CMPSCI 710 - Advanced Compilers

Project Description

 

This project is designed to give students a taste of how to do and report on experimental research. Although the project is a compiler optimization, the approach and skills will give you valuable skills for experimental research. Each team picks a compiler optimization, predicts the results, implements it, experiments with it, and reports on it. In the rest of this document, take the word “optimization” to mean “optimization or analysis.”

The class will divide up into teams of 2 students. In consultation with the instructor, each team will choose one or more compiler optimizations, find and read relevant papers or book chapters on the optimization, design its implementation within a compiler of your choice, implement it, perform experiments to test its correctness and effectiveness, write a report, and give a talk. The remainder of this document discusses potential projects and the expectations and due dates for each of the above components.

You may use a number of possible compilers, but I particularly recommend that you use one of the following, although others are possible subject to approval.

·        The Jikes RVM, a Java JIT compiler and runtime system
http://www-124.ibm.com/developerworks/oss/jikesrvm/

·        Broadway and C-Breeze, a meta-compiler plus C from the University of Texas at Austin
see http://www.cs.utexas.edu/users/sammy,
http://www.cs.utexas.edu/users/c-breeze

We have working versions of all of these tools, and each of these offers different existing infrastructure, optimizations, and targets. Also available is an experimental compiler called Shark designed to optimize script languages, by yours truly; we can discuss possible projects using Shark if you’re interested.

The projects should extend the existing compiler's functionality with an optimization and its corresponding analysis. Your choice of optimization must add functionality to the compiler. You may implement a known optimization, invent a new one, or extend an existing one. Some possible projects are:

In Jikes:

  • unrolling and other loop optimization for improving instruction level parallelism (ILP)
  • policies for generating ILP
  • policies for method inlining
  • more sophisticated escape analysis for scalar replacement
  • escape and other analyses to improve garbage collection
  • as above, but towards improving virtual memory behavior
  • class analysis for safe object inlining
  • policies for object inlining
  • class analysis for improving cache replacement decisions
  • sound global array-bounds check elimination
  • sparse conditional constant propagation
  • induction variable elimination
  • array-based optimizations (there are lots of them to choose from)
  • static and dynamic verification of high-level properties

DUE 2/13/2003: Phase 1 - Project Description

You should turn in a one page document consisting of a list of the team members, a one or two paragraph description of the optimization, and 3 to 5 references to relevant papers (e.g., book chapters, theses, and conference or journal articles). Include why you chose this project/compiler pair, and what your expected results are.

DUE 2/27/2003: Phase 2 - Project Design

Having read the references specified in the first document (and perhaps others), each team will develop a 2 to 4 page working document describing the design of the implementation with pseudocode and a full specification of the algorithm (with appropriate citations).

DUE 3/25/2003: Phase 3 - Project Review

By appointment, each team will give a 20 minute demo consisting of: 10-20 minutes going through and explaining the current state of the implementation; and 10-20 minutes for discussion in which the team and the instructor insure a successful project completion. PREPARE a schedule of activities, and discussion points for this meeting. You are driving.

DUE 4/29/2003: Phase 4 - Project Implementation

Your project should work now and you should work on your results, report, and presentation for the remainder of the semester.

DUE 5/6/2003 to 5/13/2003: Phase 5 - Project Talks

A 25-minute talk with slides that spends 5 minutes on what problem you are solving and why it is interesting. 10 to 15 minutes on how you did it, and what it does including some examples. 5 to 10 minutes on how well it worked and what you learned.

DUE 5/13/2003: Phase 5 - Project Report

The final document will take the basic form of a conference paper and will include an abstract, introduction, project description, results, and summary. The abstract will describe in 1 paragraph why the optimization is important, the implementation of the optimization, and the results. The introduction will in 1 page elaborate on the abstract and further motivate the selected optimization. The remainder of the paper will present the approach and results with static and dynamic measurements using various configurations. You will include a related work section and then summarize, including lessons learned. Much of the report should be taken from the working document developed in Phase 2.

Important: In an appendix included in your final report, you must include an implementation details section that has 3 parts: an overview, 1 to 3 well chosen examples, and a code listing. Part 1 should provide an overview (approximately 1 page) briefly describing any key implementation details and listing important procedures that you wrote. Part 2 has 1 to 3 examples before and after showing how your optimization changes the code. Part 3 includes the files you created and changed. Please mark with a highlighter pen the procedure definitions and call sites.


Notes on the Project Design Document

Your project design document should have the following components:

1. Abstract ­ 1 paragraph (problem, solution, meaning)

2­4 sentences on what problem you are solving and why it is important,

1­2 sentences on what you will do/did, and

1­2 sentences on what it means (is it new, is it in a new setting, will you confirm someone else's results, etc.)

2. Introduction ­ 4­7 paragraphs (problem, solution, meaning) expanding the above, plus a paragraph that describes the organization of the document (This paragraph should not be interchangeable with someone else's paper.)

3. Background ­ Describe the system in which you are working, what it has, and any previous work on which you are relying.

4. Project Description ­ what specifically you are doing that is not in the system in which you are working, why you chose it, as opposed to some other technique to do the same thing. Include an example of how your work will change the code.

5. Experimental Design ­ Detail all the parts of the experimental setting. Include the system version, the particular kind of code it generates that you will use (justify this choice if necessary), the programs, machine(s), and types of experiments, e.g., we will measure the static number of aliases; we will measure the program execution time without Y, configured with M, and then with our new technique Y, we will use a cache simulator to measure Z; Restate what is in your system, what you are adding, and what you expect to see as a benefit of your addition.

6. Related Work ­ relate your work to the previous work in the area. Why are you doing this register allocator? What other choices are there? What trade­off does your choice make compared to the other techniques?

General Writing Tips:

Pick 1­3 main points of your document. Make these points at least 4 times, maybe 5; once in the abstract, intro, body (1­3 times), related work, conclusion.

Write in the active voice (not the passive voice). Always chose a clear noun and verb. This tenet greatly increases clarity when writing about computer systems, because which system component is responsible for the action is key to understanding how your system works. In particular, the compiler, memory manager, runtime system, application, or architecture, all could be the part of the system responsible for the action.

Consider the following passive sentence: The object is then allocated into the immortal generation. Instead choose: The compiler modifies the allocation site, replacing the default nursery allocation with allocation into the immortal space. OR: At run time, the memory manager chooses to allocate the object into the immortal space. The two choices show the problem, and hopefully one of them is correct.

To improve clarity, use the shortest sentence that you can. Chop long sentences into 2 or 3 short ones.

``This'' is not a noun. For clarity, always pick a noun to describe with ``this.'' Consider the following incorrect usage: This causes an error in the runtime system. Instead use: This buffer overflow causes an error in the runtime system.

Besides the abstract and introduction, each section (and many subsections) should begin with a summary of the contents of the section. In technical writing, there should be no surprise endings.

Every conference citation should include: Authors, title, conference, place, date, page numbers. Every journal citation should include: Authors, title, journal, volume, number, date, page numbers. Every thesis citation should include: Author, title, institution, date. WITHOUT FAIL!