Monday, November 12, 2007

The First Compiler

When I first came to know about compilers there was this question that bothered me lot - How was the first compiler compiled, being that no compiler was available to compile it ? There is another related question if you consider an interesting point, that most compilers are written in the language that are built to compile. A good example is C. The first C compiler obviously couldn't have been written in C. This sounds like a chicken and egg problem right - How did the first compiler get generated ?

The first compiler was written in the traditional way. You write a assembly program and convert it to machine code and you have a executable that can be used to compile.

Now to the other question - how do you write a compiler for a new language if the compilers are written in the language that are built to compile. Well, there are many methods for this :

  1. Write the compiler for the new language in any of the existing language.  The first Pascal compiler was actually written in Fortran.
  2. Write the compiler in a language that is actually a subset of the new language so that we can have the compiler of the subset language to compile the new language. Unlike the first case here both languages are related. The new language is derived from another language.
  3. Write the compiler in the new language and hand compile it in an un-optimized way and use this compiler to compile the original source code, thereby you actually get the optimized compiler.
  4. And finally the classic way - write the compiler in assembly language and get the machine code from it.

So do you know which is the first compiler that was built and for which language it was for ? Well The "first" compiler was created by Rear Admiral Dr. Grace Murray Hopper in 1949. That was for A-O programming language.You can know more about that here. But this compiler was closer to what we would today call a linker, I.e., a tool for combining existing libraries into a program. The name came from the fact that they 'compiled' the libraries into a single binary image.The first successful compiler was the FORTRAN I compiler first began in 1954 and finished in 1957.Modern compiler theory was developed in the early and mid 1960s, primarily out of the early ALGOL-60 compilers.

Friday, October 19, 2007

Caller-saves versus Callee-saves

Consider two functions say fun and bar and bar is calling fun, then bar is called the caller and fun is called the callee. All the registers saved by fun is called callee saved registers and the registers saved by bar is called caller saved registers. Now that we know what caller saved registers and callee saved registers are, let’s look into their role in register allocation.

A good complier will make sure that there are as many registers are available for allocator to work with. This is not always an easy job. When one function calls another function, the values in the registers used by the parent or the caller should not be altered by the child or the callee. One way to make sure this is happening is to save all the registers used by the caller just before calling the callee. Thus the callee can access all the registers without any restrictions and the allocator can also do its function efficiently. Another way for smooth functioning of the allocator is to make the callee save all the registers just before executing its main function body. Yet another way is to make sure that none of the registers used by the caller is allocated to the callee. The caller need to save only those registers which are live across the call and the callee need to save only the registers which are used inside the callee function. This is so because the registers allocator will have information about which all variables are live across a call in the caller and which all variables are used inside a callee function. Whenever a register is saved, it should be reloaded back into the registers either by the caller after the function call or by the callee just before returning to the parent depending on whether its caller saved or callee saved. So basically the only difference between the caller saves and callee saves comes down to when the registers get actually saved.

A compiler can be either use caller save strategy or callee save strategy, but this has got some problems. The entire caller saved registers may not be used in the callee function and not all the registers saved by the callee are live or used across the function call. We cannot avoid these unnecessary saves since when one function is being complied we don’t know the register usage of callee/caller. But unnecessary saves can be reduced by adopting a mix of caller save and callee save strategy. In this some set of registers will be marked as caller save registers and another set will be marked as callee saved registers. No registers will be in both the sets and all the registers available need not be in both the sets. So if any of the registers in the caller save list is live across the function call, then only they need to be saved and not all the registers live across the call. If a mixed strategy is used the variables that are live across a function call should be allocated to callee saved registers. This way, the caller doesn’t have to save these and ,with luck, they don’t have to be saved by the callee either (if the callee doesn’t use these registers in its body).