Go: Overview of the Compiler.

Illustration created for “A Journey With Go”, made from the original Go Gopher, created by Renee French.

ℹ️ This article is based on Go 1.13.

The Go compiler is an important tool in the Go ecosystem since it is one of the essential steps for building our programs to executable binaries. The journey of the compiler is a long one, it has been written in C to move to Go and many optimizations and cleanups will keep happening in the future. Let’s discover the high level of its operations.

The Go compiler is composed of four phases that could be grouped into two categories:

  • frontend. This phase runs an analysis from the source code and produces an abstract syntactic structure of source code, called AST.
  • backend. The second phase will transform the representation of the source code into machine code, along with several optimizations.
compiler documentation

In order to better understand each phase, let’s use a simple program:

package main

func main() {
	a := 1
	b := 2
	if true {
		add(a, b)
	}
}

func add(a, b int) {
	println(a + b)
}

Parsing

The first phase is pretty straightforward and well explained in the documentation:

In the first phase of compilation, source code is tokenized (lexical analysis), parsed (syntax analysis), and a syntax tree is constructed for each source file.

The lexer will be the first package to run in order to tokenize the source code. Here is the output of the previous example tokenized:

Go source code tokenized

Once tokenized, that will be parsed and used to build a syntax tree.


AST transformation

The transformation to an Abstract Syntax Tree can be displayed thanks to the command go tool compile with the flag -W:

This phase will also include optimizations like inlining. In our example, the method add can be inlined already since we do not see any instruction CALLFUNC to the method add. Let’s run the again command with the flag -l that disables the inlining:

Once the AST generates, it allows the compiler to go to a lower-level intermediate representation with the SSA representation.

SSA generation

The Static Single Assignment form is the phase where the optimizations will happen: dead code elimination, removal of unused branches, replacing some expressions with constant values, etc.

The SSA code can be dumped thanks to the command GOSSAFUNC=main go tool compile main.go && open ssa.html that produces an HTML document will all the different passes that are done in the SSA package:

The generated SSA stands in the “start” tab:

SSA code

The variables a and b are highlighted here, along with the if condition and will allow us later to see how those lines are changed. The code also shows us how the compiler manages the println function that is decomposed in 4 steps: printlock, printint, printnl, printunlock. The compiler automatically adds a lock for us and, according to the type of the argument, will call the related method to print it correctly.

In our example, since a and b are known at the compilation, the compiler can calculate the final result and mark the variables as not necessary anymore. The pass opt will optimize this part:

SSA code — “opt” pass

v11 has been replaced here by the result of the addition of v4 and v5 that have been marked as dead code. The pass opt deadcode will then remove that code:

SSA code — “opt deadcode” pass

Regarding the if condition, the opt phase will mark the constant true as dead code and then will be removed:

constant boolean is removed

Then, another pass will simplify the control flow by marking the unnecessary block and condition as invalid. Those blocks will later be removed by another pass dedicated to the dead code:

unnecessary control flow is removed

Once all the passes are done, the Go compiler will now generate an intermediate assembly code:

Go asm code

The next phase will generate the machine code into the binary file.


Machine code generation

The last step of the compiler is the generation of the object file, main.o in our example. From this file, it is now possible to disassemble it with the objdumptool that does the reverse process. Here is a nice diagram created by Grant Seltzer Richman:

go tool compile
go tool objdump

You can find more information about the object file and binaries in “Dissecting Go Binaries.

Once the object file is generated, it can now be passed directly to the linker with the command go tool link and your binary will finally be ready.

Jin
记录&分享。