ℹ️ This article is based on Go 1.13.
The Go compiler is an important tool in the Go ecosystem since it is one of the essential steps for building our programs to executable binaries. The journey of the compiler is a long one, it has been written in C to move to Go and many optimizations and cleanups will keep happening in the future. Let’s discover the high level of its operations.
The Go compiler is composed of four phases that could be grouped into two categories:
- frontend. This phase runs an analysis from the source code and produces an abstract syntactic structure of source code, called AST.
- backend. The second phase will transform the representation of the source code into machine code, along with several optimizations.
In order to better understand each phase, let’s use a simple program:
package main
func main() {
a := 1
b := 2
if true {
add(a, b)
}
}
func add(a, b int) {
println(a + b)
}
Parsing
The first phase is pretty straightforward and well explained in the documentation:
In the first phase of compilation, source code is tokenized (lexical analysis), parsed (syntax analysis), and a syntax tree is constructed for each source file.
The lexer will be the first package to run in order to tokenize the source code. Here is the output of the previous example tokenized:
Once tokenized, that will be parsed and used to build a syntax tree.
AST transformation
The transformation to an Abstract Syntax Tree can be displayed thanks to the command go tool compile
with the flag -W
:
This phase will also include optimizations like inlining. In our example, the method add
can be inlined already since we do not see any instruction CALLFUNC
to the method add
. Let’s run the again command with the flag -l that disables the inlining:
Once the AST generates, it allows the compiler to go to a lower-level intermediate representation with the SSA representation.
SSA generation
The Static Single Assignment form is the phase where the optimizations will happen: dead code elimination, removal of unused branches, replacing some expressions with constant values, etc.
The SSA code can be dumped thanks to the command GOSSAFUNC=main go tool compile main.go && open ssa.html
that produces an HTML document will all the different passes that are done in the SSA package:
The generated SSA stands in the “start” tab:
The variables a
and b
are highlighted here, along with the if
condition and will allow us later to see how those lines are changed. The code also shows us how the compiler manages the println
function that is decomposed in 4 steps: printlock
, printint
, printnl
, printunlock
. The compiler automatically adds a lock for us and, according to the type of the argument, will call the related method to print it correctly.
In our example, since a
and b
are known at the compilation, the compiler can calculate the final result and mark the variables as not necessary anymore. The pass opt
will optimize this part:
v11
has been replaced here by the result of the addition of v4
and v5
that have been marked as dead code. The pass opt deadcode
will then remove that code:
Regarding the if
condition, the opt
phase will mark the constant true
as dead code and then will be removed:
Then, another pass will simplify the control flow by marking the unnecessary block and condition as invalid. Those blocks will later be removed by another pass dedicated to the dead code:
Once all the passes are done, the Go compiler will now generate an intermediate assembly code:
The next phase will generate the machine code into the binary file.
Machine code generation
The last step of the compiler is the generation of the object file, main.o
in our example. From this file, it is now possible to disassemble it with the objdump
tool that does the reverse process. Here is a nice diagram created by Grant Seltzer Richman:
You can find more information about the object file and binaries in “Dissecting Go Binaries.
Once the object file is generated, it can now be passed directly to the linker with the command go tool link
and your binary will finally be ready.