1<!---
2// Copyright 2018 The Go Authors. All rights reserved.
3// Use of this source code is governed by a BSD-style
4// license that can be found in the LICENSE file.
5-->
6
7## Introduction to the Go compiler
8
9`cmd/compile` contains the main packages that form the Go compiler. The compiler
10may be logically split in four phases, which we will briefly describe alongside
11the list of packages that contain their code.
12
13You may sometimes hear the terms "front-end" and "back-end" when referring to
14the compiler. Roughly speaking, these translate to the first two and last two
15phases we are going to list here. A third term, "middle-end", often refers to
16much of the work that happens in the second phase.
17
18Note that the `go/*` family of packages, such as `go/parser` and
19`go/types`, are mostly unused by the compiler. Since the compiler was
20initially written in C, the `go/*` packages were developed to enable
21writing tools working with Go code, such as `gofmt` and `vet`.
22However, over time the compiler's internal APIs have slowly evolved to
23be more familiar to users of the `go/*` packages.
24
25It should be clarified that the name "gc" stands for "Go compiler", and has
26little to do with uppercase "GC", which stands for garbage collection.
27
28### 1. Parsing
29
30* `cmd/compile/internal/syntax` (lexer, parser, syntax tree)
31
32In the first phase of compilation, source code is tokenized (lexical analysis),
33parsed (syntax analysis), and a syntax tree is constructed for each source
34file.
35
36Each syntax tree is an exact representation of the respective source file, with
37nodes corresponding to the various elements of the source such as expressions,
38declarations, and statements. The syntax tree also includes position information
39which is used for error reporting and the creation of debugging information.
40
41### 2. Type checking
42
43* `cmd/compile/internal/types2` (type checking)
44
45The types2 package is a port of `go/types` to use the syntax package's
46AST instead of `go/ast`.
47
48### 3. IR construction ("noding")
49
50* `cmd/compile/internal/types` (compiler types)
51* `cmd/compile/internal/ir` (compiler AST)
52* `cmd/compile/internal/noder` (create compiler AST)
53
54The compiler middle end uses its own AST definition and representation of Go
55types carried over from when it was written in C. All of its code is written in
56terms of these, so the next step after type checking is to convert the syntax
57and types2 representations to ir and types. This process is referred to as
58"noding."
59
60Noding using a process called Unified IR, which builds a node representation
61using a serialized version of the typechecked code from step 2.
62Unified IR is also involved in import/export of packages and inlining.
63
64### 4. Middle end
65
66* `cmd/compile/internal/deadcode` (dead code elimination)
67* `cmd/compile/internal/inline` (function call inlining)
68* `cmd/compile/internal/devirtualize` (devirtualization of known interface method calls)
69* `cmd/compile/internal/escape` (escape analysis)
70
71Several optimization passes are performed on the IR representation:
72dead code elimination, (early) devirtualization, function call
73inlining, and escape analysis.
74
75### 5. Walk
76
77* `cmd/compile/internal/walk` (order of evaluation, desugaring)
78
79The final pass over the IR representation is "walk," which serves two purposes:
80
811. It decomposes complex statements into individual, simpler statements,
82 introducing temporary variables and respecting order of evaluation. This step
83 is also referred to as "order."
84
852. It desugars higher-level Go constructs into more primitive ones. For example,
86 `switch` statements are turned into binary search or jump tables, and
87 operations on maps and channels are replaced with runtime calls.
88
89### 6. Generic SSA
90
91* `cmd/compile/internal/ssa` (SSA passes and rules)
92* `cmd/compile/internal/ssagen` (converting IR to SSA)
93
94In this phase, IR is converted into Static Single Assignment (SSA) form, a
95lower-level intermediate representation with specific properties that make it
96easier to implement optimizations and to eventually generate machine code from
97it.
98
99During this conversion, function intrinsics are applied. These are special
100functions that the compiler has been taught to replace with heavily optimized
101code on a case-by-case basis.
102
103Certain nodes are also lowered into simpler components during the AST to SSA
104conversion, so that the rest of the compiler can work with them. For instance,
105the copy builtin is replaced by memory moves, and range loops are rewritten into
106for loops. Some of these currently happen before the conversion to SSA due to
107historical reasons, but the long-term plan is to move all of them here.
108
109Then, a series of machine-independent passes and rules are applied. These do not
110concern any single computer architecture, and thus run on all `GOARCH` variants.
111These passes include dead code elimination, removal of
112unneeded nil checks, and removal of unused branches. The generic rewrite rules
113mainly concern expressions, such as replacing some expressions with constant
114values, and optimizing multiplications and float operations.
115
116### 7. Generating machine code
117
118* `cmd/compile/internal/ssa` (SSA lowering and arch-specific passes)
119* `cmd/internal/obj` (machine code generation)
120
121The machine-dependent phase of the compiler begins with the "lower" pass, which
122rewrites generic values into their machine-specific variants. For example, on
123amd64 memory operands are possible, so many load-store operations may be combined.
124
125Note that the lower pass runs all machine-specific rewrite rules, and thus it
126currently applies lots of optimizations too.
127
128Once the SSA has been "lowered" and is more specific to the target architecture,
129the final code optimization passes are run. This includes yet another dead code
130elimination pass, moving values closer to their uses, the removal of local
131variables that are never read from, and register allocation.
132
133Other important pieces of work done as part of this step include stack frame
134layout, which assigns stack offsets to local variables, and pointer liveness
135analysis, which computes which on-stack pointers are live at each GC safe point.
136
137At the end of the SSA generation phase, Go functions have been transformed into
138a series of obj.Prog instructions. These are passed to the assembler
139(`cmd/internal/obj`), which turns them into machine code and writes out the
140final object file. The object file will also contain reflect data, export data,
141and debugging information.
142
143### 8. Tips
144
145#### Getting Started
146
147* If you have never contributed to the compiler before, a simple way to begin
148 can be adding a log statement or `panic("here")` to get some
149 initial insight into whatever you are investigating.
150
151* The compiler itself provides logging, debugging and visualization capabilities,
152 such as:
153 ```
154 $ go build -gcflags=-m=2 # print optimization info, including inlining, escape analysis
155 $ go build -gcflags=-d=ssa/check_bce/debug # print bounds check info
156 $ go build -gcflags=-W # print internal parse tree after type checking
157 $ GOSSAFUNC=Foo go build # generate ssa.html file for func Foo
158 $ go build -gcflags=-S # print assembly
159 $ go tool compile -bench=out.txt x.go # print timing of compiler phases
160 ```
161
162 Some flags alter the compiler behavior, such as:
163 ```
164 $ go tool compile -h file.go # panic on first compile error encountered
165 $ go build -gcflags=-d=checkptr=2 # enable additional unsafe pointer checking
166 ```
167
168 There are many additional flags. Some descriptions are available via:
169 ```
170 $ go tool compile -h # compiler flags, e.g., go build -gcflags='-m=1 -l'
171 $ go tool compile -d help # debug flags, e.g., go build -gcflags=-d=checkptr=2
172 $ go tool compile -d ssa/help # ssa flags, e.g., go build -gcflags=-d=ssa/prove/debug=2
173 ```
174
175 There are some additional details about `-gcflags` and the differences between `go build`
176 vs. `go tool compile` in a [section below](#-gcflags-and-go-build-vs-go-tool-compile).
177
178* In general, when investigating a problem in the compiler you usually want to
179 start with the simplest possible reproduction and understand exactly what is
180 happening with it.
181
182#### Testing your changes
183
184* Be sure to read the [Quickly testing your changes](https://go.dev/doc/contribute#quick_test)
185 section of the Go Contribution Guide.
186
187* Some tests live within the cmd/compile packages and can be run by `go test ./...` or similar,
188 but many cmd/compile tests are in the top-level
189 [test](https://github.com/golang/go/tree/master/test) directory:
190
191 ```
192 $ go test cmd/internal/testdir # all tests in 'test' dir
193 $ go test cmd/internal/testdir -run='Test/escape.*.go' # test specific files in 'test' dir
194 ```
195 For details, see the [testdir README](https://github.com/golang/go/tree/master/test#readme).
196 The `errorCheck` method in [testdir_test.go](https://github.com/golang/go/blob/master/src/cmd/internal/testdir/testdir_test.go)
197 is helpful for a description of the `ERROR` comments used in many of those tests.
198
199 In addition, the `go/types` package from the standard library and `cmd/compile/internal/types2`
200 have shared tests in `src/internal/types/testdata`, and both type checkers
201 should be checked if anything changes there.
202
203* The new [application-based coverage profiling](https://go.dev/testing/coverage/) can be used
204 with the compiler, such as:
205
206 ```
207 $ go install -cover -coverpkg=cmd/compile/... cmd/compile # build compiler with coverage instrumentation
208 $ mkdir /tmp/coverdir # pick location for coverage data
209 $ GOCOVERDIR=/tmp/coverdir go test [...] # use compiler, saving coverage data
210 $ go tool covdata textfmt -i=/tmp/coverdir -o coverage.out # convert to traditional coverage format
211 $ go tool cover -html coverage.out # view coverage via traditional tools
212 ```
213
214#### Juggling compiler versions
215
216* Many of the compiler tests use the version of the `go` command found in your PATH and
217 its corresponding `compile` binary.
218
219* If you are in a branch and your PATH includes `<go-repo>/bin`,
220 doing `go install cmd/compile` will build the compiler using the code from your
221 branch and install it to the proper location so that subsequent `go` commands
222 like `go build` or `go test ./...` will exercise your freshly built compiler.
223
224* [toolstash](https://pkg.go.dev/golang.org/x/tools/cmd/toolstash) provides a way
225 to save, run, and restore a known good copy of the Go toolchain. For example, it can be
226 a good practice to initially build your branch, save that version of
227 the toolchain, then restore the known good version of the tools to compile
228 your work-in-progress version of the compiler.
229
230 Sample set up steps:
231 ```
232 $ go install golang.org/x/tools/cmd/toolstash@latest
233 $ git clone https://go.googlesource.com/go
234 $ cd go
235 $ git checkout -b mybranch
236 $ ./src/all.bash # build and confirm good starting point
237 $ export PATH=$PWD/bin:$PATH
238 $ toolstash save # save current tools
239 ```
240 After that, your edit/compile/test cycle can be similar to:
241 ```
242 <... make edits to cmd/compile source ...>
243 $ toolstash restore && go install cmd/compile # restore known good tools to build compiler
244 <... 'go build', 'go test', etc. ...> # use freshly built compiler
245 ```
246
247* toolstash also allows comparing the installed vs. stashed copy of
248 the compiler, such as if you expect equivalent behavior after a refactor.
249 For example, to check that your changed compiler produces identical object files to
250 the stashed compiler while building the standard library:
251 ```
252 $ toolstash restore && go install cmd/compile # build latest compiler
253 $ go build -toolexec "toolstash -cmp" -a -v std # compare latest vs. saved compiler
254 ```
255
256* If versions appear to get out of sync (for example, with errors like
257 `linked object header mismatch` with version strings like
258 `devel go1.21-db3f952b1f`), you might need to do
259 `toolstash restore && go install cmd/...` to update all the tools under cmd.
260
261#### Additional helpful tools
262
263* [compilebench](https://pkg.go.dev/golang.org/x/tools/cmd/compilebench) benchmarks
264 the speed of the compiler.
265
266* [benchstat](https://pkg.go.dev/golang.org/x/perf/cmd/benchstat) is the standard tool
267 for reporting performance changes resulting from compiler modifications,
268 including whether any improvements are statistically significant:
269 ```
270 $ go test -bench=SomeBenchmarks -count=20 > new.txt # use new compiler
271 $ toolstash restore # restore old compiler
272 $ go test -bench=SomeBenchmarks -count=20 > old.txt # use old compiler
273 $ benchstat old.txt new.txt # compare old vs. new
274 ```
275
276* [bent](https://pkg.go.dev/golang.org/x/benchmarks/cmd/bent) facilitates running a
277 large set of benchmarks from various community Go projects inside a Docker container.
278
279* [perflock](https://github.com/aclements/perflock) helps obtain more consistent
280 benchmark results, including by manipulating CPU frequency scaling settings on Linux.
281
282* [view-annotated-file](https://github.com/loov/view-annotated-file) (from the community)
283 overlays inlining, bounds check, and escape info back onto the source code.
284
285* [godbolt.org](https://go.godbolt.org) is widely used to examine
286 and share assembly output from many compilers, including the Go compiler. It can also
287 [compare](https://go.godbolt.org/z/5Gs1G4bKG) assembly for different versions of
288 a function or across Go compiler versions, which can be helpful for investigations and
289 bug reports.
290
291#### -gcflags and 'go build' vs. 'go tool compile'
292
293* `-gcflags` is a go command [build flag](https://pkg.go.dev/cmd/go#hdr-Compile_packages_and_dependencies).
294 `go build -gcflags=<args>` passes the supplied `<args>` to the underlying
295 `compile` invocation(s) while still doing everything that the `go build` command
296 normally does (e.g., handling the build cache, modules, and so on). In contrast,
297 `go tool compile <args>` asks the `go` command to invoke `compile <args>` a single time
298 without involving the standard `go build` machinery. In some cases, it can be helpful to have
299 fewer moving parts by doing `go tool compile <args>`, such as if you have a
300 small standalone source file that can be compiled without any assistance from `go build`.
301 In other cases, it is more convenient to pass `-gcflags` to a build command like
302 `go build`, `go test`, or `go install`.
303
304* `-gcflags` by default applies to the packages named on the command line, but can
305 use package patterns such as `-gcflags='all=-m=1 -l'`, or multiple package patterns such as
306 `-gcflags='all=-m=1' -gcflags='fmt=-m=2'`. For details, see the
307 [cmd/go documentation](https://pkg.go.dev/cmd/go#hdr-Compile_packages_and_dependencies).
308
309### Further reading
310
311To dig deeper into how the SSA package works, including its passes and rules,
312head to [cmd/compile/internal/ssa/README.md](internal/ssa/README.md).
313
314Finally, if something in this README or the SSA README is unclear
315or if you have an idea for an improvement, feel free to leave a comment in
316[issue 30074](https://go.dev/issue/30074).
View as plain text