1# go-json
2
3![Go](https://github.com/goccy/go-json/workflows/Go/badge.svg)
4[![GoDoc](https://godoc.org/github.com/goccy/go-json?status.svg)](https://pkg.go.dev/github.com/goccy/go-json?tab=doc)
5[![codecov](https://codecov.io/gh/goccy/go-json/branch/master/graph/badge.svg)](https://codecov.io/gh/goccy/go-json)
6
7Fast JSON encoder/decoder compatible with encoding/json for Go
8
9<img width="400px" src="https://user-images.githubusercontent.com/209884/92572337-42b42900-f2bf-11ea-973a-c74a359553a5.png"></img>
10
11# Roadmap
12
13```
14* version ( expected release date )
15
16* v0.9.0
17 |
18 | while maintaining compatibility with encoding/json, we will add convenient APIs
19 |
20 v
21* v1.0.0
22```
23
24We are accepting requests for features that will be implemented between v0.9.0 and v.1.0.0.
25If you have the API you need, please submit your issue [here](https://github.com/goccy/go-json/issues).
26
27# Features
28
29- Drop-in replacement of `encoding/json`
30- Fast ( See [Benchmark section](https://github.com/goccy/go-json#benchmarks) )
31- Flexible customization with options
32- Coloring the encoded string
33- Can propagate context.Context to `MarshalJSON` or `UnmarshalJSON`
34- Can dynamically filter the fields of the structure type-safely
35
36# Installation
37
38```
39go get github.com/goccy/go-json
40```
41
42# How to use
43
44Replace import statement from `encoding/json` to `github.com/goccy/go-json`
45
46```
47-import "encoding/json"
48+import "github.com/goccy/go-json"
49```
50
51# JSON library comparison
52
53| name | encoder | decoder | compatible with `encoding/json` |
54| :----: | :------: | :-----: | :-----------------------------: |
55| encoding/json | yes | yes | N/A |
56| [json-iterator/go](https://github.com/json-iterator/go) | yes | yes | partial |
57| [easyjson](https://github.com/mailru/easyjson) | yes | yes | no |
58| [gojay](https://github.com/francoispqt/gojay) | yes | yes | no |
59| [segmentio/encoding/json](https://github.com/segmentio/encoding/tree/master/json) | yes | yes | partial |
60| [jettison](https://github.com/wI2L/jettison) | yes | no | no |
61| [simdjson-go](https://github.com/minio/simdjson-go) | no | yes | no |
62| goccy/go-json | yes | yes | yes |
63
64- `json-iterator/go` isn't compatible with `encoding/json` in many ways (e.g. https://github.com/json-iterator/go/issues/229 ), but it hasn't been supported for a long time.
65- `segmentio/encoding/json` is well supported for encoders, but some are not supported for decoder APIs such as `Token` ( streaming decode )
66
67## Other libraries
68
69- [jingo](https://github.com/bet365/jingo)
70
71I tried the benchmark but it didn't work.
72Also, it seems to panic when it receives an unexpected value because there is no error handling...
73
74- [ffjson](https://github.com/pquerna/ffjson)
75
76Benchmarking gave very slow results.
77It seems that it is assumed that the user will use the buffer pool properly.
78Also, development seems to have already stopped
79
80# Benchmarks
81
82```
83$ cd benchmarks
84$ go test -bench .
85```
86
87## Encode
88
89<img width="700px" src="https://user-images.githubusercontent.com/209884/107126758-0845cb00-68f5-11eb-8db7-086fcf9bcfaa.png"></img>
90<img width="700px" src="https://user-images.githubusercontent.com/209884/107126757-07ad3480-68f5-11eb-87aa-858cc5eacfcb.png"></img>
91
92## Decode
93
94<img width="700" alt="" src="https://user-images.githubusercontent.com/209884/107979944-bd1d6d80-7002-11eb-944b-9d17b6674e3f.png">
95<img width="700" alt="" src="https://user-images.githubusercontent.com/209884/107979931-b989e680-7002-11eb-87a0-66fc22d90dd4.png">
96<img width="700" alt="" src="https://user-images.githubusercontent.com/209884/107979940-bc84d700-7002-11eb-9647-869bbc25c9d9.png">
97
98
99# Fuzzing
100
101[go-json-fuzz](https://github.com/goccy/go-json-fuzz) is the repository for fuzzing tests.
102If you run the test in this repository and find a bug, please commit to corpus to go-json-fuzz and report the issue to [go-json](https://github.com/goccy/go-json/issues).
103
104# How it works
105
106`go-json` is very fast in both encoding and decoding compared to other libraries.
107It's easier to implement by using automatic code generation for performance or by using a dedicated interface, but `go-json` dares to stick to compatibility with `encoding/json` and is the simple interface. Despite this, we are developing with the aim of being the fastest library.
108
109Here, we explain the various speed-up techniques implemented by `go-json`.
110
111## Basic technique
112
113The techniques listed here are the ones used by most of the libraries listed above.
114
115### Buffer reuse
116
117Since the only value required for the result of `json.Marshal(interface{}) ([]byte, error)` is `[]byte`, the only value that must be allocated during encoding is the return value `[]byte` .
118
119Also, as the number of allocations increases, the performance will be affected, so the number of allocations should be kept as low as possible when creating `[]byte`.
120
121Therefore, there is a technique to reduce the number of times a new buffer must be allocated by reusing the buffer used for the previous encoding by using `sync.Pool`.
122
123Finally, you allocate a buffer that is as long as the resulting buffer and copy the contents into it, you only need to allocate the buffer once in theory.
124
125```go
126type buffer struct {
127 data []byte
128}
129
130var bufPool = sync.Pool{
131 New: func() interface{} {
132 return &buffer{data: make([]byte, 0, 1024)}
133 },
134}
135
136buf := bufPool.Get().(*buffer)
137data := encode(buf.data) // reuse buf.data
138
139newBuf := make([]byte, len(data))
140copy(newBuf, buf)
141
142buf.data = data
143bufPool.Put(buf)
144```
145
146### Elimination of reflection
147
148As you know, the reflection operation is very slow.
149
150Therefore, using the fact that the address position where the type information is stored is fixed for each binary ( we call this `typeptr` ),
151we can use the address in the type information to call a pre-built optimized process.
152
153For example, you can get the address to the type information from `interface{}` as follows and you can use that information to call a process that does not have reflection.
154
155To process without reflection, pass a pointer (`unsafe.Pointer`) to the value is stored.
156
157```go
158
159type emptyInterface struct {
160 typ unsafe.Pointer
161 ptr unsafe.Pointer
162}
163
164var typeToEncoder = map[uintptr]func(unsafe.Pointer)([]byte, error){}
165
166func Marshal(v interface{}) ([]byte, error) {
167 iface := (*emptyInterface)(unsafe.Pointer(&v)
168 typeptr := uintptr(iface.typ)
169 if enc, exists := typeToEncoder[typeptr]; exists {
170 return enc(iface.ptr)
171 }
172 ...
173}
174```
175
176※ In reality, `typeToEncoder` can be referenced by multiple goroutines, so exclusive control is required.
177
178## Unique speed-up technique
179
180## Encoder
181
182### Do not escape arguments of `Marshal`
183
184`json.Marshal` and `json.Unmarshal` receive `interface{}` value and they perform type determination dynamically to process.
185In normal case, you need to use the `reflect` library to determine the type dynamically, but since `reflect.Type` is defined as `interface`, when you call the method of `reflect.Type`, The reflect's argument is escaped.
186
187Therefore, the arguments for `Marshal` and `Unmarshal` are always escaped to the heap.
188However, `go-json` can use the feature of `reflect.Type` while avoiding escaping.
189
190`reflect.Type` is defined as `interface`, but in reality `reflect.Type` is implemented only by the structure `rtype` defined in the `reflect` package.
191For this reason, to date `reflect.Type` is the same as `*reflect.rtype`.
192
193Therefore, by directly handling `*reflect.rtype`, which is an implementation of `reflect.Type`, it is possible to avoid escaping because it changes from `interface` to using `struct`.
194
195The technique for working with `*reflect.rtype` directly from `go-json` is implemented at [rtype.go](https://github.com/goccy/go-json/blob/master/internal/runtime/rtype.go)
196
197Also, the same technique is cut out as a library ( https://github.com/goccy/go-reflect )
198
199Initially this feature was the default behavior of `go-json`.
200But after careful testing, I found that I passed a large value to `json.Marshal()` and if the argument could not be assigned to the stack, it could not be properly escaped to the heap (a bug in the Go compiler).
201
202Therefore, this feature will be provided as an **optional** until this issue is resolved.
203
204To use it, add `NoEscape` like `MarshalNoEscape()`
205
206### Encoding using opcode sequence
207
208I explained that you can use `typeptr` to call a pre-built process from type information.
209
210In other libraries, this dedicated process is processed by making it an function calling like anonymous function, but function calls are inherently slow processes and should be avoided as much as possible.
211
212Therefore, `go-json` adopted the Instruction-based execution processing system, which is also used to implement virtual machines for programming language.
213
214If it is the first type to encode, create the opcode ( instruction ) sequence required for encoding.
215From the second time onward, use `typeptr` to get the cached pre-built opcode sequence and encode it based on it. An example of the opcode sequence is shown below.
216
217```go
218json.Marshal(struct{
219 X int `json:"x"`
220 Y string `json:"y"`
221}{X: 1, Y: "hello"})
222```
223
224When encoding a structure like the one above, create a sequence of opcodes like this:
225
226```
227- opStructFieldHead ( `{` )
228- opStructFieldInt ( `"x": 1,` )
229- opStructFieldString ( `"y": "hello"` )
230- opStructEnd ( `}` )
231- opEnd
232```
233
234※ When processing each operation, write the letters on the right.
235
236In addition, each opcode is managed by the following structure (
237Pseudo code ).
238
239```go
240type opType int
241const (
242 opStructFieldHead opType = iota
243 opStructFieldInt
244 opStructFieldStirng
245 opStructEnd
246 opEnd
247)
248type opcode struct {
249 op opType
250 key []byte
251 next *opcode
252}
253```
254
255The process of encoding using the opcode sequence is roughly implemented as follows.
256
257```go
258func encode(code *opcode, b []byte, p unsafe.Pointer) ([]byte, error) {
259 for {
260 switch code.op {
261 case opStructFieldHead:
262 b = append(b, '{')
263 code = code.next
264 case opStructFieldInt:
265 b = append(b, code.key...)
266 b = appendInt((*int)(unsafe.Pointer(uintptr(p)+code.offset)))
267 code = code.next
268 case opStructFieldString:
269 b = append(b, code.key...)
270 b = appendString((*string)(unsafe.Pointer(uintptr(p)+code.offset)))
271 code = code.next
272 case opStructEnd:
273 b = append(b, '}')
274 code = code.next
275 case opEnd:
276 goto END
277 }
278 }
279END:
280 return b, nil
281}
282```
283
284In this way, the huge `switch-case` is used to encode by manipulating the linked list opcodes to avoid unnecessary function calls.
285
286### Opcode sequence optimization
287
288One of the advantages of encoding using the opcode sequence is the ease of optimization.
289The opcode sequence mentioned above is actually converted into the following optimized operations and used.
290
291```
292- opStructFieldHeadInt ( `{"x": 1,` )
293- opStructEndString ( `"y": "hello"}` )
294- opEnd
295```
296
297It has been reduced from 5 opcodes to 3 opcodes !
298Reducing the number of opcodees means reducing the number of branches with `switch-case`.
299In other words, the closer the number of operations is to 1, the faster the processing can be performed.
300
301In `go-json`, optimization to reduce the number of opcodes itself like the above and it speeds up by preparing opcodes with optimized paths.
302
303### Change recursive call from CALL to JMP
304
305Recursive processing is required during encoding if the type is defined recursively as follows:
306
307```go
308type T struct {
309 X int
310 U *U
311}
312
313type U struct {
314 T *T
315}
316
317b, err := json.Marshal(&T{
318 X: 1,
319 U: &U{
320 T: &T{
321 X: 2,
322 },
323 },
324})
325fmt.Println(string(b)) // {"X":1,"U":{"T":{"X":2,"U":null}}}
326```
327
328In `go-json`, recursive processing is processed by the operation type of ` opStructFieldRecursive`.
329
330In this operation, after acquiring the opcode sequence used for recursive processing, the function is **not** called recursively as it is, but the necessary values are saved by itself and implemented by moving to the next operation.
331
332The technique of implementing recursive processing with the `JMP` operation while avoiding the `CALL` operation is a famous technique for implementing a high-speed virtual machine.
333
334For more details, please refer to [the article](https://engineering.mercari.com/blog/entry/1599563768-081104c850) ( but Japanese only ).
335
336### Dispatch by typeptr from map to slice
337
338When retrieving the data cached from the type information by `typeptr`, we usually use map.
339Map requires exclusive control, so use `sync.Map` for a naive implementation.
340
341However, this is slow, so it's a good idea to use the `atomic` package for exclusive control as implemented by `segmentio/encoding/json` ( https://github.com/segmentio/encoding/blob/master/json/codec.go#L41-L55 ).
342
343This implementation slows down the set instead of speeding up the get, but it works well because of the nature of the library, it encodes much more for the same type.
344
345However, as a result of profiling, I noticed that `runtime.mapaccess2` accounts for a significant percentage of the execution time. So I thought if I could change the lookup from map to slice.
346
347There is an API named `typelinks` defined in the `runtime` package that the `reflect` package uses internally.
348This allows you to get all the type information defined in the binary at runtime.
349
350The fact that all type information can be acquired means that by constructing slices in advance with the acquired total number of type information, it is possible to look up with the value of `typeptr` without worrying about out-of-range access.
351
352However, if there is too much type information, it will use a lot of memory, so by default we will only use this optimization if the slice size fits within **2Mib** .
353
354If this approach is not available, it will fall back to the `atomic` based process described above.
355
356If you want to know more, please refer to the implementation [here](https://github.com/goccy/go-json/blob/master/internal/runtime/type.go#L36-L100)
357
358## Decoder
359
360### Dispatch by typeptr from map to slice
361
362Like the encoder, the decoder also uses typeptr to call the dedicated process.
363
364### Faster termination character inspection using NUL character
365
366In order to decode, you have to traverse the input buffer character by position.
367At that time, if you check whether the buffer has reached the end, it will be very slow.
368
369`buf` : `[]byte` type variable. holds the string passed to the decoder
370`cursor` : `int64` type variable. holds the current read position
371
372```go
373buflen := len(buf)
374for ; cursor < buflen; cursor++ { // compare cursor and buflen at all times, it is so slow.
375 switch buf[cursor] {
376 case ' ', '\n', '\r', '\t':
377 }
378}
379```
380
381Therefore, by adding the `NUL` (`\000`) character to the end of the read buffer as shown below, it is possible to check the termination character at the same time as other characters.
382
383```go
384for {
385 switch buf[cursor] {
386 case ' ', '\n', '\r', '\t':
387 case '\000':
388 return nil
389 }
390 cursor++
391}
392```
393
394### Use Boundary Check Elimination
395
396Due to the `NUL` character optimization, the Go compiler does a boundary check every time, even though `buf[cursor]` does not cause out-of-range access.
397
398Therefore, `go-json` eliminates boundary check by fetching characters for hotspot by pointer operation. For example, the following code.
399
400```go
401func char(ptr unsafe.Pointer, offset int64) byte {
402 return *(*byte)(unsafe.Pointer(uintptr(ptr) + uintptr(offset)))
403}
404
405p := (*sliceHeader)(&unsafe.Pointer(buf)).data
406for {
407 switch char(p, cursor) {
408 case ' ', '\n', '\r', '\t':
409 case '\000':
410 return nil
411 }
412 cursor++
413}
414```
415
416### Checking the existence of fields of struct using Bitmaps
417
418I found by the profiling result, in the struct decode, lookup process for field was taking a long time.
419
420For example, consider decoding a string like `{"a":1,"b":2,"c":3}` into the following structure:
421
422```go
423type T struct {
424 A int `json:"a"`
425 B int `json:"b"`
426 C int `json:"c"`
427}
428```
429
430At this time, it was found that it takes a lot of time to acquire the decoding process corresponding to the field from the field name as shown below during the decoding process.
431
432```go
433fieldName := decodeKey(buf, cursor) // "a" or "b" or "c"
434decoder, exists := fieldToDecoderMap[fieldName] // so slow
435if exists {
436 decoder(buf, cursor)
437} else {
438 skipValue(buf, cursor)
439}
440```
441
442To improve this process, `json-iterator/go` is optimized so that it can be branched by switch-case when the number of fields in the structure is 10 or less (switch-case is faster than map). However, there is a risk of hash collision because the value hashed by the FNV algorithm is used for conditional branching. Also, `gojay` processes this part at high speed by letting the library user yourself write `switch-case`.
443
444
445`go-json` considers and implements a new approach that is different from these. I call this **bitmap field optimization**.
446
447The range of values per character can be represented by `[256]byte`. Also, if the number of fields in the structure is 8 or less, `int8` type can represent the state of each field.
448In other words, it has the following structure.
449
450- Base ( 8bit ): `00000000`
451- Key "a": `00000001` ( assign key "a" to the first bit )
452- Key "b": `00000010` ( assign key "b" to the second bit )
453- Key "c": `00000100` ( assign key "c" to the third bit )
454
455Bitmap structure is the following
456
457```
458 | key index(0) |
459------------------------
460 0 | 00000000 |
461 1 | 00000000 |
462~~ | |
46397 (a) | 00000001 |
46498 (b) | 00000010 |
46599 (c) | 00000100 |
466~~ | |
467255 | 00000000 |
468```
469
470You can think of this as a Bitmap with a height of `256` and a width of the maximum string length in the field name.
471In other words, it can be represented by the following type .
472
473```go
474[maxFieldKeyLength][256]int8
475```
476
477When decoding a field character, check whether the corresponding character exists by referring to the pre-built bitmap like the following.
478
479```go
480var curBit int8 = math.MaxInt8 // 11111111
481
482c := char(buf, cursor)
483bit := bitmap[keyIdx][c]
484curBit &= bit
485if curBit == 0 {
486 // not found field
487}
488```
489
490If `curBit` is not `0` until the end of the field string, then the string is
491You may have hit one of the fields.
492But the possibility is that if the decoded string is shorter than the field string, you will get a false hit.
493
494- input: `{"a":1}`
495```go
496type T struct {
497 X int `json:"abc"`
498}
499```
500※ Since `a` is shorter than `abc`, it can decode to the end of the field character without `curBit` being 0.
501
502Rest assured. In this case, it doesn't matter because you can tell if you hit by comparing the string length of `a` with the string length of `abc`.
503
504Finally, calculate the position of the bit where `1` is set and get the corresponding value, and you're done.
505
506Using this technique, field lookups are possible with only bitwise operations and access to slices.
507
508`go-json` uses a similar technique for fields with 9 or more and 16 or less fields. At this time, Bitmap is constructed as `[maxKeyLen][256]int16` type.
509
510Currently, this optimization is not performed when the maximum length of the field name is long (specifically, 64 bytes or more) in addition to the limitation of the number of fields from the viewpoint of saving memory usage.
511
512### Others
513
514I have done a lot of other optimizations. I will find time to write about them. If you have any questions about what's written here or other optimizations, please visit the `#go-json` channel on `gophers.slack.com` .
515
516## Reference
517
518Regarding the story of go-json, there are the following articles in Japanese only.
519
520- https://speakerdeck.com/goccy/zui-su-falsejsonraiburariwoqiu-mete
521- https://engineering.mercari.com/blog/entry/1599563768-081104c850/
522
523# Looking for Sponsors
524
525I'm looking for sponsors this library. This library is being developed as a personal project in my spare time. If you want a quick response or problem resolution when using this library in your project, please register as a [sponsor](https://github.com/sponsors/goccy). I will cooperate as much as possible. Of course, this library is developed as an MIT license, so you can use it freely for free.
526
527# License
528
529MIT
View as plain text