1 // Copyright 2009 The Go Authors. All rights reserved. 2 // Use of this source code is governed by a BSD-style 3 // license that can be found in the LICENSE file. 4 5 // Garbage collector (GC). 6 // 7 // The GC runs concurrently with mutator threads, is type accurate (aka precise), allows multiple 8 // GC thread to run in parallel. It is a concurrent mark and sweep that uses a write barrier. It is 9 // non-generational and non-compacting. Allocation is done using size segregated per P allocation 10 // areas to minimize fragmentation while eliminating locks in the common case. 11 // 12 // The algorithm decomposes into several steps. 13 // This is a high level description of the algorithm being used. For an overview of GC a good 14 // place to start is Richard Jones' gchandbook.org. 15 // 16 // The algorithm's intellectual heritage includes Dijkstra's on-the-fly algorithm, see 17 // Edsger W. Dijkstra, Leslie Lamport, A. J. Martin, C. S. Scholten, and E. F. M. Steffens. 1978. 18 // On-the-fly garbage collection: an exercise in cooperation. Commun. ACM 21, 11 (November 1978), 19 // 966-975. 20 // For journal quality proofs that these steps are complete, correct, and terminate see 21 // Hudson, R., and Moss, J.E.B. Copying Garbage Collection without stopping the world. 22 // Concurrency and Computation: Practice and Experience 15(3-5), 2003. 23 // 24 // 1. GC performs sweep termination. 25 // 26 // a. Stop the world. This causes all Ps to reach a GC safe-point. 27 // 28 // b. Sweep any unswept spans. There will only be unswept spans if 29 // this GC cycle was forced before the expected time. 30 // 31 // 2. GC performs the mark phase. 32 // 33 // a. Prepare for the mark phase by setting gcphase to _GCmark 34 // (from _GCoff), enabling the write barrier, enabling mutator 35 // assists, and enqueueing root mark jobs. No objects may be 36 // scanned until all Ps have enabled the write barrier, which is 37 // accomplished using STW. 38 // 39 // b. Start the world. From this point, GC work is done by mark 40 // workers started by the scheduler and by assists performed as 41 // part of allocation. The write barrier shades both the 42 // overwritten pointer and the new pointer value for any pointer 43 // writes (see mbarrier.go for details). Newly allocated objects 44 // are immediately marked black. 45 // 46 // c. GC performs root marking jobs. This includes scanning all 47 // stacks, shading all globals, and shading any heap pointers in 48 // off-heap runtime data structures. Scanning a stack stops a 49 // goroutine, shades any pointers found on its stack, and then 50 // resumes the goroutine. 51 // 52 // d. GC drains the work queue of grey objects, scanning each grey 53 // object to black and shading all pointers found in the object 54 // (which in turn may add those pointers to the work queue). 55 // 56 // e. Because GC work is spread across local caches, GC uses a 57 // distributed termination algorithm to detect when there are no 58 // more root marking jobs or grey objects (see gcMarkDone). At this 59 // point, GC transitions to mark termination. 60 // 61 // 3. GC performs mark termination. 62 // 63 // a. Stop the world. 64 // 65 // b. Set gcphase to _GCmarktermination, and disable workers and 66 // assists. 67 // 68 // c. Perform housekeeping like flushing mcaches. 69 // 70 // 4. GC performs the sweep phase. 71 // 72 // a. Prepare for the sweep phase by setting gcphase to _GCoff, 73 // setting up sweep state and disabling the write barrier. 74 // 75 // b. Start the world. From this point on, newly allocated objects 76 // are white, and allocating sweeps spans before use if necessary. 77 // 78 // c. GC does concurrent sweeping in the background and in response 79 // to allocation. See description below. 80 // 81 // 5. When sufficient allocation has taken place, replay the sequence 82 // starting with 1 above. See discussion of GC rate below. 83 84 // Concurrent sweep. 85 // 86 // The sweep phase proceeds concurrently with normal program execution. 87 // The heap is swept span-by-span both lazily (when a goroutine needs another span) 88 // and concurrently in a background goroutine (this helps programs that are not CPU bound). 89 // At the end of STW mark termination all spans are marked as "needs sweeping". 90 // 91 // The background sweeper goroutine simply sweeps spans one-by-one. 92 // 93 // To avoid requesting more OS memory while there are unswept spans, when a 94 // goroutine needs another span, it first attempts to reclaim that much memory 95 // by sweeping. When a goroutine needs to allocate a new small-object span, it 96 // sweeps small-object spans for the same object size until it frees at least 97 // one object. When a goroutine needs to allocate large-object span from heap, 98 // it sweeps spans until it frees at least that many pages into heap. There is 99 // one case where this may not suffice: if a goroutine sweeps and frees two 100 // nonadjacent one-page spans to the heap, it will allocate a new two-page 101 // span, but there can still be other one-page unswept spans which could be 102 // combined into a two-page span. 103 // 104 // It's critical to ensure that no operations proceed on unswept spans (that would corrupt 105 // mark bits in GC bitmap). During GC all mcaches are flushed into the central cache, 106 // so they are empty. When a goroutine grabs a new span into mcache, it sweeps it. 107 // When a goroutine explicitly frees an object or sets a finalizer, it ensures that 108 // the span is swept (either by sweeping it, or by waiting for the concurrent sweep to finish). 109 // The finalizer goroutine is kicked off only when all spans are swept. 110 // When the next GC starts, it sweeps all not-yet-swept spans (if any). 111 112 // GC rate. 113 // Next GC is after we've allocated an extra amount of memory proportional to 114 // the amount already in use. The proportion is controlled by GOGC environment variable 115 // (100 by default). If GOGC=100 and we're using 4M, we'll GC again when we get to 8M 116 // (this mark is computed by the gcController.heapGoal method). This keeps the GC cost in 117 // linear proportion to the allocation cost. Adjusting GOGC just changes the linear constant 118 // (and also the amount of extra memory used). 119 120 // Oblets 121 // 122 // In order to prevent long pauses while scanning large objects and to 123 // improve parallelism, the garbage collector breaks up scan jobs for 124 // objects larger than maxObletBytes into "oblets" of at most 125 // maxObletBytes. When scanning encounters the beginning of a large 126 // object, it scans only the first oblet and enqueues the remaining 127 // oblets as new scan jobs. 128 129 package runtime 130 131 import ( 132 "internal/cpu" 133 "runtime/internal/atomic" 134 "unsafe" 135 ) 136 137 const ( 138 _DebugGC = 0 139 _FinBlockSize = 4 * 1024 140 141 // concurrentSweep is a debug flag. Disabling this flag 142 // ensures all spans are swept while the world is stopped. 143 concurrentSweep = true 144 145 // debugScanConservative enables debug logging for stack 146 // frames that are scanned conservatively. 147 debugScanConservative = false 148 149 // sweepMinHeapDistance is a lower bound on the heap distance 150 // (in bytes) reserved for concurrent sweeping between GC 151 // cycles. 152 sweepMinHeapDistance = 1024 * 1024 153 ) 154 155 // heapObjectsCanMove always returns false in the current garbage collector. 156 // It exists for go4.org/unsafe/assume-no-moving-gc, which is an 157 // unfortunate idea that had an even more unfortunate implementation. 158 // Every time a new Go release happened, the package stopped building, 159 // and the authors had to add a new file with a new //go:build line, and 160 // then the entire ecosystem of packages with that as a dependency had to 161 // explicitly update to the new version. Many packages depend on 162 // assume-no-moving-gc transitively, through paths like 163 // inet.af/netaddr -> go4.org/intern -> assume-no-moving-gc. 164 // This was causing a significant amount of friction around each new 165 // release, so we added this bool for the package to //go:linkname 166 // instead. The bool is still unfortunate, but it's not as bad as 167 // breaking the ecosystem on every new release. 168 // 169 // If the Go garbage collector ever does move heap objects, we can set 170 // this to true to break all the programs using assume-no-moving-gc. 171 // 172 //go:linkname heapObjectsCanMove 173 func heapObjectsCanMove() bool { 174 return false 175 } 176 177 func gcinit() { 178 if unsafe.Sizeof(workbuf{}) != _WorkbufSize { 179 throw("size of Workbuf is suboptimal") 180 } 181 // No sweep on the first cycle. 182 sweep.active.state.Store(sweepDrainedMask) 183 184 // Initialize GC pacer state. 185 // Use the environment variable GOGC for the initial gcPercent value. 186 // Use the environment variable GOMEMLIMIT for the initial memoryLimit value. 187 gcController.init(readGOGC(), readGOMEMLIMIT()) 188 189 work.startSema = 1 190 work.markDoneSema = 1 191 lockInit(&work.sweepWaiters.lock, lockRankSweepWaiters) 192 lockInit(&work.assistQueue.lock, lockRankAssistQueue) 193 lockInit(&work.wbufSpans.lock, lockRankWbufSpans) 194 } 195 196 // gcenable is called after the bulk of the runtime initialization, 197 // just before we're about to start letting user code run. 198 // It kicks off the background sweeper goroutine, the background 199 // scavenger goroutine, and enables GC. 200 func gcenable() { 201 // Kick off sweeping and scavenging. 202 c := make(chan int, 2) 203 go bgsweep(c) 204 go bgscavenge(c) 205 <-c 206 <-c 207 memstats.enablegc = true // now that runtime is initialized, GC is okay 208 } 209 210 // Garbage collector phase. 211 // Indicates to write barrier and synchronization task to perform. 212 var gcphase uint32 213 214 // The compiler knows about this variable. 215 // If you change it, you must change builtin/runtime.go, too. 216 // If you change the first four bytes, you must also change the write 217 // barrier insertion code. 218 var writeBarrier struct { 219 enabled bool // compiler emits a check of this before calling write barrier 220 pad [3]byte // compiler uses 32-bit load for "enabled" field 221 alignme uint64 // guarantee alignment so that compiler can use a 32 or 64-bit load 222 } 223 224 // gcBlackenEnabled is 1 if mutator assists and background mark 225 // workers are allowed to blacken objects. This must only be set when 226 // gcphase == _GCmark. 227 var gcBlackenEnabled uint32 228 229 const ( 230 _GCoff = iota // GC not running; sweeping in background, write barrier disabled 231 _GCmark // GC marking roots and workbufs: allocate black, write barrier ENABLED 232 _GCmarktermination // GC mark termination: allocate black, P's help GC, write barrier ENABLED 233 ) 234 235 //go:nosplit 236 func setGCPhase(x uint32) { 237 atomic.Store(&gcphase, x) 238 writeBarrier.enabled = gcphase == _GCmark || gcphase == _GCmarktermination 239 } 240 241 // gcMarkWorkerMode represents the mode that a concurrent mark worker 242 // should operate in. 243 // 244 // Concurrent marking happens through four different mechanisms. One 245 // is mutator assists, which happen in response to allocations and are 246 // not scheduled. The other three are variations in the per-P mark 247 // workers and are distinguished by gcMarkWorkerMode. 248 type gcMarkWorkerMode int 249 250 const ( 251 // gcMarkWorkerNotWorker indicates that the next scheduled G is not 252 // starting work and the mode should be ignored. 253 gcMarkWorkerNotWorker gcMarkWorkerMode = iota 254 255 // gcMarkWorkerDedicatedMode indicates that the P of a mark 256 // worker is dedicated to running that mark worker. The mark 257 // worker should run without preemption. 258 gcMarkWorkerDedicatedMode 259 260 // gcMarkWorkerFractionalMode indicates that a P is currently 261 // running the "fractional" mark worker. The fractional worker 262 // is necessary when GOMAXPROCS*gcBackgroundUtilization is not 263 // an integer and using only dedicated workers would result in 264 // utilization too far from the target of gcBackgroundUtilization. 265 // The fractional worker should run until it is preempted and 266 // will be scheduled to pick up the fractional part of 267 // GOMAXPROCS*gcBackgroundUtilization. 268 gcMarkWorkerFractionalMode 269 270 // gcMarkWorkerIdleMode indicates that a P is running the mark 271 // worker because it has nothing else to do. The idle worker 272 // should run until it is preempted and account its time 273 // against gcController.idleMarkTime. 274 gcMarkWorkerIdleMode 275 ) 276 277 // gcMarkWorkerModeStrings are the strings labels of gcMarkWorkerModes 278 // to use in execution traces. 279 var gcMarkWorkerModeStrings = [...]string{ 280 "Not worker", 281 "GC (dedicated)", 282 "GC (fractional)", 283 "GC (idle)", 284 } 285 286 // pollFractionalWorkerExit reports whether a fractional mark worker 287 // should self-preempt. It assumes it is called from the fractional 288 // worker. 289 func pollFractionalWorkerExit() bool { 290 // This should be kept in sync with the fractional worker 291 // scheduler logic in findRunnableGCWorker. 292 now := nanotime() 293 delta := now - gcController.markStartTime 294 if delta <= 0 { 295 return true 296 } 297 p := getg().m.p.ptr() 298 selfTime := p.gcFractionalMarkTime + (now - p.gcMarkWorkerStartTime) 299 // Add some slack to the utilization goal so that the 300 // fractional worker isn't behind again the instant it exits. 301 return float64(selfTime)/float64(delta) > 1.2*gcController.fractionalUtilizationGoal 302 } 303 304 var work workType 305 306 type workType struct { 307 full lfstack // lock-free list of full blocks workbuf 308 _ cpu.CacheLinePad // prevents false-sharing between full and empty 309 empty lfstack // lock-free list of empty blocks workbuf 310 _ cpu.CacheLinePad // prevents false-sharing between empty and nproc/nwait 311 312 wbufSpans struct { 313 lock mutex 314 // free is a list of spans dedicated to workbufs, but 315 // that don't currently contain any workbufs. 316 free mSpanList 317 // busy is a list of all spans containing workbufs on 318 // one of the workbuf lists. 319 busy mSpanList 320 } 321 322 // Restore 64-bit alignment on 32-bit. 323 _ uint32 324 325 // bytesMarked is the number of bytes marked this cycle. This 326 // includes bytes blackened in scanned objects, noscan objects 327 // that go straight to black, and permagrey objects scanned by 328 // markroot during the concurrent scan phase. This is updated 329 // atomically during the cycle. Updates may be batched 330 // arbitrarily, since the value is only read at the end of the 331 // cycle. 332 // 333 // Because of benign races during marking, this number may not 334 // be the exact number of marked bytes, but it should be very 335 // close. 336 // 337 // Put this field here because it needs 64-bit atomic access 338 // (and thus 8-byte alignment even on 32-bit architectures). 339 bytesMarked uint64 340 341 markrootNext uint32 // next markroot job 342 markrootJobs uint32 // number of markroot jobs 343 344 nproc uint32 345 tstart int64 346 nwait uint32 347 348 // Number of roots of various root types. Set by gcMarkRootPrepare. 349 // 350 // nStackRoots == len(stackRoots), but we have nStackRoots for 351 // consistency. 352 nDataRoots, nBSSRoots, nSpanRoots, nStackRoots int 353 354 // Base indexes of each root type. Set by gcMarkRootPrepare. 355 baseData, baseBSS, baseSpans, baseStacks, baseEnd uint32 356 357 // stackRoots is a snapshot of all of the Gs that existed 358 // before the beginning of concurrent marking. The backing 359 // store of this must not be modified because it might be 360 // shared with allgs. 361 stackRoots []*g 362 363 // Each type of GC state transition is protected by a lock. 364 // Since multiple threads can simultaneously detect the state 365 // transition condition, any thread that detects a transition 366 // condition must acquire the appropriate transition lock, 367 // re-check the transition condition and return if it no 368 // longer holds or perform the transition if it does. 369 // Likewise, any transition must invalidate the transition 370 // condition before releasing the lock. This ensures that each 371 // transition is performed by exactly one thread and threads 372 // that need the transition to happen block until it has 373 // happened. 374 // 375 // startSema protects the transition from "off" to mark or 376 // mark termination. 377 startSema uint32 378 // markDoneSema protects transitions from mark to mark termination. 379 markDoneSema uint32 380 381 bgMarkReady note // signal background mark worker has started 382 bgMarkDone uint32 // cas to 1 when at a background mark completion point 383 // Background mark completion signaling 384 385 // mode is the concurrency mode of the current GC cycle. 386 mode gcMode 387 388 // userForced indicates the current GC cycle was forced by an 389 // explicit user call. 390 userForced bool 391 392 // initialHeapLive is the value of gcController.heapLive at the 393 // beginning of this GC cycle. 394 initialHeapLive uint64 395 396 // assistQueue is a queue of assists that are blocked because 397 // there was neither enough credit to steal or enough work to 398 // do. 399 assistQueue struct { 400 lock mutex 401 q gQueue 402 } 403 404 // sweepWaiters is a list of blocked goroutines to wake when 405 // we transition from mark termination to sweep. 406 sweepWaiters struct { 407 lock mutex 408 list gList 409 } 410 411 // cycles is the number of completed GC cycles, where a GC 412 // cycle is sweep termination, mark, mark termination, and 413 // sweep. This differs from memstats.numgc, which is 414 // incremented at mark termination. 415 cycles atomic.Uint32 416 417 // Timing/utilization stats for this cycle. 418 stwprocs, maxprocs int32 419 tSweepTerm, tMark, tMarkTerm, tEnd int64 // nanotime() of phase start 420 421 pauseNS int64 // total STW time this cycle 422 423 // debug.gctrace heap sizes for this cycle. 424 heap0, heap1, heap2 uint64 425 426 // Cumulative estimated CPU usage. 427 cpuStats 428 } 429 430 // GC runs a garbage collection and blocks the caller until the 431 // garbage collection is complete. It may also block the entire 432 // program. 433 func GC() { 434 // We consider a cycle to be: sweep termination, mark, mark 435 // termination, and sweep. This function shouldn't return 436 // until a full cycle has been completed, from beginning to 437 // end. Hence, we always want to finish up the current cycle 438 // and start a new one. That means: 439 // 440 // 1. In sweep termination, mark, or mark termination of cycle 441 // N, wait until mark termination N completes and transitions 442 // to sweep N. 443 // 444 // 2. In sweep N, help with sweep N. 445 // 446 // At this point we can begin a full cycle N+1. 447 // 448 // 3. Trigger cycle N+1 by starting sweep termination N+1. 449 // 450 // 4. Wait for mark termination N+1 to complete. 451 // 452 // 5. Help with sweep N+1 until it's done. 453 // 454 // This all has to be written to deal with the fact that the 455 // GC may move ahead on its own. For example, when we block 456 // until mark termination N, we may wake up in cycle N+2. 457 458 // Wait until the current sweep termination, mark, and mark 459 // termination complete. 460 n := work.cycles.Load() 461 gcWaitOnMark(n) 462 463 // We're now in sweep N or later. Trigger GC cycle N+1, which 464 // will first finish sweep N if necessary and then enter sweep 465 // termination N+1. 466 gcStart(gcTrigger{kind: gcTriggerCycle, n: n + 1}) 467 468 // Wait for mark termination N+1 to complete. 469 gcWaitOnMark(n + 1) 470 471 // Finish sweep N+1 before returning. We do this both to 472 // complete the cycle and because runtime.GC() is often used 473 // as part of tests and benchmarks to get the system into a 474 // relatively stable and isolated state. 475 for work.cycles.Load() == n+1 && sweepone() != ^uintptr(0) { 476 Gosched() 477 } 478 479 // Callers may assume that the heap profile reflects the 480 // just-completed cycle when this returns (historically this 481 // happened because this was a STW GC), but right now the 482 // profile still reflects mark termination N, not N+1. 483 // 484 // As soon as all of the sweep frees from cycle N+1 are done, 485 // we can go ahead and publish the heap profile. 486 // 487 // First, wait for sweeping to finish. (We know there are no 488 // more spans on the sweep queue, but we may be concurrently 489 // sweeping spans, so we have to wait.) 490 for work.cycles.Load() == n+1 && !isSweepDone() { 491 Gosched() 492 } 493 494 // Now we're really done with sweeping, so we can publish the 495 // stable heap profile. Only do this if we haven't already hit 496 // another mark termination. 497 mp := acquirem() 498 cycle := work.cycles.Load() 499 if cycle == n+1 || (gcphase == _GCmark && cycle == n+2) { 500 mProf_PostSweep() 501 } 502 releasem(mp) 503 } 504 505 // gcWaitOnMark blocks until GC finishes the Nth mark phase. If GC has 506 // already completed this mark phase, it returns immediately. 507 func gcWaitOnMark(n uint32) { 508 for { 509 // Disable phase transitions. 510 lock(&work.sweepWaiters.lock) 511 nMarks := work.cycles.Load() 512 if gcphase != _GCmark { 513 // We've already completed this cycle's mark. 514 nMarks++ 515 } 516 if nMarks > n { 517 // We're done. 518 unlock(&work.sweepWaiters.lock) 519 return 520 } 521 522 // Wait until sweep termination, mark, and mark 523 // termination of cycle N complete. 524 work.sweepWaiters.list.push(getg()) 525 goparkunlock(&work.sweepWaiters.lock, waitReasonWaitForGCCycle, traceBlockUntilGCEnds, 1) 526 } 527 } 528 529 // gcMode indicates how concurrent a GC cycle should be. 530 type gcMode int 531 532 const ( 533 gcBackgroundMode gcMode = iota // concurrent GC and sweep 534 gcForceMode // stop-the-world GC now, concurrent sweep 535 gcForceBlockMode // stop-the-world GC now and STW sweep (forced by user) 536 ) 537 538 // A gcTrigger is a predicate for starting a GC cycle. Specifically, 539 // it is an exit condition for the _GCoff phase. 540 type gcTrigger struct { 541 kind gcTriggerKind 542 now int64 // gcTriggerTime: current time 543 n uint32 // gcTriggerCycle: cycle number to start 544 } 545 546 type gcTriggerKind int 547 548 const ( 549 // gcTriggerHeap indicates that a cycle should be started when 550 // the heap size reaches the trigger heap size computed by the 551 // controller. 552 gcTriggerHeap gcTriggerKind = iota 553 554 // gcTriggerTime indicates that a cycle should be started when 555 // it's been more than forcegcperiod nanoseconds since the 556 // previous GC cycle. 557 gcTriggerTime 558 559 // gcTriggerCycle indicates that a cycle should be started if 560 // we have not yet started cycle number gcTrigger.n (relative 561 // to work.cycles). 562 gcTriggerCycle 563 ) 564 565 // test reports whether the trigger condition is satisfied, meaning 566 // that the exit condition for the _GCoff phase has been met. The exit 567 // condition should be tested when allocating. 568 func (t gcTrigger) test() bool { 569 if !memstats.enablegc || panicking.Load() != 0 || gcphase != _GCoff { 570 return false 571 } 572 switch t.kind { 573 case gcTriggerHeap: 574 trigger, _ := gcController.trigger() 575 return gcController.heapLive.Load() >= trigger 576 case gcTriggerTime: 577 if gcController.gcPercent.Load() < 0 { 578 return false 579 } 580 lastgc := int64(atomic.Load64(&memstats.last_gc_nanotime)) 581 return lastgc != 0 && t.now-lastgc > forcegcperiod 582 case gcTriggerCycle: 583 // t.n > work.cycles, but accounting for wraparound. 584 return int32(t.n-work.cycles.Load()) > 0 585 } 586 return true 587 } 588 589 // gcStart starts the GC. It transitions from _GCoff to _GCmark (if 590 // debug.gcstoptheworld == 0) or performs all of GC (if 591 // debug.gcstoptheworld != 0). 592 // 593 // This may return without performing this transition in some cases, 594 // such as when called on a system stack or with locks held. 595 func gcStart(trigger gcTrigger) { 596 // Since this is called from malloc and malloc is called in 597 // the guts of a number of libraries that might be holding 598 // locks, don't attempt to start GC in non-preemptible or 599 // potentially unstable situations. 600 mp := acquirem() 601 if gp := getg(); gp == mp.g0 || mp.locks > 1 || mp.preemptoff != "" { 602 releasem(mp) 603 return 604 } 605 releasem(mp) 606 mp = nil 607 608 // Pick up the remaining unswept/not being swept spans concurrently 609 // 610 // This shouldn't happen if we're being invoked in background 611 // mode since proportional sweep should have just finished 612 // sweeping everything, but rounding errors, etc, may leave a 613 // few spans unswept. In forced mode, this is necessary since 614 // GC can be forced at any point in the sweeping cycle. 615 // 616 // We check the transition condition continuously here in case 617 // this G gets delayed in to the next GC cycle. 618 for trigger.test() && sweepone() != ^uintptr(0) { 619 } 620 621 // Perform GC initialization and the sweep termination 622 // transition. 623 semacquire(&work.startSema) 624 // Re-check transition condition under transition lock. 625 if !trigger.test() { 626 semrelease(&work.startSema) 627 return 628 } 629 630 // In gcstoptheworld debug mode, upgrade the mode accordingly. 631 // We do this after re-checking the transition condition so 632 // that multiple goroutines that detect the heap trigger don't 633 // start multiple STW GCs. 634 mode := gcBackgroundMode 635 if debug.gcstoptheworld == 1 { 636 mode = gcForceMode 637 } else if debug.gcstoptheworld == 2 { 638 mode = gcForceBlockMode 639 } 640 641 // Ok, we're doing it! Stop everybody else 642 semacquire(&gcsema) 643 semacquire(&worldsema) 644 645 // For stats, check if this GC was forced by the user. 646 // Update it under gcsema to avoid gctrace getting wrong values. 647 work.userForced = trigger.kind == gcTriggerCycle 648 649 trace := traceAcquire() 650 if trace.ok() { 651 trace.GCStart() 652 traceRelease(trace) 653 } 654 655 // Check that all Ps have finished deferred mcache flushes. 656 for _, p := range allp { 657 if fg := p.mcache.flushGen.Load(); fg != mheap_.sweepgen { 658 println("runtime: p", p.id, "flushGen", fg, "!= sweepgen", mheap_.sweepgen) 659 throw("p mcache not flushed") 660 } 661 } 662 663 gcBgMarkStartWorkers() 664 665 systemstack(gcResetMarkState) 666 667 work.stwprocs, work.maxprocs = gomaxprocs, gomaxprocs 668 if work.stwprocs > ncpu { 669 // This is used to compute CPU time of the STW phases, 670 // so it can't be more than ncpu, even if GOMAXPROCS is. 671 work.stwprocs = ncpu 672 } 673 work.heap0 = gcController.heapLive.Load() 674 work.pauseNS = 0 675 work.mode = mode 676 677 now := nanotime() 678 work.tSweepTerm = now 679 var stw worldStop 680 systemstack(func() { 681 stw = stopTheWorldWithSema(stwGCSweepTerm) 682 }) 683 // Finish sweep before we start concurrent scan. 684 systemstack(func() { 685 finishsweep_m() 686 }) 687 688 // clearpools before we start the GC. If we wait the memory will not be 689 // reclaimed until the next GC cycle. 690 clearpools() 691 692 work.cycles.Add(1) 693 694 // Assists and workers can start the moment we start 695 // the world. 696 gcController.startCycle(now, int(gomaxprocs), trigger) 697 698 // Notify the CPU limiter that assists may begin. 699 gcCPULimiter.startGCTransition(true, now) 700 701 // In STW mode, disable scheduling of user Gs. This may also 702 // disable scheduling of this goroutine, so it may block as 703 // soon as we start the world again. 704 if mode != gcBackgroundMode { 705 schedEnableUser(false) 706 } 707 708 // Enter concurrent mark phase and enable 709 // write barriers. 710 // 711 // Because the world is stopped, all Ps will 712 // observe that write barriers are enabled by 713 // the time we start the world and begin 714 // scanning. 715 // 716 // Write barriers must be enabled before assists are 717 // enabled because they must be enabled before 718 // any non-leaf heap objects are marked. Since 719 // allocations are blocked until assists can 720 // happen, we want to enable assists as early as 721 // possible. 722 setGCPhase(_GCmark) 723 724 gcBgMarkPrepare() // Must happen before assists are enabled. 725 gcMarkRootPrepare() 726 727 // Mark all active tinyalloc blocks. Since we're 728 // allocating from these, they need to be black like 729 // other allocations. The alternative is to blacken 730 // the tiny block on every allocation from it, which 731 // would slow down the tiny allocator. 732 gcMarkTinyAllocs() 733 734 // At this point all Ps have enabled the write 735 // barrier, thus maintaining the no white to 736 // black invariant. Enable mutator assists to 737 // put back-pressure on fast allocating 738 // mutators. 739 atomic.Store(&gcBlackenEnabled, 1) 740 741 // In STW mode, we could block the instant systemstack 742 // returns, so make sure we're not preemptible. 743 mp = acquirem() 744 745 // Concurrent mark. 746 systemstack(func() { 747 now = startTheWorldWithSema(0, stw) 748 work.pauseNS += now - stw.start 749 work.tMark = now 750 751 sweepTermCpu := int64(work.stwprocs) * (work.tMark - work.tSweepTerm) 752 work.cpuStats.gcPauseTime += sweepTermCpu 753 work.cpuStats.gcTotalTime += sweepTermCpu 754 755 // Release the CPU limiter. 756 gcCPULimiter.finishGCTransition(now) 757 }) 758 759 // Release the world sema before Gosched() in STW mode 760 // because we will need to reacquire it later but before 761 // this goroutine becomes runnable again, and we could 762 // self-deadlock otherwise. 763 semrelease(&worldsema) 764 releasem(mp) 765 766 // Make sure we block instead of returning to user code 767 // in STW mode. 768 if mode != gcBackgroundMode { 769 Gosched() 770 } 771 772 semrelease(&work.startSema) 773 } 774 775 // gcMarkDoneFlushed counts the number of P's with flushed work. 776 // 777 // Ideally this would be a captured local in gcMarkDone, but forEachP 778 // escapes its callback closure, so it can't capture anything. 779 // 780 // This is protected by markDoneSema. 781 var gcMarkDoneFlushed uint32 782 783 // gcMarkDone transitions the GC from mark to mark termination if all 784 // reachable objects have been marked (that is, there are no grey 785 // objects and can be no more in the future). Otherwise, it flushes 786 // all local work to the global queues where it can be discovered by 787 // other workers. 788 // 789 // This should be called when all local mark work has been drained and 790 // there are no remaining workers. Specifically, when 791 // 792 // work.nwait == work.nproc && !gcMarkWorkAvailable(p) 793 // 794 // The calling context must be preemptible. 795 // 796 // Flushing local work is important because idle Ps may have local 797 // work queued. This is the only way to make that work visible and 798 // drive GC to completion. 799 // 800 // It is explicitly okay to have write barriers in this function. If 801 // it does transition to mark termination, then all reachable objects 802 // have been marked, so the write barrier cannot shade any more 803 // objects. 804 func gcMarkDone() { 805 // Ensure only one thread is running the ragged barrier at a 806 // time. 807 semacquire(&work.markDoneSema) 808 809 top: 810 // Re-check transition condition under transition lock. 811 // 812 // It's critical that this checks the global work queues are 813 // empty before performing the ragged barrier. Otherwise, 814 // there could be global work that a P could take after the P 815 // has passed the ragged barrier. 816 if !(gcphase == _GCmark && work.nwait == work.nproc && !gcMarkWorkAvailable(nil)) { 817 semrelease(&work.markDoneSema) 818 return 819 } 820 821 // forEachP needs worldsema to execute, and we'll need it to 822 // stop the world later, so acquire worldsema now. 823 semacquire(&worldsema) 824 825 // Flush all local buffers and collect flushedWork flags. 826 gcMarkDoneFlushed = 0 827 forEachP(waitReasonGCMarkTermination, func(pp *p) { 828 // Flush the write barrier buffer, since this may add 829 // work to the gcWork. 830 wbBufFlush1(pp) 831 832 // Flush the gcWork, since this may create global work 833 // and set the flushedWork flag. 834 // 835 // TODO(austin): Break up these workbufs to 836 // better distribute work. 837 pp.gcw.dispose() 838 // Collect the flushedWork flag. 839 if pp.gcw.flushedWork { 840 atomic.Xadd(&gcMarkDoneFlushed, 1) 841 pp.gcw.flushedWork = false 842 } 843 }) 844 845 if gcMarkDoneFlushed != 0 { 846 // More grey objects were discovered since the 847 // previous termination check, so there may be more 848 // work to do. Keep going. It's possible the 849 // transition condition became true again during the 850 // ragged barrier, so re-check it. 851 semrelease(&worldsema) 852 goto top 853 } 854 855 // There was no global work, no local work, and no Ps 856 // communicated work since we took markDoneSema. Therefore 857 // there are no grey objects and no more objects can be 858 // shaded. Transition to mark termination. 859 now := nanotime() 860 work.tMarkTerm = now 861 getg().m.preemptoff = "gcing" 862 var stw worldStop 863 systemstack(func() { 864 stw = stopTheWorldWithSema(stwGCMarkTerm) 865 }) 866 // The gcphase is _GCmark, it will transition to _GCmarktermination 867 // below. The important thing is that the wb remains active until 868 // all marking is complete. This includes writes made by the GC. 869 870 // There is sometimes work left over when we enter mark termination due 871 // to write barriers performed after the completion barrier above. 872 // Detect this and resume concurrent mark. This is obviously 873 // unfortunate. 874 // 875 // See issue #27993 for details. 876 // 877 // Switch to the system stack to call wbBufFlush1, though in this case 878 // it doesn't matter because we're non-preemptible anyway. 879 restart := false 880 systemstack(func() { 881 for _, p := range allp { 882 wbBufFlush1(p) 883 if !p.gcw.empty() { 884 restart = true 885 break 886 } 887 } 888 }) 889 if restart { 890 getg().m.preemptoff = "" 891 systemstack(func() { 892 now := startTheWorldWithSema(0, stw) 893 work.pauseNS += now - stw.start 894 }) 895 semrelease(&worldsema) 896 goto top 897 } 898 899 gcComputeStartingStackSize() 900 901 // Disable assists and background workers. We must do 902 // this before waking blocked assists. 903 atomic.Store(&gcBlackenEnabled, 0) 904 905 // Notify the CPU limiter that GC assists will now cease. 906 gcCPULimiter.startGCTransition(false, now) 907 908 // Wake all blocked assists. These will run when we 909 // start the world again. 910 gcWakeAllAssists() 911 912 // Likewise, release the transition lock. Blocked 913 // workers and assists will run when we start the 914 // world again. 915 semrelease(&work.markDoneSema) 916 917 // In STW mode, re-enable user goroutines. These will be 918 // queued to run after we start the world. 919 schedEnableUser(true) 920 921 // endCycle depends on all gcWork cache stats being flushed. 922 // The termination algorithm above ensured that up to 923 // allocations since the ragged barrier. 924 gcController.endCycle(now, int(gomaxprocs), work.userForced) 925 926 // Perform mark termination. This will restart the world. 927 gcMarkTermination(stw) 928 } 929 930 // World must be stopped and mark assists and background workers must be 931 // disabled. 932 func gcMarkTermination(stw worldStop) { 933 // Start marktermination (write barrier remains enabled for now). 934 setGCPhase(_GCmarktermination) 935 936 work.heap1 = gcController.heapLive.Load() 937 startTime := nanotime() 938 939 mp := acquirem() 940 mp.preemptoff = "gcing" 941 mp.traceback = 2 942 curgp := mp.curg 943 // N.B. The execution tracer is not aware of this status 944 // transition and handles it specially based on the 945 // wait reason. 946 casGToWaiting(curgp, _Grunning, waitReasonGarbageCollection) 947 948 // Run gc on the g0 stack. We do this so that the g stack 949 // we're currently running on will no longer change. Cuts 950 // the root set down a bit (g0 stacks are not scanned, and 951 // we don't need to scan gc's internal state). We also 952 // need to switch to g0 so we can shrink the stack. 953 systemstack(func() { 954 gcMark(startTime) 955 // Must return immediately. 956 // The outer function's stack may have moved 957 // during gcMark (it shrinks stacks, including the 958 // outer function's stack), so we must not refer 959 // to any of its variables. Return back to the 960 // non-system stack to pick up the new addresses 961 // before continuing. 962 }) 963 964 var stwSwept bool 965 systemstack(func() { 966 work.heap2 = work.bytesMarked 967 if debug.gccheckmark > 0 { 968 // Run a full non-parallel, stop-the-world 969 // mark using checkmark bits, to check that we 970 // didn't forget to mark anything during the 971 // concurrent mark process. 972 startCheckmarks() 973 gcResetMarkState() 974 gcw := &getg().m.p.ptr().gcw 975 gcDrain(gcw, 0) 976 wbBufFlush1(getg().m.p.ptr()) 977 gcw.dispose() 978 endCheckmarks() 979 } 980 981 // marking is complete so we can turn the write barrier off 982 setGCPhase(_GCoff) 983 stwSwept = gcSweep(work.mode) 984 }) 985 986 mp.traceback = 0 987 casgstatus(curgp, _Gwaiting, _Grunning) 988 989 trace := traceAcquire() 990 if trace.ok() { 991 trace.GCDone() 992 traceRelease(trace) 993 } 994 995 // all done 996 mp.preemptoff = "" 997 998 if gcphase != _GCoff { 999 throw("gc done but gcphase != _GCoff") 1000 } 1001 1002 // Record heapInUse for scavenger. 1003 memstats.lastHeapInUse = gcController.heapInUse.load() 1004 1005 // Update GC trigger and pacing, as well as downstream consumers 1006 // of this pacing information, for the next cycle. 1007 systemstack(gcControllerCommit) 1008 1009 // Update timing memstats 1010 now := nanotime() 1011 sec, nsec, _ := time_now() 1012 unixNow := sec*1e9 + int64(nsec) 1013 work.pauseNS += now - stw.start 1014 work.tEnd = now 1015 atomic.Store64(&memstats.last_gc_unix, uint64(unixNow)) // must be Unix time to make sense to user 1016 atomic.Store64(&memstats.last_gc_nanotime, uint64(now)) // monotonic time for us 1017 memstats.pause_ns[memstats.numgc%uint32(len(memstats.pause_ns))] = uint64(work.pauseNS) 1018 memstats.pause_end[memstats.numgc%uint32(len(memstats.pause_end))] = uint64(unixNow) 1019 memstats.pause_total_ns += uint64(work.pauseNS) 1020 1021 markTermCpu := int64(work.stwprocs) * (work.tEnd - work.tMarkTerm) 1022 work.cpuStats.gcPauseTime += markTermCpu 1023 work.cpuStats.gcTotalTime += markTermCpu 1024 1025 // Accumulate CPU stats. 1026 // 1027 // Pass gcMarkPhase=true so we can get all the latest GC CPU stats in there too. 1028 work.cpuStats.accumulate(now, true) 1029 1030 // Compute overall GC CPU utilization. 1031 // Omit idle marking time from the overall utilization here since it's "free". 1032 memstats.gc_cpu_fraction = float64(work.cpuStats.gcTotalTime-work.cpuStats.gcIdleTime) / float64(work.cpuStats.totalTime) 1033 1034 // Reset assist time and background time stats. 1035 // 1036 // Do this now, instead of at the start of the next GC cycle, because 1037 // these two may keep accumulating even if the GC is not active. 1038 scavenge.assistTime.Store(0) 1039 scavenge.backgroundTime.Store(0) 1040 1041 // Reset idle time stat. 1042 sched.idleTime.Store(0) 1043 1044 if work.userForced { 1045 memstats.numforcedgc++ 1046 } 1047 1048 // Bump GC cycle count and wake goroutines waiting on sweep. 1049 lock(&work.sweepWaiters.lock) 1050 memstats.numgc++ 1051 injectglist(&work.sweepWaiters.list) 1052 unlock(&work.sweepWaiters.lock) 1053 1054 // Increment the scavenge generation now. 1055 // 1056 // This moment represents peak heap in use because we're 1057 // about to start sweeping. 1058 mheap_.pages.scav.index.nextGen() 1059 1060 // Release the CPU limiter. 1061 gcCPULimiter.finishGCTransition(now) 1062 1063 // Finish the current heap profiling cycle and start a new 1064 // heap profiling cycle. We do this before starting the world 1065 // so events don't leak into the wrong cycle. 1066 mProf_NextCycle() 1067 1068 // There may be stale spans in mcaches that need to be swept. 1069 // Those aren't tracked in any sweep lists, so we need to 1070 // count them against sweep completion until we ensure all 1071 // those spans have been forced out. 1072 // 1073 // If gcSweep fully swept the heap (for example if the sweep 1074 // is not concurrent due to a GODEBUG setting), then we expect 1075 // the sweepLocker to be invalid, since sweeping is done. 1076 // 1077 // N.B. Below we might duplicate some work from gcSweep; this is 1078 // fine as all that work is idempotent within a GC cycle, and 1079 // we're still holding worldsema so a new cycle can't start. 1080 sl := sweep.active.begin() 1081 if !stwSwept && !sl.valid { 1082 throw("failed to set sweep barrier") 1083 } else if stwSwept && sl.valid { 1084 throw("non-concurrent sweep failed to drain all sweep queues") 1085 } 1086 1087 systemstack(func() { 1088 // The memstats updated above must be updated with the world 1089 // stopped to ensure consistency of some values, such as 1090 // sched.idleTime and sched.totaltime. memstats also include 1091 // the pause time (work,pauseNS), forcing computation of the 1092 // total pause time before the pause actually ends. 1093 // 1094 // Here we reuse the same now for start the world so that the 1095 // time added to /sched/pauses/total/gc:seconds will be 1096 // consistent with the value in memstats. 1097 startTheWorldWithSema(now, stw) 1098 }) 1099 1100 // Flush the heap profile so we can start a new cycle next GC. 1101 // This is relatively expensive, so we don't do it with the 1102 // world stopped. 1103 mProf_Flush() 1104 1105 // Prepare workbufs for freeing by the sweeper. We do this 1106 // asynchronously because it can take non-trivial time. 1107 prepareFreeWorkbufs() 1108 1109 // Free stack spans. This must be done between GC cycles. 1110 systemstack(freeStackSpans) 1111 1112 // Ensure all mcaches are flushed. Each P will flush its own 1113 // mcache before allocating, but idle Ps may not. Since this 1114 // is necessary to sweep all spans, we need to ensure all 1115 // mcaches are flushed before we start the next GC cycle. 1116 // 1117 // While we're here, flush the page cache for idle Ps to avoid 1118 // having pages get stuck on them. These pages are hidden from 1119 // the scavenger, so in small idle heaps a significant amount 1120 // of additional memory might be held onto. 1121 // 1122 // Also, flush the pinner cache, to avoid leaking that memory 1123 // indefinitely. 1124 forEachP(waitReasonFlushProcCaches, func(pp *p) { 1125 pp.mcache.prepareForSweep() 1126 if pp.status == _Pidle { 1127 systemstack(func() { 1128 lock(&mheap_.lock) 1129 pp.pcache.flush(&mheap_.pages) 1130 unlock(&mheap_.lock) 1131 }) 1132 } 1133 pp.pinnerCache = nil 1134 }) 1135 if sl.valid { 1136 // Now that we've swept stale spans in mcaches, they don't 1137 // count against unswept spans. 1138 // 1139 // Note: this sweepLocker may not be valid if sweeping had 1140 // already completed during the STW. See the corresponding 1141 // begin() call that produced sl. 1142 sweep.active.end(sl) 1143 } 1144 1145 // Print gctrace before dropping worldsema. As soon as we drop 1146 // worldsema another cycle could start and smash the stats 1147 // we're trying to print. 1148 if debug.gctrace > 0 { 1149 util := int(memstats.gc_cpu_fraction * 100) 1150 1151 var sbuf [24]byte 1152 printlock() 1153 print("gc ", memstats.numgc, 1154 " @", string(itoaDiv(sbuf[:], uint64(work.tSweepTerm-runtimeInitTime)/1e6, 3)), "s ", 1155 util, "%: ") 1156 prev := work.tSweepTerm 1157 for i, ns := range []int64{work.tMark, work.tMarkTerm, work.tEnd} { 1158 if i != 0 { 1159 print("+") 1160 } 1161 print(string(fmtNSAsMS(sbuf[:], uint64(ns-prev)))) 1162 prev = ns 1163 } 1164 print(" ms clock, ") 1165 for i, ns := range []int64{ 1166 int64(work.stwprocs) * (work.tMark - work.tSweepTerm), 1167 gcController.assistTime.Load(), 1168 gcController.dedicatedMarkTime.Load() + gcController.fractionalMarkTime.Load(), 1169 gcController.idleMarkTime.Load(), 1170 markTermCpu, 1171 } { 1172 if i == 2 || i == 3 { 1173 // Separate mark time components with /. 1174 print("/") 1175 } else if i != 0 { 1176 print("+") 1177 } 1178 print(string(fmtNSAsMS(sbuf[:], uint64(ns)))) 1179 } 1180 print(" ms cpu, ", 1181 work.heap0>>20, "->", work.heap1>>20, "->", work.heap2>>20, " MB, ", 1182 gcController.lastHeapGoal>>20, " MB goal, ", 1183 gcController.lastStackScan.Load()>>20, " MB stacks, ", 1184 gcController.globalsScan.Load()>>20, " MB globals, ", 1185 work.maxprocs, " P") 1186 if work.userForced { 1187 print(" (forced)") 1188 } 1189 print("\n") 1190 printunlock() 1191 } 1192 1193 // Set any arena chunks that were deferred to fault. 1194 lock(&userArenaState.lock) 1195 faultList := userArenaState.fault 1196 userArenaState.fault = nil 1197 unlock(&userArenaState.lock) 1198 for _, lc := range faultList { 1199 lc.mspan.setUserArenaChunkToFault() 1200 } 1201 1202 // Enable huge pages on some metadata if we cross a heap threshold. 1203 if gcController.heapGoal() > minHeapForMetadataHugePages { 1204 systemstack(func() { 1205 mheap_.enableMetadataHugePages() 1206 }) 1207 } 1208 1209 semrelease(&worldsema) 1210 semrelease(&gcsema) 1211 // Careful: another GC cycle may start now. 1212 1213 releasem(mp) 1214 mp = nil 1215 1216 // now that gc is done, kick off finalizer thread if needed 1217 if !concurrentSweep { 1218 // give the queued finalizers, if any, a chance to run 1219 Gosched() 1220 } 1221 } 1222 1223 // gcBgMarkStartWorkers prepares background mark worker goroutines. These 1224 // goroutines will not run until the mark phase, but they must be started while 1225 // the work is not stopped and from a regular G stack. The caller must hold 1226 // worldsema. 1227 func gcBgMarkStartWorkers() { 1228 // Background marking is performed by per-P G's. Ensure that each P has 1229 // a background GC G. 1230 // 1231 // Worker Gs don't exit if gomaxprocs is reduced. If it is raised 1232 // again, we can reuse the old workers; no need to create new workers. 1233 for gcBgMarkWorkerCount < gomaxprocs { 1234 go gcBgMarkWorker() 1235 1236 notetsleepg(&work.bgMarkReady, -1) 1237 noteclear(&work.bgMarkReady) 1238 // The worker is now guaranteed to be added to the pool before 1239 // its P's next findRunnableGCWorker. 1240 1241 gcBgMarkWorkerCount++ 1242 } 1243 } 1244 1245 // gcBgMarkPrepare sets up state for background marking. 1246 // Mutator assists must not yet be enabled. 1247 func gcBgMarkPrepare() { 1248 // Background marking will stop when the work queues are empty 1249 // and there are no more workers (note that, since this is 1250 // concurrent, this may be a transient state, but mark 1251 // termination will clean it up). Between background workers 1252 // and assists, we don't really know how many workers there 1253 // will be, so we pretend to have an arbitrarily large number 1254 // of workers, almost all of which are "waiting". While a 1255 // worker is working it decrements nwait. If nproc == nwait, 1256 // there are no workers. 1257 work.nproc = ^uint32(0) 1258 work.nwait = ^uint32(0) 1259 } 1260 1261 // gcBgMarkWorkerNode is an entry in the gcBgMarkWorkerPool. It points to a single 1262 // gcBgMarkWorker goroutine. 1263 type gcBgMarkWorkerNode struct { 1264 // Unused workers are managed in a lock-free stack. This field must be first. 1265 node lfnode 1266 1267 // The g of this worker. 1268 gp guintptr 1269 1270 // Release this m on park. This is used to communicate with the unlock 1271 // function, which cannot access the G's stack. It is unused outside of 1272 // gcBgMarkWorker(). 1273 m muintptr 1274 } 1275 1276 func gcBgMarkWorker() { 1277 gp := getg() 1278 1279 // We pass node to a gopark unlock function, so it can't be on 1280 // the stack (see gopark). Prevent deadlock from recursively 1281 // starting GC by disabling preemption. 1282 gp.m.preemptoff = "GC worker init" 1283 node := new(gcBgMarkWorkerNode) 1284 gp.m.preemptoff = "" 1285 1286 node.gp.set(gp) 1287 1288 node.m.set(acquirem()) 1289 notewakeup(&work.bgMarkReady) 1290 // After this point, the background mark worker is generally scheduled 1291 // cooperatively by gcController.findRunnableGCWorker. While performing 1292 // work on the P, preemption is disabled because we are working on 1293 // P-local work buffers. When the preempt flag is set, this puts itself 1294 // into _Gwaiting to be woken up by gcController.findRunnableGCWorker 1295 // at the appropriate time. 1296 // 1297 // When preemption is enabled (e.g., while in gcMarkDone), this worker 1298 // may be preempted and schedule as a _Grunnable G from a runq. That is 1299 // fine; it will eventually gopark again for further scheduling via 1300 // findRunnableGCWorker. 1301 // 1302 // Since we disable preemption before notifying bgMarkReady, we 1303 // guarantee that this G will be in the worker pool for the next 1304 // findRunnableGCWorker. This isn't strictly necessary, but it reduces 1305 // latency between _GCmark starting and the workers starting. 1306 1307 for { 1308 // Go to sleep until woken by 1309 // gcController.findRunnableGCWorker. 1310 gopark(func(g *g, nodep unsafe.Pointer) bool { 1311 node := (*gcBgMarkWorkerNode)(nodep) 1312 1313 if mp := node.m.ptr(); mp != nil { 1314 // The worker G is no longer running; release 1315 // the M. 1316 // 1317 // N.B. it is _safe_ to release the M as soon 1318 // as we are no longer performing P-local mark 1319 // work. 1320 // 1321 // However, since we cooperatively stop work 1322 // when gp.preempt is set, if we releasem in 1323 // the loop then the following call to gopark 1324 // would immediately preempt the G. This is 1325 // also safe, but inefficient: the G must 1326 // schedule again only to enter gopark and park 1327 // again. Thus, we defer the release until 1328 // after parking the G. 1329 releasem(mp) 1330 } 1331 1332 // Release this G to the pool. 1333 gcBgMarkWorkerPool.push(&node.node) 1334 // Note that at this point, the G may immediately be 1335 // rescheduled and may be running. 1336 return true 1337 }, unsafe.Pointer(node), waitReasonGCWorkerIdle, traceBlockSystemGoroutine, 0) 1338 1339 // Preemption must not occur here, or another G might see 1340 // p.gcMarkWorkerMode. 1341 1342 // Disable preemption so we can use the gcw. If the 1343 // scheduler wants to preempt us, we'll stop draining, 1344 // dispose the gcw, and then preempt. 1345 node.m.set(acquirem()) 1346 pp := gp.m.p.ptr() // P can't change with preemption disabled. 1347 1348 if gcBlackenEnabled == 0 { 1349 println("worker mode", pp.gcMarkWorkerMode) 1350 throw("gcBgMarkWorker: blackening not enabled") 1351 } 1352 1353 if pp.gcMarkWorkerMode == gcMarkWorkerNotWorker { 1354 throw("gcBgMarkWorker: mode not set") 1355 } 1356 1357 startTime := nanotime() 1358 pp.gcMarkWorkerStartTime = startTime 1359 var trackLimiterEvent bool 1360 if pp.gcMarkWorkerMode == gcMarkWorkerIdleMode { 1361 trackLimiterEvent = pp.limiterEvent.start(limiterEventIdleMarkWork, startTime) 1362 } 1363 1364 decnwait := atomic.Xadd(&work.nwait, -1) 1365 if decnwait == work.nproc { 1366 println("runtime: work.nwait=", decnwait, "work.nproc=", work.nproc) 1367 throw("work.nwait was > work.nproc") 1368 } 1369 1370 systemstack(func() { 1371 // Mark our goroutine preemptible so its stack 1372 // can be scanned. This lets two mark workers 1373 // scan each other (otherwise, they would 1374 // deadlock). We must not modify anything on 1375 // the G stack. However, stack shrinking is 1376 // disabled for mark workers, so it is safe to 1377 // read from the G stack. 1378 // 1379 // N.B. The execution tracer is not aware of this status 1380 // transition and handles it specially based on the 1381 // wait reason. 1382 casGToWaiting(gp, _Grunning, waitReasonGCWorkerActive) 1383 switch pp.gcMarkWorkerMode { 1384 default: 1385 throw("gcBgMarkWorker: unexpected gcMarkWorkerMode") 1386 case gcMarkWorkerDedicatedMode: 1387 gcDrainMarkWorkerDedicated(&pp.gcw, true) 1388 if gp.preempt { 1389 // We were preempted. This is 1390 // a useful signal to kick 1391 // everything out of the run 1392 // queue so it can run 1393 // somewhere else. 1394 if drainQ, n := runqdrain(pp); n > 0 { 1395 lock(&sched.lock) 1396 globrunqputbatch(&drainQ, int32(n)) 1397 unlock(&sched.lock) 1398 } 1399 } 1400 // Go back to draining, this time 1401 // without preemption. 1402 gcDrainMarkWorkerDedicated(&pp.gcw, false) 1403 case gcMarkWorkerFractionalMode: 1404 gcDrainMarkWorkerFractional(&pp.gcw) 1405 case gcMarkWorkerIdleMode: 1406 gcDrainMarkWorkerIdle(&pp.gcw) 1407 } 1408 casgstatus(gp, _Gwaiting, _Grunning) 1409 }) 1410 1411 // Account for time and mark us as stopped. 1412 now := nanotime() 1413 duration := now - startTime 1414 gcController.markWorkerStop(pp.gcMarkWorkerMode, duration) 1415 if trackLimiterEvent { 1416 pp.limiterEvent.stop(limiterEventIdleMarkWork, now) 1417 } 1418 if pp.gcMarkWorkerMode == gcMarkWorkerFractionalMode { 1419 atomic.Xaddint64(&pp.gcFractionalMarkTime, duration) 1420 } 1421 1422 // Was this the last worker and did we run out 1423 // of work? 1424 incnwait := atomic.Xadd(&work.nwait, +1) 1425 if incnwait > work.nproc { 1426 println("runtime: p.gcMarkWorkerMode=", pp.gcMarkWorkerMode, 1427 "work.nwait=", incnwait, "work.nproc=", work.nproc) 1428 throw("work.nwait > work.nproc") 1429 } 1430 1431 // We'll releasem after this point and thus this P may run 1432 // something else. We must clear the worker mode to avoid 1433 // attributing the mode to a different (non-worker) G in 1434 // traceGoStart. 1435 pp.gcMarkWorkerMode = gcMarkWorkerNotWorker 1436 1437 // If this worker reached a background mark completion 1438 // point, signal the main GC goroutine. 1439 if incnwait == work.nproc && !gcMarkWorkAvailable(nil) { 1440 // We don't need the P-local buffers here, allow 1441 // preemption because we may schedule like a regular 1442 // goroutine in gcMarkDone (block on locks, etc). 1443 releasem(node.m.ptr()) 1444 node.m.set(nil) 1445 1446 gcMarkDone() 1447 } 1448 } 1449 } 1450 1451 // gcMarkWorkAvailable reports whether executing a mark worker 1452 // on p is potentially useful. p may be nil, in which case it only 1453 // checks the global sources of work. 1454 func gcMarkWorkAvailable(p *p) bool { 1455 if p != nil && !p.gcw.empty() { 1456 return true 1457 } 1458 if !work.full.empty() { 1459 return true // global work available 1460 } 1461 if work.markrootNext < work.markrootJobs { 1462 return true // root scan work available 1463 } 1464 return false 1465 } 1466 1467 // gcMark runs the mark (or, for concurrent GC, mark termination) 1468 // All gcWork caches must be empty. 1469 // STW is in effect at this point. 1470 func gcMark(startTime int64) { 1471 if debug.allocfreetrace > 0 { 1472 tracegc() 1473 } 1474 1475 if gcphase != _GCmarktermination { 1476 throw("in gcMark expecting to see gcphase as _GCmarktermination") 1477 } 1478 work.tstart = startTime 1479 1480 // Check that there's no marking work remaining. 1481 if work.full != 0 || work.markrootNext < work.markrootJobs { 1482 print("runtime: full=", hex(work.full), " next=", work.markrootNext, " jobs=", work.markrootJobs, " nDataRoots=", work.nDataRoots, " nBSSRoots=", work.nBSSRoots, " nSpanRoots=", work.nSpanRoots, " nStackRoots=", work.nStackRoots, "\n") 1483 panic("non-empty mark queue after concurrent mark") 1484 } 1485 1486 if debug.gccheckmark > 0 { 1487 // This is expensive when there's a large number of 1488 // Gs, so only do it if checkmark is also enabled. 1489 gcMarkRootCheck() 1490 } 1491 1492 // Drop allg snapshot. allgs may have grown, in which case 1493 // this is the only reference to the old backing store and 1494 // there's no need to keep it around. 1495 work.stackRoots = nil 1496 1497 // Clear out buffers and double-check that all gcWork caches 1498 // are empty. This should be ensured by gcMarkDone before we 1499 // enter mark termination. 1500 // 1501 // TODO: We could clear out buffers just before mark if this 1502 // has a non-negligible impact on STW time. 1503 for _, p := range allp { 1504 // The write barrier may have buffered pointers since 1505 // the gcMarkDone barrier. However, since the barrier 1506 // ensured all reachable objects were marked, all of 1507 // these must be pointers to black objects. Hence we 1508 // can just discard the write barrier buffer. 1509 if debug.gccheckmark > 0 { 1510 // For debugging, flush the buffer and make 1511 // sure it really was all marked. 1512 wbBufFlush1(p) 1513 } else { 1514 p.wbBuf.reset() 1515 } 1516 1517 gcw := &p.gcw 1518 if !gcw.empty() { 1519 printlock() 1520 print("runtime: P ", p.id, " flushedWork ", gcw.flushedWork) 1521 if gcw.wbuf1 == nil { 1522 print(" wbuf1=<nil>") 1523 } else { 1524 print(" wbuf1.n=", gcw.wbuf1.nobj) 1525 } 1526 if gcw.wbuf2 == nil { 1527 print(" wbuf2=<nil>") 1528 } else { 1529 print(" wbuf2.n=", gcw.wbuf2.nobj) 1530 } 1531 print("\n") 1532 throw("P has cached GC work at end of mark termination") 1533 } 1534 // There may still be cached empty buffers, which we 1535 // need to flush since we're going to free them. Also, 1536 // there may be non-zero stats because we allocated 1537 // black after the gcMarkDone barrier. 1538 gcw.dispose() 1539 } 1540 1541 // Flush scanAlloc from each mcache since we're about to modify 1542 // heapScan directly. If we were to flush this later, then scanAlloc 1543 // might have incorrect information. 1544 // 1545 // Note that it's not important to retain this information; we know 1546 // exactly what heapScan is at this point via scanWork. 1547 for _, p := range allp { 1548 c := p.mcache 1549 if c == nil { 1550 continue 1551 } 1552 c.scanAlloc = 0 1553 } 1554 1555 // Reset controller state. 1556 gcController.resetLive(work.bytesMarked) 1557 } 1558 1559 // gcSweep must be called on the system stack because it acquires the heap 1560 // lock. See mheap for details. 1561 // 1562 // Returns true if the heap was fully swept by this function. 1563 // 1564 // The world must be stopped. 1565 // 1566 //go:systemstack 1567 func gcSweep(mode gcMode) bool { 1568 assertWorldStopped() 1569 1570 if gcphase != _GCoff { 1571 throw("gcSweep being done but phase is not GCoff") 1572 } 1573 1574 lock(&mheap_.lock) 1575 mheap_.sweepgen += 2 1576 sweep.active.reset() 1577 mheap_.pagesSwept.Store(0) 1578 mheap_.sweepArenas = mheap_.allArenas 1579 mheap_.reclaimIndex.Store(0) 1580 mheap_.reclaimCredit.Store(0) 1581 unlock(&mheap_.lock) 1582 1583 sweep.centralIndex.clear() 1584 1585 if !concurrentSweep || mode == gcForceBlockMode { 1586 // Special case synchronous sweep. 1587 // Record that no proportional sweeping has to happen. 1588 lock(&mheap_.lock) 1589 mheap_.sweepPagesPerByte = 0 1590 unlock(&mheap_.lock) 1591 // Flush all mcaches. 1592 for _, pp := range allp { 1593 pp.mcache.prepareForSweep() 1594 } 1595 // Sweep all spans eagerly. 1596 for sweepone() != ^uintptr(0) { 1597 } 1598 // Free workbufs eagerly. 1599 prepareFreeWorkbufs() 1600 for freeSomeWbufs(false) { 1601 } 1602 // All "free" events for this mark/sweep cycle have 1603 // now happened, so we can make this profile cycle 1604 // available immediately. 1605 mProf_NextCycle() 1606 mProf_Flush() 1607 return true 1608 } 1609 1610 // Background sweep. 1611 lock(&sweep.lock) 1612 if sweep.parked { 1613 sweep.parked = false 1614 ready(sweep.g, 0, true) 1615 } 1616 unlock(&sweep.lock) 1617 return false 1618 } 1619 1620 // gcResetMarkState resets global state prior to marking (concurrent 1621 // or STW) and resets the stack scan state of all Gs. 1622 // 1623 // This is safe to do without the world stopped because any Gs created 1624 // during or after this will start out in the reset state. 1625 // 1626 // gcResetMarkState must be called on the system stack because it acquires 1627 // the heap lock. See mheap for details. 1628 // 1629 //go:systemstack 1630 func gcResetMarkState() { 1631 // This may be called during a concurrent phase, so lock to make sure 1632 // allgs doesn't change. 1633 forEachG(func(gp *g) { 1634 gp.gcscandone = false // set to true in gcphasework 1635 gp.gcAssistBytes = 0 1636 }) 1637 1638 // Clear page marks. This is just 1MB per 64GB of heap, so the 1639 // time here is pretty trivial. 1640 lock(&mheap_.lock) 1641 arenas := mheap_.allArenas 1642 unlock(&mheap_.lock) 1643 for _, ai := range arenas { 1644 ha := mheap_.arenas[ai.l1()][ai.l2()] 1645 for i := range ha.pageMarks { 1646 ha.pageMarks[i] = 0 1647 } 1648 } 1649 1650 work.bytesMarked = 0 1651 work.initialHeapLive = gcController.heapLive.Load() 1652 } 1653 1654 // Hooks for other packages 1655 1656 var poolcleanup func() 1657 var boringCaches []unsafe.Pointer // for crypto/internal/boring 1658 1659 //go:linkname sync_runtime_registerPoolCleanup sync.runtime_registerPoolCleanup 1660 func sync_runtime_registerPoolCleanup(f func()) { 1661 poolcleanup = f 1662 } 1663 1664 //go:linkname boring_registerCache crypto/internal/boring/bcache.registerCache 1665 func boring_registerCache(p unsafe.Pointer) { 1666 boringCaches = append(boringCaches, p) 1667 } 1668 1669 func clearpools() { 1670 // clear sync.Pools 1671 if poolcleanup != nil { 1672 poolcleanup() 1673 } 1674 1675 // clear boringcrypto caches 1676 for _, p := range boringCaches { 1677 atomicstorep(p, nil) 1678 } 1679 1680 // Clear central sudog cache. 1681 // Leave per-P caches alone, they have strictly bounded size. 1682 // Disconnect cached list before dropping it on the floor, 1683 // so that a dangling ref to one entry does not pin all of them. 1684 lock(&sched.sudoglock) 1685 var sg, sgnext *sudog 1686 for sg = sched.sudogcache; sg != nil; sg = sgnext { 1687 sgnext = sg.next 1688 sg.next = nil 1689 } 1690 sched.sudogcache = nil 1691 unlock(&sched.sudoglock) 1692 1693 // Clear central defer pool. 1694 // Leave per-P pools alone, they have strictly bounded size. 1695 lock(&sched.deferlock) 1696 // disconnect cached list before dropping it on the floor, 1697 // so that a dangling ref to one entry does not pin all of them. 1698 var d, dlink *_defer 1699 for d = sched.deferpool; d != nil; d = dlink { 1700 dlink = d.link 1701 d.link = nil 1702 } 1703 sched.deferpool = nil 1704 unlock(&sched.deferlock) 1705 } 1706 1707 // Timing 1708 1709 // itoaDiv formats val/(10**dec) into buf. 1710 func itoaDiv(buf []byte, val uint64, dec int) []byte { 1711 i := len(buf) - 1 1712 idec := i - dec 1713 for val >= 10 || i >= idec { 1714 buf[i] = byte(val%10 + '0') 1715 i-- 1716 if i == idec { 1717 buf[i] = '.' 1718 i-- 1719 } 1720 val /= 10 1721 } 1722 buf[i] = byte(val + '0') 1723 return buf[i:] 1724 } 1725 1726 // fmtNSAsMS nicely formats ns nanoseconds as milliseconds. 1727 func fmtNSAsMS(buf []byte, ns uint64) []byte { 1728 if ns >= 10e6 { 1729 // Format as whole milliseconds. 1730 return itoaDiv(buf, ns/1e6, 0) 1731 } 1732 // Format two digits of precision, with at most three decimal places. 1733 x := ns / 1e3 1734 if x == 0 { 1735 buf[0] = '0' 1736 return buf[:1] 1737 } 1738 dec := 3 1739 for x >= 100 { 1740 x /= 10 1741 dec-- 1742 } 1743 return itoaDiv(buf, x, dec) 1744 } 1745 1746 // Helpers for testing GC. 1747 1748 // gcTestMoveStackOnNextCall causes the stack to be moved on a call 1749 // immediately following the call to this. It may not work correctly 1750 // if any other work appears after this call (such as returning). 1751 // Typically the following call should be marked go:noinline so it 1752 // performs a stack check. 1753 // 1754 // In rare cases this may not cause the stack to move, specifically if 1755 // there's a preemption between this call and the next. 1756 func gcTestMoveStackOnNextCall() { 1757 gp := getg() 1758 gp.stackguard0 = stackForceMove 1759 } 1760 1761 // gcTestIsReachable performs a GC and returns a bit set where bit i 1762 // is set if ptrs[i] is reachable. 1763 func gcTestIsReachable(ptrs ...unsafe.Pointer) (mask uint64) { 1764 // This takes the pointers as unsafe.Pointers in order to keep 1765 // them live long enough for us to attach specials. After 1766 // that, we drop our references to them. 1767 1768 if len(ptrs) > 64 { 1769 panic("too many pointers for uint64 mask") 1770 } 1771 1772 // Block GC while we attach specials and drop our references 1773 // to ptrs. Otherwise, if a GC is in progress, it could mark 1774 // them reachable via this function before we have a chance to 1775 // drop them. 1776 semacquire(&gcsema) 1777 1778 // Create reachability specials for ptrs. 1779 specials := make([]*specialReachable, len(ptrs)) 1780 for i, p := range ptrs { 1781 lock(&mheap_.speciallock) 1782 s := (*specialReachable)(mheap_.specialReachableAlloc.alloc()) 1783 unlock(&mheap_.speciallock) 1784 s.special.kind = _KindSpecialReachable 1785 if !addspecial(p, &s.special) { 1786 throw("already have a reachable special (duplicate pointer?)") 1787 } 1788 specials[i] = s 1789 // Make sure we don't retain ptrs. 1790 ptrs[i] = nil 1791 } 1792 1793 semrelease(&gcsema) 1794 1795 // Force a full GC and sweep. 1796 GC() 1797 1798 // Process specials. 1799 for i, s := range specials { 1800 if !s.done { 1801 printlock() 1802 println("runtime: object", i, "was not swept") 1803 throw("IsReachable failed") 1804 } 1805 if s.reachable { 1806 mask |= 1 << i 1807 } 1808 lock(&mheap_.speciallock) 1809 mheap_.specialReachableAlloc.free(unsafe.Pointer(s)) 1810 unlock(&mheap_.speciallock) 1811 } 1812 1813 return mask 1814 } 1815 1816 // gcTestPointerClass returns the category of what p points to, one of: 1817 // "heap", "stack", "data", "bss", "other". This is useful for checking 1818 // that a test is doing what it's intended to do. 1819 // 1820 // This is nosplit simply to avoid extra pointer shuffling that may 1821 // complicate a test. 1822 // 1823 //go:nosplit 1824 func gcTestPointerClass(p unsafe.Pointer) string { 1825 p2 := uintptr(noescape(p)) 1826 gp := getg() 1827 if gp.stack.lo <= p2 && p2 < gp.stack.hi { 1828 return "stack" 1829 } 1830 if base, _, _ := findObject(p2, 0, 0); base != 0 { 1831 return "heap" 1832 } 1833 for _, datap := range activeModules() { 1834 if datap.data <= p2 && p2 < datap.edata || datap.noptrdata <= p2 && p2 < datap.enoptrdata { 1835 return "data" 1836 } 1837 if datap.bss <= p2 && p2 < datap.ebss || datap.noptrbss <= p2 && p2 <= datap.enoptrbss { 1838 return "bss" 1839 } 1840 } 1841 KeepAlive(p) 1842 return "other" 1843 } 1844