1 // Copyright 2009 The Go Authors. All rights reserved. 2 // Use of this source code is governed by a BSD-style 3 // license that can be found in the LICENSE file. 4 5 /* 6 7 Multi-precision division. Here be dragons. 8 9 Given u and v, where u is n+m digits, and v is n digits (with no leading zeros), 10 the goal is to return quo, rem such that u = quo*v + rem, where 0 ≤ rem < v. 11 That is, quo = ⌊u/v⌋ where ⌊x⌋ denotes the floor (truncation to integer) of x, 12 and rem = u - quo·v. 13 14 15 Long Division 16 17 Division in a computer proceeds the same as long division in elementary school, 18 but computers are not as good as schoolchildren at following vague directions, 19 so we have to be much more precise about the actual steps and what can happen. 20 21 We work from most to least significant digit of the quotient, doing: 22 23 • Guess a digit q, the number of v to subtract from the current 24 section of u to zero out the topmost digit. 25 • Check the guess by multiplying q·v and comparing it against 26 the current section of u, adjusting the guess as needed. 27 • Subtract q·v from the current section of u. 28 • Add q to the corresponding section of the result quo. 29 30 When all digits have been processed, the final remainder is left in u 31 and returned as rem. 32 33 For example, here is a sketch of dividing 5 digits by 3 digits (n=3, m=2). 34 35 q₂ q₁ q₀ 36 _________________ 37 v₂ v₁ v₀ ) u₄ u₃ u₂ u₁ u₀ 38 ↓ ↓ ↓ | | 39 [u₄ u₃ u₂]| | 40 - [ q₂·v ]| | 41 ----------- ↓ | 42 [ rem | u₁]| 43 - [ q₁·v ]| 44 ----------- ↓ 45 [ rem | u₀] 46 - [ q₀·v ] 47 ------------ 48 [ rem ] 49 50 Instead of creating new storage for the remainders and copying digits from u 51 as indicated by the arrows, we use u's storage directly as both the source 52 and destination of the subtractions, so that the remainders overwrite 53 successive overlapping sections of u as the division proceeds, using a slice 54 of u to identify the current section. This avoids all the copying as well as 55 shifting of remainders. 56 57 Division of u with n+m digits by v with n digits (in base B) can in general 58 produce at most m+1 digits, because: 59 60 • u < B^(n+m) [B^(n+m) has n+m+1 digits] 61 • v ≥ B^(n-1) [B^(n-1) is the smallest n-digit number] 62 • u/v < B^(n+m) / B^(n-1) [divide bounds for u, v] 63 • u/v < B^(m+1) [simplify] 64 65 The first step is special: it takes the top n digits of u and divides them by 66 the n digits of v, producing the first quotient digit and an n-digit remainder. 67 In the example, q₂ = ⌊u₄u₃u₂ / v⌋. 68 69 The first step divides n digits by n digits to ensure that it produces only a 70 single digit. 71 72 Each subsequent step appends the next digit from u to the remainder and divides 73 those n+1 digits by the n digits of v, producing another quotient digit and a 74 new n-digit remainder. 75 76 Subsequent steps divide n+1 digits by n digits, an operation that in general 77 might produce two digits. However, as used in the algorithm, that division is 78 guaranteed to produce only a single digit. The dividend is of the form 79 rem·B + d, where rem is a remainder from the previous step and d is a single 80 digit, so: 81 82 • rem ≤ v - 1 [rem is a remainder from dividing by v] 83 • rem·B ≤ v·B - B [multiply by B] 84 • d ≤ B - 1 [d is a single digit] 85 • rem·B + d ≤ v·B - 1 [add] 86 • rem·B + d < v·B [change ≤ to <] 87 • (rem·B + d)/v < B [divide by v] 88 89 90 Guess and Check 91 92 At each step we need to divide n+1 digits by n digits, but this is for the 93 implementation of division by n digits, so we can't just invoke a division 94 routine: we _are_ the division routine. Instead, we guess at the answer and 95 then check it using multiplication. If the guess is wrong, we correct it. 96 97 How can this guessing possibly be efficient? It turns out that the following 98 statement (let's call it the Good Guess Guarantee) is true. 99 100 If 101 102 • q = ⌊u/v⌋ where u is n+1 digits and v is n digits, 103 • q < B, and 104 • the topmost digit of v = vₙ₋₁ ≥ B/2, 105 106 then q̂ = ⌊uₙuₙ₋₁ / vₙ₋₁⌋ satisfies q ≤ q̂ ≤ q+2. (Proof below.) 107 108 That is, if we know the answer has only a single digit and we guess an answer 109 by ignoring the bottom n-1 digits of u and v, using a 2-by-1-digit division, 110 then that guess is at least as large as the correct answer. It is also not 111 too much larger: it is off by at most two from the correct answer. 112 113 Note that in the first step of the overall division, which is an n-by-n-digit 114 division, the 2-by-1 guess uses an implicit uₙ = 0. 115 116 Note that using a 2-by-1-digit division here does not mean calling ourselves 117 recursively. Instead, we use an efficient direct hardware implementation of 118 that operation. 119 120 Note that because q is u/v rounded down, q·v must not exceed u: u ≥ q·v. 121 If a guess q̂ is too big, it will not satisfy this test. Viewed a different way, 122 the remainder r̂ for a given q̂ is u - q̂·v, which must be positive. If it is 123 negative, then the guess q̂ is too big. 124 125 This gives us a way to compute q. First compute q̂ with 2-by-1-digit division. 126 Then, while u < q̂·v, decrement q̂; this loop executes at most twice, because 127 q̂ ≤ q+2. 128 129 130 Scaling Inputs 131 132 The Good Guess Guarantee requires that the top digit of v (vₙ₋₁) be at least B/2. 133 For example in base 10, ⌊172/19⌋ = 9, but ⌊18/1⌋ = 18: the guess is wildly off 134 because the first digit 1 is smaller than B/2 = 5. 135 136 We can ensure that v has a large top digit by multiplying both u and v by the 137 right amount. Continuing the example, if we multiply both 172 and 19 by 3, we 138 now have ⌊516/57⌋, the leading digit of v is now ≥ 5, and sure enough 139 ⌊51/5⌋ = 10 is much closer to the correct answer 9. It would be easier here 140 to multiply by 4, because that can be done with a shift. Specifically, we can 141 always count the number of leading zeros i in the first digit of v and then 142 shift both u and v left by i bits. 143 144 Having scaled u and v, the value ⌊u/v⌋ is unchanged, but the remainder will 145 be scaled: 172 mod 19 is 1, but 516 mod 57 is 3. We have to divide the remainder 146 by the scaling factor (shifting right i bits) when we finish. 147 148 Note that these shifts happen before and after the entire division algorithm, 149 not at each step in the per-digit iteration. 150 151 Note the effect of scaling inputs on the size of the possible quotient. 152 In the scaled u/v, u can gain a digit from scaling; v never does, because we 153 pick the scaling factor to make v's top digit larger but without overflowing. 154 If u and v have n+m and n digits after scaling, then: 155 156 • u < B^(n+m) [B^(n+m) has n+m+1 digits] 157 • v ≥ B^n / 2 [vₙ₋₁ ≥ B/2, so vₙ₋₁·B^(n-1) ≥ B^n/2] 158 • u/v < B^(n+m) / (B^n / 2) [divide bounds for u, v] 159 • u/v < 2 B^m [simplify] 160 161 The quotient can still have m+1 significant digits, but if so the top digit 162 must be a 1. This provides a different way to handle the first digit of the 163 result: compare the top n digits of u against v and fill in either a 0 or a 1. 164 165 166 Refining Guesses 167 168 Before we check whether u < q̂·v, we can adjust our guess to change it from 169 q̂ = ⌊uₙuₙ₋₁ / vₙ₋₁⌋ into the refined guess ⌊uₙuₙ₋₁uₙ₋₂ / vₙ₋₁vₙ₋₂⌋. 170 Although not mentioned above, the Good Guess Guarantee also promises that this 171 3-by-2-digit division guess is more precise and at most one away from the real 172 answer q. The improvement from the 2-by-1 to the 3-by-2 guess can also be done 173 without n-digit math. 174 175 If we have a guess q̂ = ⌊uₙuₙ₋₁ / vₙ₋₁⌋ and we want to see if it also equal to 176 ⌊uₙuₙ₋₁uₙ₋₂ / vₙ₋₁vₙ₋₂⌋, we can use the same check we would for the full division: 177 if uₙuₙ₋₁uₙ₋₂ < q̂·vₙ₋₁vₙ₋₂, then the guess is too large and should be reduced. 178 179 Checking uₙuₙ₋₁uₙ₋₂ < q̂·vₙ₋₁vₙ₋₂ is the same as uₙuₙ₋₁uₙ₋₂ - q̂·vₙ₋₁vₙ₋₂ < 0, 180 and 181 182 uₙuₙ₋₁uₙ₋₂ - q̂·vₙ₋₁vₙ₋₂ = (uₙuₙ₋₁·B + uₙ₋₂) - q̂·(vₙ₋₁·B + vₙ₋₂) 183 [splitting off the bottom digit] 184 = (uₙuₙ₋₁ - q̂·vₙ₋₁)·B + uₙ₋₂ - q̂·vₙ₋₂ 185 [regrouping] 186 187 The expression (uₙuₙ₋₁ - q̂·vₙ₋₁) is the remainder of uₙuₙ₋₁ / vₙ₋₁. 188 If the initial guess returns both q̂ and its remainder r̂, then checking 189 whether uₙuₙ₋₁uₙ₋₂ < q̂·vₙ₋₁vₙ₋₂ is the same as checking r̂·B + uₙ₋₂ < q̂·vₙ₋₂. 190 191 If we find that r̂·B + uₙ₋₂ < q̂·vₙ₋₂, then we can adjust the guess by 192 decrementing q̂ and adding vₙ₋₁ to r̂. We repeat until r̂·B + uₙ₋₂ ≥ q̂·vₙ₋₂. 193 (As before, this fixup is only needed at most twice.) 194 195 Now that q̂ = ⌊uₙuₙ₋₁uₙ₋₂ / vₙ₋₁vₙ₋₂⌋, as mentioned above it is at most one 196 away from the correct q, and we've avoided doing any n-digit math. 197 (If we need the new remainder, it can be computed as r̂·B + uₙ₋₂ - q̂·vₙ₋₂.) 198 199 The final check u < q̂·v and the possible fixup must be done at full precision. 200 For random inputs, a fixup at this step is exceedingly rare: the 3-by-2 guess 201 is not often wrong at all. But still we must do the check. Note that since the 202 3-by-2 guess is off by at most 1, it can be convenient to perform the final 203 u < q̂·v as part of the computation of the remainder r = u - q̂·v. If the 204 subtraction underflows, decremeting q̂ and adding one v back to r is enough to 205 arrive at the final q, r. 206 207 That's the entirety of long division: scale the inputs, and then loop over 208 each output position, guessing, checking, and correcting the next output digit. 209 210 For a 2n-digit number divided by an n-digit number (the worst size-n case for 211 division complexity), this algorithm uses n+1 iterations, each of which must do 212 at least the 1-by-n-digit multiplication q̂·v. That's O(n) iterations of 213 O(n) time each, so O(n²) time overall. 214 215 216 Recursive Division 217 218 For very large inputs, it is possible to improve on the O(n²) algorithm. 219 Let's call a group of n/2 real digits a (very) “wide digit”. We can run the 220 standard long division algorithm explained above over the wide digits instead of 221 the actual digits. This will result in many fewer steps, but the math involved in 222 each step is more work. 223 224 Where basic long division uses a 2-by-1-digit division to guess the initial q̂, 225 the new algorithm must use a 2-by-1-wide-digit division, which is of course 226 really an n-by-n/2-digit division. That's OK: if we implement n-digit division 227 in terms of n/2-digit division, the recursion will terminate when the divisor 228 becomes small enough to handle with standard long division or even with the 229 2-by-1 hardware instruction. 230 231 For example, here is a sketch of dividing 10 digits by 4, proceeding with 232 wide digits corresponding to two regular digits. The first step, still special, 233 must leave off a (regular) digit, dividing 5 by 4 and producing a 4-digit 234 remainder less than v. The middle steps divide 6 digits by 4, guaranteed to 235 produce two output digits each (one wide digit) with 4-digit remainders. 236 The final step must use what it has: the 4-digit remainder plus one more, 237 5 digits to divide by 4. 238 239 q₆ q₅ q₄ q₃ q₂ q₁ q₀ 240 _______________________________ 241 v₃ v₂ v₁ v₀ ) u₉ u₈ u₇ u₆ u₅ u₄ u₃ u₂ u₁ u₀ 242 ↓ ↓ ↓ ↓ ↓ | | | | | 243 [u₉ u₈ u₇ u₆ u₅]| | | | | 244 - [ q₆q₅·v ]| | | | | 245 ----------------- ↓ ↓ | | | 246 [ rem |u₄ u₃]| | | 247 - [ q₄q₃·v ]| | | 248 -------------------- ↓ ↓ | 249 [ rem |u₂ u₁]| 250 - [ q₂q₁·v ]| 251 -------------------- ↓ 252 [ rem |u₀] 253 - [ q₀·v ] 254 ------------------ 255 [ rem ] 256 257 An alternative would be to look ahead to how well n/2 divides into n+m and 258 adjust the first step to use fewer digits as needed, making the first step 259 more special to make the last step not special at all. For example, using the 260 same input, we could choose to use only 4 digits in the first step, leaving 261 a full wide digit for the last step: 262 263 q₆ q₅ q₄ q₃ q₂ q₁ q₀ 264 _______________________________ 265 v₃ v₂ v₁ v₀ ) u₉ u₈ u₇ u₆ u₅ u₄ u₃ u₂ u₁ u₀ 266 ↓ ↓ ↓ ↓ | | | | | | 267 [u₉ u₈ u₇ u₆]| | | | | | 268 - [ q₆·v ]| | | | | | 269 -------------- ↓ ↓ | | | | 270 [ rem |u₅ u₄]| | | | 271 - [ q₅q₄·v ]| | | | 272 -------------------- ↓ ↓ | | 273 [ rem |u₃ u₂]| | 274 - [ q₃q₂·v ]| | 275 -------------------- ↓ ↓ 276 [ rem |u₁ u₀] 277 - [ q₁q₀·v ] 278 --------------------- 279 [ rem ] 280 281 Today, the code in divRecursiveStep works like the first example. Perhaps in 282 the future we will make it work like the alternative, to avoid a special case 283 in the final iteration. 284 285 Either way, each step is a 3-by-2-wide-digit division approximated first by 286 a 2-by-1-wide-digit division, just as we did for regular digits in long division. 287 Because the actual answer we want is a 3-by-2-wide-digit division, instead of 288 multiplying q̂·v directly during the fixup, we can use the quick refinement 289 from long division (an n/2-by-n/2 multiply) to correct q to its actual value 290 and also compute the remainder (as mentioned above), and then stop after that, 291 never doing a full n-by-n multiply. 292 293 Instead of using an n-by-n/2-digit division to produce n/2 digits, we can add 294 (not discard) one more real digit, doing an (n+1)-by-(n/2+1)-digit division that 295 produces n/2+1 digits. That single extra digit tightens the Good Guess Guarantee 296 to q ≤ q̂ ≤ q+1 and lets us drop long division's special treatment of the first 297 digit. These benefits are discussed more after the Good Guess Guarantee proof 298 below. 299 300 301 How Fast is Recursive Division? 302 303 For a 2n-by-n-digit division, this algorithm runs a 4-by-2 long division over 304 wide digits, producing two wide digits plus a possible leading regular digit 1, 305 which can be handled without a recursive call. That is, the algorithm uses two 306 full iterations, each using an n-by-n/2-digit division and an n/2-by-n/2-digit 307 multiplication, along with a few n-digit additions and subtractions. The standard 308 n-by-n-digit multiplication algorithm requires O(n²) time, making the overall 309 algorithm require time T(n) where 310 311 T(n) = 2T(n/2) + O(n) + O(n²) 312 313 which, by the Bentley-Haken-Saxe theorem, ends up reducing to T(n) = O(n²). 314 This is not an improvement over regular long division. 315 316 When the number of digits n becomes large enough, Karatsuba's algorithm for 317 multiplication can be used instead, which takes O(n^log₂3) = O(n^1.6) time. 318 (Karatsuba multiplication is implemented in func karatsuba in nat.go.) 319 That makes the overall recursive division algorithm take O(n^1.6) time as well, 320 which is an improvement, but again only for large enough numbers. 321 322 It is not critical to make sure that every recursion does only two recursive 323 calls. While in general the number of recursive calls can change the time 324 analysis, in this case doing three calls does not change the analysis: 325 326 T(n) = 3T(n/2) + O(n) + O(n^log₂3) 327 328 ends up being T(n) = O(n^log₂3). Because the Karatsuba multiplication taking 329 time O(n^log₂3) is itself doing 3 half-sized recursions, doing three for the 330 division does not hurt the asymptotic performance. Of course, it is likely 331 still faster in practice to do two. 332 333 334 Proof of the Good Guess Guarantee 335 336 Given numbers x, y, let us break them into the quotients and remainders when 337 divided by some scaling factor S, with the added constraints that the quotient 338 x/y and the high part of y are both less than some limit T, and that the high 339 part of y is at least half as big as T. 340 341 x₁ = ⌊x/S⌋ y₁ = ⌊y/S⌋ 342 x₀ = x mod S y₀ = y mod S 343 344 x = x₁·S + x₀ 0 ≤ x₀ < S x/y < T 345 y = y₁·S + y₀ 0 ≤ y₀ < S T/2 ≤ y₁ < T 346 347 And consider the two truncated quotients: 348 349 q = ⌊x/y⌋ 350 q̂ = ⌊x₁/y₁⌋ 351 352 We will prove that q ≤ q̂ ≤ q+2. 353 354 The guarantee makes no real demands on the scaling factor S: it is simply the 355 magnitude of the digits cut from both x and y to produce x₁ and y₁. 356 The guarantee makes only limited demands on T: it must be large enough to hold 357 the quotient x/y, and y₁ must have roughly the same size. 358 359 To apply to the earlier discussion of 2-by-1 guesses in long division, 360 we would choose: 361 362 S = Bⁿ⁻¹ 363 T = B 364 x = u 365 x₁ = uₙuₙ₋₁ 366 x₀ = uₙ₋₂...u₀ 367 y = v 368 y₁ = vₙ₋₁ 369 y₀ = vₙ₋₂...u₀ 370 371 These simpler variables avoid repeating those longer expressions in the proof. 372 373 Note also that, by definition, truncating division ⌊x/y⌋ satisfies 374 375 x/y - 1 < ⌊x/y⌋ ≤ x/y. 376 377 This fact will be used a few times in the proofs. 378 379 Proof that q ≤ q̂: 380 381 q̂·y₁ = ⌊x₁/y₁⌋·y₁ [by definition, q̂ = ⌊x₁/y₁⌋] 382 > (x₁/y₁ - 1)·y₁ [x₁/y₁ - 1 < ⌊x₁/y₁⌋] 383 = x₁ - y₁ [distribute y₁] 384 385 So q̂·y₁ > x₁ - y₁. 386 Since q̂·y₁ is an integer, q̂·y₁ ≥ x₁ - y₁ + 1. 387 388 q̂ - q = q̂ - ⌊x/y⌋ [by definition, q = ⌊x/y⌋] 389 ≥ q̂ - x/y [⌊x/y⌋ < x/y] 390 = (1/y)·(q̂·y - x) [factor out 1/y] 391 ≥ (1/y)·(q̂·y₁·S - x) [y = y₁·S + y₀ ≥ y₁·S] 392 ≥ (1/y)·((x₁ - y₁ + 1)·S - x) [above: q̂·y₁ ≥ x₁ - y₁ + 1] 393 = (1/y)·(x₁·S - y₁·S + S - x) [distribute S] 394 = (1/y)·(S - x₀ - y₁·S) [-x = -x₁·S - x₀] 395 > -y₁·S / y [x₀ < S, so S - x₀ < 0; drop it] 396 ≥ -1 [y₁·S ≤ y] 397 398 So q̂ - q > -1. 399 Since q̂ - q is an integer, q̂ - q ≥ 0, or equivalently q ≤ q̂. 400 401 Proof that q̂ ≤ q+2: 402 403 x₁/y₁ - x/y = x₁·S/y₁·S - x/y [multiply left term by S/S] 404 ≤ x/y₁·S - x/y [x₁S ≤ x] 405 = (x/y)·(y/y₁·S - 1) [factor out x/y] 406 = (x/y)·((y - y₁·S)/y₁·S) [move -1 into y/y₁·S fraction] 407 = (x/y)·(y₀/y₁·S) [y - y₁·S = y₀] 408 = (x/y)·(1/y₁)·(y₀/S) [factor out 1/y₁] 409 < (x/y)·(1/y₁) [y₀ < S, so y₀/S < 1] 410 ≤ (x/y)·(2/T) [y₁ ≥ T/2, so 1/y₁ ≤ 2/T] 411 < T·(2/T) [x/y < T] 412 = 2 [T·(2/T) = 2] 413 414 So x₁/y₁ - x/y < 2. 415 416 q̂ - q = ⌊x₁/y₁⌋ - q [by definition, q̂ = ⌊x₁/y₁⌋] 417 = ⌊x₁/y₁⌋ - ⌊x/y⌋ [by definition, q = ⌊x/y⌋] 418 ≤ x₁/y₁ - ⌊x/y⌋ [⌊x₁/y₁⌋ ≤ x₁/y₁] 419 < x₁/y₁ - (x/y - 1) [⌊x/y⌋ > x/y - 1] 420 = (x₁/y₁ - x/y) + 1 [regrouping] 421 < 2 + 1 [above: x₁/y₁ - x/y < 2] 422 = 3 423 424 So q̂ - q < 3. 425 Since q̂ - q is an integer, q̂ - q ≤ 2. 426 427 Note that when x/y < T/2, the bounds tighten to x₁/y₁ - x/y < 1 and therefore 428 q̂ - q ≤ 1. 429 430 Note also that in the general case 2n-by-n division where we don't know that 431 x/y < T, we do know that x/y < 2T, yielding the bound q̂ - q ≤ 4. So we could 432 remove the special case first step of long division as long as we allow the 433 first fixup loop to run up to four times. (Using a simple comparison to decide 434 whether the first digit is 0 or 1 is still more efficient, though.) 435 436 Finally, note that when dividing three leading base-B digits by two (scaled), 437 we have T = B² and x/y < B = T/B, a much tighter bound than x/y < T. 438 This in turn yields the much tighter bound x₁/y₁ - x/y < 2/B. This means that 439 ⌊x₁/y₁⌋ and ⌊x/y⌋ can only differ when x/y is less than 2/B greater than an 440 integer. For random x and y, the chance of this is 2/B, or, for large B, 441 approximately zero. This means that after we produce the 3-by-2 guess in the 442 long division algorithm, the fixup loop essentially never runs. 443 444 In the recursive algorithm, the extra digit in (2·⌊n/2⌋+1)-by-(⌊n/2⌋+1)-digit 445 division has exactly the same effect: the probability of needing a fixup is the 446 same 2/B. Even better, we can allow the general case x/y < 2T and the fixup 447 probability only grows to 4/B, still essentially zero. 448 449 450 References 451 452 There are no great references for implementing long division; thus this comment. 453 Here are some notes about what to expect from the obvious references. 454 455 Knuth Volume 2 (Seminumerical Algorithms) section 4.3.1 is the usual canonical 456 reference for long division, but that entire series is highly compressed, never 457 repeating a necessary fact and leaving important insights to the exercises. 458 For example, no rationale whatsoever is given for the calculation that extends 459 q̂ from a 2-by-1 to a 3-by-2 guess, nor why it reduces the error bound. 460 The proof that the calculation even has the desired effect is left to exercises. 461 The solutions to those exercises provided at the back of the book are entirely 462 calculations, still with no explanation as to what is going on or how you would 463 arrive at the idea of doing those exact calculations. Nowhere is it mentioned 464 that this test extends the 2-by-1 guess into a 3-by-2 guess. The proof of the 465 Good Guess Guarantee is only for the 2-by-1 guess and argues by contradiction, 466 making it difficult to understand how modifications like adding another digit 467 or adjusting the quotient range affects the overall bound. 468 469 All that said, Knuth remains the canonical reference. It is dense but packed 470 full of information and references, and the proofs are simpler than many other 471 presentations. The proofs above are reworkings of Knuth's to remove the 472 arguments by contradiction and add explanations or steps that Knuth omitted. 473 But beware of errors in older printings. Take the published errata with you. 474 475 Brinch Hansen's “Multiple-length Division Revisited: a Tour of the Minefield” 476 starts with a blunt critique of Knuth's presentation (among others) and then 477 presents a more detailed and easier to follow treatment of long division, 478 including an implementation in Pascal. But the algorithm and implementation 479 work entirely in terms of 3-by-2 division, which is much less useful on modern 480 hardware than an algorithm using 2-by-1 division. The proofs are a bit too 481 focused on digit counting and seem needlessly complex, especially compared to 482 the ones given above. 483 484 Burnikel and Ziegler's “Fast Recursive Division” introduced the key insight of 485 implementing division by an n-digit divisor using recursive calls to division 486 by an n/2-digit divisor, relying on Karatsuba multiplication to yield a 487 sub-quadratic run time. However, the presentation decisions are made almost 488 entirely for the purpose of simplifying the run-time analysis, rather than 489 simplifying the presentation. Instead of a single algorithm that loops over 490 quotient digits, the paper presents two mutually-recursive algorithms, for 491 2n-by-n and 3n-by-2n. The paper also does not present any general (n+m)-by-n 492 algorithm. 493 494 The proofs in the paper are remarkably complex, especially considering that 495 the algorithm is at its core just long division on wide digits, so that the 496 usual long division proofs apply essentially unaltered. 497 */ 498 499 package big 500 501 import "math/bits" 502 503 // rem returns r such that r = u%v. 504 // It uses z as the storage for r. 505 func (z nat) rem(u, v nat) (r nat) { 506 if alias(z, u) { 507 z = nil 508 } 509 qp := getNat(0) 510 q, r := qp.div(z, u, v) 511 *qp = q 512 putNat(qp) 513 return r 514 } 515 516 // div returns q, r such that q = ⌊u/v⌋ and r = u%v = u - q·v. 517 // It uses z and z2 as the storage for q and r. 518 func (z nat) div(z2, u, v nat) (q, r nat) { 519 if len(v) == 0 { 520 panic("division by zero") 521 } 522 523 if u.cmp(v) < 0 { 524 q = z[:0] 525 r = z2.set(u) 526 return 527 } 528 529 if len(v) == 1 { 530 // Short division: long optimized for a single-word divisor. 531 // In that case, the 2-by-1 guess is all we need at each step. 532 var r2 Word 533 q, r2 = z.divW(u, v[0]) 534 r = z2.setWord(r2) 535 return 536 } 537 538 q, r = z.divLarge(z2, u, v) 539 return 540 } 541 542 // divW returns q, r such that q = ⌊x/y⌋ and r = x%y = x - q·y. 543 // It uses z as the storage for q. 544 // Note that y is a single digit (Word), not a big number. 545 func (z nat) divW(x nat, y Word) (q nat, r Word) { 546 m := len(x) 547 switch { 548 case y == 0: 549 panic("division by zero") 550 case y == 1: 551 q = z.set(x) // result is x 552 return 553 case m == 0: 554 q = z[:0] // result is 0 555 return 556 } 557 // m > 0 558 z = z.make(m) 559 r = divWVW(z, 0, x, y) 560 q = z.norm() 561 return 562 } 563 564 // modW returns x % d. 565 func (x nat) modW(d Word) (r Word) { 566 // TODO(agl): we don't actually need to store the q value. 567 var q nat 568 q = q.make(len(x)) 569 return divWVW(q, 0, x, d) 570 } 571 572 // divWVW overwrites z with ⌊x/y⌋, returning the remainder r. 573 // The caller must ensure that len(z) = len(x). 574 func divWVW(z []Word, xn Word, x []Word, y Word) (r Word) { 575 r = xn 576 if len(x) == 1 { 577 qq, rr := bits.Div(uint(r), uint(x[0]), uint(y)) 578 z[0] = Word(qq) 579 return Word(rr) 580 } 581 rec := reciprocalWord(y) 582 for i := len(z) - 1; i >= 0; i-- { 583 z[i], r = divWW(r, x[i], y, rec) 584 } 585 return r 586 } 587 588 // div returns q, r such that q = ⌊uIn/vIn⌋ and r = uIn%vIn = uIn - q·vIn. 589 // It uses z and u as the storage for q and r. 590 // The caller must ensure that len(vIn) ≥ 2 (use divW otherwise) 591 // and that len(uIn) ≥ len(vIn) (the answer is 0, uIn otherwise). 592 func (z nat) divLarge(u, uIn, vIn nat) (q, r nat) { 593 n := len(vIn) 594 m := len(uIn) - n 595 596 // Scale the inputs so vIn's top bit is 1 (see “Scaling Inputs” above). 597 // vIn is treated as a read-only input (it may be in use by another 598 // goroutine), so we must make a copy. 599 // uIn is copied to u. 600 shift := nlz(vIn[n-1]) 601 vp := getNat(n) 602 v := *vp 603 shlVU(v, vIn, shift) 604 u = u.make(len(uIn) + 1) 605 u[len(uIn)] = shlVU(u[0:len(uIn)], uIn, shift) 606 607 // The caller should not pass aliased z and u, since those are 608 // the two different outputs, but correct just in case. 609 if alias(z, u) { 610 z = nil 611 } 612 q = z.make(m + 1) 613 614 // Use basic or recursive long division depending on size. 615 if n < divRecursiveThreshold { 616 q.divBasic(u, v) 617 } else { 618 q.divRecursive(u, v) 619 } 620 putNat(vp) 621 622 q = q.norm() 623 624 // Undo scaling of remainder. 625 shrVU(u, u, shift) 626 r = u.norm() 627 628 return q, r 629 } 630 631 // divBasic implements long division as described above. 632 // It overwrites q with ⌊u/v⌋ and overwrites u with the remainder r. 633 // q must be large enough to hold ⌊u/v⌋. 634 func (q nat) divBasic(u, v nat) { 635 n := len(v) 636 m := len(u) - n 637 638 qhatvp := getNat(n + 1) 639 qhatv := *qhatvp 640 641 // Set up for divWW below, precomputing reciprocal argument. 642 vn1 := v[n-1] 643 rec := reciprocalWord(vn1) 644 645 // Compute each digit of quotient. 646 for j := m; j >= 0; j-- { 647 // Compute the 2-by-1 guess q̂. 648 // The first iteration must invent a leading 0 for u. 649 qhat := Word(_M) 650 var ujn Word 651 if j+n < len(u) { 652 ujn = u[j+n] 653 } 654 655 // ujn ≤ vn1, or else q̂ would be more than one digit. 656 // For ujn == vn1, we set q̂ to the max digit M above. 657 // Otherwise, we compute the 2-by-1 guess. 658 if ujn != vn1 { 659 var rhat Word 660 qhat, rhat = divWW(ujn, u[j+n-1], vn1, rec) 661 662 // Refine q̂ to a 3-by-2 guess. See “Refining Guesses” above. 663 vn2 := v[n-2] 664 x1, x2 := mulWW(qhat, vn2) 665 ujn2 := u[j+n-2] 666 for greaterThan(x1, x2, rhat, ujn2) { // x1x2 > r̂ u[j+n-2] 667 qhat-- 668 prevRhat := rhat 669 rhat += vn1 670 // If r̂ overflows, then 671 // r̂ u[j+n-2]v[n-1] is now definitely > x1 x2. 672 if rhat < prevRhat { 673 break 674 } 675 // TODO(rsc): No need for a full mulWW. 676 // x2 += vn2; if x2 overflows, x1++ 677 x1, x2 = mulWW(qhat, vn2) 678 } 679 } 680 681 // Compute q̂·v. 682 qhatv[n] = mulAddVWW(qhatv[0:n], v, qhat, 0) 683 qhl := len(qhatv) 684 if j+qhl > len(u) && qhatv[n] == 0 { 685 qhl-- 686 } 687 688 // Subtract q̂·v from the current section of u. 689 // If it underflows, q̂·v > u, which we fix up 690 // by decrementing q̂ and adding v back. 691 c := subVV(u[j:j+qhl], u[j:], qhatv) 692 if c != 0 { 693 c := addVV(u[j:j+n], u[j:], v) 694 // If n == qhl, the carry from subVV and the carry from addVV 695 // cancel out and don't affect u[j+n]. 696 if n < qhl { 697 u[j+n] += c 698 } 699 qhat-- 700 } 701 702 // Save quotient digit. 703 // Caller may know the top digit is zero and not leave room for it. 704 if j == m && m == len(q) && qhat == 0 { 705 continue 706 } 707 q[j] = qhat 708 } 709 710 putNat(qhatvp) 711 } 712 713 // greaterThan reports whether the two digit numbers x1 x2 > y1 y2. 714 // TODO(rsc): In contradiction to most of this file, x1 is the high 715 // digit and x2 is the low digit. This should be fixed. 716 func greaterThan(x1, x2, y1, y2 Word) bool { 717 return x1 > y1 || x1 == y1 && x2 > y2 718 } 719 720 // divRecursiveThreshold is the number of divisor digits 721 // at which point divRecursive is faster than divBasic. 722 const divRecursiveThreshold = 100 723 724 // divRecursive implements recursive division as described above. 725 // It overwrites z with ⌊u/v⌋ and overwrites u with the remainder r. 726 // z must be large enough to hold ⌊u/v⌋. 727 // This function is just for allocating and freeing temporaries 728 // around divRecursiveStep, the real implementation. 729 func (z nat) divRecursive(u, v nat) { 730 // Recursion depth is (much) less than 2 log₂(len(v)). 731 // Allocate a slice of temporaries to be reused across recursion, 732 // plus one extra temporary not live across the recursion. 733 recDepth := 2 * bits.Len(uint(len(v))) 734 tmp := getNat(3 * len(v)) 735 temps := make([]*nat, recDepth) 736 737 z.clear() 738 z.divRecursiveStep(u, v, 0, tmp, temps) 739 740 // Free temporaries. 741 for _, n := range temps { 742 if n != nil { 743 putNat(n) 744 } 745 } 746 putNat(tmp) 747 } 748 749 // divRecursiveStep is the actual implementation of recursive division. 750 // It adds ⌊u/v⌋ to z and overwrites u with the remainder r. 751 // z must be large enough to hold ⌊u/v⌋. 752 // It uses temps[depth] (allocating if needed) as a temporary live across 753 // the recursive call. It also uses tmp, but not live across the recursion. 754 func (z nat) divRecursiveStep(u, v nat, depth int, tmp *nat, temps []*nat) { 755 // u is a subsection of the original and may have leading zeros. 756 // TODO(rsc): The v = v.norm() is useless and should be removed. 757 // We know (and require) that v's top digit is ≥ B/2. 758 u = u.norm() 759 v = v.norm() 760 if len(u) == 0 { 761 z.clear() 762 return 763 } 764 765 // Fall back to basic division if the problem is now small enough. 766 n := len(v) 767 if n < divRecursiveThreshold { 768 z.divBasic(u, v) 769 return 770 } 771 772 // Nothing to do if u is shorter than v (implies u < v). 773 m := len(u) - n 774 if m < 0 { 775 return 776 } 777 778 // We consider B digits in a row as a single wide digit. 779 // (See “Recursive Division” above.) 780 // 781 // TODO(rsc): rename B to Wide, to avoid confusion with _B, 782 // which is something entirely different. 783 // TODO(rsc): Look into whether using ⌈n/2⌉ is better than ⌊n/2⌋. 784 B := n / 2 785 786 // Allocate a nat for qhat below. 787 if temps[depth] == nil { 788 temps[depth] = getNat(n) // TODO(rsc): Can be just B+1. 789 } else { 790 *temps[depth] = temps[depth].make(B + 1) 791 } 792 793 // Compute each wide digit of the quotient. 794 // 795 // TODO(rsc): Change the loop to be 796 // for j := (m+B-1)/B*B; j > 0; j -= B { 797 // which will make the final step a regular step, letting us 798 // delete what amounts to an extra copy of the loop body below. 799 j := m 800 for j > B { 801 // Divide u[j-B:j+n] (3 wide digits) by v (2 wide digits). 802 // First make the 2-by-1-wide-digit guess using a recursive call. 803 // Then extend the guess to the full 3-by-2 (see “Refining Guesses”). 804 // 805 // For the 2-by-1-wide-digit guess, instead of doing 2B-by-B-digit, 806 // we use a (2B+1)-by-(B+1) digit, which handles the possibility that 807 // the result has an extra leading 1 digit as well as guaranteeing 808 // that the computed q̂ will be off by at most 1 instead of 2. 809 810 // s is the number of digits to drop from the 3B- and 2B-digit chunks. 811 // We drop B-1 to be left with 2B+1 and B+1. 812 s := (B - 1) 813 814 // uu is the up-to-3B-digit section of u we are working on. 815 uu := u[j-B:] 816 817 // Compute the 2-by-1 guess q̂, leaving r̂ in uu[s:B+n]. 818 qhat := *temps[depth] 819 qhat.clear() 820 qhat.divRecursiveStep(uu[s:B+n], v[s:], depth+1, tmp, temps) 821 qhat = qhat.norm() 822 823 // Extend to a 3-by-2 quotient and remainder. 824 // Because divRecursiveStep overwrote the top part of uu with 825 // the remainder r̂, the full uu already contains the equivalent 826 // of r̂·B + uₙ₋₂ from the “Refining Guesses” discussion. 827 // Subtracting q̂·vₙ₋₂ from it will compute the full-length remainder. 828 // If that subtraction underflows, q̂·v > u, which we fix up 829 // by decrementing q̂ and adding v back, same as in long division. 830 831 // TODO(rsc): Instead of subtract and fix-up, this code is computing 832 // q̂·vₙ₋₂ and decrementing q̂ until that product is ≤ u. 833 // But we can do the subtraction directly, as in the comment above 834 // and in long division, because we know that q̂ is wrong by at most one. 835 qhatv := tmp.make(3 * n) 836 qhatv.clear() 837 qhatv = qhatv.mul(qhat, v[:s]) 838 for i := 0; i < 2; i++ { 839 e := qhatv.cmp(uu.norm()) 840 if e <= 0 { 841 break 842 } 843 subVW(qhat, qhat, 1) 844 c := subVV(qhatv[:s], qhatv[:s], v[:s]) 845 if len(qhatv) > s { 846 subVW(qhatv[s:], qhatv[s:], c) 847 } 848 addAt(uu[s:], v[s:], 0) 849 } 850 if qhatv.cmp(uu.norm()) > 0 { 851 panic("impossible") 852 } 853 c := subVV(uu[:len(qhatv)], uu[:len(qhatv)], qhatv) 854 if c > 0 { 855 subVW(uu[len(qhatv):], uu[len(qhatv):], c) 856 } 857 addAt(z, qhat, j-B) 858 j -= B 859 } 860 861 // TODO(rsc): Rewrite loop as described above and delete all this code. 862 863 // Now u < (v<<B), compute lower bits in the same way. 864 // Choose shift = B-1 again. 865 s := B - 1 866 qhat := *temps[depth] 867 qhat.clear() 868 qhat.divRecursiveStep(u[s:].norm(), v[s:], depth+1, tmp, temps) 869 qhat = qhat.norm() 870 qhatv := tmp.make(3 * n) 871 qhatv.clear() 872 qhatv = qhatv.mul(qhat, v[:s]) 873 // Set the correct remainder as before. 874 for i := 0; i < 2; i++ { 875 if e := qhatv.cmp(u.norm()); e > 0 { 876 subVW(qhat, qhat, 1) 877 c := subVV(qhatv[:s], qhatv[:s], v[:s]) 878 if len(qhatv) > s { 879 subVW(qhatv[s:], qhatv[s:], c) 880 } 881 addAt(u[s:], v[s:], 0) 882 } 883 } 884 if qhatv.cmp(u.norm()) > 0 { 885 panic("impossible") 886 } 887 c := subVV(u[0:len(qhatv)], u[0:len(qhatv)], qhatv) 888 if c > 0 { 889 c = subVW(u[len(qhatv):], u[len(qhatv):], c) 890 } 891 if c > 0 { 892 panic("impossible") 893 } 894 895 // Done! 896 addAt(z, qhat.norm(), 0) 897 } 898