1 // Copyright 2019 The Go Authors. All rights reserved. 2 // Use of this source code is governed by a BSD-style 3 // license that can be found in the LICENSE file. 4 5 /* 6 Package ppc64 implements a PPC64 assembler that assembles Go asm into 7 the corresponding PPC64 instructions as defined by the Power ISA 3.0B. 8 9 This document provides information on how to write code in Go assembler 10 for PPC64, focusing on the differences between Go and PPC64 assembly language. 11 It assumes some knowledge of PPC64 assembler. The original implementation of 12 PPC64 in Go defined many opcodes that are different from PPC64 opcodes, but 13 updates to the Go assembly language used mnemonics that are mostly similar if not 14 identical to the PPC64 mneumonics, such as VMX and VSX instructions. Not all detail 15 is included here; refer to the Power ISA document if interested in more detail. 16 17 Starting with Go 1.15 the Go objdump supports the -gnu option, which provides a 18 side by side view of the Go assembler and the PPC64 assembler output. This is 19 extremely helpful in determining what final PPC64 assembly is generated from the 20 corresponding Go assembly. 21 22 In the examples below, the Go assembly is on the left, PPC64 assembly on the right. 23 24 1. Operand ordering 25 26 In Go asm, the last operand (right) is the target operand, but with PPC64 asm, 27 the first operand (left) is the target. The order of the remaining operands is 28 not consistent: in general opcodes with 3 operands that perform math or logical 29 operations have their operands in reverse order. Opcodes for vector instructions 30 and those with more than 3 operands usually have operands in the same order except 31 for the target operand, which is first in PPC64 asm and last in Go asm. 32 33 Example: 34 ADD R3, R4, R5 <=> add r5, r4, r3 35 36 2. Constant operands 37 38 In Go asm, an operand that starts with '$' indicates a constant value. If the 39 instruction using the constant has an immediate version of the opcode, then an 40 immediate value is used with the opcode if possible. 41 42 Example: 43 ADD $1, R3, R4 <=> addi r4, r3, 1 44 45 3. Opcodes setting condition codes 46 47 In PPC64 asm, some instructions other than compares have variations that can set 48 the condition code where meaningful. This is indicated by adding '.' to the end 49 of the PPC64 instruction. In Go asm, these instructions have 'CC' at the end of 50 the opcode. The possible settings of the condition code depend on the instruction. 51 CR0 is the default for fixed-point instructions; CR1 for floating point; CR6 for 52 vector instructions. 53 54 Example: 55 ANDCC R3, R4, R5 <=> and. r5, r3, r4 (set CR0) 56 57 4. Loads and stores from memory 58 59 In Go asm, opcodes starting with 'MOV' indicate a load or store. When the target 60 is a memory reference, then it is a store; when the target is a register and the 61 source is a memory reference, then it is a load. 62 63 MOV{B,H,W,D} variations identify the size as byte, halfword, word, doubleword. 64 65 Adding 'Z' to the opcode for a load indicates zero extend; if omitted it is sign extend. 66 Adding 'U' to a load or store indicates an update of the base register with the offset. 67 Adding 'BR' to an opcode indicates byte-reversed load or store, or the order opposite 68 of the expected endian order. If 'BR' is used then zero extend is assumed. 69 70 Memory references n(Ra) indicate the address in Ra + n. When used with an update form 71 of an opcode, the value in Ra is incremented by n. 72 73 Memory references (Ra+Rb) or (Ra)(Rb) indicate the address Ra + Rb, used by indexed 74 loads or stores. Both forms are accepted. When used with an update then the base register 75 is updated by the value in the index register. 76 77 Examples: 78 MOVD (R3), R4 <=> ld r4,0(r3) 79 MOVW (R3), R4 <=> lwa r4,0(r3) 80 MOVWZU 4(R3), R4 <=> lwzu r4,4(r3) 81 MOVWZ (R3+R5), R4 <=> lwzx r4,r3,r5 82 MOVHZ (R3), R4 <=> lhz r4,0(r3) 83 MOVHU 2(R3), R4 <=> lhau r4,2(r3) 84 MOVBZ (R3), R4 <=> lbz r4,0(r3) 85 86 MOVD R4,(R3) <=> std r4,0(r3) 87 MOVW R4,(R3) <=> stw r4,0(r3) 88 MOVW R4,(R3+R5) <=> stwx r4,r3,r5 89 MOVWU R4,4(R3) <=> stwu r4,4(r3) 90 MOVH R4,2(R3) <=> sth r4,2(r3) 91 MOVBU R4,(R3)(R5) <=> stbux r4,r3,r5 92 93 4. Compares 94 95 When an instruction does a compare or other operation that might 96 result in a condition code, then the resulting condition is set 97 in a field of the condition register. The condition register consists 98 of 8 4-bit fields named CR0 - CR7. When a compare instruction 99 identifies a CR then the resulting condition is set in that field 100 to be read by a later branch or isel instruction. Within these fields, 101 bits are set to indicate less than, greater than, or equal conditions. 102 103 Once an instruction sets a condition, then a subsequent branch, isel or 104 other instruction can read the condition field and operate based on the 105 bit settings. 106 107 Examples: 108 CMP R3, R4 <=> cmp r3, r4 (CR0 assumed) 109 CMP R3, R4, CR1 <=> cmp cr1, r3, r4 110 111 Note that the condition register is the target operand of compare opcodes, so 112 the remaining operands are in the same order for Go asm and PPC64 asm. 113 When CR0 is used then it is implicit and does not need to be specified. 114 115 5. Branches 116 117 Many branches are represented as a form of the BC instruction. There are 118 other extended opcodes to make it easier to see what type of branch is being 119 used. 120 121 The following is a brief description of the BC instruction and its commonly 122 used operands. 123 124 BC op1, op2, op3 125 126 op1: type of branch 127 16 -> bctr (branch on ctr) 128 12 -> bcr (branch if cr bit is set) 129 8 -> bcr+bctr (branch on ctr and cr values) 130 4 -> bcr != 0 (branch if specified cr bit is not set) 131 132 There are more combinations but these are the most common. 133 134 op2: condition register field and condition bit 135 136 This contains an immediate value indicating which condition field 137 to read and what bits to test. Each field is 4 bits long with CR0 138 at bit 0, CR1 at bit 4, etc. The value is computed as 4*CR+condition 139 with these condition values: 140 141 0 -> LT 142 1 -> GT 143 2 -> EQ 144 3 -> OVG 145 146 Thus 0 means test CR0 for LT, 5 means CR1 for GT, 30 means CR7 for EQ. 147 148 op3: branch target 149 150 Examples: 151 152 BC 12, 0, target <=> blt cr0, target 153 BC 12, 2, target <=> beq cr0, target 154 BC 12, 5, target <=> bgt cr1, target 155 BC 12, 30, target <=> beq cr7, target 156 BC 4, 6, target <=> bne cr1, target 157 BC 4, 1, target <=> ble cr1, target 158 159 The following extended opcodes are available for ease of use and readability: 160 161 BNE CR2, target <=> bne cr2, target 162 BEQ CR4, target <=> beq cr4, target 163 BLT target <=> blt target (cr0 default) 164 BGE CR7, target <=> bge cr7, target 165 166 Refer to the ISA for more information on additional values for the BC instruction, 167 how to handle OVG information, and much more. 168 169 5. Align directive 170 171 Starting with Go 1.12, Go asm supports the PCALIGN directive, which indicates 172 that the next instruction should be aligned to the specified value. Currently 173 8 and 16 are the only supported values, and a maximum of 2 NOPs will be added 174 to align the code. That means in the case where the code is aligned to 4 but 175 PCALIGN $16 is at that location, the code will only be aligned to 8 to avoid 176 adding 3 NOPs. 177 178 The purpose of this directive is to improve performance for cases like loops 179 where better alignment (8 or 16 instead of 4) might be helpful. This directive 180 exists in PPC64 assembler and is frequently used by PPC64 assembler writers. 181 182 PCALIGN $16 183 PCALIGN $8 184 185 Functions in Go are aligned to 16 bytes, as is the case in all other compilers 186 for PPC64. 187 188 6. Shift instructions 189 190 The simple scalar shifts on PPC64 expect a shift count that fits in 5 bits for 191 32-bit values or 6 bit for 64-bit values. If the shift count is a constant value 192 greater than the max then the assembler sets it to the max for that size (31 for 193 32 bit values, 63 for 64 bit values). If the shift count is in a register, then 194 only the low 5 or 6 bits of the register will be used as the shift count. The 195 Go compiler will add appropriate code to compare the shift value to achieve the 196 the correct result, and the assembler does not add extra checking. 197 198 Examples: 199 200 SRAD $8,R3,R4 => sradi r4,r3,8 201 SRD $8,R3,R4 => rldicl r4,r3,56,8 202 SLD $8,R3,R4 => rldicr r4,r3,8,55 203 SRAW $16,R4,R5 => srawi r5,r4,16 204 SRW $40,R4,R5 => rlwinm r5,r4,0,0,31 205 SLW $12,R4,R5 => rlwinm r5,r4,12,0,19 206 207 Some non-simple shifts have operands in the Go assembly which don't map directly 208 onto operands in the PPC64 assembly. When an operand in a shift instruction in the 209 Go assembly is a bit mask, that mask is represented as a start and end bit in the 210 PPC64 assembly instead of a mask. See the ISA for more detail on these types of shifts. 211 Here are a few examples: 212 213 RLWMI $7,R3,$65535,R6 => rlwimi r6,r3,7,16,31 214 RLDMI $0,R4,$7,R6 => rldimi r6,r4,0,61 215 216 More recently, Go opcodes were added which map directly onto the PPC64 opcodes. It is 217 recommended to use the newer opcodes to avoid confusion. 218 219 RLDICL $0,R4,$15,R6 => rldicl r6,r4,0,15 220 RLDICR $0,R4,$15,R6 => rldicr r6.r4,0,15 221 222 Register naming 223 224 1. Special register usage in Go asm 225 226 The following registers should not be modified by user Go assembler code. 227 228 R0: Go code expects this register to contain the value 0. 229 R1: Stack pointer 230 R2: TOC pointer when compiled with -shared or -dynlink (a.k.a position independent code) 231 R13: TLS pointer 232 R30: g (goroutine) 233 234 Register names: 235 236 Rn is used for general purpose registers. (0-31) 237 Fn is used for floating point registers. (0-31) 238 Vn is used for vector registers. Slot 0 of Vn overlaps with Fn. (0-31) 239 VSn is used for vector-scalar registers. V0-V31 overlap with VS32-VS63. (0-63) 240 CTR represents the count register. 241 LR represents the link register. 242 243 */ 244 package ppc64 245