golang[102]-assembly-汇编教程

introduce

golang的汇编基于plan9汇编,是一个中间汇编方式。这样可以忽略底层不同架构之间的一些差别。汇编主要了解各种寄存器的使用跟寻址方式。根据汇编我们能够一探golang的底层实现。比如内存如何分配,栈如何扩张。接口如何转变。

register

各种伪计数器:

  • FP: Frame pointer: arguments and locals.(指向当前栈帧)
  • PC: Program counter: jumps and branches.(指向指令地址)
  • SB: Static base pointer: global symbols.(指向全局符号表)
  • SP: Stack pointer: top of stack.(指向当前栈顶部)
  • 注意: 栈是向下整长 golang的汇编是调用者维护参数返回值跟返回地址。所以FP的值小于参数跟返回值。

analysis for add

think about this simple program:

1
2
3
4
5
6
package main

//go:noinline
func add(a, b int32) (int32, bool) { return a + b, true }

func main() { add(10, 32) }

generate assemly code in linux:

1
GOOS=linux GOARCH=amd64 go tool compile -S main.go

this is the logic of add function

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
// 0x0000: Offset of the current instruction, relative to the start of the function.
// TEXT "".add: The TEXT directive declares the "".add symbol as part of the .text section (i.e. runnable code) and indicates that the instructions that follow are the body of the function.
// The empty string "" will be replaced by the name of the current package at link-time: i.e., "".add will become main.add once linked into our final binary.
// (SB): SB is the virtual register that holds the "static-base" pointer, i.e. the address of the beginning of the address-space of our program.
// "".add(SB) declares that our symbol is located at some constant offset (computed by the linker) from the start of our address-space.


// NOSPLIT: Indicates to the compiler that it should not insert the stack-split preamble, which checks whether the current stack needs to be grown.
// In the case of our add function, the compiler has set the flag by itself: it is smart enough to figure that, since add has no local variables and no stack-frame of its own, it simply cannot outgrow the current stack; thus it'd be a complete waste of CPU cycles to run these checks at each call site.


// $0-16: $0 denotes the size in bytes of the stack-frame that will be allocated; while $16 specifies the size of the arguments passed in by the caller.
// In the general case, the frame size is followed by an argument size, separated by a minus sign. (It's not a subtraction, just idiosyncratic syntax.)
// The frame size $24-8 states that the function has a 24-byte frame and is called with 8 bytes of argument, which live on the caller's frame.
// If NOSPLIT is not specified for the TEXT, the argument size must be provided. For assembly functions with Go prototypes, go vet will check that the argument size is correct.

0x0000 00000 (main.go:4) TEXT "".add(SB), NOSPLIT|ABIInternal, $0-16

//for GOLANG GC
0x0000 00000 (main.go:4) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
0x0000 00000 (main.go:4) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
0x0000 00000 (main.go:4) FUNCDATA $3, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)

//
0x0000 00000 (main.go:4) PCDATA $2, $0
0x0000 00000 (main.go:4) PCDATA $0, $0

// The Go calling convention mandates that every argument must be passed on the stack, using the pre-reserved space on the caller's stack-frame.
// It is the caller's responsibility to grow (and shrink back) the stack appropriately so that arguments can be passed to the callee, and potential return-values passed back to the caller.
// The Go compiler never generates instructions from the PUSH/POP family: the stack is grown or shrunk by respectively decrementing or incrementing the virtual hardware stack pointer SP.
// The SP pseudo-register is a virtual stack pointer used to refer to frame-local variables and the arguments being prepared for function calls.
// It points to the top of the local stack frame, so references should use negative offsets in the range [−framesize, 0): x-8(SP), y-4(SP), and so on.

// "".b+12(SP) and "".a+8(SP) respectively refer to the addresses 12 bytes and 8 bytes below the top of the stack (remember: it grows downwards!).
// .a and .b are arbitrary aliases given to the referred locations; although they have absolutely no semantic meaning whatsoever, they are mandatory when using relative addressing on virtual registers. The documentation about the virtual frame-pointer has some to say about this:

// The FP pseudo-register is a virtual frame pointer used to refer to function arguments. The compilers maintain a virtual frame pointer and refer to the arguments on the stack as offsets from that pseudo-register. Thus 0(FP) is the first argument to the function, 8(FP) is the second (on a 64-bit machine), and so on. However, when referring to a function argument this way, it is necessary to place a name at the beginning, as in first_arg+0(FP) and second_arg+8(FP). (The meaning of the offset —offset from the frame pointer— distinct from its use with SB, where it is an offset from the symbol.) The assembler enforces this convention, rejecting plain 0(FP) and 8(FP). The actual name is semantically irrelevant but should be used to document the argument's name.

// The first argument a is not located at 0(SP), but rather at 8(SP); that's because the caller stores its return-address in 0(SP) via the CALL pseudo-instruction.
// Arguments are passed in reverse-order; i.e. the first argument is the closest to the top of the stack.

0x0000 00000 (main.go:4) MOVL "".b+12(SP), AX
0x0004 00004 (main.go:4) MOVL "".a+8(SP), CX

// ADDL does the actual addition of the two Long-words (i.e. 4-byte values) stored in AX and CX, then stores the final result in AX.
0x0008 00008 (main.go:4) ADDL CX, AX
// That result is then moved over to "".~r2+16(SP), where the caller had previously reserved some stack space and expects to find its return values. Once again, "".~r2 has no semantic meaning here.
0x000a 00010 (main.go:4) MOVL AX, "".~r2+16(SP)
0x000e 00014 (main.go:4) MOVB $1, "".~r3+20(SP)
// A final RET pseudo-instruction tells the Go assembler to insert whatever instructions are required by the calling convention of the target platform in order to properly return from a subroutine call.
// Most likely this will cause the code to pop off the return-address stored at 0(SP) then jump back to it.
0x0013 00019 (main.go:4) RET

look at more concise version:

1
2
3
4
5
6
7
8
9
10
11
12
;; Declare global function symbol "".add (actually main.add once linked)
;; Do not insert stack-split preamble
;; 0 bytes of stack-frame, 16 bytes of arguments passed in
;; func add(a, b int32) (int32, bool)
0x0000 TEXT "".add(SB), NOSPLIT, $0-16
;; ...omitted FUNCDATA stuff...
0x0000 MOVL "".b+12(SP), AX ;; move second Long-word (4B) argument from caller's stack-frame into AX
0x0004 MOVL "".a+8(SP), CX ;; move first Long-word (4B) argument from caller's stack-frame into CX
0x0008 ADDL CX, AX ;; compute AX=CX+AX
0x000a MOVL AX, "".~r2+16(SP) ;; move addition result (AX) into caller's stack-frame
0x000e MOVB $1, "".~r3+20(SP) ;; move `true` boolean (constant) into caller's stack-frame
0x0013 RET ;; jump to return address stored at 0(SP)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
   |    +-------------------------+ <-- 32(SP)              
| | |
G | | |
R | | |
O | | main.main's saved |
W | | frame-pointer (BP) |
S | |-------------------------| <-- 24(SP)
| | [alignment] |
D | | "".~r3 (bool) = 1/true | <-- 21(SP)
O | |-------------------------| <-- 20(SP)
W | | |
N | | "".~r2 (int32) = 42 |
W | |-------------------------| <-- 16(SP)
A | | |
R | | "".b (int32) = 32 |
D | |-------------------------| <-- 12(SP)
S | | |
| | "".a (int32) = 10 |
| |-------------------------| <-- 8(SP)
| | |
| | |
| | |
\ | / | return address to |
\|/ | main.main + 0x30 |
- +-------------------------+ <-- 0(SP) (TOP OF STACK)

(diagram made with https://textik.com)

analysis for main

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72

// "".main (main.main once linked) is a global function symbol in the .text section, whose address is some constant offset from the beginning of our address-space.
// It allocates a 24 bytes stack-frame and doesn't receive any argument nor does it return any value

// As we mentioned above, the Go calling convention mandates that every argument must be passed on the stack.

// Our caller, main, grows its stack-frame by 24 bytes (remember that the stack grows downwards, so SUBQ here actually makes the stack-frame bigger) by decrementing the virtual stack-pointer. Of those 24 bytes:

// 8 bytes (16(SP)-24(SP)) are used to store the current value of the frame-pointer BP (the real one!) to allow for stack-unwinding and facilitate debugging
// 1+3 bytes (12(SP)-16(SP)) are reserved for the second return value (bool) plus 3 bytes of necessary alignment on amd64
// 4 bytes (8(SP)-12(SP)) are reserved for the first return value (int32)
// 4 bytes (4(SP)-8(SP)) are reserved for the value of argument b (int32)
// 4 bytes (0(SP)-4(SP)) are reserved for the value of argument a (int32).
0x0000 00000 (main.go:6) TEXT "".main(SB), ABIInternal, $24-0
// ;; stack-split prologue...
// The prologue checks whether the goroutine is running out of space and, if it's the case, jumps to the epilogue.
// TLS is a virtual register maintained by the runtime that holds a pointer to the current g, i.e. the data-structure that keeps track of all the state of a goroutine.
// Looking at the definition of g from the source code of the runtime:
// type g struct {
// stack stack // 16 bytes
// stackguard0 is the stack pointer compared in the Go stack growth prologue.
// It is stack.lo+StackGuard normally, but can be StackPreempt to trigger a preemption.
// stackguard0 uintptr
// stackguard1 uintptr

// ...omitted dozens of fields...
// }
// We can see that 16(CX) corresponds to g.stackguard0, which is the threshold value maintained by the runtime that, when compared to the stack-pointer, indicates whether or not a goroutine is about to run out of space.
// The prologue thus checks if the current SP value is less than or equal to the stackguard0 threshold (that is, it's bigger), then jumps to the epilogue if it happens to be the case.

0x0000 00000 (main.go:6) MOVQ (TLS), CX ;; store current *g in CX
0x0009 00009 (main.go:6) CMPQ SP, 16(CX) ;; compare SP and g.stackguard0
0x000d 00013 (main.go:6) JLS 58 ;; jumps to 0x3a if SP <= g.stackguard0
// 把栈减了24个字节。增大了栈空间。
0x000f 00015 (main.go:6) SUBQ $24, SP
// 保存老的bp设置新的bp。这里的bp是真实的寄存器
0x0013 00019 (main.go:6) MOVQ BP, 16(SP)
0x0018 00024 (main.go:6) LEAQ 16(SP), BP
// ;; ... PCDATA stuff...
0x001d 00029 (main.go:6) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
0x001d 00029 (main.go:6) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
0x001d 00029 (main.go:6) FUNCDATA $3, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
0x001d 00029 (main.go:6) PCDATA $2, $0
0x001d 00029 (main.go:6) PCDATA $0, $0
//Finally, following the growth of the stack, LEAQ computes the new address of the frame-pointer and stores it in BP.
// The caller pushes the arguments for the callee as a Quad word (i.e. an 8-byte value) at the top of the stack that it has just grown.
// Although it might look like random garbage at first, 137438953482 actually corresponds to the 10 and 32 4-byte values concatenated into one 8-byte value:

// $ echo 'obase=2;137438953482' | bc
//10000000000000000000000000000000001010
// \____/\______________________________/
// 32 10

0x001d 00029 (main.go:6) MOVQ $137438953482, AX
0x0027 00039 (main.go:6) MOVQ AX, (SP)
0x002b 00043 (main.go:6) CALL "".add(SB)
// 恢复BP寄存器,缩减栈空间
0x0030 00048 (main.go:6) MOVQ 16(SP), BP
0x0035 00053 (main.go:6) ADDQ $24, SP
0x0039 00057 (main.go:6) RET
;; ... stack-split epilogue...
// The epilogue, on the other hand, triggers the stack-growth machinery and then jumps back to the prologue.
// This creates a feedback loop that goes on for as long as a large enough stack hasn't been allocated for our starved goroutine.

// The body of the epilogue is pretty straightforward: it calls into the runtime, which will do the actual work of growing the stack, then jumps back to the first instruction of the function (i.e. to the prologue).

// The NOP instruction just before the CALL exists so that the prologue doesn't jump directly onto a CALL instruction. On some platforms, doing so can lead to very dark places; it's a common pratice to set-up a noop instruction right before the actual call and land on this NOP instead.
0x003a 00058 (main.go:6) NOP
0x003a 00058 (main.go:6) PCDATA $0, $-1
0x003a 00058 (main.go:6) PCDATA $2, $-1
0x003a 00058 (main.go:6) CALL runtime.morestack_noctxt(SB)
0x003f 00063 (main.go:6) JMP 0

reciever method

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
package main

//go:noinline
func Add(a, b int32) int32 { return a + b }

type Adder struct{ id int32 }

//go:noinline
func (adder *Adder) AddPtr(a, b int32) int32 { return a + b }

//go:noinline
func (adder Adder) AddVal(a, b int32) int32 { return a + b }

func main() {
Add(10, 32) // direct call of top-level function

adder := Adder{id: 6754}
adder.AddPtr(10, 32) // direct call of method with pointer receiver
adder.AddVal(10, 32) // direct call of method with value receiver

(&adder).AddVal(10, 32) // implicit dereferencing
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
0x0000 00000 (main.go:14)	TEXT	"".main(SB), ABIInternal, $40-0
0x0000 00000 (main.go:14) MOVQ (TLS), CX
0x0009 00009 (main.go:14) CMPQ SP, 16(CX)
0x000d 00013 (main.go:14) JLS 161
// 把栈减了40个字节。增大了栈空间。
0x0013 00019 (main.go:14) SUBQ $40, SP
0x0017 00023 (main.go:14) MOVQ BP, 32(SP)
0x001c 00028 (main.go:14) LEAQ 32(SP), BP
0x0021 00033 (main.go:14) FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
0x0021 00033 (main.go:14) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
0x0021 00033 (main.go:14) FUNCDATA $3, gclocals·9fb7f0986f647f17cb53dda1484e0f7a(SB)
0x0021 00033 (main.go:15) PCDATA $2, $0
0x0021 00033 (main.go:15) PCDATA $0, $0
// move (10,32) to
0x0021 00033 (main.go:15) MOVQ $137438953482, AX
0x002b 00043 (main.go:15) MOVQ AX, (SP)
0x002f 00047 (main.go:15) CALL "".Add(SB)

// First things first, the receiver is initialized via adder := Adder{id: 6754}:
0x0034 00052 (main.go:17) MOVL $0, "".adder+28(SP)
0x003c 00060 (main.go:17) MOVL $6754, "".adder+28(SP)
0x0044 00068 (main.go:18) PCDATA $2, $1
// recieve address to the AX,8 bytes.
0x0044 00068 (main.go:18) LEAQ "".adder+28(SP), AX
0x0049 00073 (main.go:18) PCDATA $2, $0
0x0049 00073 (main.go:18) MOVQ AX, (SP)
0x004d 00077 (main.go:18) MOVQ $137438953482, AX
0x0057 00087 (main.go:18) MOVQ AX, 8(SP)
0x005c 00092 (main.go:18) CALL "".(*Adder).AddPtr(SB)

// value to the AX.
0x0061 00097 (main.go:19) MOVL "".adder+28(SP), AX
0x0065 00101 (main.go:19) MOVL AX, (SP)
0x0068 00104 (main.go:19) MOVQ $137438953482, AX
0x0072 00114 (main.go:19) MOVQ AX, 4(SP)
0x0077 00119 (main.go:19) CALL "".Adder.AddVal(SB)

// Somehow, Go automagically dereferences our pointer and manages to make the call. How so?

// How the compiler handles this kind of situation depends on whether or not the receiver being pointed to has escaped to the heap or not.

// Case A: The receiver is on the stack

// If the receiver is still on the stack and its size is sufficiently small that it can be copied in a few instructions, as is the case here, the compiler simply copies its value over to the top of the stack then does a straightforward method call to "".Adder.AddVal (i.e. the one with a value receiver).

// (&adder).AddVal(10, 32) thus looks like this in this situation:

// 0x0074 MOVL "".adder+28(SP), AX ;; move (i.e. copy) adder (note the MOV instead of a LEA) to..
// 0x0078 MOVL AX, (SP) ;; ..the top of the stack (argument #1)
// 0x007b MOVQ $137438953482, AX ;; move (32,10) to..
// 0x0085 MOVQ AX, 4(SP) ;; ..the top of the stack (arguments #3 & #2)
// 0x008a CALL "".Adder.AddVal(SB)
// Boring (although efficient). Let's move on to case B.

// Case B: The receiver is on the heap

// If the receiver has escaped to the heap then the compiler has to take a cleverer route: it generates a new method (with a pointer receiver, this time) that wraps "".Adder.AddVal, and replaces the original call to "".Adder.AddVal (the wrappee) with a call to "".(*Adder).AddVal (the wrapper).
// The wrapper's sole mission, then, is to make sure that the receiver gets properly dereferenced before being passed to the wrappee, and that any arguments and return values involved are properly copied back and forth between the caller and the wrappee.

// (NOTE: In assembly outputs, these wrapper methods are marked as <autogenerated>.)


0x007c 00124 (main.go:21) MOVL "".adder+28(SP), AX
0x0080 00128 (main.go:21) MOVL AX, (SP)
0x0083 00131 (main.go:21) MOVQ $137438953482, AX
0x008d 00141 (main.go:21) MOVQ AX, 4(SP)
0x0092 00146 (main.go:21) CALL "".Adder.AddVal(SB)
0x0097 00151 (main.go:22) MOVQ 32(SP), BP
0x009c 00156 (main.go:22) ADDQ $40, SP
0x00a0 00160 (main.go:22) RET
0x00a1 00161 (main.go:22) NOP
0x00a1 00161 (main.go:14) PCDATA $0, $-1
0x00a1 00161 (main.go:14) PCDATA $2, $-1
0x00a1 00161 (main.go:14) CALL runtime.morestack_noctxt(SB)
0x00a6 00166 (main.go:14) JMP 0

Here’s an annotated listing of the generated wrapper that should hopefully clear things up a bit:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
0x0000 TEXT	"".(*Adder).AddVal(SB), DUPOK|WRAPPER, $32-24
;; ...omitted preambles...

0x0026 MOVQ ""..this+40(SP), AX ;; check whether the receiver..
0x002b TESTQ AX, AX ;; ..is nil
0x002e JEQ 92 ;; if it is, jump to 0x005c (panic)

0x0030 MOVL (AX), AX ;; dereference pointer receiver..
0x0032 MOVL AX, (SP) ;; ..and move (i.e. copy) the resulting value to argument #1

;; forward (copy) arguments #2 & #3 then call the wrappee
0x0035 MOVL "".a+48(SP), AX
0x0039 MOVL AX, 4(SP)
0x003d MOVL "".b+52(SP), AX
0x0041 MOVL AX, 8(SP)
0x0045 CALL "".Adder.AddVal(SB) ;; call the wrapped method

;; copy return value from wrapped method then return
0x004a MOVL 16(SP), AX
0x004e MOVL AX, "".~r2+56(SP)
;; ...omitted frame-pointer stuff...
0x005b RET

;; throw a panic with a detailed error
0x005c CALL runtime.panicwrap(SB)

;; ...omitted epilogues...

Obviously, this kind of wrapper can induce quite a bit of overhead considering all the copying that needs to be done in order to pass the arguments back and forth; especially if the wrappee is just a few instructions.
Fortunately, in practice, the compiler would have inlined the wrappee directly into the wrapper to amortize these costs (when feasible, at least).

Note the WRAPPER directive in the definition of the symbol, which indicates that this method shouldn’t appear in backtraces (so as not to confuse the end-user), nor should it be able to recover from panics that might be thrown by the wrappee.

WRAPPER: This is a wrapper function and should not count as disabling recover.

The runtime.panicwrap function, which throws a panic if the wrapper’s receiver is nil, is pretty self-explanatory; here’s its complete listing for reference (src/runtime/error.go):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
// panicwrap generates a panic for a call to a wrapped value method
// with a nil pointer receiver.
//
// It is called from the generated wrapper code.
func panicwrap() {
pc := getcallerpc()
name := funcname(findfunc(pc))
// name is something like "main.(*T).F".
// We want to extract pkg ("main"), typ ("T"), and meth ("F").
// Do it by finding the parens.
i := stringsIndexByte(name, '(')
if i < 0 {
throw("panicwrap: no ( in " + name)
}
pkg := name[:i-1]
if i+2 >= len(name) || name[i-1:i+2] != ".(*" {
throw("panicwrap: unexpected string after package name: " + name)
}
name = name[i+2:]
i = stringsIndexByte(name, ')')
if i < 0 {
throw("panicwrap: no ) in " + name)
}
if i+2 >= len(name) || name[i:i+2] != ")." {
throw("panicwrap: unexpected string after type name: " + name)
}
typ := name[:i]
meth := name[i+2:]
panic(plainError("value method " + pkg + "." + typ + "." + meth + " called using nil *" + typ + " pointer"))
}

Anatomy of an interface

Overview of the datastructures

Before we can understand how they work, we first need to build a mental model of the datastructures that make up interfaces and how they’re laid out in memory.
To that end, we’ll have a quick peek into the runtime package to see what an interface actually looks like from the standpoint of the Go implementation.

The iface structure

iface is the root type that represents an interface within the runtime (src/runtime/runtime2.go).
Its definition goes like this:

1
2
3
4
type iface struct { // 16 bytes on a 64bit arch
tab *itab
data unsafe.Pointer
}

An interface is thus a very simple structure that maintains 2 pointers:

  • tab holds the address of an itab object, which embeds the datastructures that describe both the type of the interface as well as the type of the data it points to.
  • data is a raw (i.e. unsafe) pointer to the value held by the interface.

More often than not, this will result in a heap allocation as the compiler takes the conservative route and forces the receiver to escape.
This holds true even for scalar types!

We can prove that with a few lines of code (escape.go):

1
2
3
4
5
6
7
8
9
10
11
type Addifier interface{ Add(a, b int32) int32 }

type Adder struct{ name string }
//go:noinline
func (adder Adder) Add(a, b int32) int32 { return a + b }

func main() {
adder := Adder{name: "myAdder"}
adder.Add(10, 32) // doesn't escape
Addifier(adder).Add(10, 32) // escapes
}

One could even visualize the resulting heap allocation using a simple benchmark (escape_test.go):

1
2
3
4
5
6
7
8
9
10
11
12
13
func BenchmarkDirect(b *testing.B) {
adder := Adder{id: 6754}
for i := 0; i < b.N; i++ {
adder.Add(10, 32)
}
}

func BenchmarkInterface(b *testing.B) {
adder := Adder{id: 6754}
for i := 0; i < b.N; i++ {
Addifier(adder).Add(10, 32)
}
}
1
2
3
4

$ GOOS=linux GOARCH=amd64 go test -bench=. -benchmem ./escape_test.go
BenchmarkDirect-8 2000000000 1.60 ns/op 0 B/op 0 allocs/op
BenchmarkInterface-8 100000000 15.0 ns/op 4 B/op 1 allocs/op

We can clearly see how each time we create a new Addifier interface and initialize it with our adder variable, a heap allocation of sizeof(Adder) actually takes place. Later in this chapter, we’ll see how even simple scalar types can lead to heap allocations when used with interfaces.

Let’s turn our attention towards the next datastructure: itab.

The itab structure

itab is defined thusly (src/runtime/runtime2.go):

1
2
3
4
5
6
7
type itab struct { // 40 bytes on a 64bit arch
inter *interfacetype
_type *_type
hash uint32 // copy of _type.hash. Used for type switches.
_ [4]byte
fun [1]uintptr // variable sized. fun[0]==0 means _type does not implement inter.
}

An itab is the heart & brain of an interface.

First, it embeds a _type, which is the internal representation of any Go type within the runtime.
A _type describes every facets of a type: its name, its characteristics (e.g. size, alignment…), and to some extent, even how it behaves (e.g. comparison, hashing…)!
In this instance, the _type field describes the type of the value held by the interface, i.e. the value that the data pointer points to.

Second, we find a pointer to an interfacetype, which is merely a wrapper around _type with some extra information that are specific to interfaces.
As you’d expect, the inter field describes the type of the interface itself.

Finally, the fun array holds the function pointers that make up the virtual/dispatch table of the interface.
Notice the comment that says // variable sized, meaning that the size with which this array is declared is irrelevant.
We’ll see later in this chapter that the compiler is responsible for allocating the memory that backs this array, and does so independently of the size indicated here. Likewise, the runtime always accesses this array using raw pointers, thus bounds-checking does not apply here.

The _type structure

As we said above, the _type structure gives a complete description of a Go type.
It’s defined as such (src/runtime/type.go):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
type _type struct { // 48 bytes on a 64bit arch
size uintptr
ptrdata uintptr // size of memory prefix holding all pointers
hash uint32
tflag tflag
align uint8
fieldalign uint8
kind uint8
alg *typeAlg
// gcdata stores the GC type data for the garbage collector.
// If the KindGCProg bit is set in kind, gcdata is a GC program.
// Otherwise it is a ptrmask bitmap. See mbitmap.go for details.
gcdata *byte
str nameOff
ptrToThis typeOff
}

Thankfully, most of these fields are quite self-explanatory.

The nameOff & typeOff types are int32 offsets into the metadata embedded into the final executable by the linker. This metadata is loaded into runtime.moduledata structures at run time (src/runtime/symtab.go), which should look fairly similar if you’ve ever had to look at the content of an ELF file.
The runtime provide helpers that implement the necessary logic for following these offsets through the moduledata structures, such as e.g. resolveNameOff (src/runtime/type.go) and resolveTypeOff (src/runtime/type.go):

func resolveNameOff(ptrInModule unsafe.Pointer, off nameOff) name {}
func resolveTypeOff(ptrInModule unsafe.Pointer, off typeOff) *_type {}

I.e., assuming t is a _type, calling resolveTypeOff(t, t.ptrToThis) returns a copy of t.

The interfacetype structure

Finally, here’s the interfacetype structure (src/runtime/type.go):

type interfacetype struct { // 80 bytes on a 64bit arch
typ _type
pkgpath name
mhdr []imethod
}

type imethod struct {
name nameOff
ityp typeOff
}
As mentioned, an interfacetype is just a wrapper around a _type with some extra interface-specific metadata added on top.
In the current implementation, this metadata is mostly composed of a list of offsets that points to the respective names and types of the methods exposed by the interface ([]imethod).

Conclusion

Here’s an overview of what an iface looks like when represented with all of its sub-types inlined; this hopefully should help connect all the dots:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
type iface struct { // `iface`
tab *struct { // `itab`
inter *struct { // `interfacetype`
typ struct { // `_type`
size uintptr
ptrdata uintptr
hash uint32
tflag tflag
align uint8
fieldalign uint8
kind uint8
alg *typeAlg
gcdata *byte
str nameOff
ptrToThis typeOff
}
pkgpath name
mhdr []struct { // `imethod`
name nameOff
ityp typeOff
}
}
_type *struct { // `_type`
size uintptr
ptrdata uintptr
hash uint32
tflag tflag
align uint8
fieldalign uint8
kind uint8
alg *typeAlg
gcdata *byte
str nameOff
ptrToThis typeOff
}
hash uint32
_ [4]byte
fun [1]uintptr
}
data unsafe.Pointer
}

This section glossed over the different data-types that make up an interface to help us to start building a mental model of the various cogs involved in the overall machinery, and how they all work with each other.

Creating an interface
Now that we’ve had a quick look at all the datastructures involved, we’ll focus on how they actually get allocated and initiliazed.

Consider the following program (iface.go):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
type Mather interface {
Add(a, b int32) int32
Sub(a, b int64) int64
}

type Adder struct{ id int32 }
//go:noinline
func (adder Adder) Add(a, b int32) int32 { return a + b }
//go:noinline
func (adder Adder) Sub(a, b int64) int64 { return a - b }

func main() {
m := Mather(Adder{id: 6754})

// This call just makes sure that the interface is actually used.
// Without this call, the linker would see that the interface defined above
// is in fact never used, and thus would optimize it out of the final
// executable.
m.Add(10, 32)
}

NOTE: For the remainder of this chapter, we will denote an interface I that holds a type T as <I,T>. E.g. Mather(Adder{id: 6754}) instantiates an iface<Mather, Adder>.

Let’s zoom in on the instantiation of iface<Mather, Adder>:

m := Mather(Adder{id: 6754})
This single line of Go code actually sets off quite a bit of machinery, as the assembly listing generated by the compiler can attest:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
;; part 1: allocate the receiver
;; A constant decimal value of 6754, corresponding to the ID of our Adder, is stored at the beginning of the current stack-frame.
;; It's stored there so that the compiler will later be able to reference it by its address;
0x001d MOVL $6754, ""..autotmp_1+36(SP)
;; part 2: set up the itab
;; Semantically, this gives us something along the lines of the following pseudo-code:

;; tab := getSymAddr(`go.itab.main.Adder,main.Mather`).(*itab)
;; That's half of our interface right there!

;; Now, while we're at it, let's have a deeper look at that go.itab."".Adder,"".Mather symbol.
;; As usual, the -S flag of the compiler can tell us a lot:




0x0025 LEAQ go.itab."".Adder,"".Mather(SB), AX
0x002c MOVQ AX, (SP)
;; part 3: set up the data
0x0030 LEAQ ""..autotmp_1+36(SP), AX
0x0035 MOVQ AX, 8(SP)
0x003a CALL runtime.convT2I32(SB)
0x003f MOVQ 16(SP), AX
0x0044 MOVQ 24(SP), CX

Part 1: Allocate the receiver

1
0x001d MOVL	$6754, ""..autotmp_1+36(SP)

A constant decimal value of 6754, corresponding to the ID of our Adder, is stored at the beginning of the current stack-frame.
It’s stored there so that the compiler will later be able to reference it by its address; we’ll see why in part 3.

Part 2: Set up the itab

1
2
0x0025 LEAQ	go.itab."".Adder,"".Mather(SB), AX
0x002c MOVQ AX, (SP)

It looks like the compiler has already created the necessary itab for representing our iface<Mather, Adder> interface, and made it available to us via a global symbol: go.itab."".Adder,"".Mather.

We’re in the process of building an iface<Mather, Adder> interface and, in order to do so, we’re loading the effective address of this global go.itab."".Adder,"".Mather symbol at the top of the current stack-frame.
Once again, we’ll see why in part 3.
Semantically, this gives us something along the lines of the following pseudo-code:
tab := getSymAddr(go.itab.main.Adder,main.Mather).(*itab)
That’s half of our interface right there!

Now, while we’re at it, let’s have a deeper look at that go.itab."".Adder,"".Mather symbol.
As usual, the -S flag of the compiler can tell us a lot:

1
2
3
4
5
6
7
8
9
$ GOOS=linux GOARCH=amd64 go tool compile -S iface.go | grep -A 7 '^go.itab."".Adder,"".Mather'
go.itab."".Adder,"".Mather SRODATA dupok size=40
0x0000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
0x0010 8a 3d 5f 61 00 00 00 00 00 00 00 00 00 00 00 00 .=_a............
0x0020 00 00 00 00 00 00 00 00 ........
rel 0+8 t=1 type."".Mather+0
rel 8+8 t=1 type."".Adder+0
rel 24+8 t=1 "".(*Adder).Add+0
rel 32+8 t=1 "".(*Adder).Sub+0

Neat. Let’s analyze this piece by piece.

The first piece declares the symbol and its attributes:

go.itab."".Adder,"".Mather SRODATA dupok size=40
As usual, since we’re looking directly at the intermediate object file generated by the compiler (i.e. the linker hasn’t run yet), symbol names are still missing package names. Nothing new on that front.
Other than that, what we’ve got here is a 40-byte global object symbol that will be stored in the .rodata section of our binary.

Note the dupok directive, which tells the linker that it is legal for this symbol to appear multiple times at link-time: the linker will have to arbitrarily choose one of them over the others.

The second piece is a hexdump of the 40 bytes of data associated with the symbol. I.e., it’s a serialized representation of an itab structure:

1
2
3
0x0000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0x0010 8a 3d 5f 61 00 00 00 00 00 00 00 00 00 00 00 00 .=_a............
0x0020 00 00 00 00 00 00 00 00

As you can see, most of this data is just a bunch of zeros at this point. The linker will take care of filling them up, as we’ll see in a minute.

Notice how, among all these zeros, 4 bytes actually have been set though, at offset 0x10+4.
If we take a look back at the declaration of the itab structure and annotate the respective offsets of its fields:

1
2
3
4
5
6
7
8
type itab struct { // 40 bytes on a 64bit arch
inter *interfacetype // offset 0x00 ($00)
_type *_type // offset 0x08 ($08)
hash uint32 // offset 0x10 ($16)
_ [4]byte // offset 0x14 ($20)
fun [1]uintptr // offset 0x18 ($24)
// offset 0x20 ($32)
}

We see that offset 0x10+4 matches the hash uint32 field: i.e., the hash value that corresponds to our main.Adder type is already right there in our object file.

The third and final piece lists a bunch of relocation directives for the linker:

1
2
3
4
rel 0+8 t=1 type."".Mather+0
rel 8+8 t=1 type."".Adder+0
rel 24+8 t=1 "".(*Adder).Add+0
rel 32+8 t=1 "".(*Adder).Sub+0

rel 0+8 t=1 type."".Mather+0 tells the linker to fill up the first 8 bytes (0+8) of the contents with the address of the global object symbol type."".Mather.
rel 8+8 t=1 type."".Adder+0 then fills the next 8 bytes with the address of type."".Adder, and so on and so forth.

Once the linker has done its job and followed all of these directives, our 40-byte serialized itab will be complete.
Overall, we’re now looking at something akin to the following pseudo-code:

1
2
3
4
5
6
7
8
9
10
11
12
tab := getSymAddr(`go.itab.main.Adder,main.Mather`).(*itab)

// NOTE: The linker strips the `type.` prefix from these symbols when building
// the executable, so the final symbol names in the .rodata section of the
// binary will actually be `main.Mather` and `main.Adder` rather than
// `type.main.Mather` and `type.main.Adder`.
// Don't get tripped up by this when toying around with objdump.
tab.inter = getSymAddr(`type.main.Mather`).(*interfacetype)
tab._type = getSymAddr(`type.main.Adder`).(*_type)

tab.fun[0] = getSymAddr(`main.(*Adder).Add`).(uintptr)
tab.fun[1] = getSymAddr(`main.(*Adder).Sub`).(uintptr)

We’ve got ourselves a ready-to-use itab, now if we just had some data to along with it, that’d make for a nice, complete interface.

Remember from part 1 that the top of the stack (SP) currently holds the address of go.itab."".Adder,"".Mather (argument #1).
Also remember from part 2 that we had stored a $6754 decimal constant in “”…autotmp_1+36(SP): we now load the effective address of this constant just below the top of the stack-frame, at 8(SP) (argument #2).

These two pointers are the two arguments that we pass into runtime.convT2I32, which will apply the final touches of glue to create and return our complete interface.
Let’s have a closer look at it (src/runtime/iface.go):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
func convT2I32(tab *itab, elem unsafe.Pointer) (i iface) {
t := tab._type
/* ...omitted debug stuff... */
var x unsafe.Pointer
if *(*uint32)(elem) == 0 {
x = unsafe.Pointer(&zeroVal[0])
} else {
x = mallocgc(4, t, false)
*(*uint32)(x) = *(*uint32)(elem)
}
i.tab = tab
i.data = x
return
}

So runtime.convT2I32 does 4 things:

It creates a new iface structure i (to be pedantic, its caller creates it… same difference).
It assigns the itab pointer we just gave it to i.tab.
It allocates a new object of type i.tab._type on the heap, then copy the value pointed to by the second argument elem into that new object.
It returns the final interface.

This process is quite straightforward overall, although the 3rd step does involve some tricky implementation details in this specific case, which are caused by the fact that our Adder type is effectively a scalar type.
We’ll look at the interactions of scalar types and interfaces in more details in the section about the special cases of interfaces.

Conceptually, we’ve now accomplished the following (pseudo-code):

1
2
3
4
5
6
7
8
tab := getSymAddr(`go.itab.main.Adder,main.Mather`).(*itab)
elem := getSymAddr(`""..autotmp_1+36(SP)`).(*int32)

i := runtime.convTI32(tab, unsafe.Pointer(elem))

assert(i.tab == tab)
assert(*(*int32)(i.data) == 6754) // same value..
assert((*int32)(i.data) != elem) // ..but different (al)locations!

To summarize all that just went down, here’s a complete, annotated version of the assembly code for all 3 parts:

1
2
3
4
5
6
7
8
0x001d MOVL	$6754, ""..autotmp_1+36(SP)         ;; create an addressable $6754 value at 36(SP)
0x0025 LEAQ go.itab."".Adder,"".Mather(SB), AX ;; set up go.itab."".Adder,"".Mather..
0x002c MOVQ AX, (SP) ;; ..as first argument (tab *itab)
0x0030 LEAQ ""..autotmp_1+36(SP), AX ;; set up &36(SP)..
0x0035 MOVQ AX, 8(SP) ;; ..as second argument (elem unsafe.Pointer)
0x003a CALL runtime.convT2I32(SB) ;; call convT2I32(go.itab."".Adder,"".Mather, &$6754)
0x003f MOVQ 16(SP), AX ;; AX now holds i.tab (go.itab."".Adder,"".Mather)
0x0044 MOVQ 24(SP), CX ;; CX now holds i.data (&$6754, somewhere on the heap)