summaryrefslogtreecommitdiffstats
path: root/docs/re/arch_x86.txt
blob: f1f2a0345502560fc83ee30e7cc16001a7e0f02f (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
Instruction Set References
--------------------------
https://en.wikipedia.org/wiki/X86_instruction_listings
https://stackoverflow.com/questions/3818856/what-does-the-rep-stos-x86-assembly-instruction-sequence-do
https://stackoverflow.com/questions/6555094/what-does-cltq-do-in-assembly


Register Names / Sizes
----------------------
"Traditional" general-purpose registers:
    rax, rbx, rcx, rdx, rsi, rdi, rbp, rsp

 MSB                                                                   LSB
 +--------+--------+--------+--------+--------+--------+--------+--------+
 |                                  rax                                  |
 +--------+--------+--------+--------+--------+--------+--------+--------+
                                     |                eax                |
                                     +--------+--------+--------+--------+
                                                       |        ax       |
                                                       +--------+--------+
                                                       |   ah   |   al   |
                                                       +--------+--------+

Additional x86_64 general-purpose registers:
    r8, r9, r10, r11, r12, r13, r14, r15

 MSB                                                                   LSB
 +--------+--------+--------+--------+--------+--------+--------+--------+
 |                                  r8                                   |
 +--------+--------+--------+--------+--------+--------+--------+--------+
                                     |                r8d                |
                                     +--------+--------+--------+--------+
                                                       |       r8w       |
                                                       +--------+--------+
                              * Note: High byte of lower        |  r8b / |
                                16-bit word is inaccessible     |  r8l   |
                                                                +--------+


Calling Conventions
-------------------
Passing function arguments is arch-dependent: see below.  The caller return
address is pushed after any argument values.

Often, the called function will use the base pointer register to mark the stack
address at the bottom of the new stack frame and adjust the stack pointer
register to allocate space for the new frame in full.  The old bp value is saved
on the stack above the return address.  See below for an illustration of the
stack.

On return, the original base and stack pointer values are restored.  Any pushed
argument values remain on the stack and are the responsibility of the caller.
The function return value is stored in the a register.

+----------------------------+  <- sp (register)            top of stack
|                            |                              lower addresses
|  space for local function  |
| storage: variables, arrays |
|                            |
|                            |
+============================+  <- bp (register)
|     saved base pointer     |
+----------------------------+
|  saved instruction pointer |
+============================+
|    function argument ??    |
+----------------------------+
|    function argument ??    |
+----------------------------+
|            ...             |
+----------------------------+
|                            |
|                            |
|                            |
|     caller stack frame     |
|                            |
|                            |
|                            |
+============================+  <- saved base pointer (on stack)
|   caller saved base ptr    |
..............................                              higher addresses


    32-bit (x86)
    ------------
    All function arguments are pushed to the stack in reverse order, leaving the
    first arguent on the top of the stack.

    Stack pointer register: esp
    Base pointer register:  ebp
    Return value in:        eax


    64-bit (x86_64)
    ---------------
    The first six arguments are stored in registers.  All remaining arguments
    are pushed to the stack in reverse order.

    Argument #1:            rdi
    Argument #2:            rsi
    Argument #3:            rdx
    Argument #4:            rcx
    Argument #5:            r8
    Argument #6:            r9
    Stack pointer register: rsp
    Base pointer register:  rbp
    Return value in:        rax



Specific Callouts
============================================================

TEST vs. CMP
------------
CMP subtracts operands and sets internal flags.  Among these, it sets the
zero flag if the difference is zero (operands are equal).

TEST sets the zero flag (ZF) when the result of the AND operation is zero.  If
the two operands are equal, their bitwise AND is zero only when the operands
themselves are zero.  TEST also sets the sign flag (SF) when the most
significant bit is set in the result, and the parity flag (PF) when the number
of set bits is even.

JE (alias of JZ) tests the zero flag and jumps if it is set.  This creates the
following equivalencies:

test eax, eax
je <somewhere>          ---->           if (eax == 0) {}

cmp eax, ebx
je <somewhere>          ---->           if (eax == ebx) {}


REP prefix
----------
The "rep" prefix on a string instruction repeats that string instruction for CX
block loads.

e.g.  STOS is "Store String"
It will store the value in AX at the address in RDI
(technically, STOSB, STOSW, STOD, and STOSQ use AL, AX, EAX, and RAX respectively)
If RCX = 0x20, RDI = some buffer, and RAX = 0,

`rep stosq` is equivalent to:

```
buf_ptr = buf
for(i = 0x20; i != 0; i--)
    *buf_ptr = 0;
    buf_ptr++;
```