✿ introduction
Sitting in class one day I thought of what happens when you use the jmp
instruction to jump from somewhere in the code to an arbitrary location within another instruction. By "arbitrary location" here I mean the middle of a different instruction. In fact the simplest example I could think of as I was sitting in that classroom was the machine code sequence (in hexadecimal) eb ff
, which translates to jmp $ + 1
using nasm
notation where the dollar sign $
denotes the current position of an imaginary "cursor" at compile time, in this case that cursor is placed at the beginning of the jmp
instruction.
Now, it is important to note that eb
, in its base form, takes a relative signed 8-bit address as its one and only operand and this relative address jumps from the end of the given instruction. In the case of ff
, which in two's complement translates to a relative address -1
, this means we jump from the end of the jmp
instruction directly to its center, therefore the next instruction will read ff ...
.
✿ defining the string
Taking this concept, I wrote a simple "Hello, World!"
program. There was something that bugged me about the initial version of this "gibberish" program; it defined a string, offset code execution ever so slightly, printed the string from an address using one syscall, since the length of the string is known by nasm
, and then gracefully exits. Where's the excitement in defining a string in a preprocessor variable? We offset code execution to "obfuscate", but the program's purpose is obvious at first glance.
This blog post is not to go over that version's source code, a detailed explanation can be found in the README.md
of that project here. It was a fun first approach, but it could be better, and so, how about we scatter every byte in the string "Hello, World!"
all around the binary.
✿ truly gibberish
Something I hadn't mentioned in the previous section is that there was a fun little challenge to this project. We weren't just recklessly using the jmp
instruction all around the code to offset code execution; it had to have some spice to it and therefore, the only sensible challenge to give ourselves is limiting ourselves to only using each instruction once. Note that this only includes directly writing the instruction in the source code, whatever machine code the source text assembles to is not counted towards the instructions-only-once challenge (no, this doesn't mean we can just write the instructions out using the db
directive).
✿ execution
I'll throw in some code snippets here and there, but if you want to fully follow along with the code being discussed the source can be found here.
During the project I got pretty interested in the concept of just executing code behind the _start
entry point. It's a small thing to be interested in, but it reflects in how code execution is essentially stunted behind the entry point, to do some "initialization" and then catches up again. It could be even more interesting to possibly even mess with the linker a bit to have code at _start
, but the actual entry point earlier. Nevertheless, in this case we simply instantly use up our only call
instruction to jump back 11 bytes.
clflush [0xe6819648]
dd _start+8
jmp $+1
salc
_start:
call _start-11
What the "preamble" does here is setup the esi
register to hold a specific address. Here, the address is the location of _start + 8
as seen by the nasm
preprocessor define doubleword (dd
). To clarify this a bit more, let's unpack the operand to clflush
as
dec eax ; 48
xchg esi, eax ; 96
and esi, _start + 8 ; 81 e6 ...
So, we underflow eax
to 0xffffffff
and then exchange esi
with eax
(yes, this is the same as just doing dec esi
). Next, we make use of the fact that using the and
bitwise operation on any value with all bits set will essentially work as assignment of the given operand to the register. All in all this code is a cheeky way of doing mov esi, _start + 8
.
After esi
has its value we use our only jmp
instruction to jump into the middle of itself, meaning the next instruction starts with ff
, which we can find to be an x86 opcode holding multiple instructions; so let's take a bit of a detour and talk about some ways x86 encodes its instructions.
✿ x86 instruction encoding
Intel's x86 architecture is such a huge collection of instructions and various intricacies, it's fascinating that they're still adding instructions. They even wrote a series of Developer's Manuals about x86 and x86-64 which I highly recommend as they are very well written and it gives a good understanding of what is actually going on and how we can further interface with this complex structure to make our software as good as it can be. If you'd prefer to watch a cool short video about x86 instead I highly recommend this talk by XlogicX at DEF CON 25; although it is not exactly about x86 as a whole, it is a beautiful introduction to how a large architecture can be full of so many redundancies just for the sole purpose of backwards compatibility.
I'm rambling a bit. Nevertheless, instruction encoding in x86 is a large topic, so we'll try to skim over it as simply as we can.
Instructions come in the following structure:
+--------+--------+---------+ | PREFIX | OPCODE | OPERAND | +--------+--------+---------+
PREFIX
here is an optional prefix that can be added to instructions for adding extra information to the instruction. Intel® 64 and IA-32 Architectures Software Developer's Manual, Volume 2A in Section 2.1.1 categorizes prefixes into 4 functional groups, though only group 3 (operand-size override — 66h
) and group 4 (address-size override — 67h
) are of note to us. Further extensions to the available prefixes were added in x86-64, where all 4Xh
instructions were replaced with 16 REX prefix combinations, essentially acting as a gateway to 64-bit computations (see Opcodes 0x40 - 0x4f).
OPCODE
is a primary opcode of 1 to 3 bytes long, where 3-bits of extra information is sometimes encoded in the OPERAND
field. x86 is constantly adding new instructions to their Instruction Set Architecture (ISA), so they began using an escape opcode of 0x0f
which will pop up quite a bit later on as we begin running out of instructions to use. If you're not the type to carry all 4 books in the 2nd volume of the Intel manuals, I highly recommend using ref.x86asm.net for quick opcode lookup.
OPERAND
consists of quite a few fields; Figure 2.1 in Volume 2A bunches these all together, though we'll try go through them one by one to provide you with a base understanding of how operand encoding works in x86 to help better grasp some of the approaches taken in this article.
✿ operand encoding
msg: "this string is somewhere in memory"
mov edi, 0xcafe ; immediate
mov eax, edi ; register
mov ecx, [msg] ; address
and eax, [ecx] ; modr/m
xor eax, [ecx+4] ; modr/m with displacement
In the code snippet above we may observe a bunch of different types of operands we may encounter in x86 programming (ignore the instructions used, it's mainly the right side containing the different operand types that we'll be focusing on).
Immediates are quite self-explanatory and we can define them as any operand that is explicitly written out in the machine code. It is immediately known and needs no looking up in registers or memory, therefore 'immediate'.
We also may note that immediates can come in many sizes. Though the size of the operand is not interpreted implicitly as may happen in compilers of higher level programming languages. Instead we use the context of the instruction to determine how large the operand is (think back to the instruction prefixes we talked about). The CPU will read a 66h
a the beginning of the instruction and therefore knows to interpret the operand of the instruction as 16-bit (or another size, depending on the predefined expected operand size for the given instruction).
Assemblers and syntaxes differ in how they interpret the operand size the programmer intends to use. nasm
, for example, uses the register size to infer the size of an immediate operand (eax
would be a 32-bit operand and ax
would be 16 bits), whereas AT&T syntax may require the operand size to be specified in the suffix of the instruction (movd 0x9090, %eax
— move double-word, i.e. 32-bits, into eax; in this case the assembler pads the value from the most significant side).
nasm
syntax requires the programmer to directly specify the size they need in dereferencing/address operands (cmp al, byte [edx]
— compare the byte in al
with the byte at the address pointed to by the contents of edx
). Conversely, AT&T assembly syntax (used by the GNU Assembler as
) requires the operand size to be specified in the suffix of most instruction mnemonics (cmpb (%edx), %al
— the previous expression in AT&T syntax).
Register operands are arguably the most common case, in which the contents of the register are used for execution of the instruction. Registers are the primary way of manipulating data quickly and efficiently on a processor and therefore we wouldn't have much to do if they weren't viable as operands.
The size of register operands is similarly manipulated, where nasm
once again infers from register sizes and as
uses instruction suffixes. Note that if the inferred size is incompatible with other operands in the instruction, the assembler will (or should) throw an error (e.g. cmp al, eax
stresses nasm
out and we get error: invalid combination of opcode and operands
).
ModR/M and SIB -bytes are where things get a bit complicated, so we dedicate a full subsection to the topic. Bear with me for a second as we dive into this. It makes more sense as you go.
✿ modr/m
The addressing-form specifier byte (called the ModR/M byte) is most common in instructions that have an operand referring to an address in memory. Let us take a 4-byte nop
as an example — 0f 1f 47 10
. Here the primary opcode is 0f 1f
and 47 10
refers to what we may see in assembly as [edi+0x10]
. Thus we may write this whole instruction in assembly as nop dword [edi+0x10]
.
So, where did we get these two operand bytes (essentially one since we can notice the 10
is simply a displacement) from in the first place? Well, let's write it out in binary and add some context.
hex: 47 10 bin: 01000111 00010000
The ModR/M byte is built up of 3 fields, which are: Mod
represented by the most significant 2 bits, Reg/Opcode
by the next 3 bits and R/M
by the final 3 bits. The Mod
field is used mainly to split up the R/M
(register or memory) field into 4 distinct sections (note that Mod
values 00
, 01
and 10
are dereferrenced — brackets in nasm
):
- - a register or a single immediate value.
- - a register plus an 8-bit (signed) displacement byte.
- - a register plus a 32-bit (signed) displacement byte.
- - a register.
In our case 01
applies and therefore the next byte after the ModR/M byte is an 8-bit displacement.
The next 3 bits represent the Reg/Opcode
field, meaning the value of this field is either a register operand or an opcode extension. For reference, registers are given the indices:
-
0 () =
eax
-
1 () =
ecx
-
2 () =
edx
-
3 () =
ebx
-
4 () =
esp
-
5 () =
ebp
-
6 () =
esi
-
7 () =
edi
To differentiate between an instruction that uses the Reg/Opcode
field for its register value or opcode value, we must consult the documentation. All (documented) x86 instructions may be found in Volume 2 of the Developer's Manuals [Developer's Manual Combined Volumes 2A, 2B, 2C, and 2D: Instruction Set Reference, A-Z], though as an online resource we may use coder32 / coder64 or felixcloutier.com (which takes from the official Instruction Set Reference to provide an online resource for the ISA).
Navigating to the NOP instruction on Félix Cloutier's website leads us to an interesting table of information. For now we'll focus on these two relevant columns:
Opcode | Instruction |
---|---|
NP 90 | NOP |
NP 0F 1F /0 | NOP r/m16 |
NP 0F 1F /0 | NOP r/m32 |
The Opcode
column contains quite a lot of information about how the instruction will be written in its machine code form, whereas the Instruction
column tells us the mnemonic and what operand or operand size the instruction expects.
First, let's take a look at NP 0F 1F /0
in the Opcode
column. Section 3.1.1.1 in Volume 2A of the x86 Developer's Manuals contains a good summary for the shorthands used in opcode notation, though we'll discuss the most frequent ones here.
NP
tells us that this instruction is not allowed to have the prefixes 66
, F2
or F3
, however in the case of nop
these prefixes do not cause an invalid-opcode exception, but simply get ignored. Another very common prefix in the Opcode
column is REX.W
, meaning the instruction expects a 64-bit operand.
The 0F 1F
are two hexadecimal values denoting the bytes used to represent the primary opcode part of the instruction. nop
s can be represented by the primary opcode 90
or 0F 1F
corresponding to the single-byte and multi-byte no-operations respectively. You may be thinking to yourself "Why would anyone ever need a nop
, letalone a version of it that uses more storage?", I've thought it too. However, having multiple forms of a nop
, ranging from 1 to 15 bytes, allows compilers to optimize machine code to be better aligned with the processor's word-size for faster addressing.
/digit
(in the case of a multi-byte no-op /0
) indicates that only the R/M
field of the ModR/M byte is used for the operand and the Reg/Opcode
field is used as an opcode extension. This is common for instructions that only take one operand or a register or memory operand along with an immediate value (e.g. inc
and dec
instructions). When writing the machine code for a specific instruction and you use a different Reg/Opcode
value than the one specified by the opcode, you may be left with an entirely different instruction.
Returning back to the ModR/M byte's fields, we have the final R/M
part of the value, which stands for "register or memory". In the case of our nop dword [edi+0x10]
we have a R/M
value of 7 (111
in binary). Here we make use of the same register indices as for Reg/Opcode
and so we know we're dealing with the register edi
.
Taking all the information we gathered from the 47h
byte, we know our operand is the 32-bit value located at the address pointed to by the register edi
with a displacement of 10
.
The easiest way to work with ModR/M bytes is to use the lookup table provided in Volume 2A of the Developer's Manuals (specifically Table 2-1 for 16-bit ModR/M and Table 2-2 for 32-bit ModR/M). A convenient online reference can be found on coder32, including a table for 64-bit ModR/M on coder64.
Something to make note of is that not all registers are available through the R/M
byte for Mod
values 00
, 01
and 10
. If the Mod
field is equal to 0, the R/M
value of 101
(register ebp
) corresponds to a literal 32-bit displacement expected in the next 4 bytes after the ModR/M byte. Further, for the 3 referenced Mod
values esp
is replaced by the SIB-byte, which we will discuss in the following section.
✿ sib
The SIB byte, standing for Scale-Index-Base, allows for an even wider range of addressing modes. An address is formed by multiplying the index by the scale and adding the base (). Using this we get the array indexing we know in high level languages. Say we have an array of 16-bit integers that starts at the address in eax
, we can index the 4th element with [2 * 3 + eax]
. You may already notice what is going on here, but let's break it down.
The Scale value represents a scaling factor for the index; in the case of traditional array indexing we can think of it as the amount of bytes a single value of the array has.
The Index value specifies the specific increment from the beginning of the array. Here we even get a hint as to why programming languages tend to beginning indexing arrays from 0; indexing the first element of the array would also be dereferencing the pointer to the beginning of the array, which could be written in assembly as [2 * 0 + eax]
, where the multiplication by zero would zero out and the array pointer itself would get dereferenced.
The Base value is simple and straightforward as it points to the beginning of the array or the data block relative to which we wish to index.
✿ modr/m in practice
ModR/M provides a very useful interface, not only for addressing things relative to current values, but also for optimizations in addition and multiplication.
int
int
int
This C code can be optimized using ModR/M and SIB in the following way (where edi
and esi
hold the function operands and eax
holds the return value in accordance with the SysV ABI).
add:
lea eax, [edi+esi] ; add two register values and store them in the return register
ret
mul4:
lea eax, [edi*4]
ret
mul5:
lea eax, [edi+edi*4]
ret
Using this relative addressing allows us to remove the need to mov
the sum/product into the return register, since lea
(load effective address) with ModR/M and SIB does this all in one instruction.