gibberish

14/01/25
making a simple "Hello, World!" program unreadable

introduction

Sitting in class one day I thought of what happens when you use the jmp instruction to jump from somewhere in the code to an arbitrary location within another instruction. By "arbitrary location" here I mean the middle of a different instruction. In fact the simplest example I could think of as I was sitting in that classroom was the machine code sequence (in hexadecimal) eb ff, which translates to jmp $ + 1 using nasm notation where the dollar sign $ denotes the current position of an imaginary "cursor" at compile time, in this case that cursor is placed at the beginning of the jmp instruction.

Now, it is important to note that eb, in its base form, takes a relative signed 8-bit address as its one and only operand and this relative address jumps from the end of the given instruction. In the case of ff, which in two's complement translates to a relative address -1, this means we jump from the end of the jmp instruction directly to its center, therefore the next instruction will read ff ....

defining the string

Taking this concept, I wrote a simple "Hello, World!" program. There was something that bugged me about the initial version of this "gibberish" program; it defined a string, offset code execution ever so slightly, printed the string from an address using one syscall, since the length of the string is known by nasm, and then gracefully exits. Where's the excitement in defining a string in a preprocessor variable? We offset code execution to "obfuscate", but the program's purpose is obvious at first glance.

This blog post is not to go over that version's source code, a detailed explanation can be found in the README.md of that project here. It was a fun first approach, but it could be better, and so, how about we scatter every byte in the string "Hello, World!" all around the binary.

truly gibberish

Something I hadn't mentioned in the previous section is that there was a fun little challenge to this project. We weren't just recklessly using the jmp instruction all around the code to offset code execution; it had to have some spice to it and therefore, the only sensible challenge to give ourselves is limiting ourselves to only using each instruction once. Note that this only includes directly writing the instruction in the source code, whatever machine code the source text assembles to is not counted towards the instructions-only-once challenge (no, this doesn't mean we can just write the instructions out using the db directive).

execution

I'll throw in some code snippets here and there, but if you want to fully follow along with the code being discussed the source can be found here.

During the project I got pretty interested in the concept of just executing code behind the _start entry point. It's a small thing to be interested in, but it reflects in how code execution is essentially stunted behind the entry point, to do some "initialization" and then catches up again. It could be even more interesting to possibly even mess with the linker a bit to have code at _start, but the actual entry point earlier. Nevertheless, in this case we simply instantly use up our only call instruction to jump back 11 bytes.

    clflush [0xe6819648]
dd _start+8
    jmp $+1
    salc
_start:
    call _start-11

What the "preamble" does here is setup the esi register to hold a specific address. Here, the address is the location of _start + 8 as seen by the nasm preprocessor define doubleword (dd). To clarify this a bit more, let's unpack the operand to clflush as

    dec eax             ; 48
    xchg esi, eax       ; 96
    and esi, _start + 8 ; 81 e6 ...
So, we underflow eax to 0xffffffff and then exchange esi with eax (yes, this is the same as just doing dec esi). Next, we make use of the fact that using the and bitwise operation on any value with all bits set will essentially work as assignment of the given operand to the register. All in all this code is a cheeky way of doing mov esi, _start + 8.

After esi has its value we use our only jmp instruction to jump into the middle of itself, meaning the next instruction starts with ff, which we can find to be an x86 opcode holding multiple instructions; so let's take a bit of a detour and talk about some ways x86 encodes its instructions.

x86 instruction encoding

Intel's x86 architecture is such a huge collection of instructions and various intricacies, it's fascinating that they're still adding instructions. They even wrote a series of Developer's Manuals about x86 and x86-64 which I highly recommend as they are very well written and it gives a good understanding of what is actually going on and how we can further interface with this complex structure to make our software as good as it can be. If you'd prefer to watch a cool short video about x86 instead I highly recommend this talk by XlogicX at DEF CON 25; although it is not exactly about x86 as a whole, it is a beautiful introduction to how a large architecture can be full of so many redundancies just for the sole purpose of backwards compatibility.

I'm rambling a bit. Nevertheless, instruction encoding in x86 is a large topic, so we'll try to skim over it as simply as we can.

Instructions come in the following structure:

+--------+--------+---------+
| PREFIX | OPCODE | OPERAND |
+--------+--------+---------+

PREFIX here is an optional prefix that can be added to instructions for adding extra information to the instruction. Intel® 64 and IA-32 Architectures Software Developer's Manual, Volume 2A in Section 2.1.1 categorizes prefixes into 4 functional groups, though only group 3 (operand-size override — 66h) and group 4 (address-size override — 67h) are of note to us. Further extensions to the available prefixes were added in x86-64, where all 4Xh instructions were replaced with 16 REX prefix combinations, essentially acting as a gateway to 64-bit computations (see Opcodes 0x40 - 0x4f).

OPCODE is a primary opcode of 1 to 3 bytes long, where 3-bits of extra information is sometimes encoded in the OPERAND field. x86 is constantly adding new instructions to their Instruction Set Architecture (ISA), so they began using an escape opcode of 0x0f which will pop up quite a bit later on as we begin running out of instructions to use. If you're not the type to carry all 4 books in the 2nd volume of the Intel manuals, I highly recommend using ref.x86asm.net for quick opcode lookup.

OPERAND consists of quite a few fields; Figure 2.1 in Volume 2A bunches these all together, though we'll try go through them one by one to provide you with a base understanding of how operand encoding works in x86 to help better grasp some of the approaches taken in this article.

operand encoding
msg: "this string is somewhere in memory"
    
    mov edi, 0xcafe     ; immediate
    mov eax, edi        ; register
    mov ecx, [msg]      ; address
    and eax, [ecx]      ; modr/m
    xor eax, [ecx+4]    ; modr/m with displacement

In the code snippet above we may observe a bunch of different types of operands we may encounter in x86 programming (ignore the instructions used, it's mainly the right side containing the different operand types that we'll be focusing on).

Immediates are quite self-explanatory and we can define them as any operand that is explicitly written out in the machine code. It is immediately known and needs no looking up in registers or memory, therefore 'immediate'.

We also may note that immediates can come in many sizes. Though the size of the operand is not interpreted implicitly as may happen in compilers of higher level programming languages. Instead we use the context of the instruction to determine how large the operand is (think back to the instruction prefixes we talked about). The CPU will read a 66h a the beginning of the instruction and therefore knows to interpret the operand of the instruction as 16-bit (or another size, depending on the predefined expected operand size for the given instruction).

Assemblers and syntaxes differ in how they interpret the operand size the programmer intends to use. nasm, for example, uses the register size to infer the size of an immediate operand (eax would be a 32-bit operand and ax would be 16 bits), whereas AT&T syntax may require the operand size to be specified in the suffix of the instruction (movd 0x9090, %eax — move double-word, i.e. 32-bits, into eax; in this case the assembler pads the value from the most significant side).

nasm syntax requires the programmer to directly specify the size they need in dereferencing/address operands (cmp al, byte [edx] — compare the byte in al with the byte at the address pointed to by the contents of edx). Conversely, AT&T assembly syntax (used by the GNU Assembler as) requires the operand size to be specified in the suffix of most instruction mnemonics (cmpb (%edx), %al — the previous expression in AT&T syntax).

Register operands are arguably the most common case, in which the contents of the register are used for execution of the instruction. Registers are the primary way of manipulating data quickly and efficiently on a processor and therefore we wouldn't have much to do if they weren't viable as operands.

The size of register operands is similarly manipulated, where nasm once again infers from register sizes and as uses instruction suffixes. Note that if the inferred size is incompatible with other operands in the instruction, the assembler will (or should) throw an error (e.g. cmp al, eax stresses nasm out and we get error: invalid combination of opcode and operands).

ModR/M and SIB -bytes are where things get a bit complicated, so we dedicate a full subsection to the topic. Bear with me for a second as we dive into this. It makes more sense as you go.

modr/m

The addressing-form specifier byte (called the ModR/M byte) is most common in instructions that have an operand referring to an address in memory. Let us take a 4-byte nop as an example — 0f 1f 47 10. Here the primary opcode is 0f 1f and 47 10 refers to what we may see in assembly as [edi+0x10]. Thus we may write this whole instruction in assembly as nop dword [edi+0x10].

So, where did we get these two operand bytes (essentially one since we can notice the 10 is simply a displacement) from in the first place? Well, let's write it out in binary and add some context.

   hex:    47       10
   bin: 01000111 00010000

The ModR/M byte is built up of 3 fields, which are: Mod represented by the most significant 2 bits, Reg/Opcode by the next 3 bits and R/M by the final 3 bits. The Mod field is used mainly to split up the R/M (register or memory) field into 4 distinct sections (note that Mod values 00, 01 and 10 are dereferrenced — brackets in nasm):

In our case 01 applies and therefore the next byte after the ModR/M byte is an 8-bit displacement.

The next 3 bits represent the Reg/Opcode field, meaning the value of this field is either a register operand or an opcode extension. For reference, registers are given the indices:

To differentiate between an instruction that uses the Reg/Opcode field for its register value or opcode value, we must consult the documentation. All (documented) x86 instructions may be found in Volume 2 of the Developer's Manuals [Developer's Manual Combined Volumes 2A, 2B, 2C, and 2D: Instruction Set Reference, A-Z], though as an online resource we may use coder32 / coder64 or felixcloutier.com (which takes from the official Instruction Set Reference to provide an online resource for the ISA).

Navigating to the NOP instruction on Félix Cloutier's website leads us to an interesting table of information. For now we'll focus on these two relevant columns:

OpcodeInstruction
NP 90NOP
NP 0F 1F /0NOP r/m16
NP 0F 1F /0NOP r/m32

The Opcode column contains quite a lot of information about how the instruction will be written in its machine code form, whereas the Instruction column tells us the mnemonic and what operand or operand size the instruction expects.

First, let's take a look at NP 0F 1F /0 in the Opcode column. Section 3.1.1.1 in Volume 2A of the x86 Developer's Manuals contains a good summary for the shorthands used in opcode notation, though we'll discuss the most frequent ones here.

NP tells us that this instruction is not allowed to have the prefixes 66, F2 or F3, however in the case of nop these prefixes do not cause an invalid-opcode exception, but simply get ignored. Another very common prefix in the Opcode column is REX.W, meaning the instruction expects a 64-bit operand.

The 0F 1F are two hexadecimal values denoting the bytes used to represent the primary opcode part of the instruction. nops can be represented by the primary opcode 90 or 0F 1F corresponding to the single-byte and multi-byte no-operations respectively. You may be thinking to yourself "Why would anyone ever need a nop, letalone a version of it that uses more storage?", I've thought it too. However, having multiple forms of a nop, ranging from 1 to 15 bytes, allows compilers to optimize machine code to be better aligned with the processor's word-size for faster addressing.

/digit (in the case of a multi-byte no-op /0) indicates that only the R/M field of the ModR/M byte is used for the operand and the Reg/Opcode field is used as an opcode extension. This is common for instructions that only take one operand or a register or memory operand along with an immediate value (e.g. inc and dec instructions). When writing the machine code for a specific instruction and you use a different Reg/Opcode value than the one specified by the opcode, you may be left with an entirely different instruction.

Returning back to the ModR/M byte's fields, we have the final R/M part of the value, which stands for "register or memory". In the case of our nop dword [edi+0x10] we have a R/M value of 7 (111 in binary). Here we make use of the same register indices as for Reg/Opcode and so we know we're dealing with the register edi.

Taking all the information we gathered from the 47h byte, we know our operand is the 32-bit value located at the address pointed to by the register edi with a displacement of 10.

The easiest way to work with ModR/M bytes is to use the lookup table provided in Volume 2A of the Developer's Manuals (specifically Table 2-1 for 16-bit ModR/M and Table 2-2 for 32-bit ModR/M). A convenient online reference can be found on coder32, including a table for 64-bit ModR/M on coder64.

Something to make note of is that not all registers are available through the R/M byte for Mod values 00, 01 and 10. If the Mod field is equal to 0, the R/M value of 101 (register ebp) corresponds to a literal 32-bit displacement expected in the next 4 bytes after the ModR/M byte. Further, for the 3 referenced Mod values esp is replaced by the SIB-byte, which we will discuss in the following section.

sib

The SIB byte, standing for Scale-Index-Base, allows for an even wider range of addressing modes. An address is formed by multiplying the index by the scale and adding the base (S*I+B). Using this we get the array indexing we know in high level languages. Say we have an array of 16-bit integers that starts at the address in eax, we can index the 4th element with [2 * 3 + eax]. You may already notice what is going on here, but let's break it down.

The Scale value represents a scaling factor for the index; in the case of traditional array indexing we can think of it as the amount of bytes a single value of the array has.

The Index value specifies the specific increment from the beginning of the array. Here we even get a hint as to why programming languages tend to beginning indexing arrays from 0; indexing the first element of the array would also be dereferencing the pointer to the beginning of the array, which could be written in assembly as [2 * 0 + eax], where the multiplication by zero would zero out and the array pointer itself would get dereferenced.

The Base value is simple and straightforward as it points to the beginning of the array or the data block relative to which we wish to index.

modr/m in practice

ModR/M provides a very useful interface, not only for addressing things relative to current values, but also for optimizations in addition and multiplication.

int add(int a, int b) {
    return a + b;
}

int mul4(int a) {
    return a * 4;
}

int mul5(int a) {
    return a * 5;
}

This C code can be optimized using ModR/M and SIB in the following way (where edi and esi hold the function operands and eax holds the return value in accordance with the SysV ABI).

add:
    lea eax, [edi+esi] ; add two register values and store them in the return register
    ret

mul4:
    lea eax, [edi*4]
    ret

mul5:
    lea eax, [edi+edi*4]
    ret

Using this relative addressing allows us to remove the need to mov the sum/product into the return register, since lea (load effective address) with ModR/M and SIB does this all in one instruction.