Code Caves on Windows

I have been reading about code caves on Windows, and found some interesting links on the topic. Unfortunately, I could not find a single source that contained all of the relevant information, so I have been getting the different pieces from here and there and putting them together. This page is meant to provide all of the information regarding code caves on Windows.

General Scheme

The general scheme regarding code caves is well-documented and I will not replicate it here. Instead, I will just link to the relevant sources:

http://www.blizzhackers.cc/viewtopic.php?p=2483118

http://resources.infosecinstitute.com/code-injection-techniques/

32-bit Shellcode

The 32-bit code responsible for loading the DLL from within the target process goes as follows:

bits 32
push dword 0x17171717 ; ret
pushfd
pushad
push dword 0x18181818 ; dll path
mov eax, 0x19191919 ; LoadLibrary
call eax
popad
popfd
ret

The 0x17171717 value is a placeholder for the return address. This address shall be the value of the eip register at the instant when the target thread in the target process is put on a halt using SuspendThread.

The 0x18181818 value is a placeholder for the path to the DLL file to inject. Note that this path must be either an absolute path or a path relative to the target process. Failure to satisfy this requirement will result in a failed call to LoadLibrary.

The 0x19191919 value is a placeholder for the address of LoadLibrary itself. The dll injection program should resolve the address of LoadLibrary using GetProcAddress and overwrite this value with the proper address.

An interesting side effect of how Windows works is that system DLLs like Kernel32 (where LoadLibrary is defined) are mapped to the same address space in all processes, regardless of any ASLR that may be effective on the system. So, while this code cave scheme where the address of LoadLibrary is hard-coded into the shellcode is technically wrong, in practise it works wonders. The code cave tutorial on INFOSEC Institute works around this by making the shellcode itself resolve the address of LoadLibrary. Since the shellcode runs from within the target process, the address obtained will be correct. However, I found this to be unnecessary for the reason just described.

Shellcode can be obtained from the above assembly code using the ShellBlade tool:

const char shellcode[] =
        "\x68\x17\x17\x17\x17\x9c\x60\x68\x18\x18\x18\x18\xb8\x19\x19"
        "\x19\x19\xff\xd0\x61\x9d\xc3";

64-bit Shellcode

This is the part where it started to get tricky. I could not find working shellcode for 64-bit systems. This project on Code Project gives us a hint as to what it may look like, but the code was not entirely correct. In fact, I replied to the thread with working shellcode. The 64-bit code for loading a DLL that is to be injected into the target process goes as follows:
bits 64
push dword 0x17171717 ; ret (high)
push dword 0x17171717 ; ret (low)
pushfq
; no pushad on x64
push rax
push rbx
push rcx
push rdx
push rsi
push rdi
push rbp
push r8
push r9
push r10
push r11
push r12
push r13
push r14
push r15
push dword 0x23232323 ; align stack
mov rcx, 0x1818181818181818 ; dll path, using fastcall
mov rax, 0x1919191919191919 ; LoadLibrary
call rax
pop rax ; pop dummy stack alignment value
pop r15
pop r14
pop r13
pop r12
pop r11
pop r10
pop r9
pop r8
pop rbp
pop rdi
pop rsi
pop rdx
pop rcx
pop rbx
pop rax
popfq
ret

This code is rather lengthy, but actually quite simple.

The first two lines push two 0x17171717 values onto the stack, which are meant to hold the high and low parts of the return address (which is now 8 bytes). Because there is no push qword instruction in x64, the address must be assembled on the stack in two steps using two push dword instructions. As a consequence, the DLL injector must overwrite the placeholders with two memcpys, since the two 0x17171717 values will be split by a 0x68 opcode, which stands for push (see the shellcode below).

Next, there is no pushad or popad instruction in x64, so the registers must be saved and later restored manually. And there are quite a few of them.

The push dword 0x23232323 instruction is there to align the stack to a 16-byte boundary. This is a requirement of the fastcall calling convention, as explained here. This was the last final touch I needed to get the shellcode working, and I haven't seen a single reference regarding x64 code caves that explains this.

The next two instructions are responsible for calling LoadLibrary. The 0x18181818181818 value is a placeholder for the path to the DLL file, which again, must either be an absolute path or a path relative to the target process. The 0x1919191919191919 value is a placeholder for the address of LoadLibrary. Again, this address is the same for all processes due to how Kernel32.dll is loaded into memory. Also note that the path to the DLL file is moved to rcx instead of pushed onto the stack. This is how function arguments are passed using the fastcall calling convention. More details can be found here.

The other instructions simply undo the changes made to the stack and give control back to the target process.

Shellcode can be obtained from the above assembly code using the ShellBlade tool:

const char shellcode[] =
        "\x68\x17\x17\x17\x17\x68\x17\x17\x17\x17\x9c\x50\x53\x51\x52"
        "\x56\x57\x55\x41\x50\x41\x51\x41\x52\x41\x53\x41\x54\x41\x55"
        "\x41\x56\x41\x57\x68\x23\x23\x23\x23\x48\xb9\x18\x18\x18\x18"
        "\x18\x18\x18\x18\x48\xb8\x19\x19\x19\x19\x19\x19\x19\x19\xff"
        "\xd0\x58\x41\x5f\x41\x5e\x41\x5d\x41\x5c\x41\x5b\x41\x5a\x41"
        "\x59\x41\x58\x5d\x5f\x5e\x5a\x59\x5b\x58\x9d\xc3";

Other Links of Interest

A Walk in x64 Land

Introduction to x64 Assembly

Happy hacking!