Shellcode


Shellcoding is an excellent way to learn more about assembly language and how a program communicates with the underlying OS.

Why are we red teamers and penetration testers writing shellcode?

Because in real cases shellcode can be a code that is injected into a running program to make it do something it was not made to do, for example buffer overflow attacks. So shellcode is generally can be used as the “payload” of an exploit.

Boilerplate for Testing Shellcode

We will use this below run.c Code to Test all our shellcode.

Code

#include <stdio.h>
#include <string.h>
#include <sys/mman.h>
 
int main() {
    char code[] = "Your shell code here.";
	
    // Allocate executable memory
    void *exec = mmap(0, sizeof(code), PROT_READ | PROT_WRITE | PROT_EXEC, MAP_ANON | MAP_PRIVATE, -1, 0);
	
    memcpy(exec, code, sizeof(code));
	
    // Cast and call
    ((void(*)())exec)();
    return 0;
}
 

Explanation


#include <sys/mman.h>

<sys/mman.h> is a C header file on Unix/Linux systems that provides access to memory management functions, most notably mmap() and munmap().

  • It gives us low-level control over how memory is allocated, protected, and shared in our program, beyond what malloc() or new can do.

Common Functions in <sys/mman.h>

FunctionDescription
mmap()Maps memory (e.g., allocates a memory region that can be read, written, or executed). Often used for shared memory, file-backed memory, or raw executable memory for shellcode.
munmap()Unmaps a memory region created by mmap().
mprotect()Changes the protection (read/write/exec) of memory pages.
msync()Flushes changes made in memory-mapped files to disk.

void *exec = mmap(0, sizeof(code), PROT_READ | PROT_WRITE | PROT_EXEC, MAP_ANON | MAP_PRIVATE, -1, 0);

This line is allocating a region of memory that is:

  • Readable,
  • Writable,
  • Executable,
  • Private to this process,
  • Not backed by any file.
ArgumentMeaning
0Let the OS choose the address for the memory region.
sizeof(code)Size of the memory to allocate, in bytes (same size as your shellcode).
`PROT_READPROT_WRITE
MAP_ANON or MAP_ANONYMOUSMemory is not backed by any file—it’s just zero-initialized memory.
MAP_PRIVATEMemory changes are private to this process (not shared with others).
-1File descriptor (not used because we’re using anonymous mapping).
0Offset in the file (also not used here).

memcpy(exec, code, sizeof(code));

Copies the shellcode (from the code array) into the executable memory region pointed to by exec.

PartMeaning
memcpyA standard C function that copies memory.
execDestination address — memory returned by mmap() that’s marked as executable.
codeSource address — your shellcode, stored as a byte array.
sizeof(code)Number of bytes to copy (length of the shellcode).

We can’t directly execute code stored in a regular data array (like char code[]) on most modern OS, because that memory is marked non-executable by default for security. So you:

  1. Allocate a memory region that can be executed (mmap(...)).
  2. Copy the shellcode there with memcpy.
  3. Run it by casting and calling as a function.

((void(*)())exec)();

This line casts the exec pointer to a function pointer and then calls it like a normal function. Here’s a breakdown:

Step-by-step explanation:

PartMeaning
execA void* pointer to memory where your shellcode is copied.
(void(*)())execCasts exec to a pointer to a function that returns void and takes no arguments.
((void(*)())exec)()Calls that function. This jumps to the start of the shellcode and executes it.

Disable ASLR


What is ASLR?

Address Space Layout Randomization (ASLR) is a security feature used by operating systems like Windows, Linux, and macOS.

Why does ASLR exist?

Traditionally, memory addresses in a program were predictable. Attackers could guess where in memory to inject or run malicious code.
With ASLR:

  • The memory addresses used by a program change every time the program is run.
  • This makes it very hard for an attacker to know where their malicious code should go.

What gets randomized?

Every time a program starts, these can be randomly placed in memory:

  • The 1stack
  • The 2heap
  • The 3base address of the executable
  • The location of DLLs (like kernel32.dll, user32.dll, etc.)

Without ASLR:

  • Attacker knows: “My shellcode is always loaded at 0x00401000”
  • Easy to exploit.

With ASLR:

  • That address changes every time: 0x7FFD3000, 0x60400000, etc.
  • Exploit fails unless they guess correctly - much harder!

We can configure ASLR in Linux using the /proc/sys/kernel/randomize_va_space interface.

  • 0 = No Randomization
  • 1 = Conservative Randomization
  • 2 = Full Randomization

To disable ASLR, run:

echo 0 > /proc/sys/kernel/randomize_va_space

Enable ASLR, run:

echo 2 > /proc/sys/kernel/randomize_va_space

Assembly


x86 Intel Register Basics

RegisterSizePurpose
EAX32-bitAccumulator – Used for arithmetic, I/O, syscall return values
EBX32-bitBase – Often holds pointers or syscall arguments
ECX32-bitCounter – Loop counters or shifts
EDX32-bitData – Used in arithmetic, I/O, or syscalls

Each has smaller versions:

  • AX (16-bit), split into AH (upper 8-bit) + AL (lower 8-bit)
  • Example: EAX > AX > AH + AL

Common Assembly Instructions

InstructionMeaning
mov eax, 32Move 32 into EAX.
Eg.(EAX = 32)
xor eax, eaxClear EAX (EAX = 0) using XOR trick
push eaxPush value of EAX onto the stack
pop ebxPop top of stack into EBX
call funcCall a function (like in C)
int 0x80Trigger a Linux syscall interrupt

What is a Syscall?

A system call lets our program ask the Linux kernel to do something privileged, like:

  • Open a file
  • Print text
  • Exit a program
  • Allocate memory

Instead of writing a lot of C code, we can do it manually in assembly by directly telling the CPU what to do.

Steps to Make a Linux Syscall (in x86, 32-bit)

  1. Put the syscall number in EAX

    • This tells Linux which syscall we want.
    • Example: EAX = 1exit()
  2. Put the arguments in these registers:

Argument #Register
1stEBX
2ndECX
3rdEDX
4thESI
5thEDI
6thEBP
  1. Trigger the syscall using:
int 0x80
  • This causes a software interrupt → kernel takes over and runs our request.
  1. The result (like success/failure or return value) is placed in EAX.

Example: Exit with status 0

mov eax, 1      ; syscall number for exit()
mov ebx, 0      ; exit code 0
int 0x80        ; make the syscall

This tells the kernel: “I want to exit with code 0.”

Where are syscall numbers listed?

  • File: /usr/include/asm/unistd_32.h
  • Or run: man syscall
    Example:
  • exit() → syscall number = 0x1
  • exit_group() → syscall number = 0xfc

Example: C Program exit(0)

exit0.c:

#include <stdlib.h>
void main() {
    exit(0);
}

Compile & Disassemble:

gcc -masm=intel -static -m32 -o exit0 exit0.c
gdb -q ./exit0
voldemort@IdeaPad:~/Malware$ gdb -q ./exit0
Reading symbols from ./exit0...
 
This GDB supports auto-downloading debuginfo from the following URLs:
  <https://debuginfod.ubuntu.com>
Enable debuginfod for this session? (y or [n]) n
Debuginfod has been disabled.
To make this setting permanent, add 'set debuginfod enabled off' to .gdbinit.
(No debugging symbols found in ./exit0)
(gdb) disas _exit
Dump of assembler code for function _exit:
   0x08054030 <+0>:	call   0x804989a <__x86.get_pc_thunk.ax>
   0x08054035 <+5>:	add    $0x8efbf,%eax
   0x0805403a <+10>:	push   %ebx
   0x0805403b <+11>:	mov    $0xfc,%edx
   0x08054040 <+16>:	mov    0x8(%esp),%ebx
   0x08054044 <+20>:	mov    $0xffffffe0,%ecx
   0x0805404a <+26>:	jmp    0x8054051 <_exit+33>
   0x0805404c <+28>:	lea    0x0(%esi,%eiz,1),%esi
   0x08054050 <+32>:	hlt
   0x08054051 <+33>:	mov    %edx,%eax
   0x08054053 <+35>:	call   *%gs:0x10
   0x0805405a <+42>:	cmp    $0xfffff000,%eax
   0x0805405f <+47>:	jbe    0x8054050 <_exit+32>
   0x08054061 <+49>:	neg    %eax
   0x08054063 <+51>:	mov    %eax,%gs:(%ecx)
   0x08054066 <+54>:	jmp    0x8054050 <_exit+32>
End of assembler dump.
(gdb) run
Starting program: /home/voldemort/Malware/exit0 
[Inferior 1 (process 47630) exited normally]
(gdb) break exit
Breakpoint 1 at 0x804c570
(gdb) run
Starting program: /home/voldemort/Malware/exit0 
 
Breakpoint 1, 0x0804c570 in exit ()
(gdb) disas
Dump of assembler code for function exit:
=> 0x0804c570 <+0>:	push   %esi
   0x0804c571 <+1>:	pop    %esi
   0x0804c572 <+2>:	call   0x804989a <__x86.get_pc_thunk.ax>
   0x0804c577 <+7>:	add    $0x96a7d,%eax
   0x0804c57c <+12>:	sub    $0xc,%esp
   0x0804c57f <+15>:	lea    0x6c(%eax),%eax
   0x0804c585 <+21>:	push   $0x1
   0x0804c587 <+23>:	push   $0x1
   0x0804c589 <+25>:	push   %eax
   0x0804c58a <+26>:	push   0x1c(%esp)
   0x0804c58e <+30>:	call   0x804c2d0 <__run_exit_handlers>
End of assembler dump.
(gdb)  

By this we can see how a function send’s instruction to assemblly.

Summary

  • Registers like EAX, EBX are essential in low-level code.
  • Assembly instructions control data movement and logic.
  • Linux syscalls require setting registers and using int 0x80.
  • exit() is just a wrapper around a syscall using EAX = 1.

Importance of null byte

Create meow.c file with below code:

#include <stdio.h>
int main(void) {
	printf ("=^..^= meow \x00 meow");
	return 0;
}

Compile and run

gcc -m32 -w -o meow meow.c
./meow


printf ("=^..^= meow \x00 meow");

  • This line prints a formatted string to the console using printf().

Now, let’s break down the string =^..^= meow \x00 meow:

  • =^..^=: This is just a simple string, which is the face of a cat or kitten in “ASCII art” (often referred to as a “meow face”).
  • meow: This is the literal string “meow”.
  • \x00: This is the hexadecimal escape sequence in C. \x00 represents a null character (ASCII value 0). It’s a special character that typically marks the end of a string in C.
    • In this case, \x00 won’t display anything in the output because it’s a non-printable character, but it will still be part of the string passed to printf().
  • meow: Another literal string “meow”.

So, the final string passed to printf() will look something like this in memory:

=^..^= meow [null byte] meow

Explanation of the Output:

  • =^..^=: This is the “meow face” or ASCII art representing a cat.
  • meow: This is a string that was printed.
  • After this, we’re seeing the prompt (voldemort@IdeaPad:~/Malware$), which is our terminal prompt. This is just the normal shell prompt, indicating that our program has finished executing.

Why \x00 didn’t show up:

The null character \x00 (hexadecimal value 0) is a non-printable character. It is often used in C to mark the end of a string. So, even though printf was given the string that includes \x00, it doesn’t cause any visible output.

In most cases, the null byte doesn’t cause any visible change when printing to the terminal because it’s treated as the string terminator in C. It doesn’t show up in the output but is still present in memory. So, printf simply stops printing at the first occurrence of the null byte, which happens after the first “meow.”

So, even though \x00 is in the string, it won’t be printed, and the output we get is:

=^..^= meow
  • The null byte \x00 is a non-printable character that serves as a string terminator in C, but doesn’t actually show up in the output.
  • Our terminal prompt (voldemort@IdeaPad:~/Malware$) is shown after the program finishes running, but this is separate from the program output itself.

Assembly codes

Program 1

; exit1.asm
 
section .data
 
section .bss
 
section .text
	global _start
 
_start:
	mov eax, 0
	mov eax, 1
	int 0x80
Explanation
  1. section .data

    • This is where you put variables with values already known.
      Example:
      section .data
      msg db "Hello", 0
  2. section .bss

    • This is for empty variables (you’ll fill them later).
      Example:
      section .bss
      buffer resb 64  ; reserve 64 bytes

Think: .data = pre-filled, .bss = empty box.

  1. section .text

    • This is where your actual instructions/code live.
      Like C’s main(), your program starts here.
  2. What is global _start?

    • global _start makes the _start label accessible to the linker.
    • When the program runs, Linux says:
      “Where do I start?”
    • So we write:
      global _start
      _start:
    • This is like saying: “Start from here!”
      Equivalent to int main() in C.
mov eax, 0
  • This moves the value 0 into the eax register.
  • But it’s immediately overwritten, so it’s useless here.
mov eax, 1
  • eax is like a temporary number holder.
  • We say: “Put number 1 in eax.”

Why number 1? Because:
System calls = Asking the OS to do something

Linux gives us a list of numbers. Each number = a request:

NumberSyscall
1exit()
4write()
5open()
6close()
11execve()
We can find it at /usr/include/asm/unistd_32.h
So:
mov eax, 1
int 0x80

Means: “Hey Linux, I want to exit.”

What is int 0x80?

  • It switches to kernel mode (a special mode where Linux runs).
  • It tells Linux: “Please perform the syscall I asked for.”
  • It’s like hitting the enter key after typing the syscall.
Compile and Investigate exit1.asm
nasm -f elf32 -o exit1.o exit1.asm
ld -m elf_i386 -o exit1 exit1.o
./exit1
objdump -M intel -d exit1


YLet’s look again at this part of your objdump -d output:

08049000 <_start>:
 8049000:	b8 00 00 00 00        mov    eax,0x0
 8049005:	b8 01 00 00 00        mov    eax,0x1
 804900a:	cd 80                 int    0x80

The left column is the hex machine code, byte by byte:

Line 1:

8049000:	b8 00 00 00 00
  • b8 = opcode for mov eax, <value>
  • 00 00 00 00 = the 4-byte value, which is 0x00000000
    This is 4 zero bytes being written into eax.

Line 2:

8049005:	b8 01 00 00 00
  • Again b8 = mov to eax
  • Now the value is 0x00000001 = 1 in 4 bytes
  • So here you have three 0x00 bytes, and one 0x01

Line 3:

804900a:	cd 80
  • cd is the opcode for int (interrupt)
  • 80 is the interrupt number 0x80 (the syscall gateway)

Program 2

;exit2.asm
 
section .data
 
section .bss
 
section .text
	global _start
 
_start:
	xor eax, eax   ;zero out eax
	xor ebx, ebx
	mov al, 1
	
	int 0x80
Compile and Investigate exit1.asm
nasm -f elf32 -o exit2.o exit2.asm
ld -m elf_i386 -o exit2 exit2.o
./exit2
objdump -M intel -d exit2

Difference Between Program1 and Program2

The main idea is:

Avoid null (0x00) bytes in your shellcode.

Why are null bytes a problem?

  • In C strings, the null byte (0x00) means “end of string”.
  • So, if our shellcode (which is often injected as a string) contains a null byte in the middle, it might get truncated or cut off, making it fail.
    Example:
char shellcode[] = "\xB8\x01\x00\x00\x00"; // mov eax, 1

That shellcode has 3 null bytes (\x00\x00\x00), so a C program might interpret the string as:

"\xB8\x01"

Because it sees the first \x00 and thinks “end of string.”

❌ This version:

mov eax, 1     ; Generates 4 bytes: B8 01 00 00 00

This has three null bytes: 00 00 00. Bad for shellcode.

✅ This version:

xor eax, eax   ; Zero the whole register (generates 2 bytes: 31 C0 — no nulls)
mov al, 1      ; Set just the low 8 bits (generates: B0 01 — again, no nulls)

This version does not contain any null bytes, so it’s safe to embed into C strings or payloads.

Summary

  • Null bytes (0x00) terminate C strings, so they break shellcode if embedded.
  • mov eax, 1generates 3 null bytes in machine code. ❌
  • xor eax, eax + mov al, 1no null bytes. ✅
  • EAX is a 32-bit register. AX, AH, AL are parts of it:
    • AX = lower 16 bits of EAX
    • AL = lower 8 bits (used to avoid full 32-bit operations)
  • Use mov al, 1 instead of mov eax, 1 to avoid 0x00 bytes in shellcode.

Testing our first Program

We will use output of Program 2’s output in Testing Boilerplate.

Program 2’s Output

So, the bytes we need are 31 c0 b0 01 cd 80. Replacing the code at the top (run.c) with:

//run.c
#include <stdio.h>
#include <string.h>
#include <sys/mman.h>
 
int main() {
    char code[] = "\x31\xc0\xb0\x01\xcd\x80";
 
    // Allocate executable memory
    void *exec = mmap(0, sizeof(code), PROT_READ | PROT_WRITE | PROT_EXEC, MAP_ANON | MAP_PRIVATE, -1, 0);
 
    memcpy(exec, code, sizeof(code));
 
    // Cast and call
    ((void(*)())exec)();
 
    return 0;
}
 

Now, compile and Run:

gcc -m32 -o run run.c
./run
echo $?


Our program returned 0 instead of 1, so our shellcode worked successfully.

Program 3

In this code we will try to spawn a linux shell

Code
;shell.asm
section .data
	msg: db '/bin/sh'
	
section .bss
	
section .text
	global _start
	
_start:
	xor eax, eax    ; zero out eax
	xor ebx, ebx    ; zero out ebx
	xor ecx, ecx    ; zero out ecx
	xor edx, edx    ; zero out edx
	
	mov al, 0xb     ; mov eax, 11: execve
	mov ebx, msg    ; load the string pointer to ebx
	int 0x80        ; syscall
	
	;normal exit
	mov al, 1       ; sys_exit system call
	xor ebx, ebx    ; no errors (mov ebx, 0)
	int 0x80        ; call sys_exit
Compile and run
nasm -f elf32 -o shell.o shell.asm 
ld -m elf_i386 -o shell shell.o
./shell

Program 4

In this program we will write code without any null bytes, using the stack to store variables.

Code
;shell2.asm
section .bss
	
section .text
	global _start
	
_start:
	
	xor eax, eax     ; zero out eax
	xor ebx, ebx     ; zero out ebx
	xor ecx, ecx     ; zero out ecx
	xor edx, edx     ; zero out edx
	
	push eax         ; string terminator
	push 0x68732f6e  ; "hs/n"
	push 0x69622f2f  ; "ib//"
	mov ebx, esp     ; "//bin/sh", 0 pointer is ESP
	mov al, 0xb     ; mov eax, 11: execve
	int 0x80         ; syscall
Explanation
InstructionPurposeValue PushedStack Content (Top to Bottom)
push eaxPush string terminator \00x00000000\0 (null byte)
push 0x68732f6ePush part of string: n/sh (reversed)0x68732f6en, /, s, h
push 0x69622f2fPush part of string: //bi (reversed)0x69622f2f/, /, b, i
mov ebx, espSet ebx to point to start of /bin/sh(copies address of esp)EBX now points to string /bin/sh\0

Memory Order (Top to Bottom = Stack Growth Direction)

AddressBytesMeaning
ESP2f 2f 62 69//bi
ESP+46e 2f 73 68n/sh
ESP+800 00 00 00NULL terminator (\0)

All values are in little-endian, which is why the byte order appears reversed in memory.

Assemble and check if it works properly
nasm -f elf32 -o shell2.o shell2.asm 
ld -m elf_i386 -o shell2 shell2.o
./shell2
objdump -M intel -d shell2

Now using this byte code in Testing code.

Extracting Byte Code

objdump -d ./shell2|grep '[0-9a-f]:'|grep -v 'file'| \
cut -f2 -d:|cut -f1-6 -d' '|tr -s ' '|tr '\t' ' '| \
sed 's/ $//g'|sed 's/ /\\x/g'|paste -d '' -s | \
sed 's/^/"/'|sed 's/$/"/g'


Replacing the byte code in run.c (Testing code).

//run.c
#include <stdio.h>
#include <string.h>
#include <sys/mman.h>
 
int main() {
    char code[] = "\x31\xc0\x31\xdb\x31\xc9\x31\xd2\x50\x68\x6e\x2f\x73\x68\x68\x2f\x2f\x62\x69\x89\xe3\xb0\x0b\xcd\x80";
 
    // Allocate executable memory
    void *exec = mmap(0, sizeof(code), PROT_READ | PROT_WRITE | PROT_EXEC, MAP_ANON | MAP_PRIVATE, -1, 0);
 
    memcpy(exec, code, sizeof(code));
 
    // Cast and call
    ((void(*)())exec)();
 
    return 0;
}
Compile and run code
gcc -m32 -o run run.c
./run

Footnotes

  1. What is Stack

    • The stack is a part of memory used for:
      • Function calls
      • Local variables
      • Return addresses

    📦 Think of it like a pile of plates:

    • We push a plate (data) on top when entering a function.
    • We pop it off when the function exits.
    • Grows downwards in memory.

    🔁 Automatically managed (we don’t manually free stack memory).

    📌 Example:

    void foo() {
       int a = 5;  // stored on the stack
    }
  2. What is Heap

    • The heap is a larger memory area used for:
      • Dynamically allocated memory (e.g., malloc, new)

    🧰 Think of it like a storage warehouse:

    • We ask for space.
    • We get a pointer.
    • We must free it manually when done.

    📌 Example:

    int *a = malloc(sizeof(int));  // lives in the heap
  3. What is Base Address

    • The base address is the starting address of:
      • A memory segment (like a loaded program, DLL, or memory block).
      • A pointer’s target memory region.

    🧭 Think of it as a “starting point” in memory

    • We can calculate offsets from it (like base + 4, etc.)