Global vs. Local Variables
Global Variables
C Code:
int x = 1;
int y = 2;
void main() {
x = x + y;
printf("total = %d\n", x);
}
Assembly (Listing 6-3):
00401003 mov eax, dword_40CF60 ; load global x
00401008 add eax, dword_40C000 ; add global y
0040100E mov dword_40CF60, eax ; store result back to x
00401013 mov ecx, dword_40CF60 ; load x again for printf
00401019 push ecx ; push x
0040101A push offset aTotalD ; push format string "total = %d\n"
0040101F call printf ; call printf
Local Variables
C Code:
void main() {
int x = 1;
int y = 2;
x = x + y;
printf("total = %d\n", x);
}
Assembly (without labels):
00401006 mov dword ptr [ebp-4], 1 ; x = 1
0040100D mov dword ptr [ebp-8], 2 ; y = 2
00401014 mov eax, [ebp-4] ; load x
00401017 add eax, [ebp-8] ; add y
0040101A mov [ebp-4], eax ; store result back to x
0040101D mov ecx, [ebp-4] ; load x again for printf
00401020 push ecx ; push x
00401021 push offset aTotalD ; push format string
00401026 call printf ; call printf
-
Global: Memory addresses like
dword_40CF60
, program-wide scope. -
Local: Stack offsets
[ebp-4]
,[ebp+var_4]
, function-limited scope. -
Uses explicit stack offsets (
[ebp-4]
,[ebp-8]
, etc.). -
Harder to read because you have to track what’s at which offset.
Here, you have to remember
[ebp-4]
→x
[ebp-8]
→y
Assembly (with labels):
00401006 mov [ebp+var_4], 1 ; x = 1
0040100D mov [ebp+var_8], 2 ; y = 2
00401014 mov eax, [ebp+var_4] ; load x
00401017 add eax, [ebp+var_8] ; add y
0040101A mov [ebp+var_4], eax ; store result back to x
0040101D mov ecx, [ebp+var_4] ; load x again for printf
00401020 push ecx ; push x
00401021 push offset aTotalD ; push format string
00401026 call printf ; call printf
-
Uses symbolic names (like
var_4
,var_8
) assigned by a disassembler (like IDA Pro). -
Much easier to read because you don’t have to track offsets in your head.
-
Tools can rename
var_4
tox
directly! -
var_4
→ local variablex
-
var_8
→ local variabley
The actual binary code and execution don’t change at all.
It’s purely about readability during analysis.
Without Label | With Label | What It Means |
---|---|---|
[ebp-4] | [ebp+var_4] | Local variable x |
[ebp-8] | [ebp+var_8] | Local variable y |
Disassembling Arithmetic Operations
C Code:
int a = 0;
int b = 1;
a = a + 11;
a = a - b;
a--;
b++;
b = a % 3;
This initializes two integers a
and b
, then performs a series of arithmetic operations: addition, subtraction, decrement, increment, and modulo.
Assembly:
00401006 mov [ebp+var_4], 0
0040100D mov [ebp+var_8], 1
00401014 mov eax, [ebp+var_4] ; load a
00401017 mov eax, 0Bh ; move 11 into eax
0040101A add [ebp+var_4], eax ; a = a + 11
0040101D mov ecx, [ebp+var_4] ; load a into ecx
00401020 mov ecx, [ebp+var_8] ; load b into ecx
00401023 sub [ebp+var_4], ecx ; a = a - b
00401026 mov edx, [ebp+var_4] ; load a into edx
00401029 mov edx, 1 ; load 1 into edx
0040102C sub [ebp+var_4], edx ; a--
0040102F mov eax, [ebp+var_8] ; load b into eax
00401032 mov eax, 1 ; load 1 into eax
00401035 add [ebp+var_8], eax ; b++
00401038 mov eax, [ebp+var_4] ; load a into eax
0040103B cdq ; sign-extend eax into edx:eax
0040103C mov ecx, 3 ; load 3 into ecx
00401041 idiv ecx ; divide edx:eax by ecx
00401043 mov [ebp+var_8], edx ; store remainder (modulo) in b
Explanation:
- Initialization:
a
(var_4
) = 0b
(var_8
) = 1
- Addition (
a + 11
):
11
is moved toeax
, added toa
, and stored back ina
. - Subtraction (
a - b
):
b
is loaded intoecx
, subtracted froma
, and stored. - Decrement (
a--
):
Rather than usingdec
, compiler usessub
with 1. - Increment (
b++
):
Usesadd
with 1 instead ofinc
. - Modulo (
b = a % 3
):a
is ineax
,cdq
extends it toedx:eax
.idiv ecx
dividesedx:eax
by 3.- Remainder in
edx
is stored asb
.
Recognizing If Statements
if
statements in C let a program choose different execution paths based on a condition. Let’s see how this is translated to assembly.
C Code Example:
int x = 1;
int y = 2;
if (x == y) {
printf("x equals y.\n");
} else {
printf("x is not equal to y.\n");
}
Assembly Code:
00401006 mov [ebp+var_8], 1 ; x = 1
0040100D mov [ebp+var_4], 2 ; y = 2
00401014 mov eax, [ebp+var_8] ; load x
00401017 cmp eax, [ebp+var_4] ; compare x and y
0040101A jnz short loc_40102B ; jump if not equal
0040101C push offset aXEqualsY ; "x equals y.\n"
00401021 call printf
00401026 add esp, 4
00401029 jmp short loc_401038 ; jump to end
0040102B loc_40102B:
0040102B push offset aXIsNotEqualToY; "x is not equal to y.\n"
00401030 call printf
- The comparison (
cmp
) checks ifx
equalsy
. - The conditional jump (
jnz
) means “jump if not equal.” Ifx
is not equal toy
, the jump occurs to theelse
block (loc_40102B
). - If
x
equalsy
, the jump is skipped, and the code prints"x equals y."
- At the end of the
if
block, an unconditional jump (jmp
) skips over theelse
block, ensuring only one branch runs.
Recognizing Nested if Statements
Nested if
statements test multiple conditions and can add complexity in assembly.
C Code:
int x = 0;
int y = 1;
int z = 2;
if (x == y) {
if (z == 0) {
printf("z is zero and x = y.\n");
} else {
printf("z is non-zero and x = y.\n");
}
} else {
if (z == 0) {
printf("z zero and x != y.\n");
} else {
printf("z non-zero and x != y.\n");
}
}
Assembly Code:
00401006 mov [ebp+var_8], 0 ; x=0
0040100D mov [ebp+var_4], 1 ; y=1
00401014 mov [ebp+var_C], 2 ; z=2
0040101B mov eax, [ebp+var_8] ; load x
0040101E cmp eax, [ebp+var_4] ; compare x and y
00401021 jnz short loc_401047 ; jump if x != y
00401023 cmp [ebp+var_C], 0 ; check z==0
00401027 jnz short loc_401038 ; jump if z!=0
00401029 push offset aZIsZeroAndXY_ ; "z is zero and x = y.\n"
0040102E call printf
00401033 add esp, 4
00401036 jmp short loc_401045 ; jump to end
00401038 loc_401038:
00401038 push offset aZIsNonZeroAndX ; "z is non-zero and x = y.\n"
0040103D call printf
00401042 add esp, 4
00401045 loc_401045:
00401045 jmp short loc_401069 ; skip else section
00401047 loc_401047:
00401047 cmp [ebp+var_C], 0 ; check z==0 again
0040104B jnz short loc_40105C ; jump if z!=0
0040104D push offset aZZeroAndXY_ ; "z zero and x != y.\n"
00401052 call printf
00401057 add esp, 4
0040105A jmp short loc_401069
0040105C loc_40105C:
0040105C push offset aZNonZeroAndXY_ ; "z non-zero and x != y.\n"
00401061 call printf
- The first comparison (
cmp
+jnz
) at checks ifx != y
. - If
x == y
, it comparesz
to 0 (cmp
+jnz
) at and chooses which message to print. - If
x != y
, another comparison (cmp
+jnz
) at chooses between the last two messages.
Recognizing Loops
For Loop:
C Code
int i;
for(i=0; i<100; i++) {
printf("i equals %d\n", i);
}
Assembly Code:
00401004 mov [ebp+var_4], 0 ; i=0
0040100B jmp short loc_401016 ; jump to compare first
0040100D loc_40100D:
0040100D mov eax, [ebp+var_4] ; increment loop variable
00401010 add eax, 1
00401013 mov [ebp+var_4], eax ; store incremented value
00401016 loc_401016:
00401016 cmp [ebp+var_4], 64h ; compare i with 100
0040101A jge short loc_40102F ; if i >= 100, jump to end
0040101C mov ecx, [ebp+var_4] ; execution: print i
0040101F push ecx
00401020 push offset aID ; "i equals %d\n"
00401025 call printf
0040102A add esp, 8
0040102D jmp short loc_40100D ; jump to increment
while
Loop
The while
loop is commonly used in malware and other software to repeat code execution until a condition is met—like waiting for a packet or command.
C Code for a while Loop:
int status=0;
int result=0;
while(status == 0) {
result = performAction();
status = checkResult(result);
}
Assembly Code:
00401036 mov [ebp+var_4], 0 ; status = 0
0040103D mov [ebp+var_8], 0 ; result = 0
00401044 loc_401044:
00401044 cmp [ebp+var_4], 0 ; check if status == 0
00401048 jnz short loc_401063 ; exit loop if not zero
0040104A call performAction ; result = performAction()
0040104F mov [ebp+var_8], eax
00401052 mov eax, [ebp+var_8]
00401055 push eax
00401056 call checkResult ; status = checkResult(result)
0040105B add esp, 4
0040105E mov [ebp+var_4], eax
00401061 jmp short loc_401044 ; jump back to start of loop
Key Points:
- The loop starts at
loc_401044
, where it comparesstatus
with 0. - If
status
is not zero (jnz
), the loop exits atloc_401063
. - If
status
is zero,performAction()
andcheckResult()
are executed, updatingstatus
. - An unconditional jump (
jmp
) at the end ensures the loop repeats until the condition is false.
Function Call Conventions
Function calls can vary in how arguments are passed, who cleans up the stack, and where the return value is stored. These rules are known as calling conventions, and they are crucial for understanding assembly code and interfacing with APIs.
Key Conventions
cdecl
- Parameters: pushed right-to-left
- Stack cleanup: caller
- Return value: in EAX
- Example:
push c push b push a call test add esp, 12 ; caller cleans up mov ret, eax
stdcall
- Same parameter order as cdecl.
- Callee cleans up the stack (no
add esp
in caller). - Used in Windows API.
fastcall
- First few arguments (often 2) passed in registers (e.g., ECX, EDX).
- Additional arguments on stack, right-to-left.
- Caller usually cleans up the stack.
Push vs. Move
Compilers may either push arguments onto the stack or move them directly to memory.
For instance:
- Visual Studio (pushes):
push x push y call adder
- GCC (moves directly):
mov [esp+4], y mov [esp], x call adder
C code:
int adder(int a, int b) {
return a + b;
}
void main() {
int x = 1, y = 2;
printf("Result: %d\n", adder(x, y));
}
The assembly code for adder
is:
00401730 push ebp
00401731 mov ebp, esp
00401733 mov eax, [ebp+arg_0]
00401736 add eax, [ebp+arg_4]
00401739 pop ebp
0040173A retn
Visual Studio vs. GCC
Visual Studio (push) | GCC (move) |
---|---|
push x | mov [esp], x |
push y | mov [esp+4], y |
call adder | call adder |
add esp, 8 (cleanup) | No cleanup needed |
printf uses similar variations | printf uses similar variations |
- Even for the same compiler, these conventions can change based on build settings.
- Always check disassembly to confirm the convention.
- Knowing these conventions helps you reverse-engineer or interface with compiled code confidently.
Switch Statements
C Code:
switch(i)
{
case 1:
printf("i = %d", i+1);
break;
case 2:
printf("i = %d", i+2);
break;
case 3:
printf("i = %d", i+3);
break;
default:
break;
}
Assembly :
00401013 cmp [ebp+var_8], 1
00401017 jz short loc_401027 ; jump if i == 1
00401019 cmp [ebp+var_8], 2
0040101D jz short loc_40103D ; jump if i == 2
0040101F cmp [ebp+var_8], 3
00401023 jz short loc_401053 ; jump if i == 3
00401025 jmp short loc_401067 ; jump to default / end
; ---------- case 1 ----------
00401027 loc_401027:
00401027 mov ecx, [ebp+var_4] ; move i to ecx
0040102A add ecx, 1 ; i + 1
0040102D push ecx ; push result
0040102E push offset unk_40C000 ; push format string "i = %d"
00401033 call printf
00401038 add esp, 8 ; clean up stack
0040103B jmp short loc_401067 ; jump to end
; ---------- case 2 ----------
0040103D loc_40103D:
0040103D mov edx, [ebp+var_4] ; move i to edx
00401040 add edx, 2 ; i + 2
00401043 push edx ; push result
00401044 push offset unk_40C004 ; push format string "i = %d"
00401049 call printf
0040104E add esp, 8 ; clean up stack
00401051 jmp short loc_401067 ; jump to end
; ---------- case 3 ----------
00401053 loc_401053:
00401053 mov eax, [ebp+var_4] ; move i to eax
00401056 add eax, 3 ; i + 3
00401059 push eax ; push result
0040105A push offset unk_40C008 ; push format string "i = %d"
0040105F call printf
00401064 add esp, 8 ; clean up stack
; ---------- end of switch ----------
00401067 loc_401067:
-
Comparisons (
cmp
) are done one after the other:- First compare
i
to 1 (cmp [ebp+var_8], 1
) - Jump to code for case 1 if equal
- If not, compare to 2, and so on
- First compare
-
Unconditional jumps (
jmp
) at the end of each case code block:- Prevents fall-through (like
break
in C) - All cases end up at
loc_401067
(end of switch)
- Prevents fall-through (like
-
For each case:
- Load
i
(mov ecx, [ebp+var_4]
, etc.) - Add case-specific value (
add ecx, 1
, etc.) - Push arguments to
printf
(push ecx
, etc.) - Call
printf
- Clean up stack (
add esp, 8
)
- Load
Disassembling Arrays
C Code :
int b[5] = {123, 87, 487, 7, 978};
void main()
{
int i;
int a[5];
for(i = 0; i < 5; i++)
{
a[i] = i;
b[i] = i;
}
}
Assembly Code :
00401006 mov [ebp+var_18], 0 ; i = 0
0040100D jmp short loc_401018 ; jump to comparison
0040100F loc_40100F:
0040100F mov eax, [ebp+var_18] ; eax = i
00401012 add eax, 1 ; eax = i + 1
00401015 mov [ebp+var_18], eax ; i++
00401018 loc_401018:
00401018 cmp [ebp+var_18], 5 ; compare i to 5
0040101C jge short loc_401037 ; if i >= 5, exit loop
; ---------- a[i] = i ----------
0040101E mov ecx, [ebp+var_18] ; ecx = i
00401021 mov edx, [ebp+var_18] ; edx = i
00401024 mov [ebp+ecx*4+var_14], edx ; a[i] = i
; ---------- b[i] = i ----------
00401028 mov eax, [ebp+var_18] ; eax = i
0040102B mov ecx, [ebp+var_18] ; ecx = i
0040102E mov dword_40A000[ecx*4], eax ; b[i] = i
00401035 jmp short loc_40100F ; loop again
00401037 loc_401037:
Local variable a
-
Base address:
[ebp+var_14]
-
Access:
[ebp + ecx*4 + var_14]
sinceint
is 4 bytes.
Global variableb
-
Base address:
dword_40A000
(global in.data
segment). -
Access:
[dword_40A000 + ecx*4]
.
Loop Control -
mov [ebp+var_18], 0
→i = 0
-
Comparison:
cmp [ebp+var_18], 5
-
If
i >= 5
, exit loop (jge loc_401037
) -
Loop increment:
i++
(add eax, 1
) -
Local array element:
mov [ebp+index*4+var_14], value
-
Global array element:
mov dword_40A000[index*4], value
Let’s break down the C struct, the assembly for main()
and test()
, and explain how the struct’s layout can be reverse engineered from the disassembly.
Struct Definition
C Code:
struct my_structure { // Struct layout
int x[5]; // offset 0x00
char y; // offset 0x14
double z; // offset 0x18
};
struct my_structure *gms; // Global variable
void test(struct my_structure *q)
{
int i;
q->y = 'a'; // ASCII 0x61
q->z = 15.6; // Floating point
for (i = 0; i < 5; i++) {
q->x[i] = i; // Array initialization
}
}
void main()
{
gms = (struct my_structure *) malloc(sizeof(struct my_structure));
test(gms);
}
Assembly: main()
Function
00401050 push ebp
00401051 mov ebp, esp
00401053 push 20h ; sizeof(my_structure) = 0x20 (32 bytes)
00401055 call malloc
0040105A add esp, 4
0040105D mov dword_40EA30, eax ; store malloc ptr to global gms
00401062 mov eax, dword_40EA30 ; eax = gms
00401067 push eax ; pass pointer to test()
00401068 call sub_401000 ; call test()
0040106D add esp, 4
00401070 xor eax, eax
00401072 pop ebp
00401073 retn
Assembly: test()
Function:
00401000 push ebp
00401001 mov ebp, esp
00401003 mov ecx, [ebp+arg_0] ; ecx = pointer to struct
00401007 mov byte ptr [ecx+14h], 61h ; q->y = 'a' → offset 0x14
0040100E fld ds:dbl_40B120 ; load 15.6 as float (global constant)
00401014 fstp qword ptr [ecx+18h] ; q->z = 15.6 → offset 0x18
00401017 mov [ebp+var_4], 0 ; i = 0
0040101E jmp short loc_401029 ; loop condition
00401020 loc_401020:
00401020 mov edx, [ebp+var_4]
00401023 mov ecx, [ebp+arg_0]
00401026 mov [ecx+eax*4], edx ; q->x[i] = i → offset 0x00 + i*4
00401029 loc_401029:
00401029 cmp [ebp+var_4], 5
0040102D jge short loc_40103D
0040102F mov eax, [ebp+var_4]
00401032 mov edx, [ebp+var_4]
00401035 mov [ecx+eax*4], edx ; again q->x[i] = i
0040103B jmp short loc_401020
0040103D loc_40103D:
0040103D mov esp, ebp
0040103F pop ebp
00401040 retn
Link Traversal
C Code :
struct node {
int x;
struct node* next;
};
typedef struct node pnode;
void main() {
pnode *curr, *head;
int i;
head = NULL;
// Linked list creation
for (i = 1; i <= 10; i++) {
curr = (pnode *)malloc(sizeof(pnode));
curr->x = i;
curr->next = head;
head = curr;
}
// Linked list traversal
curr = head;
while (curr) {
printf("%d\n", curr->x);
curr = curr->next;
}
}
struct node {
int x; // offset 0
struct node *next; // offset 4
};
Linked List Creation Loop (Assembly:)
0040106A mov [ebp+var_8], 0 ; head = NULL
00401071 mov [ebp+var_C], 1 ; i = 1
00401078 loc_401078:
00401078 cmp [ebp+var_C], 0Ah ; if (i > 10) exit loop
0040107C jg loc_4010AB ; jump if i > 10
...
00401085 call malloc ; allocate new node
0040108A mov [ebp+var_4], eax ; curr = malloc(...)
0040108D mov edx, [ebp+var_4]
00401090 mov eax, [ebp+var_C]
00401093 mov [edx], eax ; curr->x = i ()
00401095 mov edx, [ebp+var_4]
00401098 mov eax, [ebp+var_8]
0040109B mov [edx+4], eax ; curr->next = head ()
0040109E mov eax, [ebp+var_4]
004010A1 mov [ebp+var_8], eax ; head = curr
...
004010A7 inc [ebp+var_C] ; i++
004010A9 jmp loc_401078 ; loop again
malloc
creates a new struct.curr->x = i
→ stored at[curr + 0]
→ offset0x0
curr->next = head
→ stored at[curr + 4]
→ offset0x4
head = curr
updates the head of the list.
Linked List Traversal Loop (Assembly:)
004010AB mov [ebp+var_4], eax ; curr = head
004010B1 loc_4010B1:
004010B1 cmp [ebp+var_4], 0 ; while(curr)
004010B5 jz locret_4010D7 ; if null, exit
004010B7 mov eax, [ebp+var_4] ; eax = curr
004010BA mov eax, [eax] ; eax = curr->x
004010BC mov [esp+var_14], eax ; prepare printf arg
004010C0 mov [esp+var_18], offset aD ; "%d\n"
004010C7 call printf ; print
004010CC mov eax, [ebp+var_4]
004010CF mov eax, [eax+4] ; eax = curr->next ()
004010D2 mov [ebp+var_4], eax ; curr = curr->next
004010D5 jmp loc_4010B1 ; repeat ()
- The value
curr->x
is printed:mov eax, [eax]
curr->next
is accessed viamov eax, [eax+4]
— this is the key to identifying a linked list.
- Pointer Field within Structure:
- When a value is loaded from
[eax + 4]
whereeax
is a struct pointer, it’s likely anext
pointer. - If that value is then reused as a pointer to the same structure → you have a linked list.
- When a value is loaded from
- Recursive Structure Access:
- The pointer field (
next
) is repeatedly dereferenced in a loop.
- The pointer field (
- Offset Use:
- If the struct is
8
bytes (4 forx
, 4 fornext
), and[eax]
and[eax + 4]
are accessed consistently → classic linked list.
- If the struct is