Level of Abstraction


In traditional computer architecture, there are several levels of abstraction that create a way of hiding the implementation details. Example we can run Windows OS on many different types of hardware because the hardware is abstracted from the OS.

  • There are three coding level involved in malware analysis.

Note

  • Malware Developers(Author) creates programs at high-level languages and use a compiler to generate machine code to be run by the CPU.
  • Malware Analysts and Reverse Engineers operate at low-level language, uses a disassembler to generate assembly code that they can read and analyze to know how a program operates.

The x86 Architecture


The most modern computer architectures includes x86. It has three hardware components:

  • The CPU to executes code.
  • The RAM to stores all data and code.
  • And I/O interfaces like hard drives, keyboards, and monitors.
  • CPU contains three components.
    • The control unit gets instruction to execute from RAM.
    • Registers are the CPU’s basic data storage units and are used to save time so that the CPU’s doesn’t need to access RAM.
    • The ALU executes an instruction fetched from RAM and places the results in registers.

Register

A register is a small amount of data storage available to the CPU, whose contents can be accessed more quickly than storage available elsewhere.

  • The most common x86 registers, which fall into the following four categories:
    • General registers are used by the CPU during execution.
    • Segment registers are used to track sections of memory.
    • Status flags are used to make decisions.
    • Instruction pointers are used to keep track of the next instruction to execute.

Main Memory (RAM)

The Main Memory can be divided in to four major parts.

Data

  • Data refer to the specific section of memory called data section.
  • Which contains values that are put in place when a program is initially loaded which known as static values because they do not change while the program is running.
  • They can also be known as global values.
  • And they are available to any part of the program.

Code

  • Code include the instruction fetched by CPU to execute the program.
  • Code controls how the task will perform.

Heap

  • The heap is used to create new values and free old values that the program no longer needed.
  • The heap is a larger memory area used for:
    • Dynamically allocated memory (e.g., malloc, new)
  • Think of it like a storage warehouse:
    • We ask for space.
    • We get a pointer.
    • We must free it manually when done.
      Example:
	int *a = malloc(sizeof(int));  // lives in the heap

Stack

  • The stack is a part of memory used for:
    • Function calls
    • Local variables
    • Return addresses

Think of it like a pile of plates:

  • We push a plate (data) on top when entering a function.
  • We pop it off when the function exits.
  • Grows downwards in memory.

Automatically managed (we don’t manually free stack memory).

Example:

	void foo() {
	   int a = 5;  // stored on the stack
	}

Instruction

  • Instruction are building blocks of for assembly programs.
  • Assembly program’s instruction are made up of 1. mnemonic and 2. zero or operands.
    1. Mnemonic is a word that identifies the instruction to execute, such as mov, or, xor, etc.
    2. Operands are used to identify the data used for Mnenonic. There are three types of operands.
      • Immediate operands are fixed values, such as the 0x42.
      • Register operands refer to registers, such as ecx .
      • Memory address operands refer to a memory address that contains the value of interest, typically denoted by a value, register, or equation between brackets, such as [eax].

Simple Instruction

The simplest and most common instruction is mov, which is used to move data from one location to another. In other words, it’s the instruction for reading and writing to memory. The mov instruction can move data into registers or RAM. The format is mov destination, source.

lea

  • lea is similar to mov which means “load effective address”.
  • The format of the instruction is lea destination, source.
    Example:
  • lea eax, [ebx+8] will put EBX+8 into EAX.
  • Similarly, mov eax, [ebx+8] loads the data at the memory address specified by EBX+8.
    Therefore, lea eax, [ebx+8] would be the same as mov eax, ebx+8; however, a mov instruction like that is invalid.

The lea instruction is not used exclusively to refer to memory addresses. It is useful when calculating values, because it requires fewer instructions. For example, it is common to see an instruction such as lea ebx, [eax*4+4], where eax is a number, rather than a memory address. This instruction is the functional equivalent of , but the former is shorter or more efficient for the compiler to use instead of a total of four instructions (for example inc eax; mov ecx, 4; mul ecx; mov ebx, eax).

Arithmetic

Addition & Subtraction:

  • Syntax:
    • add destination, value - Adds value to destination.
    • sub destination, value - Subtracts value from destination.
  • Flags:
    • Zero Flag (ZF): Set if result is zero.
    • Carry Flag (CF): Set if destination < value subtracted.
  • Increment & Decrement:
    • inc reg - Increments register by 1.
    • dec reg - Decrements register by 1.

Multiplication & Division:

  • Unsigned Instructions:
    • mul value - Multiplies EAX by value, result in EDX:EAX (64 bits).
    • div value - Divides EDX:EAX by value; quotient in EAX, remainder in EDX.
  • Registers Used:
    • Before: Set EAX (for mul) or EDX:EAX (for div) properly.
    • After: EDX = high 32 bits, EAX = low 32 bits.
  • Signed Versions:
    • imul and idiv.

Logical Operators:

  • Operate like add/sub but bitwise.
  • Common:
    • or destination, value
    • and destination, value
    • xor destination, value

Shifting Instructions:

  • Syntax:
    • shl dest, count — Shift left by count bits.
    • shr dest, count — Shift right by count bits.
  • Bit behavior: Bits shifted out go to CF; zeros fill the gaps.

Rotation Instructions:

  • Syntax:
    • rol dest, count — Rotate bits left.
    • ror dest, count — Rotate bits right.
  • Behavior: Bits wrap around.

Opcodes and Endianness

  • As we can see the opcodes are B9 42 00 00 00 for the instruction mov ecx, 0x42.
  • The value 0xB9 corresponds to mov ecx, and 0x42000000 corresponds to the value 0x42.

Endianness

  • Here 0x42000000 is treated as 0x42 because x86 architecture uses little-endian format.
  • The endianness of data is very important because in larger data items it describes the order data. Little-endian bytes is first in order then big-endian.
  • Changing between endianness id something malware do during network communication because network data uses big-endian and an x86 program uses little-endian.
    • the IP address 127.0.0.1 will be represented as 0x7F000001 in big- endian format (over the network) and 0x0100007F in little-endian format (locally in memory).

Stack

  • Stack Basics:
    • Stack = LIFO (Last In, First Out).
    • push: adds item on top of stack.
    • pop: removes top item.
    • For example, if you push the numbers 1, 2, and then 3 (in order), the first item to pop off will be 3, because it was the last item pushed onto the stack
  • Registers:
    • ESP (Stack Pointer):
      • Points to top of the stack.
      • Updates on push/pop.
    • EBP (Base Pointer):
      • Stays constant within a function.
      • Used to track local variables and parameters.
  • Stack Growth:
    • Stack grows downwards in memory (top-down).
    • Higher addresses → allocated first.
    • Lower addresses → used as values are pushed.
  • Stack Use Cases:
    • Short-term storage for:
      • Local variables
      • Function parameters
      • Return addresses
    • Especially important for function calls.
  • Key Instructions:
    • push, pop, call, ret, leave, enter
  • Local Variables:
    • Typically accessed relative to EBP.
    • Convention varies by compiler, but EBP-based references are common.

Function call

  • What are functions?
    Functions are blocks of code that do a specific job. They’re called by other parts of the program, and after they’re done, they return to where they were called from.
  • Stack Role:
    The stack is used for temporary storage, like function arguments, local variables, and return addresses. It works in a “last in, first out” way, the last item put on the stack is the first one taken off.
  • Registers Involved:
    • ESP (Stack Pointer): Points to the top of the stack.
    • EBP (Base Pointer): Used to keep track of the current function’s stack frame.
  • Function Structure:
    Functions often start with a prologue (setup code) and end with an epilogue (cleanup code).

How a Typical Function Call Works:

  1. Arguments are pushed onto the stack (like setting up the function’s input).
  2. The function is called using the call instruction. This saves where to return to after the function finishes.
  3. Prologue:
    • The function saves the previous base pointer (EBP).
    • It sets up a new base pointer for itself.
    • It makes space for local variables.
  4. The function does its job (calculations, logic, etc.).
  5. Epilogue:
    • Frees up the stack space used for local variables.
    • Restores the old base pointer so the calling code can use its own variables again.
    • Often uses the leave instruction to do this efficiently.
  6. The function ends with ret, which jumps back to the original caller using the saved return address.
  7. The stack is cleaned up to remove any arguments no longer needed.

In short:

Functions use the stack to store data they need temporarily. They set up (prologue), do their job, then clean up (epilogue) before returning control to the main program.

Stack Layout

The stack in x86 is top-down in memory:

  • High memory addresses at the top (like 0x0012F050).
  • Low memory addresses at the bottom (like 0x0012F000).
  • The stack grows downward (toward 0).

When a function is called, a new stack frame is created:

  • It contains function arguments, a return address, the previous EBP (base pointer), and local variables.
  • When the function ends, this stack frame is removed, and the previous (caller’s) stack frame is restored.

ESP and EBP

  • ESP (Stack Pointer): Points to the top of the stack.
    • Pushing data (push eax) decreases ESP (the stack grows down).
    • Popping data (pop ebx) increases ESP (the stack shrinks up).
  • EBP (Base Pointer): Used as a stable reference within a function’s stack frame (doesn’t change during the function).

Example of the Stack Frame

  • Arguments are at the bottom (like 0x12F044).
  • Above them is the return address (where the function will go back).
  • Above that is the old EBP (previous base pointer).
  • Above that are the local variables.
  • ESP points to the top of the stack (where the last item was pushed).
  • EBP stays constant within the function for easy access to locals and arguments.

Direct Stack Access
You don’t always have to use push or pop to access data on the stack.

  • Example: mov eax, ss:[esp] reads the top of the stack directly without changing ESP.
  • This can be useful in certain situations (like low-level tweaks).

Special Push/Pop Instructions

  • pusha: Pushes all 16-bit registers onto the stack (AX, CX, DX, etc.).
  • pushad: Pushes all 32-bit registers (EAX, ECX, EDX, etc.).
  • popa / popad: Pop all those registers back off the stack.

These are rarely used by compilers, but you’ll see them in and-coded assembly or malware shellcode because they’re a fast way to save the entire CPU state (like a snapshot).

Here’s a clear explanation of this section in simpler terms:

Conditional

All programming languages need to compare values and make decisions. In x86 assembly, we do this with conditionals.
Key Instructions for Conditions:

  • test
  • cmp

The test Instruction

  • It’s basically like doing and with two values.
  • BUT: It doesn’t change the values themselves!
  • It just sets flags in the CPU based on the result.
  • Common Use: Check if something is zero (like checking if a register is empty or NULL).
  • Example:
    test eax, eax   ; tests if eax is zero (because eax & eax == eax)
    
  • This is faster and smaller than comparing against zero directly!

The cmp Instruction

  • It’s like a subtraction: cmp a, b is basically a - b.
  • BUT: It doesn’t actually change a or b – it just sets the flags.
  • The CPU’s zero flag (ZF) and carry flag (CF) are most important here.

Here’s how to read the flags after a cmp dst, src:

ComparisonZFCF
dst == src10
dst < src01
dst > src00

This lets the CPU decide what to do next (like jump to another part of the code) based on whether the comparison was equal, less, or greater.

  • test is for checking if a value is zero (without changing the value).
  • cmp is for checking how two values relate (equal, less, greater) without modifying them.
  • CPU flags (ZF and CF) tell the next steps in the program’s decision-making.

Branching

A branch is a part of code that the CPU may or may not execute. It depends on what’s going on in the program’s flow (like an if or while in higher-level languages).

  • Branching means choosing where to go next in the program.

The jmp Instruction: Basic Jump

  • jmp location
    Unconditionally jumps to location.
    Like saying: “Go here no matter what.”
  • No conditions – it always jumps.

Conditional Jumps: Making Decisions

For if/else-style logic in assembly, we use conditional jumps.

  • Conditional jumps check CPU flags (set by instructions like cmp or test).
  • Only jump if the condition is true.

Common Conditional Jumps

InstructionJumps If…Notes
jz loc or je locZero flag (ZF) = 1Jumps if values were equal (e.g., a == b).
jnz loc or jne locZero flag (ZF) = 0Jumps if not equal (a != b).
jg locDestination > Source (signed)After cmp, jumps if signed greater.
jge locDest ≥ Source (signed)Jumps if signed greater or equal.
ja locDest > Source (unsigned)For unsigned values.
jae locDest ≥ Source (unsigned)For unsigned values.
jl locDest < Source (signed)Jumps if signed less than.
jle locDest ≤ Source (signed)Jumps if signed less or equal.
jb locDest < Source (unsigned)For unsigned values.
jbe locDest ≤ Source (unsigned)For unsigned values.
jo locOverflow flag (OF) = 1If last operation had signed overflow.
js locSign flag (SF) = 1If result is negative (sign bit is set).
jecxz locECX = 0Jump if ECX is zero (often used in loop counters).
  • Programs use these jumps to handle conditions and loops.
  • CPU flags (ZF, CF, OF, etc.) tell the CPU whether to jump or not.
  • The jumps don’t affect the values – they just control what code to run next.
    Sure! Here’s a straightforward explanation of this section:

Rep

  • The rep instructions help the CPU handle data buffers (like arrays of bytes, words, or double words).
  • They let you process multiple bytes at once instead of one by one.

Key Players:

  • ESI: Source index register (where to read from).
  • EDI: Destination index register (where to write to).
  • ECX: Counter (how many times to repeat).

How Rep Instructions Work

The rep prefix tells the CPU:

  • “Repeat this instruction until ECX = 0.”

There are variations:

PrefixStop When…
repECX = 0
repe / repzECX = 0 or ZF = 0
repne / repnzECX = 0 or ZF = 1
Common Rep Instructions
InstructionWhat It Does (Like in C)
rep movsbCopies bytes from source to destination (memcpy).
repe cmpsbCompares two buffers until mismatch (memcmp).
rep stosbInitializes a buffer to the same value (memset).
repne scasbSearches for a byte in a buffer.

Example: rep movsb

  • ESI points to source
  • EDI points to destination
  • ECX = number of bytes to copy

If DF (Direction Flag) = 0 (default), ESI and EDI increment (copy forward).
CPU moves 1 byte from ESI to EDI, then decrements ECX.

  • Repeat until ECX = 0.

Example: repe cmpsb

Used to compare two buffers:

  • ESI points to buffer A
  • EDI points to buffer B
  • ECX = number of bytes to compare
  • Compares one byte at a time: ESI vs. EDI.
  • If mismatch, stop (ZF=0).
  • If equal, ECX-- and repeat until ECX=0 or mismatch.

Other Handy Instructions

  • stosb: Store AL in every byte of the buffer.
    (e.g., set entire buffer to zero)
  • scasb: Search for a byte in a buffer.
    Compares AL to each byte in buffer until found.