What is PE file format?

A PE (Portable Executable) file is the native file format for executables, DLLs, and other binary files in the Windows operating system. Derived from the Unix COFF (Common Object File Format), it standardizes how Windows loads and executes programs across all Win32-compatible platforms, including those running on different CPU architectures like x86 or ARM. Despite the name “portable,” a PE file’s machine code is architecture-specific and must be recompiled for other CPU types. The PE format includes headers and sections that define code, data, resources, and other execution details, making it essential for developers, reverse engineers, and security analysts to understand how Windows software is structured and executed.

PE file structure looks like this:

DOS header

  • DOS header store the information needed to load the PE file. Therefore, this header is mandatory for loading a PE file.

DOS header structure:

typedef struct _IMAGE_DOS_HEADER {// DOS .EXE header
WORD e_magic;           // Magic number
WORD e_cblp;            // Bytes on last page of file
WORD e_cp;              // Pages in file
WORD e_crlc;            // Relocations
WORD e_cparhdr;         // Size of header in paragraphs
WORD e_minalloc;        // Minimum extra paragraphs needed
WORD e_maxalloc;        // Maximum extra paragraphs needed
WORD e_ss;              // Initial (relative) SS value
WORD e_sp;              // Initial SP value
WORD e_csum;            // Checksum
WORD e_ip;              // Initial IP value
WORD e_cs;              // Initial (relative) CS value
WORD e_lfarlc;          // File address of relocation table
WORD e_ovno;            // Overlay number
WORD e_res[4];          // Reserved words
WORD e_oemid;           // OEM identifier (for e_oeminfo)
WORD e_oeminfo;         // OEM information; e_oemid specific
WORD e_res2[10];        // Reserved words
LONG e_lfanew;          // File address of new exe header
} IMAGE_DOS_HEADER, *PIMAGE_DOS_HEADER;
  • It is a 64 bytes in size.
  • There are two of the most important fields in the DOS header are e_magic and e_lfanew.
    • The e_magic field contains the magic number 4d 5a (or “MZ” in ASCII), which identifies the file as a DOS-compatible executable and serves as a signature for the PE format. These initials belong to Mark Zbikowski, a Microsoft engineer who played a key role in early DOS development.
    • The e_lfanew field points to the actual start of the PE header, making it critical for the Windows loader and analysis tools to locate and parse the PE file properly.

DOS stub

  • After the first 64 buthes of the file, a DOS Stub starts.
  • Its main purpose is to maintain backward compatibility with DOS systems by displaying a message like “This program cannot be run in DOS mode” if the file is executed in a non-Windows environment.
  • Although it contains valid 16-bit DOS code, it’s largely obsolete and never runs on modern Windows systems. It also plays a structural role by filling space before the actual PE header, whose location is specified by the e_lfanew field in the DOS header.
  • This area in memory is mostly filled with zeros.

PE header

  • This portion is small and simply contains a file signature which are the magic bytes 50 45 00 00.

File Header

  • Comes immediately after the PE signature.
  • Contains metadata about the executable such as:
    • Target machine architecture (e.g., x86, x64)
    • Number of sections
    • Timestamp
    • Flags that describe the file type (e.g., executable, DLL)
  • It does not contain actual executable code but guides the loader on how to interpret the rest of the file.
  • Sometimes referred to as the COFF (Common Object File Format) Header, because it’s inherited from the COFF structure used in Unix.

Optional Header

  • It’s optional in context of COFF object files but not PE files. It contains many important variables such as AddressOfEntryPoint, ImageBase, Section Alignment, SizeOfImage, SizeOfHeaders and the DataDirectory.
  • This structure has 32-bit and 64-bit versions

Section Table

  • Contains an array of IMAGE_SECTION_HEADER structs which define the sections of the PE file such as the .text and .data sections.
  • It consists of 0x28 bytes.

Sections

  • Applications in Windows do not directly access physical memory. Instead, they operate within a virtual memory space. Within this space, sections of a PE file (like code, data, and resources) are loaded and managed. These sections are mapped into virtual memory, and all operations are performed using this mapped data. The address used by an application to reference memory, without any additional offset, is called a Virtual Address (VA).
  • Each application has a preferred starting point in virtual memory, specified by the ImageBase field in the PE file’s Optional Header.
  • All Relative Virtual Addresses (RVAs) are offsets measured from this ImageBase. The relationship between these addresses is given by the formula:
    • RVA = VA - ImageBase.
  • Since the ImageBase is known, you can convert between VA and RVA as needed.
  • Each section has a fixed size defined in the PE file’s section table, and if the actual data is smaller than the section size, it’s padded with null bytes (00) to meet the required size.
  • A typical Windows NT application includes predefined sections like
    • .text (code),
    • .data (initialized data),
    • .bss (uninitialized data),
    • .rdata (read-only data), and
    • .rsrc (resources).

Import Address Table

  • The Import Address Table is comprised of function pointers, and is used to get the addresses of functions when the DLLs are loaded.
  • A compiled application was designed so that all API calls will not use direct hardcoded addresses but rather work through a function pointer.