bddisasm: The Bitdefender x86 Disassembler
Introduction
Hypervisor Memory Introspection (HVMI) relies on analyzing memory accesses in order to determine whether they are legitimate or not. For example, by analyzing the old memory value and the newly stored value, HVMI can decide whether to allow the modification or not. This, however, introduces the complication of needing to do an in-depth analysis of each instruction that modifies protected memory. Unlike a RISC architecture, x86 has a large number of instructions that may access memory in a complicated, read-modify-write (RMW) manner, and using complicated addressing schemes. In order to simplify instruction decoding and analysis, a dedicated x86 instruction decoder has been created, capable of providing full instruction information, and thus alleviating the HVMI module from needing to know x86 instructions format. This blog post will detail some bddisasm internals, how to work with it, while highlighting why it is a critical part of HVMI. In addition, we will go through some particularities of x86 instruction encoding. The main bddisasm project is located here, and the documentation can be accessed here.
bddisasm Overview
bddisasm is a standalone library, written in C, with some Python used for internal decoder-tables generation. The library is built to be fast, while providing as much information about the decoded instructions as possible - this is important, because other projects using bddisasm can rely on it to provide complete and accurate information about instructions. When considering other decoding libraries, there are only some which are similar in features:
- Intel Xed, which is written and maintained by Intel, thus somehow making it the standard x86 decoder; while not the fastest, it provides rich information about the decoded instructions;
- Capstone, which is basically a collection of decoders for multiple architectures;
- ZyDis, which is comparable to Xed as far as features go, and lightweight (however, when bddisasm was created internally, ZyDis did not exist yet);
Other decoders written in other languages, such as Rust or C#, or disassembler (which only provide a textual output of the instruction, without providing actual decoded instruction information) were not considered. Given that Xed and Capstone seemed rather difficult to work with at that time, we decided to create a lightweight decoder of our own, with the following objectives in mind:
- Lightweight - written entirely in C, with no external dependencies, no memory allocated, and thread-safe by design;
- Fast - while we needed a decoder capable of providing as much information as possible about instructions, we still considered speed an important factor;
- Resilient - the decoder should be able to handle all kinds of malformed instructions, as well as valid instructions containing redundant prefixes, or encodings which are not typical;
- Complete - the decoder must support all existing x86 instructions, including AVX; moreover, extending the support to new instructions should be as simple as possible;
- Easy to work with - single-header file, single-API library, which provides all possible information in the output decoded instruction, without the need to call additional functions in order to extract information from the decoded instruction;
We will not get into a comparison between bddisasm and other decoding libraries, and we will keep the focus of this blog post on how to work with bddisasm and how useful it is in HVMI.
Working With bddisasm
Working with the decoding library is easy: include the bddisasm.h
header file, link with the bddisasm.lib
(Windows) or libbddisasm.a
(Linux), and call the decoding API! bddisasm uses a single-API decoding scheme, where the NdDecode
APIs provide an output INSTRUX
structure containing all the possible information about the instruction. The only thing not included in the INSTRUX
structure is the textual disassembly of the instruction, which has to be generated separately using the NdToText
API. A typical usage scenario might be the following:
#include "bddisasm/bddisasm.h"
int main()
{
INSTRUX ix;
unsigned char ins[2] = { 0x33, 0xC0 };
NDSTATUS status;
status = NdDecodeEx(&ix, ins, sizeof(ins), ND_CODE_64, ND_DATA_64);
if (!ND_SUCCESS(status))
{
printf("Decoding failed with error 0x%08x!\n", status);
return -1;
}
printf("Decoded instruction with length %d!\n", ix.Length);
}
Decoded Instruction Information
Note that the output INSTRUX ix
structure will contain all the information about the decoded instruction. A comprehensive list about what kind of information you can find:
- Raw information about the instruction, such as prefixes, opcodes, modrm, SIB, displacement, immediate fields;
- Information about operand size, address size, vector length;
- Decoded prefix information, such as whether the instruction uses lock, is repeated, is xacquire/xrelease enabled or CET tracked;
- Length information, including about the instruction itself and different fields of the instruction, such as immediate fields or displacement;
- Offset information about each constituent field of the instruction (position of each field inside the instruction);
- Individual RFLAGS flags access mode: modified, tested, set, cleared and undefined;
- FPU flags access for
C0
,C1
,C2
andC3
flags; - AVX information, such as instruction exception class, tuple type, and rounding mode;
- Instruction category (example: CALL, RET, ARITH, LOGIC, etc.);
- Instruction set (example: I86, I64, MMX, SSE4, etc.);
- CPUID feature flag, which indicates the leaf and subleaf that must be queried, together with the register and bit which indicates support for that particular instruction;
- Valid operating modes for the instruction (example: real, protected, long, VMX root, etc.);
- Valid prefixes for the instruction (example: REP, LOCK, etc.);
- Valid decorators for AVX instructions (example: masking, broadcast, rounding, etc.);
- Instruction Mnemonic;
- Operand information, including: operand type, size, access (read, conditional read, write, conditional write) and details;
- Full memory operand details: segment, base, index, scale, compressed displacement, stack, string, bitbase, VSIB, etc.);
- Implicit operands information: registers, implicit memory locations, etc.
Examples on how different kind if info can be extracted from INSTRUX
can be found on the official documentation page.
Textual Disassembly
Using the NdToText
function, one can convert a decoded INSTRUX
to a textual disassembly that can be printed. The NdToText
function only supports Intel style syntax, so the following instruction 33C0
would be decoded as XOR eax, eax
, and 4833C0
would be decoded as XOR rax, rax
. Typical usage of the NdToText
function is:
// Create the text disassembly for this instruction.
char text[ND_MIN_BUF_SIZE];
NdToText(&ix, 0, sizeof(text), text);
printf("Instruction: %s\n", text);
bddisasm in HVMI
bddisasm is a critical part of the HVMI technology. As mentioned, decoding and analyzing instructions is important, since the vast majority of events HVMI is dealing with are EPT violations (memory-referencing instructions).
One of the first things HVMI does when an EPT violation takes place is to decode the offending instruction. Since accessing guest memory is usually slow (as it involves translating the guest linear address to a guest physical address, and mapping each physical page in the process - both page-tables and the actual page), once an instruction is decoded, it gets cached internally. Subsequent accesses from the same instruction pointer would yield an already decoded, cached instruction, thus speeding up this process (of course, caching instructions also implies that the pages containing them have to be monitored for modifications, in order to invalidate the cache if an instruction is modified).
Once the instruction has been decoded, HVMI will dissect it in order to determine each memory location accessed. This is important, because one instruction may directly, or indirectly, access multiple memory locations, and we have to analyze each access before allowing the instruction to continue. Since bddisasm provides full operand information (including implicit operands), HVMI will basically iterate through all the instruction operands, and check if they’re memory. For each memory operand, it would then make sure that it accesses a region of memory that is not monitored by HVMI. Examples of instructions that access multiple addresses might include:
CALL [mem]
- it reads the[mem]
operand, and it writes the stack; it may also access the shadow stack, if CET Shadow Stack is enabled;MOVS
- it reads memory from[rsi]
, and writes memory to[rdi]
;- AVX instruction using VSIB addressing - multiple addresses may be accessed by a VSIB operand such as
[rax+xmm0*8]
; - MPX instructions
BNDLDX
andBNDSTX
, which access the bounds tables;
There are also some events (mostly asynchronous) which may cause EPT violations, even if the current instruction does not do any kind of memory access; such events include:
- Delivery of interrupts or exceptions, which will read the Interrupt Descriptor Table (IDT), and will write the interrupt frame on the stack;
- Accesses inside the Global Descriptor Table (GDT), as part of a descriptor load;
- Access inside the Task State Segment (TSS) as part of a task switch or interrupt delivery;
- Accesses inside page-tables, as part of page-walks (although these are distinctively indicated using a dedicated bit inside the VMCS EPT violation exit qualification);
- Other memory accesses, such as Processor Trace induced accesses, Branch Trace Stores accesses, etc.
These types of events are not included in the INSTRUX
structure, as they may occur in an asynchronous manner, during normal instruction execution.
Once the faulting instruction has been decoded, it gets analyzed by multiple modules inside Introcore, in order to determine whether it is a legitimate modification (this is used together with other kinds of information extracted, such as what module the instruction belongs to). A typical validation is comparing the old memory value with the new value that is about to be stored - if they are the same, it is generally safe to allow the instruction to execute. Extracting the old value can be done by doing a simple emulation of the instruction - since bddisasm already provides all the necessary information, the new value can be computed fairly easy (and since this is not a full-blown emulation, there’s no need to do checks at this point).
Another common task done to instructions which access memory is to compute the guest linear address accessed by an explicit memory operand. For a typical modrm encoded memory operand, steps needed in order to compute this value include:
- Querying the segment registers, and fetching the base of the segment used in the addressing; the resulting linear address can be initialized to this segment-base value, or 0, of no segment is used;
- If the memory operand is direct (such as in
A01111111111111111
, which decodes toMOV al, byte ptr [0x1111111111111111]
), the address can be added to the linear address, and no further processing is needed; - If the memory operand uses a base register, add its value to the linear address;
- If the memory operand uses an index register, scale it and add it to the linear address;
- If the memory operand uses a displacement, add it to the linear address;
- If the memory oprand is RIP-relative, add the length of the current instruction, followed by the current RIP;
- If the memory operand uses bitbase addressing (
BT
,BTS
,BTR
,BTC
instructions), compute the bit offset from the source operand, and add it to the resulting linear address; - If the operation is a stack push, subtract the size of the stack operation from the resulting linear address;
The steps listed above will yield the linear address used by the instruction operand, which can be further processed to extract more information.
Adding support for new instructions
bddisasm has been created in an extensible way, making the addition of new instructions trivial, as long as they don’t use a new encoding scheme, or new registers. The instructions database is contained in several .dat
files located inside the isagenerator/instructions
project of the bddisasm repo. While sufficient information about this project is already provided in the readme, we will only show a quick demonstration about adding new instructions to bddisasm.
Let’s first pick up an encoding that is not currently used by any instruction - 0F04
(although it should be noted that it was used on 286 by LOADALL
). If we try to decode this instruction, we will see that bddisasm fails:
C:\>disasmtool -b64 -h 0F04
0000000000000000 0f db 0x0f (0x80000002)
Looking inside disasmstatus.h
, we see that error code 0x80000002
means ND_STATUS_INVALID_ENCODING
- so there is no valid encoding for it, which is good, since we want to create our own.
Let’s now assume that the instruction we want to create is a simple, modrm encoded instruction, with two operands - the first one is a general purpose register, while the second one is either general purpose register, or memory. Let’s call this instruction BDDISASM
, and let’s see a couple of possible forms:
BDDISASM eax, ecx
BDDISASM rax, r15
BDDISASM rsp, qword [rdi]
As already mentioned, the opcode for this instruction will be (the currently unassigned) 0F04
. The first operand will be a 16, 32 or 64 bit general purpose register, depending on operand size. The second operand will be a 16, 32 or 64 bit general purpose register or memory location, depending on modrm.mod and operand size. The first operand is read-write, while the second one is only read. There are no implicit operands. We can now describe the basic instruction in table_0F.dat
:
BDDISASM Gv,Ev nil [ 0x0F 0x04 /r] w:RW|R
Let’s break down this specification:
- The first element,
BDDISASM
, represents the mnemonic; - The second element,
Gv
, indicates that the first operand is a general purpose register encoded in modrm.reg (G
), and is 16, 32 or 64 bit in size, depending on operand size (v
); - The third element,
Ev
, indicates that the second operand is a general purpose register or memory encoded in modrm.rm (E
), and is 16, 32 or 64 bit in size, depending on operand size (v
); - The next element describes the implicit operands accessed by the instruction; since we have none, we specify
nil
; - The next element is the most important, and it describes the encoding; it enumerates all the opcode bytes -
0F
and04
, followed by/r
, indicating that the instruction uses modrm encoding; - The last element is the access map for each instruction operand, and is specified using the keyword
w:
. The first operand is read-write (RW
), while the second one is read (R
).
Let’s now rebuild the decoding tree by building the isagenerator
project in VisualStudio, or by running make
, followed by rebuilding the bddisasm
library and the disasmtool
. Let’s now try to decode the instruction again:
C:\>disasmtool -b64 -h 0F04
0000000000000000 0f db 0x0f (0x80000001)
We now get a different error: 0x80000001
which means ND_STATUS_BUFFER_TOO_SMALL
- this indicates that there is a valid instruction for this encoding, but we didn’t supply enough bytes to decode it. Indeed, we did not, as we only specified the opcode, without the modrm byte. Let’s try again, using the modrm byte as well this time:
C:\>disasmtool -b64 -h 0F0400
0000000000000000 0f0400 BDDISASM eax, dword ptr [rax]
As you can see, the instruction has been successfully decoded! Playing with different encodings yield the expected instruction:
C:\>disasmtool -b64 -h 0F0400660F0400480F0400670F0400F30F0400
0000000000000000 0f0400 BDDISASM eax, dword ptr [rax]
0000000000000003 660f0400 BDDISASM ax, word ptr [rax]
0000000000000007 480f0400 BDDISASM rax, qword ptr [rax]
000000000000000B 670f0400 BDDISASM eax, dword ptr [eax]
000000000000000F f30f0400 BDDISASM eax, dword ptr [rax]
If we want, we can add more information to our instruction. For example, we can now assume that the instruction modifies the flags - it always sets the carry-flag (CF
). In order to specify this, we have to first indicate that the instruction uses the implicit operand Fv
, which stands for the flags register, and then we have to tell it what flags it modifies, using the f:
keyword:
BDDISASM Gv,Ev Fv [ 0x0F 0x04 /r] w:RW|R|W, f:CF=1
Running the disassembler once again, with the extended information option, yields this:
C:\>disasmtool -b64 -h 0F0400 -exi
0000000000000000 0f0400 BDDISASM eax, dword ptr [rax]
DSIZE: 32, ASIZE: 64, VLEN: -
ISA Set: UNKNOWN, Ins cat: UNKNOWN, CET tracked: no
FLAGS access
CF: 1,
Valid modes
R0: yes, R1: yes, R2: yes, R3: yes
Real: yes, V8086: yes, Prot: yes, Compat: yes, Long: yes
SMM on: yes, SMM off: yes, SGX on: yes, SGX off: yes, TSX on: yes, TSX off: yes
VMXRoot: yes, VMXNonRoot: yes, VMXRoot SEAM: yes, VMXNonRoot SEAM: yes, VMX off: yes
Valid prefixes
REP: no, REPcc: no, LOCK: no
HLE: no, XACQUIRE only: no, XRELEASE only: no
BND: no, BHINT: no, DNT: no
Operand: 0, Acc: RW, Type: Register, Size: 4, RawSize: 4, Encoding: R, RegType: General Purpose, RegSize: 4, RegId: 0, RegCount: 1
Operand: 1, Acc: R-, Type: Memory, Size: 4, RawSize: 4, Encoding: M,
Segment: 3, Base: 0,
Operand: 2, Acc: -W, Type: Register, Size: 4, RawSize: 4, Encoding: S, RegType: Flags, RegSize: 4, RegId: 0, RegCount: 1
You can see that in the FLAGS access
section, CF: 1
is listed, meaning that it is always set to 1
. Other possible values for flags include m
, which means that the flag is modified according to the result, 0
, which means that the flag is cleared, t
which means the flag is tested, and u
, which means the flag is undefined. Flags which are missing from the FLAGS access
section are not touched at all.
Special Instruction Encodings
Finally, let’s discuss some not-so-well-known cases when dealing with instruction encodings. Of course, all of them are documented, but finding this information in the SDM can take some time, so here is a quick list of the most relevant ones:
- It is possible to encode seemingly valid instructions which are longer than the maximum limit of 15 bytes, but the CPU would generate #GP on them anyway;
- The REX prefix must always be the last byte before the opcode bytes, otherwise it’s ignored:
48F333C0
will decode toxor eax, eax
, NOTxor rax, rax
, becauseF3
prefix comes after the REX48
prefix; - If both
F2
andF3
prefixes are present in an instruction, the CPU will consider the last occurring prefix:F2F2F2F3A6
will decode torepz cmpsb
, sinceF3
is the last occurring prefix, and all of theF2
prefixes will be ignored ; - Likewise, the last occurring segment prefix is considered, but note that in 64 bit mode, only
fs
andgs
overrides are accepted:65642e2e3300
will decode toxor eax, dword ptr cs:[eax]
in 32 bit mode, but it will decode toxor eax, dword ptr fs:[rax]
in 64 bit mode; - Ignored and redundant prefixes are part of the instruction, so they are counted to the instruction length:
F2F2F2F2F2F2F2F2F2F2F2F2F2F290
is decoded & executed as aNOP
, but adding one moreF2
prefix will make the instruction 16 bytes long, thus generating a #GP, and causing a decode error; - RSP cannot be used as an index with SIB addressing, and it will be ignored:
3304E0
will decode toxor eax, dword ptr [rax]
instead ofxor eax, dword ptr [rax+rsp*8]
; - When using
modrm.mod == 0
(memory) andmodrm.rm == 5
(rbp) in 64 bit mode, RIP relative addressing is used instead of direct addressing; however, by using SIB addressing withmodrm.mod == 0
(memory),modrm.rm == 4
(rsp, SIB),SIB.base == 5
(rbp), andSIB.index == 4
(rsp), one ca still do direct addressing to an absolute 32 bit address even in 64 bit mode:33042544444444
will decode toxor eax, dword ptr [0x44444444]
; - In 64 bit mode, stack operations use 64 bit operands by default (even without REX.W prefix); however, this means that encoding a PUSH/POP with a 32 bit operand is impossible, although you can still encode PUSH/POP with 16 bit operands:
6650
will decode topush ax
even in 64 bit mode, with no way to encodepush eax
; - In 64 bit mode, on Intel CPUs, indirect branches always use 64 bit operands:
FF20
,48FF20
,66FF20
will all decode tojmp qword ptr [rax]
in 64 bit mode; - Moves to/from control/debug registers will always use modrm.mod == 3, even if the actual modrm.mod is 0, 1 or 2:
0F2000
will decode tomov rax, cr0
, notmov [rax], cr0
; - Moves to/from control/debug registers with modrm.mod == 1 or modrm.mod == 2 (with displacement), will ignore the displacement:
0F2040
will decode tomov rax, cr0
, even if normally modrm.mod == 1 would require a 1 byte displacement; - On AMD, one can use the
LOCK
prefixF0
withmov cr
instruction to access the cr8 registers:F00F22C0
decodes tomov cr8, eax
;
Conclusions
We presented in this blog-post how a comprehensive, well written instruction decoder can greatly improve the life of a developer when working with instructions. Surely, bddisasm is a very important part of HVMI, as much of the introspection logic revolves around analyzing instructions, and having the option of doing this with a simple, fast and accurate decoder greately improves code quality.
We will see in a future blog-post how bddisasm can also help create simple instruction emulators, when we will dig into another important project - the Bitdefender Shellcode Emulator - bdshemu.