1 / 86

SRE Basics

SRE Basics. In this Section…. We briefly cover following topics Assembly code Virtual machine/Java bytecode Windows PE file format. Assembly Code. High Level Languages. First, high level languages… Ancient high level languages Basic --- little structure FORTRAN --- limited structure

Download Presentation

SRE Basics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SREBasics SRE Basics 1

  2. In this Section… • We briefly cover following topics • Assembly code • Virtual machine/Java bytecode • Windows PE file format SRE Basics 2

  3. Assembly Code SRE Basics 3

  4. High Level Languages • First, high level languages… • Ancient high level languages • Basic --- little structure • FORTRAN --- limited structure • C --- “structured” language • C was designed to deal with complexity • OO languages take this one step further • Above languages considered primitive today SRE Basics 4

  5. High Level Languages • Object oriented (OO) languages • “Object” groups code and data together • Consider best way to handle complexity (at least for now…) • Important OO ideas include • Encapsulation, inheritance, polymorphism SRE Basics 5

  6. High Level Languages • Program must deal with code and data • Data • Variables, data structures, files, etc. • Code • Reverser must study control flow • Conditionals, switches, loops, etc. SRE Basics 6

  7. High Level Languages • High level languages --- different users want different things • Goes back (at least) to C vs FORTRAN • Today, major tradeoff is between simplicity and flexibility • Simplicity --- easy to write short program to do exactly what you want (e.g., C) • Flexibility --- language has it all (e.g., Java) SRE Basics 7

  8. High Level Languages • Some languages compiled into native code • exe is specific to the hardware • C, C++, FORTRAN, etc. • Other languages “compiled” into “code”, which is interpreted by a virtual machine • Java, C# • Often possible to make compiled version • For reverser, this distinction is far more important than OO or not SRE Basics 8

  9. Intro to Assembly • At the lowest level, machine binary • Assembly code lives between binary and high level languages • When reversing native code, we must deal with assembly code • Why assembly code? • Why not “reverse” binary to, say, C? SRE Basics 9

  10. Intro to Assembly • Reverser would like to deal with high level, but is stuck with low level • Ideally, want to create mental “link” from low level to high level • Easier for code written in C • Harder for OO code, such as C++ • Why? SRE Basics 10

  11. Intro to Assembly • Perhaps biggest difference at assembly level is dealing with data • High level languages hide lots and lots of details on data manipulations • For example, loading and storing • Also, low level instructions are primitive • Each instruction does not do very much SRE Basics 11

  12. Intro to Assembly • Consider following simple C program • Simple, but far higher level than assembly code int multiply(int x, int y) { int z; z = x * y; return z; } SRE Basics 12

  13. Intro to Assembly int multiply(int x, int y) { int z; z = x * y; return z; } • In assembly code… • Store state before entering function • Allocate memory for z • Load x and y into registers • Multiply x by y and store result in register • Copy result back to memory for z (optional) • Restore state that was stored in 1. • Return z SRE Basics 13

  14. Intro to Assembly • Why are things so complicated at low level? • It’s all about efficiency! • Reading memory and storing are slow • No single asm instruction to read memory, operate on it, and store result • But this is common in high level languages SRE Basics 14

  15. Intro to Assembly • Registers --- “local” processor memory • So don’t have to read and write RAM • Stack --- “scratch paper” (in RAM) • Holds register values, local variables, function parameters and return values • E.g., storage for “z” in multiply example • Heap --- dynamic, variable-sized data • Data section --- e.g., string constants • Control flow --- high level “if” or “while” are much more complex at low level SRE Basics 15

  16. Registers • Registers used in most instructions • Specifics here deal with “IA-32” • Intel Architecture, 32-bit • Used in “Wintel” machines • We use IA-32 notation • AT&T notation also exists • Eight 32-bit registers (next slide) • All 8 start with “E” • Also several system registers SRE Basics 16

  17. Registers • EAX, EBX, EDX --- generic, used for int, Boolean, …, memory operations • ECX --- generic, used as counter • ESI/EDI --- generic, source/destination pointers when copying memory • SI == source index, DI == destination index • EBP --- generic, stack “base” pointer • Usually, stack position after return address • ESP --- stack pointer • Curretn stack frame is between ESP to EBP SRE Basics 17

  18. Flags • EFLAGS --- special registers • Status flags updated by various operations to “record” outcomes • System flags too, but we don’t care about them • Flags are basic tool for conditionals • For example, a TEST followed by a jump instruction • TEST sets various flags, jump determines action to take, based on those flags SRE Basics 18

  19. Instruction Format • Most instructions consist of… • Opcode --- the “instruction” • One or two operands --- “parameter(s)” • Operand (parameters) are data • Operands come in 3 flavors • Register name --- for example, EAX • Immediate --- e.g., hard-coded constant • Memory address --- enclosed in [brackets] SRE Basics 19

  20. Operand Examples • EAX • Read from (or write to) EAX register, depending on opcode • 0x30004040 • Immediate --- number is embedded in code • Usually a constant in high-level code • [0x4000349e] • This os a memory address • Could be a global variable in high level code SRE Basics 20

  21. Basic Instructions • We cover a few common instructions • First we give general format • Later, we give a few simple examples • There are lots of assembly instructions • But, most assembly code uses only a few • About 14 assembly instructions account for more than 90% of all code SRE Basics 21

  22. Opcode Counts • Typical opcode counts, “normal” code SRE Basics 22

  23. Opcode Counts • Opcode counts, typical virus code SRE Basics 23

  24. Instructions • We consider following operations • Moving data • Arithmetic • Comparisons • Conditional branches • Function calls SRE Basics 24

  25. Moving Data • MOV is the most popular opcode • 2 operands, destination and source: • MOV DestOperand, SourceOperand • Note the order • Destination first, source second SRE Basics 25

  26. Arithmetic • Six integer arithmetic operations • ADD, SUB, MUL, DIV, IMUL, IDIV • Many variations based on operands • ADD Op1, Op2 ; add, store result in Op1 • SUB Op1, Op2 ; sub Op2 from Op1 --> Op1 • MUL Op ; mul Op by EAX ---> EDX:EAX • DIV Op ; div EDX:EAX by Op quotient ---> EAX, remainder ---> EDX • IMUL, IDIV --- like MUL and DIV, but signed SRE Basics 26

  27. Comparisons • CMP opcode has 2 operands • CMP Operand1, Operand2 • Subtracts Operand2 from Operand1 • Result “stored” in flag bits • If 0 then ZF flag is set • Other flags can be used to tell which is greater, depending on signed or unsigned SRE Basics 27

  28. Conditional Branches • Conditional branches use “Jcc” family of instructions (je, jne, jz, jnz, etc.) • Format is • Jcc TargetAddress • If Jcc true, goto TargetAddress • Otherwise, what happens? SRE Basics 28

  29. Function Calls • Use CALL and RET • CALL FunctionAddress …… • RET ; pops return address • RET can be told to increment ESP • Need to reset stack pointer • Why? SRE Basics 29

  30. Examples cmp ebx,0xf020 jnz 10026509 • What does this do? • Compares value in EBX with constant • Jumps to specified address if operands are not same • Note: JNE and JNZ are same instruction SRE Basics 30

  31. Examples mov edi,[ecx+0x5b0] mov ebx,[ecx+0x5b4] imul edi,ebx • What does this do? • First, add 0x5b0 to ECX register, get value at that memory and put in EDI • Next, add 0x5b4 to ECX, get value at that memory and put in EBX • Note that ECX points to some data structure • Finally, EDI = EDI * EBX • Note there are different forms of IMUL SRE Basics 31

  32. Examples push eax push edi push ebx push esi push dword ptr [esp+0x24] call 0x10026eeb • What does this do? • PUSH four register values • PUSH something related to stack ptr • Probably, parameter or local variable • Would need to look at more code to decide • Note “dword ptr” is effectively a cast • CALL a function SRE Basics 32

  33. Examples mov eax, dword ptr [ebp - 0x20] shl eax, 4 mov ecx, dword ptr [ebp - 0x24] cmp dword ptr [eax+ecx+4], 0 call 0x10026eeb • What does this do? • Maybe “data structure in an array” • Last line • ECX --- gets base pointer • EAX --- current offset into the array • Add 4 to get specific member of structure SRE Basics 33

  34. Examples • AT&T syntax pushl $14 pushl $helloWorld pushl $1 movl $4, %eax pushl %eax int $0x80 addl $16, %esp pushl $0 movl $1, %eax pushl %eax int $0x80 SRE Basics 34

  35. Compilation • Converts high level representation of code to binary • Front end --- lexical analysis • Verify syntax, etc. • Intermediate representation • Optimization • Improve structure, eliminate redundancy, … SRE Basics 35

  36. Compilation • Back end --- generates the actual code • Instruction selection • Register allocation • Instruction scheduling --- pipelining, parallelism • Back end process might make disassembly hard to read • Optimization too • Each compiler has its own quirks • Can you automatically determine compiler? SRE Basics 36

  37. Virtual Machines & Bytecode SRE Basics 37

  38. Virtual Machines • Some languages instead generate intermediate bytecode • Bytecode runs in a virtual machine • Virtual machine is a program that (historically) interprets bytecode • Translates bytecode for the hardware • Bytecode analogous to assembly code SRE Basics 38

  39. Virtual Machines • Advantages? • Hardware independent • Disadvantages? • Slow • Today, usually just-in-time compilers instead of interpreters • Compile snippets of bytecode into native code as needed SRE Basics 39

  40. Reversing Bytecode • Reversing bytecode is easy • Unless special precautions are taken • Even then, easier than native code • Bytecode usually contains lots of metadata • Possible to reconstruct highly accurate high level language • Bytecode can be obfuscated • In worst case, reverser must learn bytecode • But bytecode is easier than native code SRE Basics 40

  41. Windows PE Files SRE Basics 41

  42. Windows PE File Format • Designed to be standard executable file format for all versions of OS… • …on all supported processors • Only small changes since PE format was introduced • E.g., support for 64-bit Windows SRE Basics 42

  43. Windows PE Files • Trivia • Q: What’s the difference between exe and dll? • A: Not much --- one bit differs in PE files • Q: What is size of smallest possible PE file? • A: 133 bytes • PE file on disk is a file • Once loaded into memory, it’s a module • File is mapped to module • Address where module begins is HMODULE • PE file may not all be mapped to module SRE Basics 43

  44. Windows PE Files • WINNT.H is final word on what PE file looks like • Tools to examine PE files • Dumpbin (Visual Studio) • Depends • PE Browse Professional • In spite of its name, it’s free • PEDUMP (by author of article) SRE Basics 44

  45. PE File Sections • Each section is “chunk of code or data that logically belongs together” • For example, all import tables in one section • Code is in .text section • Code is code, but many types of data • Data examples • Program data (e.g., .rdata for read-only) • API import/export tables • Resources, relocation info, etc. • Can specify section names in C++ source SRE Basics 45

  46. PE File Sections • When mapped, module starts on a page boundary • Linker can be told to merge sections • E.g., to merge .text and .rdata: • /MERGE:.rdata=.text • Some sections commonly merged • Some sections cannot be merged SRE Basics 46

  47. Relative Virtual Addresses • Exe file specifies in-memory addresses • PE file specifies preferred load location • But DLL can actually load just about anywhere • So, PE specifies addresses in a way that is independent of where it loads • No hardcoded addresses in PE • Instead, Relative Virtual Addresses (RVAs) • RVA is an offset relative to where PE is loaded SRE Basics 47

  48. Relative Virtual Addresses • To find actual memory location, add RVA to the actual load address • For example, suppose • Exe file is loaded at 0x400000 • And RVA is 0x1000 • Then code (.text) starts at 0x401000 • In Windows terminology, actual address is known as Virtual Address (VA) SRE Basics 48

  49. Data Directory • There are many data structures within exe • For efficiency, must be loaded quickly • E.g., imports, exports, resources, base relocations, etc. • DataDirectory • Array of 16 data structures • #define IMAGE_DIRECTORY_ENTRY_xxx defines array indexes (0 to 15) SRE Basics 49

  50. Importing Functions • To use code or data from another DLL, must import it • When PE file loads, Windows loader locates imported functions/data • Usually automatic, when program first starts • Imported DLLs may import others • For example, any program created with Visual C++ imports KERNEL32.DLL… • …and KERNEL32.DLL imports from NTDLL.DLL SRE Basics 50

More Related