Friday, September 18, 2009

Hackers Disassembling 1.1.5(Step Five: IDA Emerges onto the Scene)

Step Five: IDA Emerges onto the Scene

Following Dennis Ritchie's example, it has become typical to begin learning a new programming language by creating the simple "Hello, World!" program. We aren't going to sidestep this tradition. Let's evaluate the capabilities of IDA Pro using the following example. (I recommend that you compile it using Microsoft Visual C++ 6.0. Call "cl.exe first.cpp" from the command line to obtain results consistent with those in this book.)

Listing 6: The Source Code of the first.cpp Program


#include
void main ()
{
cout<<"Hello, Sailor!\n";
}




The compiler will generate an executable file that is almost 40 KB, the majority of which will be occupied with auxiliary, start, or library code! Attempts to disassemble the code using a disassembler such as W32Dasm won't be successful; the listing will be more than 500 KB! You can imagine how much time will be eaten up, especially if serious problems occupy dozens of megabytes of disassembled code.

Let's try to disassemble this program using IDA. If the default settings are used, the screen should look as follows upon completion of the analysis (although variations are possible depending on the version):



Figure 4: The IDA Pro 3.6 console interface



Figure 5: The IDA Pro 4.0 command line interface





Figure 6: The IDA Pro 4.0 GUI interface

Beginning with version 3.8x (possibly earlier), collapsing support appeared in IDA. This feature considerably simplifies code navigation, allowing us to remove lines from screen that aren't of interest at the moment. By default, all library functions are collapsed.

You can expand a function by positioning the cursor on it and pressing the <+> key on the numeric keypad. The <-> key is used to collapse the function.

After finishing analysis of the first.exe file, IDA places the cursor on the line .text:00401B2C — the program's entry point. Many novice programmers mistakenly believe that programs written in C start executing from the main function. Actually, immediately after the file is loaded, control is passed to the Start function inserted by the compiler. It prepares the following global variables:_osver (the operating system build number), winmajor (the major version number of the operating system), _winminor (the minor version number of the operating system), _winver (the complete version of the operating system incorporating winmajor and winminor), _argc (the number of arguments on the command line), argv (an array of pointers to the argument strings), and environ (an array of pointers to environment variable strings). The Start function also initializes the heap and calls the main function. After returning control, it completes the process using the Exit function. The following program allows us to clearly demonstrate the process of initializing variables performed by the start code:

Listing 7: The Source Code of the CRt0.demo.c Program


#include
#include
void main()
{
int a;
printf(">OS Version:\t\t\t%d.%d\n\
>Build:\t\t\t%d\n\
>Number of arguments:\t%d\n,"\
_winmajor, _winminor, _osver, __argc);
for (a=0; a<__argc; a++)
printf(">\tArgument %02d:\t\t%s\n", a+1, __argv[a]);
a=!a-1;
while(_environ[++a]) ;
printf(">Number of environment variables:%d\n", a);
while(a) printf(">\tVariable %d:\t\t%s\n",
a, _environ[--a]);
}




The main function looks as though the application doesn't accept any arguments from the command line, but running the program proves the opposite. On my computer, its (abridged) output looks like this:

Listing 8: The Result of Running the CRt0.demo.c Program (Abridged)


> OS Version: 5.0
>Build: 2195
>Number of arguments: 1
>Argument 01: CRt0.demo
>Number of environment variables: 30
>Variable 29: windir=C:\WINNT
> ...




There's no need to analyze the standard start code. The first task is to find where control is passed to the main function. Unfortunately, a guaranteed solution requires the complete analysis of the Start function. Investigators have plenty of tricks, but all of them are based on the implementations of particular compilers[i]; these tricks can't be considered universal.

I recommend that you study the source code of the Start functions of popular compilers, contained in the CRt0.c file (Microsoft Visual C++) and in the c0w.asm file (Borland C++). This will simplify analysis of the listing obtained from the disassembler. As an illustration, the start code of the first.exe program is shown in the following listing as a result of W32Dasm disassembly:

Listing 9: The Start Code of first.exe Obtained Using W32Dasm


//******************** Program Entry Point ********
:00401B2C 55 push ebp
:00401B2D 8BEC mov ebp, esp
:00401B2F 6AFF push FFFFFFFF
:00401B31 6870714000 push 00407170
:00401B36 68A8374000 push 004037A8
:00401B3B 64A100000000 mov eax, dword ptr fs: [00000000]
:00401B41 50 push eax
:00401B42 64892500000000 mov dword ptr fs:[00000000], esp
:00401B49 83EC10 sub esp, 00000010
:00401B4C 53 push ebx
:00401B4D 56 push esi
:00401B4E 57 push edi
:00401B4F 8965E8 mov dword ptr [ebp-18], esp

Reference To: KERNEL32.GetVersion, Ord:0174h

:00401B52 FF1504704000 call dword ptr [00407004]
:00401B58 33D2 xor edx, edx
:00401B5A 8AD4 mov dl, ah
:00401B5C 8915B0874000 mov dword ptr [004087B0], edx

:00401B62 8BC8 mov ecx, eax
:00401B64 81E1FF000000 and ecx, 000000FF
:00401B6A 890DAC874000 mov dword ptr [004087AC], ecx
:00401B70 C1E108 shl ecx, 08
:00401B73 03CA add ecx, edx
:00401B75 890DA8874000 mov dword ptr [004087A8], ecx
:00401B7B C1E810 shr eax, 10
:00401B7E A3A4874000 mov dword ptr [004087A4], eax
:00401B83 6A00 push 00000000
:00401B85 E8D91B0000 call 00403763
:00401B8A 59 pop ecx
:00401B8B 85C0 test eax, eax
:00401B8D 7508 jne 00401B97
:00401B8F 6A1C push 0000001C
:00401B91 E89A000000 call 00401C30
:00401B96 59 pop ecx

Referenced by a (U)nconditional or (C)onditional Jump at Address:
:00401B8D(C)

:00401B97 8365FC00 and dword ptr [ebp-04], 00000000
:00401B9B E8D70C0000 call 00402877

Reference To: KERNEL32.GetCommandLineA, Ord:00CAh

:00401BA0 FF1560704000 call dword ptr [00407060]
:00401BA6 A3E49C4000 mov dword ptr [00409CE4], eax
:00401BAB E8811A0000 call 00403631
:00401BB0 A388874000 mov dword ptr [00408788], eax
:00401BB5 E82A180000 call 004033E4
:00401BBA E86C170000 call 0040332B
:00401BBF E8E1140000 call 004030A5
:00401BC4 A1C0874000 mov eax, dword ptr [004087C0]
:00401BC9 A3C4874000 mov dword ptr [004087C4], eax
:00401BCE 50 push eax
:00401BCF FF35B8874000 push dword ptr [004087B8]
:00401BD5 FF35B4874000 push dword ptr [004087B4]
:00401BDB E820F4FFFF call 00401000
:00401BE0 83C40C add esp, 0000000C
:00401BE3 8945E4 mov dword ptr [ebp-1C], eax
:00401BE6 50 push eax
:00401BE7 E8E6140000 call 004030D2
:00401BEC 8B45EC mov eax, dword ptr [ebp-14]
:00401BEF 8B08 mov ecx, dword ptr [eax]
:00401BF1 8B09 mov ecx, dword ptr [ecx]
:00401BF3 894DE0 mov dword ptr [ebp-20], ecx
:00401BF6 50 push eax
:00401BF7 51 push ecx
:00401BF8 E8AA150000 call 004031A7
:00401BFD 59 pop ecx
:00401BFE 59 pop ecx
:00401BFF C3 ret




IDA knows how to recognize library functions by their signatures. (Almost the same algorithm is used by anti-virus software.) Therefore, disassemblers strongly depend on the version and completeness of the package. Not all IDA Pro versions are capable of working with programs generated by present-day compilers. (See the %IDA%/SIG/list file for the list of supported compilers.)

Listing 10: The Start Code of first.exe Obtained Using IDA Pro 4.01


00401B2C start proc near
00401B2C
00401B2C var_20 = dword ptr -20h
00401B2C var_1C = dword ptr -1Ch
00401B2C var_18 = dword ptr -18h
00401B2C var_14 = dword ptr -14h
00401B2C var_4 = dword ptr -4
00401B2C
00401B2C push ebp
00401B2D mov ebp, esp
00401B2F push 0FFFFFFFFh
00401B31 push offset stru_407170
00401B36 push offset __except_handler3
00401B3B mov eax, large fs:0
00401B41 push eax
00401B42 mov large fs:0, esp
00401B49 sub esp, 10h
00401B4C push ebx
00401B4D push esi
00401B4E push edi
00401B4F mov [ebp+var_18], esp
00401B52 call ds:GetVersion
00401B58 xor edx, edx
00401B5A mov dl, ah
00401B5C mov dword_4087B0, edx
00401B62 mov ecx, eax
00401B64 and ecx, 0FFh
00401B6A mov dword_4087AC, ecx
00401B70 shl ecx, 8
00401B73 add ecx, edx
00401B75 mov dword_4087A8, ecx
00401B7B shr eax, 10h
00401B7E mov dword_4087A4, eax
00401B83 push 0
00401B85 call __heap_init
00401B8A pop ecx
00401B8B test eax, eax
00401B8D jnz short loc_401B97
00401B8F push 1Ch
00401B91 call sub_401C30 ; _fast_error_exit
00401B96 pop ecx
00401B97
00401B97 loc_401B97: ; CODE XREF: start+61↑j
00401B97 and [ebp+var_4], 0
00401B9B call __ioinit
00401BA0 call ds:GetCommandLineA
00401BA6 mov dword_409CE4, eax
00401BAB call ___crtGetEnvironmentStringsA
00401BB0 mov dword_408788, eax
00401BB5 call __setargv
00401BBA call __setenvp
00401BBF call __cinit
00401BC4 mov eax, dword_4087C0
00401BC9 mov dword_4087C4, eax
00401BCE push eax
00401BCF push dword_4087B8
00401BD5 push dword_4087B4
00401BDB call sub_401000
00401BE0 add esp, 0Ch
00401BE3 mov [ebp+var_1C], eax
00401BE6 push eax
00401BE7 call _exit
00401BEC ; ------------------------------------------------------------
00401BEC
00401BEC loc_401BEC: ; DATA XREF: _rdata:00407170↓o
00401BEC mov eax, [ebp-14h]
00401BEF mov ecx, [eax]
00401BF1 mov ecx, [ecx]
00401BF3 mov [ebp-20h], ecx
00401BF6 push eax
00401BF7 push ecx
00401BF8 call __XcptFilter
00401BFD pop ecx
00401BFE pop ecx
00401BFF retn
00401BFF start endp ; sp = -34h




IDA Pro successfully copes with the above example, acknowledged by the line "Using FLIRT signature: VC v2.0/4.x/5.0 runtime" in the message box.





Figure 7: Loading the signature library

The disassembler has successfully determined the names of all the functions called by the start code, except the one located at the address 0x0401BDB. Knowing that three arguments are passed, and exit is called upon the return from the function, we can assume this exception is main.

There are several ways of getting to the address 0x0401000 to see the main function, including scrolling the screen using the arrows, or pressing the key and entering the required address in the dialog box that appears. However, it's easier and faster to use the navigation system built into IDA Pro. If you place the cursor on a name, constant, or expression and press the key, IDA automatically goes to the required address.

In this case, we need to place the cursor on the string sub_401000 (an argument of the call instruction). Press the key. The disassembler window should look like this:

00401000 ; ---------------- S U B R O U T I N E -------------------------
00401000
00401000 ; Attributes: bp-based frame
00401000
00401000 sub_401000 proc near ; CODE XREF: start+AF↓p
00401000 push ebp
00401001 mov ebp, esp
00401003 push offset aHelloSailor ; "Hello, Sailor!\n"
00401008 mov ecx, offset dword_408748
0040100D call ??6ostream@@QAEAAV0@PBD@Z ;
0040100D ;ostream: :operator<<(char const *)
00401012 pop ebp
00401013 retn
00401013 sub_401000 endp

The disassembler recognized a string variable and has given it the meaningful name: aHelloSailor. For clarity, in the comment on the right, it has given the original contents: "Hello, Sailor!\n". If you place the cursor on aHelloSailor and press the key, IDA will go to the required string:

00408040 aHelloSailor db 'Hello, Sailor!',0Ah,0 ; DATA XREF: sub_401000+3↑o

The comment DATA XREF: sub_401000+3↑o is known as a cross-reference: In the third line of the sub_401000 procedure, a call was made to an offset address. The "o" stands for offset, and the arrow directed upward specifies the relative position of the cross-reference.

If you place the cursor on the sub_401000+3 expression and press the key, IDA Pro will go to the following line:

00401003 push offset aHelloSailor ; "Hello, Sailor!\n"

Pressing the key cancels the previous move and returns the cursor to its initial position (like the Back command in a Web browser). An offset to the string "Hello, Sailor!\n" is passed to the procedure ??6ostream@@QAEAAV0@PBD@Z, the << operator in C++. The strange name comes from the limitation on characters that can be used in names of library functions. Compilers automatically mangle such names, transforming them into gobbledygook suitable only for operation with the linker. Few novice programmers suspect such hidden "machinations."

To facilitate analysis of code, IDA Pro displays the "correct" names in the comments, but it can be forced to show demangled names everywhere. To do this, we need to select the Demangled names item from the Options menu, then set the Names radio button in the dialog box that pops up; after that, the call to the << operator will appear as follows:

0040100D call ostream: :operator<<(char const *)

At this point, the analysis of the first.cpp application is complete. We only have to rename the sub_401000 function to main. For this, we need to position the cursor on the 0x0401000 string (the function's start address), press the key, and enter "main" in the dialog box that opens. The result should look like this:

00401000 ; ---------------------- S U B R O U T I N E ----------------------
00401000
00401000 ; Attributes: bp-based frame
00401000
00401000 main proc near ; CODE XREF: start+AF↓p
00401000 push ebp
00401001 mov ebp, esp
00401003 push offset aHelloSailor ; "Hello, Sailor!\n"
00401008 mov ecx, offset dword_408748
0040100D call ostream: :operator<<(char const *)
00401012 pop ebp
00401013 retn
00401013 main endp

Compare this to W32Dasm. (Only the contents of the main function are given.)

:00401000 55 push ebp
:00401001 8BEC mov ebp, esp

Possible StringData Ref from Data Obj -->"Hello, Sailor!"

:00401003 6840804000 push 00408040
:00401008 B948874000 mov ecx, 00408748
:0040100D E8AB000000 call 004010BD
:00401012 5D pop ebp
:00401013 C3 ret

Another important advantage of IDA is the ability to disassemble encrypted programs. In the example /SRC/Crypt.com, a static encryption method, frequently found with "wrapper" protections, was used. This simple trick "dazzles" most disassemblers. For example, processing the Crypt.com file using Sourcer results in:

Crypt proc far
7E5B:0100 start:
7E5B:0100 83 C6 06 add si, 6
7E5B:0103 FF E6 jmp si ; *
; *No entry point to code
7E5B:0105 B9 14BE mov cx, 14BEh
7E5B:0108 01 AD 5691 add ds:data_le[di], bp ; (7E5B:5691=0)
7E5B:010C 80 34 66 xor byte ptr [si], 66h ; 'f'
7E5B:010F 46 inc si
7E5B:0110 E2 FA loop $-4 ; Loop if cx > 0

7E5B:0112 FF E6 jmp si ;*
; *No entry point to code
7E5B:114 18 00 sbb [bx+si], al
7E5B:116 D2 6F DC shr byte ptr [bx-24h], cl ; Shift w/zeros fill
7E5B:119 6E 67 AB 47 A5 2E db 6Eh, 67h, 0ABh, 47h, 0A5h, 2Eh
7E5B:11F 03 0A 0A 09 4A 35 db 03h, 0Ah, 0Ah, 09h, 4Ah, 35h
7E5B:125 07 0F 0A 09 14 47 db 07h, 0Fh, 0Ah, 09h, 14h, 47h
7E5B:12B 6B 6C 42 E8 00 00 db 6Bh, 6Ch, 42h, E8h, 00h, 00h
7E5B:131 59 5E BF 00 01 57 db 59h, 5Eh, BFh, 00h, 01h, 57h
7E5B:137 2B CE F3 A4 C3 db 2Bh, CEh, F3h, A4h, C3h

Crypt endp

Sourcer failed to disassemble half of the code, leaving it as a dump, and it incorrectly disassembled the other half! The JMP SI instruction at line : 0x103 jumps to the address :0x106. (When the COM file is loaded, the value in the SI register is equal to 0x100; therefore, after the ADD SI, 6 instruction is executed, the SI register contains 0x106.) However, the instruction following the JMP is at address 0x105! The source code has a dummy byte inserted in this location, which leads the disassembler astray. That byte is interpreted as the next instruction, leading to a shift in the code to be disassembled.

Start:
add si, 6
jmp si
db 0B9H ;
lea si, _end ; to the beginning of the encrypted fragment

Sourcer is unable to predict register change points. After encountering the JMP SI instruction, it continues disassembling, silently assuming that instructions are sequential. It's possible to create a file of definitions that would indicate a byte of data is located at address 0x105, but this is inconvenient.

In contrast to Sourcer-like disassemblers, IDA was designed as an interactive, user-friendly environment. IDA doesn't make assumptions; if difficulties arise, it asks the user for help. Therefore, after encountering a register change to an unknown address, it stops further analysis. This means the result of analyzing the Crypt.com file looks like this:

seg000:0100 start proc near
seg000:0100 add si, 6
seg000:0103 jmp si
seg000:0103 start endp
seg000:0103
seg000:0103 ; ---------------------------------------------------------
seg000:0105 db 0B9h ;
seg000:0106 db 0Beh ; -
seg000:0107 db 14h ;
seg000:0108 db 1 ;
seg000:0109 db 0Adh ; i
seg000:010A db 91h ; N
...

We can help the disassembler by specifying the jump address. In this situation, novice users usually bring the cursor to the corresponding line and press the key, forcing IDA to disassemble the code from that position to the function's end. However, such a solution is erroneous; we still don't know where the branch in line :0x103 points, or how the code at address :0x106 receives control.

The correct solution is to add a cross-reference that would link line :0x103 to line :0x106. For this, we need to select Cross references from the View menu. Then, in the dialog box that opens, we need to fill in the from and to fields with the values seg000:0103 and seg000:0106, respectively.

As a result, the disassembler output should look as follows. (A bug in IDA 4.01.300 means adding a new cross-reference does not always result in automatic disassembling.)

seg000:0100 public start
seg000:0100 start proc near
seg000:0100 add si, 6
seg000:0103 jmp si
seg000:0103 start endp
seg000:0103
seg000:0103 ; -------------------------------------------------------------
seg000:0105 db 0B9h
seg000:0106 ; -------------------------------------------------------------
seg000:0106
seg000:0106 loc_0_106: ; CODE XREF: start+3↑u
seg000:0106 mov si, 114h
seg000:0109 lodsw
seg000:010A xchg ax, cx
seg000:010B push si
seg000:010C
seg000:010C loc_0_10C: ; CODE XREF: seg000:0110↓j
seg000:010C xor byte ptr [si], 66h
seg000:010F inc si
seg000:0110 loop loc_0_10C
seg000:0112 jmp si
seg000:0112 ; -------------------------------------------------------------
seg000:0114 db 18h ;
seg000:0115 db 0 ;
seg000:0116 db 0D2h ; T
seg000:0117 db 6Fh ; o
...

Since IDA Pro doesn't display the target address of the cross-reference, I'd suggest you display it manually. This will improve the code's readability and simplify navigation. Place the cursor on line :0x103, press the <:> key, and enter a comment in the dialog box that opens (for example, "jump to address 0106"). The display will change as follows:

seg000:0103 jmp si ; Jump to address 0106

Such a comment makes it possible to jump to the specified address: Just place the cursor on 0106 and press the key. Note that IDA Pro doesn't recognize hexadecimal format in the C style (0x106) or in the MASM\TASM style (0106h).

What does the value 114h represent at line :0x106 — a constant or an offset? To figure this out, we need to analyze the LODSW instruction. Since executing it results in loading the word located at address DS:SI into the AX register, the offset is loaded into the SI register.

seg000:0106 mov si, 114h
seg000:0109 lodsw

Pressing the key transforms the constant to an offset. The disassembled code will appear like this:

seg000:0106 mov si, offset unk_0_114
seg000:0109 lodsw
...
seg000:0114 unk_0_114 db 18h ; DATA XREF: seg000:0106↑o
seg000:0115 db 0 ;
seg000:0116 db 0D2h ; T
seg000:0117 db 6Fh ; o
...

IDA Pro automatically created a new name —unk_0_114— that refers to an unknown variable with a size of 1 byte. But the LODSW instruction loads a word into the AX register; therefore, we need to go to line :0144 and press the key twice to obtain the following code:

seg000:0114 word_0_114 dw 18h ; DATA XREF: seg000:0106↑o
seg000:0116 db 0D2h ; T

What does the word_0_144 location contain? The following code will help us find out:

seg000:0106 mov si, offset word_0_114
seg000:0109 lodsw
seg000:010A xchg ax, cx
seg000:010B push si
seg000:010C
seg000:010C loc_0_10C: ; CODE XREF: seg000:0110↓j
seg000:010C xor byte ptr [si], 66h
seg000:010F inc si
seg000:0110 loop loc_0_10C

In line :0x10A, the AX register value is moved to the CX register, then used by the LOOP LOC_010C instruction as a loop counter. The loop body is a simple decoder: The XOR instruction decrypts a byte pointed to by the SI register, and the INC SI instruction moves the pointer to the next byte. Therefore, the word_0_144 location contains the number of bytes to be decrypted. Place the cursor on it, press the key, and give it a better name ("BytesToDecrypt", for example).

There's one more unconditional register jump after the decryption loop.

seg000:0112 jmp si

To find out where it transfers control, we need to analyze the code and determine the SI register's contents. For this, the debugger is often used: We set a breakpoint on line 0x112 and, when the debugger window pops up, look for the register value. IDA Pro generates MAP files that contain the debugger information especially for this purpose. In particular, to avoid memorizing the numerical values of all the addresses being tested, each of them can be assigned easily remembered names. For example, if you place the cursor on line seg000:0112, then press the key and enter "BreakHere", the debugger will be able to calculate the return address automatically using its name.

To create a MAP file, click on Produce output file in the File menu and select Produce MAP file from the drop-down submenu, or press the + key combination. In either case, a dialog box will appear, which allows us to specify the data to include in the MAP file: information on segments, names automatically generated by IDA Pro (loc_0_106, sub_0x110, etc.), and demangled names. The contents of the MAP file obtained should be as follows:

Start Stop Length Name Class
00100H 0013BH 0003CH seg000 CODE
Address Publics by Value
0000:0100 start
0000:0112 BreakHere
0000:0114 BytesToDecrypt
Program entry point at 0000:0100

This format is supported by most debuggers, including the most popular one: SoftIce. It includes the msym utility, launched by specifying the MAP file on the command line. The SYM file obtained should be placed in the directory where the program being debugged is located, then loaded from the loader without specifying the extension (WLDR Crypt, for example). Otherwise, the character information won't be loaded.

Then, we need to set a breakpoint using the bpx BreakHere command, and quit the debugger with the x command. In a second, the debugger window will pop up again, informing us that the processor has reached a breakpoint. Looking at the registers displayed at the top of the screen by default, we can see that SI equals 0x12E.

This value can also be calculated mentally, without using the debugger. The MOV instruction at line 0x106 loads the offset 0x114 into the SI register. From here, the LODSW instruction reads the quantity of decrypted bytes —0x18— and the SI register is increased by the word size (2 bytes). Hence, when the decryption cycle is complete, the SI value will be 0x114+0x18+0x2 = 0x12E.

After calculating the jump address in the line 0x112, let's create a corresponding cross-reference (from 0x122 to 0x12E) and add a comment to line 0x112 ("Jump to address 012E"). Creating the cross-reference automatically disassembles the code from the address seg000:012E to the end of the file.

seg000:012E loc_0_12E: ; CODE XREF: seg000:0112↑u
seg000:012E call $+3
seg000:0131 pop cx
seg000:0132 pop si
seg000:0133 mov di, 100h
seg000:0136 push di
seg000:0137 sub cx, si
seg000:0139 repe movsb
seg000:013B retn

The CALL $+3 instruction ($ designates the current value of the IP instruction pointer) pushes the IP contents to a stack, from which it can be extracted into any general-purpose register. In Intel 80x86 microprocessors, the IP register cannot be addressed directly, and only instructions that change the course of execution can read its value, including call.

We can supplement lines 0x12E and 0x131 with a comment —MOV CX, IP— or we can calculate and substitute the direct value —MOV CX, 0x131.

The POP SI instruction at line 0x132 pops a word off the stack and places it in the SI register. Scrolling the disassembler upward, you will see the PUSH SI instruction at line 0x10B. This is paired with the POP SI instruction, and pushes the offset of the first decrypted byte to the stack. Now, the meaning of the subsequent MOV DI, 0x100\SUB CX, and SI\REPE MOVSB instructions is clear: They move the beginning of the decrypted fragment to the address starting at offset 0x100. Such an operation is characteristic for "wrapper" protections superimposed on a compiled file that should be "reset" to its "native" addresses before it is launched.

Before relocation, the CX register is loaded with the length of the block being copied. (The length is calculated by subtracting the offset of the first decrypted byte from the offset of the second instruction of the code performing relocation.) The true length is 3 bytes shorter; consequently, we need to subtract three from that value. However, the difference has no effect: The contents of memore locations at addresses beyond the end of the decrypted fragment aren't defined, and those locations may contain anything.

The 0x136:PUSH DI and 0x13B:RETN instructions are an analog of the CALL DI instruction: PUSH pushes the return address on the stack, and RETN extracts it and passes control to the corresponding address. Knowing the DI value (0x100), we can add a cross-reference (from :0x13B to :0x100) and a comment to line :0x13B — "Jump to address 0x100." However, after relocation, different code is located at the indicated addresses! Therefore, it's more logical to add the cross-reference from: 0x13B to :0x116 and the comment "Jump to address 0x116."

After the new cross–reference is created, IDA will try to disassemble the encrypted code. The following will result:

seg000:0116 loc_0_116: ; CODE XREF: seg000:013B↓u
seg000:0116 shr byte ptr [bx-24h], cl
seg000:0119 outsb
seg000:011A stos word ptr es: [edi]
seg000:011C inc di
seg000:011D movsw
seg000:011E add cx, cs:[bp+si]
seg000:0121 or cl, [bx+di]
seg000:0123 dec dx
seg000:0124 xor ax, 0F07h
seg000:0127 or cl, [bx+di]
seg000:0129 adc al, 47h
seg000:0129;--------------------------------------------------------
seg000:012B db 6Bh ; k
seg000:012C db 6Ch ; 1
seg000:012D db 42h ; B
seg000:012E;--------------------------------------------------------

Immediate disassembling of the encrypted code is impossible: It must be decrypted first. Most disassemblers aren't able to modify analyzed code on the fly; they require it to be decrypted completely beforehand. In practice, however, things are different. Before decrypting, we need to understand the decryption algorithm by analyzing the accessible part of the file. Then, we can quit the disassembler, decrypt the "secret" fragment, load the file into the disassembler again, and continue analyzing it until the next encrypted fragment occurs. We'll have to repeat the "quit-decrypt-load-analyze" cycle.

IDA allows us to solve the same task with less effort and without quitting the disassembler. This can be achieved because of virtual memory. We can imagine IDA is a "transparent" virtual machine, operating on the physical memory of the computer. To modify memory, we need to know the address. This consists of a pair of numbers: a segment address and an offset.

On the left side, each line's offset and segment name are given (seg000:0116, for example). We can get the base address of a segment from its name: Open the Segments window and select the Segments item from the View menu.

The required address is in the Base column. (It is in bold and underlined in Fig. 8.) Any location of the segment can be addressed using the [segment:offset] construction. Memory cells can be read and modified using the Byte and PatchByte functions, respectively. Calling a=Byte ([0x1000, 0x100]) reads the cell at 0x100 offset in the segment with the base address of 0x1000; calling PatchByte ([0x1000, 0x100], 0x27) writes the value 0x27 in the memory cell at the 0x100 offset in the segment with the base address of 0x1000. As their names indicate, the functions work one byte at a time.



Figure 8: The Segments window
These two functions and familiarity with the C language are enough to write a decrypting script. The IDA-C implementation doesn't follow completely the ANSI C standard. In particular, IDA doesn't allow the variable type to be set; the decompiler automatically defines it with the auto keyword when it's used for the first time. For example, auto MyVar, s0 declares two variables: MyVar and s0.

To create a script, we need to press the + key combination, or select IDC Command from the File menu. Then, we must enter the source code of the program into the dialog box that pops up.



Figure 9: An embedded script editor
Listing 11: The Source Code of a Decryption Script


auto a;
for (a=0x116; a<0x12E; a++)
PatchByte([0x1000, a], Byte([0x1000, a])^0x66);




Explanation As shown, the decryption algorithm sequentially converts the bytes of the encrypted fragment using the XOR 0x66 operation. (This operation is highlighted in bold.)


seg000:010C xor byte ptr [si], 66h
seg000:010F inc si
seg000:0110 loop loc_0_10C

The encrypted fragment itself starts from address seg000:0x116 and proceeds to address seg000:0x12E. Therefore, decryption in C looks like this: for (a=0x116; a<0x12E; a++) PatchByte ([0x1000, a], Byte ([0x1000,a]^0x66);

To execute the script, press the key (in IDA version 3.8x or higher), or the + key combination (in earlier versions). After executing the script, the disassembler window should show the code as it is in Listing 12.

If you encounter an error, you may have used the improper character case (IDA is case sensitive), the wrong syntax, or a base address that does not equal 0x1000. (Call the Segments window again to check its value.) Place the cursor on line seg000:0116 and press the "U" key to delete the previous disassembling results, then press the "C" key to disassemble the decrypted code anew.

Listing 12: The Output of the Decryption Script


seg000:0116 loc_0_116: ; CODE XREF: seg000:013B↓u
seg000:0116 mov ah, 9
seg000:0118 mov dx, 108h
seg000:011B int 21h ; DOS -- PRINT STRING
seg000:011B ; DS:DX (string terminated
seg000:011B ; by $)
seg000:011D retn
seg000:011D ;
------------------------------------------------------------------------------
seg000:011E db 48h ; H
seg000:011F db 65h ; e
seg000:0120 db 6Ch ; 1
seg000:0121 db 6Ch ; 1
seg000:0122 db 6Fh ; o
seg000:0123 db 2Ch ; ,
seg000:0124 db 20h ;
seg000:0125 db 53h ; S
seg000:0126 db 61h ; a
seg000:0127 db 69h ; i
seg000:0128 db 6Ch ; 1
seg000:0129 db 6Fh ; o
seg000:012A db 72h ; r
seg000:012B db 21h ; !
seg000:012C db 0Dh ;
seg000:012D db 0Ah ;
seg000:012E db 24h ; $
seg000:012F ; ------------------------------------------------------------------------




The chain of characters beginning at address seg000:011E can be converted to a readable string: Place the cursor on it, and press the A key. The disassembler window will look like this:

seg000:0116 loc_0_116: ; CODE XREF: seg000:013B↓u
seg000:0116 mov ah, 9
seg000:0118 mov dx, 108h
seg000:011B int 21h ; DOS -- PRINT STRING
seg000:011B ; DS:DX (string terminated
seg000:011B ; by $)
seg000:011D retn
seg000:011D ;
-----------------------------------------------------------------------
seg000:011E aHelloSailor db 'Hello, Sailor!', 0Dh, 0Ah, '$'
seg000:012E ;
-----------------------------------------------------------------------

Prior to calling interrupt 0x21, the MOV AH, 9 instruction at line :0116 prepares the AH register: It selects the function that will display the string whose offset is written in the DX register by the next instruction. To successfully assemble the listing, we need to replace the constant 0x108 with a corresponding offset. However, when assembling the code (before relocation), the string that will be displayed is located in another place! To solve this problem, you could create a new segment and copy the decrypted code to it; this would simulate the relocation of the working code.

Explanation The new segment, MySeg, can have any base address if there's no overlap with the seg000 segment. The initial address of a segment is set equal to a value that the offset makes 0x100 the first byte. The difference between the first and last addresses is the segment length. This can be calculated by subtracting the offset of the beginning of the decrypted fragment from the offset of its end: 0x13B - 0x116 = 0x25.


To create a new segment, select Segments from the View menu and press the Insert button in the dialog box. Another dialog box similar to the following one will appear:



Figure 10: Creating a new segment
We can use the following script to copy the required fragment to the segment we just created:

Listing 13: The Source Code of the Copying Script


auto a;
for (a=0x0; a<0x25; a++) PatchByte([0x2000, a+0x100], Byte([0x1000, a+0x116]));




To enter this script, press the Shift+F2 key combination again. The previous script will be lost. (IDA doesn't allow us to work simultaneously with more than one script.) After the operation is complete, the disassembler screen will look like this:

Listing 14: The Result of Executing the Copying Script


MySeg:0100 MySeg segment byte public ' ' use16
MySeg:0100 assume cs:MySeg
MySeg:0100 ;org 100h
MySeg:0100 assume es:nothing, ss:nothing, ds:nothing, fs:nothing, gs:nothing
MySeg:0100 db 0B4h ;
MySeg:0101 db 9 ;
MySeg:0102 db 0BAh ;
MySeg:0103 db 8 ;
MySeg:0104 db 1 ;
MySeg:0105 db 0CDh ;
MySeg:0106 db 21h ;
MySeg:0107 db 0C3h ;
MySeg:0108 db 48h ; H
MySeg:0109 db 65h ; e
MySeg:010A db 6Ch ; 1
MySeg:010B db 6Ch ; 1
MySeg:010C db 6Fh ; o
MySeg:010D db 2Ch ; ,
MySeg:010E db 20h ;
MySeg:010F db 53h ; S
MySeg:0110 db 61h ; a
MySeg:0111 db 69h ; i
MySeg:0112 db 6Ch ; 1
MySeg:0113 db 6Fh ; o
MySeg:0114 db 72h ; r
MySeg:0115 db 21h ; !
MySeg:0116 db 0Dh ;
MySeg:0117 db 0Ah ;
MySeg:0118 db 24h ; $
MySeg:0118 MySeg ends




Now, we need to create a cross-reference from :seg000:013B to :MySeg:0x100, converting the chain of characters to a readable string. For this, bring the cursor to line MySeg:0108 and press the A key. The disassembler window should change to the following:

Listing 15: The Result of Disassembling the Copied Fragment


MySeg:0100 loc_1000_100: ; CODE XREF: seg000:013B↑u
MySeg:0100 mov ah, 9
MySeg:0102 mov dx, 108h
MySeg:0105 int 21h ; DOS -- PRINT STRING
MySeg:0105 ; DS:DX (string terminated by $)
MySeg:0107 retn
MySeg:0107 ; --------------------------------------------------------------------------

MySeg:0108 aHelloSailorS db 'Hello, Sailor!', 0Dh, 0Ah
MySeg:0108 db '$'
MySeg:0118 MySeg ends




As a result of all these operations, the offsets loaded in the DX register are the same. (In the code, they are in bold.) If we bring the cursor to constant 108h and press the Ctrl+O key combination, it will change into an offset.

Listing 16: Converting a Constant into an Offset


MySeg:0102 mov dx, offset aHelloSailorS ; "Hello, Sailor!\r\n$"
MySeg:0105 int 21h ; DOS -- PRINT STRING
MySeg:0105 ; DS:DX (string terminated by $)
MySeg:0107 retn
MySeg:0107 ; -------------------------------------------------------------------------
MySeg:0108 aHelloSailorS db 'Hello, Sailor!', 0Dh, 0Ah ; DATA XREF: MySeg:0102o




The listing obtained is convenient for analysis, but it isn't ready for assembling: No assembler is capable of encrypting the required code. That can be performed manually, but IDA allows us to do the same without using any other tools.

Our demonstration will be more to the point if we make some changes to the file — add waiting for a keystroke, for example. To do this, we can use the assembler integrated into IDA First, however, we should separate the boundaries of MySeg to add some space for new code.

Select Segments from the View menu. In the window that opens, move the cursor to the MySeg line. Press the Ctrl+E key combination to open the dialog box for setting segment properties that contain, among other fields, the last address to be changed. We do not need to set an exact value; we can expand the segment with a small surplus over the space required to accommodate the planned changes.

If we try to add the code XOR AX, AX; INT 16h to the program, it would overwrite the beginning of the string "Hello, Sailor!". Therefore, we need to move it downward slightly beforehand (i.e., into higher addresses). We can do so with a script such as:

for (a=0x108; a <0x11A; a++) PatchByte([0x2000, a+0x20], Byte ([0x2000, a]);

Explanation The declaration of variable a is omitted for brevity. The relocation, as usual, is specified with a surplus to avoid the need for precise calculations. It occurs from left to right because the initial and target fragments do not overlap.


Place the cursor on line :0128 and press the A key to transform the chain of characters to a form convenient for reading. Then, bring the cursor to line :0102 and select the Assembler from the Path program submenu of the Edit menu. Enter the instruction MOV DX, 128h (where 128h is the new offset of the string) and immediately make it an offset by pressing the Ctrl+O key combination.

Now, enter the new code. Place the cursor on the ret instruction, call the assembler again, and enter XOR AX, AX "ENTER" INT 16h "Enter" RET "Enter" "Esc". It wouldn't be a bad idea to clean up a little: Reduce the segment size to the one used, and move the line containing "Hello, Sailor!" upward, closer to the code.

Explanation The Disable Address option in the Segment Properties window is called by pressing the Alt+S key combination. If it is set, you can decrease its size and delete addresses beyond the end of the segment.


If everything is done correctly, the final result should look as follows:

Listing 17: The Final Disassembled Code


seg000:0100 ; File Name : F:\IDAN\SRC\Crypt.com
seg000:0100 ; Format : MS-DOS COM-file
seg000:0100 ; Base Address : 1000h Range: 10100h-1013Ch Loaded length: 3Ch
seg000:0100
seg000:0100
seg000:0100 ; ==================================================
seg000:0100
seg000:0100 ; Segment type: Pure code
seg000:0100 seg000 segment byte public 'CODE' use16
seg000:0100 assume cs:seg000
seg000:0100 org 100h
seg000:0100 assume es:nothing, ss:nothing, ds:seg000
seg000:0100
seg000:0100 ; -------------- S U B R O U T I N E -------------------------
seg000:0100
seg000:0100
seg000:0100 public start
seg000:0100 start proc near
seg000:0100 add si, 6
seg000:0103 jmp si ; jump to address 0106
seg000:0103 start endp
seg000:0103
seg000:0103 ; ------------------------------------------------------------
seg000:0105 db 0B9h ; |
seg000:0106 ; ------------------------------------------------------------
seg000:0106 mov si, offset BytesToDecrypt
seg000:0109 lodsw
seg000:010A xchg ax, cx
seg000:010B push si
seg000:010C
seg000:010C loc_0_10C: ; CODE XREF: seg000:0110↓j
seg000:010C xor byte ptr [si], 66h
seg000:010F inc si
seg000:0110 loop loc_0_10C
seg000:0112
seg000:0112 BreakHere: ; Jump to the 012E address
seg000:0112 jmp si
seg000:0112 ; --------------------------------------------------------------
seg000:0114 BytesToDecrypt dw 18h ; DATA XREF: seg000:0106↑o
seg000:0116 ; --------------------------------------------------------------
seg000:0116
seg000:0116 loc_0_116: ; CODE XREF: seg000:013B↓u
seg000:0116 mov ah, 9
seg000:0118 mov dx, 108h ; "Hello, Sailor!\r\n$"
seg000:011B int 21h ; DOS -- PRINT STRING
seg000:011B ; DS:DX (string terminated
seg000:011B ; by $)
seg000:011D retn
seg000:011D ; ---------------------------------------------------------------
seg000:011E aHelloSailor db 'Hello, Sailor!', 0Dh, 0Ah, '$'
seg000:011E ;DATA XREF: seg000:0118↑o
seg000:012E ; ---------------------------------------------------------------
seg000:012E
seg000:012E loc_0_12E: ; CODE XREF: seg000:0112↓u
seg000:012E call $+3
seg000:0131 pop cx
seg000:0132 pop si
seg000:0133 mov di, 100h
seg000:0136 push di
seg000:0137 sub cx, si
seg000:0139 repe movsb
seg000:013B retn
seg000:013B seg000 ends
seg000:013B
MySeg:0100 ; -----------------------------------------------------------------
MySeg:0100 ; ================================================================
MySeg:0100
MySeg:0100 ; Segment type: Regular
MySeg:0100 MySeg segment byte public ' ' use16
MySeg:0100 assume cs:MySeg
MySeg:0100 ;org 100h
MySeg:0100
MySeg:0100 loc_1000_100: ; CODE XREF: seg000:013B↑u
MySeg:0100 mov ah, 9
MySeg:0102 mov dx, offset aHelloSailor_0
MySeg:0102 ; "Hello, Sailor!\r\n$"
MySeg:0105 int 21h ; DOS -- PRINT STRING
MySeg:0105 ; DS:DX (string terminated by $)
MySeg:0107 xor ax, ax
MySeg:0109 int 16h
MySeg:0109 ; KEYBOARD -- READ CHAR FROM BUFFER,
MySeg:0109 ; WAIT IF EMPTY
MySeg:0109 ; Return: AH = scan code, AL = character
MySeg:010B ; retn
MySeg:010B ; --------------------------------------------------------------
MySeg:010C aHelloSailor_0 db 'Hello, Sailor!', 0Dh, 0Ah, '$'
MySeg:010C ; DATA XREF: MySeg:0102↑o
MySeg:010C MySeg ends
MySeg:010C
MySeg:010C start
MySeg:010C end




Structurally, the program consists of the following parts: the decoder, occupying addresses from seg000:0x100 to seg000:0x113; the one word variable, containing the number of decrypted bytes that occupies the addresses from seg000:0x114 to seg000:0x116; the executable code of the program, occupying the entire MySeg segment; and the loader, occupying addresses from seg000:0x12E to seg000:0x13B. All these parts should be copied to the target file in the listed order. Prior to copying, each byte of the executable code should be encrypted using the XOR 0x66 operation.

An example of a script that performs these operations is in Listing 18. To load it, just press the F2 key or select IDC File from the Load File submenu of the File menu.

Listing 18: The Source Code of the Compiler Script


// A Compiler for the Crypt file
//
static main()
{
auto a,f;

// The Crtypt2.com file is opened for binary writing.
f = fopen("crypt2.com, ""wb");

// The decoder is copied into the Crypt2 file.
for (a = 0x100; a < 0x114; a++) fputc(Byte([0x1000, a]), f);
// The word that contains the number of bytes to be deciphered is
// found and copied to the file.
fputc(SegEnd([0x2000, 0x100]) -- SegStart([0x2000, 0x100]), f);
fputc(0,f) ;

// The deciphered fragment is copied and encrypted on the fly.
for (a = SegStart ([0x2000, 0x100]); a != SegEnd([0x2000, 0x100]); a++)
fputc(Byte(a) ^0x66, f);
// Code is added to the loader.
for(a = 0x12E; a < 0x13C; a++)
fputc(Byte([0x1000, a]), f) ;

// The file is closed.
fclose(f) ;
}




Executing this script will create the Crypt2.com file. You can test it by launching it. The program should display a string, wait until a key is pressed, and terminate.

One advantage of such an approach is a "walkthrough" compilation of the file; that is, the disassembled code wasn't assembled! Instead, the original contents, identical to the source file except for the modified lines, were read byte by byte from virtual memory. Repeated assembling practically never gives the results in the original file.

IDA is a convenient tool for modifying files whose source code is unavailable. It's almost the only disassembler capable of analyzing encrypted programs without using additional tools. It has an advanced user interface and a convenient system for navigating code being analyzed. It also can cope with any task.

However, these and many other capabilities can't be used to their full potential without mastery of the script language, which the previous example has confirmed.

Most protection methods can be cracked using standard techniques that don't require you to understand "how it works." A man widely known to investigators for nearly ten years (and with whom I share the same name) once said: "Having the skills to remove protection doesn't imply having the skills to set it." Crackers typically break and destroy. But a hacker's aim isn't breaking (i.e., finding ways to force the program to work at any cost); it's an understanding of the mechanism, of "how it works." Breaking is secondary.

[i]For example, Microsoft Visual C, regardless of the main function prototype, always passes three arguments to it: a pointer to the array of pointers to environment variables, a pointer to the array of pointers to command line arguments, and the number of command line arguments. All other functions of the start code take a smaller number of arguments

0 comments:

Post a Comment