Hacking KING: August 2009

Friday, August 21, 2009

Hackers Disassembling 1.1.4.3(Method 2: Setting a Breakpoint at the Password Input Function)

Method 2: Setting a Breakpoint at the Password Input Function

We can't call the previous method of directly searching for the entered password elegant or practical. Why should we search for the password, stumbling over irregularly scattered buffers, when we can place a breakpoint directly on the function that reads it? Will it be easier to guess which function the developer used?

The operation can be performed with one of just a few functions. Looking them up won't take a lot of time. In particular, editable field contents often are read with GetWindowTextA or, less frequently, with GetDlgItemTextA.

Since we're talking about windows, let's start our GUI crackme01 example and set a breakpoint at the GetWindowTextA function ("bpx GetWindowTextA"). Since this is a system function, the breakpoint will be global (i.e., it will affect all running applications). Therefore, close all unneeded programs. If you set the breakpoint before starting crackme01, you'll get several false windows because the system reads the window contents when displaying the dialog.

Let's enter KPNC Kaspersky++ as usual, then press the key. The debugger will show up instantly.

USER32!GetWindowTextA
001B: 77E1A4E2 55 push ebp
001B: 77E1A4E3 8BEC mov ebp, esp
001B: 77E1A4E5 6AFF push FF
001B: 77E1A4E7 6870A5E177 push 77E1A570
001B: 77E1A4EC 68491DE677 push 77E61D49
001B: 77E1A4F1 64A100000000 mov eax, fs: [00000000]
001B: 77E1A4F7 50 push eax

Many hacking manuals recommend that we immediately quit the function with P RET, saying there's no need to analyze it. But, we needn't hurry! We should clarify where the entered string is located and set a breakpoint at it. Let's look at the arguments the function accepts and the sequence in which it accepts them. (If you don't remember, view the SDK documentation.)

int GetWindowText (
HWND hWnd, // Handle to window or control with text
LPTSTR lpString, // Address of buffer for text
int nMaxCount // Maximum number of characters to copy
) ;

If a program is written in C, it may seem that the arguments are written on the stack according to the C convention. Nothing of the kind! All Windows API functions are called according to the Pascal convention, regardless of the language in which the program is written. Thus, arguments are pushed on the stack from left to right, and the last argument onto the stack is the return address. In 32-bit Windows, all arguments and the return address occupy a double word (4 bytes). Therefore, to reach the pointer to the string, you need to add 8 bytes to the stack's top pointer register, or ESP (one double word for nMaxCount, and another one for lpString). This is represented more clearly in Fig. 3.

Figure 3: The stack when calling GetWindowText
In SoftIce, you can display the contents of a specified address using the * operator. (See the debugger documentation for more details.)

:d * (esp+8)
0023:0012F9FC 1C FA 12 00 3B 5A E1 77-EC 4D E1 77 06 02 05 00 ... . ;Z.w.M.w... .
0023:0012FA0C 01 01 00 00 10 00 00 00-01 00 2A C0 10 A8 48 00 ... ... ... . *...H.
0023:0012FA1C 10 9B 13 00 0A 02 04 00-E8 3E 2F 00 00 00 00 00 ... ... ... >/ ... . .
0023:0012FA2C 01 02 04 00 83 63 E1 77-08 DE 48 00 0A 02 04 00 ... . . c.w. .H... . .

The buffer is filled with garbage because the string hasn't been read yet. Let's quit the function with P RET and see what happens. (Note that it will be impossible to use d *esp+8; after we exit the function, its arguments will be pushed off the stack.)

: p ret
:d 0012F9FC
0023:0012F9FC 4B 50 4E 43 20 4B 61 73-70 65 72 73 6B 79 2B 2B KPNC Kaspersky++
0023:0012FA0C 00 01 00 00 0D 00 00 00-01 00 1C 80 10 A8 48 00 ..............H.
0023:0012FA1C 10 9B 13 00 0A 02 04 00-E8 3E 2F 00 00 00 00 00 .........>/.....
0023:0012FA2C 01 02 04 00 83 63 E1 77-08 DE 48 00 0A 02 04 00 .....c.w..H.....

This is the buffer we need. Set a breakpoint and wait for the debugger window to show up. Look! (Do you recognize the comparing procedure?) After the first try, we are where we want to be.

001B:004013E3 8A10 mov dl, [eax]
001B:004013E5 8A1E mov bl, [esi]
001B:004013E7 8ACA mov cl, dl
001B:004013E9 3AD3 cmp dl, bl
001B:004013EB 751E jnz 0040140B
001B:004013ED 84C9 test cl, cl
001B:004013EF 7416 jz 00401407
001B:004013F1 8A5001 mov dl, [eax+01]

This is wonderful! Elegantly, quickly, beautifully — and without any false hits — we defeated the protection.

This method is universal; we'll take advantage of it many times. It simply requires us to determine the key function and set a breakpoint at it. In Windows, all attempts to read a password (calls to a key file, to the registry, etc.) are reduced to calls of API functions. There are many, but the number is finite and known beforehand.

Monday, August 17, 2009

Hackers Disassembling 1.1.4.2(Method 1: Searching Directly for the Entered Password in Memory)

Method 1: Searching Directly for the Entered Password in Memory

Storing a password as plain text in the program's body is more of an exception than rule. Hackers are hardly needed if the password can be seen with the naked eye. Therefore, protection developers try to hide it in every possible way. (We'll discuss how they do this later.) Taking into account the size of modern applications, a programmer may place the password in an unremarkable file stuffed with "dummies" — strings that look like a password, but are not. It's unclear what is fake and what isn't, especially because in a project of average size, there may be several hundreds, or even thousands, of suitable strings.

Let's approach the problem from the opposite side — let's not search for the original password, which is unknown to us, but rather for the string that we've fed to the program as the password. Then, let's set a breakpoint on it, and proceed in the same manner as before. The break will follow the watching call. We'll quit the matching procedure, correct JMP, and…

Let's take another look at the simple.c source code that we're cracking.

for (;;)
{
printf ("Enter password:") ;
fgets (&buff[0], PASSWORD_SIZE, stdin) ;

if (strcmp (&buff[0], PASSWORD) )
printf ("Wrong password\n") ;
else break;
if (++count>2) return -1;
}

Notice that the user-supplied password is read into buff, and compared to the reference password. If no match is made, the password again is requested from the user — but buff isn't cleared before the next attempt. From this, we can see that, if, upon receiving the message Wrong password, we open the debugger and walk through it with a context search, we may find buff.

So, let's begin. Let's start simple.exe, enter any password that comes to mind (KPNC Kaspersky ++, for example), ignore the Wrong cry and press + — the key combination for calling SoftIce. We needn't search blindly: Windows NT/9x isn't Windows 3.x or MS-DOS, with a common address space for all processes. Now, to keep one process from inadvertently intruding on another, each is allotted address space for its exclusive use. For example, process A may have the number 0x66 written at address 23:0146660, process B may have 0x0 written at the same address, 23:0146660, and process C may have a third value. Each process — A, B, or C — won't even suspect the existence of the others (unless it uses special resources for interprocessor communication).

You can find a more detailed consideration of all these issues in books by Helen Custer and Jeffrey Richter. Here, we're more worried about another problem: The debugger called by pressing the + key combination emerges in another process (most likely in Idle), and a context search over memory gives no results. We need to manually switch the debugger to the necessary address space.

From the documentation that comes with SoftIce, you may know that switching contexts is performed by the ADDR command, with either the process name truncated to eight characters or its PID. You can get that with another command — PROC. In cases where the process name is syntactically indistinguishable from a PID — "123", for example — we have to use the PID (the second column of digits in the PROC report).

:addr simple

Now, let's try the addr simple command. Nothing happens. Even the registers remain the same! Don't worry; the word "simple" is in the lower-right corner, identifying the current process. Keeping the same register values is just a bug in SoftIce. It ignores them, and only switches addresses. This is why tracing a switched program is impossible. Searching, however, is another matter.

:s 23:0 L -1 "KPNC Kaspersky"

The first argument after s is the search start address, written as selector:offset. In Windows 2000, selector 23 is used address data and the stack. In other operating systems, the selector may differ. We can find it by loading any program, and then read the contents of the DS register.

In general, starting a search from a zero offset is silly. According to the memory map, the auxiliary code is located there, and will unlikely contain the required password. However, this will do no harm, and will be much faster than trying to figure out the program load address and where to start the search. The third argument — L-1 — is the length of the area to search, where -1 means search until successful. Note that we are not searching for the entire string, but only for part of it (KPNC Kaspersky, not KPNC Kaspersky++). This allows us to get rid of false results. SoftIce likes to display references to its own buffers containing the search template. They are always located above 0x80000000, where no normal password ever lives. Nevertheless, it'll be more demonstrative if just the string we need is found using an incomplete substring.

Pattern found at 0023:00016E40 (00016E40)

We found at least one occurrence. But what if there are more of them in memory? Let's check this by issuing s commands until the message Pattern not found is received, or until the upper search address of 0x80000000 is exceeded.

:s
Pattern found at 0023:0013FF18 (0013FF18)
:s
Pattern found at 0023:0024069C (0024069C)
:s
Pattern found at 0023:80B83F18 (80B83F18)

We have three! Isn't this too much? It would be silly to set all three breakpoints. In this case, four debug-processor registers will suffice, but even three breakpoints are enough to get us lost! What would we do if we found ten matches?

Let's think: Some matches likely result from reading the input via the keyboard and putting characters into the system buffers. This seems plausible. How can we filter out the "interference?"

The memory map will help: Knowing the owner of an area that possesses a buffer, we can say a lot about that buffer. By typing in map32 simple, we obtain approximately the following:

:map32 simple
Owner Obj Name Obj# Address Size Type
simple .text 0001 001B:00011000 00003F66 CODE RO
simple .rdata 0002 0023:00015000 0000081E IDATA RO
simple .data 0003 0023:00016000 00001E44 IDATA RW

Hurrah! One of the matches belongs to our process. The buffer at address 0x16E40 belongs to the data segment and is probably what we need. But we shouldn't be hasty; everything may not be as simple as it seems. Let's look for the address 0x16E40 in the simple.exe file. (Taking into account the reverse sequence of bytes, it'll be 40 6E 01 00.)

> dumpbin /SECTION:.data /RAWDATA simple.exe
RAW DATA #3
00016030: 45 6E 74 65 72 20 70 61 73 73 77 6F 72 64 3A 00 Enter password:.
00016040: 6D 79 47 4F 4F 44 70 61 73 73 77 6F 72 64 0A 00 myGOODpassword..
00016050: 57 72 6F 6E 67 20 70 61 73 73 77 6F 72 64 0A 00 Wrong password..
00016060: 50 61 73 73 77 6F 72 64 20 4F 4B 0A 00 00 00 00 Password OK.....
00016070: 40 6E 01 00 00 00 00 00 40 6E 01 00 01 01 00 00 @n......@n......
00016080: 00 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00 ................

We found two of them there. Let's see what references the first one by looking for the substring 16070 in the decompiled code.

00011032: 68 70 60 01 00 push 16070h
00011037: 6A 64 push 64h ; Max. Password length (== 100 dec)
00011039: 8D 4D 98 lea ecx, [ebp-68h]
; The pointer to the buffer
; in which the password should be written
0001103C: 51 push ecx
0001103D: E8 E2 00 00 00 call 00011124 ; fgets
00011042: 83 C4 0C add esp, 0Ch ; Popping up three arguments

It should be clear where we are in the code, except for a mysterious pointer to 0x16070. In MSDN, where the prototype of the fgets function is described, we'll discover "the mysterious stranger" is a pointer to the FILE structure. (According to C convention, arguments are pushed onto the stack from right to left.) The first member of the FILE structure is the pointer to the buffer. (In the standard C library, the file input/output is buffered with a size of 4 KB by default.) Thus, the address 0x16E40 is a pointer to an auxiliary buffer, and we can cross it off the list of candidates.

Candidate No. 2 is 0x24069C. It falls outside the data segment. In general, it's not clear to whom it belongs. Remember the heap? Let's see what's there.

:heap 32 simple
Base Id Cmmt/Psnt/Rsvd Segments Flags Process
00140000 01 0003/0003/00FD 1 00000002 simple
00240000 02 0004/0003/000C 1 00008000 simple
00300000 03 0008/0007/0008 1 00001003 simple

That's it. We just need to clarify who allocated the memory — the system, or the programmer. The first thing that jumps out is the suspicious and strangely undocumented 0x8000 flag. We can find its definition in WINNT.H, but this won't be helpful unless it shows the system using the flag.

#define HEAP_PSEUDO_TAG_FLAG 0x8000

To be convinced, load any application into the debugger and give the command heap 32 proc_name. The system automatically allocates three areas from the heap — exactly like those in our case. This means that this candidate also has led nowhere.

One address remains: 0x13FF18. Does it remind you of anything? What was the ESP value while loading? It seems that it was 0x13FFC4. (Note that in Windows 9x, the stack is located in another place. Nevertheless, this reasoning also works for it: Just remember the stack location in your own operating system and know how to recognize it.)

Since the stack grows from the bottom up (i.e., from higher addresses to lower ones), the address 0x13FF18 is located on the stack. That's why it's similar to buffers. In addition, most programmers allocate buffers in local variables that, in turn, are allocated on the stack by the compiler.

Shall we try to set a breakpoint here?

:bpm 23:13FF18
:x
Break due to BPMB #0023:0013FF18 RW DR3 (ET = 369.65 microseconds)
MSR LastBranchFromIp = 0001144F
MSR LastBranchToIp = 00011156
001B:000110B0 mov eax, [edx]
001B:000110B2 cmp al, [ecx]
001B:000110B4 jnz 000110E4
001B:000110B6 or al, al
001B:000110B8 jz 000110E0
001B:000110BA cmp ah, [ecx+01]
001B:000110BD jnz 000110E4
001B:000110BF or ah, ah

We're in the body of the comparing procedure, which should be familiar. Let's display the values of the EDX and ECX pointers to find out what is being compared.

:d edx
0023:0013FF18 4B 50 4E 43 2D 2D 0A 00-70 65 72 73 6B 79 2B 2B KPNC Kaspersky++

:d ecx
0023:00016040 6D 79 47 4F 4F 44 70 61-73 73 77 6F 72 64 0A 00 myGOODpassword..

We've already discussed everything else that needs to be done. Let's quit the comparing procedure using the P RET command. Then, we need to find a branch, note its address, and correct the executable file. We're done.

You now are acquainted with one common way of cracking protection based on matching passwords. (Later, you'll see that this method is also suitable for cracking protection based on registration numbers.) Its main advantage is its simplicity. There are at least two drawbacks:

If the programmer clears the buffer after making a comparison, a search for the entered password will give nothing unless the system buffers remain. These are difficult to erase. However, it's also difficult to trace the password from system to local buffers!

With the abundance of auxiliary buffers, it can be difficult to find the "right" one. A programmer may allocate the password buffer in the data segment (a static buffer), on the stack (a local buffer), or on the heap. The programmer may even allocate memory using low-level VirtualAlloc calls. As a result, it sometimes appears necessary to go through all obtained occurrences.

Let's analyze another example: crackme01. It's the same as simple.exe except for its graphic user interface (GUI). Its key procedure looks like this:

Listing 5: The Source Code of the Key Procedure of crackme01

void CCrackme_01D1g: :OnOK()
{
char buff[PASSWORD_SIZE];
m_password.GetWindowText (&buff[0], PASSWORD_SIZE);
if (strcmp (&buff[0] , PASSWORD) )
{
MessageBox("Wrong password") ;
m_password.SetSel (0,-1,0) ;
return;
}
else
{

MessageBox ("Password OK");
}
CDialog: :OnOK() ;
}

Everything seems straightforward. Enter the password KPNC Kaspersky++ as usual, but before you press the OK button in response to the wrong password dialog, call the debugger and switch the context.

:s 23:0 L -1 'KPNC Kaspersky'
Pattern found at 0023:0012F9FC (0012F9FC)
:s
Pattern found at 0023:00139C78 (00139C78)

There are two occurrences, and both are on the stack. Let's begin with the first one. Set a breakpoint and wait for the debugger to emerge. The debugger's window does not make us wait long, but it shows some strange code. Press the key to quit. A cascade of windows follows, each less intelligible than the previous one.

We can speculate that the CCrackme_01D1g: :OnOK function is called directly when the OK button is pressed: It's allotted part of the stack for local variables, which is deallocated automatically when the function is exited. Thus, the local buffer with the password that we've entered exists only when it is checked, and then it is erased automatically. Our only bit of luck is the modal dialog, which tells us that we entered the wrong password. While it remains on the screen, the buffer still contains the entered password, which can be found in memory. But this does little to help us trace when this buffer will be accessed. We have to sort through the false windows one by one. At last, we see the string we seek in the data window and some intelligent code in the code window.

0023:0012F9FC 4B 50 4E 43 20 4B 61 73-70 65 72 73 6B 79 2B 2B KPNC
Kaspersky++
0023:0012FA0C 00 01 00 00 0D 00 00 00-01 00 1C C0 A8 AF 47 00
...G.
0023:0012FA1C 10 9B 13 00 78 01 01 00-F0 3E 2F 00 00 00 00 00
...x...>/...
0023:0012FA2C 01 01 01 00 83 63 E1 77-F0 AD 47 00 78 01 01 00
...c.w..G.x...

001B:004013E3 8A10 mov d1, [eax]
001B:004013E5 8A1E mov b1, [esi]
001B:004013E7 8ACA mov c1, d1
001B:004013E9 3AD3 cmp d1, b1
001B:004013EB 751E jnz 0040140B
001B:004013ED 84C9 test c1, c1
001B:004013EF 7416 jz 00401407
001B:004013F1 8A5001 mov dl, [eax+01]

Let's see where ESI points.

:d esi
0023:0040303C 4D 79 47 6F 6F 64 50 61-73 73 77 6F 72 64 00 00 MyGoodPassword..

All that remains is to patch the executable file. Here, more difficulties are waiting for us. First, the compiler has optimized the code, inserting the strcmp code instead of calling it. Second, it's swarming with conditional jumps! It will take a lot of work to find what we need. Let's approach the problem in a scientific way by viewing the disassembled code, or, to be more exact, its key fragment that compares the passwords:

>dumpbin /DISASM crackme_01.exe
004013DA: BE 3C 30 40 00 mov esi, 40303Ch
0040303C: 4D 79 47 6F 6F 64 50 61 73 73 77 6F 72 64 00 MyGoodPassword

A pointer to the reference password was placed in the ESI register.

004013DF: 8D 44 24 10 lea eax, [esp+10h]

A pointer to the user-supplied password was placed in the EAX register.

004013E3: 8A 16 mov d1, byte ptr [esi]
004013E5: 8A 1E mov b1, byte ptr [esi]
004013E7: 8A CA mov c1, d1
004013E9: 3A D3 cmp d1, b1

A comparison was made to the first character.

004013EB: 75 1E jne 0040140B ←---(3)---→ (1)

If the first character didn't match, a jump was made. Further checking would be pointless.

004013ED: 84 C9 test cl, cl

Did the first character equal zero?

004013EF: 74 16 je 00401407 ---→ (2)

If so, we reached the end of line and the passwords would be identical.

004013F1: 8A 50 01 mov d1, byte ptr [eax+1]
004013F4: 8A 5E 01 mov b1, byte ptr [esi+1]
004013F7: 8A CA mov c1, d1
004013F9: 3A D3 cmp d1, b1

The next pair of characters were checked.

004013FB: 75 0E jne 0040140B ---→ (1)

If they were not equal, the check was stopped.

004013FD: 83 C0 02 add eax, 2
00401400: 83 C6 02 add esi, 2

The next two characters were examined

00401403: 84 C9 test c1, c1

Did we reach the end of line?

00401405: 75 DC jne 004013E3 -→ (3)

No, we didn't. Matching was continued.

00401407: 33 C0 xor eax, eax ←---(2)
00401409: EB 05 jmp 00401410 ---→ (4)

This shows EAX was cleared (strcmp returns zero if successful) and quit.

0040140B: 1B C0 sbb eax, eax ←---(3)
0040140D: 83 D8 FF sbb eax, 0FFFFFFFFh

This branch is executed when the passwords don't match. EAX was set to a nonzero value. (Guess why.)

00401410: 85 C0 test eax, eax ←---(4)

If EAX equaled zero, a check was made.

00401412: 6A 00 push 0
00401414: 6A 00 push 0

Something was placed on the stack.

00401416: 74 38 je 00401450<<<<---→(5)

A jump was made somewhere.

00401418: 68 2C 30 40 00 push 40302Ch
0040302C: 57 72 6F 6E 67 20 70 61 73 73 77 6F 72 64 00 .Wrong password

Aha! "Wrong password." (The code that follows isn't of interest; it's just displaying error messages.)

Now that we understand the algorithm, we can crack it (for example, by replacing the conditional jump in line 0x401416 with an short unconditional jump, such as 0xEB).

Hackers Disassembling 1.1.4.1 (Method 0: Cracking the Original Password )

Method 0: Cracking the Original Password

Using the wldr utility delivered with SoftIce, load the file to be cracked by specifying its name on the command line, for example, as follows:

> wldr simple.exe

Yes, wldr is a 16-bit loader, and NuMega recommends that you use its 32-bit version, loader32, developed for Windows NT/9x. They have a point, but loader32 often malfunctions. (In particular, it does not always stop at the first line of the program.) However, wldr works with 32-bit applications, and the only disadvantage is that it doesn't support long file names.

If the debugger is configured correctly, a black textbox appears — a surprise to beginners. Command.com in the era of graphical interfaces! Why not? It's faster to type a command than to search for it in a long chain of nested submenus, trying to recollect where you saw it last. Besides, language is the natural means to express thoughts; a menu is best suited for listing dishes at a cafe. As an example, try to print the list of files in a directory using Windows Explorer. Have you succeeded? In MS-DOS, it was simple: dir > PRN.

If you only see INVALID in the text box (this will probably be the case), don't get confused: Windows simply hasn't yet allocated the executable file in memory. You just need to press the key (an analog of the P command that traces without entering, or stepping over, the function) or the key (an analog of the T command that traces and enters, or steps into, the function). Everything will fall into place.

001B:00401277 INVALID
001B:00401279 INVALID
001B:0040127B INVALID
001B:0040127D INVALID
:P

001b:00401285 push ebx
001b:00401286 push esi
001b:00401287 push edi
001b:00401288 mov [ebp-18], esp
001B:0040128B call [KERNEL32!GetVersion]
001b:00401291 xor edx, edx
001b:00401293 mov dl, ah
001b:00401295 mov [0040692c], edx

Pay attention: Unlike the DUMPBIN disassembler, SoftIce recognizes system function names, thus significantly simplifying analysis. However, there's no need to analyze the entire program. Let's quickly try to find the protection mechanism and, without going into detail, chop it off altogether. This is easy to say—and even easier to do! Just recall where the reference password is located in memory. Umm… Is your memory failing? Can you remember the exact address? We'll have to find it!

We'll ask the map32 command for help. It displays the memory map of a selected module. (Our module has the name "simple," the name of the executable file without its extension.)

:map32 simple
Owner Obj Name Obj# Address Size Type
simple .text 0001 001B:00401000 00003F66 CODE RO
simple .rdata 0002 0023:00405000 0000081E IDATA RO
simple .data 0003 0023:00406000 00001E44 IDATA RW

Here is the address of the beginning of the .data section. (Hopefully you remember that the password is in the .data section.) Now, create the data window using the wc command. Then, issue the d 23:406000 command, and press the + key combination to get to the desired window. Scroll using the <↓> key, or put a brick on the key. We won't need to search long.

0023:00406040 6D 79 47 4F 4F 44 70 61-73 73 77 6F 72 64 0A 00 myGOODpassword...
0023:00406050 57 72 6F 6E 67 20 70 61-73 73 77 6F 72 64 0A 00 Wrong password..
0023:00406060 50 61 73 73 77 6F 72 64-20 4F 4B 0A 00 00 00 00 Password OK.....
0023:00406070 47 6E 40 00 00 00 00 00-40 6E 40 00 01 01 00 00 Gn@.....@n@.....
0023:00406080 00 00 00 00 00 00 00 00-00 10 00 00 00 00 00 00 ................
0023:00406090 00 00 00 00 00 00 00 00-00 00 00 00 02 00 00 00 ................
0023:004060A0 01 00 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................
0023:004060B0 00 00 00 00 00 00 00 00-00 00 00 00 02 00 00 00 ................

We've got it! Remember that to be checked, the user-entered password needs to be compared to the model value. By setting a breakpoint at the instruction for reading address 0x406040, we will catch the comparison "by its tail." No sooner said than done.

:bpm 406040

Now, press the + key combination (or issue the x command) to exit the debugger. Enter any password that comes to mind — KPNC++, for example. The debugger pops up immediately:

001B:004010B0 mov eax, [edx]
001B:004010B2 cmp al, [ecx]
001B:004010B4 jnz 004010E4 (JUMP ↑)
001B:004010B6 or al, al
001B:004010B8 jz 004010E0
001B:004010BA cmp ah, [ECX+01]
001B:004010BD jnz 004010E4
001B:004010BF or ah, ah
Break due to BPMB #0023:00406040 RW DR3 (ET=752.27 milliseconds)
MSR LastBranchFromIp=0040104E
MSR LastBranchToIp=004010A0

Because of certain architectural features of Intel processors, the break is activated after the instruction has been executed (i.e., CS:EIP points to the following executable instruction — to JNZ 004010E4, in our case). Therefore, the memory location with our breakpoint was addressed by the CMP AL, [ECX] instruction. What is in AL? Let's look at the line above: MOV EAX, [EDX]. We can assume that ECX contains a pointer to the string with the reference password (because it caused the break in execution). This means EDX must be a pointer to the password entered by the user. Let's verify our assumption.

:d edx
0023:00406040 6D 79 47 4F 4F 44 70 61-73 73 77 6F 72 64 0A 00 myGOODpassword..
:d edx
0023:0012FF18 4B 50 4E 43 2B 2B 0A 00-00 00 00 00 00 00 00 00 KPNC++..........

We were right. Now, the only question is how to crack this. We might replace JNZ with JZ, or more elegantly replace EDX with ECX — then the reference password will be compared to itself! Wait a minute… We shouldn't hurry. What if we aren't in the protection routine, but in the library function (actually, in strcmp)? Changing it will result in the program perceiving any strings as identical, not just the reference and entered passwords. It won't hurt our example, in which strcmp was only called once, but it would cause normal, fully functional applications to fail. What can be done?

Let's exit strcmp and change the IF that determines whether or not the password is right. For this purpose, P RET is used (to trace until the RET instruction occurs — returning from the function).

:P RET
001B:0040104E call 004010A0
001B:00401053 add esp, 08
001B:00401056 test eax, eax
001B:00401058 jz 00401069
001B:0040105A push 00406050
001B:0040105F call 00401234
001B:00401064 add esp, 04
001B:00401067 jmp 0040106B

This is familiar. We were previously here with the disassembler. We can take the same steps now: Replace the TEST instruction with XOR, or write the sequence of bytes that identifies… Just a moment. Where are our bytes, the hexadecimal instructions? SoftIce doesn't display them by default, but the CODE ON command forces it to do so.

code on
001B:0040104E E84D000000 call 004010A0
001B:00401053 83C408 add esp, 08
001B:00401056 85C0 test eax, eax
001B:00401058 740F jz 00401069
001B:0040105A 6850604000 push 00406050
001B:0040105F E8D0010000 call 00401234
001B:00401064 83C404 add esp, 04
001B:00401067 EB02 jmp 0040106B

That's better. But how can we be sure that these bytes will be in the executable file at the same addresses? The question isn't as silly as it may seem. Try to crack the example crackme0x03 using the method just given. At first, it seems similar to simple.exe—even the reference password is located at the same address. Let's set a breakpoint on it, wait for the debugger to pop up, exit the comparing procedure, and look at the code identical to the one we previously came across.

001B:0042104E E87D000000 call 004210D0
001B:00421053 83C408 add esp, 08
001B:00421056 85C0 test eax, eax
001B:00421058 740F jz 00421069

Start HIEW, jump to address 0x421053, and… Oops; HIEW is upset with us. It says there's no such address in the file! The last byte ends at 0x407FFF. How can we be at 0x421053 in the debugger but not in the file? Perhaps we're in the body of a Windows system function. But Windows system functions are located much higher — beginning at 0x80000000.

The PE file could be loaded at a different address than the one for which it was created. (This property is called relocatability.) The system automatically corrects references to absolute addresses, replacing them with new values. As a result, the file image in memory doesn't correspond to the one written on disk. How can we find the place that needs to be corrected now?

This task is partly facilitated by the system loader, which only can relocate DLLs and always tries to load executable files at their "native" addresses. If this is impossible, loading is interrupted and an error message is sent. Likely, we are dealing with a DLL loaded by the protection we are investigating. Why are DLLs here, and where did they come from?

We'll have to study Listing 2 to find out.

Listing 4.2: The Source Code of crackme0x03

#include
#include

__declspec(dllexport) void Demo()
{
#define PASSWORD_SIZE 100
#define PASSWORD "myGOODpassword\n"

int count=0;
char buff [PASSWORD_SIZE]="";

for(;;)
{
printf("Enter password:");
fgets(&buff[0], PASSWORD_SIZE-1, stdin);

if (strcmp(&buff[0], PASSWORD))
printf("Wrong password\n");
else break;

if (++count>2) return -1;
}
printf("Password OK\n");
}

main()
{
HMODULE hmod;
void (*zzz) ();

if ((hmod=LoadLibrary("crack0~1.exe"))
&& (zzz=(void (*) ())GetProcAddress (h, "Demo")))
zzz();

}

What a way to call a function! This technique exports it directly from the executable file and loads the same file as a DLL. (Yes, the same file can be both the executable application and the DLL.)

"It doesn't make a difference", a naive programmer might object. "Everyone knows that Windows isn't so silly as to load the same file twice. LoadLibrary will return the base address of the crackme0x03 module, but won't allocate memory for it." Nothing of the sort! An artful protection scheme accesses the file by its alternate short name, leaving the system loader in a deep delusion.

The system allocates memory and returns the base address of the loaded module to the hmod variable. The code and data of this module are displaced by the hmod value — the base address of the module with which HIEW and the disassembler work. We can easily figure out the base address: Just call DUMPBIN with the /HEADERS key. (Only a fragment of its response is given.)

>dumpbin /HEADERS crack0x03
OPTIONAL HEADER VALUES
...
400000 image base
...

Hence, the base address is 0x400000 (in bytes). We can determine the load address using the mod -u command in the debugger. (The -u key allows us to display only application modules, not system ones.)

:mod -u
hMod Base PEHeader Module Name File Name
00400000 004000D8 crack0x0 \.PHCK\src\crack0x03.exe
00420000 004200D8 crack0x0 \.PHCK\src\crack0x03.exe
77E80000 77E800D0 kernel32 \WINNT\system32\kernel32.dll
77F80000 77F800C0 ntdll \WINNT\system32\ntdll.dll

Two copies of crack0x03 are loaded at once, and the last one is located at 0x420000 — just what we need! Now, it's easy to calculate that the address 0x421056 (the one we tried to find in the cracked file) "on disk" corresponds to the address 0x421056 - (0x42000 - 0x400000) = 0x421056 - 0x20000 = 0x401056. Let's take a look at that location:

00401056: 85C0 test eax, eax
00401058: 740F je .000401069 --------(1)

Everything is as expected — see how well it matches the dump produced by the debugger:

001B:00421056 85C0 test eax, eax
001B:00421058 740F jz 00421069

This calculation technique is applicable to any DLL, not just to those representing executable files.

If, instead of tracing the addresses, we used the debugger on the program being cracked to look for the sequence of bytes taken from the debugger, including the one in CALL 00422040, would we find the sequence?

001B:0042104E E87D000000 call 004210D0
001B:00421053 83C408 add esp, 08
001B:00421056 85C0 test eax, eax
001B:00421058 740F jz 00421069
:File image in memory
.0040104E: E87D000000 call .0004010D0 --------(1)
.00401053: 83C408 add esp, 008 ; "▪"
.00401056: 85C0 test eax, eax
.00401058: 740F je .000401069 --------(2)
:File image on disk

The same machine code — E8 7D 00 00 00 — corresponds to the CALL 0x4210D0 and CALL 0x4010D0 instructions. How can this be? Here's how: The operand of the 0xE8 processor instruction does not represent the offset of a subroutine; it represents the difference between the offsets of the subroutine and the instruction next to the CALL instruction. Therefore, in the first case, 0x421053 (the offset of the instruction next to CALL) + 0x0000007D (don't forget about the reverse byte order in double words) = 0x4210D0 — the required address. Thus, when the load address is changed, we don't need to correct the CALL instruction.

In the crack0x03 example, the following line is also in another location (which can be found using HIEW):

004012C5: 89154C694000 mov [00040694C], edx

The MOV instruction uses absolute addressing, rather than indirect. What will happen if you change the load address of the module? Will the file image on disk and that in memory be identical in this case?

Looking at the address 0x4212C5 (0x4012C5 + 0x2000) using the debugger, we see that the call does not go to 0x42694C, but to 0x40694C! Our module intrudes in another's domain, modifying it as it likes. This can quickly lead to a system crash! In this case, it doesn't crash, but only because the line being accessed is located in the Startup procedure (in start code), has already been executed (when the application started), and isn't called from the loaded module. It would be another matter altogether if the Demo () function accessed a static variable; the compiler, having substituted its offset, would make the module unrelocatable! It's hard to imagine how DLLs, whose load address isn't known beforehand, manage to work. But there are at least two solutions.

The first is to use indirect addressing instead of direct (for example, [reg+offset_val], where reg is a register containing the base load address, and offset_val is the offset of the memory location from the beginning of the module). This will allow the module to be loaded at any address, but the loss of just one register will appreciably lower the program's performance.

The second is to instruct the loader to correct direct offsets according to a selected base load address. This will slightly slow loading, but it won't affect the speed of the program. This doesn't mean that load time can be neglected; this method simply is preferred by Microsoft.

The problem is distinguishing actual direct offsets from constants that have the same value. It'd be silly to decompile a DLL just to clear up which locations we need to tweak. It's much easier to list the addresses in a special table, bearing the name Relocation [Fix Up] table, directly in the loaded file. The linker is responsible for creating it. Each DLL contains such a table.

To get acquainted with the table, compile and study the following listing.

Listing 4.3: The Source Code of fixupdemo.c

::fixupdemo.c
_ _declspec(dllexport) void meme(int x)
{
static int a=0x666;
a = x;
}
> cl fixupdemo.c /LD

Compile the code, then decompile it right away using "DUMPBIN/DISASM fixupdemo.dll" and "DUMPBIN/SECTION:.data/RAWDATA".

10001000: 55 push ebp
10001001: 8B EC mov ebp, esp
10001003: 8B 45 08 mov eax, dword ptr [ebp+8]
10001006: A3 30 50 00 10 mov [10005030], eax
1000100B: 5D pop ebp
1000100C: C3 ret

RAW DATA #3
10005000: 00 00 00 00 00 00 00 00 00 00 00 00 33 24 00 10 ............3$..
10005010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
10005020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
10005030: 66 06 00 00 64 11 00 10 FF FF FF FF 00 00 00 00 f...d...........

Judging by the code, the contents of EAX are always written to 0x10005030. Nevertheless, don't jump to conclusions! Try "DUMPBIN/RELOCATIONS fixupdemo.dll".

BASE RELOCATIONS #4
1000 RVA, 154 SizeOfBlock
7 HIGHLOW
1C HIGHLOW
23 HIGHLOW
32 HIGHLOW
3A HIGHLOW

The relocation table isn't empty! Its first entry points to the location 0x100001007, obtained by adding the offset 0x7 with the RVA address 0x1000 and the base load address 0x10000000 (found using DUMPBIN). The location 0x100001007 belongs to the MOV [0x10005030], EAX instruction, and it points to the highest byte of the direct offset. This offset is corrected by the loader while linking the DLL (if required).

Want to check? Let's create two copies of one DLL (such as fixupdemo.dll and fixupdemo2.dll) and load them one by one using the following program:

Listing 4: The Source Code of fixupload.c

: :fixupload.c
#include

main ()
{
void (*demo) (int a) ;
HMODULE h;
if ( (h=LoadLibrary ("fixupdemo.dll") ) &&
(h=LoadLibrary ("fixupdemo2.dll") ) &&
(demo=(void (*) (int a) )GetProcAddress (h, "meme") ) )
demo (0x777);
}
> cl fixupload

Since we can't load two different DLLs at the same address (how will the system know it's the same DLL?), the loader has to relocate one. Let's load the compiled program in the debugger, and set a breakpoint at the LoadLibraryA function. This is necessary to skip the startup code and get into the main function body. (Program execution doesn't start from the main function; instead, it starts from the auxiliary code, in which you can easily "drown.") Where did the A character at the end of the function name come from? Its roots are closely related to the introduction of Unicode in Windows. (Unicode encodes each character with 2 bytes. Therefore, 216 = 65,536 symbols, enough to represent practically all of the alphabets of the world.) The LoadLibrary name may be written in any language or in many languages simultaneously — in Russian-French-Chinese, for example. This seems tempting, but doesn't it decrease performance? It certainly does, and substantially. There's a price to be paid for Unicode! ASCII encoding suffices in most cases. Why waste precious processor clock ticks? To save performance, size was disregarded, and separate functions were created for Unicode and ASCII characters. The former received the W suffix (Wide); the latter received A (ASCII). This subtlety is hidden from programmers: Which function to call — W or A — is decided by the compiler. However, when you work with the debugger, you should specify the function name — it cannot determine the suffix independently. The stumbling block is that certain functions, such as ShowWindows, have no suffixes; their library names are the same as the canonical one. How do we know?

The simplest way is to look up the import table of the file being analyzed, and find your function there. For example, in our case:

> DUMPBIN /IMPORTS fixupload.exe > filename
> type filename
19D HeapDestroy
1C2 LoadLibraryA
CA GetCommandLineA
174 GetVersion
7D ExitProcess
29E TerminateProcess
...

From this fragment, you can see that LoadLibrary has the A suffix. The Exit-Process and TerminateProcess functions have no because they don't work with strings.

The other way is to look in the SDK. You won't find library names in it, but the Quick Info subsections give brief information on Unicode support (if such support is implemented). If Unicode is supported, the W or A suffix is indicated; if not, there are no suffixes. Shall we check this?

Here's Quick Info on LoadLibrary:

QuickInfo
Windows NT: Requires version 3.1 or later.
Windows: Requires Windows 95 or later.
Windows CE: Requires version 1.0 or later.
Header: Declared in winbase.h.
Import Library: Use kernel32.lib.
Unicode: Implemented as Unicode and ANSI versions on Windows NT.

We now understand the situation for Windows NT, but what about the one for the more common Windows 95/98? A glance at the KERNEL32.DLL export table shows there is such a function. However, looking more closely, we see something surprising: Its entry point coincides with the entry points of ten other functions!

ordinal hint RVA name
556 1B3 00039031 LoadLibraryW

The third column in the DUMPBIN report is the RVA address — the virtual address of the beginning of the function minus the file-loading base address. A simple search shows that it occurs more than once. Using the srcln program-filter to obtain the list of functions, we get the following:

21: 118 1 00039031 AddAtomW
116: 217 60 00039031 DeleteFileW
119: 220 63 00039031 DisconnectNamedPipe
178: 279 9E 00039031 FindAtomW
204: 305 B8 00039031 FreeEnvironmentStringsW
260: 361 F0 00039031 GetDriveTypeW
297: 398 115 00039031 GetModuleHandleW
341: 442 141 00039031 GetStartupInfoW
377: 478 165 00039031 GetVersionExW
384: 485 16C 00039031 GlobalAddAtomW
389: 490 171 00039031 GlobalFindAtomW
413: 514 189 00039031 HeapLock
417: 518 18D 00039031 HeapUnlock
440: 541 1A4 00039031 IsProcessorFeaturePresent
455: 556 1B3 00039031 LoadLibraryW
508: 611 1E8 00039031 OutputDebugStringW
547: 648 20F 00039031 RemoveDirectoryW
590: 691 23A 00039031 SetComputerNameW
592: 693 23C 00039031 SetConsoleCP
597: 698 241 00039031 SetConsoleOutputCP
601: 702 245 00039031 SetConsoleTitleW
605: 706 249 00039031 SetCurrentDirectoryW
645: 746 271 00039031 SetThreadLocale
678: 779 292 00039031 TryEnterCriticalSection

What a surprise: All Unicode functions live under the same roof. Since it's hard to believe that LoadLibraryW and, say, DeleteFileW are identical, we have to assume that we are dealing with a "stub", which only returns an error. Therefore, the LoadLibraryW function isn't implemented in Windows 9x.

However, let's get back to the subject at hand. Let's open the debugger, set a breakpoint on LoadLibraryA, then quit the debugger and wait for it to pop up. Fortunately, we won't have to wait long.

KERNEL32!LoadLibraryA
001B:77E98023 push ebp
001B:77E98024 mov ebp, esp
001B:77E98026 push ebx
001B:77E98027 push esi
001B:77E98028 push edi
001B:77E98029 push 77E98054
001B:77E9802E push dword ptr [ebp+08]

Let's issue the P RET command to exit LoadLibraryA (we really don't need to analyze it), and return to the easily recognizable main function.

001B:0040100B call [KERNEL32!LoadLibraryA]
001B:00401011 mov [ebp-08], eax
001B:00401014 cmp dword ptr [ebp-08], 00
001B:00401018 jz 00401051
001B:0040101A push 00405040
001B:0040101F call [KERNEL32!LoadLibraryA]
001B:00401025 mov [ebp-08], eax
001B:00401028 cmp dword ptr [ebp-08], 00

Note the value of the EAX register — the function has returned the load address to it (on my computer, 0x10000000). Continuing to trace (using the key), wait for the second execution of LoadLibraryA. This time, the load address has changed. (On my computer, it now equals 0x0530000.)

We are getting closer to the demo function call. (In the debugger, it looks like PUSH 00000777\ CALL [EBP-04]. The EBP-04 tells us nothing, but the 0x777 argument definitely reminds us of something in Listing 4.) Don't forget to move your finger from the key to the key to enter the function.

001B:00531000 55 push ebp
001B:00531001 8BEC mov ebp, esp
001B:00531003 8B4508 mov eax, [ebp+08]
001B:00531006 A330505300 mov [00535030], eax
001B:0053100B 5D pop ebp
001B:0053100C C3 ret

That's it! The system loader corrected the address according to the base address of loading the DLL itself. This is how it should work. However, there's one problem — neither that location, nor the sequence A3 30 50 53 00, is in the original DLL, which we can easily see via a context search. How can we find this instruction in the original DLL? Perhaps we'd like to replace it with NOPs.

Let's look a little bit higher — at instructions that don't contain relocatable elements: PUSH EBP/MOV EBP, ESP/MOV EAX, [EBP+08]. Why not look for the sequence 55 8B EC xxx A3? In this case, it'll work but, if the relocatable elements were densely packed with "normal" ones, we wouldn't find it. The short sequence would produce many false hits.

A more reliable way to find the contents of relocatable elements is to subtract the difference between the actual and recommended load address from them: 0x535030 (the address modified by the loader) - (0x530000 (the base loading address) - 0x10000000 (the recommended loading address)) - 0x10005030. Taking into account the reverse sequence of bytes, the machine code of the MOV [10005030], EAX instruction should look like this: A3 30 50 00 10. If we search for it using HIEW, miracle of miracles, there it is!

Thursday, August 13, 2009

Hackers Disassembling 1.1.4 ( Step Four: Getting Acquainted with the Debugger )

Step Four: Getting Acquainted with the Debugger

Overview
Debugging was initially the step-by-step execution of code, which is also called tracing. Today, programs have become so inflated that tracing them is senseless — you'll sink into a whirlpool of nested procedures, and you won't even understand what they do. A debugger isn't the best way to understand a program; an interactive disassembler (IDA, for example) copes better with this task.

We'll defer a detailed consideration of the debugger for a while. (See the section "Counteracting Debuggers.") For now, we will focus on the main functions. Using debuggers efficiently is impossible without understanding the following:

Tracing write/read/execute addresses, also called breakpoints

Tracing write/read calls to input/output ports (which can no longer be used for protection with modern operating systems because they forbid applications such low-level hardware access — that is now the prerogative of drivers, where protection is seldom implemented)

Tracing the loading of the dynamic link library (DLL) and the calling of certain functions, including system components (which, as we'll see later, is the main weapon of the present-day hacker)

Tracing program/hardware interrupts (which is not particularly relevant, since protection rarely plays with interrupts)

Tracing messages sent to windows and context searches in memory

So far, you don't need to know how the debugger works; you only need to realize that a debugger can do all of these things. However, it is important to know which debugger to use. Turbo Debugger, although widely known, is primitive, and few hackers use it.

The most powerful and universal tool is SoftIce, now available for all Windows platforms. (Some time ago, it only supported Windows 95, not Windows NT.) The fourth version, the latest available when I was writing this, did not work well with my video adapter. Therefore, I had to confine myself to the earlier 3.25 version, which is more reliable.

Hackers Disassembling 1.1.3 (Step Three: Surgery)

Step Three: Surgery

Direct modification of an executable file is a serious task. We are restricted by the existing code in that we can't move instructions apart or "push" them together, having thrown away "superfluous parts" of the protection. The offsets of all other instructions would shift, while the values of pointers and jump addresses would remain the same, and thus would point to the wrong spot.

It's rather simple to cope with the elimination of "spare parts." Just stuff the code with NOP instructions (whose opcode is 0x90, not 0x0, as many novice code diggers seem to think), that is, with an empty operation (since, generally, NOP is simply another form of the XCHG EAX, EAX instruction). Things are much more complicated when we move instructions apart! Fortunately, in PE files, "holes" always remain after alignment, which we can fill with our code or data.

But isn't it easier to simply compile the assembled file after we make the required changes? No, it isn't: If an assembler can't recognize pointers passed to a function (as we saw, our disassembler can't distinguish them from constants), it can't correct them properly, and the program won't run.

Therefore, we have to "dissect" the "live" program. The easiest way to do this is to use the HIEW utility that "digests" PE files, and thus simplifies the search for the necessary fragment. Launch it with the executable file name in the command line (hiew simple.exe). Then, press the key two times, switch to assembler mode, and press the key to proceed to the required address. As you may recall, the TEST instruction that checks the string-comparison result returned by the function is located at 0x401056.

0040104E: E8 4D 00 00 00 call 004010A0
00401053: 83 C4 08 add esp, 8
00401056: 85 C0 test eax, eax
00401058: 74 0F je 00401069

So that HIEW is able to distinguish the address from the offset in the file itself, precede this address with a dot: .401056.

00401056: 85C0 test eax, eax
00401058: 740F je 00401069 ---(1)

Now, press the key to switch HIEW to edit mode. Place the cursor at the TEST EAX, EAX instruction, press the key, and replace it with XOR EAX, EAX.

00001056: 33C0 xor eax, eax
00001058: 740F je 00401069

Because the new instruction fits exactly in the place of the previous one, press the key to save the changes to disk, and quit HIEW. Start the program and enter the first password that comes to mind.

> simple.exe
Enter password:Hi, blockhead!
Password OK

The protection has fallen! But what would we do if HIEW did not know how to "digest" PE files? We'd have to use a context search. Look at the hex dump that the disassembler displays to the left of the assembly instructions. If you try to find the 85 C0 sequence — the TEST EAX, EAX instruction — you won't come up with anything useful: There can be hundreds or more of these TEST instructions in a program. The ADD ESP,8\TEST EAX, EAX combination also is common, since it represents many typical constructions in C: if (! func (argl,arg2))…, if (! func (arg1,arg2))…, while (func(arg1,arg2), etc. The jump address likely will be different at various branches in the program; therefore, the ADD ESP,8/TEST EAX,EAX/JE 00401069 substring has a good chance of being unique. Let's try to find the code that corresponds to it: 83 C4 08 85 C0 74 0F. (To do this, just press the key in HIEW.)

Yippee! Only one entry is found, and that's just what we need. Now, let's try to modify the file directly in hex mode, without using the assembler. Note that inverting the lower bit of the instruction code results in inverting the condition for branching (i.e., 74 JE → 75 JNE).

It works, doesn't it? (Has the protection gone mad? It doesn't recognize valid passwords, but it welcomes all others.) It's wonderful!

Now, we need to clear up which bytes have changed. For this, we need an original copy of the file we modified (which we prudently saved before editing), and any file "comparer." Today, the most popular ones are c2u by Professor Nimnull and MakeCrk from Doctor Stein's Labs. The first is the better of the two; it more precisely meets the most popular "standard", and it knows how to generate the extended XCK format. At worst, we can use the utility that comes with MS-DOS/Windows — fc.exe (an abbreviation of File Compare).

Start your favorite comparer, and look at the differences between the original and modified executables.

The left column shows the offset of a byte from the beginning of the file, the second column shows the contents of the byte in the original file, and the third column contains the byte's value after modification. Let's compare that to the report generated by the c2u utility.

> c2u simple.exe simple.ex_

Corrections are written to the *.crx file, where "*" is the name of the original file. Let's consider the result more closely.

>type simple.crx
[BeginXCK]——————————————————————————————————
▪ Description : $) 1996 by Professor Nimnul
▪ Crack subject :
▪ Used packer : None/UnKnOwN/WWPACK/PKLITE/AINEXE/DIET/EXEPACK/PRO-PACK/LZEXE
▪ Used unpacker : None/UNP/X-TRACT/iNTRUDER/AUT0Hack/CUP/TR0N
▪ Comments :
▪ Target OS : DOS/WiN/WNT/W95/0S$/UNX
▪ Protection : [ ] %17
▪ Type of hack : Bit hack/JMP Correction
▪ Language : UnKn0wN/Turbo/Borland/Quick/MS/Visual C/C++/Pascal/Assembler
▪ Size : 28672
▪ Price : $000
▪ Used tools : TD386 v3.2, HiEW 5.13, C2U/486 v0.10
▪ Time for hack : 00:00:00
▪ Crack made at : 21-07-2001 12:34:21
▪ Under Music : iRON MAiDEN
[BeginCRA]————————————————————————————————
Difference(s) between simple.exe & simple.ex_
SIMPLE.EXE
00001058: 74 75
[EndCRA]——————————————————————————————————
[EndXCK]——————————————————————————————————

The result is the same; there simply is an additional text-file header explaining what kind of a beast this is. The collection of fields differs from one hacker to another. If you want, you can add your own fields or delete someone else's. However, I don't recommend doing that without a good reason. Besides, it's better to adhere to one template. Let's use the one just shown.

Description is simply an explanation. In our case, this may look like this: "Test cracking No.1."

Crack subject is what we've just cracked. Let's write: "Password protection of simple.exe."

Used packer is the type of packer. In the days of good old MS-DOS, packers were widely used to automatically decompress executable files into memory when they were launched. Thus, disk space was economized (recall the ridiculously small hard disks at the end of the 1980s and the beginning of the 1990s), while protection was strengthened. A packed file cannot be directly investigated nor edited. Before you do anything with the file, you have to unpack it. Both the hacker and users of the CRK file have to do the same. Since our file wasn't packed, we'll leave this field empty or write "None" in it.

Used unpacker is the recommended unpacker. Not all unpackers are identical; many packers provide advanced protection and skillfully resist attempts to remove it. Therefore, unpackers are not simple things. An "intelligent" unpacker easily deals with "tough" packers, but it often has difficulty with simple protection, or vice versa. If an unpacker isn't required, leave this field blank or write "None."

Comments is used to list additional tasks the user should perform before cracking (for example, removing the "system" attribute from the file, or, conversely, setting it). However, additional operations are only required in extreme cases; therefore, this field is usually filled with boasts. (Sometimes you'll even find obscenities concerning the mental abilities of the protection developer.)

Target OS is the operating system for which the cracked product is intended, and in which the hacker tested it. The program won't necessarily run under all of the same systems after cracking. For example, Windows 9x always ignores the checksum field, but Windows NT doesn't; therefore, if you haven't corrected it, you won't be able to run the cracked program using Windows NT. In our case, the checksum of the PE file header is equal to zero. (This depends on the compiler.) This means the file integrity isn't checked, and the hack will work in Windows NT/9x.

Protection is a "respectability level" evaluated as a percentage. Generally, 100% corresponds to the upper limit of the mental abilities of a hacker, but who would ever admit that? It's not surprising that the "respectability level" is usually underestimated, occasionally ten times or more. ("Look everybody! What a cool hacker I am; cracking whatever I like is as easy as A-B-C!")

Type of hack is more useful for other hackers than for users who don't understand protection and hack types. There's no universal classification. The most commonly used term, bit-hack, means cracking by changing one or more bits in one or more bytes. A particular case of a bit-hack is the JMP correction — changing the address or condition of a jump (as we've just done). Another term, NOP-ing, refers to a bit-hack that replaces certain instructions with the NOP instruction, or inserts insignificant instructions. For example, to erase a two-byte JZ xxx instruction, a combination of two one-byte INC EAX/DEC EAX instructions can be used.

Language or, to be more accurate, the compiler, is the programming environment in which the program was written. In our case, it was Microsoft Visual C++. (We know this because we compiled the program.) How do we know the environment of someone else's program? The first thing that comes to mind is to look in the file for copyrights: They are left by many compilers, including Visual C++. Look for "000053d9:Microsoft Visual C++ Runtime Library." If compilers aren't specified, run the file through IDA. It automatically recognizes most standard libraries, and even indicates particular versions. As a last resort, try to determine the language in which the code was written, taking into account C and Pascal conventions and familiar compiler features. (Each compiler has its own "handwriting." An experienced hacker can figure out how a program was compiled and even discover the optimization key.

Size refers to the size of the cracked program, which is useful for controlling the version. (Different versions of the program often differ in size.) It is determined automatically by the c2u utility; you don't need to specify it manually.

Price refers to the price of a licensed copy of the program. (The user should know how much money the crack has saved him or her.)

Used tools are the instruments used. Not filling in this field is considered bad form — it's interesting to know what instruments were used to hack the program.

This is especially true for users who believe that if they get a hold of these DUMPBIN and HIEW thingies, the protection will fall by itself.

Time for hack is the time spent hacking, including breaks for having a smoke and getting a drink. What percentage of people fills in this field accurately, without trying to look "cool?" It can be given little credence.

Crack made at is the timestamp for the completion of the crack. It's generated automatically, and you don't need to correct it (unless you get up with the sun, want to pretend you are a night owl, and set the time of completion to 3 a.m.).

Under Music is the music that you were listening to when hacking. (It's a pity that there's no field for the name of your pet hamster.) Were you listening to music while hacking? If you were, write it down — let everyone know your inspiration.

Now, we should have the following:

[BeginXCK]————————————————————————————————
▪ Description : Test cracking No. 1
▪ Crack subject : Password protection of simple.exe
▪ Used packer : None
▪ Used unpacker : None
▪ Comments : Hello, sailor! Been at sea a bit too long?
▪ Target OS : WNT/W95
▪ Protection : [ ] %1
▪ Type of hack : JMP Correction
▪ Language : Visual C/C++
▪ Size : 28672
▪ Price : $000
▪ Used tools : DUMPBIN, HiEW 6.05, C2U/486 v0.10 & Brain
▪ Time for hack : 00:10:00
▪ Crack made at : 21-07-2001 12:34:21
▪ Under Music : Paul Mauriat L'Ete Indien "Africa"
[BeginCRA]————————————————————————————————
Difference(s) between simple.exe & simple.ex_
SIMPLE.EXE
00001058: 74 75
[EndCRA]————————————————————————————————
[EndXCK]————————————————————————————————

To change the same bytes in the original program, we need another utility to do what the CRK (XCRK) file specifies. There are a lot of such utilities nowadays, which adversely affects their compatibility with various CRK formats. The most popular are cra386 by Professor Nimnull and pcracker by Doctor Stein's Labs.

Of the products for Windows, Patch Maker has an advanced user interface (Fig. 2). It includes a file comparer, crk editor, hex editor (for manual corrections?), and crk compiler to generate executable files and save users the trouble of figuring out the crack and how to do it.

Figure 2: The Patch Maker at work

Some users may find such an interface convenient, but most hackers can't stand the mouse; they prefer console applications and the keyboard.

Wednesday, August 5, 2009

Hackers Disassembling 1.1.2 (Step Two: Getting Acquainted with the Disassembler)

Step Two: Getting Acquainted with the Disassembler

In the previous step, we found the password. But how tiresome it is to enter the password each time you start the program! It wouldn't be a bad idea to hack the program so that no password is requested, or so that any password is accepted.

Hack?! It's not difficult. It's tougher to know what to hack with. A huge variety of hacker tools exists: disassemblers, debuggers, spyware such as API and message loggers, file (port, registry) call monitors, decompressors, and so on. How can a novice code digger grasp all of these facilities?

Spies, monitors, and decompressors are auxiliary, "Plan B" utilities. The main hacker weapons are the disassembler and the debugger.

The purpose of a disassembler is clear from its name. Whereas assembling is the translation of assembly instructions into machine code, disassembling is the translation of machine code into assembly instructions.

However, a disassembler can be used to study more than programs written in the assembler. Its range of application is wide, but not boundless. You may wonder where that boundary lies.

All implementations of programming languages can be divided roughly into the following categories:

>> Interpreters execute a program in the order it was typed by the programmer. In other words, interpreters "chew up" the source code, which can be accessed directly, without using additional resources. To start most BASIC and Perl implementations, you need an interpreter in addition to the source code of the program. This is inconvenient both for users (who, to execute a program of 10 KB, need to install an interpreter of 10 MB) and for developers (who likely don't want to give everyone the entire source code of the program). In addition, syntactic parsing takes a lot of time, which means no interpreter can claim great performance.

>> Compilers behave differently. They "grind" the program into machine code that can be executed directly by the processor, without using the source code or an accessory program such as an interpreter. From a user's point of view, a compiled program is a mash of hexadecimal bytes impossible for nonexperts to understand. This facilitates the development of protection mechanisms: You can only crack the simplest algorithms without deciphering them.

Is it possible to obtain the source code of a program from its machine code? No! Compilation is a unidirectional process. Labels and comments aren't included. (However, we can get the gist of the source code without comments—we are hackers, aren't we?) The main stumbling block is the ambiguous correspondence of machine instructions to constructions in high-level languages. Moreover, assembling also is a unidirectional process, and automatic disassembling is impossible in principle. However, we will not cram such details into the heads of novice code diggers; we'll leave this problem for later consideration.

>> Several software development systems lie between compilers and interpreters. The source code is transformed not to machine code, but rather to code in another interpreted language. To execute this code, the "compiled" file needs its own interpreter. FoxPro, Clipper, numerous dialects of BASIC, and certain other languages are examples.

In this last category, program code is still executed via an interpreter, but all extraneous information — labels, variable names, comments — is removed, and meaningful operator names are replaced with digital codes. This "stone" kills two birds: The intermediate language is tweaked for fast interpretation and is optimized for size beforehand, and the program code becomes unavailable for direct investigation (and/or modification).

Disassembling such programs is impossible — disassemblers only work with machine code, and can't "digest" code in an interpreted language (also known as π code) that they don't understand. The processor can't digest π code either. It can only be executed with an interpreter. But the interpreter is just what the disassembler can digest! By investigating how it works, you can "understand" π code and the purpose of its instructions. It's a laborious process! Interpreters can be so complex, and can occupy so many megabytes, that their analysis can take several months or years. Fortunately, there's no need to analyze each program. Many interpreters are identical, and π code does not tend to vary significantly from one version to another — at least, the main parts don't change daily. Therefore, it's possible to create a program to translate π code back to source code. It's not possible to restore names of variables; nevertheless, the listing will be readable.

So, disassemblers are used to investigate compiled programs, and can be applied when analyzing "pseudo-compiled" code. If that's the case, they should be suitable for cracking simple.exe. The only question is which disassembler to use.

Not all disassemblers are identical. There are "intellectuals" that automatically recognize constructions (i.e., prologs and epilogs of functions, local variables, cross-references, etc.). There are also "simpletons" that merely translate machine code into assembly instructions.

Intellectual disassemblers are the most helpful, but don't hurry to these: Begin with a manual analysis. Disassembler tools are not always on hand; therefore, it wouldn't be a bad idea to master working "in field conditions" first. Besides, working with a poor disassembler will emphasize "the taste" of good things.

Let's use the familiar DUMPBIN utility — a true "Swiss Army knife" that has plenty of useful functions, including a disassembler. Let's disassemble the code section (bearing the name .text). Redirect the output to a file, since we certainly won't find room for it on the screen.

> dumpbin /SECTION: .text /DISASM simple.exe >.code

In less than a second, the .code file is created. It has a size of as much as 300 KB. But the source program was shorter by hundreds of times! How much time will it take to clear up this "Greek?" The overwhelming bulk of it has no relation to the protection mechanism; it represents the compiler's standard library functions, which are of no use to us. How can we distinguish these from the "useful" code?

Let's think a bit. We don't know where the procedure to match passwords is located, and we don't know how it works. But we can assert with confidence that one of its arguments is a pointer to the reference password. We just need to find where this password is located in memory. Its address will be stored by the pointer.

Let's have a look at the data section once again (or wherever the password is stored).

> dumpbin /SECTION: .data /RAWDATA simple.exe >.data

RAW DATA #3
00406000: 00 00 00 00 00 00 00 00 00 00 00 00 7B 11 40 00 ...{.@.
00406010: 6E 40 40 00 00 00 00 00 00 00 00 00 20 12 40 00 n@@... .@.
00406020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ...
00406030: 45 6E 74 65 72 20 70 61 73 73 77 6F 72 64 3A 00 Enter password: .
00406040: 6D 79 47 4F 4F 44 70 61 73 73 77 6F 72 64 0A 00 myGOODpassword..
00406050: 57 72 6F 6E 67 20 70 61 73 73 77 6F 72 64 0A 00 Wrong password..
00406060: 50 61 73 73 77 6F 72 64 20 4F 4B 0A 00 00 00 00 Password OK...

Aha! The password is located at the offset 0x406040 (the left column of numbers), so the pointer to it also must equal 0x406040. Let's try to find this number in the disassembled listing by searching with any text editor.

Have you found it? Here it is (printed in bold in the text):

00401045: 68 40 60 40 00 push 406040h
0040104A: 8D 55 98 lea edx, [ebp-68h]
0040104D: 52 push edx
0040104E: E8 4D 00 00 00 call 004010A0
00401053: 83 C4 08 add esp, 8
00401056: 85 C0 test eax, eax
00401058: 74 0F je 00401069

This is one of two arguments of the 0x04010A0 function placed on the stack by the push machine instruction. The second argument is a pointer to a local buffer, probably containing the user-entered password.

Here, we have to deviate from our subject to consider passing parameters in detail. The following ways of passing function arguments are the most common: via registers and via the stack.

Passing parameters via registers is the fastest way, but it's not free from disadvantages: The number of registers is very limited, and it complicates implementing recursion — calling a function from within its own body. Furthermore, before writing new arguments into registers, we need to save the old values in RAM. In this case, isn't it easier to pass arguments through RAM without being tormented by registers?

Most compilers pass arguments via the stack. Compilers have standard way of passing arguments. There are at least two different mechanisms:

>> The C convention pushes arguments onto the stack from right to left (i.e., the first argument of the function is placed on the stack last, and thus appears on top). Deleting arguments from the stack is entrusted not to the function, but to the code calling the function. This is wasteful because each function call makes the program heavier by several bytes. However, it allows us to create functions with a variable number of arguments because the calling code knows the exact number of arguments passed.

The stack usually is cleared by the instruction ADD ESP, xxx, where xxx is the number of bytes to be deleted. Since, in 32-bit mode, each argument as a rule occupies 4 bytes, the number of function arguments is calculated in this way: . Optimizing compilers can be more eloquent. To clear a stack of several arguments, they often pop them into unused registers with the POP instruction. Alternatively, an optimizing compiler clears at the time it deems most convenient, rather than immediately after exiting a function.

>> The Pascal convention pushes arguments on the stack from left to right (i.e., the first argument of the function is placed on the stack first, and thus appears on the bottom). The deletion of function arguments is entrusted to the function itself, and is usually performed by the RET xxx instruction (i.e., return from the subroutine and pop xxx bytes from the stack).

The value returned by the function is passed through the EAX register (or EDX:EAX when returning 64-bit variables) in both conventions.

Since our program was written in C, and pushes arguments from right to left, its source code may look like this:

(*0x4010A0) (ebp-68, "myGOODpassword")

We can be convinced that there are two arguments, not six or ten, by looking at the ADD ESP, 8 instruction that immediately follows the CALL:

0040104E: E8 4D 00 00 00 call 004010A0
00401053: 83 C4 08 add esp, 8

Now, we only need to understand the goal of the 0x4010A0 function — although, if we used our brains, we'd see this is unnecessary! It's clear that this function checks the password; otherwise, why would the password be passed to it? How the function does this is a question of minor importance. What we're really interested in is the return value of the function. So, let's proceed to the following line:

00401056: 85 C0 test eax, eax
00401058: 74 0F je 00401069

What do we see? The TEST EAX, EAX instruction checks if value returned by the function equals zero. If it does, the JE instruction following it jumps to line 0x401096. Otherwise (i.e., if EAX !=0):

0040105A: 68 50 60 40 00 push 406050h

It seems to be a pointer, doesn't it? Let's verify that assumption by looking at the data segment:

00406050: 57 72 6F 6E 67 20 70 61 73 73 77 6F 72 64 0A 00 Wrong password..

We are almost there. The pointer has led us to the "Wrong password" string, which the next function outputs to the screen. Therefore, a nonzero EAX value indicates a wrong password, and a zero value indicates a correct one.

OK, let's look at the branch of the program that handles a valid password.

0040105F: E8 D0 01 00 00 call 00401234
00401064: 83 C4 04 add esp, 4
00401067: EB 02 jmp 0040106B
00401069: EB 16 jmp 00401081
...
00401081: 68 60 60 40 00 push 406060h
00401086: E8 A9 01 00 00 call 00401234

Well, we see one more pointer. The 0x401234 function was already encountered; it's (presumably) used for string output. We can find the strings in the data segment. This time, "Password OK" is referenced.

The following are some working suggestions: If we replace the JE instruction with JNE, the program will reject the real password as incorrect, and all incorrect passwords will be accepted. If we replace TEST EAX, EAX with XOR EAX, EAX, upon executing this instruction, the EAX register will always contain zero, no matter what password is entered.

Just a trifle remains: to find these bytes in the executable file and correct them.

Hackers disassembling 1.1.1(Step One: Warming up)

Step One: Warming up

The algorithm of simplest authentication consists of a character-by-character comparison of the password entered by a user to the reference value stored either in the program (which frequently happens), or outside of it—for example, in a configuration file or the registry (which happens less often).

The advantage of such protection is its extremely simple software implementation. Its core is actually only one line of code that, in the C language, could be written as follows: if (strcmp (password entered, reference password)) {/* Password is incorrect */} else {/* Password is OK*/}.

Let's supplement this code with procedures to prompt for a password and display the comparison, and then examine the program for its vulnerability to cracking.

Listing 1: The Simplest System of Authentication

// Matching the password character by character

#include
#include

#define PASSWORD_SIZE 100
#define PASSWORD "myGOODpassword\n"
// The CR above is needed
// so as not to cut off
// the user-entered CR.

int main ()
{
// The counter for authentication failures
int count=0;
// The buffer for the user-entered password
char buff [PASSWORD_SIZE];

// The main authentication loop
for (;;)
{
// Prompting the user for a password
// and reading it
printf ("Enter password:");
fgets (&buff [0], PASSWORD_SIZE,stdin);

// Matching the entered password against the reference value
if (strcmp (&buff [0], PASSWORD))
// "Scolding" if the passwords don't match;
printf ("Wrong password\n");
// otherwise (if the passwords are identical),
// getting out of the authentication loop
else break;

// Incrementing the counter of authentication failures
// and terminating the program if 3 attempts have been used
if (++count>3) return -1;
}

// Once we're here, the user has entered the right password.
printf ("Password OK\n");
}

In popular movies, cool hackers easily penetrate heavily protected systems by guessing the required password in just a few attempts. Can we do this in the real world?

Passwords can be common words, like "Ferrari", "QWERTY", or names of pet hamsters, geographical locations, etc. However, guessing the password is like looking for a needle in a haystack, and there's no guarantee of success — we can only hope that we get lucky. And lady luck, as we all know, can't be trifled with. Is there a more reliable way to crack this code?

Let's think. If the reference password is stored in the program, and isn't ciphered in some artful manner, it can be found by simply looking at the binary code. Looking at all the text strings, especially those that look like a password, we'll quickly find the required key and easily "open" the program!

The area in which we need to look can be narrowed down using the fact that, in the overwhelming majority of cases, compilers put initialized variables in the data segment (in PE files, in the .data section). The only exception is, perhaps, early Borland compilers, with their maniacal passion for putting text strings in the code segment—directly where they're used. This simplifies the compiler, but creates a lot of problems. Modern operating systems, as opposed to our old friend MS-DOS, prohibit modifying the code segment. Therefore, all variables allocated in it are read-only. Apart from this, on processors with a separate caching system (Pentiums, for example), these string "litter" the code cache, loaded during read ahead and, when they're called for the first time, loaded again from the slow RAM (L2 cache) into the data cache. The result is slower operation and a drop in performance.

So, let's assume it's in the data section. Now, we just need a handy instrument to view the binary file. You can press in your favorite shell (FAR, DOS Navigator) and, by pressing the key admire the digits scrolling down until it bores you. You can also use a hex-editor (QVIEW, HIEW, etc.) but, in this book, for presentation purposes, I'll use the DUMPBIN utility supplied with Microsoft Visual Studio.

Let's print out the data section (the key is /SECTION:.data) as raw data (the key is /RAWDATA:BYTES), having specified the ">" character for redirecting the output to a file. (The response occupies a lot of space, and only its "tail" would find room on the screen.)

> dumpbin /RAWDATA:BYTES /SECTION:.data simple.exe >filename

RAW DATA #3
00406000: 00 00 00 00 00 00 00 00 00 00 00 00 3B 11 40 00 ............;.@.
00406010: 64 40 40 00 00 00 00 00 00 00 00 00 70 11 40 00 d@@.........p.@.
00406020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00406030: 45 6E 74 65 72 20 70 61 73 73 77 6F 72 64 3A 00 Enter password:.
00406040: 6D 79 47 4F 4F 44 70 61 73 73 77 6F 72 64 0A 00 myGOODpassword..
00406050: 57 72 6F 6E 67 20 70 61 73 73 77 6F 72 64 0A 00 Wrong password..
00406060: 50 61 73 73 77 6F 72 64 20 4F 4B 0A 00 00 00 00 Password OK.....
00406070: 40 6E 40 00 00 00 00 00 40 6E 40 00 01 01 00 00 @n@.....@n@.....

Look! In the middle of the other stuff, there's a string that is similar to a reference password (it's printed in bold). Shall we try it? It seems likely we need not even bother: Judging from the source code, it really is the password. The compiler has selected too prominent of a place in which to store it—it wouldn't be such a bad idea to hide the reference password better.

One of the ways to do this is to manually place the reference password value in a section that we choose ourselves. The ability to define the location isn't standard, and, consequently, each compiler (strictly speaking, not actually the compiler, but the linker—but that isn't really important) is free to implement it in any way (or not implement it at all). In Microsoft Visual C++, a special pragma — data_seg — is used for this, and indicates in which section the initialized variables following it should be placed. By default, unassigned variables are placed in the .bbs section, and are controlled by the bss_seg pragma.

Let's add the following lines to Listing 1, and see how they run.

int count=0;
// From now on, all the initialized variables will be
// located in the .kpnc section.
#pragma data_seg (."kpnc")
// Note that the period before the name
// isn't mandatory, just customary.
char passwd[ ]=PASSWORD;
#pragma data_seg ()
// Now all the initialized variables will again
// be located in the section by default (i.e., ."data").
char buff [PASSWORD_SIZE]=" ";
...
if (strcmp(&buff[0] , &passwd[0]))

> dumpbin /RAWDATA:BYTES /SECTION: .data simple2.exe >filename

RAW DATA #3
00406000: 00 00 00 00 00 00 00 00 00 00 00 00 45 11 40 00 ............E.@.
00406010: 04 41 40 00 00 00 00 00 00 00 00 00 40 12 40 00 .A@.........@.@.
00406020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00406030: 45 6E 74 65 72 20 70 61 73 73 77 6F 72 64 3A 00 Enter password:.
00406040: 57 72 6F 6E 67 20 70 61 73 73 77 6F 72 64 0A 00 Wrong password..
00406050: 50 61 73 73 77 6F 72 64 20 4F 4B 0A 00 00 00 00 Password OK.....
00406060: 20 6E 40 00 00 00 00 00 20 6E 40 00 01 01 00 00 n@..... n@......
00406070: 00 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00 ................

Aha! Now, there's no password in the data section and hackers' attack has been retarded. But don't jump to conclusions. Simply display the list of sections in the file:

> dumpbin simple2.exe

Summary
2000 .data
1000 .kpnc
1000 .rdata
4000 .texts

The nonstandard section .kpnc attracts our attention right away. Well, shall we check to see what's in it?

dumpbin /SECTION:.kpnc /RAWDATA simple2.exe

RAW DATA #4
00408000: 6D 79 47 4F 4F 44 70 61 73 73 77 6F 72 64 0A 00 myGOODpassword..

There's the password! And we thought we hid it. It's certainly possible to put confidential data into a section of noninitialized data (.bss), the service RTL section (.rdata), or even into the code section (.text) — not everyone will look there for the password, and such allocation won't disturb the functioning of the program. But you shouldn't forget about the possibility of an automated search for text strings in a binary file. Wherever the reference password may be, such a filter will easily find it. (The only problem is determining which text string holds the required key; most likely, a dozen or so possible "candidates" will need to be tried.)

If the password is written in Unicode, the search is somewhat more complicated, since not all such utilities support this encoding. But it'd be rather native to hope that this obstacle will stop a hacker for long.

Hackers Disassembling 1.1 (Getting Acquainted with Basic Hacking Techniques)

Part I: Getting Acquainted with Basic Hacking Techniques

Step One: Warming up
Step Two: Getting Acquainted with the Disassembler
Step Three: Surgery
Step Four: Getting Acquainted with the Debugger
Step Five: IDA Emerges onto the Scene
Step Six: Using a Disassembler with a Debugger
Step Seven: Identifying Key Structures of High-Level Languages

Hackers Disassembling 1.0.1 (Protection Strength)

Protection Strength

If protection is based on the assumption that its code won't be investigated and/or changed, it's poor protection. Concealing the source code isn't an insurmountable obstacle to studying and modifying the application. Modern reverse engineering techniques automatically recognize library functions, local variables, stack arguments, data types, branches, loops, etc. And, in the near future, disassemblers will probably be able to generate code similar in appearance to that of high-level languages.

But, even today, analyzing machine code isn't so complex as to stop hackers for long. The overwhelming number of constant cracks is the best testament to this. Ideally, knowing the protection algorithm shouldn't influence the protection's strength, but this is not always possible to achieve. For example, if a server application has a limitation on the number of simultaneous connections in a demo version (which frequently happens), all a hacker needs to do is find the instruction of the process carrying out this check and delete it. Modification of a program can be detected and prevented by testing the checksum regularly; however, the code that calculates the checksum and compares it to a particular value can be found and deleted.

However many protection levels there are — one or one million — the program can be cracked! It's only a matter of time and effort. But, when there are no effective laws protecting intellectual property, developers must rely on protection more than law-enforcement bodies. There's a common opinion that if the expense of neutralizing protection isn't lower than the cost of a legal copy, nobody will crack it. This is wrong! Material gain isn't the only motivation for a hacker. Much stronger motivation appears to lie in the intellectual struggle(who's more clever: the protection developer or me?), the competition (which hacker can crack more programs?), curiosity (what makes it tick?), advancing one's own skills (to create protections, you first need to learn how to crack them), and simply as an interesting way to spend one's time. Many young hackers spend weeks removing the protection from a program that only costs a few dollars, or even one distributed free of charge.

The usefulness of protection is limited to its competition — other things being equal, clients always select unprotected products, even if the protection doesn't restrain the client's rights. Nowadays, the demand for programmers considerably exceeds supply, but, in the distant future, developers should either come to an agreement or completely refuse to offer protection. Thus, protection experts will be forced to look for other work.

This doesn't mean that this book is useless; on the contrary, the knowledge that it provides should be applied as soon as possible, while the need for protection hasn't disappeared yet.

Monday, August 3, 2009

Hackers Disassembling 1.0(Protection Classifications)

Introduction

Protection Classifications

Checking authenticity is the "heart" of the overwhelming majority of protection mechanisms. In all cases, we have to make sure that the person working with our program is who he or she claims to be, and that this person is authorized to work with the program. The word "person" might mean not only a user, but the user's computer or the medium that stores a licensed copy of the program. Thus, all protection mechanisms can be divided into two main categories:

Protection based on knowledge (of a password, serial number, etc.)

Protection based on possession (of a key disc, documentation, etc.)

Knowledge-based protection is useless if a legitimate owner isn't interested in keeping the secret. An owner can give the password (and/or serial number) to whomever he or she likes, and thus anyone can use a program with such protection on his or her computer.

Therefore, password protection against illegal copying is not effective. Why, then, do practically all prominent software manufacturers use serial numbers? The answer is simple—to protect their intellectual property with the threat (however unlikely) of brute force. The idea goes approximately as follows: The quiet, work-a-day environment of a certain company is suddenly broken into by a squad of agents dressed in camouflage, comparing the Windows license numbers (Microsoft Office, Microsoft Visual Studio) to license agreements. If they find even one illegal copy, some official pops up seemingly from out of nowhere and starts to joyfully rub his or her hands in anticipation of the expected windfall. At best, they'll force the company to buy all the illegal copies. At worst…

Naturally, nobody is barging in on users in their homes, and nobody is even considering it (yet) — your house is still your castle. Besides, what can you get from a domestic user? A wide distribution of products is good for manufacturers, and who can distribute better than pirates? Even in that case, serial numbers aren't superfluous—unregistered users cannot use technical support, which may push them to purchase legal versions.

Such protection is ideal for giant corporations, but it isn't suitable for small groups of programmers or individual developers, especially if they earn their bread by writing highly specialized programs for a limited market (say, star spectra analysis, or modeling nuclear reactions). Since they cannot apply sufficient pressure, it's unreal for them to ask users to check their licenses, and it's hardly possible to "beat" the payment out of illegal users. All that can be done is through threat and eloquence.

In this case, protection based on the possession of some unique subject that is extremely difficult to copy, or impossible to copy in general (the ideal case), is more appropriate. The first of this kind were key floppies with information written on them in such a manner that copying the floppy disk was impossible. The simplest way (but not the best) to prepare such a floppy was to gently damage the disk with a nail (an awl, a penknife), and then, having determined the sector in which the defect was located (by writing and reading any test information — up until a certain point, reading will proceed normally, followed by "garbage"), register it in the program. Then, each time the program started, it checked whether the defect was located in the same place or not. When floppy disks became less popular, the same technique was used with compact discs. The more affluent cripple their discs with a laser, while ordinary folk still use an awl or nail.

Thus, the program is rigidly bound to a disc, and requires its presence to run. Since copying such a disc is impossible (just try making identical defects on a copy), pirates have to give up.

Other possession-based protection mechanisms frequently modify the subject of possession, limiting the number of program starts or the duration of its use. Such a mechanism is often used in installers. So as to not irritate users, the key is only requested once, when the program is installed, and it's possible to work without the key. If the number of installations is limited, the damage arising from unauthorized installation of one copy on several computers can be slight.

The problem is that all of this deprives a legal user of his or her rights. Who wants to limit the number of installations? (Some people reinstall the operating system and software each month or even several times a day). In addition, key discs are not recognized by all types of drives, and are frequently "invisible" devices on the network. If the protection mechanism accesses the equipment directly, bypassing drivers in order to thwart hackers' attacks more effectively, such a program definitely won't run under Windows NT/2000, and will probably fail under Windows 9x. (This is, of course, if it wasn't designed appropriately beforehand. But such a case is even worse, since protection executing with the highest privileges can cause considerable damage to the system.) Apart from that, the key item can be lost, stolen, or just stop working correctly. (Floppy disks are inclined to demagnetize and develop bad clusters, CDs can get scratched, and electronic keys can "burn out".)

Naturally, these considerations concern the effectiveness of keys in thwarting hackers, and not the concept of keys in general. End users are none the better for this! If protection causes inconveniences, users would rather visit the nearest pirate and buy illegal software. Speeches on morals, ethics, respectability, and so on won't have any effect. Shame on you, developers! Why make users' lives even more complicated? Users are human beings too!

That said, protections based on registration numbers have been gaining popularity: Once run for the first time, the program binds itself to the computer, turns on a "counter", and sometimes blocks certain functionalities. To make the program fully functional, you have to enter a password from the developer in exchange for monetary compensation. To prevent pirate copying, the password is often a derivative of key parameters of the user's computer (or a derivative of their user name, in an elementary case).

Certainly, this brief overview of protection types has left many of them out, but a detailed discussion of protection classifications is beyond the scope of this book. We'll leave it for a second volume.