Hacking KING: Hackers Disassembling 1.1.2 (Step Two: Getting Acquainted with the Disassembler)

Step Two: Getting Acquainted with the Disassembler

In the previous step, we found the password. But how tiresome it is to enter the password each time you start the program! It wouldn't be a bad idea to hack the program so that no password is requested, or so that any password is accepted.

Hack?! It's not difficult. It's tougher to know what to hack with. A huge variety of hacker tools exists: disassemblers, debuggers, spyware such as API and message loggers, file (port, registry) call monitors, decompressors, and so on. How can a novice code digger grasp all of these facilities?

Spies, monitors, and decompressors are auxiliary, "Plan B" utilities. The main hacker weapons are the disassembler and the debugger.

The purpose of a disassembler is clear from its name. Whereas assembling is the translation of assembly instructions into machine code, disassembling is the translation of machine code into assembly instructions.

However, a disassembler can be used to study more than programs written in the assembler. Its range of application is wide, but not boundless. You may wonder where that boundary lies.

All implementations of programming languages can be divided roughly into the following categories:

>> Interpreters execute a program in the order it was typed by the programmer. In other words, interpreters "chew up" the source code, which can be accessed directly, without using additional resources. To start most BASIC and Perl implementations, you need an interpreter in addition to the source code of the program. This is inconvenient both for users (who, to execute a program of 10 KB, need to install an interpreter of 10 MB) and for developers (who likely don't want to give everyone the entire source code of the program). In addition, syntactic parsing takes a lot of time, which means no interpreter can claim great performance.

>> Compilers behave differently. They "grind" the program into machine code that can be executed directly by the processor, without using the source code or an accessory program such as an interpreter. From a user's point of view, a compiled program is a mash of hexadecimal bytes impossible for nonexperts to understand. This facilitates the development of protection mechanisms: You can only crack the simplest algorithms without deciphering them.

Is it possible to obtain the source code of a program from its machine code? No! Compilation is a unidirectional process. Labels and comments aren't included. (However, we can get the gist of the source code without comments—we are hackers, aren't we?) The main stumbling block is the ambiguous correspondence of machine instructions to constructions in high-level languages. Moreover, assembling also is a unidirectional process, and automatic disassembling is impossible in principle. However, we will not cram such details into the heads of novice code diggers; we'll leave this problem for later consideration.

>> Several software development systems lie between compilers and interpreters. The source code is transformed not to machine code, but rather to code in another interpreted language. To execute this code, the "compiled" file needs its own interpreter. FoxPro, Clipper, numerous dialects of BASIC, and certain other languages are examples.

In this last category, program code is still executed via an interpreter, but all extraneous information — labels, variable names, comments — is removed, and meaningful operator names are replaced with digital codes. This "stone" kills two birds: The intermediate language is tweaked for fast interpretation and is optimized for size beforehand, and the program code becomes unavailable for direct investigation (and/or modification).

Disassembling such programs is impossible — disassemblers only work with machine code, and can't "digest" code in an interpreted language (also known as π code) that they don't understand. The processor can't digest π code either. It can only be executed with an interpreter. But the interpreter is just what the disassembler can digest! By investigating how it works, you can "understand" π code and the purpose of its instructions. It's a laborious process! Interpreters can be so complex, and can occupy so many megabytes, that their analysis can take several months or years. Fortunately, there's no need to analyze each program. Many interpreters are identical, and π code does not tend to vary significantly from one version to another — at least, the main parts don't change daily. Therefore, it's possible to create a program to translate π code back to source code. It's not possible to restore names of variables; nevertheless, the listing will be readable.

So, disassemblers are used to investigate compiled programs, and can be applied when analyzing "pseudo-compiled" code. If that's the case, they should be suitable for cracking simple.exe. The only question is which disassembler to use.

Not all disassemblers are identical. There are "intellectuals" that automatically recognize constructions (i.e., prologs and epilogs of functions, local variables, cross-references, etc.). There are also "simpletons" that merely translate machine code into assembly instructions.

Intellectual disassemblers are the most helpful, but don't hurry to these: Begin with a manual analysis. Disassembler tools are not always on hand; therefore, it wouldn't be a bad idea to master working "in field conditions" first. Besides, working with a poor disassembler will emphasize "the taste" of good things.

Let's use the familiar DUMPBIN utility — a true "Swiss Army knife" that has plenty of useful functions, including a disassembler. Let's disassemble the code section (bearing the name .text). Redirect the output to a file, since we certainly won't find room for it on the screen.

> dumpbin /SECTION: .text /DISASM simple.exe >.code

In less than a second, the .code file is created. It has a size of as much as 300 KB. But the source program was shorter by hundreds of times! How much time will it take to clear up this "Greek?" The overwhelming bulk of it has no relation to the protection mechanism; it represents the compiler's standard library functions, which are of no use to us. How can we distinguish these from the "useful" code?

Let's think a bit. We don't know where the procedure to match passwords is located, and we don't know how it works. But we can assert with confidence that one of its arguments is a pointer to the reference password. We just need to find where this password is located in memory. Its address will be stored by the pointer.

Let's have a look at the data section once again (or wherever the password is stored).

> dumpbin /SECTION: .data /RAWDATA simple.exe >.data

RAW DATA #3
00406000: 00 00 00 00 00 00 00 00 00 00 00 00 7B 11 40 00 ...{.@.
00406010: 6E 40 40 00 00 00 00 00 00 00 00 00 20 12 40 00 n@@... .@.
00406020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ...
00406030: 45 6E 74 65 72 20 70 61 73 73 77 6F 72 64 3A 00 Enter password: .
00406040: 6D 79 47 4F 4F 44 70 61 73 73 77 6F 72 64 0A 00 myGOODpassword..
00406050: 57 72 6F 6E 67 20 70 61 73 73 77 6F 72 64 0A 00 Wrong password..
00406060: 50 61 73 73 77 6F 72 64 20 4F 4B 0A 00 00 00 00 Password OK...

Aha! The password is located at the offset 0x406040 (the left column of numbers), so the pointer to it also must equal 0x406040. Let's try to find this number in the disassembled listing by searching with any text editor.

Have you found it? Here it is (printed in bold in the text):

00401045: 68 40 60 40 00 push 406040h
0040104A: 8D 55 98 lea edx, [ebp-68h]
0040104D: 52 push edx
0040104E: E8 4D 00 00 00 call 004010A0
00401053: 83 C4 08 add esp, 8
00401056: 85 C0 test eax, eax
00401058: 74 0F je 00401069

This is one of two arguments of the 0x04010A0 function placed on the stack by the push machine instruction. The second argument is a pointer to a local buffer, probably containing the user-entered password.

Here, we have to deviate from our subject to consider passing parameters in detail. The following ways of passing function arguments are the most common: via registers and via the stack.

Passing parameters via registers is the fastest way, but it's not free from disadvantages: The number of registers is very limited, and it complicates implementing recursion — calling a function from within its own body. Furthermore, before writing new arguments into registers, we need to save the old values in RAM. In this case, isn't it easier to pass arguments through RAM without being tormented by registers?

Most compilers pass arguments via the stack. Compilers have standard way of passing arguments. There are at least two different mechanisms:

>> The C convention pushes arguments onto the stack from right to left (i.e., the first argument of the function is placed on the stack last, and thus appears on top). Deleting arguments from the stack is entrusted not to the function, but to the code calling the function. This is wasteful because each function call makes the program heavier by several bytes. However, it allows us to create functions with a variable number of arguments because the calling code knows the exact number of arguments passed.

The stack usually is cleared by the instruction ADD ESP, xxx, where xxx is the number of bytes to be deleted. Since, in 32-bit mode, each argument as a rule occupies 4 bytes, the number of function arguments is calculated in this way: . Optimizing compilers can be more eloquent. To clear a stack of several arguments, they often pop them into unused registers with the POP instruction. Alternatively, an optimizing compiler clears at the time it deems most convenient, rather than immediately after exiting a function.

>> The Pascal convention pushes arguments on the stack from left to right (i.e., the first argument of the function is placed on the stack first, and thus appears on the bottom). The deletion of function arguments is entrusted to the function itself, and is usually performed by the RET xxx instruction (i.e., return from the subroutine and pop xxx bytes from the stack).

The value returned by the function is passed through the EAX register (or EDX:EAX when returning 64-bit variables) in both conventions.

Since our program was written in C, and pushes arguments from right to left, its source code may look like this:

(*0x4010A0) (ebp-68, "myGOODpassword")

We can be convinced that there are two arguments, not six or ten, by looking at the ADD ESP, 8 instruction that immediately follows the CALL:

0040104E: E8 4D 00 00 00 call 004010A0
00401053: 83 C4 08 add esp, 8

Now, we only need to understand the goal of the 0x4010A0 function — although, if we used our brains, we'd see this is unnecessary! It's clear that this function checks the password; otherwise, why would the password be passed to it? How the function does this is a question of minor importance. What we're really interested in is the return value of the function. So, let's proceed to the following line:

00401056: 85 C0 test eax, eax
00401058: 74 0F je 00401069

What do we see? The TEST EAX, EAX instruction checks if value returned by the function equals zero. If it does, the JE instruction following it jumps to line 0x401096. Otherwise (i.e., if EAX !=0):

0040105A: 68 50 60 40 00 push 406050h

It seems to be a pointer, doesn't it? Let's verify that assumption by looking at the data segment:

00406050: 57 72 6F 6E 67 20 70 61 73 73 77 6F 72 64 0A 00 Wrong password..

We are almost there. The pointer has led us to the "Wrong password" string, which the next function outputs to the screen. Therefore, a nonzero EAX value indicates a wrong password, and a zero value indicates a correct one.

OK, let's look at the branch of the program that handles a valid password.

0040105F: E8 D0 01 00 00 call 00401234
00401064: 83 C4 04 add esp, 4
00401067: EB 02 jmp 0040106B
00401069: EB 16 jmp 00401081
...
00401081: 68 60 60 40 00 push 406060h
00401086: E8 A9 01 00 00 call 00401234

Well, we see one more pointer. The 0x401234 function was already encountered; it's (presumably) used for string output. We can find the strings in the data segment. This time, "Password OK" is referenced.

The following are some working suggestions: If we replace the JE instruction with JNE, the program will reject the real password as incorrect, and all incorrect passwords will be accepted. If we replace TEST EAX, EAX with XOR EAX, EAX, upon executing this instruction, the EAX register will always contain zero, no matter what password is entered.

Just a trifle remains: to find these bytes in the executable file and correct them.

Wednesday, August 5, 2009

Hackers Disassembling 1.1.2 (Step Two: Getting Acquainted with the Disassembler)

0 comments:

Post a Comment

Hacking KING

Hits on this blog from 6 july 2009

Blog Archive

hit counter