The function (also called a procedure or a subroutine) is the main structural unit of procedural and object-oriented languages. Therefore, disassembling a code usually starts with identifying functions and the arguments passed to them.
Strictly speaking, the term "function" is not used in all programming languages. Even when it is, its definition varies from one language to another. Without going into detail, we'll take a function to be a separate routine that can be called from various parts of a program. A function either can accept one or more arguments, or it can reject all of them; it can return the result of its execution, or it can return nothing. This isn't important. The key property of the function is returning control to the place from which it was called, and its characteristic feature is it can be called repeatedly from various parts of a program (although some functions are called from only one place).
How does a function know where it should return control? Obviously, the calling code should save a return address and pass it to the called function along with the other arguments. There are plenty of ways to solve this problem: We can, for example, place an instruction for an unconditional jump to the return address at the end of the function before calling it. We also could save the return address in a special variable, then, when the function's execution is complete, make an indirect jump using this variable as an operand of the JUMP instruction. Without going into a discussion of the strong and weak points of each method, I'd like to note that in most cases, compilers use the CALL and RET special machine instructions for calling functions and returning from them.
The CALL instruction pushes the address of the instruction following it on top of the stack, and RET pops it out from there and passes control to it. The address to which the CALL instruction points is the address of the function's beginning. The RET instruction ends the function. (However, be aware that not every RET designates a function's end! See the "Values Returned by Functions" section for more details on this issue.)
Thus, we can recognize a function in two ways: by cross-references that lead to the CALL machine instruction, or by its epilog, which ends with the RET instruction. The cross-references and the epilog allow us to determine the addresses of the function's beginning and end. Jumping a bit ahead (see the section "Local Stack Variables"), I'd like to note that at the beginning of many functions, there is a special sequence of instructions, called a prolog, which is also suitable for identifying functions.
Cross-references Looking up the disassembled code, let's find all CALL instructions; the contents of their operands are simply the required addresses of the function's beginning. The addresses of the nonvirtual functions called by name are calculated during compilation, and the operand of the CALL instruction in such cases represents an immediate value. Thanks to this, the address of the function's beginning can be discovered by simple analysis: Using a context search, we find all CALL substrings and remember (or write down) immediate operands.
Let's consider the following example:
Listing 19: A Direct Function Call
func();
main()
{
int a;
func();
a=0x666;
func();
}
func()
{
int a;
a++;
}
The result of its compilation should look approximately like this:
Listing 20: The Disassembled Code for the Direct Function Call
.text:00401000 push ebp
.text:00401001 mov ebp, esp
.text:00401003 push ecx
.text:00401004 call 401019
.text:00401004 ; Here we've caught the CALL instruction
.text:00401004 ; in which the immediate operand
.text:00401004 ; is an address of the function's beginning or,
.text:00401004 ; to be more exact, its offset in the code segment
.text:00401004 ; (.text, in this case). Now we can jump to the
.text:00401004 ; .text:00401019 line, and, having named the function,
.text:00401004 ; replace the operand of the CALL instruction with
.text:00401004 ; the call offset Function_name construction.
.text:00401009 mov dword ptr [ebp-4], 666h
.text:00401010 call 401019
.text:00401010 ; Here is yet another function call! Having looked at
.text:00401010 ; the .text:401019 line, we see that this combination
.text:00401010 ; of instructions is already defined as a function,
.text:00401010 ; and all we have to do is to replace call 401019 with
.text:00401010 ; call offset Function_name.
.text:00401015 mov esp, ebp
.text:00401017 pop ebp
.text:00401018 retn
.text:00401018 ; Here we've encountered an instruction for returning from
.text:00401018 ; the function; however, this is not necessarily
.text:00401018 ; the function's end - a function can have several exits.
.text:00401018 ; However, next to this ret is the beginning of
.text:00401018 ; My function, identifiable by the operand of
.text:00401018 ; the call instruction. Since the functions can't overlap,
.text:00401018 ; it seems likely that this ret is the function's end!
.text:00401019 push ebp
.text:00401019 ; The operands of several call instructions
.text:00401019 ; have references to this line.
.text:00401019 ; Hence, this is an address of a function's beginning.
.text:00401019 ; Every function should have its own name.
.text:00401019 ; How should we name it? Let's name it My function. :-)
.text:0040101A mov ebp, esp
.text:0040101C push ecx
.text:0040101D mov eax, [ebp-4]
.text:00401020 add eax, 1 ; This is the body of My function.
.text:00401023 mov [ebp-4], eax
.text:00401026 mov esp, ebp
.text:00401028 pop ebp
.text:00401029 retn
.text:00401029; This is the end of My function.
As you can see, everything is pretty simple. However, the task becomes much more complicated if the programmer (or the compiler) uses indirect function calls, passing their addresses via the register and dynamically calculating the address while executing the program. In particular, an operation with virtual functions is implemented just in this way. (See the section "Virtual Functions.") In any case, the compiler should somehow save the function address in code, which allows us to find and calculate it. It's even easier to load the analyzed application in the debugger, set a breakpoint on the CALL instruction "under investigation", and, after waiting for the debugger to pop up, see to which address it will pass control.
Let's consider the following example:
Listing 21: Calling a Function Using a Pointer
func();
main()
{
int (a*) ();
a=func;
a();
}
Generally, the result of its compilation should be this:
Listing 22: The Disassembled Code for Calling a Function Using a Pointer
.text:00401000 push ebp
.text:00401001 mov ebp, esp
.text:00401003 push ecx
.text:00401004 mov dword ptr [ebp-4], 401012
.text:0040100B call dword ptr [ebp-4]
.text:0040100B ; Here is the CALL instruction that implements
.text:0040100B ; an indirect call of the function
.text:0040100B ; at the address contained in the [EBP-4] cell.
.text:0040100B ; How can we know what is contained there?
.text:0040100B ; Let's scroll the disassembler screen up a little
.text:0040100B ; until we encounter the mov dword ptr [ebp-4], 401012 line.
.text:0040100B ; Aha! Then control is passed to the .text:401012 address.
.text:0040100B ; This is exactly the address of the function's beginning.
.text:0040100B ; Let's name the function and replace
.text:0040100B ; mov dword ptr [ebp-4], 401012 with
.text:0040100B ; mov dword ptr [ebp-4], offset Function_name.
.text:0040100E mov esp, ebp
.text:00401010 pop ebp
.text:00401011 retn
Some quite rare programs use indirect calls of functions that involve a complex calculation of their addresses. Let's consider the following example:
Listing 23: Calling a Function Using a Pointer and a Complex Calculation of the Target Address
func_1();
func_2();
func_3();
main ()
{
int x;
int a[3]=((int) func_1, (int) func_2, (int) func_3);
int (*f) ();
for (x=0; x<3; x++)
{
f=(int (*) ()) a[x];
f();
}
}
Generally, the result of disassembling should look like this:
Listing 24: The Disassembled Code for Calling a Function Using a Pointer and a Complex Calculation of the Target Address
.text:00401000 push ebp
.text:00401001 mov ebp, esp
.text:00401003 sub esp, 14h
.text:00401006 mov [ebp+0xC], offset sub_401046
.text:0040100D mov [ebp+0x8], offset sub_401058
.text:00401014 mov [ebp+0x4], offset sub_40106A
.text:0040101B mov [ebp+0x14], 0
.text:00401022 jmp short loc_40102D
.text:00401024 mov eax, [ebp+0x14]
.text:00401027 add eax, 1
.text:0040102A mov [ebp+0x14], eax
.text:0040102D cmp [ebp+0x14], 3
.text:00401031 jge short loc_401042
.text:00401033 mov ecx, [ebp+0x14]
.text:00401036 mov edx, [ebp+ecx*4+0xC]
.text:0040103A mov [ebp+0x10], edx
.text:0040103D call [ebp+0x10]
.text:0040103D ; This is the indirect function call. And what's
.text:0040103D ; in [EBP+0x10]? Having looked at the previous line,
.text:0040103D ; we see that we have the EDX value in [EBP+0x10].
.text:0040103D ; And what is the EDX value? Scrolling up for one line,
.text:0040103D ; we see that EDX is the same as the contents of
.text:0040103D ; the [EBP+ECX*4+0xC] location.
.text:0040103D ; What a mess! Besides the fact that we have
.text:0040103D ; to learn the contents of this cell, we also have
.text:0040103D ; to calculate its address! What is ECX equal to?
.text:0040103D ; The contents of [EBP+0x14], it seems.
.text:0040103D ; And what is the value of [EBP+0x14]?
.text:0040103D ; "Just a moment," we murmur, scrolling up the
.text:0040103D ; disassembler window. Got it! In line 0x40102A,
.text:0040103D ; EAX's contents are loaded into it.
.text:0040103D ; It's certainly possible to waste a lot of time
.text:0040103D ; and effort reconstructing the entire key algorithm
.text:0040103D ; (especially now that we've come to the end of the analysis),
.text:0040103D ; but are there any guarantees that there will be no
.text:0040103D ; mistakes? It's much faster and more reliable to load
.text:0040103D ; the program being investigated into the debugger, set
.text:0040103D ; a breakpoint on line text:0040103D, and then, wait
.text:0040103D ; until the debugger window pops up and see what is there
.text:0040103D ; in the [EBP+0x10] cell. The debugger will pop up three
.text:0040103D ; times, and it will show a new address each time! Bear in
.text:0040103D ; mind that you will only notice this in the disassembler after
.text:0040103D ; you have completed the entire reconstruction of the
.text:0040103D ; algorithm. However, you shouldn't cherish any illusions
.text:0040103D ; about the power of the debugger. A program can
.text:0040103D ; call the same function one thousand times, and can
.text:0040103D ; call a different function the one thousand first time!
.text:0040103D ; The debugger cannot reveal this. The fact is that
.text:0040103D ; such a function call can occur at any moment - when
.text:0040103D ; a certain combination of the current time, the phase of the
.text:0040103D ; moon, and the data processed by the program occurs, for
.text:0040103D ; example. Certainly, we aren't going to run the program
.text:0040103D ; under the debugger for ages. The disassembler is
.text:0040103D ; quite another matter. A complete reconstruction
.text:0040103D ; of the algorithm will allow us to unequivocally
.text:0040103D ; and reliably trace all addresses of indirect calls.
.text:0040103D ; That's why the disassembler and the debugger
.text:0040103D ; should be galloping in one harness.
.text:00401040 jmp short loc_401024
.text:00401042
.text:00401042 mov esp, ebp
.text:00401044 pop ebp
.text:00401045 retn
The most difficult cases are "manual" function calls that use a JMP instruction that preliminarily pushes the return address on the stack. In general, a call using JMP looks like this: PUSH ret_addrr/JMP func_addr, where ret_addrr and func_addr are a direct or an indirect return address and the function's beginning address, respectively. (By the way, keep in mind that the PUSH and JMP instructions don't always follow one after the other; occasionally, they are separated by other instructions.)
You might ask: What is so bad about CALL, and why do we use JMP at all? The function called by the CALL instruction always passes control to the instruction next to CALL after returning control to the parent function. In some cases (for example, when performing structured exception handling), the execution returns from the function and continues from a completely different branch of the program, rather than from the instruction next to CALL. If this is the case, we have to manually specify the required return address and call the child function using JMP.
It can be very difficult to identify such functions (especially if they have no prolog); a context search gives no result because the body of any program contains plenty of JMP instructions used for near jumps. How, then, can we analyze all of them? If we don't identify the functions, two of them will drop out of sight — the called function and the function to which control is passed just upon returning. Unfortunately, there is no quick and easy solution to this problem; the only hook here is that the calling JMP practically always goes beyond the boundaries of the function in whose body it's located. We can determine the boundaries of a function by using an epilog.
Let's consider the following example:
Listing 25: A "Manual" Function Call Using JMP
funct();
main()
{
__asm
{
LEA ESI, return_addr
PUSH ESI
JMP funct
return_addr:
}
}
Generally, the result of its compilation should look like this:
Listing 26: The Disassembled Code for a "Manual" Function Call Using JMP
.text:00401000 push ebp
.text:00401001 mov ebp, esp
.text:00401003 push ebx
.text:00401004 push esi
.text:00401005 push edi
.text:00401006 lea esi, [401012h]
.text:0040100C push esi
.text:0040100D jmp 401017
.text:0040100D ; This would seem to be a simple branch -
.text:0040100D ; what could possibly be unusual in it? However, it's not
.text:0040100D ; a simple branch, but a masked function call. How
.text:0040100D ; do we know this? Let's go to address 0x401017 and see.
.text:0040100D ; .text:00401017 push ebp
.text:0040100D ; .text:00401018 mov ebp, esp
.text:0040100D ; .text:0040101A pop ebp
.text:0040100D ; .text:0040101B retn
.text:0040100D ; What do you think - where does this ret return control?
.text:0040100D ; Naturally, to the address that lies on the top of the
.text:0040100D ; stack. And what do we have there? PUSH EBP from line
.text:0040100D ; 401017 is popped back by the POP from line 40101B.
.text:0040100D ; Well... let's return back to the JMP instruction
.text:0040100D ; and begin slowly scrolling the disassembler window up,
.text:0040100D ; tracing all calls to the stack. Here it is!
.text:0040100D ; The PUSH ESI instruction from line 401000C throws the
.text:0040100D ; contents of the SI register onto the top of the stack,
.text:0040100D ; and the register takes the value of 0x401012,
.text:0040100D ; which is simply the address of the beginning
.text:0040100D ; of the function called by the JMP instruction.
.text:0040100D ; (To be more exact, it's not an address but an offset.
.text:0040100D ; But this isn't of great importance.)
.text:00401012 pop edi
.text:00401013 pop esi
.text:00401014 pop ebx
.text:00401015 pop ebp
.text:00401016 retn
Automatically identifying functions using IDA Pro The IDA Pro disassembler is capable of analyzing operands of the CALL instructions, which allows it to divide the program into functions automatically. Besides which, IDA quite successfully copes with most indirect calls. However, it can't yet master complex calls and manual function calls that use the JMP instruction. This shouldn't cause much distress, since constructions of this kind are extremely rare; they make up less than one percent of "normal" function calls, which are easily recognized by IDA.
Prolog Most nonoptimizing compilers place the following code, called a prolog, at the beginning of the function.
Listing 27: The Generalized Code of a Function Prolog
push ebp
mov ebp, esp
sub esp, xx
Generally, the purpose of a prolog comes down to the following: if the EBP register is used for addressing local variables (as is often the case), it must be saved in the stack before using it. (Otherwise, the called function will make the parent function "go crazy".) Then, the current value of the stack pointer register (ESP) is copied into EBP — the stack frame is opened, and the ESP value is decreased by the size of the memory block allocated for local variables.
The PUSH EBP/MOV EBP, ESP/SUB ESP, xx sequence can be used to find all functions in the file being investigated, including those that have no direct references to them. In particular, IDA Pro uses this technique. However, optimizing compilers know how to address local variables through the ESP register and use EBP, as well as any other general-purpose register. The prolog of the optimized functions consists of only one SUB ESP, xxx instruction. Unfortunately, this sequence is too short to be used as a signature of the function. A more detailed story about function epilogs is ahead. (See the "Local Stack Variables" section.) Therefore, to avoid unnecessary repetition, I don't cover this topic in much detail here.
Epilog At the end of its "life", the function closes the stack frame, moving the stack pointer downward, and then restores the former value of EBP (only if the optimizing compiler didn't address local variables via ESP using EBP as a general-purpose register). The function's epilog can work in one of two ways: Either ESP is increased by the proper value using the ADD instruction, or the EBP value that points to the bottom of the stack frame is copied into it.
Listing 28: The Generalized Code of a Function Epilog
Epilog 1
pop ebp
add esp, 64h
retn
Epilog 2
mov esp, ebp
pop ebp
retn
Note The POP EBP/ADD ESP, xxx and MOV ESP, EBP/POP EBP instructions needn't follow one after another; they can be separated by other instructions. Therefore, a context search is unsuitable for finding epilogs — we will have to use a search on a mask.
If the function was written taking into account the Pascal convention, it should clear the stack of arguments itself. In the overwhelming majority of cases, this is done by the RET n instruction, where n is the number of bytes popped from the stack upon returning. Functions obeying the C convention leave clearing the stack to the code that called them and always terminate with the RET instruction. Windows API functions obey a combination of the C and Pascal conventions — arguments are placed on the stack from right to left, but clearing the stack is done by the function itself. (See the "Function Arguments" section for more details on this.)
Thus, RET can be a sufficient indication of the function's epilog, but not just any epilog is the function's end. If the function has several RET operators in its body (as is often the case), the compiler usually generates an epilog for each of them. Check whether there's a new prolog after the end of the epilog, or if the code of the old function continues. Don't forget that compilers usually don't place a code that never receives control into an executable file. In other words, the function will have only one epilog, and everything after the first RET will be thrown out as unnecessary.
Listing 29: Eliminating Code after the Unconditional RET Operators
int func(int a) push ebp
{ mov ebp, esp
mov eax, [ebp+arg_0]
return a++; mov ecx, [ebp+arg_0]
a=1/a; add ecx, 1
return a; mov [ebp+arg_0], ecx
pop ebp
} retn
On the other hand, if an unplanned exit from the function occurs when some condition becomes true, such a RET will be preserved by the compiler and "embraced" with a branch jumping over the epilog.
Listing 30: A Function with Several Epilogs
int func(int a)
{
if (!a) return a++;
return 1/a;
}
Listing 31: The Disassembled Code for the Compiled Function with Several Epilogs
push ebp
mov ebp, esp
cmp [ebp+arg_0], 0
jnz short loc_0_401017
mov eax, [ebp+arg_0]
mov ecx, [ebp+arg_0]
add ecx, 1
mov [ebp+arg_0], ecx
pop ebp
retn
; Yes, this is obviously the function's epilog;
; but it's followed by a code continuing the function,
; not a new prolog at all!
loc_0_401017: ; CODE XREF: sub_0_401000+7↑j
; This cross-reference that leads us to a branch indicates that this code
; is a continuation of the former function, and not the beginning
; of a new one, since "normal" functions are called using not JMP,
; but CALL! And what if it's an "abnormal" function? Well, that's
; easy to check. Just figure out whether the return address is
; on the top of the stack or not. It isn't there;
; hence, our assumption about the function code's continuation
; is true.
mov eax, 1
cdq
Idiv [ebp+arg_0]
loc_0_401020: ; CODE XREF: sub_0_401000+15↑j
pop ebp
retn
A special remark Starting with the 80286 processor, two instructions — ENTER and LEAVE — have appeared in the instruction set. They are specifically intended for opening and closing the stack frame. However, they are practically never used by present-day compilers. Why? The reason is that ENTER and LEAVE are very sluggish, much slower than PUSH EBP/MOV EBP, ESP/SUB ESB, xxx, and MOV ESP, EBP/POP EBP. So, on a Pentium, ENTER takes ten clock cycles to execute; this sequence normally takes only seven cycles. Similarly, LEAVE requires five clock cycles, although the same operation can be executed in two cycles (or even fewer if you separate MOV ESP,EBP/POP EBP with an instruction. Therefore, the contemporary reader will likely never come across either ENTER or LEAVE. However, it shouldn't be too much trouble to remember their purpose. (You suddenly may want to disassemble ancient programs or programs written in assembler. It's no secret that many of those who write in assembler don't know the subtleties of the processor's operation very well, and their manual optimization is appreciably worse than the compiler's when it comes to performance.)
Naked functions The Microsoft Visual C++ compiler supports the non-standard attribute naked, which allows programmers to create functions without a prolog and an epilog. Yes, without a prolog and an epilog at all! The compiler doesn't even place RET at the end of the function; you have to do it manually using an assembly insert __asm{ret}. (Using return alone doesn't produce the desired result.)
Generally, support for naked functions was planned only for writing drivers in pure C (well, almost pure — with small assembly inserts), but it has gained unexpected recognition among developers of protection mechanisms, too. It's really pleasant to have the possibility of manually creating functions without worrying that the compiler will cripple them in an unpredictable way.
For us code diggers, this means that one or more functions that contain neither a prolog nor an epilog can occur in the program. Well, what's so scary about this? Optimizing compilers also throw out the prolog and leave only the RET from the epilog, but functions are easily identified by their CALL instructions.
Identifying inline functions The most effective way to avoid the overhead inflicted by calling functions is to not call them. Really, why can't we insert the function's code directly in the calling function? Certainly, this will appreciably increase the size (especially as the function is called from more places), but it will also considerably increase the program's performance (if the inline function is called frequently).
Do inline functions hinder the analysis of a program? Well, inlining increases size of the parent function and makes its code less readable. Instead of CALL\TEST AX, EAX\JZ xxx containing an evident branch, we see a heap of instructions reminiscent of nothing and whose operation logic has yet to be figured out.
Recall that we already came across such a technique when analyzing crackme02:
Listing 32: The Disassembled Code for an Inline Function
mov ebp, ds:SendMessageA
push esi
push edi
mov edi, ecx
push eax
push 666h
mov ecx, [edi+80h]
push 0Dh
push ecx
call ebp ; SendMessageA
lea esi, [esp+678h+var_668]
mov eax, offset aMygoodpassword ; "MyGoodPassword"
loc_0_4013F0: ; CODE XREF: sub_0_4013C0+52↑j
mov dl, [eax]
mov bl, [esi]
mov cl, dl
cmp dl, bl
jnz short loc_0_401418
test cl, cl
jz short loc_0_401414
mov dl, [eax+1]
mov bl, [esi+1]
mov cl, dl
cmp dl, bl
jnz short loc_0_401418
add eax, 2
add esi, 2
test cl, cl
jnz short loc_0_4013F0
loc_0_401414: ; CODE XREF: sub_0_4013C0+3C↑j
xor eax, eax
jmp short loc_0_40141D
loc_0_401418: ; CODE XREF: sub_0_4013C0+38↑j
sbb eax, eax
sbb eax, 0FFFFFFFFh
loc_0_40141D: ; CODE XREF: sub_0_4013C0+56↑j
test eax, eax
push 0
push 0
jz short loc_0_401460
To summarize, inline functions have neither their own prolog nor epilog. Their code and local variables (if any) are completely inlined in a calling function — the resulting compilation looks exactly as if there were no function call at all. The only catch is that inlining the function inevitably results in doubling its code in all places where it's used. This can be revealed (although with difficulty), because the inline function, when it becomes a part of a calling function, undergoes optimization in the context of the parent function, which results in significant variations in the code. Let's consider this example:
Listing 33: The Pass-Through Optimization of an Inline Function
#include
__inline int max(int a, int b)
{
if (a>b) return a;
return b;
}
int main(int argc, char **argv)
{
printf("%x\n", max(0x666, 0x777));
printf("%x\n", max(0x666, argc));
printf("%x\n", max(0x666, argc));
return 0;
}
Generally, the result of its compilation will look like this:
Listing 34: The Disassembled Code for the Pass-Through Optimization of an Inline Function
push esi
push edi
push 777h ; This is the code of the first call of max.
; The compiler has already calculated the value
; that the max function returns and inserted it in the program,
; thus getting rid of an extra function call.
push offset aProc ; "%x\n"
call printf
mov esi, [esp+8+arg_0]
add esp, 8
cmp esi, 666h ; The code of the second call of max
mov edi, 666h ; The code of the second call of max
j1 short loc_0_401027 ; The code of the second call of max
mov edi, esi ; The code of the second call of max
loc_0_401027: ; CODE XREF: sub_0_401000+23↑j
push edi
push offset aProc ; "%x\n"
call printf
add esp, 8
cmp esi, 666h ; The code of the third call of max
jge short loc_0_401042 ; The code of the second call of max
mov esi, 666h ; The code of the second call of max
; See how the code of the function has changed! First, the sequence
; of executing instructions has changed: It was CMP -> MOV -> Jx, and
; has become CMP -> Jx-> MOV. Second, the JL branch has mysteriously
; turned into JGE! However, there's nothing really mysterious in this -
; it's just that a pass-through optimization has occurred! Since after
; the third call of the max function, the argc variable that the compiler
; placed in the ESI register isn't used anymore. The compiler can use
; the possibility of directly modifying this register instead of using
; a temporary variable and allocating the EDI register to it.
; (See the "Register and Temporary Variables" section.)
loc_0_401042: ; CODE XREF: sub_0_401000+3B↑j
push esi
push offset aProc ; "%x\n"
call printf
add esp, 8
mov eax, edi
pop edi
pop esi
retn
When calling the function for the first time, the compiler throws out all of its code, having calculated the result of its operation at compile time. (Really, 0x777 is always greater than 0x666; there's no need to waste processor time comparing them.) The second call is only slightly similar to the third one, although the same arguments were passed to both functions. Here, a mask search will not help (nor a context search). Even a skilled expert will fail to understand whether the same function is called or not!
Memory models and 16-bit compilers Up until now, the address of a function was understood as its offset in a code segment. The flat memory model of 32-bit Windows NT/9x packs all three segments — the code segment, the stack segment, and the data segment — in a uniform 4-gigabyte address space, allowing us to forget about the existence of segments in general.
The 16-bit applications for MS-DOS and Windows 3.x are another matter. In these, the maximum segment size is only 64 KB, which is not enough for most applications. In tiny memory models, the code, stack, and data segments are also located in the single address space. In contrast to the flat model, this address space is extremely limited in size, and serious applications should be stuffed in several different segments.
In this case, it's not enough to know the function's offset to call a function — you also must specify the segment in which it's located. However, today we can forget about this old-fashioned rule with a clear conscience. In view of the forthcoming 64-bit Windows version, describing 16-bit code in detail would simply be ridiculous.
Order of translating functions. Most compilers allocate functions in an executable file in the order they were declared in the program.
0 comments:
Post a Comment