The new and delete Operators
The compiler translates the new and delete operators into calls of library functions that can be precisely recognized like ordinary library functions. (See the "Library Functions" section.) IDA Pro, in particular, can recognize library functions automatically, removing this concern from the user's mind. However, not everyone has IDA Pro. In addition, it doesn't know all library functions and doesn't always recognize new and delete among those it knows. Thus, there are lots of reasons for identifying these functions manually.
The new and delete operators can have any implementation, but Windows compilers rarely implement functions for working independently with the heap — there's no need for them. It's easier to use the operating system services. However, it's naive to expect the call of HeapAlloc instead of new, or HeapFree instead of delete. The compiler isn't that simple! The new operator is translated into the new function, which calls malloc for memory allocation; malloc, in turn, calls heap_alloc (or a similar function, depending on the implementation of the memory management library). This function acts as a "wrapper" for the Win32 API procedure of the same name. Releasing memory is performed in a similar way.
It's too tedious to go deep into the jungle of nested calls. But it's possible to identify new and delete in a less laborious way. Let's recall all we know about the new operator:
The new operator takes only one argument — the number of memory bytes to be allocated. This argument is calculated at compile time in the majority of cases, making it a constant.
If the object contains neither data nor virtual functions, its size is one memory unit (the minimum memory allocation, giving this something to point to). Therefore, a lot of calls will be of the PUSH 01\CALL xxx type, where xxx is simply the new address. The typical size of objects is less than 100 bytes. So look for a frequently called function that has a constant smaller than 100 bytes as an argument.
The new function is one of the most popular library functions, so look for a function that has a "crowd" of cross-references.
Impressively, new returns the this pointer, and this is easily identified, even when you are glancing over the code. (See "The this Pointer" section.)
The result returned by new is always checked for equality to zero. If it equals zero, the constructor (if there is one — see the "Constructors and Destructors" section) isn't called.
The new operator has more "birthmarks" than necessary to quickly and reliably identify it, so there's no need to waste time analyzing its code. Still, keep in mind that new is used not only to create new object instances, but also to allocate memory for arrays, structures, and, occasionally, single variables (of the sort int *x = new int; it's usually pretty stupid, but some people do it). Fortunately, it's simple to distinguish the creation of a new object instance from the allocation of memory — neither arrays, nor structures, nor single variables have the this pointer!
To sum up, let's consider a code fragment generated by the Watcom compiler (IDA Pro doesn't recognize its "native" new operator):
Listing 68: Identifying the new Operator
main_ proc near ; CODE XREF: __CMain+40↓p
push 10h
call __CHK
push ebx
push edx
mov eax, 4
call W?$nwn_ui_pnv
; This is, as we'll find out later, the new function.
; By the way, IDA has recognized it,
; but you'd have to be a psychic to recognize the memory
; allocation operator in this gobbledygook! For now, notice
; that the new function accepts one argument - a constant of a very
; small value. That is, it's certainly not an offset.
; (See the "Constants and Offsets" section.)
; Passing the argument via the register tells us nothing -
; Watcom treats many library functions in such a manner.
; Other compilers, on the other hand,
; always push an argument to the stack.
mov edx, eax
test eax, eax
; The result returned by the function is checked for null
; (this is typical for new).
jz short loc_41002A
mov dword ptr [eax], offset BASE_VTBL
; Aha! The function has returned a pointer to the location where
; the pointer to the virtual table (or at least to an array
; of functions) is written. EAX suspiciously resembles this,
; but to be sure of this, we need more signs.
loc_41002A: ; CODE XREF: main_+1A↑j
mov ebx, [edx]
mov eax, edx
call dword ptr [ebx]
; There is no longer any doubt that EAX is the this pointer,
; and this code is
; just the call of a virtual function!
; Hence, the W?$nwm_ui_pnv function is new
; (what else could have returned this?)
It's more difficult to identify delete. This function has no specific features. It takes only one argument: a pointer to the memory area to be released. In the most cases, the pointer is this. But, the this pointer is accepted by dozens, if not hundreds, of other functions! There's one, tiny distinction between delete and other functions - in most cases, delete receives the this pointer via the stack, whereas the other functions receive it via the register. Unfortunately, certain compilers (for example, Watcom) pass arguments via registers to many library functions, hiding all distinctions. What's more, delete returns nothing, but there are lots of functions that behave in the same way. The call of delete follows the call of the destructor (if there is one), but the destructor is identified as the function preceding delete. Now we're in a vicious circle!
All we can do is analyze the contents of the function - sooner or later, delete calls HeapFree. Other variants are possible; Borland C++, for example, contains libraries that work with the heap at the low level, so memory is released by calling VirtualFree. Fortunately, IDA Pro identifies delete in most cases, and you needn't strain yourself.
Approaches to implementing the heap In many programming manuals on C++ ("Advanced Windows" by Jeffrey Richter, for example), you will be urged to allocate memory using new, not malloc. This is because new employs the effective memory management tools of the operating system, whereas malloc uses its own (rather sluggish) heap manager. But this is a rather strained argument! The standard says nothing about heap implementation, and we don't know beforehand which function will appear more effective. Everything depends on the specific libraries of a specific compiler.
Let's consider memory management in the standard libraries of three popular compilers: Microsoft Visual C ++, Borland C++, and Watcom C++.
In Microsoft Visual C++, both malloc and new represent a thunk to the same __nh_malloc function. Therefore, either function can be used successfully. __nh_malloc calls __heap_alloc, which, in turn, calls the Windows API function __heapAlloc. (It's worth nothing that __heap_alloc can call its own heap manager if the system's manager is unavailable. In Microsoft Visual C++ 6.0, however, only one wrapper of the hook remains; its own heap manager was excluded.)
Things are quite different in Borland C++. First, it works directly with the Windows virtual memory and implements its own heap manager based on VirtualAlloc/VirtualFree functions. Testing shows that it performs poorly on Windows 2000. (I didn't test other systems.) It also places superfluous code in the program that increases its size. In addition, new calls the malloc function — not directly, but through several layers of "wrapping" code! Therefore, contrary to all guidelines, in Borland C++, calling malloc is more effective than calling new.
Watcom C++ (its eleventh version, in any case — the latest one I could find) implements new and malloc in practically identical ways: Both of them refer to _nmalloc, a "thick" wrapper of LocalAlloc, the 16-bit Windows function that itself is a thunk to HeapAlloc!
Tuesday, September 22, 2009
Subscribe to:
Post Comments (Atom)
0 comments:
Post a Comment