Why is it difficult to identify the constructor? First, in most cases, the constructor is called automatically when a new instance of the object is created. This makes it the first function to be called — but only if it is called. The constructor is optional; it may be present in an object, or it may not. Therefore, the function called first isn't always a constructor!
Second, by looking at the description of the C++ language, we learn that the constructor doesn't return a value. This is unusual in regular functions, but this feature isn't unique and can't be used to reliably identify the constructor. What should we do then?
According to the standard, the constructor shouldn't throw exceptions automatically, even if the memory allocation for the object fails. Most compilers implement this requirement by placing a check for a null pointer before evoking the constructor. The control is passed to the constructor only if memory for the object has been allocated successfully.
In contrast, the other functions of the object are always called, even if the attempt to allocate memory was unsuccessful. To be precise, the other functions try to be called. If a memory allocation error occurs, a null pointer is returned. This causes an exception to be thrown when the first call is attempted for these functions. The control is then passed to the handler of the corresponding exception.
Thus, the function enclosed only by checks for a null pointer is a constructor. Theoretically, a similar check can be used when other functions are called, but I have not come across such functions yet.
The destructor, like the constructor, is optional; the object's function called last must not necessarily be a destructor. Nevertheless, it's simple to distinguish a destructor from any other function — it's called only if memory has been successfully allocated and the object has been created. This is a documented property of the language; it must be implemented by all compilers. Just as with the constructor, a "ring" of null pointers is placed in the code, but no confusion arises because the constructor is called first, and the destructor last.
An object consisting entirely of one constructor or one destructor is a special case. How can we figure out what we're dealing with? The call of a constructor is practically always followed by a code that resets this to zero if memory allocation was unsuccessful; there's nothing of the kind for a destructor. What's more, the destructor is rarely called directly from the parent procedure. Instead, the destructor is called from a function wrapper, along with the delete operator that releases the memory acquired by the object. So, it's quite possible to distinguish a constructor from a destructor!
To better understand these distinctions, let's consider the following example:
Listing 52: An Example of a Constructor and a Destructor
#include
class MyClass{
public:
MyClass(void);
void demo(void);
~MyClass(void);
};
MyClass::MyClass()
{
printf("Constructor\n");
}
MyClass::~MyClass()
{
printf("Destructor\n");
}
void MyClass::demo(void)
{
printf("MyClass\n");
}
main()
{
MyClass *zzz = new MyClass;
zzz->demo();
delete zzz;
}
In general, the disassembled code of the compiled version of this example should look like this:
Listing 53: The Disassembled Code for a Constructor and a Destructor
Constructor proc near ; CODE XREF: main+11↓p
; This is a constructor function. We can make sure that this is
; a constructor by looking at an implementation of its call.
; (See the main function below.)
push esi
mov esi, ecx
push offset aConstructor ; "Constructor\n"
call printf
add esp, 4
mov eax, esi
pop esi
retn
Constructor endp
Destructor proc near ; CODE XREF: __destructor+6↓p
; This is a destructor function. We can make sure that this is
; a destructor by looking at the implementation of its call.
; (See the main function below.)
push offset aDestructor ; "Destructor\n"
call printf
pop ecx
retn
Destructor endp
demo proc near ; CODE XREF: main+1E↓p
; This is an ordinary demo function.
push offset aMyclass ; "MyClass\n"
call printf
pop ecx
retn
demo endp
main proc near ; CODE XREF: start+AF↓p
push esi
push 1
call ??2@YAPAXI@Z ; operator new(uint)
add esp, 4
; Memory is allocated for a new object,
; or, rather, an attempt is made to do so.
test eax, eax
jz short loc_0_40105A
; A check for whether the allocation of memory for the object
; is successful. Pay attention to the jump destination.
; The destination is XOR ESI, ESI, which resets the poiner to the object.
; Attempting to use the null pointer causes an exception
; to be thrown, but the constructor should not throw an exception,
; even though allocating memory for the object is unsuccessful.
; Therefore, the constructor gets control
; only if the memory allocation is a success.
; Hence, the function preceding XOR ESI, ESI is just a constructor!
mov ecx, eax
; The this pointer is prepared.
call Constructor
; This function is a constructor, since it's called
; only if the memory allocation is a success.
mov esi, eax
jmp short loc_0_40105C
loc_0_40105A: ; CODE XREF: main+D↓j
xor esi, esi
; The pointer to the object is reset
; to cause an exception when attempting to use the pointer.
; Attention: The constructor never throws an exception,
; therefore, the function below definitely isn't a constructor.
loc_0_40105C: ; CODE XREF: main+18↑j
mov ecx, esi
; The this pointer is prepared.
call demo
; An ordinary function of the object is called.
test esi, esi
jz short loc_0_401070
; Checking the this pointer for NULL. The destructor is called
; only if memory for the object has been allocated.
; (If not, we likely have nothing to release.)
; Thus, the following function is a destructor and nothing else.
push 1
; The number of bytes to release. (This is necessary for delete.)
mov ecx, esi
; Preparing the this pointer.
call __destructor
; The destructor is called.
loc_0_401070: ; CODE XREF: main+25↑j
pop esi
retn
main endp
__destructor proc near ; CODE XREF: main+2B↑p
; This is a destructor function. Notice that the destructor
; is usually called from the same function as delete.
; (This is not always the case.)
arg_0 = byte ptr 8
push ebp
mov ebp, esp
push esi
mov esi, ecx
call Destructor
; A user-defined destructor function is called.
test [ebp+arg_0], 1
jz short loc_0_40109A
push esi
call ??3@YAXPAX@Z ; operator delete(void *)
add esp, 4
; Memory is released, previously allocated for the object.
loc_0_40109A: ; CODE XREF: __destructor+F↑j
mov eax, esi
pop esi
pop ebp
retn 4
__destructor endp
For objects in automatic memory, the constructor/destructor can't be identified If an object is placed on the stack (in automatic memory), no checks are performed for the success of its allocation. In this case, the call of the constructor becomes indistinguishable from the calls of other functions. The situation is similar with the destructor: The stack memory is released automatically upon the function's completion, and the object ceases without evoking delete (only used for deleting objects from the heap).
To make sure of this, let's modify the main function of our previous example as follows.
Listing 54: A Constructor/Destructor for an Object on the Stack
main()
{
MyClass zzz;
zzz.demo();
}
In general, the result of compiling this code should look like this:
Listing 55: The Compilation of a Constructor/Destructor for an Object on the Stack
main proc near ; CODE XREF: start+AF↓p
var_4 = byte ptr -4
; The zzz local variable is an instance of the MyClass object.
push ebp
mov ebp, esp
push ecx
lea ecx, [ebp+var_4]
; The this pointer is prepared.
call constructor
; The constructor is invoked just like an ordinary function!
; We can guess, although not with complete certainty,
; that this is a constructor judging from its contents alone.
; (The constructor usually initializes an object.)
lea ecx, [ebp+var_4]
call demo
; Notice that calling the demo function
; doesn't differ from calling the constructor!
lea ecx, [ebp+var_4]
call destructor
; Calling the destructor, as we already understand,
; has no specific peculiarities.
mov esp, ebp
pop ebp
retn
main endp
Identifying the constructor/destructor in global objects Global objects (also known as static objects) are allocated in the data segment at compile time. Hence, memory allocation errors are basically impossible. Does this mean that, as with stack objects, we can't reliably identify the constructor/destructor? Not quite.
A global object is accessible from many places in the program, but its constructor should be called only once. How is this done? Most compilers simply use a global variable-flag initially equal to zero, then incremented by one before the first call of the constructor (set to TRUE, in a more general case). We just have to find out whether the flag is equal to zero when the program iterates. If it's not, we must skip calling the constructor. Once again, the constructor is encircled with a branch that allows us to reliably distinguish it from all other functions.
Things are easier still with the destructor: If the object is global, it's deleted only when the program is completed. And what can trace it, besides run time type information (RTTI)? A special function, such as _atexit, receives the pointer to the destructor, saves it, and then invokes it when it becomes necessary. The special function should be called only once. To avoid using yet another flag, it's called just after the constructor is invoked. At first, the object may seem to consist of the constructor/destructor only, but this is not the case! Don't forget that _atexit doesn't immediately pass control to the destructor code; it only remembers the pointer to it for later use.
Thus, it's simple to identify the constructor/destructor of the global object, as the following example proves:
Listing 56: A Constructor/Destructor for a Global Object
main()
{
static MyClass zzz;
zzz.demo();
}
Generally, the result of compiling this code should look like this:
Listing 57: The Compilation of a Constructor/Destructor for a Global Object
main proc near ; CODE XREF: start+AF↓p
mov cl, byte_0_4078E0 ; This is a flag for initializing
; the instance of the object.
mov al, 1
test al, cl
; Is the object initialized?
jnz short loc_0_40106D
; Yes, it's initialized; the constructor shouldn't be called.
mov dl, cl
mov ecx, offset unk_0_4078E1 ; This is an instance of the object.
; The this pointer is prepared.
or dl, al
; The initialization flag is set,
; and the constructor called.
mov byte_0_4078E0, dl ; This is a flag for initializing
; the instance of the object.
call constructor
; Notice that the instance of the object
; is already initialized (see the check above),
; and the constructor isn't called.
; Thus, it can be easily identified!
push offset thunk_destructo
call _atexit
add esp, 4
; This is passing pointer to the destructor to the _atexit function.
; The destructor should be called upon completion of the program.
loc_0_40106D: ; CODE XREF: main+A↑j
mov ecx, offset unk_0_4078E1 ; This is an instance of the object.
; The this pointer is prepared.
jmp demo
; Calling demo
main endp
thunk_destructo: ; DATA XREF: main+20↑o
; This is a thunk to the destructor function.
mov ecx, offset unk_0_4078E1 ; This is an instance of the object.
jmp destructor
byte_0_4078E0 db 0 ; DATA XREF: mainr main+15↑w
; This is a flag for initializing
; the instance of the object.
unk_0_4078E1 db 0 ; DATA XREF: main+Eo main+2D↑o...
; This is an instance of the object.
Similar code is generated by Borland C++; the only difference is Borland creates artful calls of all destructors. These calls are placed in a special procedure that usually resides before or near library functions, and so they're easy to identify. Take a look:
Listing 58: A Constructor/Destructor for a Global Object Using Borland C++
_main proc near ; DATA XREF: DATA:00407044↓o
push ebp
mov ebp, esp
cmp ds:byte_0_407074, 0 ; A flag for initializing the object
jnz short loc_0_4010EC
; If the object is already initialized, the constructor isn't called.
mov eax, offset unk_0_4080B4 ; An instance of the object
call constructor
inc ds:byte_0_407074 ; A flag for initializing the object
; The flag is incremented by one
; (to set the TRUE value).
loc_0_4010EC: ; CODE XREF: _main+A↑j
mov eax, offset unk_0_4080B4 ; An instance of the object
call demo
xor eax, eax
pop ebp
retn
_main endp
call_destruct proc near ; DATA XREF: DATA:004080A4↓o
; This function contains the calls of all the destructors of global
; objects. Since the call of each destructor is encircled by the check
; for the initialization flag, this function can be easily identified -
; only this function contains such encircling code. (Calls of
; constructors are usually scattered over the entire program.)
push ebp
mov ebp, esp
cmp ds:byte_0_407074, 0 ; A flag for initializing the object
jz short loc_0_401117
; Is the object initialized?
mov eax, offset unk_0_4080B4 ; An instance of the object
; The this pointer is prepared.
mov edx, 2
call destructor
loc_0_401117: ; CODE XREF: call_destruct+A↑j
pop ebp
retn
call_destruct endp
Virtual destructor A destructor can be virtual, too! It's useful if the instance of a derived class is deleted using the pointer to the base object. Since virtual functions belong to the class of an object, not the class of a pointer, a virtual destructor is invoked according to the object type, not the pointer type. However, these subtleties concern programming. Code diggers are interested in how to identify the virtual destructor. It's easy — a virtual destructor combines the properties of a typical destructor and of a virtual function. (See the "Virtual Functions" section.)
Virtual constructor Is there such a thing? Standard C++ doesn't support anything of the kind. That is, it doesn't directly support a virtual constructor; programmers seldom need one. Still, if they do, they could write some emulating code. The code is placed in a virtual function (not the constructor!) specially chosen for this purpose. It looks approximately like this: return new (class_name) (*this). This trick is not pretty, but it works.
Certainly, there are other solutions. A detailed discussion of them is beyond the scope of this book. It would require a profound knowledge of C++, would occupy too much space, and would hardly interest most readers.
So, the identification of a virtual constructor is basically impossible because it lacks a concept. Its emulation can be performed using any of dozens of solutions—go ahead; try and count them all. However, in most cases, virtual constructors are virtual functions that take the this pointer as an argument and return the pointer to a new object. This isn't a reliable identification criterion, but it's better than nothing.
One constructor, two constructor… There may be more than one object constructor. This doesn't influence the analysis in any way. Only one constructor is chosen by the compiler, depending on the declaration of the object, and it is invoked for each object instance. One essential detail: Various object instances may invoke various constructors—be on guard!
More than one way to skin a cat, or, Attention: the empty constructor Certain limitations of a constructor (no returned value, in particular) have resulted in the empty-constructor programming style. The constructor is deliberately left empty, and all initializing code is placed in a special member function called Init. The strong and weak points of such a style could be the subject of a separate discussion. It's enough for code diggers to know that such a style exists and is actively used—not only by individual programmers, but also by giant companies such as Microsoft. Therefore, don't be surprised if you encounter a call of an empty constructor; just look for the initializing function among the ordinary members.
0 comments:
Post a Comment