View Single Post
  #25  
Old 08-13-2004, 00:06
mihaliczaj
 
Posts: n/a
Exe2C
5-10 years ago I found a program called Exe2C. You can find some references to it on the Internet.

It produced a C program that theoretically results in the same .exe. Of course the exact result depends on the compiler and the optimizations, but one can see the functions, the global data areas that are referenced by some functions etc.

Theoretical thoughts
If you fix some parameters: compiler, optimization flags, platform, endianness etc., exact language rules, then in my opinion it is possible to write a program that recompiles an .exe to such a source code that produces the same output compiling with the fixed parameters.

Unfortunately not all these parameters can be retrieved from the .exe.

If somehow we had this information, it is still impossible to get the same source code (regardless of the names of course). There are lots of info that is not preserved even if you have a not optimized compilation.
Just an example:
class C
{
int m_iX;
public:
static int GetX_static( C *pThis ) { return pThis->m_iX; }
int GetX() { return m_iX; }
friend void GetX_global( C *pThis );
};
void GetX_global( C *pThis ) { return pThis->m_iX; }

There is no difference in the resulted code of these four (member )functions.
Most C++ language elements have their equivalent in C, and it is impossible to differentiate the resulting assembly code (assuming there is no debug info in the .exe, but this is the usual case).

There are some language elements (exceptions, virtual base classes) that cannot be directly translated to their C equivalent, so they can be recognized and rebuilt.

For a long time a C++ compiler (cfront, originally written by B.Stroustrup) was just a C++ to C compiler. When new language elements have been added (exceptions, templates etc.) this became impossible.
About the details of the implementation of different C++ language elements a very good description can be found in "C++ Object Model" of Stanley B. Lippman. It describes the internal structures for virtual inheritance and the structures used to handle member function pointers to virtual functions of virtual base classes among other things.

Conclusions
I think it is a reasonable target to write an .exe to C decompiler, but it is almost impossible to get back some really useful C++ extra. Knowing the compiler and having debug info can help a lot.
Virtual tables and virtual functions can be recognized, but there is no cue for templates and inline functions.

The optimization is a general problem that occurs in the case of all languages, because there is optimization at the language level, but there is also optimization at assembly level, that can hide the originally visible constructs.
Reply With Quote