Download hex and decompile it from avr - c

I have Orange PI with ubuntu connected to atmega328p through usbasp.
I've developed a program in C, compiled it, translated to hex and uploaded on the atmega, but because of some strange behavior, the file.c is lost.
How can I get my program back from the atmega?

The good news: It is possible, definitively.
The bad news: But it's a lot of work, depending on the size of your application. I did this more than one time with AVR code, written in C, BASCOM, or C++ (Arduino). It takes several hours, for example some 20 hours for a 100-liner in BASCOM.
The approach is:
Disassemble the HEX file. Use this output as reference. You might need some options to have all constant data in the output.
Start with the best approximation of the source that your memory still holds.
Compile, link and convert it into a HEX file, too.
Disassemble this HEX file, and compare the output with the reference.
Repeat editing your source until both disassemblies are equal.
Notes:
You need deep understanding about the translation from C into machine code.
The names of functions and variables can't be reconstructed exactly. These names are gone after compiling and linking.
Be aware that the order of functions in the resulting code might not depend on their appearance in the source. Most compilers do this, though.
Be aware that the order of variables in memory might not depend on their appearance in the source, but on their name. Additionally they are commonly not sorted lexically, for example I found GCC using some kind of hashing algorithm. However, members of structs keep their order, because the standard demands that.
In a first phase, ignore differences of variable placement.
Try to identify functions of the C library, and ignore them. Especially the printf() family draws a lot of other functions with it. When you own code is finished, the library functions will be there, too, most probably.
Final note: If you happen to have the ELF file, use this for disassembling and looking up names. You will be much faster.

Related

How to get a c source code from the compiled code

I have the compiled C code in text format. I need to extract the source code by decompiling the machine code. How to do that?
"True" decompiling is, basically, impossible. Foremost, you can't "decompile" local names (in functions and source code files / modules). For those, you'll get something like, for int local variables: i1, i2... Of course, unless you also have debug information, which is not often the case.
Decompiling to "something" (which might not be very readable) is possible, but it usually relies on some heuristics, recognizing code patterns that compilers generate and can be fooled into generating strange (possibly even incorrect) C code. In practice that means that a decompiler usually works OK for a certain compiler with certain (default) compile options, but, not so nice with others.
Having said that, decompilers do exist and you can try your luck with, say Snowman
As Srdjan has said, in general decompilation of a C (or C++) program is not possible. There is too much information lost during the compilation process. For example consider a declaration such as int x this is 'lost' as it does not directly produce any machine level instruction. The compiler needs this information to do type checking only.
Now, however it is possible to disassembly which is taking the compiled executable back up a level to assembly language. However, interpretation of the assembly might (will ?) be difficult and certainly time consuming. There are several disassemblers available, if you have money IDA-Pro is probably the industry standard in disassemblers, and if you are doing this type work, well worth the several thousand dollars per license. There are a number of open source disassemblers available, google can find them.
Now, that being said there have been efforts to create a decompilers, IDA-Pro has one, and you can look at http://boomerang.sourceforge.net/ in addition to Snowman linked above.
Lastly, other languages are more friendly towards decompilation then C or C++. For example a C# programs is decompilable with tools like dotPeek or ilSpy. Similarly with Java there are a number of tools that can convert Java bytecode back into Java source.
Please post a sample of the "compiled C code in text format."
Perhaps then it will be easier to see what you are trying to achieve.
Typically it is not practical to reverse engineer assembly language into C because much the human readable information in the form of Labels and variable names is permanently lost in the compilation process.

MpLab, ASM, C, Building To accommodate both

I have a large and substantial ASM project for a PIC24 chip. (The specific chip is the PIC24FJ256GB210)
I now have some other routines in C.
I want to incorporate these into my project.
The C routines are in a project of 5 or so files, one of which contains the int main(void) statement as the starting point. This was for the purpose of testing them and giving us the confidence that they work. We are now ready to move that code and incorporate it into the larger existing system.
The assembly language stuff starts with the __reset: instruction.
How do I arrange the project and build options so that I can do these next three things ?
Keep starting with my __reset instruction
(Or at least make sure that my existing __reset and the int main(void) at least cooperate with each other)
Call his routines from the ASM code
Use the same data buffers that the C code sets up
Interestingly enough, Microchip's User forums and sample code sections seem to miss this idea (or, more likely, I haven't figured out how to find them).
I would think this question has been asked a lot, and I hope I'm not duplicating a previous question, but I don't see it here nor on MicroChip's site. Links to helpful websites on this topic are welcome.
If I just need to learn how to search this and other sites better, that will be a useful and workable answer in and of itself. Again, hope I'm not asking a duplicate question.
I recommend you to read DS51284H ("MPLAB® C COMPILER FOR PIC24 MCUs AND dsPIC® DSCs USER’S GUIDE") (PDF).
In particular see section 4.4 STARTUP AND INITIALIZATION
"Two C run-time startup modules are included in the libpic30.a archive/library. The
entry point for both startup modules is __reset. The linker scripts construct a GOTO
__reset instruction at location 0 in program memory, which transfers control upon
device reset.
....
5. The function main is called with no parameters."
Your __reset label and the one in the CRT (C run-time) would appear to conflict. If you have the source for the CRT you could change that by renaming the __reset label in the CRT to something else so that your own __reset always is called first.
Another point is that it sounds like you want to take a stand-alone program and use it as a library from within your own program. Since stand-alone programs often are designed to perform one or more specific tasks and exit once that task is finished you might want to refactor your C code a bit to make it more library-ish (like getting rid of the main() function and perhaps replace it with some sort of init() function).
And section 4.11 FUNCTION CALL CONVENTIONS.
"The first eight working registers (W0-W7) are used for function parameters. Parameters
are allocated to registers in left-to-right order, and a parameter is assigned to the first
available register that is suitably aligned.
....
Function return values are returned in W0 for 8- or 16-bit scalars, W1:W0 for 32-bit
scalars, and W3:W2:W1:W0 for 64-bit scalars."
Michael gave you a good answer. The only thing I would like to add is that you should make the project in C and cut the assembly functions within it.
This way you keep the speedy and functional asm code and can mantain the project in C, which is much easier.
It is not in your interest to convert the C code into assembly and have a large assembly code to mantain, its the other way around.
Once you read the docs you will see it is not so hard to use an assembly function in C, but to get you started, you can take a look at this:
C:\ ...bla bla... \Microchip\MPLAB C30\src\dsp\include\dsp.h
contains function declaration in C for the actual assembly functions located in this folder:
C:\ ...bla bla... \Microchip\MPLAB C30\src\dsp\asm
You can begin with the function _VectorAdd: Vector Addition, file "vadd.s"
Note that the assembly function _VectorAdd is defined as VectorAdd in the header file.
This example files are for the dsp engine in the DSPIC, something the PIC24 does not feature. But it is still ilustrative enough to extract the principle.

Combining source code into a single file for optimization

I was aiming at reducing the size of the executable for my C project and I have tried all compiler/linker options, which have helped to some extent. My code consists of a lot of separate files. My question was whether combining all source code into a single file will help with optimization that I desire? I read somewhere that a compiler will optimize better if it finds all code in a single file in place of separate multiple files. Is that true?
A compiler can indeed optimize better when it finds needed code in the same compilable (*.c) file. If your program is longer than 1000 lines or so, you'll probably regret putting all the code in one file, because doing so will make your program hard to maintain, but if shorter than 500 lines, you might try the one file, and see if it does not help.
The crucial consideration is how often code in one compilable file calls or otherwise uses objects (including functions) defined in another. If there are few transfers of control across this boundary, then erasing the boundary will not help performance appreciably. Therefore, when coding for performance, the key is to put tightly related code in the same file.
I like your question a great deal. It is the right kind of question to ask, in my view; and, though the complete answer is not simple enough to treat fully in a Stackexchange answer, your pursuit of the answer will teach you much. Though you may not yet realize it, your question really regards linking, a subject every advancing programmer eventually has to learn. Your question regards symbol tables, inlining, the in-place construction of return values and several, other, subtle factors.
At any rate, if your program is shorter than 500 lines or so, then you have little to lose by trying the single-file approach. If longer than 1000 lines, then a single file is not recommended.
It depends on the compiler. The Intel C++ Composer XE for example can automatically optimize over multiple files (when building using icc -fast *.c *.cpp or icl /fast *.c *.cpp, for linux/windows respectively).
When you use Microsoft Visual Studio, or a derived product (like Atmel Studio for microcontrollers), every single source file is compiled on its own (i. e. one cl, icl, or gcc command is issued for every c and cpp file in the project). This means no optimization.
For microcontroller projects I sometimes have to put everything in a single file in order make it even fit in the limited flash memory on the controller. If your compiler/IDE does it like visual studio, you can use a trick: Select all the source files and make them not participate in the build process (but leave them in the project), then create a file (I always use whole_program.c, and #include every single source (i.e. non-header) file in it (note that including c files is frowned upon by many high level programmers, but sometimes, you have to do it the dirty way, and with microcontrollers, that's actually more often than not).
My experience has been that with gnu/gcc the optimization is within the single file plus includes to create a single object. With clang/llvm it is quite easy and I recommend, DO NOT optimize the clang step, use clang to get from C to bytecode, the use llvm-link to link all of your bytecode modules into one bytecode module, then you can optimize the whole project, all source files optimized together, the llc adds more optimization as it heads for the target. Your best results are to tell clang using the something triple command line option what your ultimate target is. For the gnu path to do the same thing either use includes to make one big file compiled to one object, or if there is a machine code level optimizer other than a few things the linker does, then that is where it would have to happen. maybe gnu has an exposed ir file format, optimizer, and ir to target tool, but I think I would have seen that by now.
http://github.com/dwelch67 a number of my projects, although very simple programs, have llvm and gnu builds for the same source files, you can see where the llvm builds I make a binary from unoptimized bytecode and also optimized bytecode (llvm's optimizer has problems with small while loops and sometimes generates non-working code, a very quick check to see if it is you or them is to try the non-optimized llvm binary and the gnu binary to see if they all behave the same (you) or if only the optimized llvm doesnt work (them)).

hidden routines linked in c program

Hullo,
When one disasembly some win32 exe prog compiled by c compiler it
shows that some compilers links some 'hidden' routines in it -
i think even if c program is an empty one and has a 5 bytes or so.
I understand that such 5 bytes is enveloped in PE .exe format but
why to put some routines - it seem not necessary for me and even
somewhat annoys me. What is that? Can it be omitted? As i understand
c program (not speaking about c++ right now which i know has some
initial routines) should not need such complementary hidden functions..
Much tnx for answer, maybe even some extended info link, cause this
topic interests me much
//edit
ok here it is some disasembly Ive done way back then
(digital mars and old borland commandline (i have tested also)
both make much more code, (and Im specialli interested in bcc32)
but they do not include readable names/symbols in such dissassembly
so i will not post them here
thesse are somewhat readable - but i am not experienced in understending
what it is ;-)
https://dl.dropbox.com/u/42887985/prog_devcpp.htm
https://dl.dropbox.com/u/42887985/prog_lcc.htm
https://dl.dropbox.com/u/42887985/prog_mingw.htm
https://dl.dropbox.com/u/42887985/prog_pelles.htm
some explanatory comments whats that heere?
(I am afraid maybe there is some c++ sh*t here, I am
interested in pure c addons not c++ though,
but too tired now to assure that it was compiled in c
mode, extension of compiled empty-main prog was c
so I was thinking it will be output in c not c++)
tnx for longer explanations what it is
Since your win32 exe file is a dynamically linked object file, it will contain the necessary data needed by the dynamic linker to do its job, such as names of libraries to link to, and symbols that need resolving.
Even a program with an empty main() will link with the c-runtime and kernel32.dll libraries (and probably others? - a while since I last did Win32 dev).
You should also be aware that main() is only the entry point of your program - quite a bit has already gone on before this point such as retrieving and tokening the command-line, setting up the locale, creating stderr, stdin, and stdout and setting up the other mechanism required by the c-runtime library such a at_exit(). Similarly, when your main() returns, the runtime does some clean-up - and at the very least needs to call the kernel to tell it that you're done.
As to whether it's necessary? Yes, unless you fancy writing your own program prologue and epilogue each time. There are probably are ways of writing minimal, statically linked applications if you're sufficiently masochistic.
As for storage overhead, why are you getting so worked up? It's not enough to worry about.
There are several initialization functions that load whenever you run a program on Windows. These functions, among other things, call the main() function that you write - which is why you need either a main() or WinMain() function for your program to run. I'm not aware of other included functions though. Do you have some disassembly to show?
You don't have much detail to go on but I think most of what you're seeing is probably the routines of the specific C runtime library that your compiler works with.
For instance there will be code enabling it to run from the entry point 'main' which portable executable format understands to call the main(char ** args) that you wrote in your C program.

How do i compile a c program without all the bloat?

I'm trying to learn x86. I thought this would be quite easy to start with - i'll just compile a very small program basically containing nothing and see what the compiler gives me. The problem is that it gives me a ton of bloat. (This program cannot be run in dos-mode and so on) 25KB file containing an empty main() calling one empty function.
How do I compile my code without all this bloat? (and why is it there in the first place?)
Executable formats contain a bit more than just the raw machine code for the CPU to execute. If you want that then the only option is (I think) a DOS .com file which essentially is just a bunch of code loaded into a page and then jumped into. Some software (e.g. Volkov commander) made clever use of that format to deliver quite much in very little executable code.
Anyway, the PE format which Windows uses contains a few things that are specially laid out:
A DOS stub saying "This program cannot be run in DOS mode" which is what you stumbled over
several sections containing things like program code, global variables, etc. that are each handled differently by the executable loader in the operating system
some other things, like import tables
You may not need some of those, but a compiler usually doesn't know you're trying to create a tiny executable. Usually nowadays the overhead is negligible.
There is an article out there that strives to create the tiniest possible PE file, though.
You might get better result by digging up older compilers. If you want binaries that are very bare to the bone COM files are really that, so if you get hold of an old compiler that has support for generating COM binaries instead of EXE you should be set. There is a long list of free compilers at http://www.thefreecountry.com/compilers/cpp.shtml, I assume that Borland's Turbo C would be a good starting point.
The bloated module could be the loader (operating system required interface) attached by linker. Try adding a module with only something like:
void foo(){}
and see the disassembly (I assume that's the format the compiler 'gives you'). Of course the details vary much from operating systems and compilers. There are so many!

Resources