Can't extract machine code from Cortex-M3 firmware - arm

I want to extract machine code from XBee DigiMesh firmware (Cortex-M3, EM357), so I have SREC file with 3 sections inside. I suppose that one of these sections is a code section, but arm-none-eabi-objdump reports "unknown instruction" very often. Does anyone know why this happens?
This is how I try to do this:
arm-none-eabi-objcopy --input-target=srec --output-target=binary -j .sec2 xbp24-dm_8073.ehx2.dec sec2.bin
arm-none-eabi-objdump -D -bbinary -marm -Mforce-thumb sec2.bin
Firmware: http://tmp.nazaryev.ru/xbp24-dm_8073.ehx2.dec
EM357 datasheet: https://www.silabs.com/documents/public/data-sheets/EM35x.pdf
Update: there is answer at https://reverseengineering.stackexchange.com/questions/15049/cant-extract-machine-code-from-cortex-m3-firmware

The cortex-m3 supports thumb2, which is variable instruction length, you cannot simply start at the beginning and disassemble a variable length instruction set you can/will easily get out of sync and the output turns to garbage and remain that way forever. Likely not in this case, but you will have errors, this is expected. it is quite possible they added items to confuse a disassembler as well.
Also you could have a lot of data here, or it could be compressed code, or who knows what else...
Being a cortex-m the first chunk of words are not a vector table, so what is this code. If you examine the output in the way you have used the gnu tools, this does not look like real code, so perhaps encrypted or compressed or other. Do you see stack frames being built and functions ending with a pop (ldmia) that contains the link register with some static words after that then what looks like the beginning of the next function? Granted it might not all be compiled code, but some of it should look like compiled code.
if you are trying to hack some firmware you should perhaps figure out how/where this is loaded and create/use an instruction set simulator. without a vector table to start with though, have fun.
the way you did your disassembly you lost the address content
S123110001BE8110204D5401BE96102452...
0x1100 : 0x01BE
0x1102 : 0x8110
0x1104 : 0x204D
0x1108 : 0x5401
0x110C : 0xBE96
0x110E : 0x1024
and so on, that or probably byteswapped
0x1100 : 0xBE01
0x1102 : 0x1081
You could take that srec and create a program:
.hword 0xbe01
.hword 0x1081
one for each contiguous address range of items, when the address jumps start a new file, create a linker script to cover the start addresses of each section, assemble, make a _start label in one of them, link with the script then you have an elf you can disassemble, I still expect it to be problematic, but some of the relative addressing would make sense and or absolute addressing.
Also note your entry point
S903189351
assuming that is real, no reason to assume that it is.

Related

How to get program's version number through 10-pin ISP-connector with avrdude?

I'm making a printed circuit board and within the board there is a 10-pin ISP connector attached to the AVR microcontroller. So I can flash new code to the AVR.
I have several of these same boards, but with different code in it. I don't want to flash it every time I need to know what's inside that particular AVR. I just want to extract the version number of that code somehow through avrdude.
What's the easiest way to do this?
If you can read the content of the flash via a connector, you can have your build process compile and link that flash content in a well defined format which contains the version number, or some other version information.
Then you can read using avrdude and write your own program to parse (part of) the content of the flash, and then maybe script that. This helps avoid cases where your EEPROM contains one version, but the Flash already contains a new one, or vice versa.
You get bonus points for
making the location quick to find without reading lots of the flash
using a CRC or similar checksum guarding against accidental bytes looking the same as your version information
storing more detailed information than just "1.2.3|main|dirty"
You need to extend your existing build system to do all the necessary steps. Those might involve linker scripts, linker sections, C or assembly language sources, and other things.
Going on my AVR experience using an avr-libc and avr-gcc based toolchain, I would start by adding something like
jmp skip_version_number
.global version_number
.type version_number, #object
.balign 8
version_number:
.quad 0xdeadcafef00d0001
.asciz VERSION_STRING
.asciz GIT_BRANCH_NAME
.asciz GIT_DIRTY_FLAG
.asciz ""
.size version_number, . - version_number
skip_version_number:
to one of the .initN sections:
The .initN sections are linked relatively close to the front of the image, so it should not be necessary to read a lot of the flash
It contains a 64bit magical number aligned to an 8 byte boundary, so the version structure should be easy to find.
The magical number can be easily changed to allow for a different number for a different format when I eventually want to change the format by adding e.g. a checksum which this does not contain yet.
Parsing a list of NUL terminated C strings until an empty string appears should be relatively easy to write and still flexible.
Then verify the result from the disassembly dump and the map file.
Adding the checksum might involve generating a binary file for the payload at build time, then calculate that file's checksum as a separate binary file, and replacing the above content definitions with
.balign 8
version_number:
.incbin "version_number.bin"
.balign 8
.incbin "version_number_checksum.bin"
.size version_number, . - version_number
This can be made arbitrarily complex, but I hope this gives a few ideas.
Thanks Juraj for the suggestion.
Write to eeprom with something like this get the job done.
write_version:
avrdude -U eeprom:w:0x76,0x31,0x2e,0x31,0x32,0x33:m
read_version:
avrdude -U eeprom:r:-

Intel hex format and position independent code using gcc

I'm not sure if this is specific to the processor I'm using, so for what it's worth I'm using a Cortex M0+. I was wondering: if I generate a hex file through gcc using -fPIC, I produce...Position Independent Code. However, the intel hex file format that I get out of objcopy always has address information on each line's header. If I'm trying to write a bootloader, do I just ignore that information, skip the bytes relating to it, and load the actual code into memory wherever I want, or do I have to keep track of it somehow?
The intel-HEX format was specially designed to programm PROMs, EPROMS or processors with an internal EPROM and is normally used with programmers for theses devices. The addresses at the beginning of the records have not much to do with the program code directly. They indicate at which address of the PROM the data will be written. Remember also that the PROM can be mapped anywhere into the address space of the processor, thus the final address can change anyway.
As long as you don't want to program a PROM you must remove anything except the data from the records. (Don't forget the checksum at the end ;-)
As I understand the intel-HEX format the records must not be contiguous, there may be holes in between.
Some remarks:
The -f PIC parameter is not responsible for the intel-HEX format. I think that somewhere in your command lines you'll find -O ihex. If you want to have a file that could be executed, objcopy provides better suited output formats.
As long as you don't write earlier stages of the boot process by yourself, you don't load your bootloader - it will be loaded for you. The address at which this will happen is normally fixed and not changeable. So there is no need for position independent code, but it doesn't hurt either.

writing hex file to RAM in ARM Cortex-M

I am doing an ongoing project to write a simplified OS for hobby/learning purposes. I can generate hex files, and now I want to write a script on the chip to accept them via serial, load them into RAM, then execute them. For simplicity I'm writing in assembly so that all of the startup code is up to me. Where do I start here? I know that the hex file format is well documented, but is it as simple as reading the headers for each line, aligning the addresses, then putting the data into RAM and jumping to the address? It sounds like I need a lot more than that, but this is a problem that most people don't even try to solve. Any help would be great.
way too vague, there are many different file formats and at least two really popular ones that use text with the data in hex. So not really helping us here.
writing a script on chip means you have an operating system running on your microcontroller? what operating system is it and what does the command line look like, etc.
assembly is not required to completely control everything (basically baremetal) can use asm to bootstrap C and then the rest in C, not a problem.
Are you wanting to download to ram and run or wanting to download and then burn to flash to reset into in some way?
Basically what you are making is a bootloader. And yes we write bootloaders all the time, one for each platform, sometimes borrowing code from a prior platform sometimes not.
First off on your development computer, windows, mac, linux, whatever, write a program (in C or Pascal ideally, something you can easily port to the microcontroller) that reads the whole file into an array, then write some code that basically accepts one byte at a time like you would if you were receiving it serially. Parse through that file format whatever format you choose (initially, then perhaps change formats if you decide you no longer like it) take real programs that you have built which the disassembler or other tools should have other output options to show you what bytes or words should be landing at what addresses. Parse this data, printf out the address/byte or address/word items you find, and then compare that to what the toolchain showed. carve the parsing tool out and replace the printf with write to memory at that address. and then yes you jump to the entry point if you can figure that out and/or you as the designer decide all programs must have a specific entry point.
Intel hex and motorola s-record are good choices (-O ihex or -O srec, my current favorite is --srec-forceS3 -O srec), both are documented at wikipedia. and at least the gnu tools will produce both, you can use a dumb terminal program (minicom) to dump the file into your microcontroller and hopefully parse and write to ram as it comes in. If you cant handle that flow you might think of a raw binary (-O binary) and implement an xmodem receiver in your bootloader.

Calculating size of code part of the C file

With reference to this:
calculating FLASH utilisation by C code
I have decided to check the calculations of actual assembly instructions.
so my script counts the assembly instructions, lies in the assembly listing file of the feature enable code.
e.g.
if(TRUE == feature1_enable)
{
// instruction counting starts here
doSomething;
.
.
.
// instruction counting stops here
}
This gives me some counts x from which I can figure out the size of the code.
To cross check the result I decided to nm the object file of the feature code but nm gives the size of entire function and not the individual statements.
So I copied the code part for that feature in separate file, made the function of it, included necessary headers and declared variables to get this file compile (by taking care of locals would remain locals and globals would remain globals).
so the new file looks like this:
#include "header1.h"
#include "header2.h"
global_variables;
void checkSize( void )
{
local_variables;
// feature1_enable code
doSomething;
.
.
.
}
Now the function checkSize contains only the feature enable code so after compiling, if I nm the obj file, I should be able to get almost same result as assembly counts (apart from some extra size utilized by the function setup).
But that not the case, I have received huge difference. (1335 bytes in the case of assembly instructions and 1458 bytes in the case of nm of obj file).
To get the further clarification, I have created assembly of the file with function checkSize and compared with original assembly file.
I understand there is some extra stuff due to the addition of checkSize function but instructions of the feature enable code expected to be same (with the same compiler optimization and other options).
But they were not the same.
Now the question is why is there such difference in the assembly instructions for feature code inside big function and when I move it to the other file with the feature code alone.
Is there anything to predict the extra size in either case?
There could be several things happening here. To be sure you are going to have to read the actual assembly code and figure out what it is doing. The compiler is VERY clever when you have it set to a high optimization level. For example in your first code segment it is very possible for the compiler to have assembly statements out side of your
// instruction counting starts here
// instruction counting stops here
comments that perform work in between the comments. In your second example that optimization is not possible and all work needs to be done in the function. Also do not discount the amount of space the prolog and epilog of functions take. Depending on the instruction set of your processor and its stack and register usage it can be quite large. For example on Power PC there is no push many registers instruction and you have to push each individual register and pop each individual register off of the stack frame when enter and leaving a function. When you're dealing with 32 registers that can be quite a bit of code.
You could try a trick when you have high optimization levels set for you compiler. The compiler cannot optimize across "asm" statements as it does not know what happens in them. What you could do is put some dummy code in the "asm" statements. I personally like creating global symbols that are in the object file. That way I can get the address of the starting symbol and ending symbol and calculate the size of code in between. It looks something like this...
asm(" .globl sizeCalc_start");
asm(" sizeCalc_start: ");
// some code
asm(" .globl sizeCalc_end");
asm(" sizeCalc_end:");
Then you can do something in a function like
extern int sizeCalc_start;
extern int sizeCalc_end;
printf("Code Segment Size %d\r\n", &sizeCalc_end - &sizeCalc_start);
I've done this in the past and it worked. Have not tried to compile this so dunno you may need to mess around with it a bit to get what you want.
Optimization is tricky. Within a big function (and the big file) the compiler has wider context, and may optimize more aggressively - reuse common expressions, pick shorter forms of branches, etc (hard to say exactly without knowing your target architecture).
PS: I am not quite sure how do you go from assembly count to the byte count.

Automating linker configuration in IAR Embedded Workbench

I am working on a firmware project in which i have to do a crc16 check for flash integrity.
The crc is calculated using IAR Xlink linker and kept at the end of the flash. Again crc is calculated at run time from the code and compared with the stored value in the flash to check integrity. However, we can only calculate crc on the code segment of the flash memory. It's size may change whenever we make some changes in the code. Can i automate this process which i am manually doing right now?
from the .xcl linker file:
// ---------------------------------------------------------
// CRC16 Essentials: -H for fill,-J for checksum calculation
// ---------------------------------------------------------
-HFF
-J2,crc16,,,CHECKSUM2,2=(CODE)5C00-FF7F;(CODE)10000-0x20A13
Here i need to change the end value of second code segment which is 0x20A13 right now.
I get this value from the .map file, i.e on how much memory range my code is residing inside the flash.
This is the 1st change i make.
Here i need to make 2nd change from code:
sum = fast_crc16(sum, 0x5C00, 0xFF7F-0x5C00+1);
sum = fast_crc16(sum, 0x10000,0x20A13-0x10000+1);
//Check the crc16 values
if(sum != __checksum)
{
// Action to be taken if checksum doesn't match
}
Please help automating this process!!
You can try to use the __segment_begin and __segment_size or __segment_end intrinsics in IAR which are explained in the "C/C++ Compiler Reference Guide", which you can get to from your Help menu in IAR EW430. The manual says they work with segments defined in the linker file, and plenty of the people around the internet seem to be using it like that, but I tried and got compiler errors (IAR EW430 5.40.7). If that is somehow broken you might want to report it to IAR and get a fix (assuming you have a support contract).
You can use them like this:
sum = fast_crc16(sum, __segment_begin("CODE"), __segment_size("CODE"));
I don't know what happens with split segments. But why would you exclude your reset vectors from your checksum calculation? You could just go from the start of CODE to the end and include the reset vectors.
I guess you could structure your code like this:
sum = fast_crc16(sum, __segment_begin("CODE"), (char *)__segment_begin("INTVEC") - (char *)__segment_begin("CODE") + 1);
sum = fast_crc16(sum, 0x10000, (char *)__segment_end("CODE") - 0x10000);
Also, you may or may not have noticed that the __checksum variable is put into memory wherever it fits. I found it lurking after my DATA16_ID segment, which put it right in the middle of the range of my checksum code, and I did not know of a way to automate skipping sections of memory for the checksum calculation. What I did was forced __checksum to the first two bytes in flash by defining a segment for those first two bytes and putting it in there.
Edit: Missed the first change. If you are manually adjusting the range of the IAR linker checksum routine then to be able to use the segment intrinsics from the compiler your would need to define a custom segment that uses the end of your code in your linker.
I don't know if there's any way to automate that. You might need to compile your code twice (ugh) once with the segment unlimited to get the end of the code, then use a script to extract the end of code and then update a linker script. You could probably run the initial build on a pre-build command line event and just build the IAR project with an unrestricted linker file. But that seems pretty ugly.
Perhaps you can also change your solution to build the crc over the complete flash reserved for the application, not only for the used part.
Then you never need to change your linker file nor your c-code, and even a bootloader could calculating the crc without knowledge about the actual size of the application.

Resources