C code that checksums itself *in ram* - c

I'm trying to get a ram-resident image to checksum itself, which is proving easier said than done.
The code is first compiled on a cross development platform, generating an .elf output. A utility is used to strip out the binary image, and that image is burned to flash on the target platform, along with the image size. When the target is started, it copies the binary to the correct region of ram, and jumps to it. The utility also computes a checksum of all the words in the elf that are destined for ram, and that too is burned into the flash. So my image theoretically could checksum its own ram resident image using the a-priori start address and the size saved in flash, and compare to the sum saved in flash.
That's the theory anyway. The problem is that once the image begins executing, there is change in the .data section as variables are modified. By the time the sum is done, the image that has been summed is no longer the image for which the utility calculated the sum.
I've eliminated change due to variables defined by my application, by moving the checksum routine ahead of all other initializations in the app (which makes sense b/c why run any of it if an integrity check fails, right?), but the killer is the C run time itself. It appears that there are some items relating to malloc and pointer casting and other things that are altered before main() is even entered.
Is the entire idea of self-checksumming C code lame? If there was a way to force app and CRT .data into different sections, I could avoid the CRT thrash, but one might argue that if the goal is to integrity check the image before executing (most of) it, that initialized CRT data should be part of that. Is there a way to make code checksum itself in RAM like this at all?
FWIW, I seem stuck with a requirement for this. Personally I'd have thought that the way to go is to checksum the binary in the flash, before transfer to ram, and trust the loader and the ram. Paranoia has to end somewhere right?
Misc details: tool chain is GNU, image contains .text, .rodata and .data as one contiguously loaded chunk. There is no OS, this is bare metal embedded. Primary loader essentially memcpy's my binary into ram, at a predetermined address. No relocations occur. VM is not used. Checksum only needs testing once at init only.
updated
Found that by doing this..
__attribute__((constructor)) void sumItUp(void) {
// sum it up
// leave result where it can be found
}
.. that I can get the sum done before almost everything except the initialization of the malloc/sbrk vars by the CRT init, and some vars owned by "impure.o" and "locale.o". Now, the malloc/sbrk value is something I know from the project linker script. If impure.o and locale.o could be mitigated, might be in business.
update
Since I can control the entry point (by what's stated in flash for the primary loader), it seems the best angle of attack now is to use a piece of custom assembler code to set up stack and sdata pointers, call the checksum routine, and then branch into the "normal" _start code.

If the checksum is done EARLY enough, you could use ONLY stack variables, and not write to any data-section variables - that is, make EVERYTHING you need to perform the checksumming [and all preceding steps to get to that point] ONLY use local variables for storing things in [you can read global data of course].
I'm fairly convinced that the right way is to trust the flash & loader to load what is in the flash. If you want to checksum the code, sure, go and do that [assuming it's not being modified by the loader of course - for example runtime loading of shared libraries or relocation of the executable itself, such as random virtual address spaces and such]. But the data loaded from flash can't be relied upon once execution starts properly.
If there is a requirement from someone else that you should do this, then please explain to them that this is not feasible to implement, and that "the requirement, as it stands" is "broken".

I'd suggest approaching this like an executable packer, like upx.
There are several things in the other answers and in your question that, for lack of a better term, make me nervous.
I would not trust the loader or anything in flash that wasn't forced on me.
There is source code floating around on the net that was used to secure one of, I think, HTCs recent phones. Look around on forum.xda-developers.com and see if you can find it and use it for an example.
I would push back on this requirement. Cellphone manufacturers spend a lot of time on keeping their images locked down and, eventually, all of them are beaten. This seems like a vicious circle.

Can you use the linker script to place impure.o and locale.o before or after everything else, allowing you to checksum everything but those and the malloc/sbrk stuff? I'm guessing malloc and sbrk are called in the bootloader that loads your application, so the thrash caused by those cannot be eliminated?
It's not an answer to just tell you to fight this requirement, but I agree that this seems to be over-thought. I'm sure you can't go into any amount of detail, but I'm assuming the spec authors are concerned about malicious users/hackers, rather than regular memory corruption due to cosmic rays, etc. In this case, if a malicious user/hacker can change what's loaded into RAM, they can just change your checksumming routine (which is itself running from RAM, correct?) to always return a happy status, no matter how well the checksum routine they aren't running anymore is designed.
Even if they are concerned about regular memory corruption, this checksum routine would only catch that if the error occurred during the original copy to memory, which is actually the least likely time such an error would occur, simply because the system hasn't been running long enough to have a high probability of a corruption event.

In general, what you want to do is impossible, since on many (most?) platforms the program loader may "relocate" some program address constants.

Can you update the loader to perform the checksum test on the flash resident binary image, before it is copied to ram?

Related

Is is possible to have a linker generate a binary, that is as close to the previously generated one as possible?

Here is the context: I work with microcontrollers, where the resulting binary is written to an internal flash, and executed from there. The flash is erased and written in 4KiB blocks. The flasher is smart enough to skip over blocks that do not need to change.
During development, it is typical to make minor changes and recompile and re-flash dozens of times a day.
What I want to achieve is, if possible, have the linker keep the existing structure as much as possible. For example, if I remove an if-check somewhere, the code gets a few bytes smaller, and all subsequent information becomes shifted by a few bytes, hence the entire flash gets erased and reprogrammed.
If, somehow, the linker was able to add padding between the linked symbols, a big are of the flash could often be skipped without overwriting. This will obviously not work when the code gets bigger.
Are there any gcc options that can help with automating such tasks (without handcrafting massive linker scripts)? Are there any compilers that have at least partial support for such behaviours?

How to stop RAM test destroying its own variables embedded C?

During startup tests I am required to test all RAM locations using a galpat test, I have wrote the function to do this but run into a problem that the functions variables exist in RAM and therefore get mashed as part of the test.
What would be the best way around this?
A possible approach could be - especially taking care of the processor stack, .data and .bss is something you can avoid - but there is no easy way to have C work without a proper stack:
Write your code that it exclusively uses ROM code and stack variables.
Have your startup code (that would normally be written in assembler anyhow) allocate the stack in the upper half of memory, test the lower half
move (copy) the stack from upper memory to already tested areas (can be done in C)
Reset the stack pointer to point to the copied stack (involves assembler coding)
Do the rest of the memtest (can be done in C again)
(This assumes your code runs from ROM, which it would normally do at such early point in the start-up). In case there is any memory failure in areas where you allocate the stack, your code will simply crash (What it does before that - Reactor meltdown... - depends on the application).
When moving the stack around outside C's control, you should be very careful what you actually store there - Pointers to stack variables will become invalid - or rather undefined - once you've moved the stack. Simple scalar variables and pointers to outside the stack as typically used in a RAM test should work, however.
You could try and declare your variables as register to try and keep RAM usage as low as possible - But you can't force C to put certain variables into registers, and a good C compiler will put them there anyhow.
Whether this is any better than writing the whole memtest in assembler (you'd need to do the stack adjustments in assembler anyhow, as there is no means to move the processor stack around in C) I dare to challenge. I don't see much of a point here using C on this low level, especially as assembler could run a memtest routine completely from registers, without using any RAM. This makes it much more immune to any RAM problem. A RAM testing routine shouldn't rely on working memory.
The bottom line is you cannot, you have two choices, you can either only test part of the ram or part of the ram at a time which means you are not doing a complete address test. Or you dont run from the ram you are testing, which is basically the rule if you really want to test the ram. So you have to run from rom with out using a stack in the ram under test or you use another ram, perhaps there is a cache somewhere that can be used direct access to give you a little ram.
Testing half or some other fraction at a time, which is not a complete test, but better than not testing part of it at all, can be done with either a position independent module or with multiple compiles of the test that are position dependent. No reason for the stack to be an issue, the rom based code copying and jumping to the code under test can set the stack pointer based on the fraction under test or not under test, and then repeat. Treat the module like a function not like an entire program and the preserving the stack or "moving the test" problem goes away it returns back to the rom based code which can relaunch further tests.
One crazy way to do it would be to attempt to turn on the I cache get the test code into the cache (before it hits itself), and then blast away at the ram including the code behind the ram. (cant have stack) I would only try this as a fun experiment but not for anything real. Lots of problems to solve with an approach like this.
My approach would be this:
Do the test before anything is initialized (variables, stack). In gcc you can: void RAM_test(void) _attribute_ ((section (".init0")));
Write it in assembly. Ensure that the test does not use/store any variables in the RAM, only uses processor registers.
Store the result somewhere so you can use it later in normal program.
If you do have ROM space and can afford the time at boot time, I would implement the test in assembly or in a naked C function using directives to put your variables in registers so that no RAM is consumed as part of the test. This is going to be pretty architecture and compiler specific, though, and neither of those were mentioned.

Need to migrate my firmware image to ROM mask

I need to create a ROM mask that provides some functions. However, it should be possible to overwrite the functions for providing firmware patches. Therefore, a patch table should be located in a Flash memory that may be overwritten by later firmware upgrades, whereas the main part of the firmware is located in mask ROM and cannot be modified later.
Does anyone have any idea how this is done? what is the best practice to create patch tables?
Patch table - Basically your ROM has one more built in jumps to a RAM (Flash) adddresses built into it. The flash always has some sort of jump back to the loction just after your ROM jump call to return to your original code. You now have the ability to change your program's behavior from RAM. This assumes you are allowed to run code from RAM of course. If not, then only data tables can be changed on the fly.
Now, just having a single jump on startup allows you to modify starting state code like version numbers or other global/constant data, but that may not be enough. You can add another jump to flash that is run every "so often" to your code as well so that it can update program state after it is running - something like once a vblank or once a application loop.
The above should give you enough lee-way to change data that has changed over time since release or perhaps to fix a slight logic error, but it won't allow you to whole-sale change functions. To do that, you need more code. And that more code depends on just how much RAM you can use.
For instance, if you had enough RAM, and running from Flash RAM didn't hurt perforamnce too much (and is allowed), you could make things very flexible by copying all your key ROM functions into RAM on startup, then calling your RAM patch jump described above, which would allow new code stored elsewhere in your RAM patch to overwrite any previously copied code. If you took this approach, you would also want to make sure that you left some extra space around your original functions to allow new padded functions a bit of room to grow. The original code to be copied to RAM could be stored compressed as well to save ROM space. This allows you to change anything after the fact. It also introduces a very easy way to exploit your code if something else is allowed to write to RAM so beware.
Hope this helps.

RAM Checksum in C language

I need to check the RAM of a MCU at startup using a checkerboard-like algorithm. I do not want to lose whatever data is already in RAM, and also i do not know how not to affect the variables im using to perform this algorithm.
I was thinking something like:
for (position=0; position< 4096; position++)
{
*Temporal = 0x5555;
if(*Temporal != 0x5555) Error = TRUE;
*Temporal = 0xAAAA;
if(*Temporal != 0xAAAA) Error= TRUE;
Temporal +=1;
}
should i modify the linker to know where Temporal and Error are being placed?
Use the modifier "register" when declaring your pointers (as in "register int *"). Registers are a special segment of memory inside the processor core that (usually) does not count as part of RAM, so any changes to them don't count as RAM read/writes. Some exceptions exist; for instance, in AVR microcontroles, the first 32 bytes of RAM are "faked" into registers.
Your question probably got a downvote for the lack of clarity, and the question about retaining whatever was in RAM being easily solved by most beginner C programmers (just copy the contents of the pointer into a temp variable before testing it, and copying back after the end of the test). Also, you're not performing a Checksum: you're just making a Memory Test. These are very different, separate things.
You need to ensure the memory that you are testing is in a different location than the memory containing the program you are running. You may be able to do this by running directly out of flash, or whatever other permanent storage you have connected. If not, you need to do something with the link map to ensure proper memory segmentation.
Inside the test function, ussing register as MVittiS suggests is a good idea. Another alternate is to use a global variable mapped to a different segment than the one under test.
You may want to read this article about memory testing to understand the limitations of the test you propose, how memory can fail, and what you should be testing for.
I don't know (not enough details) but your memory test may be a cache test and might not test any RAM at all.
Your memory test (if it does test memory) is also very badly designed. For example, you could chop (or short out) all of the address lines and all data lines except for the least significant 2 data lines, and even though there'd be many extremely serious faults the test will still pass.
My advice would be to read something like this web page about memory testing, just to get some ideas and background information: http://www.esacademy.com/en/library/technical-articles-and-documents/miscellaneous/software-based-memory-testing.html
In general, for non-destructive tests you'd copy whatever you want to keep somewhere else, then do the test, then copy the data back. It's very important that "somewhere else" is tested first (you don't want to copy everything into faulty RAM and then copy it back).
I would also recommend using assembly (instead of C) to ensure no unwanted memory accesses are made; including your stack.
If your code is in RAM that needs to be tested then you will probably need 2 copies of your RAM testing code. You'd use the second copy of your code when you're testing the RAM that contains the first copy of your code. If your code is in ROM then it's much easier (but you still need to worry about your stack).

What is the deal with position-independent code (PIC)?

Could somebody explain why I should be interested in compiling position-independent code, and also why should I avoid it?
Making code position-independent adds a layer of abstraction, which requires an additional lookup step at runtime for certain operations (usually pertaining to accessing variables with static storage).
So if you don't need it, don't use it!
There are specific situations where you must produce PIC (namely when creating run-time loadable code, such as a plug-in module or library), but the added flexibility comes at a price.
The gory details depend on how your loader works on on whether you are building a executable or a library, but there is a sense in which this is all a problem for the build system and the compiler, not for you.
If you really want to understand you need to consider where the code gets put in the address space before execution starts and what set of branching instructions your chip provides. Are branches relative or absolute? Is access to the data segment relative or absolute?
If branches are absolute, then the code must be loaded to a reliable address or it won't work. That's position dependent code. Many simple (or older) operating systems accommodate this by always loading a program to the same place.
Relative branches mean that the can be placed at any location in memory. That is position independent code.
Again, all you need to know is the recipe for invoking your compiler and linker on your platform.
PIC code usually has to be slightly larger because the compiler can't use instructions that encode relative address offsets. Without PIC, many addresses can be encoded with 16 or 8 bits relative to current PC. Sometimes in embedded systems, PIC is useful. For example if you want to have patch code that can run at various physical addresses.

Resources