Robust incremental CRC32 - c

I am implementing a flash integrity check for an STM32F10x MCU-based board. This check needs to process the program flash, one block at a time, i.e. 64-byte block per CRC32 invocation, because it needs to be periodic and interruptible.
Unfortunately, CRC hardware in STM32F10x doesn't allow configuring the initial checksum for the CRC32, so effectively, for each block CRC32 algorithm starts with 0xFFFFFFFF.
I am looking for a way to "merge" these checksums to provide some signature that would be more robust than a trivial XOR of all the checksums. This XOR solution has a problem with swapping of the order of blocks and, of course, the traditional "even number of errors cancels itself" kind of problem.
What would be a better way of integrity checking of blocks sequence in this case? Would the following pseudo-code be "good enough":
checksum = 0xffffffff;
for (idx=0; idx<size_in_blocks; idx++)
{
CRC_ResetDR();
checksum = (checksum >> 8) ^ CRC_CalcBlockCRC(blocks[idx], block_size)
}
We can't calculate the whole flash CRC32 in one go, as it takes a too long time, and the CRC HW block is used by other entities in the OS as well, i.e. can be reset in the middle.
NOTE: there is no need to reconstruct the exact CRC32 as if it would be a one single CRC32 input block. If it can be done, this would be even better, but it can be just reliable enough to guarantee a certain lack of flash manipulation.
Edited: Clearly, CRC32 is not a cryptographic hash, so the word "cryptography" was taken off the title.

Related

How does a bootloader know the "expected" CRC value?

Let's say we have some firmware, and a bootloader. When we flash both onto the device, during boot, the bootloader would know some "expected" CRC from the binary firmware image. The bootloader would compare the expected vs. actually calculated CRC value from the binary firmware image. If they're equal, it jumps to the firmware application startup address, and if not, it just stays in the bootloader.
What I'm confused is by how the bootloader would know some "expected" CRC value. How does a discrepancy grow between an incorrect CRC value and an expected one? And where does the "expected" one come from?
I use two methods.
The CRC is stored somewhere in the binary image. Bootloader calculates the CRC of the image and compares with that value. If they match - the image is good and can be executed.
Always the same CRC is used and some additional data is appended to the image to match this CRC. It requires a bit more complicated post-build steps.
Let's say we have some firmware, and a bootloader. When we flash both onto the device, during boot, the bootloader would know some "expected" CRC from the binary firmware image. The bootloader would compare the expected vs. actually calculated CRC value from the binary firmware image. If they're equal, it jumps to the firmware application startup address, and if not, it just stays in the bootloader.
CRCs are calculated following a well known formula, so the boot loader applies that formula to the full boot record to get what is called the actual value (for that, depending on the sofware some programs take of the CRC code comming on the data or not, depending on the CRC algorithm) and it compares with the expected value which is the code that comes in the data.
Other programs store in the CRC field a value derived from the calculated CRC that forces the algorithm to return a fixed (depending on the algorithm, but always the same) value (e.g. zero) This allows simplifying the CRC algorithm that has just to calculate the CRC over the full data, and if it has not been touched (modified) for example, an expected value of zero is expected.
If you are dealing with some established protocol that defines a CRC algorithm to calculate and verify the data in a boot record, you need to look for the documentation on how the CRC is calculated and stored in it. So your expected value will be described there.
Whith respect to CRCs some algorithms initialize the CRC machine in order to distinguish the stream start from a sequence of zeros by initializing (or prepending to the bitstring ---or the initial polynomial---) a fixed string of ones. This is easily implemented by initalizing the shift register with ones and start feeding the block contents to the shift register. Others add a trailing of zeros to xor in that field the calculated CRC (so the total chain of bits should always result in an all zeros expected CRC) You need to consult the firmware provider to see how the boot record CRC is calculated, as most probably the device will refuse to boot until the CRC is not properly set.

Getting CRC-32 over STM32 flash and consistency with other CRC-32 tools

I'm moving my STM32F1xx project to a solution with a bootloader.
Part of this is that I want to be able to compute a CRC value over the existing bootloader and application flash ranges to compare existing and possible upload candidates.
Using a simple implementation on the STM32 which just does the following steps:
Enable CRC periperal
Reset the peripheral CRC value (sets to 0xFFFFFFFF)
Iterate over flash range (in this case 0x08000000 to 0x08020000) passing values to CRC peripheral
Return CRC peripheral output
uint32_t get_crc(void) {
RCC->AHBENR |= RCC_AHBENR_CRCEN;
CRC->CR |= CRC_CR_RESET;
for(uint32_t *n = (uint32_t *)FLASH_BASE; n < (uint32_t *)(FLASH_BANK1_END + 1u); n ++) {
CRC->DR = *n;
}
return CRC->DR;
}
The value I get from this is 0x0deddeb3.
To compare this value with something I am running the .bin file through two tools.
The value I get from npm's crc-32 is 0x776b0ea2
The value I get from a zip file's CRC-32 is also 0x776b0ea2
What could be causing this? Is there a difference between iterating over the entire flash range and the contents of the bin file (smaller than entire flash range)? The polynomial used by the STM32 is 0x04c11db7 which seems to be fairly standard for a CRC-32. Would the zip tool and npm crc-32 be using a different polynomial?
I have also tried iterating over bytes and half-words as well as words on the STM32 in case the other tools used a different input format.
There is a similar question on here already, but I'm hoping to use a node-js solution because that is the platform my interface application is being developed on.
Calculating CRCs is a mine field. Your question already has some points to look at:
Is there a difference between iterating over the entire flash range and the contents of the bin file (smaller than entire flash range)?
Yes, of course.
Would the zip tool and npm crc-32 be using a different polynomial?
The documentation will tell you. And I'm sure that you can use another polynomial with this tools by an option.
Anyway, these are the things to consider when calculating CRCs:
The amount of bytes (words, ...) to "sum up".
The contents of the flash not covered by the binary file, most probably all bits set to 1.
Width of the polynomial (in your case fixed to 32 bits).
Value of the polynomial.
Initial value for the register.
Whether the bits of each byte are reflected before being processed.
Whether the algorithm feeds input bytes through the register or xors them with a byte from one end and then straight into the table.
Whether the final register value should be reversed (as in reflected versions).
Value to XOR with the final register value.
The points 3 to the last are shamelessly copied from "A PAINLESS GUIDE TO CRC ERROR DETECTION ALGORITHMS" that I suggest to read.
The polynomial is only one of several parameters that define a CRC. In this case the CRC is not reflected, whereas the standard zip CRC, using the same polynomial is reflected. Also that zip CRC is exclusive-or'ed with 0xffffffff at the end, whereas yours isn't. They do both get initialized the same, which is with 0xffffffff.

Is CRC32 really so bad for file integrity check?

Of course that MD5 is better then CRC32, SHA1 is better then MD5 and so on... But also they are also much slower then CRC32.
Right know, I am thinking about how to check consistency of being transfered file and CRC32 is fastest option.
I haven't found anywhere how bad is CRC32 for integrity checks (maybe in other words how is probably that CRC32 will not detect malformed file)?
Quoting from http://www.mathpages.com/home/kmath458.htm :
So, if we assume that any corruption of our data affects our string
in a completely random way, i.e., such that the corrupted string is
totally uncorrelated with the original string, then the probability
of a corrupted string going undetected is 1/(2^n). This is the basis
on which people say a 16-bit CRC has a probability of 1/(2^16) =
1.5E-5 of failing to detect an error in the data, and a 32-bit CRC has a probability of 1/(2^32), which is about
2.3E-10 (less than one in a billion).
My opinion: CRC-32 is more than enough for error detection. It is being used widely. However, it is not secure when you want to use it as a "hash function".
Collisions (same hash output but different data) can occur easily using CRC-32 because CRC-32 use only 32bits compare to other algorithms ex. MD5 is 128-bits, SHA-1 is 160-bits, SHA-2 (SHA256/512 series) is 224bits-512bits. (depend on what you use). Also, for SHA-2 series no collision has been found.
For more info about mathematics and probability that would cause your data a collision. Please look for Hash Collision and Birthday paradox problem

Checksum with low probability of false negative

At this moment I'm using a simple checksum scheme, that just adds the words in a buffer. Firstly, my question is what is the probability of a false negative, that is, the receiving system calculating the same checksum as the sending system even when the data is different (corrupted).
Secondly, how can I reduce the probability of false negatives? What is the best checksuming scheme for that. Note that each word in the buffer is of size 64 bits or 8 bytes, that is a long variable in a 64 bit system.
Assuming a sane checksum implementation, then the probability of a randomly-chosen input string colliding with a reference input string is 1 in 2n, where n is the checksum length in bits.
However, if you're talking about input that differs from the original by a low number of bits, then the probability of collision is generally much, much lower.
One possibility is to have a look at T. Maxino's thesis titled "The Effectiveness of Checksums for Embedded Networks" (PDF), which contains an analysis for some well-known checksums.
However, usually it is better to go with CRCs, which have additional benefits, such as detection of burst errors.
For these, P. Koopman's paper "Cyclic Redundancy Code (CRC) Selection for Embedded Networks" (PDF) is a valuable resource.

Calculating an 8-bit CRC with the C preprocessor?

I'm writing code for a tiny 8-bit microcontroller with only a few bytes of RAM. It has a simple job which is to transmit 7 16-bit words, then the CRC of those words. The values of the words are chosen at compile time. The CRC specifically is "remainder of division of
word 0 to word 6 as unsigned number divided by the polynomial x^8+x²+x+1 (initial value 0xFF)."
Is it possible to calculate the CRC of those bytes at compile time using the C preprocessor?
#define CALC_CRC(a,b,c,d,e,f,g) /* what goes here? */
#define W0 0x6301
#define W1 0x12AF
#define W2 0x7753
#define W3 0x0007
#define W4 0x0007
#define W5 0x5621
#define W6 0x5422
#define CRC CALC_CRC(W0, W1, W2, W3, W4, W5, W6)
It is possible to design a macro which will perform a CRC calculation at compile time. Something like
// Choosing names to be short and hopefully unique.
#define cZX((n),b,v) (((n) & (1 << b)) ? v : 0)
#define cZY((n),b, w,x,y,z) (cZX((n),b,w)^CzX((n),b+1,x)^CzX((n),b+2,y)^cZX((n),b+3,z))
#define CRC(n) (cZY((n),0,cZ0,cZ1,cZ2,cZ3)^cZY((n),4,cZ4,cZ5,cZ6,cZ7))
should probably work, and will be very efficient if (n) can be evaluated as a compile-time constant; it will simply evaluate to a constant itself. On the other hand, if n is an expression, that expression will end up getting recomputed eight times. Even if n is a simple variable, the resulting code will likely be significantly larger than the fastest non-table-based way of writing it, and may be slower than the most compact way of writing it.
BTW, one thing I'd really like to see in the C and C++ standard would be a means of specifying overloads which would be used for functions declared inline only if particular parameters could be evaluated as compile-time constants. The semantics would be such that there would be no 'guarantee' that any such overload would be used in every case where a compiler might be able to determine a value, but there would be a guarantee that (1) no such overload would be used in any case where a "compile-time-const" parameter would have to be evaluated at runtime, and (2) any parameter which is considered a constant in one such overload will be considered a constant in any functions invoked from it. There are a lot of cases where a function could written to evaluate to a compile-time constant if its parameter is constant, but where run-time evaluation would be absolutely horrible. For example:
#define bit_reverse_byte(n) ( (((n) & 128)>>7)|(((n) & 64)>>5)|(((n) & 32)>>3)|(((n) & 16)>>1)|
(((n) & 8)<<1)|(((n) & 4)<<3)|(((n) & 2)<<5)|(((n) & 1)<<7) )
#define bit_reverse_word(n) (bit_reverse_byte((n) >> 8) | (bit_reverse_byte(n) << 8))
A simple rendering of a non-looped single-byte bit-reverse function in C on the PIC would be about 17-19 single-cycle instructions; a word bit-reverse would be 34, or about 10 plus a byte-reverse function (which would execute twice). Optimal assembly code would be about 15 single-cycle instructions for byte reverse or 17 for word-reverse. Computing bit_reverse_byte(b) for some byte variable b would take many dozens of instructions totalling many dozens of cycles. Computing bit_reverse_word(w) for some 16-bit wordw` would probably take hundreds of instructions taking hundreds or thousands of cycles to execute. It would be really nice if one could mark a function to be expanded inline using something like the above formulation in the scenario where it would expand to a total of four instructions (basically just loading the result) but use a function call in scenarios where inline expansion would be heinous.
The simplest checksum algorithm is the so-called longitudinal parity check, which breaks the data into "words" with a fixed number n of bits, and then computes the exclusive or of all those words. The result is appended to the message as an extra word.
To check the integrity of a message, the receiver computes the exclusive or of all its words, including the checksum; if the result is not a word with n zeros, the receiver knows that a transmission error occurred.
(souce: wiki)
In your example:
#define CALC_LRC(a,b,c,d,e,f) ((a)^(b)^(c)^(d)^(e)^(f))
Disclaimer: this is not really a direct answer, but rather a series of questions and suggestions that are too long for a comment.
First Question: Do you have control over both ends of the protocol, e.g. can you choose the checksum algorithm by means of either yourself or a coworker controlling the code on the other end?
If YES to question #1:
You need to evaluate why you need the checksum, what checksum is appropriate, and the consequences of receiving a corrupt message with a valid checksum (which factors into both the what & why).
What is your transmission medium, protocol, bitrate, etc? Are you expecting/observing bit errors? So for example, with SPI or I2C from one chip to another on the same board, if you have bit errors, it's probably the HW engineers fault or you need to slow the clock rate, or both. A checksum can't hurt, but shouldn't really be necessary. On the other hand, with an infrared signal in a noisy environment, and you'll have a much higher probability of error.
Consequences of a bad message is always the most important question here. So if you're writing the controller for digital room thermometer and sending a message to update the display 10x a second, one bad value ever 1000 messages has very little if any real harm. No checksum or a weak checksum should be fine.
If these 6 bytes fire a missile, set the position of a robotic scalpel, or cause the transfer of money, you better be damn sure you have the right checksum, and may even want to look into a cryptographic hash (which may require more RAM than you have).
For in-between stuff, with noticeable detriment to performance/satisfaction with the product, but no real harm, its your call. For example, a TV that occasionally changes the volume instead of the channel could annoy the hell out of customers--more so than simply dropping the command if a good CRC detects an error, but if you're in the business of making cheap/knock-off TVs that might be OK if it gets product to market faster.
So what checksum do you need?
If either or both ends have HW support for a checksum built into the peripheral (fairly common in SPI for example), that might be a wise choice. Then it becomes more or less free to calculate.
An LRC, as suggested by vulkanino's answer, is the simplest algorithm.
Wikipedia has some decent info on how/why to choose a polynomial if you really need a CRC:
http://en.wikipedia.org/wiki/Cyclic_redundancy_check
If NO to question #1:
What CRC algorithm/polynomial does the other end require? That's what you're stuck with, but telling us might get you a better/more complete answer.
Thoughts on implementation:
Most of the algorithms are pretty light-weight in terms of RAM/registers, requiring only a couple extra bytes. In general, a function will result in better, cleaner, more readable, debugger-friendly code.
You should think of the macro solution as an optimization trick, and like all optimization tricks, jumping to them to early can be a waste of development time and a cause of more problems than it's worth.
Using a macro also has some strange implications you may not have considered yet:
You are aware that the preprocessor can only do the calculation if all the bytes in a message are fixed at compile time, right? If you have a variable in there, the compiler has to generate code. Without a function, that code will be inlined every time it's used (yes, that could mean lots of ROM usage). If all the bytes are variable, that code might be worse than just writing the function in C. Or with a good compiler, it might be better. Tough to say for certain. On the other hand, if a different number of bytes are variable depending on the message being sent, you might end up with several versions of the code, each optimized for that particular usage.

Resources