Getting CRC-32 over STM32 flash and consistency with other CRC-32 tools - c

I'm moving my STM32F1xx project to a solution with a bootloader.
Part of this is that I want to be able to compute a CRC value over the existing bootloader and application flash ranges to compare existing and possible upload candidates.
Using a simple implementation on the STM32 which just does the following steps:
Enable CRC periperal
Reset the peripheral CRC value (sets to 0xFFFFFFFF)
Iterate over flash range (in this case 0x08000000 to 0x08020000) passing values to CRC peripheral
Return CRC peripheral output
uint32_t get_crc(void) {
RCC->AHBENR |= RCC_AHBENR_CRCEN;
CRC->CR |= CRC_CR_RESET;
for(uint32_t *n = (uint32_t *)FLASH_BASE; n < (uint32_t *)(FLASH_BANK1_END + 1u); n ++) {
CRC->DR = *n;
}
return CRC->DR;
}
The value I get from this is 0x0deddeb3.
To compare this value with something I am running the .bin file through two tools.
The value I get from npm's crc-32 is 0x776b0ea2
The value I get from a zip file's CRC-32 is also 0x776b0ea2
What could be causing this? Is there a difference between iterating over the entire flash range and the contents of the bin file (smaller than entire flash range)? The polynomial used by the STM32 is 0x04c11db7 which seems to be fairly standard for a CRC-32. Would the zip tool and npm crc-32 be using a different polynomial?
I have also tried iterating over bytes and half-words as well as words on the STM32 in case the other tools used a different input format.
There is a similar question on here already, but I'm hoping to use a node-js solution because that is the platform my interface application is being developed on.

Calculating CRCs is a mine field. Your question already has some points to look at:
Is there a difference between iterating over the entire flash range and the contents of the bin file (smaller than entire flash range)?
Yes, of course.
Would the zip tool and npm crc-32 be using a different polynomial?
The documentation will tell you. And I'm sure that you can use another polynomial with this tools by an option.
Anyway, these are the things to consider when calculating CRCs:
The amount of bytes (words, ...) to "sum up".
The contents of the flash not covered by the binary file, most probably all bits set to 1.
Width of the polynomial (in your case fixed to 32 bits).
Value of the polynomial.
Initial value for the register.
Whether the bits of each byte are reflected before being processed.
Whether the algorithm feeds input bytes through the register or xors them with a byte from one end and then straight into the table.
Whether the final register value should be reversed (as in reflected versions).
Value to XOR with the final register value.
The points 3 to the last are shamelessly copied from "A PAINLESS GUIDE TO CRC ERROR DETECTION ALGORITHMS" that I suggest to read.

The polynomial is only one of several parameters that define a CRC. In this case the CRC is not reflected, whereas the standard zip CRC, using the same polynomial is reflected. Also that zip CRC is exclusive-or'ed with 0xffffffff at the end, whereas yours isn't. They do both get initialized the same, which is with 0xffffffff.

Related

How does a bootloader know the "expected" CRC value?

Let's say we have some firmware, and a bootloader. When we flash both onto the device, during boot, the bootloader would know some "expected" CRC from the binary firmware image. The bootloader would compare the expected vs. actually calculated CRC value from the binary firmware image. If they're equal, it jumps to the firmware application startup address, and if not, it just stays in the bootloader.
What I'm confused is by how the bootloader would know some "expected" CRC value. How does a discrepancy grow between an incorrect CRC value and an expected one? And where does the "expected" one come from?
I use two methods.
The CRC is stored somewhere in the binary image. Bootloader calculates the CRC of the image and compares with that value. If they match - the image is good and can be executed.
Always the same CRC is used and some additional data is appended to the image to match this CRC. It requires a bit more complicated post-build steps.
Let's say we have some firmware, and a bootloader. When we flash both onto the device, during boot, the bootloader would know some "expected" CRC from the binary firmware image. The bootloader would compare the expected vs. actually calculated CRC value from the binary firmware image. If they're equal, it jumps to the firmware application startup address, and if not, it just stays in the bootloader.
CRCs are calculated following a well known formula, so the boot loader applies that formula to the full boot record to get what is called the actual value (for that, depending on the sofware some programs take of the CRC code comming on the data or not, depending on the CRC algorithm) and it compares with the expected value which is the code that comes in the data.
Other programs store in the CRC field a value derived from the calculated CRC that forces the algorithm to return a fixed (depending on the algorithm, but always the same) value (e.g. zero) This allows simplifying the CRC algorithm that has just to calculate the CRC over the full data, and if it has not been touched (modified) for example, an expected value of zero is expected.
If you are dealing with some established protocol that defines a CRC algorithm to calculate and verify the data in a boot record, you need to look for the documentation on how the CRC is calculated and stored in it. So your expected value will be described there.
Whith respect to CRCs some algorithms initialize the CRC machine in order to distinguish the stream start from a sequence of zeros by initializing (or prepending to the bitstring ---or the initial polynomial---) a fixed string of ones. This is easily implemented by initalizing the shift register with ones and start feeding the block contents to the shift register. Others add a trailing of zeros to xor in that field the calculated CRC (so the total chain of bits should always result in an all zeros expected CRC) You need to consult the firmware provider to see how the boot record CRC is calculated, as most probably the device will refuse to boot until the CRC is not properly set.

Robust incremental CRC32

I am implementing a flash integrity check for an STM32F10x MCU-based board. This check needs to process the program flash, one block at a time, i.e. 64-byte block per CRC32 invocation, because it needs to be periodic and interruptible.
Unfortunately, CRC hardware in STM32F10x doesn't allow configuring the initial checksum for the CRC32, so effectively, for each block CRC32 algorithm starts with 0xFFFFFFFF.
I am looking for a way to "merge" these checksums to provide some signature that would be more robust than a trivial XOR of all the checksums. This XOR solution has a problem with swapping of the order of blocks and, of course, the traditional "even number of errors cancels itself" kind of problem.
What would be a better way of integrity checking of blocks sequence in this case? Would the following pseudo-code be "good enough":
checksum = 0xffffffff;
for (idx=0; idx<size_in_blocks; idx++)
{
CRC_ResetDR();
checksum = (checksum >> 8) ^ CRC_CalcBlockCRC(blocks[idx], block_size)
}
We can't calculate the whole flash CRC32 in one go, as it takes a too long time, and the CRC HW block is used by other entities in the OS as well, i.e. can be reset in the middle.
NOTE: there is no need to reconstruct the exact CRC32 as if it would be a one single CRC32 input block. If it can be done, this would be even better, but it can be just reliable enough to guarantee a certain lack of flash manipulation.
Edited: Clearly, CRC32 is not a cryptographic hash, so the word "cryptography" was taken off the title.

IAR linker CRC32 calculation doesn't match with the one calculated in C (and in other webs)

The point is that I need to calculate a 32 bit CRC using the IAR in order to auto store this value into a known memory address but the result given by IAR and the one I calculate using a C function (checked using some online calculators) don't match.
In the next paragraphs I will try to go step by step following all the process I followed.
I configured the IAR linker as it is recommended ( I think so at least ) in the following link:
IAR documentation CRC links
The fact is that I have configured it as in the Calculate CRC32 as in STM32 hardware (v.5.50 and later) example (the first part of this, because I have IAR 6.5).
As I see there I tried to clone the configuration shown in the screenshot:
CONFIGURATION PICTURE
And this is the configuration I use in my C CRC file:
/* PARAMETERS EXPLANATION
* 'order' [1..32] is the CRC polynom order, counted without the leading
* '1' bit.
* 'polynom' is the CRC polynom without leading '1' bit.
* 'direct' [0,1] specifies the kind of algorithm: 1=direct, no augmented
* zero bits.
* 'crcinit' is the initial CRC value belonging to that algorithm.
* 'crcxor' is the final XOR value.
* 'refin' [0,1] specifies if a data byte is reflected before processing
* (UART) or not.
* 'refout' [0,1] specifies if the CRC will be reflected before XOR.
*/
/* Init parameters for CRC 32 algorithm */
crcParams.order = 32;
crcParams.polynom = 0x4C11DB7;
crcParams.direct = true;
crcParams.crcinit = 0xffffffff;
crcParams.crcxor = 0xffffffff;
crcParams.refin = false;
crcParams.refout = false;
I had a doubt about if the crcxor should be 0xFFFFFFFF or 0x00000000, but I tried with both without getting the expected result.
In order to check that the C function was working I have used the following websites:
FIRST CRC32 CALCULATOR
SECOND CRC32 CALCULATOR
The C code I use to calculate the CRC is based in the one explained here:
CRC C code
This is an example of the configuration used in one of the websites:
Any help will be really appreciated.
Thanks in advance.
Best regrads.
Iván
Finally I solved this. I used this CRC32 code adapted to my purpose:
https://www.iar.com/support/tech-notes/general/c-source-for-crc32/
In all my other attempts I used the right Polynomial and tried all the configurations used with other boards in the IAR linker checksum configuration but those didn't work. So the configuration I used was with a init value of 0xFFFFFFFF and the following IAR linker configuration:
I hope this helps. It was very rare because of the problems I had implementing the other, I think valid, CRC32 implementations.
Best regards,
Iván
Why do you think that their code is not working? What is your test case?
It is a CRC-32 that uses the polynomial 0x04c11db7 in a forward direction, with the initial CRC equal to 0xc704dd7b and no final exclusive-or'ing. So fast_crc32(0xc704dd7b, data, length) will return the CRC of data[0..length-1].

array storage and multiplication in verilog

I have a peripheral connected to my altera fpga and am able read data from it using SPI. I would like to store this incoming data into an array, preferably as a floating point value. Further, I have a csv file on my computer and want to store that data in another array, and then after triggering a 'start' signal multiply both arrays and send the output via rs-232 to my pc. Any suggestions on how to go about this? Code for reading data from peripheral is as follows:
// we sample on negative edge of clock
always #(negedge SCL)
begin
// data comes as MSB first.
MOSI_reg[31:0] <= {MOSI_reg[30:0], MOSI}; // left shift for MOSI data
MISO_reg[31:0] <= {MISO_reg[30:0], MISO}; // left shift for MISO data
end
thank you.
A 1024x28 matrix of 32 bits each element requires 917504 bits of RAM in your FPGA, plus another 28*32 = 896 bits for the SPI data. Multiplying these two matrices will result in a vector of 1024x1 elements, thus add 32768 bits for the result. This sums 951168 bits you will need in your device. Does your FPGA chip have this memory?
Asumming you have, yes: you can instantiate a ROM inside your design and initialize with $readmemh or $readmemb (for values in binary or hexadecimal form respectively).
If precission is not an issue, go for fixed point, as implementing multiplication and addition in floating point is kind of hard job.
You need then a FSM to fill your source vector with SPI data, do the multiplication and store the result in your destination vector. You may consider instantiating a small processor to do the job more easily
Multiplication is non-trivial in hardware, and 'assign c = a*b' is not necessarily going to produce what you want.
If your FPGA has DSP blocks, you can use one of Altera's customizable IP cores to do your multiplication in a DSP block. If not, you can still use an IP core to tune the multiplier the way you want (with regards to signed/unsigned, latency, etc.) and likely produce a better result.

Storing Large Integers/Values in an Embedded System

I'm developing a embedded system that can test a large numbers of wires (upto 360) - essentially a continuity checking system. The system works by clocking in a test vector and reading the output from the other end. The output is then compared with a stored result (which would be on an SD Card) that tells what the output should have been. The test-vectors are just a walking ones so there's no need to store them anywhere. The process would be a bit like follows:
Clock out test-vector (walking ones)
Read in output test-vector.
Read corresponding output test-vector from SD Card which tells what the output vector should be.
Compare the test-vectors from step 2 and 3.
Note down the errors/faults in a separate array.
Continue back to step 1 unless all wires are checked.
Output the errors/faults to the LCD.
My hardware consists of a large shift register thats clocked into the AVR microcontroller. For every test vector (which would also be 360 bits), I will need to read in 360 bits. So, for 360 wires the total amount of data would be 360*360 = 16kB or so. I already know I cannot do this in one pass (i.e. read the entire data and then compare), so it will have to be test-vector by test-vector.
As there are no inherent types that can hold such large numbers, I intend to use a bit-array of length 360 bit. Now, my question is, how should I store this bit array in a txt file?
One way is to store raw values i.e. on each line store the raw binary data that I read in from the shift register. So, for 8 wires, it would be 0b10011010. But this can get ugly for upto 360 wires - each line would contain 360 bytes.
Another way is to store hex values - this would just be two characters for 8 bits (9A for the above) and about 90 characters for 360 bits. This would, however, require me to read in the text - line by line - and convert the hex value to be represented in the bit-array, somehow.
So whats the best solution for this sort of problem? I need the solution to be completely "deterministic" - I can't have calls to malloc or such. They are a bit of a no-no in embedded systems from what I've read.
SUMMARY
I need to store large values that can't be represented by any traditional variable types. Currently I intend to store these values in a bitarray. What's the best way to store these values in a text file on an SD Card?
These are not integer values but rather bit maps; they have no arithmetic meaning. What you are suggesting is simply a byte array of length 360/8, and not related to "large integers" at all. However some more appropriate data structure or representation may be possible.
If the test vector is a single bit in 360, then it is both inefficient and unnecessary to store 360 bits for each vector, a value 0 to 359 is sufficient to unambiguously define each vector. If the correct output is also a single bit, then that could also be stored as a bit index, if not then you could store it as a list of indices for each bit that should be set, with some sentinel value >=360 or <0 to indicate the end of the list. Where most vectors contain less than fewer than 22 set bits, this structure will be more efficient that storing a 45 byte array.
From any bit index value, you can determine the address and mask of the individual wire by:
byte_address = base_address + bit_index / 8 ;
bit_mask = 0x01 << (bit_index % 8) ;
You could either test each of the 360 bits iteratively or generate a 360 bit vector on the fly from the list of bits.
I can see no need for dynamic memory allocation in this, but whether or not it is advisable in an embedded system is largely dependent on the application and target resources. A typical AVR system has very little memory, and dynamic memory allocation carries an overhead for heap management and block alignment that you may not be able to afford. Dynamic memory allocation is not suited in situations where hard real-time deterministic timing is required. And in all cases you should have a well defined strategy or architecture for avoiding memory leak issues (repeatedly allocating memory that never gets released).

Resources