How to calculate CRC32-C checksum with zlib - zlib

I am using the C zlib API because it has the crc32_combine function to concatenate checksums together, whereas the Boost one does not.
However, I need to implement the CRC32-C (Castagnoli) checksum, with polynomial 0x1EDC6F41, instead of the standard CRC32 checksum.
With Boost I can apparently use:
#include <boost/crc.hpp>
using crc_32c_type = boost::crc_optimal<32, 0x1EDC6F41, 0xFFFFFFFF, 0xFFFFFFFF, true, true>;
crc_32c_type result;
result.process_bytes(reinterpret_cast<const char*>(&buffer), len);
return result.checksum();
Which can use the 0x1EDC6F41 polynomial.
Is there a similar way for me to do this with zlib?

zlib is open source. You can simply take the source and modify for your own needs. You can change the line: odd[0] = 0xedb88320UL; to the reflection of the Castagnoli polynomial.

For Objective C
Took me a while to find one that worked.
//------------------------------------------------------------------------------------
// crc32c
// Calculate crc32c (Castagnoli) Checksum
//------------------------------------------------------------------------------------
+ (uint32_t) crc32c:(NSData *)data {
/* CRC-32C (iSCSI) polynomial in reversed bit order. */
int k;
const unsigned char *buf = [data bytes];
unsigned long len = [data length];
uint32_t crc = 0xFFFFFFFF;
while (len--) {
crc ^= *buf++;
for (k = 0; k < 8; k++)
//CRC-32C polynomial 0x1EDC6F41 in reversed bit order.
crc = crc & 1 ? (crc >> 1) ^ 0x82f63b78 : crc >> 1;
}
return ~crc;
}

Related

Is there an architecture-independent method to create a little-endian byte stream from a value in C?

I am trying to transmit values between architectures, by creating a uint8_t[] buffer and then sending that. To ensure they are transmitted correctly, the spec is to convert all values to little-endian as they go into the buffer.
I read this article here which discussed how to convert from one endianness to the other, and here where it discusses how to check the endianness of the system.
I am curious if there is a method to read bytes from a uint64 or other value in little-endian order regardless of whether the system is big or little? (ie through some sequence of bitwise operations)
Or is the only method to first check the endianness of the system, and then if big explicitly convert to little?
That's actually quite easy -- you just use shifts to convert between 'native' format (whatever that is) and little-endian
/* put a 32-bit value into a buffer in little-endian order (4 bytes) */
void put32(uint8_t *buf, uint32_t val) {
buf[0] = val;
buf[1] = val >> 8;
buf[2] = val >> 16;
buf[3] = val >> 24;
}
/* get a 32-bit value from a buffer (little-endian) */
uint32_t get32(uint8_t *buf) {
return (uint32_t)buf[0] + ((uint32_t)buf[1] << 8) +
((uint32_t)buf[2] << 16) + ((uint32_t)buf[3] << 24);
}
If you put a value into a buffer, transmit it as a byte stream to another machine, and then get the value from the received buffer, the two machines will have the same 32 bit value regardless of whether they have the same or different native byte oridering. The casts are needed becuase the default promotions will just convert to int, which might be smaller than a uin32_t, in which case the shifts could be out of range.
Be careful if you buffers are char rather than uint8_t (char might or might not be signed) -- you need to mask in that case:
uint32_t get32(char *buf) {
return ((uint32_t)buf[0] & 0xff) + (((uint32_t)buf[1] & 0xff) << 8) +
(((uint32_t)buf[2] & 0xff) << 16) + (((uint32_t)buf[3] & 0xff) << 24);
}
You can always serialize an uint64_t value to array of uint8_t in little endian order as simply
uint64_t source = ...;
uint8_t target[8];
target[0] = source;
target[1] = source >> 8;
target[2] = source >> 16;
target[3] = source >> 24;
target[4] = source >> 32;
target[5] = source >> 40;
target[6] = source >> 48;
target[7] = source >> 56;
or
for (int i = 0; i < sizeof (uint64_t); i++) {
target[i] = source >> i * 8;
}
and this will work anywhere where uint64_t and uint8_t exists.
Notice that this assumes that the source value is unsigned. Bit-shifting negative signed values will cause all sorts of headaches and you just don't want to do that.
Deserialization is a bit more complex if reading byte at a time in order:
uint8_t source[8] = ...;
uint64_t target = 0;
for (int i = 0; i < sizeof (uint64_t); i ++) {
target |= (uint64_t)source[i] << i * 8;
}
The cast to (uint64_t) is absolutely necessary, because the operands of << will undergo integer promotions, and uint8_t would always be converted to a signed int - and "funny" things will happen when you shift a set bit into the sign bit of a signed int.
If you write this into a function
#include <inttypes.h>
void serialize(uint64_t source, uint8_t *target) {
target[0] = source;
target[1] = source >> 8;
target[2] = source >> 16;
target[3] = source >> 24;
target[4] = source >> 32;
target[5] = source >> 40;
target[6] = source >> 48;
target[7] = source >> 56;
}
and compile for x86-64 using GCC 11 and -O3, the function will be compiled to
serialize:
movq %rdi, (%rsi)
ret
which just moves the 64-bit value of source into target array as is. If you reverse the indices (7 ... 0; big-endian), GCC will be clever enough to recognize that too and will compile it (with -O3) to
serialize:
bswap %rdi
movq %rdi, (%rsi)
ret
Most standardized network protocols specify numbers in big-endian format. In fact, big-endian is all referred to as network byte order, and there are functions specifically for translating integers of various sizes between host and network byte order.
These function are htons and ntohs for 16 bit values and htonl and ntohl` for 32 bit values. However, there is no equivalent for 64 bit values, and you're using little-endian for the network protocol, so these won't help you.
You can still however translate between the host byte order and the network byte order (little-endian in this case) without knowing the host order. You can do this by bit shifting the relevant values in to or out of the host numbers.
For example, to convert a 32 bit value from host to little endian and back to host:
uint32_t src_value = *some value*;
uint8_t buf[sizeof(uint32_t)];
int i;
for (i=0; i<sizeof(uint32_t); i++) {
buf[i] = (src_value >> (8 * i)) & 0xff;
}
uint32_t dest_value = 0;
for (i=0; i<sizeof(uint32_t); i++) {
dest_value |= (uint32_t)buf[i] << (8 * i);
}
For two systems that must communicated, you specify an "intercomminication-byte order". Then you have functions that convert between that and the native architecture byte order of each system.
There are three approaches to this problem. In order of efficiency:
Compile time detection of endianess
Run time detection of endianness
Endian agnostic code (corresponding to "sequence of bitwise operations" in your question).
Compile time detection of endianess
On architectures whose byte order is the same as the intercomm byte order, these functions do no transformation, but by using them, the same code becomes portable between systems.
Such functions may already exist on your target platform, for example:
Linux's endian.h be64toh() et-al
POSIX htonl, htons, ntohl, ntohs
Windows' winsock.h (same as POSIX but adds 64 bit htonll() and ntohll()
Where they don't exist creating them with cross-platform support is trivial. For example:
uint16_t intercom_to_host_16( uint16_t intercom_word )
{
#if __BIG_ENDIAN__
return intercom_word ;
#else
return intercom_word >> 8 | intercom_word << 8 ;
#endif
}
Here I have assumed that the intercom order is big-endian, that makes the function compatible with network byte order per ntohs() et al. The macro __BIG_ENDIAN__ is a predefined macro on most compilers. If not simply define it as a command line macro when compiling e.g. -D __BIG_ENDIAN__.
Run time detection of endianness
It is possible to detect endianness at runtime with minimal overhead:
uint16_t intercom_to_host_16( uint16_t intercom_word )
{
static const union
{
uint16_t word ;
uint8_t bytes[2] ;
} test = {.word = 0xff00u } ;
return test.bytes[0] == 0xffu ?
intercom_word :
intercom_word >> 8 | intercom_word << 8 ;
}
Of course you might wrap the test in a function for use in similar functions for other word sizes:
#include <stdbool.h>
bool isBigEndian()
{
static const union
{
uint16_t word ;
uint8_t bytes[2] ;
} test = {.word = 0xff00u } ;
return test.bytes[0] == 0xffu ;
}
Then simply have :
uint16_t intercom_to_host_16( uint16_t intercom_word )
{
return isBigEndian() ? intercom_word :
intercom_word >> 8 | intercom_word << 8 ;
}
Endian agnostic code
It is entirely possible to use endian agnostic code, but in that case all participants in the communication or file processing have the software overhead imposed even if the native byte order is already the same as the intercom byte order.
uint16_t intercom_to_host_16( uint16_t intercom_word )
{
uint8_t host_word [2] = { intercom_word >> 8,
intercom_word << 8 } ;
return *(uint16_t*)host_word ;
}

Calculate CRC with words (16-bit) as a base variable

Is there a simple CRC algorithm based on a lookup table, but with words entering the algorithm instead of bytes.
For example, this algorithm works with bytes:
#include <stdint.h>
const uint16_t wTable_CRC16_Modbus[256] = {
0x0000, 0xC0C1, 0xC181, 0x0140, 0xC301, 0x03C0, 0x0280, 0xC241,
0xC601, 0x06C0, 0x0780, 0xC741, 0x0500, 0xC5C1, 0xC481, 0x0440,
0xCC01, 0x0CC0, 0x0D80, 0xCD41, 0x0F00, 0xCFC1, 0xCE81, 0x0E40,
0x0A00, 0xCAC1, 0xCB81, 0x0B40, 0xC901, 0x09C0, 0x0880, 0xC841,
0xD801, 0x18C0, 0x1980, 0xD941, 0x1B00, 0xDBC1, 0xDA81, 0x1A40,
0x1E00, 0xDEC1, 0xDF81, 0x1F40, 0xDD01, 0x1DC0, 0x1C80, 0xDC41,
0x1400, 0xD4C1, 0xD581, 0x1540, 0xD701, 0x17C0, 0x1680, 0xD641,
0xD201, 0x12C0, 0x1380, 0xD341, 0x1100, 0xD1C1, 0xD081, 0x1040,
0xF001, 0x30C0, 0x3180, 0xF141, 0x3300, 0xF3C1, 0xF281, 0x3240,
0x3600, 0xF6C1, 0xF781, 0x3740, 0xF501, 0x35C0, 0x3480, 0xF441,
0x3C00, 0xFCC1, 0xFD81, 0x3D40, 0xFF01, 0x3FC0, 0x3E80, 0xFE41,
0xFA01, 0x3AC0, 0x3B80, 0xFB41, 0x3900, 0xF9C1, 0xF881, 0x3840,
0x2800, 0xE8C1, 0xE981, 0x2940, 0xEB01, 0x2BC0, 0x2A80, 0xEA41,
0xEE01, 0x2EC0, 0x2F80, 0xEF41, 0x2D00, 0xEDC1, 0xEC81, 0x2C40,
0xE401, 0x24C0, 0x2580, 0xE541, 0x2700, 0xE7C1, 0xE681, 0x2640,
0x2200, 0xE2C1, 0xE381, 0x2340, 0xE101, 0x21C0, 0x2080, 0xE041,
0xA001, 0x60C0, 0x6180, 0xA141, 0x6300, 0xA3C1, 0xA281, 0x6240,
0x6600, 0xA6C1, 0xA781, 0x6740, 0xA501, 0x65C0, 0x6480, 0xA441,
0x6C00, 0xACC1, 0xAD81, 0x6D40, 0xAF01, 0x6FC0, 0x6E80, 0xAE41,
0xAA01, 0x6AC0, 0x6B80, 0xAB41, 0x6900, 0xA9C1, 0xA881, 0x6840,
0x7800, 0xB8C1, 0xB981, 0x7940, 0xBB01, 0x7BC0, 0x7A80, 0xBA41,
0xBE01, 0x7EC0, 0x7F80, 0xBF41, 0x7D00, 0xBDC1, 0xBC81, 0x7C40,
0xB401, 0x74C0, 0x7580, 0xB541, 0x7700, 0xB7C1, 0xB681, 0x7640,
0x7200, 0xB2C1, 0xB381, 0x7340, 0xB101, 0x71C0, 0x7080, 0xB041,
0x5000, 0x90C1, 0x9181, 0x5140, 0x9301, 0x53C0, 0x5280, 0x9241,
0x9601, 0x56C0, 0x5780, 0x9741, 0x5500, 0x95C1, 0x9481, 0x5440,
0x9C01, 0x5CC0, 0x5D80, 0x9D41, 0x5F00, 0x9FC1, 0x9E81, 0x5E40,
0x5A00, 0x9AC1, 0x9B81, 0x5B40, 0x9901, 0x59C0, 0x5880, 0x9841,
0x8801, 0x48C0, 0x4980, 0x8941, 0x4B00, 0x8BC1, 0x8A81, 0x4A40,
0x4E00, 0x8EC1, 0x8F81, 0x4F40, 0x8D01, 0x4DC0, 0x4C80, 0x8C41,
0x4400, 0x84C1, 0x8581, 0x4540, 0x8701, 0x47C0, 0x4680, 0x8641,
0x8201, 0x42C0, 0x4380, 0x8341, 0x4100, 0x81C1, 0x8081, 0x4040
};
uint16_t Modbus_Calculate_CRC16(uint8_t *frame, uint8_t size) {
// Initialize CRC16 word
uint16_t crc16 = 0xFFFF;
// Calculate CRC16 word
uint16_t ind;
while(size--) {
ind = ( crc16 ^ *frame++ ) & 0x00FF;
crc16 >>= 8;
crc16 ^= wTable_CRC16_Modbus[ind];
}
// Swap low and high bytes
crc16 = (crc16<<8) | (crc16>>8);
// Return the CRC16 word with
// swapped low and high bytes
return crc16;
}
Since I'm always going to send words, using the above algorithm I would need to break each WORD in LSB and MSB and repeat the code in the loop body twice, first for LSB and then for MSB. Instead of doing that, I would like to update CRC in one step, with WORD as the algorithm (loop body) input.
If and only if you read words from memory in little-endian order (least significant byte first), then you can read a 16-bit word at a time. You would still need to use the table twice to process the word. The loop would be:
size >>= 1; // better have been even!
uint16_t const *words = frame; // better be little-endian!
while (size--) {
crc16 ^= *words++;
crc16 = (crc16 >> 8) ^ wTable_CRC16_Modbus[crc16 & 0xff];
crc16 = (crc16 >> 8) ^ wTable_CRC16_Modbus[crc16 & 0xff];
}
Depending on your processor, you may also need to make sure that frame points to an even address. Note that if size is odd, this won't process the last byte. If size can be and is odd, you can just add the processing of the last byte after the loop.
If you would like a single table lookup per word, then you'll need a much bigger table. The existing table is simply the CRC of the single-byte values 0..255 (where the initial CRC is zero, not 0xffff). The bigger table would be the same thing, but for the two-byte values 0..65536, with the least significant byte processed first. You can make that table using the existing table:
void modbus_bigtable(uint16_t *table) {
for (uint16_t lo = 0; lo < 256; lo++) {
uint16_t crclo = wTable_CRC16_Modbus[lo];
uint16_t crchi = crclo >> 8;
crclo &= 0xff;
uint16_t *nxt = table;
for (uint16_t hi = 0; hi < 256; hi++) {
*nxt = crchi ^ wTable_CRC16_Modbus[hi ^ crclo];
nxt += 256;
}
table++;
}
}
Then the loop becomes:
size >>= 1; // better have been even!
uint16_t const *words = frame; // better be little-endian!
while (size--)
crc16 = table[crc16 ^ *words++];

Parallel Verilog CRC algorithm from C-like reference

I have a set of c-like snippets provided that describe a CRC algorithm, and this article that explains how to transform a serial implementation to parallel that I need to implement in Verilog.
I tried using multiple online code generators, both serial and parallel (although serial would not work in final solution), and also tried working with the article, but got no similar results to what these snippets generate.
I should say I'm more or less exclusively hardware engineer and my understanding of C is rudimentary. I also never worked with CRC other than straightforward shift register implementation. I can see the polynomial and initial value from what I have, but that is more or less it.
Serial implementation uses augmented message. Should I also create parallel one for 6 bits wider message and append zeros to it?
I do not understand too well how the final value crc6 is generated. CrcValue is generated using the CalcCrc function for the final zeros of augmented message, then its top bit is written to its place in crc6 and removed before feeding it to the function again. Why is that? When working the algorithm to get the matrices for the parallel implementation, I should probably take crc6 as my final result, not last value of CrcValue?
Regardless of how crc6 is obtained, in the snippet for CRC check only runs through the function. How does that work?
Here are the code snippets:
const unsigned crc6Polynom =0x03; // x**6 + x + 1
unsigned CalcCrc(unsigned crcValue, unsigned thisbit) {
unsigned m = crcValue & crc6Polynom;
while (m > 0) {
thisbit ^= (m & 1);
m >>= 1;
return (((thisbit << 6) | crcValue) >> 1);
}
}
// obtain CRC6 for sending (6 bit)
unsigned GetCrc(unsigned crcValue) {
unsigned crc6 = 0;
for (i = 0; i < 6; i++) {
crcValue = CalcCrc(crcValue, 0);
crc6 |= (crcValue & 0x20) | (crc6 >> 1);
crcValue &= 0x1F; // remove output bit
}
return (crc6);
}
// Calculate CRC6
unsigned crcValue = 0x3F;
for (i = 1; i < nDataBits; i++) { // Startbit excluded
unsigned thisBit = (unsigned)((telegram >> i) & 0x1);
crcValue = CalcCrc(crcValue, thisBit);
}
/* now send telegram + GetCrc(crcValue) */
// Check CRC6
unsigned crcValue = 0x3F;
for (i = 1; i < nDataBits+6; i++) { // No startbit, but with CRC
unsigned thisBit = (unsigned)((telegram >> i) & 0x1);
crcValue = CalcCrc(crcValue, thisBit);
}
if (crcValue != 0) { /* put error handler here */ }
Thanks in advance for any advice, I'm really stuck there.
xoring bits of the data stream can be done in parallel because only the least signficant bit is used for feedback (in this case), and the order of the data stream bit xor operations doesn't affect the result.
Whether the hardware would need a parallel version depends on how a data stream is handled. The hardware could calculate the CRC one bit at a time during transmission or reception. If the hardware is staged to work with 6 bit characters, then a parallel version would make sense.
Since the snippets use a right shift for the CRC, it would seem that data for each 6 bit character is transmitted and received least significant bit first, to allow for hardware that could calculate CRC 1 bit at a time as it's transmitted or received. After all 6 bit data characters are transmitted, then the 6 bit CRC is transmitted (also least significant bit first).
The snippets seem wrong. My guess at what they should be:
/* calculate crc6 1 bit at a time */
const unsigned crc6Polynom =0x43; /* x**6 + x + 1 */
unsigned CalcCrc(unsigned crcValue, unsigned thisbit) {
crcValue ^= thisbit;
if(crcValue&1)
crcValue ^= crc6Polynom;
crcValue >>= 1;
return crcValue;
}
Example for passing 6 bits at a time. A 64 by 6 bit table lookup could be used to replace the for loop.
/* calculate 6 bits at a time */
unsigned CalcCrc6(unsigned crcValue, unsigned sixbits) {
int i;
crcValue ^= sixbits;
for(i = 0; i < 6; i++){
if(crcValue&1)
crcValue ^= crc6Polynom;
crcValue >>= 1;
}
return crcValue;
}
Assume that telegram contains 31 bits, 1 start bit + 30 data bits (five 6 bit characters):
/* code to calculate crc 6 bits at a time: */
unsigned crcValue = 0x3F;
int i;
telegram >>= 1; /* skip start bit */
for (i = 0; i < 5; i++) {
crcValue = CalcCrc6(unsigned crcValue, telegram & 0x3f);
telegram >>= 6;
}

_mm_crc32_u8 gives different result than reference code

I've been struggling with the intrinsics. In particular I don't get the same results using the standard CRC calculation and the supposedly equivalent intel intrinsics. I'd like to move to using _mm_crc32_u16, and _mm_crc32_u32 but if I can't get the 8 bit operation to work there's no point.
static UINT32 g_ui32CRC32Table[256] =
{
0x00000000L, 0x77073096L, 0xEE0E612CL, 0x990951BAL,
0x076DC419L, 0x706AF48FL, 0xE963A535L, 0x9E6495A3L,
0x0EDB8832L, 0x79DCB8A4L, 0xE0D5E91EL, 0x97D2D988L,
....
// Your basic 32-bit CRC calculator
// NOTE: this code cannot be changed
UINT32 CalcCRC32(unsigned char *pucBuff, int iLen)
{
UINT32 crc = 0xFFFFFFFF;
for (int x = 0; x < iLen; x++)
{
crc = g_ui32CRC32Table[(crc ^ *pucBuff++) & 0xFFL] ^ (crc >> 8);
}
return crc ^ 0xFFFFFFFF;
}
UINT32 CalcCRC32_Intrinsic(unsigned char *pucBuff, int iLen)
{
UINT32 crc = 0xFFFFFFFF;
for (int x = 0; x < iLen; x++)
{
crc = _mm_crc32_u8(crc, *pucBuff++);
}
return crc ^ 0xFFFFFFFF;
}
That table is for a different CRC polynomial than the one used by the Intel instruction. The table is for the Ethernet/ZIP/etc. CRC, often referred to as CRC-32. The Intel instruction uses the iSCSI (Castagnoli) polynomial, for the CRC often referred to as CRC-32C.
This short example code can calculate either, by uncommenting the desired polynomial:
#include <stddef.h>
#include <stdint.h>
/* CRC-32 (Ethernet, ZIP, etc.) polynomial in reversed bit order. */
#define POLY 0xedb88320
/* CRC-32C (iSCSI) polynomial in reversed bit order. */
/* #define POLY 0x82f63b78 */
/* Compute CRC of buf[0..len-1] with initial CRC crc. This permits the
computation of a CRC by feeding this routine a chunk of the input data at a
time. The value of crc for the first chunk should be zero. */
uint32_t crc32c(uint32_t crc, const unsigned char *buf, size_t len)
{
int k;
crc = ~crc;
while (len--) {
crc ^= *buf++;
for (k = 0; k < 8; k++)
crc = crc & 1 ? (crc >> 1) ^ POLY : crc >> 1;
}
return ~crc;
}
You can use this code to generate a replacement table for your code by simply computing the CRC-32C of each of the one-byte messages 0, 1, 2, ..., 255.
FWIW, I've obtained SW code that demonstrably matches the Intel crc32c instruction, but it uses a different polynomial: 0x82f63b78 The function definitely doesn't match any of the iSCSI test examples here: https://www.rfc-editor.org/rfc/rfc3720#appendix-B.4
What's frustrating in all this is every implementation I've tried for CRC-32C comes out with different hashes from all the others. Is there a true piece of reference code out there?

Matching CRC32 from STM32F0 and zlib

I'm working on a communication link between a computer running Linux and a STM32F0. I want to use some kind of error detection for my packets and since the STM32F0 has CRC32 hw and I have zlib with CRC32 on Linux I thought it would be a good idea to use CRC32 for my project. The problem is that I won't get the same CRC value for the same data on the different platforms.
#include <inttypes.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <zlib.h>
int
main(void)
{
uint8_t byte0 = 0x00;
uint32_t crc0 = crc32(0L, Z_NULL, 0);
crc0 = crc32(crc0, &byte0, 1);
printf("CRC32 value of %" PRIu8 " is: %08" PRIx32 "\n", byte0, crc0);
}
outputs CRC32 value of 0 is: d202ef8d which matches the result on several online calculators.
It seems though that whatever settings I use on the STM32 I can't get the same CRC.
I have found a flowchart on how the CRC hw calculates its value in an application note from ST but I can't figure out how it's done in zlib.
Does anyone know if they are compatible?
[edit 1] They both uses the same init value and polynomial.
[edit 2] The STM32 code is relatively uninteresging since it's using the hw.
...
/* Default values are used for init value and polynomial, see edit 1 */
CRC->CR |= CRC_CR_RESET;
CRC->DR = (uint8_t)0x00;
uint32_t crc = CRC->DR;
...
The CRC32 implementation on STM32Fx seems to be not the standard CRC32 implementation you find on many online CRC calculators and the one used in zip.
STM32 implements CRC32-MPEG2 which uses big endian and no final flip mask compared to the zip CRC32 which uses little endian and a final flip mask.
I found this online calculator which supports CRC32-MPEG2.
If you are more interested on other CRC algorithms and their implementation, look at this link.
PS: The HAL driver from STM supports input in byte, half word and word formats and they seem to work fine for STM32F0x in v1.3.1
From the documentation, it appears that your STM32 code is not just uninteresting — it is rather incomplete. From the documentation, in order to use the CRC hardware you need to:
Enable the CRC peripheral clock via the RCC peripheral.
Set the CRC Data Register to the initial CRC value by configuring the Initial CRC value register (CRC_INIT).(a)
Set the I/O reverse bit order through the REV_IN[1:0] and REV_OUT bits respectively in CRC Control register (CRC_CR).(a)
Set the polynomial size and coefficients through the POLYSIZE[1:0] bits in CRC
Control register (CRC_CR) and CRC Polynomial register (CRC_POL) respectively.(b)
Reset the CRC peripheral through the Reset bit in CRC Control register (CRC_CR).
Set the data to the CRC Data register.
Read the content of the CRC Data register.
Disable the CRC peripheral clock.
Note in particular steps 2, 3, and 4, which define the CRC being computed. They say that their example has rev_in and rev_out false, but for the zlib crc, they need to be true. Depending on the way the hardware is implemented, the polynomial will likely need to reversed as well (0xedb88320UL). The initial CRC needs to be 0xffffffff, and the final CRC inverted to match the zlib crc.
I haven't tested this, but I suspect that you're not, in fact, doing an 8-bit write on the STM32.
Instead you're probably doing a write to the full width of the register (32 bits), which of course means you're computing the CRC32 of more bytes than you intended.
Disassemble the generated code to analyze the exact store instruction that is being used.
I had similar issues implementing a CRC on an STM32 CRC module where the final checksum was not matching. I was able to fix this by looking at the example code in stm32f30x_crc.c that is provided with STMCube. In the source it has a function for 8-bit CRC that used the following code
(uint8_t)(CRC_BASE) = (uint8_t) CRC_Data;
to write to the DR register. As stated previously, the line CRC->DR is defined as volatile uint32_t which will access the whole register. It would have been helpful if ST had been more explicit about accessing the DR register rather than simply saying it supports 8-bit types.
From my experience, you can not compare the CRC32 code between the STM32 CRC unit output with the online CRC32 calculator.
Please find my CRC32 calculator for STM32 in this link.
This function has been used in my project, and proved to be correct.
For reference, getting compliant crc32 using the Low Level API:
void crc_init(void) {
LL_AHB1_GRP1_EnableClock(LL_AHB1_GRP1_PERIPH_CRC);
LL_CRC_SetPolynomialCoef(CRC, LL_CRC_DEFAULT_CRC32_POLY);
LL_CRC_SetPolynomialSize(CRC, LL_CRC_POLYLENGTH_32B);
LL_CRC_SetInitialData(CRC, LL_CRC_DEFAULT_CRC_INITVALUE);
LL_CRC_SetInputDataReverseMode(CRC, LL_CRC_INDATA_REVERSE_WORD);
LL_CRC_SetOutputDataReverseMode(CRC, LL_CRC_OUTDATA_REVERSE_BIT);
}
uint32_t crc32(const char* const buf, const uint32_t len) {
uint32_t data;
uint32_t index;
LL_CRC_ResetCRCCalculationUnit(CRC);
/* Compute the CRC of Data Buffer array*/
for (index = 0; index < (len / 4); index++) {
data = (uint32_t)((buf[4 * index + 3] << 24) |
(buf[4 * index + 2] << 16) |
(buf[4 * index + 1] << 8) | buf[4 * index]);
LL_CRC_FeedData32(CRC, data);
}
/* Last bytes specific handling */
if ((len % 4) != 0) {
if (len % 4 == 1) {
LL_CRC_FeedData8(CRC, buf[4 * index]);
}
if (len % 4 == 2) {
LL_CRC_FeedData16(CRC, (uint16_t)((buf[4 * index + 1] << 8) |
buf[4 * index]));
}
if (len % 4 == 3) {
LL_CRC_FeedData16(CRC, (uint16_t)((buf[4 * index + 1] << 8) |
buf[4 * index]));
LL_CRC_FeedData8(CRC, buf[4 * index + 2]);
}
}
return LL_CRC_ReadData32(CRC) ^ 0xffffffff;
}
Tweaked from the example here
The LL_CRC_ResetCRCCalculationUnit() re-initializes the crc unit so that repeated calls to this function return the expected value, otherwise the initial value will be whatever the result of the previous crc was, instead of 0xFFFFFFFF
You probably don't need the lines setting the polynomial to the default value because they should already be initialized to these values from reset, but I've left them in for posterity.

Resources