Parallel Verilog CRC algorithm from C-like reference - c

I have a set of c-like snippets provided that describe a CRC algorithm, and this article that explains how to transform a serial implementation to parallel that I need to implement in Verilog.
I tried using multiple online code generators, both serial and parallel (although serial would not work in final solution), and also tried working with the article, but got no similar results to what these snippets generate.
I should say I'm more or less exclusively hardware engineer and my understanding of C is rudimentary. I also never worked with CRC other than straightforward shift register implementation. I can see the polynomial and initial value from what I have, but that is more or less it.
Serial implementation uses augmented message. Should I also create parallel one for 6 bits wider message and append zeros to it?
I do not understand too well how the final value crc6 is generated. CrcValue is generated using the CalcCrc function for the final zeros of augmented message, then its top bit is written to its place in crc6 and removed before feeding it to the function again. Why is that? When working the algorithm to get the matrices for the parallel implementation, I should probably take crc6 as my final result, not last value of CrcValue?
Regardless of how crc6 is obtained, in the snippet for CRC check only runs through the function. How does that work?
Here are the code snippets:
const unsigned crc6Polynom =0x03; // x**6 + x + 1
unsigned CalcCrc(unsigned crcValue, unsigned thisbit) {
unsigned m = crcValue & crc6Polynom;
while (m > 0) {
thisbit ^= (m & 1);
m >>= 1;
return (((thisbit << 6) | crcValue) >> 1);
}
}
// obtain CRC6 for sending (6 bit)
unsigned GetCrc(unsigned crcValue) {
unsigned crc6 = 0;
for (i = 0; i < 6; i++) {
crcValue = CalcCrc(crcValue, 0);
crc6 |= (crcValue & 0x20) | (crc6 >> 1);
crcValue &= 0x1F; // remove output bit
}
return (crc6);
}
// Calculate CRC6
unsigned crcValue = 0x3F;
for (i = 1; i < nDataBits; i++) { // Startbit excluded
unsigned thisBit = (unsigned)((telegram >> i) & 0x1);
crcValue = CalcCrc(crcValue, thisBit);
}
/* now send telegram + GetCrc(crcValue) */
// Check CRC6
unsigned crcValue = 0x3F;
for (i = 1; i < nDataBits+6; i++) { // No startbit, but with CRC
unsigned thisBit = (unsigned)((telegram >> i) & 0x1);
crcValue = CalcCrc(crcValue, thisBit);
}
if (crcValue != 0) { /* put error handler here */ }
Thanks in advance for any advice, I'm really stuck there.

xoring bits of the data stream can be done in parallel because only the least signficant bit is used for feedback (in this case), and the order of the data stream bit xor operations doesn't affect the result.
Whether the hardware would need a parallel version depends on how a data stream is handled. The hardware could calculate the CRC one bit at a time during transmission or reception. If the hardware is staged to work with 6 bit characters, then a parallel version would make sense.
Since the snippets use a right shift for the CRC, it would seem that data for each 6 bit character is transmitted and received least significant bit first, to allow for hardware that could calculate CRC 1 bit at a time as it's transmitted or received. After all 6 bit data characters are transmitted, then the 6 bit CRC is transmitted (also least significant bit first).
The snippets seem wrong. My guess at what they should be:
/* calculate crc6 1 bit at a time */
const unsigned crc6Polynom =0x43; /* x**6 + x + 1 */
unsigned CalcCrc(unsigned crcValue, unsigned thisbit) {
crcValue ^= thisbit;
if(crcValue&1)
crcValue ^= crc6Polynom;
crcValue >>= 1;
return crcValue;
}
Example for passing 6 bits at a time. A 64 by 6 bit table lookup could be used to replace the for loop.
/* calculate 6 bits at a time */
unsigned CalcCrc6(unsigned crcValue, unsigned sixbits) {
int i;
crcValue ^= sixbits;
for(i = 0; i < 6; i++){
if(crcValue&1)
crcValue ^= crc6Polynom;
crcValue >>= 1;
}
return crcValue;
}
Assume that telegram contains 31 bits, 1 start bit + 30 data bits (five 6 bit characters):
/* code to calculate crc 6 bits at a time: */
unsigned crcValue = 0x3F;
int i;
telegram >>= 1; /* skip start bit */
for (i = 0; i < 5; i++) {
crcValue = CalcCrc6(unsigned crcValue, telegram & 0x3f);
telegram >>= 6;
}

Related

MSP430 I2C read multiple bytes communication problem

I'm trying to use a temperature sensor(PCT2075) by MSP430F249
To get a temperature, I get a 2bytes from this sensor.
I wrote a code from this link.
https://e2e.ti.com/support/microcontrollers/msp430/f/166/t/589712?MSP430FR5969-Read-multiple-bytes-of-data-i2c-with-repeated-start-and-without-interrupts
I'm using MSP430F249. so I modified a code from this link.
Howerver, I got just two same value. I think that it is MSByte.
Is there any way to get 2bytes from sensor.
my code here
void i2c_read_multi(uint8_t slv_addr, uint8_t reg_addr, uint8_t l, uint8_t *arr)
{
uint8_t i;
while(UCB0STAT & UCBBUSY);
UCB0I2CSA = slv_addr; // set slave address
UCB0CTL1 |= UCTR | UCTXSTT; // transmitter mode and START condition.
while(UCB0CTL1 & UCTXSTT);
UCB0TXBUF = reg_addr;
while(!(UCB0CTL1 & UCTXSTT));
UCB0CTL1 &= ~UCTR; // receiver mode
UCB0CTL1 |= UCTXSTT; // START condition
while(UCB0CTL1 & UCTXSTT); // make sure start has been cleared
for (i = 0; i < l; i++) {
while(!(IFG2 & UCB0RXIFG));
if(i == l - 1){
UCB0CTL1 |= UCTXSTP; // STOP condition
}
arr[i] = UCB0RXBUF;
}
while(UCB0CTL1 & UCTXSTP);
}
There are two issues ...
The linked to code assumes that the port only needs to read one byte for each output value.
But, based on the sensor documentation you've shown, for each value output to the array, we need to read two bytes (one for MSB and one for LSB).
And, we need to merge those two byte values into one 16 bit value. Note that arr is now uint16_t instead of uint8_t. And, l is now the number of [16 bit] samples (vs. number of bytes). So, the caller of this may need to be adjusted accordingly.
Further, note that we have to "ignore" the lower 5 bits of lsb. We do that by shifting the 16 bit value right by 5 bits (e.g. val16 >>= 5). I assume that's the correct way to do it. Or, it could be just val16 &= ~0x1F [less likely]. You may have to experiment a bit.
Here's the refactored code.
Note that this assumes the data arrives in "big endian" order [based on my best guess]. If it's actually little endian, reverse the msb = and lsb = statements.
Also, the placement of the "STOP" condition code may need to be adjusted. I had to guess as to whether it should be placed above the LSB read or MSB read.
I chose LSB--the last byte because that's closest to how the linked general i2c read is done. (i.e.) i2c doesn't know about or care about the MSB/LSB multiplexing of the device in question. It wants the STOP just before the last byte [not the 16 bit sample].
void
i2c_read_multi(uint8_t slv_addr, uint8_t reg_addr, uint8_t l,
uint16_t *arr)
{
uint8_t i;
uint8_t msb;
uint8_t lsb;
uint16_t val16;
while (UCB0STAT & UCBBUSY);
// set slave address
UCB0I2CSA = slv_addr;
// transmitter mode and START condition.
UCB0CTL1 |= UCTR | UCTXSTT;
while (UCB0CTL1 & UCTXSTT);
UCB0TXBUF = reg_addr;
while (!(UCB0CTL1 & UCTXSTT));
// receiver mode
UCB0CTL1 &= ~UCTR;
// START condition
UCB0CTL1 |= UCTXSTT;
// make sure start has been cleared
while (UCB0CTL1 & UCTXSTT);
for (i = 0; i < l; i++) {
while (!(IFG2 & UCB0RXIFG));
msb = UCB0RXBUF;
while (!(IFG2 & UCB0RXIFG));
// STOP condition
if (i == l - 1) {
UCB0CTL1 |= UCTXSTP;
}
lsb = UCB0RXBUF;
val16 = msb;
val16 <<= 8;
val16 |= lsb;
// use only most 11 significant bits
// NOTE: this _may_ not be the correct way to scale the data
val16 >>= 5;
arr[i] = val16;
}
while (UCB0CTL1 & UCTXSTP);
}

Bit bang DATA and CLOCK synchronisation issue

I'm attempting to bit bang a sequence of data from an ATSAME70Q21 to a series of shift registers (TLC5971 LED drivers). My production hardware will likely make use of hardware SPI for this purpose, but my current prototype is making use of GPIO pins so I'd like to use bit banging.
I'm fairly comfortable in arranging the data in the correct sequence as per the TLC5971 datasheet, and in shifting each bit at a time. I have an oscilloscope connected to both the DATA (BBDAT) and CLOCK (BBCLK) lines, which has illustrated something not quite right with the function.
If I simply toggle each of the BBDAT and BBCLK lines HIGH and then immediately LOW again, I get a very stable 1.85MHz signal on both lines on the oscilloscope. However, when I loop through the data structure, setting DATA either HIGH or LOW depending on the bit value, and then toggling CLOCK HIGH/LOW, the resulting signal frequency on BBDAT is around an order of magnitude lower than that of BBCLK. Given that I only toggle BBCLK once at the end of each BBDAT set/clear, I'm not sure why this would be the case.
void writeData(void)
{
Pio *baseData = (Pio *)(uintptr_t)PIOD;
Pio *baseClock = (Pio *)(uintptr_t)PIOB;
//I have 3 TLC5971 ICs in series, hence i < 3
for (uint i = 0, i < 3; i++) {
//Each packet for each TLC5971 IC is 28 bytes long
for (uint j = 0; j < 28; j++) {
for (uint k = 0; k < 8; k++) {
if (dataPacket[i][j] & (1 << k)) {
//If bit is 1, Set DATA line (BBDAT)
baseData->PIO_SODR = 1U << (BBDAT & 0x1F);
} else {
//If bit is 0, Clear DATA line (BBDAT)
baseData->PIO_CODR = 1U << (BBDAT & 0x1F);
}
//Toggle the CLOCK line (BBCLK)
baseClock->PIO_SODR = 1U << (BBCLK & 0x1F);
baseClock->PIO_CODR = 1U << (BBCLK & 0x1F);
}
}
}
The above code (with no optimisation) produces a BBDAT signal of ~100kHz and a BBCLK signal of ~1.4MHz. What am I missing here?

byte order using GCC struct bit packing

I am using GCC struct bit fields in an attempt interpret 8 byte CAN message data. I wrote a small program as an example of one possible message layout. The code and the comments should describe my problem. I assigned the 8 bytes so that all 5 signals should equal 1. As the output shows on an Intel PC, that is hardly the case. All CAN data that I deal with is big endian, and the fact that they are almost never packed 8 bit aligned makes htonl() and friends useless in this case. Does anyone know of a solution?
#include <stdio.h>
#include <netinet/in.h>
typedef union
{
unsigned char data[8];
struct {
unsigned int signal1 : 32;
unsigned int signal2 : 6;
unsigned int signal3 : 16;
unsigned int signal4 : 8;
unsigned int signal5 : 2;
} __attribute__((__packed__));
} _message1;
int main()
{
_message1 message1;
unsigned char incoming_data[8]; //This is how this message would come in from a CAN bus for all signals == 1
incoming_data[0] = 0x00;
incoming_data[1] = 0x00;
incoming_data[2] = 0x00;
incoming_data[3] = 0x01; //bit 1 of signal 1
incoming_data[4] = 0x04; //bit 1 of signal 2
incoming_data[5] = 0x00;
incoming_data[6] = 0x04; //bit 1 of signal 3
incoming_data[7] = 0x05; //bit 1 of signal 4 and signal 5
for(int i = 0; i < 8; ++i){
message1.data[i] = incoming_data[i];
}
printf("signal1 = %x\n", message1.signal1);
printf("signal2 = %x\n", message1.signal2);
printf("signal3 = %x\n", message1.signal3);
printf("signal4 = %x\n", message1.signal4);
printf("signal5 = %x\n", message1.signal5);
}
Because struct packing order varies between compilers and architectures, the best option is to use a helper function to pack/unpack the binary data instead.
For example:
static inline void message1_unpack(uint32_t *fields,
const unsigned char *buffer)
{
const uint64_t data = (((uint64_t)buffer[0]) << 56)
| (((uint64_t)buffer[1]) << 48)
| (((uint64_t)buffer[2]) << 40)
| (((uint64_t)buffer[3]) << 32)
| (((uint64_t)buffer[4]) << 24)
| (((uint64_t)buffer[5]) << 16)
| (((uint64_t)buffer[6]) << 8)
| ((uint64_t)buffer[7]);
fields[0] = data >> 32; /* Bits 32..63 */
fields[1] = (data >> 26) & 0x3F; /* Bits 26..31 */
fields[2] = (data >> 10) & 0xFFFF; /* Bits 10..25 */
fields[3] = (data >> 2) & 0xFF; /* Bits 2..9 */
fields[4] = data & 0x03; /* Bits 0..1 */
}
Note that because the consecutive bytes are interpreted as a single unsigned integer (in big-endian byte order), the above will be perfectly portable.
Instead of an array of fields, you could use a structure, of course; but it does not need to have any resemblance to the on-the-wire structure at all. However, if you have several different structures to unpack, an array of (maximum-width) fields usually turns out to be easier and more robust.
All sane compilers will optimize the above code just fine. In particular, GCC with -O2 does a very good job.
The inverse, packing those same fields to a buffer, is very similar:
static inline void message1_pack(unsigned char *buffer,
const uint32_t *fields)
{
const uint64_t data = (((uint64_t)(fields[0] )) << 32)
| (((uint64_t)(fields[1] & 0x3F )) << 26)
| (((uint64_t)(fields[2] & 0xFFFF )) << 10)
| (((uint64_t)(fields[3] & 0xFF )) << 2)
| ( (uint64_t)(fields[4] & 0x03 ) );
buffer[0] = data >> 56;
buffer[1] = data >> 48;
buffer[2] = data >> 40;
buffer[3] = data >> 32;
buffer[4] = data >> 24;
buffer[5] = data >> 16;
buffer[6] = data >> 8;
buffer[7] = data;
}
Note that the masks define the field length (0x03 = 0b11 (2 bits), 0x3F = 0b111111 (16 bits), 0xFF = 0b11111111 (8 bits), 0xFFFF = 0b1111111111111111 (16 bits)); and the shift amount depends on the bit position of the least significant bit in each field.
To verify such functions work, pack, unpack, repack, and re-unpack a buffer that should contain all zeros except one of the fields all ones, and verify the data stays correct over two roundtrips. It usually suffices to detect the typical bugs (wrong bit shift amounts, typos in masks).
Note that documentation will be key to ensure the code remains maintainable. I'd personally add comment blocks before each of the above functions, similar to
/* message1_unpack(): Unpack 8-byte message to 5 fields:
field[0]: Foobar. Bits 32..63.
field[1]: Buzz. Bits 26..31.
field[2]: Wahwah. Bits 10..25.
field[3]: Cheez. Bits 2..9.
field[4]: Blop. Bits 0..1.
*/
with the field "names" reflecting their names in documentation.

how to make a 3D mask

Currently I meet one technique issue, which makes me want to improve the previous implementation, the situation is:
I have 5 GPIO pins, I need use these pins as the hardware identifier, for example:
pin1: LOW
pin2: LOW
pin3: LOW
pin4: LOW
pin5: LOW
this means one of my HW variants, so we can have many combinations. In previous design, the developer use if-else to implement this, just like:
if(PIN1 == LOW && ... && ......&& PIN5 ==LOW)
{
HWID = variant1;
}
else if( ... )
{
}
...
else
{
}
but I think this is not good because it will have more than 200 variants, and the code will become to long, and I want changed it to a mask. The idea is I treat this five pins as a five bits register, and because I can predict which variant I need to assign according to GPIOs status(this already defined by hardware team, they provide a variant list, with all these GPIO pins configuration), therefore, the code may look like this:
enum{
variant 0x0 //GPIO config 1
...
variant 0xF3 //GPIO config 243
}
then I can first read these five GPIO pins status, and compare to some mask to see if they are equal or not.
Question
However, for GPIO, it has three status, namely: LOW, HIGH, OPEN. If there is any good calculation method to have a 3-D mask?
You have 5 pins of 3 states each. You can approach representing this in a few ways.
First, imagine using this sort of framework:
#define LOW (0)
#define HIGH (1)
#define OPEN (2)
uint16_t config = PIN_CONFIG(pin1, pin2, pin3, pin4, pin5);
if(config == PIN_CONFIG(LOW, HIGH, OPEN, LOW, LOW))
{
// do something
}
switch(config) {
case PIN_CONFIG(LOW, HIGH, OPEN, LOW, HIGH):
// do something;
break;
}
uint16_t config_max = PIN_CONFIG(OPEN, OPEN, OPEN, OPEN, OPEN);
uint32_t hardware_ids[config_max + 1] = {0};
// init your hardware ids
hardware_ids[PIN_CONFIG(LOW, HIGH, HIGH, LOW, LOW)] = 0xF315;
hardware_ids[PIN_CONFIG(LOW, LOW, HIGH, LOW, LOW)] = 0xF225;
// look up a HWID
uint32_t hwid = hardware_ids[config];
This code is just the sort of stuff you'd like to do with pin configurations. The only bit left to implement is PIN_CONFIG
Approach 1
The first approach is to keep using it as a bitfield, but instead of 1 bit per pin you use 2 bits to represent each pin state. I think this is the cleanest, even though you're "wasting" half a bit for each pin.
#define PIN_CLAMP(x) ((x) & 0x03)
#define PIN_CONFIG(p1, p2, p3, p4, p5) \\
(PIN_CLAMP(p1) & \\
(PIN_CLAMP(p2) << 2) & \\
(PIN_CLAMP(p3) << 4) & \\
(PIN_CLAMP(p4) << 6) & \\
(PIN_CLAMP(p5) << 8))
This is kind of nice because it leaves room for a "Don't care" or "Invalid" value if you are going to do searches later.
Approach 2
Alternatively, you can use arithmetic to do it, making sure you use the minimum amount of bits necessary. That is, ~1.5 bits to encode 3 values. As expected, this goes from 0 up to 242 for a total of 3^5=243 states.
Without knowing anything else about your situation I believe this is the smallest complete encoding of your pin states.
(Practically, you have to use 8 bits to encode 243 values, so it's higher 1.5 bits per pin)
#define PIN_CLAMP(x) ((x) % 3) /* note this should really assert */
#define PIN_CONFIG(p1, p2, p3, p4, p5) \\
(PIN_CLAMP(p1) & \\
(PIN_CLAMP(p2) * 3) & \\
(PIN_CLAMP(p3) * 9) & \\
(PIN_CLAMP(p4) * 27) & \\
(PIN_CLAMP(p5) * 81))
Approach 1.1
If you don't like preprocessor stuff, you could use functions a bit like this:
enum PinLevel (low = 0, high, open);
void set_pin(uint32_t * config, uint8_t pin_number, enum PinLevel value) {
int shift = pin_number * 2; // 2 bits
int mask = 0x03 << shift; // 2 bits set to on, moved to the right spot
*config &= ~pinmask;
*config |= (((int)value) << shift) & pinmask;
}
enum PinLevel get_pin(uint32_t config, uint8_t pin_number) {
int shift = pin_number * 2; // 2 bits
return (enum PinLevel)((config >> shift) & 0x03);
}
This follows the first (2 bit per value) approach.
Approach 1.2
YET ANOTHER WAY using C's cool bitfield syntax:
struct pins {
uint16_t pin1 : 2;
uint16_t pin2 : 2;
uint16_t pin3 : 2;
uint16_t pin4 : 2;
uint16_t pin5 : 2;
};
typedef union pinconfig_ {
struct pins pins;
uint16_t value;
} pinconfig;
pinconfig input;
input.value = 0; // don't forget to init the members unless static
input.pins.pin1 = HIGH;
input.pins.pin2 = LOW;
printf("%d", input.value);
input.value = 0x0003;
printd("%d", input.pins.pin1);
The union lets you view the bitfield as a number and vice versa.
(note: all code completely untested)
This is my suggestion to solve the problem
#include<stdio.h>
#define LOW 0
#define HIGH 1
#define OPEN 2
#define MAXGPIO 5
int main()
{
int gpio[MAXGPIO] = { LOW, LOW, OPEN, HIGH, OPEN };
int mask = 0;
for (int i = 0; i < MAXGPIO; i++)
mask = mask << 2 | gpio[i];
printf("Masked: %d\n", mask);
printf("Unmasked:\n");
for (int i = 0; i < MAXGPIO; i++)
printf("GPIO %d = %d\n", i + 1, (mask >> (2*(MAXGPIO-1-i))) & 0x03);
return 0;
}
A little explanation about the code.
Masking
I am using 2 bits to save each GPIO value. The combinations are:
00: LOW
01: HIGH
02: OPEN
03 is Invalid
I am iterating the array gpio (where I have the acquired values) and creating a mask in the mask variable shifting left 2 bits and applying an or operation.
Unmasking
To get the initial values I am just making the opposite operation shifting right 2 bits multiplied by the amount of GPIO - 1 and masking with 0x03
I am applying a mask with 0x03 because those are the bit I am interested.
This is the result of the program
$ cc -Wall test.c -o test;./test
Masked: 38
Unmasked:
GPIO 1 = 0
GPIO 2 = 0
GPIO 3 = 2
GPIO 4 = 1
GPIO 5 = 2
Hope this helps

_mm_crc32_u8 gives different result than reference code

I've been struggling with the intrinsics. In particular I don't get the same results using the standard CRC calculation and the supposedly equivalent intel intrinsics. I'd like to move to using _mm_crc32_u16, and _mm_crc32_u32 but if I can't get the 8 bit operation to work there's no point.
static UINT32 g_ui32CRC32Table[256] =
{
0x00000000L, 0x77073096L, 0xEE0E612CL, 0x990951BAL,
0x076DC419L, 0x706AF48FL, 0xE963A535L, 0x9E6495A3L,
0x0EDB8832L, 0x79DCB8A4L, 0xE0D5E91EL, 0x97D2D988L,
....
// Your basic 32-bit CRC calculator
// NOTE: this code cannot be changed
UINT32 CalcCRC32(unsigned char *pucBuff, int iLen)
{
UINT32 crc = 0xFFFFFFFF;
for (int x = 0; x < iLen; x++)
{
crc = g_ui32CRC32Table[(crc ^ *pucBuff++) & 0xFFL] ^ (crc >> 8);
}
return crc ^ 0xFFFFFFFF;
}
UINT32 CalcCRC32_Intrinsic(unsigned char *pucBuff, int iLen)
{
UINT32 crc = 0xFFFFFFFF;
for (int x = 0; x < iLen; x++)
{
crc = _mm_crc32_u8(crc, *pucBuff++);
}
return crc ^ 0xFFFFFFFF;
}
That table is for a different CRC polynomial than the one used by the Intel instruction. The table is for the Ethernet/ZIP/etc. CRC, often referred to as CRC-32. The Intel instruction uses the iSCSI (Castagnoli) polynomial, for the CRC often referred to as CRC-32C.
This short example code can calculate either, by uncommenting the desired polynomial:
#include <stddef.h>
#include <stdint.h>
/* CRC-32 (Ethernet, ZIP, etc.) polynomial in reversed bit order. */
#define POLY 0xedb88320
/* CRC-32C (iSCSI) polynomial in reversed bit order. */
/* #define POLY 0x82f63b78 */
/* Compute CRC of buf[0..len-1] with initial CRC crc. This permits the
computation of a CRC by feeding this routine a chunk of the input data at a
time. The value of crc for the first chunk should be zero. */
uint32_t crc32c(uint32_t crc, const unsigned char *buf, size_t len)
{
int k;
crc = ~crc;
while (len--) {
crc ^= *buf++;
for (k = 0; k < 8; k++)
crc = crc & 1 ? (crc >> 1) ^ POLY : crc >> 1;
}
return ~crc;
}
You can use this code to generate a replacement table for your code by simply computing the CRC-32C of each of the one-byte messages 0, 1, 2, ..., 255.
FWIW, I've obtained SW code that demonstrably matches the Intel crc32c instruction, but it uses a different polynomial: 0x82f63b78 The function definitely doesn't match any of the iSCSI test examples here: https://www.rfc-editor.org/rfc/rfc3720#appendix-B.4
What's frustrating in all this is every implementation I've tried for CRC-32C comes out with different hashes from all the others. Is there a true piece of reference code out there?

Resources