Compute the crc16 of a bytearray / userdata in lua - c

I am writing a Wireshark protocol dissector in lua. The protocol it parses contains a crc16 checksum. The dissector should check whether the crc is correct.
I have found a crc16 implementation written in C already with the lua wrapper code here. I have successfully compiled it and run it (e.g. crc16.compute("test")). The problem is it expects a string as input. From wireshark, I get a buffer that seems to be of lua type userdata. So when I do
crc16.compute(buffer(5, 19))
Lua complains bad argument #1 to compute (string expected, got userdata).
compute() in the crc16 implementation looks like this:
static int compute(lua_State *L)
{
const char *data;
size_t len = 0;
unsigned short r, crc = 0;
data = luaL_checklstring(L, 1, &len);
for ( ; len > 0; len--)
{
r = (unsigned short)(crc >> 8);
crc <<= 8;
crc ^= crc_table[r ^ *data];
data ++;
}
lua_pushinteger(L, crc);
return 1;
}
It seems luaL_checklstring fails. So I guess I would either need to convert the input into a lua string, which I am not sure it works, as not all bytes of my input are necessarily printable characters. Or I would need to adjust the above code so it accepts input of type userdata. I found lua_touserdata(), but this seems to return something like a pointer. So I would need a second argument for the length, right?
I don't necessarily need to use this implementation. Any crc16 implementation for lua that accepts userdata would perfectly solve the problem.

The buffer that you get from wireshark can be used as a ByteArray like this:
byte_array = Buffer(5,19):bytes();
ByteArray has a _toString function that converts the bytes into a string representation of the bytes represented as hex. So you can call the crc function like this:
crc16.compute(tostring(byte_array))
'Representation of the bytes represented as hex' means an input byte with the bits 11111111 will turn into the ASCII string FF. The ASCII string FF is 01000110 01000110 in bits or 46 46in hex. This means what you get in C, is not the original bytearray. You need to decode the ascii representation back into the original bytes before computing the crc, otherwise we will obviously get a different crc.
First, this function converts a single character c containing one ascii hex character back into the value it represents:
static char ascii2char(char c) {
c = tolower(c);
if(c >= '0' && c <= '9')
return c - '0';
else if(c >= 'a' && c <= 'f')
return c - 'a' + 10;
}
Now in the compute function we loop through the string representation, always combining two characters into one byte.
int compute(lua_State *L) {
size_t len;
const char * str = lua_tolstring(L, 1, &len);
uint8_t * data = (uint8_t *) malloc(len/2);
for(int n=0; n<len/2; n++) {
data[n] = ascii2char(str[2*n]) << 4;
data[n] |= ascii2char(str[2*n+1]);
}
crc16_t crc = crc16_init();
crc = crc16_update(crc, data, len/2);
crc = crc16_finalize(crc);
lua_pushinteger(L, crc);
free(data);
return 1;
}
In this example, I used the crc functions crc16_init, crc16_update and crc16_finalize generated using pycrc, not the crc implementation linked in the question. The problem is that you need to use the same polynom etc. as when generating the crc. Pycrc allows you the generate crc functions as needed.
My packets also contain a crc32. Pycrc can also generate code for crc32, so it works all the same way for crc32.

Christopher K outlines what is mostly the correct answer, but the conversion of hex values back into bytes seemed a little like hardwork, but this got me looking as I was searching for something like this.
The trick missed was that as well as calling the function with a buffer:bytes() you can also call
buffer:raw()
This provides exactly what is needed: a simple TSTRING that can be parsed directly without the need to do ascii conversions that would, I imagine, add significantly to the load in the C code.

Related

Run length encoding on binary files in C

I wrote this function that performs a slightly modified variation of run-length encoding on text files in C.
I'm trying to generalize it to binary files but I have no experience working with them. I understand that, while I can compare bytes of binary data much the same way I can compare chars from a text file, I am not sure how to go about printing the number of occurrences of a byte to the compressed version like I do in the code below.
A note on the type of RLE I'm using: bytes that occur more than once in a row are duplicated to signal the next-to-come number is in fact the number of occurrences vs just a number following the character in the file. For occurrences longer than one digit, they are broken down into runs that are 9 occurrences long.
For example, aaaaaaaaaaabccccc becomes aa9aa2bcc5.
Here's my code:
char* encode(char* str)
{
char* ret = calloc(2 * strlen(str) + 1, 1);
size_t retIdx = 0, inIdx = 0;
while (str[inIdx]) {
size_t count = 1;
size_t contIdx = inIdx;
while (str[inIdx] == str[++contIdx]) {
count++;
}
size_t tmpCount = count;
// break down counts with 2 or more digits into counts ≤ 9
while (tmpCount > 9) {
tmpCount -= 9;
ret[retIdx++] = str[inIdx];
ret[retIdx++] = str[inIdx];
ret[retIdx++] = '9';
}
char tmp[2];
ret[retIdx++] = str[inIdx];
if (tmpCount > 1) {
// repeat character (this tells the decompressor that the next digit
// is in fact the # of consecutive occurrences of this char)
ret[retIdx++] = str[inIdx];
// convert single-digit count to string
snprintf(tmp, 2, "%ld", tmpCount);
ret[retIdx++] = tmp[0];
}
inIdx += count;
}
return ret;
}
What changes are in order to adapt this to a binary stream? The first problem I see is with the snprintf call since it's operating using a text format. Something that rings a bell is also the way I'm handling the multiple-digit occurrence runs. We're not working in base 10 anymore so that has to change, I'm just unsure how having almost never worked with binary data.
A few ideas that can be useful to you:
one simple method to generalize RLE to binary data is to use a bit-based compression. For example the bit sequence 00000000011111100111 can be translated to the sequence 0 9623. Since the binary alphabet is composed by only two symbols, you need to only store the first bit value (this can be as simple as storing it in the very first bit) and then the number of the contiguous equal values. Arbitrarily large integers can be stored in a binary format using Elias gamma coding. Extra padding can be added to fit the entire sequence nicely into an integer number of bytes. So using this method, the above sequence can be encoded like this:
00000000011111100111 -> 0 0001001 00110 010 011
^ ^ ^ ^ ^
first bit 9 6 2 3
If you want to keep it byte based, one idea is to consider all the even bytes frequencies (interpreted as an unsigned char) and all the odd bytes the values. If one byte occur more than 255 times, than you can just repeat it. This can be very inefficient, though, but it is definitively simple to implement, and it might be good enough if you can make some assumptions on the input.
Also, you can consider moving out from RLE and implement Huffman's coding or other sophisticated algorithms (e.g. LZW).
Implementation wise, i think tucuxi already gave you some hints.
You only have to address 2 problems:
you cannot use any str-related functions, because C strings do not deal well with '\0'. So for example, strlen will return the index of the 1st 0x0 byte in a string. The length of the input must be passed in as an additional parameter: char *encode(char *start, size_t length)
your output cannot have an implicit length of strlen(ret), because there may be extra 0-bytes sprinkled about in the output. You again need an extra parameter: size_t encode(char *start, size_t length, char *output) (this version would require the output buffer to be reserved externally, with a size of at least length*2, and return the length of the encoded string)
The rest of the code, assuming it was working before, should continue to work correctly now. If you want to go beyond base-10, and for instance use base-256 for greater compression, you would only need to change the constant in the break-things-up loop (from 9 to 255), and replace the snprintf as follows:
// before
snprintf(tmp, 2, "%ld", tmpCount);
ret[retIdx++] = tmp[0];
// after: much easier
ret[retIdx++] = tmpCount;

c reinterpret pointer to datatype with bigger size

I'm trying to interpret WebSocket Frames that I get over a TCP connection. I want to do this in pure C (so no reinterpret_cast). The Format is specified in IEEE RFC 6455. I want to fill the following struct:
typedef struct {
uint8_t flags;
uint8_t opcode;
uint8_t isMasked;
uint64_t payloadLength;
uint32_t maskingKey;
char* payloadData;
} WSFrame;
with the following Function:
static void parseWsFrame(char *data, WSFrame *frame) {
frame->flags = (*data) & FLAGS_MASK;
frame->opcode = (*data) & OPCODE_MASK;
//next byte
data += 1;
frame->isMasked = (*data) & IS_MASKED;
frame->payloadLength = (*data) & PAYLOAD_MASK;
//next byte
data += 1;
if (frame->payloadLength == 126) {
frame->payloadLength = *((uint16_t *)data);
data += 2;
} else if (frame->payloadLength == 127) {
frame->payloadLength = *((uint64_t *)data);
data += 8;
}
if (frame->isMasked) {
frame->maskingKey = *((uint32_t *)data);
data += 4;
}else{
//still need to initialize it to shut up the compiler
frame->maskingKey = 0;
}
frame->payloadData = data;
}
The code is for the ESP8266, so debugging is only possible with printfs to the serial console. Using this method, I discovered that the code crashes right after the frame->maskingKey = *((uint32_t *)data); and the first two ifs get skipped, so this is the first time I cast a pointer to another pointer.
The data is not \0 terminated, but i get the size in the data received callback. In my test, I'm trying to send the message 'test' over the already established WebSocket, and the received data length is 10, so:
1 byte flags and opcode
1 byte masked and payload length
4 bytes masking key
4 bytes payload length
At the point the code crashes, I expect data to be offsetted by 2 bytes from the initial position, so it has enough data to read the following 4 bytes.
I did not code any C for a long time, so I expect only a small error in my code.
PS.: I've seen a lot code where they interpret the values byte-by-byte and shift the values, but I see no reason why this method should not work either.
The problem with casting a char* to a pointer to a larger type is that some architectures do not allow unaligned reads.
That is, for example, if you try to read a uint32_t through a pointer, then the value of the pointer itself has to be a multiple of 4. Otherwise, on some architectures, you will get a bus fault (e.g. - signal, trap, exception, etc.) of some sort.
Because this data is coming in over TCP and the format of the stream / protocol is laid out without any padding, then you will likely need to read it out from the buffer into local variables byte by byte (e.g. - using memcpy) as appropriate. For example:
if (frame->isMasked) {
mempcy(&frame->maskingKey, data, 4);
data += 4;
// TODO: handle endianness: e.g.: frame->maskingKey = ntohl(frame->maskingKey);
}else{
//still need to initialize it to shut up the compiler
frame->maskingKey = 0;
}
There's two problems:
data might not be correctly aligned for uint32_t
The bytes in data might not be in the same order as your hardware uses for value representation of integer. (sometimes called "endianness issue").
To write reliable code, look at the message specification to see which order the bytes are coming in. If they are most-significant-byte first then the portable version of your code would be:
unsigned char *udata = (unsigned char *)data;
frame->maskingKey = udata[0] * 0x1000000ul
+ udata[1] * 0x10000ul
+ udata[2] * 0x100ul
+ udata[3];
This might look like a handful at first, but you could make an inline function that takes a pointer as argument, and returns the uint32_t, which will keep your code readable.
Similar problem applies to your reads of uint16_t.

Converting from hexadecimal string to byte array in C

I am sending byte arrays between a TCP socket server & client in C. The information that I am sending is a series of integers.
I have it working, but because I am not too conversant with C, I was wondering if anyone could suggest a better solution, or at least to look and tell me that I'm not being too crazy or using outdated code with what I'm doing.
First, I generate a random decimal value, let's say "350". I need to transmit this over the socket connection as a hex byte array. It is decoded back to its decimal value at the other end.
So far, I convert it to hex this way:
unsigned char hexstr[4];
sprintf(hexstr, "%02X", numToConvert); \\ where numToConvert is a decimal integer value like 350
At this point, I have a string in hexstr that's something like "15E" (again, using the hex value of 350 for an example).
Now, I need to store this in a byte array so that it looks something like: myArray = {0X00, 0X00, 0X01, 0X5E};
Obviously I can't just write: myArray = {0X00, 0X00, 0X01, 0X5E} because the values will be different every time, since a new random number is generated every time.
Currently, I do it like this (pseudocode because the string manipulation part is irrelevant but long):
lastTwoChars = getLastTwoCharsFromString(hexstr); // so lastTwoChars would now contain "5E"
Then (actual code):
sscanf(lastTwoChars, "%0X", &res); // now the variable res contains the byte representation of lastTwoChars, is my understanding
Then finally:
myArray[3] = res;
Then, I take the next two rightmost chars from hexstr (again, using the sample value of "15E", this would be "01" -- if there's only 1 more character, as in this case "1" was the only character left after taking out "5E" from "15E", I add 0s to the left to pad) and convert that the same way using sscanf, then insert into myArray[2]. Repeat for myArray[1] and myArray[0].
Then I send the array using write().
So, after hours of plugging away at it, this all does work... but because I don't use C very much, I have a nagging suspicion that there's something I am missing in all this. Can anyone comment if what I'm doing seems OK, or there's something obvious I'm using improperly or neglecting to use?
#include <stdio.h>
#include <limits.h>
int main(){
unsigned num = 0x15E;//num=350
int i, size = sizeof(unsigned);
unsigned char myArray[size];
for(i=size-1;i>=0;--i, num>>=CHAR_BIT){
myArray[i] = num & 0xFF;
}
for(i=0;i<size;++i){
printf("0X%02hhX ", myArray[i]);//0X02X
}
printf("\n");
return 0;
}
On the transmit side, convert a 32-bit number to a four byte array with this code
void ConvertValueToArray( uint32_t value, uint8_t array[] )
{
int i;
for ( i = 3; i >= 0; i-- )
{
array[i] = value & 0xff;
value >>= 8;
}
}
On the receive side, convert the byte array back into a number with this code
uint32_t ConvertArrayToValue( uint8_t array[] )
{
int i;
uint32_t value = 0;
for ( i = 0; i < 4; i++ )
{
value <<= 8;
value |= array[i];
}
return( value );
}
Note that it's important not to use generic types like int when writing this kind of code, since an int can be different sizes on different systems. The fixed-sized types are defined in <stdint.h>.
Here's a simple test that demonstrates the conversions (without actually sending the byte arrays over the network).
#include <stdio.h>
#include <stdint.h>
int main( void )
{
uint32_t input, output;
uint8_t byte_array[4];
input = 350;
ConvertValueToArray( input, byte_array );
output = ConvertArrayToValue( byte_array );
printf( "%u\n", output );
}
If your array is 4-byte aligned (and even if it isn't on machines that support unaligned access), you can use the htonl function to convert a 32-bit integer from host to network byte order and store the whole thing at once:
#include <arpa/inet.h> // or <netinet/in.h>
...
*(uint32_t*)myArray = htonl(num);

Decode FOUR_BITS of a byte in a byte array (in C)

I am writing a code to decode the byte array based on user input (EIGHT_BITS at a time or FOUR_BITS at a time). I've actually managed to decode the byte array based on EIGHT_BITS. Now I want to decode them in terms of FOUR_BITS.
INT DecodeElem(UINT8 *decodeBuf, UINT8 elemlen, UINT8 *tempBuf, UINT8 elemlength){
if (elemlength== EIGHT_BITS){
*tempBuf = getByte(decodeBuf + decodeByteCount);
decodeOffset = 0;
decodeByteCount++;
}
}
ie, if the elemlength= FOUR_BITS, I need to decode the first four bits of a particular byte in the byte array. Could someone let me know how do I do the same without modifying the case for the EIGHT_BITS which I have written above?
What I basically need is another if statement with if (elemlength == FOUR_BITS)
Note: tempBuf is CHAR * type and I can't change the type.
decodeByteCount and decodeOffset are global variables; *decodeBuf is the already encoded byte array which needs to be decoded. elemlen is for future use and I will take care of it.
This is my getByte function:
UINT8 getByte(UINT8 *byteBuf)
{
return ((UINT8)*byteBuf);
}
I'm not sure what decodeOffset and decodeByteCount does....
It should be something like that (assuming each byte has 2 4 bit values. If the assumption is wrong remove "/2" from the code):
if (elemlength== FOUR_BITS){
*tempBuf = getByte(decodeBuf + decodeByteCount / 2);
if(decodeByteCount % 2)
*tempBuff = (*tempBuff & 0xF0) >> 4;
else
*tempBuff = (*tempBuff & 0Xf);
// ???decodeOffset = 0;
// ???decodeByteCount++;
}

Convert HEX value to (chars, string, letters and numbers)

I'm programming AVR microcontroller Atmega16 in C.
I don't understand C, i spend my time programming in PHP, and i just don't get it in C.
I don't get also how can i post sourcecode here.
If someone know, please fix my tags.
My problem is, i have this function:
unsigned char
convert_sn_to_string (const unsigned char *SN_a /* IDstring */ )
{
unsigned char i;
for (i = 0; i < 6; i++) //6 byte az SN
{
if(SN_a[i]==0x00)
{
Send_A_String("00");
}
}
return 1;
}
Inside for loop i can acces 6 byte value by hex.
Inside variable SN_A i can have
SN_A[0]=0x00;
SN_A[1]=0xFF;
SN_A[3]=0xAA;
SN_A[4]=0x11;
(...)
and similar.
It could be from 00 to FF.
What i need, is to convert that code to 12 char string.
If i have
SN_A[0]=0x00;
SN_A[1]=0xFF;
SN_A[3]=0xAA;
SN_A[4]=0x11;
(...)
I would like get at output
new[0]=0;
new[1]=0;
new[2]=A;
new[3]=A;
new[4]=1;
new[5]=1;
(...)
and so to 12, because i would like change that 6 (double) AA values, to separated.
So then i can do a loop
for i=0 i<12 i++
{
do_something_with_one_letter(new[i]);
}
Now i can play with that values, i can send them to display, or anything i need.
A hex value is simply another way of writing integers. 0xFF == 255.
So, if you want to "split" them, you need to first decide how you want to split them. This, essentially, decides on the exact way you stuff the split values into your new array.
To split a value, something like this can be used:
hexval = 0x1A
low_nybble = hexval & 0xF
high_nybble = (hexval >> 4) & 0xF
You now have 1 stored in high_nybble and 10 stored in low_nybble.

Resources