Access struct as array of bytes - c

I'm currently in the process of re-writing a program to process data received over a serial connection using the RDM protocol, each packet is received by a UART and has a specific structure but may vary in length, a packet structure example is below, assuming the number of bytes in the packet to be n (this may change depending on the contents of the packet)
What I want to do is define a struct in my C code that has the various parameters defined, but to be able to read and write bytes to/from the struct from the UART as though the struct is just an array of uint8_t. My issue with this is that I have read that structs may not always be stored in continuous sections of memory, so taking &RDMPacket1 and increment through the struct may end up with the data not being in the right place.
My other problem is that if I have an array to store a packet data of the maximum possible length (220 bytes) inside the struct, then the checksum at the end of the packet would be written into the wrong place. What methods could be used to receive the data and place it into the struct?
Example packet definition (shortened from standard)
Byte | Description
0 | START Code - Constant, can be ignored
1 | Sub-Start Code - Contains command for device to process
2 | Message Length - Points to byte number of Checksum High (up to 255)
3-8 | Destination UID - Unique ID of packet Destination
9-14 | Source UID - Unique ID of packet Source
15 | Transaction Number - ID of transaction between controller and responder
16-(n-2) | Data (up to 220 bytes long)
n-1 | Checksum High
n | Checksum Low
This is an example of a struct to hold the a packet of maximum possible length:
struct RDMPacket
{
uint8_t subStartCode;
uint8_t messageLength;
uint32_t destinationUID;
uint32_t sourceUID;
uint8_t transactionNumber;
uint8_t portID;
uint8_t messageCount;
uint8_t subDevice;
uint8_t commandClass
uint8_t parameterID;
uint8_t parameterDataLength;
uint8_t parameterData[220];
uint16_t checksum
} RDMPacket1;

The problem that you are describing can arise when you are dealing with a non-byte aligned memory structure. In this case each struct field will have the specific alignment. I.e., if the alignment is 4 bytes, each field will start on an address that is divisible by 4. To avoid this, you can use GCC's attribute packed for the structure, that instructs the compiler to pack the structure to a minimal memory. In other compilers there is #pragma pack or some other corresponding compiler directives for this purpose. To make sure that your struct is packed, you can check it's size with sizeof and compare it to the expected size.

Related

Parsing ID3V2 Frames in C

I have been attempting to retrieve ID3V2 Tag Frames by parsing through the mp3 file and retrieving each frame's size. So far I have had no luck.
I have effectively allocated memory to a buffer to aid in reading the file and have been successful in printing out the header version but am having difficulty in retrieving both the header and frame sizes. For the header framesize I get 1347687723, although viewing the file in a hex editor I see 05 2B 19.
Two snippets of my code:
typedef struct{ //typedef structure used to read tag information
char tagid[3]; //0-2 "ID3"
unsigned char tagversion; //3 $04
unsigned char tagsubversion;//4 00
unsigned char flags; //5-6 %abc0000
uint32_t size; //7-10 4 * %0xxxxxxx
}ID3TAG;
if(buff){
fseek(filename,0,SEEK_SET);
fread(&Tag, 1, sizeof(Tag),filename);
if(memcmp(Tag.tagid,"ID3", 3) == 0)
{
printf("ID3V2.%02x.%02x.%02x \nHeader Size:%lu\n",Tag.tagversion,
Tag.tagsubversion, Tag.flags ,Tag.size);
}
}
Due to memory alignment, the compiler has set 2 bytes of padding between flags and size. If your struct were putted directly in memory, size would be at address 6 (from the beginning of the struct). Since an element of 4 bytes size must be at an address multiple of 4, the compiler adds 2 bytes, so that size moves to the closest multiple of 4 address, which is here 8. So when you read from your file, size contains bytes 8-11. If you try to print *(&Tag.size - 2), you'll surely get the correct result.
To fix that, you can read fields one by one.
ID3v2 header structure is consistent across all ID3v2 versions (ID3v2.0, ID3v2.3 and ID3v2.4).
Its size is stored as a big-endian synch-safe int32
Synchsafe integers are
integers that keep its highest bit (bit 7) zeroed, making seven bits
out of eight available. Thus a 32 bit synchsafe integer can store 28
bits of information.
Example:
255 (%11111111) encoded as a 16 bit synchsafe integer is 383
(%00000001 01111111).
Source : http://id3.org/id3v2.4.0-structure § 6.2
Below is a straightforward, real-life C# implementation that you can easily adapt to C
public int DecodeSynchSafeInt32(byte[] bytes)
{
return
bytes[0] * 0x200000 + //2^21
bytes[1] * 0x4000 + //2^14
bytes[2] * 0x80 + //2^7
bytes[3];
}
=> Using values you read on your hex editor (00 05 EB 19), the actual tag size should be 112025 bytes.
By coincidence I am also working on an ID3V2 reader. The doc says that the size is encoded in four 7-bit bytes. So you need another step to convert the byte array into an integer... I don't think just reading those bytes as an int will work because of the null bit on top.

casting pointers in a buffer

Say I have a buffer filled with data and that I got it off the network.
uint8_t buffer[100];
Now imagine that this buffer has different fields. Some are 1 byte, some 2 bytes, and some 4 bytes. All these fields are packed in the buffer.
Now pretend that I want to grab the value of one of the 16 bit fields. Say that in the buffer, the field is stored like so:
buffer[2] = one byte of two byte field
buffer[3] = second byte of two byte field
I could grab that value like this:
uint16_t* p_val;
p_val = (int16_t*) &buffer[2];
or
p_val = (int16_t*) (buffer + 2);
printf("value: %d\n", ntohs(*p_val));
Is there anything wrong with this approach? Or alignment issues I should watch out for?
As has come out in commentary, yes, there are issues with your proposed approach. Although it might work on the target machine, or it might happen to work in a given case, it is not, in general, safe to cast between different pointer types. (There are exceptions.)
To properly take alignment and byte order into consideration, you could do this:
union convert {
uint32_t word;
uint16_t halfword[2];
uint8_t bytes[4];
} convert;
uint16_t result16;
memcpy(convert.bytes, buffer + offset, 2);
/* assuming network byte order: */
result16 = ntohs(convert.halfword[0]);
If you are in control of the data format, then network byte order is a good choice, as the program doesn't then need explicitly to determine, assume, or know the byte order of the machine on which it is running.

Reverse the Endianness of a C structure

I have a structure in C that looks like this:
typedef u_int8_t NN;
typedef u_int8_t X;
typedef int16_t S;
typedef u_int16_t U;
typedef char C;
typedef struct{
X test;
NN test2[2];
C test3[4];
U test4;
} Test;
I have declared the structure and written values to the fields as follows:
Test t;
int t_buflen = sizeof(t);
memset( &t, 0, t_buflen);
t.test = 0xde;
t.test2[0]=0xad; t.test2[1]=0x00;
t.test3[0]=0xbe; t.test3[1]=0xef; t.test3[2]=0x00; t.test3[3]=0xde;
t.test4=0xdeca;
I am sending this structure via UDP to a server. At present this works fine when I test locally, however I now need to send this structure from my little-endian machine to a big-endian machine. I'm not really sure how to do this.
I've looked into using htons but I'm not sure if that's applicable in this situation as it seem to only be defined for unsigned ints of 16 or 32 bits, if I understood correctly.
I think there may be two issues here depending on how you're sending this data over TCP.
Issue 1: Endianness
As, you've said endianness is an issue. You're right when you mention using htons and ntohs for shorts. You may also find htonl and its opposite useful too.
Endianness has to do with the byte ordering of multiple-byte data types in memory. Therefore, for single byte-width data types you do not have to worry. In your case is is the 2-byte data that I guess you're questioning.
To use these functions you will need to do something like the following...
Sender:
-------
t.test = 0xde; // Does not need to be swapped
t.test2[0] = 0xad; ... // Does not need to be swapped
t.test3[0] = 0xbe; ... // Does not need to be swapped
t.test4 = htons(0xdeca); // Needs to be swapped
...
sendto(..., &t, ...);
Receiver:
---------
recvfrom(..., &t, ...);
t.test4 = ntohs(0xdeca); // Needs to be swapped
Using htons() and ntohs() use the Ethernet byte ordering... big endian. Therefore your little-endian machine byte swaps t.test4 and on receipt the big-endian machine just uses that value read (ntohs() is a noop effectively).
The following diagram will make this more clear...
If you did not want to use the htons() function and its variants then you could just define the buffer format at the byte level. This diagram make's this more clear...
In this case your code might look something like
Sender:
-------
uint8_t buffer[SOME SIZE];
t.test = 0xde;
t.test2[0] = 0xad; ...
t.test3[0] = 0xbe; ...
t.test4 = 0xdeca;
buffer[0] = t.test;
buffer[1] = t.test2[0];
/// and so on, until...
buffer[7] = t.test4 & 0xff;
buffer[8] = (t.test4 >> 8) & 0xff;
...
sendto(..., buffer, ...);
Receiver:
---------
uint8_t buffer[SOME SIZE];
recvfrom(..., buffer, ...);
t.test = buffer[0];
t.test2[0] = buffer[1];
// and so on, until...
t.test4 = buffer[7] | (buffer[8] << 8);
The send and receive code will work regardless of the respective endianness of the sender and receiver because the byte-layout of the buffer is defined and known by the program running on both machines.
However, if you're sending your structure through the socket in this way you should also note the caveat below...
Issue 2: Data alignment
The article "Data alignment: Straighten up and fly right" is a great read for this one...
The other problem you might have is data alignment. This is not always the case, even between machines that use different endian conventions, but is nevertheless something to watch out for...
struct
{
uint8_t v1;
uint16_t v2;
}
In the above bit of code the offset of v2 from the start of the structure could be 1 byte, 2 bytes, 4 bytes (or just about anything). The compiler cannot re-order members in your structure, but it can pad the distance between variables.
Lets say machine 1 has a 16-bit wide data bus. If we took the structure without padding the machine will have to do two fetches to get v2. Why? Because we access 2 bytes of memory at a time at the h/w level. Therefore the compiler could pad out the structure like so
struct
{
uint8_t v1;
uint8_t invisible_padding_created_by_compiler;
uint16_t v2;
}
If the sender and receiver differ on how they pack data into a structure then just sending the structure as a binary blob will cause you problems. In this case you may have to pack the variables into a byte stream/buffer manually before sending. This is often the safest way.
There's no endianness of the structure really. It's all the separate fields that need to be converted to big-endian when needed. You can either make a copy of the structure and rewrite each field using hton/htons, then send the result. 8-bit fields don't need any modification of course.
In case of TCP you could also just send each part separately and count on nagle algorithm to merge all parts into a single packet, but with UDP you need to prepare everything up front.
The data you are sending over the network should be the same regardless of the endianess of the machines involved. The key word you need to research is serialization. This means converting a data structure to a series of bits/bytes to be sent over a network or saved to disk, which will always be the same regardless of anything like architecture or compiler.

C socket sending multiple fields message with binary protocol

How to construct a request message with a given message specification, and then send to server thought c socket? Binary protocol is employed for Client and Server communication. Are the following approaches correct?
Given message specification:
Field Fomat Length values
------------ ------ ------ --------
requesID Uint16 2 20
requestNum Uint16 2 100
requestTitle String 10 data sring
/************** approach 1 ****************/
typedef unsigned short uint16;
typedef struct {
uint16 requesID [2];
uint16 requestNum [2];
unsigned char requestTitle [10];
}requesMsg;
…
requesMsg rqMsg;
memcpy(rqMsg.requesID, "\x0\x14", 2); //20
memcpy(rqMsg.requesNum, "\x0\x64", 2); //100
memcpy(rqMsg.requesTitle, "title01 ", 10);
…
send(sockfd, &rqMsg, sizeof(rqMsg), 0);
/************** approach 2 ****************/
unsigned char rqMsg[14];
memset(rqMsg, 0, 14);
memcpy(rqMsg, "\x0\x14", 2);
memcpy(rqMsg+2, "\x0\x64", 2);
memcpy(rqMsg+4, "title01 ", 10);
…
send(sock, &rqMsg, sizeof(rqMsg), 0);
I'm afraid you are misunderstanding something: The length column appears to tell you the length in bytes, so if you receive a uint16 you receive 2 bytes.
Your first approach could lead to serious problem through data structure alignment. If I were in your shoes I'd prefer the second approach and fill in the bytes on my own into a byte array.
A general note about filling fields here: I'ts useless to use memcpy for "native" fields like uint16, etc. It might work but is simply a waste of runtime. You can fill in fields of a struct simply assigning them a value like rqMsg.requesID = 20;
Another issue is the question of byte order or endianness of your binary protocol.
As a whole package, I'd implement a "serializeRequest" function taking fields of your struct and convert it into a byte array according to the protocol.
Both of them are at least partially correct but I much prefer the first one because it allows for quick and natural data manipulations and access and leaves less space for errors compared to manual indexing. As a bonus you can even copy and assign structure values as a whole and in C it works as expected.
But for any outgoing data you should make sure to use a "packed" struct. Not only it will reduce the amount of data transmitted down to the array-based implementation figure but it will also make sure that the fields alignments are the same in all the programs involved. For most C compilers I tried (GCC included) it can be done with __attribute__((__packed__)) attribute, but there are different compilers that require different attributes or even a different keyword.
Also endianness control may be required if your application is going to run on different architectures (ARM clients vs x86_64 server is a major example). I just use some simple macros like these to preprocess each field individually before doing any calculations or data output:
#define BYTE_SWAP16(num) ( ((num & 0xFF) << 8) | ((num >> 8) & 0xFF) )
#define BYTE_SWAP32(num) ( ((num>>24)&0xff) | ((num<<8)&0xff0000) | ((num>>8)&0xff00) | ((num<<24)&0xff000000) )
But you can use different approaches like BCD encoding, separate decoding functions or something else.
Also notice that uint16_t is already a 2-byte value. You probably don't need two of them to store your single values.

Safe, efficient way to access unaligned data in a network packet from C

I'm writing a program in C for Linux on an ARM9 processor. The program is to access network packets which include a sequence of tagged data like:
<fieldID><length><data><fieldID><length><data> ...
The fieldID and length fields are both uint16_t. The data can be 1 or more bytes (up to 64k if the full length was used, but it's not).
As long as <data> has an even number of bytes, I don't see a problem. But if I have a 1- or 3- or 5-byte <data> section then the next 16-bit fieldID ends up not on a 16-bit boundary and I anticipate alignment issues. It's been a while since I've done any thing like this from scratch so I'm a little unsure of the details. Any feedback welcome. Thanks.
To avoid alignment issues in this case, access all data as an unsigned char *. So:
unsigned char *p;
//...
uint16_t id = p[0] | (p[1] << 8);
p += 2;
The above example assumes "little endian" data layout, where the least significant byte comes first in a multi-byte number.
You should have functions (inline and/or templated if the language you're using supports those features) that will read the potentially unaligned data and return the data type you're interested in. Something like:
uint16_t unaligned_uint16( void* p)
{
// this assumes big-endian values in data stream
// (which is common, but not universal in network
// communications) - this may or may not be
// appropriate in your case
unsigned char* pByte = (unsigned char*) p;
uint16_t val = (pByte[0] << 8) | pByte[1];
return val;
}
The easy way is to manually rebuild the uint16_ts, at the expense of speed:
uint8_t *packet = ...;
uint16_t fieldID = (packet[0] << 8) | packet[1]; // assumes big-endian host order
uint16_t length = (packet[2] << 8) | packet[2];
uint8_t *data = packet + 4;
packet += 4 + length;
If your processor supports it, you can type-pun or use a union (but beware of strict aliasing).
uint16_t fieldID = htons(*(uint16_t *)packet);
uint16_t length = htons(*(uint16_t *)(packet + 2));
Note that unaligned access aren't always supported (e.g. they might generate a fault of some sort), and on other architectures, they're supported, but there's a performance penalty.
If the packet isn't aligned, you could always copy it into a static buffer and then read it:
static char static_buffer[65540];
memcpy(static_buffer, packet, packet_size); // make sure packet_size <= 65540
uint16_t fieldId = htons(*(uint16_t *)static_buffer);
uint16_t length = htons(*(uint16_t *)(static_buffer + 2));
Personally, I'd just go for option #1, since it'll be the most portable.
Alignment is always going to be fine, although perhaps not super-efficient, if you go through a byte pointer.
Setting aside issues of endian-ness, you can memcpy from the 'real' byte pointer into whatever you want/need that is properly aligned and you will be fine.
(this works because the generated code will load/store the data as bytes, which is alignment safe. It's when the generated assembly has instructions loading and storing 16/32/64 bits of memory in a mis-aligned manner that it all falls apart).

Resources