C socket sending multiple fields message with binary protocol - c

How to construct a request message with a given message specification, and then send to server thought c socket? Binary protocol is employed for Client and Server communication. Are the following approaches correct?
Given message specification:
Field Fomat Length values
------------ ------ ------ --------
requesID Uint16 2 20
requestNum Uint16 2 100
requestTitle String 10 data sring
/************** approach 1 ****************/
typedef unsigned short uint16;
typedef struct {
uint16 requesID [2];
uint16 requestNum [2];
unsigned char requestTitle [10];
}requesMsg;
…
requesMsg rqMsg;
memcpy(rqMsg.requesID, "\x0\x14", 2); //20
memcpy(rqMsg.requesNum, "\x0\x64", 2); //100
memcpy(rqMsg.requesTitle, "title01 ", 10);
…
send(sockfd, &rqMsg, sizeof(rqMsg), 0);
/************** approach 2 ****************/
unsigned char rqMsg[14];
memset(rqMsg, 0, 14);
memcpy(rqMsg, "\x0\x14", 2);
memcpy(rqMsg+2, "\x0\x64", 2);
memcpy(rqMsg+4, "title01 ", 10);
…
send(sock, &rqMsg, sizeof(rqMsg), 0);

I'm afraid you are misunderstanding something: The length column appears to tell you the length in bytes, so if you receive a uint16 you receive 2 bytes.
Your first approach could lead to serious problem through data structure alignment. If I were in your shoes I'd prefer the second approach and fill in the bytes on my own into a byte array.
A general note about filling fields here: I'ts useless to use memcpy for "native" fields like uint16, etc. It might work but is simply a waste of runtime. You can fill in fields of a struct simply assigning them a value like rqMsg.requesID = 20;
Another issue is the question of byte order or endianness of your binary protocol.
As a whole package, I'd implement a "serializeRequest" function taking fields of your struct and convert it into a byte array according to the protocol.

Both of them are at least partially correct but I much prefer the first one because it allows for quick and natural data manipulations and access and leaves less space for errors compared to manual indexing. As a bonus you can even copy and assign structure values as a whole and in C it works as expected.
But for any outgoing data you should make sure to use a "packed" struct. Not only it will reduce the amount of data transmitted down to the array-based implementation figure but it will also make sure that the fields alignments are the same in all the programs involved. For most C compilers I tried (GCC included) it can be done with __attribute__((__packed__)) attribute, but there are different compilers that require different attributes or even a different keyword.
Also endianness control may be required if your application is going to run on different architectures (ARM clients vs x86_64 server is a major example). I just use some simple macros like these to preprocess each field individually before doing any calculations or data output:
#define BYTE_SWAP16(num) ( ((num & 0xFF) << 8) | ((num >> 8) & 0xFF) )
#define BYTE_SWAP32(num) ( ((num>>24)&0xff) | ((num<<8)&0xff0000) | ((num>>8)&0xff00) | ((num<<24)&0xff000000) )
But you can use different approaches like BCD encoding, separate decoding functions or something else.
Also notice that uint16_t is already a 2-byte value. You probably don't need two of them to store your single values.

Related

Access struct as array of bytes

I'm currently in the process of re-writing a program to process data received over a serial connection using the RDM protocol, each packet is received by a UART and has a specific structure but may vary in length, a packet structure example is below, assuming the number of bytes in the packet to be n (this may change depending on the contents of the packet)
What I want to do is define a struct in my C code that has the various parameters defined, but to be able to read and write bytes to/from the struct from the UART as though the struct is just an array of uint8_t. My issue with this is that I have read that structs may not always be stored in continuous sections of memory, so taking &RDMPacket1 and increment through the struct may end up with the data not being in the right place.
My other problem is that if I have an array to store a packet data of the maximum possible length (220 bytes) inside the struct, then the checksum at the end of the packet would be written into the wrong place. What methods could be used to receive the data and place it into the struct?
Example packet definition (shortened from standard)
Byte | Description
0 | START Code - Constant, can be ignored
1 | Sub-Start Code - Contains command for device to process
2 | Message Length - Points to byte number of Checksum High (up to 255)
3-8 | Destination UID - Unique ID of packet Destination
9-14 | Source UID - Unique ID of packet Source
15 | Transaction Number - ID of transaction between controller and responder
16-(n-2) | Data (up to 220 bytes long)
n-1 | Checksum High
n | Checksum Low
This is an example of a struct to hold the a packet of maximum possible length:
struct RDMPacket
{
uint8_t subStartCode;
uint8_t messageLength;
uint32_t destinationUID;
uint32_t sourceUID;
uint8_t transactionNumber;
uint8_t portID;
uint8_t messageCount;
uint8_t subDevice;
uint8_t commandClass
uint8_t parameterID;
uint8_t parameterDataLength;
uint8_t parameterData[220];
uint16_t checksum
} RDMPacket1;
The problem that you are describing can arise when you are dealing with a non-byte aligned memory structure. In this case each struct field will have the specific alignment. I.e., if the alignment is 4 bytes, each field will start on an address that is divisible by 4. To avoid this, you can use GCC's attribute packed for the structure, that instructs the compiler to pack the structure to a minimal memory. In other compilers there is #pragma pack or some other corresponding compiler directives for this purpose. To make sure that your struct is packed, you can check it's size with sizeof and compare it to the expected size.

Reverse the Endianness of a C structure

I have a structure in C that looks like this:
typedef u_int8_t NN;
typedef u_int8_t X;
typedef int16_t S;
typedef u_int16_t U;
typedef char C;
typedef struct{
X test;
NN test2[2];
C test3[4];
U test4;
} Test;
I have declared the structure and written values to the fields as follows:
Test t;
int t_buflen = sizeof(t);
memset( &t, 0, t_buflen);
t.test = 0xde;
t.test2[0]=0xad; t.test2[1]=0x00;
t.test3[0]=0xbe; t.test3[1]=0xef; t.test3[2]=0x00; t.test3[3]=0xde;
t.test4=0xdeca;
I am sending this structure via UDP to a server. At present this works fine when I test locally, however I now need to send this structure from my little-endian machine to a big-endian machine. I'm not really sure how to do this.
I've looked into using htons but I'm not sure if that's applicable in this situation as it seem to only be defined for unsigned ints of 16 or 32 bits, if I understood correctly.
I think there may be two issues here depending on how you're sending this data over TCP.
Issue 1: Endianness
As, you've said endianness is an issue. You're right when you mention using htons and ntohs for shorts. You may also find htonl and its opposite useful too.
Endianness has to do with the byte ordering of multiple-byte data types in memory. Therefore, for single byte-width data types you do not have to worry. In your case is is the 2-byte data that I guess you're questioning.
To use these functions you will need to do something like the following...
Sender:
-------
t.test = 0xde; // Does not need to be swapped
t.test2[0] = 0xad; ... // Does not need to be swapped
t.test3[0] = 0xbe; ... // Does not need to be swapped
t.test4 = htons(0xdeca); // Needs to be swapped
...
sendto(..., &t, ...);
Receiver:
---------
recvfrom(..., &t, ...);
t.test4 = ntohs(0xdeca); // Needs to be swapped
Using htons() and ntohs() use the Ethernet byte ordering... big endian. Therefore your little-endian machine byte swaps t.test4 and on receipt the big-endian machine just uses that value read (ntohs() is a noop effectively).
The following diagram will make this more clear...
If you did not want to use the htons() function and its variants then you could just define the buffer format at the byte level. This diagram make's this more clear...
In this case your code might look something like
Sender:
-------
uint8_t buffer[SOME SIZE];
t.test = 0xde;
t.test2[0] = 0xad; ...
t.test3[0] = 0xbe; ...
t.test4 = 0xdeca;
buffer[0] = t.test;
buffer[1] = t.test2[0];
/// and so on, until...
buffer[7] = t.test4 & 0xff;
buffer[8] = (t.test4 >> 8) & 0xff;
...
sendto(..., buffer, ...);
Receiver:
---------
uint8_t buffer[SOME SIZE];
recvfrom(..., buffer, ...);
t.test = buffer[0];
t.test2[0] = buffer[1];
// and so on, until...
t.test4 = buffer[7] | (buffer[8] << 8);
The send and receive code will work regardless of the respective endianness of the sender and receiver because the byte-layout of the buffer is defined and known by the program running on both machines.
However, if you're sending your structure through the socket in this way you should also note the caveat below...
Issue 2: Data alignment
The article "Data alignment: Straighten up and fly right" is a great read for this one...
The other problem you might have is data alignment. This is not always the case, even between machines that use different endian conventions, but is nevertheless something to watch out for...
struct
{
uint8_t v1;
uint16_t v2;
}
In the above bit of code the offset of v2 from the start of the structure could be 1 byte, 2 bytes, 4 bytes (or just about anything). The compiler cannot re-order members in your structure, but it can pad the distance between variables.
Lets say machine 1 has a 16-bit wide data bus. If we took the structure without padding the machine will have to do two fetches to get v2. Why? Because we access 2 bytes of memory at a time at the h/w level. Therefore the compiler could pad out the structure like so
struct
{
uint8_t v1;
uint8_t invisible_padding_created_by_compiler;
uint16_t v2;
}
If the sender and receiver differ on how they pack data into a structure then just sending the structure as a binary blob will cause you problems. In this case you may have to pack the variables into a byte stream/buffer manually before sending. This is often the safest way.
There's no endianness of the structure really. It's all the separate fields that need to be converted to big-endian when needed. You can either make a copy of the structure and rewrite each field using hton/htons, then send the result. 8-bit fields don't need any modification of course.
In case of TCP you could also just send each part separately and count on nagle algorithm to merge all parts into a single packet, but with UDP you need to prepare everything up front.
The data you are sending over the network should be the same regardless of the endianess of the machines involved. The key word you need to research is serialization. This means converting a data structure to a series of bits/bytes to be sent over a network or saved to disk, which will always be the same regardless of anything like architecture or compiler.

Why both utf-16le and utf-16be exists? endianness efficiency - C

I was wondering why both utf-16le and utf-16be exists? Is it considered to be "inefficient" for a big-endian environment to process a little-endian data?
Currently, this is what I use while storing 2 bytes var locally:
unsigned char octets[2];
short int shotint = 12345; /* (assuming short int = 2 bytes) */
octets[0] = (shortint) & 255;
octets[1] = (shortint >> 8) & 255);
I know that while storing and reading as a fixed endianness locally - there is no endian risk. I was wondering if it's considered to be "inefficient"? what would be the most "efficient" way to store a 2 bytes var? (while restricting the data to the environment's endianness, local use only.)
Thanks, Doori Bar
This allows code to write large amounts of Unicode data to a file without conversion. During loading, you must always check the endianess. If you're lucky, you need no conversion. So in 66% of the cases, you need no conversion and only on 33% you must convert.
In memory, you can then access the data using the native datatypes of your CPU which allows for efficient processing.
That way, everyone can be as happy as possible.
So in your case, you need to check the encoding when loading the data but in RAM, you can use an array of short int to process it.
[EDIT] The fastest way to convert a 16bit value to 2 octets is:
char octet[2];
short * prt = (short*)&octet[0];
*ptr = 12345;
Now you don't know if octet[0] is the low or upper 8 bits. To find that out, write a know value and then examine it.
This will give you one of the encodings; the native one of your CPU.
If you need the other encoding, you can either swap the octets as you write them to a file (i.e. write them octet[1],octet[0]) or your code.
If you have several octets, you can use 32bit integers to swap two 16bit values at once:
char octet[4];
short * prt = (short*)&octet[0];
*ptr ++ = 12345;
*ptr ++ = 23456;
int * ptr32 = (int*)&octet[0];
int val = ((*ptr32 << 8) & 0xff00ff00) || (*ptr >> 8) & 0x00ff00ff);

How do I unpack bits from a structure's stream_data in c code?

Ex.
typedef struct
{
bool streamValid;
dword dateTime;
dword timeStamp;
stream_data[800];
} RadioDataA;
Ex. Where stream_data[800] contains:
**Variable** **Length (in bits)**
packetID 8
packetL 8
versionMajor 4
versionMinor 4
radioID 8
etc..
I need to write:
void unpackData(radioDataA *streamData, MA_DataA *maData)
{
//unpack streamData (from above) & put some of the data into maData
//How do I read in bits of data? I know it's by groups of 8 but I don't understand how.
//MAData is also a struct.
}
I'm not sure I understood it right, but why can't you do just:
memcpy(maData, streamData->stream_data, sizeof(MA_DataA));
This will fully copy data contained in the array of bytes to the structure.
Your types are inconsistent or unspecified. I believe you are trying to extract packed data from a byte stream. If so, assume buf contains your data packed in order with the lengths specified. The following code should then extract each field correctly:
int packetID = buf[0];
int packetL = buf[1];
int versionMajor = (buf[2] >> 4);
int versionMinor = (buf[2] & 0x0F);
int radioID = buf[3];
As you can see, the byte-aligned values are straightforward copies. However, the 4-bit fields must be masked and/or shifted to extract only the desired data. For more information on bitwise operations refer to the excellent Bit Twiddling Hacks code snippets.
I'm just trying to unpack data and output it. I'm just stuck on how to work with bits and keeping an index and determining how to truncate it into my different variables.
The stream_data[800] is of type byte. Sorry!!
I don't think memcopy will work because it's not 1:1 direct transfer.
hope you get what I mean!

Safe, efficient way to access unaligned data in a network packet from C

I'm writing a program in C for Linux on an ARM9 processor. The program is to access network packets which include a sequence of tagged data like:
<fieldID><length><data><fieldID><length><data> ...
The fieldID and length fields are both uint16_t. The data can be 1 or more bytes (up to 64k if the full length was used, but it's not).
As long as <data> has an even number of bytes, I don't see a problem. But if I have a 1- or 3- or 5-byte <data> section then the next 16-bit fieldID ends up not on a 16-bit boundary and I anticipate alignment issues. It's been a while since I've done any thing like this from scratch so I'm a little unsure of the details. Any feedback welcome. Thanks.
To avoid alignment issues in this case, access all data as an unsigned char *. So:
unsigned char *p;
//...
uint16_t id = p[0] | (p[1] << 8);
p += 2;
The above example assumes "little endian" data layout, where the least significant byte comes first in a multi-byte number.
You should have functions (inline and/or templated if the language you're using supports those features) that will read the potentially unaligned data and return the data type you're interested in. Something like:
uint16_t unaligned_uint16( void* p)
{
// this assumes big-endian values in data stream
// (which is common, but not universal in network
// communications) - this may or may not be
// appropriate in your case
unsigned char* pByte = (unsigned char*) p;
uint16_t val = (pByte[0] << 8) | pByte[1];
return val;
}
The easy way is to manually rebuild the uint16_ts, at the expense of speed:
uint8_t *packet = ...;
uint16_t fieldID = (packet[0] << 8) | packet[1]; // assumes big-endian host order
uint16_t length = (packet[2] << 8) | packet[2];
uint8_t *data = packet + 4;
packet += 4 + length;
If your processor supports it, you can type-pun or use a union (but beware of strict aliasing).
uint16_t fieldID = htons(*(uint16_t *)packet);
uint16_t length = htons(*(uint16_t *)(packet + 2));
Note that unaligned access aren't always supported (e.g. they might generate a fault of some sort), and on other architectures, they're supported, but there's a performance penalty.
If the packet isn't aligned, you could always copy it into a static buffer and then read it:
static char static_buffer[65540];
memcpy(static_buffer, packet, packet_size); // make sure packet_size <= 65540
uint16_t fieldId = htons(*(uint16_t *)static_buffer);
uint16_t length = htons(*(uint16_t *)(static_buffer + 2));
Personally, I'd just go for option #1, since it'll be the most portable.
Alignment is always going to be fine, although perhaps not super-efficient, if you go through a byte pointer.
Setting aside issues of endian-ness, you can memcpy from the 'real' byte pointer into whatever you want/need that is properly aligned and you will be fine.
(this works because the generated code will load/store the data as bytes, which is alignment safe. It's when the generated assembly has instructions loading and storing 16/32/64 bits of memory in a mis-aligned manner that it all falls apart).

Resources