Porting an application from little-endian to big-endian architecture - c

I have a TCP server developed on x86 architecture using C under Linux using berkley socker API. The server runs fine without any problems. But now for some reasons I have to run the server on MIPS architecture which has a big-endian architecture.
The server and the clients communicate through a set of predefined protocol. I will give an example of how a server sends a simple message to the clients:
struct echo_req req;
req.header.version = OFP_VERSION;
req.header.type = OFPT_ECHO_REQUEST;
req.header.length = htons (sizeof req);
req.header.xid = htonl(y);
req.data = htonl (456);
char data[sizeof (req)];
data[0] = req.header.version;
data[1] = req.header.type;
memcpy (data + 2, &req.header.length, 2);
memcpy (data + 4, &req.header.xid, 4);
memcpy (data + 8, &req.data, 4);
if ((send (sock_fd, &data, sizeof (data), 0) == -1))
{
printf ("Error in sending echo request message\n");
exit (-1);
}
printf("Echo Request sent!\n");
As you can see I use htonl and htons for any type longer than a byte to convert it to network byte order. After making up a packet I serialize and pack the data in char array and finally send it over to the netowrk.
Now before I run my server on Big-endian architecture I wanted to clear out a few things. In my perception as I memcpy the data and pack it, if I send it over the network it shouldn't cause any problems on the big-endian architecture as memcpy will perform a byte by byte copy of the data into the array and hence there shouldn't be any problem with the byte ordering when running on Big-endian. Yet I wanted to get the opinion of you people out there which I persume know a lot more than I do as I am still a beginner in network programming :). Please guide me in this whether I am on the right track or not. All help much appreciated.
Thanks

Yes, memcpy just copies bytes in order from a source to a destination.
Without seeing the rest of your code, it's impossible to say that you've used hton(l|s) everywhere you should. It's also possible that you've done something like copying a floating point number byte for byte, which doesn't necessarily work, independent of endianness issues.
I don't see any obvious problems in the code you've posted above though.

Have you made sure you use ntoh/ntos when receiving data as well?
BTW you should simply use the struct for sending data; re-assembling it into the character array does nothing but take CPU time and possibly bear errors.

Related

C convert char* to network byte order before transfer

I'm working on a project where I must send data to server and this client will run on different os's. I knwo the problem with endians on different machines, so I'm converting 'everything' (almost) to network byte order, using htonl and vice-versa on the other end.
Also, I know that for a single byte, I don't need to convert anything. But, what should I do for char*? ex:
send(sock,(char*)text,sizeof(text));
What's the best approach to solve this? Should I create an 'intermediate function' do intercept this 'send', then really send char-by-char of this char array? If so, do I need to convert every char to network byte order? I think no, since every char is only one byte.
Thinking of this, if I create this 'intermediate functions', I don't have to convert nothing more to network byte order, since this function will send char by char, thus don't need conversion of endians.
I any advice on this.
I am presuming from your question that the application layer protocol (more specifically everything above level 4) is under your design control. For single byte-wide (octet-wide in networking parlance) data there is no issue with endian ordering and you need do nothing special to accommodate that. Now if the character data is prepended with a length specifier that is, say 2 octets, then the ordering of the bytes must be treated consistently.
Going with network byte ordering (big-endian) will certainly fill the bill, but so would consistently using little-endian. Consistency of byte ordering on each end of a connection is the crucial issue.
If the protocol is not under your design control, then the protocol specification should offer guidance on the issue of byte ordering for multi-byte integers and you should follow that.

How to represent DNS Resource Records in a structure?

I'm making a program using pcap to parse .pcap files.
I'm actually working on the DNS protocol, i'm able to get the header and display its information. Now I'd like to display its Resource Records (Question, Answer, Authority, Additional).
I found this interesting doc: http://www.zytrax.com/books/dns/ch15/
And, as I did before for parsing the different headers, I wanted to create a structure and cast my packet in it.
Following this doc I created my structure as follow:
struct question_s {
u_short *qname;
u_short qtype;
u_short qclass;
}
and I'm casting :
struct question_s *record = (struct question_s*)(data + offset);
Where data is the packet representation, and offset is the total size of previous protocols.
Now I'm having trouble understanding some points, and as my English is not perfect, it's possible that I missed something in the documentation. Here are my questions:
As qname is of variable size, am I doing it right by making it a pointer on u_short?
All pointer are 8 bytes long, so my structure should be 12 bytes long, but where is the name in memory? Should I add 12 to my offset without taking care of the name length?
I tried to display qname, working on it as if it was a char*, but it doesn't seem to work (seg. fault), here is what I did:
void test(u_short *qname) {
for (int c = 0; qname[c] != 0; ++c)
write(1, &qname[c], 1);
}
But maybe there isn't a '\0' in the string?
May be that's an endianess issue? I use htons and htonl on all my u_short and u_int values because the network byte order isn't the same as mine, but I'm not sure it applies to pointers.
If you want to see how to dissect DNS records, first read and understand RFC 1035, and then take a look at the tcpdump code to dissect DNS records. It's harder than you think; you can't just overlay a structure on top of the raw packet data.
And you can't ever overlay a structure with a pointer in it on top of raw packet data. The pointer will almost certainly point to some bogus location in your address space; protocols don't send raw pointers over the network, as a pointer is a pointer in a particular address space, and two processes on the network will have different address spaces.
(In fact, just about everything in packet dissection is harder than people think when they first try to write code to dissect packets.)

How can I portably send a C struct through a network socket?

Suppose I have a C struct defined as follows :
typedef struct servData {
char max_word[MAX_WORD];
char min_word[MAX_WORD];
int word_count ;
} servSendData ;
where 'MAX_WORD' could be any value.
Now if I have an instance of this structure :
servSendData myData ;
And if I populate this instance and then send it over the network, will there be any portability issues here considering that I want my server as well as the client to be running on either a 64-bit system or a 32-bit system.
I am going to send and receive data as follows :
//server side
strcpy(myData.max_word, "some large word") ;
strcpy(myData.min_word, "small") ;
myData.word_count=100 ;
send(sockFd, (char*)&myData, sizeof(myData);
//client side
recv(sockFd, (char*)&myData, sizeof(myData);
printf("large word is %s\n", myData.max_word) ;
printf("small word is %s\n", myData.min_word) ;
printf("total words is %d\n", myData.word_count) ;
Yes, there definitely will be portability issues.
Alignment of structure members can be different even among different compilers on the same platform, let alone different platforms. And that's all assuming that sizeof(int) is the same across all of them (though granted, it usually is --- but do you really want to rely on "usually" and hope for the best?).
This holds even if MAX_WORD is the same on both computers (I'll assume they are from here on out; if they're not, then you're in trouble here).
What you need to do is send (and receive) each field separately. There is also a problem with sizeof(int) and endianness, so I've added a call to htonl() to convert from system to network byte order (the inverse function is ntohl()). They both return uint32_t which has a fixed, known, size.
send(sockFd, myData.max_word, sizeof(myData.max_word)); // or just MAX_WORD
send(sockFd, myData.min_word, sizeof(myData.min_word));
uint32_t count = htonl(myData.word_count); // convert to network byte order
send(sockFd, &count, sizeof(count));
// error handling!
if((ret = recv(sockFd, myData.max_word, sizeof(myData.max_word))) != sizeof(myData.max_word))
{
// handle error or read more data
}
... // and so on
// remember to convert back from network byte order on recv!
// also keep in mind the third field is now `uint32_t`, and not `int` in the stream
As other relies have stated there are real problems in copying a C structure between different machines with different compilers/word size/and endian structure. One common way to resolve this issue is to transform your data into a machine independent format, transfer it across the network and then transform it back into a structure on the receiver. This is such a common requirement that multiple technologies already exist to do this - the two that spring to my mind initially are gsoap and rpcgen although there are probably many other options.
I've mostly used gsoap and after you get past the initial learning curve you can develop robust solutions that scale well (with multiple threads) and which handles both the networking and data translations for you.
If you don't want to go down this route then the safest approach is to write routines that convert your data to/from a standard string format (if you have issues with Unicode you'll need to take that into account as well) and then send that across the network.
You have to take care about the endians.
May you should use hton() or ntoh() functions, to convert between little and big endian.
You can use structure packing. With most C compilers, you can enforce a specific structure alignment. It is sometimes used for what you need it - to transfer a struct over a network.
Note that this still leaves endianness issues, so this is not a universal solution.
If you are not writing embedded software, sending data between applications without serializing properly is rarely a good idea.
The same goes for using raw sockets, which is not very convenient, and feels a bit like "reinventing the wheel".
Many libraries can help you with both! Of course, you don't have to use them, but reading their documentation, and understanding how they work will help you make better choices. Things you have not yet planned can come out of the box (like, what happens when you want to update your system, and the message format changes?)
For serialization, have a read on those general purpose formats:
Human readable: JSON, XML, YAML, others...
Binary: Protobuf, TPL, Avro, BSON, MessagePack, and many others
For socket abstraction, look up
Boost ASIO
ZeroMQ
nanomsg
Many others

passing a struct over TCP (SOCK_STREAM) socket in C

I have a small client server application in which i wish to send an entire structure over a TCP socket in C not C++. Assume the struct to be the following:
struct something{
int a;
char b[64];
float c;
}
I have found many posts saying that i need to use pragma pack or to serialize the data before sending and recieveing.
My question is, is it enough to use JUST pragma pack or just serialzation ? Or do i need to use both?
Also since serialzation is processor intensive process this makes your performance fall drastically, so what is the best way to serialize a struct WITHOUT using an external library(i would love a sample code/algo)?
You need the following to portably send struct's over the network:
Pack the structure. For gcc and compatible compilers, do this with __attribute__((packed)).
Do not use any members other than unsigned integers of fixed size, other packed structures satisfying these requirements, or arrays of any of the former. Signed integers are OK too, unless your machine doesn't use a two's complement representation.
Decide whether your protocol will use little- or big-endian encoding of integers. Make conversions when reading and writing those integers.
Also, do not take pointers of members of a packed structure, except to those with size 1 or other nested packed structures. See this answer.
A simple example of encoding and decoding follows. It assumes that the byte order conversion functions hton8(), ntoh8(), hton32(), and ntoh32() are available (the former two are a no-op, but there for consistency).
#include <stdint.h>
#include <inttypes.h>
#include <stdlib.h>
#include <stdio.h>
// get byte order conversion functions
#include "byteorder.h"
struct packet {
uint8_t x;
uint32_t y;
} __attribute__((packed));
static void decode_packet (uint8_t *recv_data, size_t recv_len)
{
// check size
if (recv_len < sizeof(struct packet)) {
fprintf(stderr, "received too little!");
return;
}
// make pointer
struct packet *recv_packet = (struct packet *)recv_data;
// fix byte order
uint8_t x = ntoh8(recv_packet->x);
uint32_t y = ntoh32(recv_packet->y);
printf("Decoded: x=%"PRIu8" y=%"PRIu32"\n", x, y);
}
int main (int argc, char *argv[])
{
// build packet
struct packet p;
p.x = hton8(17);
p.y = hton32(2924);
// send packet over link....
// on the other end, get some data (recv_data, recv_len) to decode:
uint8_t *recv_data = (uint8_t *)&p;
size_t recv_len = sizeof(p);
// now decode
decode_packet(recv_data, recv_len);
return 0;
}
As far as byte order conversion functions are concerned, your system's htons()/ntohs() and htonl()/ntohl() can be used, for 16- and 32-bit integers, respectively, to convert to/from big-endian. However, I'm not aware of any standard function for 64-bit integers, or to convert to/from little endian. You can use my byte order conversion functions; if you do so, you have to tell it your machine's byte order by defining BADVPN_LITTLE_ENDIAN or BADVPN_BIG_ENDIAN.
As far as signed integers are concerned, the conversion functions can be implemented safely in the same way as the ones I wrote and linked (swapping bytes directly); just change unsigned to signed.
UPDATE: if you want an efficient binary protocol, but don't like fiddling with the bytes, you can try something like Protocol Buffers (C implementation). This allows you to describe the format of your messages in separate files, and generates source code that you use to encode and decode messages of the format you specify. I also implemented something similar myself, but greatly simplified; see my BProto generator and some examples (look in .bproto files, and addr.h for usage example).
Before you send any data over a TCP connection, work out a protocol specification. It doesn't have to be a multiple-page document filled with technical jargon. But it does have to specify who transmits what when and it must specify all messages at the byte level. It should specify how the ends of messages are established, whether there are any timeouts and who imposes them, and so on.
Without a specification, it's easy to ask questions that are simply impossible to answer. If something goes wrong, which end is at fault? With a specification, the end that didn't follow the specification is at fault. (And if both ends follow the specification and it still doesn't work, the specification is at fault.)
Once you have a specification, it's much easier to answer questions about how one end or the other should be designed.
I also strongly recommend not designing a network protocol around the specifics of your hardware. At least, not without a proven performance issue.
It depends on whether you can be sure that your systems on either end of the connection are homogeneous or not. If you are sure, for all time (which most of us cannot be), then you can take some shortcuts - but you must be aware that they are shortcuts.
struct something some;
...
if ((nbytes = write(sockfd, &some, sizeof(some)) != sizeof(some))
...short write or erroneous write...
and the analogous read().
However, if there's any chance that the systems might be different, then you need to establish how the data will be transferred formally. You might well linearize (serialize) the data - possibly fancily with something like ASN.1 or probably more simply with a format that can be reread easily. For that, text is often beneficial - it is easier to debug when you can see what's going wrong. Failing that, you need to define the byte order in which an int is transferred and make sure that the transfer follows that order, and the string probably gets a byte count followed by the appropriate amount of data (consider whether to transfer a terminal null or not), and then some representation of the float. This is more fiddly. It is not all that hard to write serialization and deserialization functions to handle the formatting. The tricky part is designing (deciding on) the protocol.
You could use an union with the structure you want to send and an array:
union SendSomething {
char arr[sizeof(struct something)];
struct something smth;
};
This way you can send and receive just arr. Of course, you have to take care about endianess issues and sizeof(struct something) might vary across machines (but you can easily overcome this with a #pragma pack).
Why would you do this when there are good and fast serialization libraries out there like Message Pack which do all the hard work for you, and as a bonus they provide you with cross-language compatibility of your socket protocol?
Use Message Pack or some other serialization library to do this.
Usually, serialization brings several benefits over e.g. sending the bits of the structure over the wire (with e.g. fwrite).
It happens individually for each non-aggregate atomic data (e.g. int).
It defines precisely the serial data format sent over the wire
So it deals with heterogenous architecture: sending and recieving machines could have different word length and endianness.
It may be less brittle when the type change a little bit. So if one machine has an old version of your code running, it might be able to talk with a machine with a more recent version, e.g. one having a char b[80]; instead of char b[64];
It may deal with more complex data structures -variable-sized vectors, or even hash-tables- with a logical way (for the hash-table, transmit the association, ..)
Very often, the serialization routines are generated. Even 20 years ago, RPCXDR already existed for that purpose, and XDR serialization primitives are still in many libc.
Pragma pack is used for the binary compatibility of you struct on another end.
Because the server or the client to which you send the struct may be written on another language or builded with other c compiler or with other c compiler options.
Serialization, as I understand, is making stream of bytes from you struct. When you write you struct in the socket you make serialiazation.
Google Protocol Buffer offers a nifty solution to this problem. Refer here Google Protobol Buffer - C Implementaion
Create a .proto file based on the structure of your payload and save it as payload.proto
syntax="proto3"
message Payload {
int32 age = 1;
string name = 2;
} .
Compile the .proto file using
protoc --c_out=. payload.proto
This will create the header file payload.pb-c.h and its corresponding payload.pb-c.c in your directory.
Create your server.c file and include the protobuf-c header files
#include<stdio.h>
#include"payload.pb.c.h"
int main()
{
Payload pload = PLOAD__INIT;
pload.name = "Adam";
pload.age = 1300000;
int len = payload__get_packed_size(&pload);
uint8_t buffer[len];
payload__pack(&pload, buffer);
// Now send this buffer to the client via socket.
}
On your receiving side client.c
....
int main()
{
uint8_t buffer[MAX_SIZE]; // load this buffer with the socket data.
size_t buffer_len; // Length of the buffer obtain via read()
Payload *pload = payload_unpack(NULL, buffer_len, buffer);
printf("Age : %d Name : %s", pload->age, pload->name);
}
Make sure you compile your programs with -lprotobuf-c flag
gcc server.c payload.pb-c.c -lprotobuf-c -o server.out
gcc client.c payload.pb-c.c -lprotobuf-c -o client.out

Parsing Binary Data in C?

Are there any libraries or guides for how to read and parse binary data in C?
I am looking at some functionality that will receive TCP packets on a network socket and then parse that binary data according to a specification, turning the information into a more useable form by the code.
Are there any libraries out there that do this, or even a primer on performing this type of thing?
I have to disagree with many of the responses here. I strongly suggest you avoid the temptation to cast a struct onto the incoming data. It seems compelling and might even work on your current target, but if the code is ever ported to another target/environment/compiler, you'll run into trouble. A few reasons:
Endianness: The architecture you're using right now might be big-endian, but your next target might be little-endian. Or vice-versa. You can overcome this with macros (ntoh and hton, for example), but it's extra work and you have make sure you call those macros every time you reference the field.
Alignment: The architecture you're using might be capable of loading a mutli-byte word at an odd-addressed offset, but many architectures cannot. If a 4-byte word straddles a 4-byte alignment boundary, the load may pull garbage. Even if the protocol itself doesn't have misaligned words, sometimes the byte stream itself is misaligned. (For example, although the IP header definition puts all 4-byte words on 4-byte boundaries, often the ethernet header pushes the IP header itself onto a 2-byte boundary.)
Padding: Your compiler might choose to pack your struct tightly with no padding, or it might insert padding to deal with the target's alignment constraints. I've seen this change between two versions of the same compiler. You could use #pragmas to force the issue, but #pragmas are, of course, compiler-specific.
Bit Ordering: The ordering of bits inside C bitfields is compiler-specific. Plus, the bits are hard to "get at" for your runtime code. Every time you reference a bitfield inside a struct, the compiler has to use a set of mask/shift operations. Of course, you're going to have to do that masking/shifting at some point, but best not to do it at every reference if speed is a concern. (If space is the overriding concern, then use bitfields, but tread carefully.)
All this is not to say "don't use structs." My favorite approach is to declare a friendly native-endian struct of all the relevant protocol data without any bitfields and without concern for the issues, then write a set of symmetric pack/parse routines that use the struct as a go-between.
typedef struct _MyProtocolData
{
Bool myBitA; // Using a "Bool" type wastes a lot of space, but it's fast.
Bool myBitB;
Word32 myWord; // You have a list of base types like Word32, right?
} MyProtocolData;
Void myProtocolParse(const Byte *pProtocol, MyProtocolData *pData)
{
// Somewhere, your code has to pick out the bits. Best to just do it one place.
pData->myBitA = *(pProtocol + MY_BITS_OFFSET) & MY_BIT_A_MASK >> MY_BIT_A_SHIFT;
pData->myBitB = *(pProtocol + MY_BITS_OFFSET) & MY_BIT_B_MASK >> MY_BIT_B_SHIFT;
// Endianness and Alignment issues go away when you fetch byte-at-a-time.
// Here, I'm assuming the protocol is big-endian.
// You could also write a library of "word fetchers" for different sizes and endiannesses.
pData->myWord = *(pProtocol + MY_WORD_OFFSET + 0) << 24;
pData->myWord += *(pProtocol + MY_WORD_OFFSET + 1) << 16;
pData->myWord += *(pProtocol + MY_WORD_OFFSET + 2) << 8;
pData->myWord += *(pProtocol + MY_WORD_OFFSET + 3);
// You could return something useful, like the end of the protocol or an error code.
}
Void myProtocolPack(const MyProtocolData *pData, Byte *pProtocol)
{
// Exercise for the reader! :)
}
Now, the rest of your code just manipulates data inside the friendly, fast struct objects and only calls the pack/parse when you have to interface with a byte stream. There's no need for ntoh or hton, and no bitfields to slow down your code.
The standard way to do this in C/C++ is really casting to structs as 'gwaredd' suggested
It is not as unsafe as one would think. You first cast to the struct that you expected, as in his/her example, then you test that struct for validity. You have to test for max/min values, termination sequences, etc.
What ever platform you are on you must read Unix Network Programming, Volume 1: The Sockets Networking API. Buy it, borrow it, steal it ( the victim will understand, it's like stealing food or something... ), but do read it.
After reading the Stevens, most of this will make a lot more sense.
Let me restate your question to see if I understood properly. You are
looking for software that will take a formal description of a packet
and then will produce a "decoder" to parse such packets?
If so, the reference in that field is PADS. A good article
introducing it is PADS: A Domain-Specific Language for Processing Ad
Hoc Data. PADS is very complete but unfortunately under a non-free licence.
There are possible alternatives (I did not mention non-C
solutions). Apparently, none can be regarded as completely production-ready:
binpac
PacketTypes
DataScript
If you read French, I summarized these issues in Génération de
décodeurs de formats binaires.
In my experience, the best way is to first write a set of primitives, to read/write a single value of some type from a binary buffer. This gives you high visibility, and a very simple way to handle any endianness-issues: just make the functions do it right.
Then, you can for instance define structs for each of your protocol messages, and write pack/unpack (some people call them serialize/deserialize) functions for each.
As a base case, a primitive to extract a single 8-bit integer could look like this (assuming an 8-bit char on the host machine, you could add a layer of custom types to ensure that too, if needed):
const void * read_uint8(const void *buffer, unsigned char *value)
{
const unsigned char *vptr = buffer;
*value = *buffer++;
return buffer;
}
Here, I chose to return the value by reference, and return an updated pointer. This is a matter of taste, you could of course return the value and update the pointer by reference. It is a crucial part of the design that the read-function updates the pointer, to make these chainable.
Now, we can write a similar function to read a 16-bit unsigned quantity:
const void * read_uint16(const void *buffer, unsigned short *value)
{
unsigned char lo, hi;
buffer = read_uint8(buffer, &hi);
buffer = read_uint8(buffer, &lo);
*value = (hi << 8) | lo;
return buffer;
}
Here I assumed incoming data is big-endian, this is common in networking protocols (mainly for historical reasons). You could of course get clever and do some pointer arithmetic and remove the need for a temporary, but I find this way makes it clearer and easier to understand. Having maximal transparency in this kind of primitive can be a good thing when debugging.
The next step would be to start defining your protocol-specific messages, and write read/write primitives to match. At that level, think about code generation; if your protocol is described in some general, machine-readable format, you can generate the read/write functions from that, which saves a lot of grief. This is harder if the protocol format is clever enough, but often doable and highly recommended.
You might be interested in Google Protocol Buffers, which is basically a serialization framework. It's primarily for C++/Java/Python (those are the languages supported by Google) but there are ongoing efforts to port it to other languages, including C. (I haven't used the C port at all, but I'm responsible for one of the C# ports.)
You don't really need to parse binary data in C, just cast some pointer to whatever you think it should be.
struct SomeDataFormat
{
....
}
SomeDataFormat* pParsedData = (SomeDataFormat*) pBuffer;
Just be wary of endian issues, type sizes, reading off the end of buffers, etc etc
Parsing/formatting binary structures is one of the very few things that is easier to do in C than in higher-level/managed languages. You simply define a struct that corresponds to the format you want to handle and the struct is the parser/formatter. This works because a struct in C represents a precise memory layout (which is, of course, already binary). See also kervin's and gwaredd's replies.
I'm not really understand what kind of library you are looking for ? Generic library that will take any binary input and will parse it to unknown format?
I'm not sure there is such library can ever exist in any language.
I think you need elaborate your question a little bit.
Edit:
Ok, so after reading Jon's answer seems there is a library, well kind of library it's more like code generation tool. But as many stated just casting the data to the appropriate data structure, with appropriate carefulness i.e using packed structures and taking care of endian issues you are good. Using such tool with C it's just an overkill.
Basically suggestions about casting to struct work but please be aware that numbers can be represented differently on different architectures.
To deal with endian issues network byte order was introduced - common practice is to convert numbers from host byte order to network byte order before sending the data and to convert back to host order on receipt. See functions htonl, htons, ntohl and ntohs.
And really consider kervin's advice - read UNP. You won't regret it!

Resources