Byte swapping a struct - c

Okay I hate to ask this question ... but here goes. I am writing some code in C on an x86 machine. I want to send a struct over the network and and I want to convert the struct to Network Byte Order ... I understand all the drama about packing and the the gcc packing pragmas ... what I want to know is HOW do I convert a struct (or an array or any such arbitrary memory blob) to network byte order.
Is there a standard (Unix/Linux/Posix) function call I can use or must I roll my own.
x

In principle, you can go through the struct and call htonl or htons on each uint32_t or uint16_t field, respectively, assigning the results back or to a copy of the struct. However, I would not recommend this sort of approach. It's very fragile and subject to struct alignment issues, etc.
Unless transmitting and receiving data is extremely performance-critical, I would simply implement proper serialize and deserialize methods for your structures. You can write numeric values out one byte at a time in binary format, choosing whether you want to write to least significant or most significant part first. But really, I would recommend choosing a modern text-based serialization format like json or (uhg, I hate to say this) xml. The cost of serializing and deserializing text is quite small, and the advantages in terms of debugging ease and extensibility are significant.
Finally, if you want to use text but find json or xml too distasteful, too heavy, or too much of a learning curve, you can always just use printf and scanf formatting to read and write structures as text in a fixed order. Writing all numeric values, including floats, in hex rather than decimal will probably improve performance a bit and ensure round-trip accuracy of floating point values. If you don't have C99, another option for floats could be to decompose them to mantissa/exponent form and recompose them using frexp and ldexp.

If you want to send data over the network, you will want to look at using the network-to-host and host-to-network byte ordering functions.
http://beej.us/guide/bgnet/output/html/multipage/htonsman.html

htons
http://www.gnu.org/s/libc/manual/html_node/Byte-Order.html

RESPONSE: Thank you for the responses. I guess the CORRECT answer is that for structs, arrays and such memory blobs you MUST implement your own serialization function ... this is fine ... I will look into this. I wanted to get a good feel before I attempted such a thing ...
x

Also look into frameworks that have been implemented to tackle this exact problem, that allow you to marshall / demarshall arbitrarily complex data structures. If you are going to do this on a scale any larger than a few types, then use a framework.
rpcgen / XDR : Don't let all the talk about RPC / client / server scare you away. You can use rpcgen to generate XDR marshalling / demarshalling routines for your data, that you can transport whichever way you like.
Flick IDL Compiler Kit
CORBA : Most CORBA frameworks have IDL compilers e.g. ACE TAO.
ASN.1 : If you enjoy some pain. Very exotic though.

Related

Define size of long and time_t to 4bytes

I want to synchronize two Raspberry Pi's with a C program. It is working fine, if the program only is running on the Pi's, but for development I want to use my PC (where its also easier to debug), but I send the timespec struct directly as binary over the wire. A raspberry is using 4bytes for long and time_t, my PC is using 8byte each... So they do not come together.
Is it possible to set long and time_t to 4byte each, only for this C script?
I know that the size of long, short, etc. is defined by the system.
Important: I only want to define it once in the script and not transforming it to uintXX or int each time.
In programming, it is not uncommon to need to treat network transmissions as separate from in-memory handling. In fact, it is pretty much the norm. So converting it to a network format of the proper byte order and size is really recommended and while help with the abstractions for your interfaces.
You might as well consider transforming to plain text, if that is not a time-critical piece of data exchange. It makes for a lot easier debugging.
C is probably not the best tool for the job here. It's much too low level to provide automatic data serialization like JavaScript, Python or similar more abstract languages.
You cannot assume the definitions of timespec will be identical on different platforms. For one thing the size of an int will be different depending on the 32/64 bits architecture, and you can have endianing problems too.
When you want to exchange data structures between heterogeneous platforms, you need to define your own protocol with unambiguous data and a clear endianing convention.
One solution would be to send the numbers as ASCII. Not terribly efficient, but if it's just a couple of values, who cares?
Another would be to create an exchange structure with (u)intXX_t fields.
You can assume a standard raspberry kernel will be little endian like your PC, but if you're writing a small exchange protocol, you might as well add a couple of htonl/ntohl for good measure.

Structure padding

I am learning structure padding and packing in C.
I have this doubt, as I have read padding will depend on architecture, so does it affect inter machine communication?, ie. if data created on one machine is getting read on other machine.
How this problem is avoided in this scenario.
Yes, you cannot send the binary data of a structure between platforms and expect it to look the same on the other side.
The way you solve it is you create a marshaller/demarshaller for your construct and pass it through on the way out of one system, and on the way in to the other system. This lets the compiler take care of the buffering for you on each system.
Each side knows how to take the data, as you've specified it will be sent, and deal with it for the local platform.
Platforms such as java handle this for you by creating serialization mechanisms for your classes. In C, you'll need to do this for yourself. How you do it depends on how you want to send your data. You could serialize to binary, XML, or anything else.
#pragma pack is supported by most compilers that I know of. This can allow the programmer to specify their desired padding method for structs.
http://msdn.microsoft.com/en-us/library/2e70t5y1%28v=vs.80%29.aspx
http://gcc.gnu.org/onlinedocs/gcc/Structure_002dPacking-Pragmas.html
http://clang.llvm.org/docs/UsersManual.html#microsoft-extensions
In C/C++ a structures are used as data pack. It doesn't provide any data encapsulation or data hiding features (C++ case is an exception due to its semantic similarity with classes).
Because of the alignment requirements of various data types, every member of structure should be naturally aligned. The members of structure allocated sequentially increasing order.
It will only be affected if the code you have compiled for some other architecture uses a different padding scheme.
To help alleviate problems, I recommend that you pack structures with no padding. Where padding is required, use place-holders in (eg char reserved[2]). Also, don't use bitfields!! They are not portable.
You should also be aware of other architecture-related problems. Specifically endianness, and datatype sizes. If you need better portability, you may want to serialise and de-serialise a byte stream instead of casting it as a struct.
You can use #pragma pack(1) before the struct declaration and #pragma pack() before to disable architecture based packing; this will solve half of the problem 'cause some data types are architecture based too, to solve the second half I usually use specific data type like int_16 for 16 bits integers, u_int_32 for 32 bits integers and so on.
Take a look at http://freebsd.active-venture.com/FreeBSD-srctree/newsrc/netinet/ip_icmp.h.html ; this include describe some architecture independent network data packets.

Questions about memory alignement in structures and portability of the sizeof operator

I have a question about structure padding and memory alignment optimizations regarding structures in C language. I am sending a structure over the network, I know that, for run-time optimizations purposes, the memory inside a structure is not contiguous. I've run some tests on my local computer and indeed, sizeof(my_structure) was different than the sum of all my structure members. I ran some research to find out two things :
First, the sizeof() operator retrieves the padded size of the structure (i.e the real size that would be stored in memory).
When specifying __attribute__((__packed__)) in the declaration of the structure this optimization is disabled by the compiler, so sizeof(my_structure) will be exactly the same as the sum of the fields of my structure.
That being said, i am wondering if the sizeof operator was getting the padded size on every compilers implementation and on every architecture, in other words, is it always safe to copy a structure with memcpy for example using the sizeof operator such as :
memcpy(struct_dest, struct_src, sizeof(struct_src));
I am also wondering what is the real purpose of __attribute__((__packed__)), is it used to send a less important amount the data on a network when submitting a structure or is it, in fact, used to avoid some unspecified and platform-dependant sizeof operator behaviour ?
Thanks by advance.
Different compilers on different architectures can and do use different padding. So for wire transmission it is not uncommon to pack structs to achieve a consistent binary layout. This can then cater for the code at each end of the wire running on different architecture.
However you also need to make sure that your data types are the same size if you use this approach. For example, on 64 bit systems, long is 4 bytes on Windows and 8 bytes almost everywhere else. And you also need to deal with endianness issues. The standard is to transmit over the wire in network byte order. In practice you would be better using a dedicated serialization library rather than trying to reinvent solutions to all these issues.
I am sending a structure over the network
Stop there. Perhaps some would disagree with me on this (in practice you do see a lot of projects doing this), but struct is a way of laying out things in memory - it's not a serialization mechanism. By using this tool for the job, you're already tying yourself to a bunch of non-portable assumptions.
Sure, you may be able to fake it with things like structure padding pragmas and attributes, but - can you really? Even with those non-portable mechanisms you never know what quirks might show up. I recall working in a code base where "packed" structures were used, then suddenly taking it to a platform where access had to be word aligned... even though it was nominally the same compiler (thus supported the same proprietary extensions) it produced binaries which crashed. Any pain you get from this path is probably deserved, and I would say only take it if you can be 100% sure it will only run in a given compiler and environment, and that will never change. I'd say the safer bet is to write a proper serialization mechanism that doesn't allow writing structures around across process boundaries.
Is it always safe to copy a structure with memcpy for example using the sizeof operator
Yes, it is and that is the purpose of providing the sizeof operator.
Usually __attribute__((__packed__)) is used not for size considerations but when you want want to to make sure of the layout of a structure is exactly as you want it to be.
For ex:
If a structure is to be used to match hardware or be sent on a wire then it needs to have the exact same layout without any padding.This is because different architectures usually implement different kinds & amounts of padding and alignment and the only way to ensure common ground is to remove padding out out of the picture by using packing.

Dumping struct in C

Is it a good idea to simply dump a struct to a binary file using fwrite?
e.g
struct Foo {
char name[100];
double f;
int bar;
} data;
fwrite(&data,sizeof(data),1,fout);
How portable is it?
I think it's really a bad idea to just throw whatever the compiler gives(padding,integer size,etc...). even if platform portability is not important.
I've a friend arguing that doing so is very common.... in practice.
Is it true???
Edit: What're the recommended way to write portable binary file? Using some sort of library?
I'm interested how this is achieved too.(By specifying byte order,sizes,..?)
That's certainly a very bad idea, for two reasons:
the same struct may have different sizes on different platforms due to alignment issues and compiler mood
the struct's elements may have different representations on different machines (think big-endian/little-endian, IEE754 vs. some other stuff, sizeof(int) on different platforms)
It rather critically matters whether you want the file to be portable, or just the code.
If you're only ever going to read the data back on the same C implementation (and that means with the same values for any compiler options that affect struct layout in any way), using the same definition of the struct, then the code is portable. It might be a bad idea for other reasons: difficulty of changing the struct, and in theory there could be security risks around dumping padding bytes to disk, or bytes after any NUL terminator in that char array. They could contain information that you never intended to persist. That said, the OS does it all the time in the swap file, so whatEVER, but try using that excuse when users notice that your document format doesn't always delete data they think they've deleted, and they just emailed it to a reporter.
If the file needs to be passed between different platforms then it's a pretty bad idea, because you end up accidentally defining your file format to be something like, "whatever MSVC on Win32 ends up writing". This could end up being pretty inconvenient to read and write on some other platform, and certainly the code you wrote in the first place won't do it when running on another platform with an incompatible storage representation of the struct.
The recommended way to write portable binary files, in order of preference, is probably:
Don't. Use a text format. Be prepared to lose some precision in floating-point values.
Use a library, although there's a bit of a curse of choice here. You might think ASN.1 looks all right, and it is as long as you never have to manipulate the stuff yourself. I would guess that Google Protocol Buffers is fairly good, but I've never used it myself.
Define some fairly simple binary format in terms of what each unsigned char in turn means. This is fine for characters[*] and other integers, but gets a bit tricky for floating-point types. "This is a little-endian representation of an IEEE-754 float" will do you OK provided that all your target platforms use IEEE floats. Which I expect they do, but you have to bet on that. Then, assemble that sequence of characters to write and interpret it to read: if you're "lucky" then on a given platform you can write a struct definition that matches it exactly, and use this trick. Otherwise do whatever byte manipulation you need to. If you want to be really portable, be careful not to use an int throughout your code to represent the value taken from bar, because if you do then on some platform where int is 16 bits, it won't fit. Instead use long or int_least32_t or something, and bounds-check the value on writing. Or use uint32_t and let it wrap.
[*] Until you hit an EBCDIC machine, that is. Not that anybody will seriously expect your files to be portable to a machine that plain text files aren't portable to either.
How fond are you of getting a call in the middle of the night? Either use a #pragma to pack them or write them variable by variable.
Yes, this sort of foolishness is very common but that doesn't make it a good idea. You should write each field individually in a specified byte order, that will avoid alignment and byte order problems at the cost of a little tiny bit of extra effort. Reading and writing field by field will also make your life easier when you upgrade your software and have to read your old data format or if the underlying hardware architecture changes.

MPI and C structs

I have to admit, I was quite shocked to see how many lines of code are required to transfer one C struct with MPI.
Under what circumstances will it work to simply transmit a struct using the predefined datatype MPI_CHAR? Consider the following example:
struct particle {
double x;
double y;
long i;
};
struct particle p;
MPI_Isend(&p, sizeof(particle), MPI_CHAR, tag, MPI_COMM_WORLD, &sendr);
In my case, all processes run on the same architecture. Is padding the only issue?
MPI_BYTE is the datatype to use when sending untyped data, not MPI_CHAR. If the two machines have the same architecture but use different character encoding, using MPI_CHAR may corrupt your data. Sending your data as MPI_BYTE will leave the binary representation as it is and perform no representation conversion whatsoever.
That said, yes, technically it is correct to send a struct that way, if (and only if) you can guarantee that the data representation will be the same on the sending and receiving ends. However, it is poor programming practice, as it obfuscates the purpose of your code, and introduces platform dependency.
Keep in mind that you only have to define and commit a datatype once, while you will generally need to write code to send it several times. Reducing the clarity of all your sends just to save a couple lines on the single definition is not a trade up.
Personally I'd be more concerned about comprehensibility and maintainability, even portability, than padding. If I'm sending a structure I like my code to show that I am sending a structure, not a sequence of bytes or chars. And I expect my codes to run on multiple architectures, across multiple generations of language standards and compilers.
I guess all I'm saying is that, if it's worth defining a structure (and you obviously think it is) then it's worth defining a structure. Saving a few lines of (near-)boilerplate isn't much of an argument against that.

Resources