On our ASIC we have two processors, which means two different compilers that behave slightly differently. We pass structs full of data between the two for communication, this happens quite often per second so we don't have much time to burn here.
The problem is that both compilers treat padding differently. So when we go to copy data from one domain to the other, we get values that don't align correctly. Our initial solution was to put attribute((packed)) on everything inside the struct. While this seems to work most of the time, it is by no means portable. As we are moving the code to different platforms, I'm noticing that not all compilers understand attribute((packed)) and I'd like to keep our code portable.
Has anyone here dealt with this kind of issue? What would you recommend?
Thanks in advance!
I would manually pack such structures, so that there is no padding issue with any compiler, present or future.
It is tedious, but it is a portable future-proof solution, worth the effort.
Fundamentally struct arrangement in C isn't portable, and so __attribute__((packed)) or similar is generally the typical way to impose a fixed layout on a struct.
The other option is to manually add pad fields in appropriate places and be aware of each platforms' alignment constraints to ensure that across your two platforms the two structures match - but this is essentially manual attribute((packed)).
Note that the pahole utility from dwarves (announcement, paper) is a great tool to check and view the alignment of structures, assuming your compiler emits ELF files.
This is the reason why you shouldn't use structs for data protocols. It might seem like a harsh thing to say, but structs are unfortunately non-portable and therefore dangerous. You could consider doing something like this (pseudo code):
typedef struct
{
uint32_t x;
uint16_t y;
...
} my_struct_t; // some custom struct
#define PROTOCOL_SIZE ( sizeof(uint32_t) + sizeof(uint16_t) + ... )
void pack (uint8_t raw_data[PROTOCOL_SIZE],
const my_struct_t* ms)
{
uint16_t i=0;
memcpy(&raw_data[i], ms->x, sizeof(ms->x));
i += sizeof(ms->x);
memcpy(&raw_data[i], ms->y, sizeof(ms->y));
i += sizeof(ms->y);
...
}
And then make a similar unpack() function which copies raw data into the struct.
Advantages of this is: 100% portable. And if the protocol specifies a particular endianess, this function could also handle that conversion (which you would have to do anyhow.)
Disadvantages are one extra memory buffer and some extra data copying.
Related
I am learning structure padding and packing in C.
I have this doubt, as I have read padding will depend on architecture, so does it affect inter machine communication?, ie. if data created on one machine is getting read on other machine.
How this problem is avoided in this scenario.
Yes, you cannot send the binary data of a structure between platforms and expect it to look the same on the other side.
The way you solve it is you create a marshaller/demarshaller for your construct and pass it through on the way out of one system, and on the way in to the other system. This lets the compiler take care of the buffering for you on each system.
Each side knows how to take the data, as you've specified it will be sent, and deal with it for the local platform.
Platforms such as java handle this for you by creating serialization mechanisms for your classes. In C, you'll need to do this for yourself. How you do it depends on how you want to send your data. You could serialize to binary, XML, or anything else.
#pragma pack is supported by most compilers that I know of. This can allow the programmer to specify their desired padding method for structs.
http://msdn.microsoft.com/en-us/library/2e70t5y1%28v=vs.80%29.aspx
http://gcc.gnu.org/onlinedocs/gcc/Structure_002dPacking-Pragmas.html
http://clang.llvm.org/docs/UsersManual.html#microsoft-extensions
In C/C++ a structures are used as data pack. It doesn't provide any data encapsulation or data hiding features (C++ case is an exception due to its semantic similarity with classes).
Because of the alignment requirements of various data types, every member of structure should be naturally aligned. The members of structure allocated sequentially increasing order.
It will only be affected if the code you have compiled for some other architecture uses a different padding scheme.
To help alleviate problems, I recommend that you pack structures with no padding. Where padding is required, use place-holders in (eg char reserved[2]). Also, don't use bitfields!! They are not portable.
You should also be aware of other architecture-related problems. Specifically endianness, and datatype sizes. If you need better portability, you may want to serialise and de-serialise a byte stream instead of casting it as a struct.
You can use #pragma pack(1) before the struct declaration and #pragma pack() before to disable architecture based packing; this will solve half of the problem 'cause some data types are architecture based too, to solve the second half I usually use specific data type like int_16 for 16 bits integers, u_int_32 for 32 bits integers and so on.
Take a look at http://freebsd.active-venture.com/FreeBSD-srctree/newsrc/netinet/ip_icmp.h.html ; this include describe some architecture independent network data packets.
Have seen various code around where one read data into a char or void and then
cast it to a struct. Example is parsing of file formats where data has fixed offsets.
Example:
struct some_format {
char magic[4];
uint32_t len;
uint16_t foo;
};
struct some_format *sf = (struct some_format*) buf;
To be sure this is always valid one need to align the struct by using __attribute__((packed)).
struct test {
uint8_t a;
uint8_t b;
uint32_t c;
uint8_t d[128];
} __attribute__((packed));
When reading big and complex file formats this surely makes things much simpler. Typically
reading media format with structs having 30+ members etc.
It is also easy to read in a huge buffer and cast to proper type by for example:
struct mother {
uint8_t a;
uint8_t b;
uint32_t offset_child;
};
struct child {
...
}
m = (struct mother*) buf;
c = (struct child*) ((uint8_t*)buf + mother->offset_child);
Or:
read_big_buf(buf, 4096);
a = (struct a*) buf;
b = (struct b*) (buf + sizeof(struct a));
c = (struct c*) (buf + SOME_DEF);
...
It would also be easy to quickly write such structures to file.
My question is how good or bad this way of coding is. I am looking at various data
structures and would use the best way to handle this.
Is this how it is done? (As in: is this common practice.)
Is __attribute__((packed)) always safe?
Is it better to use sscanf. What was I thinking about?, Thanks #Amardeep
Is it better to make functions where one initiates structure with casts and bit shifting.
etc.
As of now I use this mainly in data information tools. Like listing all structures of
a certain type with their values in a file format like e.g. a media stream.
Information dumping tools.
It is how it is sometimes done. Packed is safe as long as you use it correctly. Using sscanf() would imply you are reading text data, which is a different use case than a binary image from a structure.
If your code does not require portability across compilers and/or platforms (CPU architectures), and your compiler has support for packed structures, then this is a perfectly legitimate way of accessing serialized data.
However, problems may arise if you try to generate data on one platform and use it on another due to:
Host Byte Order (Little Endian/Big Endian)
Different sizes for language primitive types (long can be 32 or 64 bits for example)
Code changes on one side but not the other.
There are libraries that simplify serialization/deserialization and handle most of these issues. The overhead of such operations is easier justified on systems that must span processes and hosts. However, if your structures are very complex, using a ser/des library may be justified simply due to ease of maintenance.
Is this how it is done?
I don't this question understand. Edit: you'd like to know if this is a common idiom. In codebases where dependency on GNU extensions is acceptable, yes, this is used quite frequently, since it's convenient.
is __attribute__((packed)) always safe?
For this use case, pretty much yes, except when it's unavailable.
Is it better to use sscanf.
No. Don't use scanf().
Is it better to make functions where one initiates structure with casts and bit shifting.
It's more portable. __attribute__((packed)) is a GNU extension, and not all compilers support it (although I'm wondering who cares about compilers other than GCC and Clang, but theoretically, this still is an issue).
One of my gripes about C language standards to date is that they impose enough rules about how compilers have to lay out structures and bit fields to preclude what might otherwise be useful optimizations [e.g. on a system with power-of-two integer sizes, a compiler would be forbidden from packing eight three-bit fields into three bytes] but does not provide any means by which a programmer can specify an explicit struct layout. I used to frequently use byte pointers to read out data from structures, but I don't favor such techniques now so much as I used to. When speed isn't critical, I prefer nowadays to use a family functions which either write multi-byte types to multiple consecutive memory locations using whatever endianness is needed [e.g. void storeI32LE(uint8_t **dest, int32_t dat) or int32_t readI32LE(uint8_t const **src);]. Such code will not be as efficient as what a compiler might be able to write in cases where processors have the correct endianness and either the structure members are aligned or processors support unaligned accesses, but code using such methods may easily be ported to any processor regardless of its native alignment and endianness.
I have a question about structure padding and memory alignment optimizations regarding structures in C language. I am sending a structure over the network, I know that, for run-time optimizations purposes, the memory inside a structure is not contiguous. I've run some tests on my local computer and indeed, sizeof(my_structure) was different than the sum of all my structure members. I ran some research to find out two things :
First, the sizeof() operator retrieves the padded size of the structure (i.e the real size that would be stored in memory).
When specifying __attribute__((__packed__)) in the declaration of the structure this optimization is disabled by the compiler, so sizeof(my_structure) will be exactly the same as the sum of the fields of my structure.
That being said, i am wondering if the sizeof operator was getting the padded size on every compilers implementation and on every architecture, in other words, is it always safe to copy a structure with memcpy for example using the sizeof operator such as :
memcpy(struct_dest, struct_src, sizeof(struct_src));
I am also wondering what is the real purpose of __attribute__((__packed__)), is it used to send a less important amount the data on a network when submitting a structure or is it, in fact, used to avoid some unspecified and platform-dependant sizeof operator behaviour ?
Thanks by advance.
Different compilers on different architectures can and do use different padding. So for wire transmission it is not uncommon to pack structs to achieve a consistent binary layout. This can then cater for the code at each end of the wire running on different architecture.
However you also need to make sure that your data types are the same size if you use this approach. For example, on 64 bit systems, long is 4 bytes on Windows and 8 bytes almost everywhere else. And you also need to deal with endianness issues. The standard is to transmit over the wire in network byte order. In practice you would be better using a dedicated serialization library rather than trying to reinvent solutions to all these issues.
I am sending a structure over the network
Stop there. Perhaps some would disagree with me on this (in practice you do see a lot of projects doing this), but struct is a way of laying out things in memory - it's not a serialization mechanism. By using this tool for the job, you're already tying yourself to a bunch of non-portable assumptions.
Sure, you may be able to fake it with things like structure padding pragmas and attributes, but - can you really? Even with those non-portable mechanisms you never know what quirks might show up. I recall working in a code base where "packed" structures were used, then suddenly taking it to a platform where access had to be word aligned... even though it was nominally the same compiler (thus supported the same proprietary extensions) it produced binaries which crashed. Any pain you get from this path is probably deserved, and I would say only take it if you can be 100% sure it will only run in a given compiler and environment, and that will never change. I'd say the safer bet is to write a proper serialization mechanism that doesn't allow writing structures around across process boundaries.
Is it always safe to copy a structure with memcpy for example using the sizeof operator
Yes, it is and that is the purpose of providing the sizeof operator.
Usually __attribute__((__packed__)) is used not for size considerations but when you want want to to make sure of the layout of a structure is exactly as you want it to be.
For ex:
If a structure is to be used to match hardware or be sent on a wire then it needs to have the exact same layout without any padding.This is because different architectures usually implement different kinds & amounts of padding and alignment and the only way to ensure common ground is to remove padding out out of the picture by using packing.
I've lots of different structs containing enum members that I have to transmit via TCP/IP. While the communication endpoints are on different operating systems (Windows XP and Linux) meaning different compilers (gcc 4.x.x and MSVC 2008) both program parts share the same header files with type declarations.
For performance reasons, the structures should be transmitted directly (see code sample below) without expensively serializing or streaming the members inside.
So the question is how to ensure that both compilers use the same internal memory representation for the enumeration members (i.e. both use 32-bit unsigned integers). Or if there is a better way to solve this problem...
//type and enum declaration
typedef enum
{
A = 1,
B = 2,
C = 3
} eParameter;
typedef enum
{
READY = 400,
RUNNING = 401,
BLOCKED = 402
FINISHED = 403
} eState;
#pragma pack(push,1)
typedef struct
{
eParameter mParameter;
eState mState;
int32_t miSomeValue;
uint8_t miAnotherValue;
...
} tStateMessage;
#pragma pack(pop)
//... send via socket
tStateMessage msg;
send(iSocketFD,(void*)(&msg),sizeof(tStateMessage));
//... receive message on the other side
tStateMessage msg_received;
recv(iSocketFD,(void*)(&msg_received),sizeof(tStateMessage));
Additionally...
Since both endpoints are little endian maschines, endianess is not a problem here.
And the pack #pragma solves alignment issues satisfactorily.
Thx for your answers,
Axel
I'll answer your question pragmatically because you've chosen a relatively risky path after weighing the performance gains against the possible downsides (at least I hope you have!).
If portability and robustness against future changes to those compilers have also been considered then an empirical approach would be the best guard against problems.
Ensure you are using initializers for the enums (your examples do this) in all cases.
Do empirical testing to see how the data is interpreted on the receiving side.
Record the version numbers of the build tools on both sides and archive them with the source code. Preferably archive the tools as well.
Document everything you did so unforeseen maintenance in the future is not handicapped.
Pray for the best! ;-)
I would advise you to use one of the serialization libraries specially designed for such problems, like:
Apache Avro (tutorial)
Facebook's Thrift (tutorial)
Google's Protocol Buffers (tutorial)
What you will get is maximum platform portability, an easy way of changing the interface and the type of messages transmitted plus a lot more useful features.
Note that only Avro has an officially supported C API. For Thrift and Protocol Buffers you either make a thin C wrapper over the C++ API or use one of the C APIs, like protobuf-c.
This is premature optimization. You have made two costly assumptions without measurements.
The first assumption is that this part of the code is a performance bottleneck in the first place. Is it? Very unlikely. If one is going to make assumptions about performance, then the safe assumption is that the network speed will be the bottleneck, not the code which sends and receives the network messages. This alone should prevent you from ever considering the second assumption.
The second assumption is that serializing the struct portably will be noticeably slower than writing the raw bits of the struct. This assumption is nearly always false.
Skeptical? Measure it! :)
If you don't want to go through serialization, one method I've seen used is to eschew enums and simply use 32-bit unsigned ints and #DEFINEs to emulate enums. You trade away some type safety for some assurances about data format.
Otherwise, you are relying on behaviour that isn't guarenteed in the language specification to be implemented the same way on all your compilers. If you aren't worried about general portability and just want to ensure the same effect on two compilers, it should be possible through trial and error and a lot of testing to get the two to do the same thing. I believe the C99 spec allows enums to internally be the size of int or smaller, but not larger than int. So one thing I've seen done to supposedly hint the compiler in the right direction is:
typedef enum
{
READY = 400,
RUNNING = 401,
BLOCKED = 402,
FINISHED = 403,
MAX = MAX_INT
} eState;
This should limit the compiler's choices for how to store the enum. Note that compilers can violate the standard, however, I know gcc has a non-standard feature where it will allow 64-bit enums if necessary.
Also, check out:
What is the size of an enum in C?
It is strongly recommended to serialize the data in some way or at least use an indicator about the hardware architecture. Even if you use the same compiler, you can have problems with internal data representations (little endian, big endian etc).
I have just discovered the joy of bitflags. I have several questions related to "best-practices" regarding the use of bitflags in C. I learned everything from various examples I found on the web but still have questions.
In order to save space, I am using a single 32bit integer field in a struct (A->flag) to represent several different sets of boolean properties. In all, 20 different bits are #defined. Some of these are truly presence/absence flags (STORAGE-INTERNAL vs. STORAGE-EXTERNAL). Others have more than two values (e.g. mutually exclusive set of formats: FORMAT-A, FORMAT-B, FORMAT-C). I have defined macros for setting specific bits (and simultaneously turning off mutually exclusive bits). I have also defined macros for testing if specific combination of bits are set in the flag.
However, what is lost in the above approach is the specific grouping of flags that is best captured by enums. For writing functions, I would like to use enums (e.g., STORAGE-TYPE and FORMAT-TYPE), so that function definitions look nice. I expect to use enums only for passing parameters and #defined macros for setting and testing flags.
(a) How do I define flag (A->flag) as a 32 bit integer in a portable fashion (across 32 bit / 64 bit platforms)?
(b) Should I worry about potential size differences in how A->flag vs. #defined constants vs. enums are stored?
(c) Am I making things unnecessarily complicated, meaning should I just stick to using #defined constants for passing parameters as ordinary ints? What else should I worry about in all this?
I apologize for the poorly articulated question. It reflects my ignorance about potential issues.
There is a C99 header that was intended to solve that exact problem (a) but for some reason Microsoft doesn't implement it. Fortunately, you can get <stdint.h> for Microsoft Windows here. Every other platform will already have it. The 32-bit int types are uint32_t and int32_t. These also come in 8, 16, and 64- bit flavors.
So, that takes care of (a).
(b) and (c) are kind of the same question. We do make assumptions whenever we develop something. You assume that C will be available. You assume that <stdint.h> can be found somewhere. You could always assume that int was at least 16 bits and now a >= 32 bit assumption is fairly reasonable.
In general, you should try to write conforming programs that don't depend on layout, but they will make assumptions about word length. You should worry about performance at the algorithm level, that is, am I writing something that is quadratic, polynomial, exponential?
You should not worry about performance at the operation level until (a) you notice a performance lag, and (b) you have profiled your program. You need to get your job done without bogging down worrying about individual operations. :-)
Oh, I should add that you particularly don't need to worry about low level operation performance when you are writing the program in C in the first place. C is the close-to-the-metal go-as-fast-as-possible language. We routinely write stuff in php, python, ruby, or lisp because we want a powerful language and the CPU's are so fast these days that we can get away with an entire interpreter, never mind a not-perfect choice of bit-twiddle-word-length ops. :-)
You can use bit-fields and let the compiler do the bit twiddling. For example:
struct PropertySet {
unsigned internal_storage : 1;
unsigned format : 4;
};
int main(void) {
struct PropertySet x;
struct PropertySet y[10]; /* array of structures containing bit-fields */
if (x.internal_storage) x.format |= 2;
if (y[2].internal_storage) y[2].format |= 2;
return 0;
}
Edited to add array of structures
As others have said, your problem (a) is resolvable by using <stdint.h> and either uint32_t or uint_least32_t (if you want to worry about Burroughs mainframes which have 36-bit words). Note that MSVC does not support C99, but #DigitalRoss shows where you can obtain a suitable header to use with MSVC.
Your problem (b) is not an issue; C will type convert safely for you if it is necessary, but it probably isn't even necessary.
The area of most concern is (c) and in particular the format sub-field. There, 3 values are valid. You can handle this by allocating 3 bits and requiring that the 3-bit field is one of the values 1, 2, or 4 (any other value is invalid because of too many or too few bits set). Or you could allocate a 2-bit number, and specify that either 0 or 3 (or, if you really want to, one of 1 or 2) is invalid. The first approach uses one more bit (not currently a problem since you're only using 20 of 32 bits) but is a pure bitflag approach.
When writing function calls, there is no particular problem writing:
some_function(FORMAT_A | STORAGE_INTERNAL, ...);
This will work whether FORMAT_A is a #define or an enum (as long as you specify the enum value correctly). The called code should check whether the caller had a lapse in concentration and wrote:
some_function(FORMAT_A | FORMAT_B, ...);
But that is an internal check for the module to worry about, not a check for users of the module to worry about.
If people are going to be switching bits in the flags member around a lot, the macros for setting and unsetting the format field might be beneficial. Some might argue that any pure boolean fields barely need it, though (and I'd sympathize). It might be best to treat the flags member as opaque and provide 'functions' (or macros) to get or set all the fields. The less people can get wrong, the less will go wrong.
Consider whether using bit-fields works for you. My experience is that they lead to big code and not necessarily very efficient code; YMMV.
Hmmm...nothing very definitive here, so far.
I would use enums for everything because those are guaranteed to be visible in a debugger where #define values are not.
I would probably not provide macros to get or set bits, but I'm a cruel person at times.
I would provide guidance on how to set the format part of the flags field, and might provide a macro to do that.
Like this, perhaps:
enum { ..., FORMAT_A = 0x0010, FORMAT_B = 0x0020, FORMAT_C = 0x0040, ... };
enum { FORMAT_MASK = FORMAT_A | FORMAT_B | FORMAT_C };
#define SET_FORMAT(flag, newval) (((flag) & ~FORMAT_MASK) | (newval))
#define GET_FORMAT(flag) ((flag) & FORMAT_MASK)
SET_FORMAT is safe if used accurately but horrid if abused. One advantage of the macros is that you could replace them with a function that validated things thoroughly if necessary; this works well if people use the macros consistently.
For question a, if you are using C99 (you probably are using it), you can use the uint32_t predefined type (or, if it is not predefined, it can be found in the stdint.h header file).
Regarding (c): if your enumerations are defined correctly you should be able to pass them as arguments without a problem. A few things to consider:
enumeration storage is often
compiler specific, so depending on
what kind of development you are
doing (you don't mention if it's
Windows vs. Linux vs. embedded vs.
embedded Linux :) ) you may want to
visit compiler options for enum
storage to make sure there are no
issues there. I generally agree with
the above consensus that the
compiler should cast your
enumerations appropriately - but
it's something to be aware of.
in the case that you are doing
embedded work, many static quality
checking programs such as PC Lint
will "bark" if you start getting too
clever with enums, #defines, and
bitfields. If you are doing
development that will need to pass
through any quality gates, this
might be something to keep in mind.
In fact, some automotive standards
(such as MISRA-C) get downright
irritable if you try to get trig
with bitfields.
"I have just discovered the joy of
bitflags." I agree with you - I find
them very useful.
I added comments to each answer above. I think I have some clarity. It seems enums are cleaner as it shows up in debugger and keeps fields separate. macros can be used for setting and getting values.
I have also read that enums are stored as small integers - which as I understand it, is not a problem with the boolean tests as these would be peroformed starting at the right most bits. But, can enums be used to store large integers (1 << 21)??
thanks again to you all. I have already learned more than I did two days ago!!
~Russ