Strict aliasing and writing int via char*

Strict aliasing and writing int via char* - c

In an old program I serialized a data structure to bytes, by allocating an array of unsigned char, and then converted ints by:
*((*int)p) = value;
(where p is the unsigned char*, and value is the value to be stored).
This worked fine, except when compiled on Sparc where it triggered exceptions due to accessing memory with improper alignment. Which made perfect sense because the data elements had varying sizes so p quickly became unaligned, and triggered the error when used to store an int value, where the underlying Sparc instructions require alignment.
This was quickly fixed (by writing out the value to the char-array byte-by-byte). But I'm a bit concerned about this because I've used this construction in many programs over the years without issue. But clearly I'm violating some C rule (strict aliasing?) and whereas this case was easily discovered, maybe the violations can cause other types of undefined behavior that is more subtle due to optimizing compilers etc. I'm also a bit puzzled because I believe I've seen constructions like this in lot of C code over the years. I'm thinking of hardware drivers that describe the data-structure exchanged by the hardware as structs (using pack(1) of course), and writing those to h/w registers etc. So it seems to be a common technique.
So my question is, is exactly what rule was violated by the above, and what would be the proper C way to realize the use-case (i.e. serializing data to an array of unsigned char). Of course custom serialization functions can be written for all functions to write it out byte-by-byte but it sounds cumbersome and not very efficient.
Finally, can ill effects (outside of alignment problems etc.) in general be expected through violation of this aliasing rule?

Yes, your code violates strict aliasing rule. In C, only char* and its signed and unsigned counterparts are assumed to alias other types.
So, the proper way to do such raw serialization is to create an array on ints, and then treat it as unsigned char buffer.
int arr[] = { 1, 2, 3, 4, 5 };
unsigned char* rawData = (unsigned char*)arr;
You can memcpy, fwrite, or do other serialization of rawData, and it is absolutely valid.
Deserialization code may look like this:
int* arr = (int*)calloc(5, sizeof(int));
memcpy(arr, rawData, 5 * sizeof(int));
Sure, you should care of endianness, padding and other issues to implement reliable serialization.

It is compiler and platform specific, on how a struct is represented (layed out) in memory and whether or not the start address of a struct is aligned to a 1,2,4,8,... byte boundary. Therefore, you should not take any assumptions on the layout of your structs members.
On platforms, where your member types require specific alignment, padding bytes are added to the struct (which equals the statement I made above, that sizeof(struct Foo) >= the sum of its data member sizes). The padding...
Now, if you fwrite() or memcpy() a struct from one instance to another, on the same machine with the same compiler and settings (e.g. in the same program of yours), you will write both the data content and the padding bytes, added by the compiler. As long as you handle the whole struct, you can successfully round trip (as long as there are no pointer members inside the struct, at least).
What you cannot assume is, that you can cast smaller types (e.g. unsigned char ) to "larger types" (e.g. unsigned int) and memcpy between those in that direction, because unsigned int might require proper alignment on that target platform. Usually if you do that wrong, you see bus errors or alike.
malloc() in the most general case is the generic way to get heap-memory for any type of data. Be it a byte array or some struct, independent of its alignment requirements. There is no system existing, where you cannot struct Foo *ps = malloc(sizeof(struct Foo)). On platforms, where alignment is vital, malloc will not return unaligned addresses as it would break any code, trying to allocate memory for a struct. As malloc() is not psychic, it will also return "struct compatible aligned" pointers if you use it to allocate byte arrays.
Any form of "ad hoc" serialization like writing the whole struct is only a promising approach as long as you need not exchange the serialized data with other machines or other applications (or future versions of the same application where someone might have tinkered with compiler settings, related to alignment).
If you look for a portable and more reliable and robust solution, you should consider using one of the main stream serialization packages, one of which being the aforementioned Google protocol buffers.

Related

Can C add padding between struct members, even if they are ordered in decreasing alignment?

struct Foo {
int a;
char b;
}
Will it be guaranteed in this case that b will have an offset of sizeof(int) in the struct? Will it be guaranteed that members will be packed together as long as all alignment requirements are being met, no padding required (Not taking into account the padding at the end to align the structures size to the largest member)?
I am asking this because I would like to know if simply fwrite()ing a struct to a save file can cause problems if the layout of a struct is not consistent across platforms, because then each save file would be specific to the platform on which it was created.

There are no guarantees. If a compiler wishes to insert unnecessary padding between structure members, it can do so. (For example, it might have determined that a particular member could be handled much more efficiently were it eight-byte aligned, even though it doesn't require alignment at all.)
Portable code intended for interoperability should not fwrite structs. Aside from the potential of differences in padding, not all platforms use the same endianness, nor the same floating point representation (if that's relevant).

Strictly speaking, the C standard makes no guarantees regarding padding inside of a struct, other than no padding at the beginning.
That being said, most implementations you're likely to come across tend to perform padding in a consistent manner (see The Lost Art of Structure Packing). In your particular example however there's a potential for inconsistency as an int is not necessarily the same size on all platforms.
If you used fixed width types such as int32_t and int8_t instead of int and char then you're probably OK. Adding a static_assert for the size of the struct can help enforce this.
You do however need to worry about endianness, converting each field to a known byte order before saving and back to the host byte order after reading back.

Is it better for performance to have alignment padding at the end of a small struct instead of between 2 members?

We know that there is padding in some structures in C. Please consider the following 2:
struct node1 {
int a;
int b;
char c;
};
struct node2 {
int a;
char c;
int b;
};
Assuming sizeof(int) = alignof(int) = 4 bytes:
sizeof(node1) = sizeof(node2) = 12, due to padding.
What is the performance difference between the two? (if any, w.r.t. the compiler or the architecture of the system, especially with GCC)

These are bad examples - in this case it doesn't matter, since the amount of padding will be the same in either case. There will not be any performance differences.
The compiler will always strive to fill up trailing padding at the end of a struct or otherwise using arrays of structs wouldn't be feasible, since the first member should always be aligned. If not for trailing padding in some item struct_array[0], then the first member in struct_array[1] would end up misaligned.
The order would matter if we were to do this though:
struct node3 {
int a;
char b;
int c;
char d;
};
Assuming 4 byte int and 4 byte alignment, then b occupies 1+3 bytes here, and d an additional 1+3 bytes. This could have been written better if the two char members were placed adjacently, in which case the total amount of padding would just have been 2 bytes.

I would not be surprised if the interviewer's opinion was based on the old argument of backward compatibility when extending the struct in the future. Additional fields (char, smallint) may benefit from the space occupied by the trailing padding, without the risk of affecting the memory offset of the existing fields.
In most cases, it's a moot point. The approach itself is likely to break compatibility, for two reasons:
Starting the extensions on a new alignment boundary (as would happen to node2) may not be memory-optimal, but it might well prevent the new fields from accidentally being overwritten by the padding of a 'legacy' struct.
When compatibility is that much of an issue (e.g. when persisting or transferring data), then it makes more sense to serialize/deserialize (even if binary is a requirement) than to depend on a binary format that varies per architecture, per compiler, even per compiler option.

OK, I might be completely off the mark here since this is a bit out of my league. If so, please correct me. But this is how I see it:
First of all, why do we need padding and alignment at all? It's just wasted bytes, isn't it? Well, turns out that processors like it. That is, if you issue an instruction to the CPU that operates on a 32-bit integer, the CPU will demand that this integer resides at a memory address which is dividable by 4. For a 64-bit integer it will need to reside in an address dividable by 8. And so on. This is done to make the CPU design simpler and better performant.
If you violate this requirement (aka "unaligned memory access"), most CPUs will raise an exception. x86 is actually an oddity because it will still perform the operation - but it will take more than twice as long because it will fetch the value from memory in two passes rather than one and then do bitwise magic to stick the value together from these separate accesses.
So this is the reason why compilers add padding to structs - so that all the members would be properly aligned and the CPU could access them quickly (or at all). Well, that's assuming the struct itself is located at a proper memory address. But it will also take care of that as long as you stick to standard operations for allocating the memory.
But it is possible to explicitly tell the compiler that you want a different alignment too. For example, if you want to use your struct to read in a bunch of data from a tightly packed file, you could explicitly set the padding to 1. In that case the compiler will also have to emit extra instructions to compensate for potential misalignment.
TL;DR - wrong alignment makes everything slower (or under certain conditions can crash your program entirely).
However this doesn't answer the question "where to better put the padding?" Padding is needed, yes, but where? Well, it doesn't make much difference directly, however by rearranging your members carefully you can reduce the size of the entire struct. And less memory used usually means a faster program. Especially if you create large arrays of these structs, using less memory will mean less memory accesses and more efficient use of CPU cache.
In your example however I don't think there's any difference.
P.S. Why does your struct end with a padding? Because arrays. The compiler wants to make sure that if you allocate an array of these structs, they will all be properly aligned. Because array members don't have any padding between them.

What is the performance difference between the two?
The performance difference is "indeterminable". For most cases it won't make any difference.
For cases where it does make a difference; either version might be faster, depending on how the structure is used. For one example, if you have a large array of these structures and frequently select a structure in the array "randomly"; then if you only access a and b of the randomly selected structure the first version can be faster (because a and b are more likely to be in the same cache line), and if you only access a and c then the second version can be faster.

Type punning and Unions in C

I'm currently working on a project to build a small compiler just for the heck of it.
I've decided to take the approach of building an extremely simple virtual machine to target so I don't have to worry about learning the ins and outs of elf, intel assembly, etc.
My question is about type punning in C using unions. I've decided to only support 32 bit integers and 32 bit float values in the vm's memory. To facilitate this, the "main memory" of the vm is set up like this:
typedef union
{
int i;
float f;
}word;
memory = (word *)malloc(mem_size * sizeof(word));
So I can then just treat the memory section as either an int or a float depending on the instruction.
Is this technically type punning? It certainly would be if I were to use ints as the words of memory and then use a float* to treat them like floats. My current approach, while syntactically different, I don't think is semantically different. In the end I'm still treating 32 bits in memory as either an int or a float.
The only information I could come up with online suggests that this is implementation dependent. Is there a more portable way to acheive this without wasting a bunch of space?
I could do the following, but then I would be taking up more than 2 times as much memory and "reinventing the wheel" with respect to unions.
typedef struct
{
int i;
float f;
char is_int;
}
Edit
I perhaps didn't make my exact question clear. I am aware that I can use either a float or an int from a union without undefined behavior. What I'm after is specifically a way to have a 32 bit memory location that I can safely use as an int or float without knowing what the last value set was. I want to account for the situation where the other type is used.

Yes, storing one member of union and reading another is type punning (assuming the types are sufficiently different). Moreover, this is the only kind of universal (any type to any type) type punning that is officially supported by C language. It is supported in a sense that the language promises that in this case the type punning will actually occur, i.e. that a physical attempt to read an object of one type as an object of another type will take place. Among other things it means that writing one member of the union and reading another member implies a data dependency between the write and the read. This, however, still leaves you with the burden of ensuring that the type punning does not produce a trap representation.
When you use casted pointers for type punning (what is usually understood as "classic" type punning), the language explicitly states that in general case the behavior is undefined (aside from reinterpreting object's value as an array of chars and other restricted cases). Compilers like GCC implement so called "strict aliasing semantics", which basically means that the pointer-based type punning might not work as you expect it to work. For example, the compiler might (and will) ignore the data dependency between type-punned reads and writes and rearrange them arbitrarily, thus completely ruining your intent. This
int i;
float f;
i = 5;
f = *(float *) &i;
can be easily rearranged into actual
f = *(float *) &i;
i = 5;
specifically because a strict-aliased compiler deliberately ignores the possibility of data dependency between the write and the read in the example.
In a modern C compiler, when you really need to perform physical reinterpretation of one objects value as value of another type, you are restricted to either memcpy-ing bytes from one object to another or to union-based type punning. There are no other ways. Casting pointers is no longer a viable option.

As long as you only access the member (int or float) which was most recently stored, there's no problem and no real implementation dependency. It's perfectly safe and well-defined to store a value in a union member and then read that same member.
(Note that there's no guarantee that int and float are the same size, though they are on every system I've seen.)
If you store a value in one member and then read the other, that's type punning. Quoting a footnote in the latest C11 draft:
If the member used to read the contents of a union object is not the
same as the member last used to store a value in the object, the
appropriate part of the object representation of the value is
reinterpreted as an object representation in the new type as described
in 6.2.6 (a process sometimes called "type punning"). This might be a
trap representation.

Why are structures copied via memcpy in embedded system code?

In embedded software domain for copying structure of same type people don't use direct assignment and do that by memcpy() function or each element copying.
lets have for example
struct tag
{
int a;
int b;
};
struct tag exmple1 = {10,20};
struct tag exmple2;
for copying exmple1 into exmple2..
instead of writing direct
exmple2=exmple1;
people use
memcpy(exmple2,exmple1,sizeof(struct tag));
or
exmple2.a=exmple1.a;
exmple2.b=exmple1.b;
why ????

One way or the other there is nothing specific about embedded systems that makes this dangerous, the language semantics are identical for all platforms.
C has been used in embedded systems for many years, and early C compilers, before ANSI/ISO standardisation did not support direct structure assignment. Many practitioners are either from that era, or have been taught by those that were, or are using legacy code written by such practitioners. This is probably the root of the doubt, but it is not a problem on an ISO compliant implementation. On some very resource constrained targets, the available compiler may not be fully ISO compliant for a number of reasons, but I doubt that this feature would be affected.
One issue (that applies to embedded and non-embedded alike), is that when assigning a structure, an implementation need not duplicate the value of any undefined padding bits, therefore if you performed a structure assignment, and then performed a memcmp() rather than member-by-member comparison to test for equality, there is no guarantee that they will be equal. However if you perform a memcpy(), any padding bits will be copied so that memcmp() and member-by-member comparison will yield equality.
So it is arguably safer to use memcpy() in all cases (not just embedded), but the improvement is marginal, and not conducive to readability. It would be a strange implementation that did not use the simplest method of structure assignment, and that is a simple memcpy(), so it is unlikely that the theoretical mismatch would occur.

In your given code there is no problem even if you write:
example2 = example1;
But just assume if in future, the struct definition changes to:
struct tag
{
int a[1000];
int b;
};
Now if you execute the assignment operator as above then (some of the) compiler might inline the code for byte by byte (or int by int) copying. i.e.
example1.a[0] = example.a[0];
example1.a[1] = example.a[1];
example1.a[2] = example.a[2];
...
which will result in code bloat in your code segment. Such kind of memory errors are not trivial to find. That's why people use memcpy.
[However, I have heard that modern compilers are capable enough to use memcpy internally when such instruction is encountered especially for PODs.]

Copying C-structures via memcpy() is often used by programmers who learned C decades ago and did not follow the standardization process since. They simple don't know that C supports assignment of structures (direct structure assignment was not available in all pre-ANSI-C89 compilers).
When they learn about this feature some still stick to the memcpy() way because it is their custom. There are also motivations that originate in cargo cult programming, e.g. it is claimed that memcpy is just faster - of course - without being able to back this up with a benchmark test case.
Structures are also memcpy()ied by some newbie programmers because they either confuse structure assignment with the assignment of a pointer of a structure - or they simply overuse memcpy() (they often also use memcpy() where strcpy() would be more appropriate).
There is also the memcmp() structure comparison anti-pattern that is sometimes cited by some programmers for using memcpy() instead of structure assignment. The reasoning behind this is the following: since C does not automatically generate a == operator for structures and writing a custom structure comparison function is tedious, memcmp() is used to compare structures. In the next step - to avoid differences in the padding bits of compared structures - memset(...,0,...) is used to initialize all structures (instead of using the C99 initializer syntax or initializing all fields separately) and memcpy() is used to copy the structures! Because memcpy() also copies the content of the padding bits ...
But note that this reasoning is flawed for several reasons:
the use of memcpy()/memcmp()/memset() introduce new error possibilities - e.g. supplying a wrong size
when the structure contains integer fields the ordering under memcmp() changes between big- and little-endian architectures
a char array field of size n that is 0-terminated at position x must also have all elements after position x zeroed out at any time - else 2 otherwise equal structs compare unequal
assignment from a register to a field may also set the neighbouring padding bits to values unequal 0, thus, following comparisons with otherwise equal structures yield an unequal result
The last point is best illustrated with a small example (assuming architecture X):
struct S {
int a; // on X: sizeof(int) == 4
char b; // on X: 24 padding bits are inserted after b
int c;
};
typedef struct S S;
S s1;
memset(&s1, 0, sizeof(S));
s1.a = 0;
s1.b = 'a';
s1.c = 0;
S s2;
memcpy(&s2, &s1, sizeof(S));
assert(memcmp(&s1, &s2, sizeof(S)==0); // assertion is always true
s2.b = 'x';
assert(memcmp(&s1, &s2, sizeof(S)!=0); // assertion is always true
// some computation
char x = 'x'; // on X: 'x' is stored in a 32 bit register
// as least significant byte
// the other bytes contain previous data
s1.b = x; // the complete register is copied
// i.e. the higher 3 register bytes are the new
// padding bits in s1
assert(memcmp(&s1, &s2, sizeof(S)==0); // assertion is not always true
The failure of the last assertion may depend on code reordering, change of the compiler, change of compiler options and stuff like that.
Conclusion
As a general rule: to increase code correctness and portability use direct struct assignment (instead of memcpy()), C99 struct initialization syntax (instead of memset) and a custom comparison function (instead of memcmp()).

In C people probably do that, because they think that memcpy would be faster. But I don't think that is true. Compiler optimizations would take care of that.
In C++ it may also have different semantics because of user defined assignment operator and copy constructors.

On top of what the others wrote some additional points:
Using memcpy instead of a simple assignment gives a hint to someone who maintains the code that the operation might be expensive. Using memcpy in these cases will improves the understanding of the code.
Embedded systems are often written with portability and performance in mind. Portability is important because you may want to re-use your code even if the CPU in the original design is not available or if a cheaper micro-controller can do the same job.
These days low-end micro-controllers come and go faster than the compiler developers can catch up, so it is not uncommon to work with compilers that use a simple byte-copy loop instead of something optimized for structure assignments. With the move to 32 bit ARM cores this is not true for a large part of embedded developers. There are however a lot of people out there who build products that target obscure 8 and 16 bit micro-controllers.
A memcpy tuned for a specific platform may be more optimal than what a compiler can generate. For example on embedded platforms having structures in flash memory is common. Reading from flash is not as slow as writing to it, but it is still a lot slower than a ordinary copy from RAM to RAM. A optimized memcpy function may use DMA or special features from the flash controller to speed up the copy process.

That is a complete nonsense. Use whichever way you prefer. The simplest is :
exmple2=exmple1;

Whatever you do, don't do this:
exmple2.a=exmple1.a;
exmple2.b=exmple1.b;
It poses a maintainability problem because any time that anyone adds a member to the structure, they have to add a line of code to do the copy of that member. Someone is going to forget to do that and it will cause a hard to find bug.

On some implementations, the way in which memcpy() is performed may differ from the way in which "normal" structure assignment would be performed, in a manner that may be important in some narrow contexts. For example, one or the other structure operand may be unaligned and the compiler might not know about it (e.g. one memory region might have external linkage and be defined in a module written in a different language that has no means of enforcing alignment). Use of a __packed declaration would be better if a compiler supported such, but not all compilers do.
Another reason for using something other than structure assignment could be that a particular implementation's memcpy might access its operands in a sequence that would work correctly with certain kinds of volatile source or destination, while that implementation's struct assignment might use a different sequence that wouldn't work. This is generally not a good reason to use memcpy, however, since aside from the alignment issue (which memcpy is required to handle correctly in any case) the specifications for memcpy don't promise much about how the operation will be performed. It would be better to use a specially-written routine which performed the operations exactly as required (for example, if the target is a piece of hardware which needs to have 4 bytes of structure data written using four 8-bit writes rather than one 32-bit writes, one should write a routine which does that, rather than hoping that no future version of memcpy decides to "optimize" the operation).
A third reason for using memcpy in some cases would be the fact that compilers will often perform small structure assignments using a direct sequence of loads and stores, rather than using a library routine. On some controllers, the amount of code this requires may vary depending upon where the structures are located in memory, to the point that the load/store sequence may end up being bigger than a memcpy call. For example, on a PICmicro controller with 1Kwords of code space and 192 bytes of RAM, coping a 4-byte structure from bank 1 to bank 0 would take 16 instructions. A memcpy call would take eight or nine (depending upon whether count is an unsigned char or int [with only 192 bytes of RAM total, unsigned char should be more than sufficient!] Note, however, that calling a memcpy-ish routine which assumed a hard-coded size and required both operands be in RAM rather than code space would only require five instructions to call, and that could be reduced to four with the use of a global variable.

first version is perfect.
second one may be used for speed (there is no reason for your size).
3rd one is used only if padding is different for target and source.

Question related to alignment

Is it a good practice to group all the variables of same type together while declaring in local scope within a function? If yes, why? Does it solve memory alignment issues?

I think it mattered with the VAX C compiler I used 20 years ago, but not with any modern compiler. It is not safe to assume that local variables will be in any particular order, certainly not safe to assume they will be in the order you declared them. I have definitely seen the MSVC compiler reorder them.
Grouping variables of the same type does help when they are fields of a struct, because the ordering of fields of a struct is guaranteed to match the order of declaration.

It depends on the compiler; i.e. the compiler will layout memory as it sees fit. So other than being good style, it has no effect (at least in any modern compilers I've used).

In general it will not help for local variables. There are optimization rules which can be applied by the complier and additional "pragma" directives that could be used to manipulate the alignment.

It will not solve alignment issues, since there shouldn't be alignment issues - the compiler will lay out your local variables correctly aligned, so there should be no alignment issues.
The only issue that grouping like-aligned types might have is to reduce use of the stack, but compilers are free to reorder the layout of variables on the stack anyway (or even reuse locations for different local variables at different times, or to keep locals in registers and not ever have them on the stack), so you're generally not buying anything for an optimized compile.
If you're going to be 'type punning' items on the stack, you'll need to use the same methods for alignment safety that you'd use for data off the stack - maybe more, since memory allocated by malloc() or new is guaranteed to be appropriately aligned for any type - that guarantee is not made for storage allocated to automatic variables.
'Type punning' is when you circumvent the type system. such as by accessing the bytes in a char array as an int by casting a char* to an int*:
int x;
char data[4];
fill_data( data, sizeof(data));
int x = *(int*) data;
Since the alignment requirement of the char[] might be different from an int, the above access of data through an int* might not be 'alignment safe'. However, since malloc() is specifed to return pointers suitably aligned for any type, the following should not have any alignment problems:
int x;
char* pData = malloc( 4);
if (!pData) exit(-1);
fill_data( pData, 4);
x = *(int*) pData;
However, note that sizeof(int) might not be 4 and int types might be little- or big-endian, so there are still portability issues with the above code - just not alignment issues. There are other ways of performing type punning including accessing data through different members of a union, but those may have their own portability issues notably that accessing a member that wasn't the last written member is unspecified behavior.

Padding and alignment issues only matter for structs, not local variables, because the compiler can put local variables in whatever order it wants. As for why it matters in structs -
Many C compilers will align struct members by inserting padding bytes between them. For example, if you have a struct S { int a; char b; int c; char d; int e; }, and the target hardware requires that ints be aligned on 4-byte boundaries, then the compiler will insert three bytes of padding between b and c and between d and e, wasting 6 bytes of memory per instance. On the other hand, if the members were in order a c, e, b, d, then it will insert two bytes of padding at the end (so that the size of S as a whole is a multiple of 4, so the members will be properly aligned when in arrays), wasting only 2 bytes per instance. The rules are very much platform and compiler-specific; some compilers may rearrange members to avoid padding, and some have extensions to control the padding and alignment rules in case you need binary compatibility. In general, you should only care about alignment if you're either reading/writing structs directly and depending on them having the same layout (which is usually a bad idea), or you expect to have lots of instances and memory is at a premium.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight