Structure Alignment Attribute when Wrapping C from Ada - c

When is the Ada type attribute Alignment needed when wrapping C structures from Ada?
Our typical wrapper structure looks like
type T is record
a : aliased Interfaces.C.unsigned_char;
b : aliased Interfaces.C.double;
end record;
Now, when/where is
for T'Alignment use 8;
needed?
And does this depend on the target architecture?

Ada 2012 LRM's definition of 'Alignment.
When 'Alignment is associated with a type, the address of objects of that type must be evenly divisible by the alignment value.
So in your definition, an object of type T could be explicitly placed, or will be automatically allocated by the compiler, at address 800, but not 804.
This becomes relevant when certain data types must obey alignment constraints, such as doubles starting on a double-word (8-byte) boundary. (This is dependent on target architecture--some impose such constraints, others do not.) Likewise, some architecture may allow multi-byte values to start on odd addresses--'Alignment use 1--but most not.
This issue is most likely to arise in situations such as you describe where you need to define an Ada layout matching an externally defined one. Specifying 'Alignment can ensure that objects and record components get properly laid out to match the external source.
Often though, especially when interfacing to C, simply applying the Convention aspect or pragma to the corresponding type definitions will ensure that the Ada layout automatically matches the C layout, leaving all the detailed aligning and padding to the compiler.

The alignment attribute's purpose is not to support C compatibility, it is cheifly to provide compatibility with hardware that has special alignment restrictions for its data (including some data for some CPUs, most typically floats and the FPU).
That being said, technically I suppose it could be useful for C compatibility in situations where you know the alignment rules used by the C compiler you are trying to make your Ada code compatible with.
In this particular case, I suspect what would happen would be that the compiler would make sure all objects of type T get laid out at a quadword boundry. However, the compiler might then chose to put seven bytes of fill in the middle of your T object to ensure that the double is also put on a quadword boundry. You can't really say without forcing its hand with a record representation clause as well. If you care how the internals of a record structure are laid out (eg: you want to ensure it matches some C layout), you probably ought to be explicit about it.

Related

Why doesn't the standard require that struct members be padded minimally?

The standard doesn't seem impose any padding requirements on struct members, even though it does prohibit reordering (6.7.2.1p6). How likely is it that a C platform will not pad minimally, i.e., not add only the minimum amount of padding needed to make sure the next member (or instance of the same struct, if this is the last member) is sufficiently aligned for its type?
Is it even sensible of the standard not to require that padding be minimal?
I'm asking because this lack of a padding guarantee seems to prevent me from portably representing serialized objects as structs (even if I limit myself to just uint8_t arrays as members, compilers seem to be allowed to add padding in between them), and I'm finding it a little weird to have to resort to offset arithmetic there.
How likely is it that a C platform will not pad minimally, i.e., not add only the minimum amount of padding needed to make sure the next member (or instance of the same struct, if this is the last member) is sufficiently aligned for its type?
Essentially, the "extra" padding may allow significant compiler optimizations.
Unfortunately, I don't know if any compilers actually do that (and therefore cannot provide any estimate on its likelihood of occurring).
As a simple example, consider a 32-bit or 64-bit architecture, where the ABI states that string literals and character arrays are aligned to 32-bit or 64-bit boundary. Many of the C library functions are (also) implemented by the C compiler itself; see e.g. these lists for GCC. The compiler can track the parameters to see if they refer to a string literal or (the beginning of a) character array, and if so, replace e.g. strcmp() with an optimized built-in version (which does the comparison in 32-bit units, rather than char-at-a-time).
As a more complicated example, consider a RISC hardware architecture, where unaligned byte access is slower than aligned native word access. (For example, the former may be implemented in hardware as the latter, followed by a bit shift.) Such an architecture could have an ABI that requires all structure members to be word-aligned. Then, the C compiler would be required to add more-than-minimal padding.
Traditionally, the C standards committee has been very careful to not exclude any kind of hardware architecture from correctly implementing the language.
Is it even sensible of the standard not to require that padding be minimal?
The purpose of the C standard used to be to ensure that C code would behave in the same manner if compiled with different compilers, and to allow implementation of the language on any sufficiently capable hardware architecture. In that sense, it is very sensible for the standard not to require minimal padding, as some ABIs may require more than minimal padding for whatever reason.
With the introduction of the Microsoft "extensions", the purpose of the C standard has shifted significantly, to binding C to C++ to ensure a C++ compiler can compile C code with minimal differences to C++ compilation, and to provide interfaces that can be marketed as "safer" with the actual purpose of balkanizing developers and binding them to a single vendor implementation. Because this is contrary to the previous purpose of the standard, and it is clearly non-sensible to standardize single-vendor functions like fscanf_s() while not standardizing multi-vendor functions like getline(), it may not be possible to define what sensible means anymore in the context of the C standard. It definitely does not match "good judgment"; it probably now refers to "being perceptible by the senses".
I'm asking because this lack of a padding guarantee seems to prevent me from portably representing serialized objects as structs
You are making the same mistake C programmers make, over and over again. Structs are not suitable for representing serialized objects. You should not use a struct to represent a network object, or a file header, because of the C struct rules.
Instead, you should use a simple character buffer, and either accessor functions (to extract or pack each member or field from the buffer), or conversion functions (to convert the buffer contents to a struct and vice versa).
The underlying reason why even experienced programmers like the asker still would prefer to use a struct instead, is that the accessors/conversion involves a lot of extra code; having the compiler do it instead would be much better: less code, simpler code, easier to maintain.
And I agree. It would even be quite straightforward, if a new keyword, say serialized_struct was introduced; to introduce a serialized data structure with completely different member rules to traditional C structs. (Note that this support would not affect e.g. linking at all, so it really is not as complicated as one might think.) Additional attributes or keywords could be used to specify explicit byte order, and the compiler would do all the conversion details for us, in whatever way the compiler sees best for the specific architecture it compiler for. This support would only be available for new code, but it would be hugely beneficial in cutting down on interoperability issues -- and it would make a lot of serialization code simpler!
Unfortunately, when you combine the C standard committee's traditional dislike to adding new keywords, and the overall direction change from interoperability to vendor lock-in, there is no chance at all for anything like this to be included in the C standard.
Of course, as described in the comments, there are lots of C libraries that implement one serialization scheme or other. I've even written a few myself (for rather peculiar use cases, though). A sensible approach (poor pun intended) would be to pick a vibrant one (well maintained, with a lively community around the library), and use it.

Are there reasons to avoid bit-field structure members?

I long knew there are bit-fields in C and occasionally I use them for defining densely packed structs:
typedef struct Message_s {
unsigned int flag : 1;
unsigned int channel : 4;
unsigned int signal : 11;
} Message;
When I read open source code, I instead often find bit-masks and bit-shifting operations to store and retrieve such information in hand-rolled bit-fields. This is so common that I do not think the authors were not aware of the bit-field syntax, so I wonder if there are reasons to roll bit-fields via bit-masks and shifting operations your own instead of relying on the compiler to generate code for getting and setting such bit-fields.
Why other programmers use hand-coded bit manipulations instead of bitfields to pack multiple fields into a single word?
This answer is opinion based as the question is quite open:
Many programmers are unaware of the availability of bitfields or unsure about their portability and precise semantics. Some even distrust the compiler's ability to produce correct code. They prefer to write explicit code that they understand.
As commented by Cornstalks, this attitude is rooted in real life experience as explained in this article.
Bitfield's actual memory layout is implementation defined: if the memory layout must follow a precise specification, bitfields should not be used and hand-coded bit manipulations may be required.
The handing of signed values in signed typed bitfields is implementation defined. If signed values are packed into a range of bits, it may be more reliable to hand-code the access functions.
Are there reasons to avoid bitfield-structs?
bitfield-structs come with some limitations:
Bit fields result in non-portable code. Also, the bit field length has a high dependency on word size.
Reading (using scanf()) and using pointers on bit fields is not possible due to non-addressability.
Bit fields are used to pack more variables into a smaller data space, but cause the compiler to generate additional code to manipulate these variables. This results in an increase in both space as well as time complexities.
The sizeof() operator cannot be applied to the bit fields, since sizeof() yields the result in bytes and not in bits.
Source
So whether you should use them or not depends. Read more in Why bit endianness is an issue in bitfields?
PS: When to use bit-fields in C?
There is no reason for it. Bitfields are useful and convenient. They are in the common use in the embedded projects. Some architectures (like ARM) have even special instructions to manipulate bitfields.
Just compare the code (and write the rest of the function foo1)
https://godbolt.org/g/72b3vY
In many cases, it is useful to be able to address individual groups of bits within a word, or to operate on a word as a unit. The Standard presently does not provide
any practical and portable way to achieve such functionality. If code is written to use bitfields and it later becomes necessary to access multiple groups as a word, there would be no nice way to accommodate that without reworking all the code using the bit fields or disabling type-based aliasing optimizations, using type punning, and hoping everything gets laid out as expected.
Using shifts and masks may be inelegant, but until C provides a means of treating an explicitly-designated sequence of bits within one lvalue as another lvalue, it is often the best way to ensure that code will be adaptable to meet needs.

Structure padding

I am learning structure padding and packing in C.
I have this doubt, as I have read padding will depend on architecture, so does it affect inter machine communication?, ie. if data created on one machine is getting read on other machine.
How this problem is avoided in this scenario.
Yes, you cannot send the binary data of a structure between platforms and expect it to look the same on the other side.
The way you solve it is you create a marshaller/demarshaller for your construct and pass it through on the way out of one system, and on the way in to the other system. This lets the compiler take care of the buffering for you on each system.
Each side knows how to take the data, as you've specified it will be sent, and deal with it for the local platform.
Platforms such as java handle this for you by creating serialization mechanisms for your classes. In C, you'll need to do this for yourself. How you do it depends on how you want to send your data. You could serialize to binary, XML, or anything else.
#pragma pack is supported by most compilers that I know of. This can allow the programmer to specify their desired padding method for structs.
http://msdn.microsoft.com/en-us/library/2e70t5y1%28v=vs.80%29.aspx
http://gcc.gnu.org/onlinedocs/gcc/Structure_002dPacking-Pragmas.html
http://clang.llvm.org/docs/UsersManual.html#microsoft-extensions
In C/C++ a structures are used as data pack. It doesn't provide any data encapsulation or data hiding features (C++ case is an exception due to its semantic similarity with classes).
Because of the alignment requirements of various data types, every member of structure should be naturally aligned. The members of structure allocated sequentially increasing order.
It will only be affected if the code you have compiled for some other architecture uses a different padding scheme.
To help alleviate problems, I recommend that you pack structures with no padding. Where padding is required, use place-holders in (eg char reserved[2]). Also, don't use bitfields!! They are not portable.
You should also be aware of other architecture-related problems. Specifically endianness, and datatype sizes. If you need better portability, you may want to serialise and de-serialise a byte stream instead of casting it as a struct.
You can use #pragma pack(1) before the struct declaration and #pragma pack() before to disable architecture based packing; this will solve half of the problem 'cause some data types are architecture based too, to solve the second half I usually use specific data type like int_16 for 16 bits integers, u_int_32 for 32 bits integers and so on.
Take a look at http://freebsd.active-venture.com/FreeBSD-srctree/newsrc/netinet/ip_icmp.h.html ; this include describe some architecture independent network data packets.

Questions about memory alignement in structures and portability of the sizeof operator

I have a question about structure padding and memory alignment optimizations regarding structures in C language. I am sending a structure over the network, I know that, for run-time optimizations purposes, the memory inside a structure is not contiguous. I've run some tests on my local computer and indeed, sizeof(my_structure) was different than the sum of all my structure members. I ran some research to find out two things :
First, the sizeof() operator retrieves the padded size of the structure (i.e the real size that would be stored in memory).
When specifying __attribute__((__packed__)) in the declaration of the structure this optimization is disabled by the compiler, so sizeof(my_structure) will be exactly the same as the sum of the fields of my structure.
That being said, i am wondering if the sizeof operator was getting the padded size on every compilers implementation and on every architecture, in other words, is it always safe to copy a structure with memcpy for example using the sizeof operator such as :
memcpy(struct_dest, struct_src, sizeof(struct_src));
I am also wondering what is the real purpose of __attribute__((__packed__)), is it used to send a less important amount the data on a network when submitting a structure or is it, in fact, used to avoid some unspecified and platform-dependant sizeof operator behaviour ?
Thanks by advance.
Different compilers on different architectures can and do use different padding. So for wire transmission it is not uncommon to pack structs to achieve a consistent binary layout. This can then cater for the code at each end of the wire running on different architecture.
However you also need to make sure that your data types are the same size if you use this approach. For example, on 64 bit systems, long is 4 bytes on Windows and 8 bytes almost everywhere else. And you also need to deal with endianness issues. The standard is to transmit over the wire in network byte order. In practice you would be better using a dedicated serialization library rather than trying to reinvent solutions to all these issues.
I am sending a structure over the network
Stop there. Perhaps some would disagree with me on this (in practice you do see a lot of projects doing this), but struct is a way of laying out things in memory - it's not a serialization mechanism. By using this tool for the job, you're already tying yourself to a bunch of non-portable assumptions.
Sure, you may be able to fake it with things like structure padding pragmas and attributes, but - can you really? Even with those non-portable mechanisms you never know what quirks might show up. I recall working in a code base where "packed" structures were used, then suddenly taking it to a platform where access had to be word aligned... even though it was nominally the same compiler (thus supported the same proprietary extensions) it produced binaries which crashed. Any pain you get from this path is probably deserved, and I would say only take it if you can be 100% sure it will only run in a given compiler and environment, and that will never change. I'd say the safer bet is to write a proper serialization mechanism that doesn't allow writing structures around across process boundaries.
Is it always safe to copy a structure with memcpy for example using the sizeof operator
Yes, it is and that is the purpose of providing the sizeof operator.
Usually __attribute__((__packed__)) is used not for size considerations but when you want want to to make sure of the layout of a structure is exactly as you want it to be.
For ex:
If a structure is to be used to match hardware or be sent on a wire then it needs to have the exact same layout without any padding.This is because different architectures usually implement different kinds & amounts of padding and alignment and the only way to ensure common ground is to remove padding out out of the picture by using packing.

Using fread to read the contents of a file into a structure

In the "Advanced Programming in the Unix Environment" book there's a part (ch 8.14, page 251) in which the author shows us the definition of the "acct" struct (used to store accounting records info). He then shows a program in which he reads the accounting data from a file into the struct (the key part of which is):
fread (&acdata, sizeof(acdata), 1, fp)
The trouble I'm having is that I've heard that C compilers will sometimes rearrange the elements of a struct in memory in order to better utilize space (due to alignment issues). So, if this code is just taking all of the content of the file and sticking it into acdata (and the contents of the file are arranged to match the ordering specified in the struct definition) if some of the elements of struct have been moved, then if I refer to them in code, I may not be getting what I expected (since the data in the file did not get rearranged the way the struct did in memory).
What am I missing (because from what I'm getting this doesn't seem reliable)?
Thanks for your help (my apologies if I've done something wrong procedurally - this is my first time posting)
Worry!
You are right to worry about this issue and pay attention to it. It's a vexing problem, and often happens when you carry your source to another machine, with a different -- even slightly different -- architecture, and perhaps with a different OS or maybe a different compiler; compile your program there; and expect your structs to remain intact over fwrite( ) and fread( ). Or when you add a 1-byte variable to your struct, recompile, and send out binaries to all your friends. Your program doesn't work on their machines anymore, for some mysterious reason.
Sometimes it works (by accident) and you never notice the problem; sometimes it doesn't work and you pull your hair out for a few days.
The isssue has nothing to do with rearrangement of struct members. Compilers don't do that. It has nothing to do with optimization, either.
The issue is byte alignment, and the Wikipedia article mentioned below tells you how to fix up your structs so they'll always be correctly aligned. It's always a good idea to pay attention to byte alignment. Otherwise your program isn't portable. And, worse, the program you carefully compiled on your whiz-bang x86-64 and distributed to all of your customers all of a sudden won't run on their 32-bit machines.
Just as important: be mindful of the lengths and alignments of the struct members, too.
There's a nice Wikipedia article that explains the details. It's a very worthwhile read.
I would be wary of a compiler-specific pragma that does the job, but just for that compiler. If you put a pragma in your code, then your program isn't C anymore.
The layout (padding and alignment, but not order) of the structure may change if you compile your code on a different compiler, or a later version of the compiler, or even with different compile-time options.
It won't change from run to run of the same compiled program - that would be a nightmare scenario :-)
So, provided the same program (or technically, any program which has the same structure layout encoded into it at compile time) is the one doing the reading, this will work just fine.
The relevant sections of the C99 standard are:
6.2.6.1/1: The representations of all types are unspecified except as stated in this subclause.
6.2.6.1/6 (the only mention of structures in that subclause): When a value is stored in an object of structure or union type, including in a member object, the bytes of the object representation that correspond to any padding bytes take unspecified values. The value of a structure or union object is never a trap representation, even though the value of a member of the structure or union object may be a trap representation.
That's the only mention of structure padding in that subclause. In other words, it's up to the implementation and they don't even need to document it (unspecified as opposed to implementation-defined, which would require documenting).
6.7.2.1/13: ... There may be unnamed padding within a structure object, but not at its beginning.
6.7.2.1/15: There may be unnamed padding at the end of a structure or union.
If you were to create version 1.1 of your program and it uses a different structure layout (new compiler, different compiler options, #pragma pack, etc), it would very quickly be evident that you had a problem during your unit tests (which should include loading in a file from the previous version).
In that case, you could include some 'intelligence' in your 1.1 program which could recognise an earlier file layout and transform the data as it comes in. That's why good file formats will often have a version indicator (for the file layout version, not the program version) as the first item in that file.
For example, quite a few of my applications use an application identifier along with a 16-bit integer at the front of the file to indicate what application and version it is and the file loader part of the program can handle at least the current and previous versions (and often every version ever created).
The program version and file layout version are separate things - they can drift if, for example, you release ten versions of your program without needing to update the file layout.
Yes
Your program will be stable.
Your question has touched off a bonfire of portability recommendations that you didn't actually ask for. The question you seemed to be asking is "is this code pattern and my program stable?". And the answer to that is yes.
You structure will not be reordered. C99 specifically prohibits rearranging the structure members.1
Also, the layout and alignment do not depend on optimization level. If they did, all programs would have to be entirely built with the same optimization level, as well as all library routines, the kernel, all kernel interfaces, etc.
Users would also have to track, forever, the optimization level of every one of those interfaces listed above that ever had been compiled as part of the system.
The memory alignment rules are really a kind of hidden ABI. They can't change without adding very specialized and by definition rarely-used compiler flags. They tend to work just fine over different compilers. (Otherwise, every element of a system identified above would ALSO have to be compiled by the same compiler, or be useless. Every compiler that supports a given system uses the exact same alignment rules. Nothing would work, otherwise.) The compiler flags that change alignment policies are usually intended to be built into the compiler configuration for a given OS.
Now, your binary file layout, while perfectly reasonable, is a bit old-school. It has certain drawbacks. While none of these are show-stoppers and none are generally worth rewriting an app, they include:
it's hard to debug binary files
they do lock in a single byte order and a single alignment policy. In the (sadly, increasingly unlikely) case where you need to port to a new architecture, you might end up needing to unpack the record with memcpy(3). Not the end of the world.
they aren't structured. Things like YAML and, ahem, even XML are sort of self-parsing, so it becomes a lot easier to read in a file, and certain types of file manipulations can be done with tools. Even more important, the file format itself becomes more flexible. Your ability to take advantage of the auto-parsed-object is limited, however, in C and C++.
As I understand Paxdiablo's request, he would like me to agree that there exist compiler options and pragmas that, if used, will alter the alignment rules. That's true. Obviously these options are used only for specific reasons.
1. C99 6.7.2.1(13) Within a structure object, the non-bit-field members and the units in which bit-fields
reside have addresses that increase in the order in which they are declared.
The struct is written to the file based on how it is in memory. The ordering will be the same. Mixing compilers between write and read might be an issue however.

Resources