Why c-faq 16.7 solution does not consider padding bytes? - c

http://c-faq.com/strangeprob/ptralign.html
In 16.7, the author explains:
s.i32 = *(long int *)p;
s.i16 = *(int *)p;
will get into trouble 'cause these casted pointer may not be aligned. So he uses byte wise manipulation instead for solution.
My question is, since this code:
struct mystruct {
char c;
long int i32;
int i16;
} s;
will have padding bytes after 'char c;', why didn't the author skip the padding when he try to get the 'long int i32;'?

Keep in mind this is supposed to be a FAQ list. You have to read the Q and the A as if they were written by different people. Theoretically, the Q is something that actually gets Asked Frequently by people who don't know the answer. In reality it's probably not a direct quote from an actual question, but a sort of idealized version of the question made up by the FAQ list maintainter. But still, when writing the Q section the author adopts a different persona.
In this case the questioning persona doesn't know about alignment or padding. He writes char buf[7] and the struct definition and thinks they should both be 7 bytes long. The 7-byte buffer is an external data format (in a file or a network protocol stream) that the questioner is trying to parse, the struct represents the variables he would like to parse it into, and the statements like s.i32 = *(long int *)p; are his unsuccessful attempt at doing it.
In the A section, our author drops that persona and gives the correct method of transferring the data from the packed 7-byte buffer into the struct. He doesn't explain every detail of alignment and padding rules as applied to the struct and the char buffer because he wants to keep the answer brief.
You're looking at a true old-style newsgroup FAQ list, which was designed to actually answer the questions that people ask frequently, not a corporate web-site style "FAQ" in which a marketing team makes up fake questions designed to flatter the company and avoid answering any complaints. (And does anybody else remember when there was a distinction between a FAQ which was a single question and a FAQL which was the FAQ List with answers? Where did that go?)

Related

Determine struct size, ignoring padding

I receive datagrams through a network and I would like to copy the data to a struct with the appropirate fields (corresponding to the format of the message). There are many different types of datagrams (with different fields and size). Here is a simplified version (in reality the fields are always arrays of chars):
struct dg_a
{
char id[2];
char time[4];
char flags;
char end;
};
struct dg_a data;
memcpy(&data, buffer, offsetof(struct dg_a, end));
Currently I add a dummy field called end to the end of the struct so that I can use offsetof to determine how many bytes to copy.
Is there a better and less error-prone way to do this? I was looking for something more portable than putting __attribute__((packed)) and using sizeof.
--
EDIT
Several people in the comments had stated that my approach is bad, but so far nobody has presented a reason why this is. Since struct members are char, there are no trap representations and no paddings between the members (guaranteed by the standard).
A central issue is the size of buffer (assumed to be a character array). The 2 below copy, perhaps a few byte difference.
memcpy(&data, buffer, offsetof(struct dg_a, end)); // 7
// or
memcpy(&data, buffer, sizeof data); // 7, 8, 16 depends on alignment.
Consider avoiding those issues and use buffer as wide as any data structure and zero filled/padded prior to being populated with incoming data.
struct dg_a {
char id[2];
char time[4];
char flags;
}; // no end field
union dg_all {
struct dg_a a;
struct dg_b b;
...
struct dg_z z;
} buffer = { 0 };
foo(&buffer, sizeof buffer); // get data
switch (bar(buffer)) {
case `a` {
struct dg_a data = buffer.a; // Ditch the memcpy
// or maybe no need for copy, just use `buffer.a`
If the term "language" refers to a mapping between source text and behavior, the name C describes two families of languages:
The family of languages which mapped "C syntax" to the behaviors of commonplace microcomputer hardware in ways which were defined more by precedent than specification, but were essentially 100% consistent throughout the 1980s and most of the 1990s among implementations targeting commonplace hardware.
The family of all languages that meet the C Specification, including those processed by deliberately-capricious implementations.
Even though the authors of the C Standard recognized that it would not be practical to mandate that all implementations be suitable for all of the purposes served by C programs, a mentality has emerged in some fields that the only programs that should be considered "portable" are those which the Standard requires all implementations to support. A program which could be broken by a deliberately-capricious implementation should (given that mentality) be viewed as "non-portable" or "erroneous", even if it would benefit greatly from semantics which compilers for commonplace hardware had unanimously supported during the late 20th century, and for which the Standard defines no nice replacements.
Because compilers targeting certain fields like high-end number crunching can benefit from assuming that code won't rely upon certain hardware features, and because the authors of the Standard didn't want to get into details of deciding what implementations should be regarded as suitable for what purposes, some compiler writers really don't want to support code which attempts to overlay data onto structures. Such constructs may be more readable than code which tries to manually parse all the data, and compilers that endeavor to support such code may be able to process it more easily and efficiently than code which manually parses all the data, but since the Standard would allow compilers to assign struct layouts in silly ways if they chose to do so, compiler writers have a mentality that any code which tries to overlay data onto structures should be considered defective.
C has no standard mechanism for avoiding padding between structure elements or at the end of the structure. Many implementations provide such a thing as an extension, however, and inasmuch as you seem to want to match structure layout to network message payloads, your only alternative is to rely on such an extension.
Although using __attribute__((packed)) or a work-alike will enable you to use sizeof for your purpose, that's just a bonus. The main point of doing so is to match the structure layout to the network message structure for the benefit of your proposed memory copying. If the structure is laid out with internal padding where the protocol message has none, then a direct, whole-message copy such as you propose simply cannot work. That sizeof otherwise does not give you the correct size is only a symptom of the larger problem.
Note also that you may face other issues with copying raw bytes, too. In particular, if you intend to exchange messages between machines with different architectures, and these message contain integers larger than one byte, then you need to account for byte-order differences. If the protocol is well designed, then it in fact specifies byte order. Similarly, if you're passing around character data then you may need to deal with encoding issues (which may themselves have have their own byte-ordering considerations).
Overall, you are unlikely to be able to build a robust, portable protocol implementation based on copying whole message payloads into corresponding structures, all at once. At minimum, you would likely need to perform message-type-specific fixup after the main copy. I recommend instead biting the bullet and writing appropriate marshalling functions for each message type into and out of the corresponding network representation. You'll more easily make this portable.

Linux kernel: why do 'subclass' structs put base class info at end?

I was reading the chapter in Beautiful Code on the Linux kernel and the author discusses how Linux kernel implements inheritance in the C language (amongst other topics). In a nutshell, a 'base' struct is defined and in order to inherit from it the 'subclass' struct places a copy of the base at the end of the subclass struct definition. The author then spends a couple pages explaining a clever and complicated macro to figure out how many bytes to back in order to convert from the base part of the object to the subclass part of the object.
My question: Within the subclass struct, why not declare the base struct as the first thing in the struct, instead of the last thing?
The main advantage of putting the base struct stuff first is when casting from the base to the subclass you wouldn't need to move the pointer at all - essentially, doing the cast just means telling the compiler to let your code use the 'extra' fields that the subclass struct has placed after the stuff that the base defines.
Just to clarify my question a little bit let me throw some code out:
struct device { // this is the 'base class' struct
int a;
int b;
//etc
}
struct usb_device { // this is the 'subclass' struct
int usb_a;
int usb_b;
struct device dev; // This is what confuses me -
// why put this here, rather than before usb_a?
}
If one happens to have a pointer to the "dev" field inside of a usb_device object then in order to cast it back to that usb_device object one needs to subtract 8 from that pointer. But if "dev" was the first thing in a usb_device casting the pointer wouldn't need to move the pointer at all.
Any help on this would be greatly appreciated. Even advice on where to find an answer would be appreciated - I'm not really sure how to Google for the architectural reason behind a decision like this. The closest I could find here on StackOverflow is:
why to use these weird nesting structure
And, just to be clear - I understand that a lot of bright people have worked on the Linux kernel for a long time so clearly there's a good reason for doing it this way, I just can't figure out what it is.
The Amiga OS uses this "common header" trick in a lot of places and it looked like a good idea at the time: Subclassing by simply casting the pointer type. But there are drawbacks.
Pro:
You can extend existing data structures
You can use the same pointer in all places where the base type is expected, no pointer arithmetic needed, saving precious cycles
It feels natural
Con:
Different compilers tend to align data structures differently. If the base structure ended with char a;, then you could have 0, 1 or 3 pad bytes afterwards before the next field of the subclass starts. This led to quite nasty bugs, especially when you had to maintain backwards compatibility (i.e. for some reason, you have to have a certain padding because an ancient compiler version had a bug and now, there is lots of code which expects the buggy padding).
You don't notice quickly when you pass the wrong structure around. With the code in your question, fields get trashed very quickly if the pointer arithmetic is wrong. That is a good thing since it raises chances that a bug is discovered more early.
It leads to an attitude "my compiler will fix it for me" (which it sometimes won't) and all the casts lead to a "I know better than the compiler" attitude. The latter one would make you automatically insert casts before understanding the error message, which would lead to all kinds of odd problems.
The Linux kernel is putting the common structure elsewhere; it can be but doesn't have to be at the end.
Pro:
Bugs will show early
You will have to do some pointer arithmetic for every structure, so you're used to it
You don't need casts
Con:
Not obvious
Code is more complex
I'm new to the Linux kernel code, so take my ramblings here with a grain of salt. As far as I can tell, there is no requirement as to where to put the "subclass" struct. That is exactly what the macros provide: You can cast to the "subclass" structure, regardless of its layout. This provides robustness to your code (the layout of a structure can be changed, without having to change your code.
Perhaps there is a convention of placing the "base class" struct at the end, but I'm not aware of it. I've seen lots of code in drivers, where different "base class" structs are used to cast back to the same "subclass" structure (from different fields in the "subclass" of course).
I don't have fresh experience from the Linux kernel, but from other kernels. I'd say that this doesn't matter at all.
You are not supposed to cast from one to the other. Allowing casts like that should only be done in very specific situations. In most cases it reduces the robustness and flexibility of the code and is considered quite sloppy. So the deepest "architectural reason" you're looking for might just be "because that's the order someone happened to write it in". Or alternatively, that's what the benchmarks showed would be the best for performance of some important code path in that code. Or alternatively, the person who wrote it thinks it looks pretty (I always build upside-down pyramids in my variable declarations and structs if I have no other constraints). Or someone happened to write it this way 20 years ago and since then everyone else has been copying it.
There might be some deeper design behind this, but I doubt it. There's just no reason to design those things at all. If you want to find out from an authoritative source why it's done this way, just submit a patch to linux that changes it and see who yells at you.
It's for multiple inheritance. struct dev isn't the only interface you can apply to a struct in the linux kernel, and if you have more than one, just casting the sub class to a base class wouldn't work. For example:
struct device {
int a;
int b;
// etc...
};
struct asdf {
int asdf_a;
};
struct usb_device {
int usb_a;
int usb_b;
struct device dev;
struct asdf asdf;
};

explain notifier.c from the Linux kernel

I'm seeking to fully understand the following code snippet from kernel/notifier.c. I have read and built simple link lists and think I get the construct from K&R's C programming. However this is slightly more complex. The second line below which begins with the 'int' appears to be two items together which is unclear.
The first is the (*notifier_call) which I believe has independent but related significance with the second containing a 'notifier block' term.
Can you explain how it works in detail? I understand that there is a function pointer and multiple subscribers possible. But I lack the way to tie these facts together, and could use a primer or key so I exactly understand how the code works. The third line looks to contain the linking structure, or recursive nature. Forgive my terms, and correct them as fit as I am a new student of computer science terminology.
struct notifier_block {
int (*notifier_call)(struct notifier_block *, unsigned long, void *);
struct notifier_block *next;
int priority;
};
This is a simple pointer to a function returning int, taking as its arguments a pointer to a notifier_block structure, an unsigned long, and a pointer to void.
It has nothing to do with the linked list, that is linked by the next member.

what the author of nedtries means by "in-place"?

I. Just implemented a kind of bitwise trie (based on nedtries), but my code does lot
Of memory allocation (for each node).
Contrary to my implemetation, nedtries are claimed to be fast , among othet things,
Because of their small number of memory allocation (if any).
The author claim his implementation to be "in-place", but what does it really means in this context ?
And how does nedtries achieve such a small number of dynamic memory allocation ?
Ps: I know that the sources are available, but the code is pretty hard to follow and I cannot figure how it works
I'm the author, so this is for the benefit of the many according to Google who are similarly having difficulties in using nedtries. I would like to thank the people here on stackflow for not making unpleasant comments about me personally which some other discussions about nedtries do.
I am afraid I don't understand the difficulties with knowing how to use it. Usage is exceptionally easy - simply copy the example in the Readme.html file:
typedef struct foo_s foo_t;
struct foo_s {
NEDTRIE_ENTRY(foo_t) link;
size_t key;
};
typedef struct foo_tree_s foo_tree_t;
NEDTRIE_HEAD(foo_tree_s, foo_t);
static foo_tree_t footree;
static size_t fookeyfunct(const foo_t *RESTRICT r)
{
return r->key;
}
NEDTRIE_GENERATE(static, foo_tree_s, foo_s, link, fookeyfunct, NEDTRIE_NOBBLEZEROS(foo_tree_s));
int main(void)
{
foo_t a, b, c, *r;
NEDTRIE_INIT(&footree);
a.key=2;
NEDTRIE_INSERT(foo_tree_s, &footree, &a);
b.key=6;
NEDTRIE_INSERT(foo_tree_s, &footree, &b);
r=NEDTRIE_FIND(foo_tree_s, &footree, &b);
assert(r==&b);
c.key=5;
r=NEDTRIE_NFIND(foo_tree_s, &footree, &c);
assert(r==&b); /* NFIND finds next largest. Invert the key function to invert this */
NEDTRIE_REMOVE(foo_tree_s, &footree, &a);
NEDTRIE_FOREACH(r, foo_tree_s, &footree)
{
printf("%p, %u\n", r, r->key);
}
NEDTRIE_PREV(foo_tree_s, &footree, &a);
return 0;
}
You declare your item type - here it's struct foo_s. You need the NEDTRIE_ENTRY() inside it otherwise it can contain whatever you like. You also need a key generating function. Other than that, it's pretty boilerplate.
I wouldn't have chosen this system of macro based initialisation myself! But it's for compatibility with the BSD rbtree.h so nedtries is very easy to swap in to anything using BSD rbtree.h.
Regarding my usage of "in place"
algorithms, well I guess my lack of
computer science training shows
here. What I would call "in place"
is when you only use the memory
passed into a piece of code, so if
you hand 64 bytes to an in place
algorithm it will only touch that 64
bytes i.e. it won't make use of
extra metadata, or allocate some
extra memory, or indeed write to
global state. A good example is an
"in place" sort implementation where
only the collection being sorted
(and I suppose the thread stack)
gets touched.
Hence no, nedtries doesn't need a
memory allocator. It stores all the
data it needs in the NEDTRIE_ENTRY
and NEDTRIE_HEAD macro expansions.
In other words, when you allocate
your struct foo_s, you do all the
memory allocation for nedtries.
Regarding understanding the "macro
goodness", it's far easier to
understand the logic if you compile
it as C++ and then debug it :). The
C++ build uses templates and the
debugger will cleanly show you state
at any given time. In fact, all
debugging from my end happens in a
C++ build and I meticulously
transcribe the C++ changes into
macroised C.
Lastly, before a new release, I
search Google for people having
problems with my software to see if
I can fix things and I am typically
amazed what someone people say about
me and my free software. Firstly,
why didn't those people having
difficulties ask me directly for
help? If I know that there is
something wrong with the docs, then
I can fix them - equally, asking on
stackoverflow doesn't let me know
immediately that there is a docs
problem bur rather relies on me to
find it next release. So all I would
say is that if anyone finds a
problem with my docs, please do
email me and say so, even if there
is a discussion say like here on
stackflow.
Niall
I took a look at the nedtrie.h source code.
It seems that the reason it is "in-place" is that you have to add the trie bookkeeping data to the items that you want to store.
You use the NEDTRIE_ENTRY macro to add parent/child/next/prev links to your data structure, and you can then pass that data structure to the various trie routines, which will extract and use those added members.
So it is "in-place" in the sense that you augment your existing data structures and the trie code piggybacks on that.
At least that's what it looks like. There's lots of macro goodness in that code so I could have gotten myself confused (:
In-place means you operate on the original (input) data, so the input data becomes the output data. Not-in-place means that you have separate input and output data, and the input data is not modified. In-place operations have a number of advantages - smaller cache/memory footprint, lower memory bandwidth, hence typically better performance, etc, but they have the disadvantage that they are destructive, i.e. you lose the original input data (which may or may not matter, depending on the use case).
In-place means to operate on the input data and (possibly) update it. The implication is that there no copying and/moving of the input data. This may result in loosing the input data original values which you will need to consider if it is relevant for your particular case.

Parsing Binary Data in C?

Are there any libraries or guides for how to read and parse binary data in C?
I am looking at some functionality that will receive TCP packets on a network socket and then parse that binary data according to a specification, turning the information into a more useable form by the code.
Are there any libraries out there that do this, or even a primer on performing this type of thing?
I have to disagree with many of the responses here. I strongly suggest you avoid the temptation to cast a struct onto the incoming data. It seems compelling and might even work on your current target, but if the code is ever ported to another target/environment/compiler, you'll run into trouble. A few reasons:
Endianness: The architecture you're using right now might be big-endian, but your next target might be little-endian. Or vice-versa. You can overcome this with macros (ntoh and hton, for example), but it's extra work and you have make sure you call those macros every time you reference the field.
Alignment: The architecture you're using might be capable of loading a mutli-byte word at an odd-addressed offset, but many architectures cannot. If a 4-byte word straddles a 4-byte alignment boundary, the load may pull garbage. Even if the protocol itself doesn't have misaligned words, sometimes the byte stream itself is misaligned. (For example, although the IP header definition puts all 4-byte words on 4-byte boundaries, often the ethernet header pushes the IP header itself onto a 2-byte boundary.)
Padding: Your compiler might choose to pack your struct tightly with no padding, or it might insert padding to deal with the target's alignment constraints. I've seen this change between two versions of the same compiler. You could use #pragmas to force the issue, but #pragmas are, of course, compiler-specific.
Bit Ordering: The ordering of bits inside C bitfields is compiler-specific. Plus, the bits are hard to "get at" for your runtime code. Every time you reference a bitfield inside a struct, the compiler has to use a set of mask/shift operations. Of course, you're going to have to do that masking/shifting at some point, but best not to do it at every reference if speed is a concern. (If space is the overriding concern, then use bitfields, but tread carefully.)
All this is not to say "don't use structs." My favorite approach is to declare a friendly native-endian struct of all the relevant protocol data without any bitfields and without concern for the issues, then write a set of symmetric pack/parse routines that use the struct as a go-between.
typedef struct _MyProtocolData
{
Bool myBitA; // Using a "Bool" type wastes a lot of space, but it's fast.
Bool myBitB;
Word32 myWord; // You have a list of base types like Word32, right?
} MyProtocolData;
Void myProtocolParse(const Byte *pProtocol, MyProtocolData *pData)
{
// Somewhere, your code has to pick out the bits. Best to just do it one place.
pData->myBitA = *(pProtocol + MY_BITS_OFFSET) & MY_BIT_A_MASK >> MY_BIT_A_SHIFT;
pData->myBitB = *(pProtocol + MY_BITS_OFFSET) & MY_BIT_B_MASK >> MY_BIT_B_SHIFT;
// Endianness and Alignment issues go away when you fetch byte-at-a-time.
// Here, I'm assuming the protocol is big-endian.
// You could also write a library of "word fetchers" for different sizes and endiannesses.
pData->myWord = *(pProtocol + MY_WORD_OFFSET + 0) << 24;
pData->myWord += *(pProtocol + MY_WORD_OFFSET + 1) << 16;
pData->myWord += *(pProtocol + MY_WORD_OFFSET + 2) << 8;
pData->myWord += *(pProtocol + MY_WORD_OFFSET + 3);
// You could return something useful, like the end of the protocol or an error code.
}
Void myProtocolPack(const MyProtocolData *pData, Byte *pProtocol)
{
// Exercise for the reader! :)
}
Now, the rest of your code just manipulates data inside the friendly, fast struct objects and only calls the pack/parse when you have to interface with a byte stream. There's no need for ntoh or hton, and no bitfields to slow down your code.
The standard way to do this in C/C++ is really casting to structs as 'gwaredd' suggested
It is not as unsafe as one would think. You first cast to the struct that you expected, as in his/her example, then you test that struct for validity. You have to test for max/min values, termination sequences, etc.
What ever platform you are on you must read Unix Network Programming, Volume 1: The Sockets Networking API. Buy it, borrow it, steal it ( the victim will understand, it's like stealing food or something... ), but do read it.
After reading the Stevens, most of this will make a lot more sense.
Let me restate your question to see if I understood properly. You are
looking for software that will take a formal description of a packet
and then will produce a "decoder" to parse such packets?
If so, the reference in that field is PADS. A good article
introducing it is PADS: A Domain-Specific Language for Processing Ad
Hoc Data. PADS is very complete but unfortunately under a non-free licence.
There are possible alternatives (I did not mention non-C
solutions). Apparently, none can be regarded as completely production-ready:
binpac
PacketTypes
DataScript
If you read French, I summarized these issues in Génération de
décodeurs de formats binaires.
In my experience, the best way is to first write a set of primitives, to read/write a single value of some type from a binary buffer. This gives you high visibility, and a very simple way to handle any endianness-issues: just make the functions do it right.
Then, you can for instance define structs for each of your protocol messages, and write pack/unpack (some people call them serialize/deserialize) functions for each.
As a base case, a primitive to extract a single 8-bit integer could look like this (assuming an 8-bit char on the host machine, you could add a layer of custom types to ensure that too, if needed):
const void * read_uint8(const void *buffer, unsigned char *value)
{
const unsigned char *vptr = buffer;
*value = *buffer++;
return buffer;
}
Here, I chose to return the value by reference, and return an updated pointer. This is a matter of taste, you could of course return the value and update the pointer by reference. It is a crucial part of the design that the read-function updates the pointer, to make these chainable.
Now, we can write a similar function to read a 16-bit unsigned quantity:
const void * read_uint16(const void *buffer, unsigned short *value)
{
unsigned char lo, hi;
buffer = read_uint8(buffer, &hi);
buffer = read_uint8(buffer, &lo);
*value = (hi << 8) | lo;
return buffer;
}
Here I assumed incoming data is big-endian, this is common in networking protocols (mainly for historical reasons). You could of course get clever and do some pointer arithmetic and remove the need for a temporary, but I find this way makes it clearer and easier to understand. Having maximal transparency in this kind of primitive can be a good thing when debugging.
The next step would be to start defining your protocol-specific messages, and write read/write primitives to match. At that level, think about code generation; if your protocol is described in some general, machine-readable format, you can generate the read/write functions from that, which saves a lot of grief. This is harder if the protocol format is clever enough, but often doable and highly recommended.
You might be interested in Google Protocol Buffers, which is basically a serialization framework. It's primarily for C++/Java/Python (those are the languages supported by Google) but there are ongoing efforts to port it to other languages, including C. (I haven't used the C port at all, but I'm responsible for one of the C# ports.)
You don't really need to parse binary data in C, just cast some pointer to whatever you think it should be.
struct SomeDataFormat
{
....
}
SomeDataFormat* pParsedData = (SomeDataFormat*) pBuffer;
Just be wary of endian issues, type sizes, reading off the end of buffers, etc etc
Parsing/formatting binary structures is one of the very few things that is easier to do in C than in higher-level/managed languages. You simply define a struct that corresponds to the format you want to handle and the struct is the parser/formatter. This works because a struct in C represents a precise memory layout (which is, of course, already binary). See also kervin's and gwaredd's replies.
I'm not really understand what kind of library you are looking for ? Generic library that will take any binary input and will parse it to unknown format?
I'm not sure there is such library can ever exist in any language.
I think you need elaborate your question a little bit.
Edit:
Ok, so after reading Jon's answer seems there is a library, well kind of library it's more like code generation tool. But as many stated just casting the data to the appropriate data structure, with appropriate carefulness i.e using packed structures and taking care of endian issues you are good. Using such tool with C it's just an overkill.
Basically suggestions about casting to struct work but please be aware that numbers can be represented differently on different architectures.
To deal with endian issues network byte order was introduced - common practice is to convert numbers from host byte order to network byte order before sending the data and to convert back to host order on receipt. See functions htonl, htons, ntohl and ntohs.
And really consider kervin's advice - read UNP. You won't regret it!

Resources