I am new to the C language and just learned about structs and pointers.
My question is related to the offsetof macro I recently saw. I know how it works and the logic behind that.
In the <stddef.h> file the definition is as follows:
#define offsetof(type,member) ((unsigned long) &(((type*)0)->member))
My question is, if I have a struct as shown below:
struct test {
int field1:
int field2:
};
struct test var;
Why cannot I directly get the address of field2 as:
char * p = (char *)&var;
char *addressofField2 = p + sizeof(int);
Rather than writing something like this
field2Offset = offsetof (struct test, field2);
and then adding offset value to var's starting address?
Is there any difference? Is using offsetof more efficient?
The C compiler will often add extra padding bits or bytes between members of a struct in order to improve efficiency and keep integers word-aligned (which in some architectures is required to avoid bus errors and in some architectures is required to avoid efficiency problems). For example, in many compilers, if you have this struct:
struct ImLikelyPadded {
int x;
char y;
int z;
};
you might find that sizeof(struct ImLikelyPadded) is 12, not 9, because the compiler will insert three extra padding bytes between the end of the one-byte char y and the word-sized int z. This is why offsetof is so useful - it lets you determine where things really are even factoring in padding bytes and is highly portable.
Unlike arrays, memory layout of struct is not always contiguous. Compiler may add extra bytes, in order to align the memory. This is called padding.
Because of padding, it us difficult to find location of member manually. This is also why we always use sizeof to find struct size.
Offsetof , macro let you find out the distance,offset, of a member of struct from the strating position of the struct.
One intelligent use if offsetof is seen in Linux kernel's container_of macro. This macro let you find out starting position of node given the address of member in a generic inclusive doubly linked list
As already mentioned in the other answers, padding is one of the reasons. I won't repeat what was already said about it.
Another good reason to use the offsetof macro and not manually compute offsets is that you only have to write it once. Imagine what happens if you need to change the type of field1 or insert or remove one or more fields in front of field2. Using your hand-crafted calculation you have to find and change all its occurrences. Missing one of them will produce mysterious bugs that are difficult to find.
The code written using offsetof doesn't need any update in this situation. The compiler takes care of everything on the next compilation.
Even more, the code that uses offsetof is more clear. The macro is standard, its functionality is documented. A fellow programmer that reads the code understands it immediately. It's not that easy to understand what the hand-crafted code attempts.
Related
I am working on refactoring some old code and have found few structs containing zero length arrays (below). Warnings depressed by pragma, of course, but I've failed to create by "new" structures containing such structures (error 2233). Array 'byData' used as pointer, but why not to use pointer instead? or array of length 1? And of course, no comments were added to make me enjoy the process...
Any causes to use such thing? Any advice in refactoring those?
struct someData
{
int nData;
BYTE byData[0];
}
NB It's C++, Windows XP, VS 2003
Yes this is a C-Hack.
To create an array of any length:
struct someData* mallocSomeData(int size)
{
struct someData* result = (struct someData*)malloc(sizeof(struct someData) + size * sizeof(BYTE));
if (result)
{ result->nData = size;
}
return result;
}
Now you have an object of someData with an array of a specified length.
There are, unfortunately, several reasons why you would declare a zero length array at the end of a structure. It essentially gives you the ability to have a variable length structure returned from an API.
Raymond Chen did an excellent blog post on the subject. I suggest you take a look at this post because it likely contains the answer you want.
Note in his post, it deals with arrays of size 1 instead of 0. This is the case because zero length arrays are a more recent entry into the standards. His post should still apply to your problem.
http://blogs.msdn.com/oldnewthing/archive/2004/08/26/220873.aspx
EDIT
Note: Even though Raymond's post says 0 length arrays are legal in C99 they are in fact still not legal in C99. Instead of a 0 length array here you should be using a length 1 array
This is an old C hack to allow a flexible sized arrays.
In C99 standard this is not neccessary as it supports the arr[] syntax.
Your intution about "why not use an array of size 1" is spot on.
The code is doing the "C struct hack" wrong, because declarations of zero length arrays are a constraint violation. This means that a compiler can reject your hack right off the bat at compile time with a diagnostic message that stops the translation.
If we want to perpetrate a hack, we must sneak it past the compiler.
The right way to do the "C struct hack" (which is compatible with C dialects going back to 1989 ANSI C, and probably much earlier) is to use a perfectly valid array of size 1:
struct someData
{
int nData;
unsigned char byData[1];
}
Moreover, instead of sizeof struct someData, the size of the part before byData is calculated using:
offsetof(struct someData, byData);
To allocate a struct someData with space for 42 bytes in byData, we would then use:
struct someData *psd = (struct someData *) malloc(offsetof(struct someData, byData) + 42);
Note that this offsetof calculation is in fact the correct calculation even in the case of the array size being zero. You see, sizeof the whole structure can include padding. For instance, if we have something like this:
struct hack {
unsigned long ul;
char c;
char foo[0]; /* assuming our compiler accepts this nonsense */
};
The size of struct hack is quite possibly padded for alignment because of the ul member. If unsigned long is four bytes wide, then quite possibly sizeof (struct hack) is 8, whereas offsetof(struct hack, foo) is almost certainly 5. The offsetof method is the way to get the accurate size of the preceding part of the struct just before the array.
So that would be the way to refactor the code: make it conform to the classic, highly portable struct hack.
Why not use a pointer? Because a pointer occupies extra space and has to be initialized.
There are other good reasons not to use a pointer, namely that a pointer requires an address space in order to be meaningful. The struct hack is externalizeable: that is to say, there are situations in which such a layout conforms to external storage such as areas of files, packets or shared memory, in which you do not want pointers because they are not meaningful.
Several years ago, I used the struct hack in a shared memory message passing interface between kernel and user space. I didn't want pointers there, because they would have been meaningful only to the original address space of the process generating a message. The kernel part of the software had a view to the memory using its own mapping at a different address, and so everything was based on offset calculations.
It's worth pointing out IMO the best way to do the size calculation, which is used in the Raymond Chen article linked above.
struct foo
{
size_t count;
int data[1];
}
size_t foo_size_from_count(size_t count)
{
return offsetof(foo, data[count]);
}
The offset of the first entry off the end of desired allocation, is also the size of the desired allocation. IMO it's an extremely elegant way of doing the size calculation. It does not matter what the element type of the variable size array is. The offsetof (or FIELD_OFFSET or UFIELD_OFFSET in Windows) is always written the same way. No sizeof() expressions to accidentally mess up.
I have a structure like this
struct packet
{
int seqnum;
char type[1];
float time1;
float pri;
float time2;
unsigned char data[512];
}
I am receiving packet in an array
char buf[529];
I want to take the seqnum,data everything separately.Does the following typecast work.. It is giving junk value for me.
struct packet *pkt;
pkt=(struct packet *)buf;
printf(" %d",pkt->seqnum)
No, that likely won't work and is generally a bad and broken way of doing this.
You must use compiler-specific extensions to make sure there's no invisible padding between your struct members, for something like that to work. With gcc, for instance, you do this using the __attribute__() syntax.
It is, thus, not a portable idea.
It's much better to be explicit about it, and unpack each field. This also gives you a chance to have a well-defined endianness in your network protocol, which is generally a good idea for interoperability's sake.
No, that isn't generally valid code. You should make the struct first and then memcopy stuff into it:
packet p;
memcpy(&p.seqnum, buf + 0, 4);
memcpy(&p.type[0], buf + 4, 1);
memcpy(&p.time1, buf + 5, 4);
And so forth.
You must take great care to get the type sizes and endianness right.
First of all, you cannot know in advance where the compiler will insert padding bytes in your structure for performance optimization (cache line alignment, integer alignment etc) since this is platform-dependent. Except, of course, if you are considering building the app only on your platform.
Anyway, in your case it seems like you are getting data from somewhere (network ?) and it is highly probable that the data has been compacted (no padding bytes between fields).
If you really want to typecast your array to a struct pointer, you can still tell the compiler to remove the padding bytes it might add. Note that this depends on the compiler you use and is not a standard C implementation. With gcc, you might add this statement at the end of your structure definition :
struct my_struct {
int blah;
/* Blah ... */
} __attribute__((packed));
Note that it will affect the performance for member access, copy etc ...
Unless you have a very good reason to do so, don't ever use the __attribute__((packed)) thing !
The other solution, which is much more advisable is to make the parsing on your own. You just allocate an appropriate structure and fill its fields by seeking the good information from your buffer. A sequence of memcpy instructions is likely to do the trick here (see Kerrek's answer)
I want to grab part of the data inside some C structs to partially serialize/deserialize them, writing the bytes from memory to disk, and viceversa.
The structs are not known in advance, they are dynamically built with my own C code generator (as well as the code that will serialize it). The serializable fields will be placed at the beginning of the struct.
Let's say a have a struct with 4 fields, and the first two are to be serialized:
typedef struct {
int8_t x1;
int32_t x2; /* 1 + 4 = 5 bytes (if packed) */
int8_t y1;
int32_t y2; /* 1 + 4 +1 + 4 = 10 bytes (if packed) */
} st;
I plan to grab the pointer to the struct variable and write/read the n bytes that cover those two first fields (x1, x2). I don't think I need to worry about alignment/packing because I don't intend the serialization to survive different compilations (only a unique executable is expected to read/write the data). And, as I'm targeting a wide scope of compilers-architectures, I don't want to place assumptions on alignment-packing or compiler specific tricks.
Then, I need to count bytes. And I can't just do sizeof(st.x1)+sizeof(st.x2) because of alingment-padding. So, I'm planning to substract pointers, from the start of the struct to the first "non persistent" field:
st myst;
int partsize = (char*)&myst.y1 - (char*)(&myst);
printf("partial size=%d (total size=%d)\n",partsize,sizeof(myst));
This seems to work. And it can be placed in a macro.
(For the record: I tried also to write another macro that does not requrire an instance of the struct, something like this, but it doesnt seem possible here - but this does not matter me much).
My question: Is this correct and safe? Can you see any potential pitfall, or some better approach?
Among other things: Does C standard (and de-facto compilers) assume that the structs fields lay in memory in the same order as they are defined in source? This probably is a stupid question, but I'd want to be sure...
UPDATE: Some conclusions from the answers and my own findings:
There seems to be no problem with my approach. In particular, C dictates that struct fields will never change order.
One could also (as suggested by an aswer) count from the last persistent field and add its size : (char*)&myst.x2 + sizeof(&myst.x2) - (char*)(&myst) . That would be equivalent, except that it would include not the padding bytes (if present) for the last field. A very small advantage - and a very small disadvantage, in being less simple.
But the accepted answer, with offsetof, seems to be preferable than my proposal. It's clear-expressive and pure compile-time, it does not require an instance of the struct. It further seems to be standard, available in any compiler.
If one does not need a compile-time construct, and has an instance of the struct available (as is my scenario) both solutions are esentially equivalent.
Have you looked at the offsetof facility? It returns the offset of a member from the start of a struct. So offsetof (st, x2) returns the offset of x2 from the start of the struct. So in your example offsetof (st, x2) + sizeof(st.x2) will give you the count of bytes of the serialized components.
This is pretty similar to what you are doing now, you just get to ignore the padding after x2 and to use a rarely used piece of C.
C guarantees this kind of behaviour. It's intended to allow primitive polymorphism. Consider:
struct X {
int a;
};
struct Y {
int a;
int b;
};
void foo(X* x) {
x->a = 10;
};
Y y;
foo((X*)&y); // Well defined behaviour- y.a = 10.
The C complier may insert padding bytes for alignment, but may not reorder the struct variables.
A cleaner way might be to just define a 2nd struct for sizeof() purposes, that includes the beginning variables of the struct you want. The compiler will guarantee that 2 struts that have the same variables in the same order will layout in the same way.
You may wish to take a look at protobuf; it seems to do what you want in a portable manner.
I vote for the KISS principle. Write to the file element by element and you're guaranteed no compiler dependencies.
I have a linked list, which stores groups of settings for my application:
typedef struct settings {
struct settings* next;
char* name;
char* title;
char* desc;
char* bkfolder;
char* srclist;
char* arcall;
char* incfold;
} settings_row;
settings_row* first_profile = { 0 };
#define SETTINGS_PER_ROW 7
When I load values into this structure, I don't want to have to name all the elements. I would rather treat it like a named array -- the values are loaded in order from a file and placed incrementally into the struct. Then, when I need to use the values, I access them by name.
//putting values incrementally into the struct
void read_settings_file(settings_row* settings){
char* field = settings + sizeof(void*);
int i = 0;
while(read_value_into(field[i]) && i++ < SETTINGS_PER_ROW);
}
//accessing components by name
void settings_info(settings_row* settings){
printf("Settings 'profile': %s\n", settings.title);
printf("Description: %s\n", settings.desc);
printf("Folder to backup to: %s\n", settings.bkfolder);
}
But I wonder, since these are all pointers (and there will only ever be pointers in this struct), will the compiler add padding to any of these values? Are they guaranteed to be in this order, and have nothing between the values? Will my approach work sometimes, but fail intermittently?
edit for clarification
I realize that the compiler can pad any values of a struct--but given the nature of the struct (a struct of pointers) I thought this might not be a problem. Since the most efficient way for a 32 bit processor to address data is in 32 bit chunks, this is how the compiler pads values in a struct (ie. an int, short, int in a struct will add 2 bytes of padding after the short, to make it into a 32 bit chunk, and align the next int to the next 32 bit chunk). But since a 32 bit processor uses 32 bit addresses (and a 64 bit processor uses 64 bit addresses (I think)), would padding be totally unnecessary since all of the values of the struct (addresses, which are efficient by their very nature) are in ideal 32 bit chunks?
I am hoping some memory-representation / compiler-behavior guru can come shed some light on whether a compiler would ever have a reason to pad these values
Under POSIX rules, all pointers (both function pointers and data pointers) are all required to be the same size; under just ISO C, all data pointers are convertible to 'void *' and back without loss of information (but function pointers need not be convertible to 'void *' without loss of information, nor vice versa).
Therefore, if written correctly, your code would work. It isn't written quite correctly, though! Consider:
void read_settings_file(settings_row* settings)
{
char* field = settings + sizeof(void*);
int i = 0;
while(read_value_into(field[i]) && i++ < SETTINGS_PER_ROW)
;
}
Let's assume you're using a 32-bit machine with 8-bit characters; the argument is not all that significantly different if you're using 64-bit machines. The assignment to 'field' is all wrong, because settings + 4 is a pointer to the 5th element (counting from 0) of an array of 'settings_row' structures. What you need to write is:
void read_settings_file(settings_row* settings)
{
char* field = (char *)settings + sizeof(void*);
int i = 0;
while(read_value_into(field[i]) && i++ < SETTINGS_PER_ROW)
;
}
The cast before addition is crucial!
C Standard (ISO/IEC 9899:1999):
6.3.2.3 Pointers
A pointer to void may be converted to or from a pointer to any incomplete or object
type. A pointer to any incomplete or object type may be converted to a pointer to void
and back again; the result shall compare equal to the original pointer.
[...]
A pointer to a function of one type may be converted to a pointer to a function of another
type and back again; the result shall compare equal to the original pointer. If a converted
pointer is used to call a function whose type is not compatible with the pointed-to type,
the behavior is undefined.
In many cases pointers are natural word sizes, so the compiler is unlikely to pad each member, but that doesn't make it a good idea. If you want to treat it like an array you should use an array.
I'm thinking out loud here so there's probably many mistakes but perhaps you could try this approach:
enum
{
kName = 0,
kTitle,
kDesc,
kBkFolder,
kSrcList,
kArcAll,
kIncFold,
kSettingsCount
};
typedef struct settings {
struct settings* next;
char *settingsdata[kSettingsCount];
} settings_row;
Set the data:
settings_row myRow;
myRow.settingsData[kName] = "Bob";
myRow.settingsData[kDescription] = "Hurrrrr";
...
Reading the data:
void read_settings_file(settings_row* settings){
char** field = settings->settingsData;
int i = 0;
while(read_value_into(field[i]) && i++ < SETTINGS_PER_ROW);
}
It's not guaranteed by the C standard. I've a sneaking suspicion, that I don't have time to check right now either way, that it guarantees no padding between the char* fields, i.e. that consecutive fields of the same type in a struct are guaranteed to be layout-compatible with an array of that type. But even if so, you're on your own between the settings* and the first char*, and also between the last char* and the end of the struct. But you could use offsetof to deal with the first issue, and I don't think the second affects your current code.
However, what you want is almost certainly guaranteed by your compiler, which somewhere in its documentation will set out its rules for struct layout, and will almost certainly say that all pointers to data are word sized, and that a struct can be the size of 8 words without additional padding. But if you want to write highly portable code, you have to use only the guarantees in the standard.
The order of fields is guaranteed. I also don't think you'll see intermittent failure - AFAIK the offset of each field in that struct will be consistent for a given implementation (meaning the combination of compiler and platform).
You could assert that sizeof(settings*) == sizeof(char*) and sizeof(settings_row) == sizeof(char*)*8. If both those hold, there is no room for any padding in the struct, since fields are not allowed to "overlap". If you ever hit a platform where they don't hold, you'll find out.
Even so, if you want an array, I'd be inclined to say use an array, with inline accessor functions or macros to get the individual fields. Whether your trick works or not, it's even easier not to think about it at all.
Although not a duplicate, this probably answers your question:
Why isn't sizeof for a struct equal to the sum of sizeof of each member?
It's not uncommon for applications to write an entire struct into a file and read it back out again. But this suffers from the possibility that one day the file will need to be read back on another platform, or by another version of the compiler that packs the struct differently. (Although this can be dealt with by specially-written code that understands the original packing format).
Technically, you can rely only on the order; the compiler could insert padding. If different pointers were of different size, or if the pointer size wasn't a natural word size, it might insert padding.
Practically speaking, you could get away with it. I wouldn't recommend it; it's a bad, dirty trick.
You could achieve your goal with another level of indirection (what doesn't that solve?), or by using a temporary array initialized to point to the various members of the structure.
It's not guaranteed, but it will work fine in most cases. It won't be intermittent, it will either work or not work on a particular platform with a particular build. Since you're using all pointers, most compilers won't mess with any padding.
Also, if you wanted to be safer, you could make it a union.
You can't do that the way you are trying. The compiler is allowed to pad any and all members of the struct. I do not believe it is allowed to reorder the fields.
Most compilers have an attribute that can be applied to the struct to pack it (ie to turn it into a collection of tightly packed storage with no padding), but the downside is that this generally affects performance. The packed flag will probably allow you to use the struct the way you want, but it may not be portable across various platforms.
Padding is designed to make field access as efficient as possible on the target architecture. It's best not to fight it unless you have to (ie, the struct goes to a disk or over a network.)
It seems to me that this approach creates more problems than it solves.
When you read this code six months from now, will you still be aware of all the subtleties of how the compiler pads a struct?
Would someone else, who didn't write the code?
If you must use the struct, use it in the canonical way and just write a function which
assigns values to each field separately.
You could also use an array and create macros to give field names to indices.
If you get too "clever" about optimizing your code, you will end up with slower code anyway, since the compiler won't be able to optimize it as well.
How do you compare two instances of structs for equality in standard C?
C provides no language facilities to do this - you have to do it yourself and compare each structure member by member.
You may be tempted to use memcmp(&a, &b, sizeof(struct foo)), but it may not work in all situations. The compiler may add alignment buffer space to a structure, and the values found at memory locations lying in the buffer space are not guaranteed to be any particular value.
But, if you use calloc or memset the full size of the structures before using them, you can do a shallow comparison with memcmp (if your structure contains pointers, it will match only if the address the pointers are pointing to are the same).
If you do it a lot I would suggest writing a function that compares the two structures. That way, if you ever change the structure you only need to change the compare in one place.
As for how to do it.... You need to compare every element individually
You can't use memcmp to compare structs for equality due to potential random padding characters between field in structs.
// bad
memcmp(&struct1, &struct2, sizeof(struct1));
The above would fail for a struct like this:
typedef struct Foo {
char a;
/* padding */
double d;
/* padding */
char e;
/* padding */
int f;
} Foo ;
You have to use member-wise comparison to be safe.
#Greg is correct that one must write explicit comparison functions in the general case.
It is possible to use memcmp if:
the structs contain no floating-point fields that are possibly NaN.
the structs contain no padding (use -Wpadded with clang to check this) OR the structs are explicitly initialized with memset at initialization.
there are no member types (such as Windows BOOL) that have distinct but equivalent values.
Unless you are programming for embedded systems (or writing a library that might be used on them), I would not worry about some of the corner cases in the C standard. The near vs. far pointer distinction does not exist on any 32- or 64- bit device. No non-embedded system that I know of has multiple NULL pointers.
Another option is to auto-generate the equality functions. If you lay your struct definitions out in a simple way, it is possible to use simple text processing to handle simple struct definitions. You can use libclang for the general case – since it uses the same frontend as Clang, it handles all corner cases correctly (barring bugs).
I have not seen such a code generation library. However, it appears relatively simple.
However, it is also the case that such generated equality functions would often do the wrong thing at application level. For example, should two UNICODE_STRING structs in Windows be compared shallowly or deeply?
Note you can use memcmp() on non static stuctures without
worrying about padding, as long as you don't initialise
all members (at once). This is defined by C90:
http://www.pixelbeat.org/programming/gcc/auto_init.html
It depends on whether the question you are asking is:
Are these two structs the same object?
Do they have the same value?
To find out if they are the same object, compare pointers to the two structs for equality.
If you want to find out in general if they have the same value you have to do a deep comparison. This involves comparing all the members. If the members are pointers to other structs you need to recurse into those structs too.
In the special case where the structs do not contain pointers you can do a memcmp to perform a bitwise comparison of the data contained in each without having to know what the data means.
Make sure you know what 'equals' means for each member - it is obvious for ints but more subtle when it comes to floating-point values or user-defined types.
memcmp does not compare structure, memcmp compares the binary, and there is always garbage in the struct, therefore it always comes out False in comparison.
Compare element by element its safe and doesn't fail.
If the structs only contain primitives or if you are interested in strict equality then you can do something like this:
int my_struct_cmp(const struct my_struct * lhs, const struct my_struct * rhs)
{
return memcmp(lhs, rsh, sizeof(struct my_struct));
}
However, if your structs contain pointers to other structs or unions then you will need to write a function that compares the primitives properly and make comparison calls against the other structures as appropriate.
Be aware, however, that you should have used memset(&a, sizeof(struct my_struct), 1) to zero out the memory range of the structures as part of your ADT initialization.
if the 2 structures variable are initialied with calloc or they are set with 0 by memset so you can compare your 2 structures with memcmp and there is no worry about structure garbage and this will allow you to earn time
This compliant example uses the #pragma pack compiler extension from Microsoft Visual Studio to ensure the structure members are packed as tightly as possible:
#include <string.h>
#pragma pack(push, 1)
struct s {
char c;
int i;
char buffer[13];
};
#pragma pack(pop)
void compare(const struct s *left, const struct s *right) {
if (0 == memcmp(left, right, sizeof(struct s))) {
/* ... */
}
}