Recently I reviewed some C code and found something equivalent to the following:
struct foo {
int some_innocent_variables;
double some_big_array[VERY_LARGE_NUMBER];
}
Being almost, but not quite, almost entirely a newbie in C, am I right in thinking that this struct is awfully inefficient in its use of space because of the array member? What happens when this struct gets passed as an argument to a function? Is it copied in its entirety on the stack, including the full array?
Would it be better in most cases to have a double *some_pointer instead?
If you pass by value yes it will make a copy of everything.
But that's why pointers exist.
//Just the address is passed
void doSomething(struct foo *myFoo)
{
}
Being passed as an argument it will be copied which is very inefficient way of passing structures, especially big ones. However, basically, structs are passed to functions by pointer.
Choosing between
double some_big_array[VERY_LARGE_NUMBER];
and
double *some_pointer
depends only on the program design and how this field/structure will be used. The latter allows using variable size storage, however may need dynamic allocation.
There are plenty of reasons to use arrays in structs. Among them is the fact that structs are passed to functions by value, while arrays are passed by reference. That said, this struct is probably passed to functions with pointers.
As others have said, objects of that type are usually passed around with pointers (always sizeof (struct foo) bytes, often 4 bytes).
You may also see the "struct hack" (also passed around with pointers):
struct foo {
int some_innocent_variables;
double some_array[]; /* C99 flexible array member */
/* double some_array[1]; ** the real C89 "struck hack" */
}
This "struct hack" gets an array sized by the malloc call.
/* allocate an object of struct foo type with an array with 42 elements */
struct foo *myfoo = malloc(sizeof *myfoo + 42 * sizeof *myfoo->some_array);
/* some memory may be wasted when using C89 and
the "struct hack" and this allocation method */
Yes, in C you would usually pass a pointer to the structure around due to efficiency reasons.
That structure is fine as long as you pass it by reference (using a pointer).
Offtopic:
Beware of the struct hack, as it is not strictly standard compliant; it ignores the automatic padding. The Unix IPC messaging queues use it (see struct msgbuf), though, and it is almost certainly to work with any compiler.
That said, the functions that use that structure may use pointers to it instead of using a copy.
Related
I must say, I have quite a conundrum in a seemingly elementary problem. I have a structure, in which I would like to store an array as a field. I'd like to reuse this structure in different contexts, and sometimes I need a bigger array, sometimes a smaller one. C prohibits the use of variable-sized buffer. So the natural approach would be declaring a pointer to this array as struct member:
struct my {
struct other* array;
}
The problem with this approach however, is that I have to obey the rules of MISRA-C, which prohibits dynamic memory allocation. So then if I'd like to allocate memory and initialize the array, I'm forced to do:
var.array = malloc(n * sizeof(...));
which is forbidden by MISRA standards. How else can I do this?
Since you are following MISRA-C, I would guess that the software is somehow mission-critical, in which case all memory allocation must be deterministic. Heap allocation is banned by every safety standard out there, not just by MISRA-C but by the more general safety standards as well (IEC 61508, ISO 26262, DO-178 and so on).
In such systems, you must always design for the worst-case scenario, which will consume the most memory. You need to allocate exactly that much space, no more, no less. Everything else does not make sense in such a system.
Given those pre-requisites, you must allocate a static buffer of size LARGE_ENOUGH_FOR_WORST_CASE. Once you have realized this, you simply need to find a way to keep track of what kind of data you have stored in this buffer, by using an enum and maybe a "size used" counter.
Please note that not just malloc/calloc, but also VLAs and flexible array members are banned by MISRA-C:2012. And if you are using C90/MISRA-C:2004, there are no VLAs, nor are there any well-defined use of flexible array members - they invoked undefined behavior until C99.
Edit: This solution does not conform to MISRA-C rules.
You can kind of include VLAs in a struct definition, but only when it's inside a function. A way to get around this is to use a "flexible array member" at the end of your main struct, like so:
#include <stdio.h>
struct my {
int len;
int array[];
};
You can create functions that operate on this struct.
void print_my(struct my *my) {
int i;
for (i = 0; i < my->len; i++) {
printf("%d\n", my->array[i]);
}
}
Then, to create variable length versions of this struct, you can create a new type of struct in your function body, containing your my struct, but also defining a length for that buffer. This can be done with a varying size parameter. Then, for all the functions you call, you can just pass around a pointer to the contained struct my value, and they will work correctly.
void create_and_use_my(int nelements) {
int i;
// Declare the containing struct with variable number of elements.
struct {
struct my my;
int array[nelements];
} my_wrapper;
// Initialize the values in the struct.
my_wrapper.my.len = nelements;
for (i = 0; i < nelements; i++) {
my_wrapper.my.array[i] = i;
}
// Print the struct using the generic function above.
print_my(&my_wrapper.my);
}
You can call this function with any value of nelements and it will work fine. This requires C99, because it does use VLAs. Also, there are some GCC extensions that make this a bit easier.
Important: If you pass the struct my to another function, and not a pointer to it, I can pretty much guarantee you it will cause all sorts of errors, since it won't copy the variable length array with it.
Here's a thought that may be totally inappropriate for your situation, but given your constraints I'm not sure how else to deal with it.
Create a large static array and use this as your "heap":
static struct other heap[SOME_BIG_NUMBER];
You'll then "allocate" memory from this "heap" like so:
var.array = &heap[start_point];
You'll have to do some bookkeeping to keep track of what parts of your "heap" have been allocated. This assumes that you don't have any major constraints on the size of your executable.
Generally it is preferred to pass pointer to structure to a function in C, in order to avoid copying during function call. This has an unwanted side effect that the called function can modify the elements of the structure inadvertently. What is a good programming practice to avoid such errors without compromising on the efficiency of the function call ?
Pass a pointer-to-const is the obvious answer
void foo(const struct some_struct *p)
That will prevent you from modifying the immediate members of the struct inadvertently. That's what const is for.
In fact, your question sounds like a copy-paste from some quiz card, with const being the expected answer.
In general, when it comes to simple optimizations like what you've described, it is often preferable to use a pointer-to-struct rather than passing a struct itself, as passing a whole struct means more overhead from extra data being copied onto the call stack.
The example below is a fairly common approach:
#include <errno.h>
typedef struct myStruct {
int i;
char c;
} myStruct_t;
int myFunc(myStruct_t* pStruct) {
if (!pStruct) {
return EINVAL;
}
// Do some stuff
return 0;
}
If you want to avoid modifying the data passed to the function, just make sure that the data is immutable by modifying the function prototype.
int myFunc(const myStruct_t* pStruct)
You will also benefit from reading up on "const correctness".
A very common idiom, particularly in unix/posix style system code is to have the caller allocate a struct, and pass a pointer to that struct through the function call.
This is a little different than what I think your asking about where you are passing data into a function with a struct (where as others have mention you may the function to treat the struct as const). In these cases, the struct is empty (or only partially full) before the function call. The caller will do something like allocate an empty struct and then passes a pointer to this struct. Probably different than your general question, but relevant to the discussion I think.
This accomplishes a couple handy things. It avoids copying a possibly large structure, also it lets the caller fill in some fields and the callee to fill out other (giving an effective shared space for communication).
The most important aspect to this idiom is that the caller has full control over the allocation of the struct. It can have it on the stack, heap, reuse the same one repeatedly, but where it comes from the caller is responsible for the handling the memory.
This is one of the problems with passing around struct pointers; you can easily lose track of who allocated the struct and whose responsibility it is to free it. This idiom gives you the advantage of not having to copy the struct around, while making it clear who has the job of free'ing the memory is.
I am working on refactoring some old code and have found few structs containing zero length arrays (below). Warnings depressed by pragma, of course, but I've failed to create by "new" structures containing such structures (error 2233). Array 'byData' used as pointer, but why not to use pointer instead? or array of length 1? And of course, no comments were added to make me enjoy the process...
Any causes to use such thing? Any advice in refactoring those?
struct someData
{
int nData;
BYTE byData[0];
}
NB It's C++, Windows XP, VS 2003
Yes this is a C-Hack.
To create an array of any length:
struct someData* mallocSomeData(int size)
{
struct someData* result = (struct someData*)malloc(sizeof(struct someData) + size * sizeof(BYTE));
if (result)
{ result->nData = size;
}
return result;
}
Now you have an object of someData with an array of a specified length.
There are, unfortunately, several reasons why you would declare a zero length array at the end of a structure. It essentially gives you the ability to have a variable length structure returned from an API.
Raymond Chen did an excellent blog post on the subject. I suggest you take a look at this post because it likely contains the answer you want.
Note in his post, it deals with arrays of size 1 instead of 0. This is the case because zero length arrays are a more recent entry into the standards. His post should still apply to your problem.
http://blogs.msdn.com/oldnewthing/archive/2004/08/26/220873.aspx
EDIT
Note: Even though Raymond's post says 0 length arrays are legal in C99 they are in fact still not legal in C99. Instead of a 0 length array here you should be using a length 1 array
This is an old C hack to allow a flexible sized arrays.
In C99 standard this is not neccessary as it supports the arr[] syntax.
Your intution about "why not use an array of size 1" is spot on.
The code is doing the "C struct hack" wrong, because declarations of zero length arrays are a constraint violation. This means that a compiler can reject your hack right off the bat at compile time with a diagnostic message that stops the translation.
If we want to perpetrate a hack, we must sneak it past the compiler.
The right way to do the "C struct hack" (which is compatible with C dialects going back to 1989 ANSI C, and probably much earlier) is to use a perfectly valid array of size 1:
struct someData
{
int nData;
unsigned char byData[1];
}
Moreover, instead of sizeof struct someData, the size of the part before byData is calculated using:
offsetof(struct someData, byData);
To allocate a struct someData with space for 42 bytes in byData, we would then use:
struct someData *psd = (struct someData *) malloc(offsetof(struct someData, byData) + 42);
Note that this offsetof calculation is in fact the correct calculation even in the case of the array size being zero. You see, sizeof the whole structure can include padding. For instance, if we have something like this:
struct hack {
unsigned long ul;
char c;
char foo[0]; /* assuming our compiler accepts this nonsense */
};
The size of struct hack is quite possibly padded for alignment because of the ul member. If unsigned long is four bytes wide, then quite possibly sizeof (struct hack) is 8, whereas offsetof(struct hack, foo) is almost certainly 5. The offsetof method is the way to get the accurate size of the preceding part of the struct just before the array.
So that would be the way to refactor the code: make it conform to the classic, highly portable struct hack.
Why not use a pointer? Because a pointer occupies extra space and has to be initialized.
There are other good reasons not to use a pointer, namely that a pointer requires an address space in order to be meaningful. The struct hack is externalizeable: that is to say, there are situations in which such a layout conforms to external storage such as areas of files, packets or shared memory, in which you do not want pointers because they are not meaningful.
Several years ago, I used the struct hack in a shared memory message passing interface between kernel and user space. I didn't want pointers there, because they would have been meaningful only to the original address space of the process generating a message. The kernel part of the software had a view to the memory using its own mapping at a different address, and so everything was based on offset calculations.
It's worth pointing out IMO the best way to do the size calculation, which is used in the Raymond Chen article linked above.
struct foo
{
size_t count;
int data[1];
}
size_t foo_size_from_count(size_t count)
{
return offsetof(foo, data[count]);
}
The offset of the first entry off the end of desired allocation, is also the size of the desired allocation. IMO it's an extremely elegant way of doing the size calculation. It does not matter what the element type of the variable size array is. The offsetof (or FIELD_OFFSET or UFIELD_OFFSET in Windows) is always written the same way. No sizeof() expressions to accidentally mess up.
I'm sort of learning C, I'm not a beginner to programming though, I "know" Java and python, and by the way I'm on a mac (leopard).
Firstly,
1: could someone explain when to use a pointer and when not to?
2:
char *fun = malloc(sizeof(char) * 4);
or
char fun[4];
or
char *fun = "fun";
And then all but the last would set indexes 0, 1, 2 and 3 to 'f', 'u', 'n' and '\0' respectively. My question is, why isn't the second one a pointer? Why char fun[4] and not char *fun[4]? And how come it seems that a pointer to a struct or an int is always an array?
3:
I understand this:
typedef struct car
{
...
};
is a shortcut for
struct car
{
...
};
typedef struct car car;
Correct? But something I am really confused about:
typedef struct A
{
...
}B;
What is the difference between A and B? A is the 'tag-name', but what's that? When do I use which? Same thing for enums.
4. I understand what pointers do, but I don't understand what the point of them is (no pun intended). And when does something get allocated on the stack vs. the heap? How do I know where it gets allocated? Do pointers have something to do with it?
5. And lastly, know any good tutorial for C game programming (simple) ? And for mac/OS X, not windows?
PS. Is there any other name people use to refer to just C, not C++? I hate how they're all named almost the same thing, so hard to try to google specifically C and not just get C++ and C# stuff.
Thanks!!
It was hard to pick a best answer, they were all great, but the one I picked was the only one that made me understand my 3rd question, which was the only one I was originally going to ask. Thanks again!
My question is, why isn't the second one a pointer?
Because it declares an array. In the two other cases, you have a pointer that refers to data that lives somewhere else. Your array declaration, however, declares an array of data that lives where it's declared. If you declared it within a function, then data will die when you return from that function. Finally char *fun[4] would be an array of 4 pointers - it wouldn't be a char pointer. In case you just want to point to a block of 4 chars, then char* would fully suffice, no need to tell it that there are exactly 4 chars to be pointed to.
The first way which creates an object on the heap is used if you need data to live from thereon until the matching free call. The data will survive a return from a function.
The last way just creates data that's not intended to be written to. It's a pointer which refers to a string literal - it's often stored in read-only memory. If you write to it, then the behavior is undefined.
I understand what pointers do, but I don't understand what the point of them is (no pun intended).
Pointers are used to point to something (no pun, of course). Look at it like this: If you have a row of items on the table, and your friend says "pick the second item", then the item won't magically walk its way to you. You have to grab it. Your hand acts like a pointer, and when you move your hand back to you, you dereference that pointer and get the item. The row of items can be seen as an array of items:
And how come it seems that a pointer to a struct or an int is always an array?
item row[5];
When you do item i = row[1]; then you first point your hand at the first item (get a pointer to the first one), and then you advance till you are at the second item. Then you take your hand with the item back to you :) So, the row[1] syntax is not something special to arrays, but rather special to pointers - it's equivalent to *(row + 1), and a temporary pointer is made up when you use an array like that.
What is the difference between A and B? A is the 'tag-name', but what's that? When do I use which? Same thing for enums.
typedef struct car
{
...
};
That's not valid code. You basically said "define the type struct car { ... } to be referable by the following ordinary identifier" but you missed to tell it the identifier. The two following snippets are equivalent instead, as far as i can see
1)
struct car
{
...
};
typedef struct car car;
2)
typedef struct car
{
...
} car;
What is the difference between A and B? A is the 'tag-name', but what's that? When do I use which? Same thing for enums.
In our case, the identifier car was declared two times in the same scope. But the declarations won't conflict because each of the identifiers are in a different namespace. The two namespaces involved are the ordinary namespace and the tag namespace. A tag identifier needs to be used after a struct, union or enum keyword, while an ordinary identifier doesn't need anything around it. You may have heard of the POSIX function stat, whose interface looks like the following
struct stat {
...
};
int stat(const char *path, struct stat *buf);
In that code snippet, stat is registered into the two aforementioned namespaces too. struct stat will refer to the struct, and merely stat will refer to the function. Some people don't like to precede identifiers always with struct, union or enum. Those use typedef to introduce an ordinary identifier that will refer to the struct too. The identifier can of course be the same (both times car), or they can differ (one time A the other time B). It doesn't matter.
3) It's bad style to use two different names A and B:
typedef struct A
{
...
} B;
With that definition, you can say
struct A a;
B b;
b.field = 42;
a.field = b.field;
because the variables a and b have the same type. C programmers usually say
typedef struct A
{
...
} A;
so that you can use "A" as a type name, equivalent to "struct A" but it saves you a lot of typing.
Use them when you need to. Read some more examples and tutorials until you understand what pointers are, and this ought to be a lot clearer :)
The second case creates an array in memory, with space for four bytes. When you use that array's name, you magically get back a pointer to the first (index 0) element. And then the [] operator then actually works on a pointer, not an array - x[y] is equivalent to *(x + y). And yes, this means x[y] is the same as y[x]. Sorry.
Note also that when you add an integer to a pointer, it's multiplied by the size of the pointed-to elements, so if you do someIntArray[1], you get the second (index 1) element, not somewhere inbetween starting at the first byte.
Also, as a final gotcha - array types in function argument lists - eg, void foo(int bar[4]) - secretly get turned into pointer types - that is, void foo(int *bar). This is only the case in function arguments.
Your third example declares a struct type with two names - struct A and B. In pure C, the struct is mandatory for A - in C++, you can just refer to it as either A or B. Apart from the name change, the two types are completely equivalent, and you can substitute one for the other anywhere, anytime without any change in behavior.
C has three places things can be stored:
The stack - local variables in functions go here. For example:
void foo() {
int x; // on the stack
}
The heap - things go here when you allocate them explicitly with malloc, calloc, or realloc.
void foo() {
int *x; // on the stack
x = malloc(sizeof(*x)); // the value pointed to by x is on the heap
}
Static storage - global variables and static variables, allocated once at program startup.
int x; // static
void foo() {
static int y; // essentially a global that can only be used in foo()
}
No idea. I wish I didn't need to answer all questions at once - this is why you should split them up :)
Note: formatting looks ugly due to some sort of markdown bug, if anyone knows of a workaround please feel free to edit (and remove this note!)
char *fun = malloc(sizeof(char) * 4);
or
char fun[4];
or
char *fun = "fun";
The first one can be set to any size you want at runtime, and be resized later - you can also free the memory when you are done.
The second one is a pointer really 'fun' is the same as char ptr=&fun[0].
I understand what pointers do, but I don't understand what the point of
them is (no pun intended). And when
does something get allocated on the
stack vs. the heap? How do I know
where it gets allocated? Do pointers
have something to do with it?
When you define something in a function like "char fun[4]" it is defined on the stack and the memory isn't available outside the function.
Using malloc (or new in C++) reserves memory on the heap - you can make this data available anywhere in the program by passing it the pointer. This also lets you decide the size of the memory at runtime and finaly the size of the stack is limited (typically 1Mb) while on the heap you can reserve all the memory you have available.
edit 5. Not really - I would say pure C. C++ is (almost) a superset of C so unless you are working on a very limited embedded system it's usualy OK to use C++.
\5. Chipmunk
Fast and lightweight 2D rigid body physics library in C.
Designed with 2D video games in mind.
Lightweight C99 implementation with no external dependencies outside of the Std. C library.
Many language bindings available.
Simple, read the documentation and see!
Unrestrictive MIT license.
Makes you smarter, stronger and more attractive to the opposite gender!
...
In your second question:
char *fun = malloc(sizeof(char) * 4);
vs
char fun[4];
vs
char *fun = "fun";
These all involve an array of 4 chars, but that's where the similarity ends. Where they differ is in the lifetime, modifiability and initialisation of those chars.
The first one creates a single pointer to char object called fun - this pointer variable will live only from when this function starts until the function returns. It also calls the C standard library and asks it to dynamically create a memory block the size of an array of 4 chars, and assigns the location of the first char in the block to fun. This memory block (which you can treat as an array of 4 chars) has a flexible lifetime that's entirely up to the programmer - it lives until you pass that memory location to free(). Note that this means that the memory block created by malloc can live for a longer or shorter time than the pointer variable fun itself does. Note also that the association between fun and that memory block is not fixed - you can change fun so it points to different memory block, or make a different pointer point to that memory block.
One more thing - the array of 4 chars created by malloc is not initialised - it contains garbage values.
The second example creates only one object - an array of 4 chars, called fun. (To test this, change the 4 to 40 and print out sizeof(fun)). This array lives only until the function it's declared in returns (unless it's declared outside of a function, when it lives for as long as the entire program is running). This array of 4 chars isn't initialised either.
The third example creates two objects. The first is a pointer-to-char variable called fun, just like in the first example (and as usual, it lives from the start of this function until it returns). The other object is a bit strange - it's an array of 4 chars, initialised to { 'f', 'u', 'n', 0 }, which has no name and that lives for as long as the entire program is running. It's also not guaranteed to be modifiable (although what happens if you try to modify it is left entirely undefined - it might crash your program, or it might not). The variable fun is initialised with the location of this strange unnamed, unmodifiable, long-lived array (but just like in the first example, this association isn't permanent - you can make fun point to something else).
The reason why there's so many confusing similarities and differences between arrays and pointers is down to two things:
The "array syntax" in C (the [] operator) actually works on pointers, not arrays!
Trying to pin down an array is a bit like catching fog - in almost all cases the array evaporates and is replaced by a pointer to its first element instead.
How do you compare two instances of structs for equality in standard C?
C provides no language facilities to do this - you have to do it yourself and compare each structure member by member.
You may be tempted to use memcmp(&a, &b, sizeof(struct foo)), but it may not work in all situations. The compiler may add alignment buffer space to a structure, and the values found at memory locations lying in the buffer space are not guaranteed to be any particular value.
But, if you use calloc or memset the full size of the structures before using them, you can do a shallow comparison with memcmp (if your structure contains pointers, it will match only if the address the pointers are pointing to are the same).
If you do it a lot I would suggest writing a function that compares the two structures. That way, if you ever change the structure you only need to change the compare in one place.
As for how to do it.... You need to compare every element individually
You can't use memcmp to compare structs for equality due to potential random padding characters between field in structs.
// bad
memcmp(&struct1, &struct2, sizeof(struct1));
The above would fail for a struct like this:
typedef struct Foo {
char a;
/* padding */
double d;
/* padding */
char e;
/* padding */
int f;
} Foo ;
You have to use member-wise comparison to be safe.
#Greg is correct that one must write explicit comparison functions in the general case.
It is possible to use memcmp if:
the structs contain no floating-point fields that are possibly NaN.
the structs contain no padding (use -Wpadded with clang to check this) OR the structs are explicitly initialized with memset at initialization.
there are no member types (such as Windows BOOL) that have distinct but equivalent values.
Unless you are programming for embedded systems (or writing a library that might be used on them), I would not worry about some of the corner cases in the C standard. The near vs. far pointer distinction does not exist on any 32- or 64- bit device. No non-embedded system that I know of has multiple NULL pointers.
Another option is to auto-generate the equality functions. If you lay your struct definitions out in a simple way, it is possible to use simple text processing to handle simple struct definitions. You can use libclang for the general case – since it uses the same frontend as Clang, it handles all corner cases correctly (barring bugs).
I have not seen such a code generation library. However, it appears relatively simple.
However, it is also the case that such generated equality functions would often do the wrong thing at application level. For example, should two UNICODE_STRING structs in Windows be compared shallowly or deeply?
Note you can use memcmp() on non static stuctures without
worrying about padding, as long as you don't initialise
all members (at once). This is defined by C90:
http://www.pixelbeat.org/programming/gcc/auto_init.html
It depends on whether the question you are asking is:
Are these two structs the same object?
Do they have the same value?
To find out if they are the same object, compare pointers to the two structs for equality.
If you want to find out in general if they have the same value you have to do a deep comparison. This involves comparing all the members. If the members are pointers to other structs you need to recurse into those structs too.
In the special case where the structs do not contain pointers you can do a memcmp to perform a bitwise comparison of the data contained in each without having to know what the data means.
Make sure you know what 'equals' means for each member - it is obvious for ints but more subtle when it comes to floating-point values or user-defined types.
memcmp does not compare structure, memcmp compares the binary, and there is always garbage in the struct, therefore it always comes out False in comparison.
Compare element by element its safe and doesn't fail.
If the structs only contain primitives or if you are interested in strict equality then you can do something like this:
int my_struct_cmp(const struct my_struct * lhs, const struct my_struct * rhs)
{
return memcmp(lhs, rsh, sizeof(struct my_struct));
}
However, if your structs contain pointers to other structs or unions then you will need to write a function that compares the primitives properly and make comparison calls against the other structures as appropriate.
Be aware, however, that you should have used memset(&a, sizeof(struct my_struct), 1) to zero out the memory range of the structures as part of your ADT initialization.
if the 2 structures variable are initialied with calloc or they are set with 0 by memset so you can compare your 2 structures with memcmp and there is no worry about structure garbage and this will allow you to earn time
This compliant example uses the #pragma pack compiler extension from Microsoft Visual Studio to ensure the structure members are packed as tightly as possible:
#include <string.h>
#pragma pack(push, 1)
struct s {
char c;
int i;
char buffer[13];
};
#pragma pack(pop)
void compare(const struct s *left, const struct s *right) {
if (0 == memcmp(left, right, sizeof(struct s))) {
/* ... */
}
}