Why use address of first element of struct, rather than struct itself? - c

I've just come upon yet another code base at work where developers consistently use the address of the first element of structs when copying/comparing/setting, rather than the struct itself. Here's a simple example.
First there's a struct type:
typedef struct {
int a;
int b;
} foo_t;
Then there's a function that makes a copy of such a struct:
void bar(foo_t *inp)
{
foo_t l;
...
memcpy(&l.a, &inp->a, sizeof(foo_t));
...
}
I wouldn't myself write a call to memcpy in that way and I started out with suspecting that the original developers simply didn't quite grasp pointers and structs in C. However, now I've seen this in two unrelated code bases, with no common developers so I'm starting to doubt myself.
Why would one want to use this style?

Nobody should do that.
If you rearrange struct members you are in trouble.

Instead of that:
memcpy(&l.a, &inp->a, sizeof(foo_t));
you can do that:
memcpy(&l, inp, sizeof(foo_t));
While it can be dangerous and misleading, both statements actually do the same thing here as C guarantees there is no padding before the first structure member.
But the best is just to copy the structure objects using a simple assignment operator:
l = *inp;
Why would one want to use this style?
My guess: ignorance or bad discipline.

One wouldn't. If you ever moved a in the struct or you inserted member(s) before it, you would introduce a memory smashing bug.

This code is unsafe because rearranging the members of the struct can result in the memcpy accessing beyond the bounds of the struct if member a is no longer the first member.
However, it's conceivable that members are intentionally ordered within the struct and programmer only wants to copy a subset of them, beginning with member a and running until the end of the struct. If that's the case then the code can be made safe with the following change:
memcpy(&l.a, &inp->a, sizeof(foo_t) - offsetof(foo_t, a));
Now the struct members may be rearranged into any order and this memcpy will never go out of bounds.

Actually, there is one legitimate use case for this: constructing a class hierarchy.
When treating structs as a class instances, the first member (i.e. offset 0) will typically be the supertype instance... if a supertype exists. This allows a simple cast to move between using the subtype vs. the supertype. Very useful.
On Darren Stone's note about intention, this is expected when executing OO in the C language.
In any other case, I would suggest avoiding this pattern and accessing the member directly instead, for reasons already cited.

It's a really bad habit. The struct might have another member prepended, for example. This is an insanely careless habit and I am surprised to read that anyone would do this.
Others have already noted these; the one that bugs me is this:
struct Foo rgFoo [3];
struct Foo *pfoo = &rgFoo [0];
instead of
struct Foo *pfoo = rgfoo;
Why deref the array by index and then take the address again? It's already the address, the only difference of note is that pfoo is technically
struct Foo *const,
not
struct Foo *.
Yet I used to see the first one all the time.

Related

Is there a principle for choosing between embedding a struct itself or the pointer to a struct inside a struct?

This is a code snippet from qemu.(qemu-5.1.0 include/hw/arm/smmu-common.h)
typedef struct SMMUDevice {
void *smmu;
PCIBus *bus;
int devfn;
IOMMUMemoryRegion iommu;
AddressSpace as;
uint32_t cfg_cache_hits;
uint32_t cfg_cache_misses;
QLIST_ENTRY(SMMUDevice) next;
} SMMUDevice;
I've seen many such codes until now but I am now curious if there is any principle/rule in choosing between
embedding a struct A inside a struct B
embedding a pointer to the struct A inside a struct B
Two things that come to my mind right away is that if a struct A is to be shared by many structs, it is better to use pointer. or if the struct containing the struct(that is, struct B) is to be frequently passed as a function argument, it would be better to use pointer(pointer to struct B as argument, or pointer to A inside struct B and struct B is the argument) because copying the struct to stack would take long time.
I am curious if there are other important rules.
There's no correct answer because it depends on what you want to use them for. Storing a struct inside another struct is generally more efficient, since it gives faster access and better data cache use.
However, it isn't as flexible. If you wish to swap out the whole contents of a big struct for something else, it goes much faster to just swap two pointers than doing a hard copy of all the data. Pointers also enable different forms of allocation - you could have a static storage struct with a pointer at dynamically allocated memory for example.
if a struct A is to be shared by many structs, it is better to use pointer
I don't see how that matters at all. It's just a . vs -> notation by the code using it.
or if the struct containing the struct(that is, struct B) is to be frequently passed as a function argument, it would be better to use pointer
No that's nonsense, you'd always pass the outer struct through a pointer no matter what members it got. Passing it by value doesn't make any sense in either scenario.

Can I access a member of a struct with a pointer to a struct which contains a pointer to that struct

I am trying to access the members in the struct tCAN_MESSAGE. What I think would work is like the first example in main, i.e. some_ptr->canMessage_ptr->value = 10;. But I have some code that someone else have written and what I can see is that that person have used some_ptr->canMessage_ptr[i].value;.
Is it possible to do it the first way? We are using pointers to structs which contains pointer to another struct (like the example below) quite often, but I never see the use of ptr1->ptr2->value?
typedef struct
{
int value1;
int value2;
int value3;
float value4;
}tCAN_MESSAGE;
typedef struct
{
tCAN_MESSAGE *canMessage_ptr;
}tSOMETHING;
int main(void)
{
tCAN_MESSATE var_canMessage;
tSOMETHING var_something;
tSOMETHING *some_ptr = &var_something;
some_ptr->canMessage_ptr = &var_canMessage;
some_ptr->canMessage_ptr->value1 = 10; //is this valid?
//I have some code that are doing this, ant iterating trough it with a for:
some_ptr->canMessage_ptr[i].value1; //Is this valid?
return 0
}
It's very simple: every pointer has to be set to point at a valid memory location before use. If it isn't, you can't use it. You cannot "store data inside pointers". See this:
Crash or "segmentation fault" when data is copied/scanned/read to an uninitialized pointer
None of your code is valid. some_ptr isn't set to point anywhere, so it cannot be accessed, nor can its members. Similarly, some_ptr->canMessage_ptr isn't set to point anywhere either.
I am trying to access the members in the struct tCAN_MESSAGE. What I
think would work is like the first example in main, i.e.
some_ptr->canMessage_ptr->value = 10;. But I have some code that
someone else have written and what I can see is that that person have
used some_ptr->canMessage_ptr[i].value;. Is it possible to do it the
first way?
The expression
some_ptr->canMessage_ptr[i].value
is 100% equivalent to
(*(some_ptr->canMessage_ptr + i)).value
, which in turn is 100% equivalent to
(some_ptr->canMessage_ptr + i)->value
. When i is 0, that is of course equivalent to
some_ptr->canMessage_ptr->value
So yes, it is possible to use some_ptr->canMessage_ptr->value as long as the index in question is 0. If the index is always 0 then chaining arrow operators as you suggest is good style. Otherwise, the mixture of arrow and indexing operators that you see in practice would be my style recommendation.
We are using pointers to structs wich contains pointer to
another struct (like the example below) quite often, but I never see
the use of ptr1->ptr2->value ?
I'm inclined to suspect that you do not fully understand what you're working with. Usage of the form some_ptr->canMessage_ptr[i].value suggests that your tSOMETHING type contains a pointer to the first element of an array of possibly many tCAN_MESSAGEs, which is a subtle but important distinction to make. In that case, yes, as shown above, you can chain arrow operators to access the first element of such an array (at index 0). However, the cleanest syntax for accessing other elements of that array is to use the indexing operator, and it pays to be consistent.

Struct with array member in C

Recently I reviewed some C code and found something equivalent to the following:
struct foo {
int some_innocent_variables;
double some_big_array[VERY_LARGE_NUMBER];
}
Being almost, but not quite, almost entirely a newbie in C, am I right in thinking that this struct is awfully inefficient in its use of space because of the array member? What happens when this struct gets passed as an argument to a function? Is it copied in its entirety on the stack, including the full array?
Would it be better in most cases to have a double *some_pointer instead?
If you pass by value yes it will make a copy of everything.
But that's why pointers exist.
//Just the address is passed
void doSomething(struct foo *myFoo)
{
}
Being passed as an argument it will be copied which is very inefficient way of passing structures, especially big ones. However, basically, structs are passed to functions by pointer.
Choosing between
double some_big_array[VERY_LARGE_NUMBER];
and
double *some_pointer
depends only on the program design and how this field/structure will be used. The latter allows using variable size storage, however may need dynamic allocation.
There are plenty of reasons to use arrays in structs. Among them is the fact that structs are passed to functions by value, while arrays are passed by reference. That said, this struct is probably passed to functions with pointers.
As others have said, objects of that type are usually passed around with pointers (always sizeof (struct foo) bytes, often 4 bytes).
You may also see the "struct hack" (also passed around with pointers):
struct foo {
int some_innocent_variables;
double some_array[]; /* C99 flexible array member */
/* double some_array[1]; ** the real C89 "struck hack" */
}
This "struct hack" gets an array sized by the malloc call.
/* allocate an object of struct foo type with an array with 42 elements */
struct foo *myfoo = malloc(sizeof *myfoo + 42 * sizeof *myfoo->some_array);
/* some memory may be wasted when using C89 and
the "struct hack" and this allocation method */
Yes, in C you would usually pass a pointer to the structure around due to efficiency reasons.
That structure is fine as long as you pass it by reference (using a pointer).
Offtopic:
Beware of the struct hack, as it is not strictly standard compliant; it ignores the automatic padding. The Unix IPC messaging queues use it (see struct msgbuf), though, and it is almost certainly to work with any compiler.
That said, the functions that use that structure may use pointers to it instead of using a copy.

Manually zeroing variables VS copying struct

I have a long C (not C++) struct. It is used to control entities in a game, with position, some behavior data, nothing flashy, except for two strings. The struct is global.
Right now, whenever an object is initialized, I put all the values to defaults, one by one
myobjects[index].angle = 0; myobjects[index].speed = 0;
like that. It doesn't really feel slow, but I am wondering if copying a "template" struct with all values set to the defaults is faster or more convenient.
So, to sum up into two proper questions: Is it faster to copy a struct instead of manually setting all the data?
What should I keep in mind about the malloc-ed memory for the strings?
"More convenient" is likely the more important part.
struct s vs[...];
...
// initialize every member to "default"
// but can't use in normal assignment
struct s newv = {0};
vs[...] = newv;
Or hide the "initialization details" behind an init-function (or a macro, if you dislike maintainable code :-)
struct s* init_s (struct s* v, ...) { /* and life goes on */ }
You may use this sequence:
memset(myobjects+index,0,sizeof(myobjects[0]);
if all you need is to set all members to zero
Beware: if a particular member is pointer, it will be set to NULL
Nelu Cozac

a few beginner C questions

I'm sort of learning C, I'm not a beginner to programming though, I "know" Java and python, and by the way I'm on a mac (leopard).
Firstly,
1: could someone explain when to use a pointer and when not to?
2:
char *fun = malloc(sizeof(char) * 4);
or
char fun[4];
or
char *fun = "fun";
And then all but the last would set indexes 0, 1, 2 and 3 to 'f', 'u', 'n' and '\0' respectively. My question is, why isn't the second one a pointer? Why char fun[4] and not char *fun[4]? And how come it seems that a pointer to a struct or an int is always an array?
3:
I understand this:
typedef struct car
{
...
};
is a shortcut for
struct car
{
...
};
typedef struct car car;
Correct? But something I am really confused about:
typedef struct A
{
...
}B;
What is the difference between A and B? A is the 'tag-name', but what's that? When do I use which? Same thing for enums.
4. I understand what pointers do, but I don't understand what the point of them is (no pun intended). And when does something get allocated on the stack vs. the heap? How do I know where it gets allocated? Do pointers have something to do with it?
5. And lastly, know any good tutorial for C game programming (simple) ? And for mac/OS X, not windows?
PS. Is there any other name people use to refer to just C, not C++? I hate how they're all named almost the same thing, so hard to try to google specifically C and not just get C++ and C# stuff.
Thanks!!
It was hard to pick a best answer, they were all great, but the one I picked was the only one that made me understand my 3rd question, which was the only one I was originally going to ask. Thanks again!
My question is, why isn't the second one a pointer?
Because it declares an array. In the two other cases, you have a pointer that refers to data that lives somewhere else. Your array declaration, however, declares an array of data that lives where it's declared. If you declared it within a function, then data will die when you return from that function. Finally char *fun[4] would be an array of 4 pointers - it wouldn't be a char pointer. In case you just want to point to a block of 4 chars, then char* would fully suffice, no need to tell it that there are exactly 4 chars to be pointed to.
The first way which creates an object on the heap is used if you need data to live from thereon until the matching free call. The data will survive a return from a function.
The last way just creates data that's not intended to be written to. It's a pointer which refers to a string literal - it's often stored in read-only memory. If you write to it, then the behavior is undefined.
I understand what pointers do, but I don't understand what the point of them is (no pun intended).
Pointers are used to point to something (no pun, of course). Look at it like this: If you have a row of items on the table, and your friend says "pick the second item", then the item won't magically walk its way to you. You have to grab it. Your hand acts like a pointer, and when you move your hand back to you, you dereference that pointer and get the item. The row of items can be seen as an array of items:
And how come it seems that a pointer to a struct or an int is always an array?
item row[5];
When you do item i = row[1]; then you first point your hand at the first item (get a pointer to the first one), and then you advance till you are at the second item. Then you take your hand with the item back to you :) So, the row[1] syntax is not something special to arrays, but rather special to pointers - it's equivalent to *(row + 1), and a temporary pointer is made up when you use an array like that.
What is the difference between A and B? A is the 'tag-name', but what's that? When do I use which? Same thing for enums.
typedef struct car
{
...
};
That's not valid code. You basically said "define the type struct car { ... } to be referable by the following ordinary identifier" but you missed to tell it the identifier. The two following snippets are equivalent instead, as far as i can see
1)
struct car
{
...
};
typedef struct car car;
2)
typedef struct car
{
...
} car;
What is the difference between A and B? A is the 'tag-name', but what's that? When do I use which? Same thing for enums.
In our case, the identifier car was declared two times in the same scope. But the declarations won't conflict because each of the identifiers are in a different namespace. The two namespaces involved are the ordinary namespace and the tag namespace. A tag identifier needs to be used after a struct, union or enum keyword, while an ordinary identifier doesn't need anything around it. You may have heard of the POSIX function stat, whose interface looks like the following
struct stat {
...
};
int stat(const char *path, struct stat *buf);
In that code snippet, stat is registered into the two aforementioned namespaces too. struct stat will refer to the struct, and merely stat will refer to the function. Some people don't like to precede identifiers always with struct, union or enum. Those use typedef to introduce an ordinary identifier that will refer to the struct too. The identifier can of course be the same (both times car), or they can differ (one time A the other time B). It doesn't matter.
3) It's bad style to use two different names A and B:
typedef struct A
{
...
} B;
With that definition, you can say
struct A a;
B b;
b.field = 42;
a.field = b.field;
because the variables a and b have the same type. C programmers usually say
typedef struct A
{
...
} A;
so that you can use "A" as a type name, equivalent to "struct A" but it saves you a lot of typing.
Use them when you need to. Read some more examples and tutorials until you understand what pointers are, and this ought to be a lot clearer :)
The second case creates an array in memory, with space for four bytes. When you use that array's name, you magically get back a pointer to the first (index 0) element. And then the [] operator then actually works on a pointer, not an array - x[y] is equivalent to *(x + y). And yes, this means x[y] is the same as y[x]. Sorry.
Note also that when you add an integer to a pointer, it's multiplied by the size of the pointed-to elements, so if you do someIntArray[1], you get the second (index 1) element, not somewhere inbetween starting at the first byte.
Also, as a final gotcha - array types in function argument lists - eg, void foo(int bar[4]) - secretly get turned into pointer types - that is, void foo(int *bar). This is only the case in function arguments.
Your third example declares a struct type with two names - struct A and B. In pure C, the struct is mandatory for A - in C++, you can just refer to it as either A or B. Apart from the name change, the two types are completely equivalent, and you can substitute one for the other anywhere, anytime without any change in behavior.
C has three places things can be stored:
The stack - local variables in functions go here. For example:
void foo() {
int x; // on the stack
}
The heap - things go here when you allocate them explicitly with malloc, calloc, or realloc.
void foo() {
int *x; // on the stack
x = malloc(sizeof(*x)); // the value pointed to by x is on the heap
}
Static storage - global variables and static variables, allocated once at program startup.
int x; // static
void foo() {
static int y; // essentially a global that can only be used in foo()
}
No idea. I wish I didn't need to answer all questions at once - this is why you should split them up :)
Note: formatting looks ugly due to some sort of markdown bug, if anyone knows of a workaround please feel free to edit (and remove this note!)
char *fun = malloc(sizeof(char) * 4);
or
char fun[4];
or
char *fun = "fun";
The first one can be set to any size you want at runtime, and be resized later - you can also free the memory when you are done.
The second one is a pointer really 'fun' is the same as char ptr=&fun[0].
I understand what pointers do, but I don't understand what the point of
them is (no pun intended). And when
does something get allocated on the
stack vs. the heap? How do I know
where it gets allocated? Do pointers
have something to do with it?
When you define something in a function like "char fun[4]" it is defined on the stack and the memory isn't available outside the function.
Using malloc (or new in C++) reserves memory on the heap - you can make this data available anywhere in the program by passing it the pointer. This also lets you decide the size of the memory at runtime and finaly the size of the stack is limited (typically 1Mb) while on the heap you can reserve all the memory you have available.
edit 5. Not really - I would say pure C. C++ is (almost) a superset of C so unless you are working on a very limited embedded system it's usualy OK to use C++.
\5. Chipmunk
Fast and lightweight 2D rigid body physics library in C.
Designed with 2D video games in mind.
Lightweight C99 implementation with no external dependencies outside of the Std. C library.
Many language bindings available.
Simple, read the documentation and see!
Unrestrictive MIT license.
Makes you smarter, stronger and more attractive to the opposite gender!
...
In your second question:
char *fun = malloc(sizeof(char) * 4);
vs
char fun[4];
vs
char *fun = "fun";
These all involve an array of 4 chars, but that's where the similarity ends. Where they differ is in the lifetime, modifiability and initialisation of those chars.
The first one creates a single pointer to char object called fun - this pointer variable will live only from when this function starts until the function returns. It also calls the C standard library and asks it to dynamically create a memory block the size of an array of 4 chars, and assigns the location of the first char in the block to fun. This memory block (which you can treat as an array of 4 chars) has a flexible lifetime that's entirely up to the programmer - it lives until you pass that memory location to free(). Note that this means that the memory block created by malloc can live for a longer or shorter time than the pointer variable fun itself does. Note also that the association between fun and that memory block is not fixed - you can change fun so it points to different memory block, or make a different pointer point to that memory block.
One more thing - the array of 4 chars created by malloc is not initialised - it contains garbage values.
The second example creates only one object - an array of 4 chars, called fun. (To test this, change the 4 to 40 and print out sizeof(fun)). This array lives only until the function it's declared in returns (unless it's declared outside of a function, when it lives for as long as the entire program is running). This array of 4 chars isn't initialised either.
The third example creates two objects. The first is a pointer-to-char variable called fun, just like in the first example (and as usual, it lives from the start of this function until it returns). The other object is a bit strange - it's an array of 4 chars, initialised to { 'f', 'u', 'n', 0 }, which has no name and that lives for as long as the entire program is running. It's also not guaranteed to be modifiable (although what happens if you try to modify it is left entirely undefined - it might crash your program, or it might not). The variable fun is initialised with the location of this strange unnamed, unmodifiable, long-lived array (but just like in the first example, this association isn't permanent - you can make fun point to something else).
The reason why there's so many confusing similarities and differences between arrays and pointers is down to two things:
The "array syntax" in C (the [] operator) actually works on pointers, not arrays!
Trying to pin down an array is a bit like catching fog - in almost all cases the array evaporates and is replaced by a pointer to its first element instead.

Resources