struct inside struct : to point or not to point? - c

I'd like to understand the difference between using a pointer and a value when it comes to referencing a struct inside another struct.
By that I mean, I can have those two declarations:
struct foo {
int bar;
};
struct fred {
struct foo barney;
struct foo *wilma;
}
It appears I can get the same behavior from both barney and wilma entries, as long as I de-reference accordingly when I access them. The barney case intuitively feels “wrong” but I cannot say why.
Am I just relying on some C undefined behavior? If not, what would be the reason(s) to opt for one style over the other?
The following code shows how I come to the conclusion both use cases are equivalent; neither clang nor gcc complain about anything.
#include <stdio.h>
#include <stdlib.h>
struct a_number {
int i;
};
struct s_w_ptr {
struct a_number *n;
};
struct s_w_val {
struct a_number n;
};
void store_via_ptr(struct s_w_ptr *swp, struct s_w_val *swv) {
struct a_number *i = malloc(sizeof(i));
i->i = 1;
swp->n = i;
swv->n = *i;
}
void store_via_val(struct s_w_ptr *swp, struct s_w_val *swv) {
struct a_number j;
j.i = 2;
swp->n = &j;
swv->n = j;
}
int main(void) {
struct s_w_ptr *swp = malloc(sizeof(swp));
struct s_w_val *swv = malloc(sizeof(swv));
store_via_ptr(swp, swv);
printf("p: %d | v: %d\n", swp->n->i, swv->n.i);
store_via_val(swp, swv);
printf("p: %d | v: %d\n", swp->n->i, swv->n.i);
}

It's perfectly valid to have both struct members in a struct and have pointers to struct in a struct. They must be used differently but both are legal.
Why have a struct in a struct ?
One reason is to group things together. For instance:
struct car
{
struct motor motor; // a struct with several members describing the motor
struct wheel wheel; // a struct with several members describing the wheels
...
}
struct car myCar = {....initializer...};
myCar.wheel = SomeOtherWheelModel; // Replace wheels in a single assign
myCar.wheel.pressure = 2.1; // Change a single wheel member
Why have a struct pointer in a struct?
One very obvious reason is that is can be used as an array of N structs by using dynamic allocation of N times the struct size.
Another typical example is linked lists where you have a pointer to a struct of the same type as the struct containing the pointer.

There are several advantages of having a struct in a struct instead of having a pointer to struct in a struct:
It requires less memory allocation. In the case where you have a pointer to a struct in a struct, the compiler will allocate memory to store the pointer to the struct within the parent struct and separately allocate the memory for the child struct.
Additional instructions are typically required to access the contents of the child struct. For example consider that the program is reading the contents of the child struct. If a struct within a struct is used, the program will apply an offset to the address of the variable and read the contents of that memory location. In the case of a pointer to a struct in a struct, the program will actually apply an offset to the parent struct variable address, fetch the address of the child struct, then read from memory the contents of the child struct.
A separate variable needs to be declared for both the parent and child struct and if an initializer is used, then a separate initializer is needed. In the case of a struct in a struct only one variable must be declared and a single initializer is used.
In cases where dynamic memory allocation is used, the developer must remember to deallocate memory for both the child and parent objects before the variables fall out of scope. In the case of struct in a struct the memory must be freed for only one variable.
Lastly, as is shown in the example, if a pointer is used, Null checking may be necessary to ensure that the pointer to the child struct has been initialized.
The primary advantages of having a pointer to a struct in a struct would be if you needed to replace the child struct with another struct within the program, such as a linked list. A less common case might be if the child struct can be of more than one type. In this case you might use a void * type for the child. I may also use a pointer within a struct to point to an array in case where the array pointed to may vary in size between instances.
Based on my knowledge the case shown in the example above, I would be inclined to use a struct in a struct, since both objects are of fixed size and type and since it appears that they would not need to be separated.

C structures can be used to group related data, such as the title of a book, its author, its assigned book number, and so on. But much of what we use structures for is creating data structures (in a different sense of the word “structure”) in memory.
Consider that the book’s author has a name, a date of birth, other biographical information, a list of books they have written, and more. We could include in the struct book a struct author that would contain all this information. But, if the author has written a hundred books, we could have 100 copies of all that information, one copy in each struct book. Further, we cannot continue the “contain the data inside the structure directly” model with the struct author, because it cannot contain a struct book for each book the author publishes if those struct book members also have to contain the struct author for the author—every object would have to contain itself.
It is more efficient to create one struct author and have each struct book for that author to link to their struct author.
Another example is that we use pointers to create data structures for efficient access to data. If we are reading data for thousands of items and want to keep them sorted by name, one option is to allocate memory for some number of structures, read the data, and sort the data. When new data is read and we have used all the memory we allocated, we allocate new memory, copy all the old data to the new memory if necessary, and move some of the data so we can insert the new data in its proper place. However, we have many better options than that. We can use linked lists, binary trees, other kinds of trees, and hash tables.
These data structures effectively require using pointers. A binary tree will have a root node, and each node contains two pointers, one to a subtree of nodes that are earlier than it in the sorting order and another to a subtree of nodes that are later than it. We can look up items in the tree by following pointers to earlier or later nodes to find the right position. And we can insert items by changing a few pointers. If the tree happens to become unbalanced, we can rearrange nodes in the tree by changing pointers. The bulk of the data in the nodes does not have to be changed or copied, just some pointers.
We can also use pointers to have multiple structures for the same data. All the data about books could be stored in one place, and a tree ordered by name could contain nodes in which each node contained a pointer to the book structure and two pointers to subtrees. We could have one tree like this ordered by title of the book and another tree ordered by the name of the author and another tree ordered by the assigned book number. Then we can efficiently look up a book by title or author or number, but there is only one master copy of the complete book data, in the struct book objects. The look-up data is in the tree, which contains only pointers. That is much more efficient than copying all of the struct book data for each tree.
So the reasons we choose between use structures or pointers as members is not whether the C syntax allows us to refer to the data or not—we can get to the data in both cases. The reasons are because one method requires embedding data, which is inflexible and requires copying data, and the other method is flexible and efficient.

Let's consider at first this function
void store_via_ptr(struct s_w_ptr *swp, struct s_w_val *swv) {
struct a_number *i = malloc(sizeof(i));
i->i = 1;
swp->n = i;
swv->n = *i;
}
This declaration
struct a_number *i = malloc(sizeof(i));
is equivalent to the following declaration
struct a_number *i = malloc(sizeof( struct a_number * ));
So in general the function can invoke undefined behavior when sizeof( struct a_number ) is greater than sizeof( struct a_number * ).
It seems you mean
struct a_number *i = malloc(sizeof( *i ) );
^^^
If you will split the function in two functions for each its parameter like
void store_via_ptr1( struct s_w_ptr *swp ) {
struct a_number *i = malloc(sizeof( *i ) );
i->i = 1;
swp->n = i;
}
and
void store_via_ptr( struct s_w_val *swv ) {
struct a_number *i = malloc(sizeof( *i));
i->i = 1;
swv->n = *i;
}
then in the first function the object pointed to by the pointer swp will need to remember to free the allocated memory within the function. Otherwise there will be a memory leak.
The second function already produces a memory leak because the allocated memory was not freed.
Now let's consider the second function
void store_via_val(struct s_w_ptr *swp, struct s_w_val *swv) {
struct a_number j;
j.i = 2;
swp->n = &j;
swv->n = j;
}
Here the pointer swp->n will point to a local object j. So after exiting the function this pointer will be invalid because the pointed object will not be alive.
So the both functions are incorrect. Instead you could write the following functions
int store_via_ptr(struct s_w_ptr *swp ) {
swp->n = malloc( sizeof( *swp->n ) );
int success = swp->n != NULL;
if ( success ) swp->n->i = 1;
return success;
}
and
void store_via_val( struct s_w_val *swv ) {
swv->n.i = 2;
}
When to include a whole object of a structure type in another object of a structure type or to use a pointer to an object of a structure type within other object of a structure type depends on the design and context where such objects are used.
For example consider a structure struct Point
struct Point
{
int x;
int y;
};
In this case if you want to declare a structure struct Rectangle then it is natural to define it like
struct Rectangle
{
struct Point top_left;
struct Point bottom_right;
};
On the other hand, if you have a two-sided singly-linked list then it can look like
struct Node
{
int value;
struct Node *next;
};
struct List
{
struct Node *head;
struct Node *tail;
};

Two problems:
In store_via_ptr you allocate memory for i dynamically. When you use s_w_val you copy the structure, and then leave the pointer. Which means the pointer will be lost and can't be passed to free later.
In store_via_val you make swp->n point to the local variable j. A variable whose life-time will end when the function returns, leaving you with an invalid pointer.
The first problem might lead to a memory leak (something you never care about in your simple example problem).
The second problem is worse, since it will lead to undefined behavior when you dereference the pointer swp->n.
Unrelated to that, in the main function you don't need to allocate memory dynamically for the structures. You could just have defined them as plain structure objects and used the pointer-to operator & when calling the functions.

Related

What would happen if a linked list was implemented without pointers?

The standard struct for list node is:
struct node {
int x;
struct node *next;
};
But, what would happen if we defined a node without a pointer, like this:
struct node {
int x;
struct node next;
};
?
I assume that the main problem would be not knowing where the list ends, since there wouldn't be a NULL pointer. But apart from that is there any other effects to be taken into consideration?
What would happen if we defined a node without a pointer, like this:
struct node {
int x;
struct node next;
};
This declares a structure with unterminated recursion. Hence the declaration is invalid and is rejected by the compiler.
Let's calculate this:
sizeof(struct node)
Well, we have an int, possibly some padding and sizeof(struct node). Putting it into one formula:
sizeof(struct node) = sizeof(int) + padding + sizeof(struct node)
This cannot be solved.
Thinking about it less theoretically, it would be a structure containing an infinite number of itself.
Languages that don't have value semantics but use reference semantics instead, like Haskell, allow this kind of data structures (types). I'm oversimplifying a lot here, but think of every structure member (record field) as a pointer, then it's probably clear why or works there:
data List = EndOfList | Node Int List
A struct type may not contain an instance of itself as a member for two reasons.
First, the type definition isn't complete until the closing } of the struct type; until the type definition is complete, the compiler won't know how much space to allocate for that member. Secondly, a struct type that contains an instance of itself would be infinitely large.
A struct type may contain a pointer to itself as a member since pointers to incomplete types are allowed, and all struct pointer types have the same size and representation.
You can create linked lists without using pointers; I did it Fortran 77, which didn't have pointer types. You simply use an array as your storage, and use array indices as your "pointers".

Naming a variable with another variable in C

I want to create a struct with 2 variables, such as
struct myStruct {
char charVar;
int intVar;
};
and I will name the structs as:
struct myStruct name1;
struct myStruct name2;
etc.
The problem is, I don't know how many variables will be entered, so there must be infinite nameX structures.
So, how can I name these structures with variables?
Thanks.
You should use an array and a pointer.
struct myStruct *p = NULL;
p = malloc(N * sizeof *p); // where N is the number of entries.
int index = 1; /* or any other number - from 0 to N-1*/
p[index].member = x;
Then you can add elements to it by using realloc if you need to add additional entries.
Redefine myStruct as
struct myStruct {
char charVar;
int intVar;
struct myStruct *next;
};
Keep track of the last structure you have as well as the start of the list. When addding new elements, append them to the end of your linked list.
/* To initialize the list */
struct myStruct *start, *end;
start = malloc(sizeof(struct myStruct));
start->next = NULL;
end = start;
/* To add a new structure at the end */
end->next = malloc(sizeof(struct myStruct));
end = end->next;
end->next = NULL;
This example does not do any error checking. Here is how you would step along the list to print all the values in it:
struct myStruct *ptr;
for(ptr = start; ptr != NULL; ptr = ptr->next)
printf("%d %s\n", ptr->intVar, ptr->charVar);
You not have to have a distinct name for each structure in a linked list (or any other kind of list, in general). You can assign any of the unnamed structures to the pointer ptr as you use them.
So, how can I name these structures with variables?
I think every beginner starts out wanting to name everything. It's not surprising -- you learn about using variables to store data, so it seems natural that you'd always use variables. The answer, however, is that you don't always use variables for storing data. Very often, you store data in structures or objects that are created dynamically. It may help to read about dynamic allocation. The idea is that when you have a new piece of data to store, you ask for a piece of memory (using a library call like malloc or calloc). You refer to that piece of memory by its address, i.e. a pointer.
There are a number of ways to keep track of all the pieces of memory that you've obtained, and each one constitutes a data structure. For example, you could keep a number of pieces of data in a contiguous block of memory -- that's an array. See Devolus's answer for an example. Or you could have lots of little pieces of memory, with each one containing the address (again, a pointer) of the next one; that's a linked list. Mad Physicist's answer is a fine example of a linked list.
Each data structure has its own advantages and disadvantages -- for example, arrays allow fast access but are slow for inserting and deleting, while linked lists are relatively slow for access but are fast for inserting and deleting. Choosing the right data structure for the job at hand is an important part of programming.
It usually takes a little while to get comfortable with pointers, but it's well worth the effort as they open up a lot of possibilities for storing and manipulating data in your program. Enjoy the ride.

Using unions with structures

I have a structure like this:
struct data
{
char abc[10];
int cnt;
struct data *next, *prior;
};
struct data *start, *last;
struct data *start1, *last1;
struct data *start2, *last2;
The integer 'cnt' can have two values. The pointers:
struct data *start, *last;
are used to link all data with all values of 'cnt'. The pointers:
struct data *start1, *last1;
struct data *start2, *last2;
are used to link data when the value of 'cnt' is either 1 or 2. My problem is that when I change the value of 'abc' or 'cnt' for one linked list, say 'start->abc', the value 'start1->abc' and 'start2->abc' are unchanged because they live in different memory locations.
I would like a change in data under one list to be reflected in the other two lists. I believe 'unions' could help me do this but I don't know how to set it up.
Any help appreciated!
Nope, can't be done.
If even you come up with a solution that uses unions to get this done, you'll essentially have some data objects allocated in such a way that they overlap each other in memory. You'd end up with a contiguous memory block.
Rather than that, disregard the linked list altogether and use an array:
struct data {
char abc[10];
int data;
}
struct data datas[50];
struct data* some = datas[20];
struct data* prev = some - 1;
struct data* next = some + 1;
(Don't go out of bounds.)
If you really want a linked list for some reason, the whole point of them is that each element can be anywhere in the memory. This means that each element needs to remember the address of the next and the previous in order to allow two-way navigation.
Therefore, rather than thinking about union tricks, just make a function insertData or removeData that do basic operations on a list and also fixes all the pointers in neighbouring elements.
char global_abc[10];
int global_cnt;
struct data
{
char *abc;
int *cnt;
struct data *next, *prior;
};
start->abc = start1->abc = start2->abc = global_abc;
start->cnt = start1->cnt = start2->cnt = aglobal_cnt;
And now when you changed the
strcpy(start->abc, "any");
then it will be changed for the other elements.
And when you changed the
*(start->cnt) = 5;
then it will be changed for the other elements.
If you want the data to live on two lists simultaneously, the "all" list and the "cnt" list, then you need two sets of start, last pointers in the structure.
struct data
{
char abc[10];
int cnt;
struct data *next_all, *prior_all;
struct data *next_cnt, *prior_cnt;
};
When you change the value of cnt, you must remove the data from the next_cnt, prior_cnt list (corresponding to start1, last1 or start2, last2) and add it to the other.
Use a set of arrays to hold the data, and use a pointer to those arrays from your structures. Then "linked" entries can point to the same data buffer...
struct data
{
char* abc;
int cnt;
struct data *next, *prior;
};
struct data *start, *last;
struct data *start1, *last1;
struct data *start2, *last2;
char abcBuffer[2][10];
and in some function somewhere...
start->abc = abcBuffer[start->cnt];
start1->abc = abcBuffer[start1->cnt];
start2->abc = abcBuffer[start2->cnt];
In this case, changing the content of abcBuffer[n] will reflect the same change across all of the structures linked to that buffer. The key, however, is that you cannot do this using a "shared" structure such as a union, but have to manage it in your code.

define a function returning struct pointer

Please bear with me, i m from other language and newbie to c and learning it from http://c.learncodethehardway.org/book/learn-c-the-hard-way.html
struct Person {
char *name;
int age;
int height;
int weight;
};
struct Person *Person_create(char *name, int age, int height, int weight)
{
struct Person *who = malloc(sizeof(struct Person));
assert(who != NULL);
who->name = strdup(name);
who->age = age;
who->height = height;
who->weight = weight;
return who;
}
I understand the second Person_create function returns a pointer of struct Person. I don't understand is(may be because i m from other language, erlang, ruby), why does it define it as
struct Person *Person_create(char *name, int age, int height, int weight)
not
struct Person Person_create(char *name, int age, int height, int weight)
and is there other way to define a function to return a structure?
sorry if this question is too basic.
It is defined so because it returns a pointer to a struct, not a struct. You assign the return value to a struct Person *, not to struct Person.
It is possible to return a full struct, like that:
struct Person Person_create(char *name, int age, int height, int weight)
{
struct Person who;
who.name = strdup(name);
who.age = age;
who.height = height;
who.weight = weight;
return who;
}
But it is not used very often.
The Person_create function returns a pointer to a struct Person so you have to define the return value to be a pointer (by adding the *). To understand the reason for returning a pointer to a struct and not the struct itself one must understand the way C handles memory.
When you call a function in C you add a record for it on the call stack. At the bottom of the call stack is the main function of the program you're running, at the top is the currently executing function. The records on the stack contain information such as the values of the parameters passed to the functions and all the local variables of the functions.
There is another type of memory your program has access to: heap memory. This is where you allocate space using malloc, and it is not connected to the call stack.
When you return from a function the call stack is popped and all the information associated with the function call are lost. If you want to return a struct you have two options: copy the data inside the struct before it is popped from the call stack, or keep the data in heap memory and return a pointer to it. It's more expensive to copy the data byte for byte than to simply return a pointer, and thus you would normally want to do that to save resources (both memory and CPU cycles). However, it doesn't come without cost; when you keep your data in heap memory you have to remember to free it when you stop using it, otherwise your program will leak memory.
The function returns who, which is a struct Person * - a pointer to a structure. The memory to hold the structure is allocated by malloc(), and the function returns a pointer to that memory.
If the function were declared to return struct Person and not a pointer, then who could also be declared as a structure. Upon return, the structure would be copied and returned to the caller. Note that the copy is less efficient than simply returning a pointer to the memory.
Structs are not pointers (or references) by default in C/C++, as they are for example in Java. Struct Person Function() would therefor return struct itself (by value, making a copy) not a pointer.
You often don't want to create copies of objects (shallow copies by default, or copies created using copy constructors) as this can get pretty time consuming soon.
To copy the whole struct and not just pointer is less efficient because a pointer's sizeof is usually much smaller than sizeof of a whole struct itself.
Also, a struct might contain pointers to other data in memory, and blindly copying that could be dangerous for dynamically allocated data (if a code handling one copy would free it, the other copy would be left with invalid pointer).
So shallow copy is almost always a bad idea, unless you're sure that the original goes out of scope - and then why wouldn't you just return a pointer to the struct instead (a struct dynamically allocated on heap of course, so it won't be destroyed like the stack-allocated entities are destroyed, on return from a function).

Deep copy of graph structure

I have a graph structure in C and want to make a deep copy of it (including nodes and edges).
The structure looks like this:
struct li_list {
struct li_node n;
};
struct li_node {
struct li_node *next, *prev;
};
struct gr_graph {
struct li_list nodes;
int nodecount;
};
struct gr_node {
struct li_node node;
struct gr_graph *graph;
int pred_count, succ_count;
struct li_list pred, succ;
};
struct gr_edge {
struct li_node succ, pred;
struct gr_node *from, *to;
unsigned long marks;
};
These structs do not exist as themselves, but "inherited" in another struct, like this:
struct ex_node {
struct gr_node _; // "Superclass"
int id;
struct ex_node *union_find_parent;
...
}
Is there an elegant solution of creating a deep copy such a structure, including updating references to the copies?
Note: Members of nested structs do not point to the root struct it contains, but to their related nested struct (for instance, ex_node._.pred.n.next points to a ex_edge._.pred). This implies tedious pointer arithmetic when these must be updated.
My solution up to now is
Memcopy all structs
Iterate through all copies
Call a bunch of macros for all fields that contain references (Due to missing RTTI in C, I probably won't come around this)
The macros use
offsetof to calculate the address of the root struct
Retrieve the address of the copied equivalent
offsetof to make the pointer point to the correct nested struct
Is there any easier way to do this? I am also afraid that I forget to add a macro call when I add more fields.
I don't think you can do a deep-copy per se, as the pointers will have a memory address assigned to the pointers, the best way I can think of a deep-copy is to simply allocate a new graph structure and copy the data (not pointers) and build it up from there by mallocing the new pointers and adjust the pointers in the ex_node structure. That would be a more thorough solution...
Hope this helps,
Best regards,
Tom.
Sounds ok. My $0.02:
Not sure why you need both li_list and li_node. Further, don't you need a data member for li_node?
The overall structure looks a bit complex (of course, I don't know your requirements) and smells of C++ style design (pardon me, if I am wrong)
memcpy is not required. A simple assignment suffices.
Define a function pointer member for each structure with pointer members, so that you can do:
So:
struct foo {
int datum;
int *p;
foo_copy pfoo;
};
typedef void (*foo_copy)(const struct foo *src, struct foo *dst);
void foo_cp(const struct foo *src, struct foo *dst)
{
*dst = *src; // copy non-pointer data
dst->p = malloc(sizeof *dst->p);
dst->p = *src->p;
}
// somewhere else
struct foo s;
// initalize
struct foo *t = malloc(sizeof *t);
s.copy(&s, &t);
and nested types call appropriate member copy methods ...
memcpy all structs and create a sorted list where each entry contains address of original struct and address of copy of struct.
Now iterate through all copies. For each pointer variables in all copied structs, search pointer in the sorted list and replace it with the address of its copy.
Yes, there is an elegant solution using spanning trees and the decorator pattern.
-First, build a spanning tree of the graph. You can use a DFS (Depth First Search)
or a BFS(Breadth First Search) to achieve this. Use the decorator pattern to give
each each visited node a unique identifier.
-Next, (or at the same time) traverse the spanning tree from start to finish
and begin building your second tree by allocating new nodes and connecting
the edges that form the spanning tree.
-Finally, take one more pass through the spanning tree, and using the synchronized
identifiers, connect the remaining missing edges in the new graph, so that they match the
connectivity of the old graph.
(e.g. If node5 in graph1 has edges connecting to node7 and node 11, then
use the ordering of graph2 to connect its node5 to its node7 and 11.)

Resources