How is it possible for a C struct to reference itself? - c

How does a C compiler (I'm using GCC) know what to do with the following?
struct node
{
int x;
struct node* next;
};
More precisely, if node has yet to be completely defined yet (we have not reached the closing curly brace), then how does the compiler know how big a struct ought to be?
While I realize that "pointing to" only requires an address, incrementing pointers does require the size of the data it points to.

The size of the struct is not important, as a pointer to the struct is being stored, not the struct itself.
In terms of incrementing pointers to struct; that is done outside of the struct definition, so again, is not important.

Related

Is there a principle for choosing between embedding a struct itself or the pointer to a struct inside a struct?

This is a code snippet from qemu.(qemu-5.1.0 include/hw/arm/smmu-common.h)
typedef struct SMMUDevice {
void *smmu;
PCIBus *bus;
int devfn;
IOMMUMemoryRegion iommu;
AddressSpace as;
uint32_t cfg_cache_hits;
uint32_t cfg_cache_misses;
QLIST_ENTRY(SMMUDevice) next;
} SMMUDevice;
I've seen many such codes until now but I am now curious if there is any principle/rule in choosing between
embedding a struct A inside a struct B
embedding a pointer to the struct A inside a struct B
Two things that come to my mind right away is that if a struct A is to be shared by many structs, it is better to use pointer. or if the struct containing the struct(that is, struct B) is to be frequently passed as a function argument, it would be better to use pointer(pointer to struct B as argument, or pointer to A inside struct B and struct B is the argument) because copying the struct to stack would take long time.
I am curious if there are other important rules.
There's no correct answer because it depends on what you want to use them for. Storing a struct inside another struct is generally more efficient, since it gives faster access and better data cache use.
However, it isn't as flexible. If you wish to swap out the whole contents of a big struct for something else, it goes much faster to just swap two pointers than doing a hard copy of all the data. Pointers also enable different forms of allocation - you could have a static storage struct with a pointer at dynamically allocated memory for example.
if a struct A is to be shared by many structs, it is better to use pointer
I don't see how that matters at all. It's just a . vs -> notation by the code using it.
or if the struct containing the struct(that is, struct B) is to be frequently passed as a function argument, it would be better to use pointer
No that's nonsense, you'd always pass the outer struct through a pointer no matter what members it got. Passing it by value doesn't make any sense in either scenario.

Pointers to structures when declaring structure inside a structure

I have a question regarding structures definition and pointers.
In the definition of linked list node structure we define the structure as follows:
typedef struct node
{
int data;
struct node *next;
}Node;
Why whould we use this way of declaration instead of:
typedef struct node
{
int data;
struct node next; //changed this line
}Node;
Thanks in advance!
A structure is defined after its closing brace. Until it a structure has an incomplete type. But a definition of a structure requires that all its members except a flexible array shall have complete types.
So in this declaration
typedef struct node
{
int data;
struct node next; //changed this line
}Node;
the data member next has an incomplete type.
From the C Standard (6.7.2.1 Structure and union speciļ¬ers)
...The type is incomplete until immediately after the } that terminates
the list, and complete thereafter.
and
3 A structure or union shall not contain a member with incomplete or
function type (hence, a structure shall not contain an instance of
itself, but may contain a pointer to an instance of itself), except
that the last member of a structure with more than one named member
may have incomplete array type; such a structure (and any union
containing, possibly recursively, a member that is such a structure)
shall not be a member of a structure or an element of an array.
As for pointers then they always have complete types because their sizes are known.
Actually we do that because, we avoid the recursive call. Suppose think your second case. You call node inside a node itself. So what is the sizeof the node. sizeof(int) + sizeof(node). Then again for the node the size become sizeof(int)+sizeof(node). So this is a unstoppable recursive process. So we use the first case because avoid the recursive process. It just point to the object of same type structure.
The compiler needs to determine the size of whatever it is that comes its way.
This needs completeness in definition, so that a determined calculation can be made.
In the first case of self referential structure, we have a pointer. A pointer is os struct node * type has a definite size determined by the architecture.
In the second case - struct node next, what would be the size of next? Would it be the size of struct node? Okay, let's say it is, but then again - what is the size of struct node? Well, the answer is sizeof(int) + sizeof(struct node). Okay, but then again... wait...what??... go back and read this entire para again, and realize the catch-22 situation here...
The compilers won't and don't appreciate this!

Node pointer in the definition of node itself, how does it work?

Consider this struct definition:
struct node {
int age;
struct node* next;
};
How is it possible to have struct node *next in struct node {...} when it is in the definition of struct node itself?
What would the value of sizeof(struct node*) be?
With:
struct node* to_add = (struct node*)(malloc(sizeof(struct node));
Does to_add only have the address of the allocated memory?
Although I have used and implemented data structures in C/C++, however I have some basic doubts. Could you please help me understand the doubts which I have. I tried to search on-line, however the doubts still remains the same.
Because you only need a declaration of a type to define a pointer to that type. The start of the struct definition declares the type so you're ok to have pointers to that type inside it.
sizeof(struct node*) returns the size of the pointer
malloc returns a pointer to heap memory of the type's size but the memory is uninitialized.
The size is right but it's uninitialized.
Here's an MS Painted pictorial description of what such a node would look like:
Each node has a value (42, 69, or 613 in this example) and each holds a pointer to the next node, the last node holding a pointer to nothing.
This isn't an infinite recursive structure - a node doesn't have a node member, it has a node* member. And a pointer is just a pointer, another memory address. sizeof(node*) is the same as the size of any other pointer (whether 4 or 8 bytes). sizeof(node) is more interesting, but dependent on padding and other alignment issues. On a 32-bit machine, it'll probably be 8 bytes. On a 64-bit machine, it'll probably be 16 bytes (with 4 bytes of padding in the middle).
A declaration is sufficient to create a pointer. E.g.
struct node;
struct node* nodePtr = NULL;
The size of a pointer does not depend on the full definition of the object. Hence you can use:
struct node;
struct node* nodePtr = NULL;
size_t s = sizeof(nodePtr);
Since creation of a pointer does not depend on the full definition of the struct, it is possible to use:
struct node { // At this point struct node is a declared type.
int age;
// Since struct node is a declared type, you can create a pointer
struct node* next;
};
It's a pointer. Pointer definitions are naturally opaque--that is, the only thing that we need to know to be able to have a pointer is that the thing it points to has a constant size (as per all C and C++ non-templated types). Opaque pointers, in fact, are a way of making an encapsulated API in C/C++; it's used for stuff like SDL.
It should normally be the size as any other pointer. Pointers are usually implemented as simple memory addresses, but memory itself can be a complicated situation depending on the architecture and software it is running on. On x86-64, I believe the size is the same as one word, which is 8 bytes.
Depends on implementation or architecture. Normally (meaning x86 or x86-64), yes.
Yes.
One more thing; note that sizeof(Node) and sizeof(age) + sizeof(next) can be different sizes due to compiler alignment. Compilers often try to make memory multiples of word sizes or a different size (e.g. cache size) to make memory access better for architectures that optimize for alignment to word size.

Arrow vs. Dot in C Structs?

I've got a specific question regarding the arrow vs. dot notation for structs in C. I understand that -> is used for struct pointers, and . is used for objects, however I've been having some trouble parsing some code I found online.
typedef struct node{
int data;
}Node;
typedef struct heap{
int size;
Node *dataArray;
}Heap;
typedef struct plan{
int maxPile;
Heap *heapArray;
}Plan;
Given this code, if I create:
Plan *p
And then I want to access a specific index in the heapArray inside Plan I would do:
p->heapArray[i]
From here, though if I want to access either the size of the dataArray inside a struct heap, would I use '->' or '.'?
So if I wanted to get the first element of the data array of that heap would I do:
p->heapArray[i].dataArray[0]
or
p->heapArray[i]->dataArray[0]
The correct answer is
p->heapArray[i].dataArray[0]
because when you use the subscript on the heapArray pointer, it's like doing pointer arithmetic and then dereferencing the pointer, something like this
(*(p->heapArray + 1)).dataArray[0]
so when you dereference it, the type of it becomes Heap which means it's not a pointer and has to be accessed with a . and not a ->.
p->heapArray[i] is of type Heap, which is a struct, so you'd use ..

Pointer to struct containing structs

I am writing a multithreaded application and would like to pass around pointers to a struct.
Do the structs in the struct need to be malloced or if the outer struct is malloced will it prevent the internal structs from being deleted or lost when passing around the pointer?
Struct I am asking about is
struct thread_data
{
position starttile;
position destinationtile;
char *message;
};
where position is a struct itself that contains no pointers
If the struct contains child structs, then it is generally all one block of memory. And so there would be no separate allocation.
If the struct instead contains pointers to structs, then my previous comment would not apply. In this case, it kind of depends on what you are doing.
Had you considered posting a tiny bit of code so people would have a clue what you had in mind.
You will probably find it easier to manage memory if you do
struct X {
struct Y data;
};
struct X* var = malloc(sizeof(struct X));
instead of
struct X {
struct Y* pData;
};
struct X* var = malloc(sizeof(struct X));
var->pData = malloc(sizeof(struct Y));
If your outer struct contains actual structures, there's no need to allocate them separately.
If your outer struct contains pointers to structures, then they'll need to be allocated somewhere.
It is easier if your outer structure contains actual structures. Even so, with pointers, simply make sure that you never make the pointer to the outer structure available to other threads until the inner structures are fully allocated - which avoids threading issues on allocation. Deallocation will require suitable care to ensure exclusive access.

Resources