Linked List function explanation, subscription of a structure pointer - c

Programming a simple singly-linked-list in C, I came about this repository on Github: https://github.com/clehner/ll.c while looking for some examples.
There is the following function (_list_next(void *)):
struct list
{
struct list *next; // on 64-bit-systems, we have 8 bytes here, on 32-bit-systems 4 bytes.
void *value[]; // ISO C99 flexible array member, incomplete type, sizeof may not be applied and evaluates to zero.
};
void *_list_next(void *list)
{
return list ? ((struct list *)list)[-1].next : NULL; // <-- what is happening here?
}
Could you explain how this works?
It looks like he is casting a void pointer to a list pointer and then subscripting that pointer. How does that work and what exactly happens there?
I don't understand purpose of [-1].

This is undefined behavior that happens to work on the system where the author has tried it.
To understand what is going on, note the return value of _ll_new:
void * _ll_new(void *next, size_t size)
{
struct ll *ll = malloc(sizeof(struct ll) + size);
if (!ll)
return NULL;
ll->next = next;
return &ll->value;
}
The author gives you the address of value, not the address of the node. However, _list_next needs the address of struct list: otherwise it would be unable to access next. Therefore, in order to get to next member you need to find its address by walking back one member.
That is the idea behind indexing list at [-1] - it gets the address of next associated with this particular address of value. However, this indexes the array outside of its valid range, which is undefined behavior.
Other functions do that too, but they use pointer arithmetic instead of indexing. For example, _ll_pop uses
ll--;
which achieves the same result.
A better approach would be using something along the lines of container_of macro.

Related

When pointer points to structure location in memory how can we access structures fields just through that address?

Im new to community this is my first post so hello to everyone.
I have recently started studying the coding and c language in specific.But i have a confusion when it comes to structures and how they are referenced in memory.
This is example when my lack of understanding makes me unable to understand what exactly happening in code.
For example when asking malloc for space for lets say a node structure the way i understood it until now is that computer will allocate memory of size of struct if instructed by size of operator in parenthesis. Therefore that many memory locations will be allocated starting at specific location pointer points to.But when we use pointer of struct type we allocated memory for (in this case struct node) it just stores a address of first byte of said struct as all pointers do if i understand correctly.
Then when
`*(pointername).exactfieldname
For example if we assume there is node structure defined in code.With two fields for int called numbers and for pointer called next.
node *n=malloc (sizeof(node));
*(n).next=malloc (sizeof(node));
syntax is used i cant understand how it works exactly.How is a computer just through pointer to a first byte out of certain number of bytes that were allocated suddenly able to access fields of structure?
Reason this is additionally confusing is because when defining a node struct for linked list for example it is possible to define pointer to struct of struct type being defined before it is defined because its just a pointer so it only stores address. Due to that fact struct pointer cant have any special property allowing it to access fields its still just a pointer right?
When pointer is derefrenced does it mean that computer goes to pointed location and enters a strucutres. And then rest of syntax after dereferencing pointer like *(pointer ).fieldname can be used because now computer is inside structure and interacts with it and .fieldname refers to that instruction now?
I'll try to answer despite your question lacking some clarity.
If I get you right, you are confused by this:
typedef struct node {
struct node *next; // <<<< here
some_type_t data;
} node;
In the line marked, the compiler does not yet know what struct node looks like.
That is correct. It doesn't need to know that because we only store a pointer.
In that place you cannot define a non-pointer element of that type (or any other incomplete type) for exactly that reason.
Now if you come to that part:
node *n=malloc (sizeof(node));
n->next=malloc (sizeof(node));
(Note: Your syntax was incorrect)
You seem to wonder how the compiler would know what n->next really is as it was unknown when the struct was defined.
That does not matter.
It is known when the compiler comes to this line. You can only dereference a pointer if the type is fully known in that location.
The compiler now knows what node* means and can address the fields in *n and in the same way it can deal with n->next.
Study and try do understand the following code.
Compare the values that are printed out. %p will print address values (hex format) and %d prints decimal values.
Take a good look at the parameters that are passed to the printf function. & is the 'address of' operator and -> is a dereference operator, which is equal to *(pointer)..
struct node {
struct node *next; //pointer to struct node
struct data_rec { //embedded struct
int value; //some value of type int
} data; //data of type struct data_rec
};
//allocation on the heap -> pointer to struct node
struct node *allocated_node = malloc(sizeof(struct node));
allocated_node->next = NULL;
allocated_node->data.value = 0;
//allocation on the stack (sizeof(struct node) bytes)
struct node base_node;
base_node.next = allocated_node;
base_node.data.value = 42;
//prints some information of the node
void printNodeInfo(struct node *node_)
{
printf(
"address of node: %p\n"
"address of node.next: %p\n"
"value of node.next: %p\n"
"address of node.data: %p\n"
"address of node.data.value: %p\n"
"value of node.data.value: %d\n",
node_,
&node_->next,
node_->next,
&node_->data,
&node_->data.value,
node_->data.value
);
}
int main()
{
printNodeInfo(&base_node);
printNodeInfo(allocated_node);
return 0;
}

Error with malloc while creating memory for list of struct pointers

For some reason, while trying to create an array of pointers of a struct called Node, I keep getting an error:
Node ret[2] = (struct Node*)malloc(2*sizeof(Node));
error: invalid initializer
Can someone please help me with that?
Node ret[2] = (struct Node*)malloc(2*sizeof(Node));
should probably be:
Node *ret = malloc(2 * sizeof(*ret));
That's because you need a pointer to the memory, not an array. With an array, it's initialisation, which would require a braced init list. Note that this only provides memory for the pointers, not the things they point at - they need to be allocated separately if you wish to use them.
You'll probably notice two other changes as well:
I've removed the cast on the malloc return value. This serves no purpose in C since the void* returned can be implicitly cast to other pointer types. In fact, there are situations where explicit casts can lead to subtle problems.
I've used *ret as the variable to get the size rather than the type. This is just habit of mine so that, if I change the type to (for example) tNode, I only have to change it in one place on that line - yes, I'm basically lazy :-) Note that this is just a preference, doing it the original way has no adverse effect on the program itself, just developer maintenance.
I think your struct is typedef ed
Node ret[2] = ( struct Node* ) malloc( 2 * sizeof(Node) );
it should be
Node *rec[2] = { malloc(sizeof(Node)) , malloc(sizeof(Node)) };
or
Node *rec = malloc(2*sizeof(Node));

Pondering the purpose of TAILQ's tqe_prev not pointing to the previous node

In sys/queue.h there defines a data structure TAILQ. It is very popularly used throughout Linux kernel. Its definition is like this:
#define TAILQ_ENTRY(type) \
struct { \
struct type *tqe_next; /* next element */ \
struct type **tqe_prev; /* address of previous next element */ \
}
I am a little baffled at this code: what is the advantage to have tqe_prev pointing the tqe_next of the previous node? If it was me, I would have tqe_prev directly pointing to the previous node, similar to tqe_next pointing to the next node.
One reason I'd think of, when we insert a node, we directly operate on the pointer to be updated, we do not need to go through its owning node first. But is that it? Any other advantages?
I am wondering how we can travel backwards of the queue? When we have a pointer to a node, since its tqe_prev does not point to the previous node, we have no way to go through the queue till the head. Or such backward travel is by design not supported by TAILQ?
Oh, interesting. I didn't know this technique had any other users (I came up with it myself).
The reason to do things this way is that there may not be a "previous node": The first element does not have a predecessor, but it does have a pointer pointing to it.
This simplifies several operations. For example, if you want to delete a node given only a pointer to it, you can do this:
void delete(struct node *p) {
*p->tqe_prev = p->tqe_next;
if (p->tqe_next) {
p->tqe_next->tqe_prev = p->tqe_prev;
}
free(p);
}
If you had a pointer to the preceding node, you'd have to write this:
void delete(struct node *p) {
if (p->tqe_prev) {
p->tqe_prev->tqe_next = p->tqe_next;
} else {
???
}
if (p->tqe_next) {
p->tqe_next->tqe_prev = p->tqe_prev;
}
free(p);
}
... but now you're stuck: You can't write the ??? part without knowing where the root of the list is.
Similar arguments apply to insert operations.
Backwards traversal is indeed not a priority for this kind of structure. But it can be done if must be (but only if you know for sure that you are not at the root, i.e. you know there actually is a previous node):
#include <stddef.h>
struct node *prev(struct node *p) {
return (struct node *)((unsigned char *)p->tqe_prev - offsetof(struct node, tqe_next));
}
We know that p->tqe_prev is the address of a .tqe_next slot within a struct node. We cast this address to (unsigned char *) so we can do bytewise pointer arithmetic. We subtract the (byte) offset of .tqe_next within the struct node structure (offsetof macro courtesy of <stddef.h>). This gives us the address of the beginning of the struct node structure, which we finally cast to the right type.
Linus answered the question in https://meta.slashdot.org/story/12/10/11/0030249/linus-torvalds-answers-your-questions.
The quote is as follows:
At the opposite end of the spectrum, I actually wish more people understood the really core low-level kind of coding. Not big, complex stuff like the lockless name lookup, but simply good use of pointers-to-pointers etc. For example, I've seen too many people who delete a singly-linked list entry by keeping track of the "prev" entry, and then to delete the entry, doing something like
if (prev)
prev->next = entry->next;
else
list_head = entry->next;
and whenever I see code like that, I just go "This person doesn't understand pointers". And it's sadly quite common.
People who understand pointers just use a "pointer to the entry pointer", and initialize that with the address of the list_head. And then as they traverse the list, they can remove the entry without using any conditionals, by just doing a "*pp = entry->next".

function definition in C

I can't get that why are we using a * in a function declaration like:
struct node *create_ll(struct node *)
{
body here
}
Why do we use that * before create_ll which is the function name?
And it is called using the statement:
start = create_ll(start);
If this could help.
Please explain this.
struct node *create_ll(struct node *)
means the return type of this function will be a pointer of type struct node. read it like
struct node * ,
not like
*create_ll.
This has nothing to do with the NAME of the function.
As stated by Sourav (thought I'd elaborate further, and I can't comment due to low rep), using the * operator returns a pointer to the given type, this pointer is actually just a number that stores the starting memory address of the given object (the actual type of number depends on OS and processor... 32bit numbers on a 32bit OS/processor, 64bit numbers on a 64bit OS/processor) and not the actual object itself.
For instance: even if you have a 64bit processor, if you're running Windows XP (32bit) then the resulting number will be a 32bit number (4 bytes of memory to store), if you switched over to a 64bit OS then the resulting number would be a 64bit number (8 bytes of memory to store).
In order to get a pointer in the first place, the & operator is needed... unless dynamically allocated using malloc() or something similar.
When actually using the pointer, then the -> operator is used (instead of using the . operator).
To give an example in code:
struct test_object
{
unsigned int value;
};
void function()
{
// Declare a POINTER to an object of type <test_object>
test_object *pointer;
// Declare 2 temporary objects
test_object object1, object2;
// Set object1's value using the . operator
object1.value = 1;
// Set object2's value using the . operator
object2.value = 2;
// Set the pointer to point at object2
// Note the usage of the & operator
pointer = &object2;
// Print out whatever the pointer points to (in this case object2)
// Note the usage of -> instead of .
// This is how pointers access the object being pointed at
cout << pointer->value;
// Now set the pointer to point at object1
pointer = &object1;
// Print out whatever the pointer points to (in this case object1)
// Note this is the EXACT same line used above
// but the end result is completely different
cout << pointer->value;
};
A word of warning, pointers can be quite dangerous if used incorrectly. In the above example, I didn't initialize the pointer when I declared it...
IE:
test_object *pointer = NULL;
If you tried to use the COUT line in the above code without setting the pointer first, then really bad stuff can happen (program crashes, accessing the wrong memory location giving unexpected results, etc).
The best way to avoid such things is to ALWAYS initialize pointers to NULL, and ALWAYS check if the pointer is NULL before actually trying to access the memory being pointed to...
Re-using the above code, but making it safer:
void function()
{
// Declare the pointer
test_object *pointer = NULL;
// Declare the 2 actual objects
test_object object1, object2;
// Set values
object1.value = 1;
object2.value = 2;
// Check if pointer isn't pointing at anything
if (pointer == NULL)
{
// At this moment in time, it doesn't point at anything (it's still NULL)
// So this code WON'T run, which stops the program crashing
// Print out whatever the pointer points to
cout << pointer->value;
}
// Set the pointer to point at object2
pointer = &object2;
// Check if pointer isn't pointing at anything
if (pointer == NULL)
{
// Now it DOES point to something (anything other than NULL)
// Print out whatever the pointer points to
cout << pointer->value;
}
};
If you comment out the 2 if statements, then the program will probably crash when the first COUT is reached (it SHOULD crash, but not always).
I hope this answers your question
Here * means you are using pointers.Reason is that since Node could contain lot of variables and it would consume lot of memory So to avoid this we use pointers like
struct Node*
Since in this case when calling function and passing argument or returning argument of type Node you save lot of memory since in pointers only the address of the Node is passed or returned.Otherwise huge copy(If Node have lot of variables) of the Node will be made in memory before passing to or returning from the functions.
It is just like if I take you to an apple store and go with you to buy them and in the pointer case I tell you only the address of the shop and you buy yourself.
Now coming to second part of your question struct Node*
here function will return pointer of type struct Node so to use this you will write following code.
struct Node* someInput;
sturct Node* someOutput=create_ll(someInput);
and to use members inside someOutput for example if there are members like name and age inside Node you will do following
someOutput->age;
someOutput->name;
Your function is returning a pointer.
In this case it is returning a pointer of type struct node.
so the function prototype looks like
struct node *func(struct node *);

C generic linked-list

I have a generic linked-list that holds data of type void* I am trying to populate my list with type struct employee, eventually I would like to destruct the object struct employee as well.
Consider this generic linked-list header file (i have tested it with type char*):
struct accListNode //the nodes of a linked-list for any data type
{
void *data; //generic pointer to any data type
struct accListNode *next; //the next node in the list
};
struct accList //a linked-list consisting of accListNodes
{
struct accListNode *head;
struct accListNode *tail;
int size;
};
void accList_allocate(struct accList *theList); //allocate the accList and set to NULL
void appendToEnd(void *data, struct accList *theList); //append data to the end of the accList
void removeData(void *data, struct accList *theList); //removes data from accList
--------------------------------------------------------------------------------------
Consider the employee structure
struct employee
{
char name[20];
float wageRate;
}
Now consider this sample testcase that will be called from main():
void test2()
{
struct accList secondList;
struct employee *emp = Malloc(sizeof(struct employee));
emp->name = "Dan";
emp->wageRate =.5;
struct employee *emp2 = Malloc(sizeof(struct employee));
emp2->name = "Stan";
emp2->wageRate = .3;
accList_allocate(&secondList);
appendToEnd(emp, &secondList);
appendToEnd(emp2, &secondList);
printf("Employee: %s\n", ((struct employee*)secondList.head->data)->name); //cast to type struct employee
printf("Employee2: %s\n", ((struct employee*)secondList.tail->data)->name);
}
Why does the answer that I posted below solve my problem? I believe it has something to do with pointers and memory allocation. The function Malloc() that i use is a custom malloc that checks for NULL being returned.
Here is a link to my entire generic linked list implementation: https://codereview.stackexchange.com/questions/13007/c-linked-list-implementation
The problem is this accList_allocate() and your use of it.
struct accList secondList;
accList_allocate(&secondList);
In the original test2() secondList is memory on the stack. &secondList is a pointer to that memory. When you call accList_allocate() a copy of the pointer is passed in pointing at the stack memory. Malloc() then returns a chunk of memory and assigns it to the copy of the pointer, not the original secondList.
Coming back out, secondList is still pointing at uninitialised memory on the stack so the call to appendToEnd() fails.
The same happens with the answer except secondList just happens to be free of junk. Possibly by chance, possibly by design of the compiler. Either way it is not something you should rely on.
Either:
struct accList *secondList = NULL;
accList_allocate(&secondList);
And change accList_allocate()
accList_allocate(struct accList **theList) {
*theList = Malloc(sizeof(struct accList));
(*theList)->head = NULL;
(*theList)->tail = NULL;
(*theList)->size = 0;
}
OR
struct accList secondList;
accList_initialise(secondList);
With accList_allocate() changed to accList_initialise() because it does not allocate
accList_initialise(struct accList *theList) {
theList->head = NULL;
theList->tail = NULL;
theList->size = 0;
}
I think that your problem is this:
You've allocated secondList on the stack in your original test2 function.
The stack memory is probably dirty, so secondList requires initialization
Your accList_allocate function takes a pointer to the list, but then overwrites it with the Malloc call. This means that the pointer you passed in is never initialized.
When test2 tries to run, it hits a bad pointer (because the memory isn't initialized).
The reason that it works when you allocate it in main is that your C compiler probably zeros the stack when the program starts. When main allocates a variable on the stack, that allocation is persistent (until the program ends), so secondList is actually, and accidentally, properly initialized when you allocate it in main.
Your current accList_allocate doesn't actually initialize the pointer that's been passed in, and the rest of your code will never see the pointer that it allocates with Malloc. To solve your problem, I would create a new function: accList_initialize whose only job is to initialize the list:
void accList_initialize(struct accList* theList)
{
// NO malloc
theList->head = NULL;
theList->tail = NULL;
theList->size = 0;
}
Use this, instead of accList_allocate in your original test2 function. If you really want to allocate the list on the heap, then you should do so (and not mix it with a struct allocated on the stack). Have accList_allocate return a pointer to the allocated structure:
struct accList* accList_allocate(void)
{
struct accList* theList = Malloc( sizeof(struct accList) );
accList_initialize(theList);
return theList;
}
Two things I see wrong here based on the original code, in the above question,
What you've seen is undefined behaviour and arose from that is the bus error message as you were assigning a string literal to the variable, when in fact you should have been using the strcpy function, you've edited your original code accordinly so.. something to keep in mind in the future :)
The usage of the word Malloc is going to cause confusion, especially in peer-review, the reviewers are going to have a brain fart and say "whoa, what's this, should that not be malloc?" and very likely raise it up. (Basically, do not call custom functions that have similar sounding names as the C standard library functions)
You're not checking for the NULL, what if your souped up version of Malloc failed then emp is going to be NULL! Always check it no matter how trivial or your thinking is "Ah sher the platform has heaps of memory on it, 4GB RAM no problem, will not bother to check for NULL"
Have a look at this question posted elsewhere to explain what is a bus error.
Edit: Using linked list structures, in how the parameters in the function is called is crucial to the understanding of it. Notice the usage of &, meaning take the address of the variable that points to the linked list structure, and passing it by reference, not passing by value which is a copy of the variable. This same rule applies to usage of pointers also in general :)
You've got the parameters slightly out of place in the first code in your question, if you were using double-pointers in the parameter list then yes, using &secondList would have worked.
It may depend on how your Employee structure is designed, but you should note that
strcpy(emp->name, "Dan");
and
emp->name = "Dan";
function differently. In particular, the latter is a likely source of bus errors because you generally cannot write to string literals in this way. Especially if your code has something like
name = "NONE"
or the like.
EDIT: Okay, so with the design of the employee struct, the problem is this:
You can't assign to arrays. The C Standard includes a list of modifiable lvalues and arrays are not one of them.
char name[20];
name = "JAMES" //illegal
strcpy is fine - it just goes to the memory address dereferenced by name[0] and copies "JAMES\0" into the memory there, one byte at a time.

Resources