I think I have a basic understanding of how skip lists work, but being new to them in addition to being a C-beginner has me confused on a few points, especially the initialization of the list. Here's the code I'm trying to follow:
#define MAXSKIPLEVEL 5
typedef struct Node {
int data;
struct Node *next[1];
} Node;
typedef struct SkipList {
Node *header;
int level;
} SkipList;
// Initialize skip list
SkipList* initList() {
SkipList *list = calloc(1, sizeof(SkipList));
if ((list->header = calloc(1, sizeof(Node) + MAXSKIPLEVEL*sizeof(Node*))) == 0) {
printf("Memory Error\n");
exit(1);
}
for (int i = 0; i < MAXSKIPLEVEL; i++)
list->header->next[i] = list->header;
return list;
}
I haven't done anything with arrays of pointers yet in C, so I think I'm getting a bit caught up with how they work. I have a few questions if someone would be kind enough to help me out.
First, I did sizeof(int) and got 4, sizeof(Node*) and got 8, so I expected sizeof(Node) to equal 12, but it ended up being 16, why is this? Same confusion with the size of SkipList compared to the sizes of its contents. I took the typedef and [1] out to see if either of them was the cause, but the size was still 16.
Second, why is the [1] in struct Node *next[1]? Is it needed for the list->header->next[i] later on? Is it okay that next[i] will go higher than 1? Is it just because the number of pointers for each node is variable, so you make it an array then increase it individually later on?
Third, why does list->header->next[i] = list->header initially instead of NULL?
Any advice/comments are greatly appreciated, thanks.
For your first question - why isn't the size of the struct the size of its members? - this is due to struct padding, where the compiler, usually for alignment reasons, may add in extra blank space between or after members of a struct in order to get the size up to a nice multiple of some fundamental size (often 8 or 16). There's no portable way to force the size of the struct to be exactly the size of its members, though most compilers have some custom switches you can flip to do this.
For your second question - why the [1]? - the idea here is that when you actually allocate one of the node structs, you'll overallocate the space so that the memory at the end of the struct can be used for the pointers. By creating an array of length one and then overallocating the space, you make it syntactically convenient to access this overallocated space as though it were a part of the struct all along. Newer versions of C have a concept called flexible array members that have supplanted this technique; I'd recommend Googling it and seeing if that helps out.
For your final question - why does list->header->next[i] initially point to list->header rather than NULL? - without seeing more of the code it's hard to say. Many implementations of linked list structures use some sort of trick like this to avoid having to special-case on NULL in the implementation, and it's entirely possible that this sort of trick is getting used here as well.
The sizeof number is 16 because of structure padding.
Many architectures either require or strongly prefer their pointers to be aligned on a certain boundary (e.g., 4-byte boundary, 8-byte boundary, etc.) They will either fail, or they will perform slowly, if pointers are "misaligned". Your C compiler is probably inserting 4 unused bytes in the middle of your structure so that your 8-byte pointer is aligned on an 8-byte boundary, which causes the structure size to increase by 4 bytes.
There is more explanation available from the C FAQ.
Related
I'm quit confused with the difference between flexible arrays and pointer as struct members. Someone suggested, struct with pointers need malloc twice. However, consider the following code:
struct Vector {
size_t size;
double *data;
};
int len = 20;
struct Vector* newVector = malloc(sizeof *newVector + len * sizeof*newVector->data);
printf("%p\n",newVector->data);//print 0x0
newVector->data =(double*)((char*)newVector + sizeof*newVector);
// do sth
free(newVector);
I find a difference is that the address of data member of Vector is not defined. The programmer need to convert to "find" the exactly address. However, if defined Vector as:
struct Vector {
size_t size;
double data[];
};
Then the address of data is defined.
I am wondering whether it is safe and able to malloc struct with pointers like this, and what is the exactly reason programmers malloc twice when using struct with pointers.
The difference is how the struct is stored. In the first example you over-allocate memory but that doesn't magically mean that the data pointer gets set to point at that memory. Its value after malloc is in fact indeterminate, so you can't reliably print it.
Sure, you can set that pointer to point beyond the part allocated by the struct itself, but that means potentially slower access since you need to go through the pointer each time. Also you allocate the pointer itself as extra space (and potentially extra padding because of it), whereas in a flexible array member sizeof doesn't count the flexible array member. Your first design is overall much more cumbersome than the flexible version, but other than that well-defined.
The reason why people malloc twice when using a struct with pointers could either be that they aren't aware of flexible array members or using C90, or alternatively that the code isn't performance-critical and they just don't care about the overhead caused by fragmented allocation.
I am wondering whether it is safe and able to malloc struct with pointers like this, and what is the exactly reason programmers malloc twice when using struct with pointers.
If you use pointer method and malloc only once, there is one extra thing you need to care of in the calculation: alignment.
Let's add one extra field to the structure:
struct Vector {
size_t size;
uint32_t extra;
double *data;
};
Let's assume that we are on system where each field is 4 bytes, there is no trailing padding on struct and total size is 12 bytes. Let's also assume that double is 8 bytes and requires alignment to 8 bytes.
Now there is a problem: expression (char*)newVector + sizeof*newVector no longer gives address that is divisible by 8. There needs to be manual padding of 4 bytes between structure and data. This complicates the malloc size calculation and data pointer offset calculation.
So the main reason you see 1 malloc pointer version less, is that it is harder to get right. With pointer and 2 mallocs, or flexible array member, compiler takes care of necessary alignment calculation and padding so you don't have to.
I'm confused about how to access an array of structs.
simple case:
typedef struct node
{
int number;
struct node *left;
struct node *right;
} node;
node *nodeArray = malloc(sizeof(node));
nodeArray->number = 5;
So, that all makes sense. but the following doesn't work:
typedef struct node
{
int number;
struct node *left;
struct node *right;
} node;
node *nodeArray = malloc(511 * sizeof(node));
for(int i = 0; i < 511; i++)
{
nodeArray[i]->number = i;
}
However, nodeArray[i].number = i does seem to work can someone explain what's going on and also what's the difference between node *nodeArray = malloc(511 * sizeof(node)); and node (*nodeArray) = malloc(511 * sizeof(node));
In the first snippet, the following are all equivalent:
nodeArray->number = 5; // preferred
nodeArray[0].number = 5;
(*nodeArray).number = 5;
In the second snippet, the following are all equivalent:
(nodeArray + i)->number = i;
nodeArray[i].number = i; // preferred
(*(nodeArray + i)).number = i;
So, as you can see, there is a choice of three different syntaxes that all do the same thing. The arrow syntax (nodeArray->number) is preferred when dealing with a pointer to a single instance of the struct. The array indexing with dot notation (nodeArray[i].number) is preferred when dealing with a pointer to an array of structs. The third syntax (dereferencing the pointer and dot notation) is avoided by sensible programmers.
When you allocate an array like this
node* nodeArray = malloc(511*sizeof(node));
nodeArray is a pointer, getting a pointer to an individual struct node you just add an integer:
nodeArray + 1 would give a pointer to the second node
nodeArray + 1 can be written as &nodeArray[1]
so to dereference the pointer
*(nodeArray + 1).number or write nodeArray[1].number
May be the problem is caused by alignment:
Your node structure contains an integer and two pointers, its minimum storage size could be 12 bytes (on most 32-bit architectures) or 24 bytes (64-bit architectures) but the alignment constraints of the architecture may force each node to be aligned using another maximum storage size (with extra padding, which needs to be allocated too.
sizeof(type) just returns a minimum storage size (the extra allocated padding should not be accessible, even if this is not checked at runtime or by the compiler).
Solution: use calloc() which will also take into consideration the alignment constraints for each item in your array!
Replace:
node *nodeArray = malloc(511 * sizeof(node));
by:
node *nodeArray = calloc(511, sizeof(node));
and now your code is normally safe, the actually allocated size will include the necessary additional padding required by the underlying architecture.
Otherwise your code is not portable.
Note that some C/C++ compiler also provide a alignof(type) to get the correct alignment for the datatype (and it should be used for implementing void *calloc(size_t nitems, size_t size) in the C/C++ libraries).
Your sample code above may suffer from buffer overflows because you did not allocate enough space for the array before writing items in the loop.
You don't see the difference when you use simple types (you don't care about their alignment or where they are are isolately allocated, there's possibly extra padding allocated on the stack or in structures using them, which is not accessible, even if no padding is necessary when their storage is allocated inside physical registers; but even with "auto" or "register" allocation, the compiler may still allocate space on the stack for it, as a backing store that could be used to save the register when it is needed for something else or before performing an external function call, or method call in C++ and the function body is not inlined).
See the documentation of alignofand alignas declarators in C++11. There are many resources about them; for example:
https://en.cppreference.com/w/cpp/language/alignas
See as well the documentation of calloc()
(And don't be confused by the simplified 32-bit or 64-bit memory models used in Linux; even Linux uses now more precise memory models, taking into account alignment problems, as well as accessibility and performance problems, sometimes enforced by the underlying platform for good security reasons in order to reduce a surface of attacks that exists in the single/unified "flat" memory model for everything: segmented architectures are coming back in the computing industry, and C/C++ compilers had to adapt: C++11 replies to this problematic that otherwise would require costlier or inefficient solutions in the compiled code, severely limiting some optimizations such as cache management, efficiency of TLB stores, paging and virtualized memory, enforced security scopes for users/process/threads and so on).
Remember that each datatype has its own size and alignment and they are independent. The assumption that there's a single "size" to allocate for a datatype in an array is wrong (as well extra padding at end of the allocated array, after its last item, may not be allocated, and read/write access to padding areas may be restricted/enforced by the compiler or at runtime).
Now consider also the case of bitfields (datatypes declared as members of structures with an extra precision/size parameter): their sizeof() is not the true minimum as they can be packed more tightly (including arrays of booleans: sizeof() returns the minimum size of the datatype once it has been promoted to an integer and so when it has possibly been enlarged with extra padding or extension of the sign bit; usually the compiler enforces theses invalid accesses to padding bits by using bitmasking, shifts or rotations; but a processor may provide more convenient instructions to handle bits inside a word unit in memory or even in a register, so that your bitfields won't overflow and modify other surrounding bitfields or padding bits because of an arithmetic operation on their value).
As well your nodeArray[i] returns a reference to a node object, not a pointer, so nodeArray[i]->anything is invalid: you need to replace the -> by a ..
I want to know how to store custom objects (not their pointers) in C. I have created a custom structure called Node
#define MAXQ 100
typedef struct {
int state[MAXQ];
int height;
} Node;
(which works) and I want to store a few of these Nodes in a container (without using pointers, since they are not stored elsewhere) so I can access them later.
The internet seems to suggest something like calloc() so my last attempt was to make a container Neighbors following this example, with numNeighbors being just an integer:
Node Neighbors = (Node*)calloc(numNeighbors, sizeof(Node));
At compilation, I got an error from this line saying
initializing 'Node' with an expression of incompatible type 'void *'
and in places where I referenced to this container (as in Neighbors[i]) I got errors of
subscripted value is not an array, pointer, or vector
Since I'm spoiled by Python, I have no idea if I've got my syntax all wrong (it should tell you something that I'm still not there after scouring a ton of tutorials, docs, and stackoverflows on malloc(), calloc() and the like), or if I am on a completely wrong approach to storing custom objects (searching "store custom objects in C" on the internet gives irrelevant results dealing with iOS and C# so I would really appreciate some help).
EDIT: Thanks for the tips everyone, it finally compiled without errors!
You can create a regular array using your custom struct:
Node Neighbors[10];
You can then reference them like any other array, for example:
Neighbors[3].height = 10;
If your C implementation supports C.1999 style VLA, simply define your array.
Node Neighbors[numNeighbors];
(Note that VLA has no error reporting mechanism. A failed allocation results in undefined behavior, which probably expresses itself as a crash.)
Otherwise, you will need dynamic allocation. calloc is suitable, but it returns a pointer representing the contiguous allocation.
Node *Neighbors = calloc(numNeighbors, sizeof(*Neighbors));
Note, do not cast the result of malloc/calloc/realloc when programming in C. It is not required, and in the worst case, can mask a fatal error.
I want to store a few of these Nodes in a container (without using pointers, since they are not stored elsewhere) so I can access them later.
If you know the amount of them at compile-time (or at the very least a reasonable maximum); then you can create an array of stack-allocated objects. For instance, say you are OK with a maximum of 10 objects:
#define MAX_NODES 10
Node nodes[MAX_NODES];
int number_nodes = 0;
Then when you add an object, you keep in sync number_nodes (so that you know where to put the next one). Technically, you will always have 10, but you only use the ones you want/need. Removing objects is similar, although more involved if you want to take out some in the middle.
However, if you don't know how many you will have (nor a maximum); or even if you know but they are way too many to fit in the stack; then you are forced to use the heap (typically with malloc() and free()):
int number_nodes; // unknown until runtime or too big
Node * nodes = malloc(sizeof(Node) * number_nodes);
...
free(nodes);
In any case, you will be using pointers in the dynamically allocated memory case, and most probably in the stack case as well.
Python is hiding and doing all this dance for you behind the scenes -- which is quite useful and time saving as you have probably already realized, as long as you do not need precise control over it (read: performance).
malloc and calloc are for dynamic allocation, and they need pointer variables. I don't see any reason for you to use dynamic allocation. Just define a regular array until you have a reason not to.
#define MAXQ 100
#define NUM_NEIGHBORS 50
typedef struct {
int state[MAXQ];
int height;
} Node;
int main(void)
{
Node Neighbors[NUM_NEIGHBORS];
Neighbors[0].state[0] = 0;
Neighbors[0].height = 1;
}
Here NUM_NEIGHBORS needs to be a constant. (Hence static) If you want it to be variable or dynamic, then you need dynamic allocations, and pointers inevitably:
#define MAXQ 100
typedef struct {
int state[MAXQ];
int height;
} Node;
int main(void)
{
int numNeighbors = 50;
Node *Neighbors;
Neighbors = (Node*)calloc(numNeighbors, sizeof(Node));
Neighbors[0].state[0] = 0;
Neighbors[0].height = 1;
}
typedef struct Node {
int data;
struct Node *next;
} node;
pointer->next = (node*)malloc(sizeof(node));
How many bytes of memory are dynamically given to pointer->next in the above code. For (int*)malloc(sizeof(int)), 2 bytes are given. Likewise how many for node?
Malloc will dinamically assign the size of "node".
Node is a struct and the size of every struct depends on the size of every element inside the struct.
In this case, the size of node will be: size of int + size of struct Node*
(If the result is not multiple of 2, it will be padded for architecture reasons)
Your device has an architecture of 2 bytes, and for that reason, the size of the structs can only be 2, 4, 6, 8 etc...
The size of int depends on the target you are working on. Since your architecture is 16 bits, the size of int is 2 bytes.
About. the size of struct Node *, you need to know that EVERY pointer data types have exactly the same size, it doesn't matter the data type their are pointing to. And that size also depends on the architecture. Again, your architecture is 16 bits and that's why the size of struct node * is 2 bytes.
size of int = 2.
size of struct node * = 2
Total memory assigned by malloc = 2 + 2 = 4
First, a suggestion: rewrite
pointer->next=(node*)malloc(sizeof(node));
as
pointer->next = malloc( sizeof *pointer->next );
You don't need the cast (unless you're working on a pre-ANSI implementation, in which case God help you), and using the dereferenced target as the operand of sizeof means you don't have to specify the type, potentially saving you some maintenance heartburn.
Also, a little whitespace goes a long way (although you don't need to put whitespace around the function arguments - that's my style, some people don't like it, but it makes things easier for me to read).
How much bytes of memory is dynamically given to pointer->next
It will be at least as big as sizeof (int) plus sizeof (struct Node *), and potentially may be bigger; depending on your platform, it could be as small as 4 bytes or as large as 16. C allows for "padding" bytes between struct members to satisfy alignment requirements for the underlying architecture. For example, a particular architecture may require that all multi-byte objects be aligned on addresses that are multiples of 4; if your data member is only 2 bytes wide, then there will be 2 unused bytes between it and the next member.
Without knowing a lot about your system, we just can't tell you. You can take that same code and try it on multiple compilers, and you'll get different answers. You have to check yourself, using sizeof(node) or sizeof(struct Node) (I think either syntax works, but just in case).
consider the code below:
#include "list.h"
struct List
{
int size;
int* data;
};
List *list_create()
{
List *list;
printf("%d %d",sizeof(list),sizeof(List));
list = malloc(sizeof(list));
assert(list != NULL);
if (list != NULL) {
list->size = 0;
}
return list;
}
The number printed out is "4 8", i assume this is the 4 bytes taken by "int size" in List object?and the size of "int* data" is 0 cause nothing has assigned to data?
the size of int pointer is also 4 bytes so the type List take 8 bytes in total? or there are some thing else going on? Can some one help me understand all this in detail?
then the malloc() get 4 bytes from the heap and assign the address to the pointer list? later in main if i do "list->data[i]=1;" this will give me a run time error why? Is it because I cant change contents in the heap? but if i do "list->size++" this would work,
isn't the whole list object is in the heap?
really need some help here
Thanks in advance.
sizeof(List*) is the size of a pointer to a List struct.
sizeof(list) in your case, since variable list is of type List* is the same as sizeof(List*).
sizeof(List) instead is the size of the struct List, it contains two 32 bit variables (I assume you are using a 32 bit compiler obviously), an integer and a pointer and your compiler decided that the right size for your struct is 8 bytes.
Pointers to types are usually 4 byte in 32 bit compilers and 8 bytes in 64 bit compilers.
As a side note, reading your code however i read you never initialize list->data, you should initialize it to something somewhere i guess.
This is C++ however, you should write
typedef struct { ... } List; // This is C.
Sizeof operator is evaluated at compile time, not at runtime, it gives only information of the size of a type.
You cannot, for example, know how much elements are in a dynamic array with sizeof, if you were trying to accomplish this, sizeof(pointer) will give you the size in byte of the pointer type.
As something to read about what is a pointer and what is an array i would suggest you to read http://www.lysator.liu.se/c/c-faq/c-2.html or http://pw1.netcom.com/~tjensen/ptr/pointers.htm
Technically your code has an error in it.
The code should read: sizeof(struct List) or have typedef struct List List; somewhere.
But yes, sizeof(list) is the size of the variable list. Since list is a pointer it is equivalent to sizeof(void*) which on your system/compiler is 4.
sizeof(struct List) is the size of the struct which is sizeof(int)+sizeof(int*)+any alignment issues. The alignment thing is often forgotten but is very important as it can change the size of the struct in unexpected ways.