c malloc array of struct - c

So far, I have dealt a bit with pointers and structs, but I'm not sure how to allocate an array of a structure at runtime - see below.
N.B. "user_size" is initialized at runtime.
typedef struct _COORDS
{
double x;
double y;
double area;
double circumference;
int index;
wchar_t name[16];
} COORDS, *PCOORDS;
PCOORDS pCoords = (PCOORDS)malloc(sizeof(COORDS)* user_size);
// NULL ptr check omitted
After that, can I just access pCoords[0] to pCoords[user_size-1] as with an ordinary array of ints?
More to the point: I don't understand how the compiler superimposes the layout of the structure on the alloc'ed memory? Does it even have to or am I overthinking this?

The compiler does not super-impose the structure on the memory -- you tell it to do so!
An array of structures is accessed by multiplying the index of one element by its total size. pCoords[3], for example, is "at" pCoords + 3*sizeof(COORDS) in memory.
A structure member is accessed by its offset (which is calculated by the sizes of the elements before it, taking padding into account). So member x is at an offset 0 from the start of its container, pCoords plus sizeof(COORDS) times the array element index; and y is sizeof(x) after that.
Since you tell the compiler that (1) you want a contiguous block of memory with a size for user_size times the size of a single COORD, and (2) then access this through pCoords[2].y, all it has to do is multiply and add, and then read the value (literally) in that memory address. Since the type of y is double, it reads and interprets the raw bytes as a double. And usually, it gets it right.
The only problem that can arise is when you have multiple pointers to that same area of memory. That could mean that the raw bytes "at" an address may need interpreting as different types (for instance, when one pointer tells it to expect an int and another a double).
With the provisio that the valid range is acutally 0..user_size - 1, your code is fine.

You are probably overthinking this. The compiler does not "superimpose" anything on the malloc'ed memory - that is just a bunch of bytes.
However, pointers are typed in C, and the type of the pointer determines how the memory is interpreted when the pointer is derefenced or used in pointer artihmetic. The compiler knows the memory layout of the struct. Each field has a defined size and an offset, and the overall size of the struct is known, too.
In your case, the expression pCoords[i].area = 42.0 is equivalent to
char *pByte = (char*)pCoords + sizeof(COORDS) * i + offsetof(COORDS, area);
double *pDouble = (pDouble*)pByte;
*pDouble = 42.0;

Related

the difference between struct with flexible arrays members and struct with pointer members

I'm quit confused with the difference between flexible arrays and pointer as struct members. Someone suggested, struct with pointers need malloc twice. However, consider the following code:
struct Vector {
size_t size;
double *data;
};
int len = 20;
struct Vector* newVector = malloc(sizeof *newVector + len * sizeof*newVector->data);
printf("%p\n",newVector->data);//print 0x0
newVector->data =(double*)((char*)newVector + sizeof*newVector);
// do sth
free(newVector);
I find a difference is that the address of data member of Vector is not defined. The programmer need to convert to "find" the exactly address. However, if defined Vector as:
struct Vector {
size_t size;
double data[];
};
Then the address of data is defined.
I am wondering whether it is safe and able to malloc struct with pointers like this, and what is the exactly reason programmers malloc twice when using struct with pointers.
The difference is how the struct is stored. In the first example you over-allocate memory but that doesn't magically mean that the data pointer gets set to point at that memory. Its value after malloc is in fact indeterminate, so you can't reliably print it.
Sure, you can set that pointer to point beyond the part allocated by the struct itself, but that means potentially slower access since you need to go through the pointer each time. Also you allocate the pointer itself as extra space (and potentially extra padding because of it), whereas in a flexible array member sizeof doesn't count the flexible array member. Your first design is overall much more cumbersome than the flexible version, but other than that well-defined.
The reason why people malloc twice when using a struct with pointers could either be that they aren't aware of flexible array members or using C90, or alternatively that the code isn't performance-critical and they just don't care about the overhead caused by fragmented allocation.
I am wondering whether it is safe and able to malloc struct with pointers like this, and what is the exactly reason programmers malloc twice when using struct with pointers.
If you use pointer method and malloc only once, there is one extra thing you need to care of in the calculation: alignment.
Let's add one extra field to the structure:
struct Vector {
size_t size;
uint32_t extra;
double *data;
};
Let's assume that we are on system where each field is 4 bytes, there is no trailing padding on struct and total size is 12 bytes. Let's also assume that double is 8 bytes and requires alignment to 8 bytes.
Now there is a problem: expression (char*)newVector + sizeof*newVector no longer gives address that is divisible by 8. There needs to be manual padding of 4 bytes between structure and data. This complicates the malloc size calculation and data pointer offset calculation.
So the main reason you see 1 malloc pointer version less, is that it is harder to get right. With pointer and 2 mallocs, or flexible array member, compiler takes care of necessary alignment calculation and padding so you don't have to.

C memory allocation sequence for struct data

I am reading a C scripts written by someone else. I don't understand this memory allocation part.
lda_suffstats* ss = malloc(sizeof(lda_suffstats));
ss->class_total = malloc(sizeof(double)*num_topics);
ss->class_word = malloc(sizeof(double*)*num_topics);
where lda_suffstats is a self-defined structure,
typedef struct
{
double** class_word;
double* class_total;
double alpha_suffstats;
int num_docs;
} lda_suffstats;
My question is regarding the first line of memory allocation. What is the size of lda_suffstats? Shouldn't the memory for each of its component be allocated before itself?
You can know how big lda_suffstats will be before you actually have one, just like you know how big of a bag you need to have with you in order to fit two cartons of milk and a dozen eggs. A size of lda_suffstats is a sum of sizes of double**, double*, double and int, no more, no less. They are not independent components, they'll all use the memory of the lda_suffstats. Now, the first two are pointers, which means the associated value is not right there but only pointed to, and allocating the target of the pointers is what the other two malloc lines are about.
lda_suffstats has four fields with the types of double**, double*, double, and int. The size of each of these is known at compile time. The sum of their sizes gives the size of lda_suffstats. The amount of memory allocated to the pointers does not change this because that memory is allocated outside of the struct.

Pointer to 2D arrays in C

I know there is several questions about that which gives good (and working) solutions, but none IMHO which says clearly what is the best way to achieve this.
So, suppose we have some 2D array :
int tab1[100][280];
We want to make a pointer that points to this 2D array.
To achieve this, we can do :
int (*pointer)[280]; // pointer creation
pointer = tab1; //assignation
pointer[5][12] = 517; // use
int myint = pointer[5][12]; // use
or, alternatively :
int (*pointer)[100][280]; // pointer creation
pointer = &tab1; //assignation
(*pointer)[5][12] = 517; // use
int myint = (*pointer)[5][12]; // use
OK, both seems to work well. Now I would like to know :
what is the best way, the 1st or the 2nd ?
are both equals for the compiler ? (speed, perf...)
is one of these solutions eating more memory than the other ?
what is the more frequently used by developers ?
//defines an array of 280 pointers (1120 or 2240 bytes)
int *pointer1 [280];
//defines a pointer (4 or 8 bytes depending on 32/64 bits platform)
int (*pointer2)[280]; //pointer to an array of 280 integers
int (*pointer3)[100][280]; //pointer to an 2D array of 100*280 integers
Using pointer2 or pointer3 produce the same binary except manipulations as ++pointer2 as pointed out by WhozCraig.
I recommend using typedef (producing same binary code as above pointer3)
typedef int myType[100][280];
myType *pointer3;
Note: Since C++11, you can also use keyword using instead of typedef
using myType = int[100][280];
myType *pointer3;
in your example:
myType *pointer; // pointer creation
pointer = &tab1; // assignation
(*pointer)[5][12] = 517; // set (write)
int myint = (*pointer)[5][12]; // get (read)
Note: If the array tab1 is used within a function body => this array will be placed within the call stack memory. But the stack size is limited. Using arrays bigger than the free memory stack produces a stack overflow crash.
The full snippet is online-compilable at gcc.godbolt.org
int main()
{
//defines an array of 280 pointers (1120 or 2240 bytes)
int *pointer1 [280];
static_assert( sizeof(pointer1) == 2240, "" );
//defines a pointer (4 or 8 bytes depending on 32/64 bits platform)
int (*pointer2)[280]; //pointer to an array of 280 integers
int (*pointer3)[100][280]; //pointer to an 2D array of 100*280 integers
static_assert( sizeof(pointer2) == 8, "" );
static_assert( sizeof(pointer3) == 8, "" );
// Use 'typedef' (or 'using' if you use a modern C++ compiler)
typedef int myType[100][280];
//using myType = int[100][280];
int tab1[100][280];
myType *pointer; // pointer creation
pointer = &tab1; // assignation
(*pointer)[5][12] = 517; // set (write)
int myint = (*pointer)[5][12]; // get (read)
return myint;
}
Both your examples are equivalent. However, the first one is less obvious and more "hacky", while the second one clearly states your intention.
int (*pointer)[280];
pointer = tab1;
pointer points to an 1D array of 280 integers. In your assignment, you actually assign the first row of tab1. This works since you can implicitly cast arrays to pointers (to the first element).
When you are using pointer[5][12], C treats pointer as an array of arrays (pointer[5] is of type int[280]), so there is another implicit cast here (at least semantically).
In your second example, you explicitly create a pointer to a 2D array:
int (*pointer)[100][280];
pointer = &tab1;
The semantics are clearer here: *pointer is a 2D array, so you need to access it using (*pointer)[i][j].
Both solutions use the same amount of memory (1 pointer) and will most likely run equally fast. Under the hood, both pointers will even point to the same memory location (the first element of the tab1 array), and it is possible that your compiler will even generate the same code.
The first solution is "more advanced" since one needs quite a deep understanding on how arrays and pointers work in C to understand what is going on. The second one is more explicit.
int *pointer[280]; //Creates 280 pointers of type int.
In 32 bit os, 4 bytes for each pointer. so 4 * 280 = 1120 bytes.
int (*pointer)[100][280]; // Creates only one pointer which is used to point an array of [100][280] ints.
Here only 4 bytes.
Coming to your question, int (*pointer)[280]; and int (*pointer)[100][280]; are different though it points to same 2D array of [100][280].
Because if int (*pointer)[280]; is incremented, then it will points to next 1D array, but where as int (*pointer)[100][280]; crosses the whole 2D array and points to next byte. Accessing that byte may cause problem if that memory doen't belongs to your process.
Ok, this is actually four different question. I'll address them one by one:
are both equals for the compiler? (speed, perf...)
Yes. The pointer dereferenciation and decay from type int (*)[100][280] to int (*)[280] is always a noop to your CPU. I wouldn't put it past a bad compiler to generate bogus code anyways, but a good optimizing compiler should compile both examples to the exact same code.
is one of these solutions eating more memory than the other?
As a corollary to my first answer, no.
what is the more frequently used by developers?
Definitely the variant without the extra (*pointer) dereferenciation. For C programmers it is second nature to assume that any pointer may actually be a pointer to the first element of an array.
what is the best way, the 1st or the 2nd?
That depends on what you optimize for:
Idiomatic code uses variant 1. The declaration is missing the outer dimension, but all uses are exactly as a C programmer expects them to be.
If you want to make it explicit that you are pointing to an array, you can use variant 2. However, many seasoned C programmers will think that there's a third dimension hidden behind the innermost *. Having no array dimension there will feel weird to most programmers.

Mapping varying memory block with a fixed array within a struct

struct image_struct {
unsigned int width;
unsigned int height;
char mode;
char depth;
unsigned char data[13];
}
image_struct* newImage( unsigned int width, unsigned int height, char depth ) {
image_struct* image = (image_struct*)malloc(
sizeof(image_struct) - 13 + width * height * depth );
return( image );
}
Visual Studio doesn't complain about accessing the fixed array beyond the 13 bytes, is this inadvisable? My intent was to avoid processing headers in file IO by using straight memory writes for structs with built-in headers. Apologies for the title. :\
There's a trick you can do where you define a zero-length array at the end of a struct. You can then allocate the sizeof the struct plus the size of the array you want and you get an array of any size you want, decided at run-time rather than compile-time. Here is some info on it:
http://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
Points to note:
You must allocate the right amount of memory. You may be able to access the memory beyond the struct even if you didn't allocate it. But that's a bug in your code. The memory may be used for something else, or cross a boundary etc. Worst case it'll overwrite some other data and you won't discover it until some other part of your program behaves oddly. Never use heap memory you didn't allocate.
Once allocated you cannot resize the array without reallocing the entire struct + array size.
The array has to be the last element of the array
Make sure you know how long the array is meant to be. Maybe store the length in a field in the struct and do your own bounds checking to ensure you don't go wrong with your pointer arithmetic (/array index access).
This only applies to structs allocated on the heap, not automatic variables on the stack.

Increasing The Size of Memory Allocated to a Struct via Malloc

I just learned that it's possible to increase the size of the memory you'll allocate to a struct when using the malloc function. For example, you can have a struct like this:
struct test{
char a;
int v[1];
char b;
};
Which clearly has space for only 2 chars and 1 int (pointer to an int in reality, but anyway). But you could call malloc in such a way to make the struct holds 2 chars and as many ints as you wanted (let's say 10):
int main(){
struct test *ptr;
ptr = malloc (sizeof(struct test)+sizeof(int)*9);
ptr->v[9]=50;
printf("%d\n",ptr->v[9]);
return 0;
}
The output here would be "50" printed on the screen, meaning that the array inside the struct was holding up to 10 ints.
My questions for the experienced C programmers out there:
What is happening behind the scenes here? Does the computer allocate 2+4 (2 chars + pointer to int) bytes for the standard "struct test", and then 4*9 more bytes of memory and let the pointer "ptr" put whatever kind of data it wants on those extra bytes?
Does this trick only works when there is an array inside the struct?
If the array is not the last member of the struct, how does the computer manage the memory block allocated?
...Which clearly has space for only 2 chars and 1 int (pointer to an
int in reality, but anyway)...
Already incorrect. Arrays are not pointers. Your struct holds space for 2 chars and 1 int. There's no pointer of any kind there. What you have declared is essentially equivalent to
struct test {
char a;
int v;
char b;
};
There's not much difference between an array of 1 element and an ordinary variable (there's conceptual difference only, i.e. syntactic sugar).
...But you could call malloc in such a way to make it hold 1 char and as
many ints as you wanted (let's say 10)...
Er... If you want it to hold 1 char, why did you declare your struct with 2 chars???
Anyway, in order to implement an array of flexible size as a member of a struct you have to place your array at the very end of the struct.
struct test {
char a;
char b;
int v[1];
};
Then you can allocate memory for your struct with some "extra" memory for the array at the end
struct test *ptr = malloc(offsetof(struct test, v) + sizeof(int) * 10);
(Note how offsetof is used to calculate the proper size).
That way it will work, giving you an array of size 10 and 2 chars in the struct (as declared). It is called "struct hack" and it depends critically on the array being the very last member of the struct.
C99 version of C language introduced dedicated support for "struct hack". In C99 it can be done as
struct test {
char a;
char b;
int v[];
};
...
struct test *ptr = malloc(sizeof(struct test) + sizeof(int) * 10);
What is happening behind the scenes here? Does the computer allocate
2+4 (2 chars + pointer to int) bytes for the standard "struct test",
and then 4*9 more bytes of memory and let the pointer "ptr" put
whatever kind of data it wants on those extra bytes?
malloc allocates as much memory as you ask it to allocate. It is just a single flat block of raw memory. Nothing else happens "behind the scenes". There's no "pointer to int" of any kind in your struct, so any questions that involve "pointer to int" make no sense at all.
Does this trick only works when there is an array inside the struct?
Well, that's the whole point: to access the extra memory as if it belongs to an array declared as the last member of the struct.
If the array is not the last member of the struct, how does the computer manage the memory block allocated?
It doesn't manage anything. If the array is not the last member of the struct, then trying to work with the extra elements of the array will trash the members of the struct that declared after the array. This is pretty useless, which is why the "flexible" array has to be the last member.
No, that does not work. You can't change the immutable size of a struct (which is a compile-time allocation, after all) by using malloc ( ) at run time. But you can allocate a memory block, or change its size, such that it holds more than one struct:
int main(){
struct test *ptr;
ptr = malloc (sizeof(struct test) * 9);
}
That's just about all you can do with malloc ( ) in this context.
In addition to what others have told you (summary: arrays are not pointers, pointers are not arrays, read section 6 of the comp.lang.c FAQ), attempting to access array elements past the last element invokes undefined behavior.
Let's look at an example that doesn't involve dynamic allocation:
struct foo {
int arr1[1];
int arr2[1000];
};
struct foo obj;
The language guarantees that obj.arr1 will be allocated starting at offset 0, and that the offset of obj.arr2 will be sizeof (int) or more (the compiler may insert padding between struct members and after the last member, but not before the first one). So we know that there's enough room in obj for multiple int objects immediately following obj.arr1. That means that if you write obj.arr1[5] = 42, and then later access obj.arr[5], you'll probably get back the value 42 that you stored there (and you'll probably have clobbered obj.arr2[4]).
The C language doesn't require array bounds checking, but it makes the behavior of accessing an array outside its declared bounds undefined. Anything could happen -- including having the code quietly behave just the way you want it to. In fact, C permits array bounds checking; it just doesn't provide a way to handle errors, and most compilers don't implement it.
For an example like this, you're most likely to run into visible problems in the presence of optimization. A compiler (particularly an optimizing compiler) is permitted to assume that your program's behavior is well-defined, and to rearrange the generated code to take advantage of that assumption. If you write
int index = 5;
obj.arr1[index] = 42;
the compiler is permitted to assume that the index operation doesn't go outside the declared bounds of the array. As Henry Spencer wrote, "If you lie to the compiler, it will get its revenge".
Strictly speaking, the struct hack probably involves undefined behavior (which is why C99 added a well-defined version of it), but it's been so widely used that most or all compilers will support it. This is covered in question 2.6 of the comp.lang.c FAQ.

Resources