Mapping varying memory block with a fixed array within a struct - c

struct image_struct {
unsigned int width;
unsigned int height;
char mode;
char depth;
unsigned char data[13];
}
image_struct* newImage( unsigned int width, unsigned int height, char depth ) {
image_struct* image = (image_struct*)malloc(
sizeof(image_struct) - 13 + width * height * depth );
return( image );
}
Visual Studio doesn't complain about accessing the fixed array beyond the 13 bytes, is this inadvisable? My intent was to avoid processing headers in file IO by using straight memory writes for structs with built-in headers. Apologies for the title. :\

There's a trick you can do where you define a zero-length array at the end of a struct. You can then allocate the sizeof the struct plus the size of the array you want and you get an array of any size you want, decided at run-time rather than compile-time. Here is some info on it:
http://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
Points to note:
You must allocate the right amount of memory. You may be able to access the memory beyond the struct even if you didn't allocate it. But that's a bug in your code. The memory may be used for something else, or cross a boundary etc. Worst case it'll overwrite some other data and you won't discover it until some other part of your program behaves oddly. Never use heap memory you didn't allocate.
Once allocated you cannot resize the array without reallocing the entire struct + array size.
The array has to be the last element of the array
Make sure you know how long the array is meant to be. Maybe store the length in a field in the struct and do your own bounds checking to ensure you don't go wrong with your pointer arithmetic (/array index access).
This only applies to structs allocated on the heap, not automatic variables on the stack.

Related

Is it ok to create a large array in the heap when you aren't necessarily using all of it?

So I'm looking at a solution to some coding interview type questions, and there's an array inside a struct
#define MAX_SIZE 1000000
typedef struct _heap {
int data[MAX_SIZE];
int heap_size;
}heap;
heap* init(heap* h) {
h = (heap*)malloc(sizeof(heap));
h->heap_size = 0;
return h;
}
This heap struct is later created like so
heap* max_heap = NULL;
max_heap = init(max_heap);
First of all, I'd wish this was written in C++ style than C, but secondly if I'm just conscerned about the array, I'm assuming it is equivalent to solely analyze the array portion by changing the code like this
int* data = NULL;
data = (int*)malloc(1000000 * sizeof(int));
Now in that case, is there any problems with declaring the array with the max size if you are probably just using a little bit of it?
I guess this boils down to the question of when an array is created in the heap, how does the system block out that portion of the memory? In which case does the system prevent you from accessing memory that is part of the array? I wouldn't want a giant array holding up space if I'm not using much of it.
is there any problems with declaring the array with the max size if you are probably just using a little bit of it?
Yes. The larger the allocation size the greater the risk of an out-of-memory error. If not here, elsewhere in code.
Yet some memory allocation systems handle this well as real memory allocations do not immediately occur, but later when needed.
I guess this boils down to the question of when an array is created in the heap, how does the system block out that portion of the memory?
That is an implementation defined issue not defined by C. It might happen immediately or deferred.
For maximum portability, code would take a more conservative approach and allocate large memory chunks only as needed, rather than rely on physical allocation occurring in a delayed fashion.
Alternative
In C, consider a struct with a flexible member array.
typedef struct _heap {
size_t heap_size;
int data[];
} heap;

the difference between struct with flexible arrays members and struct with pointer members

I'm quit confused with the difference between flexible arrays and pointer as struct members. Someone suggested, struct with pointers need malloc twice. However, consider the following code:
struct Vector {
size_t size;
double *data;
};
int len = 20;
struct Vector* newVector = malloc(sizeof *newVector + len * sizeof*newVector->data);
printf("%p\n",newVector->data);//print 0x0
newVector->data =(double*)((char*)newVector + sizeof*newVector);
// do sth
free(newVector);
I find a difference is that the address of data member of Vector is not defined. The programmer need to convert to "find" the exactly address. However, if defined Vector as:
struct Vector {
size_t size;
double data[];
};
Then the address of data is defined.
I am wondering whether it is safe and able to malloc struct with pointers like this, and what is the exactly reason programmers malloc twice when using struct with pointers.
The difference is how the struct is stored. In the first example you over-allocate memory but that doesn't magically mean that the data pointer gets set to point at that memory. Its value after malloc is in fact indeterminate, so you can't reliably print it.
Sure, you can set that pointer to point beyond the part allocated by the struct itself, but that means potentially slower access since you need to go through the pointer each time. Also you allocate the pointer itself as extra space (and potentially extra padding because of it), whereas in a flexible array member sizeof doesn't count the flexible array member. Your first design is overall much more cumbersome than the flexible version, but other than that well-defined.
The reason why people malloc twice when using a struct with pointers could either be that they aren't aware of flexible array members or using C90, or alternatively that the code isn't performance-critical and they just don't care about the overhead caused by fragmented allocation.
I am wondering whether it is safe and able to malloc struct with pointers like this, and what is the exactly reason programmers malloc twice when using struct with pointers.
If you use pointer method and malloc only once, there is one extra thing you need to care of in the calculation: alignment.
Let's add one extra field to the structure:
struct Vector {
size_t size;
uint32_t extra;
double *data;
};
Let's assume that we are on system where each field is 4 bytes, there is no trailing padding on struct and total size is 12 bytes. Let's also assume that double is 8 bytes and requires alignment to 8 bytes.
Now there is a problem: expression (char*)newVector + sizeof*newVector no longer gives address that is divisible by 8. There needs to be manual padding of 4 bytes between structure and data. This complicates the malloc size calculation and data pointer offset calculation.
So the main reason you see 1 malloc pointer version less, is that it is harder to get right. With pointer and 2 mallocs, or flexible array member, compiler takes care of necessary alignment calculation and padding so you don't have to.

c malloc array of struct

So far, I have dealt a bit with pointers and structs, but I'm not sure how to allocate an array of a structure at runtime - see below.
N.B. "user_size" is initialized at runtime.
typedef struct _COORDS
{
double x;
double y;
double area;
double circumference;
int index;
wchar_t name[16];
} COORDS, *PCOORDS;
PCOORDS pCoords = (PCOORDS)malloc(sizeof(COORDS)* user_size);
// NULL ptr check omitted
After that, can I just access pCoords[0] to pCoords[user_size-1] as with an ordinary array of ints?
More to the point: I don't understand how the compiler superimposes the layout of the structure on the alloc'ed memory? Does it even have to or am I overthinking this?
The compiler does not super-impose the structure on the memory -- you tell it to do so!
An array of structures is accessed by multiplying the index of one element by its total size. pCoords[3], for example, is "at" pCoords + 3*sizeof(COORDS) in memory.
A structure member is accessed by its offset (which is calculated by the sizes of the elements before it, taking padding into account). So member x is at an offset 0 from the start of its container, pCoords plus sizeof(COORDS) times the array element index; and y is sizeof(x) after that.
Since you tell the compiler that (1) you want a contiguous block of memory with a size for user_size times the size of a single COORD, and (2) then access this through pCoords[2].y, all it has to do is multiply and add, and then read the value (literally) in that memory address. Since the type of y is double, it reads and interprets the raw bytes as a double. And usually, it gets it right.
The only problem that can arise is when you have multiple pointers to that same area of memory. That could mean that the raw bytes "at" an address may need interpreting as different types (for instance, when one pointer tells it to expect an int and another a double).
With the provisio that the valid range is acutally 0..user_size - 1, your code is fine.
You are probably overthinking this. The compiler does not "superimpose" anything on the malloc'ed memory - that is just a bunch of bytes.
However, pointers are typed in C, and the type of the pointer determines how the memory is interpreted when the pointer is derefenced or used in pointer artihmetic. The compiler knows the memory layout of the struct. Each field has a defined size and an offset, and the overall size of the struct is known, too.
In your case, the expression pCoords[i].area = 42.0 is equivalent to
char *pByte = (char*)pCoords + sizeof(COORDS) * i + offsetof(COORDS, area);
double *pDouble = (pDouble*)pByte;
*pDouble = 42.0;

Various length structure in C for memory manager?

I practice in realization a memory manager in C.
I want the structure, that has a various length and self-described.
So, I peep at a POSIX textbook something, like that:
struct layout
{
uint32_t size; // array size in bytes, include space after the struct
uchar_t data[1];
};
// But, is next line correct?
layout *val = malloc (array_memory_in_bytes + sizeof (uint32_t) - 1);
// Where does a static array keep the pointer for using it?
If I have several these structures one-after-one in uninterrupted piece of memory, and I want be able to iterate through them. Can I write something, like that:
layout *val1 = pointer;
layout *val2 = val1 + val1.size + sizeof (val1.size);
Or can you recommend me a better approach?
The Standard C version of this is called flexible array member and it looks like:
struct layout
{
uint32_t size;
uchar_t data[];
};
// allocate one of these blocks (in a function)
struct layout *val = malloc( sizeof *val + number_of_bytes );
val->size = number_of_bytes;
The code val1->data + val1->size will get you a pointer one-past-the-end of the space you just malloc'd.
However you cannot iterate off the end of one malloc'd block and hope to hit another malloc'd block. To implement this idea you would have to malloc a large block and then place various struct layout objects throughout it, being careful about alignment.
In this approach, it's probably best to also store an index of where each struct layout is. In theory you could go through the list from the start each time, adding on size and then doing your alignment adjustment; but that would be slow and also it would mean you could not cope with a block in the middle being freed and re-"allocated".
If this is meant to be a drop-in replacement for malloc then there are in fact two alignment considerations:
alignment for struct layout
data must be aligned for any possible type
The simplest way to cope with this is to align struct layout for any possible type also. This could look like (note: #include <stdint.h> required):
struct layout
{
uint64_t size; // may as well use 64 bits since they're there
_Alignas(max_align_t) uchar_t data[];
};
An alternative might be to keep size at 32-bit and throw in a pragma pack to prevent padding; then you'll need to use some extra complexity to make sure that the struct layout is placed 4 bytes before a max_align_t-byte boundary, and so on. I'd suggest doing it the easy way first and get your code running; then later you can go back and try this change in order to save a few bytes of memory if you want.
Alternative approaches:
Keep each instance of a struct layout plus its trailing data in a separate allocation.
Change data be a pointer to malloc'd space; then you could keep all of the struct layout objects in an array.
The general idea will work, but that specific struct will only work if the most-severe boundary alignment case is an int.
A memory manager, particularly one that might be a back-end for an implementation of malloc(), must know what that worst-case boundary is. The actual start of data must be on that boundary in order to satisfy the general requirement that the allocated memory be suitably aligned for the storage of any data type.
The easiest way to get that done is to make the length allocation header described by the layout struct and the actual allocation sizes all multiples of that alignment unit.
No matter what, you can't describe the start of data as a struct member and have the size of that struct be the size of the header. C doesn't support zero-length fields. You should use something to put that array on boundary, and use the offsetof() macro from <stddef.h>.
Personally, I'd use a union, based on both old habits and occasional use of Visual C++ for C. But uint32_t is a C99 type and if you also have C11 support you can use _Alignas(). With that, your struct could look something like:
#define ALIGN_TYPE double /* if this is the worst-case type */
#define ALIGN_UNIT ((sizeof)(ALIGN_TYPE))
#define ALIGN_SIZE(n) (((size_t)(n) + ALIGN_UNIT - 1) & ~(ALIGN_UNIT-1))
typedef struct layout
{
size_t size; /* or use uint32_t if you prefer */
_Alignas(ALIGN_UNIT) char data[1];
} layout;
#define HEADER_SIZE (offsetof(layout, data))
That makes most everything symbolic except for the worst-case alignment type. You'd allocate the combined header plus data array with:
layout *ptr = (layout*) malloc(HEADER_SIZE + ALIGN_SIZE(number_of_bytes));
ptr->size = HEADER_SIZE;
The ALIGN_SIZE type really isn't a symbolic constant, though, unless C99/C11 changed the definition of sizeof. You can't use to compute ordinary array dimensions, for example. You can hard code a literal number, like 8 for a typical double, if that's a problem. Beware that long double has a problematical size (10 bytes) on many x86 implementations. If you're going to base the allocation unit on a type, then long double might not be your best choice.

Array of structure with struct hack

Suppose I have structure in C like this
struct A {
int len;
char s[1];
}
I want to have an array of above structure but char s[1] member of struct A can be of variable length. How can we do this? Even struct hack trick in C99 doesn't seem to work here. One solution is to have char * as a last member and do dynamic memory allocation, but I want all data of struct to be in contiguous locations as my implementation needs to be cache oblivious.
You can't have an array of variable size objects, so you can't have an array of structures using the struct hack. All the objects in an array must be the same size. And if they're all the same size, the size must be implied by the structure, so you won't be using the struct hack after all; there'll be a size other than 1 in the dimension of the array s in your structure (unless 1 is big enough for everything). The reason is that the storage location for a[i] (where a is the name of an array and i is an index into the array) must be computable as 'byte address of a plus (i times size of one object in array)'. So the size of the objects in the array (in this case, structures) must be known and fixed.
As an alternative, you can have an array of pointers to variable size objects; you simply have to arrange to allocate each object separately with the appropriate size, and save the pointer to that in the array.
Note that C99 does away with the 'struct hack' (which was never officially portable, though in practice it was) and introduces 'flexible array members' instead:
struct A {
int len;
char data[];
};
However, the advice above still applies.
If there is a maximum size for "s", you could use that instead of [1]. That keeps everything contiguous.
If you really don't want to use dynamic memory, then you can't do it with an array. You need your own "manager" that will use the struct hack trick on each member individually - but that means you can't do indexed lookups - you have to look at each element to see how big it is and jump the right number of bytes to the next element.
In C, array indexing involves multiplying the base address by the compile-time-constant size of an individual element. For that reason, you can't use inbuilt array support directly with the "struct hack", as each s element will be allocated exactly the 1 byte you request, and indices further past the struct will access following S elements in the array (or go off the end completely, possibly crashing).
If you really need contiguous data for cache-access speed, you can pack it yourself, you can solve this (like most things) with an indirection... have a contiguous array of S*, and manually pack your data into another contiguous buffer (malloc() or stack-allocate enough memory for all your S objects including the real data size of all s[] members). Your performance may suffer (or your OS crash) if the int len elements aren't optimally (properly) aligned for your architecture, so you may need to manually pad between S instances.
S* index[100] char data[10000];
(S*)(data) --------------> S with 14-byte s[] using data[0]..[17]
(S*)(data + 20) -----\ 2 byte padding so next S is 4-byte aligned
(S*)(data + 32) --\ \---> S with 7-byte s[] using data[20]..[30]
\ 1 byte padding...
\-----> ...
Unfortunately, this is quite an inflexible data layout - you can't just grow the amount of data in an element's s member without schuffling all the other data out of the way and patching the index, but that's normal for arrays so if you were already considering using them then perhaps this will suit you. Another hassle is calculating the total size of S structs (including s[] and any padding) up front....

Resources