Allocating a dynamic array in a dynamically allocated struct (struct of arrays) - c

This question is really about how to use variable-length types in the Python/C API (PyObject_NewVar, PyObject_VAR_HEAD, PyTypeObject.tp_basicsize and .tp_itemsize , but I can ask this question without bothering with the details of the API. Just assume I need to use an array inside a struct.
I can create a list data structure in one of two ways. (I'll just talk about char lists for now, but it doesn't matter.) The first uses a pointer and requires two allocations. Ignoring #includes and error handling:
struct listptr {
size_t elems;
char *data;
};
struct listptr *listptr_new(size_t elems) {
size_t basicsize = sizeof(struct listptr), itemsize = sizeof(char);
struct listptr *lp;
lp = malloc(basicsize);
lp->elems = elems;
lp->data = malloc(elems * itemsize);
return lp;
}
The second way to create a list uses array notation and one allocation. (I know this second implementation works because I've tested it pretty thoroughly.)
struct listarray {
size_t elems;
char data[1];
};
struct listarray *listarray_new(size_t elems) {
size_t basicsize = offsetof(struct listarray, data), itemsize = sizeof(char);
struct listarray *la;
la = malloc(basicsize + elems * itemsize);
la->elems = elems;
return lp;
}
In both cases, you then use lp->data[index] to access the array.
My question is why does the second method work? Why do you declare char data[1] instead of any of char data[], char data[0], char *data, or char data? In particular, my intuitive understanding of how structs work is that the correct way to declare data is char data with no pointer or array notation at all. Finally, are my calculations of basicsize and itemsize correct in both implementations? In particular, is this use of offsetof guaranteed to be correct for all machines?
Update
Apparently this is called a struct hack: In C99, you can use a flexible array member:
struct listarray2 {
size_t elems;
char data[];
}
with the understanding that you'll malloc enough space for data at runtime. Before C99, the data[1] declaration was common. So my question now is why declare char data[1] or char data[] instead of char *data or char data?

The reason you'd declare char data[1] or char data[] instead of char *data or char data is to keep your structure directly serializable and deserializable. This is important in cases where you'll be writing these sorts of structures to disk or over a network socket, etc.
Take for example your first code snippet that requires two allocations. Your listptr type is not directly serializable. i.e. listptr.elems and the data pointed to by listptr.data are not in a contiguous piece of memory. There is no way to read/write this structure to/from disk with a generic function. You need a custom function that is specific to your struct listptr type to do it. i.e. On serialize you'd have to first write elems to disk, and then write the data pointed to by the data pointer. On deserialization you'd have to read elems, allocate the appropriate space to listptr.data and then read the data from disk.
Using a flexible array member solves this problem because listptr.elem and the listptr.data reside in a contiguous memory space. So to serialize it you can simply write out the total allocated size for the structure and then the structure itself. On deserialize you then first read the allocated size, allocate the needed space and then read your listptr struct into that space.
You may wonder why you'd ever really need this, but it can be an invaluable feature. Consider a data stream of heterogeneous types. Provided you define a header that defines the which heterogeneous type you have and its size and precede each type in the stream with this header, you can generically serialize and deserialize data stream very elegantly and efficiently.
The only reason I know of for choosing char data[1] over char data[] is if you are defining an API that needs to be portable between C99 and C++ since C++ does not have support for flexible array members.
Also, wanted to point out that in the char data[1] you can do the following to get the total needed structure size:
size_t totalsize = offsetof(struct listarray, data[elems]);
You also ask why you wouldn't use char data instead of char data[1] or char data[]. While technically possible to use just plain old char data, it would be (IMHO) morally shunned. The two main issues with this approach are:
You wanted an array of chars, but now you can't access the data member directly as an array. You need to point a pointer to the address of data to access it as an array. i.e.
char *as_array = &listarray.data;
Your structure definition (and your code's use of the structure) would be totally misleading to anyone reading the code. Why declare a single char when you really meant an array of char?
Given these two things, I don't know why anyone would use char data in favor of char data[1]. It just doesn't benefit anyone given the alternatives.

Related

How can I access items in a dynamically allocated array of structs?

I'm in a class right now that works with C and one of my assignments requires that I work with a struct that my professor wrote for us. It's actually two structs, with one struct basically containing an array of the first struct.
Here's what they look like:
typedef struct cityStruct
{
unsigned int zip;
char *town
} city;
typedef struct zipTownsStruct
{
int *towns;
city **zips;
city *cities
} zipTowns;
And here's my function for allocating memory for the zipTowns structure:
void getArrs(zipTowns *arrs, int size)
{
arrs->towns = malloc(sizeof(int) * size);
arrs->zips = malloc(sizeof(city **) * size);
arrs->cities = malloc(sizeof(city *) * size);
}
From what I understand, what I'm doing here is allocating space in memory for a certain number of ints, city pointers, and city structures, based on the size variable. I understand that this is basically what an array is.
I'm having trouble with understanding how I can access these arrays and manipulate items in it. Writing this gives me an error:
strcpy(arrs.cities[0]->town, "testTown\0");
You can see what I'm trying to do here. I want to access each "City" in the zipTowns struct by index and insert a value.
How can I access the items in these dynamically allocated array of structures?
Think of x->y as (*x).y.
arrs is not a structure, it's a pointer to a structure, and cities is not a pointer to a pointer to a structure, it's just a pointer to a structure.
Use arrs->cities[0].town instead of arrs.cities[0]->town.
However, you're still not allocating enough room for these structures. This should make it clearer what you're doing with the allocations, and should also give you enough room for your data:
arrs->towns = malloc(sizeof(*arrs->towns) * size);
arrs->zips = malloc(sizeof(*arrs->zips) * size);
arrs->cities = malloc(sizeof(*arrs->cities) * size);
With the second and third, you were only allocating enough room for a pointer to be stored instead of the actual data type.
With this approach, you will be able to access from arrs->cities[0] to arrs->cities[9] and you also will be able to access the members of each city by doing arrs->cities[<number>].<member>.
You also do not need to intentionally null-terminate your strings. This is already done for you. Therefore, you can replace "testTown\0" with "testTown".

How to allocate memory dynamically for a struct [duplicate]

I have looked around but have been unable to find a solution to what must be a well asked question.
Here is the code I have:
#include <stdlib.h>
struct my_struct {
int n;
char s[]
};
int main()
{
struct my_struct ms;
ms.s = malloc(sizeof(char*)*50);
}
and here is the error gcc gives me:
error: invalid use of flexible array member
I can get it to compile if i declare the declaration of s inside the struct to be
char* s
and this is probably a superior implementation (pointer arithmetic is faster than arrays, yes?)
but I thought in c a declaration of
char s[]
is the same as
char* s
The way you have it written now , used to be called the "struct hack", until C99 blessed it as a "flexible array member". The reason you're getting an error (probably anyway) is that it needs to be followed by a semicolon:
#include <stdlib.h>
struct my_struct {
int n;
char s[];
};
When you allocate space for this, you want to allocate the size of the struct plus the amount of space you want for the array:
struct my_struct *s = malloc(sizeof(struct my_struct) + 50);
In this case, the flexible array member is an array of char, and sizeof(char)==1, so you don't need to multiply by its size, but just like any other malloc you'd need to if it was an array of some other type:
struct dyn_array {
int size;
int data[];
};
struct dyn_array* my_array = malloc(sizeof(struct dyn_array) + 100 * sizeof(int));
Edit: This gives a different result from changing the member to a pointer. In that case, you (normally) need two separate allocations, one for the struct itself, and one for the "extra" data to be pointed to by the pointer. Using a flexible array member you can allocate all the data in a single block.
You need to decide what it is you are trying to do first.
If you want to have a struct with a pointer to an [independent] array inside, you have to declare it as
struct my_struct {
int n;
char *s;
};
In this case you can create the actual struct object in any way you please (like an automatic variable, for example)
struct my_struct ms;
and then allocate the memory for the array independently
ms.s = malloc(50 * sizeof *ms.s);
In fact, there's no general need to allocate the array memory dynamically
struct my_struct ms;
char s[50];
ms.s = s;
It all depends on what kind of lifetime you need from these objects. If your struct is automatic, then in most cases the array would also be automatic. If the struct object owns the array memory, there's simply no point in doing otherwise. If the struct itself is dynamic, then the array should also normally be dynamic.
Note that in this case you have two independent memory blocks: the struct and the array.
A completely different approach would be to use the "struct hack" idiom. In this case the array becomes an integral part of the struct. Both reside in a single block of memory. In C99 the struct would be declared as
struct my_struct {
int n;
char s[];
};
and to create an object you'd have to allocate the whole thing dynamically
struct my_struct *ms = malloc(sizeof *ms + 50 * sizeof *ms->s);
The size of memory block in this case is calculated to accommodate the struct members and the trailing array of run-time size.
Note that in this case you have no option to create such struct objects as static or automatic objects. Structs with flexible array members at the end can only be allocated dynamically in C.
Your assumption about pointer aritmetics being faster then arrays is absolutely incorrect. Arrays work through pointer arithmetics by definition, so they are basically the same. Moreover, a genuine array (not decayed to a pointer) is generally a bit faster than a pointer object. Pointer value has to be read from memory, while the array's location in memory is "known" (or "calculated") from the array object itself.
The use of an array of unspecified size is only allowed at the end of a structure, and only works in some compilers. It is a non-standard compiler extension. (Although I think I remember C++0x will be allowing this.)
The array will not be a separate allocation for from the structure though. So you need to allocate all of my_struct, not just the array part.
What I do is simply give the array a small but non-zero size. Usually 4 for character arrays and 2 for wchar_t arrays to preserve 32 bit alignment.
Then you can take the declared size of the array into account, when you do the allocating. I often don't on the theory that the slop is smaller than the granularity that the heap manager works in in any case.
Also, I think you should not be using sizeof(char*) in your allocation.
This is what I would do.
struct my_struct {
int nAllocated;
char s[4]; // waste 32 bits to guarantee alignment and room for a null-terminator
};
int main()
{
struct my_struct * pms;
int cb = sizeof(*pms) + sizeof(pms->s[0])*50;
pms = (struct my_struct*) malloc(cb);
pms->nAllocated = (cb - sizoef(*pms) + sizeof(pms->s)) / sizeof(pms->s[0]);
}
I suspect the compiler doesn't know how much space it will need to allocate for s[], should you choose to declare an automatic variable with it.
I concur with what Ben said, declare your struct
struct my_struct {
int n;
char s[1];
};
Also, to clarify his comment about storage, declaring char *s won't put the struct on the stack (since it is dynamically allocated) and allocate s in the heap, what it will do is interpret the first sizeof(char *) bytes of your array as a pointer, so you won't be operating on the data you think you are, and probably will be fatal.
It is vital to remember that although the operations on pointers and arrays may be implemented the same way, they are not the same thing.
Arrays will resolve to pointers, and here you must define s as char *s. The struct basically is a container, and must (IIRC) be fixed size, so having a dynamically sized array inside of it simply isn't possible. Since you're mallocing the memory anyway, this shouldn't make any difference in what you're after.
Basically you're saying, s will indicate a memory location. Note that you can still access this later using notation like s[0].
pointer arithmetic is faster than arrays, yes?
Not at all - they're actually the same. arrays translate to pointer arithmetics at compile-time.
char test[100];
test[40] = 12;
// translates to: (test now indicates the starting address of the array)
*(test+40) = 12;
Working code of storing array inside a structure in a c, and how to store value in the array elements Please leave comment if you have any doubts, i will clarify at my best
Structure Define:
struct process{
int process_id;
int tau;
double alpha;
int* process_time;
};
Memory Allocation for process structure:
struct process* process_mem_aloc = (struct process*) malloc(temp_number_of_process * sizeof(struct process));
Looping through multiple process and for each process updating process_time dyanamic array
int process_count = 0;
int tick_count = 0;
while(process_count < number_of_process){
//Memory allocation for each array of the process, will be containting size equal to number_of_ticks: can hold any value
(process_mem_aloc + process_count)->process_time = (int*) malloc(number_of_ticks* sizeof(int));
reading data from line by line from a file, storing into process_time array and then printing it from the stored value, next while loop is inside the process while loop
while(tick_count < number_of_ticks){
fgets(line, LINE_LENGTH, file);
*((process_mem_aloc + process_count)->process_time + tick_count) = convertToInteger(line);;
printf("tick_count : %d , number_of_ticks %d\n",tick_count,*((process_mem_aloc + process_count)->process_time + tick_count));
tick_count++;
}
tick_count = 0;
the code generated will be identical (array and ptr). Apart from the fact that the array one wont compile that is
and BTW - do it c++ and use vector

Is there a way to initialize an array of strings in a struct when you don't know how many elements you will put in the string?

I have this struct:
typedef struct SomeStruct {
char someString[];
} SomeStruct;
This produces an error since someString's size is not defined when initialized.
I want to make someString an array of strings, but I will not know the size of the array at the time of initialization. (The elements that will be in the array will depend on user input later in the program).
Is it possible to initialize this as an array of strings without knowing the size of the array?
Yes, the C standard talks about this in 7.2.18-26. What you are describing is known as a flexible array member of a struct. From the standard:
As a special case, the last element of a structure with more than one named member may have an incomplete array type; this is called a flexible array member.
Essentially what it is saying is, if the last member of the struct is an array of undefined size (as might be the case for runtime sizes), then when using the struct, you would allocate the appropriate size of your struct including how large you want the string to be. For example:
typedef struct SomeStruct {
char someString[];
} SomeStruct;
has the flexible array member someString. A common way to use this is:
SomeStruct *p = malloc(sizeof (SomeStruct) + str_size);
Assuming that the call to malloc succeeds, the object pointed to by p behaves, for most purposes, as if p had been declared as:
struct {char someString[str_size]; } *p;
Read the standard for more detail. The buzzword flexible array member will show up a lot of information too. The wikipedia is a good place to start.
You can use a structure with flexible array. For example
typedef struct SomeStruct
{
size_t n;
char someString[];
} SomeStruct;
where n is used to store the number of elements in the array.
Then you can create objects of the structure the following way
SomeStruct *s = malloc( sizeof( SomeStruct ) + 10 * sizeof( char[100] ) );
s->n = 10;
If you can't use a dynamic array (it sounds like this, if you get a compile error for it), you can actually overrun the array, as long as it's at the end of the struct, and as long as you can actually access that memory. Example:
#include <stdio.h>
#include <stdlib.h>
typedef struct SomeStruct {
char someString[10];
} SomeStruct;
int main (void)
{
// Allocate 4x space, so we have room to overrun
SomeStruct *p = malloc(sizeof(SomeStruct) * 4);
p->someString[38] = 'a';
printf("%c\n", p->someString[38]);
}
Of course, you still have to actually allocate the space, so it may not be so useful to you depending on your case.

Why does internal Lua strings store the way they do?

I was wanting a simple string table that will store a bunch of constants and I thought "Hey! Lua does that, let me use some of there functions!"
This is mainly in the lstring.h/lstring.c files (I am using 5.2)
I will show the code I am curious about first. Its from lobject.h
/*
** Header for string value; string bytes follow the end of this structure
*/
typedef union TString {
L_Umaxalign dummy; /* ensures maximum alignment for strings */
struct {
CommonHeader;
lu_byte reserved;
unsigned int hash;
size_t len; /* number of characters in string */
} tsv;
} TString;
/* get the actual string (array of bytes) from a TString */
#define getstr(ts) cast(const char *, (ts) + 1)
/* get the actual string (array of bytes) from a Lua value */
#define svalue(o) getstr(rawtsvalue(o))
As you see, the data is stored outside of the structure. To get the byte stream, you take the size of TString, add 1, and you got the char* pointer.
Isn't this bad coding though? Its been DRILLED into m in my C classes to make clearly defined structures. I know I might be stirring a nest here, but do you really lose that much speed/space defining a structure as header for data rather than defining a pointer value for that data?
The idea is probably that you allocate the header and the data in one big chunk of data instead of two:
TString *str = (TString*)malloc(sizeof(TString) + <length_of_string>);
In addition to having just one call to malloc/free, you also reduce memory fragmentation and increase memory localization.
But answering your question, yes, these kind of hacks are usually a bad practice, and should be done with extreme care. And if you do, you'll probably want to hide them under a layer of macros/inline functions.
As rodrigo says, the idea is to allocate the header and string data as a single chunk of memory. It's worth pointing out that you also see the non-standard hack
struct lenstring {
unsigned length;
char data[0];
};
but C99 added flexible array members so it can be done in a standard compliant way as
struct lenstring {
unsigned length;
char data[];
};
If Lua's string were done in this way it'd be something like
typedef union TString {
L_Umaxalign dummy;
struct {
CommonHeader;
lu_byte reserved;
unsigned int hash;
size_t len;
const char data[];
} tsv;
} TString;
#define getstr(ts) (ts->tsv->data)
It relates to the complications arising from the more limited C language. In C++, you would just define a base class called GCObject which contains the garbage collection variables, then TString would be a subclass and by using a virtual destructor, both the TString and it's accompanying const char * blocks would be freed properly.
When it comes to writing the same kind of functionality in C, it's a bit more difficult as classes and virtual inheritance do not exist.
What Lua is doing is implementing garbage collection by inserting the header required to manage the garbage collection status of the part of memory following it. Remember that free(void *) does not need to know anything other than the address of the memory block.
#define CommonHeader GCObject *next; lu_byte tt; lu_byte marked
Lua keeps a linked list of these "collectable" blocks of memory, in this case an array of characters, so that it can then free the memory efficiently without knowing the type of object it is pointing to.
If your TString pointed to another block of memory where the character array was, then it require the garbage collector determine the object's type, then delve into its structure to also free the string buffer.
The pseudo code for this kind of garbage collection would be something like this:
GCHeader *next, *prev;
GCHeader *current = firstObject;
while(current)
{
next = current->next;
if (/* current is ready for deletion */)
{
free(current);
// relink previous to the next (singly-linked list)
if (prev)
prev->next = next;
}
else
prev = current; // store previous undeleted object
current = next;
}

How to save a dynamic struct to file

I have something like this, in fact more complex struct than this:
typedef struct _sample {
unsigned char type;
char *name;
test *first;
} sample;
typedef struct _test {
test *prev;
test *next;
char *name;
int total;
test_2 **list;
} test;
typedef struct _test_2 {
char *name;
unsigned int blabla;
} test_2;
sample *sample_var;
I want to backup this struct into a file and after restore it.
I also try with fwrite(sample_var, sizeof(sample), 1, file_handle); but the real problem is sizeof(sample) that return wrong size, not real variable size.
There is a way to save it into file & restore without knowing the size?
You are trying to serialize, or marshal the structure. You can't just fwrite the data (having pointers is the most obvious stopper). The sizeof problem is really minor when compared to storing pointers in a file (a pointer is meaningless outside the program where it originated).
You will have to define your own serialization / deserialization functions. You could either use your own simple format or use JSON, XML, XDR or something like that.
Personally I would go with JSON, since it's all the rage these days anyway.
As an aside, here is a C FAQ vaguely linked to your own question (though it discusses interoperabillity issues).
There is no easy approach to save such a structure into a file. For instance, even the sample.name field has a size of 4 (depending on architecture), while what you probably want to save is the content of the memory pointed by sample.name.
Here is a sample code that will do such a thing. You will have to duplicate the process to save the entire structure.
void saveToFile(FILE *fh, sample s)
{
fwrite(s.type, sizeof(char), fh);
int nameSize = strlen(s.name); // get the length of the name field
fwrite(nameSize, sizeof(size_t), fh); // write the length of the name field
frwite(s.name, nameSize * sizeof(char), fh); // write the content of the name field
// continue with other fields
}
The idea is to store the size of the next structure and then writting the content. To get the information from the file, you read the size, and then get the data.
sizeof(sample) is not incorrect: it returns the size of a char followed by two pointers. If you need to save such a recursive data type, you have to manually follow dereference the pointers.
It seems like what you really want to do is store the struct and what it's pointer's are referring to, not the pointers themselves.
You will need to write some logic the determine the size of the the data being pointed at, and write that data to the file instead of the pointers.

Resources