I'm writing an application in C (as a beginner) and I'm struggling with getting corrupted data inside a struct that contains a variable length array. I found similar issues described in forum posts on cprogramming.com and also on cert.og/secure-coding. I thought I'd had found the right solution, but it seems not.
The struct looks like this;
typedef struct {
int a;
int b;
} pair;
typedef struct {
CommandType name;
pair class;
pair instr;
pair p1;
pair p2;
pair p3;
CommandType expected_next;
char* desc;
int size;
pair sw1;
pair sw2;
pair* data;
} command;
With the problematic one being "command". For any given instance (or whatever the correct phrase would be) of "command" different fields would be set, although in most cases the same fields are set albeit in different instances.
The problem I have is when trying to set the expected_next, name, sw1, sw2, size and data fields. And it's the data field that's getting corrupt. I'm allocating memory for the struct like this;
void *command_malloc(int desc_size,int data_size)
{
return malloc(sizeof(command) +
desc_size*sizeof(char) +
data_size*sizeof(pair));
}
command *cmd;
cmd = command_malloc(0, file_size);
But when I (pretty) print the resulting cmd, the middle of the data field appears to be random garbage. I've stepped through with gdb and can see that the correct data is getting loaded into the the field. It appears that it's only when the command gets passed to a different function that it gets corrupted. This code is called inside a function such as;
command* parse(char *line, command *context)
And the pretty-print happens in another function;
void pretty_print(char* line, command* cmd)
I had thought I was doing things correctly, but apparently not. As far as I can tell, I construct other instances of the struct okay (and I duplicated those approaches for this one) but they don't contain any variable length array in them and their pretty-prints looks fine - which concerns me because they might also be broken, but the breakage is less obvious.
What I'm writing is actually a parser, so a command gets passed into the parse function (which describes the current state, giving hints to the parser what to expect next) and the next command (derived from the input "line") is returned. "context" is free-d at the end of the parse function, which the new command getting returned - which would then be passed back into "parse" with the next "line" of input.
Can anyone suggest anything as to why this might be happening?
Many thanks.
When you allocate memory to structure, only a pointer size gets allocated to *desc. You must allocate memory to the space (array contents) desc points to, as someone already pointed out. Purpose of my answer is to show slightly different way of doing that.
Since having a pointer *desc increases structure size by a word (sizeof pointer), you can safely have a variable length array hack in you structure to reduce structure size.
Here's how your structure should look like, notice that desc[] has been pulled down to the end of structure :
typedef struct {
CommandType name;
pair class;
pair instr;
pair p1;
pair p2;
pair p3;
CommandType expected_next;
int size;
pair sw1;
pair sw2;
pair* data;
char desc[];
} command;
Now,
1. Allocate memory for command which includes array size also :
command *cmd = malloc(sizeof(command) + desc_length);
Use desc :
cmd->desc[desc_length -1] = '\0';
This hack works only if member is at the end of structure, saves structure size, saves pointer indirection, can be used if array length is structure instance specific.
You have to allocate desc and data separately.
When you allocate your struct command *cmd, memory is allocated for your pointers of decs and data. Desc and data have to be malloced separately.
So allocate your command
command *cmd = malloc(sizeof(command));
then allocate memory for data or desc
example for desc:
cmd->desc = malloc( sizeof(char )*100);
Related
I'm writing an application in C (as a beginner) and I'm struggling with getting corrupted data inside a struct that contains a variable length array. I found similar issues described in forum posts on cprogramming.com and also on cert.og/secure-coding. I thought I'd had found the right solution, but it seems not.
The struct looks like this;
typedef struct {
int a;
int b;
} pair;
typedef struct {
CommandType name;
pair class;
pair instr;
pair p1;
pair p2;
pair p3;
CommandType expected_next;
char* desc;
int size;
pair sw1;
pair sw2;
pair* data;
} command;
With the problematic one being "command". For any given instance (or whatever the correct phrase would be) of "command" different fields would be set, although in most cases the same fields are set albeit in different instances.
The problem I have is when trying to set the expected_next, name, sw1, sw2, size and data fields. And it's the data field that's getting corrupt. I'm allocating memory for the struct like this;
void *command_malloc(int desc_size,int data_size)
{
return malloc(sizeof(command) +
desc_size*sizeof(char) +
data_size*sizeof(pair));
}
command *cmd;
cmd = command_malloc(0, file_size);
But when I (pretty) print the resulting cmd, the middle of the data field appears to be random garbage. I've stepped through with gdb and can see that the correct data is getting loaded into the the field. It appears that it's only when the command gets passed to a different function that it gets corrupted. This code is called inside a function such as;
command* parse(char *line, command *context)
And the pretty-print happens in another function;
void pretty_print(char* line, command* cmd)
I had thought I was doing things correctly, but apparently not. As far as I can tell, I construct other instances of the struct okay (and I duplicated those approaches for this one) but they don't contain any variable length array in them and their pretty-prints looks fine - which concerns me because they might also be broken, but the breakage is less obvious.
What I'm writing is actually a parser, so a command gets passed into the parse function (which describes the current state, giving hints to the parser what to expect next) and the next command (derived from the input "line") is returned. "context" is free-d at the end of the parse function, which the new command getting returned - which would then be passed back into "parse" with the next "line" of input.
Can anyone suggest anything as to why this might be happening?
Many thanks.
When you allocate memory to structure, only a pointer size gets allocated to *desc. You must allocate memory to the space (array contents) desc points to, as someone already pointed out. Purpose of my answer is to show slightly different way of doing that.
Since having a pointer *desc increases structure size by a word (sizeof pointer), you can safely have a variable length array hack in you structure to reduce structure size.
Here's how your structure should look like, notice that desc[] has been pulled down to the end of structure :
typedef struct {
CommandType name;
pair class;
pair instr;
pair p1;
pair p2;
pair p3;
CommandType expected_next;
int size;
pair sw1;
pair sw2;
pair* data;
char desc[];
} command;
Now,
1. Allocate memory for command which includes array size also :
command *cmd = malloc(sizeof(command) + desc_length);
Use desc :
cmd->desc[desc_length -1] = '\0';
This hack works only if member is at the end of structure, saves structure size, saves pointer indirection, can be used if array length is structure instance specific.
You have to allocate desc and data separately.
When you allocate your struct command *cmd, memory is allocated for your pointers of decs and data. Desc and data have to be malloced separately.
So allocate your command
command *cmd = malloc(sizeof(command));
then allocate memory for data or desc
example for desc:
cmd->desc = malloc( sizeof(char )*100);
This question is really about how to use variable-length types in the Python/C API (PyObject_NewVar, PyObject_VAR_HEAD, PyTypeObject.tp_basicsize and .tp_itemsize , but I can ask this question without bothering with the details of the API. Just assume I need to use an array inside a struct.
I can create a list data structure in one of two ways. (I'll just talk about char lists for now, but it doesn't matter.) The first uses a pointer and requires two allocations. Ignoring #includes and error handling:
struct listptr {
size_t elems;
char *data;
};
struct listptr *listptr_new(size_t elems) {
size_t basicsize = sizeof(struct listptr), itemsize = sizeof(char);
struct listptr *lp;
lp = malloc(basicsize);
lp->elems = elems;
lp->data = malloc(elems * itemsize);
return lp;
}
The second way to create a list uses array notation and one allocation. (I know this second implementation works because I've tested it pretty thoroughly.)
struct listarray {
size_t elems;
char data[1];
};
struct listarray *listarray_new(size_t elems) {
size_t basicsize = offsetof(struct listarray, data), itemsize = sizeof(char);
struct listarray *la;
la = malloc(basicsize + elems * itemsize);
la->elems = elems;
return lp;
}
In both cases, you then use lp->data[index] to access the array.
My question is why does the second method work? Why do you declare char data[1] instead of any of char data[], char data[0], char *data, or char data? In particular, my intuitive understanding of how structs work is that the correct way to declare data is char data with no pointer or array notation at all. Finally, are my calculations of basicsize and itemsize correct in both implementations? In particular, is this use of offsetof guaranteed to be correct for all machines?
Update
Apparently this is called a struct hack: In C99, you can use a flexible array member:
struct listarray2 {
size_t elems;
char data[];
}
with the understanding that you'll malloc enough space for data at runtime. Before C99, the data[1] declaration was common. So my question now is why declare char data[1] or char data[] instead of char *data or char data?
The reason you'd declare char data[1] or char data[] instead of char *data or char data is to keep your structure directly serializable and deserializable. This is important in cases where you'll be writing these sorts of structures to disk or over a network socket, etc.
Take for example your first code snippet that requires two allocations. Your listptr type is not directly serializable. i.e. listptr.elems and the data pointed to by listptr.data are not in a contiguous piece of memory. There is no way to read/write this structure to/from disk with a generic function. You need a custom function that is specific to your struct listptr type to do it. i.e. On serialize you'd have to first write elems to disk, and then write the data pointed to by the data pointer. On deserialization you'd have to read elems, allocate the appropriate space to listptr.data and then read the data from disk.
Using a flexible array member solves this problem because listptr.elem and the listptr.data reside in a contiguous memory space. So to serialize it you can simply write out the total allocated size for the structure and then the structure itself. On deserialize you then first read the allocated size, allocate the needed space and then read your listptr struct into that space.
You may wonder why you'd ever really need this, but it can be an invaluable feature. Consider a data stream of heterogeneous types. Provided you define a header that defines the which heterogeneous type you have and its size and precede each type in the stream with this header, you can generically serialize and deserialize data stream very elegantly and efficiently.
The only reason I know of for choosing char data[1] over char data[] is if you are defining an API that needs to be portable between C99 and C++ since C++ does not have support for flexible array members.
Also, wanted to point out that in the char data[1] you can do the following to get the total needed structure size:
size_t totalsize = offsetof(struct listarray, data[elems]);
You also ask why you wouldn't use char data instead of char data[1] or char data[]. While technically possible to use just plain old char data, it would be (IMHO) morally shunned. The two main issues with this approach are:
You wanted an array of chars, but now you can't access the data member directly as an array. You need to point a pointer to the address of data to access it as an array. i.e.
char *as_array = &listarray.data;
Your structure definition (and your code's use of the structure) would be totally misleading to anyone reading the code. Why declare a single char when you really meant an array of char?
Given these two things, I don't know why anyone would use char data in favor of char data[1]. It just doesn't benefit anyone given the alternatives.
currently, I'm facing a problem with my code and my understanding of pointer. here's the code
struct command
{
int type;
int *input;
int *output;
union{
struct command *command[2];
char **word;
}u;
};
to my understanding, the instance struct command *command[2] is an array of pointer to array of command. So I allocate the array with these:
cur_command->u.command[0] = malloc(sizeof(struct command[2]));
So it give me a 2d array of command. However my teacher told me that struct command *command[2] is a pointer to an array command size 2. So cur_command->u.command[0] give the first command element instead of a pointer to a command array size two. My question is, how can I allocate the memory to develop this kind of behavior. thx
First off, I would suggest changing the name of the one variable to, e.g. cmd instead of command to reduce confusion. That is:
....
union{
struct command *cmd[2];
char **word;
}u;
....
Now, as a couple other comments have pointed out, cur_command->u.cmd is an array of two pointers to struct command. cur_command->u.cmd[0] is the first of the two pointers, and cur_command->u.cmd[1] is the second. In order to use either of them, they should be initialized to be pointers to actual struct command objects:
cur_command->u.cmd[0] = malloc(sizeof(struct command));
cur_command->u.cmd[1] = malloc(sizeof(struct command));
Then, you can use either one in the same way you use your cur_command, which is also a pointer to struct command. That is, you can set some of the fields:
cur_command->u.cmd[0]->type = 1;
....
Don't forget to free memory when you're done with it:
free(cur_command->u.cmd[0]);
free(cur_command->u.cmd[1]);
Since the structure is recursive, you may need some recursive code to correctly free all the memory, depending on how deeply you chain these things together...
Also note, that in your posted code (malloc(sizeof(struct command[2]))), the sizeof(...) bit isn't doing what you think it is. I'm not entirely sure it should even compile, as you can't treat a struct as an array like that...
I was wanting a simple string table that will store a bunch of constants and I thought "Hey! Lua does that, let me use some of there functions!"
This is mainly in the lstring.h/lstring.c files (I am using 5.2)
I will show the code I am curious about first. Its from lobject.h
/*
** Header for string value; string bytes follow the end of this structure
*/
typedef union TString {
L_Umaxalign dummy; /* ensures maximum alignment for strings */
struct {
CommonHeader;
lu_byte reserved;
unsigned int hash;
size_t len; /* number of characters in string */
} tsv;
} TString;
/* get the actual string (array of bytes) from a TString */
#define getstr(ts) cast(const char *, (ts) + 1)
/* get the actual string (array of bytes) from a Lua value */
#define svalue(o) getstr(rawtsvalue(o))
As you see, the data is stored outside of the structure. To get the byte stream, you take the size of TString, add 1, and you got the char* pointer.
Isn't this bad coding though? Its been DRILLED into m in my C classes to make clearly defined structures. I know I might be stirring a nest here, but do you really lose that much speed/space defining a structure as header for data rather than defining a pointer value for that data?
The idea is probably that you allocate the header and the data in one big chunk of data instead of two:
TString *str = (TString*)malloc(sizeof(TString) + <length_of_string>);
In addition to having just one call to malloc/free, you also reduce memory fragmentation and increase memory localization.
But answering your question, yes, these kind of hacks are usually a bad practice, and should be done with extreme care. And if you do, you'll probably want to hide them under a layer of macros/inline functions.
As rodrigo says, the idea is to allocate the header and string data as a single chunk of memory. It's worth pointing out that you also see the non-standard hack
struct lenstring {
unsigned length;
char data[0];
};
but C99 added flexible array members so it can be done in a standard compliant way as
struct lenstring {
unsigned length;
char data[];
};
If Lua's string were done in this way it'd be something like
typedef union TString {
L_Umaxalign dummy;
struct {
CommonHeader;
lu_byte reserved;
unsigned int hash;
size_t len;
const char data[];
} tsv;
} TString;
#define getstr(ts) (ts->tsv->data)
It relates to the complications arising from the more limited C language. In C++, you would just define a base class called GCObject which contains the garbage collection variables, then TString would be a subclass and by using a virtual destructor, both the TString and it's accompanying const char * blocks would be freed properly.
When it comes to writing the same kind of functionality in C, it's a bit more difficult as classes and virtual inheritance do not exist.
What Lua is doing is implementing garbage collection by inserting the header required to manage the garbage collection status of the part of memory following it. Remember that free(void *) does not need to know anything other than the address of the memory block.
#define CommonHeader GCObject *next; lu_byte tt; lu_byte marked
Lua keeps a linked list of these "collectable" blocks of memory, in this case an array of characters, so that it can then free the memory efficiently without knowing the type of object it is pointing to.
If your TString pointed to another block of memory where the character array was, then it require the garbage collector determine the object's type, then delve into its structure to also free the string buffer.
The pseudo code for this kind of garbage collection would be something like this:
GCHeader *next, *prev;
GCHeader *current = firstObject;
while(current)
{
next = current->next;
if (/* current is ready for deletion */)
{
free(current);
// relink previous to the next (singly-linked list)
if (prev)
prev->next = next;
}
else
prev = current; // store previous undeleted object
current = next;
}
NOTE: I've re written the original question to make it much more clear.
I have a function called
VcStatus readVcard( FILE *const vcf, Vcard **const cardp )
vcf is an open file I will read, and cardp is a pointer to the start of an array of cards.
a file will have multiple cards in it.
readVCard reads the file a line at a time, and calls the function parseVcProp to indentify keywords in the line, and assign them to the appropriate place in a structure.
Here are the structures
typedef struct { // property (=contentline)
VcPname name; // property name
// storage for 0-2 parameters (NULL if not present)
char *partype; // TYPE=string
char *parval; // VALUE=string
char *value; // property value string
void *hook; // reserved for pointer to parsed data structure
} VcProp;
typedef struct { // single card
int nprops; // no. of properties
VcProp prop[]; // array of properties
} Vcard;
typedef struct { // vCard file
int ncards; // no. of cards in file
Vcard **cardp; // pointer to array of card pointers
} VcFile;
So a file contains multiple cards, a card contains multiple properties, etc.
The thing is, a single card can any have number of properties. It is not known how many until you are done reading them.
Here is what I do not understand.
How must I allocate the memory to use parseVcProp properly?
Each time I call parseVcProp, i obviously want it to be storing the data in a new structure, so how do i allocate this memory before hand? Do i just malloc(sizeof(VcProp)*1)?
Vcard *getcards(int n) {
Vcard *c = malloc(sizeof(Vcard) + sizeof(VcProp) * n);
c->nprops = n;
return c;
}
You really need to show us the particular line that's producing the error.
With that said, for a structure like vcard that contains a flexible array member, you cannot create variables of that type. You can only create pointer variables. For instance:
vcard *vc = malloc(sizeof(vcard) + n*sizeof(VcProp));
At this point, vc->prop[0] through vc->prop[n-1] are valid array elements (each has type VcProp).
Note that a flexible array member is an array, not a pointer.
Sorry for the confusion everyone.
I figured out my error.
The reason things were going wacky is because propp is an output pointer, not a input pointer
I was trying to use Vcard->prop as a passing argument, when I actually had to just create my own, and send the address of it.