How to use zero length array in C

How to use zero length array in C - c

We can initialize a struct with zero length array as specified in the link:
Zero-Length.
I'm using the following structures:
typedef unsigned char UINT8;
typedef unsigned short UINT16;
typedef struct _CommandHeader
{
UINT16 len;
UINT8 payload[0];
} CommandHeader;
typedef struct _CmdXHeader
{
UINT8 len;
UINT8 payload[0];
} CmdXhHeader;
Now the CommandHeader.payload should point / contain to CmdXHeader struct. i.e. memory should look like:
-------------------------------------------------------------
| CommandHeader.len | CmdXHeader.len | CmdXHeader.payload ....|
-------------------------------------------------------------
I can easily malloc CmdXHeader / CommandHeader to customized length. But how to assign value to CmdXHeader payload or how to link a CmdXHeader object to the CommandHeader.payload?
My Solution
Thanks for all the reply. I solved it in the following way:
//Get the buffer for CmdXHeader:
size_t cmdXHeader_len = sizeof(CmdXHeader) + custom_len;
CmdXHeader* cmdXHeader = (CmdXHeader*) malloc(cmdXHeader_len);
//Get a temporary pointer and assign the data to it
UINT8* p;
p[0] = 1;
p[2] = 2;
.......
//Now copy the memory of p to cmdXHeader
memcopy(cmdHeader->payload, p, custom_len);
// allocate the buffer for CommandHeader
CommandHeader* commandHeader = (CommandHeader*) malloc (sizeof (CommandHeader) + cmdXHeader_len);
// populate the fields in commandHeader
commandHeader->len = custom_len;
memcpy(commandHeader->payload, cmdXHeader, cmdXHeader_len);
Now the commandHeader object have the desired memory and we can typecast with whatever way we want...

A zero-length array at the end of a struct, or anywhere else, is actually illegal (more precisely a constraint violation) in standard C. It's a gcc-specific extension.
It's one of several forms of the "struct hack". A slightly more portable way to do it is to define an array of length 1 rather than 0.
Dennis Ritchie, creator of the C language, has called it "unwarranted chumminess with the C implementation".
The 1999 revision of the ISO C Standard introduced a feature called the "flexible array member", a more robust way to do this. Most modern C compilers support this feature (I suspect Microsoft's compiler doesn't, though).
This is discussed at length in question 2.6 of the comp.lang.c FAQ.
As for how you access it, whichever form you use, you can treat it like you'd treat any array. The name of the member decays to a pointer in most contexts, allowing you to index into it. As long as you've allocated enough memory, you can do things like:
CommandHeader *ch;
ch = malloc(computed_size);
if (ch == NULL) { /* allocation failed, bail out */ }
ch.len = 42;
ch.payload[0] = 10;
ch.payload[1] = 20;
/* ... */
Obviously this is only a rough outline.
Note that sizeof, when applied to the type CommandHeader or an object of that type, will give you a result that does not include the flexible array member.
Note also that identifiers starting with underscores are reserved to the implementation. You should never define such identifiers in your own code. There's no need to use distinct identifiers for the typedef name and the struct tag:
typedef struct CommandHeader
{
UINT16 len;
UINT8 payload[0];
} CommandHeader;
I'd also suggest using the standard types uint16_t and uint8_t, defined in <stdint.h> (assuming your compiler supports it; it's also new in C99).
(Actually the rules for identifiers starting with underscores are slightly more complex. Quoting N1570, the latest draft of the standard, section 7.1.3:
All identifiers that begin with an underscore and either an uppercase letter or another
underscore are always reserved for any use.
All identifiers that begin with an underscore are always reserved for use as identifiers
with file scope in both the ordinary and tag name spaces.
And there are several more classes of reserved identifiers.
But rather than working out which identifiers are safe to use at file scope and which are safe to use in other scopes, it's much easier just to avoid defining any identifiers that start with an underscore.)

I assume you've got some bytes in memory and you want to find the pointer to payload?
typedef struct _CmdXHeader
{
UINT8 len;
UINT8* payload;
} CmdXhHeader;
typedef struct _CommandHeader
{
UINT16 len;
CmdXhHeader xhead;
} CommandHeader;
You could then cast your memory to a pointer to CommandHeader
uint8_t* my_binary_data = { /* assume you've got some data */ };
CommandHeader* cmdheader = (CommandHeader*) my_binary_data;
// access the data
cmdheader->xhead.payload[0];
IMPORTANT! Unless you pack your struct, it will probably align on word boundaries and not be portable. See your compiler docs for specific syntax on how to pack the struct.
Also, I'd only do what you've shown if you are consuming bytes (i.e. read from a file, or from a wire). IF you are the creator of the data, then I would heartily recommend against what you've shown.

struct _CommandHeader *commandHeader = malloc(sizeof(struct _CommandHeader)+
sizeof(struct _CmdXHeader));

Better to use payload[] instead of payload[0] in C99.
Some of C99 compilers discourage usage of zero length array.
So, in case you get an error here :
typedef struct CommandHeader
{
UINT16 len;
UINT8 payload[0];
} CommandHeader;
you can always correct it as :
typedef struct CommandHeader
{
UINT16 len;
UINT8 payload[];
} CommandHeader;

Related

May I assume that struct fields are placed in order and with no padding?

Frankly, is such a code valid or does it invoke undefined behavior?
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
struct two_values
{
int some;
char value;
};
int main(void) {
int some = 5;
char value = 'a';
unsigned char *data = malloc(sizeof(struct two_values));
memcpy(data, &some, sizeof(int));
memcpy(data+sizeof(int), &value, sizeof(char));
struct two_values dest;
memcpy(&dest, data, sizeof(struct two_values));
printf("some = %d, value = %c\n", dest.some, dest.value);
return 0;
}
http://ideone.com/4JbrP9
Can I just put the binary representation of two struct field together and reinterpret this as the whole struct?

You had better to not disturb the internal compiler doings in your code, as it would lead you to incorrect code and undefined behaviour. You can switch compilers, or just updating the version of your favourite, and run into trouble.
The best way to solve the thing you show of having two variables and to store them properly in the struct fields is to use properly the types provided by C, and use a pointer typed to the proper type. If you use
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
struct two_values
{
int some;
char value;
};
int main(void) {
int some = 5;
char value = 'a';
/* next instead of unsigned char *data = malloc(sizeof(struct two_values)); */
struct two_values *data = malloc(sizeof(struct two_values));
/* next instead of memcpy(data, &some, sizeof(int)); */
data->some = some;
/* next instead of memcpy(data+sizeof(int), &value, sizeof(char)); */
data->value = value;
struct two_values dest;
/* next instead of memcpy(&dest, data, sizeof(struct two_values)); */
dest = *data;
printf("some = %d, value = %c\n", dest.some, dest.value);
return 0;
}
You'll avoid all compiler alignment issues. It is always possible to do it with the language operators & (address of) and * (points to) or -> (field of struct pointed to).
Anyway, if you prefer the memcpy approach (no idea of why, but you are on your way, anyway) you can substitute:
data->some = some;
...
data->value = value;
...
dest = *data;
by
memcpy(&data->some, &some, sizeof data->some);
...
memcpy(&data->value, &value, sizeof data->value);
...
memcpy(&dest, data, sizeof dest);
And that will take internally the alignments that the compiler could make by itself.
All compilers have defined some pragma, or keyword, to control alignment. This is also nonportable, as you can switch compilers and get to the issue of having to change the way you expressed things. C11 has some standard means to control for packed structs and use no alignment in the compiler. This is done mainly when you have to serialize some structure and don't want to deal with holes on it. Look at the C11 specs for that.
Serializing structs is not completely solved by just making them packed, as normally you have to deal with the serialized representations of integer, floating point or char data (which can or cannot coincide with the internal representation used by the compiler) so you again face the problem of being compiler agnostic and have to think twice before using externally the internal representation of data.
My recomendation anyway, is never trust how the compiler stores data internally.

The padding is determined by the compiler. The order is guaranteed. If you need something similar to your code above, I would recommend the offsetof-macro in <stddef.h>.
memcpy(data + offsetof(struct two_values, value), &value, sizeof(char));
Or without explicitly adding the offset at all:
memcpy(&data->value, &value, sizeof(char));

It depend on how your structure is aligned. You can check by verifying sizeof(two_values), if it comes 5(assuming sizeof int is 4), you probably are ok.
If its more than that it implies filler bytes are inserted in your structure to align each element of your structure at correct byte boundry

May I assume that struct fields are placed in order
Yes, this is guaranteed by the standard. C11 6.2.5/20:
a structure is a type consisting of a sequence of members, whose
storage is allocated in an ordered sequence
and with no padding?
No, you cannot assume this. C11 6.7.1/15:
Within a structure object, the non-bit-field members and the units in which bit-fields reside have addresses that increase in the order in which they are declared. /--/
There may be unnamed padding within a structure object, but not at its beginning.
Padding and alignment are implementation-defined behavior.
You are however guaranteed that two structs of the same type have the same padding. Copying from a struct to another struct of same type, as in your example, is safe and well-defined.

Setting the size of an array inside a struct with a value of another value within the same struct, in C

struct {
uint16 msg_length;
uint8 msg_type;
ProtocolVersion version;
uint16 cipher_spec_length;
uint16 session_id_length;
uint16 challenge_length;
V2CipherSpec cipher_specs[V2ClientHello.cipher_spec_length];
opaque session_id[V2ClientHello.session_id_length];
opaque challenge[V2ClientHello.challenge_length;
} V2ClientHello;
Is it possible to do something similar to the above (https://www.rfc-editor.org/rfc/rfc5246)? If so how do I go about coding this inside C?
To be more specific this line in the struct:
V2CipherSpec cipher_specs[V2ClientHello.cipher_spec_length];
Uses:
> V2ClientHello.cipher_spec_length
Which is defined in the same struct, to set the length of the array.

C does not support dynamic-sized arrays. To achieve your goal, you can use a pointer of type V2CipherSpec as structure variable and allocate memory at a later stage using the V2ClientHello.cipher_spec_length value.

Absolutely not. C does not have dynamic-size arrays. Instead, we can rely on tricks like this:
struct {
uint16 msg_length;
uint8 msg_type;
ProtocolVersion version;
uint16 cipher_spec_length;
uint16 session_id_length;
uint16 challenge_length;
char extra[0]; // or 1 if your compiler hates this
} V2ClientHello;
Then, do not create instances of this struct directly, but rather via malloc():
struct V2ClientHello* hello = malloc(sizeof(V2ClientHello) +
len1*sizeof(V2CipherSpec) + len2 + len3);
Now you have a dynamically-allocated structure of the size you need. You can make accessor functions to get the "extra" fields:
V2CipherSpec* get_spec(V2ClientHello* hello, int idx) {
assert(idx < hello->cipher_spec_length);
return ((V2CipherSpec*)&hello->extra)[idx];
}
And of course you can wrap up the malloc() call inside a create routine which takes the sizes of all three dynamic parts and does everything in one place for robustness.

The shown code from the RFC is pseudo code, you can not implement it as shown.
You need to allocate these arrays manually depending on V2ClientHello.cipher_spec_length and the other length specification fields, once their values are known.

Value of "V2ClientHello.cipher_spec_length" is not available at compile time. You can not specify the size of an array at run time, instead use:
V2CipherSpec *cipher_specs;
in struct and use malloc or calloc to allocate a block of memory at run time.
V2ClientHello.cipher_specs = (V2CipherSpec *)malloc(V2ClientHello.cipher_spec_length);

how to use flexible array in C to keep several values?

I have the following code:
typedef struct
{
int name;
int info[1];
} Data;
then I have five variables:
int a, b, c, d, e;
how can I use this as a flexible array to keep all the values of the five variables?

To do this properly, you should declare the flexible array member as an incomplete type:
typedef struct
{
int name;
int info[];
} Data;
Then allocate memory for it dynamically with
Data* data = malloc(sizeof(Data) + sizeof(int[N]));
for(int i=0; i<N; i++)
{
data->info[i] = something; // now use it just as any other array
}
EDIT
Ensure that you are using a C99 compiler for this to work, otherwise you will encounter various problems:
If you allocate an array of length 1, then you will malloc 1 item for the first element of the array together with the struct, and then append N bytes after that. Meaning you are actually allocating N+1 bytes. This is perhaps not what one intended to do, and it makes things needlessly complicated.
(To solve the above problem, GCC had a pre-C99 extension that allowed zero-length arrays, which isn't allowed in standard C.)
Pre-C99, or in any other context than as a flexible array member, C doesn't allow incomplete array types as the one shown in my code.
C99 guarantees that your program is well-defined when using a flexible array member. If you don't use C99, then the compiler might append "struct padding" bytes between the other struct members and the array at the end. Meaning that data->info[0] could point at a struct padding byte and not at the first item in your allocated array. This can cause all kinds of weird, unexpected behavior.
This is why flexible array members were called "struct hack" before C99. They weren't reliable, just a dirty hack which may or may not work.

That kind of structure is a somewhat common idiom in C; the idea is that you allocate extra space at the end of the struct, where the elements of info after the first are actually stored. The size-1 array member at the end of the struct then allows you to use array syntax to access this data.
If you want to store 5 elements you'll have to do:
Data * data=malloc(sizeof(Data)+sizeof(int)*4); /* 4 because the first element is
already included in the size of
the struct */
/* error checking omitted ... */
data->info[0]=a;
data->info[1]=b;
data->info[2]=c;
data->info[3]=d;
data->info[4]=e;
/* ... */
/* when you don't need d anymore remember to deallocate */
free(data);
You may also write a helper function to ease the allocation:
Data * AllocateData(size_t elements)
{
if(elements==0)
return NULL;
return malloc(sizeof(Data)+sizeof(int)*(elements-1));
}
and the example above would be
Data * data=AllocateData(5);
/* then as above */

This is called flexible arrays and was introduced in C99. Often called a struct hack too.
In C99, the flexible array member should be declared without a size.
You need to dynamically allocate memory that can hold more memory than the size of the struct.
As the array is the last member in the struct, you can index it past its size, provided you allocated enough memory for it.
typedef struct
{
int name;
int info[1];
} Data;
Data *d = malloc(sizeof(*d) + (5 * sizeof(int)); //enough for the struct and 5 more ints.
//we have enough room for 6 elements in the info array now
//since the struct has room for 1 element, and we allocated room for another 5 ints
d->info[0] = 1;
d->info[1] = 2;
d->info[2] = 3;
d->info[3] = 4;
d->info[4] = 5;
d->info[5] = 6;
Using an array member with 1 size int info[1]; in this manner is technically undefined behavior - but will work fine on many popular compilers. With a C99 compiler this is supported by a flexible array member declared as int info[];. Read more here

C struct size alignment

I want the size of a C struct to be multiple of 16 bytes (16B/32B/48B/..).
It does not matter which size it gets to; it only needs to be multiple of 16 bytes.
How could I enforce the compiler to do that?

For Microsoft Visual C++:
#pragma pack(push, 16)
struct _some_struct
{
...
}
#pragma pack(pop)
For GCC:
struct _some_struct { ... } __attribute__ ((aligned (16)));
Example:
#include <stdio.h>
struct test_t {
int x;
int y;
} __attribute__((aligned(16)));
int main()
{
printf("%lu\n", sizeof(struct test_t));
return 0;
}
compiled with gcc -o main main.c will output 16. The same goes for other compilers.

The size of a C struct will depend on the members of the struct, their types and how many of them there are. There is really no standard way to force the compiler to make structs to be a multiple of some size. Some compilers provide a pragma that will allow you to set the alignment boundary however that is really a different thing. And there may be some that would have such a setting or provide such a pragma.
However if you insist on this one method would be to do memory allocation of the struct and to force the memory allocation to round up to the next 16 byte size.
So if you had a struct like this.
struct _simpleStruct {
int iValueA;
int iValueB;
};
Then you could do something like the following.
{
struct _simpleStruct *pStruct = 0;
pStruct = malloc ((sizeof(*pStruct)/16 + 1)*16);
// use the pStruct for whatever
free(pStruct);
}
What this would do is to push the size up to the next 16 byte size so far as you were concerned. However what the memory allocator does may or may not be to give you a block that is actually that size. The block of memory may actually be larger than your request.
If you are going to do something special with this, for instance lets say that you are going to write this struct to a file and you want to know the block size then you would have to do the same calculation used in the malloc() rather than using the sizeof() operator to calculate the size of the struct.
So the next thing would be to write your own sizeof() operator using a macro such as.
#define SIZEOF16(x) ((sizeof(x)/16 + 1) * 16)
As far as I know there is no dependable method for pulling the size of an allocated block from a pointer. Normally a pointer will have a memory allocation block that is used by the memory heap management functions that will contain various memory management information such as the allocated block size which may actually be larger than the requested amount of memory. However the format for this block and where it is located relative to the actual memory address provided will depend on the C compiler's run time.

This depends entirely on the compiler and other tools since alignment is not specified that deeply in the ISO C standard (it specifies that alignment may happen at the compilers behest but does not go into detail as to how to enforce it).
You'll need to look into the implementation-specific stuff for your compiler toolchain. It may provide a #pragma pack (or align or some other thing) that you can add to your structure defininition.
It may also provide this as a language extension. For example, gcc allows you to add attributes to a definition, one of which controls alignment:
struct mystruct { int val[7]; } __attribute__ ((aligned (16)));

You could perhaps do a double struct, wrapping your actual struct in a second one that can add padding:
struct payload {
int a; /*Your actual fields. */
float b;
char c;
double d;
};
struct payload_padded {
struct payload p;
char padding[16 * ((sizeof (struct payload) + 15) / 16)];
};
Then you can work with the padded struct:
struct payload_padded a;
a.p.d = 43.3;
Of course, you can make use of the fact that the first member of a structure starts 0 bytes from where the structure starts, and treat a pointer to struct payload_padded as if it's a pointer to a struct payload (because it is):
float d_plus_2(const struct payload *p)
{
return p->d + 2;
}
/* ... */
struct payload_padded b;
const double dp2 = d_plus_2((struct payload *) &b);

Why does internal Lua strings store the way they do?

I was wanting a simple string table that will store a bunch of constants and I thought "Hey! Lua does that, let me use some of there functions!"
This is mainly in the lstring.h/lstring.c files (I am using 5.2)
I will show the code I am curious about first. Its from lobject.h
/*
** Header for string value; string bytes follow the end of this structure
*/
typedef union TString {
L_Umaxalign dummy; /* ensures maximum alignment for strings */
struct {
CommonHeader;
lu_byte reserved;
unsigned int hash;
size_t len; /* number of characters in string */
} tsv;
} TString;
/* get the actual string (array of bytes) from a TString */
#define getstr(ts) cast(const char *, (ts) + 1)
/* get the actual string (array of bytes) from a Lua value */
#define svalue(o) getstr(rawtsvalue(o))
As you see, the data is stored outside of the structure. To get the byte stream, you take the size of TString, add 1, and you got the char* pointer.
Isn't this bad coding though? Its been DRILLED into m in my C classes to make clearly defined structures. I know I might be stirring a nest here, but do you really lose that much speed/space defining a structure as header for data rather than defining a pointer value for that data?

The idea is probably that you allocate the header and the data in one big chunk of data instead of two:
TString *str = (TString*)malloc(sizeof(TString) + <length_of_string>);
In addition to having just one call to malloc/free, you also reduce memory fragmentation and increase memory localization.
But answering your question, yes, these kind of hacks are usually a bad practice, and should be done with extreme care. And if you do, you'll probably want to hide them under a layer of macros/inline functions.

As rodrigo says, the idea is to allocate the header and string data as a single chunk of memory. It's worth pointing out that you also see the non-standard hack
struct lenstring {
unsigned length;
char data[0];
};
but C99 added flexible array members so it can be done in a standard compliant way as
struct lenstring {
unsigned length;
char data[];
};
If Lua's string were done in this way it'd be something like
typedef union TString {
L_Umaxalign dummy;
struct {
CommonHeader;
lu_byte reserved;
unsigned int hash;
size_t len;
const char data[];
} tsv;
} TString;
#define getstr(ts) (ts->tsv->data)

It relates to the complications arising from the more limited C language. In C++, you would just define a base class called GCObject which contains the garbage collection variables, then TString would be a subclass and by using a virtual destructor, both the TString and it's accompanying const char * blocks would be freed properly.
When it comes to writing the same kind of functionality in C, it's a bit more difficult as classes and virtual inheritance do not exist.
What Lua is doing is implementing garbage collection by inserting the header required to manage the garbage collection status of the part of memory following it. Remember that free(void *) does not need to know anything other than the address of the memory block.
#define CommonHeader GCObject *next; lu_byte tt; lu_byte marked
Lua keeps a linked list of these "collectable" blocks of memory, in this case an array of characters, so that it can then free the memory efficiently without knowing the type of object it is pointing to.
If your TString pointed to another block of memory where the character array was, then it require the garbage collector determine the object's type, then delve into its structure to also free the string buffer.
The pseudo code for this kind of garbage collection would be something like this:
GCHeader *next, *prev;
GCHeader *current = firstObject;
while(current)
{
next = current->next;
if (/* current is ready for deletion */)
{
free(current);
// relink previous to the next (singly-linked list)
if (prev)
prev->next = next;
}
else
prev = current; // store previous undeleted object
current = next;
}