How's flexible array implemented in c?

How's flexible array implemented in c? - c

..
char arKey[1]; } Bucket;
The above is said to be flexible array,how?

Often the last member of a struct is given a size of 0 or 1 (despite 0 being against the standard pre-C99, it's allowed in some compilers as it has great value as a marker). As one would not normally create an array of size 0 or 1, this indicates to fellow coders that the field is used as the start of a variably sized array, extending from the final member into any available memory.
You may also find a member of the struct defining the exact length of the flexible array, just as you often find a member that contains the total size in bytes of the struct.
Links
http://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
Is using flexible array members in C bad practice?
http://msdn.microsoft.com/en-us/library/6zxfydcs(VS.71).aspx
http://blogs.msdn.com/b/oldnewthing/archive/2004/08/26/220873.aspx
Example
typedef struct {
size_t len;
char arr[];
} MyString;
size_t mystring_len(MyString const *ms) { return ms->len; }
MyString *mystring_new(char const *init)
{
size_t len = strlen(init);
MyString *rv = malloc(sizeof(MyString) + len + 1);
rv->len = len;
strncpy(rv->arr, init, len);
return rv;
}

Flexible arrays are supposed to have a length of 0 in C99. Using a size of 1 is C90 and is now deprecated.
Basically, such flexible arrays are created by invoking malloc with sizeof(Bucket) + array_length, where array_length is the desired size of your array. Then, dereferencing the arKey pointer (which must be the last member of your structure) will result in that extra memory being accessed, effectively implementing variable-sized objects.
See this page for more information:
http://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html

Related

Is there a way to create a dynamic type assignment in C

I am working to create a set of functions in C that will allow a dynamically growing array. In this example I have create a struct with a variable titled len that stores the active length of the array, another variable titled size that stores the total length of the array assigned during initialization, and another variable titled array which is a pointer to the memory containing the array data. In this example the variable array is initialized in the struct as an integer. Within the function titled int_array I initialize the array and and return the struct. Within that function I call the init_int_array function that does the heavy lifting. In addition, I have another function titled append_int_array that checks the memory allocation and assigns another chunk of memory if necessary and then appends the array with a new index/variable. As you can see, this example is hard coded for an integer, and I will need to repeat these lines of code for every other data type if I want an array to contain that type of data. There has got to be a way to instantiate the struct so that the variable array can be a different data type so that I do not have to repeat all lines of code for every data type, but I am not sure what that method is. Any help would be appreciated. The code is shown below. NOTE: I also have a function to free the array memory after use, but I am omitting it since it is not relevant to the question.
array.h
#ifndef ARRAY_H
#define ARRAY_H
#include<stdlib.h>
#include<stdio.h>
typedef struc
{
int *array;
size_t len;
size_t size;
}Array;
void init_int_array(Array, size_t num_indices);
Array int_array(size_t num_indices);
void append_int_array(Array *array, int item);
#endif /* ARRAY_H */
Array.c
void init_int_array(Array *array, size_t num_indices) {
/* This function initializes the array with a guess for
the total array size (i.e. num_indices)
*/
int *int_pointer;
int_pointer = (int *)malloc(num_indices * sizeof(int));
if (int_pointer == NULL) {
printf("Unable to allocate memory, exiting.\n");
free(int_pointer);
exit(0);
}
else {
array->array = int_pointer;
array->len = 0;
array->size = num_indices;
}
}
Array int_array(size_t num_indices) {
/* This function calls init_int_array to initialize
the array and returns a struct containing the array
*/
Array array;
init_int_array(&array, num_indices);
return array;
}
void append_int_array(Array *array, int item) {
/* This function adds a data point/index to the array
and also doubles the memory allocation if necessary
to incorporate the new data point.
*/
array->len++;
if (array->len == array->size){
array->size *= 2;
int *int_pointer;
int_pointer = (int *)realloc(array->array, array->size * sizeof(int));
if (int_pointer == NULL) {
printf("Unable to reallocate memory, exiting.\n");
free(int_pointer);
exit(0);
}
else {
array->array = int_pointer;
array->array[array->len - 1] = item;
}
}
else
array->array[array->len - 1] = item;
}

A simple solution is rewrite your header like this:
typedef struct
{
void *array; // buffer
size_t len; // amount used
size_t elem; // size of element
size_t size; // size of buffer
} Array;
void init_array(Array *, size_t num_indices, size_t elem);
Array array(size_t num_indices, size_t elem);
void append_array(Array *array, void *item);
The changes to your code would be as follows:
Remove references to int in the name.
Make all inputs be to arbitrary type using void *.
Use array.elem instead of sizeof(int).
The biggest change is that elements to append will be passed by pointer, not by value.
Cast the buffer to whatever type you need to access elements.
Cast the buffer to char * internally to do pointer math on it.
Here is a sample calling sequence you could use:
Array buf = array(10, sizeof(int));
for(int i = 0; i < 3; i++) {
append_array(&buf, &i); // Remember that buf knows sizeof(int)
}
printf("Second element (of %d) is %d\n", buf->len, ((int *)buf->array)[1]);

C is a strongly- and statically-typed language without polymorphism, so in fact no, there is no language-supported form of dynamic typing. Every object you declare, every function parameter, every struct and union member, every array element has a specific type declared in your source code.
Some of the things you can do:
use a typedef or a preprocessor macro to provide indirection of the data type in question. That would allow you to have (lexically) one structure type and one set of support functions that provide for your dynamically-adjustable array to have any one element type of the user's choice, per program.
use preprocessor macros to template the structure type and support functions so that users can get separate versions for any and all element types they want. This might be usefully combined with _Generic selection.
define and use a union type for use as the array's element type, allowing use of any of the union's members' types. With a little more work, this can be made a tagged union, so that objects of different types in the same array could be supported. The cost, however, is wasted space and worse memory efficiency when you use members having smaller types.
use void * or maybe uintmax_t or unsigned char[some_largish_number] as the element type, and implement conversions to and from that type. This has some of the disadvantages of the union alternative, plus some complications surrounding the needed conversions. Also, there is no type that can be guaranteed large enough to accommodate all other data types. Nor even all built-in data types, though this is a more realistic goal.
use void as the formal element type (possible only with dynamic allocation and pointers, not with an array-style declaration). Add a separate member that recoirds the actual size of the elements. Implement wrappers / conversions that support use of that underlying structure in conjunction with various complete data types. This is described in more detail in another answer.

Multiple structures in a single malloc invoking undefined behaviour?

From Use the correct syntax when declaring a flexible array member it says that when malloc is used for a header and flexible data when data[1] is hacked into the struct,
This example has undefined behavior when accessing any element other
than the first element of the data array. (See the C Standard, 6.5.6.)
Consequently, the compiler can generate code that does not return the
expected value when accessing the second element of data.
I looked up the C Standard 6.5.6, and could not see how this would produce undefined behaviour. I've used a pattern that I'm comfortable with, where the header is implicitly followed by data, using the same sort of malloc,
#include <stdlib.h> /* EXIT malloc free */
#include <stdio.h> /* printf */
#include <string.h> /* strlen memcpy */
struct Array {
size_t length;
char *array;
}; /* +(length + 1) char */
static struct Array *Array(const char *const str) {
struct Array *a;
size_t length;
length = strlen(str);
if(!(a = malloc(sizeof *a + length + 1))) return 0;
a->length = length;
a->array = (char *)(a + 1); /* UB? */
memcpy(a->array, str, length + 1);
return a;
}
/* Take a char off the end just so that it's useful. */
static void Array_to_string(const struct Array *const a, char (*const s)[12]) {
const int n = a->length ? a->length > 9 ? 9 : (int)a->length - 1 : 0;
sprintf(*s, "<%.*s>", n, a->array);
}
int main(void) {
struct Array *a = 0, *b = 0;
int is_done = 0;
do { /* Try. */
char s[12], t[12];
if(!(a = Array("Foo!")) || !(b = Array("To be or not to be."))) break;
Array_to_string(a, &s);
Array_to_string(b, &t);
printf("%s %s\n", s, t);
is_done = 1;
} while(0); if(!is_done) {
perror(":(");
} {
free(a);
free(b);
}
return is_done ? EXIT_SUCCESS : EXIT_FAILURE;
}
Prints,
<Foo> <To be or >
The compliant solution uses C99 flexible array members. The page also says,
Failing to use the correct syntax when declaring a flexible array
member can result in undefined behavior, although the incorrect syntax
will work on most implementations.
Technically, does this C90 code produce undefined behaviour, too? And if not, what is the difference? (Or the Carnegie Mellon Wiki is incorrect?) What is the factor on the implementations this will not work on?

This should be well defined:
a->array = (char *)(a + 1);
Because you create a pointer to one element past the end of an array of size 1 but do not dereference it. And because a->array now points to bytes that do not yet have an effective type, you can use them safely.
This only works however because you're using the bytes that follow as an array of char. If you instead tried to create an array of some other type whose size is greater than 1, you could have alignment issues.
For example, if you compiled a program for ARM with 32 bit pointers and you had this:
struct Array {
int size;
uint64_t *a;
};
...
Array a = malloc(sizeof *a + (length * sizeof(uint64_t)));
a->length = length;
a->a= (uint64_t *)(a + 1); // misaligned pointer
a->a[0] = 0x1111222233334444ULL; // misaligned write
Your program would crash due to a misaligned write. So in general you shouldn't depend on this. Best to stick with a flexible array member which the standard guarantees will work.

As an adjunct to #dbush good answer, a way to get around alignment woes is to use a union. This insures &p[1] is properly aligned for (uint64_t*)1. sizeof *p includes any needed padding vs. sizeof *a.
union {
struct Array header;
uint64_t dummy;
} *p;
p = malloc(sizeof *p + length*sizeof p->header->array);
struct Array *a = (struct Array *)&p[0]; // or = &(p->header);
a->length = length;
a->array = (uint64_t*) &p[1]; // or &p[1].dummy;
Or go with C99 and flexible array member.
1 As well as struct Array

Before the publication of C89, there were some implementations that would attempt to identify and trap upon out-of-bounds array accesses. Given something like:
struct foo {int a[4],b[4];} *p;
such implementations would squawk at an effort to access p->a[i] if i wasn't in the range 0 to 3. For programs that don't need to index the address of array-type lvalue p->a to access anything outside that array, being able to trap on such out-of-bounds accesses would be useful.
The authors of C89 were also almost certainly aware that it was common for programs to use the address of dummy-sized array at the end of a structure as a means of accessing storage beyond the structure. Using such techniques made it possible to do things that couldn't be done nearly as nicely otherwise, and part of the Spirit of C, according to the authors of the Standard, is "Don't prevent the programmer from doing what needs to be done".
Consequently, the authors of the Standard treated such accesses as something which implementations could support or not, at their leisure, presumably based upon what would be most useful for their customers. While it would often be helpful for implementations which would normally bounds-check accesses to structures in an array, to provide an option to omit such checks in cases where the last item of an indirectly-accessed structure is an array with one element (or, if they extend the language to waive a compile-time constraint, zero elements), people writing such implementations would presumably be capable of recognizing such things without the authors of the Standard having to tell them. The notion that "Undefined Behavior" was intended as some form of prohibition doesn't seem to have really taken hold until after the publication of C89's successor standard.
With regard to your example, having a pointer within a struct point to later storage in the same allocation should work, but with a couple of caveats:
If the allocation is passed to realloc, the pointer within it will become invalid.
The only real advantage of using a pointer versus a flexible array member is that it allows for the possibility of having it point somewhere else. That may be good if the only kind of "something else" will always be an constant object of static duration that never has to be freed, or perhaps if it is some other kind of object that won't have to be freed, but may be problematical if it could hold the only reference to something stored in a separate allocation.
Flexible array members have been available as an extension in some compilers before C89 was written, and were officially added in C99. Any decent compiler should support them.

You can define struct Array as:
struct Array
{
size_t length;
char array[1];
}; /* +(length + 1) char */
then malloc( sizeof *a + length ). The "+1" element is in array[1] member. Fill structure with:
a->length = length;
strcpy( a->array, str );

How to allocate memory for an array and a struct in one malloc call without breaking strict aliasing?

When allocating memory for a variable sized array, I often do something like this:
struct array {
long length;
int *mem;
};
struct array *alloc_array( long length)
{
struct array *arr = malloc( sizeof(struct array) + sizeof(int)*length);
arr->length = length;
arr->mem = (int *)(arr + 1); /* dubious pointer manipulation */
return arr;
}
I then use the arrray like this:
int main()
{
struct array *arr = alloc_array( 10);
for( int i = 0; i < 10; i++)
arr->mem[i] = i;
/* do something more meaningful */
free( arr);
return 0;
}
This works and compiles without warnings. Recently however, I read about strict aliasing. To my understanding, the code above is legal with regard to strict aliasing, because the memory being accessed through the int * is not the memory being accessed through the struct array *. Does the code in fact break strict aliasing rules? If so, how can it be modified not to break them?
I am aware that I could allocate the struct and array separately, but then I would need to free them separately too, presumably in some sort of free_array function. That would mean that I have to know the type of the memory I am freeing when I free it, which would complicate code. It would also likely be slower. That is not what I am looking for.

The proper way to declare a flexible array member in a struct is as follows:
struct array {
long length;
int mem[];
};
Then you can allocate the space as before without having to assign anything to mem:
struct array *alloc_array( long length)
{
struct array *arr = malloc( sizeof(struct array) + sizeof(int)*length);
arr->length = length;
return arr;
}

Modern C officially supports flexible array members. So you can define your structure as follows:
struct array {
long length;
int mem[];
};
And allocate it as you do now, without the added hassle of dubious pointer manipulation. It will work out of the box, all the access will be properly aligned and you won't have to worry about dark corners of the language. Though, naturally, it's only viable if you have a single such member you need to allocate.
As for what you have now, since allocated storage doesn't have a declared type (it's a blank slate), you aren't breaking strict aliasing, since you haven't given that memory an effective type. The only issue is with possible mess-up of alignment. Though that's unlikely with the types in your structure.

I believe the code as written does violate strict aliasing rules, when standard read in the strictest sense.
You are accessing an object of type int through a pointer to unrelated type array. I believe, that an easy way out would be to use starting address of the struct, and than convert it char*, and perform a pointer arithmetic on it. Example:
void* alloc = malloc(...);
array = alloc;
int* p_int = (char*)alloc + sizeof(array);

sizeof of string literal inside a struct - C

I have the following code:
struct stest
{
int x;
unsigned char data[];
} x =
{
1, {"\x10\x20\x30\x00\x10"}
};
int main()
{
printf( "x.data: %d\n", (int)x.data[0] );
return 0;
}
Which works fine. However, I need to use the size of the "data".
If I do:
printf( "sizeof x.data: %d\n", (int)sizeof(x.data) );
I get the error:
invalid application of ‘sizeof’ to incomplete type ‘char[]’
Is there a way to get the size of "data" in this situation, or maybe a suggestion of an alternative method I could use?
The compiler I am using is gcc 4.6.3.

Since x.data is a null terminated char array you could just use strlen function.
printf( "sizeof x.data: %u\n", strlen(x.data)+1 );
This code will not work correctly if the array contains null. In this case you need to store length of the array in separate member of struct.

As others have said, strlen will get you what you want.
The reason you're running into trouble isn't due to the string literal aspect, but rather the lack of a definite size of the array in your struct. It's an incomplete type. The compiler doesn't know how big the array is, and so sizeof can't figure it out.
Specifically, this is a "flexible array member". C99 added these. Structs with array's with an incomplete array as the last element (so it has room to grow).
Sizeof gets the size of the datatype.
Sizeof when applied to arrays, gets the size of the whole array.
Sizeof when applied to structs with a flexible array member, ignores the array.
Sizeof when applied to incomplete types simply fails. It gives up trying to figure out how big the array is.
So long story short, slap a number in your array. (Or pass along that information)
struct
{
int x;
int thesizeofmyarrayisfive = 5;
unsigned char data[5];
}

Add a size field to the structure.
#define DATA ("\x10\x20\x30\x00\x10")
struct stest
{
int x;
size_t size;
unsigned char data[];
} x =
{
1, sizeof DATA, { DATA }
};

There's only two options:
Store a terminating byte. Often people use the null character for this. Of course, you must make sure that the terminating byte is not found in the valid data part of the array.
Add a length member to your struct.

Array out of bounds, index -1

I basically want to store a array of student names, based on a given number. For example, if the user wants to insert 5 names, then the array size will be 5. If the user wants to insert 10 names, then the array size will be 10.
I have a method like this to set a name to a specific element in an array.
void setNames(char *names){
strcpy(name[i], names);
}
Thing is, how do I do array bound checks? I heard that you can only add when the index is -1.

Arrays don't maintain their own size, you have to do that for them. This is part of the reason why vectors are so much easier to deal with, and why everyone will say "wtf, raw arrays? use a vector". An array is just a contiguous chunk of memory, thats it. a vector contains an array, and lets you use it like an array to some extent, but it handles a lot of the housekeeping details for you.
Anyway, if you really want to use a raw array, then you'll need to pass around size information along with it. C strings are a null-terminated array -- just a plain old array, but the last element is \0. This way you can read from it without knowing it's size ahead of time, just don't read past the null character at the end (dragons be there).

EDIT (as the OP indicated he actually wants C):
C answer
What you can do is either create a char array:
char [N][name_length]
where N - number "user wants" (I assume the user will somehow input it into your program), name_length - maximum length the name can have (a C-string, i.e. null-terminated string).
or create an array of your own structs (each holding a separate name and maybe some other information).
C++ answer
A typical way to do this in C++ is by using std::vector<std::string> (assuming you only want to store names, as std::string).
You then add new elements using using push_back() function. And, as vector is implemented as a dynamic array in C++, you won't have to do bound checking.

C code needs to keep track of the array size in another variable.
typedef struct {
char **name;
size_t n;
} Names_T;
void Names_Set(Names_T *names, size_t index, const char *name) {
// See if it is a special value and then append to the array
if (index == (size_t) -1) {
index = names->n;
}
if (index >= names->n) {
size_t newsize = index + 1;
// OOM error handling omitted
names->name = realloc(names->name, newsize * sizeof *names->name);
while (names->n < newsize) {
names->name[names->n++] = NULL;
}
}
char *oldname = names->name[index];
names->name[index] = strdup(name);
free(oldname);
}
void Names_Delete(Names_T *names) {
while (names->n > 0) {
names->n--;
free(&names->name[names->n]);
names->name[names->n] = NULL;
}
free(names->name);
names->name = NULL;
}
int main(void) {
Names_T names = { NULL, 0 };
Names_Set(&names, 3, "Sam"); // set array element 3
Names_Set(&names, (size_t) -1, "Thers"); // Append to array
Names_Delete(&names);
return 0;
}

When programming in C/C++ (unless using C++11 or newer), you will manipulate arrays as pointers. That means you won't know the size of an array unless you save it. What char str[10] really means is str's address + 10 * sizeof(char). You are directly dealing with memory here.
If you want a high level approach for that, take a look at C++11. std::array and std::vector are there for you. From the documentation, look how std::array is defined:
template <
class T,
std::size_t N
> struct array;
It means it stores its own size and has useful functions as well, such as size(), at(), back() etc.