Is this use of an array Undefined Behavior? [duplicate] - c

This question already has answers here:
Is the "struct hack" technically undefined behavior?
(8 answers)
Closed 7 years ago.
In a solution I posted I got comments that the solution contains Undefined Behavior. However, I do not see how. The basic of the solution posted is:
typedef struct {
int n;
int a[1];
} t_x;
void example(void)
{
int i;
t_x *t= malloc (sizeof(t_x) + 99*sizeof(int));
t->n= 100;
for (i=0; i < t->n; i++)
t->a[i]= i;
free(t);
}
The comment of UB centered on whether the array now has 1 element (as declared) or has 100 elements (as allocated).
The parts of the standard quoted were 6.5.6 (pointer/int addition) and 6.5.2.1 (array subscripting)
"6.5.6 defines what happens when you add a pointer and an integer. The resulting pointer points to a corresponding element of the array, if such an element exists, or to one element past the end. The result is undefined otherwise."
"6.5.2.1 defines what a[n] means in terms of a+n. It follows that you cannot say a[n] if a doesn't have at least n+1 elements."
With both quotes the commenter seems to imply that element a[99] would not exist, however, looking at the memory lay-out it clearly exists:
Please help me understand if/why this is UB and what types of UB I may expect.

This is a pretty popular trick in pre-C99 code. It works in many implementations, but is not strictly speaking legal (thus not portable). The standard doesn't say how the strcture of t_x aligns in memory. See C FAQ for detail.
C99 introduced flexible length array, which is preferred for such problem.

Ok, so the problem is, that you want to allocate memory for a table, that you defined in code. This table will always be of size one as you written, so you can store one object of type int in there.
If you want to allocate memory dynamically for a, then you should:
typedef struct {
int n;
int* a;
} t_x;
and then
t_x someStruct;
someStruct.a=(int*)malloc(numberOfElements*sizeof(int));

Related

Why would C have "fake arrays"? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I'm reading The Unix haters handbook and in chapter 9 there's something I don't really understand:
C doesn’t really have arrays either. It has something that looks like an array
but is really a pointer to a memory location.
I can't really imagine any way to store an array in memory other than using pointers to index memory locations. How C implements "fake" arrays, anyways? Is there any veracity on this claim?
I think the author’s point is that C arrays are really just a thin veneer on pointer arithmetic. The subscript operator is defined simply as a[b] == *(a + b), so you can easily say 5[a] instead of a[5] and do other horrible things like access the array past the last index.
Comparing to that, a “true array” would be one that knows its own size, doesn’t let you do pointer arithmetic, access past the last index without an error, or access its contents using a different item type. In other words, a “true array” is a tight abstraction that doesn’t tie you to a single representation – it could be a linked list instead, for example.
PS. To spare myself some trouble: I don’t really have an opinion on this, I’m just explaining the quote from the book.
There is a difference between C arrays and pointers, and it can be seen by the output of sizeof() expressions. For example:
void sample1(const char * ptr)
{
/* s1 depends on pointer size of architecture */
size_t s1 = sizeof(ptr);
}
size_t sample2(const char arr[])
{
/* s2 also depends on pointer size of architecture, because arr decays to pointer */
size_t s2 = sizeof(arr);
return s2;
}
void sample3(void)
{
const char arr[3];
/* s3 = 3 * sizeof(char) = 3 */
size_t s2 = sizeof(arr);
}
void sample4(void)
{
const char arr[3];
/* s4 = output of sample2(arr) which... depends on pointer size of architecture, because arr decays to pointer */
size_t s4 = sample2(arr);
}
The sample2 and sample4 in particular is probably why people tend to conflate C arrays with C pointers, because in other languages you can simply pass arrays as an argument to a function and have it work 'just the same' as it did in the caller function. Similarly because of how C works you can pass pointers instead of arrays and this is 'valid', whereas in other languages with a clearer distinction between arrays and pointers it would not be.
You could also view the sizeof() output as a consequence of C's pass-by-value semantics (since C arrays decay to pointers).
Also, some compilers also support this C syntax:
void foo(const char arr[static 2])
{
/* arr must be **at least** 2 elements in size, cannot pass NULL */
}
The statement you quoted is factually incorrect. Arrays in C are not pointers.
The idea of implementing arrays as pointers was used in B and BCPL languages (ancestors of C), but it has not survived transition to C. At the early ages of C the "backward compatibility" with B and BCPL was considered somewhat important, which is why C arrays closely emulate behavior of B and BCPL arrays (i.e. C arrays easily "decay" to pointers). Nevertheless, C arrays are not "pointers to a memory location".
The book quote is completely bogus. This misconception is rather widespread among C newbies. But how it managed to get into a book is beyond me.
Author probably means, that arrays are constrained in ways which make them feel like 2nd class citizens from programmer point of view. For example, two functions, one is ok, another is not:
int finefunction() {
int ret = 5;
return ret;
}
int[] wtffunction() {
int ret[1] = { 5 };
return ret;
}
You can work around this a bit by wrapping arrays in structs, but it just sort of emphasizes that arrays are different, they're not like other types.
struct int1 {
int a[1];
}
int[] finefunction2() {
struct int1 ret = { { 5 } };
return ret;
}
Another effect of this is, that you can't get size of array at runtime:
int my_sizeof(int a[]) {
int size = sizeof(a);
return size;
}
int main() {
int arr[5];
// prints 20 4, not 20 20 as it would if arrays were 1st class things
printf("%d %d\n", sizeof(arr), my_sizeof(arr));
}
Another way to say what the authors says is, in C (and C++) terminology, "array" means something else than in most other languages.
So, your title question, how would a "true array" be stored in memory. Well, there is no one single kind of "true array". If you wanted true arrays in C, you have basically two options:
Use calloc to allocate buffer, and store pointer and item count here
struct intarrayref {
size_t count;
int *data;
}
This struct is basically reference to array, and you can pass it around nicely to functions etc. You will want to write functions to operate on it, such as create copy of the actual data.
Use flexible array member, and allocate whole struct with single calloc
struct intarrayobject {
size_t count;
int data[];
}
In this case, you allocate both the metadata (count), and the space for array data in one go, but the price is, you can't pass this struct around as value any more, because that would leave behind the extra data. You have to pass pointer to this struct to functions etc. So it is matter of opinion whether one would consider this a "true array" or just slightly enhanced normal C array.
Like the entire book, it's a case of trolling, specifically, the type of trolling that involves stating something almost-true but wrong to solicit angry responses about why it's wrong. C most certainly does have actual arrays/array types, as evidenced by the way pointer-to-array types (and multi-dimensional arrays) work.

Array of size 0 at the end of struct [duplicate]

This question already has answers here:
What's the need of array with zero elements?
(5 answers)
Closed 5 years ago.
My professor of a systems programming course I'm taking told us today to define a struct with a zero-length array at the end:
struct array{
size_t size;
int data[0];
};
typedef struct array array;
This is a useful struct to define or initialize an array with a variable, i.e., something as follows:
array *array_new(size_t size){
array* a = malloc(sizeof(array) + size * sizeof(int));
if(a){
a->size = size;
}
return a;
}
That is, using malloc(), we also allocate memory for the array of size zero. This is completely new for me, and it's seems odd, because, from my understanding, structs do not have their elements necessarily in continuous locations.
Why does the code in array_new allocate memory to data[0]? Why would it be legal to access then, say
array * a = array_new(3);
a->data[1] = 12;
?
From what he told us, it seems that an array defined as length zero at the end of a struct is ensured to come immediately after the last element of the struct, but this seems strange, because, again, from my understanding, structs could have padding.
I've also seen around that this is just a feature of gcc and not defined by any standard. Is this true?
Currently, there exists a standard feature, as mentioned in C11, chapter §6.7.2.1, called flexible array member.
Quoting the standard,
As a special case, the last element of a structure with more than one named member may
have an incomplete array type; this is called a flexible array member. In most situations,
the flexible array member is ignored. In particular, the size of the structure is as if the
flexible array member were omitted except that it may have more trailing padding than
the omission would imply. [...]
The syntax should be
struct s { int n; double d[]; };
where the last element is incomplete type, (no array dimensions, not even 0).
So, your code should better look like
struct array{
size_t size;
int data[ ];
};
to be standard-conforming.
Now, coming to your example, of a 0-sized array, this was a legacy way ("struct hack") of achieving the same. Before C99, GCC supported this as an extension to emulate flexible array member functionality.
Your professor is confused. They should go read what happens if I define a zero size array. This is a non-standard GCC extension; it is not valid C and not something they should teach students to use (*).
Instead, use standard C flexible array member. Unlike your zero-size array, it will actually work, portably:
struct array{
size_t size;
int data[];
};
Flexible array members are guaranteed to count as zero when you use sizeof on the struct, allowing you to do things like:
malloc(sizeof(array) + sizeof(int[size]));
(*) Back in the 90s people used an unsafe exploit to add data after structs, known as the "struct hack". To provide a safe way to extend a struct, GCC implemented the zero-size array feature as a non-standard extension. It became obsolete in 1999 when the C standard finally provided a better way to do this.
Other answers explains that zero-length arrays are GCC extension and C allows variable length array but no one addressed your other questions.
from my understanding, structs do not have their elements necessarily in continuous locations.
Yes. struct data type do not have their elements necessarily in continuous locations.
Why does the code in array_new allocate memory to data[0]? Why would it be legal to access then, say
array * a = array_new(3);
a->data[1] = 12;
?
You should note that one of the the restriction on zero-length array is that it must be the last member of a structure. By this, compiler knows that the struct can have variable length object and some more memory will be needed at runtime.
But, you shouldn't be confused with; "since zero-length array is the last member of the structure then the memory allocated for zero-length array must be added to the end of the structure and since structs do not have their elements necessarily in continuous locations then how could that allocated memory be accessed?"
No. That's not the case. Memory allocation for structure members not necessarily be contiguous, there may be padding between them, but that allocated memory must be accessed with variable data. And yes, padding will have no effect over here. The rule is:
§6.7.2.1/15
Within a structure object, the non-bit-field members and the units in which bit-fields
reside have addresses that increase in the order in which they are declared.
I've also seen around that this is just a feature of gcc and not defined by any standard. Is this true?
Yes. As other answers already mentioned that zero-length arrays are not supported by standard C, but an extension of GCC compilers. C99 introduced flexible array member. An example from C standard (6.7.2.1):
After the declaration:
struct s { int n; double d[]; };
the structure struct s has a flexible array member d. A typical way to use this is:
int m = /* some value */;
struct s *p = malloc(sizeof (struct s) + sizeof (double [m]));
and assuming that the call to malloc succeeds, the object pointed to by p behaves, for most purposes, as if p had been declared as:
struct { int n; double d[m]; } *p;
(there are circumstances in which this equivalence is broken; in particular, the offsets of member d might not be the same).
A more standard way would be to define your array with a data size of 1, as in:
struct array{
size_t size;
int data[1]; // <--- will work across compilers
};
Then use the offset of the data member (not the size of the array) in the calculation:
array *array_new(size_t size){
array* a = malloc(offsetof(array, data) + size * sizeof(int));
if(a){
a->size = size;
}
return a;
}
This is effectively using array.data as a marker for where the extra data might go (depending on size).
The way I used to do it is without a dummy member at the end of the structure: the size of the structure itself tells you the address just past it. Adding 1 to the typed pointer goes there:
header * p = malloc (sizeof (header) + buffersize);
char * buffer = (char*)(p+1);
As for structs in general, you can know that the fields are layed out in order. Being able to match some imposed structure needed by a file format binary image, operating system call, or hardware is one advantage of using C. You have to know how the padding for alignment works, but they are in order and in one contiguous block.

going out of bounds in array of structs in C [duplicate]

This question already has answers here:
Array index out of bound behavior
(10 answers)
Closed 7 years ago.
Suppose I declare the following
typedef struct{
int age;
int weight;
} Man;
Then I make an array of Man such as
Man *manArr = malloc(sizeof(Man) * 2);
My understanding is that I now have two cells each capable of holding a Man type in them..but how am I able to do this then?
manArr[45] = (Man) {33, 23};
I would have imagined that I would have seg faulted because there only exists two cells but I can printf the values of manArr[45]. What's a good way to for example go through struct arrays, do something to their fields, and move on to the next without "going out of bounds" per say?
Thanks
Accessing out-of-bounds is not guaranteed to segfault. It is defined by the C standard as undefined behavior, i.e. anything can happen, including seemingly error-free behavior.
What's a good way to for example go through struct arrays, do something to their fields, and move on to the next without "going out of bounds" per say?
Remember the size of the array.
const size_t manArrSize = 2;
Man *manArr = malloc(sizeof(Man) * manArrSize);
for (size_t index = 0; index < manArrSize; ++index)
{
// Access `manArr[index]`.
}
Going out of bounds of an array causes undefined behaviour. This means that anything could happen. If you're lucky you'll get a segfault, but you may also get cases where the memory location happens to be somewhere you can access.
As for move on to the next without "going out of bounds":
You could either use an extra variable to store the size of the array, or decide on a sentinel value in the array (and allocate one more slot for it) so that if a certain element is equal to the sentinel value, you know it is the end. For example, argv uses NULL as the sentinel value.

Declaring a zero sizes array [duplicate]

This question already has answers here:
zero length arrays [duplicate]
(3 answers)
Closed 9 years ago.
I tried declaring an array a of size 0:
int a[0];
My VC++ 6 compiler throws an error of not being able to create an array of zero size.
If I try the same of declaring inside a structure, I do not get any errors.
struct st
{
int a[0];
}
The code gets compiled and linked without any errors. Can somebody help me understand how the compiler reacts in the above two cases. Thanks.
The struct is a special case. It is a common pattern to declare an empty array as the last member of a struct, where the struct is actually part of a larger block of memory of variable length. See Empty arrays in structs for more explanation.
Some compilers support the extension of using zero-sized arrays as the last element of the struct, to indicate your intent to allocate there an array whose size you don't know yet. Then you can use that struct member (the zero-sized array) to access the elements of that array.
Note that is not an standard feature from C89, and C99 offers an alternative solution:
struct st
{
int a[];
}

zero length arrays [duplicate]

This question already has answers here:
What's the need of array with zero elements?
(5 answers)
Closed 5 years ago.
Recently I came across a structure definition,
struct arr {
int cnt;
struct {
int size;
int *name;
} list[0];
};
and now I don't know the reason for list[0] being declared. What I am interested in is why is this used. Does it have any advantage? If yes, what is it?
The use is for dynamic-length arrays. You can allocate the memory using malloc(), and have the array reside at the end of the structure:
struct arr *my_arr = malloc(sizeof *my_arr + 17 * sizeof *my_arr->list);
my_arr->cnt = 17;
my_arr->list[0].size = 0;
my_arr->list[1].name = "foo";
Actually being able to use 0 for the length is (as pointed out in a comment) a GCC extension. In C99, you can leave out the size literal altogether for the same effect.
Before these things were implemented, you often saw this done with a length of 1, but that complicates the allocation a bit since you must compensate when computing the memory needed.
It is called "struct hack". You can search for it on SO or on the Net
http://www.google.com/search?q=struct+hack&sitesearch=stackoverflow.com/questions
Note that formally it is always illegal to declare arrays of size 0 in C. The code you provided formally is not even compilable. Most C compilers will accept 0-sized array declaration as an extension though, specifically because it is often used in "lazy" version of "struct hack" (it can rely on sizeof to determine how much memory to allocate, since 0-sized array supposedly does not affect the total size of the struct).
An arguably better implementation of struct hack uses an array of size 1
struct arr {
int cnt;
struct {
int size;
int *name;
} list[1];
};
It is "better" because it is formally compilable at least. In order to allocate memory for a struct with N elements in the list, standard offsetof macro is used
arr *a = malloc(offsetof(arr, list) + N * sizeof a->list);
In C99 version of the language specification the "struct hack" is supported through size-less array declaration (with empty []), since 0-sized array declarations are illegal in C99 as well.
Another advantage is if your structure describes on-disk/on-network data. If cnt is 0, the data size may only be the length of cnt.
I'm here just to confirm what I dreaded, that list[0] is not valid.

Resources