Why does a non-constant offsetof expression work? - c

Why does this work:
#include <sys/types.h>
#include <stdio.h>
#include <stddef.h>
typedef struct x {
int a;
int b[128];
} x_t;
int function(int i)
{
size_t a;
a = offsetof(x_t, b[i]);
return a;
}
int main(int argc, char **argv)
{
printf("%d\n", function(atoi(argv[1])));
}
If I remember the definition of offsetof correctly, it's a compile time construct. Using 'i' as the array index results in a non-constant expression. I don't understand how the compiler can evaluate the expression at compile time.
Why isn't this flagged as an error?

The C standard does not require this to work, but it likely works in some C implementations because offsetof(type, member) expands to something like:
type t; // Declare an object of type "type".
char *start = (char *) &t; // Find starting address of object.
char *p = (char *) &t->member; // Find address of member.
p - start; // Evaluate offset from start to member.
I have separated the above into parts to display the essential logic. The actual implementation of offsetof would be different, possibly using implementation-dependent features, but the core idea is that the address of a fictitious or temporary object would be subtracted from the address of the member within the object, and this results in the offset. It is designed to work for members but, as an unintended effect, it also works (in some C implementations) for elements of arrays in structures.
It works for these elements simply because the construction used to find the address of a member also works to find the address of an element of an array member, and the subtraction of the pointers works in a natural way.

it's a compile time construct
AFAICS, there are no such constraints. All the standard says is:
[C99, 7.17]:
The macro...
offsetof(type, member-designator)
...
The type and member designator shall be such that given
static type t;
then the expression &(t.member-designator) evaluates to an address constant.

offsetof (type,member)
Return member offset: This macro with functional form returns the offset value in bytes of member member in the data structure or union type type.
http://www.cplusplus.com/reference/cstddef/offsetof/
(C, C++98 and C++11 standards)

I think I understand this now.
The offsetof() macro does not evaluate to a constant, it evaluates to a run-time expression that returns the offset. Thus as long as type.member is valid syntax, the compiler doesn't care what it is. You can use arbitrary expressions for the array index. I had thought it was like sizeof and had to be constant at compile time.

There has been some confusion on what exactly is permitted as a member-designator. Here are two papers I am aware of:
DR 496
Offsetof for Pointers to Members
However, even quite old versions of GCC, clang, and ICC support calculating array elements with dynamic offset. Based on Raymond's blog I guess that MSVC has long supported it too.
I believe it is based out of pragmatism. For those not familiar, the "struct hack" and flexible array members use variable-length data in the last member of a struct:
struct string {
size_t size;
const char data[];
};
This type is often allocated with something like this:
string *string_alloc(size_t size) {
string *s = malloc(offsetof(string, data[size]));
s->size = size;
return s;
}
Admittedly, this latter part is just a theory. It's such a useful optimization that I imagine that initially it was permitted on purpose for such cases, or it was accidentally supported and then found to be useful for exactly such cases.

Related

C: Reading 8 bytes from a region of size 0 [-Wstringop-overread] [duplicate]

Just curious, what actually happens if I define a zero-length array int array[0]; in code? GCC doesn't complain at all.
Sample Program
#include <stdio.h>
int main() {
int arr[0];
return 0;
}
Clarification
I'm actually trying to figure out if zero-length arrays initialised this way, instead of being pointed at like the variable length in Darhazer's comments, are optimised out or not.
This is because I have to release some code out into the wild, so I'm trying to figure out if I have to handle cases where the SIZE is defined as 0, which happens in some code with a statically defined int array[SIZE];
I was actually surprised that GCC does not complain, which led to my question. From the answers I've received, I believe the lack of a warning is largely due to supporting old code which has not been updated with the new [] syntax.
Because I was mainly wondering about the error, I am tagging Lundin's answer as correct (Nawaz's was first, but it wasn't as complete) -- the others were pointing out its actual use for tail-padded structures, while relevant, isn't exactly what I was looking for.
An array cannot have zero size.
ISO 9899:2011 6.7.6.2:
If the expression is a constant expression, it shall have a value greater than zero.
The above text is true both for a plain array (paragraph 1). For a VLA (variable length array), the behavior is undefined if the expression's value is less than or equal to zero (paragraph 5). This is normative text in the C standard. A compiler is not allowed to implement it differently.
gcc -std=c99 -pedantic gives a warning for the non-VLA case.
As per the standard, it is not allowed.
However it's been current practice in C compilers to treat those declarations as a flexible array member (FAM) declaration:
C99 6.7.2.1, §16: As a special case, the last element of a structure with more than one named member may have an incomplete array type; this is called a flexible array member.
The standard syntax of a FAM is:
struct Array {
size_t size;
int content[];
};
The idea is that you would then allocate it so:
void foo(size_t x) {
Array* array = malloc(sizeof(size_t) + x * sizeof(int));
array->size = x;
for (size_t i = 0; i != x; ++i) {
array->content[i] = 0;
}
}
You might also use it statically (gcc extension):
Array a = { 3, { 1, 2, 3 } };
This is also known as tail-padded structures (this term predates the publication of the C99 Standard) or struct hack (thanks to Joe Wreschnig for pointing it out).
However this syntax was standardized (and the effects guaranteed) only lately in C99. Before a constant size was necessary.
1 was the portable way to go, though it was rather strange.
0 was better at indicating intent, but not legal as far as the Standard was concerned and supported as an extension by some compilers (including gcc).
The tail padding practice, however, relies on the fact that storage is available (careful malloc) so is not suited to stack usage in general.
In Standard C and C++, zero-size array is not allowed..
If you're using GCC, compile it with -pedantic option. It will give warning, saying:
zero.c:3:6: warning: ISO C forbids zero-size array 'a' [-pedantic]
In case of C++, it gives similar warning.
It's totally illegal, and always has been, but a lot of compilers
neglect to signal the error. I'm not sure why you want to do this.
The one use I know of is to trigger a compile time error from a boolean:
char someCondition[ condition ];
If condition is a false, then I get a compile time error. Because
compilers do allow this, however, I've taken to using:
char someCondition[ 2 * condition - 1 ];
This gives a size of either 1 or -1, and I've never found a compiler
which would accept a size of -1.
Another use of zero-length arrays is for making variable-length object (pre-C99). Zero-length arrays are different from flexible arrays which have [] without 0.
Quoted from gcc doc:
Zero-length arrays are allowed in GNU C. They are very useful as the last element of a structure that is really a header for a variable-length object:
struct line {
int length;
char contents[0];
};
struct line *thisline = (struct line *)
malloc (sizeof (struct line) + this_length);
thisline->length = this_length;
In ISO C99, you would use a flexible array member, which is slightly different in syntax and semantics:
Flexible array members are written as contents[] without the 0.
Flexible array members have incomplete type, and so the sizeof operator may not be applied.
A real-world example is zero-length arrays of struct kdbus_item in kdbus.h (a Linux kernel module).
I'll add that there is a whole page of the online documentation of gcc on this argument.
Some quotes:
Zero-length arrays are allowed in GNU C.
In ISO C90, you would have to give contents a length of 1
and
GCC versions before 3.0 allowed zero-length arrays to be statically initialized, as if they were flexible arrays. In addition to those cases that were useful, it also allowed initializations in situations that would corrupt later data
so you could
int arr[0] = { 1 };
and boom :-)
Zero-size array declarations within structs would be useful if they were allowed, and if the semantics were such that (1) they would force alignment but otherwise not allocate any space, and (2) indexing the array would be considered defined behavior in the case where the resulting pointer would be within the same block of memory as the struct. Such behavior was never permitted by any C standard, but some older compilers allowed it before it became standard for compilers to allow incomplete array declarations with empty brackets.
The struct hack, as commonly implemented using an array of size 1, is dodgy and I don't think there's any requirement that compilers refrain from breaking it. For example, I would expect that if a compiler sees int a[1], it would be within its rights to regard a[i] as a[0]. If someone tries to work around the alignment issues of the struct hack via something like
typedef struct {
uint32_t size;
uint8_t data[4]; // Use four, to avoid having padding throw off the size of the struct
}
a compiler might get clever and assume the array size really is four:
; As written
foo = myStruct->data[i];
; As interpreted (assuming little-endian hardware)
foo = ((*(uint32_t*)myStruct->data) >> (i << 3)) & 0xFF;
Such an optimization might be reasonable, especially if myStruct->data could be loaded into a register in the same operation as myStruct->size. I know nothing in the standard that would forbid such optimization, though of course it would break any code which might expect to access stuff beyond the fourth element.
Definitely you can't have zero sized arrays by standard, but actually every most popular compiler gives you to do that. So I will try to explain why it can be bad
#include <cstdio>
int main() {
struct A {
A() {
printf("A()\n");
}
~A() {
printf("~A()\n");
}
int empty[0];
};
A vals[3];
}
I am like a human would expect such output:
A()
A()
A()
~A()
~A()
~A()
Clang prints this:
A()
~A()
GCC prints this:
A()
A()
A()
It is totally strange, so it is a good reason not to use empty arrays in C++ if you can.
Also there is extension in GNU C, which gives you to create zero length array in C, but as I understand it right, there should be at least one member in structure prior, or you will get very strange examples as above if you use C++.

Is it valid to calculate element pointers by explicit arithmetic?

Is the following program valid? (In the sense of being well-defined by the ISO C standard, not just happening to work on a particular compiler.)
struct foo {
int a, b, c;
};
int f(struct foo *p) {
// should return p->c
char *q = ((char *)p) + 2 * sizeof(int);
return *((int *)q);
}
It follows at least some of the rules for well-defined use of pointers:
The value being loaded, is of the same type that was stored at the address.
The provenance of the calculated pointer is valid, being derived from a valid pointer by adding an offset, that gives a pointer still within the original storage instance.
There is no mixing of element types within the struct, that would generate padding to make an element offset unpredictable.
But I'm still not sure it's valid to explicitly calculate and use element pointers that way.
C is a low level programming language. This code is well-defined but probably not portable.
It is not portable because it makes assumptions about the layout of the struct. In particular, you might run into fields being 64-bit aligned on a 64bit platform where in is 32 bit.
Better way of doing it is using the offsetof marco.
The C standard allows there to be arbitrary padding between elements of a struct (but not at the beginning of one). Real-world compilers won’t insert padding into a struct like that one, but the DeathStation 9000 is allowed to. If you want to do that portably, use the offsetof() macro from <stddef.h>.
*(int*)((char*)p + offsetof(foo, c))
is guaranteed to work. A difference, such as offsetof(foo,c) - offsetof(foo, b), is also well-defined. (Although, since offsetof() returns an unsigned value, it’s defined to wrap around to a large unsigned number if the difference underflows.)
In practice, of course, use &p->c.
An expression like the one in your original question is guaranteed to work for array elements, however, so long as you do not overrun your buffer. You can also generate a pointer one past the end of an array and compare that pointer to a pointer within the array, but dereferencing such a pointer is undefined behavior.
I think it likely that at least some authors of the Standard intended to allow a compiler given something like:
struct foo { unsigned char a[4], b[4]; } x;
int test(int i)
{
x.b[0] = 1;
x.a[i] = 2;
return x.b[0];
}
to generate code that would always return 1 regardless of the value of i. On the flip side, I think it is extremely like nearly all of the Committee would have intended that a function like:
struct foo { char a[4], b[4]; } x;
void put_byte(int);
void test2(unsigned char *p, int sz)
{
for (int i=0; i<sz; i++)
put_byte(p[i]);
}
be capable of outputting all of the bytes in x in a single invocation.
Clang and gcc will assume that any construct which applies the [] operator to a struct or union member will only be used to access elements of that member array, but the Standard defines the behavior of arrayLValue[index] as equivalent to (*((arrayLValue)+index)), and would define the address of x.a's first element, which is an unsigned char*, as equivalent to the address of x, cast to that type. Thus, if code calls test2((unsigned char*)x), the expression p[i] would be equivalent to x.a[i], which clang and gcc would only support for subscripts in the range 0 to 3.
The only way I see of reading the Standard as satisfying both viewpoints would be to treat support for even the latter construct as a "quality of implementation" issue outside the Standard's jurisdiction, on the assumption that quality implementations would support constructs like the latter with or without a mandate, and there was thus no need to write sufficiently detailed rules to distinguish those two scenarios.

Structure of a book in C

I am new to C programming and in the development of this exercise I encountered this error that I cannot resolve:
Fields must have a constant size: 'variable length array in structure' extension will never be supported
#include <stdio.h>
#include <stdlib.h>
int main(int argc, const char * argv[]) {
int nChapters = 2;
typedef struct {
char title[50];
char author[50];
} Heading;
typedef struct {
char title[50];
int number_pages;
} Chapter;
typedef struct {
Heading heading;
Chapter chapters[nChapters]; //Fields must have a constant size: 'variable length array in structure' extension will never be supported
} Book;
printf("\n");
system("read -p 'Press enter to continue...' ");
printf("Hello, World!\n");
return 0;
}
If I replace chapters[nChapters] with an int like chapters[2], program run without problems. Thanks in advance!
In C you have to declare arrays using a fixed length, your nChapters variable is indeed, a variable. You can turn it into a constant variable by simply adding the const keyword:
const int nChapters = 2
You can use the preprocessor directive #define:
#define nChapters 2
The issue is that you are assuming that it is obvious
Chapter chapters[nChapters];
that value of nChapters is 2.
It works that way for a array which is not within a struct or a union.
This is supported by weird, non-standard, non-GCC (but accepted as an extension by GCC in C90 onwards), not recommended feature called as VLA or Variable Length Arrays. Using these, one can allocate a auto class array.
Referring to GNU/GCC documentation, section 6.20, It is trivial to note that,
The storage is allocated at the point of declaration and deallocated when the block scope containing the declaration exits.
C99 recommends a better way to deal with this requirement - by using flexible length array.
§6.7.2.1 Structure and union specifiers
¶18 As a special case, the last element of a structure with more than one named member may have an incomplete array type; this is called a flexible array member. In most situations, the flexible array member is ignored. In particular, the size of the structure is as if the flexible array member were omitted except that it may have more trailing padding than the omission would imply. However, when a . (or ->) operator has a left operand that is (a pointer to) a structure with a flexible array member and the right operand names that member, it behaves as if that member were replaced with the longest array (with the same element type) that would not make the structure larger than the object being accessed; the offset of the array shall remain that of the flexible array member, even if this would differ from that of the replacement array. If this array would have no elements, it behaves as if it had one element but the behavior is undefined if any attempt is made to access that element or to generate a pointer one past it.
So, that would change your struct to:
typedef struct {
Heading heading;
Chapter chapters[];
} Book;
And then allocate the memory dynamically from heap - using malloc.
The size of the array member of the struct has to be a constant expression (skip "flexible member" case and GCC's VLA-in-struct extension).
In the C standard the only portable way to have a true named integer constant is using enums.
Just replace:
int nChapters = 2;
with this:
enum { nChapters = 2 };

Is it legal to implement inheritance in C by casting pointers between one struct that is a subset of another rather than first member?

Now I know I can implement inheritance by casting the pointer to a struct to the type of the first member of this struct.
However, purely as a learning experience, I started wondering whether it is possible to implement inheritance in a slightly different way.
Is this code legal?
#include <stdio.h>
#include <stdlib.h>
struct base
{
double some;
char space_for_subclasses[];
};
struct derived
{
double some;
int value;
};
int main(void) {
struct base *b = malloc(sizeof(struct derived));
b->some = 123.456;
struct derived *d = (struct derived*)(b);
d->value = 4;
struct base *bb = (struct base*)(d);
printf("%f\t%f\t%d\n", d->some, bb->some, d->value);
return 0;
}
This code seems to produce desired results , but as we know this is far from proving it is not UB.
The reason I suspect that such a code might be legal is that I can not see any alignment issues that could arise here. But of course this is far from knowing no such issues arise and even if there are indeed no alignment issues the code might still be UB for any other reason.
Is the above code valid?
If it's not, is there any way to make it valid?
Is char space_for_subclasses[]; necessary? Having removed this line the code still seems to be behaving itself
As I read the standard, chapter §6.2.6.1/P5,
Certain object representations need not represent a value of the object type. If the stored
value of an object has such a representation and is read by an lvalue expression that does
not have character type, the behavior is undefined. [...]
So, as long as space_for_subclasses is a char (array-decays-to-pointer) member and you use it to read the value, you should be OK.
That said, to answer
Is char space_for_subclasses[]; necessary?
Yes, it is.
Quoting §6.7.2.1/P18,
As a special case, the last element of a structure with more than one named member may
have an incomplete array type; this is called a flexible array member. In most situations,
the flexible array member is ignored. In particular, the size of the structure is as if the
flexible array member were omitted except that it may have more trailing padding than
the omission would imply. However, when a . (or ->) operator has a left operand that is
(a pointer to) a structure with a flexible array member and the right operand names that
member, it behaves as if that member were replaced with the longest array (with the same
element type) that would not make the structure larger than the object being accessed; the
offset of the array shall remain that of the flexible array member, even if this would differ
from that of the replacement array. If this array would have no elements, it behaves as if
it had one element but the behavior is undefined if any attempt is made to access that
element or to generate a pointer one past it.
Remove that and you'd be accessing invalid memory, causing undefined behavior. However, in your case (the second snippet), you're not accessing value anyway, so that is not going to be an issue here.
This is more-or-less the same poor man's inheritance used by struct sockaddr, and it is not reliable with the current generation of compilers. The easiest way to demonstrate a problem is like this:
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>
struct base
{
double some;
char space_for_subclasses[];
};
struct derived
{
double some;
int value;
};
double test(struct base *a, struct derived *b)
{
a->some = 1.0;
b->some = 2.0;
return a->some;
}
int main(void)
{
void *block = malloc(sizeof(struct derived));
if (!block) {
perror("malloc");
return 1;
}
double x = test(block, block);
printf("x=%g some=%g\n", x, *(double *)block);
return 0;
}
If a->some and b->some were allowed by the letter of the standard to be the same object, this program would be required to print x=2.0 some=2.0, but with some compilers and under some conditions (it won't happen at all optimization levels, and you may have to move test to its own file) it will print x=1.0 some=2.0 instead.
Whether the letter of the standard does allow a->some and b->some to be the same object is disputed. See http://blog.regehr.org/archives/1466 and the paper it links to.

How does the C offsetof macro work? [duplicate]

This question already has answers here:
Closed 11 years ago.
The community reviewed whether to reopen this question 9 months ago and left it closed:
Original close reason(s) were not resolved
Possible Duplicate:
Why does this C code work?
How do you use offsetof() on a struct?
I read about this offsetof macro on the Internet, but it doesn't explain what it is used for.
#define offsetof(a,b) ((int)(&(((a*)(0))->b)))
What is it trying to do and what is the advantage of using it?
R.. is correct in his answer to the second part of your question: this code is not advised when using a modern C compiler.
But to answer the first part of your question, what this is actually doing is:
(
(int)( // 4.
&( ( // 3.
(a*)(0) // 1.
)->b ) // 2.
)
)
Working from the inside out, this is ...
Casting the value zero to the struct pointer type a*
Getting the struct field b of this (illegally placed) struct object
Getting the address of this b field
Casting the address to an int
Conceptually this is placing a struct object at memory address zero and then finding out at what the address of a particular field is. This could allow you to figure out the offsets in memory of each field in a struct so you could write your own serializers and deserializers to convert structs to and from byte arrays.
Of course if you would actually dereference a zero pointer your program would crash, but actually everything happens in the compiler and no actual zero pointer is dereferenced at runtime.
In most of the original systems that C ran on the size of an int was 32 bits and was the same as a pointer, so this actually worked.
It has no advantages and should not be used, since it invokes undefined behavior (and uses the wrong type - int instead of size_t).
The C standard defines an offsetof macro in stddef.h which actually works, for cases where you need the offset of an element in a structure, such as:
#include <stddef.h>
struct foo {
int a;
int b;
char *c;
};
struct struct_desc {
const char *name;
int type;
size_t off;
};
static const struct struct_desc foo_desc[] = {
{ "a", INT, offsetof(struct foo, a) },
{ "b", INT, offsetof(struct foo, b) },
{ "c", CHARPTR, offsetof(struct foo, c) },
};
which would let you programmatically fill the fields of a struct foo by name, e.g. when reading a JSON file.
It's finding the byte offset of a particular member of a struct. For example, if you had the following structure:
struct MyStruct
{
double d;
int i;
void *p;
};
Then you'd have offsetOf(MyStruct, d) == 0, offsetOf(MyStruct, i) == 8, and offsetOf(MyStruct, p) == 12 (that is, the member named d is 0 bytes from the start of the structure, etc.).
The way that it works is it pretends that an instance of your structure exists at address 0 (the ((a*)(0)) part), and then it takes the address of the intended structure member and casts it to an integer. Although dereferencing an object at address 0 would ordinarily be an error, it's ok to take the address because the address-of operator & and the member dereference -> cancel each other out.
It's typically used for generalized serialization frameworks. If you have code for converting between some kind of wire data (e.g. bytes in a file or from the network) and in-memory data structures, it's often convenient to create a mapping from member name to member offset, so that you can serialize or deserialize values in a generic manner.
The implementation of the offsetof macro is really irrelevant.
The actual C standard defines it as in 7.17.3:
offsetof(type, member-designator)
which expands to an integer constant expression that has type size_t, the value of which is the offset in bytes, to the structure member (designated by member-designator), from the beginning of its structure (designated by type). The type and member designator shall be such that given static type t;.
Trust Adam Rosenfield's answer.
R is completely wrong, and it has many uses - especially being able to tell when code is non-portable among platforms.
(OK, it's C++, but we use it in static template compile time assertions to make sure our data structures do not change size between platforms/versions.)

Resources