void compute(int rows, int columns, double *data) {
double (*data2D)[columns] = (double (*)[columns]) data;
// do something with data2D
}
int main(void) {
double data[25] = {0};
compute(5, 5, data);
}
Sometimes, it'd be very convenient to treat a parameter as a multi-dimensional array, but it needs to be declared as a pointer into a flat array. Is it safe to cast the pointer to treat it as a multidimensional array, as compute does in the above example? I'm pretty sure the memory layout is guaranteed to work correctly, but I don't know if the standard allows pointers to be cast this way.
Does this break any strict aliasing rules? What about the rules for pointer arithmetic; since the data "isn't actually" a double[5][5], are we allowed to perform pointer arithmetic and indexing on data2D, or does it violate the requirement that pointer arithmetic not stray past the bounds of an appropriate array? Is data2D even guaranteed to point to the right place, or is it just guaranteed that we can cast it back and recover data? Standard quotes would be much appreciated.
I apologize in advance for a somewhat vague answer, as someone said these rules in the standard are quite hard to interpret.
C11 6.3.2.3 says
A pointer to an object type may be converted to a pointer to a
different object type. If the resulting pointer is not correctly
aligned for the referenced type, the behavior is undefined.
So the actual cast is fine, as long as both pointers have the same alignment.
And then regarding accessing the actual data through the pointer, C11 6.5 gives you a wall of gibberish text regarding "aliasing", which is quite hard to understand. I'll try to cite what I believe are the only relevant parts for this specific case:
"The effective type of an object for an access to its stored value is
the declared type of the object, if any." /--/
"An object shall have its stored value accessed only by an lvalue
expression that has one of the following types:
a type compatible
with the effective type of the object, "
/--/
"an aggregate or union type that includes one of the aforementioned types among its
members"
(The above is sometimes referred to as the "strict aliasing rule", which isn't a formal C language term, but rather a term made up by compiler implementers.)
In this case, the effective type of the object is an array of 25 doubles. You are attempting to cast it to an array pointer to an array of 5 doubles. Whether it counts as a type compatible with the effective type, or as an aggregate which includes the type, I'm not sure. But I'm quite sure it counts as either of those two valid cases.
So as far as I can see, this code doesn't violate 6.3.2.3 nor 6.5. I believe that the code is guaranteed to work fine and the behavior should be well-defined.
The safest way in such situations is to keep the elements in a flat 1-dimensional array, but write accessor methods to read and write from this array in a multi dimensional way.
#include <stdio.h>
#include <string.h>
const int rowCount = 10;
const int columnCount = 10;
const int dataSize = rowCount*columnCount;
double data[dataSize];
void setValue( const int x, const int y, double value)
{
if ( x>=0 && x<columnCount && y>=0 && y<rowCount) {
data[x+y*columnCount] = value;
}
}
double getValue( const int x, const int y )
{
if ( x>=0 && x<columnCount && y>=0 && y<rowCount) {
return data[x+y*columnCount];
} else {
return 0.0;
}
}
int main()
{
memset(data, 0, sizeof(double)*dataSize);
// set a value
setValue(5, 2, 12.0);
// get a value
double value = getValue(2, 7);
return 0;
}
The example uses global variables which are only used for simplicity. You can either pass the data array as additional parameter to the functions, or even create a context to work with.
In c++ you would wrap the data container into a class and use the two methods as access methods.
Related
Is the following program valid? (In the sense of being well-defined by the ISO C standard, not just happening to work on a particular compiler.)
struct foo {
int a, b, c;
};
int f(struct foo *p) {
// should return p->c
char *q = ((char *)p) + 2 * sizeof(int);
return *((int *)q);
}
It follows at least some of the rules for well-defined use of pointers:
The value being loaded, is of the same type that was stored at the address.
The provenance of the calculated pointer is valid, being derived from a valid pointer by adding an offset, that gives a pointer still within the original storage instance.
There is no mixing of element types within the struct, that would generate padding to make an element offset unpredictable.
But I'm still not sure it's valid to explicitly calculate and use element pointers that way.
C is a low level programming language. This code is well-defined but probably not portable.
It is not portable because it makes assumptions about the layout of the struct. In particular, you might run into fields being 64-bit aligned on a 64bit platform where in is 32 bit.
Better way of doing it is using the offsetof marco.
The C standard allows there to be arbitrary padding between elements of a struct (but not at the beginning of one). Real-world compilers won’t insert padding into a struct like that one, but the DeathStation 9000 is allowed to. If you want to do that portably, use the offsetof() macro from <stddef.h>.
*(int*)((char*)p + offsetof(foo, c))
is guaranteed to work. A difference, such as offsetof(foo,c) - offsetof(foo, b), is also well-defined. (Although, since offsetof() returns an unsigned value, it’s defined to wrap around to a large unsigned number if the difference underflows.)
In practice, of course, use &p->c.
An expression like the one in your original question is guaranteed to work for array elements, however, so long as you do not overrun your buffer. You can also generate a pointer one past the end of an array and compare that pointer to a pointer within the array, but dereferencing such a pointer is undefined behavior.
I think it likely that at least some authors of the Standard intended to allow a compiler given something like:
struct foo { unsigned char a[4], b[4]; } x;
int test(int i)
{
x.b[0] = 1;
x.a[i] = 2;
return x.b[0];
}
to generate code that would always return 1 regardless of the value of i. On the flip side, I think it is extremely like nearly all of the Committee would have intended that a function like:
struct foo { char a[4], b[4]; } x;
void put_byte(int);
void test2(unsigned char *p, int sz)
{
for (int i=0; i<sz; i++)
put_byte(p[i]);
}
be capable of outputting all of the bytes in x in a single invocation.
Clang and gcc will assume that any construct which applies the [] operator to a struct or union member will only be used to access elements of that member array, but the Standard defines the behavior of arrayLValue[index] as equivalent to (*((arrayLValue)+index)), and would define the address of x.a's first element, which is an unsigned char*, as equivalent to the address of x, cast to that type. Thus, if code calls test2((unsigned char*)x), the expression p[i] would be equivalent to x.a[i], which clang and gcc would only support for subscripts in the range 0 to 3.
The only way I see of reading the Standard as satisfying both viewpoints would be to treat support for even the latter construct as a "quality of implementation" issue outside the Standard's jurisdiction, on the assumption that quality implementations would support constructs like the latter with or without a mandate, and there was thus no need to write sufficiently detailed rules to distinguish those two scenarios.
I'm writing some library code that exposes a const pointer to users but during certain operations I need to change where this pointer points (behind the scenes switcheroo tricks). One idea I had to solve this problem without encountering UB or strict-aliasing violations was to use a union with a const member:
// the pointed-to objects (in production code, these are actually malloc'd blocks of mem)
int x = 0, y = 7;
typedef union { int * const cp; int * p; } onion;
onion o = { .cp = &x };
printf("%d\n", *o.cp); // <---------------------- prints: 0
o.p = &y;
printf("%d\n", *o.cp); // <---------------------- prints: 7
But I don't know if this is well-defined or not... anybody know if it is (or isn't) and why?
EDIT: I think I muddied the waters by mentioning I was building a library as lots of people have asked for clarifying details about that rather than answering the much simpler question I intended.
Below, I've simplified the code by changing the type from int* to just int and now my question is simply: is the following well-defined?
typedef union { int const cp; int p; } onion;
onion o = { .cp = 0 };
printf("%d\n", o.cp); // <---------------------- prints: 0
o.p = 7;
printf("%d\n", o.cp); // <---------------------- prints: 7
I think this is undefined as per C11 6.7.3 (equivalent paragraph is in all versions of the standard):
If an attempt is made to modify an object defined with a const-qualified type through use of an lvalue with non-const-qualified type, the behavior is undefined.
o.cp is undoubtedly an object defined with a const-qualified type.
The modification of o.p does seem to me to count as an attempt to modify o.cp , since that is exactly why we are doing it!
Every programming book I've had told me the following.
static const int x = 7;
int *px = (int *)&x;
is not defined, but
static int x = 7;
const int *px1 = &x;
int *px2 = (int *)px1;
is defined. That is, you can always cast away the const-ness if the originating pointer (here the &x) wasn't const.
Here I'm leaning on the lack of a contrary opinion from any quality source and not bothering to look up the standard (for which I'm not going to pay).
However you're trying to export something const that isn't const. That is actually valid. The language allows for
extern const * int p;
to be writable behind the secnes. The way to switch it out to the file with the definition doesn't see it const is to define it as int *p; and carefully not include the declaration in the file containing the defintion. This allows you to cast away the const with impunity. Writing to it would look like:
int x;
*((int **)&p) = &x;
Old compilers used to reject extern const volatile machine_register; but modern compilers are fine.
If the interface is a const-declared pointer such as int *const (like you've indicated in your comment), then there's nothing you can do to change that that will not trigger UB.
If you're storing an int * somewhere (e.g., as a static int *ip;) and are exposing its address via a an int *const* pointer (e.g., int *const* ipcp = &ip;, then you can simply recast to back to (int**) (the original type of &ip from the example I gave) and use that to access the int* pointer.
The Standard uses the term "object" to refer to a number of concepts, including:
an exclusive association of a region of storage of static, automatic, or thread duration to a "stand-alone" named identifier, which will hold its value throughout its lifetime unless modified using an lvalue or pointer derived from it.
any region of storage identified by an lvalue.
Within block scope, a declaration struct s1 { int x,y; } v1; will cause the creation of an object called v1 which satisfying the first definition above. Within the lifetime of v1, no other named object which satisfies that definition will be observably associated with the same storage. An lvalue expression like v1.x would identify an object meeting the second definition, but not the first, since it would identify storage that is associated not just with the lvalue expression v1.x, but also with the named stand-alone object v1.
I don't think the authors of the Standard fully considered, or reached any sort of meaningful consensus on, the question of which meaning of "object" is described by the rule:
If an attempt is made to modify an object defined with a const-qualified type through use of an lvalue with non-const-qualified type, the behavior is undefined.
It would certainly make sense that if an object of the first kind is defined with a const qualifier, the behavior of code that tries to modify it would be outside the Standard's jurisdiction. If one interprets the rule as applying more broadly to other kinds of objects as well, then actions that modify such objects within their lifetime would also fall outside the Standard's jurisdiction, but the Standard really doesn't meaningfully describe the lifetime of objects of the second type as being anything other than the lifetime of the underlying storage.
Interpreting the quoted text as applying only to objects of the first kind would yield clear and useful semantics; trying to apply it to other kinds of objects would yield semantics that are murkier. Perhaps such semantics could be useful for some purposes, but I don't see any advantage versus treating the text as applying to objects of the first type.
Now I know I can implement inheritance by casting the pointer to a struct to the type of the first member of this struct.
However, purely as a learning experience, I started wondering whether it is possible to implement inheritance in a slightly different way.
Is this code legal?
#include <stdio.h>
#include <stdlib.h>
struct base
{
double some;
char space_for_subclasses[];
};
struct derived
{
double some;
int value;
};
int main(void) {
struct base *b = malloc(sizeof(struct derived));
b->some = 123.456;
struct derived *d = (struct derived*)(b);
d->value = 4;
struct base *bb = (struct base*)(d);
printf("%f\t%f\t%d\n", d->some, bb->some, d->value);
return 0;
}
This code seems to produce desired results , but as we know this is far from proving it is not UB.
The reason I suspect that such a code might be legal is that I can not see any alignment issues that could arise here. But of course this is far from knowing no such issues arise and even if there are indeed no alignment issues the code might still be UB for any other reason.
Is the above code valid?
If it's not, is there any way to make it valid?
Is char space_for_subclasses[]; necessary? Having removed this line the code still seems to be behaving itself
As I read the standard, chapter §6.2.6.1/P5,
Certain object representations need not represent a value of the object type. If the stored
value of an object has such a representation and is read by an lvalue expression that does
not have character type, the behavior is undefined. [...]
So, as long as space_for_subclasses is a char (array-decays-to-pointer) member and you use it to read the value, you should be OK.
That said, to answer
Is char space_for_subclasses[]; necessary?
Yes, it is.
Quoting §6.7.2.1/P18,
As a special case, the last element of a structure with more than one named member may
have an incomplete array type; this is called a flexible array member. In most situations,
the flexible array member is ignored. In particular, the size of the structure is as if the
flexible array member were omitted except that it may have more trailing padding than
the omission would imply. However, when a . (or ->) operator has a left operand that is
(a pointer to) a structure with a flexible array member and the right operand names that
member, it behaves as if that member were replaced with the longest array (with the same
element type) that would not make the structure larger than the object being accessed; the
offset of the array shall remain that of the flexible array member, even if this would differ
from that of the replacement array. If this array would have no elements, it behaves as if
it had one element but the behavior is undefined if any attempt is made to access that
element or to generate a pointer one past it.
Remove that and you'd be accessing invalid memory, causing undefined behavior. However, in your case (the second snippet), you're not accessing value anyway, so that is not going to be an issue here.
This is more-or-less the same poor man's inheritance used by struct sockaddr, and it is not reliable with the current generation of compilers. The easiest way to demonstrate a problem is like this:
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>
struct base
{
double some;
char space_for_subclasses[];
};
struct derived
{
double some;
int value;
};
double test(struct base *a, struct derived *b)
{
a->some = 1.0;
b->some = 2.0;
return a->some;
}
int main(void)
{
void *block = malloc(sizeof(struct derived));
if (!block) {
perror("malloc");
return 1;
}
double x = test(block, block);
printf("x=%g some=%g\n", x, *(double *)block);
return 0;
}
If a->some and b->some were allowed by the letter of the standard to be the same object, this program would be required to print x=2.0 some=2.0, but with some compilers and under some conditions (it won't happen at all optimization levels, and you may have to move test to its own file) it will print x=1.0 some=2.0 instead.
Whether the letter of the standard does allow a->some and b->some to be the same object is disputed. See http://blog.regehr.org/archives/1466 and the paper it links to.
I'm writing a C library that uses some simple object-oriented inheritance much like this:
struct Base {
int x;
};
struct Derived {
struct Base base;
int y;
};
And now I want to pass a Derived* to a function that takes a Base* much like this:
int getx(struct Base *arg) {
return arg->x;
};
int main() {
struct Derived d;
return getx(&d);
};
This works, and is typesafe of course, but the compiler doesn't know this. Is there a way to tell the compiler that this is typesafe? I'm focusing just on GCC and clang here so compiler-specific answers are welcome. I have vague memories of seeing some code that did this using __attribute__((inherits(Base)) or something of the sort but my memory could be lying.
This is safe in C except that you should cast the argument to Base *. The rule that prohibits aliasing (or, more precisely, that excludes it from being supported in standard C) is in C 2011 6.5, where paragraph 7 states:
An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
— a type compatible with the effective type of the object,
…
This rule prevents us from taking a pointer to float, converting it to pointer to int, and dereferencing the pointer to int to access the float as an int. (More precisely, it does not prevent us from trying, but it makes the behavior undefined.)
It might seems that your code violates this since it accesses a Derived object using a Base lvalue. However, converting a pointer to Derived to a pointer to Base is supported by C 2011 6.7.2.1 paragraph 15 states:
… A pointer to a structure object, suitably converted, points to its initial member…
So, when we convert the pointer to Derived to a pointer to Base, what we actually have is not a pointer to the Derived object using a different type than it is (which is prohibited) but a pointer to the first member of the Derived object using its actual type, Base, which is perfectly fine.
About the edit: Originally I stated function arguments would be converted to the parameter types. However, C 6.5.2.2 2 requires that each argument have a type that may be assigned to an object with the type of its corresponding parameter (with any qualifications like const removed), and 6.5.16.1 requires that, when assigning one pointer to another, they have compatible types (or meet other conditions not applicable here). Thus, passing a pointer to Derived to a function that takes a pointer to Base violates standard C constraints. However, if you perform the conversion yourself, it is legal. If desired, the conversion could be built into a preprocessor macro that calls the function, so that the code still looks like a simple function call.
Give address of a base member (truly type-safe option):
getx(&d.base);
Or use void pointer:
int getx(void * arg) {
struct Base * temp = arg;
return temp->x;
};
int main() {
struct Derived d;
return getx(&d);
};
It works because C requires that there is never a padding before the first struct member. This won't increase type safety, but removes the needs for casting.
As noted above by user694733, you are probably best off to conform to standards and type safety by using the address of the base field as in (repeating for future reference)
struct Base{
int x;
}
struct Derived{
int y;
struct Base b; /* look mam, not the first field! */
}
struct Derived d = {0}, *pd = &d;
void getx (struct Base* b);
and now despite the base not being the first field you can still do
getx (&d.b);
or if you are dealing with a pointer
getx(&pd->b).
This is a very common idiom. You have to be careful if the pointer is NULL, however, because the &pd->b just does
(struct Base*)((char*)pd + offsetof(struct Derived, b))
so &((Derived*)NULL)->b becomes
((struct Base*)offsetof(struct Derived, b)) != NULL.
IMO it is a missed opportunity that C has adopted anonymous structs but not adopted the plan9 anonymous struct model which is
struct Derived{
int y;
struct Base; /* look mam, no fieldname */
} d;
It allows you to just write getx(&d) and the compiler will adjust the Derived pointer to a base pointer i.e. it means exactly the same as getx(&d.b) in the example above. In other words it effectively gives you inheritance but with a very concrete memory layout model. In particular, if you insist on not embedding (== inheriting) the base struct at the top, you have to deal with NULL yourself. As you expect from inheritance it works recursively so for
struct TwiceDerived{
struct Derived;
int z;
} td;
you can still write getx(&td). Moreover, you may not need the getx as you can write d.x (or td.x or pd->x).
Finally using the typeof gcc extension you can write a little macro for downcasting (i.e. casting to a more derived struct)
#define TO(T,p) \
({ \
typeof(p) nil = (T*)0; \
(T*)((char*)p - ((char*)nil - (char*)0)); \
}) \
so you can do things like
struct Base b = {0}, *pb = &b;
struct Derived* pd = TO(struct Derived, pb);
which is useful if you try to do virtual functions with function pointers.
On gcc you can use/experiment with the plan 9 extensions with -fplan9-extensions. Unfortunately it does not seem to have been implemented on clang.
Why does this work:
#include <sys/types.h>
#include <stdio.h>
#include <stddef.h>
typedef struct x {
int a;
int b[128];
} x_t;
int function(int i)
{
size_t a;
a = offsetof(x_t, b[i]);
return a;
}
int main(int argc, char **argv)
{
printf("%d\n", function(atoi(argv[1])));
}
If I remember the definition of offsetof correctly, it's a compile time construct. Using 'i' as the array index results in a non-constant expression. I don't understand how the compiler can evaluate the expression at compile time.
Why isn't this flagged as an error?
The C standard does not require this to work, but it likely works in some C implementations because offsetof(type, member) expands to something like:
type t; // Declare an object of type "type".
char *start = (char *) &t; // Find starting address of object.
char *p = (char *) &t->member; // Find address of member.
p - start; // Evaluate offset from start to member.
I have separated the above into parts to display the essential logic. The actual implementation of offsetof would be different, possibly using implementation-dependent features, but the core idea is that the address of a fictitious or temporary object would be subtracted from the address of the member within the object, and this results in the offset. It is designed to work for members but, as an unintended effect, it also works (in some C implementations) for elements of arrays in structures.
It works for these elements simply because the construction used to find the address of a member also works to find the address of an element of an array member, and the subtraction of the pointers works in a natural way.
it's a compile time construct
AFAICS, there are no such constraints. All the standard says is:
[C99, 7.17]:
The macro...
offsetof(type, member-designator)
...
The type and member designator shall be such that given
static type t;
then the expression &(t.member-designator) evaluates to an address constant.
offsetof (type,member)
Return member offset: This macro with functional form returns the offset value in bytes of member member in the data structure or union type type.
http://www.cplusplus.com/reference/cstddef/offsetof/
(C, C++98 and C++11 standards)
I think I understand this now.
The offsetof() macro does not evaluate to a constant, it evaluates to a run-time expression that returns the offset. Thus as long as type.member is valid syntax, the compiler doesn't care what it is. You can use arbitrary expressions for the array index. I had thought it was like sizeof and had to be constant at compile time.
There has been some confusion on what exactly is permitted as a member-designator. Here are two papers I am aware of:
DR 496
Offsetof for Pointers to Members
However, even quite old versions of GCC, clang, and ICC support calculating array elements with dynamic offset. Based on Raymond's blog I guess that MSVC has long supported it too.
I believe it is based out of pragmatism. For those not familiar, the "struct hack" and flexible array members use variable-length data in the last member of a struct:
struct string {
size_t size;
const char data[];
};
This type is often allocated with something like this:
string *string_alloc(size_t size) {
string *s = malloc(offsetof(string, data[size]));
s->size = size;
return s;
}
Admittedly, this latter part is just a theory. It's such a useful optimization that I imagine that initially it was permitted on purpose for such cases, or it was accidentally supported and then found to be useful for exactly such cases.