Is accessing an array within a temporary variable undefined behaviour? - c

typedef struct test{
char c_arr[1];
} test;
test array[1] = {{1}};
test get(int index){
return array[index];
}
int main(){
char* a = get(0).c_arr;
return *a;
}
In this post the behaviour is explained in C++: Accessing an array within a struct causes warnings with clang
Above code does not cause any warnings or errors when compiling with gcc or clang.
Is get(0).c_arr returning a pointer to a temporary variable which gets destroyed at the end of the expression? and if so, is dereferencing and returning its value UB? if it is then what would be a good way to fix it, maybe this?
test* get(int index){
return &array[index];
}

char* a = get(0).c_arr;
return *a;
Is clearly UB; the memory referred to by a has been released from the stack by the time the return runs, and I would be really annoyed by the contrary result. Imagine if somebody wrote:
while (true) {
char a = get(0).c_arr[0];
//...
a = get(0).c_arr[0];
//...
a = get(0).c_arr[0];
//...
a = get(0).c_arr[0];
//...
a = get(0).c_arr[0];
//...
}
The stack waste stinks and embedded programmers and the kernel programmers would be up in arms about it.
However, the following is valid in C: char a = get(0).c_arr[0]; This is because the temporary persists long enough to use in the expression.

Yes, it's undefined behavior. The relevant section of the C17 standard is "6.2.4 Storage durations of objects":
The lifetime of an object is the portion of program execution during which storage is guaranteed
to be reserved for it. An object exists, has a constant address,33) and retains its last-stored value
throughout its lifetime.34) If an object is referred to outside of its lifetime, the behavior is undefined.
A non-lvalue expression with structure or union type, where the structure or union contains a
member with array type (including, recursively, members of all contained structures and unions)
refers to an object with automatic storage duration and temporary lifetime.36) Its lifetime begins
when the expression is evaluated and its initial value is the value of the expression. Its lifetime ends
when the evaluation of the containing full expression ends.
The expression get(0) is not an lvalue, and struct test contains c_arr, a member with array type, so it has temporary lifetime. This means that return *a; is UB because it accesses it outside of its lifetime.
Also, a big hint that this isn't allowed is that had you used char c; instead of char c_arr[1];, then char* a = &get(0).c; would have been a compile-time error, since you can't take the address of an rvalue, and what you wrote is basically morally equivalent to trying to do that.
I filed GCC bug 101358 and LLVM bug 51002 about not receiving warnings for this.

Related

Union with const and non-const members

I'm writing some library code that exposes a const pointer to users but during certain operations I need to change where this pointer points (behind the scenes switcheroo tricks). One idea I had to solve this problem without encountering UB or strict-aliasing violations was to use a union with a const member:
// the pointed-to objects (in production code, these are actually malloc'd blocks of mem)
int x = 0, y = 7;
typedef union { int * const cp; int * p; } onion;
onion o = { .cp = &x };
printf("%d\n", *o.cp); // <---------------------- prints: 0
o.p = &y;
printf("%d\n", *o.cp); // <---------------------- prints: 7
But I don't know if this is well-defined or not... anybody know if it is (or isn't) and why?
EDIT: I think I muddied the waters by mentioning I was building a library as lots of people have asked for clarifying details about that rather than answering the much simpler question I intended.
Below, I've simplified the code by changing the type from int* to just int and now my question is simply: is the following well-defined?
typedef union { int const cp; int p; } onion;
onion o = { .cp = 0 };
printf("%d\n", o.cp); // <---------------------- prints: 0
o.p = 7;
printf("%d\n", o.cp); // <---------------------- prints: 7
I think this is undefined as per C11 6.7.3 (equivalent paragraph is in all versions of the standard):
If an attempt is made to modify an object defined with a const-qualified type through use of an lvalue with non-const-qualified type, the behavior is undefined.
o.cp is undoubtedly an object defined with a const-qualified type.
The modification of o.p does seem to me to count as an attempt to modify o.cp , since that is exactly why we are doing it!
Every programming book I've had told me the following.
static const int x = 7;
int *px = (int *)&x;
is not defined, but
static int x = 7;
const int *px1 = &x;
int *px2 = (int *)px1;
is defined. That is, you can always cast away the const-ness if the originating pointer (here the &x) wasn't const.
Here I'm leaning on the lack of a contrary opinion from any quality source and not bothering to look up the standard (for which I'm not going to pay).
However you're trying to export something const that isn't const. That is actually valid. The language allows for
extern const * int p;
to be writable behind the secnes. The way to switch it out to the file with the definition doesn't see it const is to define it as int *p; and carefully not include the declaration in the file containing the defintion. This allows you to cast away the const with impunity. Writing to it would look like:
int x;
*((int **)&p) = &x;
Old compilers used to reject extern const volatile machine_register; but modern compilers are fine.
If the interface is a const-declared pointer such as int *const (like you've indicated in your comment), then there's nothing you can do to change that that will not trigger UB.
If you're storing an int * somewhere (e.g., as a static int *ip;) and are exposing its address via a an int *const* pointer (e.g., int *const* ipcp = &ip;, then you can simply recast to back to (int**) (the original type of &ip from the example I gave) and use that to access the int* pointer.
The Standard uses the term "object" to refer to a number of concepts, including:
an exclusive association of a region of storage of static, automatic, or thread duration to a "stand-alone" named identifier, which will hold its value throughout its lifetime unless modified using an lvalue or pointer derived from it.
any region of storage identified by an lvalue.
Within block scope, a declaration struct s1 { int x,y; } v1; will cause the creation of an object called v1 which satisfying the first definition above. Within the lifetime of v1, no other named object which satisfies that definition will be observably associated with the same storage. An lvalue expression like v1.x would identify an object meeting the second definition, but not the first, since it would identify storage that is associated not just with the lvalue expression v1.x, but also with the named stand-alone object v1.
I don't think the authors of the Standard fully considered, or reached any sort of meaningful consensus on, the question of which meaning of "object" is described by the rule:
If an attempt is made to modify an object defined with a const-qualified type through use of an lvalue with non-const-qualified type, the behavior is undefined.
It would certainly make sense that if an object of the first kind is defined with a const qualifier, the behavior of code that tries to modify it would be outside the Standard's jurisdiction. If one interprets the rule as applying more broadly to other kinds of objects as well, then actions that modify such objects within their lifetime would also fall outside the Standard's jurisdiction, but the Standard really doesn't meaningfully describe the lifetime of objects of the second type as being anything other than the lifetime of the underlying storage.
Interpreting the quoted text as applying only to objects of the first kind would yield clear and useful semantics; trying to apply it to other kinds of objects would yield semantics that are murkier. Perhaps such semantics could be useful for some purposes, but I don't see any advantage versus treating the text as applying to objects of the first type.

Can the address of a variable with automatic storage duration be taken in its definition?

Is it allowed to take the address of an object on the right hand-side of its definition, as happens in foo() below:
typedef struct { char x[100]; } chars;
chars make(void *p) {
printf("p = %p\n", p);
chars c;
return c;
}
void foo(void) {
chars b = make(&b);
}
If it is allowed, is there any restriction on its use, e.g., is printing it OK, can I compare it to another pointer, etc?
In practice it seems to compile on the compilers I tested, with the expected behavior most of the time (but not always), but that's far from a guarantee.
To answer the question in the title, with your code sample in mind, yes it may. The C standard says as much in §6.2.4:
The lifetime of an object is the portion of program execution during
which storage is guaranteed to be reserved for it. An object exists,
has a constant address, and retains its last-stored value throughout
its lifetime.
For such an object that does not have a variable length array type,
its lifetime extends from entry into the block with which it is
associated until execution of that block ends in any way.
So yes, you may take the address of a variable from the point of declaration, because the object has the address at this point and it's in scope. A condensed example of this is the following:
void *p = &p;
It serves very little purpose, but is perfectly valid.
As for your second question, what can you do with it. I can mostly say I wouldn't use that address to access the object until initialization is complete, because the order of evaluation for expressions in initializers is left unsepcified (§6.7.9). You can easily find your foot shot off.
One place where this does come through, is when defining all sorts of tabular data structures that need to be self referential. For instance:
typedef struct tab_row {
// Useful data
struct tab_row *p_next;
} row;
row table[3] = {
[1] = { /*Data 1*/, &table[0] },
[2] = { /*Data 2*/, &table[1] },
[0] = { /*Data 0*/, &table[2] },
};
6.2.1 Scopes of identifiers
Structure, union, and enumeration tags have scope that begins just after the appearance of
the tag in a type specifier that declares the tag. Each enumeration constant has scope that
begins just after the appearance of its defining enumerator in an enumerator list. Any
other identifier has scope that begins just after the completion of its declarator.
In
chars b = make(&b);
// ^^
the declarator is b, so it is in scope in its own initializer.
6.2.4 Storage durations of objects
For such an [automatic] object that does not have a variable length array type, its lifetime extends
from entry into the block with which it is associated until execution of that block ends in
any way.
So in
{ // X
chars b = make(&b);
}
the lifetime of b starts at X, so by the time the initializer executes, it is both alive and in scope.
As far as I can tell, this is effectively identical to
{
chars b;
b = make(&b);
}
There's no reason you couldn't use &b there.
The question has already been answered, but for reference, it doesn't make much sense. This is how you would write the code:
typedef struct { char x[100]; } chars;
chars make (void) {
chars c;
/* init c */
return c;
}
void foo(void) {
chars b = make();
}
Or perhaps preferably in case of an ADT or similar, return a pointer to a malloc:ed object. Passing structs by value is usually not a good idea.

Can we reuse allocated memory

This is a follow up to this question.
When explaining my problem, I declared that allocated memory could be reused because it has no declared type, and I was told that it was incorrect C.
Here is a code example illustrating the question:
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#include <string.h>
struct Elt {
int id;
char name[32];
};
struct Elt2 {
double val;
char name[16];
};
static_assert(sizeof(struct Elt2) < sizeof(struct Elt), "Incorrect sizes");
int main() {
struct Elt actual1 = { 1, "foo"};
struct Elt2 actual2 = {2.0, "bar"};
struct Elt* elt = malloc(sizeof(struct Elt));
memcpy(elt, &actual1, sizeof(*elt)); // populates the allocated memory
printf("elt: %d %s\n", elt->id, elt->name);
struct Elt2 *elt2 = (void *) elt; // declares a new pointer to a shorter type
memcpy(elt2, &actual2, sizeof(*elt2)); // effective type is now struct Elt2
printf("elt2: %g %s\n", elt2->val, elt2->name);
//printf("elt: %d %s\n", elt->id, elt->name); UB: storage now contains an Elt2 object
free(elt); // only legal use for elt
return 0;
}
I believe that 6.5 Expression §6 of draft n1570 allows it:
The effective type of an object for an access to its stored value is the declared type of the
object, if any.87) If a value is stored into an object having no declared type through an
lvalue having a type that is not a character type, then the type of the lvalue becomes the
effective type of the object for that access and for subsequent accesses that do not modify
the stored value. If a value is copied into an object having no declared type using
memcpy or memmove, or is copied as an array of character type, then the effective type
of the modified object for that access and for subsequent accesses that do not modify the
value is the effective type of the object from which the value is copied, if it has one.
with note 87:
87) Allocated objects have no declared type.
Question:
Is reusing allocated memory for storing a different object that can fit in that memory conformant C?
If not, that would be catastrophic. Many people use such tricks to implement their own fine grained memory management on to of malloc.
So, yes, this is exactly what the paragraph in the standard that you are citing is about. Notice that it choses the words carefully. It says
If a value is stored into an object having no declared type ...
this property of having no declared type doesn't change through the lifetime of the object, so the provision applies at any time a new value is written into it.
If, for some weird reason the committee would have wanted to say that the effective type is only changeable once, they would have say something like
If a value is stored into an object having no effective type ...
The only correct answer is one that is fully derived from the standard.
Without going through the standard, I would say, "Yes, your assumption is correct". I say that because without it, it would not be possible to implement your own memory manager. I think not even malloc could be implemented in C without it.

Is it legal to use memcpy with a destination structure with constant members?

For example, is the following function legal:
struct two_int {
const int a, b;
}
void copy_two(const two_int *src, two_int *dest) {
memcpy(dest, src, sizeof(two_int));
}
It seems like at least some types of modifications of constant-defined values is not allowed, but it is not clear to me if this qualifies.
If the answer is "it is not allowed, in general", I'm also wondering about the special case where dest is newly allocated memory with malloc (and hence hasn't yet been assigned any value), such as:
two_int s = {.a = 1, .b = 2};
two_int *d = malloc(sizeof(two_int));
copy_two(&s, d);
Update: It seems like the latter question seems to answered in the affirmative (it's OK) for the case of a newly malloc'd structure, but the original, more general question still stands, I think.
Use of memcpy for such purposes would have defined behavior only if the actual destination object does not have static or automatic duration.
Given the code:
struct foo { double const x; };
void outsideFunction(struct foo *p);
double test(void)
{
struct foo s1 = {0.12345678};
outsideFunction(&s1);
return 0.12345678;
}
a compiler would be entitled to optimize the function to:
double test(void)
{
static struct foo const s1 = {0.12345678};
outsideFunction(&s1);
return s1.x;
}
On many processors, the only way to load a double with an arbitrary constant is to load its value from object holding that constant, and in this case the compiler would conveniently know an object (s1.x) which must hold the constant 0.12345678. The net effect would be that code which used memcpy to write to s1.x could corrupt other uses of the numeric constant 0.12345678. As the saying goes, "variables won't; constants aren't". Nasty stuff.
The problem would not exist for objects of allocated duration because memcpy requires a compiler to "forget" everything about what had previously been stored in the destination storage, including any "effective type". The declared types of static and automatic objects exist independent of anything being stored into them, and cannot be erased by "memcpy" nor any other means, but allocated-duration objects only have an "effective type" which would get erased.
First note, that almost anything is allowed, in the wider sense of the word: It might simply have undefined behavior. You can take an arbitrary number, reinterpret_cast<> it as an address, and write to that address - perfectly "legal" C++. As for what it does - usually, no guarantees. Here be dragons.
However, as indicated in an answer to this question:
behavior of const_cast in C++
if the structure you're pointing to was not originally defined as const, and simply got const'ified on the way to your code, then const_cast'ing the reference or pointer to it and setting the pointed-to values is required to succeed as you might expect.

Plain old C: accessing member of a function returning struct

Please could you help me with a question of "legitimacy".
Assuming that foo() is a function returning a structure, is it officially acceptable to assign a member of the structure directly to a variable, for example
x = foo().member
Both the GNU-C complier and an embedded C compiler (Keil) accept this without any grumbles but is this actually legitimate according to the official C standard or is it just a relaxed attitude of these particular compilers? If it's legit, has it always been legit or is it a recent development?
Here's an example that compiles and runs OK:
typedef struct
{
int a;
int b;
} footype;
footype testfoo(void)
{
footype n;
n.a = 1;
n.b = 2;
return n;
}
int main()
{
printf("\nTest = %d\n", testfoo().a);
return 0;
}
Yes, this is a standard C syntax.
The testfoo function returns a temporary instance of the structure. This instance can be used at once as show in the example, and the temporary instance is valid until the full expression (i.e. the printf call in your case) is done.
It's equivalent to
{
footype temp = testfoo();
printf("\nTest = %d\n", temp.a);
}
It's a standard case of expression evaluation result.
The flow:
testfoo().a
is a standard C expression where:
testfoo() result, what the function returns, is in our case a struct footype.
Then compiler applies the operator . to the this struct.
The dot operator access the field a and creates a new result.
The final expression result is used in the printf().
This is always valid.
I.e.:
int *bar(void); //bar is afunnction that returns an int pointer
...
i = bar()[10]; //We access the 10th element pointed by bar()
Out of interest I compiled your example and looked at the assembly generated. First, every compiler can solve this in its own way. I use VC2008.
In your example, the compiler returned the struct in its registers.
I changed the struct to include a large string, so the compiler would run out of registers. It now changed the signature of the function to receive a pointer, and had allocated an instance of the struct on the stack (caller's stackframe), then called the function with a pointer to this struct. The function copied the result to the struct on the stack using this pointer.

Resources