C: does the assignment operator deep copy? - c

For scalar values, the assignment operator seems to copy the right-hand side value to the left. How does that work for composite data types? Eg, if I have a nested struct
struct inner {
int b;
};
struct outer {
struct inner a;
};
int main() {
struct outer s1 = { .a = {.b=1}};
struct outer s2 = s1;
}
does the assignment recursively deep copy the values?
does the same happen when passing the struct to a function?
By experimenting it seems like it does, but can anyone point to the specification of the behavior?

There is no "recursion"; it copies all the (value) bits of the value. Pointers are not magically followed of course, the assignment operator wouldn't know how to duplicate the pointed-to data.
You can think of
a = b;
as shorthand for
memcpy(&a, &b, sizeof a);
The sizeof is misleading of course, since we know the types are the same on both sides but I don't think __typeof__ helps.
The draft C11 spec says (in 6.5.16.1 Simple assignment, paragraph 2):
In simple assignment (=), the value of the right operand is converted to the
type of the assignment expression and replaces the value stored in the object
designated by the left operand.

does the assignment recursively deep copy the values?
Yes, just as if you would have used memcpy. Pointers are copied, but not what they point at. The term "deep copy" often means: also copy what the pointers point at (for example in a C++ copy constructor).
Except the values of any padding bytes may hold indeterminate values. (Meaning that memcmp on a struct might be unsafe.)
does the same happen when passing the struct to a function?
Yes. See the reference to 6.5.2.2 below.
By experimenting it seems like it does, but can anyone point to the specification of the behavior?
C17 6.5.16:
An assignment operator stores a value in the object designated by the left operand. An
assignment expression has the value of the left operand after the assignment, but is not
an lvalue. The type of an assignment expression is the type the left operand would have
after lvalue conversion.
(Lvalue conversion in this case isn't relevant, since both structs must be of 100% identical and compatible types. Simply put: two structs are compatible if they have exactly the same members.)
C17 6.5.16.1 Simple assignment:
the left operand has an atomic, qualified, or unqualified version of a structure or union
type compatible with the type of the right;
C17 6.5.2.2 Function calls, §7:
If the expression that denotes the called function has a type that does include a prototype,
the arguments are implicitly converted, as if by assignment, ...

Related

Can you proof why casting is important when I deference a void pointer?

Why is necessary do casting when I dereference a void pointer?
I have this example:
int x;
void* px = &x;
*px = 9;
Can you proof why this don't work?
By definition, a void pointer points to an I'm-not-sure-what-type-of-object.
By definition, when you use the unary * operator to access the object pointed to by a pointer, you must know (well, the compiler must know) what the type of the object is.
So we have just proved that we cannot directly dereference a void pointer using *; we must always explicitly cast the void pointer to some actual object pointer type first.
Now, in many people's minds, the "obvious" answer to "what type does/should a 'generic' pointer point to?" is "char". And, once upon a time, before the void type had been invented, character pointers were routinely used as "generic" pointers. So some compilers (including, notably, gcc) extend things a bit and let you do more (such as pointer arithmetic) with a void pointer than the standard requires.
So that might explain how code like that in your question might be able to "work". (In your case, though, since the pointed-to type was actually int, not char, if it "worked" it was only because you were on a little-endian machine.)
...And with that said, I find that the code in your question does not work for me, not even under gcc. It first gives me a non-fatal warning:
warning: dereferencing ‘void *’ pointer
But then it changes its mind and decides this is an error instead:
error: invalid use of void expression
A second compiler I tried said something similar:
error: incomplete type 'void' is not assignable
Addendum: To say a little more about why the pointed-to type is reuired when you dereference a pointer:
When you access a pointer using *, the compiler is going to emit code to fetch from (or maybe store to) the pointed-to location. But the compiler is going to have to emit code that accesses a certain number of bytes, and in many cases it may matter how those byte(s) are interpreted. Both the number and the interpretation of the bytes is determined by the type (that's what types are for), which is precisely why an actual, non-void type is required.
One of the best ways I know of appreciating this requirement is to consider code like
*p + 1
or, even better
*p += 1
If p points to a char, the compiler is probably going to emit some kind of an addb ("add byte") instruction.
If p points to an int, the compiler is going to emit an ordinary add instruction.
If p points to a float or double, the compiler is going to emit a floating-point addition instruction. And so on.
But if p is a void *, the compiler has no idea what to do. It complains (in the form of an error message) not just because the C standard says you can't dereference a void pointer, but more importantly, because the compiler simply doesn't know what to do with your code.
In short:
The target of an assignment expression must be a modifiable lvalue, which cannot be a void expression. This is because the void type does not represent any values - it denotes an absence of a value. You cannot create an object of type void.
If the expression px has type void *, then the expression *px has type void. Attempting to assign to *px is a constraint violation and the compiler is required to yell at you for it.
If you want to assign a new value to x through px, then you have to cast px to an int * before dereferencing:
*((int *)px) = 5;
Chapter and verse:
6.2.5 Types
...
19 The void type comprises an empty set of values; it is an incomplete object type that
cannot be completed.
...
6.3.2.1 Lvalues, arrays, and function designators
1 An lvalue is an expression (with an object type other than void) that potentially
designates an object;64) if an lvalue does not designate an object when it is evaluated, the
behavior is undefined. When an object is said to have a particular type, the type is
specified by the lvalue used to designate the object. A modifiable lvalue is an lvalue that
does not have array type, does not have an incomplete type, does not have a const-qualified type, and if it is a structure or union, does not have any member (including,
recursively, any member or element of all contained aggregates or unions) with a const-qualified type.
...
6.3.2.2 void
1 The (nonexistent) value of a void expression (an expression that has type void) shall not
be used in any way, and implicit or explicit conversions (except to void) shall not be
applied to such an expression. If an expression of any other type is evaluated as a void
expression, its value or designator is discarded. (A void expression is evaluated for its
side effects.)
...
6.3.2.3 Pointers
1 A pointer to void may be converted to or from a pointer to any object type. A pointer to
any object type may be converted to a pointer to void and back again; the result shall
compare equal to the original pointer.
...
6.5.3.2 Address and indirection operators
...
4 The unary * operator denotes indirection. If the operand points to a function, the result is
a function designator; if it points to an object, the result is an lvalue designating the
object. If the operand has type ‘‘pointer to type’’, the result has type ‘‘type’’. If an
invalid value has been assigned to the pointer, the behavior of the unary * operator is
undefined.102)
...
6.5.16 Assignment operators
...
Constraints
2 An assignment operator shall have a modifiable lvalue as its left operand.
More specifically, dereferencing a void pointer violates the wording of 6.5.3.2 Address and indirection operators, paragraph 4:
The unary * operator denotes indirection. If the operand points to a function, the result is a function designator; if it points to an object, the result is an lvalue designating the object. If the operand has type ''pointer to type'', the result has type ''type''. If an invalid value has been assigned to the pointer, the behavior of the unary * operator is undefined.
Since a pointer to void has no "type" - it can't be dereferenced. Note that this is beyond undefined behavior - it is a violation of the C language standard.
It probably doesn't work because it violates a rule in the ISO C standard which requires a diagnostic, and (I'm guessing) your compiler is treating that as a fatal situation.
According to ISO C99, as well as the C11 Draft (n1548), the only constraint on the use of the * dereferencing operator is "[t]he operand of the unary *operator shall have pointer type." [6.5.3.2¶2, n1548] The code we have here meets that constraint, and has no syntax error. Therefore no diagnostic is required for the use of the * operator.
However, what is the meaning of the * operator applied to a void * pointer?
"The unary * operator denotes indirection. If the operand points to a function, the result is a function designator; if it points to an object, the result is an lvalue designating the object. If the operand has type ‘‘pointer to type’’, the result has type ‘‘type’’. [6.5.3.2¶4, n1548]
The type void is neither a function nor an object type, so the middle sentence, which talks about producing a function or object designator, is not applicable to our case. The last sentence quoted above is applicable; it gives a requirement that an expression which dereferences a void * has void type.
Thus *px = 9; runs aground because it's assigning an int value to a void expression. An assignment requires an lvalue expression of object type; void is not an object type and the expression is certainly not an lvalue. The exact wording of the constraint is: "An assignment operator shall have a modifiable lvalue as its left operand." [6.5.16¶2, n1548] Violation of this constraint requires a diagnostic.
It appears from my perhaps naive reading of the standard that the expression *px as such is valid; only no attempt must be made to extract a result from it, or use it as the target of an assignment. If that is true, it could be used as an expression statement whose value is discarded: if (foo()) { *px; }, and it could be redundantly cast to void also: (void) *px. These apparently pointless situations might be somehow exploited by, or at least arise in, certain kinds of macros.
For instance, if we want to be sure that the argument of some macro is a pointer we can take advantage of the constraint that * requires a pointer operand:
#define MAC(NUM, PTR) ( ... (void) *(PTR) ...)
I.e. somewhere in the macro we dereference the pointer and throw away the result, which will diagnose if PTR isn't a pointer. It looks like ISO C allows this usage even if PTR is a void *, which is arguably useful.

can a struct assignment overlap similar to memmove() or is struct assignment like memcpy()?

There are two different functions for making a copy of a data area in the Standard C library, memmove() used for overlapping memory areas and memcpy() for disjoint, non-overlapping memory areas.
What does the C standards say about struct assignment as in:
struct thing myThing = {0};
struct thing *pmyThing = &myThing;
myThing = *pmyThing; // assign myThing to itself through a pointer dereference.
Does the struct assignment follow the rules for memmove() or for memcpy() or its own rules so far as overlapping memory areas are concerned?
Section 6.5.16.1 ("Simple assignment") of the C standard (reading from draft N1548) states:
In simple assignment (=), the value of the right operand is converted to the type of the assignment expression and replaces the value stored in the object designated by the left operand.
If the value being stored in an object is read from another object that overlaps in any way the storage of the first object, then the overlap shall be exact and the two objects shall have qualified or unqualified versions of a compatible type; otherwise, the behavior is undefined.
The C standard doesn't specify how the compiler implements the simple assignment. But overlap between the source and destination are permitted if the overlap is exact and the types are compatible. Self-assignment (whether through a pointer or not) meets this requirement, and thus the behavior is well-defined.

Invalid parse of ternary expression VC2008?

The following code gives me a compiler warning
warning C4133: ':' : incompatible types - from 'YTYPE *' to 'XTYPE *'
however, the expession seems OK to me. Any ideas?
struct XTYPE {
int x;
long y;
};
struct YTYPE {
long y;
int x;
};
extern void *getSomething(void);
void Test(void)
{
int b= 0;
struct XTYPE *pX;
struct YTYPE *pY;
void * (*pfFoo)(void);
pfFoo= getSomething;
if (b ? (pX= (*pfFoo)()) // error
: (pY= (*pfFoo)()) )
{
;
}
if (b ? ((pX= (*pfFoo)())!=0) // no error
: ((pY= (*pfFoo)())!=0) )
{
;
}
}
It's a constraint violation, simply put. To begin with, the type of an assignment expression is determined by the left hand side. So your case sees struct XTYPE* and struct YTYPE*.
6.5.16 Assignment operators - p3
An assignment operator stores a value in the object designated by the
left operand. An assignment expression has the value of the left
operand after the assignment,111) but is not an lvalue. The type of an
assignment expression is the type the left operand would have after
lvalue conversion. The side effect of updating the stored value of the
left operand is sequenced after the value computations of the left and
right operands. The evaluations of the operands are unsequenced.
And the types of the operands for a conditional expression must satisfy this constraint:
6.5.15 Conditional operator - p3
One of the following shall hold for the second and third operands:
both operands have arithmetic type;
both operands have the same structure or union type;
both operands have void type;
both operands are pointers to qualified or unqualified versions of compatible types;
one operand is a pointer and the other is a null pointer constant; or
one operand is a pointer to an object type and the other is a pointer to a qualified or unqualified version of void.
Since struct XTYPE* and struct YTYPE* are not pointers to compatible types (the only bullet which may even apply), and are in fact just pointers to unrelated types, your program is ill-formed.
A major point of contention here is that MSVC isn't a conforming C compiler (not C11 anyway). But the above rules haven't changed much since the last C version that MSVC does support, so there you have it.
Although you aren't using it, the compiler has to make a sensible type for the result of the tertiary operation by looking at the two RHS paths.
In the first case you gave it 2 unrelated pointer types X and Y, which it refuses to resolve directly to bool for the if statement. (bool doesn't exist as a specific type in C, so it would need to use some int type)
In the second case you have done this cast to bool, by comparing with 0/NULL/nullptr, though you have switched the logic from if set to if clear.
In c++ we have the concept of inheritance and related types, so the compiler can often find a common type to resolve these sorts of statements. In my experience different compilers have treated this situation differently, but with strict compliance settings it should always fail unless one of the operands is a subtype of the other.
One way I used to fix these issues in the old days was to use "double-pling" !! as a cast to boolean keeping the original sense:
if (b ? !!(pX= (*pfFoo)())
: !!(pY= (*pfFoo)()) )
{
;
}
The point of the double-pling is that it can be read as a single operation with a clear intent, while anything else has to be decoded by the reader.

Dereferencing null pointer valid and working (not sizeof) [duplicate]

Code sample:
struct name
{
int a, b;
};
int main()
{
&(((struct name *)NULL)->b);
}
Does this cause undefined behaviour? We could debate whether it "dereferences null", however C11 doesn't define the term "dereference".
6.5.3.2/4 clearly says that using * on a null pointer causes undefined behaviour; however it doesn't say the same for -> and also it does not define a -> b as being (*a).b ; it has separate definitions for each operator.
The semantics of -> in 6.5.2.3/4 says:
A postfix expression followed by the -> operator and an identifier designates a member
of a structure or union object. The value is that of the named member of the object to
which the first expression points, and is an lvalue.
However, NULL does not point to an object, so the second sentence seems underspecified.
Also relevant might be 6.5.3.2/1:
Constraints:
The operand of the unary & operator shall be either a function designator, the result of a
[] or unary * operator, or an lvalue that designates an object that is not a bit-field and is
not declared with the register storage-class specifier.
However I feel that the bolded text is defective and should read lvalue that potentially designates an object , as per 6.3.2.1/1 (definition of lvalue) -- C99 messed up the definition of lvalue, so C11 had to rewrite it and perhaps this section got missed.
6.3.2.1/1 does say:
An lvalue is an expression (with an object type other than void) that potentially
designates an object; if an lvalue does not designate an object when it is evaluated, the
behavior is undefined
however the & operator does evaluate its operand. (It doesn't access the stored value but that is different).
This long chain of reasoning seems to suggest that the code causes UB however it is fairly tenuous and it's not clear to me what the writers of the Standard intended. If in fact they intended anything, rather than leaving it up to us to debate :)
From a lawyer point of view, the expression &(((struct name *)NULL)->b); should lead to UB, since you could not find a path in which there would be no UB. IMHO the root cause is that at a moment you apply the -> operator on an expression that does not point to an object.
From a compiler point of view, assuming the compiler programmer was not overcomplicated, it is clear that the expression returns the same value as offsetof(name, b) would, and I'm pretty sure that provided it is compiled without error any existing compiler will give that result.
As written, we could not blame a compiler that would note that in the inner part you use operator -> on an expression than cannot point to an object (since it is null) and issue a warning or an error.
My conclusion is that until there is a special paragraph saying that provided it is only to take its address it is legal do dereference a null pointer, this expression is not legal C.
Yes, this use of -> has undefined behavior in the direct sense of the English term undefined.
The behavior is only defined if the first expression points to an object and not defined (=undefined) otherwise. In general you shouldn't search more in the term undefined, it means just that: the standard doesn't provide a meaning for your code. (Sometimes it points explicitly to such situations that it doesn't define, but this doesn't change the general meaning of the term.)
This is a slackness that is introduced to help compiler builders to deal with things. They may defined a behavior, even for the code that you are presenting. In particular, for a compiler implementation it is perfectly fine to use such code or similar for the offsetof macro. Making this code a constraint violation would block that path for compiler implementations.
Let's start with the indirection operator *:
6.5.3.2 p4:
The unary * operator denotes indirection. If the operand points to a function, the result is
a function designator; if it points to an object, the result is an lvalue designating the
object. If the operand has type "pointer to type", the result has type "type". If an
invalid value has been assigned to the pointer, the behavior of the unary * operator is
undefined. 102)
*E, where E is a null pointer, is undefined behavior.
There is a footnote that states:
102) Thus, &*E is equivalent to E (even if E is a null pointer), and &(E1[E2]) to ((E1)+(E2)). It is
always true that if E is a function designator or an lvalue that is a valid operand of the unary &
operator, *&E is a function designator or an lvalue equal to E. If *P is an lvalue and T is the name of
an object pointer type, *(T)P is an lvalue that has a type compatible with that to which T points.
Which means that &*E, where E is NULL, is defined, but the question is whether the same is true for &(*E).m, where E is a null pointer and its type is a struct that has a member m?
C Standard doesn't define that behavior.
If it were defined, new problems would arise, one of which is listed below. C Standard is correct to keep it undefined, and provides a macro offsetof that handles the problem internally.
6.3.2.3 Pointers
An integer constant expression with the value 0, or such an expression cast to type
void *, is called a null pointer constant. 66) If a null pointer constant is converted to a
pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal
to a pointer to any object or function.
This means that an integer constant expression with the value 0 is converted to a null pointer constant.
But the value of a null pointer constant is not defined as 0. The value is implementation defined.
7.19 Common definitions
The macros are
NULL
which expands to an implementation-defined null pointer constant
This means C allows an implementation where the null pointer will have a value where all bits are set and using member access on that value will result in an overflow which is undefined behavior
Another problem is how do you evaluate &(*E).m? Do the brackets apply and is * evaluated first. Keeping it undefined solves this problem.
First, let's establish that we need a pointer to an object:
6.5.2.3 Structure and union members
4 A postfix expression followed by the -> operator and an identifier designates a member
of a structure or union object. The value is that of the named member of the object to
which the first expression points, and is an lvalue.96) If the first expression is a pointer to
a qualified type, the result has the so-qualified version of the type of the designated
member.
Unfortunately, no null pointer ever points to an object.
6.3.2.3 Pointers
3 An integer constant expression with the value 0, or such an expression cast to type
void *, is called a null pointer constant.66) If a null pointer constant is converted to a
pointer type, the resulting pointer, called a null pointer, is guaranteed to compare unequal
to a pointer to any object or function.
Result: Undefined Behavior.
As a side-note, some other things to chew over:
6.3.2.3 Pointers
4 Conversion of a null pointer to another pointer type yields a null pointer of that type.
Any two null pointers shall compare equal.
5 An integer may be converted to any pointer type. Except as previously specified, the
result is implementation-defined, might not be correctly aligned, might not point to an
entity of the referenced type, and might be a trap representation.67)
6 Any pointer type may be converted to an integer type. Except as previously specified, the
result is implementation-defined. If the result cannot be represented in the integer type,
the behavior is undefined. The result need not be in the range of values of any integer
type.
67) The mapping functions for converting a pointer to an integer or an integer to a pointer are intended to be consistent with the addressing structure of the execution environment.
So even if the UB should happen to be benign this time, it might still result in some totally unexpected number.
Nothing in the C standard would impose any requirements on what a system could do with the expression. It would, when the standard was written, have been perfectly reasonable for it to to cause the following sequence of events at runtime:
Code loads a null pointer into the addressing unit
Code asks the addressing unit to add the offset of field b.
The addressing unit trigger a trap when attempting to add an integer to a null pointer (which should for robustness be a run-time trap, even though many systems don't catch it)
The system starts executing essentially random code after being dispatched through a trap vector that was never set because code to set it would have wasted been a waste of memory, as addressing traps shouldn't occur.
The very essence of what Undefined Behavior meant at the time.
Note that most of the compilers that have appeared since the early days of C would regard the address of a member of an object located at a constant address as being a compile-time constant, but I don't think such behavior was mandated then, nor has anything been added to the standard which would mandate that compile-time address calculations involving null pointers be defined in cases where run-time calculations would not.
No. Let's take this apart:
&(((struct name *)NULL)->b);
is the same as:
struct name * ptr = NULL;
&(ptr->b);
The first line is obviously valid and well defined.
In the second line, we calculate the address of a field relative to the address 0x0 which is perfectly legal as well. The Amiga, for example, had the pointer to the kernel in the address 0x4. So you could use a method like this to call kernel functions.
In fact, the same approach is used on the C macro offsetof (wikipedia):
#define offsetof(st, m) ((size_t)(&((st *)0)->m))
So the confusion here revolves around the fact that NULL pointers are scary. But from a compiler and standard point of view, the expression is legal in C (C++ is a different beast since you can overload the & operator).

Why can this C code run correctly? [duplicate]

This question already has answers here:
Why does sizeof(x++) not increment x?
(10 answers)
Closed 7 years ago.
The C code likes this:
#include <stdio.h>
#include <unistd.h>
#define DIM(a) (sizeof(a)/sizeof(a[0]))
struct obj
{
int a[1];
};
int main()
{
struct obj *p = NULL;
printf("%d\n",DIM(p->a));
return 0;
}
This object pointer p is NULL, so, i think this p->a is illegal.
But i have tested this code in Ubuntu14.04, it can execute correctly. So, I want to know why...
Note: the original code had int a[0] above but I've changed that to int a[1] since everyone seems to be hung up on that rather than the actual question, which is:
Is the expression sizeof(p->a) valid when p is equal to NULL?
Because sizeof is a compile time construction, it does not depend on evaluating the input. sizeof(p->a) gets evaluated based on the declared type of the member p::a solely, and becomes a constant in the executable. So the fact that p points to null makes no difference.
The runtime value of p plays absolutely no role in the expression sizeof(p->a).
In C and C++, sizeof is an operator and not a function. It can be applied to either a type-id or an expression. Except in the case that of an expression and the expression is a variable-length array (new in C99) (as pointed out by paxdiablo), the expression is an unevaluated operand and the result is the same as if you had taken sizeof against the type of that expression instead. (C.f. C11 references due to paxdiablo below, C++14 working draft 5.3.3.1)
First up, if you want truly portable code, you shouldn't be attempting to create an array of size zero1, as you did in your original question, now fixed. But, since it's not really relevant to your question of whether sizeof(p->a) is valid when p == NULL, we can ignore it for now.
From C11 section 6.5.3.4 The sizeof and _Alignof operators (my bold):
2/ The sizeof operator yields the size (in bytes) of its operand, which may be an expression or the parenthesized name of a type. The size is determined from the type of the operand. The result is an integer. If the type of the operand is a variable length array type, the operand is evaluated; otherwise, the operand is not evaluated and the result is an integer constant.
Therefore no evaluation of the operand is done unless it's a variable length array (which your example is not). Only the type itself is used to figure out the size.
1 For the language lawyers out there, C11 states in 6.7.6.2 Array declarators (my bold):
1/ In addition to optional type qualifiers and the keyword static, the [ and ] may delimit an expression or *. If they delimit an expression (which specifies the size of an array), the expression shall have an integer type. If the expression is a constant expression, it shall have a value greater than zero.
However, since that's in the constraints section (where shall and shall not do not involve undefined behaviour), it simply means the program itself is not strictly conforming. It's still covered by the standard itself.
This code contains a constraint violation in ISO C because of:
struct obj
{
int a[0];
};
Zero-sized arrays are not permitted anywhere. Therefore the C standard does not define the behaviour of this program (although there seems to be some debate about that).
The code can only "run correctly" if your compiler implements a non-standard extension to allow zero-sized arrays.
Extensions must be documented (C11 4/8), so hopefully your compiler's documentation defines its behaviour for struct obj (a zero-sized struct?) and the value of sizeof p->a, and whether or not sizeof evaluates its operand when the operand denotes a zero-sized array.
sizeof() doesn't care a thing about the content of anything, it merely looks at the resulting type of the expression.
Since C99 and variable length arrays, it is computed at run time when a variable length array is part of the expression in the sizeof operand.Otherwise, the operand is not evaluated and the result is an integer constant
Zero-size array declarations within structs was never permitted by any C standard, but some older compilers allowed it before it became standard for compilers to allow incomplete array declarations with empty brackets(flexible array members).

Resources