why *foo ++= *++foo may be undefined?

why *foo ++= *++foo may be undefined? - c

it should set the current character to next character. For example:
while( *foo ) {
if(baa(*foo)) *foo++ = *++foo;
foo++;
}
But I get the following errors:
error: operation on ‘foo’ may be undefined [-Werror=sequence-point]
cc1: all warnings being treated as errors
Can anyone explain why that? that isn't valid C syntax?

You're incrementing foo on both sides of the assignment, with no sequence points in between. That's not allowed; you can only modify a value once between sequence points.

Let's take a closer look at this expression:
*foo++ = *++foo
*foo++ evaluates as *(foo++) (postfix ++ has higher precedence than unary *); you take the current value of foo and dereference it, and advance foo as a side effect. *++foo evaluates as *(++foo) (both unary * and ++ have the same precedence, so they are applied left-to-right); you take the value of foo + 1, dereference the result, and then advance foo again as a side effect. Then you assign the result of the second expression to the first.
The problem is that the exact order in which all of those side effects are applied (assignment, postincrement, and preincrement) is unspecified; the compiler is free to reorder those operations as it sees fit. Because of this, expressions of the form x++ = ++x will give different results for different compilers, or for the same compiler with different compiler settings, or even based on the surrounding code.
The language standard explicitly calls this out as undefined behavior so that compiler implementors are free to handle the situation any way they see fit, with no requirement to try and do the "right thing" (whatever the "right thing" may be). GCC obviously issues a diagnostic in this case, but they don't have to. For one thing, not all cases are so easy to detect as this. Imagine a function like
void bar (int *a, int *b)
{
*a++ = *++b;
}
Is this a problem? Only if a and b point to the same thing, but if the caller is in a separate translation unit, there's no way to know that at compile time.

Related

behaviour of const static qualifier [duplicate]

I wrote some thing similar to this in my code
const int x=1;
int *ptr;
ptr = &x;
*ptr = 2;
Does this work on all compilers? Why doesn't the GCC compiler notice that we are changing a constant variable?

const actually doesn't mean "constant". Something that's "constant" in C has a value that's determined at compile time; a literal 42 is an example. The const keyword really means read-only. Consider, for example:
const int r = rand();
The value of r is not determined until program execution time, but the const keyword means that you're not permitted to modify r after it's been initialized.
In your code:
const int x=1;
int *ptr;
ptr = &x;
*ptr = 2;
the assignment ptr = &x; is a constraint violation, meaning that a conforming compiler is required to complain about it; you can't legally assign a const int* (pointer to const int) value to a non-const int* object. If the compiler generates an executable (which it needn't do; it could just reject it), then the behavior is not defined by the C standard.
For example, the generated code might actually store the value 2 in x -- but then a later reference to x might yield the value 1, because the compiler knows that x can't have been modified after its initialization. And it knows that because you told it so, by defining x as const. If you lie to the compiler, the consequences can be arbitrarily bad.
Actually, the worst thing that can happen is that the program behaves as you expect it to; that means you have a bug that's very difficult to detect. (But the diagnostic you should have gotten will have been a large clue.)

Online C 2011 draft:
6.7.3 Type qualiﬁers
...
6 If an attempt is made to modify an object deﬁned with a const-qualiﬁed type through use
of an lvalue with non-const-qualiﬁed type, the behavior is undeﬁned. If an attempt is
made to refer to an object deﬁned with a volatile-qualiﬁed type through use of an lvalue
with non-volatile-qualiﬁed type, the behavior is undeﬁned.133)
133) This applies to those objects that behave as if they were deﬁned with qualiﬁed types, even if they are
never actually deﬁned as objects in the program (such as an object at a memory-mapped input/output
address).
Emphasis added.
Since the behavior is left undefined, the compiler is not required to issue a diagnostic, nor is it required to halt translation. This would be difficult to catch in the general case; suppose you had a function like
void foo( int *p ) { *p = ...; }
defined in it's own separate translation unit. During translation, the compiler has no way of knowing if p could be pointing to a const-qualified object or not. If your call is something like
const int x;
foo( &x );
you may get a warning like parameter 1 of 'foo' discards qualifiers or something similarly illuminating.
Also note that the const qualifier doesn't necessarily mean that the associated variable will be stored in read-only memory, so it's possible the above code would "work" (update the value in x) in that you'd successfully update x by doing an end-run around the const semantics. But then you might as well just not declare x to be const.

There is a good discussion of this here:Does the evil cast get trumped by the evil compiler?
I would expect gcc to compile this because:
ptr is allowed to point to x, otherwise reading it would be impossible, although as the comment below says it's not exactly brilliant code and the compiler should complain. Warning options will (I guess) affect whether or not it's actually warned about.
when you write to x, the compiler no longer "knows" that it is writing to a const, all this is in the coders hands in C. However, it does really know, so it may well warn you, depending on the warning options you've selected.
whether it works or not, however, will depend on how the code is compiled, the way const is implemented for the compile options selected and the target CPU and architecture. It may work. Or it may crash. Or you may write to a "random" bit of memory and cause (or not) some freaky effect. It's not a good coding strategy, but that wasn't your question :-)

Bad programmer. No Moon Pie!
If you need to modify a const, copy it to a non-const variable and then work with it. It's const for a reason. Trying to "sneak" around a const can cause serious runtime issues. i.e. the optimizer has likely used the value inline, etc.
const int x=1;
int non_const_x = x;
non_const_x = 2;

Accessing a struct field in adjacent to a conditional expression

why there is a difference between the 2 next code segments:
struct g {
int m[100];
};
struct a {
struct g ttt[40];
struct g hhh[40];
}man;
extern int bar(int z);
//this code generate a call to memcopy.
void foo1(int idx){
bar(((idx == 5) ? man.hhh[idx+7] : man.ttt[idx+7]).m[idx+3]);
}
//this code doesn't generate a call to memcopy.
void foo2(int idx){
bar(((idx == 5) ? man.hhh[idx+7].m[idx+3] : man.ttt[idx+7].m[idx+3]));
}
In both codes segment I want to send the same field (depends on the conditional expression) to bar function. However one the first code generate a call to memcopy (when compiled with clang to powerpc arch it can be seen clearly). I wrote a little main and run the 2 functions and they gave me the same output (compiled with gcc 4.4.7).

This answer applies to C only - the question is dual-tagged but I am assuming OP is using C for reasons that will become clear later.
Here's the first expression again:
((idx == 5) ? man.hhh[idx+7] : man.ttt[idx+7]).m[idx+3]
The type of the conditional expression is struct g. However, the result of the conditional operator in C is not an lvalue. What is it then?
In C11 6.2.4p8 it's explicitly defined as a value of temporary lifetime.
In C90 the m[idx+3] is ill-formed: m is not an lvalue because the . operator only yields an lvalue if the left operand was an lvalue; and the array-pointer decay only applies to lvalues.
In C99 array-pointer decay happens to all values, but it's not explicitly stated where decayed m points.
Personally I think it's clear enough that in C99, something akin to the C11 behaviour was intended, so I would regard the code as well-defined in C99. Further discussion here. This is probably a moot point, as on all the compilers I tried, they gave the same result for -std=c99 as they did for -std=c11.
Moving forward then: In C11 (and probably C99), Snippet 1 should give the right result. Your compiler does that, but it seems that it optimizes the code poorly. It naively copies the whole value resulting from the conditional operator before indexing into it.
Testing with godbolt, I found that all versions of "x86 clang" and "PowerPC gcc 4.8" used memcpy; but "x86 gcc" was able to optimize the code.
In C++, the result of the conditional operator is an lvalue if the second and third operands were lvalues of the same type, so this problem shouldn't arise in that language.
To avoid this problem, use an alternative where the result of the conditional operator is not a struct or union value. For example you could just use Snippet 2; or either of:
bar( ((idx == 5) ? &man.hhh[idx+7] : &man.ttt[idx+7]))->m[idx+3] );
bar( ((idx == 5) ? man.hhh : man.ttt)[idx+7].m[idx+3] );

Can parentheses in C cause implicit cast?

Background
The last time I asked about whether parentheses were causing implicit cast (here), #pmg was nice enough to point out that "Nothing in C is done below int" But there, the discussion was about bitwise operators, and the parentheses turned out to be just a distraction.
Introduction
Below, the parentheses are the main attraction. Or, to be more boring but precise, the only operators I see are the parentheses and assignment operators.
At this reference about the C parentheses operator, I do not see anything about parentheses changing the type (outside of typecast syntax, which is not this case).
Meanwhile, here's a reference that reminds that there is automatic type conversion on assignment, but I don't think that will explain the static analysis tool behavior I will describe here.
As in my previous question, "OK" means that the static analysis tool did not warn about an implicit type conversion, and "NOT OK" means that it did.
int main(void)
{
unsigned int ui;
int i;
ui = (256U); // NOT OK (*) (1)
i = (256U); // NOT OK (*) (2)
i = 256; // OK
i = 256U; // NOT OK
ui = 256U; // OK (3)
ui = 256; // NOT OK
return(0);
}
I can understand them all except the first two - what do the parentheses do? If they do nothing in the way of implicit typecasting, then I would expect (1) to be OK and (2) to be NOT OK. If they do automatic type promotion of types smaller than int up to int, then I would expect (1) to be NOT OK and (2) to be OK. But this tool says that both are NOT OK.
Is this a static analysis tool error, or is the tool correct and there's something else I need to learn about implicit type conversions in C?
(BTW I hope that the value 256 is small enough not be causing overflow on my machine ...)

First, let's clear up some terminology. Nothing can cause an "implicit cast", because there is no such thing. A cast is a explicit operator, consisting of a type name in parentheses preceding an expression, such as (double)42; it specifies a conversion. Conversions can be either explicit (specified by a cast operator) or implicit, as in double x = 42;. So what you're really asking is whether parentheses can cause an implicit conversion.
And the answer, at least in the code you've shown us, is no.
Quoting the C99 standard (3.7 MB PDF), section 6.5.1p5:
A parenthesized expression is a primary expression. Its type and value
are identical to those of the unparenthesized expression. It is an
lvalue, a function designator, or a void expression if the
unparenthesized expression is, respectively, an lvalue, a function
designator, or a void expression.
And since 256U is already a primary expression, the parentheses make no difference at all; parentheses generally indicate precedence, but in this case there is no predecence to indicate.
What static analysis tool are you using? You should probably submit a bug report.

The tool is confused somehow. There's no casting here. Those parentheses just indicate precedence.

When dereferencing and post-incrementing a pointer to function pointer, what happens first?

Given this code:
typedef void (*Thunk)();
Thunk* gFP;
void foo(){ printf("Foo "); *gFP(); };
void bar(){ printf("Bar ");
Thunk Codex[] = { foo, bar };
gFP = Codex;
(*gFP++)();
Does the function call happen before or after the increment?
i.e: Will this print "Foo Foo Foo ..." or "Foo Bar"?

This is just my personal view. I'm not 100% convinced that this is correct.
So, please forgive me if my answer is wrong.
C99 6.5.2.2/10 Function calls says:
The order of evaluation of the
function designator, the actual
arguments, and subexpressions within
the actual arguments is unspecified,
but there is a sequence point before
the actual call.
C99 6.5.2.4/2 Postfix increment and decrement operators says:
The side effect of updating the stored
value of the operand shall occur
between the previous and the next
sequence point.
The side effect of post increment operator is completed somewhere before the
next sequence point.
Assuming the expression f( x ),
I think there is a sequence point after the evaluation of f and x, and
before the function call.
So, the side effect of gFP++ will be completed before the function call,
and the code in the question is expected to print Foo Bar.
Edit:
I removed the quotes from Annex-C in C99 and C++, and added the quotes from
C99.
Probably previous quotes were indistinct regarding the question.

The dereference happens first. This is the same thing as any other post-increment: the original value is used.
See, for instance, Post Increment with respect to Sequence Points
However, your question seems to be whether the function pointer use inside of foo() will call foo() or bar().
http://newsgroups.derkeiler.com/Archive/Comp/comp.std.c/2009-10/msg00053.html is a discussion in comp.std.c with the header "sequence point problem" that argues over precisely this point. I don't think it came to a consensus, but there were good arguments on both sides.
By my previous reading of the standard, this would invokes undefined behavior.
The call to a function acts as sequence point, but appendix C only says it acts as a sequence point relative to the expressions passed in as parameters -- they are guaranteed to be evaluated, but nothing else necessarily will (in f(i++) + g(j++) accessing either i in g() or j in f() invokes undefined behavior.)
However, 6.5.5.2 (p 10) says:
There is a sequence point after the
evaluations of the function designator
and the actual arguments but before
the actual call.
which means that it does sequence the ++.

The operator precedence table shows the order of operations for C.
In your example, gFP++ has the highest precedence, followed by *gFP
However, the increment is not done until all other operations have completed.
So what you will end up with is the dereference operating on gFP, then the function call, then the value of gFP being incremented.
So you will end up with a stack overflow.

C standard addressing simplification inconsistency

Section §6.5.3.2 "Address and indirection operators" ¶3 says (relevant section only):
The unary & operator returns the address of its operand. ...
If the operand is the result of a unary * operator, neither that operator nor the & operator is evaluated and the result is as if both were omitted, except that the constraints on the operators still apply and the result is not an lvalue. Similarly, if the operand is the result of a [] operator, neither the & operator nor the unary * that is implied by the [] is evaluated and the result is as if the & operator were removed and the [] operator were changed to a + operator. ...
This means that this:
#define NUM 10
int tmp[NUM];
int *i = tmp;
printf("%ti\n", (ptrdiff_t) (&*i - i) );
printf("%ti\n", (ptrdiff_t) (&i[NUM] - i) );
Should be perfectly legal, printing 0 and the NUM (10). The standard seems very clear that both of those cases are required to be optimized.
However, it doesn't seem to require the following to be optimized:
struct { int a; short b; } tmp, *s = tmp;
printf("%ti\n", (ptrdiff_t) (&s->b - s) );
This seems awfully inconsistent. I can see no reason that the above code shouldn't print the sizeof(int) plus (unlikely) padding (possibly 4).
Simplifying a &-> expression is going to be the same conceptually (IMHO) as &[], a simple address-plus-offset. It's even an offset that's going to be determinable at compile time, rather than potentially runtime with the [] operator.
Is there anything in the rationale about why this is so seemingly inconsistent?

In your example, &i[10] is actually not legal: it becomes i + 10, which becomes NULL + 10, and you can't perform arithmetic on a null pointer. (6.5.6/8 lists the conditions under which pointer arithmetic can be performed)
Anyway, this rule was added in C99; it was not present in C89. My understanding is that it was added in large part to make code like the following well-defined:
int* begin, * end;
int v[10];
begin = &v[0];
end = &v[10];
That last line is technically invalid in C89 (and in C++) but is allowed in C99 because of this rule. It was a relatively minor change that made a commonly used construct well-defined.
Because you can't perform arithmetic on a null pointer, your example (&s->b) would be invalid anyway.
As for why there is this "inconsistency," I can only guess. It's likely that no one thought to make it consistent or no one saw a compelling use case for this. It's possible that this was considered and ultimately rejected. There are no remarks about the &* reduction in the Rationale. You might be able to find some definitive information in the WG14 papers, but unfortunately they seem to be quite poorly organized, so trawling through them may be tedious.

I think that the rule hasn't been added for optimization purpose (what does it bring that the as-if rule doesn't?) but to allow &t[sizeof(t)/sizeof(*t)] and &*(t+sizeof(t)/sizeof(*t)) which would be undefined behaviour without it (writing such things directly may seem silly, but add a layer or two of macros and it can make sense). I don't see a case where special casing &p->m would bring such benefit. Note that as James pointed out, &p[10] with p a null pointer is still undefined behaviour; &p->m with p a null pointer would similarly have stayed invalid (and I must admit that I don't see any use when p is the null pointer).

I believe that the compiler can choose to pack in different ways, possibly adding padding between members of a struct to increase memory access speed. This means that you can't for sure say that b will always be an offset of 4 away. The single value does not have the same problem.
Also, the compiler may not know the layout of a struct in memory during the optimization phase, thus preventing any sort of optimization concerning struct member accesses and subsequent pointer casts.
edit:
I have another theory...
many times the compiler will optimize the abstract syntax tree just after lexical analysis and parsing. This means it will find things like operators that cancel out and expressions that evaluate to a constant and reduce those sections of the tree to one node. This also means that the information about structs is not available. later optimization passes that occur after some code generation may be able to take this into account because they have additional information, but for things like trimming the AST, that information is not yet there.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

why foo ++= ++foo may be undefined? - c

You're incrementing foo on both sides of the assignment, with no sequence points in between. That's not allowed; you can only modify a value once between sequence points.

Related

behaviour of const static qualifier [duplicate]

Accessing a struct field in adjacent to a conditional expression

Can parentheses in C cause implicit cast?

When dereferencing and post-incrementing a pointer to function pointer, what happens first?

C standard addressing simplification inconsistency

Categories

Resources