Background
The last time I asked about whether parentheses were causing implicit cast (here), #pmg was nice enough to point out that "Nothing in C is done below int" But there, the discussion was about bitwise operators, and the parentheses turned out to be just a distraction.
Introduction
Below, the parentheses are the main attraction. Or, to be more boring but precise, the only operators I see are the parentheses and assignment operators.
At this reference about the C parentheses operator, I do not see anything about parentheses changing the type (outside of typecast syntax, which is not this case).
Meanwhile, here's a reference that reminds that there is automatic type conversion on assignment, but I don't think that will explain the static analysis tool behavior I will describe here.
As in my previous question, "OK" means that the static analysis tool did not warn about an implicit type conversion, and "NOT OK" means that it did.
int main(void)
{
unsigned int ui;
int i;
ui = (256U); // NOT OK (*) (1)
i = (256U); // NOT OK (*) (2)
i = 256; // OK
i = 256U; // NOT OK
ui = 256U; // OK (3)
ui = 256; // NOT OK
return(0);
}
I can understand them all except the first two - what do the parentheses do? If they do nothing in the way of implicit typecasting, then I would expect (1) to be OK and (2) to be NOT OK. If they do automatic type promotion of types smaller than int up to int, then I would expect (1) to be NOT OK and (2) to be OK. But this tool says that both are NOT OK.
Is this a static analysis tool error, or is the tool correct and there's something else I need to learn about implicit type conversions in C?
(BTW I hope that the value 256 is small enough not be causing overflow on my machine ...)
First, let's clear up some terminology. Nothing can cause an "implicit cast", because there is no such thing. A cast is a explicit operator, consisting of a type name in parentheses preceding an expression, such as (double)42; it specifies a conversion. Conversions can be either explicit (specified by a cast operator) or implicit, as in double x = 42;. So what you're really asking is whether parentheses can cause an implicit conversion.
And the answer, at least in the code you've shown us, is no.
Quoting the C99 standard (3.7 MB PDF), section 6.5.1p5:
A parenthesized expression is a primary expression. Its type and value
are identical to those of the unparenthesized expression. It is an
lvalue, a function designator, or a void expression if the
unparenthesized expression is, respectively, an lvalue, a function
designator, or a void expression.
And since 256U is already a primary expression, the parentheses make no difference at all; parentheses generally indicate precedence, but in this case there is no predecence to indicate.
What static analysis tool are you using? You should probably submit a bug report.
The tool is confused somehow. There's no casting here. Those parentheses just indicate precedence.
Related
This question already has answers here:
What does (void)var actually do?
(4 answers)
Closed 3 years ago.
I am reading this book and I have come across this code:
static void
task1(void *args) {
int i;
(void)args;
for (;;) {
gpio_toggle(GPIOC,GPIO13);
for (i = 0; i < 1000000; i++)
__asm__("nop");
}
}
I understand all (relatively) except for line 5. What is (void)args; doing?
args is not used in the function body, and I know that if an argument is not used then one could write
static void
task2(void *args __attribute((unused))) {
// code here
}
which is not being done here. So what is (void) written like that doing?
In general when one does not use an function argument, the compiler is likely to warn you about it. After all if you aren't going to use it why put it there in the first place?
In effect it is an artificial use of the argument that does nothing useful other than telling the compiler you know what you are doing, so "Shatupalready with the warning!"
(void) args works on all of the platforms that I have seen. I believe the __attribute((unused)) is a gcc specific thing.
Every expression has a type. Any expression can be turned into a statement by adding a semicolon. If an expression-statement yields a non-void value, that value is discarded.
Compilers will often warn about a value being discarded. Even a simple case like
printf("hello world\n");
quietly discards the int value returned by printf; a compiler warning might remind the programmer to test that value and take some action if it indicates that the call failed. (Most compilers will not warn in this particular case, since printf calls are usually used in a statement context, with the result ignored.)
Casting an expression to type void discards the result, but it does so explicitly. This is likely to silence a compiler warning.
As far as the language is concerned, a (void) cast converts a result to type void, which is equivalent to discarding that result. From a programmer's point of view, a (void) cast can silence a warning that a value is not used, since you're explicitly ignoring it and asserting to the compiler that you know what you're doing (even if you don't).
Quoting the C standard (N1570 draft), 6.3.2.2:
The (nonexistent) value of a void expression (an expression that has
type void) shall not be used in any way, and implicit or
explicit conversions (except to void) shall not be applied to
such an expression. If an expression of any other type is evaluated as
a void expression, its value or designator is discarded. (A void
expression is evaluated for its side effects.)
and 6.2.5 paragraph 19:
The void type comprises an empty set of values; it is an
incomplete object type that cannot be completed.
It's a compiler warning (-W unused-variable or -W all) they're suppressing by using it. You are correct in that __attribute__((unused)) is a valid C macro to do what you're asking, but it's a matter of preference. It's also not supported by all C compilers.
Sources:
http://www.keil.com/support/man/docs/armcc/armcc_chr1359124983230.htm
https://en.cppreference.com/w/cpp/compiler_support
void is an empty data type, it can be used in a different number of situations.
A function that "returns" void doesn't return anything (and by definition is a procedure, not a function).
A void* (pointer of void) is often used as a generic pointer (a pointer to a memory address with no particular data type associated with it). The compiler will let you assign this pointer to any other pointer type without need of explicit conversion.
malloc() for example, returns void* so you can do things like int* v = malloc(sizeof(int)*10)); without need of explicitly recasting the return value of malloc() to int*.
The other way around is also true, a function that takes void* as parameter will take any pointer type without need of explicit conversion.
I am preparing for our programming test and I read this long long A[(10,10)] declaration (it was in some previous test in our course), which I have no more information about. Only other thing I know about it, is that it is not possible to initialize variable declared this way by calling A[5][1]=something. Otherwise, I would assume it is some kind of 2D array.
It also could be comma operator but the gcc compiler actually doesn't recognise it.
abc.c:3:16: error: expected ']'
long long A[10,10];
^ abc.c:3:13: note: to match this '['
long long A[10,10];
Do you have any clue if it is a thing, or just some nonsense? (I was trying to Google it, but these things aren't that easy to find...)
Thank you.
In array declarations, a constant-expression is expected, which is a subset of expression. Specifically, the comma operator and assignment expressions are not part of the set.
Array declarators are a kind of direct-declarator:
direct-declarator: ... |
direct-declarator "[" constant-expression? "]";
constant-expression: conditional-expression;
expression: assignment-expression | expression "," assignment-expression;
assignment-expression: conditional-expression |
unary-expression assignment-operator assignment-expression;
So the grammar doesn't allow for a comma here.
To answer your question "Do you have any clue if it is a thing, or just some nonsense?": Any declaration which is that non-intuitive that experienced programmers have to consult cpp reference is IMHO clearly nonsense.
I tested expression long long A[(10,10)] with Apple LLVM 8.0 compiler and C99 language dialect, and it worked. When consulting the cpp reference concerning comma operator, one can find the following:
Top-level comma operator is also disallowed in array bounds
// int a[2,3]; // error
int a[(2,3)]; // OK, VLA array of size 3 (VLA because (2,3) is not a constant expression)
So long long A[(10,10)] seams to be equivalent to long long A[10], where the 10 is the second part of non top-level comma expression (10,10). Interesting thing may be that an array declared this way is treated as VLA (variable length array, which's size is determined at runtime).
In my C program, I have a line where I am using the '==' operator, and the two operands are casted as char, like so:
char top_level_domain[sizeof(char) * 128];
...
if((char)top_level_domain[i] == ':'
|| (char)top_level_domain[i] == '/')
Is this recommended/safe? If not, how should I go about checking the contents of a certain element in an array?
EDIT: added declaration and removed casts to the character literals
In general, it is more safe and effective to avoid casting when you can, because it allows the compiler to perform type checking. For example, spot the error:
// Let's pretend you forgot or mixed up the type here...
char **top_level_domain;
// ...
if ((char) top_level_domain[i] == (char) ':')
...
Or maybe...
char top_level_domain[sizeof(char) * 128];
...
// Whoops! forgot to type [i]
if((char)top_level_domain[i] == ':'
|| (char)top_level_domain == '/')
Whoops! You forgot to dereference the pointer, you're getting garbage. The compiler would have given you a diagnostic message, but since you used a cast, the diagnostic message is gone.
Note: This will actually cause a diagnostic message on some compilers because char is narrower than char *, but if we were using size_t instead of char then there would be no narrowing, but it would still be an error.
Why use casts?
There are quite a few situations where the C "integer promotions" and "usual arithmetic conversions" can cause undesired behavior. For example,
size_t align_to_16_bytes(size_t x)
{
// return x & ~15u; // WRONG
return x & ~(size_t) 15u; // RIGHT
}
However, in general it will only cause problems when you are using types wider than int or when you are mixing signed and unsigned types that are at least as wide as int.
Newer languages such as Java and C# largely avoid this problem by only allowing widening implicit casts.
The casts are "safe" but useless, and very bad style. Generally in C, anything that needs a cast is at best bad style, and more often, invoking undefined behavior, so the presence of casts in a source file is a "code smell" - an indication that the author probably did something wrong and that the reader needs to be extra careful looking for bugs.
Just remove the casts and your code will be perfectly fine.
A cast is an explicit statement to the compiler that you want to override the default implicit type conversions (or account for the absence of them) that the language gives you. Generally speaking, these default implicit type conversions are well thought through by the language designers, and work with C's type safety, not against it.
A good example is void *, which, according to C11 Section 6.5.16.1.1, may implicitly be converted via assignment to or from "a pointer to any object type". This implies that you can not, for example, implicitly convert it to a pointer to a function. This is exactly how you would want it to work when calling malloc(), for instance - it must convert to some other type of pointer, since you obviously can't create objects of type void, but it makes no sense at all to dynamically allocate a block of memory for a function. Thus, the default implicit type conversions here do exactly what you'd want - let you convert to a pointer to any object type since that's the whole purpose, but loudly complain if you try to convert to anything else.
Some people seem to be of the view that casting the return from malloc() makes it "explicit" what you're trying to do, but (1) you never see those people doing things like int i = 1; double d = (double) i;, they seem to make a special case out of malloc(); and (2) it doesn't do this at all, since what the cast actually makes explicit is the fact that you want to override the type safety and default conversions that C gives you, when what you actually want to do in this case is to abide by them.
Alternatively, sometimes the implicit type conversions do not give you what you want, and a cast is necessary. The obvious example is integer division, which always gives you an int. The folks who made C could have provided another operator to perform floating point division with integers, if they wanted to, but they didn't, the result being that if you want to perform division with two integers and integer division is not what you want, then you have to cast one of them to a floating point type to override the default behavior to achieve what you want. If integer division is what you want in a particular case, then you obviously don't cast.
So, as a general rule, when C gives you the result you want without casting - which is most of the time - don't cast. Only cast when C's default behavior does not give you what you want, and you're willing to explicitly abandon the type safety it gives you as a result.
The casts are unsafe, not only because of the compiler type checking issue that Dietrich Epp speaks about, but also because the reduction of the value range can lead to false positives:
int myValue = 'a' + 768; //A number that is not equal to 'a'
assert(myValue != 'a'); //This is true...
assert((char)myValue == 'a'); //...but so is this!
Of course, this happens because I constructed myValue in a way to yield this false positive. However, 1/256th of all integers will compare equal to a specific character if the integer is casted to a char, which is a lot of possible false positives. This won't happen, if you allow the compiler to choose the integer representation for the comparison.
Section §6.5.3.2 "Address and indirection operators" ¶3 says (relevant section only):
The unary & operator returns the address of its operand. ...
If the operand is the result of a unary * operator, neither that operator nor the & operator is evaluated and the result is as if both were omitted, except that the constraints on the operators still apply and the result is not an lvalue. Similarly, if the operand is the result of a [] operator, neither the & operator nor the unary * that is implied by the [] is evaluated and the result is as if the & operator were removed and the [] operator were changed to a + operator. ...
This means that this:
#define NUM 10
int tmp[NUM];
int *i = tmp;
printf("%ti\n", (ptrdiff_t) (&*i - i) );
printf("%ti\n", (ptrdiff_t) (&i[NUM] - i) );
Should be perfectly legal, printing 0 and the NUM (10). The standard seems very clear that both of those cases are required to be optimized.
However, it doesn't seem to require the following to be optimized:
struct { int a; short b; } tmp, *s = tmp;
printf("%ti\n", (ptrdiff_t) (&s->b - s) );
This seems awfully inconsistent. I can see no reason that the above code shouldn't print the sizeof(int) plus (unlikely) padding (possibly 4).
Simplifying a &-> expression is going to be the same conceptually (IMHO) as &[], a simple address-plus-offset. It's even an offset that's going to be determinable at compile time, rather than potentially runtime with the [] operator.
Is there anything in the rationale about why this is so seemingly inconsistent?
In your example, &i[10] is actually not legal: it becomes i + 10, which becomes NULL + 10, and you can't perform arithmetic on a null pointer. (6.5.6/8 lists the conditions under which pointer arithmetic can be performed)
Anyway, this rule was added in C99; it was not present in C89. My understanding is that it was added in large part to make code like the following well-defined:
int* begin, * end;
int v[10];
begin = &v[0];
end = &v[10];
That last line is technically invalid in C89 (and in C++) but is allowed in C99 because of this rule. It was a relatively minor change that made a commonly used construct well-defined.
Because you can't perform arithmetic on a null pointer, your example (&s->b) would be invalid anyway.
As for why there is this "inconsistency," I can only guess. It's likely that no one thought to make it consistent or no one saw a compelling use case for this. It's possible that this was considered and ultimately rejected. There are no remarks about the &* reduction in the Rationale. You might be able to find some definitive information in the WG14 papers, but unfortunately they seem to be quite poorly organized, so trawling through them may be tedious.
I think that the rule hasn't been added for optimization purpose (what does it bring that the as-if rule doesn't?) but to allow &t[sizeof(t)/sizeof(*t)] and &*(t+sizeof(t)/sizeof(*t)) which would be undefined behaviour without it (writing such things directly may seem silly, but add a layer or two of macros and it can make sense). I don't see a case where special casing &p->m would bring such benefit. Note that as James pointed out, &p[10] with p a null pointer is still undefined behaviour; &p->m with p a null pointer would similarly have stayed invalid (and I must admit that I don't see any use when p is the null pointer).
I believe that the compiler can choose to pack in different ways, possibly adding padding between members of a struct to increase memory access speed. This means that you can't for sure say that b will always be an offset of 4 away. The single value does not have the same problem.
Also, the compiler may not know the layout of a struct in memory during the optimization phase, thus preventing any sort of optimization concerning struct member accesses and subsequent pointer casts.
edit:
I have another theory...
many times the compiler will optimize the abstract syntax tree just after lexical analysis and parsing. This means it will find things like operators that cancel out and expressions that evaluate to a constant and reduce those sections of the tree to one node. This also means that the information about structs is not available. later optimization passes that occur after some code generation may be able to take this into account because they have additional information, but for things like trimming the AST, that information is not yet there.
If I do:
int j = ({int x = 7; x+3;});
In i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5646) gcc it compiles just fine. The block in question ({int x = 7; x+3;}) returns the value of the last statement as the value of the block. If you remove the parenthesis it doesn't compile. Can I expect this to work in most C compilers?
Additionally, what is the name for this construct? I have had a lot of trouble searching for it because search engines don't index () or {} and C is a terrible search term. I also haven't been able to find anything about it in any of my books, probably because I don't know what to look for.
It's a GCC extension:
A compound statement enclosed in parentheses may appear as an expression in GNU C. This allows you to use loops, switches, and local variables within an expression.
Recall that a compound statement is a sequence of statements surrounded by braces; in this construct, parentheses go around the braces. For example:
({ int y = foo (); int z;
if (y > 0) z = y;
else z = - y;
z; })
is a valid (though slightly more complex than necessary) expression for the absolute value of foo ().
The last thing in the compound statement should be an expression followed by a semicolon; the value of this subexpression serves as the value of the entire construct. (If you use some other kind of statement last within the braces, the construct has type void, and thus effectively no value.)...
If you remove the parenthesis it doesn't compile.
Without the parentheses, the compiler will treat this as an aggregate initialization block and will fail when it sees the int keyword. You cannot have keywords in initializer blocks.
6.7.8 Initialization
11 The initializer for a scalar shall be a single expression, optionally enclosed in braces. The
initial value of the object is that of the expression (after conversion); the same type
constraints and conversions as for simple assignment apply, taking the type of the scalar
to be the unqualified version of its declared type.
6.2.5 Types
21 Arithmetic types and pointer types are collectively called scalar types. Array and
structure types are collectively called aggregate types.
Can I expect this to work in most c compilers?
No. Looks like a non-standard GNU extension.
Additionally, what is the name for this construct?
I wonder if there is any. Actually, this is similar to what macros typically do.
You can expect it to work in most versions of GCC.
You can expect it to work almost nowhere else - it is a GCC extension.
The section of the GCC manual that describes the feature is titled 'Statements and Declarations in Expressions':
A compound statement enclosed in parentheses may appear as an expression in GNU C.
Later it says:
Any temporaries created within a statement within a statement expression will be destroyed
at the statement’s end. This makes statement expressions inside macros slightly
different from function calls.
So, 'statement expression' seems to be the name used in the documentation.