C language If statement - c

I am new to C language and trying to figure out the meaning of the following code.
In here if (!msize) checking to see if msize is zero or if msize is NULL ?
if (!msize)
msize = 1 / msize; /* provoke a signal */
//Example 1: A division-by-zero misuse, in lib/mpi/mpi-pow.c of the Linux kernel, where the entire code will be optimized away.
//Compilers, GCC 4.7 and Clang 3.1

It depends on the type of msize.
If msize is a pointer, it tests whether it is NULL.
If msize is not a pointer, it tests whether it is 0.
This distinction may seem pedantic, but it's important. While NULL is actually 0 on most systems, the C standard allows it to be any other value.
I did some further reading, because I started to doubt whether my understanding above is correct.
Here are relevant parts of the C standard.
§6.5.3.3 Unary arithmetic operators
(5) The result of the logical negation operator ! is 0 if the value of its
operand compares unequal to 0, 1 if the value of its operand compares
equal to 0. The result has type int. The expression !E is equivalent
to (0==E).
§6.3.2.3 Pointers
(3) An integer constant expression with the value 0, or such an expression
cast to type void *, is called a null pointer constant. 66) If a null
pointer constant is converted to a pointer type, the resulting
pointer, called a null pointer, is guaranteed to compare unequal to a
pointer to any object or function.
(6) Any pointer type may be converted to an integer type. Except as
previously specified, the result is implementation-defined. If the
result cannot be represented in the integer type, the behavior is
undefined. The result need not be in the range of values of any integer
type.
Footnotes: 66 The macro NULL is defined in <stddef.h> (and other headers) as a null pointer constant; see 7.19.
As you can see, 0 is kind of a magic number in C. For systems with a non-zero NULL, I expect that the actual behaviour of !msize may be implementation-defined. In any case, this is all a bit nit-picky.
I tracked down the source of your example in the paper: Undefined Behavior: What Happened to My Code?. The text discussing your example states:
As mentioned earlier, at the instruction set level, x86 raises an
exception for a division by zero [17, 3.2], while MIPS [22, A.6] and
PowerPC [15, 3.3.38] silently ignore it. A division by zero in C is
undefined behavior [19, 6.5.5], and a compiler can thus simply assume
that the divisor is always non-zero.
Figure 1 shows a division-by-zero misuse in the Linux kernel. From the
programmer’s comment it is clear that the intention is to signal an
error in case msize is zero. When compiling with GCC, this code
behaves as intended on an x86, but not on a PowerPC, because it will
not generate an exception. When compiling with Clang, the result is
even more surprising. Clang assumes that the divisor msize must be
non-zero—on any system—since otherwise the division is undefined.
Combined with this assumption, the zero check !msize becomes always
false, since msize cannot be both zero and non-zero. The compiler
determines that the whole block of code is unreachable and removes it,
which has the unexpected effect of removing the programmer’s original
intention of guarding against the case when msize is zero.
So in your case, the answer you really needed was yes: it tests whether msize is 0.

Related

how logical NOT operator works in c?

How the logical NOT operator ! actually works in c?
How it turns all non-zero int into 0 and vice-versa?
For example:
#include <stdio.h>
void main() {
if(!(-76))
printf("I won't print anything");
if(!(2))
printf("I will also not print anything");
}
doesn't print anything, which could mean -76 and 2 was turned into zero...
So, I tried this:
#include <stdio.h>
void main() {
int x = !4;
printf("%d", x);
}
which indeed printed 0
and now I don't get how, is it flipping all the bits to 0 or what?
Most CPU architectures include an instruction to compare to zero; or even check the result against zero for most operations run on the processor. How this construct is implemented will differ from compiler to compiler.
For example, in x86, there are two instructions: JZ and JNZ: jump zero and jump not zero, which can be used if your test is an if statement. If the last value looked at was zero, jump (or don't jump) to a new instruction.
Given this, it's trivial to implement int x = !4; at assembly level as a jump if 4 is zero or not, though this particular example would be likely calculated at compile time, since all values are constant.
Additionally, most versions of the x86 instruction set support the SETZ instruction directly, which will set a register directly to 1 or 0, based on whether the processors zero flag is currently set. This can be used to implement the logical NOT operation directly.
6.5.3.3 Unary arithmetic operators
Constraints
1 The operand of the unary + or - operator shall have arithmetic type; of the ~ operator, integer type; of the ! operator, scalar type.
Semantics
...
5 The result of the logical negation operator ! is 0 if the value of its operand compares unequal to 0, 1 if the value of its operand compares equal to 0. The result has type int. The expression !E is equivalent to (0==E).
C 202x Working Draft
So, that's what language definition says should happen; if the expression x evaluates to non-zero, then the expression !x should evaluate to zero; if x evaluates to zero, then !x should evaluate to 1. The bits of the operand are not affected.
How that's accomplished in the machine code is up to the specific implementation; it depends on the available instruction set, the compiler, and various other factors such that no one answer works everywhere. It could translate to a branch statement, it could take advantage of specialized instructions, etc.

Using + to check if multiple pointers are all NULL

Syntactically it makes sense (Although it looks like some other language, which I don't particularly enjoy), it can save a lot of typing and code space, but how bad is it?
if(p1 + (unsigned)p2 + (unsigned)p3 == NULL)
{
// all pointers are NULL, exit
}
Using pointer arithmetic with a pointer rvalue, I don't see how it could give a false result (the entire expression to evaluate to NULL even though not all pointers are NULL), but I don't exactly know how much evilness this potentially hides, so is it bad to do this, not-common way of checking if plenty of pointers are all NULL?
Regarding to the original version of the question, which omitted the casts ...
it can save a lot of typing and code space, but how bad is it?
Very, very bad. Its behavior is altogether undefined, and if your compiler fails to reject it then you should get yourself a better one. Subtraction of one pointer from another is defined under some circumstances (and yields an integer result), but it is never meaningful to add two pointers.
Inasmuch as it shouldn't even compile, every keystroke used to type it instead of something that works is wasted, so no, it doesn't save typing or code space.
I don't see how it could give a false result.
If the compiler actually accepts it, the result can be anything at all. It is undefined.
so is it bad to do this, not-common way of checking if plenty of pointers are all NULL?
Yes.
Regarding the modified question in which all but one of the pointers are cast to integer:
The casts do not rescue the code -- multiple problems remain.
If the remaining pointer does not point to a valid object, or if the sum of the integers is negative or greater than the number of elements in the array to which the pointer points then the result of the pointer addition is still undefined (where a pointer to a scalar is treated as a pointer to a one-element array). Of course, the integer sum can't be negative in this particular case, but that's of minimal advantage.
C does not guarantee that casting a null pointer to an integer yields the value 0. It is common for it to do so, but the language does not require it.
C does not guarantee that non-null pointers convert to nonzero integers, and with your particular code that's a genuine risk. The type unsigned is not necessarily large enough to afford a distinct value to every distinct pointer.
Even if all of the foregoing were not a problem for some particular implementation -- that is, if you could safely perform arithmetic on a NULL pointer, and NULL pointers reliably converted to integers as zero, and non-NULL pointers reliably converted to nonzero -- the test could still go wrong because two nonzero unsigned integers can sum to zero. That happens where the arithmetic sum of the two is equal to UINT_MAX + 1.
There are multiple reasons why this is not a reliable method.
First, when you add an integer to a pointer, the C standard does not say what happens if the result is outside of the array into which the pointer points. (For these purposes, pointing just one past the last element, the end of the array, counts as inside, not outside. Also, a pointer to a single object counts as an array of one object.) Note that the C standard does not just not say what the result of the addition is; it does not say what the behavior of the entire program is. So, once you execute an addition that goes outside of an array, you cannot predict (from the C standard) what your program will do at all.
One likely result is that the compiler will see pointer + integer + integer and reason (or, more technically, apply transformations as if this reasoning were used) that pointer + integer is valid only if pointer is not NULL, and then the result is never NULL, so the expression pointer + integer is never NULL. Similarly, pointer + integer + integer is never NULL. Therefore pointer + integer + integer == NULL is always false, and we can optimize the program by removing this code completely. Thus, the code to handle the case when all pointers are NULL will be silently removed from your program.
Second, even if the C standard did guarantee a result of the addition, this expression could, hypothetically, evaluate to NULL even if none of the pointers were NULL. For example, consider a 16-bit address space where the first pointer were represented with the address 0x7000, the second were 0x6000, and the third were 0x3000. (I will also suppose these are char * pointers, so one element is one byte.) If we add these, the mathematical result is 0x10000. In 16-bit arithmetic, that wraps, so the computed result is 0x0000. Thus, the expression could evaluate to zero, which is likely used for NULL.
Third, unsigned may be narrower than pointers (for example, it may be 32 bits while pointers are 64), so the cast may lose information—there may be non-zero bits in the bits that were lost during the conversion, so the test will fail to detect them.
There are situations where we want to optimize pointer tests, and there are legitimate but non-standard ways to do it. On some processors, branching can be expensive, so doing some arithmetic with one test and one branch may be faster than doing three tests and three branches. C provides an integer type intended for working with pointer representations: uintptr_t, declared in <stdint.h>. With that, we can write this code:
if (((uintptr_t) p1 | (uintptr_t) p2 | (uintptr_t) p3) == 0) …
What this does is convert each pointer to an unsigned integer of a width suitable for working with pointer representations. The C standard does not say what the result of this conversion is, but it is intended to be unsurprising, and C implementations for flat address spaces may document that the result is the memory address. They may also document that NULL is the zero address. Once we have these integers, we OR them together instead of adding them. The result of an OR has a bit set if either of the corresponding bits in its operands was set. Thus, if any one of the addresses is not zero, then the result will not be zero either. So this code, if executed in a suitable C implementation, will perform the test you desire.
(I have used such tests in special high-performance code to test whether all pointers were aligned as desired, rather than to test for NULL. In that case, I had direct access to the compiler developers and could ensure the compiler would behave as desired. This is not standard C code.)
Using any sort of pointer arithmetic on non-array pointers is undefined behavior in C.

Signed integers' undefined behavior and Apple Secure Coding Guide

Apple Secure Coding Guide says the following (page 27):
Also, any bits that overflow past the length of an integer variable (whether signed or unsigned) are dropped.
However, regards to signed integer overflow C standard (89) says:
An example of undefined behavior is the behavior on integer overflow.
and
If an exception occurs during the evaluation of an expression (that is, if the result is not mathematically defined or not representable), the behavior is undefined.
Is the Coding Guide wrong? Is there something here that I don't get? I am not convinced myself that Apple Secure Coding Guide could get this wrong.
Here is a second opinion, from a static analyzer described as detecting undefined behavior:
int x;
int main(){
x = 0x7fffffff + 1;
}
The analyzer is run so:
$ frama-c -val -machdep x86_32 t.c
And it produces:
[kernel] preprocessing with "gcc -C -E -I. t.c"
[value] Analyzing a complete application starting at main
...
t.c:4:[kernel] warning: signed overflow. assert 0x7fffffff+1 ≤ 2147483647;
...
[value] Values at end of function main:
NON TERMINATING FUNCTION
This means that the program t.c contains undefined behavior, and that no execution of it ever terminates without causing undefined behavior.
Let's take this example:
1 << 32
If we assume 32-bit int, C clearly says it is undefined behavior. Period.
But any implementation can define this undefined behavior.
gcc for example says (while not very explicit in defining the behavior):
GCC does not use the latitude given in C99 only to treat certain aspects of signed '<<' as undefined, but this is subject to change.
http://gcc.gnu.org/onlinedocs/gcc/Integers-implementation.html
I don't know for clang but I suspect that as for gcc, the evaluation of an expression like 1 << 32 would give no surprise (that is, evaluate to 0).
But even if it is defined on implementations running in Apple operating systems, a portable program should not make use of expressions that invoke undefined behavior in the C language.
EDIT: I thought the Apple sentence was dealing only with bitwise << operator. It looks like it's more general and in that case for C language, they are utterly wrong.
The two statements are not mutually incompatible.
The standard does not define what behaviour each implementation is required to provide (so different implementations can do different things and still be standard conformant).
Apple is allowed to define the behaviour of its implementation.
You as a programmer would be well advised to treat the behaviour as undefined since your code may need to be moved to other platforms where the behaviour is different, and perhaps because Apple could, in theory, change its mind in the future and still conform to the standard.
Consider the code
void test(int mode)
{
int32_t a = 0x12345678;
int32_t b = mode ? a*0x10000 : a*0x10000LL;
return b;
}
If this method is invoked with a mode value of zero, the code will compute the long long value 0x0000123456780000 and store it into a. The behavior of this is fully defined by the C standard: if bit 31 of the result is clear, it will lop off all but the bottom 32 bits and store the resulting (positive) integer into a. If bit 31 were set and the result were being stored to a 32-bit int rather than a variable of type int32_t, the implementation would have some latitude, but implementations are only allowed to define int32_t if they would perform such narrowing conversions according to the rules of two's-complement math.
If this method were invoked with a non-zero mode value, then the numerical computation would yield a result outside the range of the temporary expression value, and as such would cause Undefined Behavior. While the rules dictate what should happen if a calculation performed on a longer type is stored into a shorter one, they do not indicate what should happen if calculations don't fit in the type with which they are performed. A rather nasty gap in the standard (which should IMHO be plugged) occurs with:
uint16_t multiply(uint16_t x, uint16_t y)
{
return x*y;
}
For all combinations of x and y values where the Standard says anything about what this function should do, the Standard requires that it compute and return the product mod 65536. If the Standard were to mandate that for all combinations of x and y values 0-65535 this method must return the arithmetical value of (x*y) mod 65536, it would be mandating behavior with which 99.99% of standards-compliant compilers would already be in conformance. Unfortunately, on machines where int is 32 bits, the Standard presently imposes no requirements with regard to this function's behavior in cases where the arithmetical product would be larger than 2147483647. Even though any portion of the intermediate result beyond the bottom 16 bits will ignored, the code will try to evaluate the result using a 32-bit signed integer type; the Standard imposes no requirements on what should happen if a compiler recognizes that the product will overflow that type.

How is {int i=999; char c=i;} different from {char c=999;}?

My friend says he read it on some page on SO that they are different,but how could the two be possibly different?
Case 1
int i=999;
char c=i;
Case 2
char c=999;
In first case,we are initializing the integer i to 999,then initializing c with i,which is in fact 999.In the second case, we initialize c directly with 999.The truncation and loss of information aside, how on earth are these two cases different?
EDIT
Here's the link that I was talking of
why no overflow warning when converting int to char
One member commenting there says --It's not the same thing. The first is an assignment, the second is an initialization
So isn't it a lot more than only a question of optimization by the compiler?
They have the same semantics.
The constant 999 is of type int.
int i=999;
char c=i;
i created as an object of type int and initialized with the int value 999, with the obvious semantics.
c is created as an object of type char, and initialized with the value of i, which happens to be 999. That value is implicitly converted from int to char.
The signedness of plain char is implementation-defined.
If plain char is an unsigned type, the result of the conversion is well defined. The value is reduced modulo CHAR_MAX+1. For a typical implementation with 8-bit bytes (CHAR_BIT==8), CHAR_MAX+1 will be 256, and the value stored will be 999 % 256, or 231.
If plain char is a signed type, and 999 exceeds CHAR_MAX, the conversion yields an implementation-defined result (or, starting with C99, raises an implementation-defined signal, but I know of no implementations that do that). Typically, for a 2's-complement system with CHAR_BIT==8, the result will be -25.
char c=999;
c is created as an object of type char. Its initial value is the int value 999 converted to char -- by exactly the same rules I described above.
If CHAR_MAX >= 999 (which can happen only if CHAR_BIT, the number of bits in a byte, is at least 10), then the conversion is trivial. There are C implementations for DSPs (digital signal processors) with CHAR_BIT set to, for example, 32. It's not something you're likely to run across on most systems.
You may be more likely to get a warning in the second case, since it's converting a constant expression; in the first case, the compiler might not keep track of the expected value of i. But a sufficiently clever compiler could warn about both, and a sufficiently naive (but still fully conforming) compiler could warn about neither.
As I said above, the result of converting a value to a signed type, when the source value doesn't fit in the target type, is implementation-defined. I suppose it's conceivable that an implementation could define different rules for constant and non-constant expressions. That would be a perverse choice, though; I'm not sure even the DS9K does that.
As for the referenced comment "The first is an assignment, the second is an initialization", that's incorrect. Both are initializations; there is no assignment in either code snippet. There is a difference in that one is an initialization with a constant value, and the other is not. Which implies, incidentally, that the second snippet could appear at file scope, outside any function, while the first could not.
Any optimizing compiler will just make the int i = 999 local variable disappear and assign the truncated value directly to c in both cases. (Assuming that you are not using i anywhere else)
It depends on your compiler and optimization settings. Take a look at the actual assembly listing to see how different they are. For GCC and reasonable optimizations, the two blocks of code are probably equivalent.
Aside from the fact that the first also defines an object iof type int, the semantics are identical.
i,which is in fact 999
No, i is a variable. Semantically, it doesn't have a value at the point of the initialization of c ... the value won't be known until runtime (even though we can clearly see what it will be, and so can an optimizing compiler). But in case 2 you're assigning 999 to a char, which doesn't fit, so the compiler issues a warning.

On a platform where NULL is represented as 0, has a compiler ever generated unexpected code for NULL <= p

In C99, equality == does not seem ever to be undefined. It can produce 1 by accident if you apply it to invalid addresses (for instance &x + 1 == &y may be true by accident). It does not produce undefined behavior. Many, but not all, invalid addresses are undefined to compute/use according to the standard, so that in p == &x with p a dangling pointer, or in &x + 2 == &y, the invalid address causes the undefined behavior, not ==.
On the other hand, >= and other comparisons are undefined when applied to pointers that do not point within the same object. That includes testing q >= NULL where q is a valid pointer. This test is the subject of my question.
I work on a static analyzer for low-level embedded code. It is normal for this kind of code to do things outside what the standard allows. As an example, an array of pointers may, in this kind of code, be initialized with memset(...,0,...), although the standard does not specify that NULL and 0 must have the same representation. In order to be useful, the analyzer must accept this kind of thing and interpret them the way the programmer expects. Warning the programmer would be perceived as a false positive.
So the analyzer is already assuming that NULL and 0 have the same representation (you are supposed to check your compiler against the analyzer to make sure they agree on this kind of assumptions). I am noticing that some programs compare valid pointers against NULL with >= (this library is an example). This works as intended as long as NULL is represented as 0 and pointer comparison is compiled as an unsigned integer comparison.
I only wish the analyzer to warn about this if, perhaps because of some agressive optimization, it may be compiled into something different from what the programmer meant on conventional platforms. Hence my question: is there any example of a program not evaluating q >= NULL as 1, on a platform where NULL is represented as 0?
NOTE: this question is not about using 0 in a pointer context to get a null pointer. The assumption about the representation of NULL is a real assumption, because there is no conversion in the memset() example.
There are definitely pointers that when you reinterpret them as a signed integer of pointer size will have negative sign.
In particular all kernel memory on Win32, and if you use "large address aware" then even 1GB of userspace since you get 3GB of userspace.
I don't know the details of c pointer arithmetic, but I suspect that these might compare as <0 in some compilers.

Resources