Why does GCC not detect overflow on variable initialization? - c

Why is it compiled without errors? What am I doing wrong?
#include <stdio.h>
int main (){
int n1 = 90, n2 = 93, n3 = 95;
int i = 2147483647;
int ii = 2147483646;
int iii = 2147483650;
char c1[50] = {'\0'};
char c2[50] = {'\0'};
char c3[50] = {'\0'};
n1 = sprintf(c1, "%d", i+i);
n2 = sprintf(c2, "%d", ii);
n3 = sprintf(c3, "%d", iii);
printf("n1 = %d, n2 = %d, n3 = %d\n i = |%s| \n ii = |%s|\niii = |%s|\n", n1, n2, n3, c1, c2, c3);
return 0;
}
gcc filename -Wall -Wextra -Werror
I guess %d can't be more than int, but it's compiled and as a result:
n1 = 2, n2 = 10, n3 = 11
i = |-2|
ii = |2147483646|
iii = |-2147483646|
I was expecting a GCC error.

There is an error in your initialization to iii, where the constant provided does not fit in int.
GCC will diagnose this issue if you enable -pedantic. From the documentation:
-Wpedantic
-pedantic
Issue all the warnings demanded by strict ISO C and ISO C++; reject all programs that use forbidden extensions, and some other programs that do not follow ISO C and ISO C++. For ISO C, follows the version of the ISO C standard specified by any -std option used.
When doing so, I get the error:
.code.tio.c: In function ‘main’:
.code.tio.c:7:15: error: overflow in conversion from ‘long int’ to ‘int’ changes value from ‘2147483650’ to ‘-2147483646’ [-Werror=overflow]
int iii = 2147483650;
^~~~~~~~~~
cc1: all warnings being treated as errors
Try it online!
Other problems
Arithmetic leading to signed integer overflow
The arithmetic operation i+1 triggers signed integer overflow, which has undefined behavior, so the compiler is free to do whatever it wants with that code.
Note that both operands to the + operator have type int, so if any result is generated, it would be an int. However, since signed integer overflow is undefined, no result may be generated (e.g., the program could just halt), or a random result may be generated, or some other overflow behavior may occur that matches your observation.
In the general case, there isn't any way for the compiler to know if any particular operation will actually cause overflow. In this case, static code analysis may have revealed it. I believe GCC does perform some rudimentary static code analysis, but it is not required to identify every instance of undefined behavior.
Using sprintf instead of snprintf
While it is safe to use sprintf in your particular context, it is generally preferable to use snprintf to guard against buffer overflow exploits. snprintf simply needs an extra parameter to indicate the size of the buffer, and it will NUL terminate the string for you.

I can really see after jxh's excellent answer somebody would still be saying "but whyyy?"
Here's why:
typedef int HANDLE;
#define HKEY_LOCAL_MACHINE ((HANDLE)0x80000001)
No, HANDLE isn't int anymore, but it was in 1994. Everybody and their brother depended on signed overflow just working at compile time. If you changed it you broke your platform headers. That didn't happen until the big 64 bit port.
The ancient compilers simply didn't check for constant out of range. They just parsed the constant with something analogous to strtol; the overflow was really a runtime overflow in the compiler itself; without code written to detect it it simply didn't exist.
The static analysis didn't see 0x80000001; it saw -bignum. This used to bite people when cross compiling to different bitnesses; sometimes compile time constants were just wrong. One by one all this stuff got cleaned up, but there were too many places that depended on no warning on overflow (because the last thing you want is warnings in the platform headers), so it was left as is.

Related

How to filter the expression akin to `int c = 1/0*0` in C language?

OS: Ubuntu 18.04
GCC: 7.5.0
I'm writing an expression generator to test my simple debugger, and want to filter the expressions with division by zero behavior. However i encounter a troubling problem.
As for the definite division by zero behavior akin to int c = 1/0, it will raise a signal so i can handle these cases by signal(). Nevertheless, in the case akin to int c = 1/0*0, cis equal 0 , and the program will never trap into the signal handler.
The test code as below.
#include<stdio.h>
#include<stdlib.h>
#include<signal.h>
void ss(int sig){
printf("division by zero.\n");
exit(0);
}
int main(){
int c;
signal(SIGFPE, ss);
c = (1u/0u)*0u;
printf("%d\n", c);
}
Here is the result.
gcc -o divide_0 divide_0.c
divide_0.c: In function ‘main’:
divide_0.c:15:11: warning: division by zero [-Wdiv-by-zero]
c = (1u/0u)*0u;
^
0
How can i capture the warning in this case?
Forcing the compiler to execute a division by zero is actually hard:
First, as you have noted, the compiler may evaluate 1/0 at compile-time. To prevent it from knowing the divisor is zero, we can use a volatile object, as in volatile int zero = 0; c = 1/zero;. The volatile qualifier tells the compiler the object may be changed by means unknown to it, so it cannot assume it is zero and must get the value when the expression is being evaluated.
Multiplying by zero in (1u/0u)*0u is counterproductive, even after we change the divisor to zero. The compiler can reason that any defined result of (1u/zero)*0u is zero, and therefore it is allowed to use zero as the union of the defined results (zero) and the undefined results (whatever the compiler likes), so it can replace (1u/zero)*0u with zero, 0. That is, it must still evaluate zero because it is volatile, but then it is free to just produce zero as the result without doing the division.
The compiler does not have to use a division instruction to evaluate a division operator. I tested with Apple Clang 11, and it was evaluating 1u/zero with instructions equivalent to zero == 1 ? 1 : 0;. In other words, it just did a compare and a set, not a division, presumably because compare and set is faster. When I changed this to 13u/zero, then the compiler used a divide instruction. I expect your best bet for getting a divide instruction would be to use volatile int unknown = 0; unknown = unknown/unknown;. Even then, a compiler would be allowed by the C standard to perform the division using any instructions it wanted, not a divide. But I presume compilers will generally generate a divide instruction in this case.
Then this code will execute a division at run-time. Whether that causes a signal and what happens with that signal depends on your computing platform.
Division by zero is not guaranteed to generate a signal.
Whenever your expression evaluator performs a division, it needs to check if the divisor is 0, and if so perform an appropriate action.
I assume that the compiler somehow just removes the division. It is allowed to do that. So instead try this code:
int main(int argc, char **argv){
int c;
int d = argc - 1;
signal(SIGFPE, ss);
c = 1u/d;
printf("%d\n", c);
}
Call it without arguments. The trick here is that the compiler cannot know how many arguments you will give the program, so it cannot optimize it away. Well, not as easily anyway.
The compiler itself complains about this since it can tell that your code is faulty and unsafe. However, you can try to use -Wno-div-by-zero argument for gcc at your own risk.

C: How to best handle unsigned integers modulo arithmetic ("wrap-around") when using unsigned operands in calculations

There was this range checking function that required two signed integer parameters:
range_limit(long int lower, long int upper)
It was called with range_limit(0, controller_limit). I needed to expand the range check to also include negative numbers up to the 'controller_limit' magnitude.
I naively changed the call to
range_limit(-controller_limit, controller_limit)
Although it compiled without warnings, this did not work as I expected.
I missed that controller_limit was unsigned integer.
In C, simple integer calculations can lead to surprising results. For example these calculations
0u - 1;
or more relevant
unsigned int ui = 1;
-ui;
result in 4294967295 of type unsigned int (aka UINT_MAX). As I understand it, this is due to integer conversion rules and modulo arithmetics of unsigned operands see here.
By definition, unsigned arithmetic does not overflow but rather "wraps-around". This behavior is well defined, so the compiler will not issue a warning (at least not gcc) if you use these expressions calling a function:
#include <stdio.h>
void f_l(long int li) {
printf("%li\n", li); // outputs: 4294967295
}
int main(void)
{
unsigned int ui = 1;
f_l(-ui);
return 0;
}
Try this code for yourself!
So instead of passing a negative value I passed a ridiculously high positive value to the function.
My fix was to cast from unsigned integer into int:
range_limit(-(int)controller_limit, controller_limit);
Obviously, integer modulo behavior in combination with integer conversion rules allows for subtle mistakes that are hard to spot especially, as the compiler does not help in finding these mistakes.
As the compiler does not emit any warnings and you can come across these kind of calculations any day, I'd like to know:
If you have to deal with unsigned operands, how do you best avoid the unsigned integers modulo arithmetic pitfall?
Note:
While gcc does not provide any help in detecting integer modulo arithmetic (at the time of writing), clang does. The compiler flag "-fsanitize=unsigned-integer-overflow" will enable detection of modulo arithmetic (using "-Wconversion" is not sufficient), however, not at compile time but at runtime. Try for yourself!
Further reading:
Seacord: Secure Coding in C and C++, Chapter 5, Integer Security
Using signed integers does not change the situation at all.
A C implementation is under no obligation to raise a run-time warning or error as a response to Undefined Behaviour. Undefined Behaviour is undefined, as it says; the C standard provides absolutely no requirements or guidance about the outcome. A particular implementation can choose any mechanism it sees fit in response to Undefined Behaviour, including explicitly defining the result. (If you rely on that explicit definition, your program is no longer portable to other compilers with different or undocumented behaviour. Perhaps you don't care.)
For example, GCC defines the result of out-of-bounds integer conversions and some bitwise operations in Implementation-defined behaviour section of its manual.
If you're worried about integer overflow (and there are lots of times you should be worried about it), it's up to you to protect yourself.
For example, instead of allowing:
unsigned_counter += 5;
to overflow, you could write:
if (unsigned_count > UINT_MAX - 5) {
/* Handle the error */
}
else {
unsigned_counter += 5;
}
And you should do that in cases where integer overflow will get you into trouble. A common example, which can (and has!) lead to buffer-overflow exploits, comes from checking whether a buffer has enough room for an addition:
if (buffer_length + added_length >= buffer_capacity) {
/* Reallocate buffer or fail*/
}
memcpy(buffer + buffer_length, add_characters, added_length);
buffer_length += added_length;
buffer[buffer_length] = 0;
If buffer_length + added_length overflows -- in either signed or unsigned arithmetic -- the necessary reallocation (or failure) won't trigger and the memcpy will overwrite memory or segfault or do something else you weren't expecting.
It's easy to fix, so it's worth getting into the habit:
if (added_length >= buffer_capacity
|| buffer_length >= buffer_capacity - added_length) {
/* Reallocate buffer or fail*/
}
memcpy(buffer + buffer_length, add_characters, added_length);
buffer_length += added_length;
buffer[buffer_length] = 0;
Another similar case where you can get into serious trouble is when you are using a loop and your increment is more than one.
This is safe:
for (i = 0; i < limit; ++i) ...
This could lead to an infinite loop:
for (i = 0; i < limit; i += 2) ...
The first one is safe -- assuming i and limit are the same type -- because i + 1 cannot overflow if i < limit. The most it can be is limit itself. But no such guarantee can be made about i + 2, since limit could be INT_MAX (or whatever is the maximum value for the integer type being used). Again, the fix is simple: compare the difference rather than the sum.
If you're using GCC and you don't care about full portability, you can use the GCC overflow-detection builtins to help you. They're also documented in the GCC manual.

Which "C" implementation(s) do not implement modulo arithmetic for signed integers?

In reference to C11 draft, section 3.4.3 and C11 draft, section H.2.2, I'm looking for "C" implementations that implement behaviour other than modulo arithmetic for signed integers.
Specifically, I am looking for instances where this is the default behaviour, possibly due to the underlying machine architecture.
Here's a code sample and terminal session that illustrates modulo arithmetic behaviour for signed integers:
overflow.c:
#include <stdio.h>
#include <limits.h>
int main(int argc, char *argv[])
{
int a, b;
printf ( "INT_MAX = %d\n", INT_MAX );
if ( argc == 2 && sscanf(argv[1], "%d,%d", &a, &b) == 2 ) {
int c = a + b;
printf ( "%d + %d = %d\n", a, b, c );
}
return 0;
}
Terminal session:
$ ./overflow 2000000000,2000000000
INT_MAX = 2147483647
2000000000 + 2000000000 = -294967296
Even with a "familiar" compiler like gcc, on a "familiar" platform like x86, signed integer overflow can do something other than the "obvious" twos-complement wraparound behavior.
One amusing (or possibly horrifying) example is the following (see on godbolt):
#include <stdio.h>
int main(void) {
for (int i = 0; i >= 0; i += 1000000000) {
printf("%d\n", i);
}
printf("done\n");
return 0;
}
Naively, you would expect this to output
0
1000000000
2000000000
done
And with gcc -O0 you would be right. But with gcc -O2 you get
0
1000000000
2000000000
-1294967296
-294967296
705032704
...
continuing indefinitely. The arithmetic is twos-complement wraparound, all right, but something seems to have gone wrong with the comparison in the loop condition.
In fact, if you look at the assembly output, you'll see that gcc has omitted the comparison entirely, and made the loop unconditionally infinite. It is able to deduce that if there were no overflow, the loop could never terminate, and since signed integer overflow is undefined behavior, it is free to have the loop not terminate in that case either. The simplest and "most efficient" legal code is therefore to never terminate at all, since that avoids an "unnecessary" comparison and conditional jump.
You might consider this either cool or perverse, depending on your point of view.
(For extra credit: look at what icc -O2 does and try to explain it.)
On many platforms, requiring that a compiler perform precise integer-size truncation would cause many constructs to run less efficiently than would be possible if they were allowed to use looser truncation semantics. For example, given int muldiv(int x, ind y) { return x*y/60; }, a compiler that was allowed to use loose integer semantics could replace muldiv(x,240); with x<<2, but one which was required to use precise semantics would need to actually perform the multiplication and division. Such optimizations are useful, and generally won't pose problems if casting operators are used in cases where programs need mod-reduced arithmetic, and compilers process a cast to a particular size as implying truncation to that size.
Even when using unsigned values, the presence of a cast in (uint32_t)(uint32a-uint32b) > uint32c will make the programmer's intention clearer, and would be necessary to ensure that code will operate the same on systems with 64-bit int as on those with 32-bit int, so if one wants to test for integer wraparound, even on a compiler that would define the behavior, I would regard (int)(x+someUnsignedChar) < x as superior to `x+someUnsignedChar < x because the cast would let a human reader know the code was deliberately treating values as something other than normal mathematical integers.
The big problem is that some compilers are prone to generate code which behaves nonsensically in case of integer overflow. Even a construct like unsigned mul_mod_65536(unsigned short x, unsigned short y) { return (x*y) & 0xFFFFu; } which the authors of the Standard expected commonplace implementations to process as in a way indistinguishable from unsigned math, will sometimes cause gcc to generate nonsensical code in cases where x would exceed INT_MAX/y.

C: gcc implicitly converts signed char to unsigned char and vice versa?

I'm trying to learn C at got stuck with datatype-sizes at the moment.
Have a look at this code snippet:
#include <stdio.h>
#include <limits.h>
int main() {
char a = 255;
char b = -128;
a = -128;
b = 255;
printf("size: %lu\n", sizeof(char));
printf("min: %d\n", CHAR_MIN);
printf("max: %d\n", CHAR_MAX);
}
The printf-output is:
size: 1
min: -128
max: 127
How is that possible? The size of char is 1 Byte and the default char seems to be signed (-128...127). So how can I assign a value > 127 without getting an overflow warning (which I get when I try to assign -128 or 256)? Is gcc automatically converting to unsigned char? And then, when I assign a negative value, does it convert back? Why does it do so? I mean, all this implicitness wouldn't make it easier to understand.
EDIT:
Okay, it's not converting anything:
char a = 255;
char b = 128;
printf("%d\n", a); /* -1 */
printf("%d\n", b); /* -128 */
So it starts counting from the bottom up. But why doesn't the compiler give me a warning? And why does it so, when I try to assign 256?
See 6.3.1.3/3 in the C99 Standard
... the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.
So, if you don't get a signal (if your program doesn't stop) read the documentation for your compiler to understand what it does.
gcc documents the behaviour ( in http://gcc.gnu.org/onlinedocs/gcc/Integers-implementation.html#Integers-implementation ) as
The result of, or the signal raised by, converting an integer to a signed integer type when the value cannot be represented in an object of that type (C90 6.2.1.2, C99 6.3.1.3).
For conversion to a type of width N, the value is reduced modulo 2^N to be within range of the type; no signal is raised.
how can I assign a value > 127
The result of converting an out-of-range integer value to a signed integer type is either an implementation-defined result or an implementation-defined signal (6.3.1.3/3). So your code is legal C, it just doesn't have the same behavior on all implementations.
without getting an overflow warning
It's entirely up to GCC to decide whether to warn or not about valid code. I'm not quite sure what its rules are, but I get a warning for initializing a signed char with 256, but not with 255. I guess that's because a warning for code like char a = 0xFF would normally not be wanted by the programmer, even when char is signed. There is a portability issue, in that the same code on another compiler might raise a signal or result in the value 0 or 23.
-pedantic enables a warning for this (thanks, pmg), which makes sense since -pedantic is intended to help write portable code. Or arguably doesn't make sense, since as R.. points out it's beyond the scope of merely putting the compiler into standard-conformance mode. However, the man page for gcc says that -pedantic enables diagnostics required by the standard. This one isn't, but the man page also says:
Some users try to use -pedantic to check programs for strict ISO C
conformance. They soon find that it does not do quite what they want:
it finds some non-ISO practices, but not all---only those for which
ISO C requires a diagnostic, and some others for which diagnostics
have been added.
This leaves me wondering what a "non-ISO practice" is, and suspecting that char a = 255 is one of the ones for which a diagnostic has been specifically added. Certainly "non-ISO" means more than just things for which the standard demands a diagnostic, but gcc obviously is not going so far as to diagnose all non-strictly-conforming code of this kind.
I also get a warning for initializing an int with ((long long)UINT_MAX) + 1, but not with UINT_MAX. Looks as if by default gcc consistently gives you the first power of 2 for free, but after that it thinks you've made a mistake.
Use -Wconversion to get a warning about all of those initializations, including char a = 255. Beware that will give you a boatload of other warnings that you may or may not want.
all this implicitness wouldn't make it easier to understand
You'll have to take that up with Dennis Ritchie. C is weakly-typed as far as arithmetic types are concerned. They all implicitly convert to each other, with various levels of bad behavior when the value is out of range depending on the types involved. Again, -Wconversion warns about the dangerous ones.
There are other design decisions in C that mean the weakness is quite important to avoid unwieldy code. For example, the fact that arithmetic is always done in at least an int means that char a = 1, b = 2; a = a + b involves an implicit conversion from int to char when the result of the addition is assigned to a. If you use -Wconversion, or if C didn't have the implicit conversion at all, you'd have to write a = (char)(a+b), which wouldn't be too popular. For that matter, char a = 1 and even char a = 'a' are both implicit conversions from int to char, since C has no literals of type char. So if it wasn't for all those implicit conversions either various other parts of the language would have to be different, or else you'd have to absolutely litter your code with casts. Some programmers want strong typing, which is fair enough, but you don't get it in C.
Simple solution :
see signed char can have value from -128 to 127 okey
so now when you are assigning 129 to any char value it will take
127(this is valid) + 2(this additional) = -127
(give char a=129 & print it value comes -127)
look char register can have value like..
...126,127,-128,-127,-126...-1,0,1,2....
which ever you will assign final value will come by this calculation ...!!

Declaring an array of negative length

What happens in C when you create an array of negative length?
For instance:
int n = -35;
int testArray[n];
for(int i = 0; i < 10; i++)
testArray[i]=i+1;
This code will compile (and brings up no warnings with -Wall enabled), and it seems you can assign to testArray[0] without issue. Assigning past that gives either a segfault or illegal instruction error, and reading anything from the array says "Abort trap" (I'm not familiar with that one). I realize this is somewhat academic, and would (hopefully) never come up in real life, but is there any particular way that the C standard says to treat such arrays, or is does it vary from compiler to compiler?
It's undefined behaviour, because it breaks a "shall" constraint:
C99 §6.7.5.2:
If the size is an expression that is
not an integer constant expression...
...each time it is evaluated it shall
have a value greater than zero.
Undefined behavior, I believe, though don't quote me on that.
This gives the error error: size of array 'testArray' is negative in gcc:
int testArray[-35];
though, as you've seen:
int n = -35;
int testArray[n];
does not give an error even with both -Wall and -W.
However, if you use -pedantic flag, gcc will warn that ISO C90 forbids variable length array.
Visual studio erro message for compilation, you can use -1 to say an empty array. It expects int and you are passing int, so no compiler error.

Resources