C: How to best handle unsigned integers modulo arithmetic ("wrap-around") when using unsigned operands in calculations - c

There was this range checking function that required two signed integer parameters:
range_limit(long int lower, long int upper)
It was called with range_limit(0, controller_limit). I needed to expand the range check to also include negative numbers up to the 'controller_limit' magnitude.
I naively changed the call to
range_limit(-controller_limit, controller_limit)
Although it compiled without warnings, this did not work as I expected.
I missed that controller_limit was unsigned integer.
In C, simple integer calculations can lead to surprising results. For example these calculations
0u - 1;
or more relevant
unsigned int ui = 1;
-ui;
result in 4294967295 of type unsigned int (aka UINT_MAX). As I understand it, this is due to integer conversion rules and modulo arithmetics of unsigned operands see here.
By definition, unsigned arithmetic does not overflow but rather "wraps-around". This behavior is well defined, so the compiler will not issue a warning (at least not gcc) if you use these expressions calling a function:
#include <stdio.h>
void f_l(long int li) {
printf("%li\n", li); // outputs: 4294967295
}
int main(void)
{
unsigned int ui = 1;
f_l(-ui);
return 0;
}
Try this code for yourself!
So instead of passing a negative value I passed a ridiculously high positive value to the function.
My fix was to cast from unsigned integer into int:
range_limit(-(int)controller_limit, controller_limit);
Obviously, integer modulo behavior in combination with integer conversion rules allows for subtle mistakes that are hard to spot especially, as the compiler does not help in finding these mistakes.
As the compiler does not emit any warnings and you can come across these kind of calculations any day, I'd like to know:
If you have to deal with unsigned operands, how do you best avoid the unsigned integers modulo arithmetic pitfall?
Note:
While gcc does not provide any help in detecting integer modulo arithmetic (at the time of writing), clang does. The compiler flag "-fsanitize=unsigned-integer-overflow" will enable detection of modulo arithmetic (using "-Wconversion" is not sufficient), however, not at compile time but at runtime. Try for yourself!
Further reading:
Seacord: Secure Coding in C and C++, Chapter 5, Integer Security

Using signed integers does not change the situation at all.
A C implementation is under no obligation to raise a run-time warning or error as a response to Undefined Behaviour. Undefined Behaviour is undefined, as it says; the C standard provides absolutely no requirements or guidance about the outcome. A particular implementation can choose any mechanism it sees fit in response to Undefined Behaviour, including explicitly defining the result. (If you rely on that explicit definition, your program is no longer portable to other compilers with different or undocumented behaviour. Perhaps you don't care.)
For example, GCC defines the result of out-of-bounds integer conversions and some bitwise operations in Implementation-defined behaviour section of its manual.
If you're worried about integer overflow (and there are lots of times you should be worried about it), it's up to you to protect yourself.
For example, instead of allowing:
unsigned_counter += 5;
to overflow, you could write:
if (unsigned_count > UINT_MAX - 5) {
/* Handle the error */
}
else {
unsigned_counter += 5;
}
And you should do that in cases where integer overflow will get you into trouble. A common example, which can (and has!) lead to buffer-overflow exploits, comes from checking whether a buffer has enough room for an addition:
if (buffer_length + added_length >= buffer_capacity) {
/* Reallocate buffer or fail*/
}
memcpy(buffer + buffer_length, add_characters, added_length);
buffer_length += added_length;
buffer[buffer_length] = 0;
If buffer_length + added_length overflows -- in either signed or unsigned arithmetic -- the necessary reallocation (or failure) won't trigger and the memcpy will overwrite memory or segfault or do something else you weren't expecting.
It's easy to fix, so it's worth getting into the habit:
if (added_length >= buffer_capacity
|| buffer_length >= buffer_capacity - added_length) {
/* Reallocate buffer or fail*/
}
memcpy(buffer + buffer_length, add_characters, added_length);
buffer_length += added_length;
buffer[buffer_length] = 0;
Another similar case where you can get into serious trouble is when you are using a loop and your increment is more than one.
This is safe:
for (i = 0; i < limit; ++i) ...
This could lead to an infinite loop:
for (i = 0; i < limit; i += 2) ...
The first one is safe -- assuming i and limit are the same type -- because i + 1 cannot overflow if i < limit. The most it can be is limit itself. But no such guarantee can be made about i + 2, since limit could be INT_MAX (or whatever is the maximum value for the integer type being used). Again, the fix is simple: compare the difference rather than the sum.
If you're using GCC and you don't care about full portability, you can use the GCC overflow-detection builtins to help you. They're also documented in the GCC manual.

Related

Which "C" implementation(s) do not implement modulo arithmetic for signed integers?

In reference to C11 draft, section 3.4.3 and C11 draft, section H.2.2, I'm looking for "C" implementations that implement behaviour other than modulo arithmetic for signed integers.
Specifically, I am looking for instances where this is the default behaviour, possibly due to the underlying machine architecture.
Here's a code sample and terminal session that illustrates modulo arithmetic behaviour for signed integers:
overflow.c:
#include <stdio.h>
#include <limits.h>
int main(int argc, char *argv[])
{
int a, b;
printf ( "INT_MAX = %d\n", INT_MAX );
if ( argc == 2 && sscanf(argv[1], "%d,%d", &a, &b) == 2 ) {
int c = a + b;
printf ( "%d + %d = %d\n", a, b, c );
}
return 0;
}
Terminal session:
$ ./overflow 2000000000,2000000000
INT_MAX = 2147483647
2000000000 + 2000000000 = -294967296
Even with a "familiar" compiler like gcc, on a "familiar" platform like x86, signed integer overflow can do something other than the "obvious" twos-complement wraparound behavior.
One amusing (or possibly horrifying) example is the following (see on godbolt):
#include <stdio.h>
int main(void) {
for (int i = 0; i >= 0; i += 1000000000) {
printf("%d\n", i);
}
printf("done\n");
return 0;
}
Naively, you would expect this to output
0
1000000000
2000000000
done
And with gcc -O0 you would be right. But with gcc -O2 you get
0
1000000000
2000000000
-1294967296
-294967296
705032704
...
continuing indefinitely. The arithmetic is twos-complement wraparound, all right, but something seems to have gone wrong with the comparison in the loop condition.
In fact, if you look at the assembly output, you'll see that gcc has omitted the comparison entirely, and made the loop unconditionally infinite. It is able to deduce that if there were no overflow, the loop could never terminate, and since signed integer overflow is undefined behavior, it is free to have the loop not terminate in that case either. The simplest and "most efficient" legal code is therefore to never terminate at all, since that avoids an "unnecessary" comparison and conditional jump.
You might consider this either cool or perverse, depending on your point of view.
(For extra credit: look at what icc -O2 does and try to explain it.)
On many platforms, requiring that a compiler perform precise integer-size truncation would cause many constructs to run less efficiently than would be possible if they were allowed to use looser truncation semantics. For example, given int muldiv(int x, ind y) { return x*y/60; }, a compiler that was allowed to use loose integer semantics could replace muldiv(x,240); with x<<2, but one which was required to use precise semantics would need to actually perform the multiplication and division. Such optimizations are useful, and generally won't pose problems if casting operators are used in cases where programs need mod-reduced arithmetic, and compilers process a cast to a particular size as implying truncation to that size.
Even when using unsigned values, the presence of a cast in (uint32_t)(uint32a-uint32b) > uint32c will make the programmer's intention clearer, and would be necessary to ensure that code will operate the same on systems with 64-bit int as on those with 32-bit int, so if one wants to test for integer wraparound, even on a compiler that would define the behavior, I would regard (int)(x+someUnsignedChar) < x as superior to `x+someUnsignedChar < x because the cast would let a human reader know the code was deliberately treating values as something other than normal mathematical integers.
The big problem is that some compilers are prone to generate code which behaves nonsensically in case of integer overflow. Even a construct like unsigned mul_mod_65536(unsigned short x, unsigned short y) { return (x*y) & 0xFFFFu; } which the authors of the Standard expected commonplace implementations to process as in a way indistinguishable from unsigned math, will sometimes cause gcc to generate nonsensical code in cases where x would exceed INT_MAX/y.

C safely taking absolute value of integer

Consider following program (C99):
#include <stdio.h>
#include <stdlib.h>
#include <inttypes.h>
int main(void)
{
printf("Enter int in range %jd .. %jd:\n > ", INTMAX_MIN, INTMAX_MAX);
intmax_t i;
if (scanf("%jd", &i) == 1)
printf("Result: |%jd| = %jd\n", i, imaxabs(i));
}
Now as I understand it, this contains easily triggerable undefined behaviour, like this:
Enter int in range -9223372036854775808 .. 9223372036854775807:
> -9223372036854775808
Result: |-9223372036854775808| = -9223372036854775808
Questions:
Is this really undefined behaviour, as in "code is allowed to trigger any code path, which any code that stroke compiler's fancy", when user enters the bad number? Or is it some other flavor of not-completely-defined?
How would a pedantic programmer go about guarding against this, without making any assumptions not guaranteed by standard?
(There are a few related questions, but I didn't find one which answers question 2 above, so if you suggest duplicate, please make sure it answers that.)
If the result of imaxabs cannot be represented, can happen if using two's complement, then the behavior is undefined.
7.8.2.1 The imaxabs function
The imaxabs function computes the absolute value of an integer j. If the result cannot
be represented, the behavior is undefined. 221)
221) The absolute value of the most negative number cannot be represented in two’s complement.
The check that makes no assumptions and is always defined is:
intmax_t i = ... ;
if( i < -INTMAX_MAX )
{
//handle error
}
(This if statement cannot be taken if using one's complement or sign-magnitude representation, so the compiler might give a unreachable code warning. The code itself is still defined and valid. )
How would a pedantic programmer go about guarding against this, without making any assumptions not guaranteed by standard?
One method is to use unsigned integers. The overflow behaviour of unsigned integers is well-defined as is the behaviour when converting from a signed to an unsigned integer.
So I think the following should be safe (turns out it's horriblly broken on some really obscure systems, see later in the post for an improved version)
uintmax_t j = i;
if (j > (uintmax_t)INTMAX_MAX) {
j = -j;
}
printf("Result: |%jd| = %ju\n", i, j);
So how does this work?
uintmax_t j = i;
This converts the signed integer into an unsigned one. IF it's positive the value stays the same, if it's negative the value increases by 2n (where n is the number of bits). This converts it to a large number (larger than INTMAX_MAX)
if (j > (uintmax_t)INTMAX_MAX) {
If the original number was positive (and hence less than or equal to INTMAX_MAX) this does nothing. If the original number was negative the inside of the if block is run.
j = -j;
The number is negated. The result of a negation is clearly negative and so cannot be represented as an unsigned integer. So it is increased by 2n.
So algebraically the result for negative i looks like
j = - (i + 2n) + 2n = -i
Clever, but this solution makes assumptions. This fails if INTMAX_MAX == UINTMAX_MAX, which is allowed by C Standard.
Hmm, lets look at this (i'm reading https://busybox.net/~landley/c99-draft.html which is apprarently the last C99 draft prior to standardisation, if anything changed in the final standard please do tell me.
When typedef names differing only in the absence or presence of the initial u are defined, they shall denote corresponding signed and unsigned types as described in 6.2.5; an implementation shall not provide a type without also providing its corresponding type.
In 6.2.5 I see
For each of the signed integer types, there is a corresponding (but different) unsigned integer type (designated with the keyword unsigned) that uses the same amount of storage (including sign information) and has the same alignment requirements.
In 6.2.6.2 I see
#1
For unsigned integer types other than unsigned char, the bits of the object representation shall be divided into two groups: value bits and padding bits (there need not be any of the latter). If there are N value bits, each bit shall represent a different power of 2 between 1 and 2N-1, so that >objects of that type shall be capable of representing values from 0 to 2N-1 >using a pure binary representation; this shall be known as the value representation. The values of any padding bits are unspecified.39)
#2
For signed integer types, the bits of the object representation shall be divided into three groups: value bits, padding bits, and the sign bit. There need not be any padding bits; there shall be exactly one sign bit. Each bit that is a value bit shall have the same value as the same bit in the object representation of the corresponding unsigned type (if there are M value bits in the signed type and N in the unsigned type, then M<=N). If the sign bit is zero, it shall not affect the resulting value.
So yes it seems you are right, while the signed and unsigned types have to be the same size it does seem to be valid for the unsigned type to have one more padding bit than the signed type.
Ok, based on the analysis above revealing a flaw in my first attempt i've written a more paranoid variant. This has two changes from my first version.
I use i < 0 rather than j > (uintmax_t)INTMAX_MAX to check for negative numbers. This means that the algorithm proceduces correct results for numbers grater than or equal to -INTMAX_MAX even when INTMAX_MAX == UINTMAX_MAX.
I add handling for the error case where INTMAX_MAX == UINTMAX_MAX, INTMAX_MIN == -INTMAX_MAX -1 and i == INTMAX_MIN. This will result in j=0 inside the if condition which we can easilly test for.
It can be seen from the requirements in the C standard that INTMAX_MIN cannot be smaller than -INTMAX_MAX -1 since there is only one sign bit and the number of value bits must be the same or lower than in the corresponding unsigned type. There are simply no bit patterns left to represent smaller numbers.
uintmax_t j = i;
if (i < 0) {
j = -j;
if (j == 0) {
printf("your platform sucks\n");
exit(1);
}
}
printf("Result: |%jd| = %ju\n", i, j);
#plugwash I think 2501 is correct. For example, -UINTMAX_MAX value becomes 1: (-UINTMAX_MAX + (UINTMAX_MAX + 1)), and is not caught by your if. – hyde 58 mins ago
Umm,
assuming INTMAX_MAX == UINTMAX_MAX and i = -INTMAX_MAX
uintmax_t j = i;
after this command j = -INTMAX_MAX + (UINTMAX_MAX + 1) = 1
if (i < 0) {
i is less than zero so we run the commands inside the if
j = -j;
after this command j = -1 + (UINTMAX_MAX + 1) = UINTMAX_MAX
which is the correct answer, so no need to trap it in an error case.
On two-complement systems getting the absolute number of the most negative value is indeed undefined behavior, as the absolute value would be out of range. And it's nothing the compiler can help you with, as the UB happens at run-time.
The only way to protect against that is to compare the input against the most negative value for the type (INTMAX_MIN in the code you show).
So calculating the absolute value of an integer invokes undefined behaviour in one single case. Actually, while the undefined behaviour can be avoided, it is impossible to give the correct result in one case.
Now consider multiplication of an integer by 3: Here we have a much more serious problem. This operation invokes undefined behaviour in 2/3rds of all cases! And for two thirds of all int values x, finding an int with the value 3x is just impossible. That's a much more serious problem than the absolute value problem.
You may want to use some bit hacks:
int v; // we want to find the absolute value of v
unsigned int r; // the result goes here
int const mask = v >> sizeof(int) * CHAR_BIT - 1;
r = (v + mask) ^ mask;
This works well when INT_MIN < v <= INT_MAX. In the case where v == INT_MIN, it remains INT_MIN , without causing undefined behavior.
You can also use bitwise operation to handle this on ones' complement and sign-magnitude systems.
Reference: https://graphics.stanford.edu/~seander/bithacks.html#IntegerAbs
according to this http://linux.die.net/man/3/imaxabs
Notes
Trying to take the absolute value of the most negative integer is not defined.
To handle the full range you could add something like this to your code
if (i != INTMAX_MIN) {
printf("Result: |%jd| = %jd\n", i, imaxabs(i));
} else { /* Code around undefined abs( INTMAX_MIN) /*
printf("Result: |%jd| = %jd%jd\n", i, -(i/10), -(i%10));
}
edit: As abs(INTMAX_MIN) cannot be represented on a 2's complement machine, 2 values within respresentable range are concatenated on output as a string.
Tested with gcc, though printf required %lld as %jd was not a supported format.
Is this really undefined behaviour, as in "code is allowed to trigger any code path, which any code that stroke compiler's fancy", when user enters the bad number? Or is it some other flavor of not-completely-defined?
The behaviour of the program is only undefined, when the bad number is successfully input-ed and passed to imaxabs(), which on a typical 2's complement system returns a -ve result as you observed.
That is the undefined behaviour in this case, the implementation would also be allowed to terminate the program with an over-flow error if the ALU set status flags.
The reason for "undefined behaviour" in C is so compiler writers don't have to guard against overflow, so programs can run more efficiently. Whilst it is within C standard for every C program using abs() to try to kill your first born, just because you call it with a too -ve value, writing such code into the object file would simply be perverse.
The real problem with these undefined behaviours, is that an optimising compiler, can reason away naive checks so code like :
r = (i < 0) ? -i : i;
if (r < 0) { // This code may be pointless
// Do overflow recovery
doRecoveryProcessing();
} else {
printf("%jd", r);
}
As a compiler optomiser can reason that negative values are negated, it could in principal determine that (r <0) is always false, so the attempt to trap the problem fails.
How would a pedantic programmer go about guarding against this, without making any assumptions not guaranteed by standard?
By far the best way, is simply to ensure that the program works on a valid range, so in this case validating the input suffices (disallow INTMAX_MIN).
Programs printing tables of abs() ought to avoid INT*_MIN and so on.
if (i != INTMAX_MIN) {
printf("Result: |%jd| = %jd\n", i, imaxabs(i));
} else { /* Code around undefined abs( INTMAX_MIN) /*
printf("Result: |%jd| = %jd%jd\n", i, -(i/10), -(i%10));
}
Appears to write out the abs( INTMAX_MIN) by fakery, allowing the program to live up to it's promise to the user.

Is it bad to underflow then overflow an unsigned variable?

Kraaa.
I am a student in a programming school who requires us to write C functions with less than 25 lines of code. So, basically, every line counts. Sometimes, I have the need to shorten assignments like so:
#include <stddef.h>
#include <stdio.h>
#define ARRAY_SIZE 3
int main(void)
{
int nbr_array[ARRAY_SIZE] = { 1, 2, 3 };
size_t i;
i = -1;
while (++i < ARRAY_SIZE)
printf("nbr_array[%zu] = %i\n", i, nbr_array[i]);
return (0);
}
The important part of this code is the size_t counter named i. In order to save up several lines of code, I would like to pre-increment it in the loop's condition. But, insofar as the C standard defines size_t as an unsigned type, what I am basically doing here, is underflowing the i variable (from 0 to a very big value), then overflowing it once (from that big value to 0).
My question is the following: regardless of the bad practises of having to shorten our code, is it safe to set an unsigned (size_t) variable to -1 then pre-increment it at each iteration to browse an array?
Thanks!
The i = -1; part of your program is fine.
Converting -1 to an unsigned integer type is defined in C, and results in a value that, if incremented, results in zero.
This said, you are not gaining any line of code with respect to the idiomatic for (i=0; i<ARRAY_SIZE; i++) ….
Your %zi format should probably be %zu.
Unsigned arithmetic never "overflows/underflows" (at least in the way the standard talks about the undefined behavior of signed arithmetic overflow). All unsigned arithmetic is actually modular arithmetic, and as such is safe (i.e. it won't cause undefined behavior in and of itself).
To be precise, the C standard guarantees two things:
Any integer conversion to an unsigned type is well defined (as if the signed number were represented as 2-complement)
overflow/underflow of unsigned integers is well defined (modular arithmetic with 2^n)
Since size_t is an unsigned type, you are not doing anything evil.

Why is int rather than unsigned int used for C and C++ for loops?

This is a rather silly question but why is int commonly used instead of unsigned int when defining a for loop for an array in C or C++?
for(int i;i<arraySize;i++){}
for(unsigned int i;i<arraySize;i++){}
I recognize the benefits of using int when doing something other than array indexing and the benefits of an iterator when using C++ containers. Is it just because it does not matter when looping through an array? Or should I avoid it all together and use a different type such as size_t?
Using int is more correct from a logical point of view for indexing an array.
unsigned semantic in C and C++ doesn't really mean "not negative" but it's more like "bitmask" or "modulo integer".
To understand why unsigned is not a good type for a "non-negative" number please consider these totally absurd statements:
Adding a possibly negative integer to a non-negative integer you get a non-negative integer
The difference of two non-negative integers is always a non-negative integer
Multiplying a non-negative integer by a negative integer you get a non-negative result
Obviously none of the above phrases make any sense... but it's how C and C++ unsigned semantic indeed works.
Actually using an unsigned type for the size of containers is a design mistake of C++ and unfortunately we're now doomed to use this wrong choice forever (for backward compatibility). You may like the name "unsigned" because it's similar to "non-negative" but the name is irrelevant and what counts is the semantic... and unsigned is very far from "non-negative".
For this reason when coding most loops on vectors my personally preferred form is:
for (int i=0,n=v.size(); i<n; i++) {
...
}
(of course assuming the size of the vector is not changing during the iteration and that I actually need the index in the body as otherwise the for (auto& x : v)... is better).
This running away from unsigned as soon as possible and using plain integers has the advantage of avoiding the traps that are a consequence of unsigned size_t design mistake. For example consider:
// draw lines connecting the dots
for (size_t i=0; i<pts.size()-1; i++) {
drawLine(pts[i], pts[i+1]);
}
the code above will have problems if the pts vector is empty because pts.size()-1 is a huge nonsense number in that case. Dealing with expressions where a < b-1 is not the same as a+1 < b even for commonly used values is like dancing in a minefield.
Historically the justification for having size_t unsigned is for being able to use the extra bit for the values, e.g. being able to have 65535 elements in arrays instead of just 32767 on 16-bit platforms. In my opinion even at that time the extra cost of this wrong semantic choice was not worth the gain (and if 32767 elements are not enough now then 65535 won't be enough for long anyway).
Unsigned values are great and very useful, but NOT for representing container size or for indexes; for size and index regular signed integers work much better because the semantic is what you would expect.
Unsigned values are the ideal type when you need the modulo arithmetic property or when you want to work at the bit level.
This is a more general phenomenon, often people don't use the correct types for their integers. Modern C has semantic typedefs that are much preferable over the primitive integer types. E.g everything that is a "size" should just be typed as size_t. If you use the semantic types systematically for your application variables, loop variables come much easier with these types, too.
And I have seen several bugs that where difficult to detect that came from using int or so. Code that all of a sudden crashed on large matrixes and stuff like that. Just coding correctly with correct types avoids that.
It's purely laziness and ignorance. You should always use the right types for indices, and unless you have further information that restricts the range of possible indices, size_t is the right type.
Of course if the dimension was read from a single-byte field in a file, then you know it's in the range 0-255, and int would be a perfectly reasonable index type. Likewise, int would be okay if you're looping a fixed number of times, like 0 to 99. But there's still another reason not to use int: if you use i%2 in your loop body to treat even/odd indices differently, i%2 is a lot more expensive when i is signed than when i is unsigned...
Not much difference. One benefit of int is it being signed. Thus int i < 0 makes sense, while unsigned i < 0 doesn't much.
If indexes are calculated, that may be beneficial (for example, you might get cases where you will never enter a loop if some result is negative).
And yes, it is less to write :-)
Using int to index an array is legacy, but still widely adopted. int is just a generic number type and does not correspond to the addressing capabilities of the platform. In case it happens to be shorter or longer than that, you may encounter strange results when trying to index a very large array that goes beyond.
On modern platforms, off_t, ptrdiff_t and size_t guarantee much more portability.
Another advantage of these types is that they give context to someone who reads the code. When you see the above types you know that the code will do array subscripting or pointer arithmetic, not just any calculation.
So, if you want to write bullet-proof, portable and context-sensible code, you can do it at the expense of a few keystrokes.
GCC even supports a typeof extension which relieves you from typing the same typename all over the place:
typeof(arraySize) i;
for (i = 0; i < arraySize; i++) {
...
}
Then, if you change the type of arraySize, the type of i changes automatically.
It really depends on the coder. Some coders prefer type perfectionism, so they'll use whatever type they're comparing against. For example, if they're iterating through a C string, you might see:
size_t sz = strlen("hello");
for (size_t i = 0; i < sz; i++) {
...
}
While if they're just doing something 10 times, you'll probably still see int:
for (int i = 0; i < 10; i++) {
...
}
I use int cause it requires less physical typing and it doesn't matter - they take up the same amount of space, and unless your array has a few billion elements you won't overflow if you're not using a 16-bit compiler, which I'm usually not.
Because unless you have an array with size bigger than two gigabyts of type char, or 4 gigabytes of type short or 8 gigabytes of type int etc, it doesn't really matter if the variable is signed or not.
So, why type more when you can type less?
Aside from the issue that it's shorter to type, the reason is that it allows negative numbers.
Since we can't say in advance whether a value can ever be negative, most functions that take integer arguments take the signed variety. Since most functions use signed integers, it is often less work to use signed integers for things like loops. Otherwise, you have the potential of having to add a bunch of typecasts.
As we move to 64-bit platforms, the unsigned range of a signed integer should be more than enough for most purposes. In these cases, there's not much reason not to use a signed integer.
Consider the following simple example:
int max = some_user_input; // or some_calculation_result
for(unsigned int i = 0; i < max; ++i)
do_something;
If max happens to be a negative value, say -1, the -1 will be regarded as UINT_MAX (when two integers with the sam rank but different sign-ness are compared, the signed one will be treated as an unsigned one). On the other hand, the following code would not have this issue:
int max = some_user_input;
for(int i = 0; i < max; ++i)
do_something;
Give a negative max input, the loop will be safely skipped.
Using a signed int is - in most cases - a mistake that could easily result in potential bugs as well as undefined behavior.
Using size_t matches the system's word size (64 bits on 64 bit systems and 32 bits on 32 bit systems), always allowing for the correct range for the loop and minimizing the risk of an integer overflow.
The int recommendation comes to solve an issue where reverse for loops were often written incorrectly by unexperienced programmers (of course, int might not be in the correct range for the loop):
/* a correct reverse for loop */
for (size_t i = count; i > 0;) {
--i; /* note that this is not part of the `for` statement */
/* code for loop where i is for zero based `index` */
}
/* an incorrect reverse for loop (bug on count == 0) */
for (size_t i = count - 1; i > 0; --i) {
/* i might have overflowed and undefined behavior occurs */
}
In general, signed and unsigned variables shouldn't be mixed together, so at times using an int in unavoidable. However, the correct type for a for loop is as a rule size_t.
There's a nice talk about this misconception that signed variables are better than unsigned variables, you can find it on YouTube (Signed Integers Considered Harmful by Robert Seacord).
TL;DR;: Signed variables are more dangerous and require more code than unsigned variables (which should be preferred almost in all cases and definitely whenever negative values aren't logically expected).
With unsigned variables the only concern is the overflow boundary which has a strictly defined behavior (wrap-around) and uses clearly defined modular mathematics.
This allows a single edge case test to catch an overflow and that test can be performed after the mathematical operation was executed.
However, with signed variables the overflow behavior is undefined (UB) and the negative range is actually larger than the positive range - things that add edge cases that must be tested for and explicitly handled before the mathematical operation can be executed.
i.e., how much INT_MIN * -1? (the pre-processor will protect you, but without it you're in a jam).
P.S.
As for the example offered by #6502 in their answer, the whole thing is again an issue of trying to cut corners and a simple missing if statement.
When a loop assumes at least 2 elements in an array, this assumption should be tested beforehand. i.e.:
// draw lines connecting the dots - forward loop
if(pts.size() > 1) { // first make sure there's enough dots
for (size_t i=0; i < pts.size()-1; i++) { // then loop
drawLine(pts[i], pts[i+1]);
}
}
// or test against i + 1 : which tests the desired pts[i+1]
for (size_t i = 0; i + 1 < pts.size(); i++) { // then loop
drawLine(pts[i], pts[i+1]);
}
// or start i as 1 : but note that `-` is slower than `+`
for (size_t i = 1; i < pts.size(); i++) { // then loop
drawLine(pts[i - 1], pts[i]);
}

C: gcc implicitly converts signed char to unsigned char and vice versa?

I'm trying to learn C at got stuck with datatype-sizes at the moment.
Have a look at this code snippet:
#include <stdio.h>
#include <limits.h>
int main() {
char a = 255;
char b = -128;
a = -128;
b = 255;
printf("size: %lu\n", sizeof(char));
printf("min: %d\n", CHAR_MIN);
printf("max: %d\n", CHAR_MAX);
}
The printf-output is:
size: 1
min: -128
max: 127
How is that possible? The size of char is 1 Byte and the default char seems to be signed (-128...127). So how can I assign a value > 127 without getting an overflow warning (which I get when I try to assign -128 or 256)? Is gcc automatically converting to unsigned char? And then, when I assign a negative value, does it convert back? Why does it do so? I mean, all this implicitness wouldn't make it easier to understand.
EDIT:
Okay, it's not converting anything:
char a = 255;
char b = 128;
printf("%d\n", a); /* -1 */
printf("%d\n", b); /* -128 */
So it starts counting from the bottom up. But why doesn't the compiler give me a warning? And why does it so, when I try to assign 256?
See 6.3.1.3/3 in the C99 Standard
... the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.
So, if you don't get a signal (if your program doesn't stop) read the documentation for your compiler to understand what it does.
gcc documents the behaviour ( in http://gcc.gnu.org/onlinedocs/gcc/Integers-implementation.html#Integers-implementation ) as
The result of, or the signal raised by, converting an integer to a signed integer type when the value cannot be represented in an object of that type (C90 6.2.1.2, C99 6.3.1.3).
For conversion to a type of width N, the value is reduced modulo 2^N to be within range of the type; no signal is raised.
how can I assign a value > 127
The result of converting an out-of-range integer value to a signed integer type is either an implementation-defined result or an implementation-defined signal (6.3.1.3/3). So your code is legal C, it just doesn't have the same behavior on all implementations.
without getting an overflow warning
It's entirely up to GCC to decide whether to warn or not about valid code. I'm not quite sure what its rules are, but I get a warning for initializing a signed char with 256, but not with 255. I guess that's because a warning for code like char a = 0xFF would normally not be wanted by the programmer, even when char is signed. There is a portability issue, in that the same code on another compiler might raise a signal or result in the value 0 or 23.
-pedantic enables a warning for this (thanks, pmg), which makes sense since -pedantic is intended to help write portable code. Or arguably doesn't make sense, since as R.. points out it's beyond the scope of merely putting the compiler into standard-conformance mode. However, the man page for gcc says that -pedantic enables diagnostics required by the standard. This one isn't, but the man page also says:
Some users try to use -pedantic to check programs for strict ISO C
conformance. They soon find that it does not do quite what they want:
it finds some non-ISO practices, but not all---only those for which
ISO C requires a diagnostic, and some others for which diagnostics
have been added.
This leaves me wondering what a "non-ISO practice" is, and suspecting that char a = 255 is one of the ones for which a diagnostic has been specifically added. Certainly "non-ISO" means more than just things for which the standard demands a diagnostic, but gcc obviously is not going so far as to diagnose all non-strictly-conforming code of this kind.
I also get a warning for initializing an int with ((long long)UINT_MAX) + 1, but not with UINT_MAX. Looks as if by default gcc consistently gives you the first power of 2 for free, but after that it thinks you've made a mistake.
Use -Wconversion to get a warning about all of those initializations, including char a = 255. Beware that will give you a boatload of other warnings that you may or may not want.
all this implicitness wouldn't make it easier to understand
You'll have to take that up with Dennis Ritchie. C is weakly-typed as far as arithmetic types are concerned. They all implicitly convert to each other, with various levels of bad behavior when the value is out of range depending on the types involved. Again, -Wconversion warns about the dangerous ones.
There are other design decisions in C that mean the weakness is quite important to avoid unwieldy code. For example, the fact that arithmetic is always done in at least an int means that char a = 1, b = 2; a = a + b involves an implicit conversion from int to char when the result of the addition is assigned to a. If you use -Wconversion, or if C didn't have the implicit conversion at all, you'd have to write a = (char)(a+b), which wouldn't be too popular. For that matter, char a = 1 and even char a = 'a' are both implicit conversions from int to char, since C has no literals of type char. So if it wasn't for all those implicit conversions either various other parts of the language would have to be different, or else you'd have to absolutely litter your code with casts. Some programmers want strong typing, which is fair enough, but you don't get it in C.
Simple solution :
see signed char can have value from -128 to 127 okey
so now when you are assigning 129 to any char value it will take
127(this is valid) + 2(this additional) = -127
(give char a=129 & print it value comes -127)
look char register can have value like..
...126,127,-128,-127,-126...-1,0,1,2....
which ever you will assign final value will come by this calculation ...!!

Resources