Is it bad to underflow then overflow an unsigned variable? - c

Kraaa.
I am a student in a programming school who requires us to write C functions with less than 25 lines of code. So, basically, every line counts. Sometimes, I have the need to shorten assignments like so:
#include <stddef.h>
#include <stdio.h>
#define ARRAY_SIZE 3
int main(void)
{
int nbr_array[ARRAY_SIZE] = { 1, 2, 3 };
size_t i;
i = -1;
while (++i < ARRAY_SIZE)
printf("nbr_array[%zu] = %i\n", i, nbr_array[i]);
return (0);
}
The important part of this code is the size_t counter named i. In order to save up several lines of code, I would like to pre-increment it in the loop's condition. But, insofar as the C standard defines size_t as an unsigned type, what I am basically doing here, is underflowing the i variable (from 0 to a very big value), then overflowing it once (from that big value to 0).
My question is the following: regardless of the bad practises of having to shorten our code, is it safe to set an unsigned (size_t) variable to -1 then pre-increment it at each iteration to browse an array?
Thanks!

The i = -1; part of your program is fine.
Converting -1 to an unsigned integer type is defined in C, and results in a value that, if incremented, results in zero.
This said, you are not gaining any line of code with respect to the idiomatic for (i=0; i<ARRAY_SIZE; i++) ….
Your %zi format should probably be %zu.

Unsigned arithmetic never "overflows/underflows" (at least in the way the standard talks about the undefined behavior of signed arithmetic overflow). All unsigned arithmetic is actually modular arithmetic, and as such is safe (i.e. it won't cause undefined behavior in and of itself).

To be precise, the C standard guarantees two things:
Any integer conversion to an unsigned type is well defined (as if the signed number were represented as 2-complement)
overflow/underflow of unsigned integers is well defined (modular arithmetic with 2^n)
Since size_t is an unsigned type, you are not doing anything evil.

Related

Why does this size_t i = size variable giving "garbage" value when used in for loop?

I have defined size as the passed value of 6 tracing the value of "size" also produced 6, however when I use size, or even plainly 6 to initialize i but in the for-loop, the value of i goes to garbage.
In the case here i just initialize the value of 6 for easier interpretation. To my best understanding, size_t is similar to an unsigned int or unsigned long int depending on the compiler
for (size_t i = 6 ; i >= 0; --i){
printf("%lu\n",i);
}
gcc -Wall -Wextra called and said hi:
warning: comparison of unsigned expression in '>= 0' is always true [-Wtype-limits]
Do yourself a favour and stop searching for bugs that the compiler already found, by following this advise: What compiler options are recommended for beginners learning C?
Now what happens in this case is that unsigned integers have well-defined wrap around. When going past 0, size_t will therefore get the value of a very large integer.
You then lie to printf and say that you are passing a signed long, when you are in fact passing an unsigned size_t. This is strictly speaking undefined behavior and anything can happen. The correct conversion specifier to use is %zu.
In practice on a system with 8 bit long, you might get an output such as
18446744073709551615, but this isn't guaranteed since it's a bug. In either case it is an eternal loop which will hang the program.
i >= 0 is always true because i is a size_t and can't get under 0
So the loop won't terminate, this is something your compiler accepts or not depending your compilation options
If you decrement a size_t which is equal to 0, this size_t will get the largest possible value for size_t.
You can use i > 0 instead or use
for(int i = 6;i >= 0;--i)
if you want your 0 to be printed.
PS : %zu is for size_t and %d for int (or %i , %d being for decimal integer input)

C: How to best handle unsigned integers modulo arithmetic ("wrap-around") when using unsigned operands in calculations

There was this range checking function that required two signed integer parameters:
range_limit(long int lower, long int upper)
It was called with range_limit(0, controller_limit). I needed to expand the range check to also include negative numbers up to the 'controller_limit' magnitude.
I naively changed the call to
range_limit(-controller_limit, controller_limit)
Although it compiled without warnings, this did not work as I expected.
I missed that controller_limit was unsigned integer.
In C, simple integer calculations can lead to surprising results. For example these calculations
0u - 1;
or more relevant
unsigned int ui = 1;
-ui;
result in 4294967295 of type unsigned int (aka UINT_MAX). As I understand it, this is due to integer conversion rules and modulo arithmetics of unsigned operands see here.
By definition, unsigned arithmetic does not overflow but rather "wraps-around". This behavior is well defined, so the compiler will not issue a warning (at least not gcc) if you use these expressions calling a function:
#include <stdio.h>
void f_l(long int li) {
printf("%li\n", li); // outputs: 4294967295
}
int main(void)
{
unsigned int ui = 1;
f_l(-ui);
return 0;
}
Try this code for yourself!
So instead of passing a negative value I passed a ridiculously high positive value to the function.
My fix was to cast from unsigned integer into int:
range_limit(-(int)controller_limit, controller_limit);
Obviously, integer modulo behavior in combination with integer conversion rules allows for subtle mistakes that are hard to spot especially, as the compiler does not help in finding these mistakes.
As the compiler does not emit any warnings and you can come across these kind of calculations any day, I'd like to know:
If you have to deal with unsigned operands, how do you best avoid the unsigned integers modulo arithmetic pitfall?
Note:
While gcc does not provide any help in detecting integer modulo arithmetic (at the time of writing), clang does. The compiler flag "-fsanitize=unsigned-integer-overflow" will enable detection of modulo arithmetic (using "-Wconversion" is not sufficient), however, not at compile time but at runtime. Try for yourself!
Further reading:
Seacord: Secure Coding in C and C++, Chapter 5, Integer Security
Using signed integers does not change the situation at all.
A C implementation is under no obligation to raise a run-time warning or error as a response to Undefined Behaviour. Undefined Behaviour is undefined, as it says; the C standard provides absolutely no requirements or guidance about the outcome. A particular implementation can choose any mechanism it sees fit in response to Undefined Behaviour, including explicitly defining the result. (If you rely on that explicit definition, your program is no longer portable to other compilers with different or undocumented behaviour. Perhaps you don't care.)
For example, GCC defines the result of out-of-bounds integer conversions and some bitwise operations in Implementation-defined behaviour section of its manual.
If you're worried about integer overflow (and there are lots of times you should be worried about it), it's up to you to protect yourself.
For example, instead of allowing:
unsigned_counter += 5;
to overflow, you could write:
if (unsigned_count > UINT_MAX - 5) {
/* Handle the error */
}
else {
unsigned_counter += 5;
}
And you should do that in cases where integer overflow will get you into trouble. A common example, which can (and has!) lead to buffer-overflow exploits, comes from checking whether a buffer has enough room for an addition:
if (buffer_length + added_length >= buffer_capacity) {
/* Reallocate buffer or fail*/
}
memcpy(buffer + buffer_length, add_characters, added_length);
buffer_length += added_length;
buffer[buffer_length] = 0;
If buffer_length + added_length overflows -- in either signed or unsigned arithmetic -- the necessary reallocation (or failure) won't trigger and the memcpy will overwrite memory or segfault or do something else you weren't expecting.
It's easy to fix, so it's worth getting into the habit:
if (added_length >= buffer_capacity
|| buffer_length >= buffer_capacity - added_length) {
/* Reallocate buffer or fail*/
}
memcpy(buffer + buffer_length, add_characters, added_length);
buffer_length += added_length;
buffer[buffer_length] = 0;
Another similar case where you can get into serious trouble is when you are using a loop and your increment is more than one.
This is safe:
for (i = 0; i < limit; ++i) ...
This could lead to an infinite loop:
for (i = 0; i < limit; i += 2) ...
The first one is safe -- assuming i and limit are the same type -- because i + 1 cannot overflow if i < limit. The most it can be is limit itself. But no such guarantee can be made about i + 2, since limit could be INT_MAX (or whatever is the maximum value for the integer type being used). Again, the fix is simple: compare the difference rather than the sum.
If you're using GCC and you don't care about full portability, you can use the GCC overflow-detection builtins to help you. They're also documented in the GCC manual.

The purpose of size_t and its relationship with implementation [duplicate]

I am getting confused with size_t in C. I know that it is returned by the sizeof operator. But what exactly is it? Is it a data type?
Let's say I have a for loop:
for(i = 0; i < some_size; i++)
Should I use int i; or size_t i;?
From Wikipedia:
According to the 1999 ISO C standard
(C99), size_t is an unsigned integer
type of at least 16 bit (see sections
7.17 and 7.18.3).
size_tis an unsigned data type
defined by several C/C++ standards,
e.g. the C99 ISO/IEC 9899 standard,
that is defined in stddef.h.1 It can
be further imported by inclusion of
stdlib.h as this file internally sub
includes stddef.h.
This type is used to represent the
size of an object. Library functions
that take or return sizes expect them
to be of type or have the return type
of size_t. Further, the most
frequently used compiler-based
operator sizeof should evaluate to a
constant value that is compatible with
size_t.
As an implication, size_t is a type guaranteed to hold any array index.
size_t is an unsigned type. So, it cannot represent any negative values(<0). You use it when you are counting something, and are sure that it cannot be negative. For example, strlen() returns a size_t because the length of a string has to be at least 0.
In your example, if your loop index is going to be always greater than 0, it might make sense to use size_t, or any other unsigned data type.
When you use a size_t object, you have to make sure that in all the contexts it is used, including arithmetic, you want non-negative values. For example, let's say you have:
size_t s1 = strlen(str1);
size_t s2 = strlen(str2);
and you want to find the difference of the lengths of str2 and str1. You cannot do:
int diff = s2 - s1; /* bad */
This is because the value assigned to diff is always going to be a positive number, even when s2 < s1, because the calculation is done with unsigned types. In this case, depending upon what your use case is, you might be better off using int (or long long) for s1 and s2.
There are some functions in C/POSIX that could/should use size_t, but don't because of historical reasons. For example, the second parameter to fgets should ideally be size_t, but is int.
size_t is a type that can hold any array index.
Depending on the implementation, it can be any of:
unsigned char
unsigned short
unsigned int
unsigned long
unsigned long long
Here's how size_t is defined in stddef.h of my machine:
typedef unsigned long size_t;
If you are the empirical type,
echo | gcc -E -xc -include 'stddef.h' - | grep size_t
Output for Ubuntu 14.04 64-bit GCC 4.8:
typedef long unsigned int size_t;
Note that stddef.h is provided by GCC and not glibc under src/gcc/ginclude/stddef.h in GCC 4.2.
Interesting C99 appearances
malloc takes size_t as an argument, so it determines the maximum size that may be allocated.
And since it is also returned by sizeof, I think it limits the maximum size of any array.
See also: What is the maximum size of an array in C?
The manpage for types.h says:
size_t shall be an unsigned integer type
To go into why size_t needed to exist and how we got here:
In pragmatic terms, size_t and ptrdiff_t are guaranteed to be 64 bits wide on a 64-bit implementation, 32 bits wide on a 32-bit implementation, and so on. They could not force any existing type to mean that, on every compiler, without breaking legacy code.
A size_t or ptrdiff_t is not necessarily the same as an intptr_t or uintptr_t. They were different on certain architectures that were still in use when size_t and ptrdiff_t were added to the Standard in the late 1980s, and becoming obsolete when C99 added many new types but not gone yet (such as 16-bit Windows). The x86 in 16-bit protected mode had a segmented memory where the largest possible array or structure could be only 65,536 bytes in size, but a far pointer needed to be 32 bits wide, wider than the registers. On those, intptr_t would have been 32 bits wide but size_t and ptrdiff_t could be 16 bits wide and fit in a register. And who knew what kind of operating system might be written in the future? In theory, the i386 architecture offers a 32-bit segmentation model with 48-bit pointers that no operating system has ever actually used.
The type of a memory offset could not be long because far too much legacy code assumes that long is exactly 32 bits wide. This assumption was even built into the UNIX and Windows APIs. Unfortunately, a lot of other legacy code also assumed that a long is wide enough to hold a pointer, a file offset, the number of seconds that have elapsed since 1970, and so on. POSIX now provides a standardized way to force the latter assumption to be true instead of the former, but neither is a portable assumption to make.
It couldn’t be int because only a tiny handful of compilers in the ’90s made int 64 bits wide. Then they really got weird by keeping long 32 bits wide. The next revision of the Standard declared it illegal for int to be wider than long, but int is still 32 bits wide on most 64-bit systems.
It couldn’t be long long int, which anyway was added later, since that was created to be at least 64 bits wide even on 32-bit systems.
So, a new type was needed. Even if it weren’t, all those other types meant something other than an offset within an array or object. And if there was one lesson from the fiasco of 32-to-64-bit migration, it was to be specific about what properties a type needed to have, and not use one that meant different things in different programs.
Since nobody has yet mentioned it, the primary linguistic significance of size_t is that the sizeof operator returns a value of that type. Likewise, the primary significance of ptrdiff_t is that subtracting one pointer from another will yield a value of that type. Library functions that accept it do so because it will allow such functions to work with objects whose size exceeds UINT_MAX on systems where such objects could exist, without forcing callers to waste code passing a value larger than "unsigned int" on systems where the larger type would suffice for all possible objects.
size_t and int are not interchangeable. For instance on 64-bit Linux size_t is 64-bit in size (i.e. sizeof(void*)) but int is 32-bit.
Also note that size_t is unsigned. If you need signed version then there is ssize_t on some platforms and it would be more relevant to your example.
As a general rule I would suggest using int for most general cases and only use size_t/ssize_t when there is a specific need for it (with mmap() for example).
size_t is an unsigned integer data type which can assign only 0 and greater than 0 integer values. It measure bytes of any object's size and is returned by sizeof operator.
const is the syntax representation of size_t, but without const you can run the program.
const size_t number;
size_t regularly used for array indexing and loop counting. If the compiler is 32-bit it would work on unsigned int. If the compiler is 64-bit it would work on unsigned long long int also. There for maximum size of size_t depending on the compiler type.
size_t already defined in the <stdio.h> header file, but it can also be defined by the
<stddef.h>, <stdlib.h>, <string.h>, <time.h>, and <wchar.h> headers.
Example (with const)
#include <stdio.h>
int main()
{
const size_t value = 200;
size_t i;
int arr[value];
for (i = 0 ; i < value ; ++i)
{
arr[i] = i;
}
size_t size = sizeof(arr);
printf("size = %zu\n", size);
}
Output: size = 800
Example (without const)
#include <stdio.h>
int main()
{
size_t value = 200;
size_t i;
int arr[value];
for (i = 0; i < value; ++i)
{
arr[i] = i;
}
size_t size = sizeof(arr);
printf("size = %zu\n", size);
}
Output: size = 800
size_t is a typedef which is used to represent the size of any object in bytes. (Typedefs are used to create an additional name/alias for another data type, but does not create a new type.)
Find it defined in stddef.h as follows:
typedef unsigned long long size_t;
size_t is also defined in the <stdio.h>.
size_t is used as the return type by the sizeof operator.
Use size_t, in conjunction with sizeof, to define the data type of the array size argument as follows:
#include <stdio.h>
void disp_ary(int *ary, size_t ary_size)
{
for (int i = 0; i < ary_size; i++)
{
printf("%d ", ary[i]);
}
}
int main(void)
{
int arr[] = {1, 2, 3, 4, 5, 6, 7, 8, 9, 0};
int ary_size = sizeof(arr)/sizeof(int);
disp_ary(arr, ary_size);
return 0;
}
size_t is guaranteed to be big enough to contain the size of the biggest object the host system can handle.
Note that an array's size limitation is really a factor the system's stack size limitations where this code is compiled and executed. You should be able to adjust the stack size at link time (see ld commands's --stack-size parameter).
To give you an idea of approximate stack sizes:
4K on an embedded device
1M on Win10
7.4M on Linux
Many C library functions like malloc, memcpy and strlen declare their arguments and return type as size_t.
size_t affords the programmer with the ability to deal with different types, by adding/subtracting the number of elements required instead of using the offset in bytes.
Let's get a deeper appreciate for what size_t can do for us by examining its usage in pointer arithmetic operations of a C string and an integer array:
Here's an example using a C string:
const char* reverse(char *orig)
{
size_t len = strlen(orig);
char *rev = orig + len - 1;
while (rev >= orig)
{
printf("%c", *rev);
rev = rev - 1; // <= See below
}
return rev;
}
int main() {
char *string = "123";
printf("%c", reverse(string));
}
// Output: 321
0x7ff626939004 "123" // <= orig
0x7ff626939006 "3" // <= rev - 1 of 3
0x7ff626939005 "23" // <= rev - 2 of 3
0x7ff626939004 "123" // <= rev - 3 of 3
0x7ff6aade9003 "" // <= rev is indeterminant. This can be exploited as an out of bounds bug to read memory contents that this program has no business reading.
That's not very helpful in understanding the benefits of using size_t since a character is one byte, regardless of your architecture.
When we're dealing with numerical types, size_t becomes very beneficial.
size_t type is like an integer with benefits that can hold a physical memory address; That address changes its size according to the type of platform in which it is executed.
Here's how we can leverage sizeof and size_t when passing an array of ints:
void print_reverse(int *orig, size_t ary_size)
{
int *rev = orig + ary_size - 1;
while (rev >= orig)
{
printf("%i", *rev);
rev = rev - 1;
}
}
int main()
{
int nums[] = {1, 2, 3};
print_reverse(nums, sizeof(nums)/sizeof(*nums));
return 0;
}
0x617d3ffb44 1 // <= orig
0x617d3ffb4c 3 // <= rev - 1 of 3
0x617d3ffb48 2 // <= rev - 2 of 3
0x617d3ffb44 1 // <= rev - 3 of 3
Above, we see than an int takes 4 bytes (and since there are 8 bits per byte, an int occupies 32 bits).
If we were to create an array of longs we'd discover that a long takes 64 bits on a linux64 operating system, but only 32 bits on a Win64 system. Hence, using t_size, will save a lot of coding and potential bugs, especially when running C code that performs Address Arithmetic on different architectures.
So the moral of this story is "Use size_t and let your C-compiler do the error-prone work of pointer arithmetic."
size_t is unsigned integer data type. On systems using the GNU C Library, this will be unsigned int or unsigned long int. size_t is commonly used for array indexing and loop counting.
In general, if you are starting at 0 and going upward, always use an unsigned type to avoid an overflow taking you into a negative value situation. This is critically important, because if your array bounds happens to be less than the max of your loop, but your loop max happens to be greater than the max of your type, you will wrap around negative and you may experience a segmentation fault (SIGSEGV). So, in general, never use int for a loop starting at 0 and going upwards. Use an unsigned.
size_t or any unsigned type might be seen used as loop variable as loop variables are typically greater than or equal to 0.
When we use a size_t object, we have to make sure that in all the contexts it is used, including arithmetic, we want only non-negative values. For instance, following program would definitely give the unexpected result:
// C program to demonstrate that size_t or
// any unsigned int type should be used
// carefully when used in a loop
#include<stdio.h>
int main()
{
const size_t N = 10;
int a[N];
// This is fine
for (size_t n = 0; n < N; ++n)
a[n] = n;
// But reverse cycles are tricky for unsigned
// types as can lead to infinite loop
for (size_t n = N-1; n >= 0; --n)
printf("%d ", a[n]);
}
Output
Infinite loop and then segmentation fault
This is a platform-specific typedef. For example, on a particular machine, it might be unsigned int or unsigned long. You should use this definition for more portability of your code.
From my understanding, size_t is an unsigned integer whose bit size is large enough to hold a pointer of the native architecture.
So:
sizeof(size_t) >= sizeof(void*)

Check if unsigned is less than zero

Playing with some sources found code like this:
void foo(unsigned int i)
{
if(i<0)
printf("Less then zero\n");
else
printf("greater or equ\n");
}
int main()
{
int bar = -2;
foo(bar);
return 0;
}
I think there is no sense, but may be there some cases(security?) that makes this check sensable?
An unsigned int cannot be less than 0 by definition. So, to more directly answer your question, you're right in thinking that this makes no sense. It is not a meaningful security item either unless you encounter something like a loop that accidently decrements a signed int past 0 and then casts it as an unsigned int for use as an index into an array and therefore indexes memory outside of the array.
i will always be >=0because it is declared as unsigned and thus interpreted as an unsigned integer.
So your first test will always be false.
Your call foo(bar) actually converts an int into an unsigned int. This may be what confuses you. And "conversion" does not actually change the bytes/bits value of your integer, it is just a matter of formal typing and interpretation.
See this answer for examples of signed/unsigned conversions.
Here is a simple example (the exact output depends on the number of bytes of an unsigned inton your system, for me it is 4 bytes).
Code:
printf("%u\n", (unsigned int) -2);
Output:
4294967294

Why is int rather than unsigned int used for C and C++ for loops?

This is a rather silly question but why is int commonly used instead of unsigned int when defining a for loop for an array in C or C++?
for(int i;i<arraySize;i++){}
for(unsigned int i;i<arraySize;i++){}
I recognize the benefits of using int when doing something other than array indexing and the benefits of an iterator when using C++ containers. Is it just because it does not matter when looping through an array? Or should I avoid it all together and use a different type such as size_t?
Using int is more correct from a logical point of view for indexing an array.
unsigned semantic in C and C++ doesn't really mean "not negative" but it's more like "bitmask" or "modulo integer".
To understand why unsigned is not a good type for a "non-negative" number please consider these totally absurd statements:
Adding a possibly negative integer to a non-negative integer you get a non-negative integer
The difference of two non-negative integers is always a non-negative integer
Multiplying a non-negative integer by a negative integer you get a non-negative result
Obviously none of the above phrases make any sense... but it's how C and C++ unsigned semantic indeed works.
Actually using an unsigned type for the size of containers is a design mistake of C++ and unfortunately we're now doomed to use this wrong choice forever (for backward compatibility). You may like the name "unsigned" because it's similar to "non-negative" but the name is irrelevant and what counts is the semantic... and unsigned is very far from "non-negative".
For this reason when coding most loops on vectors my personally preferred form is:
for (int i=0,n=v.size(); i<n; i++) {
...
}
(of course assuming the size of the vector is not changing during the iteration and that I actually need the index in the body as otherwise the for (auto& x : v)... is better).
This running away from unsigned as soon as possible and using plain integers has the advantage of avoiding the traps that are a consequence of unsigned size_t design mistake. For example consider:
// draw lines connecting the dots
for (size_t i=0; i<pts.size()-1; i++) {
drawLine(pts[i], pts[i+1]);
}
the code above will have problems if the pts vector is empty because pts.size()-1 is a huge nonsense number in that case. Dealing with expressions where a < b-1 is not the same as a+1 < b even for commonly used values is like dancing in a minefield.
Historically the justification for having size_t unsigned is for being able to use the extra bit for the values, e.g. being able to have 65535 elements in arrays instead of just 32767 on 16-bit platforms. In my opinion even at that time the extra cost of this wrong semantic choice was not worth the gain (and if 32767 elements are not enough now then 65535 won't be enough for long anyway).
Unsigned values are great and very useful, but NOT for representing container size or for indexes; for size and index regular signed integers work much better because the semantic is what you would expect.
Unsigned values are the ideal type when you need the modulo arithmetic property or when you want to work at the bit level.
This is a more general phenomenon, often people don't use the correct types for their integers. Modern C has semantic typedefs that are much preferable over the primitive integer types. E.g everything that is a "size" should just be typed as size_t. If you use the semantic types systematically for your application variables, loop variables come much easier with these types, too.
And I have seen several bugs that where difficult to detect that came from using int or so. Code that all of a sudden crashed on large matrixes and stuff like that. Just coding correctly with correct types avoids that.
It's purely laziness and ignorance. You should always use the right types for indices, and unless you have further information that restricts the range of possible indices, size_t is the right type.
Of course if the dimension was read from a single-byte field in a file, then you know it's in the range 0-255, and int would be a perfectly reasonable index type. Likewise, int would be okay if you're looping a fixed number of times, like 0 to 99. But there's still another reason not to use int: if you use i%2 in your loop body to treat even/odd indices differently, i%2 is a lot more expensive when i is signed than when i is unsigned...
Not much difference. One benefit of int is it being signed. Thus int i < 0 makes sense, while unsigned i < 0 doesn't much.
If indexes are calculated, that may be beneficial (for example, you might get cases where you will never enter a loop if some result is negative).
And yes, it is less to write :-)
Using int to index an array is legacy, but still widely adopted. int is just a generic number type and does not correspond to the addressing capabilities of the platform. In case it happens to be shorter or longer than that, you may encounter strange results when trying to index a very large array that goes beyond.
On modern platforms, off_t, ptrdiff_t and size_t guarantee much more portability.
Another advantage of these types is that they give context to someone who reads the code. When you see the above types you know that the code will do array subscripting or pointer arithmetic, not just any calculation.
So, if you want to write bullet-proof, portable and context-sensible code, you can do it at the expense of a few keystrokes.
GCC even supports a typeof extension which relieves you from typing the same typename all over the place:
typeof(arraySize) i;
for (i = 0; i < arraySize; i++) {
...
}
Then, if you change the type of arraySize, the type of i changes automatically.
It really depends on the coder. Some coders prefer type perfectionism, so they'll use whatever type they're comparing against. For example, if they're iterating through a C string, you might see:
size_t sz = strlen("hello");
for (size_t i = 0; i < sz; i++) {
...
}
While if they're just doing something 10 times, you'll probably still see int:
for (int i = 0; i < 10; i++) {
...
}
I use int cause it requires less physical typing and it doesn't matter - they take up the same amount of space, and unless your array has a few billion elements you won't overflow if you're not using a 16-bit compiler, which I'm usually not.
Because unless you have an array with size bigger than two gigabyts of type char, or 4 gigabytes of type short or 8 gigabytes of type int etc, it doesn't really matter if the variable is signed or not.
So, why type more when you can type less?
Aside from the issue that it's shorter to type, the reason is that it allows negative numbers.
Since we can't say in advance whether a value can ever be negative, most functions that take integer arguments take the signed variety. Since most functions use signed integers, it is often less work to use signed integers for things like loops. Otherwise, you have the potential of having to add a bunch of typecasts.
As we move to 64-bit platforms, the unsigned range of a signed integer should be more than enough for most purposes. In these cases, there's not much reason not to use a signed integer.
Consider the following simple example:
int max = some_user_input; // or some_calculation_result
for(unsigned int i = 0; i < max; ++i)
do_something;
If max happens to be a negative value, say -1, the -1 will be regarded as UINT_MAX (when two integers with the sam rank but different sign-ness are compared, the signed one will be treated as an unsigned one). On the other hand, the following code would not have this issue:
int max = some_user_input;
for(int i = 0; i < max; ++i)
do_something;
Give a negative max input, the loop will be safely skipped.
Using a signed int is - in most cases - a mistake that could easily result in potential bugs as well as undefined behavior.
Using size_t matches the system's word size (64 bits on 64 bit systems and 32 bits on 32 bit systems), always allowing for the correct range for the loop and minimizing the risk of an integer overflow.
The int recommendation comes to solve an issue where reverse for loops were often written incorrectly by unexperienced programmers (of course, int might not be in the correct range for the loop):
/* a correct reverse for loop */
for (size_t i = count; i > 0;) {
--i; /* note that this is not part of the `for` statement */
/* code for loop where i is for zero based `index` */
}
/* an incorrect reverse for loop (bug on count == 0) */
for (size_t i = count - 1; i > 0; --i) {
/* i might have overflowed and undefined behavior occurs */
}
In general, signed and unsigned variables shouldn't be mixed together, so at times using an int in unavoidable. However, the correct type for a for loop is as a rule size_t.
There's a nice talk about this misconception that signed variables are better than unsigned variables, you can find it on YouTube (Signed Integers Considered Harmful by Robert Seacord).
TL;DR;: Signed variables are more dangerous and require more code than unsigned variables (which should be preferred almost in all cases and definitely whenever negative values aren't logically expected).
With unsigned variables the only concern is the overflow boundary which has a strictly defined behavior (wrap-around) and uses clearly defined modular mathematics.
This allows a single edge case test to catch an overflow and that test can be performed after the mathematical operation was executed.
However, with signed variables the overflow behavior is undefined (UB) and the negative range is actually larger than the positive range - things that add edge cases that must be tested for and explicitly handled before the mathematical operation can be executed.
i.e., how much INT_MIN * -1? (the pre-processor will protect you, but without it you're in a jam).
P.S.
As for the example offered by #6502 in their answer, the whole thing is again an issue of trying to cut corners and a simple missing if statement.
When a loop assumes at least 2 elements in an array, this assumption should be tested beforehand. i.e.:
// draw lines connecting the dots - forward loop
if(pts.size() > 1) { // first make sure there's enough dots
for (size_t i=0; i < pts.size()-1; i++) { // then loop
drawLine(pts[i], pts[i+1]);
}
}
// or test against i + 1 : which tests the desired pts[i+1]
for (size_t i = 0; i + 1 < pts.size(); i++) { // then loop
drawLine(pts[i], pts[i+1]);
}
// or start i as 1 : but note that `-` is slower than `+`
for (size_t i = 1; i < pts.size(); i++) { // then loop
drawLine(pts[i - 1], pts[i]);
}

Resources