Why is int rather than unsigned int used for C and C++ for loops? - c

This is a rather silly question but why is int commonly used instead of unsigned int when defining a for loop for an array in C or C++?
for(int i;i<arraySize;i++){}
for(unsigned int i;i<arraySize;i++){}
I recognize the benefits of using int when doing something other than array indexing and the benefits of an iterator when using C++ containers. Is it just because it does not matter when looping through an array? Or should I avoid it all together and use a different type such as size_t?

Using int is more correct from a logical point of view for indexing an array.
unsigned semantic in C and C++ doesn't really mean "not negative" but it's more like "bitmask" or "modulo integer".
To understand why unsigned is not a good type for a "non-negative" number please consider these totally absurd statements:
Adding a possibly negative integer to a non-negative integer you get a non-negative integer
The difference of two non-negative integers is always a non-negative integer
Multiplying a non-negative integer by a negative integer you get a non-negative result
Obviously none of the above phrases make any sense... but it's how C and C++ unsigned semantic indeed works.
Actually using an unsigned type for the size of containers is a design mistake of C++ and unfortunately we're now doomed to use this wrong choice forever (for backward compatibility). You may like the name "unsigned" because it's similar to "non-negative" but the name is irrelevant and what counts is the semantic... and unsigned is very far from "non-negative".
For this reason when coding most loops on vectors my personally preferred form is:
for (int i=0,n=v.size(); i<n; i++) {
...
}
(of course assuming the size of the vector is not changing during the iteration and that I actually need the index in the body as otherwise the for (auto& x : v)... is better).
This running away from unsigned as soon as possible and using plain integers has the advantage of avoiding the traps that are a consequence of unsigned size_t design mistake. For example consider:
// draw lines connecting the dots
for (size_t i=0; i<pts.size()-1; i++) {
drawLine(pts[i], pts[i+1]);
}
the code above will have problems if the pts vector is empty because pts.size()-1 is a huge nonsense number in that case. Dealing with expressions where a < b-1 is not the same as a+1 < b even for commonly used values is like dancing in a minefield.
Historically the justification for having size_t unsigned is for being able to use the extra bit for the values, e.g. being able to have 65535 elements in arrays instead of just 32767 on 16-bit platforms. In my opinion even at that time the extra cost of this wrong semantic choice was not worth the gain (and if 32767 elements are not enough now then 65535 won't be enough for long anyway).
Unsigned values are great and very useful, but NOT for representing container size or for indexes; for size and index regular signed integers work much better because the semantic is what you would expect.
Unsigned values are the ideal type when you need the modulo arithmetic property or when you want to work at the bit level.

This is a more general phenomenon, often people don't use the correct types for their integers. Modern C has semantic typedefs that are much preferable over the primitive integer types. E.g everything that is a "size" should just be typed as size_t. If you use the semantic types systematically for your application variables, loop variables come much easier with these types, too.
And I have seen several bugs that where difficult to detect that came from using int or so. Code that all of a sudden crashed on large matrixes and stuff like that. Just coding correctly with correct types avoids that.

It's purely laziness and ignorance. You should always use the right types for indices, and unless you have further information that restricts the range of possible indices, size_t is the right type.
Of course if the dimension was read from a single-byte field in a file, then you know it's in the range 0-255, and int would be a perfectly reasonable index type. Likewise, int would be okay if you're looping a fixed number of times, like 0 to 99. But there's still another reason not to use int: if you use i%2 in your loop body to treat even/odd indices differently, i%2 is a lot more expensive when i is signed than when i is unsigned...

Not much difference. One benefit of int is it being signed. Thus int i < 0 makes sense, while unsigned i < 0 doesn't much.
If indexes are calculated, that may be beneficial (for example, you might get cases where you will never enter a loop if some result is negative).
And yes, it is less to write :-)

Using int to index an array is legacy, but still widely adopted. int is just a generic number type and does not correspond to the addressing capabilities of the platform. In case it happens to be shorter or longer than that, you may encounter strange results when trying to index a very large array that goes beyond.
On modern platforms, off_t, ptrdiff_t and size_t guarantee much more portability.
Another advantage of these types is that they give context to someone who reads the code. When you see the above types you know that the code will do array subscripting or pointer arithmetic, not just any calculation.
So, if you want to write bullet-proof, portable and context-sensible code, you can do it at the expense of a few keystrokes.
GCC even supports a typeof extension which relieves you from typing the same typename all over the place:
typeof(arraySize) i;
for (i = 0; i < arraySize; i++) {
...
}
Then, if you change the type of arraySize, the type of i changes automatically.

It really depends on the coder. Some coders prefer type perfectionism, so they'll use whatever type they're comparing against. For example, if they're iterating through a C string, you might see:
size_t sz = strlen("hello");
for (size_t i = 0; i < sz; i++) {
...
}
While if they're just doing something 10 times, you'll probably still see int:
for (int i = 0; i < 10; i++) {
...
}

I use int cause it requires less physical typing and it doesn't matter - they take up the same amount of space, and unless your array has a few billion elements you won't overflow if you're not using a 16-bit compiler, which I'm usually not.

Because unless you have an array with size bigger than two gigabyts of type char, or 4 gigabytes of type short or 8 gigabytes of type int etc, it doesn't really matter if the variable is signed or not.
So, why type more when you can type less?

Aside from the issue that it's shorter to type, the reason is that it allows negative numbers.
Since we can't say in advance whether a value can ever be negative, most functions that take integer arguments take the signed variety. Since most functions use signed integers, it is often less work to use signed integers for things like loops. Otherwise, you have the potential of having to add a bunch of typecasts.
As we move to 64-bit platforms, the unsigned range of a signed integer should be more than enough for most purposes. In these cases, there's not much reason not to use a signed integer.

Consider the following simple example:
int max = some_user_input; // or some_calculation_result
for(unsigned int i = 0; i < max; ++i)
do_something;
If max happens to be a negative value, say -1, the -1 will be regarded as UINT_MAX (when two integers with the sam rank but different sign-ness are compared, the signed one will be treated as an unsigned one). On the other hand, the following code would not have this issue:
int max = some_user_input;
for(int i = 0; i < max; ++i)
do_something;
Give a negative max input, the loop will be safely skipped.

Using a signed int is - in most cases - a mistake that could easily result in potential bugs as well as undefined behavior.
Using size_t matches the system's word size (64 bits on 64 bit systems and 32 bits on 32 bit systems), always allowing for the correct range for the loop and minimizing the risk of an integer overflow.
The int recommendation comes to solve an issue where reverse for loops were often written incorrectly by unexperienced programmers (of course, int might not be in the correct range for the loop):
/* a correct reverse for loop */
for (size_t i = count; i > 0;) {
--i; /* note that this is not part of the `for` statement */
/* code for loop where i is for zero based `index` */
}
/* an incorrect reverse for loop (bug on count == 0) */
for (size_t i = count - 1; i > 0; --i) {
/* i might have overflowed and undefined behavior occurs */
}
In general, signed and unsigned variables shouldn't be mixed together, so at times using an int in unavoidable. However, the correct type for a for loop is as a rule size_t.
There's a nice talk about this misconception that signed variables are better than unsigned variables, you can find it on YouTube (Signed Integers Considered Harmful by Robert Seacord).
TL;DR;: Signed variables are more dangerous and require more code than unsigned variables (which should be preferred almost in all cases and definitely whenever negative values aren't logically expected).
With unsigned variables the only concern is the overflow boundary which has a strictly defined behavior (wrap-around) and uses clearly defined modular mathematics.
This allows a single edge case test to catch an overflow and that test can be performed after the mathematical operation was executed.
However, with signed variables the overflow behavior is undefined (UB) and the negative range is actually larger than the positive range - things that add edge cases that must be tested for and explicitly handled before the mathematical operation can be executed.
i.e., how much INT_MIN * -1? (the pre-processor will protect you, but without it you're in a jam).
P.S.
As for the example offered by #6502 in their answer, the whole thing is again an issue of trying to cut corners and a simple missing if statement.
When a loop assumes at least 2 elements in an array, this assumption should be tested beforehand. i.e.:
// draw lines connecting the dots - forward loop
if(pts.size() > 1) { // first make sure there's enough dots
for (size_t i=0; i < pts.size()-1; i++) { // then loop
drawLine(pts[i], pts[i+1]);
}
}
// or test against i + 1 : which tests the desired pts[i+1]
for (size_t i = 0; i + 1 < pts.size(); i++) { // then loop
drawLine(pts[i], pts[i+1]);
}
// or start i as 1 : but note that `-` is slower than `+`
for (size_t i = 1; i < pts.size(); i++) { // then loop
drawLine(pts[i - 1], pts[i]);
}

Related

What type should be used to loop through an array? [duplicate]

This question already has answers here:
What is the correct type for array indexes in C?
(9 answers)
For iterating though an array should we be using size_t or ptrdiff_t?
(3 answers)
Closed 3 years ago.
Let's have this array:
char arr[SIZE_MAX];
And I want to loop through it (and one past its last element):
char *x;
for (i = 0; i <= sizeof(arr); i++)
x = &arr[i];
(Edited to add some valid code with pointers).
What is the most adequate type for i?
I accept non-standard types such as POSIX's types.
Related question:
Assign result of sizeof() to ssize_t
Let's have this array:
char arr[SIZE_MAX];
Note that many C implementations will reject that, SIZE_MAX being larger than any array they can actually support. SIZE_MAX is the maximum value of type size_t, but that's not the same thing as the (implementation-dependent) maximum size of an object.
And I want to loop through it (and one past its last element):
for (i = 0; i <= sizeof(arr); i++)
What is the most adequate type for i?
There is no type available that we can be confident of being able to represent SIZE_MAX + 1, and supposing that arr indeed has size SIZE_MAX, you need i's type to be able to represent that value in order to avoid looping infinitely. Unless you want to loop infinitely, which is possible, albeit unlikely.
On the other hand, if we have
char arr[NOT_SURE_HOW_LARGE_BUT_LESS_THAN_SIZE_MAX];
then the appropriate type for i is size_t. This is the type of the result of the sizeof operator, and if the only assumption we're making is that the object's size is smaller than SIZE_MAX then size_t is the only type we can be confident will serve the purpose.
If we can assume tighter bounds on the object's size, then it may be advantageous to choose a different type. In particular, if we can be confident that it is smaller than UINT_MAX then unsigned int may be a slightly more performant choice. Of course, signed types may be used for i, too, provided that they can in fact represent all the needed values, but this may elicit a warning from some compilers.
Hmm, interesting case. You want to be able to loop with, let's just call it hypothetically the largest possible number, and thus wrap-around rather than ever having the comparison return false. Something like this might do the trick:
char arr[SIZE_MAX];
size_t i = 0;
do {
/* whatever you wanna do here */
++i;
}
while (i <= sizeof(arr) && i != 0);
Because a do-while will always perform the first iteration, we can put a condition such that the loop will end when i == 0, and it will only happen upon wrapping back around to 0 rather than on the first iteration. Add another condition for i <= sizeof(arr) to take care of cases where we don't have wrap-around.
size_t is your friend in this case. It guarantees you access to any array index.

About Using size_t

Suppose I wanted to initialize an array of characters, with the 'x' character, using a loop, should I use an int or a size_t?
char *pch = (char *) malloc(100);
for (size_t s = 0; s < 100; s++)
pch[s] = 'x';
Note: I know about memset and other 'tricks', my question is just about int VS size_t.
Since the variable is indexing an array, using size_t is more appropriate as an array index can't (normally) be negative.
Also, since array indexes are often compared against the result of a strlen call or sizeof operator, both of which yield a size_t, using an int could generate warnings for signed/unsigned comparisons.
However, if you're initializing all bytes of a buffer to a given value, you should instead use memset:
memset(pch, 'x', 100);
size_t is more appropriate for indexing arrays as it is guaranteed to have the correct range for all array sizes. Yet there are some issues to consider:
size_t is an unsigned type, so you must be careful to only compute positive values in tests.
For example iterating downwards this way will not work:
for (size_t i = size - 1; i >= 0; i--) {
array[i] = 0;
}
This naive approach has 2 problems: it iterates even if size is 0 and it loops forever because i >= 0 is always true.
A better approach is this:
for (size_t i = size; i-- > 0;) {
array[i] = 0;
}
size_t may be larger than int, so the format %d in printf is inappropriate for index values of this type. The standard conversion specifier is %zd, but it is not supported on many windows systems. You may need to cast the size_t value as (int) or use %llu and cast the size_t value as unsigned long long.
Some people consider size_t inelegant, maybe because of the _ or just because they are not used to it.
For these and other reasons, it is quite common to see index variables defined as int. While it is OK for small arrays and short loops, it will cause hard to find bugs in code where these assumptions end up failing.
The generated code is likely to be identical, especially for such a simple case. With more complex math, signed types (if you can safely use them) are somewhat more optimizable because the compiler is allowed to assume they never overflow. With signed types, you also won't get unpleasant surprises if you decide to compare against a negative index.
So if you sum it up:
very_large_arrays negative_comparison_safe maybe_faster
int no yes yes
size_t yes no no
it looks like int could be preferable with definitely-small (<2^15 or perhaps 2^31 if your architecture targets guarantee that) ranges unless you can think of another criterion where size_t wins.
The advantage of size_t is that it will definitely work for any array no matter the size as long as you aren't comparing against negative indices. (This may be much more important than negative-comparison-safety and potential speed gains from undefined overflow).
ssize_t (== signed size_t) combines the best of both, unless you need every last bit of size_t (you definitely don't on a 64 bit machine).

Should I use the stdint.h integer types on 32/64 bit machines?

One thing that bugs me about the regular c integer declarations is that their names are strange, "long long" being the worst. I am only building for 32 and 64 bit machines so I do not necessarily need the portability that the library offers, however I like that the name for each type is a single word in similar length with no ambiguity in size.
// multiple word types are hard to read
// long integers can be 32 or 64 bits depending on the machine
unsigned long int foo = 64;
long int bar = -64;
// easy to read
// no ambiguity
uint64_t foo = 64;
int64_t bar = -64;
On 32 and 64 bit machines:
1) Can using a smaller integer such as int16_t be slower than something higher such as int32_t?
2) If I needed a for loop to run just 10 times, is it ok to use the smallest integer that can handle it instead of the typical 32 bit integer?
for (int8_t i = 0; i < 10; i++) {
}
3) Whenever I use an integer that I know will never be negative is it ok to prefer using the unsigned version even if I do not need the extra range in provides?
// instead of the one above
for (uint8_t i = 0; i < 10; i++) {
}
4) Is it safe to use a typedef for the types included from stdint.h
typedef int32_t signed_32_int;
typedef uint32_t unsigned_32_int;
edit: both answers were equally good and I couldn't really lean towards one so I just picked the answerer with lower rep
1) Can using a smaller integer such as int16_t be slower than something higher such as int32_t?
Yes it can be slower. Use int_fast16_t instead. Profile the code as needed. Performance is very implementation dependent. A prime benefit of int16_t is its small, well defined size (also it must be 2's complement) as used in structures and arrays, not so much for speed.
The typedef name int_fastN_t designates the fastest signed integer type with a width of at least N. C11 §7.20.1.3 2
2) If I needed a for loop to run just 10 times, is it ok to use the smallest integer that can handle it instead of the typical 32 bit integer?
Yes but that savings in code and speed is questionable. Suggest int instead. Emitted code tends to be optimal in speed/size with the native int size.
3) Whenever I use an integer that I know will never be negative is it OK to prefer using the unsigned version even if I do not need the extra range in provides?
Using some unsigned type is preferred when the math is strictly unsigned (such as array indexing with size_t), yet code needs to watch for careless application like
for (unsigned i = 10 ; i >= 0; i--) // infinite loop
4) Is it safe to use a typedef for the types included from stdint.h
Almost always. Types like int16_t are optional. Maximum portability uses required types uint_least16_t and uint_fast16_t for code to run on rare platforms that use bits widths like 9, 18, etc.
Can using a smaller integer such as int16_t be slower than something higher such as int32_t?
Yes. Some CPUs do not have dedicated 16-bit arithmetic instructions; arithmetic on 16-bit integers must be emulated with an instruction sequence along the lines of:
r1 = r2 + r3
r1 = r1 & 0xffff
The same principle applies to 8-bit types.
Use the "fast" integer types in <stdint.h> to avoid this -- for instance, int_fast16_t will give you an integer that is at least 16 bits wide, but may be wider if 16-bit types are nonoptimal.
If I needed a for loop to run just 10 times, is it ok to use the smallest integer that can handle it instead of the typical 32 bit integer?
Don't bother; just use int. Using a narrower type doesn't actually save any space, and may cause you issues down the line if you decide to increase the number of iterations to over 127 and forget that the loop variable is using a narrow type.
Whenever I use an integer that I know will never be negative is it ok to prefer using the unsigned version even if I do not need the extra range in provides?
Best avoided. Certain C idioms do not work properly on unsigned integers; for instance, you cannot write a loop of the form:
for (i = 100; i >= 0; i--) { … }
if i is an unsigned type, because i >= 0 will always be true!
Is it safe to use a typedef for the types included from stdint.h
Safe from a technical perspective, but it'll annoy other developers who have to work with your code.
Get used to the <stdint.h> names. They're standardized and reasonably easy to type.
Absolutely possible, yes. On my laptop (Intel Haswell), in a microbenchmark that counts up and down between 0 and 65535 on two registers 2 billion times, this takes
1.313660150s - ax dx (16-bit)
1.312484805s - eax edx (32-bit)
1.312270238s - rax rdx (64-bit)
Minuscule but repeatable differences in timing. (I wrote the benchmark in assembly, because C compilers may optimize it to a different register size.)
It will work, but you'll have to keep it up to date if you change the bounds and the C compiler will probably optimize it to the same assembly code anyway.
As long as it's correct C, that's totally fine. Keep in mind that unsigned overflow is defined and signed overflow is undefined, and compilers do take advantage of that for optimization. For example,
void foo(int start, int count) {
for (int i = start; i < start + count; i++) {
// With unsigned arithmetic, this will execute 0 times if
// "start + count" overflows to a number smaller than "start".
// With signed arithmetic, that may happen, or the compiler
// may assume this loop always runs "count" times.
// For defined behavior, avoid signed overflow.
}
Yes. Also, POSIX provides inttypes.h which extends stdint.h with some useful functions and macros.

Why do we use signed integers in for loops? [duplicate]

This question already has answers here:
Why is int rather than unsigned int used for C and C++ for loops?
(11 answers)
Closed 6 years ago.
Indices are always nonnegative so I use size_t where ever possible. Why is it common to use signed integers then? What's the rationale behind it?
I think it's mainly due to a few reasons that work together:
History, I think int was basically thought of as "a machine word" back in the day (and probably still is, by many). So it's kind of "the default type", the one you use without much further thought.
Many loops are short, so the better precision given by unsigned types doesn't matter which makes people not think about using an unsigned type.
Ease of typing, int is much nicer on the hands than unsigned or (unsigned int).
Many people simply don't realize how much sensical an unsigned type is for iteration, or (the horror) don't care.
If you mean things like
for(int i=0; i<n; i++)
then there is no rationale. Sloppy, lazy programmers write sloppy code, simple as that.
Unless negative numbers are used, the type should indeed be some suitable unsigned type. size_t in case the iterator is used to index an array. Otherwise uint_fastn_t, where "n" should be large enough to contain all values of the iteration.
unsigned types have a major discontinuity at 0, it is safer for programmers to use a signed type for some loops such as running indices downwards. We often see this:
for (int i = n - 1; i >= 0; i--) {
...
}
Using an unsigned type in the above loop fails. There are ways to write such a loop with an unsigned type, but many programmers will find it less intuitive:
for (unsigned i = n; i-- > 0; ) {
...
}
Furthermore, indices are not necessarily unsigned: negative indices can be used with C pointers and some algorithms are written to take advantage of this.
The reason for the widespread use of int is historical: size_t was introduced late in the game and only became larger than unsigned quite recently, except for some older exotic systems.

What is the difference between #define and declaration

#include <stdio.h>
#include <math.h>
// #define LIMIT 600851475143
int isP(long i);
void run();
// 6857
int main()
{
//int i = 6857;
//printf("%d\n", isP(i));
run();
}
void run()
{
long LIMIT = 600851475143;
// 3, 5
// under 1000
long i, largest =1, temp=0;
for(i=3; i<=775147; i+=2)
{
temp = ((LIMIT/i)*i);
if(LIMIT == temp)
if(isP(i)==1)
largest = i;
}
printf("%d\n",largest);
}
int isP(long i)
{
long j;
for(j=3; j<= i/2; j+=2)
if(i == (i/j)*j)
return 0;
return 1;
}
I just met an interesting issue. As above shows, this piece of code is designed to calculate the largest prime number of LIMIT. The program as shows above gave me an answer of 29, which is incorrect.
While, miraculously, when I defined the LIMIT value (instead of declaring it as long), it could give me the correct value: 6857.
Could someone help me to figure out the reason? Thanks a lot!
A long on many platforms is a 4 byte integer, and will overflow at 2,147,483,647. For example, see Visual C++'s Data Type Ranges page.
When you use a #define, the compiler is free to choose a more appropriate type which can hold your very large number. This can cause it to behave correctly, and give you the answer you expect.
In general, however, I would recommend being explicit about the data type, and choosing a data type that will represent the number correctly without requiring compiler and platform specific behavior, if possible.
I would suspect a numeric type issue here.
#define is a preprocessor directive, so it would replace LIMIT with that number in the code before running the compiler. This leaves the door open for the compiler to interpret that number how it wants, which may not be as a long.
In your case, long probably isn't big enough, so the compiler chooses something else when you use #define. For consistent behavior, you should specify a type that you know has an appropriate range and not rely on the compiler to guess correctly.
You should also turn on full warnings on your compiler, it might be able to detect this sort of problem for you.
When you enter an expression like:
(600851475143 + 1)
everything is fine, as the compiler automatically promotes both of those constants to an appropriate type (like long long in your case) large enough to perform the calculation. You can do as many expressions as you want in this way. But when you write:
long n = 600851475143;
the compiler tries to assign a long long (or whatever the constant is implicitly converted to) to a long, which results in a problem in your case. Your compiler should warn you about this, gcc for example says:
warning: overflow in implicit constant conversion [-Woverflow]
Of course, if a long is big enough to hold that value, there's no problem, since the constant will be a type either the same size as long or smaller.
It is probably because 600851475143 is larger than LONG_MAX (2147483647 according to this)
Try replacing the type of limit with 'long long'. The way it stands, it wraps around (try printing limit as a long). The preprocessor knows that and uses the right type.
Your code basically reduce to these two possibilities:
long LIMIT = 600851475143;
x = LIMIT / i;
vs.
#define LIMIT 600851475143
x = LIMIT / i;
The first one is equivalent to casting the constant into long:
x = (long)600851475143 / i;
while the second one will be precompiled into:
x = 600851475143 / i;
And here lies the difference: 600851475143 is too big for your compiler's long type so if it is cast into long it overflows and goes crazy. But if it is used directly in the division, the compiler knows that it doesn't fit into a long, automatically interprets it as a long long literal, the i is promoted and the division is done as a long long.
Note, however, that even it the algorithm seems to work most of the time, you still have overflows elsewhere and so the code is incorrect. You should declare any variable that may hold these big values as long long.

Resources