Why do we use signed integers in for loops? [duplicate] - c

This question already has answers here:
Why is int rather than unsigned int used for C and C++ for loops?
(11 answers)
Closed 6 years ago.
Indices are always nonnegative so I use size_t where ever possible. Why is it common to use signed integers then? What's the rationale behind it?

I think it's mainly due to a few reasons that work together:
History, I think int was basically thought of as "a machine word" back in the day (and probably still is, by many). So it's kind of "the default type", the one you use without much further thought.
Many loops are short, so the better precision given by unsigned types doesn't matter which makes people not think about using an unsigned type.
Ease of typing, int is much nicer on the hands than unsigned or (unsigned int).
Many people simply don't realize how much sensical an unsigned type is for iteration, or (the horror) don't care.

If you mean things like
for(int i=0; i<n; i++)
then there is no rationale. Sloppy, lazy programmers write sloppy code, simple as that.
Unless negative numbers are used, the type should indeed be some suitable unsigned type. size_t in case the iterator is used to index an array. Otherwise uint_fastn_t, where "n" should be large enough to contain all values of the iteration.

unsigned types have a major discontinuity at 0, it is safer for programmers to use a signed type for some loops such as running indices downwards. We often see this:
for (int i = n - 1; i >= 0; i--) {
...
}
Using an unsigned type in the above loop fails. There are ways to write such a loop with an unsigned type, but many programmers will find it less intuitive:
for (unsigned i = n; i-- > 0; ) {
...
}
Furthermore, indices are not necessarily unsigned: negative indices can be used with C pointers and some algorithms are written to take advantage of this.
The reason for the widespread use of int is historical: size_t was introduced late in the game and only became larger than unsigned quite recently, except for some older exotic systems.

Related

What type should be used to loop through an array? [duplicate]

This question already has answers here:
What is the correct type for array indexes in C?
(9 answers)
For iterating though an array should we be using size_t or ptrdiff_t?
(3 answers)
Closed 3 years ago.
Let's have this array:
char arr[SIZE_MAX];
And I want to loop through it (and one past its last element):
char *x;
for (i = 0; i <= sizeof(arr); i++)
x = &arr[i];
(Edited to add some valid code with pointers).
What is the most adequate type for i?
I accept non-standard types such as POSIX's types.
Related question:
Assign result of sizeof() to ssize_t
Let's have this array:
char arr[SIZE_MAX];
Note that many C implementations will reject that, SIZE_MAX being larger than any array they can actually support. SIZE_MAX is the maximum value of type size_t, but that's not the same thing as the (implementation-dependent) maximum size of an object.
And I want to loop through it (and one past its last element):
for (i = 0; i <= sizeof(arr); i++)
What is the most adequate type for i?
There is no type available that we can be confident of being able to represent SIZE_MAX + 1, and supposing that arr indeed has size SIZE_MAX, you need i's type to be able to represent that value in order to avoid looping infinitely. Unless you want to loop infinitely, which is possible, albeit unlikely.
On the other hand, if we have
char arr[NOT_SURE_HOW_LARGE_BUT_LESS_THAN_SIZE_MAX];
then the appropriate type for i is size_t. This is the type of the result of the sizeof operator, and if the only assumption we're making is that the object's size is smaller than SIZE_MAX then size_t is the only type we can be confident will serve the purpose.
If we can assume tighter bounds on the object's size, then it may be advantageous to choose a different type. In particular, if we can be confident that it is smaller than UINT_MAX then unsigned int may be a slightly more performant choice. Of course, signed types may be used for i, too, provided that they can in fact represent all the needed values, but this may elicit a warning from some compilers.
Hmm, interesting case. You want to be able to loop with, let's just call it hypothetically the largest possible number, and thus wrap-around rather than ever having the comparison return false. Something like this might do the trick:
char arr[SIZE_MAX];
size_t i = 0;
do {
/* whatever you wanna do here */
++i;
}
while (i <= sizeof(arr) && i != 0);
Because a do-while will always perform the first iteration, we can put a condition such that the loop will end when i == 0, and it will only happen upon wrapping back around to 0 rather than on the first iteration. Add another condition for i <= sizeof(arr) to take care of cases where we don't have wrap-around.
size_t is your friend in this case. It guarantees you access to any array index.

About Using size_t

Suppose I wanted to initialize an array of characters, with the 'x' character, using a loop, should I use an int or a size_t?
char *pch = (char *) malloc(100);
for (size_t s = 0; s < 100; s++)
pch[s] = 'x';
Note: I know about memset and other 'tricks', my question is just about int VS size_t.
Since the variable is indexing an array, using size_t is more appropriate as an array index can't (normally) be negative.
Also, since array indexes are often compared against the result of a strlen call or sizeof operator, both of which yield a size_t, using an int could generate warnings for signed/unsigned comparisons.
However, if you're initializing all bytes of a buffer to a given value, you should instead use memset:
memset(pch, 'x', 100);
size_t is more appropriate for indexing arrays as it is guaranteed to have the correct range for all array sizes. Yet there are some issues to consider:
size_t is an unsigned type, so you must be careful to only compute positive values in tests.
For example iterating downwards this way will not work:
for (size_t i = size - 1; i >= 0; i--) {
array[i] = 0;
}
This naive approach has 2 problems: it iterates even if size is 0 and it loops forever because i >= 0 is always true.
A better approach is this:
for (size_t i = size; i-- > 0;) {
array[i] = 0;
}
size_t may be larger than int, so the format %d in printf is inappropriate for index values of this type. The standard conversion specifier is %zd, but it is not supported on many windows systems. You may need to cast the size_t value as (int) or use %llu and cast the size_t value as unsigned long long.
Some people consider size_t inelegant, maybe because of the _ or just because they are not used to it.
For these and other reasons, it is quite common to see index variables defined as int. While it is OK for small arrays and short loops, it will cause hard to find bugs in code where these assumptions end up failing.
The generated code is likely to be identical, especially for such a simple case. With more complex math, signed types (if you can safely use them) are somewhat more optimizable because the compiler is allowed to assume they never overflow. With signed types, you also won't get unpleasant surprises if you decide to compare against a negative index.
So if you sum it up:
very_large_arrays negative_comparison_safe maybe_faster
int no yes yes
size_t yes no no
it looks like int could be preferable with definitely-small (<2^15 or perhaps 2^31 if your architecture targets guarantee that) ranges unless you can think of another criterion where size_t wins.
The advantage of size_t is that it will definitely work for any array no matter the size as long as you aren't comparing against negative indices. (This may be much more important than negative-comparison-safety and potential speed gains from undefined overflow).
ssize_t (== signed size_t) combines the best of both, unless you need every last bit of size_t (you definitely don't on a 64 bit machine).

What's the point in specifying unsigned integers with "U"?

I have always, for as long as I can remember and ubiquitously, done this:
for (unsigned int i = 0U; i < 10U; ++i)
{
// ...
}
In other words, I use the U specifier on unsigned integers. Now having just looked at this for far too long, I'm wondering why I do this. Apart from signifying intent, I can't think of a reason why it's useful in trivial code like this?
Is there a valid programming reason why I should continue with this convention, or is it redundant?
First, I'll state what is probably obvious to you, but your question leaves room for it, so I'm making sure we're all on the same page.
There are obvious differences between unsigned ints and regular ints: The difference in their range (-2,147,483,648 to 2,147,483,647 for an int32 and 0 to 4,294,967,295 for a uint32). There's a difference in what bits are put at the most significant bit when you use the right bitshift >> operator.
The suffix is important when you need to tell the compiler to treat the constant value as a uint instead of a regular int. This may be important if the constant is outside the range of a regular int but within the range of a uint. The compiler might throw a warning or error in that case if you don't use the U suffix.
Other than that, Daniel Daranas mentioned in comments the only thing that happens: if you don't use the U suffix, you'll be implicitly converting the constant from a regular int to a uint. That's a tiny bit extra effort for the compiler, but there's no run-time difference.
Should you care? Here's my answer, (in bold, for those who only want a quick answer): There's really no good reason to declare a constant as 10U or 0U. Most of the time, you're within the common range of uint and int, so the value of that constant looks exactly the same whether its a uint or an int. The compiler will immediately take your const int expression and convert it to a const uint.
That said, here's the only argument I can give you for the other side: semantics. It's nice to make code semantically coherent. And in that case, if your variable is a uint, it doesn't make sense to set that value to a constant int. If you have a uint variable, it's clearly for a reason, and it should only work with uint values.
That's a pretty weak argument, though, particularly because as a reader, we accept that uint constants usually look like int constants. I like consistency, but there's nothing gained by using the 'U'.
I see this often when using defines to avoid signed/unsigned mismatch warnings. I build a code base for several processors using different tool chains and some of them are very strict.
For instance, removing the ‘u’ in the MAX_PRINT_WIDTH define below:
#define MAX_PRINT_WIDTH (384u)
#define IMAGE_HEIGHT (480u) // 240 * 2
#define IMAGE_WIDTH (320u) // 160 * 2 double density
Gave the following warning:
"..\Application\Devices\MartelPrinter\mtl_print_screen.c", line 106: cc1123: {D} warning:
comparison of unsigned type with signed type
for ( x = 1; (x < IMAGE_WIDTH) && (index <= MAX_PRINT_WIDTH); x++ )
You will probably also see ‘f’ for float vs. double.
I extracted this sentence from a comment, because it's a widely believed incorrect statement, and also because it gives some insight into why explicitly marking unsigned constants as such is a good habit.
...it seems like it would only be useful to keep it when I think overflow might be an issue? But then again, haven't I gone some ways to mitigating for that by specifying unsigned in the first place...
Now, let's consider some code:
int something = get_the_value();
// Compute how many 8s are necessary to reach something
unsigned count = (something + 7) / 8;
So, does the unsigned mitigate potential overflow? Not at all.
Let's suppose something turns out to be INT_MAX (or close to that value). Assuming a 32-bit machine, we might expect count to be 229, or 268,435,456. But it's not.
Telling the compiler that the result of the computation should be unsigned has no effect whatsoever on the typing of the computation. Since something is an int, and 7 is an int, something + 7 will be computed as an int, and will overflow. Then the overflowed value will be divided by 8 (also using signed arithmetic), and whatever that works out to be will be converted to an unsigned and assigned to count.
With GCC, arithmetic is actually performed in 2s complement so the overflow will be a very large negative number; after the division it will be a not-so-large negative number, and that ends up being a largish unsigned number, much larger than the one we were expecting.
Suppose we had specified 7U instead (and maybe 8U as well, to be consistent). Now it works.. It works because now something + 7U is computed with unsigned arithmetic, which doesn't overflow (or even wrap around.)
Of course, this bug (and thousands like it) might go unnoticed for quite a lot of time, blowing up (perhaps literally) at the worst possible moment...
(Obviously, making something unsigned would have mitigated the problem. Here, that's pretty obvious. But the definition might be quite a long way from the use.)
One reason you should do this for trivial code1 is that the suffix forces a type on the literal, and the type may be very important to produce the correct result.
Consider this bit of (somewhat silly) code:
#define magic_number(x) _Generic((x), \
unsigned int : magic_number_unsigned, \
int : magic_number_signed \
)(x)
unsigned magic_number_unsigned(unsigned) {
// ...
}
unsigned magic_number_signed(int) {
// ...
}
int main(void) {
unsigned magic = magic_number(10u);
}
It's not hard to imagine those function actually doing something meaningful based on the type of their argument. Had I omitted the suffix, the generic selection would have produced a wrong result for a very trivial call.
1 But perhaps not the particular code in your post.
In this case, it's completely useless.
In other cases, a suffix might be useful. For instance:
#include <stdio.h>
int
main()
{
printf("%zu\n", sizeof(123));
printf("%zu\n", sizeof(123LL));
return 0;
}
On my system, it will print 4 then 8.
But back to your code, yes it makes your code more explicit, nothing more.

Why is int rather than unsigned int used for C and C++ for loops?

This is a rather silly question but why is int commonly used instead of unsigned int when defining a for loop for an array in C or C++?
for(int i;i<arraySize;i++){}
for(unsigned int i;i<arraySize;i++){}
I recognize the benefits of using int when doing something other than array indexing and the benefits of an iterator when using C++ containers. Is it just because it does not matter when looping through an array? Or should I avoid it all together and use a different type such as size_t?
Using int is more correct from a logical point of view for indexing an array.
unsigned semantic in C and C++ doesn't really mean "not negative" but it's more like "bitmask" or "modulo integer".
To understand why unsigned is not a good type for a "non-negative" number please consider these totally absurd statements:
Adding a possibly negative integer to a non-negative integer you get a non-negative integer
The difference of two non-negative integers is always a non-negative integer
Multiplying a non-negative integer by a negative integer you get a non-negative result
Obviously none of the above phrases make any sense... but it's how C and C++ unsigned semantic indeed works.
Actually using an unsigned type for the size of containers is a design mistake of C++ and unfortunately we're now doomed to use this wrong choice forever (for backward compatibility). You may like the name "unsigned" because it's similar to "non-negative" but the name is irrelevant and what counts is the semantic... and unsigned is very far from "non-negative".
For this reason when coding most loops on vectors my personally preferred form is:
for (int i=0,n=v.size(); i<n; i++) {
...
}
(of course assuming the size of the vector is not changing during the iteration and that I actually need the index in the body as otherwise the for (auto& x : v)... is better).
This running away from unsigned as soon as possible and using plain integers has the advantage of avoiding the traps that are a consequence of unsigned size_t design mistake. For example consider:
// draw lines connecting the dots
for (size_t i=0; i<pts.size()-1; i++) {
drawLine(pts[i], pts[i+1]);
}
the code above will have problems if the pts vector is empty because pts.size()-1 is a huge nonsense number in that case. Dealing with expressions where a < b-1 is not the same as a+1 < b even for commonly used values is like dancing in a minefield.
Historically the justification for having size_t unsigned is for being able to use the extra bit for the values, e.g. being able to have 65535 elements in arrays instead of just 32767 on 16-bit platforms. In my opinion even at that time the extra cost of this wrong semantic choice was not worth the gain (and if 32767 elements are not enough now then 65535 won't be enough for long anyway).
Unsigned values are great and very useful, but NOT for representing container size or for indexes; for size and index regular signed integers work much better because the semantic is what you would expect.
Unsigned values are the ideal type when you need the modulo arithmetic property or when you want to work at the bit level.
This is a more general phenomenon, often people don't use the correct types for their integers. Modern C has semantic typedefs that are much preferable over the primitive integer types. E.g everything that is a "size" should just be typed as size_t. If you use the semantic types systematically for your application variables, loop variables come much easier with these types, too.
And I have seen several bugs that where difficult to detect that came from using int or so. Code that all of a sudden crashed on large matrixes and stuff like that. Just coding correctly with correct types avoids that.
It's purely laziness and ignorance. You should always use the right types for indices, and unless you have further information that restricts the range of possible indices, size_t is the right type.
Of course if the dimension was read from a single-byte field in a file, then you know it's in the range 0-255, and int would be a perfectly reasonable index type. Likewise, int would be okay if you're looping a fixed number of times, like 0 to 99. But there's still another reason not to use int: if you use i%2 in your loop body to treat even/odd indices differently, i%2 is a lot more expensive when i is signed than when i is unsigned...
Not much difference. One benefit of int is it being signed. Thus int i < 0 makes sense, while unsigned i < 0 doesn't much.
If indexes are calculated, that may be beneficial (for example, you might get cases where you will never enter a loop if some result is negative).
And yes, it is less to write :-)
Using int to index an array is legacy, but still widely adopted. int is just a generic number type and does not correspond to the addressing capabilities of the platform. In case it happens to be shorter or longer than that, you may encounter strange results when trying to index a very large array that goes beyond.
On modern platforms, off_t, ptrdiff_t and size_t guarantee much more portability.
Another advantage of these types is that they give context to someone who reads the code. When you see the above types you know that the code will do array subscripting or pointer arithmetic, not just any calculation.
So, if you want to write bullet-proof, portable and context-sensible code, you can do it at the expense of a few keystrokes.
GCC even supports a typeof extension which relieves you from typing the same typename all over the place:
typeof(arraySize) i;
for (i = 0; i < arraySize; i++) {
...
}
Then, if you change the type of arraySize, the type of i changes automatically.
It really depends on the coder. Some coders prefer type perfectionism, so they'll use whatever type they're comparing against. For example, if they're iterating through a C string, you might see:
size_t sz = strlen("hello");
for (size_t i = 0; i < sz; i++) {
...
}
While if they're just doing something 10 times, you'll probably still see int:
for (int i = 0; i < 10; i++) {
...
}
I use int cause it requires less physical typing and it doesn't matter - they take up the same amount of space, and unless your array has a few billion elements you won't overflow if you're not using a 16-bit compiler, which I'm usually not.
Because unless you have an array with size bigger than two gigabyts of type char, or 4 gigabytes of type short or 8 gigabytes of type int etc, it doesn't really matter if the variable is signed or not.
So, why type more when you can type less?
Aside from the issue that it's shorter to type, the reason is that it allows negative numbers.
Since we can't say in advance whether a value can ever be negative, most functions that take integer arguments take the signed variety. Since most functions use signed integers, it is often less work to use signed integers for things like loops. Otherwise, you have the potential of having to add a bunch of typecasts.
As we move to 64-bit platforms, the unsigned range of a signed integer should be more than enough for most purposes. In these cases, there's not much reason not to use a signed integer.
Consider the following simple example:
int max = some_user_input; // or some_calculation_result
for(unsigned int i = 0; i < max; ++i)
do_something;
If max happens to be a negative value, say -1, the -1 will be regarded as UINT_MAX (when two integers with the sam rank but different sign-ness are compared, the signed one will be treated as an unsigned one). On the other hand, the following code would not have this issue:
int max = some_user_input;
for(int i = 0; i < max; ++i)
do_something;
Give a negative max input, the loop will be safely skipped.
Using a signed int is - in most cases - a mistake that could easily result in potential bugs as well as undefined behavior.
Using size_t matches the system's word size (64 bits on 64 bit systems and 32 bits on 32 bit systems), always allowing for the correct range for the loop and minimizing the risk of an integer overflow.
The int recommendation comes to solve an issue where reverse for loops were often written incorrectly by unexperienced programmers (of course, int might not be in the correct range for the loop):
/* a correct reverse for loop */
for (size_t i = count; i > 0;) {
--i; /* note that this is not part of the `for` statement */
/* code for loop where i is for zero based `index` */
}
/* an incorrect reverse for loop (bug on count == 0) */
for (size_t i = count - 1; i > 0; --i) {
/* i might have overflowed and undefined behavior occurs */
}
In general, signed and unsigned variables shouldn't be mixed together, so at times using an int in unavoidable. However, the correct type for a for loop is as a rule size_t.
There's a nice talk about this misconception that signed variables are better than unsigned variables, you can find it on YouTube (Signed Integers Considered Harmful by Robert Seacord).
TL;DR;: Signed variables are more dangerous and require more code than unsigned variables (which should be preferred almost in all cases and definitely whenever negative values aren't logically expected).
With unsigned variables the only concern is the overflow boundary which has a strictly defined behavior (wrap-around) and uses clearly defined modular mathematics.
This allows a single edge case test to catch an overflow and that test can be performed after the mathematical operation was executed.
However, with signed variables the overflow behavior is undefined (UB) and the negative range is actually larger than the positive range - things that add edge cases that must be tested for and explicitly handled before the mathematical operation can be executed.
i.e., how much INT_MIN * -1? (the pre-processor will protect you, but without it you're in a jam).
P.S.
As for the example offered by #6502 in their answer, the whole thing is again an issue of trying to cut corners and a simple missing if statement.
When a loop assumes at least 2 elements in an array, this assumption should be tested beforehand. i.e.:
// draw lines connecting the dots - forward loop
if(pts.size() > 1) { // first make sure there's enough dots
for (size_t i=0; i < pts.size()-1; i++) { // then loop
drawLine(pts[i], pts[i+1]);
}
}
// or test against i + 1 : which tests the desired pts[i+1]
for (size_t i = 0; i + 1 < pts.size(); i++) { // then loop
drawLine(pts[i], pts[i+1]);
}
// or start i as 1 : but note that `-` is slower than `+`
for (size_t i = 1; i < pts.size(); i++) { // then loop
drawLine(pts[i - 1], pts[i]);
}

Should you always use 'int' for numbers in C, even if they are non-negative?

I always use unsigned int for values that should never be negative. But today I
noticed this situation in my code:
void CreateRequestHeader( unsigned bitsAvailable, unsigned mandatoryDataSize,
unsigned optionalDataSize )
{
If ( bitsAvailable – mandatoryDataSize >= optionalDataSize ) {
// Optional data fits, so add it to the header.
}
// BUG! The above includes the optional part even if
// mandatoryDataSize > bitsAvailable.
}
Should I start using int instead of unsigned int for numbers, even if they
can't be negative?
One thing that hasn't been mentioned is that interchanging signed/unsigned numbers can lead to security bugs. This is a big issue, since many of the functions in the standard C-library take/return unsigned numbers (fread, memcpy, malloc etc. all take size_t parameters)
For instance, take the following innocuous example (from real code):
//Copy a user-defined structure into a buffer and process it
char* processNext(char* data, short length)
{
char buffer[512];
if (length <= 512) {
memcpy(buffer, data, length);
process(buffer);
return data + length;
} else {
return -1;
}
}
Looks harmless, right? The problem is that length is signed, but is converted to unsigned when passed to memcpy. Thus setting length to SHRT_MIN will validate the <= 512 test, but cause memcpy to copy more than 512 bytes to the buffer - this allows an attacker to overwrite the function return address on the stack and (after a bit of work) take over your computer!
You may naively be saying, "It's so obvious that length needs to be size_t or checked to be >= 0, I could never make that mistake". Except, I guarantee that if you've ever written anything non-trivial, you have. So have the authors of Windows, Linux, BSD, Solaris, Firefox, OpenSSL, Safari, MS Paint, Internet Explorer, Google Picasa, Opera, Flash, Open Office, Subversion, Apache, Python, PHP, Pidgin, Gimp, ... on and on and on ... - and these are all bright people whose job is knowing security.
In short, always use size_t for sizes.
Man, programming is hard.
Should I always ...
The answer to "Should I always ..." is almost certainly 'no', there are a lot of factors that dictate whether you should use a datatype- consistency is important.
But, this is a highly subjective question, it's really easy to mess up unsigneds:
for (unsigned int i = 10; i >= 0; i--);
results in an infinite loop.
This is why some style guides including Google's C++ Style Guide discourage unsigned data types.
In my personal opinion, I haven't run into many bugs caused by these problems with unsigned data types — I'd say use assertions to check your code and use them judiciously (and less when you're performing arithmetic).
Some cases where you should use unsigned integer types are:
You need to treat a datum as a pure binary representation.
You need the semantics of modulo arithmetic you get with unsigned numbers.
You have to interface with code that uses unsigned types (e.g. standard library routines that accept/return size_t values.
But for general arithmetic, the thing is, when you say that something "can't be negative," that does not necessarily mean you should use an unsigned type. Because you can put a negative value in an unsigned, it's just that it will become a really large value when you go to get it out. So, if you mean that negative values are forbidden, such as for a basic square root function, then you are stating a precondition of the function, and you should assert. And you can't assert that what cannot be, is; you need a way to hold out-of-band values so you can test for them (this is the same sort of logic behind getchar() returning an int and not char.)
Additionally, the choice of signed-vs.-unsigned can have practical repercussions on performance, as well. Take a look at the (contrived) code below:
#include <stdbool.h>
bool foo_i(int a) {
return (a + 69) > a;
}
bool foo_u(unsigned int a)
{
return (a + 69u) > a;
}
Both foo's are the same except for the type of their parameter. But, when compiled with c99 -fomit-frame-pointer -O2 -S, you get:
.file "try.c"
.text
.p2align 4,,15
.globl foo_i
.type foo_i, #function
foo_i:
movl $1, %eax
ret
.size foo_i, .-foo_i
.p2align 4,,15
.globl foo_u
.type foo_u, #function
foo_u:
movl 4(%esp), %eax
leal 69(%eax), %edx
cmpl %eax, %edx
seta %al
ret
.size foo_u, .-foo_u
.ident "GCC: (Debian 4.4.4-7) 4.4.4"
.section .note.GNU-stack,"",#progbits
You can see that foo_i() is more efficient than foo_u(). This is because unsigned arithmetic overflow is defined by the standard to "wrap around," so (a + 69u) may very well be smaller than a if a is very large, and thus there must be code for this case. On the other hand, signed arithmetic overflow is undefined, so GCC will go ahead and assume signed arithmetic doesn't overflow, and so (a + 69) can't ever be less than a. Choosing unsigned types indiscriminately can therefore unnecessarily impact performance.
The answer is Yes. The "unsigned" int type of C and C++ is not an "always positive integer", no matter what the name of the type looks like. The behavior of C/C++ unsigned ints has no sense if you try to read the type as "non-negative"... for example:
The difference of two unsigned is an unsigned number (makes no sense if you read it as "The difference between two non-negative numbers is non-negative")
The addition of an int and an unsigned int is unsigned
There is an implicit conversion from int to unsigned int (if you read unsigned as "non-negative" it's the opposite conversion that would make sense)
If you declare a function accepting an unsigned parameter when someone passes a negative int you simply get that implicitly converted to a huge positive value; in other words using an unsigned parameter type doesn't help you finding errors neither at compile time nor at runtime.
Indeed unsigned numbers are very useful for certain cases because they are elements of the ring "integers-modulo-N" with N being a power of two. Unsigned ints are useful when you want to use that modulo-n arithmetic, or as bitmasks; they are NOT useful as quantities.
Unfortunately in C and C++ unsigned were also used to represent non-negative quantities to be able to use all 16 bits when the integers where that small... at that time being able to use 32k or 64k was considered a big difference. I'd classify it basically as an historical accident... you shouldn't try to read a logic in it because there was no logic.
By the way in my opinion that was a mistake... if 32k are not enough then quite soon 64k won't be enough either; abusing the modulo integer just because of one extra bit in my opinion was a cost too high to pay. Of course it would have been reasonable to do if a proper non-negative type was present or defined... but the unsigned semantic is just wrong for using it as non-negative.
Sometimes you may find who says that unsigned is good because it "documents" that you only want non-negative values... however that documentation is of any value only for people that don't actually know how unsigned works for C or C++. For me seeing an unsigned type used for non-negative values simply means that who wrote the code didn't understand the language on that part.
If you really understand and want the "wrapping" behavior of unsigned ints then they're the right choice (for example I almost always use "unsigned char" when I'm handling bytes); if you're not going to use the wrapping behavior (and that behavior is just going to be a problem for you like in the case of the difference you shown) then this is a clear indicator that the unsigned type is a poor choice and you should stick with plain ints.
Does this means that C++ std::vector<>::size() return type is a bad choice ? Yes... it's a mistake. But if you say so be prepared to be called bad names by who doesn't understand that the "unsigned" name is just a name... what it counts is the behavior and that is a "modulo-n" behavior (and no one would consider a "modulo-n" type for the size of a container a sensible choice).
Bjarne Stroustrup, creator of C++, warns about using unsigned types in his book The C++ programming language:
The unsigned integer types are ideal
for uses that treat storage as a bit
array. Using an unsigned instead of an
int to gain one more bit to represent
positive integers is almost never a
good idea. Attempts to ensure that
some values are positive by declaring
variables unsigned will typically be
defeated by the implicit conversion
rules.
I seem to be in disagreement with most people here, but I find unsigned types quite useful, but not in their raw historic form.
If you consequently stick to the semantic that a type represents for you, then there should be no problem: use size_t (unsigned) for array indices, data offsets etc. off_t (signed) for file offsets. Use ptrdiff_t (signed) for differences of pointers. Use uint8_t for small unsigned integers and int8_t for signed ones. And you avoid at least 80% of portability problems.
And don't use int, long, unsigned, char if you mustn't. They belong in the history books. (Sometimes you must, error returns, bit fields, e.g)
And to come back to your example:
bitsAvailable – mandatoryDataSize >= optionalDataSize
can be easily rewritten as
bitsAvailable >= optionalDataSize + mandatoryDataSize
which doesn't avoid the problem of a potential overflow (assert is your friend) but gets you a bit nearer to the idea of what you want to test, I think.
if (bitsAvailable >= optionalDataSize + mandatoryDataSize) {
// Optional data fits, so add it to the header.
}
Bug-free, so long as mandatoryDataSize + optionalDataSize can't overflow the unsigned integer type -- the naming of these variables leads me to believe this is likely to be the case.
You can't fully avoid unsigned types in portable code, because many typedefs in the standard library are unsigned (most notably size_t), and many functions return those (e.g. std::vector<>::size()).
That said, I generally prefer to stick to signed types wherever possible for the reasons you've outlined. It's not just the case you bring up - in case of mixed signed/unsigned arithmetic, the signed argument is quietly promoted to unsigned.
From the comments on one of Eric Lipperts Blog Posts (See here):
Jeffrey L. Whitledge
I once developed a system in which
negative values made no sense as a
parameter, so rather than validating
that the parameter values were
non-negative, I thought it would be a
great idea to just use uint instead. I
quickly discovered that whenever I
used those values for anything (like
calling BCL methods), they had be
converted to signed integers. This
meant that I had to validate that the
values didn't exceed the signed
integer range on the top end, so I
gained nothing. Also, every time the
code was called, the ints that were
being used (often received from BCL
functions) had to be converted to
uints. It didn't take long before I
changed all those uints back to ints
and took all that unnecessary casting
out. I still have to validate that the
numbers are not negative, but the code
is much cleaner!
Eric Lippert
Couldn't have said it better myself.
You almost never need the range of a
uint, and they are not CLS-compliant.
The standard way to represent a small
integer is with "int", even if there
are values in there that are out of
range. A good rule of thumb: only use
"uint" for situations where you are
interoperating with unmanaged code
that expects uints, or where the
integer in question is clearly used as
a set of bits, not a number. Always
try to avoid it in public interfaces.
Eric
The situation where (bitsAvailable – mandatoryDataSize) produces an 'unexpected' result when the types are unsigned and bitsAvailable < mandatoryDataSize is a reason that sometimes signed types are used even when the data is expected to never be negative.
I think there's no hard and fast rule - I typically 'default' to using unsigned types for data that has no reason to be negative, but then you have to take to ensure that arithmetic wrapping doesn't expose bugs.
Then again, if you use signed types, you still have to sometimes consider overflow:
MAX_INT + 1
The key is that you have to take care when performing arithmetic for these kinds of bugs.
No you should use the type that is right for your application. There is no golden rule. Sometimes on small microcontrollers it is for example more speedy and memory efficient to use say 8 or 16 bit variables wherever possible as that is often the native datapath size, but that is a very special case. I also recommend using stdint.h wherever possible. If you are using visual studio you can find BSD licensed versions.
If there is a possibility of overflow, then assign the values to the next highest data type during the calculation, ie:
void CreateRequestHeader( unsigned int bitsAvailable, unsigned int mandatoryDataSize, unsigned int optionalDataSize )
{
signed __int64 available = bitsAvailable;
signed __int64 mandatory = mandatoryDataSize;
signed __int64 optional = optionalDataSize;
if ( (mandatory + optional) <= available ) {
// Optional data fits, so add it to the header.
}
}
Otherwise, just check the values individually instead of calculating:
void CreateRequestHeader( unsigned int bitsAvailable, unsigned int mandatoryDataSize, unsigned int optionalDataSize )
{
if ( bitsAvailable < mandatoryDataSize ) {
return;
}
bitsAvailable -= mandatoryDataSize;
if ( bitsAvailable < optionalDataSize ) {
return;
}
bitsAvailable -= optionalDataSize;
// Optional data fits, so add it to the header.
}
You'll need to look at the results of the operations you perform on the variables to check if you can get over/underflows - in your case, the result being potentially negative. In that case you are better off using the signed equivalents.
I don't know if its possible in c, but in this case I would just cast the X-Y thing to an int.
If your numbers should never be less than zero, but have a chance to be < 0, by all means use signed integers and sprinkle assertions or other runtime checks around. If you're actually working with 32-bit (or 64, or 16, depending on your target architecture) values where the most significant bit means something other than "-", you should only use unsigned variables to hold them. It's easier to detect integer overflows where a number that should always be positive is very negative than when it's zero, so if you don't need that bit, go with the signed ones.
Suppose you need to count from 1 to 50000. You can do that with a two-byte unsigned integer, but not with a two-byte signed integer (if space matters that much).

Resources