Signed vs Unsigned operations in C - c

Very simple question:
I have a program doing lots and lots of mathematical computations over ints and long longs. To fit in an extra bit, I made the long longs unsigned, since I only dealt with positive numbers, and could now get a few more values.
Oddly enough, this gave me a 15% performance boost, which I confirmed to be in simply making all the long long's unsigned.
Is this possible? Are mathematical operations really faster with unsigned numbers? I remember reading that there would be no difference, and the compiler automatically picks out the fastest way to go whether signed or unsigned. Is this 15% boost really from making the vars unsigned, or could it be something else affected in my code?
And, if it really is from making the vars unsigned, should I aim to make everything (even ints) unsigned, as I never need negative numbers, and every second is important if I can save it.

In some operations, signed integers are faster, in others, unsigned are faster:
In C, signed integer operations can be assumed not to wrap. The compiler will take advantage of this in loop optimization, for example. Comparisons can be optimized away similarly. (This can also lead to subtle bugs if you don't expect this).
On the other hand, unsigned integers do not have this assumption. However, not having to deal with a sign is a big advantage for some operations, for example: division. Unsigned division by a constant power of two is a simple shift, but (depending on your rounding rules) there's a conditional off-by-1 for negative numbers.
Personally, I make a habit of only using unsigned integers unless I really, really do have a value which needs to be signed. It's not so much for performance as correctness.
You may see the effect magnified with long long, which (I'm guessing) is 64 bits in your case. The CPU usually doesn't have single instructions do deal with these types (in 32 bit mode), so the slight added complexity for signed operations will be more noticeable.

On a 32-bit processor, 64-bit integer operations are emulated; using unsigned instead of signed means the emulation library doesn't have to do extra work to propagate carry bits etc.

There are three cases where a compiler cares whether a variable is signed or unsigned:
When the variable is converted to a longer type
When the comparison operators (greater-than, etc.) are applied
When overflows might occur
On some machines, conversion of signed variables to longer types requires extra code; on other machines, a conversion may be performed as part of a 'load' or 'move' instruction.
Some machines (mainly small embedded microcontrollers) require more instructions to perform a signed-versus-signed comparison than unsigned-versus-unsigned, but most machines have a full array of both signed and unsigned compare instructions available.
When overflows occur with unsigned types, the compiler may have to add code to ensure that the defined behavior actually occurs. No such code is required for signed types, because anything that might happen in the absence of such code would be permitted by the standard.

The compiler doesn't pick if it's going to be unsigned or signed. But, yes, in theory, unsigned with unsigned is faster than signed with signed. If you really want to slow things down, you'll go with signed with unsigned. And even worse: floats with integers.
It depends on the processor, of course.

Related

Integer memory representation

As been told again and so that nehative numbers are represented by 2s complement while unsigned don't use that extra bit for signed convention. In case of integer we can represent both signed and unsigned. How in data type integer computer figures out which encoding scheme to pursue
Some operations (such as addition) work identically on both signed and unsigned integers.
But that's not the case for all operations. When right-shifting, we shift in zeroes for unsigned integers, and we shift in the sign bit for signed integers.
In these cases, the processor provides the means to achieve both operations. It's possible for the processor two offer two different instructions, or two variations of one.
But whatever the case, there is no decision making on the processor's part. The processor just executes the instructions selected by the compiler. It's up to the compiler to emit instructions that achieve the desired result based on the type of the values involved.

What are the minimum-width integer types useful for?

From ISO/IEC 9899:
7.18.1.2 Minimum-width integer types
1 The typedef name int_leastN_t designates a signed integer type with a width of at
least N, such that no signed integer type with lesser size has at least the specified width.
Thus, int_least32_t denotes a signed integer type with a width of at least 32 bits.
Why I should ever use this types?
When I'm deciding what type I should take for a variable I need, then I ask my self:
"What will be the biggest value it could ever be carrying?"
So I'm going to find an answer, check what's the lowest 2n which is bigger than that, and take the matching exact integer type.
So in this case I could use also a minimum-width integer type.
But why? As I already know: it will never be a greater value. So why take something that could sometimes cover even more as I need?
All other cases I can imagin of where even invalid as i.e.:
"I have a type that will be at least size of..."
- The implmentation can't know what will be the largest (for example) user input I will ever get, so adjusting the type at compile time won't help.
"I have a variable where I can't determine what size of values it will be holding on run time."
-So how the compiler can know at compile time? -> It can't find the fitting byte size, too.
So what is the usage of these types?
Because your compiler knows best what is good for you. For example, on some CPU architectures, computations involving 8 or 16 bit types might be much slower than computations done in 32 bits due to extra instructions for masking operands and results to match their width.
The C implementation on a Cray Unicos for example has only an 8 bit char type, everything else (short, int, long, long long) is 64 bit. If you force a type to be int16_t or int32_t performance can suffer drastically due to the narrow stores requiring masking, oring and anding. Using int_least32_t would allow the compiler to use the native 64 bit type.
So why take something that could sometimes cover even more as I need?
Because there might not always be the size you need. For example, on system where CHAR_BIT > 8, int8_t is not available, but int_least8_t is.
Idea is not that compiler will guess how much bits you need. Idea is that compiler will always have type available which will satisfy your size requirement, even if it cannot offer exact size type.

About the use of signed integers in C family of languages

When using integer values in my own code, I always try to consider the signedness, asking myself if the integer should be signed or unsigned.
When I'm sure the value will never need to be negative, I then use an unsigned integer.
And I have to say this happen most of the time.
When reading other peoples' code, I rarely see unsigned integers, even if the represented value can't be negative.
So I asked myself: «is there a good reason for this, or do people just use signed integers because the don't care»?
I've search on the subject, here and in other places, and I have to say I can't find a good reason not to use unsigned integers, when it applies.
I came across those questions: «Default int type: Signed or Unsigned?», and «Should you always use 'int' for numbers in C, even if they are non-negative?» which both present the following example:
for( unsigned int i = foo.Length() - 1; i >= 0; --i ) {}
To me, this is just bad design. Of course, it may result in an infinite loop, with unsigned integers.
But is it so hard to check if foo.Length() is 0, before the loop?
So I personally don't think this is a good reason for using signed integers all the way.
Some people may also say that signed integers may be useful, even for non-negative values, to provide an error flag, usually -1.
Ok, that's good to have a specific value that means «error».
But then, what's wrong with something like UINT_MAX, for that specific value?
I'm actually asking this question because it may lead to some huge problems, usually when using third-party libraries.
In such a case, you often have to deal with signed and unsigned values.
Most of the time, people just don't care about the signedness, and just assign a, for instance, an unsigned int to a signed int, without checking the range.
I have to say I'm a bit paranoid with the compiler warning flags, so with my setup, such an implicit cast will result in a compiler error.
For that kind of stuff, I usually use a function or macro to check the range, and then assign using an explicit cast, raising an error if needed.
This just seems logical to me.
As a last example, as I'm also an Objective-C developer (note that this question is not related to Objective-C only):
- ( NSInteger )tableView: ( UITableView * )tableView numberOfRowsInSection: ( NSInteger )section;
For those not fluent with Objective-C, NSInteger is a signed integer.
This method actually retrieves the number of rows in a table view, for a specific section.
The result will never be a negative value (as the section number, by the way).
So why use a signed integer for this?
I really don't understand.
This is just an example, but I just always see that kind of stuff, with C, C++ or Objective-C.
So again, I'm just wondering if people just don't care about that kind of problems, or if there is finally a good and valid reason not to use unsigned integers for such cases.
Looking forward to hear your answers : )
a signed return value might yield more information (think error-numbers, 0 is sometimes a valid answer, -1 indicates error, see man read) ... which might be relevant especially for developers of libraries.
if you are worrying about the one extra bit you gain when using unsigned instead of signed then you are probably using the wrong type anyway. (also kind of "premature optimization" argument)
languages like python, ruby, jscript etc are doing just fine without signed vs unsigned. that might be an indicator ...
When using integer values in my own code, I always try to consider the signedness, asking myself if the integer should be signed or unsigned.
When I'm sure the value will never need to be negative, I then use an unsigned integer.
And I have to say this happen most of the time.
To carefully consider which type that is most suitable each time you declare a variable is very good practice! This means you are careful and professional. You should not only consider signedness, but also the potential max value that you expect this type to have.
The reason why you shouldn't use signed types when they aren't needed have nothing to do with performance, but with type safety. There are lots of potential, subtle bugs that can be caused by signed types:
The various forms of implicit promotions that exist in C can cause your type to change signedness in unexpected and possibly dangerous ways. The integer promotion rule that is part of the usual arithmetic conversions, the lvalue conversion upon assignment, the default argument promotions used by for example VA lists, and so on.
When using any form of bitwise operators or similar hardware-related programming, signed types are dangerous and can easily cause various forms of undefined behavior.
By declaring your integers unsigned, you automatically skip past a whole lot of the above dangers. Similarly, by declaring them as large as unsigned int or larger, you get rid of lots of dangers caused by the integer promotions.
Both size and signedness are important when it comes to writing rugged, portable and safe code. This is the reason why you should always use the types from stdint.h and not the native, so-called "primitive data types" of C.
So I asked myself: «is there a good reason for this, or do people just use signed integers because the don't care»?
I don't really think it is because they don't care, nor because they are lazy, even though declaring everything int is sometimes referred to as "sloppy typing" - which means sloppily picked type more than it means too lazy to type.
I rather believe it is because they lack deeper knowledge of the various things I mentioned above. There's a frightening amount of seasoned C programmers who don't know how implicit type promotions work in C, nor how signed types can cause poorly-defined behavior when used together with certain operators.
This is actually a very frequent source of subtle bugs. Many programmers find themselves staring at a compiler warning or a peculiar bug, which they can make go away by adding a cast. But they don't understand why, they simply add the cast and move on.
for( unsigned int i = foo.Length() - 1; i >= 0; --i ) {}
To me, this is just bad design
Indeed it is.
Once upon a time, down-counting loops would yield more effective code, because the compiler pick add a "branch if zero" instruction instead of a "branch if larger/smaller/equal" instruction - the former is faster. But this was at a time when compilers were really dumb and I don't believe such micro-optimizations are relevant any longer.
So there is rarely ever a reason to have a down-counting loop. Whoever made the argument probably just couldn't think outside the box. The example could have been rewritten as:
for(unsigned int i=0; i<foo.Length(); i++)
{
unsigned int index = foo.Length() - i - 1;
thing[index] = something;
}
This code should not have any impact on performance, but the loop itself turned a whole lot easier to read, while at the same time fixing the bug that your example had.
As far as performance is concerned nowadays, one should probably spend the time pondering about which form of data access that is most ideal in terms of data cache use, rather than anything else.
Some people may also say that signed integers may be useful, even for non-negative values, to provide an error flag, usually -1.
That's a poor argument. Good API design uses a dedicated error type for error reporting, such as an enum.
Instead of having some hobbyist-level API like
int do_stuff (int a, int b); // returns -1 if a or b were invalid, otherwise the result
you should have something like:
err_t do_stuff (int32_t a, int32_t b, int32_t* result);
// returns ERR_A is a is invalid, ERR_B if b is invalid, ERR_XXX if... and so on
// the result is stored in [result], which is allocated by the caller
// upon errors the contents of [result] remain untouched
The API would then consistently reserve the return of every function for this error type.
(And yes, many of the standard library functions abuse return types for error handling. This is because it contains lots of ancient functions from a time before good programming practice was invented, and they have been preserved the way they are for backwards-compatibility reasons. So just because you find a poorly-written function in the standard library, you shouldn't run off to write an equally poor function yourself.)
Overall, it sounds like you know what you are doing and giving signedness some thought. That probably means that knowledge-wise, you are actually already ahead of the people who wrote those posts and guides you are referring to.
The Google style guide for example, is questionable. Similar could be said about lots of other such coding standards that use "proof by authority". Just because it says Google, NASA or Linux kernel, people blindly swallow them no matter the quality of the actual contents. There are good things in those standards, but they also contain subjective opinions, speculations or blatant errors.
Instead I would recommend referring to real professional coding standards instead, such as MISRA-C. It enforces lots of thought and care for things like signedness, type promotion and type size, where less detailed/less serious documents just skip past it.
There is also CERT C, which isn't as detailed and careful as MISRA, but at least a sound, professional document (and more focused towards desktop/hosted development).
There is one heavy-weight argument against widely unsigned integers:
Premature optimization is the root of all evil.
We all have at least on one occasion been bitten by unsigned integers. Sometimes like in your loop, sometimes in other contexts. Unsigned integers add a hazard, even though a small one, to your program. And you are introducing this hazard to change the meaning of one bit. One little, tiny, insignificant-but-for-its-sign-meaning bit. On the other hand, the integers we work with in bread and butter applications are often far below the range of integers, more in the order of 10^1 than 10^7. Thus, the different range of unsigned integers is in the vast majority of cases not needed. And when it's needed, it is quite likely that this extra bit won't cut it (when 31 is too little, 32 is rarely enough) and you'll need a wider or an arbitrary-wide integer anyway. The pragmatic approach in these cases is to just use the signed integer and spare yourself the occasional underflow bug. Your time as a programmer can be put to much better use.
From the C FAQ:
The first question in the C FAQ is which integer type should we decide to use?
If you might need large values (above 32,767 or below -32,767), use long. Otherwise, if space is very important (i.e. if there are large arrays or many structures), use short. Otherwise, use int. If well-defined overflow characteristics are important and negative values are not, or if you want to steer clear of sign-extension problems when manipulating bits or bytes, use one of the corresponding unsigned types.
Another question concerns types conversions:
If an operation involves both signed and unsigned integers, the situation is a bit more complicated. If the unsigned operand is smaller (perhaps we're operating on unsigned int and long int), such that the larger, signed type could represent all values of the smaller, unsigned type, then the unsigned value is converted to the larger, signed type, and the result has the larger, signed type. Otherwise (that is, if the signed type can not represent all values of the unsigned type), both values are converted to a common unsigned type, and the result has that unsigned type.
You can find it here. So basically using unsigned integers, mostly for arithmetic conversions can complicate the situation since you'll have to either make all your integers unsigned, or be at the risk of confusing the compiler and yourself, but as long as you know what you are doing, this is not really a risk per se. However, it could introduce simple bugs.
And when it is a good to use unsigned integers? one situation is when using bitwise operations:
The << operator shifts its first operand left by a number of bits
given by its second operand, filling in new 0 bits at the right.
Similarly, the >> operator shifts its first operand right. If the
first operand is unsigned, >> fills in 0 bits from the left, but if
the first operand is signed, >> might fill in 1 bits if the high-order
bit was already 1. (Uncertainty like this is one reason why it's
usually a good idea to use all unsigned operands when working with the
bitwise operators.)
taken from here
And I've seen this somewhere:
If it was best to use unsigned integers for values that are never negative, we would have started by using unsigned int in the main function int main(int argc, char* argv[]). One thing is sure, argc is never negative.
EDIT:
As mentioned in the comments, the signature of main is due to historical reasons and apparently it predates the existence of the unsigned keyword.
Unsigned intgers are an artifact from the past. This is from the time, where processors could do unsigned arithmetic a little bit faster.
This is a case of premature optimization which is considered evil.
Actually, in 2005 when AMD introduced x86_64 (or AMD64, how it was then called), the 64 bit architecture for x86, they brought the ghosts of the past back: If a signed integer is used as an index and the compiler can not prove that it is never negative, is has to insert a 32 to 64 bit sign extension instruction - because the default 32 to 64 bit extension is unsigned (the upper half of a 64 bit register gets cleard if you move a 32 bit value into it).
But I would recommend against using unsigned in any arithmetic at all, being it pointer arithmetic or just simple numbers.
for( unsigned int i = foo.Length() - 1; i >= 0; --i ) {}
Any recent compiler will warn about such an construct, with condition ist always true or similar. With using a signed variable you avoid such pitfalls at all. Instead use ptrdiff_t.
A problem might be the c++ library, it often uses an unsigned type for size_t, which is required because of some rare corner cases with very large sizes (between 2^31 and 2^32) on 32 bit systems with certain boot switches ( /3GB windows).
There are many more, comparisons between signed and unsigned come to my mind, where the signed value automagically gets promoted to a unsigned and thus becomes a huge positive number, when it has been a small negative before.
One exception for using unsigned exists: For bit fields, flags, masks it is quite common. Usually it doesn't make sense at all to interpret the value of these variables as a magnitude, and the reader may deduce from the type that this variable is to be interpreted in bits.
The result will never be a negative value (as the section number, by the way). So why use a signed integer for this?
Because you might want to compare the return value to a signed value, which is actually negative. The comparison should return true in that case, but the C standard specifies that the signed get promoted to an unsigned in that case and you will get a false instead. I don't know about ObjectiveC though.

Unsigned version of lldiv in C?

I have a function which needs the quotient and remainder for an unsigned 64-bit division. It looks like lldiv and lldiv_t, while long long ints rather than ints, are signed. Is there an unsigned version? If not, what's the best way to handle this?
Speed is important (as usual, billions or trillions of operations), but the compiler might be smart enough to handle this properly -- I'm using gcc 4.3.3.
Just use the division and remainder operators. Any sane compiler will do a much better job optimizing them than a call to div, ldiv, or lldiv.

Calculating with a variable outside of its bounds in C

If I make a calculation with a variable where an intermediate part of the calculation goes higher then the bounds of that variable type, is there any hazard that some platforms may not like?
This is an example of what I'm asking:
int a, b;
a=30000;
b=(a*32000)/32767;
I have compiled this, and it does give the correct answer of 29297 (well, within truncating error, anyway). But the part that worries me is that 30,000*32,000 = 960,000,000, which is a 30-bit number, and thus cannot be stored in a 16-bit int. The end result is well within the bounds of an int, but I was expecting that whatever working part of memory would have the same size allocated as the largest source variables did, so an overflow error would occur.
This is just a small example to show my problem, I am trying to avoid using floating points by making the fraction be a fraction of the max amount able to be stored in that variable (in this case, a signed integer, so 32767 on the positive side), because the embedded system I'm using I believe does not have an FPU.
So how do most processors handle calculations out of the bounds of the source and destination variables?
On a 16-bit compiler/CPU, you can (almost) plan on that code giving incorrect results. This is a bit sad, since nearly every CPU (that has a multiply instruction at all) will produce and store the intermediate result, but no C compiler (of which I'm aware) will normally use it (and if you made a and b unsigned, it wouldn't be allowed to use it).
You have a few choices to deal with this. One is to write small muldiv function in assembly language that does the multiplication (preserving the high word) then the division on that, and finally returns the value to C when it's been reduced back into range.
Another option is to do the math on unsigned integers, which at least allow you to figure out when a problem occurred. Unfortunately, none of the choices is what I'd call particularly appealing though...
As far as I know, most if not all processors will hold results for a word * word multiplication in a double word -- meaning, an 8 bit * 8 bit is stored in a 16-bit register(s) on an 8-bit processor, a 32-bit * 32 bit operation is stored in a 64-bit register(s) on a 32-bit machine. (At least, that's how it's been on all the embedded microcontrollers I've used)
If that weren't the case, the processor would be severely crippled in the sense of only allowing half-word * half-word multiplication.
AFAIK this kind of thing is formally "undefined". You have to do the algebra necessary to prevent overflow. That's always your first choice. Numeric stability is no accident, it requires some care in deciding when and how to do division and multiplication.
Or, you have to guarantee that you'll use an intermediate result buffer that's big enough.
Using a large intermediate buffer is what some C compilers do anyway. The language, however, doesn't make any guarantees.
So, to be sure that it works, most folks do something like this.
short a= 30000;
int temp= a;
int temp2= (a*32000)/32767;
// here you can check for errors; if temp2 > 32767, you have overflow.
short b= a;
Signed integer overflow is undefined behavior.
Almost any implementation you could possibly meet will wrap around on integer overflow, because (a) everyone uses 2's complement, in which arithmetic operations are bitwise identical for signed and unsigned types of the same size, and (b) wraparound is the defined behavior of unsigned types in C.
So, on an implementation with a 16 bit int, I would expect the result 0 for your calculation (and that is the result that it must have if you'd used an unsigned 16 bit int). But I'd code against the possibility it might throw a hardware exception, explode, etc.
Note that if you do the calculation with two 16 bit short variables on a machine with a 32 bit int, then you will generally get the "right" answer 29297, because the intermediate value (a*32000) is an int, and only gets truncated back to short at the end. I say "generally" because converting an out-of-bounds integer value to a signed integer type either gives an unspecified result or else raises a signal. But again, any implementation you'll encounter in polite company just takes a modulus.
Are you sure your compiler has 16 bit integers? On most systems nowadays, ints are 32 bits. Another possible reason you aren't getting an error is that some compilers will recognize that it can compute something like this at compile time and will do so.
If you are really concerned that you will end up with overflow, you can sometimes reorder or factor the formula differently so that no intermediate terms will overflow. In your example that would be hard to do since all of your terms are near the limit of a 16 bit value. Do you need the number to be exactly right, or can you approximate? If you can, you can do something like this:
int a, b;
a=30000;
//b=(a*32000)/32767 ~= a * (32000/32768) = a *(125/128)
b = (a / 128) * 125 // if a=30000, b = 29250 - about 0.16% error
Another option would be to use larger sized types for intermediate terms. If your compiler had 16 bit ints and 32 bit longs, you could do something like this:
int a, b;
a=30000;
b=((long)a*32000L)/32767L;
Really, there's no set answer for how to handle overflow. You need to evaluate each case on its own and decide what the best solution is.
Your compiler and target processor both have to do with the sizes of the various data types.
Compilers will usually promote variables to the largest easy to work with size during calculations and then convert the results whatever size is needed for an assignment at the end.
There's also C rules that govern promoting to sizes which are more difficult to work with for some calculations. If you are compiling for an AVR, which has 8 bit registers but defines an int to be 16 bits, many calculations end up using more registers than you might think that they need because of this promotion and the fact that constant numbers in your code have to be thought of as being int or unsigned int unless the compiler can prove to itself that this won't effect the outcome of the calculations.
Try rewriting your code with various different sizes of integers (short, int, long, long long) and see how that goes. You may also want to write a simple program that prints out the sizeof( ) of the standard predefined types.
If you need to worry about the sizes of your integer variables and/or the intermediate results of your calculations then you should include and use things like uint32_t and int64_t for your declarations and type casting.

Resources