Should I use the stdint.h integer types on 32/64 bit machines? - c

One thing that bugs me about the regular c integer declarations is that their names are strange, "long long" being the worst. I am only building for 32 and 64 bit machines so I do not necessarily need the portability that the library offers, however I like that the name for each type is a single word in similar length with no ambiguity in size.
// multiple word types are hard to read
// long integers can be 32 or 64 bits depending on the machine
unsigned long int foo = 64;
long int bar = -64;
// easy to read
// no ambiguity
uint64_t foo = 64;
int64_t bar = -64;
On 32 and 64 bit machines:
1) Can using a smaller integer such as int16_t be slower than something higher such as int32_t?
2) If I needed a for loop to run just 10 times, is it ok to use the smallest integer that can handle it instead of the typical 32 bit integer?
for (int8_t i = 0; i < 10; i++) {
}
3) Whenever I use an integer that I know will never be negative is it ok to prefer using the unsigned version even if I do not need the extra range in provides?
// instead of the one above
for (uint8_t i = 0; i < 10; i++) {
}
4) Is it safe to use a typedef for the types included from stdint.h
typedef int32_t signed_32_int;
typedef uint32_t unsigned_32_int;
edit: both answers were equally good and I couldn't really lean towards one so I just picked the answerer with lower rep

1) Can using a smaller integer such as int16_t be slower than something higher such as int32_t?
Yes it can be slower. Use int_fast16_t instead. Profile the code as needed. Performance is very implementation dependent. A prime benefit of int16_t is its small, well defined size (also it must be 2's complement) as used in structures and arrays, not so much for speed.
The typedef name int_fastN_t designates the fastest signed integer type with a width of at least N. C11 §7.20.1.3 2
2) If I needed a for loop to run just 10 times, is it ok to use the smallest integer that can handle it instead of the typical 32 bit integer?
Yes but that savings in code and speed is questionable. Suggest int instead. Emitted code tends to be optimal in speed/size with the native int size.
3) Whenever I use an integer that I know will never be negative is it OK to prefer using the unsigned version even if I do not need the extra range in provides?
Using some unsigned type is preferred when the math is strictly unsigned (such as array indexing with size_t), yet code needs to watch for careless application like
for (unsigned i = 10 ; i >= 0; i--) // infinite loop
4) Is it safe to use a typedef for the types included from stdint.h
Almost always. Types like int16_t are optional. Maximum portability uses required types uint_least16_t and uint_fast16_t for code to run on rare platforms that use bits widths like 9, 18, etc.

Can using a smaller integer such as int16_t be slower than something higher such as int32_t?
Yes. Some CPUs do not have dedicated 16-bit arithmetic instructions; arithmetic on 16-bit integers must be emulated with an instruction sequence along the lines of:
r1 = r2 + r3
r1 = r1 & 0xffff
The same principle applies to 8-bit types.
Use the "fast" integer types in <stdint.h> to avoid this -- for instance, int_fast16_t will give you an integer that is at least 16 bits wide, but may be wider if 16-bit types are nonoptimal.
If I needed a for loop to run just 10 times, is it ok to use the smallest integer that can handle it instead of the typical 32 bit integer?
Don't bother; just use int. Using a narrower type doesn't actually save any space, and may cause you issues down the line if you decide to increase the number of iterations to over 127 and forget that the loop variable is using a narrow type.
Whenever I use an integer that I know will never be negative is it ok to prefer using the unsigned version even if I do not need the extra range in provides?
Best avoided. Certain C idioms do not work properly on unsigned integers; for instance, you cannot write a loop of the form:
for (i = 100; i >= 0; i--) { … }
if i is an unsigned type, because i >= 0 will always be true!
Is it safe to use a typedef for the types included from stdint.h
Safe from a technical perspective, but it'll annoy other developers who have to work with your code.
Get used to the <stdint.h> names. They're standardized and reasonably easy to type.

Absolutely possible, yes. On my laptop (Intel Haswell), in a microbenchmark that counts up and down between 0 and 65535 on two registers 2 billion times, this takes
1.313660150s - ax dx (16-bit)
1.312484805s - eax edx (32-bit)
1.312270238s - rax rdx (64-bit)
Minuscule but repeatable differences in timing. (I wrote the benchmark in assembly, because C compilers may optimize it to a different register size.)
It will work, but you'll have to keep it up to date if you change the bounds and the C compiler will probably optimize it to the same assembly code anyway.
As long as it's correct C, that's totally fine. Keep in mind that unsigned overflow is defined and signed overflow is undefined, and compilers do take advantage of that for optimization. For example,
void foo(int start, int count) {
for (int i = start; i < start + count; i++) {
// With unsigned arithmetic, this will execute 0 times if
// "start + count" overflows to a number smaller than "start".
// With signed arithmetic, that may happen, or the compiler
// may assume this loop always runs "count" times.
// For defined behavior, avoid signed overflow.
}
Yes. Also, POSIX provides inttypes.h which extends stdint.h with some useful functions and macros.

Related

C: Is using char faster than using int?

Since char is only 1 byte long, is it better to to use char while dealing with 8-bit unsigned int?
Example:
I was trying to create a struct for storing rgb values of a color.
struct color
{
unsigned int r: 8;
unsigned int g: 8;
unsigned int b: 8;
};
Now since it is int, it allocates a memory of 4 bytes in my case. But if I replace them with unsigned char, they will be taking 3 bytes of memory as intended (in my platform).
No. Maybe a tiny bit.
First, this is a very platform dependent question.
However the <stdint.h> header was introduced to help with this.
Some hardware platforms are optimised for a particular size of operand and have an overhead to using smaller (in bit-size) operands even though logically the smaller operand requires less computation.
You should find that uint_fast8_t is the fastest unsigned integer type with at least 8 bits (#include <stdint.h> to use it).
That may be the same as unsigned char or unsigned int depending on whether your question is 'yes' or 'no' respectively(*).
So the idea would be that if you're speed focused you'd use uint_fast8_t and the compiler will pick the fastest type fitting your purpose.
There are a couple of downsides to this scheme.
One is that if you create very vast quantities of data performance can be impaired (and limits reached) because you're using an 'oversized' type for the purpose.
On a platform where a 4-byte int is faster than a 1-byte char you're using about 4 times as much memory as you need.
If your platform is small or your scale large that can be a significant overhead.
Also you need to be careful that if the underlying type isn't the minimum size you expect then some calculations may be confounded.
Arithmetic 'clocks' neatly for unsigned operands but obviously at different sizes if uint_fast8_t isn't in fact 8-bits.
It's platform dependent what the following returns:
#include <stdint.h>
//...
int foo() {
uint_fast8_t x=255;
++x;
if(x==0){
return 1;
}
return 0;
}
The overhead of dealing with potentially outsized types can claw back your gains.
I tend to agree with Knuth that "premature optimisation is the root of all evil" and would say you should only get into this sort of cleverness if you need it.
Do a typedef for typedef uint8_t color_comp; for now and get the application working before trying to shave off fractions of a second performance later!
I don't know what your application does but it may be that it's not compute intensive in RGB channels and the bottleneck (if any) is elsewhere. Maybe you find some high load calculation where it's worth dealing with uint_fast8_t conversions and issues.
The wisdom of Knuth is that you probably don't know where that is until later anyway.
(*) It could be unsigned long or indeed some other type. The only constraint is that it is an unsigned type and at least 8 bits.

Efficient tiny boolean matrix multiplication

I have some unsigned 16 bit integer s which I'd like to map to an unsigned 32 bit integer r in such a way that each flipped bit in s flips at most one (given) bit in r -- simply a mapping between 0..16 and 0..32 that is. So we can see this as a matrix equation
Ps = r
where P is a 32 x 16 boolean matrix, s is a 16 x 1 boolean vector and r is 32 x 1 boolean vector. I have a gut feeling there exists some super simple hack that I'm missing. Important note: the target machine is a 16 bit mcu!
Here's the best I can do:
static u16 P[32] = someArrayOrWhatever();
u32 FsiPermutationHack(u16 s) {
u32 r;
for (u16 i = 0; i < 32; i++)
{
r |= ((u32)((P[i] & s) > 0) << i);
}
return r;
}
The rationale is this: the i:th bit of r is 1 if and only if (P[i] & s) != 0x0000. I am too stupid to disassemble stuff, but I am guessing this would be like ~100 instructions IF we didn't have to do that stupid u32 cast. But then again, perhaps the compiler auto-splits the loop in two for us in which case it's looking pretty good for us.
Apologies for the tangent, just thought I'd share my attempted solution -- do you have a better one?
Inasmuch as you say,
I am guessing this would be like ~100 instructions IF we didn't have
to do that stupid u32 cast. But then again, perhaps the compiler
auto-splits the loop in two for us in which case it's looking pretty
good for us.
and
I have a gut feeling there exists some super simple hack that I'm missing
, I will interpret you to be asking how to minimize the use of 32-bit arithmetic in this code intended for a 16-bit processor.
You really ought to learn how to disassemble and check the compiled result to see whether the compiler does automatically split the loop as you hypothesize, but supposing that it does not, I don't see why you couldn't do the same manually:
static u16 P[32]; /* value assigned elsewhere */
u32 FsiPermutationHack(u16 s) {
u16 *P_hi = P + 16;
u16 r_lo = 0;
u16 r_hi = 0;
for (u16 i = 0; i < 16; i++) {
r_lo |= (P[i] & s) != 0) << i;
r_hi |= (P_hi[i] & s) != 0) << i;
}
return ((u32) r_hi << 16) + r_lo;
}
That supposes u16 and u32 to be unsigned 16-bit and 32-bit (respectively) integers with no padding bits.
Note also that the idea that performing arithmetic with type u16 instead of u32 should be an improvement assumes that type u32 has a higher integer promotion rank than unsigned int. Roughly speaking, that comes down to the implementation's unsigned int being a 16-bit type. That's entirely plausible for an implementation for a 16-bit processor. On a system whose int and unsigned int are instead 32-bit types, however, all narrower integer arithmetic arguments would be promoted to 32 bits anyway.
Update:
As far as the possibility of a better alternative algorithm, I observe that each bit of the result is computed from a different element of array P, that the whole value of each element is used, and that the element size is the same as the target machine's native word size. There seems then no scope for performing fewer 16-bit bitwise AND operations than there are array elements (but see below).
If we accept that each array element must be processed separately, then the provided implementation does a pretty good job of approaching it efficiently:
It performs only 16-bit computations until the time comes to assemble the final result;
It computes both the upper and lower halves of the result in the same loop, thus incurring only 16 iterations' worth of loop overhead instead of 32
It largely removes the extra indexing arithmetic that that would otherwise have required by creating P_hi for accessing the upper half of the array
It would be possible to manually unroll the loop to possibly save a few more cycles, but that's the kind of optimization that you absolutely should rely on your compiler to perform for you.
As far as "bit twiddling hacks", the only scope I see for anything of that nature would be processing adjacent pairs of 16-bit array elements as 32-bit unsigned integers. That would allow performing one 32-bit bitwise AND in place of each two 16-bit ANDs. That would be coupled with two 32-bit comparisons (vs. two 16-bit comparisons in the above code). The 16-bit shift and bitwise OR operations of the above approach could be retained. Aside from that having formally undefined behavior as a result of violating the strict aliasing rule, that would involve 32-bit arithmetic, which presumably is about half as fast as 16-bit arithmetic on your 16-bit machine. Performance is better measured than predicted, but I don't see any reason to expect a significant win from that approach.

Defending "U" suffix after Hex literals

There is some debate between my colleague and I about the U suffix after hexadecimally represented literals. Note, this is not a question about the meaning of this suffix or about what it does. I have found several of those topics here, but I have not found an answer to my question.
Some background information:
We're trying to come to a set of rules that we both agree on, to use that as our style from that point on. We have a copy of the 2004 Misra C rules and decided to use that as a starting point. We're not interested in being fully Misra C compliant; we're cherry picking the rules that we think will most increase efficiency and robustness.
Rule 10.6 from the aforementioned guidelines states:
A “U” suffix shall be applied to all constants of unsigned type.
I personally think this is a good rule. It takes little effort, looks better than explicit casts and more explicitly shows the intention of a constant. To me it makes sense to use it for all unsigned contants, not just numerics, since enforcing a rule doesn't happen by allowing exceptions, especially for a commonly used representation of constants.
My colleague, however, feels that the hexadecimal representation doesn't need the suffix. Mostly because we almost exclusively use it to set micro-controller registers, and signedness doesn't matter when setting registers to hex constants.
My Question
My question is not one about who is right or wrong. It is about finding out whether there are cases where the absence or presence of the suffix changes the outcome of an operation. Are there any such cases, or is it a matter of consistency?
Edit: for clarification; Specifically about setting micro-controller registers by assigning hexadecimal values to them. Would there be a case where the suffix could make a difference there? I feel like it wouldn't. As an example, the Freescale Processor Expert generates all register assignments as unsigned.
Appending a U suffix to all hexadecimal constants makes them unsigned as you already mentioned. This may have undesirable side-effects when these constants are used in operations along with signed values, especially comparisons.
Here is a pathological example:
#define MY_INT_MAX 0x7FFFFFFFU // blindly applying the rule
if (-1 < MY_INT_MAX) {
printf("OK\n");
} else {
printf("OOPS!\n");
}
The C rules for signed/unsigned conversions are precisely specified, but somewhat counter-intuitive so the above code will indeed print OOPS.
The MISRA-C rule is precise as it states A “U” suffix shall be applied to all constants of unsigned type. The word unsigned has far reaching consequences and indeed most constants should not really be considered unsigned.
Furthermore, the C Standard makes a subtile difference between decimal and hexadecimal constants:
A hexadecimal constant is considered unsigned if its value can be represented by the unsigned integer type and not the signed integer type of the same size for types int and larger.
This means that on 32-bit 2's complement systems, 2147483648 is a long or a long long whereas 0x80000000 is an unsigned int. Appending a U suffix may make this more explicit in this case but the real precaution to avoid potential problems is to mandate the compiler to reject signed/unsigned comparisons altogether: gcc -Wall -Wextra -Werror or clang -Weverything -Werror are life savers.
Here is how bad it can get:
if (-1 < 0x8000) {
printf("OK\n");
} else {
printf("OOPS!\n");
}
The above code should print OK on 32-bit systems and OOPS on 16-bit systems. To make things even worse, it is still quite common to see embedded projects use obsolete compilers which do not even implement the Standard semantics for this issue.
For your specific question, the defined values for micro-processor registers used specifically to set them via assignment (assuming these registers are memory-mapped), need not have the U suffix at all. The register lvalue should have an unsigned type and the hex value will be signed or unsigned depending on its value, but the operation will proceed the same. The opcode for setting a signed number or an unsigned number is the same on your target architecture and on any architectures I have ever seen.
With all integer-constants
Appending u/U insures the integer-constant will be some unsigned type.
Without a u/U
For a decimal-constant, the integer-constant will be some signed type.
For a hexadecimal/octal-constant, the integer-constant will be signed or unsigned type, depending of value and integer type ranges.
Note: All integer-constants have positive values.
// +-------- unary operator
// |+-+----- integer-constant
int x = -123;
absence or presence of the suffix changes the outcome of an operation?
When is this important?
With various expressions, the sign-ness and width of the math needs to be controlled and preferable not surprising.
// Examples: assume 32-bit `unsigned`, `long`, 64-bit `long long`
// Bad signed int overflow (UB)
unsigned a = 4000 * 1000 * 1000;
// OK
unsigned b = 4000u * 1000 * 1000;
// undefined behavior
unsigned c = 1 << 31
// OK
unsigned d = 1u << 31
printf("Size %zu\n", sizeof(0xFFFFFFFF)); // 8 type is `long long`
printf("Size %zu\n", sizeof(0xFFFFFFFFu)); // 4 type is `unsigned`
// 2 ** 63
long long e = -9223372036854775808; // C99: bad "9223372036854775808" not representable
long long f = -9223372036854775807 - 1; // ok
long long g = -9223372036854775808u; // implementation defined behavior **
some_unsigned_type h_max = -1; OK, max value for the target type.
some_unsigned_type i_max = -1u; OK, but not max value for wide unsigned types
// when negating a negative `int`
unsigned j = 0 - INT_MIN; // typically int overflow or UB
unsigned k = 0u - INT_MIN; // Never UB
** or an implementation-defined signal is raised.
For the specific question, which was loading register(s), then the U makes it an unsigned value, but whether the compiler treats the n-bit word pattern as a signed or unsigned value it will move the same bit pattern, assuming there isn't any size extension that would propagate an MSB. The difference that might matter is if the register load operation will set any processor condition flags based on a signed or unsigned loading. As an overall guide if the processor supports storing a constant to configuration register or a memory address then loading a peripheral register is unlikely to set the processor's NEG condition flag. Loading a general purpose register connected to an ALU, one that can be the target of an arithmetic operation like add increment or decrement, might set a negative flag on loading so that e.g. a trailing "branch (if) negative" opcode would execute the branch. You would want to check the processor's references to be sure. Small instruction set processors tend to have only a load register instruction, while larger instruction sets are more likely to have a load unsigned variant of the load instruction that doesn't set the NEG bit in the processor's flags, but again, check the processor's references. if you don't have access to the processor's errata (the boo-boo list) and need a specific flag state. All of this only tends to come up when an optimizing compiler re-aranges code with an inline assembly instruction and other uncommon situations. Examine the generate assembly code, turn off some or all compiler optimizations for the module when needed, etc.

On embedded platforms, is it more efficient to use unsigned int instead of (implicity signed) int?

I've got into this habit of always using unsigned integers where possible in my code, because the processor can do divides by powers of two on unsigned types, which it can't with signed types. Speed is critical for this project. The processor operates at up to 40 MIPS.
My processor has an 18 cycle divide, but it takes longer than the single cycle barrel shifter. So is it worth using unsigned integers here to speed things up or do they bring other disadvantages? I'm using a dsPIC33FJ128GP802 - a member of the dsPIC33F series by Microchip. It has single cycle multiply for both signed and unsigned ints. It also has sign and zero extend instructions.
For example, it produces this code when mixing signed and unsigned integers.
026E4 97E80F mov.b [w15-24],w0
026E6 FB0000 se w0,w0
026E8 97E11F mov.b [w15-31],w2
026EA FB8102 ze w2,w2
026EC B98002 mul.ss w0,w2,w0
026EE 400600 add.w w0,w0,w12
026F0 FB8003 ze w3,w0
026F2 100770 subr.w w0,#16,w14
I'm using C (GCC for dsPIC.)
I think we all need to know a lot more about the peculiarities of your processor to answer this question. Why can't it do divides by powers of two on signed integers? As far as I remember the operation is the same for both. I.e.
10/2 = 00001010 goes to 00000101
-10/2 = 11110110 goes to 11111011
Maybe you should write some simple code doing an unsigned divide and a signed divide and compare the compiled output.
Also benchmarking is a good idea. It doesn't need to be precise. Just have a an array of a few thousand numbers, start a timer and start dividing them a few million times and time how long it takes. Maybe do a few billion times if your processor is fast. E.g.
int s_numbers[] = { etc. etc. };
int s_array_size = sizeof(s_numbers);
unsigned int u_numbers[] = { etc. etc.};
unsigned int u_array_size = sizeof(u_numbers);
int i;
int s_result;
unsigned int u_result;
/* Start timer. */
for(i = 0; i < 100000000; i++)
{
i_result = s_numbers[i % s_array_size] / s_numbers[(i + 1) % s_array_size];
}
/* Stop timer and print difference. */
/* Repeat for unsigned integers. */
Written in a hurry to show the principle, please forgive any errors.
It won't give precise benchmarking but should give a general idea of which is faster.
I don't know much about the instruction set available on your processor but a quick look makes me think that it has instructions that may be used for both arithmetic and logical shifts, which should mean that shifting a signed value costs about the same as shifting an unsigned value, and dividing by powers of 2 for each using the shifts should also cost the same. (my knowledge about this is from a quick glance at some intrinsic functions for a C compiler that targets your processor family).
That being said, if you are working with values which are to be interpreted as unsigned then you might as well declare them as unsigned. For the last few years I've been using the types from stdint.h more and more, and usually I end up using the unsigned versions because my values are either inherently unsigned or I'm just using them as bit arrays.
Generate assembly both ways and count cycles.
I'm going to guess the unsigned divide of powers of two are faster because it can simply do a right shift as needed without needing to worry about sign extension.
As for disadvantages: detecting arithmetic overflows, overflowing a signed type because you didn't realize it while using unsigned, etc. Nothing blocking, just different things to watch out for.

Should you always use 'int' for numbers in C, even if they are non-negative?

I always use unsigned int for values that should never be negative. But today I
noticed this situation in my code:
void CreateRequestHeader( unsigned bitsAvailable, unsigned mandatoryDataSize,
unsigned optionalDataSize )
{
If ( bitsAvailable – mandatoryDataSize >= optionalDataSize ) {
// Optional data fits, so add it to the header.
}
// BUG! The above includes the optional part even if
// mandatoryDataSize > bitsAvailable.
}
Should I start using int instead of unsigned int for numbers, even if they
can't be negative?
One thing that hasn't been mentioned is that interchanging signed/unsigned numbers can lead to security bugs. This is a big issue, since many of the functions in the standard C-library take/return unsigned numbers (fread, memcpy, malloc etc. all take size_t parameters)
For instance, take the following innocuous example (from real code):
//Copy a user-defined structure into a buffer and process it
char* processNext(char* data, short length)
{
char buffer[512];
if (length <= 512) {
memcpy(buffer, data, length);
process(buffer);
return data + length;
} else {
return -1;
}
}
Looks harmless, right? The problem is that length is signed, but is converted to unsigned when passed to memcpy. Thus setting length to SHRT_MIN will validate the <= 512 test, but cause memcpy to copy more than 512 bytes to the buffer - this allows an attacker to overwrite the function return address on the stack and (after a bit of work) take over your computer!
You may naively be saying, "It's so obvious that length needs to be size_t or checked to be >= 0, I could never make that mistake". Except, I guarantee that if you've ever written anything non-trivial, you have. So have the authors of Windows, Linux, BSD, Solaris, Firefox, OpenSSL, Safari, MS Paint, Internet Explorer, Google Picasa, Opera, Flash, Open Office, Subversion, Apache, Python, PHP, Pidgin, Gimp, ... on and on and on ... - and these are all bright people whose job is knowing security.
In short, always use size_t for sizes.
Man, programming is hard.
Should I always ...
The answer to "Should I always ..." is almost certainly 'no', there are a lot of factors that dictate whether you should use a datatype- consistency is important.
But, this is a highly subjective question, it's really easy to mess up unsigneds:
for (unsigned int i = 10; i >= 0; i--);
results in an infinite loop.
This is why some style guides including Google's C++ Style Guide discourage unsigned data types.
In my personal opinion, I haven't run into many bugs caused by these problems with unsigned data types — I'd say use assertions to check your code and use them judiciously (and less when you're performing arithmetic).
Some cases where you should use unsigned integer types are:
You need to treat a datum as a pure binary representation.
You need the semantics of modulo arithmetic you get with unsigned numbers.
You have to interface with code that uses unsigned types (e.g. standard library routines that accept/return size_t values.
But for general arithmetic, the thing is, when you say that something "can't be negative," that does not necessarily mean you should use an unsigned type. Because you can put a negative value in an unsigned, it's just that it will become a really large value when you go to get it out. So, if you mean that negative values are forbidden, such as for a basic square root function, then you are stating a precondition of the function, and you should assert. And you can't assert that what cannot be, is; you need a way to hold out-of-band values so you can test for them (this is the same sort of logic behind getchar() returning an int and not char.)
Additionally, the choice of signed-vs.-unsigned can have practical repercussions on performance, as well. Take a look at the (contrived) code below:
#include <stdbool.h>
bool foo_i(int a) {
return (a + 69) > a;
}
bool foo_u(unsigned int a)
{
return (a + 69u) > a;
}
Both foo's are the same except for the type of their parameter. But, when compiled with c99 -fomit-frame-pointer -O2 -S, you get:
.file "try.c"
.text
.p2align 4,,15
.globl foo_i
.type foo_i, #function
foo_i:
movl $1, %eax
ret
.size foo_i, .-foo_i
.p2align 4,,15
.globl foo_u
.type foo_u, #function
foo_u:
movl 4(%esp), %eax
leal 69(%eax), %edx
cmpl %eax, %edx
seta %al
ret
.size foo_u, .-foo_u
.ident "GCC: (Debian 4.4.4-7) 4.4.4"
.section .note.GNU-stack,"",#progbits
You can see that foo_i() is more efficient than foo_u(). This is because unsigned arithmetic overflow is defined by the standard to "wrap around," so (a + 69u) may very well be smaller than a if a is very large, and thus there must be code for this case. On the other hand, signed arithmetic overflow is undefined, so GCC will go ahead and assume signed arithmetic doesn't overflow, and so (a + 69) can't ever be less than a. Choosing unsigned types indiscriminately can therefore unnecessarily impact performance.
The answer is Yes. The "unsigned" int type of C and C++ is not an "always positive integer", no matter what the name of the type looks like. The behavior of C/C++ unsigned ints has no sense if you try to read the type as "non-negative"... for example:
The difference of two unsigned is an unsigned number (makes no sense if you read it as "The difference between two non-negative numbers is non-negative")
The addition of an int and an unsigned int is unsigned
There is an implicit conversion from int to unsigned int (if you read unsigned as "non-negative" it's the opposite conversion that would make sense)
If you declare a function accepting an unsigned parameter when someone passes a negative int you simply get that implicitly converted to a huge positive value; in other words using an unsigned parameter type doesn't help you finding errors neither at compile time nor at runtime.
Indeed unsigned numbers are very useful for certain cases because they are elements of the ring "integers-modulo-N" with N being a power of two. Unsigned ints are useful when you want to use that modulo-n arithmetic, or as bitmasks; they are NOT useful as quantities.
Unfortunately in C and C++ unsigned were also used to represent non-negative quantities to be able to use all 16 bits when the integers where that small... at that time being able to use 32k or 64k was considered a big difference. I'd classify it basically as an historical accident... you shouldn't try to read a logic in it because there was no logic.
By the way in my opinion that was a mistake... if 32k are not enough then quite soon 64k won't be enough either; abusing the modulo integer just because of one extra bit in my opinion was a cost too high to pay. Of course it would have been reasonable to do if a proper non-negative type was present or defined... but the unsigned semantic is just wrong for using it as non-negative.
Sometimes you may find who says that unsigned is good because it "documents" that you only want non-negative values... however that documentation is of any value only for people that don't actually know how unsigned works for C or C++. For me seeing an unsigned type used for non-negative values simply means that who wrote the code didn't understand the language on that part.
If you really understand and want the "wrapping" behavior of unsigned ints then they're the right choice (for example I almost always use "unsigned char" when I'm handling bytes); if you're not going to use the wrapping behavior (and that behavior is just going to be a problem for you like in the case of the difference you shown) then this is a clear indicator that the unsigned type is a poor choice and you should stick with plain ints.
Does this means that C++ std::vector<>::size() return type is a bad choice ? Yes... it's a mistake. But if you say so be prepared to be called bad names by who doesn't understand that the "unsigned" name is just a name... what it counts is the behavior and that is a "modulo-n" behavior (and no one would consider a "modulo-n" type for the size of a container a sensible choice).
Bjarne Stroustrup, creator of C++, warns about using unsigned types in his book The C++ programming language:
The unsigned integer types are ideal
for uses that treat storage as a bit
array. Using an unsigned instead of an
int to gain one more bit to represent
positive integers is almost never a
good idea. Attempts to ensure that
some values are positive by declaring
variables unsigned will typically be
defeated by the implicit conversion
rules.
I seem to be in disagreement with most people here, but I find unsigned types quite useful, but not in their raw historic form.
If you consequently stick to the semantic that a type represents for you, then there should be no problem: use size_t (unsigned) for array indices, data offsets etc. off_t (signed) for file offsets. Use ptrdiff_t (signed) for differences of pointers. Use uint8_t for small unsigned integers and int8_t for signed ones. And you avoid at least 80% of portability problems.
And don't use int, long, unsigned, char if you mustn't. They belong in the history books. (Sometimes you must, error returns, bit fields, e.g)
And to come back to your example:
bitsAvailable – mandatoryDataSize >= optionalDataSize
can be easily rewritten as
bitsAvailable >= optionalDataSize + mandatoryDataSize
which doesn't avoid the problem of a potential overflow (assert is your friend) but gets you a bit nearer to the idea of what you want to test, I think.
if (bitsAvailable >= optionalDataSize + mandatoryDataSize) {
// Optional data fits, so add it to the header.
}
Bug-free, so long as mandatoryDataSize + optionalDataSize can't overflow the unsigned integer type -- the naming of these variables leads me to believe this is likely to be the case.
You can't fully avoid unsigned types in portable code, because many typedefs in the standard library are unsigned (most notably size_t), and many functions return those (e.g. std::vector<>::size()).
That said, I generally prefer to stick to signed types wherever possible for the reasons you've outlined. It's not just the case you bring up - in case of mixed signed/unsigned arithmetic, the signed argument is quietly promoted to unsigned.
From the comments on one of Eric Lipperts Blog Posts (See here):
Jeffrey L. Whitledge
I once developed a system in which
negative values made no sense as a
parameter, so rather than validating
that the parameter values were
non-negative, I thought it would be a
great idea to just use uint instead. I
quickly discovered that whenever I
used those values for anything (like
calling BCL methods), they had be
converted to signed integers. This
meant that I had to validate that the
values didn't exceed the signed
integer range on the top end, so I
gained nothing. Also, every time the
code was called, the ints that were
being used (often received from BCL
functions) had to be converted to
uints. It didn't take long before I
changed all those uints back to ints
and took all that unnecessary casting
out. I still have to validate that the
numbers are not negative, but the code
is much cleaner!
Eric Lippert
Couldn't have said it better myself.
You almost never need the range of a
uint, and they are not CLS-compliant.
The standard way to represent a small
integer is with "int", even if there
are values in there that are out of
range. A good rule of thumb: only use
"uint" for situations where you are
interoperating with unmanaged code
that expects uints, or where the
integer in question is clearly used as
a set of bits, not a number. Always
try to avoid it in public interfaces.
Eric
The situation where (bitsAvailable – mandatoryDataSize) produces an 'unexpected' result when the types are unsigned and bitsAvailable < mandatoryDataSize is a reason that sometimes signed types are used even when the data is expected to never be negative.
I think there's no hard and fast rule - I typically 'default' to using unsigned types for data that has no reason to be negative, but then you have to take to ensure that arithmetic wrapping doesn't expose bugs.
Then again, if you use signed types, you still have to sometimes consider overflow:
MAX_INT + 1
The key is that you have to take care when performing arithmetic for these kinds of bugs.
No you should use the type that is right for your application. There is no golden rule. Sometimes on small microcontrollers it is for example more speedy and memory efficient to use say 8 or 16 bit variables wherever possible as that is often the native datapath size, but that is a very special case. I also recommend using stdint.h wherever possible. If you are using visual studio you can find BSD licensed versions.
If there is a possibility of overflow, then assign the values to the next highest data type during the calculation, ie:
void CreateRequestHeader( unsigned int bitsAvailable, unsigned int mandatoryDataSize, unsigned int optionalDataSize )
{
signed __int64 available = bitsAvailable;
signed __int64 mandatory = mandatoryDataSize;
signed __int64 optional = optionalDataSize;
if ( (mandatory + optional) <= available ) {
// Optional data fits, so add it to the header.
}
}
Otherwise, just check the values individually instead of calculating:
void CreateRequestHeader( unsigned int bitsAvailable, unsigned int mandatoryDataSize, unsigned int optionalDataSize )
{
if ( bitsAvailable < mandatoryDataSize ) {
return;
}
bitsAvailable -= mandatoryDataSize;
if ( bitsAvailable < optionalDataSize ) {
return;
}
bitsAvailable -= optionalDataSize;
// Optional data fits, so add it to the header.
}
You'll need to look at the results of the operations you perform on the variables to check if you can get over/underflows - in your case, the result being potentially negative. In that case you are better off using the signed equivalents.
I don't know if its possible in c, but in this case I would just cast the X-Y thing to an int.
If your numbers should never be less than zero, but have a chance to be < 0, by all means use signed integers and sprinkle assertions or other runtime checks around. If you're actually working with 32-bit (or 64, or 16, depending on your target architecture) values where the most significant bit means something other than "-", you should only use unsigned variables to hold them. It's easier to detect integer overflows where a number that should always be positive is very negative than when it's zero, so if you don't need that bit, go with the signed ones.
Suppose you need to count from 1 to 50000. You can do that with a two-byte unsigned integer, but not with a two-byte signed integer (if space matters that much).

Resources