Why cast like this? "(uint16_t) -1;"

Why cast like this? "(uint16_t) -1;" - c

I saw a line of code like this:
xxxxx = (uint16_t) -1;
Why cast -1 into a unsigned int? What is this code to get?
Thanks!

Obviously the answer is within reach of your fingertips. Just plug that code in and step through and you will see the value that you get when you cast -1 to a 16 bit unsigned integer.
The value turns out to be two's complement of -1, which is: 0xFFFF hex, or 65535 decimal.
As to the actual reason for using the code like that, it's simply a short-cut. Or maybe it's just to satisfy a type compatibility requirement.
If you're wondering how come -1 gets cast to 0xFFFF (65535) and not maybe 0 or 1, as one might expect, you have to understand that the reason for that is that the C language, although statically typed, is quite liberal when it comes to enforcing type restrictions. That means that it will happily cast - or interpret if you will - any memory location as whatever arbitrary type of data you tell it. This of course can have quite devastating consequences if used improperly but the trade-off is flexibility and a speed improvement due to the lack of strict sanity checks. This was very important a few decades ago when C was designed and it still is if you're writing code for very tiny processors.
That being said, if you think about a cast as simply saying: "disregard what you think you know about the data at this memory location, just tell me what it would mean if you read it as a <insert_your_type_here>" and if you know that computers usually represent negative numbers as two's complement (see above) then the answer should by now by pretty obvious: C is taking the value in memory and reading it back as an unsigned integer.
As an ending note. I should point out that C is not the only language that will cast -1 to 0xFFFF but even more modern languages that are capable of stronger type checks will do the same, probably for compatibility and continuity reasons, as well as for the reason that it makes it possible to reverse the cast: 0xFFFF back to a signed 16 bit integer is -1.

It will return the highest possible unsigned 16bit integer.

It returns 65535 which is the maximum value of a two byte (16 bit) unsigned integer.

Related

Can I use "MAX" macros to check whether all bits are set?

I recently had an interview where I had to propose a function that checks whether all bits from a uint32_t are set or not.
I wrote the following code :
int checkStatus(uint32_t val) {
return val == UINT32_MAX;
}
I assumed it would return 0 if one bit isn't set, and 1 if both of the values are the same.
As I didn't see this solution anywhere else, I assume it's wrong. But I don't understand why. When I did it, I was thinking : "if all bits are set to 1, then the value should be the maximum unsigned integer represented otherwise not." Could you tell me if it was incorrect and why?

The problem with using UINT32_MAX here is that "maximum" is an arithmetic concept, not a bitwise logical concept. Although UINT32_MAX does, in fact, represent the (unsigned) number 0xFFFFFFFF in most scenarios, this is just a consequence of the way numbers are represented in twos-complement arithmetic.
Using UINT32_MAX here has the potential to mislead the reader, because it has this connotation of arithmetic, rather than of bitwise manipulation.
In C programming, we're all using to representing bit patterns as hex numbers. The number "0xFFFFFFFF" immediately expresses the notion that all bits are set, in a way that UNIT32_MAX does not. Similarly, it may not be appropriate to use the value "0xFFFFFFFF" to represent "minus 1" in an arithmetic expression, even if the values are, in practice, the same.
Because I make most of my living looking for errors in other people's code, I'm acutely aware of the differences in expressive powers of different ways of expressing the same thing. Sometimes we have to write inexpressive things for reasons of inefficiency; but I don't think we should do it just for the sake of it.

Why not use complement of VAL instead of (-VAL -1)

I'm reading redis' source code from https://github.com/antirez/redis.
I saw such macros in src/ziplist.c
#define INT24_MAX 0x7fffff
#define INT24_MIN (-INT24_MAX - 1)
why not just do this?
#define INT24_MIN (~INT24_MAX)

A better question might be why do you think (~INT24_MAX) is better than (-INT24_MAX - 1)?
On a two's complement machine you get the same result from either expression, and both of them evaluate just as fast as the other (for a 32-bit target, the compiler will reduce both of them them to 0xff800000 at compile time). However, in my opinion the expression (-INT24_MAX - 1) models the numeric concept that the minimum value is one less than the negation of the maximum value better.
That might not be of any huge importance, but the expression (~INT24_MAX) isn't better in an objective way, and I'd argue that subjectively it might not be as good.
Basically, (-INT24_MAX - 1) might have been what the coder happened to think of (maybe since as I mentioned, it models what's numerically intended), and there's no reason to use something else.

Suppose, int is 32-bit and can hold 0x7fffff, then ~0x7fffff is going to be ~0x007fffff or, after all bits have been inverted, 0xff800000.
This bit pattern represents the negative value -0x7fffff-1 if negative integers use the 2's complement representation.
If they use the 1's complement representation, then this pattern represents the negative value -0x7fffff.
If they use the sign-magnitude representation, then this pattern represents the negtative value -0x7f800000.
As you can see, the value of ~0x7fffff is going to depend on the representation of negative integers and the size of the type that can hold the value 0x7fffff.
If you're trying to write portable C code, you should avoid such situations.

About the use of signed integers in C family of languages

a signed return value might yield more information (think error-numbers, 0 is sometimes a valid answer, -1 indicates error, see man read) ... which might be relevant especially for developers of libraries.
if you are worrying about the one extra bit you gain when using unsigned instead of signed then you are probably using the wrong type anyway. (also kind of "premature optimization" argument)
languages like python, ruby, jscript etc are doing just fine without signed vs unsigned. that might be an indicator ...

When using integer values in my own code, I always try to consider the signedness, asking myself if the integer should be signed or unsigned.
When I'm sure the value will never need to be negative, I then use an unsigned integer.
And I have to say this happen most of the time.
To carefully consider which type that is most suitable each time you declare a variable is very good practice! This means you are careful and professional. You should not only consider signedness, but also the potential max value that you expect this type to have.
The reason why you shouldn't use signed types when they aren't needed have nothing to do with performance, but with type safety. There are lots of potential, subtle bugs that can be caused by signed types:
The various forms of implicit promotions that exist in C can cause your type to change signedness in unexpected and possibly dangerous ways. The integer promotion rule that is part of the usual arithmetic conversions, the lvalue conversion upon assignment, the default argument promotions used by for example VA lists, and so on.
When using any form of bitwise operators or similar hardware-related programming, signed types are dangerous and can easily cause various forms of undefined behavior.
By declaring your integers unsigned, you automatically skip past a whole lot of the above dangers. Similarly, by declaring them as large as unsigned int or larger, you get rid of lots of dangers caused by the integer promotions.
Both size and signedness are important when it comes to writing rugged, portable and safe code. This is the reason why you should always use the types from stdint.h and not the native, so-called "primitive data types" of C.
So I asked myself: «is there a good reason for this, or do people just use signed integers because the don't care»?
I don't really think it is because they don't care, nor because they are lazy, even though declaring everything int is sometimes referred to as "sloppy typing" - which means sloppily picked type more than it means too lazy to type.
I rather believe it is because they lack deeper knowledge of the various things I mentioned above. There's a frightening amount of seasoned C programmers who don't know how implicit type promotions work in C, nor how signed types can cause poorly-defined behavior when used together with certain operators.
This is actually a very frequent source of subtle bugs. Many programmers find themselves staring at a compiler warning or a peculiar bug, which they can make go away by adding a cast. But they don't understand why, they simply add the cast and move on.
for( unsigned int i = foo.Length() - 1; i >= 0; --i ) {}
To me, this is just bad design
Indeed it is.
Once upon a time, down-counting loops would yield more effective code, because the compiler pick add a "branch if zero" instruction instead of a "branch if larger/smaller/equal" instruction - the former is faster. But this was at a time when compilers were really dumb and I don't believe such micro-optimizations are relevant any longer.
So there is rarely ever a reason to have a down-counting loop. Whoever made the argument probably just couldn't think outside the box. The example could have been rewritten as:
for(unsigned int i=0; i<foo.Length(); i++)
{
unsigned int index = foo.Length() - i - 1;
thing[index] = something;
}
This code should not have any impact on performance, but the loop itself turned a whole lot easier to read, while at the same time fixing the bug that your example had.
As far as performance is concerned nowadays, one should probably spend the time pondering about which form of data access that is most ideal in terms of data cache use, rather than anything else.
Some people may also say that signed integers may be useful, even for non-negative values, to provide an error flag, usually -1.
That's a poor argument. Good API design uses a dedicated error type for error reporting, such as an enum.
Instead of having some hobbyist-level API like
int do_stuff (int a, int b); // returns -1 if a or b were invalid, otherwise the result
you should have something like:
err_t do_stuff (int32_t a, int32_t b, int32_t* result);
// returns ERR_A is a is invalid, ERR_B if b is invalid, ERR_XXX if... and so on
// the result is stored in [result], which is allocated by the caller
// upon errors the contents of [result] remain untouched
The API would then consistently reserve the return of every function for this error type.
(And yes, many of the standard library functions abuse return types for error handling. This is because it contains lots of ancient functions from a time before good programming practice was invented, and they have been preserved the way they are for backwards-compatibility reasons. So just because you find a poorly-written function in the standard library, you shouldn't run off to write an equally poor function yourself.)
Overall, it sounds like you know what you are doing and giving signedness some thought. That probably means that knowledge-wise, you are actually already ahead of the people who wrote those posts and guides you are referring to.
The Google style guide for example, is questionable. Similar could be said about lots of other such coding standards that use "proof by authority". Just because it says Google, NASA or Linux kernel, people blindly swallow them no matter the quality of the actual contents. There are good things in those standards, but they also contain subjective opinions, speculations or blatant errors.
Instead I would recommend referring to real professional coding standards instead, such as MISRA-C. It enforces lots of thought and care for things like signedness, type promotion and type size, where less detailed/less serious documents just skip past it.
There is also CERT C, which isn't as detailed and careful as MISRA, but at least a sound, professional document (and more focused towards desktop/hosted development).

There is one heavy-weight argument against widely unsigned integers:
Premature optimization is the root of all evil.
We all have at least on one occasion been bitten by unsigned integers. Sometimes like in your loop, sometimes in other contexts. Unsigned integers add a hazard, even though a small one, to your program. And you are introducing this hazard to change the meaning of one bit. One little, tiny, insignificant-but-for-its-sign-meaning bit. On the other hand, the integers we work with in bread and butter applications are often far below the range of integers, more in the order of 10^1 than 10^7. Thus, the different range of unsigned integers is in the vast majority of cases not needed. And when it's needed, it is quite likely that this extra bit won't cut it (when 31 is too little, 32 is rarely enough) and you'll need a wider or an arbitrary-wide integer anyway. The pragmatic approach in these cases is to just use the signed integer and spare yourself the occasional underflow bug. Your time as a programmer can be put to much better use.

From the C FAQ:
The first question in the C FAQ is which integer type should we decide to use?
If you might need large values (above 32,767 or below -32,767), use long. Otherwise, if space is very important (i.e. if there are large arrays or many structures), use short. Otherwise, use int. If well-defined overflow characteristics are important and negative values are not, or if you want to steer clear of sign-extension problems when manipulating bits or bytes, use one of the corresponding unsigned types.
Another question concerns types conversions:
If an operation involves both signed and unsigned integers, the situation is a bit more complicated. If the unsigned operand is smaller (perhaps we're operating on unsigned int and long int), such that the larger, signed type could represent all values of the smaller, unsigned type, then the unsigned value is converted to the larger, signed type, and the result has the larger, signed type. Otherwise (that is, if the signed type can not represent all values of the unsigned type), both values are converted to a common unsigned type, and the result has that unsigned type.
You can find it here. So basically using unsigned integers, mostly for arithmetic conversions can complicate the situation since you'll have to either make all your integers unsigned, or be at the risk of confusing the compiler and yourself, but as long as you know what you are doing, this is not really a risk per se. However, it could introduce simple bugs.
And when it is a good to use unsigned integers? one situation is when using bitwise operations:
The << operator shifts its first operand left by a number of bits
given by its second operand, filling in new 0 bits at the right.
Similarly, the >> operator shifts its first operand right. If the
first operand is unsigned, >> fills in 0 bits from the left, but if
the first operand is signed, >> might fill in 1 bits if the high-order
bit was already 1. (Uncertainty like this is one reason why it's
usually a good idea to use all unsigned operands when working with the
bitwise operators.)
taken from here
And I've seen this somewhere:
If it was best to use unsigned integers for values that are never negative, we would have started by using unsigned int in the main function int main(int argc, char* argv[]). One thing is sure, argc is never negative.
EDIT:
As mentioned in the comments, the signature of main is due to historical reasons and apparently it predates the existence of the unsigned keyword.

Unsigned intgers are an artifact from the past. This is from the time, where processors could do unsigned arithmetic a little bit faster.
This is a case of premature optimization which is considered evil.
Actually, in 2005 when AMD introduced x86_64 (or AMD64, how it was then called), the 64 bit architecture for x86, they brought the ghosts of the past back: If a signed integer is used as an index and the compiler can not prove that it is never negative, is has to insert a 32 to 64 bit sign extension instruction - because the default 32 to 64 bit extension is unsigned (the upper half of a 64 bit register gets cleard if you move a 32 bit value into it).
But I would recommend against using unsigned in any arithmetic at all, being it pointer arithmetic or just simple numbers.
for( unsigned int i = foo.Length() - 1; i >= 0; --i ) {}
Any recent compiler will warn about such an construct, with condition ist always true or similar. With using a signed variable you avoid such pitfalls at all. Instead use ptrdiff_t.
A problem might be the c++ library, it often uses an unsigned type for size_t, which is required because of some rare corner cases with very large sizes (between 2^31 and 2^32) on 32 bit systems with certain boot switches ( /3GB windows).
There are many more, comparisons between signed and unsigned come to my mind, where the signed value automagically gets promoted to a unsigned and thus becomes a huge positive number, when it has been a small negative before.
One exception for using unsigned exists: For bit fields, flags, masks it is quite common. Usually it doesn't make sense at all to interpret the value of these variables as a magnitude, and the reader may deduce from the type that this variable is to be interpreted in bits.
The result will never be a negative value (as the section number, by the way). So why use a signed integer for this?
Because you might want to compare the return value to a signed value, which is actually negative. The comparison should return true in that case, but the C standard specifies that the signed get promoted to an unsigned in that case and you will get a false instead. I don't know about ObjectiveC though.

Is plain char usually/always unsigned on non-twos-complement systems?

Obviously the standard says nothing about this, but I'm interested more from a practical/historical standpoint: did systems with non-twos-complement arithmetic use a plain char type that's unsigned? Otherwise you have potentially all sorts of weirdness, like two representations for the null terminator, and the inability to represent all "byte" values in char. Do/did systems this weird really exist?

The null character used to terminate strings could never have two representations. It's defined like so (even in C90):
A byte with all bits set to 0, called the null character, shall exist in the basic execution character set
So a 'negative zero' on a ones-complement wouldn't do.
That said, I really don't know much of anything about non-two's complement C implementations. I used a one's-complement machine way back when in university, but don't remember much about it (and even if I cared about the standard back then, it was before it existed).

It's true, for the first 10 or 20 years of commercially produced computers (the 1950's and 60's) there were, apparently, some disagreements on how to represent negative numbers in binary. There were actually three contenders:
Two's complement, which not only won the war but also drove the others to extinction
One's complement, -x == ~x
Sign-magnitude, -x = x ^ 0x80000000
I think the last important ones-complement machine was probably the CDC-6600, at the time, the fastest machine on earth and the immediate predecessor of the first supercomputer.1.
Unfortunately, your question cannot really be answered, not because no one here knows the answer :-) but because the choice never had to be made. And this was for actually two reasons:
Two's complement took over simultaneously with byte machines. Byte addressing hit the world with the twos-complement IBM System/360. Previous machines had no bytes, only complete words had addresses. Sometimes programmers would pack characters inside these words and sometimes they would just use the whole word. (Word length varied from 12 bits to 60.)
C was not invented until a decade after the byte machines and two's complement transition. Item #1 happened in the 1960's, C first appeared on small machines in the 1970's and did not take over the world until the 1980's.
So there simply never was a time when a machine had signed bytes, a C compiler, and something other than a twos-complement data format. The idea of null-terminated strings was probably a repeatedly-invented design pattern thought up by one assembly language programmer after another, but I don't know that it was specified by a compiler until the C era.
In any case, the first actually standardized C ("C89") simply specifies "a byte or code of value zero is appended" and it is clear from the context that they were trying to be number-format independent. So, "+0" is a theoretical answer, but it may never really have existed in practice.
1. The 6600 was one of the most important machines historically, and not just because it was fast. Designed by Seymour Cray himself, it introduced out-of-order execution and various other elements later collectively called "RISC". Although others tried to claimed credit, Seymour Cray is the real inventor of the RISC architecture. There is no dispute that he invented the supercomputer. It's actually hard to name a past "supercomputer" that he didn't design.

I believe it would be almost but not quite possible for a system to have a one's-complement 'char' type, but there are four problems which cannot all be resolved:
Every data type must be representable as a sequence of char, such that if all the char values comprising two objects compare identical, the data objects containing in question will be identical.
Every data type must likewise be representable as a sequence of 'unsigned char'.
The unsigned char values into which any data type can be decomposed must form a group whose order is a power of two.
I don't believe the standard permits a one's-complement machine to special-case the value that would be negative zero and make it behave as something else.
It might be possible to have a standards-compliant machine with a one's-complement or sign-magnitude "char" type if the only way to get a negative zero would be by overlaying some other data type, and if negative zero compared unequal to positive zero. I'm not sure if that could be standards-compliant or not.
EDIT
BTW, if requirement #2 were relaxed, I wonder what the exact requirements would be when overlaying other data types onto 'char'? Among other things, while the standard makes it abundantly clear that one must be able to perform assignments and comparisons on any 'char' values that may result from overlaying another variable onto a 'char', I don't know that it imposes any requirement that all such values must behave as an arithmetic group. For example, I wonder what the legality would be of a machine in which every memory location was physically stored as 66 bits, with the top two bits indicating whether the value was a 64-bit integer, a 32-bit memory handle plus a 32-bit offset, or a 64-bit double-precision floating-point number? Since the standard allows implementations to do anything they like when an arithmetic computation exceeds the range of a signed type, that would suggest that signed types do not necessarily have to behave as a group.
For most signed types, there's no requirement that the type be unable to represent any numbers outside the range specified in limits.h; if limits.h specifies that the minimum "int" is -32767, then it would be perfectly legitimate for an implementation to in fact allow a value of -32768 since any program that tried to do so would invoke Undefined Behavior. The key question would probably be whether it would be legitimate for a 'char' value resulting from the overlay of some other type to yield a value outside the range specified in limits.h. I wonder what the standard says?

Calculating with a variable outside of its bounds in C

If I make a calculation with a variable where an intermediate part of the calculation goes higher then the bounds of that variable type, is there any hazard that some platforms may not like?
This is an example of what I'm asking:
int a, b;
a=30000;
b=(a*32000)/32767;
I have compiled this, and it does give the correct answer of 29297 (well, within truncating error, anyway). But the part that worries me is that 30,000*32,000 = 960,000,000, which is a 30-bit number, and thus cannot be stored in a 16-bit int. The end result is well within the bounds of an int, but I was expecting that whatever working part of memory would have the same size allocated as the largest source variables did, so an overflow error would occur.
This is just a small example to show my problem, I am trying to avoid using floating points by making the fraction be a fraction of the max amount able to be stored in that variable (in this case, a signed integer, so 32767 on the positive side), because the embedded system I'm using I believe does not have an FPU.
So how do most processors handle calculations out of the bounds of the source and destination variables?

On a 16-bit compiler/CPU, you can (almost) plan on that code giving incorrect results. This is a bit sad, since nearly every CPU (that has a multiply instruction at all) will produce and store the intermediate result, but no C compiler (of which I'm aware) will normally use it (and if you made a and b unsigned, it wouldn't be allowed to use it).
You have a few choices to deal with this. One is to write small muldiv function in assembly language that does the multiplication (preserving the high word) then the division on that, and finally returns the value to C when it's been reduced back into range.
Another option is to do the math on unsigned integers, which at least allow you to figure out when a problem occurred. Unfortunately, none of the choices is what I'd call particularly appealing though...

As far as I know, most if not all processors will hold results for a word * word multiplication in a double word -- meaning, an 8 bit * 8 bit is stored in a 16-bit register(s) on an 8-bit processor, a 32-bit * 32 bit operation is stored in a 64-bit register(s) on a 32-bit machine. (At least, that's how it's been on all the embedded microcontrollers I've used)
If that weren't the case, the processor would be severely crippled in the sense of only allowing half-word * half-word multiplication.

AFAIK this kind of thing is formally "undefined". You have to do the algebra necessary to prevent overflow. That's always your first choice. Numeric stability is no accident, it requires some care in deciding when and how to do division and multiplication.
Or, you have to guarantee that you'll use an intermediate result buffer that's big enough.
Using a large intermediate buffer is what some C compilers do anyway. The language, however, doesn't make any guarantees.
So, to be sure that it works, most folks do something like this.
short a= 30000;
int temp= a;
int temp2= (a*32000)/32767;
// here you can check for errors; if temp2 > 32767, you have overflow.
short b= a;

Signed integer overflow is undefined behavior.
Almost any implementation you could possibly meet will wrap around on integer overflow, because (a) everyone uses 2's complement, in which arithmetic operations are bitwise identical for signed and unsigned types of the same size, and (b) wraparound is the defined behavior of unsigned types in C.
So, on an implementation with a 16 bit int, I would expect the result 0 for your calculation (and that is the result that it must have if you'd used an unsigned 16 bit int). But I'd code against the possibility it might throw a hardware exception, explode, etc.
Note that if you do the calculation with two 16 bit short variables on a machine with a 32 bit int, then you will generally get the "right" answer 29297, because the intermediate value (a*32000) is an int, and only gets truncated back to short at the end. I say "generally" because converting an out-of-bounds integer value to a signed integer type either gives an unspecified result or else raises a signal. But again, any implementation you'll encounter in polite company just takes a modulus.

Are you sure your compiler has 16 bit integers? On most systems nowadays, ints are 32 bits. Another possible reason you aren't getting an error is that some compilers will recognize that it can compute something like this at compile time and will do so.
If you are really concerned that you will end up with overflow, you can sometimes reorder or factor the formula differently so that no intermediate terms will overflow. In your example that would be hard to do since all of your terms are near the limit of a 16 bit value. Do you need the number to be exactly right, or can you approximate? If you can, you can do something like this:
int a, b;
a=30000;
//b=(a*32000)/32767 ~= a * (32000/32768) = a *(125/128)
b = (a / 128) * 125 // if a=30000, b = 29250 - about 0.16% error
Another option would be to use larger sized types for intermediate terms. If your compiler had 16 bit ints and 32 bit longs, you could do something like this:
int a, b;
a=30000;
b=((long)a*32000L)/32767L;
Really, there's no set answer for how to handle overflow. You need to evaluate each case on its own and decide what the best solution is.

Your compiler and target processor both have to do with the sizes of the various data types.
Compilers will usually promote variables to the largest easy to work with size during calculations and then convert the results whatever size is needed for an assignment at the end.
There's also C rules that govern promoting to sizes which are more difficult to work with for some calculations. If you are compiling for an AVR, which has 8 bit registers but defines an int to be 16 bits, many calculations end up using more registers than you might think that they need because of this promotion and the fact that constant numbers in your code have to be thought of as being int or unsigned int unless the compiler can prove to itself that this won't effect the outcome of the calculations.
Try rewriting your code with various different sizes of integers (short, int, long, long long) and see how that goes. You may also want to write a simple program that prints out the sizeof( ) of the standard predefined types.
If you need to worry about the sizes of your integer variables and/or the intermediate results of your calculations then you should include and use things like uint32_t and int64_t for your declarations and type casting.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight