Can I use "MAX" macros to check whether all bits are set? - c

I recently had an interview where I had to propose a function that checks whether all bits from a uint32_t are set or not.
I wrote the following code :
int checkStatus(uint32_t val) {
return val == UINT32_MAX;
}
I assumed it would return 0 if one bit isn't set, and 1 if both of the values are the same.
As I didn't see this solution anywhere else, I assume it's wrong. But I don't understand why. When I did it, I was thinking : "if all bits are set to 1, then the value should be the maximum unsigned integer represented otherwise not." Could you tell me if it was incorrect and why?

The problem with using UINT32_MAX here is that "maximum" is an arithmetic concept, not a bitwise logical concept. Although UINT32_MAX does, in fact, represent the (unsigned) number 0xFFFFFFFF in most scenarios, this is just a consequence of the way numbers are represented in twos-complement arithmetic.
Using UINT32_MAX here has the potential to mislead the reader, because it has this connotation of arithmetic, rather than of bitwise manipulation.
In C programming, we're all using to representing bit patterns as hex numbers. The number "0xFFFFFFFF" immediately expresses the notion that all bits are set, in a way that UNIT32_MAX does not. Similarly, it may not be appropriate to use the value "0xFFFFFFFF" to represent "minus 1" in an arithmetic expression, even if the values are, in practice, the same.
Because I make most of my living looking for errors in other people's code, I'm acutely aware of the differences in expressive powers of different ways of expressing the same thing. Sometimes we have to write inexpressive things for reasons of inefficiency; but I don't think we should do it just for the sake of it.

Related

When is it useful to write "0 - x" rather than "-x"?

I've occasionally noticed some C code insisting on using 0 - x to get the additive complement of x, rather than writing -x. Now, I suppose these are not equivalent for types smaller in size than int (edit: Nope, apparently equivalent even then), but otherwise - is there some benefit to the former rather than the latter form?
tl;dr: 0-x is useful for scrubbing the sign of floating-point zero.
(As #Deduplicator points out in a comment:)
Many of us tend to forget that, in floating-point types, we have both a "positive zero" and a "negative zero" value - flipping the sign bit on and off leaves the same mantissa and exponent. Read more on this here.
Well, it turns out that the two expressions behave differently on positive-signed zero, and the same on negative-signed zero, as per the following:
value of x
value of 0-x
value of -x
-.0
0
0
0
0
-.0
See this on Coliru.
So, when x is of a floating-point type,
If you want to "forget the sign of zero", use 0-x.
If you want to "keep the sign of zero", use x.
For integer types it shouldn't matter.
On the other hand, as #NateEldredge points out the expressions should be equivalent on small integer types, due to integer promotion - -x translates into a promotion of x into an int, then applying the minus sign.
There is no technical reason to do this today. At least not with integers. And at least not in a way that a sane (according to some arbitrary definition) coder would use. Sure, it could be the case that it causes a cast. I'm actually not 100% sure, but in that case I would use an explicit cast instead to clearly communicate the intention.
As M.M pointed out, there were reasons in the K&R time, when =- was equivalent to -=. This had the effect that x=-y was equivalent to x=x-y instead of x=0-y. This was undesirable effect, so the feature was removed.
Today, the reason would be readability. Especially if you're writing a mathematical formula and want to point out that a parameter is zero. One example would be the distance formula. The distance from (x,y) to origo is sqrt(pow(0-x, 2), pow(0-y, 2))

Why is infinity = 0x3f3f3f3f?

In some situations, one generally uses a large enough integer value to represent infinity. I usually use the largest representable positive/negative integer. That usually yields more code, since you need to check if one of the operands is infinity before virtually all arithmetic operations in order to avoid overflows. Sometimes it would be desirable to have saturated integer arithmetic. For that reason, some people use smaller values for infinity, that can be added or multiplied several times without overflow. What intrigues me is the fact that it's extremely common to see (specially in programming competitions):
const int INF = 0x3f3f3f3f;
Why is that number special? It's binary representation is:
00111111001111110011111100111111
I don't see any specially interesting property here. I see it's easy to type, but if that was the reason, almost anything would do (0x3e3e3e3e, 0x2f2f2f2f, etc). It can be added once without overflow, which allows for:
a = min(INF, b + c);
But all the other constants would do, then. Googling only shows me a lot of code snippets that use that constant, but no explanations or comments.
Can anyone spot it?
I found some evidence about this here (original content in Chinese); the basic idea is that 0x7fffffff is problematic since it's already "the top" of the range of 4-byte signed ints; so, adding anything to it results in negative numbers; 0x3f3f3f3f, instead:
is still quite big (same order of magnitude of 0x7fffffff);
has a lot of headroom; if you say that the valid range of integers is limited to numbers below it, you can add any "valid positive number" to it and still get an infinite (i.e. something >=INF). Even INF+INF doesn't overflow. This allows to keep it always "under control":
a+=b;
if(a>INF)
a=INF;
is a repetition of equal bytes, which means you can easily memset stuff to INF;
also, as #Jörg W Mittag noticed above, it has a nice ASCII representation, that allows both to spot it on the fly looking at memory dumps, and to write it directly in memory.
I may or may not be one of the earliest discoverers of 0x3f3f3f3f. I published a Romanian article about it in 2004 (http://www.infoarena.ro/12-ponturi-pentru-programatorii-cc #9), but I've been using this value since 2002 at least for programming competitions.
There are two reasons for it:
0x3f3f3f3f + 0x3f3f3f3f doesn't overflow int32. For this some use 100000000 (one billion).
one can set an array of ints to infinity by doing memset(array, 0x3f, sizeof(array))
0x3f3f3f3f is the ASCII representation of the string ????.
Krugle finds 48 instances of that constant in its entire database. 46 of those instances are in a Java project, where it is used as a bitmask for some graphics manipulation.
1 project is an operating system, where it is used to represent an unknown ACPI device.
1 project is again a bitmask for Java graphics.
So, in all of the projects indexed by Krugle, it is used 47 times because of its bitpattern, once because of its ASCII interpretation, and not a single time as a representation of infinity.

Why not use complement of VAL instead of (-VAL -1)

I'm reading redis' source code from https://github.com/antirez/redis.
I saw such macros in src/ziplist.c
#define INT24_MAX 0x7fffff
#define INT24_MIN (-INT24_MAX - 1)
why not just do this?
#define INT24_MIN (~INT24_MAX)
A better question might be why do you think (~INT24_MAX) is better than (-INT24_MAX - 1)?
On a two's complement machine you get the same result from either expression, and both of them evaluate just as fast as the other (for a 32-bit target, the compiler will reduce both of them them to 0xff800000 at compile time). However, in my opinion the expression (-INT24_MAX - 1) models the numeric concept that the minimum value is one less than the negation of the maximum value better.
That might not be of any huge importance, but the expression (~INT24_MAX) isn't better in an objective way, and I'd argue that subjectively it might not be as good.
Basically, (-INT24_MAX - 1) might have been what the coder happened to think of (maybe since as I mentioned, it models what's numerically intended), and there's no reason to use something else.
Suppose, int is 32-bit and can hold 0x7fffff, then ~0x7fffff is going to be ~0x007fffff or, after all bits have been inverted, 0xff800000.
This bit pattern represents the negative value -0x7fffff-1 if negative integers use the 2's complement representation.
If they use the 1's complement representation, then this pattern represents the negative value -0x7fffff.
If they use the sign-magnitude representation, then this pattern represents the negtative value -0x7f800000.
As you can see, the value of ~0x7fffff is going to depend on the representation of negative integers and the size of the type that can hold the value 0x7fffff.
If you're trying to write portable C code, you should avoid such situations.

Why cast like this? "(uint16_t) -1;"

I saw a line of code like this:
xxxxx = (uint16_t) -1;
Why cast -1 into a unsigned int? What is this code to get?
Thanks!
Obviously the answer is within reach of your fingertips. Just plug that code in and step through and you will see the value that you get when you cast -1 to a 16 bit unsigned integer.
The value turns out to be two's complement of -1, which is: 0xFFFF hex, or 65535 decimal.
As to the actual reason for using the code like that, it's simply a short-cut. Or maybe it's just to satisfy a type compatibility requirement.
If you're wondering how come -1 gets cast to 0xFFFF (65535) and not maybe 0 or 1, as one might expect, you have to understand that the reason for that is that the C language, although statically typed, is quite liberal when it comes to enforcing type restrictions. That means that it will happily cast - or interpret if you will - any memory location as whatever arbitrary type of data you tell it. This of course can have quite devastating consequences if used improperly but the trade-off is flexibility and a speed improvement due to the lack of strict sanity checks. This was very important a few decades ago when C was designed and it still is if you're writing code for very tiny processors.
That being said, if you think about a cast as simply saying: "disregard what you think you know about the data at this memory location, just tell me what it would mean if you read it as a <insert_your_type_here>" and if you know that computers usually represent negative numbers as two's complement (see above) then the answer should by now by pretty obvious: C is taking the value in memory and reading it back as an unsigned integer.
As an ending note. I should point out that C is not the only language that will cast -1 to 0xFFFF but even more modern languages that are capable of stronger type checks will do the same, probably for compatibility and continuity reasons, as well as for the reason that it makes it possible to reverse the cast: 0xFFFF back to a signed 16 bit integer is -1.
It will return the highest possible unsigned 16bit integer.
It returns 65535 which is the maximum value of a two byte (16 bit) unsigned integer.

About the use of signed integers in C family of languages

When using integer values in my own code, I always try to consider the signedness, asking myself if the integer should be signed or unsigned.
When I'm sure the value will never need to be negative, I then use an unsigned integer.
And I have to say this happen most of the time.
When reading other peoples' code, I rarely see unsigned integers, even if the represented value can't be negative.
So I asked myself: «is there a good reason for this, or do people just use signed integers because the don't care»?
I've search on the subject, here and in other places, and I have to say I can't find a good reason not to use unsigned integers, when it applies.
I came across those questions: «Default int type: Signed or Unsigned?», and «Should you always use 'int' for numbers in C, even if they are non-negative?» which both present the following example:
for( unsigned int i = foo.Length() - 1; i >= 0; --i ) {}
To me, this is just bad design. Of course, it may result in an infinite loop, with unsigned integers.
But is it so hard to check if foo.Length() is 0, before the loop?
So I personally don't think this is a good reason for using signed integers all the way.
Some people may also say that signed integers may be useful, even for non-negative values, to provide an error flag, usually -1.
Ok, that's good to have a specific value that means «error».
But then, what's wrong with something like UINT_MAX, for that specific value?
I'm actually asking this question because it may lead to some huge problems, usually when using third-party libraries.
In such a case, you often have to deal with signed and unsigned values.
Most of the time, people just don't care about the signedness, and just assign a, for instance, an unsigned int to a signed int, without checking the range.
I have to say I'm a bit paranoid with the compiler warning flags, so with my setup, such an implicit cast will result in a compiler error.
For that kind of stuff, I usually use a function or macro to check the range, and then assign using an explicit cast, raising an error if needed.
This just seems logical to me.
As a last example, as I'm also an Objective-C developer (note that this question is not related to Objective-C only):
- ( NSInteger )tableView: ( UITableView * )tableView numberOfRowsInSection: ( NSInteger )section;
For those not fluent with Objective-C, NSInteger is a signed integer.
This method actually retrieves the number of rows in a table view, for a specific section.
The result will never be a negative value (as the section number, by the way).
So why use a signed integer for this?
I really don't understand.
This is just an example, but I just always see that kind of stuff, with C, C++ or Objective-C.
So again, I'm just wondering if people just don't care about that kind of problems, or if there is finally a good and valid reason not to use unsigned integers for such cases.
Looking forward to hear your answers : )
a signed return value might yield more information (think error-numbers, 0 is sometimes a valid answer, -1 indicates error, see man read) ... which might be relevant especially for developers of libraries.
if you are worrying about the one extra bit you gain when using unsigned instead of signed then you are probably using the wrong type anyway. (also kind of "premature optimization" argument)
languages like python, ruby, jscript etc are doing just fine without signed vs unsigned. that might be an indicator ...
When using integer values in my own code, I always try to consider the signedness, asking myself if the integer should be signed or unsigned.
When I'm sure the value will never need to be negative, I then use an unsigned integer.
And I have to say this happen most of the time.
To carefully consider which type that is most suitable each time you declare a variable is very good practice! This means you are careful and professional. You should not only consider signedness, but also the potential max value that you expect this type to have.
The reason why you shouldn't use signed types when they aren't needed have nothing to do with performance, but with type safety. There are lots of potential, subtle bugs that can be caused by signed types:
The various forms of implicit promotions that exist in C can cause your type to change signedness in unexpected and possibly dangerous ways. The integer promotion rule that is part of the usual arithmetic conversions, the lvalue conversion upon assignment, the default argument promotions used by for example VA lists, and so on.
When using any form of bitwise operators or similar hardware-related programming, signed types are dangerous and can easily cause various forms of undefined behavior.
By declaring your integers unsigned, you automatically skip past a whole lot of the above dangers. Similarly, by declaring them as large as unsigned int or larger, you get rid of lots of dangers caused by the integer promotions.
Both size and signedness are important when it comes to writing rugged, portable and safe code. This is the reason why you should always use the types from stdint.h and not the native, so-called "primitive data types" of C.
So I asked myself: «is there a good reason for this, or do people just use signed integers because the don't care»?
I don't really think it is because they don't care, nor because they are lazy, even though declaring everything int is sometimes referred to as "sloppy typing" - which means sloppily picked type more than it means too lazy to type.
I rather believe it is because they lack deeper knowledge of the various things I mentioned above. There's a frightening amount of seasoned C programmers who don't know how implicit type promotions work in C, nor how signed types can cause poorly-defined behavior when used together with certain operators.
This is actually a very frequent source of subtle bugs. Many programmers find themselves staring at a compiler warning or a peculiar bug, which they can make go away by adding a cast. But they don't understand why, they simply add the cast and move on.
for( unsigned int i = foo.Length() - 1; i >= 0; --i ) {}
To me, this is just bad design
Indeed it is.
Once upon a time, down-counting loops would yield more effective code, because the compiler pick add a "branch if zero" instruction instead of a "branch if larger/smaller/equal" instruction - the former is faster. But this was at a time when compilers were really dumb and I don't believe such micro-optimizations are relevant any longer.
So there is rarely ever a reason to have a down-counting loop. Whoever made the argument probably just couldn't think outside the box. The example could have been rewritten as:
for(unsigned int i=0; i<foo.Length(); i++)
{
unsigned int index = foo.Length() - i - 1;
thing[index] = something;
}
This code should not have any impact on performance, but the loop itself turned a whole lot easier to read, while at the same time fixing the bug that your example had.
As far as performance is concerned nowadays, one should probably spend the time pondering about which form of data access that is most ideal in terms of data cache use, rather than anything else.
Some people may also say that signed integers may be useful, even for non-negative values, to provide an error flag, usually -1.
That's a poor argument. Good API design uses a dedicated error type for error reporting, such as an enum.
Instead of having some hobbyist-level API like
int do_stuff (int a, int b); // returns -1 if a or b were invalid, otherwise the result
you should have something like:
err_t do_stuff (int32_t a, int32_t b, int32_t* result);
// returns ERR_A is a is invalid, ERR_B if b is invalid, ERR_XXX if... and so on
// the result is stored in [result], which is allocated by the caller
// upon errors the contents of [result] remain untouched
The API would then consistently reserve the return of every function for this error type.
(And yes, many of the standard library functions abuse return types for error handling. This is because it contains lots of ancient functions from a time before good programming practice was invented, and they have been preserved the way they are for backwards-compatibility reasons. So just because you find a poorly-written function in the standard library, you shouldn't run off to write an equally poor function yourself.)
Overall, it sounds like you know what you are doing and giving signedness some thought. That probably means that knowledge-wise, you are actually already ahead of the people who wrote those posts and guides you are referring to.
The Google style guide for example, is questionable. Similar could be said about lots of other such coding standards that use "proof by authority". Just because it says Google, NASA or Linux kernel, people blindly swallow them no matter the quality of the actual contents. There are good things in those standards, but they also contain subjective opinions, speculations or blatant errors.
Instead I would recommend referring to real professional coding standards instead, such as MISRA-C. It enforces lots of thought and care for things like signedness, type promotion and type size, where less detailed/less serious documents just skip past it.
There is also CERT C, which isn't as detailed and careful as MISRA, but at least a sound, professional document (and more focused towards desktop/hosted development).
There is one heavy-weight argument against widely unsigned integers:
Premature optimization is the root of all evil.
We all have at least on one occasion been bitten by unsigned integers. Sometimes like in your loop, sometimes in other contexts. Unsigned integers add a hazard, even though a small one, to your program. And you are introducing this hazard to change the meaning of one bit. One little, tiny, insignificant-but-for-its-sign-meaning bit. On the other hand, the integers we work with in bread and butter applications are often far below the range of integers, more in the order of 10^1 than 10^7. Thus, the different range of unsigned integers is in the vast majority of cases not needed. And when it's needed, it is quite likely that this extra bit won't cut it (when 31 is too little, 32 is rarely enough) and you'll need a wider or an arbitrary-wide integer anyway. The pragmatic approach in these cases is to just use the signed integer and spare yourself the occasional underflow bug. Your time as a programmer can be put to much better use.
From the C FAQ:
The first question in the C FAQ is which integer type should we decide to use?
If you might need large values (above 32,767 or below -32,767), use long. Otherwise, if space is very important (i.e. if there are large arrays or many structures), use short. Otherwise, use int. If well-defined overflow characteristics are important and negative values are not, or if you want to steer clear of sign-extension problems when manipulating bits or bytes, use one of the corresponding unsigned types.
Another question concerns types conversions:
If an operation involves both signed and unsigned integers, the situation is a bit more complicated. If the unsigned operand is smaller (perhaps we're operating on unsigned int and long int), such that the larger, signed type could represent all values of the smaller, unsigned type, then the unsigned value is converted to the larger, signed type, and the result has the larger, signed type. Otherwise (that is, if the signed type can not represent all values of the unsigned type), both values are converted to a common unsigned type, and the result has that unsigned type.
You can find it here. So basically using unsigned integers, mostly for arithmetic conversions can complicate the situation since you'll have to either make all your integers unsigned, or be at the risk of confusing the compiler and yourself, but as long as you know what you are doing, this is not really a risk per se. However, it could introduce simple bugs.
And when it is a good to use unsigned integers? one situation is when using bitwise operations:
The << operator shifts its first operand left by a number of bits
given by its second operand, filling in new 0 bits at the right.
Similarly, the >> operator shifts its first operand right. If the
first operand is unsigned, >> fills in 0 bits from the left, but if
the first operand is signed, >> might fill in 1 bits if the high-order
bit was already 1. (Uncertainty like this is one reason why it's
usually a good idea to use all unsigned operands when working with the
bitwise operators.)
taken from here
And I've seen this somewhere:
If it was best to use unsigned integers for values that are never negative, we would have started by using unsigned int in the main function int main(int argc, char* argv[]). One thing is sure, argc is never negative.
EDIT:
As mentioned in the comments, the signature of main is due to historical reasons and apparently it predates the existence of the unsigned keyword.
Unsigned intgers are an artifact from the past. This is from the time, where processors could do unsigned arithmetic a little bit faster.
This is a case of premature optimization which is considered evil.
Actually, in 2005 when AMD introduced x86_64 (or AMD64, how it was then called), the 64 bit architecture for x86, they brought the ghosts of the past back: If a signed integer is used as an index and the compiler can not prove that it is never negative, is has to insert a 32 to 64 bit sign extension instruction - because the default 32 to 64 bit extension is unsigned (the upper half of a 64 bit register gets cleard if you move a 32 bit value into it).
But I would recommend against using unsigned in any arithmetic at all, being it pointer arithmetic or just simple numbers.
for( unsigned int i = foo.Length() - 1; i >= 0; --i ) {}
Any recent compiler will warn about such an construct, with condition ist always true or similar. With using a signed variable you avoid such pitfalls at all. Instead use ptrdiff_t.
A problem might be the c++ library, it often uses an unsigned type for size_t, which is required because of some rare corner cases with very large sizes (between 2^31 and 2^32) on 32 bit systems with certain boot switches ( /3GB windows).
There are many more, comparisons between signed and unsigned come to my mind, where the signed value automagically gets promoted to a unsigned and thus becomes a huge positive number, when it has been a small negative before.
One exception for using unsigned exists: For bit fields, flags, masks it is quite common. Usually it doesn't make sense at all to interpret the value of these variables as a magnitude, and the reader may deduce from the type that this variable is to be interpreted in bits.
The result will never be a negative value (as the section number, by the way). So why use a signed integer for this?
Because you might want to compare the return value to a signed value, which is actually negative. The comparison should return true in that case, but the C standard specifies that the signed get promoted to an unsigned in that case and you will get a false instead. I don't know about ObjectiveC though.

Resources