I'm reading redis' source code from https://github.com/antirez/redis.
I saw such macros in src/ziplist.c
#define INT24_MAX 0x7fffff
#define INT24_MIN (-INT24_MAX - 1)
why not just do this?
#define INT24_MIN (~INT24_MAX)
A better question might be why do you think (~INT24_MAX) is better than (-INT24_MAX - 1)?
On a two's complement machine you get the same result from either expression, and both of them evaluate just as fast as the other (for a 32-bit target, the compiler will reduce both of them them to 0xff800000 at compile time). However, in my opinion the expression (-INT24_MAX - 1) models the numeric concept that the minimum value is one less than the negation of the maximum value better.
That might not be of any huge importance, but the expression (~INT24_MAX) isn't better in an objective way, and I'd argue that subjectively it might not be as good.
Basically, (-INT24_MAX - 1) might have been what the coder happened to think of (maybe since as I mentioned, it models what's numerically intended), and there's no reason to use something else.
Suppose, int is 32-bit and can hold 0x7fffff, then ~0x7fffff is going to be ~0x007fffff or, after all bits have been inverted, 0xff800000.
This bit pattern represents the negative value -0x7fffff-1 if negative integers use the 2's complement representation.
If they use the 1's complement representation, then this pattern represents the negative value -0x7fffff.
If they use the sign-magnitude representation, then this pattern represents the negtative value -0x7f800000.
As you can see, the value of ~0x7fffff is going to depend on the representation of negative integers and the size of the type that can hold the value 0x7fffff.
If you're trying to write portable C code, you should avoid such situations.
Related
I recently had an interview where I had to propose a function that checks whether all bits from a uint32_t are set or not.
I wrote the following code :
int checkStatus(uint32_t val) {
return val == UINT32_MAX;
}
I assumed it would return 0 if one bit isn't set, and 1 if both of the values are the same.
As I didn't see this solution anywhere else, I assume it's wrong. But I don't understand why. When I did it, I was thinking : "if all bits are set to 1, then the value should be the maximum unsigned integer represented otherwise not." Could you tell me if it was incorrect and why?
The problem with using UINT32_MAX here is that "maximum" is an arithmetic concept, not a bitwise logical concept. Although UINT32_MAX does, in fact, represent the (unsigned) number 0xFFFFFFFF in most scenarios, this is just a consequence of the way numbers are represented in twos-complement arithmetic.
Using UINT32_MAX here has the potential to mislead the reader, because it has this connotation of arithmetic, rather than of bitwise manipulation.
In C programming, we're all using to representing bit patterns as hex numbers. The number "0xFFFFFFFF" immediately expresses the notion that all bits are set, in a way that UNIT32_MAX does not. Similarly, it may not be appropriate to use the value "0xFFFFFFFF" to represent "minus 1" in an arithmetic expression, even if the values are, in practice, the same.
Because I make most of my living looking for errors in other people's code, I'm acutely aware of the differences in expressive powers of different ways of expressing the same thing. Sometimes we have to write inexpressive things for reasons of inefficiency; but I don't think we should do it just for the sake of it.
I have read:
"Like an unsigned int, but offset by −(2^(n−1) − 1), where n is the number of bits in the numeral. Aside:
Technically we could choose any bias we please, but the choice presented here is extraordinarily common." - http://inst.eecs.berkeley.edu/~cs61c/sp14/disc/00/Disc0.pdf
However, I don't get what the point is. Can someone explain this to me with examples? Also, when should I use it, given other options like one's compliment, sign and mag, and two's compliment?
Biased notation is a way of storing a range of values that doesn't start with zero.
Put simply, you take an existing representation that goes from zero to N, and then add a bias B to each number so it now goes from B to N+B.
Floating-point exponents are stored with a bias to keep the dynamic range of the type "centered" on 1.
Excess-three encoding is a technique for simplifying decimal arithmetic using a bias of three.
Two's complement notation could be considered as biased notation with a bias of INT_MIN and the most-significant bit flipped.
A "representation" is a way of encoding information so that it easy to extract details or inferences from the encoded information.
Most modern CPUs "represent" numbers using "twos complement notation". They do this because it is easy to design digital circuits that can do what amounts to arithmetic on these values quickly (add, subtract, multiply, divide, ...). Twos complement also has the nice property that one can interpret the most significant bit as either a power-of-two (giving "unsigned numbers") or as a sign bit (giving signed numbers) without changing essentially any of the hardware used to implement the arithmetic.
Older machines used other bases, e.g, quite common in the 60s were machines that represented numbers as sets of binary-coded-decimal digits stuck in 4-bit addressable nibbles (the IBM 1620 and 1401 are examples of this). So, you can represent that same concept or value different ways.
A bias just means that whatever representation you chose (for numbers), you have added a constant bias to that value. Presumably that is done to enable something to be done more effectively. I can't speak to "−(2^(n−1) − 1)" being "an extraordinaly common (bias)"; I do lots of assembly and C coding and pretty don't find a need to "bias" values.
However, there is a common example. Modern CPUs largely implement IEEE floating point, which stores floating point numbers with sign, exponent, mantissa. The exponent is is power of two, symmetric around zero, but biased by 2^(N-1) if I recall correctly, for an N-bit exponent.
This bias allows floating point values with the same sign to be compared for equal/less/greater by using the standard machine twos-complement instructions rather than a special floating point instruction, which means that sometimes use of actual floating point compares can be avoided. (See http://www.cygnus-software.com/papers/comparingfloats/comparingfloats.htm for dark corner details). [Thanks to #PotatoSwatter for noting
the inaccuracy of my initial answer here, and making me go dig this out.]
it's idiomatic to initialize a block of memory to zero by
memset(p, 0, size_of_p);
when we want to initialize it to minus one, we can:
memset(p, -1, size_of_p);
no matter what type p is, because in two's complemenatry representation, minus one is 0xff for 8 bits integer, 0xffff for 16 bits, and 0xffffffff for 32 bits.
My concern is, is such two's complementary representation universally applicable in the realm of modern computers? Can I expect such code platform-independent and robust enough to port to other platforms?
Thanks in advance.
No, there are three schemes of representing negative numbers allowed by the ISO C standard:
two's complement;
ones' complement; and
sign/magnitude.
However, you should keep in mind that it's been a long time since I've seen a platform using the two less common schemes. I would say that all modern implementations use two's complement.
You only want to concern yourself with this if you're trying to be 100% portable. If you're the type of person who's happy to be 99.99999% portable, don't worry about it.
See also info on the ones' complement Unisys 2200 (still active as at 2010) and this answer explaining the layouts.
The simple answer is yes, but the better answer is that you're prematurely optimising.
Write code that is obvious instead of code that you think is fast:
for(i = 0; i < p; i++)
array[i] = -1;
Will be automatically converted by an optimising compiler to the fastest possible representation (in VS it will become a memset when p is large enough) that does what you want without you having to think about whether this premature optimisation is always valid.
I saw a line of code like this:
xxxxx = (uint16_t) -1;
Why cast -1 into a unsigned int? What is this code to get?
Thanks!
Obviously the answer is within reach of your fingertips. Just plug that code in and step through and you will see the value that you get when you cast -1 to a 16 bit unsigned integer.
The value turns out to be two's complement of -1, which is: 0xFFFF hex, or 65535 decimal.
As to the actual reason for using the code like that, it's simply a short-cut. Or maybe it's just to satisfy a type compatibility requirement.
If you're wondering how come -1 gets cast to 0xFFFF (65535) and not maybe 0 or 1, as one might expect, you have to understand that the reason for that is that the C language, although statically typed, is quite liberal when it comes to enforcing type restrictions. That means that it will happily cast - or interpret if you will - any memory location as whatever arbitrary type of data you tell it. This of course can have quite devastating consequences if used improperly but the trade-off is flexibility and a speed improvement due to the lack of strict sanity checks. This was very important a few decades ago when C was designed and it still is if you're writing code for very tiny processors.
That being said, if you think about a cast as simply saying: "disregard what you think you know about the data at this memory location, just tell me what it would mean if you read it as a <insert_your_type_here>" and if you know that computers usually represent negative numbers as two's complement (see above) then the answer should by now by pretty obvious: C is taking the value in memory and reading it back as an unsigned integer.
As an ending note. I should point out that C is not the only language that will cast -1 to 0xFFFF but even more modern languages that are capable of stronger type checks will do the same, probably for compatibility and continuity reasons, as well as for the reason that it makes it possible to reverse the cast: 0xFFFF back to a signed 16 bit integer is -1.
It will return the highest possible unsigned 16bit integer.
It returns 65535 which is the maximum value of a two byte (16 bit) unsigned integer.
The latter representation looks more natural to understand. Why do most languages choose the former one? I guess there must be some unique and thus advantageous characteristics in the Two's complement which make data operations easier.
Languages don't specify the number format; the hardware does. Ask Intel why they designed their ALU to do 2's complement
The answer will be because the mathematical operations are more regular in 2s complement; positive and negative numbers need to be handled differently in 1s complement, which means double the hardware/microcode needed for basic math in the CPU.
From Wikipedia
The two's-complement system has the advantage that the fundamental arithmetical operations of addition, subtraction and multiplication are identical, regardless of whether the inputs and outputs are interpreted as unsigned binary numbers or two's complement (provided that overflow is ignored). This property makes the system both simpler to implement and capable of easily handling higher precision arithmetic. Also, zero has only a single representation, obviating the subtleties associated with negative zero, which exists in ones'-complement systems.