I'm struggling with determining edge-sizes of each variable. I can't understand the following problem.
To get maximum value of char for example, I use: ~ 0 >> 1
Which should work like this:
transfer 0 to binary: 0000 0000 (I assume that char is stored on 8
bits)
negate it: 1111 1111 (now I'm out of char max size)
move one place right: 0111 1111 (I get 127 which seems to be correct)
Now I want to present this result using printf function.
Why exactly do I have to use cast like this:
printf("%d\n", (unsigned char)(~0) >> 1)?
I just don't get it. I assume that it has something to do with point 2 when I get out of char range, but I'm not sure.
I will be grateful if you present me more complex explanation to this problem.
Please don't use these kinds of tricks. They might work on ordinary machines but they are possibly unportable and hard to understand. Instead, use the symbolic constants from the header file limits.h which contains the size limits for each of the basic types. For instance, CHAR_MAX is the upper bound for a char, CHAR_MIN is the lower bound. Further limits for the numeric types declared in stddef.h and stdint.h can be found in stdint.h.
Now for your question: Arithmetic is done on values of type int by default, unless you cause the operands involved to have a different type. This happens for various reasons, like one of the variables involved having a different type or you using a iteral of different type (like 1.0 or 1L or 1U). Even more importantly, the type of an arithmetic expression promotes from the inside to the outside. Thus, in the statement
char c = 1 + 2 + 3;
The expression 1 + 2 + 3 is evaluated as type int and only converted to char immediately before assigning. Even more important is that in the C language, you can't do arithmetic on types smaller than int. For instance, in the expression c + 1 where c is of type char, the compiler inserts an implicit conversion from char to int before adding one to c. Thus, a statement like
c = c + 1;
actually behaves like this in C:
c = (char)((int)c + 1);
Thus, ~0 >> 1 actually evaluates to 0xffffffff (-1) on a usual 32 bit architecture because the type int usually has 32 bits and right shifting of signed types usually shifts sign bits so the most significant bit becomes a one. Casting to unsigned char cause truncation, with the result being 0xff (255). All arguments but the first to printf are part of a variable argument list which is a bit complicated but basically means that all types smaller than int are converted to int, float is converted to double and all other types are left unchanged.
Now, how can we get this right? On an ordinary machine with two's complement and no padding bits one could use expressions like these to compute the largest and smallest char, assuming sizeof (char) < sizeof (int):
(1 << CHAR_BIT - 1) - 1; /* largest char */
-(1 << CHAR_BIT - 1); /* smallest char */
For other types, this is going to be slightly more difficult since we need to avoid overflow. Here is an expression that works for all signed integer types on an ordinary machine, where type is the type you want to have the limits of:
(type)(((uintmax_t)1 << sizeof (type) * CHAR_BIT - 1) - 1) /* largest */
(type)-((uintmax_t)1 << sizeof (type) * CHAR_BIT - 1) /* smallest */
For an unsigned type type, you could use this to get the maximum:
~(type)0
Please notice that all these tricks should not appear in portable code.
The exact effect of your actions is different from what you assumed.
0 is not 0000 0000. 0 has type int, which means that it is most likely 0000 0000 0000 0000 0000 0000 0000 0000, depending on how many bits int has on your platform. (I will assume 32-bit int.)
Now, ~0 is, expectedly, 1111 1111 1111 1111 1111 1111 1111 1111, which still has type int and is a negative value.
When you shift it to the right, the result is implementation-defined. Right-shifting negative signed integer values in C does not guarantee that you will obtain 0 in the sign bit. Quite the opposite, most platforms will actually replicate the sign bit when right-shifting. Which means that ~0 >> 1 will still give you 1111 1111 1111 1111 1111 1111 1111 1111.
Note, that even if you do this on a platform that shifts-in a 0 into the sign bit when right-shifting negative values, you will still obtain 0111 1111 1111 1111 1111 1111 1111 1111, which is in general case not the maximum value of char you were trying to obtain.
If you want to make sure that a right-shift operation shifts-in 0 bits from the left, you have to either 1) shift an unsigned bit-pattern or 2) shift a signed, but positive bit-pattern. With negative bit patterns you risk running into the sign-extending behavior, meaning that for negative values 1 bits would be shifted-in from the left instead of 0 bits.
Since C language does not have shifts that would work in the domain of [unsigned/signed] char type (the operand is promoted to int anyway before the shift), what you can do is make sure that you are shifting a positive int value and make sure that your initial bit-mask has the correct number of 1s in it. That is exactly what you achieve by using (unsigned char) ~0 as the initial mask. (unsigned char) ~0 will participate in the shift as a value of type int equal to 0000 0000 0000 0000 0000 0000 1111 1111 (assuming 8-bit char). After the shift you will obtain 0000 0000 0000 0000 0000 0000 0111 1111, which is exactly what you were trying to obtain.
That only works with unsigned integers. For signed integers, right shifting a negative number and the behaviour of in bit-wise inversion is implementation defined. Not only it depends on the representation of negative values, but also what CPU instruction the compiler uses to perform the right-shift (some CPUs do not have arithmetic (right) shift for instance.
So, unless you make additional constraints for your implementation, it is not possible to determine the limits of signed integers. This implies there is no completely portable way (for signed integers).
Note that whether char is signed or unsigned is also implementation defined and that (unsigned char)(~0) >> 1 is subject to integer promotions, so it will not yield a character result, but an int. (which makes the format specifier correct - allthough presumably unintended).
Use limits.h to get macros for your implementation's integer limits. This file has to be provided by any standard-compliant C compiler.
Related
Does anyone know what does &- in C programming?
limit= address+ (n &- sizeof(uint));
This isn't really one operator, but two:
(n) & (-sizeof(uint))
i.e. this is performing a bitwise and operation between n and -sizeof(uint).
What does this mean?
Let's assume -sizeof(uint) is -4 - then by two's complement representation, -sizeof(uint) is 0xFFFFFFFC or
1111 1111 1111 1111 1111 1111 1111 1100
We can see that this bitwise and operation will zero-out the last two bits of n. This effectively aligns n to the lowest multiple of sizeof(uint).
&- is the binary bitwise AND operator written together with - which is the unary minus operator. Operator precedence (binary & having lowest precedence here) gives us the operands of & as n and -sizeof(uint).
The purpose is to create a bit mask in a very obscure way, relying on unsigned integer arithmetic. Assuming uint is 4 bytes (don't use homebrewed types either btw, use stdint.h), then the code is equivalent to this
n & -(size_t)4
size_t being the type returned by sizeof, which is guaranted to be a large, unsigned integer type. Applying unary minus on unsigned types is of course nonsense too. Though even if it is obscure, applying minus on unsigned arithmetic results in well-defined wrap-around1), so in case of the value 4, we get 0xFFFFFFFFFFFFFFFC on a typical PC where size_t is 64 bits.
n & 0xFFFFFFFFFFFFFFFC will mask out everything but the 2 least significant bits.
What the relation between these 2 bits and the size of the type used is, I don't know. I guess that the purpose is to store something equivalent to the type's size in bytes in that area. Something with 4 values will fit in the two least significant bits: binary 0, 1, 10, 11. (The purpose could maybe be masking out misaligned addresses or some such?)
Assuming I guessed correct, we can write the same code without any obfuscation practices as far more readable code:
~(sizeof(uint32_t)-1)
Which gives us 4-1 = 0x3, ~0x3 = 0xFFFF...FC. Or in case of 8 byte types, 0xFFFF...F8. And so on.
So I'd rewrite the code as
#include <stdint.h>
uint32_t mask = ~(sizeof(uint32_t)-1);
limit = address + (n & mask);
1) C17 6.3.1.3
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type60)
Where the foot note 60) says:
The rules describe arithmetic on the mathematical value, not the value of a given type of expression.
In this case repeatedly subtracting SIZE_MAX+1 from 4 until the value is in range of what can fit inside a size_t variable.
I have the following code:
unsigned char chr = 234; // 1110 1010
unsigned long result = 0;
result = chr << 24;
And now result will equal 18446744073340452864, which is 1111 1111 1111 1111 1111 1111 1111 1111 1110 1010 0000 0000 0000 0000 0000 0000 in binary.
Why is there sign extension being done, when chr is unsigned?
Also if I change the shift from 24 to 8 then result is 59904 which is 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 1110 1010 0000 0000 in binary. Why here is there no extension done here? (Any shift 23 or less doesn't have sign extension done to it)
Also on my current platform sizeof(long) is 8.
What are the rules for automatically casting to larger size types when shifting? It seems to me that if the shift is 23 or less than the chr gets casted to an unsigned type and if it's 24 or more it gets casted to a signed type? (And why is sign extension even being done at all with a left shift)
With chr = 234, the expression chr << 24 is evaluated in isolation: chr is promoted to (a 32-bit signed) int and shifted left 24 bits, yielding a negative int value. When you assign to a 64-bit unsigned long, the sign-bit is propagated through the most significant 32 bits of the 64-bit value. Note that the method of calculating chr << 24 is not itself affected by what the value is assigned to.
When the shift is just 8 bits, the result is a positive (32-bit signed) integer, and that sign bit (0) is propagated through the most significant 32-bits of the unsigned long.
To understand this it's easiest to think in terms of values.
Each integral type has a fixed range of representable values. For example, unsigned char usually ranges from 0 to 255 ; other ranges are possible and you can find your compiler's choice by checking UCHAR_MAX in limits.h.
When doing a conversion between integral types; if the value is representable in the destination type, then the result of the conversion is that value. (This may be a different bit-pattern, e.g. sign extension).
If the value is not representable in the destination type then:
for signed destinations, the behaviour is implementation-defined (which may include raising a signal).
for unsigned destinations, the value is adjusted modulo the maximum value representable in the type, plus one.
Modern systems handle the signed out-of-range assignment by left-truncating excessive bits; and if it is still out-of-range then it retains the same bit-pattern, but the value changes to whatever value that bit-pattern represents in the destination type.
Moving onto your actual example.
In C, there is something called the integral promotions. With <<, this happens to the left-hand operand; with the arithmetic operators it happens to all operands. The effect of integral promotions is that any value of a type smaller than int is converted to the same value with type int.
Further, the definition of << 24 is multiplication by 2^24 (where this has the type of the promoted left operand), with undefined behaviour if this overflows. (Informally: shifting into the sign bit causes UB).
So, putting all the conversions explicitly, your code is
result = (unsigned long) ( ((int)chr) * 16777216 )
Now, the result of this calculation is 3925868544 , which if you are on a typical system with 32-bit int, is greater than INT_MAX which is 2147483647, so the behaviour is undefined.
If we want to explore results of this undefined behaviour on typical systems: what may happen is the same procedure I outlined earlier for out-of-range assignment. The bit-pattern of 3925868544 is of course 1110 1010 0000 0000 0000 0000 0000 0000. Treating this as the pattern of an int using 2's complement gives the int -369098752.
Finally we have the conversion of this value to unsigned long. -369098752 is out of range for unsigned long; and the rule for an unsigned destination is to adjust the value modulo ULONG_MAX+1. So the value you are seeing is 18446744073709551615 + 1 - 369098752.
If your intent was to do the calculation in unsigned long precision, you need to make one of the operands unsigned long; e.g. do ((unsigned long)chr) << 24. (Note: 24ul won't work, the type of the right-hand operand of << or >> does not affect the left-hand operand).
We can assume an int is 32 bits in 2's compliment
The only Legal operators are: ! ~ & ^ | + << >>
At this point i am using brute force
int a=0x01;
x=(x+1)>>1; //(have tried with just x instead of x+1 as well)
a = a+(!(!x));
...
with the last 2 statements repeated 32 times. This adds 1 to a everytime x is shifted one place and != 0 for all 32 bits
Using the test compiler it says my method fails on test case 0x7FFFFFFF (a 0 followed by 31 1's) and says this number requires 32 bits to represent. I dont see why this isnt 31 (which my method computes) Can anyone explain why? And what i need to change to account for this?
0x7FFFFFFF does require 32 bits. It could be expressed as an unsigned integer in only 31 bits:
111 1111 1111 1111 1111 1111 1111 1111
but if we interpret that as a signed integer using two's complement, then the leading 1 would indicate that it's negative. So we have to prepend a leading 0:
0 111 1111 1111 1111 1111 1111 1111 1111
which then makes it 32 bits.
As for what you need to change — your current program actually has undefined behavior. If 0x7FFFFFFF (231-1) is the maximum allowed integer value, then 0x7FFFFFFF + 1 cannot be computed. It is likely to result in -232, but there's absolutely no guarantee: the standard allow compilers to do absolutely anything in this case, and real-world compilers do in fact perform optimizations that can happen to give shocking results when you violate this requirement. Similarly, there's no specific guarantee what ... >> 1 will mean if ... is negative, though in this case compilers are required, at least, to choose a specific behavior and document it. (Most compilers choose to produce another negative number by copying the leftmost 1 bit, but there's no guarantee of that.)
So really the only sure fix is either:
to rewrite your code as a whole, using an algorithm that doesn't have these problems; or
to specifically check for the case that x is 0x7FFFFFFF (returning a hardcoded 32) and the case that x is negative (replacing it with ~x, i.e. -(x+1), and proceeding as usual).
Please try this code to check whether a signed integer x can be fitted into n bits. The function returns 1 when it does and 0 otherwise.
// http://www.cs.northwestern.edu/~wms128/bits.c
int check_bits_fit_in_2s_complement(signed int x, unsigned int n) {
int mask = x >> 31;
return !(((~x & mask) + (x & ~mask))>> (n + ~0));
}
Assume I have the variable x initialized to 425. In binary, that is 110101001.
Bitshifting it to the right by 2 as follows: int a = x >> 2;, the answer is: 106. In binary that is 1101010. This makes sense as the two right-most bits are dropped and two zero's are added to the left side.
Bitshifting it to the left by 2 as follows: int a = x << 2;, the answer is: 1700. In binary this is 11010100100. I don't understand how this works. Why are the two left most bits preserved? How can I drop them?
Thank you,
This is because int is probably 32-bits on your system. (Assuming x is type int.)
So your 425, is actually:
0000 0000 0000 0000 0000 0001 1010 1001
When left-shifted by 2, you get:
0000 0000 0000 0000 0000 0110 1010 0100
Nothing gets shifted off until you go all the way past 32. (Strictly speaking, overflow of signed-integer is undefined behavior in C/C++.)
To drop the bits that are shifted off, you need to bitwise AND against a mask that's the original length of your number:
int a = (425 << 2) & 0x1ff; // 0x1ff is for 9 bits as the original length of the number.
First off, don't shift signed integers. The bitwise operations are only universally unambiguous for unsigned integral types.
Second, why shift if you can use * 4 and / 4?
Third, you only drop bits on the left when you exceed the size of the type. If you want to "truncate on the left" mathematically, perform a modulo operation:
(x * 4) % 256
The bitwise equivalent is AND with a bit pattern: (x << 2) & 0xFF
(That is, the fundamental unsigned integral types in C are always implicitly "modulo 2n", where n is the number of bits of the type.)
Why would you expect them to be dropped? Your int (probably) consumes 4 bytes. You're shifting them into a space that it rightfully occupies.
The entire 4-byte space in memory is embraced during evaluation. You'd need to shift entirely out of that space in memory to "drop" them.
Consider these definitions:
int x=5;
int y=-5;
unsigned int z=5;
How are they stored in memory? Can anybody explain the bit representation of these in memory?
Can int x=5 and int y=-5 have same bit representation in memory?
ISO C states what the differences are.
The int data type is signed and has a minimum range of at least -32767 through 32767 inclusive. The actual values are given in limits.h as INT_MIN and INT_MAX respectively.
An unsigned int has a minimal range of 0 through 65535 inclusive with the actual maximum value being UINT_MAX from that same header file.
Beyond that, the standard does not mandate twos complement notation for encoding the values, that's just one of the possibilities. The three allowed types would have encodings of the following for 5 and -5 (using 16-bit data types):
two's complement | ones' complement | sign/magnitude
+---------------------+---------------------+---------------------+
5 | 0000 0000 0000 0101 | 0000 0000 0000 0101 | 0000 0000 0000 0101 |
-5 | 1111 1111 1111 1011 | 1111 1111 1111 1010 | 1000 0000 0000 0101 |
+---------------------+---------------------+---------------------+
In two's complement, you get a negative of a number by inverting all bits then adding 1.
In ones' complement, you get a negative of a number by inverting all bits.
In sign/magnitude, the top bit is the sign so you just invert that to get the negative.
Note that positive values have the same encoding for all representations, only the negative values are different.
Note further that, for unsigned values, you do not need to use one of the bits for a sign. That means you get more range on the positive side (at the cost of no negative encodings, of course).
And no, 5 and -5 cannot have the same encoding regardless of which representation you use. Otherwise, there'd be no way to tell the difference.
As an aside, there are currently moves underway, in both C and C++ standards, to nominate two's complement as the only encoding for negative integers.
Because it's all just about memory, in the end all the numerical values are stored in binary.
A 32 bit unsigned integer can contain values from all binary 0s to all binary 1s.
When it comes to 32 bit signed integer, it means one of its bits (most significant) is a flag, which marks the value to be positive or negative.
The C standard specifies that unsigned numbers will be stored in binary. (With optional padding bits). Signed numbers can be stored in one of three formats: Magnitude and sign; two's complement or one's complement. Interestingly that rules out certain other representations like Excess-n or Base −2.
However on most machines and compilers store signed numbers in 2's complement.
int is normally 16 or 32 bits. The standard says that int should be whatever is most efficient for the underlying processor, as long as it is >= short and <= long then it is allowed by the standard.
On some machines and OSs history has causes int not to be the best size for the current iteration of hardware however.
Here is the very nice link which explains the storage of signed and unsigned INT in C -
http://answers.yahoo.com/question/index?qid=20090516032239AAzcX1O
Taken from this above article -
"process called two's complement is used to transform positive numbers into negative numbers. The side effect of this is that the most significant bit is used to tell the computer if the number is positive or negative. If the most significant bit is a 1, then the number is negative. If it's 0, the number is positive."
Assuming int is a 16 bit integer (which depends on the C implementation, most are 32 bit nowadays) the bit representation differs like the following:
5 = 0000000000000101
-5 = 1111111111111011
if binary 1111111111111011 would be set to an unsigned int, it would be decimal 65531.