C find maximum two's complement integer - c

I am tasked with finding maximum two's complement integer, or the TMax. I am at a complete loss for how to do this. I know that the correct value is 0x7fffffff, or 2147483647, but I do not know how exactly to get to this result. That's the maximum number for a 32 bit integer. I cannot use functions or conditionals, and at most I can use 4 operations. Can anyone try and help explain this to me? I know the way to find the maximum number for a certain bit count is 2^(bits - 1) - 1, so 2^(31) - 1 = 2147483647

Assuming you know that your machine uses two's complement representation, this is how you would do so in a standard compliant manner:
unsigned int x = ~0u;
x >>= 1;
printf("max int = %d\n", (int)x);
By using an unsigned int, you prevent any implementation defined behavior caused by right shifting a negative value.

find maximum two's complement integer
int TMax = -1u >> 1 or -1u/2 is sufficient when INT_MAX == UINT_MAX/2 to find the maximum int,
This "works" even if int is encoded as 2's complement or the now rare 1s complement or sign magnitude.
Better to use
#include <limits.h>
int TMax = INT_MAX;
Other tricks can involve undefined, implementation defined, unspecified behavior which are best avoided in C.

There are two scenarios in which you may be looking for the maximum positive number, either given an integer data type or given a number of bits. The are also two solutions.
Fill and shift right
Working in an integer data type of a size that exactly matches the size of the desired twos complement data type, you might be able to solve the problem by
(unsigned 'type') ^0)>>1
or equivalently,
(unsigned 'type') ^0)/2.
For example, on a machine where short is 16 bits,
(unsigned short) ^0 ==> 0xFFFF (65535)
((unsigned short) ^0 ) >> 1 ==> 0x7FFF (32767)
On a 32 bit data type, this method gives us 0x7FFFFFFF (2147483647).
In C, an integer type has a minimum size only, c.f. an int can be 16 bits, 32 bits, or larger. But, the word size used in the calculation must exactly match that of the intended target.
Also, note that the data must be an unsigned type. The right shift for a signed type is usually implemented as a sign extended shift (the sign bit is copied into the result).
Set the sign bit only and subtract 1
The second technique, which works for any word size equal to or larger than the number of bits of the desired twos complement word size, is
(unsigned integer_type) 1<<(n-1)-1
For example, in any integer word size greater to or larger than 16, we can find the TMAX for 16 as
(unsigned integer_type) 1<<15 ==> binary 1000 0000 0000 0000 (0x8000)
(unsigned integer_type) (1<<15 - 1) == > 0111 1111 1111 1111 (0x7FFF)
This is robust and works on almost any scenario that provides adequate word size.
Again the data type for the calculation has to be unsigned if the word size in the calculation is that of the target. This is not necessary for a larger word size.
Examples
In the first example, we show that the second method works for 32 bits, using long or long long types.
#include <stdio.h>
int main() {
printf( "%ld\n", (long) ( ( ((unsigned long) 1)<<31 ) - 1 ) );
printf( "%lld\n", (long long) ( ( ((unsigned long long) 1)<<31 ) - 1 ) );
}
Output:
2147483647
2147483647
And here we show that the first method, shift right from all bits set, fails when int is not exactly 32 bits, which as noted, is not guaranteed in C.
#include <stdio.h>
int main() {
printf( "from long long %lld (%zu bits)\n", ( (unsigned long long) ~0 )>>1,
sizeof(unsigned long long)*8 );
printf( "from long %ld (%zu bits)\n", ( (unsigned long) ~0 )>>1,
sizeof(unsigned long)*8 );
printf( "from int %d (%zu bits)\n", ( (unsigned int) ~0 )>>1,
sizeof(unsigned int)*8 );
printf( "from short %d (%zu bits)\n", ( (unsigned short) ~0 )>>1,
sizeof(unsigned short)*8 );
}
Output:
from long long 9223372036854775807 (64 bits)
from long 9223372036854775807 (64 bits)
from int 2147483647 (32 bits)
from short 32767 (16 bits)
Again, recall that the C language only guarantees a minimum size for any integer data types. An int can be 16 bits or 32 bits or larger, depending on your platform.

Thanks for the help, everyone! Turns out I cannot use macros, unsigned, or longs. I came to this solution:
~(1 << 31)
That generates the correct output, so I will leave it at that!

Related

What does data type range really means?

I have been reading some definitions on the internet that usually says that for example:
" ...Since on most computers “int” data type is of 2 bytes, or 16 bits, it can only store 2^16 numbers..."
and
" ... And since 2^16=65535, it can only hold that many numbers ... " - for unsigned int
I've also seen on some website that the maximum value that an int variable can hold is "2,147,483,647". Then I've been wondering a bit hard the relation between the number 65535 and the number 2,147,483,647.
I did some tests, and I saw that the maximum value that I can store is actually 2,147,483,647
in a int variable, so, what 65535 actually means then?
link: https://www.quora.com/The-range-of-int-data-type-Is-32768-to-32767-What-does-this-actually-mean-What-range-of-numbers-I-can-store-in-int
Over the history of computers, byte and word sizes have varied considerably; you don't always had a neat system of 8-bit bytes, 16-bit words, 32-bit longwords, etc. When C was being developed in the early 1970s, you had systems with 9-bit bytes and 36-bit words, systems that weren't byte-addressed at all, word sizes in excess of 40 bits, etc. Similarly, some systems had padding or guard bits that didn't contribute to representing the value - you could have an 18-bit type that could still only represent 216 values. Making all word sizes powers of 2 is convenient, but it isn't required.
Because the situation was somewhat variable, the C language standard only specifies the minimum range of values that a type must be able to represent. signed char must be able to represent at least the range -127...127, so it must be at least 8 bits wide. A short must be able to represent at least the range -32767...32767, so it must be at least 16 bits wide, etc. Also, representation of signed integers varied as well - two's complement is the most common, but you also had sign-magnitude and ones' complement representations, which encode two values for zero (positive and negative) - that's why the ranges don't go from -2N-1 to 2N-1-1. The individual implementations then map those ranges onto the native word sizes provided by the hardware.
Now, it's not an accident that those particular ranges were specified - most hardware was already using 8-bit bytes, 16-bit words, 32-bit longwords, etc. Many of C's abstractions (including type sizes and behavior) are based on what the hardware already provides.
int is somewhat special - it's only required to represent at least the range -32767...32767, but it's also commonly set to be the same as the native word size, which since the late '80s has been 32 bits on most platforms.
To see what the actual ranges are on your platform, you can look at the macros defined in <limits.h>. Here's a little program I womped up to show what some of the size definitions are on my system:
#include <stdio.h>
#include <limits.h>
#define EXP(x) #x
#define STR(x) EXP(x)
#define DISPL(t,m) printf( "%30s = %2zu, %15s = %35s\n", "sizeof(" #t ")", sizeof(t), #m, STR(m) )
#define DISPL2(t,m1,m2) printf( "%30s = %2zu, %15s = %35s, %15s = %35s\n", "sizeof(" #t ")", sizeof(t), #m1, STR(m1), #m2, STR(m2) )
int main( void )
{
DISPL(char, CHAR_BIT);
DISPL2(char, CHAR_MIN, CHAR_MAX);
DISPL2(signed char, SCHAR_MIN, SCHAR_MAX);
DISPL(unsigned char, UCHAR_MAX);
DISPL2(short, SHRT_MIN, SHRT_MAX);
DISPL(unsigned short, USHRT_MAX);
DISPL2(int, INT_MIN, INT_MAX);
DISPL(unsigned int, UINT_MAX );
DISPL2(long, LONG_MIN, LONG_MAX );
DISPL(unsigned long, ULONG_MAX );
DISPL2(long long, LLONG_MIN, LLONG_MAX );
DISPL(unsigned long long, ULLONG_MAX );
return 0;
}
And here's the result:
$ ./sizes
sizeof(char) = 1, CHAR_BIT = 8
sizeof(char) = 1, CHAR_MIN = (-127-1), CHAR_MAX = 127
sizeof(signed char) = 1, SCHAR_MIN = (-127-1), SCHAR_MAX = 127
sizeof(unsigned char) = 1, UCHAR_MAX = (127*2 +1)
sizeof(short) = 2, SHRT_MIN = (-32767 -1), SHRT_MAX = 32767
sizeof(unsigned short) = 2, USHRT_MAX = (32767 *2 +1)
sizeof(int) = 4, INT_MIN = (-2147483647 -1), INT_MAX = 2147483647
sizeof(unsigned int) = 4, UINT_MAX = (2147483647 *2U +1U)
sizeof(long) = 8, LONG_MIN = (-9223372036854775807L -1L), LONG_MAX = 9223372036854775807L
sizeof(unsigned long) = 8, ULONG_MAX = (9223372036854775807L *2UL+1UL)
sizeof(long long) = 8, LLONG_MIN = (-9223372036854775807LL-1LL), LLONG_MAX = 9223372036854775807LL
sizeof(unsigned long long) = 8, ULLONG_MAX = (9223372036854775807LL*2ULL+1ULL)
The size of an int is not necessarily the same on all implementations.
The C standard dictates that the range of an int must be at least -32767 to 32767, but it can be more. On most systems you're likely to come in contact with, an int will have range -2,147,483,648 to 2,147,483,647 i.e. 32-bit two's complement representation.
16 bits can have a certain number of unique combinations (see here for an explanation of this). To figure out the number, you just need to raise 2 to the power of the number of bits. 2^16 is 65536. Since counting starts at zero, that means that 65535 is the maximum. That's for an unsigned integer though.
A signed integer uses one of the bits to determine if the number is positive or negative (this is called Two's Complement). Meaning there are only 15 bits with which to express the number. 2^15 is 32768, meaning that in the positive direction, the number can go from 0 to 32767. In the negative direction, the lowest the number can go is -32768. The total number of combinations is still 65536, but the bits just means something different in the case of a signed 16-bit integer.
With 32-bit integers, the logic is exactly the same. 2^32 is 4294967296, meaning the highest that a 32-bit number can go is 4294967295. There are exactly 2^32 combinations possible, but because it starts at zero, the highest number is exactly one less than 2^32. But if you reserve one bit for the sign, that means you can only go up to 2^31, minus one, which is 2147483648. So, using the same logic as for 16-bit numbers, we can figure out that the highest a signed 32-bit integer can go is 2147483647, and the lowest it can go is -2147483648.
As for which data types use 8, 16, 32 or 64 bits, that's platform-dependent. An int these days is almost always 32 bits, but that's not always been the case. If you want a data type to have a guaranteed size, you have to specifically choose one with an explicit size. In C, this can be done with the types defined in stdint.h. For instance, uint32_t would be an unsigned integer with 32 bits, meaning that on any platform, you can guarantee that the highest it can go is 4294967295.

Is there a generic "isolate a single byte" bit mask for all systems, irrespective of CHAR_BIT?

If CHAR_BIT == 8 on your target system (most cases), it's very easy to mask out a single byte:
unsigned char lsb = foo & 0xFF;
However, there are a few systems and C implementations out there where CHAR_BIT is neither 8 nor a multiple thereof. Since the C standard only mandates a minimum range for char values, there is no guarantee that masking with 0xFF will isolate an entire byte for you.
I've searched around trying to find information about a generic "byte mask", but so far haven't found anything.
There is always the O(n) solution:
unsigned char mask = 1;
size_t i;
for (i = 0; i < CHAR_BIT; i++)
{
mask |= (mask << i);
}
However, I'm wondering if there is any O(1) macro or line of code somewhere that can accomplish this, given how important this task is in many system-level programming scenarios.
The easiest way to extract an unsigned char from an integer value is simply to cast it to unsigned char:
(unsigned char) SomeInteger
Per C 2018 6.3.1.3 2, the result is the remainder of SomeInteger modulo UCHAR_MAX+1. (This is a non-negative remainder; it is always adjusted to be greater than or equal to zero and less than UCHAR_MAX+1.)
Assigning to an unsigned char has the same effect, as assignment performs a conversion (and initializing works too):
unsigned char x;
…
x = SomeInteger;
If you want an explicit bit mask, UCHAR_MAX is such a mask. This is so because unsigned integers are pure binary in C, and the maximum value of an unsigned integer has all value bits set. (Unsigned integers in general may also have padding bit, but unsigned char may not.)
One difference can occur in very old or esoteric systems: If a signed integer is represented with sign-and-magnitude or one’s complement instead of today’s ubiquitous two’s complement, then the results of extracting an unsigned char from a negative value will differ depending on whether you use the conversion method or the bit-mask method.
On review (after accept) , #Eric Postpischil answer's part about UCHAR_MAX makes for a preferable mask.
#define BYTE_MASK UCHAR_MAX
The value UCHAR_MAX shall equal 2CHAR_BIT − 1. C11dr §5.2.4.2.1 2
As unsigned char cannot have padding. So UCHAR_MAX is always the all bits set pattern in a character type and hence in a C "byte".
some_signed & some_unsigned is a problem on non-2's complement as the some_signed is convert to unsigned before the & thus changing the bit pattern on negative vales. To avoid, the all ones mask needs to be signed when masking signed types. The is usually the case with foo & UINT_MAX
Conclusion
Assume: foo is of some integer type.
If only 2's complement is of concern, use a cast - it does not change the bit pattern.
unsigned char lsb = (unsigned char) foo;
Otherwise with any integer encoding and CHAR_MAX <= INT_MAX
unsigned char lsb = foo & UCHAR_MAX;
Otherwise TBD
Shifting an unsigned 1 by CHAR_BIT and then subtracting 1 will work even on esoteric non-2's complement systems. #Some programmer dude. Be sure to use unsigned math.
On such systems, this preserves the bit patten unlike (unsigned char) cast on negative integers.
unsigned char mask = (1u << CHAR_BIT) - 1u;
unsigned char lsb = foo & mask;
Or make a define
#define BYTE_MASK ((1u << CHAR_BIT) - 1u)
unsigned char lsb = foo & BYTE_MASK;
To also handle those pesky cases where UINT_MAX == UCHAR_MAX where 1u << CHAR_BIT would be UB, shift in 2 steps.
#define BYTE_MASK (((1u << (CHAR_BIT - 1)) << 1u) - 1u)
UCHAR_MAX does not have to be equal to (1U << CHAR_BIT) - 1U
you need actually to and with that calculated value not with the UCHAR_MAX
value & ((1U << CHAR_BIT) - 1U).
Many real implementations (for example TI) define UCHAR_MAX as 255 and emit the code which behaves like the one on the machines having 8 bits bytes. It is done to preserve compatibility with the code written for other targets.
For example
unsigned char x;
x++;
will generate the code which checks in the value of x is larger than UCHAR_MAX and if it the truth zeroing the 'x'

set most significant bit in C

I am trying to set the most significant bit in a long long unsigned, x.
To do that I am using this line of code:
x |= 1<<((sizeof(x)*8)-1);
I thought this should work, because sizeof gives size in bytes, so I multiplied by 8 and subtract one to set the final bit. Whenever I do that, the compiler has this warning: "warning: left shift count >= width of type"
I don't understand why this error is occurring.
The 1 that you are shifting is a constant of type int, which means that you are shifting an int value by sizeof(unsigned long long) * 8) - 1 bits. This shift can easily be more than the width of int, which is apparently what happened in your case.
If you want to obtain some bit-mask mask of unsigned long long type, you should start with an initial bit-mask of unsigned long long type, not of int type.
1ull << (sizeof(x) * CHAR_BIT) - 1
An arguably better way to build the same mask would be
~(-1ull >> 1)
or
~(~0ull >> 1)
use 1ULL << instead of 1 <<
Using just "1" makes you shift an integer. 1ULL will be an unsigned long long which is what you need.
An integer will probably be 32 bits and long long probably 64 bits wide. So shifting:
1 << ((sizeof(long long)*8)-1)
will be (most probably):
1 << 63
Since 1 is an integer which is (most probably) 32 bits you get a warning because you are trying to shift past the MSB of a 32 bit value.
The literal 1 you are shifting is not automatically an unsigned long long (but an int) and thus does not have as many bits as you need. Suffix it with ULL (i.e., 1ULL), or cast it to unsigned long long before shifting to make it the correct type.
Also, to be a bit safer for strange platforms, replace 8 with CHAR_BIT. Note that this is still not necessarily the best way to set the most significant bit, see, e.g., this question for alternatives.
You should also consider using a type such as uint64_t if you're assuming unsigned long long to be a certain width, or uint_fast64_t/uint_least64_t if you need at least a certain width, or uintmax_t if you need the largest available type.
Thanks to the 2's complement representation of negative integers, the most-negative interger is exactly the desired bit pattern with only the MSB set. So x |= (unsigned long long )LONG_LONG_MIN; should work too.

Difference between unsigned int and int

I read about twos complement on wikipedia and on stack overflow, this is what I understood but I'm not sure if it's correct
signed int
the left most bit is interpreted as -231 and this how we can have negative numbers
unsigned int
the left most bit is interpreted as +231 and this is how we achieve large positive numbers
update
What will the compiler see when we store 3 vs -3?
I thought 3 is always 00000000000000000000000000000011
and -3 is always 11111111111111111111111111111101
example for 3 vs -3 in C:
unsigned int x = -3;
int y = 3;
printf("%d %d\n", x, y); // -3 3
printf("%u %u\n", x, y); // 4294967293 3
printf("%x %x\n", x, y); // fffffffd 3
Two's complement is a way to represent negative integers in binary.
First of all, here's a standard 32-bit integer ranges:
Signed = -(2 ^ 31) to ((2 ^ 31) - 1)
Unsigned = 0 to ((2 ^ 32) - 1)
In two's complement, a negative is represented by inverting the bits of its positive equivalent and adding 1:
10 which is 00001010 becomes -10 which is 11110110 (if the numbers were 8-bit integers).
Also, the binary representation is only important if you plan on using bitwise operators.
If your doing basic arithmetic, then this is unimportant.
The only time this may give unexpected results outside of the aforementioned times is getting the absolute value of the signed version of -(2 << 31) which will always give a negative.
Your problem does not have to do with the representation, but the type.
A negative number in an unsigned integer is represented the same, the difference is that it becomes a super high number since it must be positive and the sign bit works as normal.
You should also realize that ((2^32) - 5) is the exact same thing as -5 if the value is unsigned, etc.
Therefore, the following holds true:
unsigned int x = (2 << 31) - 5;
unsigned int y = -5;
if (x == y) {
printf("Negative values wrap around in unsigned integers on underflow.");
}
else {
printf( "Unsigned integer underflow is undefined!" );
}
The numbers don't change, just the interpretation of the numbers. For most two's complement processors, add and subtract do the same math, but set a carry / borrow status assuming the numbers are unsigned, and an overflow status assuming the number are signed. For multiply and divide, the result may be different between signed and unsigned numbers (if one or both numbers are negative), so there are separate signed and unsigned versions of multiply and divide.
For 32-bit integers, for both signed and unsigned numbers, n-th bit is always interpreted as +2n.
For signed numbers with the 31th bit set, the result is adjusted by -232.
Example:
1111 1111 1111 1111 1111 1111 1111 11112 as unsigned int is interpreted as 231+230+...+21+20. The interpretation of this as a signed int would be the same MINUS 232, i.e. 231+230+...+21+20-232 = -1.
(Well, it can be said that for signed numbers with the 31th bit set, this bit is interpreted as -231 instead of +231, like you said in the question. I find this way a little less clear.)
Your representation of 3 and -3 is correct: 3 = 0x00000003, -3 + 232 = 0xFFFFFFFD.
Yes, you are correct, allow me to explain a bit further for clarification purposes.
The difference between int and unsigned int is how the bits are interpreted. The machine processes unsigned and signed bits the same way, but there are extra bits added for signing. Two's complement notation is very readable when dealing with related subjects.
Example:
The number 5's, 0101, inverse is 1011.
In C++, it's depends when you should use each data type. You should use unsigned values when functions or operators return those values. ALUs handle signed and unsigned variables very similarly.
The exact rules for writing in Two's complement is as follows:
If the number is positive, count up to 2^(32-1) -1
If it is 0, use all zeroes
For negatives, flip and switch all the 1's and 0's.
Example 2(The beauty of Two's complement):
-2 + 2 = 0 is displayed as 0010 + 1110; and that is 10000. With overflow at the end, we have our result as 0000;

Tilde C unsigned vs signed integer

For example:
unsigned int i = ~0;
Result: Max number I can assign to i
and
signed int y = ~0;
Result: -1
Why do I get -1? Shouldn't I get the maximum number that I can assign to y?
Both 4294967295 (a.k.a. UINT_MAX) and -1 have the same binary representation of 0xFFFFFFFF or 32 bits all set to 1. This is because signed numbers are represented using two's complement. A negative number has its MSB (most significant bit) set to 1 and its value determined by flipping the rest of the bits, adding 1 and multiplying by -1. So if you have the MSB set to 1 and the rest of the bits also set to 1, you flip them (get 32 zeros), add 1 (get 1) and multiply by -1 to finally get -1.
This makes it easier for the CPU to do the math as it needs no special exceptions for negative numbers. For example, try adding 0xFFFFFFFF (-1) and 1. Since there is only room for 32 bits, this will overflow and the result will be 0 as expected.
See more at:
http://en.wikipedia.org/wiki/Two%27s_complement
unsigned int i = ~0;
Result: Max number I can assign to i
Usually, but not necessarily. The expression ~0 evaluates to an int with all (non-padding) bits set. The C standard allows three representations for signed integers,
two's complement, in which case ~0 = -1 and assigning that to an unsigned int results in (-1) + (UINT_MAX + 1) = UINT_MAX.
ones' complement, in which case ~0 is either a negative zero or a trap representation; if it's a negative zero, the assignment to an unsigned int results in 0.
sign-and-magnitude, in which case ~0 is INT_MIN == -INT_MAX, and assigning it to an unsigned int results in (UINT_MAX + 1) - INT_MAX, which is 1 in the unlikely case that unsigned int has a width (number of value bits for unsigned integer types, number of value bits + 1 [for the sign bit] for signed integer types) smaller than that of int and 2^(WIDTH - 1) + 1 in the common case that the width of unsigned int is the same as the width of int.
The initialisation
unsigned int i = ~0u;
will always result in i holding the value UINT_MAX.
signed int y = ~0;
Result: -1
As stated above, only if the representation of signed integers uses two's complement (which nowadays is by far the most common representation).
~0 is just an int with all bits set to 1. When interpreted as unsigned this will be equivalent to UINT_MAX. When interpreted as signed this will be -1.
Assuming 32 bit ints:
0 = 0x00000000 = 0 (signed) = 0 (unsigned)
~0 = 0xffffffff = -1 (signed) = UINT_MAX (unsigned)
Paul's answer is absolutely right. Instead of using ~0, you can use:
#include <limits.h>
signed int y = INT_MAX;
unsigned int x = UINT_MAX;
And now if you check values:
printf("x = %u\ny = %d\n", UINT_MAX, INT_MAX);
you can see max values on your system.
No, because ~ is the bitwise NOT operator, not the maximum value for type operator. ~0 corresponds to an int with all bits set to 1, which, interpreted as an unsigned gives you the max number representable by an unsigned, and interpreted as a signed int, gives you -1.
You must be on a two's complement machine.
Look up http://en.wikipedia.org/wiki/Two%27s_complement, and learn a little about Boolean algebra, and logic design. Also learning how to count in binary and addition and subtraction in binary will explain this further.
The C language used this form of numbers so to find the largest number you need to use 0x7FFFFFFF. (where you use 2 FF's for each byte used and the leftmost byte is a 7.) To understand this you need to look up hexadecimal numbers and how they work.
Now to explain the unsigned equivalent. In signed numbers the bottom half of numbers are negative (0 is assumed positive so negative numbers actually count 1 higher than positive numbers). Unsigned numbers are all positive. So in theory your highest number for a 32 bit int is 2^32 except that 0 is still counted as positive so it's actually 2^32-1, now for signed numbers half those numbers are negative. which means we divide the previous number 2^32 by 2, since 32 is an exponent we get 2^31 numbers on each side 0 being positive means the range of an signed 32 bit int is (-2^31, 2^31-1).
Now just comparing ranges:
unsigned 32 bit int: (0, 2^32-1)
signed 32 bit int: (-2^31, 2^32-1)
unsigned 16 bit int: (0, 2^16-1)
signed 16 bit int: (-2^15, 2^15-1)
you should be able to see the pattern here.
to explain the ~0 thing takes a bit more, this has to do with subtraction in binary. it's just adding 1 and flipping all the bits then adding the two numbers together. C does this for you behind the scenes and so do many processors (including the x86 and x64 lines of processors.)
Because of this it's best to store negative numbers as though they are counting down, and in two's complement the added 1 is also hidden. Because 0 is assumed positive thus negative numbers can't have a value for 0, so they automatically have -1 (positive 1 after the bit flip) added to them. when decoding negative numbers we have to account for this.

Resources