When adding two short numbers

When adding two short numbers - c

Really simple question here. I have a really simple program for adding two numbers and printing out the sum of those numbers (below). When running the program, it works as expected and prints out 40 000 for 20 000 + 20 000. But when I change int a, b and sum to short a,b and sum, I get -25 536 as an answer. Anyone who can explain why this happens? I have an idea, but would love to hear it from someone who knows it. Thanks for reading.
int a, b, sum;
a = 20000; b = 20000; sum = a+b;
printf("%d + %d = %d\n", a, b, sum);

On your system, short is presumably 16 bits, so the range of values is -32768 to 32767. 20000 + 20000 is larger that the maximum value, so this causes overflow, which results in undefined behavior.
If you change to unsigned short, the range becomes 0 to 65525, and the addition will work. In addition, overflow is well-defined with unsigned integers, it simply wraps around using modular arithmetic, e.g. (unsigned short)65535 + 2 = 1.

The maximum value of a signed short is 32767
In binary, this is a 16-bit number, rather than the 32 bit number (as is the case with ints). Because it's signed, it is represented as follows:
0 11111 11111 11111
If you add 1 to that, it becomes
1 00000 00000 00000
Which is back to -32768
You probably get the idea.

Related

Overflow and underflow in unsigned integers

Suppose, I'm trying to subtract 2 unsigned integers:
247 = 1111 0111
135 = 1000 0111
If we subtract these 2 binary numbers we get = 0111 0000
Is this a underflow, since we only need 7 bits now?? Or how does that work??

Underflow in unsigned subtraction c = a - b occurs whenever b is larger than a.
However, that's somewhat of a circular definition, because how many kinds of machines perform the a < b comparison is by subtracting the operands using wraparound arithmetic, and then detecting the overflow based on the two operands and the result.
Note also that in C we don't speak about "overflow", because there is no error condition: C unsigned integers provide that wraparound arithmetic that is commonly found in hardware.
So, given that we have the wraparound arithmetic, we can detect whether wraparound (or overflow, depending on point of view) has taken place in a subtraction.
What we need is the most significant bits from a, b and c. Let's call them A, B and C. From these, the overflow V is calculated like this:
A B C | V
------+--
0 0 0 | 0
0 0 1 | 1
0 1 0 | 1
0 1 1 | 1
1 0 0 | 0
1 0 1 | 0
1 1 0 | 0
1 1 1 | 1
This simplifies to
A'B + A'C + BC
In other words, overflow in the unsigned subtraction c = a - b happens whenever:
the msb of a is 0 and that of b is 1;
or the msb of a is 0 and that of c is 1;
or the msb of b is 1 and that of c is also 1.
Subtracting 247 - 135 = 112 is clearly not overflow, since 247 is larger than 135. Applying the rules above, A = 1, B = 0 and C = 0. The 1 1 0 row of the table has a 0 in the V column: no overflow.

Generally, “underflow” means the ideal mathematical result of a calculation is below what the type can represent. If 7 is subtracted from 5 in unsigned arithmetic, the ideal mathematical result would be −2, but an unsigned type cannot represent −2, so the operation underflows. Or, in an eight-bit signed type that can represent numbers from −128 to +127, subtracting 100 from −100 would ideally produce −200, but this cannot be represented in the type, so the operation underflows.
In C, unsigned arithmetic is said not to underflow or overflow because the C standard defines the operations to be performed using modulo arithmetic instead of real-number arithmetic. For example, with 32-bit unsigned arithmetic, subtracting 7 from 5 would produce 4,294,967,294 (in hexadecimal, FFFFFFFE16), because it has wrapped modulo 232 = 4,294,967,296. People may nonetheless use the terms “underflow” or “overflow” when discussing these operations, intended to refer to the mathematical issues rather than the defined C behavior.
In other words, for whatever type you are using for arithmetic there is some lower limit L and some upper limit U that the type can represent. If the ideal mathematical result of an operation is less than L, the operation underflows. If the ideal mathematical result of an operation is greater than U, the operation overflows. “Underflow” and “overflow” mean the operation has gone out of the bounds of the type. “Overflow” may also be used to refer to any exceeding of the bounds of the type, including in the low direction.
It does not mean that fewer bits are needed to represent the result. When 100001112 is subtracted from 111101112, the result, 011100002 = 11100002, is within bounds, so there is no overflow or underflow. The fact that it needs fewer bits to represent is irrelevant.
(Note: For integer arithmetic, “underflow” or “overflow” is defined relative to the absolute bounds L and U. For floating-point arithmetic, these terms have somewhat different meanings. They may be defined relative to the magnitude of the result, neglecting the sign, and they are defined relative to the finite non-zero range of the format. A floating-point format may be able to represent 0, then various finite non-zero numbers, then infinity. Certain results between 0 and the smallest non-zero number the format can represent are said to underflow even though they are technically inside the range of representable numbers, which is from 0 to infinity in magnitude. Similarly, certain results above the greatest representable finite number are said to overflow even though they are inside the representable range, since they are less than infinity.)

Long story short, this is what happens when you have:
unsigned char n = 255; /* highest possible value for an unsigned char */
n = n + 1; /* now n is "overflowing" (although the terminology is not correct) to 0 */
printf("overflow: 255 + 1 = %u\n", n);
n = n - 1; /* n will now "underflow" from 0 to 255; */
printf("underflow: 0 - 1 = %u\n", n);
n *= 2; /* n will now be (255 * 2) % 256 = 254;
/* when the result is too high, modulo with 2 to the power of 8 is used */
/* for an 8 bit variable such as unsigned char; */
printf("large overflow: 255 * 2 = %u\n", n);
n = n * (-2) + 100; /* n should now be -408 which is 104 in terms of unsigned char. */
/* (Logic is this: 408 % 256 = 152; 256 - 152 = 104) */
printf("large underflow: 255 * 2 = %u\n", n);
The result of that is (compiled with gcc 11.1, flags -Wall -Wextra -std=c99):
overflow: 255 + 1 = 0
underflow: 0 - 1 = 255
large overflow: 255 * 2 = 254
large underflow: 255 * 2 = 104
Now the scientific version: The comments above represent just a mathematical model of what is going on. To better understand what is actually happening, the following rules apply:
Integer promotion:
Integer types smaller than int are promoted when an operation is
performed on them. If all values of the original type can be
represented as an int, the value of the smaller type is converted to
an int; otherwise, it is converted to an unsigned int.
So what actually happens in memory when the computer does n = 255; n = n + 1; is this:
First, the right side is evaluated as an int (signed), because the result fits in a signed int according to the rule of integer promotion. So the right side of the expression becomes in binary: 0b00000000000000000000000011111111 + 0b00000000000000000000000000000001 = 0b00000000000000000000000100000000 (a 32 bit int).
Truncation
The 32-bit int loses the most significant 24 bits when being assigned
back to an 8-bit number.
So, when 0b00000000000000000000000100000000 is assigned to variable n, which is an unsigned char, the 32-bit value is truncated to an 8-bit value (only the right-most 8 bits are copied) => n becomes 0b00000000.
The same thing happens for each operation. The expression on the right side evaluates to a signed int, than it is truncated to 8 bits.

Generating random 64/32/16/ and 8-bit integers in C

I'm hoping that somebody can give me an understanding of why the code works the way it does. I'm trying to wrap my head around things but am lost.
My professor has given us this code snippet which we have to use in order to generate random numbers in C. The snippet in question generates a 64-bit integer, and we have to adapt it to also generate 32-bit, 16-bit, and 8-bit integers. I'm completely lost on where to start, and I'm not necessarily asking for a solution, just on how the original snippet works, so that I can adapt it form there.
long long rand64()
{
int a, b;
long long r;
a = rand();
b = rand();
r = (long long)a;
r = (r << 31) | b;
return r;
}
Questions I have about this code are:
Why is it shifted 31 bits? I thought rand() generated a number between 0-32767 which is 16 bits, so wouldn't that be 48 bits?
Why do we say | (or) b on the second to last line?

I'm making the relatively safe assumption that, in your computer's C implementation, long long is a 64-bit data type.
The key here is that, since long long r is signed, any value with the highest bit set will be negative. Therefore, the code shifts r by 31 bits to avoid setting that bit.
The | is a logical bit operator which combines the two values by setting all of the bits in r which are set in b.
EDIT:
After reading some of the comments, I realized that my answer needs correction. rand() returns a value no more than RAND_MAX which is typically 2^31-1. Therefore, r is a 31-bit integer. If you shifted it 32 bits to the left, you'd guarantee that its 31st bit (0-up counting) would always be zero.

rand() generates a random value [0...RAND_MAX] of questionable repute - but let us set that reputation aside and assume rand() is good enough and it is a
Mersenne number (power-of-2 - 1).
Weakness to OP's code: If RAND_MAX == pow(2,31)-1, a common occurrence, then OP's rand64() only returns values [0...pow(2,62)). #Nate Eldredge
Instead, loop as many times as needed.
To find how many random bits are returned with each call, we need the log2(RAND_MAX + 1). This fortunately is easy with an awesome macro from Is there any way to compute the width of an integer type at compile-time?
#include <stdlib.h>
/* Number of bits in inttype_MAX, or in any (1<<k)-1 where 0 <= k < 2040 */
#define IMAX_BITS(m) ((m)/((m)%255+1) / 255%255*8 + 7-86/((m)%255+12))
#define RAND_MAX_BITWIDTH (IMAX_BITS(RAND_MAX))
Example: rand_ul() returns a random value in the [0...ULONG_MAX] range, be unsigned long 32-bit, 64-bit, etc.
unsigned long rand_ul(void) {
unsigned long r = 0;
for (int i=0; i<IMAX_BITS(ULONG_MAX); i += RAND_MAX_BITWIDTH) {
r <<= RAND_MAX_BITWIDTH;
r |= rand();
}
return r;
}

How to create a negative binary number using signed/unsigned in C?

My thoughts: if one declares an int it basically gets an unsigned int. So if I need a negative value I have to explicitly create a signed int.
I tried
int a = 0b10000101;
printf("%d", a); // i get 138 ,what i've expected
signed int b = 0b10000101; // here i expect -10, but i also get 138
printf("%d", b); // also tried %u
So am I wrong that an signed integer in binary is a negative value?
How can I create a negative value in binary format?
Edit Even if I use 16/32/64 bits I get the same result. unsigned/signed doest seems to make a difference without manually shifting the bits.

If numbers are represented as two's complement you just need to have the sign bit set to ensure that the number is negative. That's the MSB. If an int is 32 bits, then 0b11111111111111111111111111111111 is -1, and 0b10000000000000000000000000000000 is INT_MIN.
To adjust for the size int(8|16|64)_t, just change the number of bits. The sign bit is still the MSB.

Keep in mind that, depending on your target, int could be 2 or 4 bytes. This means that int a=0b10000101 is not nearly enough bits to set the sign bit.
If your int is 4 bytes, you need 0b10000000 0000000 0000000 00000000 (spaces added for clarity).
For example on a 32-bit target:
int b = 0b11111111111111111111111111111110;
printf("%d\n", b); // prints -2

because int a = 0b10000101 has only 8 bits, where you need 16 or 32. Try thi:
int a = 0b10000000000000000000000000000101
that should create negative number if your machine is 32bits. If this does not work try:
int a = 0b1000000000000101
there are other ways to produce negative numbers:
int a = 0b1 << 31 + 0b101
or if you have 16 bit system
int a = 0b1 << 15 + 0b101
or this one would work for both 32 or 16 bits
int a = ~0b0 * 0b101
or this is another one that would work on both if you want to get -5
int a = ~0b101 + 1
so 0b101 is 5 in binary, ~0b101 gives -6 so to get -5 you add 1
EDIT:
Since I now see that you have confusion of what signed and unsigned numbers are, I will try to explain it as simple as possible int
So when you have:
int a = 5;
is the same as:
signed int a = 5;
and both of them would be positive. Now it would be the same as:
unsigned int a = 5;
because 5 is positive number.
On the other hand if you have:
int a = -5;
this would be the same as
signed int a = -5;
but it would not be the same as following:
unsigned int a = -5;
the first 2 would be -5, the third one is not the same. In fact it would be the same if you entered 4294967291 because they are the same in binary form but the fact that you have unsigned in front means that compiler would store it the same way but treat it as positive value.

How to create a negative binary number using signed/unsigned in C?
Simply negate the constant of a positive value. To attempt to do so with many 1's
... 1110110 assumes a bit width for int. Better to be portable.
#include <stdio.h>
int main(void) {
#define NEGATIVE_BINARY_NUMBER (-0b1010)
printf("%d\n", NEGATIVE_BINARY_NUMBER);
}
Output
-10

Homework - C bit puzzle - Perform % using C bit operations (no looping, conditionals, function calls, etc)

I'm completely stuck on how to do this homework problem and looking for a hint or two to keep me going. I'm limited to 20 operations (= doesn't count in this 20).
I'm supposed to fill in a function that looks like this:
/* Supposed to do x%(2^n).
For example: for x = 15 and n = 2, the result would be 3.
Additionally, if positive overflow occurs, the result should be the
maximum positive number, and if negative overflow occurs, the result
should be the most negative number.
*/
int remainder_power_of_2(int x, int n){
int twoToN = 1 << n;
/* Magic...? How can I do this without looping? We are assuming it is a
32 bit machine, and we can't use constants bigger than 8 bits
(0xFF is valid for example).
However, I can make a 32 bit number by ORing together a bunch of stuff.
Valid operations are: << >> + ~ ! | & ^
*/
return theAnswer;
}
I was thinking maybe I could shift the twoToN over left... until I somehow check (without if/else) that it is bigger than x, and then shift back to the right once... then xor it with x... and repeat? But I only have 20 operations!

Hint: In decadic system to do a modulo by power of 10, you just leave the last few digits and null the other. E.g. 12345 % 100 = 00045 = 45. Well, in computer numbers are binary. So you have to null the binary digits (bits). So look at various bit manipulation operators (&, |, ^) to do so.

Since binary is base 2, remainders mod 2^N are exactly represented by the rightmost bits of a value. For example, consider the following 32 bit integer:
00000000001101001101000110010101
This has the two's compliment value of 3461525. The remainder mod 2 is exactly the last bit (1). The remainder mod 4 (2^2) is exactly the last 2 bits (01). The remainder mod 8 (2^3) is exactly the last 3 bits (101). Generally, the remainder mod 2^N is exactly the last N bits.
In short, you need to be able to take your input number, and mask it somehow to get only the last few bits.
A tip: say you're using mod 64. The value of 64 in binary is:
00000000000000000000000001000000
The modulus you're interested in is the last 6 bits. I'll provide you a sequence of operations that can transform that number into a mask (but I'm not going to tell you what they are, you can figure them out yourself :D)
00000000000000000000000001000000 // starting value
11111111111111111111111110111111 // ???
11111111111111111111111111000000 // ???
00000000000000000000000000111111 // the mask you need
Each of those steps equates to exactly one operation that can be performed on an int type. Can you figure them out? Can you see how to simplify my steps? :D
Another hint:
00000000000000000000000001000000 // 64
11111111111111111111111111000000 // -64

Since your divisor is always power of two, it's easy.
uint32_t remainder(uint32_t number, uint32_t power)
{
power = 1 << power;
return (number & (power - 1));
}
Suppose you input number as 5 and divisor as 2
`00000000000000000000000000000101` number
AND
`00000000000000000000000000000001` divisor - 1
=
`00000000000000000000000000000001` remainder (what we expected)
Suppose you input number as 7 and divisor as 4
`00000000000000000000000000000111` number
AND
`00000000000000000000000000000011` divisor - 1
=
`00000000000000000000000000000011` remainder (what we expected)
This only works as long as divisor is a power of two (Except for divisor = 1), so use it carefully.

When does the signedness of an integer really matter?

Due to the way conversions and operations are defined in C, it seems to rarely matter whether you use a signed or an unsigned variable:
uint8_t u; int8_t i;
u = -3; i = -3;
u *= 2; i *= 2;
u += 15; i += 15;
u >>= 2; i >>= 2;
printf("%u",u); // -> 2
printf("%u",i); // -> 2
So, is there a set of rules to tell under which conditions the signedness of a variable really makes a difference?

It matters in these contexts:
division and modulo: -2/2 = 1, -2u/2 = UINT_MAX/2-1, -3%4 = -3, -3u%4 = 1
shifts. For negative signed values, the result of >> and << are implementation defined or undefined, resp. For unsigned values, they are always defined.
relationals -2 < 0, -2u > 0
overflows. x+1 > x may be assumed by the compiler to be always true iff x has signed type.

Yes. Signedness will affect the result of Greater Than and Less Than operators in C. Consider the following code:
unsigned int a = -5;
unsigned int b = 7;
if (a < b)
printf("Less");
else
printf("More");
In this example, "More" is incorrectly output, because the -5 is converted to a very high positive number by the compiler.
This will also affect your arithmetic with different sized variables. Again, consider this example:
unsigned char a = -5;
signed short b = 12;
printf("%d", a+b);
The returned result is 263, not the expected 7. This is because -5 is actually treated as 251 by the compiler. Overflow makes your operations work correctly for same-sized variables, but when expanding, the compiler does not expand the sign bit for unsigned variables, so it treats them as their original positive representation in the larger sized space. Study how two's compliment works and you'll see where this result comes from.

It affects the range of values that you can store in the variable.

It is relevant mainly in comparison.
printf("%d", (u-3) < 0); // -> 0
printf("%d", (i-3) < 0); // -> 1

Overflow on unsigned integers just wraps around. On signed values this is undefined behavior, everything can happen.

The signedness of 2's complement numbers is simply just a matter of how you are interpreting the number. Imagine the 3 bit numbers:
000
001
010
011
100
101
110
111
If you think of 000 as zero and the numbers as they are natural to humans, you would interpret them like this:
000: 0
001: 1
010: 2
011: 3
100: 4
101: 5
110: 6
111: 7
This is called "unsigned integer". You see everything as a number bigger than/equal to zero.
Now, what if you want to have some numbers as negative? Well, 2's complement comes to rescue. 2's complement is known to most people as just a formula, but in truth it's just congruency modulo 2^n where n is the number of bits in your number.
Let me give you a few examples of congruency:
2 = 5 = 8 = -1 = -4 module 3
-2 = 6 = 14 module 8
Now, just for convenience, let's say you decide to have the left most bit of a number as its sign. So you want to have:
000: 0
001: positive
010: positive
011: positive
100: negative
101: negative
110: negative
111: negative
Viewing your numbers congruent modulo 2^3 (= 8), you know that:
4 = -4
5 = -3
6 = -2
7 = -1
Therefore, you view your numbers as:
000: 0
001: 1
010: 2
011: 3
100: -4
101: -3
110: -2
111: -1
As you can see, the actual bits for -3 and 5 (for example) are the same (if the number has 3 bits). Therefore, writing x = -3 or x = 5 gives you the same result.
Interpreting numbers congruent modulo 2^n has other benefits. If you sum 2 numbers, one negative and one positive, it could happen on paper that you have a carry that would be thrown away, yet the result is still correct. Why? That carry was a 2^n which is congruent to 0 modulo 2^n! Isn't that convenient?
Overflow is also another case of congruency. In our example, if you sum two unsigned numbers 5 and 6, you get 3, which is actually 11.
So, why do you use signed and unsigned? For the CPU there is actually very little difference. For you however:
If the number has n bits, the unsigned represents numbers from 0 to 2^n-1
If the number has n bits, the signed represents numbers from -2^(n-1) to 2^(n-1)-1
So, for example if you assign -1 to a an unsigned number, it's the same as assigning 2^n-1 to it.
As per your example, that's exactly what you are doing. you are assigning -3 to a uint8_t, which is illegal, but as far as the CPU is concerned you are assigning 253 to it. Then all the rest of the operations are the same for both types and you end up getting the same result.
There is however a point that your example misses. operator >> on signed number extends the sign when shifting. Since the result of both of your operations is 9 before shifting you don't notice this. If you didn't have the +15, you would have -6 in i and 250 in u which then >> 2 would result in -2 in i (if printed with %u, 254) and 62 in u. (See Peter Cordes' comment below for a few technicalities)
To understand this better, take this example:
(signed)101011 (-21) >> 3 ----> 111101 (-3)
(unsigned)101011 ( 43) >> 3 ----> 000101 ( 5)
If you notice, floor(-21/8) is actually -3 and floor(43/8) is 5. However, -3 and 5 are not equal (and are not congruent modulo 64 (64 because there are 6 bits))

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight