How does printf know when it's being passed a signed int - c

I'm trying to figure out how variables really work in C and find it strange how the printf function seems to know the difference between different variables of the same size, I'm assuming they're both 16 bits.
#include <stdio.h>
int main(void) {
unsigned short int positive = 0xffff;
short int negative = 0xffff;
printf("%d\n%d\n", positive, negative);
return 0;
}
Output:
65535
-1

I think we have to more carefully distinguish between the type conversions on the different integer types on one hand and the printf format specifiers (allowing to force printf how to interpret the data) on the other hand.
Systematically:
printf("%hd %hd\n", positive, negative);
// gives: -1 -1
Both values are interpreted as signed short int by printf, regardless of the declaration.
printf("%hu %hu\n", positive, negative);
// gives: 65535 65535
Both values are interpreted as unsigned short int by printf, regardless of the declaration.
However,
printf("%d %d\n", positive, negative);
// gives: 65535 -1
Both values are implicitly converted to (a longer) int, while the sign is kept.
Finally,
printf("%u %u\n", positive, negative);
// gives 65535 4294967295
Again, both values are implicitly converted to int, while the sign is kept, but then the negative value is interpreted as unsigned. As we can see here, plain int is actually 32-bit (on this system).
Curiously, only if I compile with gcc and -Wpedantic, it gives me a warning for the assignment short int negative = 0xffff;.

Related

Right Shifting Unsigned Integers in C Issue: Vacated Bits Not 0 When Shifting ~0?

To my understanding, when right-shifting an unsigned integer a number of places in C, the empty positions to the left are filled with zeroes.
However, I have been trying to right-shift ~0 (which would be 1111...1111) without any success at all. Trying to right-shift it any amount of places produces the same value except when I store the integer in a variable prior to the shift:
#include <stdio.h>
int main() {
// Using an expression:
printf("%u \n", ~0);
printf("%u \n", ~0 >> 2);
unsigned int temp = ~0;
// Using a variable to store the value before shift:
printf("%u \n", temp);
printf("%u \n", temp >> 2);
return 0;
}
Output:
4294967295
4294967295
4294967295
1073741823
Left-shifting ~0 is perfectly fine when performed in either manner. I'm not sure what's going on here.
The type of the expression 0 is int, not unsigned int. Therefore, ~0 is the int with value 0xffffffff (assuming 32-bit ints, which it looks like you have). On most systems, this means -1, and right-shifting a signed integer will sign-extend (keep it negative if it starts negative), giving 0xffffffff again.
When you print it with %u, the printf function reads this signed int with value -1 as an unsigned int, and prints what 0xffffffff means for an unsigned int.
Note that all of this is just what happens on most systems most of the time. This has multiple instances of undefined or implementation-defined behavior (bit pattern of negative ints, what right-shifting negative ints does, and what the %u format specifier means when you pass in an int instead of an unsigned int).
~0U >> 2
This will make sure that it is unsigned. Otherwise you were having signed operation.
Integral literals are always of type int (signed int). Now when you were left shifting that it is sign extended.
%u expects unsigned int. You have passed a signed int to it. This is Undefined behavior.
The following proposed code corrects the OPs code and addresses the comments to the question, including the removal of the undefined behavior.
#include <stdio.h>
int main( void )
{
// Using an expression:
printf("%u \n", ~0U);
printf("%u \n", ~0U >> 2);
unsigned int temp = ~0U;
// Using a variable to store the value before shift:
printf("%u \n", temp);
printf("%u \n", temp >> 2);
return 0;
}
the output is:
4294967295
1073741823
4294967295
1073741823

Unsigned values in C

I have the following code:
#include <stdio.h>
int main() {
unsigned int a = -1;
int b = -1;
printf("%x\n", a);
printf("%x\n", b);
printf("%d\n", a);
printf("%d\n", b);
printf("%u\n", a);
printf("%u\n", b);
return 0;
}
The output is:
ffffffff
ffffffff
-1
-1
4294967295
4294967295
I can see that a value is interpreted as signed or unsigned according to the value passed to printf function. In both cases, the bytes are the same (ffffffff). Then, what is the unsigned word for?
Assign a int -1 to an unsigned: As -1 does not fit in the range [0...UINT_MAX], multiples of UINT_MAX+1 are added until the answer is in range. Evidently UINT_MAX is pow(2,32)-1 or 429496725 on OP's machine so a has the value of 4294967295.
unsigned int a = -1;
The "%x", "%u" specifier expects a matching unsigned. Since these do not match, "If a conversion specification is invalid, the behavior is undefined.
If any argument is not the correct type for the corresponding conversion specification, the behavior is undefined." C11 §7.21.6.1 9. The printf specifier does not change b.
printf("%x\n", b); // UB
printf("%u\n", b); // UB
The "%d" specifier expects a matching int. Since these do not match, more UB.
printf("%d\n", a); // UB
Given undefined behavior, the conclusions are not supported.
both cases, the bytes are the same (ffffffff).
Even with the same bit pattern, different types may have different values. ffffffff as an unsigned has the value of 4294967295. As an int, depending signed integer encoding, it has the value of -1, -2147483647 or TBD. As a float it may be a NAN.
what is unsigned word for?
unsigned stores a whole number in the range [0 ... UINT_MAX]. It never has a negative value. If code needs a non-negative number, use unsigned. If code needs a counting number that may be +, - or 0, use int.
Update: to avoid a compiler warning about assigning a signed int to unsigned, use the below. This is an unsigned 1u being negated - which is well defined as above. The effect is the same as a -1, but conveys to the compiler direct intentions.
unsigned int a = -1u;
Having unsigned in variable declaration is more useful for the programmers themselves - don't treat the variables as negative. As you've noticed, both -1 and 4294967295 have exact same bit representation for a 4 byte integer. It's all about how you want to treat or see them.
The statement unsigned int a = -1; is converting -1 in two's complement and assigning the bit representation in a. The printf() specifier x, d and u are showing how the bit representation stored in variable a looks like in different format.
When you initialize unsigned int a to -1; it means that you are storing the 2's complement of -1 into the memory of a.
Which is nothing but 0xffffffff or 4294967295.
Hence when you print it using %x or %u format specifier you get that output.
By specifying signedness of a variable to decide on the minimum and maximum limit of value that can be stored.
Like with unsigned int: the range is from 0 to 4,294,967,295 and int: the range is from -2,147,483,648 to 2,147,483,647
For more info on signedness refer this

Sign extension query in case of short

Given,
unsigned short y = 0xFFFF;
When I print
printf("%x", y);
I get : 0xFFFF;
But when I print
printf("%x", (signed short)y);
I get : 0xFFFFFFFF
Whole program below:
#include <stdio.h>
int main() {
unsigned short y = 0xFFFF;
unsigned short z = 0x7FFF;
printf("%x %x\n", y,z);
printf("%x %x", (signed short)y, (signed short)z);
return 0;
}
Sign extension happens when we typecast lower to higher byte data type, but here we are typecasting short to signed short.
In both cases sizeof((signed short)y) or sizeof((signed short)z) prints 2 bytes. Short remains of 2 bytes, if sign bit is zero as in case of 0x7fff.
Any help is very much appreciated!
Output of the first printf is as expected. The second printf produces undefined behavior.
In C language when you pass a a value smaller than int as a variadic argument, that value is always implicitly converted to type int. It is not possible to physically pass a short or char variadic argument. That implicit conversion to int is where your "sign extension" takes place.
For this reason, your printf("%x", y); is equivalent to printf("%x", (int) y);. The value that is passed to printf is 0xFFFF of type int. Technically, %x format requires an unsigned int argument, but a non-negative int value is also OK (unless I'm missing some technicality). The output is 0xFFFF.
Conversion to int happens in the second case as well. I.e. your printf("%x", (signed short) y); is equivalent to printf("%x", (int) (signed short) y);. The conversion of 0xFFFF to (signed short) is implementation-defined, because 0xFFFF is apparently out of range of signed short on your platform. But most likely it produces a negative value (-1). When converted to int it produces the same negative value of type int (again, -1 represented as 0xFFFFFFFF for a 32-bit int). The further behavior is undefined, since you are passing a negative int value for format specifier %x, which requires unsigned int argument. It is illegal to use %x with negative int values.
In other words, formally your second printf prints unpredictable garbage. But practically the above explains where that 0xFFFFFFFF came from.
Let's break it down and into smaller pieces:
Given,
unsigned short y = 0xFFFF;
Assuming two-bytes unsigned short maximum value is 2^16-1, that is indeed 0xFFFF.
When I print
printf("%x", y);
Due to default argument promotion (as printf() is variadic function) value of y is implicitly promoted to type int. With %x format-specified it's treated as unsigned int. Assuming common two-complement's representation and four-bytes int type, that means that as most-significant bit is set to zero, the bit patterns of int and unsigned int are simply the same.
But when I print
printf("%x", (signed short)y);
What you have done is cast to signed type, that cannot represent value of 0xFFFF. Such conversion as standard stays is implementation-defined, so you can get whatever result. After implicit conversion to int apparently you have bit-patern of 32-ones, that are represented as 0xFFFFFFFF.

When will an unsigned int variable becomes negative

I was going through the existing code and when debugging the UTC time which is declared as
unsigned int utc_time;
I could get some positive integer every time by which I would be sure that I get the time. But suddenly in the code I got a negative value for the variable which is declared as an unsigned integer.
Please help me to understand what might be the reason.
Unsigned integers, by their very nature, can never be negative.
You may end up with a negative value if you cast it to a signed integer, or simply assign the value to a signed integer, or even incorrectly treat it as signed, such as with:
#include <stdio.h>
int main (void) {
unsigned int u = 3333333333u;
printf ("unsigned = %u, signed = %d\n", u, u);
return 0;
}
which outputs:
unsigned = 3333333333, signed = -961633963
on my 32-bit integer system.
When it's cast or treated as a signed type. You probably printed your unsigned int as an int, and the bit sequence of the unsigned would have corresponded to a negative signed value.
ie. Perhaps you did:
unsigned int utc_time;
...
printf("%d", utc_time);
Where %d is for signed integers, compared to %u which is used for unsigned. Anyway if you show us the code we'll be able to tell you for certain.
There's no notion of positive or negative in an unsigned variable.
Make sure you using
printf("%u", utc_time);
to display it
In response to the comment %u displays the varible as an unsigned int where as %i or %d will display the varible as a signed int.
Negative numbers in most (all?) C programs are represented as a two's complement of the unsigned number plus one. It's possible that your debugger or a program listing the values doesn't show it as an unsigned type so you see it's two's complement.

Printing unsigned long long using %d

Why do I get -1 when I print the following?
unsigned long long int largestIntegerInC = 18446744073709551615LL;
printf ("largestIntegerInC = %d\n", largestIntegerInC);
I know I should use llu instead of d, but why do I get -1 instead of 18446744073709551615LL?
Is it because of overflow?
In C (99), LLONG_MAX, the maximum value of long long int type is guaranteed to be at least 9223372036854775807. The maximum value of an unsigned long long int is guaranteed to be at least 18446744073709551615, which is 264−1 (0xffffffffffffffff).
So, initialization should be:
unsigned long long int largestIntegerInC = 18446744073709551615ULL;
(Note the ULL.) Since largestIntegerInC is of type unsigned long long int, you should print it with the right format specifier, which is "%llu":
$ cat test.c
#include <stdio.h>
int main(void)
{
unsigned long long int largestIntegerInC = 18446744073709551615ULL;
/* good */
printf("%llu\n", largestIntegerInC);
/* bad */
printf("%d\n", largestIntegerInC);
return 0;
}
$ gcc -std=c99 -pedantic test.c
test.c: In function ‘main’:
test.c:9: warning: format ‘%d’ expects type ‘int’, but argument 2 has type ‘long long unsigned int’
The second printf() above is wrong, it can print anything. You are using "%d", which means printf() is expecting an int, but gets a unsigned long long int, which is (most likely) not the same size as int. The reason you are getting -1 as your output is due to (bad) luck, and the fact that on your machine, numbers are represented using two's complement representation.
To see how this can be bad, let's run the following program:
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
int main(int argc, char *argv[])
{
const char *fmt;
unsigned long long int x = ULLONG_MAX;
unsigned long long int y = 42;
int i = -1;
if (argc != 2) {
fprintf(stderr, "Need format string\n");
return EXIT_FAILURE;
}
fmt = argv[1];
printf(fmt, x, y, i);
putchar('\n');
return 0;
}
On my Macbook, running the program with "%d %d %d" gives me -1 -1 42, and on a Linux machine, the same program with the same format gives me -1 42 -1. Oops.
In fact, if you are trying to store the largest unsigned long long int number in your largestIntegerInC variable, you should include limits.h and use ULLONG_MAX. Or you should store assing -1 to your variable:
#include <limits.h>
#include <stdio.h>
int main(void)
{
unsigned long long int largestIntegerInC = ULLONG_MAX;
unsigned long long int next = -1;
if (next == largestIntegerInC) puts("OK");
return 0;
}
In the above program, both largestIntegerInC and next contain the largest possible value for unsigned long long int type.
It's because you're passing a number with all the bits set to 1. When interpreted as a two's complement signed number, that works out to -1. In this case, it's probably only looking at 32 of those one bits instead of all 64, but that doesn't make any real difference.
In two's complement arithmetic, the signed value -1 is the same as the largest unsigned value.
Consider the bit patterns for negative numbers in two's complement (I'm using 8 bit integers, but the pattern applies regardless of the size):
0 - 0x00
-1 - 0xFF
-2 - 0xFE
-3 - 0xFD
So, you can see that negative 1 has the bit pattern of all 1's which is also the bit pattern for the largest unsigned value.
You used a format for a signed 32-bit number, so you got -1. printf() can't tell internally how big the number you passed in is, so it just pulls the first 32 bits from the varargs list and uses them as the value to be printed out. Since you gave a signed format, it prints it that way, and 0xffffffff is the two's complement representation of -1.
You can (should) see why in compiler warning. If not, try to set the highest warning level. With VS I've got this warning: warning C4245: 'initializing' : conversion from '__int64' to 'unsigned __int64', signed/unsigned mismatch.
No, there is no overflow. It's because it isn't printing the entire value:
18446744073709551615 is the same as 0xFFFFFFFFFFFFFFFF. When printf %d processes that, it grabs only 32 bits (or 64 bits if it's a 64-bit CPU) for conversion, and those are the signed value -1.
If the printf conversion had been %u instead, it would show either 4294967295 (32 bits) or 18446744073709551615 (64 bits).
An overflow is when a value increases to the point where it won't fit in the storage allocated. In this case, the value is allocated just fine, but isn't being completely retrieved.

Resources