Signed/Unsigned int, short and char - c

I am trying to understand the output of the code given at : http://phrack.org/issues/60/10.html
Quoting it here for reference:
#include <stdio.h>
int main(void){
int l;
short s;
char c;
l = 0xdeadbeef;
s = l;
c = l;
printf("l = 0x%x (%d bits)\n", l, sizeof(l) * 8);
printf("s = 0x%x (%d bits)\n", s, sizeof(s) * 8);
printf("c = 0x%x (%d bits)\n", c, sizeof(c) * 8);
return 0;
}
The output i get on my machine is:-
l = 0xdeadbeef (32 bits)
s = 0xffffbeef (16 bits)
c = 0xffffffef (8 bits)
Here is my understanding:-
The assignments s=l, c=l will result in s and c being promoted to ints and they will have the last 16 bits (0xbeef) and last 8 bits (0xef) of l respectively.
Printf tries to interpret each of the above values (l,s and c) as unsigned integers (as %x is passed as the format specifier). From the output i see that sign extension has taken place. My doubt is that since %x represents unsigned int, why has the sign extension taken place while printing s and c? Should not the output for s be 0x0000beef and for c be 0x000000ef?

why has the sign extension taken place while printing s and c
Let's see the following code:
unsigned char ucr8bit; /* Range is 0 to 255 on my machine */
signed char cr8bit; /* Range is -128 to 127 on my machine */
int i32bit;
cr8bit = MINUS_100; /* (char)(-100) or 0x9C */
i32bit = cr8bit; /* i32 bit is -100 or 0xFFFFFF9C */
As you can see, althout the number -100 is same, its representation is not mere prepending 0s in wider character but may be prepending the MSB or sign bit of the signed type in 2s complement system and 1s complement system.
In your example you are trying to print s and c as wider type and hence getting the sign bit replication.
Also your code contains many sources of undefined and unspecified behavior and thus may give different output on different compilers.
(For instance, you should use signed char instead of char as char may behave as unsigned char on some implementation and as signed char on some other implmentations)
l = 0xdeadbeef; /* Initializing l from an unsigned
if sizeof l is 32 bit UB as l is signed */
s = l; /* Initializing with an undefined value. Moreover
implicit conversion wider to narrower type */
printf("l = 0x%x (%d bits)\n", l, sizeof(l) * 8); /* Using %x
to print signed number and %d to print size_t */

You are using a 32-bit signed integer. That means that only 31 bits can be used for positive numbers. 0xdeadbeef uses 32 bits. Therefore, assigning it to a 32-bit signed integer makes it a negative number.
When shown with an unsigned conversion specifier, %x, it looks like the negative number that it is (with the sign extension).
When copying it into a short or char, the property of it being a negative number is retained.
To further show this, try setting:
l = 0xef;
The output is now:
l = 0xef (32 bits)
s = 0xef (16 bits)
c = 0xffffffef (8 bits)
0xef uses 8 bits which is positive when placed into a 32-bit or 16-bit variable. When you place an 8-bit number into a signed 8-bit variable (char), you are creating a negative number.
To see the retention of the negative number, try the reverse:
c = 0xef;
s = c;
l = c;
The output is:
l = 0xffffffef (32 bits)
s = 0xffffffef (16 bits)
c = 0xffffffef (8 bits)

Related

Output of the following C code

What will be the output of the following C code. Assuming it runs on Little endian machine, where short int takes 2 Bytes and char takes 1 Byte.
#include<stdio.h>
int main() {
short int c[5];
int i = 0;
for(i = 0; i < 5; i++)
c[i] = 400 + i;
char *b = (char *)c;
printf("%d", *(b+8));
return 0;
}
In my machine it gave
-108
I don't know if my machine is Little endian or big endian. I found somewhere that it should give
148
as the output. Because low order 8 bits of 404(i.e. element c[4]) is 148. But I think that due to "%d", it should read 2 Bytes from memory starting from the address of c[4].
The code gives different outputs on different computers because on some platforms the char type is signed by default and on others it's unsigned by default. That has nothing to do with endianness. Try this:
char *b = (char *)c;
printf("%d\n", (unsigned char)*(b+8)); // always prints 148
printf("%d\n", (signed char)*(b+8)); // always prints -108 (=-256 +148)
The default value is dependent on the platform and compiler settings. You can control the default behavior with GCC options -fsigned-char and -funsigned-char.
c[4] stores 404. In a two-byte little-endian representation, that means two bytes of 0x94 0x01, or (in decimal) 148 1.
b+8 addresses the memory of c[4]. b is a pointer to char, so the 8 means adding 8 bytes (which is 4 two-byte shorts). In other words, b+8 points to the first byte of c[4], which contains 148.
*(b+8) (which could also be written as b[8]) dereferences the pointer and thus gives you the value 148 as a char. What this does is implementation-defined: On many common platforms char is a signed type (with a range of -128 .. 127), so it can't actually be 148. But if it is an unsigned type (with a range of 0 .. 255), then 148 is fine.
The bit pattern for 148 in binary is 10010100. Interpreting this as a two's complement number gives you -108.
This char value (of either 148 or -108) is then automatically converted to int because it appears in the argument list of a variable-argument function (printf). This doesn't change the value.
Finally, "%d" tells printf to take the int argument and format it as a decimal number.
So, to recap: Assuming you have a machine where
a byte is 8 bits
negative numbers use two's complement
short int is 2 bytes
... then this program will output either -108 (if char is a signed type) or 148 (if char is an unsigned type).
To see what sizes types have in your system:
printf("char = %u\n", sizeof(char));
printf("short = %u\n", sizeof(short));
printf("int = %u\n", sizeof(int));
printf("long = %u\n", sizeof(long));
printf("long long = %u\n", sizeof(long long));
Change the lines in your program
unsigned char *b = (unsigned char *)c;
printf("%d\n", *(b + 8));
And simple test (I know that it is not guaranteed but all C compilers I know do it this way and I do not care about old CDC or UNISYS machines which had different addresses and pointers to different types of data
printf(" endianes test: %s\n", (*b + (unsigned)*(b + 1) * 0x100) == 400? "little" : "big");
Another remark: it is only because in your program c[0] == 400

Why is a variable declared as an unsigned int generating a negative value?

I wrote this bit of code to learn about bit shifting. To my surprise, even though I declared x to be an unsigned int, the output includes a negative number, namely when the leftmost bit is set to 1. My question: why? I thought an unsigned int was never negative. Per sizeof(x), x is 4 bytes wide.
Here is the code fragment:
int main(void)
{
unsigned int x;
x = 1;
for (int i = 0; i < 32; i++)
{
printf("2^%i = %i\n", i, x);
x <<= 1;
}
return 0;
}
Here is the output:
2^0 = 1
2^1 = 2
2^2 = 4
2^3 = 8
2^4 = 16
2^5 = 32
2^6 = 64
2^7 = 128
2^8 = 256
2^9 = 512
2^10 = 1024
2^11 = 2048
2^12 = 4096
2^13 = 8192
2^14 = 16384
2^15 = 32768
2^16 = 65536
2^17 = 131072
2^18 = 262144
2^19 = 524288
2^20 = 1048576
2^21 = 2097152
2^22 = 4194304
2^23 = 8388608
2^24 = 16777216
2^25 = 33554432
2^26 = 67108864
2^27 = 134217728
2^28 = 268435456
2^29 = 536870912
2^30 = 1073741824
2^31 = -2147483648
Just use correct conversion symbol
printf("2^%u = %u\n", i, x);
You're using the %i format specifier, which prints its argument as a signed int.
If you want to print as unsigned, use the %u format specifier.
printf("2^%i = %u\n", i, x);
When you talk about the sign of a integral value in C (C++ and many other programming languages) you are just talking about how you interpret some data.
You must understand that what's stored inside an unsigned int are just bits regardless of the sign, the fact that they behave as "unsigned" when used is a mere interpretation of the value.
So by using %i specifier you are treating it as a signed value, regardless how it is declared. Try with %u which specifies that you want to treat them as unsigned.
According to the C++ reference page on printf, using %i in the string passed to printf means the corresponding argument will be treated as signed decimal integer. This means that your unsigned int will be casted to a signed int.
In C++, casting unsigned to signed (and reverse) only changes the interpretation, not the bit values. So setting the leftmost bit to 1 makes the number negative because that is what it corresponds to in signed integer interpretation.
To achieve the expected number, use %u instead for unsigned integer interpretation.

Bitwise shift for unsigned long long type

In a c program. I am trying to use the left shift operator on uint64_t variable.
E.g.
// with shift of 24 bits
uint64_t x = 0;
x = (((uint64_t)76) << (24));
Output is: x = 1275068416
---------------------------------------------
// with shift of 32 bits
uint64_t x = 0;
x = (((uint64_t)76) << (32));
Output is: x = 0
If I perform left shift till 24 bits then it works fine, but at 32 bits it outputs 0. Whereas what I think is as the size of uint64_t i.e. unsigned long long is 64 bits. So shouldn't it work till the 64 bit shift ?
You're using the wrong format specifier to print the output. The %d format specifier expects an int, which apparently is 32-bit on your system. So passing a 64-bit value (and an unsigned one at that) leads to undefined behavior.
You should use the PRIu64 macro to get the correct format specifier for an unsigned 64-bit value.
printf("%"PRIu64"\n", x);

Unsigned int from 32 bit to 64bit OS

This code snippet is excerpted from a linux book.
If this is not appropriate to post the code snippet here, please let me know.
I will delete it. Thanks.
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
char buf[30];
char *p;
int i;
unsigned int index = 0;
//unsigned long index = 0;
printf("index-1 = %lx (sizeof %d)\n", index-1, sizeof(index-1));
for(i = 'A'; i <= 'Z'; i++)
buf[i - 'A'] = i;
p = &buf[1];
printf("%c: buf=%p p=%p p[-1]=%p\n", p[index-1], buf, p, &p[index-1]);
return 0;
}
On 32-bit OS environment:
This program works fine no matter the data type of index is unsigned int or unsigned long.
On 64-bit OS environment:
The same program will run into "core dump" if index is declared as unsigned int.
However, if I only change the data type of index from unsigned int to a) unsigned long or b) unsigned short,
this program works fine too.
The reason from the book only tells me that 64-bit will cause the core-dump due to non-negative number. But I have no idea exactly about the reason why unsigned long and unsigned short work but unsigned int.
What I am confused is that
p + (0u -1) == p + UINT_MAX when index is unsigned int.
BUT,
p + (0ul - 1) == p[-1] when index is unsigned long.
I get stuck at here.
If anyone can help to elaborate the details, it is highly appreciated!
Thank you.
Here comes some result on my 32 bit(RHEL5.10/gcc version 4.1.2 20080704)
and 64 bit machine (RHEL6.3/gcc version 4.4.6 20120305)
I am not sure if gcc version makes any difference here.
So, I paste the information as well.
On 32 bit:
I tried two changes:
1) Modify unsigned int index = 0 to unsigned short index = 0.
2) Modify unsigned int index = 0 to unsigned char index = 0.
The program can run without problem.
index-1 = ffffffff (sizeof 4)
A: buf=0xbfbdd5da p=0xbfbdd5db p[-1]=0xbfbdd5da
It seems that the data type of index will be promoted to 4 bytes due to -1.
On 64 bit:
I tried three changes:
1) Modify unsigned int index = 0 to unsigned char index = 0.
It works!
index-1 = ffffffff (sizeof 4)
A: buf=0x7fffef304ae0 p=0x7fffef304ae1 p[-1]=0x7fffef304ae0
2) Modify unsigned int index = 0 to unsigned short index = 0.
It works!
index-1 = ffffffff (sizeof 4)
A: buf=0x7fff48233170 p=0x7fff48233171 p[-1]=0x7fff48233170
3) Modify unsigned int index = 0 to unsigned long index = 0.
It works!
index-1 = ffffffff (sizeof 8)
A: buf=0x7fffb81d6c20 p=0x7fffb81d6c21 p[-1]=0x7fffb81d6c20
BUT, only
unsigned int index = 0 runs into the core dump at the last printf.
index-1 = ffffffff (sizeof 4)
Segmentation fault (core dumped)
Do not lie to the compiler!
Passing printf an int where it expects a long (%ld) is undefined behavior.
(Creating a pointer pointing outside any valid object (and not just behind one) is UB too...)
Correct the format specifiers and the pointer arithmetic (that includes indexing as a special case) and everything will work.
UB includes "It works as expected" as well as "Catastrophic failure".
BTW: If you politely ask your compiler for all warnings, it would warn you. Use -Wall -Wextra -pedantic or similar.
One other problem is code has is in your printf():
printf("index-1 = %lx (sizeof %d)\n", index-1, sizeof(index-1));
Lets simplify:
int i = 100;
print("%lx", i-1);
You are telling printf here is a long but in reality you are sending an int. clang does tell you the corrent warning (I think gcc should also spit the correct waring). See:
test1.c:6:19: warning: format specifies type 'unsigned long' but the argument has type 'int' [-Wformat]
printf("%lx", i - 100);
~~~ ^~~~~~~
%x
1 warning generated.
Solution is simple: you need to pass a long to printf or tell printf to print an int:
printf("%lx", (long)(i-100) );
printf("%x", i-100);
You got luck on 32bit and your app did not crash. Porting it to 64bit revealed a bug in your code and you can now fix it.
Arithmetic on unsigned values is always defined, in terms of wrap-around. E.g. (unsigned)-1 is the same as UINT_MAX. So an expression like
p + (0u-1)
is equivalent to
p + UINT_MAX
(&p[0u-1] is equivalent to &*(p + (0u-1)) and p + (0u-1)).
Maybe this is easier to understand if we replace the pointers with unsigned integer types. Consider:
uint32_t p32; // say, this is a 32-bit "pointer"
uint64_t p64; // a 64-bit "pointer"
Assuming 16, 32, and 64 bit for short, int, and long, respectively (entries on the same line equal):
p32 + (unsigned short)-1 p32 + USHRT_MAX p32 + (UINT_MAX>>16)
p32 + (0u-1) p32 + UINT_MAX p32 - 1
p32 + (0ul-1) p32 + ULONG_MAX p32 + UINT_MAX p32 - 1
p64 + (0u-1) p64 + UINT_MAX
p64 + (0ul-1) p64 + ULONG_MAX p64 - 1
You can always replace operands of addition, subtraction and multiplication on unsigned types by something congruent modulo the maximum value + 1. For example,
-1 ☰ ffffffffhex mod 232
(ffffffffhex is 232-1 or UINT_MAX), and also
ffffffffffffffffhex ☰ ffffffffhex mod 232
(for a 32-bit unsigned type you can always truncate to the least-significant 8 hex-digits).
Your examples:
32-bit
unsigned short index = 0;
In index - 1, index is promoted to int. The result has type int and value -1 (which is negative). Same for unsigned char.
64-bit
unsigned char index = 0;
unsigned short index = 0;
Same as for 32-bit. index is promoted to int, index - 1 is negative.
unsigned long index = 0;
The output
index-1 = ffffffff (sizeof 8)
is weird, it’s your only correct use of %lx but looks like you’ve printed it with %x (expecting 4 bytes); on my 64-bit computer (with 64-bit long) and with %lx I get:
index-1 = ffffffffffffffff (sizeof 8)
ffffffffffffffffhex is -1 modulo 264.
unsigned index = 0;
An int cannot hold any value unsigned int can, so in index - 1 nothing is promoted to int, the result has type unsigned int and value -1 (which is positive, being the same as UINT_MAX or ffffffffhex, since the type is unsigned). For 32-bit-addresses, adding this value is the same as subtracting one:
bfbdd5db bfbdd5db
+ ffffffff - 1
= 1bfbdd5da
= bfbdd5da = bfbdd5da
(Note the wrap-around/truncation.) For 64-bit addresses, however:
00007fff b81d6c21
+ ffffffff
= 00008000 b81d6c20
with no wrap-around. This is trying to access an invalid address, so you get a segfault.
Maybe have a look at 2’s complement on Wikipedia.
Under my 64-bit Linux, using a specifier expecting a 32-bit value while passing a 64-bit type (and the other way round) seems to “work”, only the 32 least-significant bits are read. But use the correct ones. lx expects an unsigned long, unmodified x an unsigned int, hx an unsigned short (an unsigned short is promoted to int when passed to printf (it’s passed as a variable argument), due to default argument promotions). The length modifier for size_t is z, as in %zu:
printf("index-1 = %lx (sizeof %zu)\n", (unsigned long)(index-1), sizeof(index-1));
(The conversion to unsigned long doesn’t change the value of an unsigned int, unsigned short, or unsigned char expression.)
sizeof(index-1) could also have been written as sizeof(+index), the only effect on the size of the expression are the usual arithmetic conversions, which are also triggered by unary +.

Unsigned Char pointing to unsigned integer

I don't understand why the following code prints out 7 2 3 0 I expected it to print out 1 9 7 1. Can anyone explain why it is printing 7230?:
unsigned int e = 197127;
unsigned char *f = (char *) &e;
printf("%ld\n", sizeof(e));
printf("%d ", *f);
f++;
printf("%d ", *f);
f++;
printf("%d ", *f);
f++;
printf("%d\n", *f);
Computers work with binary, not decimal, so 197127 is stored as a binary number and not a series of single digits separately in decimal
19712710 = 0003020716 = 0011 0000 0010 0000 01112
Suppose your system uses little endian, 0x00030207 would be stored in memory as 0x07 0x02 0x03 0x00 which is printed out as (7 2 3 0) as expected when you print out each byte
Because with your method you print out the internal representation of the unsigned and not its decimal representation.
Integers or any other data are represented as bytes internally. unsigned char is just another term for "byte" in this context. If you would have represented your integer as decimal inside a string
char E[] = "197127";
and then done an anologous walk throught the bytes, you would have seen the representation of the characters as numbers.
Binary representation of "197127" is "00110000001000000111".
The bytes looks like "00000111" (is 7 decimal), "00000010" (is 2), "0011" (is 3). the rest is 0.
Why did you expect 1 9 7 1? The hex representation of 197127 is 0x00030207, so on a little-endian architecture, the first byte will be 0x07, the second 0x02, the third 0x03, and the fourth 0x00, which is exactly what you're getting.
The value of e as 197127 is not a string representation. It is stored as a 16/32 bit integer (depending on platform). So, in memory, e is allocated, say 4 bytes on the stack, and would be represented as 0x30207 (hex) at that memory location. In binary, it would look like 110000001000000111. Note that the "endian" would actually backwards. See this link account endianess. So, when you point f to &e, you are referencing the 1st byte of the numeric value, If you want to represent a number as a string, you should have
char *e = "197127"
This has to do with the way the integer is stored, more specifically byte ordering. Your system happens to have little-endian byte ordering, i.e. the first byte of a multi byte integer is least significant, while the last byte is most significant.
You can try this:
printf("%d\n", 7 + (2 << 8) + (3 << 16) + (0 << 24));
This will print 197127.
Read more about byte order endianness here.
The byte layout for the unsigned integer 197127 is [0x07, 0x02, 0x03, 0x00], and your code prints the four bytes.
If you want the decimal digits, then you need to break the number down into digits:
int digits[100];
int c = 0;
while(e > 0) { digits[c++] = e % 10; e /= 10; }
while(c > 0) { printf("%u\n", digits[--c]); }
You know the type of int often take place four bytes. That means 197127 is presented as 00000000 00000011 00000010 00000111 in memory. From the result, your memory's address are Little-Endian. Which means, the low-byte 0000111 is allocated at low address, then 00000010 and 00000011, finally 00000000. So when you output f first as int, through type cast you obtain a 7. By f++, f points to 00000010, the output is 2. The rest could be deduced by analogy.
The underlying representation of the number e is in binary and if we convert the value to hex we can see that the value would be(assuming 32 bit unsigned int):
0x00030207
so when you iterate over the contents you are reading byte by byte through the *unsigned char **. Each byte contains two 4 bit hex digits and the byte order endiannes of the number is little endian since the least significant byte(0x07) is first and so in memory the contents are like so:
0x07020300
^ ^ ^ ^- Fourth byte
| | |-Third byte
| |-Second byte
|-First byte
Note that sizeof returns size_t and the correct format specifier is %zu, otherwise you have undefined behavior.
You also need to fix this line:
unsigned char *f = (char *) &e;
to:
unsigned char *f = (unsigned char *) &e;
^^^^^^^^
Because e is an integer value (probably 4 bytes) and not a string (1 byte per character).
To have the result you expect, you should change the declaration and assignment of e for :
unsigned char *e = "197127";
unsigned char *f = e;
Or, convert the integer value to a string (using sprintf()) and have f point to that instead :
char s[1000];
sprintf(s,"%d",e);
unsigned char *f = s;
Or, use mathematical operation to get single digit from your integer and print those out.
Or, ...

Resources