Unsigned int from 32 bit to 64bit OS - c

This code snippet is excerpted from a linux book.
If this is not appropriate to post the code snippet here, please let me know.
I will delete it. Thanks.
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
char buf[30];
char *p;
int i;
unsigned int index = 0;
//unsigned long index = 0;
printf("index-1 = %lx (sizeof %d)\n", index-1, sizeof(index-1));
for(i = 'A'; i <= 'Z'; i++)
buf[i - 'A'] = i;
p = &buf[1];
printf("%c: buf=%p p=%p p[-1]=%p\n", p[index-1], buf, p, &p[index-1]);
return 0;
}
On 32-bit OS environment:
This program works fine no matter the data type of index is unsigned int or unsigned long.
On 64-bit OS environment:
The same program will run into "core dump" if index is declared as unsigned int.
However, if I only change the data type of index from unsigned int to a) unsigned long or b) unsigned short,
this program works fine too.
The reason from the book only tells me that 64-bit will cause the core-dump due to non-negative number. But I have no idea exactly about the reason why unsigned long and unsigned short work but unsigned int.
What I am confused is that
p + (0u -1) == p + UINT_MAX when index is unsigned int.
BUT,
p + (0ul - 1) == p[-1] when index is unsigned long.
I get stuck at here.
If anyone can help to elaborate the details, it is highly appreciated!
Thank you.
Here comes some result on my 32 bit(RHEL5.10/gcc version 4.1.2 20080704)
and 64 bit machine (RHEL6.3/gcc version 4.4.6 20120305)
I am not sure if gcc version makes any difference here.
So, I paste the information as well.
On 32 bit:
I tried two changes:
1) Modify unsigned int index = 0 to unsigned short index = 0.
2) Modify unsigned int index = 0 to unsigned char index = 0.
The program can run without problem.
index-1 = ffffffff (sizeof 4)
A: buf=0xbfbdd5da p=0xbfbdd5db p[-1]=0xbfbdd5da
It seems that the data type of index will be promoted to 4 bytes due to -1.
On 64 bit:
I tried three changes:
1) Modify unsigned int index = 0 to unsigned char index = 0.
It works!
index-1 = ffffffff (sizeof 4)
A: buf=0x7fffef304ae0 p=0x7fffef304ae1 p[-1]=0x7fffef304ae0
2) Modify unsigned int index = 0 to unsigned short index = 0.
It works!
index-1 = ffffffff (sizeof 4)
A: buf=0x7fff48233170 p=0x7fff48233171 p[-1]=0x7fff48233170
3) Modify unsigned int index = 0 to unsigned long index = 0.
It works!
index-1 = ffffffff (sizeof 8)
A: buf=0x7fffb81d6c20 p=0x7fffb81d6c21 p[-1]=0x7fffb81d6c20
BUT, only
unsigned int index = 0 runs into the core dump at the last printf.
index-1 = ffffffff (sizeof 4)
Segmentation fault (core dumped)

Do not lie to the compiler!
Passing printf an int where it expects a long (%ld) is undefined behavior.
(Creating a pointer pointing outside any valid object (and not just behind one) is UB too...)
Correct the format specifiers and the pointer arithmetic (that includes indexing as a special case) and everything will work.
UB includes "It works as expected" as well as "Catastrophic failure".
BTW: If you politely ask your compiler for all warnings, it would warn you. Use -Wall -Wextra -pedantic or similar.

One other problem is code has is in your printf():
printf("index-1 = %lx (sizeof %d)\n", index-1, sizeof(index-1));
Lets simplify:
int i = 100;
print("%lx", i-1);
You are telling printf here is a long but in reality you are sending an int. clang does tell you the corrent warning (I think gcc should also spit the correct waring). See:
test1.c:6:19: warning: format specifies type 'unsigned long' but the argument has type 'int' [-Wformat]
printf("%lx", i - 100);
~~~ ^~~~~~~
%x
1 warning generated.
Solution is simple: you need to pass a long to printf or tell printf to print an int:
printf("%lx", (long)(i-100) );
printf("%x", i-100);
You got luck on 32bit and your app did not crash. Porting it to 64bit revealed a bug in your code and you can now fix it.

Arithmetic on unsigned values is always defined, in terms of wrap-around. E.g. (unsigned)-1 is the same as UINT_MAX. So an expression like
p + (0u-1)
is equivalent to
p + UINT_MAX
(&p[0u-1] is equivalent to &*(p + (0u-1)) and p + (0u-1)).
Maybe this is easier to understand if we replace the pointers with unsigned integer types. Consider:
uint32_t p32; // say, this is a 32-bit "pointer"
uint64_t p64; // a 64-bit "pointer"
Assuming 16, 32, and 64 bit for short, int, and long, respectively (entries on the same line equal):
p32 + (unsigned short)-1 p32 + USHRT_MAX p32 + (UINT_MAX>>16)
p32 + (0u-1) p32 + UINT_MAX p32 - 1
p32 + (0ul-1) p32 + ULONG_MAX p32 + UINT_MAX p32 - 1
p64 + (0u-1) p64 + UINT_MAX
p64 + (0ul-1) p64 + ULONG_MAX p64 - 1
You can always replace operands of addition, subtraction and multiplication on unsigned types by something congruent modulo the maximum value + 1. For example,
-1 ☰ ffffffffhex mod 232
(ffffffffhex is 232-1 or UINT_MAX), and also
ffffffffffffffffhex ☰ ffffffffhex mod 232
(for a 32-bit unsigned type you can always truncate to the least-significant 8 hex-digits).
Your examples:
32-bit
unsigned short index = 0;
In index - 1, index is promoted to int. The result has type int and value -1 (which is negative). Same for unsigned char.
64-bit
unsigned char index = 0;
unsigned short index = 0;
Same as for 32-bit. index is promoted to int, index - 1 is negative.
unsigned long index = 0;
The output
index-1 = ffffffff (sizeof 8)
is weird, it’s your only correct use of %lx but looks like you’ve printed it with %x (expecting 4 bytes); on my 64-bit computer (with 64-bit long) and with %lx I get:
index-1 = ffffffffffffffff (sizeof 8)
ffffffffffffffffhex is -1 modulo 264.
unsigned index = 0;
An int cannot hold any value unsigned int can, so in index - 1 nothing is promoted to int, the result has type unsigned int and value -1 (which is positive, being the same as UINT_MAX or ffffffffhex, since the type is unsigned). For 32-bit-addresses, adding this value is the same as subtracting one:
bfbdd5db bfbdd5db
+ ffffffff - 1
= 1bfbdd5da
= bfbdd5da = bfbdd5da
(Note the wrap-around/truncation.) For 64-bit addresses, however:
00007fff b81d6c21
+ ffffffff
= 00008000 b81d6c20
with no wrap-around. This is trying to access an invalid address, so you get a segfault.
Maybe have a look at 2’s complement on Wikipedia.
Under my 64-bit Linux, using a specifier expecting a 32-bit value while passing a 64-bit type (and the other way round) seems to “work”, only the 32 least-significant bits are read. But use the correct ones. lx expects an unsigned long, unmodified x an unsigned int, hx an unsigned short (an unsigned short is promoted to int when passed to printf (it’s passed as a variable argument), due to default argument promotions). The length modifier for size_t is z, as in %zu:
printf("index-1 = %lx (sizeof %zu)\n", (unsigned long)(index-1), sizeof(index-1));
(The conversion to unsigned long doesn’t change the value of an unsigned int, unsigned short, or unsigned char expression.)
sizeof(index-1) could also have been written as sizeof(+index), the only effect on the size of the expression are the usual arithmetic conversions, which are also triggered by unary +.

Related

Output of the following C code

What will be the output of the following C code. Assuming it runs on Little endian machine, where short int takes 2 Bytes and char takes 1 Byte.
#include<stdio.h>
int main() {
short int c[5];
int i = 0;
for(i = 0; i < 5; i++)
c[i] = 400 + i;
char *b = (char *)c;
printf("%d", *(b+8));
return 0;
}
In my machine it gave
-108
I don't know if my machine is Little endian or big endian. I found somewhere that it should give
148
as the output. Because low order 8 bits of 404(i.e. element c[4]) is 148. But I think that due to "%d", it should read 2 Bytes from memory starting from the address of c[4].
The code gives different outputs on different computers because on some platforms the char type is signed by default and on others it's unsigned by default. That has nothing to do with endianness. Try this:
char *b = (char *)c;
printf("%d\n", (unsigned char)*(b+8)); // always prints 148
printf("%d\n", (signed char)*(b+8)); // always prints -108 (=-256 +148)
The default value is dependent on the platform and compiler settings. You can control the default behavior with GCC options -fsigned-char and -funsigned-char.
c[4] stores 404. In a two-byte little-endian representation, that means two bytes of 0x94 0x01, or (in decimal) 148 1.
b+8 addresses the memory of c[4]. b is a pointer to char, so the 8 means adding 8 bytes (which is 4 two-byte shorts). In other words, b+8 points to the first byte of c[4], which contains 148.
*(b+8) (which could also be written as b[8]) dereferences the pointer and thus gives you the value 148 as a char. What this does is implementation-defined: On many common platforms char is a signed type (with a range of -128 .. 127), so it can't actually be 148. But if it is an unsigned type (with a range of 0 .. 255), then 148 is fine.
The bit pattern for 148 in binary is 10010100. Interpreting this as a two's complement number gives you -108.
This char value (of either 148 or -108) is then automatically converted to int because it appears in the argument list of a variable-argument function (printf). This doesn't change the value.
Finally, "%d" tells printf to take the int argument and format it as a decimal number.
So, to recap: Assuming you have a machine where
a byte is 8 bits
negative numbers use two's complement
short int is 2 bytes
... then this program will output either -108 (if char is a signed type) or 148 (if char is an unsigned type).
To see what sizes types have in your system:
printf("char = %u\n", sizeof(char));
printf("short = %u\n", sizeof(short));
printf("int = %u\n", sizeof(int));
printf("long = %u\n", sizeof(long));
printf("long long = %u\n", sizeof(long long));
Change the lines in your program
unsigned char *b = (unsigned char *)c;
printf("%d\n", *(b + 8));
And simple test (I know that it is not guaranteed but all C compilers I know do it this way and I do not care about old CDC or UNISYS machines which had different addresses and pointers to different types of data
printf(" endianes test: %s\n", (*b + (unsigned)*(b + 1) * 0x100) == 400? "little" : "big");
Another remark: it is only because in your program c[0] == 400

Print wrong value of unsigned int variable in C

I have written this small program using C:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
unsigned int un_i = 1112;
printf ("%d and %d", (1 - un_i), (1 - un_i)/10);
return 0;
}
My expectation is: "-1111 and -111"
But my result is: "-1111 and 429496618"
I don't know why it prints out 429496618 instead of -111. Please explain for me
I use gcc ver 4.4.7 and OS centos with kernel 2.6.32
Thank you very much!
That because un_i is of type unsigned int, it does not represent negative values. If you expect the result to be negative, you will need signed type, like int, try this:
unsigned int un_i = 1112;
printf ("PRINTF: un_i[%d] and u_i/10[%d]\n", (1 - un_i), (1 - (int)un_i)/10);
You expect to print -1111 and -111. However 1 - un_i produces a result that is also of type unsigned int; the result is also always non-negative. If unsigned int is 32 bits wide (as it is in your case), the result will be 4294966185; and that divided by 10 would result in 429496618.
The %d switch expects a (signed) int, not an unsigned int. Of that the C11 standard says that when using variable arguments with wrong type the behaviour is undefined, excepting that
one type is a signed integer type, the other type is the corresponding unsigned integer type, and the value is representable in both types
Thus printing 429496618 with %d has defined behaviour, as this same value is representable as both signed int and unsigned int.
However 1 - 1112U will have the value of UINT_MAX - 1110, which, when used as an argument to printf and converted with %d will lead to undefined behaviour since the UINT_MAX - 1100 value is not representable as a signed int; getting -1111 printed just happens to be the (undefined) behaviour in this case.
Since you really want to use signed numbers, then you should declare your variable un_i as an int instead of unsigned int.
In case you expect to do signed math with numbers greater than int use long int or long long int or even better, types such as int64_t instead of this unsigned trickery.
Your issue is from Signed and Unsigned conversion.
1, In case of unsigned int un_i = 1112;
(1 - un_i) = 0xfffffba9 /* first bit is 1 */
Even the first bit is 1, but un_i is unsigned, so it treat the first bit as normal value bit, so the first bit will be 0 after division:
(1 - un_i)/10 = 0x1999992a /* first bit is 0 */
printf ("%d", (1 - un_i)/10); /* 0x1999992a = 429496618 because Sign bit is 0 */
2, In case of int sig_i = 1112, the first bit is treated as Sign bit, and keep it is still 1 (negative) after devision by 10:
(1 - sig_i) = 0xfffffba9 /* first bit is 1 */
(1 - sig_i)/10 = 0xffffff91 /* first bit is 1 */
printf ("%d", (1 - sig_i)/10); /* 0xffffff91 = -111 because Sign bit is 1 */
Please run my code to get the detail result:
unsigned int un_i = 1112;
int sig_i = 1112;
printf ("Unsigned \n%d [hex: %x]\n", (1 - un_i), (1 - un_i));
printf ("and %d [hex: %x]\n", (1 - un_i)/10, (1 - un_i)/10);
printf ("Signed \n%d[hex: %x]\n", (1 - sig_i), (1 - sig_i));
printf ("and %d [hex: %x]\n", (1 - sig_i)/10, (1 - sig_i)/10);
Result
Unsigned
-1111 [hex: fffffba9]
and 429496618 [hex: 1999992a]
Signed
-1111[hex: fffffba9]
and -111 [hex: ffffff91]

Signed/Unsigned int, short and char

I am trying to understand the output of the code given at : http://phrack.org/issues/60/10.html
Quoting it here for reference:
#include <stdio.h>
int main(void){
int l;
short s;
char c;
l = 0xdeadbeef;
s = l;
c = l;
printf("l = 0x%x (%d bits)\n", l, sizeof(l) * 8);
printf("s = 0x%x (%d bits)\n", s, sizeof(s) * 8);
printf("c = 0x%x (%d bits)\n", c, sizeof(c) * 8);
return 0;
}
The output i get on my machine is:-
l = 0xdeadbeef (32 bits)
s = 0xffffbeef (16 bits)
c = 0xffffffef (8 bits)
Here is my understanding:-
The assignments s=l, c=l will result in s and c being promoted to ints and they will have the last 16 bits (0xbeef) and last 8 bits (0xef) of l respectively.
Printf tries to interpret each of the above values (l,s and c) as unsigned integers (as %x is passed as the format specifier). From the output i see that sign extension has taken place. My doubt is that since %x represents unsigned int, why has the sign extension taken place while printing s and c? Should not the output for s be 0x0000beef and for c be 0x000000ef?
why has the sign extension taken place while printing s and c
Let's see the following code:
unsigned char ucr8bit; /* Range is 0 to 255 on my machine */
signed char cr8bit; /* Range is -128 to 127 on my machine */
int i32bit;
cr8bit = MINUS_100; /* (char)(-100) or 0x9C */
i32bit = cr8bit; /* i32 bit is -100 or 0xFFFFFF9C */
As you can see, althout the number -100 is same, its representation is not mere prepending 0s in wider character but may be prepending the MSB or sign bit of the signed type in 2s complement system and 1s complement system.
In your example you are trying to print s and c as wider type and hence getting the sign bit replication.
Also your code contains many sources of undefined and unspecified behavior and thus may give different output on different compilers.
(For instance, you should use signed char instead of char as char may behave as unsigned char on some implementation and as signed char on some other implmentations)
l = 0xdeadbeef; /* Initializing l from an unsigned
if sizeof l is 32 bit UB as l is signed */
s = l; /* Initializing with an undefined value. Moreover
implicit conversion wider to narrower type */
printf("l = 0x%x (%d bits)\n", l, sizeof(l) * 8); /* Using %x
to print signed number and %d to print size_t */
You are using a 32-bit signed integer. That means that only 31 bits can be used for positive numbers. 0xdeadbeef uses 32 bits. Therefore, assigning it to a 32-bit signed integer makes it a negative number.
When shown with an unsigned conversion specifier, %x, it looks like the negative number that it is (with the sign extension).
When copying it into a short or char, the property of it being a negative number is retained.
To further show this, try setting:
l = 0xef;
The output is now:
l = 0xef (32 bits)
s = 0xef (16 bits)
c = 0xffffffef (8 bits)
0xef uses 8 bits which is positive when placed into a 32-bit or 16-bit variable. When you place an 8-bit number into a signed 8-bit variable (char), you are creating a negative number.
To see the retention of the negative number, try the reverse:
c = 0xef;
s = c;
l = c;
The output is:
l = 0xffffffef (32 bits)
s = 0xffffffef (16 bits)
c = 0xffffffef (8 bits)

C UINT16 How to get it right?

I'm new on C programming and I'm testing some code where I receive and process an UDP packet formatted as follow:
UINT16 port1
UINT16 port2
The corresponding values on this test are:
6005
5555
If I print the whole packet buffer I get something like this:
u^W³^U><9e>^D
So I thought that I would just have to break it and process as an unsigned int of 16 bytes. So I've tried something like this:
int l = 0;
unsigned int *primaryPort = *(unsigned int) &buffer[l];
AddToLog(logInfo, "PrimaryPort: %u\n", primaryPort);
l += sizeof(primaryPort);
unsigned int *secondaryPort = *(unsigned int) &buffer[l];
AddToLog(logInfo, "SecondaryPort: %u\n", secondaryPort);
l += sizeof(secondaryPort);
But I get wrong numbers with 8 digits.
I even tried another approach like follow, but also get the wrong number as well.
int l = 0;
unsigned char primaryPort[16];
snprintf(primaryPort, sizeof(primaryPort), "%u", &buffer[l]);
AddToLog(logInfo, "PrimaryPort: %d\n", primaryPort);
l += sizeof(primaryPort);
unsigned char secondaryPort[16];
snprintf(secondaryPort, sizeof(secondaryPort), "%u", &buffer[l]);
AddToLog(logInfo, "SecondaryPort: %d\n", secondaryPort);
l += sizeof(secondaryPort);
What I'm doing wrong? Also, why I have to free on a char string variables, but I don't need to free on int variables?
You are passing to AddToLog and snprintf pointers to the integers. So what you're seeing are the addresses of the integers, not the integers themselves.
You need to dereference your pointers -- for example, put an asterisk (*) in front of primaryPort in your calls to AddToLog in your first approach.
As #rileyberton suggests, most likely unsigned int is 4 bytes on your system, which is the C99 type uint32_t. For a 16-bit integer, use uint16_t. These are defined in stdint.h. These are traditionally called "short integers" or "half integers" and require the %hu qualifier in printf or similar functions, rather than just %u (which stands for unsigned int, whose size depends on the target machine.)
Also, as #igor-tandetnik suggests, you may need to switch the byte order of the integers in your packet, if for example the packet is using network order (big-endian) format and your target machine is using little-endian format.
unsigned int on your system is likely 4 bytes (uint32_t). You can use unsigned int here if you mask out the values in the correct endianess, or simply use a short.
int l = 0;
unsigned short *primaryPort = *(unsigned short) &buffer[l];
AddToLog(logInfo, "PrimaryPort: %u\n", primaryPort);
l += sizeof(*primaryPort);
unsigned short *secondaryPort = *(unsigned short) &buffer[l];
AddToLog(logInfo, "SecondaryPort: %u\n", secondaryPort);
l += sizeof(*secondaryPort);
You declared primaryPort and secondaryPort to be pointers to unsigned short.
But when you assign them values from a section of buffer, you already de-referenced the pointer. You don't need pointers-to-unsigned-short. You just need an unsigned short.
Change it to:
unsigned short primaryPort = *((unsigned short*) &buffer[l]);
unsigned short secondaryPort = *((unsigned short *) &buffer[l]);
Note the removal of a * in the variable declarations.
If you're still having problems, you'll need to examine buffer byte-by-byte, looking for the value you expect. You can expect that 6005 will show up as either hex 17 75 or 75 17, depending on your platform's endianness.

cast without * operator

Could someone explain to me what's happening to "n" in this situation?
main.c
unsigned long temp0;
PLLSYS0_FWD_DIV_A_DECODE(n);
main.h
#define PLLSYS0_FWD_DIV_A_DECODE(n) ((((unsigned long)(n))>>8)& 0x0000000f)
I understand that n is being shifted 8 bits and then anded with 0x0000000f. So what does (unsigned long)(n) actually do?
#include <stdio.h>
int main(void)
{
unsigned long test1 = 1;
printf("test1 = %d \n", test1);
printf("(unsigned long)test1 = %d \n", (unsigned long)(test1));
return 0;
}
Output:
test1 = 1
(unsigned long)test1 = 1
In your code example, the cast doesn't make much sense because test1 is already an unsigned long, but it makes sense when the macro is used on a different type like unsigned char etc.
Also you should use %lu in printf to print unsigned long.
printf("(unsigned long)test1 = %lu\n", (unsigned long)(test1));
// ^^
It widens it to be the size of an unsigned long. Imagine if you called this with a char and shifted it 8 bits to the right, the anding wouldn't work the same.
Also just found this (look under right-shift operator) for why it's unsigned. Apparently unsigned forces a logical shift in which the left-most bit is replaced with a zero for each position shifted. Whereas a signed value shifted performs an arithmetic shift where the left-most bit is replaced by the dropped rightmost bit.
Example:
11000011 ( unsigned, shifted to the right by 1 )
01100001
11000011 ( signed, shifted to the right by 1 )
11100001
Could someone explain to me what's happening to "n" in this situation?
You are casting n to unsigned long.
So what does (unsigned long)(n) actually do?
It will promote n to unsigned long.
Casting the input is all it's doing before the bit shift and the anding. Being careful about order if operations and precedence of operators. It's pretty ugly.
But looks like they're avoiding hitting the sign bit and by doing this instead of a function, there's no type checking on n.
It's just ugly.
Better form would be to have a clean clear function that has input type checking.
That ensures that n has the proper size (in bits) and most importantly is treated as unsigned. As the shift operators perform sign extension, when a number is signed and negative, the extension will be done with 1 not zero. It means that a negative number shifted will always result in a negative number.
For example:
int main()
{
long i = -1;
long x, y;
x = ((unsigned long)i) >> 8;
y = i >> 8;
printf("%ld %ld\n", x, y);
}
On my machine it outputs:
72057594037927935 -1
Because of the sign extension in y, the number continues to be -1:

Resources