c and bit shifting in a char - c

I am new to C and having a hard time understanding why the code below prints out ffffffff when binary 1111111 should equal hex ff.
int i;
char num[8] = "11111111";
unsigned char result = 0;
for ( i = 0; i < 8; ++i )
result |= (num[i] == '1') << (7 - i);
}
printf("%X", bytedata);

You print bytedata which may be uninitialized.
Replace
printf("%X", bytedata);
with
printf("%X", result);
Your code then run's fine. code
Although it is legal in C, for good practice you should make
char num[8] = "11111111";
to
char num[9] = "11111111";
because in C the null character ('\0') always appended to the string literal. And also it would not compile as a C++ file with g++.
EDIT
To answer your question
If I use char the result is FFFFFFFF but if I use unsigned char the result is FF.
Answer:
Case 1:
In C size of char is 1byte(Most implementation). If it is unsigned we can
use 8bit and hold maximum 11111111 in binary and FF in hex(decimal 255). When you print it with printf("%X", result);, this value implicitly converted to unsigned int which becomes FF in hex.
Case 2: But when you use char(signed), then MSB bit use as sign bit, so you can use at most 7 bit for your number whose range -128 to 127 in decimal. When you assign it with FF(255 in decimal) then Integer Overflow occur which leads to Undefined behavior.

Related

Char automatically converts to int (I guess)

I have following code
char temp[] = { 0xAE, 0xFF };
printf("%X\n", temp[0]);
Why output is FFFFFFAE, not just AE?
I tried
printf("%X\n", 0b10101110);
And output is correct: AE.
Suggestions?
The answer you're getting, FFFFFFAE, is a result of the char data type being signed. If you check the value, you'll notice that it's equal to -82, where -82 + 256 = 174, or 0xAE in hexadecimal.
The reason you get the correct output when you print 0b10101110 or even 174 is because you're using the literal values directly, whereas in your example you're first putting the 0xAE value in a signed char where the value is then being sort of "reinterpreted modulo 128", if you wanna think of it that way.
So in other words:
0 = 0 = 0x00
127 = 127 = 0x7F
128 = -128 = 0xFFFFFF80
129 = -127 = 0xFFFFFF81
174 = -82 = 0xFFFFFFAE
255 = -1 = 0xFFFFFFFF
256 = 0 = 0x00
To fix this "problem", you could declare the same array you initially did, just make sure to use an unsigned char type array and your values should print as you expect.
#include <stdio.h>
#include <stdlib.h>
int main()
{
unsigned char temp[] = { 0xAE, 0xFF };
printf("%X\n", temp[0]);
printf("%d\n\n", temp[0]);
printf("%X\n", temp[1]);
printf("%d\n\n", temp[1]);
return EXIT_SUCCESS;
}
Output:
AE
174
FF
255
https://linux.die.net/man/3/printf
According to the man page, %x or %X accept an unsigned integer. Thus it will read 4 bytes from the stack.
In any case, under most architectures you can't pass a parameter that is less then a word (i.e. int or long) in size, and in your case it will be converted to int.
In the first case, you're passing a char, so it will be casted to int. Both are signed, so a signed cast is performed, thus you see preceding FFs.
In your second example, you're actually passing an int all the way, so no cast is performed.
If you'd try:
printf("%X\n", (char) 0b10101110);
You'd see that FFFFFFAE will be printed.
When you pass a smaller than int data type (as char is) to a variadic function (as printf(3) is) the parameter is converted to int in case the parameter is signed and to unsigned int in the case it is unsigned. What is being done and you observe is a sign extension, as the most significative bit of the char variable is active, it is replicated to the thre bytes needed to complete an int.
To solve this and to have the data in 8 bits, you have two possibilities:
Allow your signed char to convert to an int (with sign extension) then mask the bits 8 and above.
printf("%X\n", (int) my_char & 0xff);
Declare your variable as unsigned, so it is promoted to an unsigned int.
unsigned char my_char;
...
printf("%X\n", my_char);
This code causes undefined behaviour. The argument to %X must have type unsigned int, but you supply char.
Undefined behaviour means that anything can happen; including, but not limited to, extra F's appearing in the output.

Confusion on printing strings as integers in C

I would like to know how the string is represented in integer, so I wrote the following program.
#include <stdio.h>
int main(int argc, char *argv[]){
char name[4] = {"#"};
printf("integer name %d\n", *(int*)name);
return 0;
}
The output is:
integer name 64
This is understandable because # is 64 in integer, i.e., 0x40 in hex.
Now I change the program into:
#include <stdio.h>
int main(int argc, char *argv[]){
char name[4] = {"##"};
printf("integer name %d\n", *(int*)name);
return 0;
}
The output is:
integer name 16448
I dont understand this. Since ## is 0x4040 in hex. So it should be 2^12+2^6 = 4160
If I count the '\0' at the end of the string, then it should be 2^16+2^10 = 66560
Could someone explain where 16448 comes from?
Your math is wrong: 0x4040 == 16448. The two fours are the 6th and 14th bits respectively.
Your code actually invokes undefined behavior because you must not alias a char * with an int *. This is known as the strict aliasing rule. To see just one reason why this should be disallowed, consider what would otherwise have to happen if the code is run on a little and a big endian machine.
If you want to see the hex pattern of the string, you should simply loop over its bytes and print out each byte.
void
print_string(const char * strp)
{
printf("0x");
do
printf("%02X", (unsigned char) *strp);
while (*strp++);
printf("\n");
}
Of course, instead of printing the bytes, you can shift them into an integer (that will very soon overflow) and only finally output that integer. While doing this, you'll be forced to take a stand on “your” endianness.
/* Interpreting as big endian. */
unsigned long
int_string(const char * strp)
{
unsigned long value = 0UL;
do
value = (value << 8) | (unsigned char) *strp;
while (*strp++);
return value;
}
This is how 16448 comes :
0x4040 can be written like this in binary :
4 0 4 0 -> Hex
0100 0000 0100 0000 -> Binary
2^14 2^6 = 16448
Because here 6th and 14th bit are set.
Hope you got it :)

Unsigned Char pointing to unsigned integer

I don't understand why the following code prints out 7 2 3 0 I expected it to print out 1 9 7 1. Can anyone explain why it is printing 7230?:
unsigned int e = 197127;
unsigned char *f = (char *) &e;
printf("%ld\n", sizeof(e));
printf("%d ", *f);
f++;
printf("%d ", *f);
f++;
printf("%d ", *f);
f++;
printf("%d\n", *f);
Computers work with binary, not decimal, so 197127 is stored as a binary number and not a series of single digits separately in decimal
19712710 = 0003020716 = 0011 0000 0010 0000 01112
Suppose your system uses little endian, 0x00030207 would be stored in memory as 0x07 0x02 0x03 0x00 which is printed out as (7 2 3 0) as expected when you print out each byte
Because with your method you print out the internal representation of the unsigned and not its decimal representation.
Integers or any other data are represented as bytes internally. unsigned char is just another term for "byte" in this context. If you would have represented your integer as decimal inside a string
char E[] = "197127";
and then done an anologous walk throught the bytes, you would have seen the representation of the characters as numbers.
Binary representation of "197127" is "00110000001000000111".
The bytes looks like "00000111" (is 7 decimal), "00000010" (is 2), "0011" (is 3). the rest is 0.
Why did you expect 1 9 7 1? The hex representation of 197127 is 0x00030207, so on a little-endian architecture, the first byte will be 0x07, the second 0x02, the third 0x03, and the fourth 0x00, which is exactly what you're getting.
The value of e as 197127 is not a string representation. It is stored as a 16/32 bit integer (depending on platform). So, in memory, e is allocated, say 4 bytes on the stack, and would be represented as 0x30207 (hex) at that memory location. In binary, it would look like 110000001000000111. Note that the "endian" would actually backwards. See this link account endianess. So, when you point f to &e, you are referencing the 1st byte of the numeric value, If you want to represent a number as a string, you should have
char *e = "197127"
This has to do with the way the integer is stored, more specifically byte ordering. Your system happens to have little-endian byte ordering, i.e. the first byte of a multi byte integer is least significant, while the last byte is most significant.
You can try this:
printf("%d\n", 7 + (2 << 8) + (3 << 16) + (0 << 24));
This will print 197127.
Read more about byte order endianness here.
The byte layout for the unsigned integer 197127 is [0x07, 0x02, 0x03, 0x00], and your code prints the four bytes.
If you want the decimal digits, then you need to break the number down into digits:
int digits[100];
int c = 0;
while(e > 0) { digits[c++] = e % 10; e /= 10; }
while(c > 0) { printf("%u\n", digits[--c]); }
You know the type of int often take place four bytes. That means 197127 is presented as 00000000 00000011 00000010 00000111 in memory. From the result, your memory's address are Little-Endian. Which means, the low-byte 0000111 is allocated at low address, then 00000010 and 00000011, finally 00000000. So when you output f first as int, through type cast you obtain a 7. By f++, f points to 00000010, the output is 2. The rest could be deduced by analogy.
The underlying representation of the number e is in binary and if we convert the value to hex we can see that the value would be(assuming 32 bit unsigned int):
0x00030207
so when you iterate over the contents you are reading byte by byte through the *unsigned char **. Each byte contains two 4 bit hex digits and the byte order endiannes of the number is little endian since the least significant byte(0x07) is first and so in memory the contents are like so:
0x07020300
^ ^ ^ ^- Fourth byte
| | |-Third byte
| |-Second byte
|-First byte
Note that sizeof returns size_t and the correct format specifier is %zu, otherwise you have undefined behavior.
You also need to fix this line:
unsigned char *f = (char *) &e;
to:
unsigned char *f = (unsigned char *) &e;
^^^^^^^^
Because e is an integer value (probably 4 bytes) and not a string (1 byte per character).
To have the result you expect, you should change the declaration and assignment of e for :
unsigned char *e = "197127";
unsigned char *f = e;
Or, convert the integer value to a string (using sprintf()) and have f point to that instead :
char s[1000];
sprintf(s,"%d",e);
unsigned char *f = s;
Or, use mathematical operation to get single digit from your integer and print those out.
Or, ...

C - unsigned int to unsigned char array conversion

I have an unsigned int number (2 byte) and I want to convert it to unsigned char type. From my search, I find that most people recommend to do the following:
unsigned int x;
...
unsigned char ch = (unsigned char)x;
Is the right approach? I ask because unsigned char is 1 byte and we casted from 2 byte data to 1 byte.
To prevent any data loss, I want to create an array of unsigned char[] and save the individual bytes into the array. I am stuck at the following:
unsigned char ch[2];
unsigned int num = 272;
for(i=0; i<2; i++){
// how should the individual bytes from num be saved in ch[0] and ch[1] ??
}
Also, how would we convert the unsigned char[2] back to unsigned int.
Thanks a lot.
You can use memcpy in that case:
memcpy(ch, (char*)&num, 2); /* although sizeof(int) would be better */
Also, how would be convert the unsigned char[2] back to unsigned int.
The same way, just reverse the arguments of memcpy.
How about:
ch[0] = num & 0xFF;
ch[1] = (num >> 8) & 0xFF;
The converse operation is left as an exercise.
How about using a union?
union {
unsigned int num;
unsigned char ch[2];
} theValue;
theValue.num = 272;
printf("The two bytes: %d and %d\n", theValue.ch[0], theValue.ch[1]);
It really depends on your goal: why do you want to convert this to an unsigned char? Depending on the answer to that there are a few different ways to do this:
Truncate: This is what was recomended. If you are just trying to squeeze data into a function which requires an unsigned char, simply cast uchar ch = (uchar)x (but, of course, beware of what happens if your int is too big).
Specific endian: Use this when your destination requires a specific format. Usually networking code likes everything converted to big endian arrays of chars:
int n = sizeof x;
for(int y=0; n-->0; y++)
ch[y] = (x>>(n*8))&0xff;
will does that.
Machine endian. Use this when there is no endianness requirement, and the data will only occur on one machine. The order of the array will change across different architectures. People usually take care of this with unions:
union {int x; char ch[sizeof (int)];} u;
u.x = 0xf00
//use u.ch
with memcpy:
uchar ch[sizeof(int)];
memcpy(&ch, &x, sizeof x);
or with the ever-dangerous simple casting (which is undefined behavior, and crashes on numerous systems):
char *ch = (unsigned char *)&x;
Of course, array of chars large enough to contain a larger value has to be exactly as big as this value itself.
So you can simply pretend that this larger value already is an array of chars:
unsigned int x = 12345678;//well, it should be just 1234.
unsigned char* pChars;
pChars = (unsigned char*) &x;
pChars[0];//one byte is here
pChars[1];//another byte here
(Once you understand what's going on, it can be done without any variables, all just casting)
You just need to extract those bytes using bitwise & operator. OxFF is a hexadecimal mask to extract one byte. Please look at various bit operations here - http://www.catonmat.net/blog/low-level-bit-hacks-you-absolutely-must-know/
An example program is as follows:
#include <stdio.h>
int main()
{
unsigned int i = 0x1122;
unsigned char c[2];
c[0] = i & 0xFF;
c[1] = (i>>8) & 0xFF;
printf("c[0] = %x \n", c[0]);
printf("c[1] = %x \n", c[1]);
printf("i = %x \n", i);
return 0;
}
Output:
$ gcc 1.c
$ ./a.out
c[0] = 22
c[1] = 11
i = 1122
$
Endorsing #abelenky suggestion, using an union would be a more fail proof way of doing this.
union unsigned_number {
unsigned int value; // An int is 4 bytes long
unsigned char index[4]; // A char is 1 byte long
};
The characteristics of this type is that the compiler will allocate memory only for the biggest member of our data structure unsigned_number, which in this case is going to be 4 bytes - since both members (value and index) have the same size. Had you defined it as a struct instead, we would have 8 bytes allocated on memory, since the compiler does its allocation for all the members of a struct.
Additionally, and here is where your problem is solved, the members of an union data structure all share the same memory location, which means they all refer to same data - think of that like a hard link on GNU/Linux systems.
So we would have:
union unsigned_number my_number;
// Assigning decimal value 202050300 to my_number
// which is represented as 0xC0B0AFC in hex format
my_number.value = 0xC0B0AFC; // Representation: Binary - Decimal
// Byte 3: 00001100 - 12
// Byte 2: 00001011 - 11
// Byte 1: 00001010 - 10
// Byte 0: 11111100 - 252
// Printing out my_number one byte at time
for (int i = 0; i < (sizeof(my_number.value)); i++)
{
printf("index[%d]: %u, 0x%x\n", \
i, my_number.index[i], my_number.index[i]);
}
// Printing out my_number as an unsigned integer
printf("my_number.value: %u, 0x%x", my_number.value, my_number.value);
And the output is going to be:
index[0]: 252, 0xfc
index[1]: 10, 0xa
index[2]: 11, 0xb
index[3]: 12, 0xc
my_number.value: 202050300, 0xc0b0afc
And as for your final question, we wouldn't have to convert from unsigned char back to unsigned int since the values are already there. You just have to choose by which way you want to access it
Note 1: I am using an integer of 4 bytes in order to ease the understanding of the concept. For the problem you presented you must use:
union unsigned_number {
unsigned short int value; // A short int is 2 bytes long
unsigned char index[2]; // A char is 1 byte long
};
Note 2: I have assigned byte 0 to 252 in order to point out the unsigned characteristic of our index field. Was it declared as a signed char, we would have index[0]: -4, 0xfc as output.

OR operator in C not working

I don't understand why the final printf in the code below is not printing 255.
char c;
c = c & 0;
printf("The value of c is %d", (int)c);
int j = 255;
c = (c | j);
printf("The value of c is %d", (int)c);
In most implementations the char type is signed, so it ranges from -128 to 127.
This means, 11111111 (which is 255 written in binary) is equal to -1. (As it's represented as a value stored in two's complement)
To get what you expect, you need to declare c as a unsigned char, like so:
unsigned char c = 0;
int j = 255;
c = (c | j);
printf("The value of c is %d", (int)c);
It is probably printing -1. That is because
c = (c | j);
will evaluate as
c = (0 | 255) = (0 | 0xFF) = 0xFF
but, since c is signed, 0xFF will be -1 and not 255 as you expected. If you change c to unsigned char it will print 255, as you imagined.
Try to replace char c with unsigned char c. Basically char type supports values from -128 to 127. Your result is bigger than the supported range and overflows.
By default char
is signed char
in C. If you want 255 to be printed then use unsigned char
however i explain the output in context of signed char
so that the concept becomes clear.
If you write j=127 then it will print 127 as the bit representation is 01111111.
But if you write j=128
then it will print -128
because the bit representation 01111111 has increased by 1 to become 10000000.
now if you write j=255
then the bit representation is 11111111. so it will print -1
If you write j=256
then it will print 0
because when bit representation 11111111 is increased by 1 it becomes 100000000. But only 1 byte is allocated for a characher variable so the left most bit is not stored and bit representation becomes 00000000.
Further if you write j=257
then its also a case of overflow and the bit representation become 00000001 and 1 will be printed.

Resources