convert chars to integer correct way [duplicate] - c

This question already has answers here:
embedding chars in int and vice versa
(3 answers)
Copying a 4 element character array into an integer in C
(6 answers)
Closed 9 years ago.
Just trying to make sure I got it right.
On SO I encountered an answer on a question: how to store chars in int like this:
unsigned int final = 0;
final |= ( data[0] << 24 );
final |= ( data[1] << 16 );
final |= ( data[2] << 8 );
final |= ( data[3] );
But to my understanding this is wrong isn't it?
Why: say data has stored the integer in little endian way (e.g., data[0]=LSB_ofSomeInt).
Then if machine executing above code is little endian, final will hold correct value,
else if the machine running above code is big endian it will hold a wrong value, isn't it?
Just trying to make sure I got this right, I am not going to ask more question in this direction for now.

Do not do this when you have functions like htonl etc.
Takes the hassle out of things

This code does not depend on the endianness of the platform: data[0] is always stored as the most significant byte of the int, followed by the rest, and data[3] is always the least significant byte.
Whether that's "right" or "wrong" depends on how the integer has been encoded in the data array itself.
There is one problem though: if data has been declared using char rather than unsigned char, the signed data[i] will be promoted first to a signed int and you end up setting many more bits than you intended.

This is wrong in little and big endian systems.
If data elements are of type char, you then need to cast all data elements to unsigned char before doing the bitwise left shift, otherwise you may encounter sign extension on data elements with negative values. The signedness of char is implementation defined and char can be a signed type.
Also data[0] << 24 (or even (unsigned char) data[0] << 24) will invoke undefined behavior if data[0] is a negative value as the resulting value is then not representable in an int and so you'll need an extra cast to unsigned int.
The best is to declare an unsigned char array for data and then cast each data elements to unsigned int before the left shift.
Now assuming you cast it correctly, this will work only if data[0] holds the most significant byte of your value.

Besides the obvious problem of platform-specific byte-ordering (which other answers have addressed), you should be careful about promotion of data types.
I'm assuming that data is an array of type unsigned char. In which case, the expression
data[0] << 24
is zero; you just left shifted an 8-bit operand 24 bits! I haven't compiled it to check, or reviewed the type promotion rules, but I believe, the way you have parenthesized it,
data[0] << 24 is still an unsigned char. It gets promoted when you bit-wise or the result with final. At best, it leaves too much to interpretation. A safer, more explicit way to do this is to bit-wise or first, then shift:
final |= data[0]; final <<= 8;
final |= data[1]; final <<= 8;
final |= data[2]; final <<= 8;
final |= data[3]; final <<= 8;
or you could promote explicitly and then shift:
final |= ((unsigned int)data[0]) << 24;
final |= ((unsigned int)data[1]) << 16;
final |= ((unsigned int)data[2]) << 8;
final |= ((unsigned int)data[3]);
Of course, this doesn't deal with the endianness problem at all. But that may or may not be a problem, depending on where data came from.

Related

shifting an unsigned char by more than 8 bits

I'm a bit troubled by this code:
typedef struct _slink{
struct _slink* next;
char type;
void* data;
}
assuming what this describes is a link in a file, where data is 4bytes long representing either an address or an integer(depending on the type of the link)
Now I'm looking at reformatting numbers in the file from little-endian to big-endian, and so what I wanna do is change the order of the bytes before writing back to the file, i.e.
for 0x01020304, I wanna convert it to 0x04030201 so when I write it back, its little endian representation is gonna look like the big endian representation of 0x01020304, I do that by multiplying the i'th byte by 2^8*(3-i), where i is between 0 and 3. Now this is one way it was implemented, and what troubles me here is that this is shifting bytes by more than 8 bits.. (L is of type _slink*)
int data = ((unsigned char*)&L->data)[0]<<24) + ((unsigned char*)&L->data)[1]<<16) +
((unsigned char*)&L->data)[2]<<8) + ((unsigned char*)&L->data)[3]<<0)
Can anyone please explain why this actually works? without having explicitly cast these bytes to integers to begin with(since they're only 1 bytes but are shifted by up to 24 bits)
Thanks in advance.
Any integer type smaller than int is promoted to type int when used in an expression.
So the shift is actually applied to an expression of type int instead of type char.
Can anyone please explain why this actually works?
The shift does not occur as an unsigned char but as a type promoted to an int1. #dbush.
Reasons why code still has issues.
32-bit int
Shifting a int 1 into the the sign's place is undefined behavior UB. See also #Eric Postpischil.
((unsigned char*)&L->data)[0]<<24) // UB
16-bit int
Shifting by the bit width or more is insufficient precision even if the type was unsigned. As int it is UB like above. Perhaps then OP would have only wanted a 2-byte endian swap?
Alternative
const uint8_t *p = &L->data;
uint32_t data = (uint32_t)p[0] << 24 | (uint32_t)p[1] << 16 | //
(uint32_t)p[2] << 8 | (uint32_t)p[3] << 0;
For the pedantic
Had int used non-2's complement, the addition of a negative value from ((unsigned char*)&L->data)[0]<<24) would have messed up the data pattern. Endian manipulations are best done using unsigned types.
from little-endian to big-endian
This code does not swap between those 2 endians. It is a big endian to native endian swap. When this code is run on a 32-bit unsigned little endian machine, it is effectively a big/little swap. On a 32-bit unsigned big endian machine, it could have been a no-op.
1 ... or posibly an unsigned on select platforms where UCHAR_MAX > INT_MAX.

32-Bit variable shifting in 8-bit MCU

I'm using 32-bit variable for storing 4 8-bit values into one 32-bit value.
32_bit_buf[0]= cmd[9]<<16 | cmd[10]<<8| cmd[11] <<0;
cmd is of unsigned char type with data
cmd [9]=AA
cmd[10]=BB
cmd[11]=CC
However when 32-bit variable is printed I'm getting 0xFFFFBBCC.
Architecture- 8-bit AVR Xmega
Language- C
Can anyone figure out where I'm going wrong.
Your architecture uses 16bit int, so shifting by 16 places is undefined. Cast your cmd[9] to a wider type, e.g. (uint32_t)cmd[9] << 16 should work.
You should also apply this cast to the other components: When you shift cmd[10] by 8 places, you could shift into the sign-bit of the 16bit signed int your operands are automatically promoted to, leading to more strange/undefined behavior.
That is because you are trying to shift value in 8 bit container (unsigned char) and get a 32 bit. The 8-bit value will be expanded to int (16 bit), but this is still not enough. You can solve the issue in many ways, one of them for e.g. could be by using the destination variable as accumulator.
32_bit_buf[0] = cmd[9];
32_bit_buf[0] <<= 8;
32_bit_buf[0] |= cmd[10];
32_bit_buf[0] <<= 8;
32_bit_buf[0] |= cmd[11];

Why doesn't my bit minipulation construct work in C

What I'm trying to do is make a mask with a 1 bit all the way to the left side of the set of bits with the rest being zero, irrespective of variable size. I tried the following:
unsigned char x = ~(~0 >> 1);
which, to me, should work whether it's done on a char or an int, but it doesn't!
To me, the manipulation looks like this:
||||||||
0|||||||
|0000000
This is what it appears it should look like, and on a 16-bit integer:
|||||||| ||||||||
0||||||| ||||||||
|0000000 00000000
Why doesn't this construct work? It's giving me zero whether I try to assign it to an unsigned char, or an int.
I'm on like 50 page of K&R, so I'm pretty new. I don't know what a literal means, I'm not sure what an "arithmetic" shift is, I don't know how to use suffix', and I damn sure can't use a structure.
~0 is the int zero with all bits inverted, which is the int consisting of all ones. On a 2s complement machine, this is a -1. Right shifting a -1 will cause sign extension, so ~0 >> 1 is still all ones.
What you want is to right shift an unsigned quantity, which will not invoke sign extension.
~0u >> 1
is an unsigned integer with the high order bit zero and all others set to 1, so
~(0u >> 1)
is an unsigned integer with the high order bit of one and all others set to zero.
Now getting this to work for all data sizes is nontrivial because C converts the operands of integer arithmetic to int or unsigned int beforehand. For example,
~(unsigned char)0 >> 1
produces an int result of -1 because the unsigned char is "promoted" to int before the ~ is applied.
So to get what you want with all data types, the only way I can see is to use sizeof to see how many bytes (or octets) are in the data.
#include <stdio.h>
#include <limits.h>
#define LEADING_ONE(X) (1 << (CHAR_BIT * sizeof(X) - 1))
int main(void) {
printf("%x\n", LEADING_ONE(char));
printf("%x\n", LEADING_ONE(int));
return 0;
}
The general rule for C is that expressions are evaluated in a common type, in this case (signed) integer. The evaluation of (~0) and (~0 >> 1) are signed integers and the shift is an arithmetic shift. In your case that is being implemented with sign extension, so:
(0xffffffff >> 1) => (0xffffffff)
A logical shift will inject the zero on the left that you were expecting, so your problem is how to make the compiler do a logical shift. Try:
unsigned char a = ~0;
unsigned char b = a >> 1; // this should do a logical shift
unsigned char c = ~b;
There are better ways to do what you are trying, but this should get you over the current problem.
There are two things that are giving you the unexpected result.
You are starting out with 0, which is treated as a signed int.
The intermediate results get converted to int.
If you work with unsigned char at strategic points, you should be OK.
unsigned char c = ((unsigned char)~0 >> 1);
c = ~c;

Assigning bits to a 64-bit variable

I am kinda new to bit operations. I am trying to store information in an int64_t variable like this:
int64_t u = 0;
for(i=0;i<44;i++)
u |= 1 << i;
for(;i<64;i++)
u |= 0 << i;
int t = __builtin_popcountl(u);
and what I intended with this was to store 44 1s in variable u and make sure that the remaining positions are all 0, so "t" returns 44. However, it always returns 64. With other variables, e.g. int32, it also fails. Why?
The type of an expression is generally determined by the expression itself, not by the context in which it appears.
Your variable u is of type int64_t (incidentally, uint64_t would be better since you're performing bitwise operations).
In this line:
u |= 1 << i;
since 1 is of type int, 1 << i is also of type int. If, as is typical, int is 32 bits, this has undefined behavior for larger values of i.
If you change this line to:
u |= (uint64_t)1 << i;
it should do what you want.
You could also change the 1 to 1ULL. That gives it a type of unsigned long long, which is guaranteed to be at least 64 bits but is not necessarily the same type as uint64_t.
__builtin_popcountl takes unsigned long as its paremeter, which is not always 64-bit integer. I personally use __builtin_popcountll, which takes long long. Looks like it's not the case for you
Integers have type 'int' by default, and by shifting int by anything greater or equal to 32 (to be precise, int's size in bits), you get undefined behavior. Correct usage: u |= 1LL << i; Here LL stands for long long.
Oring with zero does nothing. You can't just set bit to a particular value, you should either OR with mask (if you want to set some bits to 1s) or AND with mask's negation (if you want to set some bits to 0s), negation is done by tilda (~).
When you shift in the high bit of the 32-bit integer and and convert to 64-bit the sign bit will extend through the upper 32 bits; which you will then OR in setting all 64 bits, because your literal '1' is a signed 32 bit int by default. The shift will also not effect the upper 32 bits because the value is only 32 bit; however the conversion to 64-bit will when the the value being converted is negative.
This can be fixed by writing your first loop like this:
for(i=0;i<44;i++)
u |= (int64_t)1 << i;
Moreover, this loop does nothing since ORing with 0 will not alter the value:
for(;i<64;i++)
u |= 0 << i;

bytes are swapped when pointer cast is done in C

I have an array of "unsigned short" i.e. 16-bits each element in C. I have two "unsigned short" values which should be written back in array in little endian order which means that least significant element will come first. For example, if I have following value:
unsigned int val = 0x12345678;
it should be stored in my array as:
unsigned short buff[10];
buff[0] = 0x5678;
buff[1] = 0x1234;
I have written a code to write the value at once and not extracting upper and lower 16-bits of the int value and write them separately since there might be atomicity problems. My code looks like this:
typedef unsigned int UINT32;
*((UINT32*)(buff)) = (value & 0xffff0000) + (value & 0xffff);
Surprisingly, the code above works correctly and the results will be:
buff[0] is 0x5678;
buff[1] is 0x1234;
The problem is, as it is shown, I am saving the "unsigned short" values in big endian order and not little endian as I wish. In other words, when I cast the pointer from "unsigned short*" to "unsigned int*" the 16-bit elements are swapped automatically! Does anybody knows what happens here and why the data gets swapped?
Your platform represents data in little endian format, and by casting buff to (UINT32 *), you are telling the compiler that buff must now be interpreted as pointer to unsigned int. The instruction
*((UINT32*)(buff)) = (value & 0xffff0000) + (value & 0xffff);
Just says "write (value & 0xffff0000) + (value & 0xffff) into this unsigned int (buff)". And that's what he does, how he stores it is not your business. You're not supposed to access either of the lower or upper 16 bits, because it is platform dependent which one comes first.
All you know is that if you access buff as an unsigned int, you will get the same value that you previously stored in there, but it is not safe to assume any particular byte order.
So basically your code has undefined behavior.

Resources