Bit shift and pointer oddities in C, looking for explanations - c

I discovered something odd that I can't explain. If someone here can see what or why this is happening I'd like to know. What I'm doing is taking an unsigned short containing 12 bits aligned high like this:
1111 1111 1111 0000
I then want to shif the bits so that each byte in the short hold 7bits with the MSB as a pad. The result on what's presented above should look like this:
0111 1111 0111 1100
What I have done is this:
unsigned short buf = 0xfff;
//align high
buf <<= 4;
buf >>= 1;
*((char*)&buf) >>= 1;
This gives me something like looks like it's correct but the result of the last shift leaves the bit set like this:
0111 1111 1111 1100
Very odd. If I use an unsigned char as a temporary storage and shift that then it works, like this:
unsigned short buf = 0xfff;
buf <<= 4;
buf >>= 1;
tmp = *((char*)&buf);
*((char*)&buf) = tmp >> 1;
The result of this is:
0111 1111 0111 1100
Any ideas what is going on here?

Yes, it looks like char is signed on your platform. If you did *((unsigned char*)&buf) >>= 1, it would work.

Lets break this down. I'll assume that your compiler thinks of short as a 16-bits of memory.
unsigned short buf = 0xfff;
//align high
buf <<= 4;
is equivalent to:
unsigned short buf = 0xfff0;
... and
buf >>= 1;
should result in buf having the value 0x7ff8 (i.e. th bits shifted to the right by one bit). Now for your fancy line:
*((char*)&buf) >>= 1;
lots going on here... first the left hand side needs to be resolved. What you're saying is take buf and treat it as a pointer to 8-bits of memory (as opposed to it's natural 16-bits). Which of the two bytes that buf originally referred to relies on knowning what your memory endian-ness is (if it's big-endian buf points to 0x7f, if it's little-endian buf points to 0xf8). I'll assume you're on an Intel box, which means its little endian, and now buff points to 0xf8. Your statement then says assign to that byte, the value at that byte shifted (and sign extended since char is signed) to the right by one, or 0xfc. The other byte will remain unchanged. If you want no sign extension, chast buf to a (unsigned char *).

Related

How to copy two elements of a list into one bigger element?

Here's an example :
I have a list of uint8_t element (It's not real code ^^) :
List[0] = 0010 1001
List[1] = 0100 0111
And have one unsigned short element (Twice the size of uint8_t).
I want my short element to be this after : 0010 1001 0100 0111
Can I move it like this :
ShortElement = (unsigned short) UnsignedInt8List[0];
Or do I have to use binary tweaks to move it?
Thank you :)
No, you can't simply cast the value. Since unsigned short can always represent the value in uint8_t, the value will remained unchanged. If you do ShortElement = (unsigned short) UnsignedInt8List[0];, ShortElement will be assigned the value 0000 0000 0010 1001.
You should use bitwise operations to combine the values:
ShortElement = ((unsigned short) UnsignedInt8List[0] << 8) |
((unsigned short) UnsignedInt8List[1] << 0);
This is a little more verbose than it needs to be, but I've included everything for clarity. It could be equivalently written as:
ShortElement = ((unsigned short) UnsignedInt8List[0] << 8) | UnsignedInt8List[1];
One note is to be careful with your types. You know the size of uint8_t, but you don't portably know that unsigned short is exactly 16 bits long, you only know that it must be at least 16 bits long to be able to hold the range [0, 65535]. This isn't a problem in anything you've shown because of that minimum of 16 bits, it's even often recommended to use non-fixed width types where safe and convenient, but it's something to keep in mind.
One this that might be tempting but that you should not do is to use pointers to get the value:
// DO NOT DO THIS, IT IS THE WRONG WAY
ShortElement = *(unsigned short *) &UnsignedInt8List[0];
Don't do that, you'll get the wrong result on many systems, and it might outright cause a crash on some. It's undefined behaviour.

Converting array of characters to an array of uint32_t in c-- is this the proper way?

I am trying to convert an array of characters into an array of uint32_t in order to use that in a CRC calculation. I was curious if this is the correct way to do this or if it is dangerous? I have a habit of doing dangerous conversions and I am trying to learn better ways to convert things that are less dangerous :). I know that each char in the array is 8 bits. Should I sum 4 of the characters up and toss it into an index of the unsigned int array or is it ok just to place each character in its separate array? Would summing four 8 bit characters up change their values into the array? I have read something about shifting characters, however, I am not sure exactly how to shift the four characters into one index of the unsigned int array.
text[i] is my array of characters.
uint32_t inputText[512];
for( i = 0; i < 504; i++)
{
inputText[i] = (uint32_t)text[i];
}
The cast seems fine; although, I'm not sure why you say i < 504 when your array of uint32_ts is 512. (If you do want to only convert 504 values and you want a 512-length array, you might want to use array[512] = {0} to ensure the memory is zeroed out instead of the last 8 values being set to whatever was previously in the memory.) Nonetheless, it is perfectly safe to say: SomeArrayOfLargerType[i] = (largerType_t)SomeArrayOfSmallerType[i], but bear in mind that how it is now, your binary will end up looking something like:
0100 0001 -> 0000 0000 0000 0000 0000 0000 0100 0001
So, those 24 leading 0s might be an undesired result.
As for summing up four characters, that will almost definitely not work out how you want; unless you literally want the sum like 0000 0001 (one) + 0000 0010 (two) = 0000 0100 (three). If you would instead want the previous example to produce 00000001 000000010, then yes, you would need to apply shifts.
UPDATE - Some information about shifting via example:
The following would be an example of shifting:
uint32_t valueArray[FINAL_LENGTH] = {0};
int i;
for(i=0; i < TEXT_LENGTH; i++){ // text_length is the initial message/text length (512 bytes or something)
int mode = i % 4; // 4-to-1 value storage ratio (4 uint8s being stored as 1 uint32)
int writeLocation = (int)(i/4); // values will be truncated, so something like 3/4 = 0 (which is desired)
switch(mode){
case(0):
// add to bottom 8-bits of index
valueArray[writeLocation] = text[i];
break;
case(1):
valueArray[writeLocation] |= (text[i] << 8); // shift to left by 8 bits to insert to second byte
break;
case(2):
valueArray[writeLocation] |= (text[i] << 16); // shift to left by 16 bits to insert to third byte
break;
case(3):
valueArray[writeLocation] |= (text[i] << 24); // shift to left by 24 bits to insert to fourth byte
break;
default:
printf("Some error occurred here... If source has been modified, please check to make sure the number of case handlers == the possible values for mode.\n");
}
}
You can see an example of that running here: https://ideone.com/OcDMoM (Note, there is some runtime error when executing that on IDEOne. I haven't looked intensely for that issue, though, as the output still seems to be accurate and the code is just meant to serve as an example.)
Essentially, because each byte is 8-bits, and you want to store the bytes in 4-byte chunks (32-bits each), you need four different cases for how far you shift. In the first case, the first 8-bits are filled in by a byte from the message. In the second case, the second 8-bits are filled in by the following byte in the message (which is left shifted by 8-bits because that is the offset for the binary position). And that continues for the remaining 2 bytes, and then it repeats starting at the next index of the initial message array.
When combining the bytes, |= is used because that will take what is already in uint32 and it will perform a bitwise OR on it, so the final values will combine into one single value.
So, to break down a simple example like what I had in my initial post, let's say I have 0000 0001 (one) and 0000 0010 (two), with an initial 16-bit integer to hold them 0000 0000 0000 0000. The first byte is assigned to the 16-bit integer making it 0000 0000 0000 0001. Then the second byte is left shifted by 8 making it 0000 0010 0000 0000. Finally, the two are via a bitwise OR, so the 16-bit integer becomes: 0000 0010 0000 0001.
In the case of a 32-bit integer to hold 4 bytes, that process will repeat 2 more times with 8 additional shifts, and then it will proceed to the next uint32 to repeat the process.
Hopefully that all makes sense. If not, I can try to clarify further.

C: bit operations on a variable-length bit string

I'm doing some bit operations on a variable-length bit string.
I defined a function setBits(char *res, int x, int y) that should work on that bit string passed by the *res variable, given a x and y (just to mention, I'm trying to implement something like a Bloom filter using 8 bits per x):
void setBits(char *res, int x, int y)
{
*res |= x << (y * 8)
}
E.g. given the following x-y-vectors {0,0} ; {0,1} ; {1,2} ; {2,3}, I expect a bit string like this (or vice-versa depending whether little- or big-endian, but that isn't important right now):
0000 0010 0000 0001 0000 0000 0000 0000
So the lowest 8 bits should come from {0,0}, the second 8 bits from {0,1}, the next 8 bits come from {1,2} and the last from {2,3}.
Unfortunately, and I don't seem to get the reason for that, setBits always returns only the last result (in this case i.e. the bit string from {2,3}). I debugged the code and realized that *res is always 0 - but why? What am I doing wrong? Is it that I chose char* that it doesn't work or am I completely missing something very stupid?
Assuming 8-bit chars, the maximum value you can store in *res is 0xff i.e. (1<<8)-1.
Consider what happens when you call setBits for x=1, y=1
x << (y * 8) == 1 << (1 * 8)
== 1 << 8
== 0x100
*res is an 8-bit value so can only store the bottom 8 bits of this calculation. For any non-zero value of y, the bits which can be stored in *res are guaranteed to be 0.

How to convert a 48-bit byte array into an 64-bit integer in C?

I have an unsigned char array whose size is 6. The content of the byte array is an integer (4096*number of seconds since Unix Time). I know that the byte array is big-endian.
Is there a library function in C that I can use to convert this byte array into int_64 or do I have to do it manually?
Thanks!
PS: just in case you need more information, yes, I am trying to parse an Unix timestamp. Here is the format specification of the timestamp that I dealing with.
A C99 implementation may offer uint64_t (it doesn't have to provide it if there is no native fixed-width integer that is exactly 64 bits), in which case, you could use:
#include <stdint.h>
unsigned char data[6] = { /* bytes from somewhere */ };
uint64_t result = ((uint64_t)data[0] << 40) |
((uint64_t)data[1] << 32) |
((uint64_t)data[2] << 24) |
((uint64_t)data[3] << 16) |
((uint64_t)data[4] << 8) |
((uint64_t)data[5] << 0);
If your C99 implementation doesn't provide uint64_t you can still use unsigned long long or (I think) uint_least64_t. This will work regardless of the native endianness of the host.
Have your tried this:
unsigned char a [] = {0xaa,0xbb,0xcc,0xdd,0xee,0xff};
unsigned long long b = 0;
memcpy(&b,a,sizeof(a)*sizeof(char));
cout << hex << b << endl;
Or you can do it by hand which will avoid some architecture specific issues.
I would recommend using normal integer operation (sums and shifts) rather than trying to emulate memory block ordering which is no better than the solution above in term of compatibility.
I think the best way to do it is using a union.
union time_u{
uint8_t data[6];
uint64_t timestamp;
}
Then you can use that memory space as a byte array or uint64_t, by referencing
union time_u var_name;
var_name.data[i]
var_name.timestamp
Here is a method to convert it to 64 bits:
uint64_t
convert_48_to_64(uint8_t *val_ptr){
uint64_t ret = 0;
uint8_t *ret_ptr = (uint8_t *)&ret;
for (int i = 0; i < 6; i++) {
ret_ptr[5-i] = val_ptr[i];
}
return ret;
}
convert_48_to_64((uint8_t)&temp); //temp is in 48 bit
eg: num_in_48_bit = 77340723707904; this number in 48 bit binary will be : 0100 0110 0101 0111 0100 1010 0101 1101 0000 0000 0000 0000 After conversion in 64 bit binary will be : 0000 0000 0000 0000 0000 0000 0000 0000 0101 1101 0100 1010 0101 0111 0100 0110 let's say val_ptr stores the base address of num_in_48_bit. Since pointer typecast to uint8_t, incrementing val_ptr will give you next byte. Looping over and copy the value byte by byte. Note, I am taking care of network to byte order as well.
You can use pack option
#pragma pack(1)
or
__attribute__((packed))
depending on the compiler
typedef struct __attribute__((packed))
{
uint64_t u48: 48;
} uint48_t;
uint48_t data;
memcpy(six_byte_array, &data, 6);
uint64_t result = data.u48;
See
_int64 bit field
How can I create a 48-bit uint for bit mask
If a 32-bit integer overflows, can we use a 40-bit structure instead of a 64-bit long one?
Which C datatype can represent a 40-bit binary number?

C bitwise shift

I suppose sizeof(char) is one byte. Then when I write following code,
#include<stdio.h>
int main(void)
{
char x = 10;
printf("%d", x<<5);
}
The output is 320
My question is, if char is one byte long and value is 10, it should be:
0000 1010
When I shift by 5, shouldn't it become:
0100 0001
so why is output 320 and not 65?
I am using gcc on Linux and checked that sizeof(char) = 1
In C, all intermediates that are smaller than int are automatically promoted to int.
Therefore, your char is being promoted to larger than 8 bits.
So your 0000 1010 is being shifted up by 5 bits to get 320. (nothing is shifted off the top)
If you want to rotate, you need to do two shifts and a mask:
unsigned char x = 10;
x = (x << 5) | (x >> 3);
x &= 0xff;
printf("%d", x);
It's possible to do it faster using inline assembly or if the compiler supports it, intrinsics.
Mysticial is right. If you do
char x = 10;
printf("%c", x);
It prints "#", which, if you check your ASCII table, is 64.
0000 1010 << 5 = 0001 0100 0000
You had overflow, but since it was promoted to an int, it just printed the number.
Because what you describe is a rotate, not a shift. 0 is always shifted in on left shifts.

Resources