shifting an unsigned char by more than 8 bits - c

I'm a bit troubled by this code:
typedef struct _slink{
struct _slink* next;
char type;
void* data;
}
assuming what this describes is a link in a file, where data is 4bytes long representing either an address or an integer(depending on the type of the link)
Now I'm looking at reformatting numbers in the file from little-endian to big-endian, and so what I wanna do is change the order of the bytes before writing back to the file, i.e.
for 0x01020304, I wanna convert it to 0x04030201 so when I write it back, its little endian representation is gonna look like the big endian representation of 0x01020304, I do that by multiplying the i'th byte by 2^8*(3-i), where i is between 0 and 3. Now this is one way it was implemented, and what troubles me here is that this is shifting bytes by more than 8 bits.. (L is of type _slink*)
int data = ((unsigned char*)&L->data)[0]<<24) + ((unsigned char*)&L->data)[1]<<16) +
((unsigned char*)&L->data)[2]<<8) + ((unsigned char*)&L->data)[3]<<0)
Can anyone please explain why this actually works? without having explicitly cast these bytes to integers to begin with(since they're only 1 bytes but are shifted by up to 24 bits)
Thanks in advance.

Any integer type smaller than int is promoted to type int when used in an expression.
So the shift is actually applied to an expression of type int instead of type char.

Can anyone please explain why this actually works?
The shift does not occur as an unsigned char but as a type promoted to an int1. #dbush.
Reasons why code still has issues.
32-bit int
Shifting a int 1 into the the sign's place is undefined behavior UB. See also #Eric Postpischil.
((unsigned char*)&L->data)[0]<<24) // UB
16-bit int
Shifting by the bit width or more is insufficient precision even if the type was unsigned. As int it is UB like above. Perhaps then OP would have only wanted a 2-byte endian swap?
Alternative
const uint8_t *p = &L->data;
uint32_t data = (uint32_t)p[0] << 24 | (uint32_t)p[1] << 16 | //
(uint32_t)p[2] << 8 | (uint32_t)p[3] << 0;
For the pedantic
Had int used non-2's complement, the addition of a negative value from ((unsigned char*)&L->data)[0]<<24) would have messed up the data pattern. Endian manipulations are best done using unsigned types.
from little-endian to big-endian
This code does not swap between those 2 endians. It is a big endian to native endian swap. When this code is run on a 32-bit unsigned little endian machine, it is effectively a big/little swap. On a 32-bit unsigned big endian machine, it could have been a no-op.
1 ... or posibly an unsigned on select platforms where UCHAR_MAX > INT_MAX.

Related

set most significant bit in C

I am trying to set the most significant bit in a long long unsigned, x.
To do that I am using this line of code:
x |= 1<<((sizeof(x)*8)-1);
I thought this should work, because sizeof gives size in bytes, so I multiplied by 8 and subtract one to set the final bit. Whenever I do that, the compiler has this warning: "warning: left shift count >= width of type"
I don't understand why this error is occurring.
The 1 that you are shifting is a constant of type int, which means that you are shifting an int value by sizeof(unsigned long long) * 8) - 1 bits. This shift can easily be more than the width of int, which is apparently what happened in your case.
If you want to obtain some bit-mask mask of unsigned long long type, you should start with an initial bit-mask of unsigned long long type, not of int type.
1ull << (sizeof(x) * CHAR_BIT) - 1
An arguably better way to build the same mask would be
~(-1ull >> 1)
or
~(~0ull >> 1)
use 1ULL << instead of 1 <<
Using just "1" makes you shift an integer. 1ULL will be an unsigned long long which is what you need.
An integer will probably be 32 bits and long long probably 64 bits wide. So shifting:
1 << ((sizeof(long long)*8)-1)
will be (most probably):
1 << 63
Since 1 is an integer which is (most probably) 32 bits you get a warning because you are trying to shift past the MSB of a 32 bit value.
The literal 1 you are shifting is not automatically an unsigned long long (but an int) and thus does not have as many bits as you need. Suffix it with ULL (i.e., 1ULL), or cast it to unsigned long long before shifting to make it the correct type.
Also, to be a bit safer for strange platforms, replace 8 with CHAR_BIT. Note that this is still not necessarily the best way to set the most significant bit, see, e.g., this question for alternatives.
You should also consider using a type such as uint64_t if you're assuming unsigned long long to be a certain width, or uint_fast64_t/uint_least64_t if you need at least a certain width, or uintmax_t if you need the largest available type.
Thanks to the 2's complement representation of negative integers, the most-negative interger is exactly the desired bit pattern with only the MSB set. So x |= (unsigned long long )LONG_LONG_MIN; should work too.

bytes are swapped when pointer cast is done in C

I have an array of "unsigned short" i.e. 16-bits each element in C. I have two "unsigned short" values which should be written back in array in little endian order which means that least significant element will come first. For example, if I have following value:
unsigned int val = 0x12345678;
it should be stored in my array as:
unsigned short buff[10];
buff[0] = 0x5678;
buff[1] = 0x1234;
I have written a code to write the value at once and not extracting upper and lower 16-bits of the int value and write them separately since there might be atomicity problems. My code looks like this:
typedef unsigned int UINT32;
*((UINT32*)(buff)) = (value & 0xffff0000) + (value & 0xffff);
Surprisingly, the code above works correctly and the results will be:
buff[0] is 0x5678;
buff[1] is 0x1234;
The problem is, as it is shown, I am saving the "unsigned short" values in big endian order and not little endian as I wish. In other words, when I cast the pointer from "unsigned short*" to "unsigned int*" the 16-bit elements are swapped automatically! Does anybody knows what happens here and why the data gets swapped?
Your platform represents data in little endian format, and by casting buff to (UINT32 *), you are telling the compiler that buff must now be interpreted as pointer to unsigned int. The instruction
*((UINT32*)(buff)) = (value & 0xffff0000) + (value & 0xffff);
Just says "write (value & 0xffff0000) + (value & 0xffff) into this unsigned int (buff)". And that's what he does, how he stores it is not your business. You're not supposed to access either of the lower or upper 16 bits, because it is platform dependent which one comes first.
All you know is that if you access buff as an unsigned int, you will get the same value that you previously stored in there, but it is not safe to assume any particular byte order.
So basically your code has undefined behavior.

Is it safe to compare an (uint32_t) with an hard-coded value?

I need to do bitewise operations on 32bit integers (that indeed represent chars, but whatever).
Is the following kind of code safe?
uint32_t input;
input = ...;
if(input & 0x03000000) {
output = 0x40000000;
output |= (input & 0xFC000000) >> 2;
I mean, in the "if" statement, I am doing a bitwise operation on, on the left side, a uint32_t, and on the right side... I don't know!
So do you know the type and size (by that I mean on how much bytes is it stored) of hard-coded "0x03000000" ?
Is it possible that some systems consider 0x03000000 as an int and hence code it only on 2 bytes, which would be catastrophic?
Is the following kind of code safe?
Yes, it is.
So do you know the type and size (by that I mean on how much bytes is it stored) of hard-coded "0x03000000" ?
0x03000000 is int on a system with 32-bit int and long on a system with 16-bit int.
(As uint32_t is present here I assume two's complement and CHAR_BIT of 8. Also I don't know any system with 16-bit int and 64-bit long.)
Is it possible that some systems consider 0x03000000 as an int and hence code it only on 2 bytes, which would be catastrophic?
See above on a 16-bit int system, 0x03000000 is a long and is 32-bit. An hexadecimal constant in C is the first type in which it can be represented:
int, unsigned int, long, unsigned long, long long, unsigned long long

Storing Variable in C

I am having some challenges with a basic concept in C. Help would be much obliged.
I went ahead and annotated the code with the explanation of the code as well the question I am trying to ask there as well.
void main (void)
{
printf("%x", (unsigned)((char) (0x0FF))); //I want to store just 0xFF;
/* Purpose of the next if-statement is to check if the unsigned char which is 255
* be the same as the unsigned int which is also 255. How come the console doesn't print
* out "sup"? Ideally it is supposed to print "sup" since 0xFF==0x000000FF.
*/
if(((unsigned)(char) (0x0FF))==((int)(0x000000FF)))
printf("%s","sup");
}
Thank you for your help.
You have gotten your parentheses wrong,
if(((unsigned)(char) (0x0FF))==((int)(0x000000FF)))
performs two casts on the left operand, first to char, usually(1) resulting in -1, and then that value is cast to unsigned int, usually(2) resulting in 2^32-1 = 4294967295.
(1) If char is signed, eight bits wide, two's complement is used and the conversion is done by just taking the least significant byte, as is the case for the majority of hosted implementations. If char is unsigned, or wider than eight bits, the result will be 255.
(2) If the cast to char resulted in -1 and unsigned int is 32 bits wide.

variables of incompatible width

I am using the following code to simplify assigning large values to specific locations in memory:
int buffer_address = virtual_to_physical(malloc(BUFFER_SIZE));
unsigned long int ring_slot = buffer_address << 32 | BUFFER_SIZE;
However, the compiler complains "warning: left shift count >= width of type". But an unsigned long int in C is 64 bits, so bit-shifting an int (32 bits) to the left 32 bits should yield a 64 bit value, and hence the compiler shouldn't complain. But it does.
Is there something obvious I'm missing, or otherwise is there a simple workaround?
An unsigned long int is not necessarily 64 bits, but for the simplicity let's assume it is.
buffer_address is of type int. Any expression without any "higher" types on buffer_address should return int. Thereby buffer_address << 32 should return int, and not unsigned long. Thus the compiler complains.
This should solve your issue though:
unsigned long ring_slot = ((unsigned long) buffer_address) << 32 | BUFFER_SIZE;
Please note, an unsigned long is not necessarily 64 bits, this depends on the implementation. Use this instead:
#include <stdint.h> // introduced in C99
uint64_t ring_slot = ((uint64_t) buffer_address) << 32 | BUFFER_SIZE;
buffer_address is a (32-bit) int, so buffer_size << 32 is shifting it by an amount greater than or equal to its size.
unsigned long ring_slot = ((unsigned long) buffer_address << 32) | BUFFER_SIZE:
Note that 'unsigned long' need not be 64-bits (it is not on Windows - 32-bit (ILP32) or 64-bit (LLP64); nor it is on a 32-bit Unix machine (ILP32)). To get a guaranteed (at least) 64-bit integer, you need unsigned long long.
There are few machines where int is a 64-bit quantity (ILP64); the DEC Alpha was one such, and I believe some Cray machines also used that (and the Cray's also used 'big' char types - more than 8 bits per char).
The result of the expression on the right side of the = sign does not depend on what it's assigned to. You must cast to unsigned long first.
unsigned long int ring_slot = (unsigned long)buffer_address << 32 | BUFFER_SIZE;

Resources