bytes are swapped when pointer cast is done in C - c

I have an array of "unsigned short" i.e. 16-bits each element in C. I have two "unsigned short" values which should be written back in array in little endian order which means that least significant element will come first. For example, if I have following value:
unsigned int val = 0x12345678;
it should be stored in my array as:
unsigned short buff[10];
buff[0] = 0x5678;
buff[1] = 0x1234;
I have written a code to write the value at once and not extracting upper and lower 16-bits of the int value and write them separately since there might be atomicity problems. My code looks like this:
typedef unsigned int UINT32;
*((UINT32*)(buff)) = (value & 0xffff0000) + (value & 0xffff);
Surprisingly, the code above works correctly and the results will be:
buff[0] is 0x5678;
buff[1] is 0x1234;
The problem is, as it is shown, I am saving the "unsigned short" values in big endian order and not little endian as I wish. In other words, when I cast the pointer from "unsigned short*" to "unsigned int*" the 16-bit elements are swapped automatically! Does anybody knows what happens here and why the data gets swapped?

Your platform represents data in little endian format, and by casting buff to (UINT32 *), you are telling the compiler that buff must now be interpreted as pointer to unsigned int. The instruction
*((UINT32*)(buff)) = (value & 0xffff0000) + (value & 0xffff);
Just says "write (value & 0xffff0000) + (value & 0xffff) into this unsigned int (buff)". And that's what he does, how he stores it is not your business. You're not supposed to access either of the lower or upper 16 bits, because it is platform dependent which one comes first.
All you know is that if you access buff as an unsigned int, you will get the same value that you previously stored in there, but it is not safe to assume any particular byte order.
So basically your code has undefined behavior.

Related

shifting an unsigned char by more than 8 bits

I'm a bit troubled by this code:
typedef struct _slink{
struct _slink* next;
char type;
void* data;
}
assuming what this describes is a link in a file, where data is 4bytes long representing either an address or an integer(depending on the type of the link)
Now I'm looking at reformatting numbers in the file from little-endian to big-endian, and so what I wanna do is change the order of the bytes before writing back to the file, i.e.
for 0x01020304, I wanna convert it to 0x04030201 so when I write it back, its little endian representation is gonna look like the big endian representation of 0x01020304, I do that by multiplying the i'th byte by 2^8*(3-i), where i is between 0 and 3. Now this is one way it was implemented, and what troubles me here is that this is shifting bytes by more than 8 bits.. (L is of type _slink*)
int data = ((unsigned char*)&L->data)[0]<<24) + ((unsigned char*)&L->data)[1]<<16) +
((unsigned char*)&L->data)[2]<<8) + ((unsigned char*)&L->data)[3]<<0)
Can anyone please explain why this actually works? without having explicitly cast these bytes to integers to begin with(since they're only 1 bytes but are shifted by up to 24 bits)
Thanks in advance.
Any integer type smaller than int is promoted to type int when used in an expression.
So the shift is actually applied to an expression of type int instead of type char.
Can anyone please explain why this actually works?
The shift does not occur as an unsigned char but as a type promoted to an int1. #dbush.
Reasons why code still has issues.
32-bit int
Shifting a int 1 into the the sign's place is undefined behavior UB. See also #Eric Postpischil.
((unsigned char*)&L->data)[0]<<24) // UB
16-bit int
Shifting by the bit width or more is insufficient precision even if the type was unsigned. As int it is UB like above. Perhaps then OP would have only wanted a 2-byte endian swap?
Alternative
const uint8_t *p = &L->data;
uint32_t data = (uint32_t)p[0] << 24 | (uint32_t)p[1] << 16 | //
(uint32_t)p[2] << 8 | (uint32_t)p[3] << 0;
For the pedantic
Had int used non-2's complement, the addition of a negative value from ((unsigned char*)&L->data)[0]<<24) would have messed up the data pattern. Endian manipulations are best done using unsigned types.
from little-endian to big-endian
This code does not swap between those 2 endians. It is a big endian to native endian swap. When this code is run on a 32-bit unsigned little endian machine, it is effectively a big/little swap. On a 32-bit unsigned big endian machine, it could have been a no-op.
1 ... or posibly an unsigned on select platforms where UCHAR_MAX > INT_MAX.

Change char* to point a few bits offset

Say I have this code:
char num[2] = {15, 240};
char* p_num;
Now, if I have understood everything correct, the bits in the array should be aligned like this:
00001111 11110000
My question is: Is there any possible way to make the pointer p_num to point to the four last bits in the first byte so that when I execute this code:
printf("%d", *p_num);
255 will be written?
I.e. p_num will point to the bits which the brackets below encloses:
0000[1111 1111]0000
No. The minimum addressable unit of memory is a byte (at best), though you could obtain the desired value using
((num[0] & 0xF) << 4) | ((num[1] >> 4) & 0xF)
For example,
unsigned char num[2] = {15, 240};
unsigned char combined = ((num[0] & 0xF) << 4) | ((num[1] >> 4) & 0xF);
printf("%d\n", (int)combined);
Note that I used unsigned char to store 240 and 255 since char can be signed or unsigned depending on the implementation.
No, for two reasons.
In C, the size of a char is defined to be 1. However, the unit itself is implementation dependent. C does not guarantee that it will be 1 byte. Granted the unit is typically 1 byte. However, since this is not guaranteed, your premise that the bits in the bytes of the array will be arranged as you believe is not accurate. Strictly speaking, the bits themselves could contain anything. Though it would be unusual to have something else, based on the language you cannot rely on that arrangement of bits. Any workaround that relies on bit shifts, etc, will reliably work only on that implementation and cannot be assumed to be portable.
A pointer to char will only point to a char. If you advance it using pointer arithmetic, it will advance by 1 of whatever the implementation defined unit is. The same is true for,pointers to any other type. There is no way provided by the language to advance a pointer by fractions of whatever the size of the type is that the pointer points to.

convert chars to integer correct way [duplicate]

This question already has answers here:
embedding chars in int and vice versa
(3 answers)
Copying a 4 element character array into an integer in C
(6 answers)
Closed 9 years ago.
Just trying to make sure I got it right.
On SO I encountered an answer on a question: how to store chars in int like this:
unsigned int final = 0;
final |= ( data[0] << 24 );
final |= ( data[1] << 16 );
final |= ( data[2] << 8 );
final |= ( data[3] );
But to my understanding this is wrong isn't it?
Why: say data has stored the integer in little endian way (e.g., data[0]=LSB_ofSomeInt).
Then if machine executing above code is little endian, final will hold correct value,
else if the machine running above code is big endian it will hold a wrong value, isn't it?
Just trying to make sure I got this right, I am not going to ask more question in this direction for now.
Do not do this when you have functions like htonl etc.
Takes the hassle out of things
This code does not depend on the endianness of the platform: data[0] is always stored as the most significant byte of the int, followed by the rest, and data[3] is always the least significant byte.
Whether that's "right" or "wrong" depends on how the integer has been encoded in the data array itself.
There is one problem though: if data has been declared using char rather than unsigned char, the signed data[i] will be promoted first to a signed int and you end up setting many more bits than you intended.
This is wrong in little and big endian systems.
If data elements are of type char, you then need to cast all data elements to unsigned char before doing the bitwise left shift, otherwise you may encounter sign extension on data elements with negative values. The signedness of char is implementation defined and char can be a signed type.
Also data[0] << 24 (or even (unsigned char) data[0] << 24) will invoke undefined behavior if data[0] is a negative value as the resulting value is then not representable in an int and so you'll need an extra cast to unsigned int.
The best is to declare an unsigned char array for data and then cast each data elements to unsigned int before the left shift.
Now assuming you cast it correctly, this will work only if data[0] holds the most significant byte of your value.
Besides the obvious problem of platform-specific byte-ordering (which other answers have addressed), you should be careful about promotion of data types.
I'm assuming that data is an array of type unsigned char. In which case, the expression
data[0] << 24
is zero; you just left shifted an 8-bit operand 24 bits! I haven't compiled it to check, or reviewed the type promotion rules, but I believe, the way you have parenthesized it,
data[0] << 24 is still an unsigned char. It gets promoted when you bit-wise or the result with final. At best, it leaves too much to interpretation. A safer, more explicit way to do this is to bit-wise or first, then shift:
final |= data[0]; final <<= 8;
final |= data[1]; final <<= 8;
final |= data[2]; final <<= 8;
final |= data[3]; final <<= 8;
or you could promote explicitly and then shift:
final |= ((unsigned int)data[0]) << 24;
final |= ((unsigned int)data[1]) << 16;
final |= ((unsigned int)data[2]) << 8;
final |= ((unsigned int)data[3]);
Of course, this doesn't deal with the endianness problem at all. But that may or may not be a problem, depending on where data came from.

Byte sequence change when cast char* to unsigned short* and dereference?

unsigned short* pname = (unsigned short*)(buf + buf_offset);/*sequence problem?*/
unsigned short pointer_offset = ntohs(*pname) & COMPRESSION_MASK;
Here, buf_offset == 0. the content of buf is [c0] [0c] .However, the *pname is [0x0cc0]. What is the problem? Thank you.
As others have pointed out, the code you have posted is not portable. It's not just for the reason of endianness. Casting from char * to unsigned short * may also cause bus errors, due to invalid alignment. Additionally, there may be padding bits in an unsigned short that cause your program to misbehave, or an unsigned short might be smaller or larger than "2 bytes" depending on CHAR_BIT and the choices of the implementation. The overall issue is internal representation of types, and you can avoid this issue by using operators that behave the same regardless of internal representation. Perhaps you meant:
unsigned short offset = (unsigned char) buf[0];
offset *= (UCHAR_MAX + 1);
offset += (unsigned char) buf[1];
offset &= COMPRESSION_MASK;
If you want big endian, you must explicitly state that you want it. By multiplying buf[0] and adding buf[1], I'm specifying that buf[0] is more significant than buf[1], hence I'm explicitly specifying that I want big endian. Multiplications and additions work the same in every C implementation, and there are no alignment issues.
Reversing the conversion:
unsigned char buf[2] = { offset / (UCHAR_MAX + 1), offset % (UCHAR_MAX + 1) };
It'd be nice to see more code written without care of internal representation!
it depend on your platform , big-endian / little-endian
you should change the byte order.
#ifdef _BIG_ENDIAN_
// revers bytes order
#endif
ntohs() swaps the byte order to big endian
As gabriel mentioned,
The ntohs() converts a u_short from TCP/IP network byte order to host byte order (which is little-endian on Intel processors).
The ntohs function returns the value in host byte order. If the parameter passed is already in host byte order, then this function will reverse it. It is up to the application to determine if the byte order must be reversed.

Pointer indirection when pointer and data widths differ

I want to access a 32-bit data pointed to by an address in a hardware register (which is 64 bits, with only 40 LSb's set). So I do:
paddr_t address = read_hw(); // paddr_t is unsigned long long
unsigned int value = *(unsigned int*) address; // error: cast to pointer from integer of different size
unsigned int value2 = (unsigned int) *((paddr_t*) address); // error: cast to pointer from integer of different size
What would be the right way to do this without compiler error (I use -Werror)?
Nominally with C99 the first option is closest to correct,
uint32_t value = *(uint32_t*)address;
However you may also choose to use the other pointer/integer helpers,
uintptr_t address = read_hw();
uint32_t value = *(uint32_t*)address;
I'm not sure I understand the question.
"I want to access a 32-bit data pointed to by an address in a hardware register (which is 64 bits, with only 40 LSb's set)."
So you have a hardware register 64 bits wide, the least significant 40 of which should be interpreted as an address in memory which contains 32 bits of data?
Can you try
uint32_t* pointer = (*(uint64_t *) register_address) & (~0 >> 24)
uint32_t value = *pointer
Although this might get more complicated depending on endian-ness and whether the compiler interprets >> as a logical or arithmetic right-shift.
Although, really, I want to ask,
Does "I am using a cross-compiler, I don't have the luxury of printf" mean you can't actually run your code, or just that you have to do it some some hardware that lacks a convenient output channel?
What is your target architecture, that your pointers are 40 bits long?!
From what you have written you have a 64 pointer of which only 40 bits are the pointer, and that pointer points to some data that is 32 bits in size.
Your code seems to be trying to mangle the 40 bit pointer into a 32 bit pointer.
What you should be doing is &'ing the relevant 40 bits within the 64 bit pointer so that it remains a 64 bit pointer, and then using that to access the data, which you can then similarly & to get the data. Otherwise you are (as the errors indicate) truncating the pointer.
Something like (I don't have 64 bit so I can't test this, but you get the idea):
address = address & 0x????????????????; // use the ?s to mask off the bits you
// want to ignore
value64 = *address; // value64 is 64 bits
value32 = (int)(value64 & 0x00000000ffffffff); // if the data is in the lower
// half of value64
or
value32 = (int)((value64 & 0xffffffff00000000) > 32); // if the data is in the
// higher half of value64
where the ?'s are masking the bits as needed (depending on the endiness that you are working with).
You'll probably also need to change the (int) casts to suit (you want to instead cast it to whatever 32 bit data type the data represents - ie. the type of value32).
check the real sizes for your pointers and paddr_t type:
printf("paddr_t size: %d, pointer size: %d\n",
sizeof(paddr_t), sizeof(unsigned int *));
what do you get?
update:
ARM is a 32 bits architecture, so you are trying to convert from a 64bits integer to a 32bits pointer an your compiler doesn't like it!
If you are sure that the value in paddr_t fits in a 32bits pointer you can just cast it to an int first:
unsigned int *p = (unsigned int *)(int)addrs;

Resources