Byte sequence change when cast char* to unsigned short* and dereference? - c

unsigned short* pname = (unsigned short*)(buf + buf_offset);/*sequence problem?*/
unsigned short pointer_offset = ntohs(*pname) & COMPRESSION_MASK;
Here, buf_offset == 0. the content of buf is [c0] [0c] .However, the *pname is [0x0cc0]. What is the problem? Thank you.

As others have pointed out, the code you have posted is not portable. It's not just for the reason of endianness. Casting from char * to unsigned short * may also cause bus errors, due to invalid alignment. Additionally, there may be padding bits in an unsigned short that cause your program to misbehave, or an unsigned short might be smaller or larger than "2 bytes" depending on CHAR_BIT and the choices of the implementation. The overall issue is internal representation of types, and you can avoid this issue by using operators that behave the same regardless of internal representation. Perhaps you meant:
unsigned short offset = (unsigned char) buf[0];
offset *= (UCHAR_MAX + 1);
offset += (unsigned char) buf[1];
offset &= COMPRESSION_MASK;
If you want big endian, you must explicitly state that you want it. By multiplying buf[0] and adding buf[1], I'm specifying that buf[0] is more significant than buf[1], hence I'm explicitly specifying that I want big endian. Multiplications and additions work the same in every C implementation, and there are no alignment issues.
Reversing the conversion:
unsigned char buf[2] = { offset / (UCHAR_MAX + 1), offset % (UCHAR_MAX + 1) };
It'd be nice to see more code written without care of internal representation!

it depend on your platform , big-endian / little-endian
you should change the byte order.
#ifdef _BIG_ENDIAN_
// revers bytes order
#endif

ntohs() swaps the byte order to big endian

As gabriel mentioned,
The ntohs() converts a u_short from TCP/IP network byte order to host byte order (which is little-endian on Intel processors).
The ntohs function returns the value in host byte order. If the parameter passed is already in host byte order, then this function will reverse it. It is up to the application to determine if the byte order must be reversed.

Related

shifting an unsigned char by more than 8 bits

I'm a bit troubled by this code:
typedef struct _slink{
struct _slink* next;
char type;
void* data;
}
assuming what this describes is a link in a file, where data is 4bytes long representing either an address or an integer(depending on the type of the link)
Now I'm looking at reformatting numbers in the file from little-endian to big-endian, and so what I wanna do is change the order of the bytes before writing back to the file, i.e.
for 0x01020304, I wanna convert it to 0x04030201 so when I write it back, its little endian representation is gonna look like the big endian representation of 0x01020304, I do that by multiplying the i'th byte by 2^8*(3-i), where i is between 0 and 3. Now this is one way it was implemented, and what troubles me here is that this is shifting bytes by more than 8 bits.. (L is of type _slink*)
int data = ((unsigned char*)&L->data)[0]<<24) + ((unsigned char*)&L->data)[1]<<16) +
((unsigned char*)&L->data)[2]<<8) + ((unsigned char*)&L->data)[3]<<0)
Can anyone please explain why this actually works? without having explicitly cast these bytes to integers to begin with(since they're only 1 bytes but are shifted by up to 24 bits)
Thanks in advance.
Any integer type smaller than int is promoted to type int when used in an expression.
So the shift is actually applied to an expression of type int instead of type char.
Can anyone please explain why this actually works?
The shift does not occur as an unsigned char but as a type promoted to an int1. #dbush.
Reasons why code still has issues.
32-bit int
Shifting a int 1 into the the sign's place is undefined behavior UB. See also #Eric Postpischil.
((unsigned char*)&L->data)[0]<<24) // UB
16-bit int
Shifting by the bit width or more is insufficient precision even if the type was unsigned. As int it is UB like above. Perhaps then OP would have only wanted a 2-byte endian swap?
Alternative
const uint8_t *p = &L->data;
uint32_t data = (uint32_t)p[0] << 24 | (uint32_t)p[1] << 16 | //
(uint32_t)p[2] << 8 | (uint32_t)p[3] << 0;
For the pedantic
Had int used non-2's complement, the addition of a negative value from ((unsigned char*)&L->data)[0]<<24) would have messed up the data pattern. Endian manipulations are best done using unsigned types.
from little-endian to big-endian
This code does not swap between those 2 endians. It is a big endian to native endian swap. When this code is run on a 32-bit unsigned little endian machine, it is effectively a big/little swap. On a 32-bit unsigned big endian machine, it could have been a no-op.
1 ... or posibly an unsigned on select platforms where UCHAR_MAX > INT_MAX.

Merge uint16_t and int16_t into int32_t for data transmission in C

I am confronted with the problem, to transmit an index (uint16_t) and data (int16_t) in a int32_t type, but I had no success yet.
What I tried is the following:
int32_t buf = (int16_t) data;
buf |= ((int32_t idx)<<16);
But this works only for positive data because of the 2 complement, which adds a one at the MSB. How can I achieve this?
Merging integer types is a notorious problem in C due to finicky details about integer promotion, representations of integers, and the limits of the definitions of bit shifting.
First, there is rarely a good reason to merge a uint16_t and an int16_t into an int32_t for transmission. One ought to simply access the bytes of the objects using an unsigned char * and transmit the bytes. Often, gathering the bytes into a buffer for transmission is easily done using memcpy, which is equivalent to copying the bytes using an unsigned char *.
If you must merge a uint16_t and an int16_t into an int32_t, with the former going into the high bytes, then a standard-conforming way to do it is:
uint32_t temporary = (uint32_t) idx << 16 | (uint16_t) data;
int32_t buf;
memcpy(&buf, &temporary, sizeof buf);
Explanation:
(uint32_t) idx widens idx to 32 bits, ensuring that it can be shifted by 16 bits without losing any data.
(uint16_t) data converts data from int16_t, producing a non-negative value of 16 bits that will, when widened to 32 bits, remain non-negative (and not be subject to sign extension).
Then these two pieces are assembled into a uint32_t. (In general, it is easier to work with bit operations using unsigned integers, as they have better semantics for bit operations in C than signed integers do.)
Finally, the resulting bits are copied into buf, side-stepping problems with signed integer overflow and conversion.
Note that a good compiler will optimize away the memcpy.
To do this without memcpy, it is essentially necessary to adjust the uint16_t to a signed value:
int32_t temporary = 0x8000 <= idx ? idx - 0x10000 : idx;
int32_t buf = temporary * 0x10000 | (uint16_t) data;
Or, if the order of data and idx in buf can be changed, an easier solution is:
int32_t buf = data * 65536 | idx;
Try:
uint32_t buf = data & 0xffff;
buf |= idx<<16;

Copying int to different memory location, receiving extra bytes than expected

Trying to pre-pend a 2 byte message length, after getting the length in a 4 byte int. I use memcpy to copy 2 bytes of the int. When I look at the second byte I copied, it is as expected, but accessing the first byte actually prints 4 bytes.
I would expect that dest[0] and dest[1] both contain 1 byte of the int. whether or not it's a significant byte, or the order is switched... I can throw in an offset on the memcpy or reversing 0 and 1. It does not have to be portable, I would just like it to work.
The same error is happening in Windows with LoadRunner and Ubuntu with GCC - so I have at least tried to rule out portability as a cause.
I'm not sure where I'm going wrong. I am suspecting it's related to my lack of using pointers recently? Is there a better approach to cast an int to a short and then put it in the first 2 bytes of a buffer?
char* src;
char* dest;
int len = 2753; // Hex - AC1
src=(char*)malloc(len);
dest=(char*)malloc(len+2);
memcpy(dest, &len, 2);
memcpy(dest+2, src, len);
printf("dest[0]: %02x", dest[0]);
// expected result: c1
// actual result: ffffffc1
printf("dest[1]: %02x", dest[1]);
// expected result: 0a
// actual result: 0a
You cannot just take a random two bytes out of a four byte object and call it a cast to short.
You will need to copy your int into a two byte int before doing your memcpy.
But actually, that isn't the best way to do it either, because you have no control over the byte order of an integer.
Your code should look like this:
dest[0] = ((unsigned)len >> 8) & 0xFF;
dest[1] = ((unsigned)len) & 0xFF;
That should write it out in network byte order aka big endian. All of the standard network protocols use this byte order.
And I'd add something like:
assert( ((unsigned)len & 0xFFFF0000) == 0 ); // should be nothing in the high bytes
Firstly, you are using printf incorrectly. This
printf("dest[0]: %02x", dest[0]);
uses x format specifier in printf. x format specifier requires an argument of type unsigned int. Not char, but unsigned int and only unsigned int (or alternatively an int with non-negative value).
The immediate argument you supplied has type char, which is probably signed on your platform. This means that your dest[0] contains -63. A variadic argument of type char is automatically promoted to type int, which turns 0xc1 into 0xffffffc1 (as a signed representation of -63 in type int). Since printf expects an unsigned int value and you are passing a negative int value instead, the behavior is undefined. The printout that you see is nothing more than a manifestation of that undefined behavior. It is meaningless.
One proper way to print dest[0] in this case would be
printf("dest[0]: %02x", (unsigned) dest[0]);
I'm pretty sure the output will still be ffffffc1, but in this case 0xffffffc1 is the prefectly expected result of integer conversion from negative -63 value to unsigned int type. Nothing unusual here.
Alternatively you can do
printf("dest[0]: %02x", (unsigned char) dest[0]);
which should give you your desired c1 output. Note that the conversion to int takes place in this case as well, but since the original value is positive (193), the result of the conversion to int is positive too and printf works properly.
Finally, if you want to work with raw memory directly, the proper type to use would be unsigned char from the very beginning. Not char, but unsigned char.
Secondly, an object of type int may easily occupy more than two 8-bit bytes. Depending on the platform, the 0xA and 0xC1 values might end up in completely different portions of the memory region occupied by that int object. You should not expect that copying the first two bytes of an int object will copy the 0xAC1 portion specifically.
You make the assumption that an "int" is two bytes. What justification do you have for that? Your code is highly unportable.
You make another assumption that "char" is unsigned. What justification do you have for that? Again, your code is highly unportable.
You make another assumption about the ordering of bytes in an int. What justification do you have for that? Again, your code is highly unportable.
instead of the literal 2, use sizeof(int). Never hard code the size of a type.
If this code should be portable, you should not use int, but a fixed size datatype.
If you need 16 bit, you could use int16_t.
Also, the printing of the chars would need a cast to unsigned. Now, the char is upcasted to an int, and the sign is extended. This gives the initial FFFF's

bytes are swapped when pointer cast is done in C

I have an array of "unsigned short" i.e. 16-bits each element in C. I have two "unsigned short" values which should be written back in array in little endian order which means that least significant element will come first. For example, if I have following value:
unsigned int val = 0x12345678;
it should be stored in my array as:
unsigned short buff[10];
buff[0] = 0x5678;
buff[1] = 0x1234;
I have written a code to write the value at once and not extracting upper and lower 16-bits of the int value and write them separately since there might be atomicity problems. My code looks like this:
typedef unsigned int UINT32;
*((UINT32*)(buff)) = (value & 0xffff0000) + (value & 0xffff);
Surprisingly, the code above works correctly and the results will be:
buff[0] is 0x5678;
buff[1] is 0x1234;
The problem is, as it is shown, I am saving the "unsigned short" values in big endian order and not little endian as I wish. In other words, when I cast the pointer from "unsigned short*" to "unsigned int*" the 16-bit elements are swapped automatically! Does anybody knows what happens here and why the data gets swapped?
Your platform represents data in little endian format, and by casting buff to (UINT32 *), you are telling the compiler that buff must now be interpreted as pointer to unsigned int. The instruction
*((UINT32*)(buff)) = (value & 0xffff0000) + (value & 0xffff);
Just says "write (value & 0xffff0000) + (value & 0xffff) into this unsigned int (buff)". And that's what he does, how he stores it is not your business. You're not supposed to access either of the lower or upper 16 bits, because it is platform dependent which one comes first.
All you know is that if you access buff as an unsigned int, you will get the same value that you previously stored in there, but it is not safe to assume any particular byte order.
So basically your code has undefined behavior.

overcoming little-endian-ness when union'ing char[4] with int32

On my little-endian z80-esque processor I have a 32-bit long int msk = 0xFFFFFF00 (Subnet Mask).
I learned about endian-ness this morning when I tried passing
(unsigned char *)&msk to a
void bar(unsigned char * c); function that walks through the values of this &msk and stores them to a database.
Unfortunately due to the little-endian-ness of z80 processors, the database stores the values "backwards", and when another function reads the bytes back, it sees 0x00FFFFFF, which is not the correct subnet mask.
Is there any trivial way around this with unions? I'd like char[3] to map to the LSB of my long int msk, instead of what it currently is (char[0] gets the LSB).
In conclusion, Big-Endian is better.
To fix endian issues: Whenever you serialize your integers to disk or to the network, convert them to a known byte order. Network order aka big-endian, is the easiest because the htonl and htons functions already exist. Or you may do it manually by repeatedly pulling off the low-order byte with byte & 0xFF; byte >>= 8 or the high-order byte with ((byte >> i*8) & 0xFF)
If you have a long int value and want the LSB of it, it is far more portable to use bit shift and mask operations rather than unions or casts.
ntohl will swap the endianess of a 32-bit integer

Resources