overcoming little-endian-ness when union'ing char[4] with int32 - c

On my little-endian z80-esque processor I have a 32-bit long int msk = 0xFFFFFF00 (Subnet Mask).
I learned about endian-ness this morning when I tried passing
(unsigned char *)&msk to a
void bar(unsigned char * c); function that walks through the values of this &msk and stores them to a database.
Unfortunately due to the little-endian-ness of z80 processors, the database stores the values "backwards", and when another function reads the bytes back, it sees 0x00FFFFFF, which is not the correct subnet mask.
Is there any trivial way around this with unions? I'd like char[3] to map to the LSB of my long int msk, instead of what it currently is (char[0] gets the LSB).
In conclusion, Big-Endian is better.

To fix endian issues: Whenever you serialize your integers to disk or to the network, convert them to a known byte order. Network order aka big-endian, is the easiest because the htonl and htons functions already exist. Or you may do it manually by repeatedly pulling off the low-order byte with byte & 0xFF; byte >>= 8 or the high-order byte with ((byte >> i*8) & 0xFF)
If you have a long int value and want the LSB of it, it is far more portable to use bit shift and mask operations rather than unions or casts.

ntohl will swap the endianess of a 32-bit integer

Related

shifting an unsigned char by more than 8 bits

I'm a bit troubled by this code:
typedef struct _slink{
struct _slink* next;
char type;
void* data;
}
assuming what this describes is a link in a file, where data is 4bytes long representing either an address or an integer(depending on the type of the link)
Now I'm looking at reformatting numbers in the file from little-endian to big-endian, and so what I wanna do is change the order of the bytes before writing back to the file, i.e.
for 0x01020304, I wanna convert it to 0x04030201 so when I write it back, its little endian representation is gonna look like the big endian representation of 0x01020304, I do that by multiplying the i'th byte by 2^8*(3-i), where i is between 0 and 3. Now this is one way it was implemented, and what troubles me here is that this is shifting bytes by more than 8 bits.. (L is of type _slink*)
int data = ((unsigned char*)&L->data)[0]<<24) + ((unsigned char*)&L->data)[1]<<16) +
((unsigned char*)&L->data)[2]<<8) + ((unsigned char*)&L->data)[3]<<0)
Can anyone please explain why this actually works? without having explicitly cast these bytes to integers to begin with(since they're only 1 bytes but are shifted by up to 24 bits)
Thanks in advance.
Any integer type smaller than int is promoted to type int when used in an expression.
So the shift is actually applied to an expression of type int instead of type char.
Can anyone please explain why this actually works?
The shift does not occur as an unsigned char but as a type promoted to an int1. #dbush.
Reasons why code still has issues.
32-bit int
Shifting a int 1 into the the sign's place is undefined behavior UB. See also #Eric Postpischil.
((unsigned char*)&L->data)[0]<<24) // UB
16-bit int
Shifting by the bit width or more is insufficient precision even if the type was unsigned. As int it is UB like above. Perhaps then OP would have only wanted a 2-byte endian swap?
Alternative
const uint8_t *p = &L->data;
uint32_t data = (uint32_t)p[0] << 24 | (uint32_t)p[1] << 16 | //
(uint32_t)p[2] << 8 | (uint32_t)p[3] << 0;
For the pedantic
Had int used non-2's complement, the addition of a negative value from ((unsigned char*)&L->data)[0]<<24) would have messed up the data pattern. Endian manipulations are best done using unsigned types.
from little-endian to big-endian
This code does not swap between those 2 endians. It is a big endian to native endian swap. When this code is run on a 32-bit unsigned little endian machine, it is effectively a big/little swap. On a 32-bit unsigned big endian machine, it could have been a no-op.
1 ... or posibly an unsigned on select platforms where UCHAR_MAX > INT_MAX.

Define a c type in bits

I was wondering if you could define a type in bits.
Specifically, I want to define a 24 bits type, in order to store the cumulative number of package lost in RTP.
If not, how can I memcpy 3 bytes from an int.
If I do this, I'm not sure how it'll end:
memcpy(pkg + 29, (&clamped_pkgs_lost)+(sizeof(char)), 3*sizeof (char));
You can define a type with at least 24 bits using a bitfield, but a bitfield must be a member of a struct:
struct {
unsigned pkgs_lost: 24;
};
Whether you use such a bitfield, or just a simple type with at least 24 bits like unsigned long to store the value within your application, when you copy it to the RTP packet the simplest portable way to do it is to copy it a byte at a time. This is because the value in the RTP packet is always big-endian, and the endianness of your host is unknown.
Assuming that pkg is of type unsigned char *, you would do something like:
pkg[33] = pkgs_lost >> 16;
pkg[34] = pkgs_lost >> 8;
pkg[35] = pkgs_lost;
to place the 24-bit big endian number at byte position 33 in the outgoing packet.
In C you can define integer types only in terms of the fundamental types or bitfields thereof.
Bitfields are quirky. You can't take their address. And they won't save you any space if you need just 24 bits, but your platform only has fundamental types of 8, 16 and 32 bits. You'd still need to use either 3 8-bit integers or 1 32-bit integer (or 1 16-bit and 1 8-bit) to store those 24 bits of yours.
For something as simple as a counter, I'd just use a 32-bit integer. If I'm interested in limiting it to 24 bit values, I have two options:
zeroing out the 8 most significant bits and thus simulating a wrap around
limiting the value to 224-1, so it never grows beyond it nor wraps around
You can store a narrow integer in a larger integer. Just mask-off the bits you want.
int main() {
long data;
data & 0xFFFFFF;
}
Or, you can define a bitfield on a structure member. But don't try to write the struct to disk and open it on a different system because bitfield layouts are not standardized.
struct {
long data:24;
};

Byte sequence change when cast char* to unsigned short* and dereference?

unsigned short* pname = (unsigned short*)(buf + buf_offset);/*sequence problem?*/
unsigned short pointer_offset = ntohs(*pname) & COMPRESSION_MASK;
Here, buf_offset == 0. the content of buf is [c0] [0c] .However, the *pname is [0x0cc0]. What is the problem? Thank you.
As others have pointed out, the code you have posted is not portable. It's not just for the reason of endianness. Casting from char * to unsigned short * may also cause bus errors, due to invalid alignment. Additionally, there may be padding bits in an unsigned short that cause your program to misbehave, or an unsigned short might be smaller or larger than "2 bytes" depending on CHAR_BIT and the choices of the implementation. The overall issue is internal representation of types, and you can avoid this issue by using operators that behave the same regardless of internal representation. Perhaps you meant:
unsigned short offset = (unsigned char) buf[0];
offset *= (UCHAR_MAX + 1);
offset += (unsigned char) buf[1];
offset &= COMPRESSION_MASK;
If you want big endian, you must explicitly state that you want it. By multiplying buf[0] and adding buf[1], I'm specifying that buf[0] is more significant than buf[1], hence I'm explicitly specifying that I want big endian. Multiplications and additions work the same in every C implementation, and there are no alignment issues.
Reversing the conversion:
unsigned char buf[2] = { offset / (UCHAR_MAX + 1), offset % (UCHAR_MAX + 1) };
It'd be nice to see more code written without care of internal representation!
it depend on your platform , big-endian / little-endian
you should change the byte order.
#ifdef _BIG_ENDIAN_
// revers bytes order
#endif
ntohs() swaps the byte order to big endian
As gabriel mentioned,
The ntohs() converts a u_short from TCP/IP network byte order to host byte order (which is little-endian on Intel processors).
The ntohs function returns the value in host byte order. If the parameter passed is already in host byte order, then this function will reverse it. It is up to the application to determine if the byte order must be reversed.

How are the values stored in the C unsigned shorts?

I'm trying to read a binary file into a C# struct. The file was created from C and the following code creates 2 bytes out of the 50+ byte rows.
unsigned short nDayTimeBitStuffed = atoi( LPCTSTR( strInput) );
unsigned short nDayOfYear = (0x01FF & nDayTimeBitStuffed);
unsigned short nTimeOfDay = (0x01F & (nDayTimeBitStuffed >> 9) );
Binary values on the file are 00000001 and 00000100.
The expected values are 1 and 2, so I think some bit ordering/swapping is going on but not sure.
Any help would be greatly appreciated.
Thanks!
The answer is 'it depends' - most notably on the machine, and also on how the data is written to the file. Consider:
unsigned short x = 0x0102;
write(fd, &x, sizeof(x));
On some machines (Intel), the low-order byte (0x02) will be written before the high-order byte (0x01); on others (PPC, SPARC), the high-order byte will be written before the low-order one.
So, from a little-endian (Intel) machine, you'd see the bytes:
0x02 0x01
But from a big-endian (PPC) machine, you'd see the bytes:
0x01 0x02
Your bytes appear to be 0x01 and 0x04. Your calculation for 0x02 appears flawed.
The C code you show doesn't write anything. The value in nDayOfYear is the bottom 9 bits of the input value; the nTimeOfDay appears to be the next 5 bits (so 14 of the 16 bits are used).
For example, if the value in strInput is 12141 decimal, 0x2F6D, then the value in nDayOfYear would be 365 (0x16D) and the value in nTimeOfDay would be 23 (0x17).
It is a funny storage order; you can't simply compare the two values whereas if you packed the day of year in the more significant portion of the value and time into the less significant, then you could compare values as simple integers and get the correct comparison.
The expected file contents are very much related to the processor and compiler used to create the file, if it's binary.
I'm assuming a Windows machine here, which uses 2 bytes for a short and puts them in little endian order.
Your comments don't make much sense either. If it's two bytes then it should be using two chars, not shorts. The range of the first is going to be 1-365, so it definitely needs more than a single byte to represent. I'm going to assume you want the first 4 bytes, not the first 2.
This means that the first byte will be bits 0-7 of the DayOfYear, the second byte will be bits 8-15 of the DayOfYear, the third byte will be bits 0-7 of the TimeOfDay, and the fourth byte will be bits 8-15 of the TimeOfDay.

C Endian Conversion : bit by bit

I have a special unsigned long (32 bits) and I need to convert the endianness of it bit by bit - my long represents several things all smooshed together into one piece of binary.
How do I do it?
Endianness is a word-level concept where the bytes are either stored most-significant byte first (big endian) or least-significant byte first (little endian). Data transferred over a network is typically big endian (so-called network byte order). Data stored in memory on a machine can be in either order, with little endian being the most common given the prevalence of the Intel x86 architecture. Even though most computer architectures are big endian, the x86 is so ubiquitous that you'll most often see little endian data in memory.
Anyhow, the point of all that is that endianness is a very specific concept that only applies at the byte level, not the bit level. If ntohs(), ntohl(), htons(), and htonl() don't do what you want then what you're dealing with isn't endianness per se.
If you need to reverse the individual bits of your unsigned long or do anything else complicated like that, please post more information about what exactly you need to do.
Be careful to understand the meaning of 'endianness'. It refers to the order of bytes within the data, not bits within the byte. You may only need to use a function like htonl or ntohl to convert your d-word.
If you truly want to reverse the order of all bits in the 32b data type, you could write an iterative algorithm to mask and shift each bit into the appropriate reflected position.
A simple endianess conversion function for an unsigned long value could look like the following:
typedef union {
unsigned long u32;
unsigned char u8 [ 4 ];
} U32_U8;
unsigned long SwapEndian(unsigned long u)
{
U32_U8 source;
U32_U8 dest;
source.u32 = u;
dest.u8[0] = source.u8[3];
dest.u8[1] = source.u8[2];
dest.u8[2] = source.u8[1];
dest.u8[3] = source.u8[0];
return dest.u32;
}
To invert the bit order of an integer, you can shift out the bits in one direction and shift the bits to the destination in the opposite direction.

Resources