C Endian Conversion : bit by bit - c

I have a special unsigned long (32 bits) and I need to convert the endianness of it bit by bit - my long represents several things all smooshed together into one piece of binary.
How do I do it?

Endianness is a word-level concept where the bytes are either stored most-significant byte first (big endian) or least-significant byte first (little endian). Data transferred over a network is typically big endian (so-called network byte order). Data stored in memory on a machine can be in either order, with little endian being the most common given the prevalence of the Intel x86 architecture. Even though most computer architectures are big endian, the x86 is so ubiquitous that you'll most often see little endian data in memory.
Anyhow, the point of all that is that endianness is a very specific concept that only applies at the byte level, not the bit level. If ntohs(), ntohl(), htons(), and htonl() don't do what you want then what you're dealing with isn't endianness per se.
If you need to reverse the individual bits of your unsigned long or do anything else complicated like that, please post more information about what exactly you need to do.

Be careful to understand the meaning of 'endianness'. It refers to the order of bytes within the data, not bits within the byte. You may only need to use a function like htonl or ntohl to convert your d-word.
If you truly want to reverse the order of all bits in the 32b data type, you could write an iterative algorithm to mask and shift each bit into the appropriate reflected position.

A simple endianess conversion function for an unsigned long value could look like the following:
typedef union {
unsigned long u32;
unsigned char u8 [ 4 ];
} U32_U8;
unsigned long SwapEndian(unsigned long u)
{
U32_U8 source;
U32_U8 dest;
source.u32 = u;
dest.u8[0] = source.u8[3];
dest.u8[1] = source.u8[2];
dest.u8[2] = source.u8[1];
dest.u8[3] = source.u8[0];
return dest.u32;
}

To invert the bit order of an integer, you can shift out the bits in one direction and shift the bits to the destination in the opposite direction.

Related

fread and endianness confusion

So to provide context, my system is little endian and the file that I am reading from is big endian (MIDI format, for those that are interested). I am supposed to read a variety of data from the file, including unsigned integers (8 bit, 16 bit, and 32 bit), chars, and booleans.
So far I know that reading unsigned integers will be an issue with fread() because I would have to convert them from big endian to little endian. My first question is, although maybe stupid to some, do I need to convert chars and booleans as well?
My second question is regarding the entire file format. Since the file is in a different endian system, do I need to read the file from the end towards the beginning (since the MSB and LSB positions will be different)? Or do I need to read in the values from the start to the end, like I would normally, and then convert those to little endian?
Thanks for taking the time to read my post and for any answers that I might receive!
Endianness only reverses order inside words of a certain length, usually 2, 4, or 8 bytes. If you're reading in a one byte value such as a char or a bool, then endianness has no effect. However, if you're reading in any value that is more than a byte, such as an integer, then endianness matters. You can still use fread since endianness has nothing to do with file reading, just make sure to convert from big endian to little endian.
When you read external data that isn't just a sequence of characters, you read it as a sequence of bytes, and construct the actual data that you want from it.
If you expect a signed 16 bit number, followed by an unsigned 8 bit number, followed by an unsigned 32 bit number, you write a function reading two bytes and returns them converted to a signed 16 bit number, one that reads one byte and returns it as an unsigned 8 bit number, and one that reads four bytes and returns them converted to a 32 bit number. Construct the 16 and 32 bit numbers using bit shifting.

Swapping an integer with a short using a generic function

Assume I have this generic function that swaps two variables:
void swap(void *v1, void *v2, int size){
char buffer[size];
memcpy(buffer, v1, size);
memcpy(v1, v2, size);
memcpy(v2, buffer, size);
}
It works fine, but I was wondering in what cases this might break. One case that comes to mind is when we have two different data types and the size specified is not enough to capture the bigger data. for example:
int x = 4444;
short y = 5;
swap(&x, &y, sizeof(short));
I'd expect that when I run this it would give an incorrect result, because memcpy would work with only 2 bytes (rather than 4) and part of the data would be lost or changed when dealing with x.
Surprisingly though, when I run it, it gives the correct answer on both my Windows 7 and Ubuntu operating systems. I know that Ubuntu and Windows differ in endianness but apparently that doesn't affect any of the two systems.
I want to know why the generic function works fine in this case.
To understand this fully you have to understand the C standard and the specifics of you machine and compiler. Starting with the C standard, here's some relevant snippets [The standard I'm using is WG14/N1256], summarized a little:
The object representation for a signed integer consists of value bits,
padding bits, and a sign bit. [section 6.2.6.2.2].
These bits are stored in a contiguous sequence of bytes. [section
6.2.6.1].
If there's N value bits, they represent powers of two from 2^0 to
2^{N-1}. [section 6.2.6.2].
The sign bit can have one of three meanings, one of which is that is
has value -2^N (two's complement) [section 6.2.6.2.2].
When you copy bytes from a short to an int, you're copying the value bits, padding bits and the sign bit of the short to bits of the int, but not necessarily preserving the meaning of the bits. Somewhat surprisingly, the standard allows this except it doesn't guarantee that the int you get will be valid if your target implementation has so-called "trap representations" and you're unlucky enough to generate one.
In practice, you've found on your machine and your compiler:
a short is represented by 2 bytes of 8 bits each.
The sign bit is bit 7 of the second byte
The value bits in ascending order of value are bits 0-7 of byte 0, and bits 0-6 of byte 1.
There's no padding bits
an int is represented by 4 bytes of 8 bits each.
The sign bit is bit 7 of the fourth byte
The value bits in ascending order of value are bits 0-7 of byte 0, 0-7 of byte 1, 0-7 of byte 2, and 0-6 of byte 3.
There's no padding bits
You would also find out that both representations use two's complement.
In pictures (where SS is the sign bit, and the numbers N correspond to a bit that has value 2^N):
short:
07-06-05-04-03-02-01-00 | SS-14-13-12-11-10-09-08
int:
07-06-05-04-03-02-01-00 | 15-14-13-12-11-10-09-08 | 23-22-21-20-19-18-17-16 | SS-30-29-28-27-26-25-24
You can see from this that if you copy the bytes of a short to the first two bytes of a zero int, you'll get the same value if the sign bit is zero (that is, the number is positive) because the value bits correspond exactly. As a corollary, you can also predict you'll get a different value if you start with a negative-valued short since the sign bit of the short has value -2^15 but the corresponding bit in the int has value 2^15.
The representation you've found on your machine is often summarized as "two's complement, little-endian", but the C standard provides a lot more flexibility in representations than that description suggests (even allowing a byte to have more than 8 bits), which is why portable code usually avoids relying on bit/byte representations of integral types.
As has already been pointed out in the comments the systems you are using are typically little-endian (least significant byte in the lowest address). Given that the memcpy sets the short to the lowest part of the int.
You might enjoy looking at Bit Twiddling Hacks for 'generic' ways to do swap operations.

Define a c type in bits

I was wondering if you could define a type in bits.
Specifically, I want to define a 24 bits type, in order to store the cumulative number of package lost in RTP.
If not, how can I memcpy 3 bytes from an int.
If I do this, I'm not sure how it'll end:
memcpy(pkg + 29, (&clamped_pkgs_lost)+(sizeof(char)), 3*sizeof (char));
You can define a type with at least 24 bits using a bitfield, but a bitfield must be a member of a struct:
struct {
unsigned pkgs_lost: 24;
};
Whether you use such a bitfield, or just a simple type with at least 24 bits like unsigned long to store the value within your application, when you copy it to the RTP packet the simplest portable way to do it is to copy it a byte at a time. This is because the value in the RTP packet is always big-endian, and the endianness of your host is unknown.
Assuming that pkg is of type unsigned char *, you would do something like:
pkg[33] = pkgs_lost >> 16;
pkg[34] = pkgs_lost >> 8;
pkg[35] = pkgs_lost;
to place the 24-bit big endian number at byte position 33 in the outgoing packet.
In C you can define integer types only in terms of the fundamental types or bitfields thereof.
Bitfields are quirky. You can't take their address. And they won't save you any space if you need just 24 bits, but your platform only has fundamental types of 8, 16 and 32 bits. You'd still need to use either 3 8-bit integers or 1 32-bit integer (or 1 16-bit and 1 8-bit) to store those 24 bits of yours.
For something as simple as a counter, I'd just use a 32-bit integer. If I'm interested in limiting it to 24 bit values, I have two options:
zeroing out the 8 most significant bits and thus simulating a wrap around
limiting the value to 224-1, so it never grows beyond it nor wraps around
You can store a narrow integer in a larger integer. Just mask-off the bits you want.
int main() {
long data;
data & 0xFFFFFF;
}
Or, you can define a bitfield on a structure member. But don't try to write the struct to disk and open it on a different system because bitfield layouts are not standardized.
struct {
long data:24;
};

What is this C syntax?

I have no idea what to call it, so I have no idea how to search for it.
unsigned int odd : 1;
Edit:
To elaborate, it comes from this snippet:
struct bitField {
unsigned int odd : 1;
unsigned int padding: 15; // to round out to 16 bits
};
I gather this involves bits, but I'm still not all the way understanding.
They are bitfields. odd and padding will be stored in one unsigned int (16 bit) where odd will occupy the lowest bit, and padding the upper 15 bit of the unsigned int.
It's a bitfield - Check the C FAQ.
It's:
1 bit of "odd" (e.g. 1)
15 bits of "padding" (e.g. 0000000000000001)
and (potentially) whatever other bits round out the unsigned int. In modern 32-bit platforms where this is 32 bits, you'll see another 16 0s in memory (but not in the struct). (In this case sizeof returns 4)
Bitfields can save memory but potentially add instructions to computations. In some cases compilers may ignore your bitfield settings. You can't make any assumptions about how the compiler will choose to actually lay out your bit field, and it can depend on the endianness of your platform.
The main thing I use bitfields for is when I know I will be doing a lot of copying of the data, and not necessarily a lot of computation on or reference of the specific fields in the bit field.

overcoming little-endian-ness when union'ing char[4] with int32

On my little-endian z80-esque processor I have a 32-bit long int msk = 0xFFFFFF00 (Subnet Mask).
I learned about endian-ness this morning when I tried passing
(unsigned char *)&msk to a
void bar(unsigned char * c); function that walks through the values of this &msk and stores them to a database.
Unfortunately due to the little-endian-ness of z80 processors, the database stores the values "backwards", and when another function reads the bytes back, it sees 0x00FFFFFF, which is not the correct subnet mask.
Is there any trivial way around this with unions? I'd like char[3] to map to the LSB of my long int msk, instead of what it currently is (char[0] gets the LSB).
In conclusion, Big-Endian is better.
To fix endian issues: Whenever you serialize your integers to disk or to the network, convert them to a known byte order. Network order aka big-endian, is the easiest because the htonl and htons functions already exist. Or you may do it manually by repeatedly pulling off the low-order byte with byte & 0xFF; byte >>= 8 or the high-order byte with ((byte >> i*8) & 0xFF)
If you have a long int value and want the LSB of it, it is far more portable to use bit shift and mask operations rather than unions or casts.
ntohl will swap the endianess of a 32-bit integer

Resources