Endian dependent code in real application? - c

I know the following C code is endian-dependent:
short s_endian = 0x4142;
char c_endian = *(char *)&s_endian;
On a big-endian machine, c_endian will be 'A'(0x41); while on a little-endian machine, it will be 'B'(0x42).
But this code seems kind of ugly. So is there endian dependent code in real applications? Or have you came across any application that needs a lot of changes when porting to a different target with a different endian?
Thanks.

Pretty much any code that deals with saving integers with more than 8 bits in binary format, or sends such integers over the network. For one extremely common example, many of the fields in the TCP header fall into this category.

Networking code is endian dependent (it should always transfer across the network as big-endian, even on a little-endian machine), hence the need for functions like htons(), htonl(), ntohs(), and ntohl() in net/hton.h that allow easy conversions from host-to-network byte-order and network-to-host byte-order.
Hope this helps,
Jason

I once collected data using a specialized DAQ card on a PC, and tried to analyze the file on a PowerPC mac. Turns out the "file format" the thing used was a raw memory dump...
Little endian on x86, big endian on Power PC. You figure it out.

The short answer is yes. Anything that reads/writes raw binary to a file or socket needs to keep track of the endianness of the data.
For example, the IP protocol requires big-endian representation.

When manipulating the internal representation of floating-point numbers, you could access the parts (or the full value) using an integer type. For example:
union float_u
{
float f;
unsigned short v[2];
};
int get_sign(float f)
{
union float_u u;
u.f = f;
return (u.v[0] & 0x8000) != 0; // Endian-dependant
}

If your program sends data to another system (either over a serial or network link, or by saving it to a file for something else to read) or reads data from another system, then you can have endianness issues.
I don't know that static analysis would be able to detect such constructs, but having your programmers follow a coding standard, where structure elements and variables were marked up to indicate their endianness could help.
For example, if all network data structures had _be appended to the named of multi-byte members, you could look for instances where you assigned a non-suffixed (host byte order) variable or even a literal value (like 0x1234) to one of those members.
It would be great if we could capture endianness in our datatypes -- uint32_be and uint32_le to go with uint32_t. Then the compiler could disallow assignments or operations between the two. And the signature for htobe32 would be uint32_be htobe32( uint32_t n);.

Related

Define size of long and time_t to 4bytes

I want to synchronize two Raspberry Pi's with a C program. It is working fine, if the program only is running on the Pi's, but for development I want to use my PC (where its also easier to debug), but I send the timespec struct directly as binary over the wire. A raspberry is using 4bytes for long and time_t, my PC is using 8byte each... So they do not come together.
Is it possible to set long and time_t to 4byte each, only for this C script?
I know that the size of long, short, etc. is defined by the system.
Important: I only want to define it once in the script and not transforming it to uintXX or int each time.
In programming, it is not uncommon to need to treat network transmissions as separate from in-memory handling. In fact, it is pretty much the norm. So converting it to a network format of the proper byte order and size is really recommended and while help with the abstractions for your interfaces.
You might as well consider transforming to plain text, if that is not a time-critical piece of data exchange. It makes for a lot easier debugging.
C is probably not the best tool for the job here. It's much too low level to provide automatic data serialization like JavaScript, Python or similar more abstract languages.
You cannot assume the definitions of timespec will be identical on different platforms. For one thing the size of an int will be different depending on the 32/64 bits architecture, and you can have endianing problems too.
When you want to exchange data structures between heterogeneous platforms, you need to define your own protocol with unambiguous data and a clear endianing convention.
One solution would be to send the numbers as ASCII. Not terribly efficient, but if it's just a couple of values, who cares?
Another would be to create an exchange structure with (u)intXX_t fields.
You can assume a standard raspberry kernel will be little endian like your PC, but if you're writing a small exchange protocol, you might as well add a couple of htonl/ntohl for good measure.

Generic structure to transfer from 32 bit machine to 64 bit machine

We use the below structure in code running on a 32-bit machine. If we have to transfer this stucture to a 64-bit machine, is there any change required?
struct test
{
int num;
char a;
double dd;
};
i have two machine on network and i have two transfer data stored in above mention structure from 32 bit machine to 64 bit machine so how to make the above mention structure in generic structure so that data will not loose... this is my question.
The layout of such a structure is completely platform-dependent and you can't even use it to transfer data between two instances of a 32 bit application compiled using different compilers, or different compile settings under the same compiler.
The only safe use for such a structure in data transfer is between multiple instances of the same executable. Same as in: same build. You can't even generally guarantee that some later build will have the same structure.
To transfer binary data in a binary-compatible fashion, you need to use some kind of a binary stream that maintains a fixed binary structure, independent of the platform. Google Protocol Buffers are one example of such, another is Qt's QDataStream.
Generally the struct is not really adequate to use for network or persistency purposes, as it relies in too many ways on the C implementation (compiler + platform).
"Transferring" depends on what you're doing with the struct and contained elements.
These items should be on you checklist:
Check elements for value ranges. All used types may change in width. char may change in signedness.
Check the whole structure's size. This might be important for code relying on a specific size or some arbitrary bounds.
When leaving the process's address space (network or persistently storing) make sure that the struct's are properly migrated, incl. endings, size, alignment.
Everything depends heavily on the used C implementations on the different platforms.

Is it necessary to check for endianness with unsigned 8-bit integers?

I am sending data between a C TCP socket server and a C# TCP client.
From the C# client, the data being sent is an array of the .NET framework type System.Byte, which is an unsigned 8-bit integer.
From the C server, the data being sent is an array of the C type char, which is also an unsigned 8-bit integer.
From what I have read, endianness is an issue when dealing with 16+ bit integers, i.e. when you have more than 1 byte, then the order of bytes is either in Little Endian and Big Endian.
Since I am only transmitting 8-bit arrays, do I need to worry about endianness? I haven't been able to find a clear answer so far.
Thank you.
Your intuition is correct: endianness is irrelevant for 8-bit integers; it only comes into play for types that are wider than one byte.
The hardware takes care of the bit endianness, or more specifically bit endianness is defined in the physical layer. For instance if your packets are transmitted over ethernet they will be transmitted low-bit first. But anyway after they have been received bytes will be reassembled the way you sent them by the physical layer.
Since you are dealing with a higher layer of the protocol-stack, you only have to care about byte endianness, which means you can not have problems with a 8 bits integer.
There are 2 kinds of endianness:
byte-endianness: the order of bytes within a multi-byte data
bit-endianness: the order of bits within a byte
Most of the time when you say endianness you say byte-endianness. That is because bit-endianness is almost always big-endian, but the byte-endianness varies across systems.
With byte-endianness you don't have to worry for data that is 1-byte width, as you suspect.
Can't say how much role does bit-endianness play in the tcp protocol. From the other answers here it looks like you don't have to worry about bit-endianness at the transport layer.

Big Endian Vs Little Endian Padding Issue

In my code there is a structure which have padding issues. I fixed them and my code is running fine on a little endian machine. Can there be a chance that this stucture cause a problem for a big endian machine ??
You need to keep the following in mind:
Whenever doing data communication, the endianess of the communication protocol is what matters. All data communication protocols have (should have) a specified endianess. Big endian is probably most common, because back in the days where CRC calculations were done with digital electonic gates rather than software, the checksum itself had to be big endian.
(This can lead to quite obscure protocols, like the industry standard field bus CANopen, where all integers in the sent data must be little endian, but the identifier and checksum must be big endian.)
Struct padding will always cause issues when you are writing portable code. Code like send(&my_struct, sizeof(my_struct) is never portable! Because it will send the data and any padding bytes. And padding bytes may be anywhere inside the struct and not just in the end. If you need to write truly portable code, you cannot use structs/unions for the data protocol, everything needs to be stored in arrays of bytes or similar, where the data is guaranteed to be allocated in adjacent cells. Struct padding has nothing to do with endianess, but rather of the CPU instruction set.
(Motorola CPUs have traditionally had better support for reading and storing at unaligned addresses, while Intel derivates have alignment requirements and are therefore more prone to use padding. As it happens, Motorola were with the big endians and Intel were with the little endians. So by coincidence, little endian CPUs are more likely to have padding, but this is only because of the CPU instruction set and not because of the endianess itself.)
A structure, in C, is a way of representing data in memory. (It gives "structure" to memory.)
Any conversion from "struct" to "sequence of bytes" that just casts the "struct" bit away, and uses whatever underlying byte representation C is using is going to be affected by endianness. (And padding. Maybe other issues too, like pointers, sizeof(some-integral-type), etc.)
I suspect you're doing something like this:
// Some non-standard way to get rid of padding in Foo
struct Foo
{
// Some fields...
}
// Meanwhile, in a function somewhere...
fwrite(a_foo, sizeof(a_foo), 1, fp);
Maybe you're not calling fwrite, maybe it's send, but yes, if you're doing serialization like this, you are going to be effected by endianness.

endianess and integer variable

In c I am little worried with the concept of endianess. Suppose I declare an integer (2 bytes) on a little endian machine as
int a = 1;
and it is stored as:
Address value
1000 1
1001 0
On big endian it should be stored as vice-versa. Now if I do &a then I should get 1000 on on both machines.
If this is true then if I store int a=1 then I should get a=1 on little endian whereas 2^15 on big endian. Is this correct?
It should not matter to you how the data is represented as long as you don't transfer it between platforms (or access it through assembly code).
If you only use standard C - it's hidden from you and you shouldn't bother yourself with it. If you pass data around between unknown machines (for example you have a client and a server application that communicate through some network) - use hton and ntoh functions that convert from local to network and from network to local endianess, to avoid problems.
If you access memory directly, then the problem would not only be endian but also packing, so be careful with it.
In both little endian and big endian, the address of the "first" byte is returned. Here by "first" I mean 1000 in your example and not "first" in the sense of most-significant or least significant byte.
Now if I do &a then I should get 1000 on little endian and 1001 on big endian.
No; on both machines you will get the address 1000.
If this is true then if I store int a=1 then I should get a=1 on little endian whereas 2^15 on big endian. Is this correct?
No; because the preliminary condition is false. Also, if you assign 1 to a variable and then read the value back again, you get the result 1 back. Anything else would be a major cause of confusion. (There are occasions when this (reading back the stored value) is not guaranteed - for example, if multiple threads could be modifying the variable, or if it is an I/O register on an embedded system. However, you'd be aware of these if you needed to know about it.)
For the most part, you do not need to worry about endianness. There are a few places where you do. The socket networking functions are one such; writing binary data to a file which will be transferred between machines of different endianness is another. Pretty much all the rest of the time, you don't have to worry about it.
This is correct, but it's not only an issue in c, but any program that reads and writes binary data. If the data stays on a single computer, then it will be fine.
Also, if you are reading/writing from text files this won't be an issue.

Resources