Conversion required for two little endian machine - c

I was been asked in an interview question about "Does the conversion (little to big endian & vice versa) required for data which have to be transfer between two little endian machine"
As far as i know the little endian store lowest byte at lowest address and big endian stores higher byte at lowest address. I am not sure for data transfer between two machine how it happens?
Anyone have any idea on this.
Thanks

Functions like htons, htonl, ntohs and ntohl can convert little endian to network order (big-endian) and vice versa. Network protocols require data to be converted to big-endian first, and are then converted back to little endian if that is the host byte order.

The convention is to transfer numerics over the network in big-endian format.
In general, you should not make any assumptions about destination machine platform.
This way two machines can understand each other, even if they use different formats for storing numerics internally.

Related

Is ntohl in x86 assembly necessary?

My goal is to send a integer value over TCP as little endian bytes to a client on windows, the client is written in x86 assembly.
If I were to send htonl() encoded bytes to the client would it be necessary being the client is compiled as x86? For instance wouldn't it be redundant to call ntohl() within my client's assembly code?
My overarching question is do I need to call htonl() server-side and ntohl() client side (x86 windows client)? Or should I just let the server do the work by checking if the server's architecture is big endian and if so swap the integer bytes via __builtin_bswap32() and send the little endian bytes to the client? I'm asking because I've read x86 is always little endian so it seems redundant if I know the client is always going to be written in x86 assembly.
My overarching question is do I need to call htonl() server-side and ntohl() client side (x86 windows client)?
No, that would convert to big-endian ("network" byte order), but you said you wanted to send data over the network in little-endian format. On x86, that already is the h ("host") order.
In x86 asm, your data in memory will already be little-endian integers / floats unless you did something unusual (like using bswap, movbe, or pshufb, or byte-at-a-time shift / store.)
To be compatible with that in C, use le32toh (on receive) and htole32 (before send) from GCC's / BSD <endian.h> instead of ntohl / htonl. i.e. use LE as your network format instead of the traditional BE.
call ntohl() within my client's assembly code?
That would be insane. If you did want that, just use the bswap or movbe instructions instead of actually setting up args for a function call. Normally those functions inline when you use them in C, although there is a stand-alone definition of ntohl in libc.
Also, no, you wouldn't want to do that. Your client doesn't want to have anything to do with big-endian, which is what those traditional functions call "network" byte order.
x86 asm with AVX2 vpshufb can byte-swap at memcpy speed (including on small buffers that fit in L1d cache), but it's even more efficient not to have to swap at all as part of the first step that reads the data.
The htonl function converts a 32 bit value from [h]ost byte order to [n]etwork byte order. Host byte order may be either big endian or little endian, and network byte order is big endian. The ntohl function does the reverse.
The purpose of these functions is to abstract away any possible conversion of the host's byte order and to have a known format for the network. So on a little endian system these functions reverse the byte order, while on a big endian system they return the original value unchanged.
I would recommend using the standard network byte order, i.e. big endian, for sending values over the network. They your code would use htonl to prepare values for sending and ntohl to read received values regardless of the host architecture.

no big endian and little endian in string?

We know different byte ordering machines store the object in memory ordered from least significant byte to most, while other machines store them from most to least. e.g. a hexadecimal value of 0x01234567.
so if we write a C program that print each byte from the memory address, big endian and little endian machines produce different result.
But for strings, This same result would be obtained on any system using ASCII as its character code, independent of the byte ordering and word size conventions. As a consequence, text data is more platform-independent than binary data.
So my question is, why we differential big endian and little endian for binary data, we could make it the same as text data which is platform-independent. What's the point to make big endian and little endian machine just in binary data?
Array elements are always addressed from low to high, regardless of endianness conventions.
ASCII and UTF-8 strings are arrays of char, which is not a multibyte type and is not affected by endianness conventions.
"Wide" strings, where each character is represented by wchar_t or another multibyte type, will be affected, but only for the individual elements, not the string as a whole.
So my question is, why we differential big endian and little endian for binary data, we could make it the same as text data which is platform-independent. What's the point to make big endian and little endian machine just in binary data?
In short: we already do: for example, a file format specification will dictate if a 32-bit integer should be serialized in big-endian or little-endian order. Similarly, network protocols will dictate the byte-order of multi-byte values (which is why htons is a thing).
However if we're only concerned with in-memory representation of binary data (and not serialized binary data) then it makes sense to only store values using the fastest representation - i.e. by using the byte-order natively preferred by the CPU and ISA. For x86 and x64 this is Little-Endian, but for ARM, MIPS, 68k, and so on - the preferred order is Big-endian (Though most non-x86 ISAs now support both big-endian and little-endian modes).
But for strings, This same result would be obtained on any system using ASCII as its character code, independent of the byte ordering and word size conventions. As a consequence, text data is more platform-independent than binary data.
So my question is, why we differential big endian and little endian for binary data, we could make it the same as text data which is platform-independent.
In short:
ASCII Strings are not integers.
Integers are not ASCII strings.
You're basically asking why we don't represent integer numbers in Base-10 Big-Endian format: we don't because Base-10 is difficult for digital computers to work with (digital computers work in Base-2). The closest thing we have to what you're describing is binary-coded-decimal and the reason computers today don't use this normally is because it's slow and inefficient (as only 4 bits are needed to represent a Base-10 value in Base-2 - you could "pack" two Base-10 values in a single byte but that can be slow because CPUs generally are fastest on word-sized (and at-least byte-sized) values - not nibble-sized (half-byte) sized-values - and actually this still doesn't solve the big-endian vs. little-endian problem (as BCD values could still be represented using either BE or LE order - and even char-based strings could be stored in reverse order without it affecting how they're processed!).

Is it necessary to check for endianness with unsigned 8-bit integers?

I am sending data between a C TCP socket server and a C# TCP client.
From the C# client, the data being sent is an array of the .NET framework type System.Byte, which is an unsigned 8-bit integer.
From the C server, the data being sent is an array of the C type char, which is also an unsigned 8-bit integer.
From what I have read, endianness is an issue when dealing with 16+ bit integers, i.e. when you have more than 1 byte, then the order of bytes is either in Little Endian and Big Endian.
Since I am only transmitting 8-bit arrays, do I need to worry about endianness? I haven't been able to find a clear answer so far.
Thank you.
Your intuition is correct: endianness is irrelevant for 8-bit integers; it only comes into play for types that are wider than one byte.
The hardware takes care of the bit endianness, or more specifically bit endianness is defined in the physical layer. For instance if your packets are transmitted over ethernet they will be transmitted low-bit first. But anyway after they have been received bytes will be reassembled the way you sent them by the physical layer.
Since you are dealing with a higher layer of the protocol-stack, you only have to care about byte endianness, which means you can not have problems with a 8 bits integer.
There are 2 kinds of endianness:
byte-endianness: the order of bytes within a multi-byte data
bit-endianness: the order of bits within a byte
Most of the time when you say endianness you say byte-endianness. That is because bit-endianness is almost always big-endian, but the byte-endianness varies across systems.
With byte-endianness you don't have to worry for data that is 1-byte width, as you suspect.
Can't say how much role does bit-endianness play in the tcp protocol. From the other answers here it looks like you don't have to worry about bit-endianness at the transport layer.

endianess and integer variable

In c I am little worried with the concept of endianess. Suppose I declare an integer (2 bytes) on a little endian machine as
int a = 1;
and it is stored as:
Address value
1000 1
1001 0
On big endian it should be stored as vice-versa. Now if I do &a then I should get 1000 on on both machines.
If this is true then if I store int a=1 then I should get a=1 on little endian whereas 2^15 on big endian. Is this correct?
It should not matter to you how the data is represented as long as you don't transfer it between platforms (or access it through assembly code).
If you only use standard C - it's hidden from you and you shouldn't bother yourself with it. If you pass data around between unknown machines (for example you have a client and a server application that communicate through some network) - use hton and ntoh functions that convert from local to network and from network to local endianess, to avoid problems.
If you access memory directly, then the problem would not only be endian but also packing, so be careful with it.
In both little endian and big endian, the address of the "first" byte is returned. Here by "first" I mean 1000 in your example and not "first" in the sense of most-significant or least significant byte.
Now if I do &a then I should get 1000 on little endian and 1001 on big endian.
No; on both machines you will get the address 1000.
If this is true then if I store int a=1 then I should get a=1 on little endian whereas 2^15 on big endian. Is this correct?
No; because the preliminary condition is false. Also, if you assign 1 to a variable and then read the value back again, you get the result 1 back. Anything else would be a major cause of confusion. (There are occasions when this (reading back the stored value) is not guaranteed - for example, if multiple threads could be modifying the variable, or if it is an I/O register on an embedded system. However, you'd be aware of these if you needed to know about it.)
For the most part, you do not need to worry about endianness. There are a few places where you do. The socket networking functions are one such; writing binary data to a file which will be transferred between machines of different endianness is another. Pretty much all the rest of the time, you don't have to worry about it.
This is correct, but it's not only an issue in c, but any program that reads and writes binary data. If the data stays on a single computer, then it will be fine.
Also, if you are reading/writing from text files this won't be an issue.

Endian dependent code in real application?

I know the following C code is endian-dependent:
short s_endian = 0x4142;
char c_endian = *(char *)&s_endian;
On a big-endian machine, c_endian will be 'A'(0x41); while on a little-endian machine, it will be 'B'(0x42).
But this code seems kind of ugly. So is there endian dependent code in real applications? Or have you came across any application that needs a lot of changes when porting to a different target with a different endian?
Thanks.
Pretty much any code that deals with saving integers with more than 8 bits in binary format, or sends such integers over the network. For one extremely common example, many of the fields in the TCP header fall into this category.
Networking code is endian dependent (it should always transfer across the network as big-endian, even on a little-endian machine), hence the need for functions like htons(), htonl(), ntohs(), and ntohl() in net/hton.h that allow easy conversions from host-to-network byte-order and network-to-host byte-order.
Hope this helps,
Jason
I once collected data using a specialized DAQ card on a PC, and tried to analyze the file on a PowerPC mac. Turns out the "file format" the thing used was a raw memory dump...
Little endian on x86, big endian on Power PC. You figure it out.
The short answer is yes. Anything that reads/writes raw binary to a file or socket needs to keep track of the endianness of the data.
For example, the IP protocol requires big-endian representation.
When manipulating the internal representation of floating-point numbers, you could access the parts (or the full value) using an integer type. For example:
union float_u
{
float f;
unsigned short v[2];
};
int get_sign(float f)
{
union float_u u;
u.f = f;
return (u.v[0] & 0x8000) != 0; // Endian-dependant
}
If your program sends data to another system (either over a serial or network link, or by saving it to a file for something else to read) or reads data from another system, then you can have endianness issues.
I don't know that static analysis would be able to detect such constructs, but having your programmers follow a coding standard, where structure elements and variables were marked up to indicate their endianness could help.
For example, if all network data structures had _be appended to the named of multi-byte members, you could look for instances where you assigned a non-suffixed (host byte order) variable or even a literal value (like 0x1234) to one of those members.
It would be great if we could capture endianness in our datatypes -- uint32_be and uint32_le to go with uint32_t. Then the compiler could disallow assignments or operations between the two. And the signature for htobe32 would be uint32_be htobe32( uint32_t n);.

Resources