Is ntohl in x86 assembly necessary? - c

My goal is to send a integer value over TCP as little endian bytes to a client on windows, the client is written in x86 assembly.
If I were to send htonl() encoded bytes to the client would it be necessary being the client is compiled as x86? For instance wouldn't it be redundant to call ntohl() within my client's assembly code?
My overarching question is do I need to call htonl() server-side and ntohl() client side (x86 windows client)? Or should I just let the server do the work by checking if the server's architecture is big endian and if so swap the integer bytes via __builtin_bswap32() and send the little endian bytes to the client? I'm asking because I've read x86 is always little endian so it seems redundant if I know the client is always going to be written in x86 assembly.

My overarching question is do I need to call htonl() server-side and ntohl() client side (x86 windows client)?
No, that would convert to big-endian ("network" byte order), but you said you wanted to send data over the network in little-endian format. On x86, that already is the h ("host") order.
In x86 asm, your data in memory will already be little-endian integers / floats unless you did something unusual (like using bswap, movbe, or pshufb, or byte-at-a-time shift / store.)
To be compatible with that in C, use le32toh (on receive) and htole32 (before send) from GCC's / BSD <endian.h> instead of ntohl / htonl. i.e. use LE as your network format instead of the traditional BE.
call ntohl() within my client's assembly code?
That would be insane. If you did want that, just use the bswap or movbe instructions instead of actually setting up args for a function call. Normally those functions inline when you use them in C, although there is a stand-alone definition of ntohl in libc.
Also, no, you wouldn't want to do that. Your client doesn't want to have anything to do with big-endian, which is what those traditional functions call "network" byte order.
x86 asm with AVX2 vpshufb can byte-swap at memcpy speed (including on small buffers that fit in L1d cache), but it's even more efficient not to have to swap at all as part of the first step that reads the data.

The htonl function converts a 32 bit value from [h]ost byte order to [n]etwork byte order. Host byte order may be either big endian or little endian, and network byte order is big endian. The ntohl function does the reverse.
The purpose of these functions is to abstract away any possible conversion of the host's byte order and to have a known format for the network. So on a little endian system these functions reverse the byte order, while on a big endian system they return the original value unchanged.
I would recommend using the standard network byte order, i.e. big endian, for sending values over the network. They your code would use htonl to prepare values for sending and ntohl to read received values regardless of the host architecture.

Related

Conversion required for two little endian machine

I was been asked in an interview question about "Does the conversion (little to big endian & vice versa) required for data which have to be transfer between two little endian machine"
As far as i know the little endian store lowest byte at lowest address and big endian stores higher byte at lowest address. I am not sure for data transfer between two machine how it happens?
Anyone have any idea on this.
Thanks
Functions like htons, htonl, ntohs and ntohl can convert little endian to network order (big-endian) and vice versa. Network protocols require data to be converted to big-endian first, and are then converted back to little endian if that is the host byte order.
The convention is to transfer numerics over the network in big-endian format.
In general, you should not make any assumptions about destination machine platform.
This way two machines can understand each other, even if they use different formats for storing numerics internally.

C- how to make a variable to be big end. by default?

I want my program to use big end by default (now it is little).
That means,everytime I declare an uint_32/int,the value which be transformed to it will be on big end order.
Is that possible? (Without calling each time to ntohl()).
I have researched on google and s.o for 3 days and haven't got any anawer
I will strongly appreciate any help!
edit:
I have server and client.
The client works with big endian,and the server is little endian.
Now,i am sending to the server md5 value for a byte array,ntohl()ing it on the server and get the valid md5 value.
In the server,when I call the md5.c function (which is dll,by the way),I am getting a different value.
This value is not even similair on any way to the recived value.
I assume it happens because of the endiannes.
The byte array I send to the function is not changing because those are single bytes,which are not seensetive the endianess,but the vars I declare on the function,and with them I manipulate my byte arr,can make a problem.
That is the reason big endian is so important to me.
I want my program to use big end by default
You cannot. The endianness is a fixed property of the target processor (of your C compiler) and related to its instruction set architecture.
You might in principle (but with C, that is really not worth the trouble) use a C compiler for some virtual machine (like WebAssembly, or MMIX) of the appropriate endianness, or even define a bytecode VM for a machine of desired endianness.
A variant of that might be to cross-compile for some other instruction set architecture (e.g. MIPS, ARM, etc...) and use some emulator to run the executable (and the necessary OS).
Regarding your edit, you could consider sending on the network the md5sum in alphanumerical form (e.g. like the output of the md5sum command).
You can compile the program for different processors, some of which use big endian instead of little endian, or you can control how your code will be compiled in that aspect. For example, for MIPS: https://gcc.gnu.org/onlinedocs/gcc/MIPS-Options.html
Endianness is purely about how a processor performs multi-byte arithmetic. The only time a programmer need to be aware is when serializing or addressing parts of an integer.
So unless you can change how a processor performs arithmetic on multi-byte words (ARM allows you to change endianness) you are stuck with how the processor works.

Decide between htons and htonl

I have read this explanation and this manual page about the usage of htons, htonl, ntohl and ntohs, however I still do not fully understand their usage (I am quite new to the socket API and network programming in general). I do understand little and big Endian and byte order, but I'm not sure how to correctly implement these functions in my chat application code so it works consistently across different machines (where one uses big Endian and another uses little).
A little context to my specific situation; I have an (almost) fully working chat application, and have just been using htons, but after some research it seems this is unreliable. I realise that this question might seem quite similar, however here I am asking about an implementation example, I already am aware of the basic function of these calls.
The idea is have some common representation for integer types. TCP/IP uses NBO, the Network Byte Ordering. And whatever it is (big or little endian or more exotic) the way to send a 16 bits integer, or a 32 bits is to use htons or htonl before sending such a data. Then when receiving you have to convert it back to your host representation with ntohs and ntohl:
Sender want to send value uint32_t a, then it sends data returned by htonl(a), say d.
Receiver gets d, applies ntohl(d) and get a correct uint32_t value.
Theses are just coding/decoding functions. Sender sends code(v), then receiver gets decode(code(v))! You don't have to know what code(v) equals to (no interest).
If it seems unreliable to you, this is because you don't use it as it should be.

Is it necessary to check for endianness with unsigned 8-bit integers?

I am sending data between a C TCP socket server and a C# TCP client.
From the C# client, the data being sent is an array of the .NET framework type System.Byte, which is an unsigned 8-bit integer.
From the C server, the data being sent is an array of the C type char, which is also an unsigned 8-bit integer.
From what I have read, endianness is an issue when dealing with 16+ bit integers, i.e. when you have more than 1 byte, then the order of bytes is either in Little Endian and Big Endian.
Since I am only transmitting 8-bit arrays, do I need to worry about endianness? I haven't been able to find a clear answer so far.
Thank you.
Your intuition is correct: endianness is irrelevant for 8-bit integers; it only comes into play for types that are wider than one byte.
The hardware takes care of the bit endianness, or more specifically bit endianness is defined in the physical layer. For instance if your packets are transmitted over ethernet they will be transmitted low-bit first. But anyway after they have been received bytes will be reassembled the way you sent them by the physical layer.
Since you are dealing with a higher layer of the protocol-stack, you only have to care about byte endianness, which means you can not have problems with a 8 bits integer.
There are 2 kinds of endianness:
byte-endianness: the order of bytes within a multi-byte data
bit-endianness: the order of bits within a byte
Most of the time when you say endianness you say byte-endianness. That is because bit-endianness is almost always big-endian, but the byte-endianness varies across systems.
With byte-endianness you don't have to worry for data that is 1-byte width, as you suspect.
Can't say how much role does bit-endianness play in the tcp protocol. From the other answers here it looks like you don't have to worry about bit-endianness at the transport layer.

Endian dependent code in real application?

I know the following C code is endian-dependent:
short s_endian = 0x4142;
char c_endian = *(char *)&s_endian;
On a big-endian machine, c_endian will be 'A'(0x41); while on a little-endian machine, it will be 'B'(0x42).
But this code seems kind of ugly. So is there endian dependent code in real applications? Or have you came across any application that needs a lot of changes when porting to a different target with a different endian?
Thanks.
Pretty much any code that deals with saving integers with more than 8 bits in binary format, or sends such integers over the network. For one extremely common example, many of the fields in the TCP header fall into this category.
Networking code is endian dependent (it should always transfer across the network as big-endian, even on a little-endian machine), hence the need for functions like htons(), htonl(), ntohs(), and ntohl() in net/hton.h that allow easy conversions from host-to-network byte-order and network-to-host byte-order.
Hope this helps,
Jason
I once collected data using a specialized DAQ card on a PC, and tried to analyze the file on a PowerPC mac. Turns out the "file format" the thing used was a raw memory dump...
Little endian on x86, big endian on Power PC. You figure it out.
The short answer is yes. Anything that reads/writes raw binary to a file or socket needs to keep track of the endianness of the data.
For example, the IP protocol requires big-endian representation.
When manipulating the internal representation of floating-point numbers, you could access the parts (or the full value) using an integer type. For example:
union float_u
{
float f;
unsigned short v[2];
};
int get_sign(float f)
{
union float_u u;
u.f = f;
return (u.v[0] & 0x8000) != 0; // Endian-dependant
}
If your program sends data to another system (either over a serial or network link, or by saving it to a file for something else to read) or reads data from another system, then you can have endianness issues.
I don't know that static analysis would be able to detect such constructs, but having your programmers follow a coding standard, where structure elements and variables were marked up to indicate their endianness could help.
For example, if all network data structures had _be appended to the named of multi-byte members, you could look for instances where you assigned a non-suffixed (host byte order) variable or even a literal value (like 0x1234) to one of those members.
It would be great if we could capture endianness in our datatypes -- uint32_be and uint32_le to go with uint32_t. Then the compiler could disallow assignments or operations between the two. And the signature for htobe32 would be uint32_be htobe32( uint32_t n);.

Resources