Decide between htons and htonl - c

I have read this explanation and this manual page about the usage of htons, htonl, ntohl and ntohs, however I still do not fully understand their usage (I am quite new to the socket API and network programming in general). I do understand little and big Endian and byte order, but I'm not sure how to correctly implement these functions in my chat application code so it works consistently across different machines (where one uses big Endian and another uses little).
A little context to my specific situation; I have an (almost) fully working chat application, and have just been using htons, but after some research it seems this is unreliable. I realise that this question might seem quite similar, however here I am asking about an implementation example, I already am aware of the basic function of these calls.

The idea is have some common representation for integer types. TCP/IP uses NBO, the Network Byte Ordering. And whatever it is (big or little endian or more exotic) the way to send a 16 bits integer, or a 32 bits is to use htons or htonl before sending such a data. Then when receiving you have to convert it back to your host representation with ntohs and ntohl:
Sender want to send value uint32_t a, then it sends data returned by htonl(a), say d.
Receiver gets d, applies ntohl(d) and get a correct uint32_t value.
Theses are just coding/decoding functions. Sender sends code(v), then receiver gets decode(code(v))! You don't have to know what code(v) equals to (no interest).
If it seems unreliable to you, this is because you don't use it as it should be.

Related

Is ntohl in x86 assembly necessary?

My goal is to send a integer value over TCP as little endian bytes to a client on windows, the client is written in x86 assembly.
If I were to send htonl() encoded bytes to the client would it be necessary being the client is compiled as x86? For instance wouldn't it be redundant to call ntohl() within my client's assembly code?
My overarching question is do I need to call htonl() server-side and ntohl() client side (x86 windows client)? Or should I just let the server do the work by checking if the server's architecture is big endian and if so swap the integer bytes via __builtin_bswap32() and send the little endian bytes to the client? I'm asking because I've read x86 is always little endian so it seems redundant if I know the client is always going to be written in x86 assembly.
My overarching question is do I need to call htonl() server-side and ntohl() client side (x86 windows client)?
No, that would convert to big-endian ("network" byte order), but you said you wanted to send data over the network in little-endian format. On x86, that already is the h ("host") order.
In x86 asm, your data in memory will already be little-endian integers / floats unless you did something unusual (like using bswap, movbe, or pshufb, or byte-at-a-time shift / store.)
To be compatible with that in C, use le32toh (on receive) and htole32 (before send) from GCC's / BSD <endian.h> instead of ntohl / htonl. i.e. use LE as your network format instead of the traditional BE.
call ntohl() within my client's assembly code?
That would be insane. If you did want that, just use the bswap or movbe instructions instead of actually setting up args for a function call. Normally those functions inline when you use them in C, although there is a stand-alone definition of ntohl in libc.
Also, no, you wouldn't want to do that. Your client doesn't want to have anything to do with big-endian, which is what those traditional functions call "network" byte order.
x86 asm with AVX2 vpshufb can byte-swap at memcpy speed (including on small buffers that fit in L1d cache), but it's even more efficient not to have to swap at all as part of the first step that reads the data.
The htonl function converts a 32 bit value from [h]ost byte order to [n]etwork byte order. Host byte order may be either big endian or little endian, and network byte order is big endian. The ntohl function does the reverse.
The purpose of these functions is to abstract away any possible conversion of the host's byte order and to have a known format for the network. So on a little endian system these functions reverse the byte order, while on a big endian system they return the original value unchanged.
I would recommend using the standard network byte order, i.e. big endian, for sending values over the network. They your code would use htonl to prepare values for sending and ntohl to read received values regardless of the host architecture.

Packing IP into a uint32

Is the packing of a char IP[4] (192.168.2.1) into a uint32 detailed as part of any RSI/ISO/standards bodies (i.e. anyone here https://en.wikipedia.org/wiki/List_of_technical_standard_organisations)? I know it's standard across a range of languages and tools, but I'm wondering if it's part of any international standard, and if so which one? That should detail for example the endianness used.
Need to know as I'm writing a specification, and I don't want to re-invent the wheel by describing the technique.
All IP addresses (and, in fact, all multi-byte numbers used in the standard networking stack) sent on the wire are in "network order" which is big-endian.
How to represent four bytes of anything within a program depends on the programmer though they typically let the compiler decide which of course uses whatever the native hardware uses.
From Why is network-byte-order defined to be big-endian? ...
RFC1700 stated it must be so. (and defined network byte order as big-endian). It has now been superseded by RFC-3232, but this part remains the same.
The convention in the documentation of Internet Protocols is to
express numbers in decimal and to picture data in "big-endian" order
[COHEN]. That is, fields are described left to right, with the most
significant octet on the left and the least significant octet on the
right.
The reference they make is to
On Holy Wars and a Plea for Peace
Cohen, D.
Computer
The abstract can be found at IEN-137 or on this IEEE page.
Summary:
Which way is chosen does not make too much
difference. It is more important to agree upon an order than which
order is agreed upon.
It concludes that both big-endian and little-endian schemes could've been possible. There is no better/worse scheme, and either can be used in place of the other as long as it is consistent all across the system/protocol.
This depends on where that IPv4 address is going.
For actual IP usage, i.e. in network packets, the packing always uses network byte order, i.e. big-endian.
If you're usage is something different, then you're of course free to define the packing however you want.

Is it necessary to check for endianness with unsigned 8-bit integers?

I am sending data between a C TCP socket server and a C# TCP client.
From the C# client, the data being sent is an array of the .NET framework type System.Byte, which is an unsigned 8-bit integer.
From the C server, the data being sent is an array of the C type char, which is also an unsigned 8-bit integer.
From what I have read, endianness is an issue when dealing with 16+ bit integers, i.e. when you have more than 1 byte, then the order of bytes is either in Little Endian and Big Endian.
Since I am only transmitting 8-bit arrays, do I need to worry about endianness? I haven't been able to find a clear answer so far.
Thank you.
Your intuition is correct: endianness is irrelevant for 8-bit integers; it only comes into play for types that are wider than one byte.
The hardware takes care of the bit endianness, or more specifically bit endianness is defined in the physical layer. For instance if your packets are transmitted over ethernet they will be transmitted low-bit first. But anyway after they have been received bytes will be reassembled the way you sent them by the physical layer.
Since you are dealing with a higher layer of the protocol-stack, you only have to care about byte endianness, which means you can not have problems with a 8 bits integer.
There are 2 kinds of endianness:
byte-endianness: the order of bytes within a multi-byte data
bit-endianness: the order of bits within a byte
Most of the time when you say endianness you say byte-endianness. That is because bit-endianness is almost always big-endian, but the byte-endianness varies across systems.
With byte-endianness you don't have to worry for data that is 1-byte width, as you suspect.
Can't say how much role does bit-endianness play in the tcp protocol. From the other answers here it looks like you don't have to worry about bit-endianness at the transport layer.

Is it a must to reverse byte order when sending numbers across the network?

In most of the examples on the web, authors usually change the byte order before sending a number from host byte order to network byte order. Then at the receiving end, authors usually restore the order back from network byte order to host byte order.
Q1:Considering that the architecture of two systems are unknown, wouldn't it be more efficient if the authors simply checked for the endianness of the machines before reversing the byte order?
Q2:Is it really necessary to reverse the byte order of numbers even if they are passed to & received by the same machine architecture?
In general, you can't know the architecture of the remote system. If everyone uses a specific byte order - network byte order, then there is no confusion. There is some cost to all the reversing, but the cost of re-engineering ALL network devices would be far greater.
A1: Suppose that we were going to try to establish the byte order of the remote systems. We need to establish communication between systems and determine what byte order the remote system has. How do we communicate without knowing the byte order?
A2: If you know that both systems have the same architecture, then no you don't need to reverse bytes at each end. But you don't, in general, know. And if you do design a system like that, then you have made an network-architecture decision that excludes different CPU architectures in the future. Consider Apple switching from 86k to PPC to x86.
A1: No, that's the point. You don't want to have to care about the endienness of the machine on the other end. In most cases (outside of a strict, controlled environment ... which is most definitely not the internet) you're not going know. If everyone uses the same byte order, you don't have to worry about it.
A2: No. Except when that ends up not being the case down the road and everything breaks, someone is going to wonder why a well known best practice wasn't followed. Usually this will be the person signing off on your paycheck.
Little or big Endian is platform specific, but for network communication, it is common with big endianness, see wiki.
Its not about blind reversing.
All networks work on big endian. My computer [ linux + intel i386] work on little endian. so i always reverse the order when i code for my computer. i think mac work over big endian. Some mobile phone platforms also.
Network byte order is big-endian. If the sending or receiving architecture is big-endian too, you could skip the step on that end as the translation amounts to a nop. However, why bother? Translating everything is simpler, safer, and has close to no performance impact.
Q1: yes, it would be more efficient if sender & reciever tested their endianess (more precisely, communicate to each other their endianess, and tested if it is the same).
Q2: no, it is not always necessary to use a "standard" byte order. But it is simpler for code wanting to interoperate portably.
However, ease of coding might be more important than communication performance -and the network costs much more than swapping bytes, unless you've got a big lot of data.
Read about serialization and e.g. this question and my answer there
No need to test for endianness if you use the ntohl() and htonl(), etc. functions/macros. On big-endian machines, they'll already be no-ops.

Endian dependent code in real application?

I know the following C code is endian-dependent:
short s_endian = 0x4142;
char c_endian = *(char *)&s_endian;
On a big-endian machine, c_endian will be 'A'(0x41); while on a little-endian machine, it will be 'B'(0x42).
But this code seems kind of ugly. So is there endian dependent code in real applications? Or have you came across any application that needs a lot of changes when porting to a different target with a different endian?
Thanks.
Pretty much any code that deals with saving integers with more than 8 bits in binary format, or sends such integers over the network. For one extremely common example, many of the fields in the TCP header fall into this category.
Networking code is endian dependent (it should always transfer across the network as big-endian, even on a little-endian machine), hence the need for functions like htons(), htonl(), ntohs(), and ntohl() in net/hton.h that allow easy conversions from host-to-network byte-order and network-to-host byte-order.
Hope this helps,
Jason
I once collected data using a specialized DAQ card on a PC, and tried to analyze the file on a PowerPC mac. Turns out the "file format" the thing used was a raw memory dump...
Little endian on x86, big endian on Power PC. You figure it out.
The short answer is yes. Anything that reads/writes raw binary to a file or socket needs to keep track of the endianness of the data.
For example, the IP protocol requires big-endian representation.
When manipulating the internal representation of floating-point numbers, you could access the parts (or the full value) using an integer type. For example:
union float_u
{
float f;
unsigned short v[2];
};
int get_sign(float f)
{
union float_u u;
u.f = f;
return (u.v[0] & 0x8000) != 0; // Endian-dependant
}
If your program sends data to another system (either over a serial or network link, or by saving it to a file for something else to read) or reads data from another system, then you can have endianness issues.
I don't know that static analysis would be able to detect such constructs, but having your programmers follow a coding standard, where structure elements and variables were marked up to indicate their endianness could help.
For example, if all network data structures had _be appended to the named of multi-byte members, you could look for instances where you assigned a non-suffixed (host byte order) variable or even a literal value (like 0x1234) to one of those members.
It would be great if we could capture endianness in our datatypes -- uint32_be and uint32_le to go with uint32_t. Then the compiler could disallow assignments or operations between the two. And the signature for htobe32 would be uint32_be htobe32( uint32_t n);.

Resources