Byte order with a large array of characters in C - c

I am doing some socket programming in C, and trying to wrestle with byte order problems. My request (send) is fine but when I receive data my bytes are all out of order. I start with something like this:
char * aResponse= (char *)malloc(512);
int total = recv(sock, aResponse, 511, 0);
When dealing with this response, each 16bit word seems to have it's bytes reversed (I'm using UDP). I tried to fix that by doing something like this:
unsigned short * _netOrder= (unsigned short *)aResponse;
unsigned short * newhostOrder= (unsigned short *)malloc(total);
for (i = 0; i < total; ++i)
{
newhostOrder[i] = ntohs(_netOrder[i]);
}
This works ok when I am treating the data as a short, however if I cast the pointer to a char again the bytes are reversed. What am I doing wrong?

Ok, there seems to be problems with what you are doing on two different levels. Part of the confusion here seems to stem for your use of pointers, what type of objects they point to, and then the interpretation of the encoding of the values in the memory pointed to by the pointer(s).
The encoding of multi-byte entities in memory is what is referred to as endianess. The two common encodings are referred to as Little Endian (LE) and Big Endian (BE). With LE, a 16-bit quantity like a short is encoded least significant byte (LSB) first. Under BE, the most significant byte (MSB) is encoded first.
By convention, network protocols normally encode things into what we call "network byte order" (NBO) which also happens to be the same as BE. If you are sending and receiving memory buffers on big endian platforms, then you will not run into conversion problems. However, your code would then be platform dependent on the BE convention. If you want to write portable code that works correctly on both LE and BE platforms, you should not assume the platform's endianess.
Achieving endian portability is the purpose of routines like ntohs(), ntohl(), htons(), and htonl(). These functions/macros are defined on a given platform to do the necessary conversions at the sending and receiving ends:
htons() - Convert short value from host order to network order (for sending)
htonl() - Convert long value from host order to network order (for sending)
ntohs() - Convert short value from network order to host order (after receive)
ntohl() - Convert long value from network order to host order (after receive)
Understand that your comment about accessing the memory when cast back to characters has no affect on the actual order of entities in memory. That is, if you access the buffer as a series of bytes, you will see the bytes in whatever order they were actually encoded into memory as, whether you have a BE or LE machine. So if you are looking at a NBO encoded buffer after receive, the MSB is going to be first - always. If you look at the output buffer after your have converted back to host order, if you have BE machine, the byte order will be unchanged. Conversely, on a LE machine, the bytes will all now be reversed in the converted buffer.
Finally, in your conversion loop, the variable total refers to bytes. However, you are accessing the buffer as shorts. Your loop guard should not be total, but should be:
total / sizeof( unsigned short )
to account for the double byte nature of each short.

This works ok when I'm treating the data as a short, however if I cast the pointer to a char again the bytes are reversed.
That's what I'd expect.
What am I doing wrong?
You have to know what the sender sent: know whether the data is bytes (which don't need reversing), or shorts or longs (which do).
Google for tutorials associated with the ntohs, htons, and htons APIs.

It's not clear what aResponse represents (string of characters? struct?). Endianness is relevant only for numerical values, not chars. You also need to make sure that at the sender's side, all numerical values are converted from host to network byte-order (hton*).

Apart from your original question (which I think was already answered), you should have a look at your malloc statement. malloc allocates bytes and an unsigned short is most likely to be two bytes.
Your statement should look like:
unsigned short *ptr = (unsigned short*) malloc(total * sizeof(unsigned short));

the network byte order is big endian, so you need to convert it to little endian if you want it to make sense, but if it is only an array it shouldn't make a fuss, how does the sender sends it's data ?

For single byte we might not care about byte ordering.

Related

TCP socket: When ntoh/hton conversion not needed?

I am using existing code, that passes data - union ibv_gid through TCP connection without converting the endianness. There is a comment inside: "The gid will be transfer byte by byte so no need to do "hton". The code is correct and works, but I don't understand why the data is passed byte by byte (actually they pass the entire struct) and why there is no need in endianess convertion. The data type they pass is:
union ibv_gid {
uint8_t raw[16];
struct {
uint64_t subnet_prefix;
uint64_t interface_id;
} global;
};
** For other data types (as int etc.) they do convert the data before and after
//VL_sock_sync_data function synchronizes between the two sides
//by exchanging data between the two sides.
//size bytes are being copied from out_buf to the other side, and being saved in in_buf.
rc = VL_sock_sync_data(sock_p, sizeof(union ibv_gid), &local_gid, &tmp_gid);
Can you please explain why there is no need in endianness conversion?
Thank you for any help
The reasoning here seems to be that there's no need to do endianness conversion here because the GID (in its canonical representation) is not two 64-bit integers. It is 16 bytes.
The complication is that two systems with different endianness will see different values in the subnet_prefix and interface_id fields. So if they were to write those values to strings, send the strings back and forth, and compare them, that would be a problem. If they were to compare GIDs based on which one had a greater subnet_prefix, and expected the comparison to be the same between systems, that would be a problem. If one generated only consecutive interface_ids, and the other expected them to be consecutive, that would be a problem. But as long as they're only being used as opaque arrays of bytes, there's no problem.

Does endianness reverse an array?

I am receiving a buffer from the network, which is big endian, and I'm on a little endian system. Do I have to reverse the buffer before using it on my system? I'm not sure if the endianness only reverses the byte ordering for a single type or if it also applies to an entire buffer.
For instance, let's say I'm receiving a buffer of unsigned longs from the network, and I'm on a little endian system. The code would be this:
for(int i=0;i<size;i++)
system_buffer[i]=ntohl(network_buffer[i]);
Before I use system_buffer, do I also have to reverse it (so the last element becomes the first element and visa versa)?
reverse_buffer(system_buffer);
Furthermore, if I'm receiving an array of unsigned chars, do I need to compensate for endianness at all, or can I use the buffer as-is?
Thanks!
Endianess is only applicable to single variables,
ie. only to each array element value, not to the order of elements in the array.
And 1-byte-variables canĀ“t have their byte order reversed,
so no need to do anything with them.
Endianness affects the byte order of integers of size larger than one byte only (i.e. it does not affect a byte or int8_t).
You need to read the protocol description in what byte order the integers are.
Endianness only affects the order of bytes (not bits) in a data word. Since a char is always one-byte in size, you don't have to do any adjustments to them. The standard library function ntohl changes the byte order of an unsigned integer (l) from network byte order (n) to host byte order (h). Therefore you don't need to reverse the buffer before using it. You only need to change the byte order of each element in the buffer.
for(int i = 0; i < size; i++) {
// network byte order to system byte order
system_buffer[i] = ntohl(network_buffer[i]);
}
// process system_buffer

Negative Numbers over TCP socket C

Whenever I try to send a negative number over a TCP socket, when I print what was received,it reads "4.29497e+09". All I'm doing is this:
int i = -8;
int temp = htonl(i);
write(sock,&temp,4);
On the Server:
int temp;
read(sock, &temp,4);
int read = ntohl(temp);
cout << read << endl;
If anyone could help, it would be much appreciated.
The htonl/ntohl functions are specifically for unsigned 32-bit integers.
The htonl() function converts the unsigned integer hostlong from host byte order to network byte order.
When transferring data over a socket, you don't need to convert it to network endianess.
This function is used to translate addresses, not actual data. These functions work with unsigned integers, so they don't match your argument (signed integer)
You need to omit them.
If the second machine uses different endianess, which is kind of rare (both 8086 and ARM architectures work with little endian), you need to swap the bytes when reading ints and shorts.
This is usually done on the receiving socket.

Typecasting a char to an int (for socket)

I have probably asked this question twice since y'day, but I have still not got a favourable answer. My problem is I have an IP address which is stored in an unsigned char. Now i want to send this IP address via socket from client to server. People have advised me to use htonl() and ntohl() for network byte transfer, but I am not able to understand that the arguments for htonl() and ntohl() are integers...how can i use it in case of unsigned char?? if I can't use it, how can I make sure that if I send 130.191.166.230 in my buffer, the receiver will receive the same all the time?? Any inputs or guidance will be appreciated. Thanks in advance.
If you have an unsigned char array string (along the lines of "10.0.0.7") forming the IP address (and I'm assuming you do since there are very few 32-bit char systems around, making it rather difficult to store an IP address into a single character), you can just send that through as it is and let the other end use it (assuming you both encode characters the same way of course, such as with ASCII).
On the other hand, you may have a four byte array of chars (assuming chars are eight bits) containing the binary IP address.
The use of htonl and ntohl is to ensure that this binary data is sent through in an order that both big-endian and little-endian systems can understand.
To that end, network byte order (the order of the bytes "on the wire") is big-endian so these functions basically do nothing on big-endian systems. On little-endian systems, they swap the bytes around.
In other words, you may have the following binary data:
uint32_t ipaddress = 0x0a010203; // for 10.1.2.3
In big endian layout that would be stored as 0x0a,0x01,0x02,0x03, in little endian as 0x03,0x02,0x01,0x0a.
So, if you want to send it in network byte order (that any endian system will be able to understand), you can't just do:
write (fd, &ipaddress, 4);
since sending that from little endian system to a big endian one will end up with the bytes reversed.
What you need to do is:
uint32_t ipaddress = 0x0a010203; // for 10.1.2.3
uint32_t ip_netorder = htonl (ipaddress); // change if necessary.
write (fd, &ip_netorder, 4);
That forces it to be network byte order which any program at the other end can understand (assuming it uses ntohl to ensure it's correct for its purposes).
In fact, this scheme can handle more than just big and little endian. If you have a 32-bit integer coding scheme where ABCD (four bytes) is encoded as A,D,B,C or even where you have a bizarrely wild bit mixture forming your integers (like using even bits first then odd bits), this will still work since your local htonl and ntohl know about those formats and can convert them correctly to network byte order.
An array of chars has a defined ordering and is not endian dependent - they always operate from low to high addresses by convention.
Do you have a string or 4 bytes?
IP4 address is 4 bytes (aka chars). So you will be having 4 unsigned chars, in an array somewhere. cast that array to send it across.
e.g. unsigned char IP[4];
use ((char *)IP) as data buffer to send, and send 4 bytes from it.

Sending the array of arbitrary length through a socket. Endianness

I'm fighting with socket programming now and I've encountered a problem, which I don't know how to solve in a portable way.
The task is simple : I need to send the array of 16 bytes over the network, receive it in a client application and parse it. I know, there are functions like htonl, htons and so one to use with uint16 and uint32. But what should I do with the chunks of data greater than that?
Thank you.
You say an array of 16 bytes. That doesn't really help. Endianness only matters for things larger than a byte.
If it's really raw bytes then just send them, you will receive them just the same
If it's really a struct you want to send it
struct msg
{
int foo;
int bar;
.....
Then you need to work through the buffer pulling that values you want.
When you send you must assemble a packet into a standard order
int off = 0;
*(int*)&buff[off] = htonl(foo);
off += sizeof(int);
*(int*)&buff[off] = htonl(bar);
...
when you receive
int foo = ntohl((int)buff[off]);
off += sizeof(int);
int bar = ntohl((int)buff[off]);
....
EDIT: I see you want to send an IPv6 address, they are always in network byte order - so you can just stream it raw.
Endianness is a property of multibyte variables such as 16-bit and 32-bit integers. It has to do with whether the high-order or low-order byte goes first. If the client application is processing the array as individual bytes, it doesn't have to worry about endianness, as the order of the bits within the bytes is the same.
htons, htonl, etc., are for dealing with a single data item (e.g. an int) that's larger than one byte. An array of bytes where each one is used as a single data item itself (e.g., a string) doesn't need to be translated between host and network byte order at all.
Bytes themselves don't have endianness any more in that any single byte transmitted by a computer will have the same value in a different receiving computer. Endianness only has relevance these days to multibyte data types such as ints.
In your particular case it boils down to knowing what the receiver will do with your 16 bytes. If it will treat each of the 16 entries in the array as discrete single byte values then you can just send them without worrying about endiannes. If, on the other hand, the receiver will treat your 16 byte array as four 32 bit integers then you'll need to run each integer through hton() prior to sending.
Does that help?

Resources