Packing IP into a uint32 - c

Is the packing of a char IP[4] (192.168.2.1) into a uint32 detailed as part of any RSI/ISO/standards bodies (i.e. anyone here https://en.wikipedia.org/wiki/List_of_technical_standard_organisations)? I know it's standard across a range of languages and tools, but I'm wondering if it's part of any international standard, and if so which one? That should detail for example the endianness used.
Need to know as I'm writing a specification, and I don't want to re-invent the wheel by describing the technique.

All IP addresses (and, in fact, all multi-byte numbers used in the standard networking stack) sent on the wire are in "network order" which is big-endian.
How to represent four bytes of anything within a program depends on the programmer though they typically let the compiler decide which of course uses whatever the native hardware uses.
From Why is network-byte-order defined to be big-endian? ...
RFC1700 stated it must be so. (and defined network byte order as big-endian). It has now been superseded by RFC-3232, but this part remains the same.
The convention in the documentation of Internet Protocols is to
express numbers in decimal and to picture data in "big-endian" order
[COHEN]. That is, fields are described left to right, with the most
significant octet on the left and the least significant octet on the
right.
The reference they make is to
On Holy Wars and a Plea for Peace
Cohen, D.
Computer
The abstract can be found at IEN-137 or on this IEEE page.
Summary:
Which way is chosen does not make too much
difference. It is more important to agree upon an order than which
order is agreed upon.
It concludes that both big-endian and little-endian schemes could've been possible. There is no better/worse scheme, and either can be used in place of the other as long as it is consistent all across the system/protocol.

This depends on where that IPv4 address is going.
For actual IP usage, i.e. in network packets, the packing always uses network byte order, i.e. big-endian.
If you're usage is something different, then you're of course free to define the packing however you want.

Related

standardising bit fields using unions (making them portable)

The page shown is from
"Programming Embedded Systems"
by Michael Barr, Anthony Massa,
published by O'Reilly,
ISBN: 0-596-00983-6, page 204.
I am asking you to give more details and explanations on this, like:
Does this means that the bit fields are going to be portable across all the compilers?
For (different) architectures does this work for bit fields with sizes more than one byte or (considering the endianness difference which I don't think that using this method will overcome this problem)?
For (same) architectures does this work for bit fields with sizes more than one byte?
If they are standardised across all compilers as the book says, can we specify how are they going to be aligned?
Q.1.2 if the bit fields are just one byte so the endianness problem won't affect it, right? So will the bit fields be portable across all the compilers on different architectures and endiannesses?
does this means that the bit fields are going to be portable across all the compilers
No, the cited text about unions somehow making bit-fields portable is strange. union does not add any further portability guarantees what-so-ever. There are many aspects of bit-fields that make them completely non-portable, because they are very poorly specified by the standard. Some examples here.
For example, using uint8_t or a char type for a bit-field is not covered by the standard. The book fails to mention this even though it makes such a non-standard example.
for (different) architectures does this work for bit fields with sizes more than one byte
for (same) architectures does this work for bit fields with sizes more than one byte
No, there are no guarantees at all.
if they are standardised across all compilers as the book says ,can we specify how are they going to be aligned?
They aren't, the book is misleading. My advise is to stop reading at "bitfields are not portable" then forget that you ever heard about bit-fields. It is a 100% superfluous feature anyway. Instead, use bitwise operators.
That is a very disturbing text, I think I would toss the whole book.
Please go read the spec, you dont have to pay for it there are older ones available and draft versions (which of course are not the final) versions available, but for that couple of decades they are more common that different.
If I could add more upvotes to Lundin's answer I would, not worth creating new users with email addresses just to do that...
I have/will possibly spark an argument on this, but...The spec does say that if you define some number of (non-zero sized) bitfields in a row in a struct or union they will get packed, and there is a special zero sized one that is used to break that so that you can declare a bunch of bitfields and group them without having to make some other struct.
Perhaps it says they will be aligned, but I would never assume, that. I know for a fact the same compiler will treat endians different and pack them on opposite ends (top bit down or bottom bit up). But there is no reason to assume that any compiler follows any convention other than packing, and I would assume although perhaps it is also subject to interpretation, that they are packed in the order defined once you figure out where they start. So I wouldnt assume that 6 bits worth of declaration are aligned either way, could be up to six different alignents within a byte assuming a byte is the size of the unit, if the size of the unit is 32 or 64 bits then I am not going to bother counting the combinations, it is more than one and that is all that matters. I know for a fact from gcc that when the 32 to 64 bit x86 transition happened, that caused problems with code making assumptions on where those bits landed.
I personally wouldnt even assume that the bits are in the declared order when they pack them together...Popular compilers tend to do that, but the spec does not say more than they are packed...what does that mean. If I have a 1 then an 8 then a 1 then an 8 then a 6 bit I would hope the compiler alignes the 8 bit ones on a byte boundary and then moves the two ones near the 6, if I were to ever use a bitfield which I dont...
The prime contention here, is to me the spec is very clear that the initial items in more than one declaration in a union only uses the same memory if the order and size are the same they are compatible types. A one bit unsigned it is not the same as a 32 bit unsigned int, so they are NOT compatible types IMO. The spec goes further to state that, for bitfields the types have to be the same type and size, so for a bitfield to share the same memory in a union, you need two structures with the same initial bitfield items to be the same type and size, and only those items are per spec going to share memory, what happens with the rest of the bits is a different story, per the spec. so your example from my reading of the spec does nothing to say that the 8 bit char (using a made up non-spec declaration) and 8 declared bits of bit field are in no way expected to line up with each other and share the same memory. Just because a compiler chooses to do that in some version does not mean you can assume that, the union in particular does not make that code portable or more portable in anyway, in fact perhaps worse as now you not only have a bitfield issue across compilers or versions, but you now have union issues across compilers or versions.
As a general rule NEVER use a structure across a compile domain (with or without bitfields)(this includes unions), so never read a file into a structure, you have crossed a compile domain, never point structures at hardware registers, you have crossed a compile domain, never point structures at memory, dont point a structure at a char array that contains an ethernet packet and use the struct and/or bitfields to pick apart the ip header. Yes, these rules are widely used and abused and are a ticking time bomb. The primary reason the time bomb only goes off rarely is that the code keeps using the same compiler, and or a couple of vary popular compilers that currently have the same implementation. but struct pointing in general fails very very often, bitfield failures are just a side effect of that, and perhaps because of the horrible text in your book, unions are starting to show up a lot making the time bomb now nuclear instead of conventional.
So if you want to use a struct or a union or a bitfield and have the code actually work without maintenance. Then stay within the same compile domain (one program compiled at the same time with the same compiler and settings), pass structures defined as structures across functions, do not point at memory or other arrays, etc. for unions never access across individually defined items, if a single variable only use that variable until finished completely with it assume it is now trash if you use a struct or other variable in that union. With bitfields each variable is a standalone item independent of the other variables next to it, you are just ATTEMPTING to save memory by using them but you are actually wasting a lot of code overhead, performance, code space by using them in the first place. Keep to that and your code is far more likely to work without maintenance, across compilers. Now if you want to use this as job security and have your code fail to build or function every minor or major release of a compiler, then do those things above, point structs across a compile domain, point bitfields at hardware registers, etc. Other than your boss noting that you write horrible code that breaks often when some other employees dont, you will have to keep maintaining that code on a regular basis for the life of that product.
All the compiler does with your bitfield is generate masks and shifts, if you write those masks and shifts yourself, MASSIVELY more portable, you may still have endian issues (which can actually at times be easily solved in portable endian-less code) but you wont be completely pointing at the wrong thing using masking and shifting. it simply works, it does not produce more code, does not produce slower code, if you really need make macros for everything, using a macro to isolate a field within a "unit" is far more portable than using a bitfield.
Forget you ever read about bitfields or heard about them, never ever use them again. The need for them died decades ago. Unions somewhat fall into the same category as well, they do actually save memory but you have to be careful to share that memory properly.
And I would toss that book as well, if they dont understand this simple topic what else do those authors not understand. As with most folks this may be confusing a popular compilers interpretation with reality.
There is a little 'bit' confusion with the concepts of bit-fields and endians -
Suppose you have an MCU of 32bit - it means that internal memory of device have to be multiplied with a size of 32bits.
Now as you might know, the way each MCU stores the memory is LSB or MSB which is Big Endian and Little Endian respectively,
see here for illustrations: Endians figure
As can be seen, same data: 0x12345678 (32 bit value) is stored differently internally.
When you are reading and writing memory using a 32 bit pointer (trivial) case - the MCU will convert it for you (it doesn't makes any difference between the endians) - the problem arose when you are dealing with a one by one byte manipulation or when exporting (or importing) from other MCU / memory peripheral also suing 8 bit / 1 byte manipulations.
Bit field will be aligned to the Byte, Word and to the Long Word types (as seen) so it can be miss-interpreted when porting to other target.
Hence, the answer your questions:
If it is only one byte that you dividing into bits it will be ported nicely.
if you define a multi-bytes union it will make you goes in trouble.
Answered at the introduction of this answer
See answer no. 1
See the figure I have attached for illustrations
Right in general
1, 2: Not quite: it always depends on the platform (endianess) and the types you are using.
3: Yes they will always land on the same spot in memory.
4: Which alignment do you mean — the memory alignment or the field alignment?

Is C Endian neutral?

Is C endian-neutral?
Ok, another way of asking this question.
I am currently translating a lot of code from C to Matlab on the same platform (PC). Do I need to care about endianess?
Both are endian-neutral languages but C (not so sure), Matlab (pretty sure).
By the same token I am also translating C to Python.
So my question, has anybody in his experience, (translating from C to another endian-neutral language) met an unexpected problem with big/little endianness.
Obviously we are only speaking about the core language. In this case I mentionned C99.
First, some background and clarification:
As I mentioned in a comment to the original question, byte order is often confused with bit order. Endianness refers to byte order only. Bit order is only relevant in documentation and when data is sent via some serial connection.
In arithmetic, in base B (and 2 ≤ B ∈ ℕ), the i'th digit Di has value Di Bi. The least significant integral digit corresponds to i=0, i.e. D0. For binary, B = 2. For ordinary decimal numbers most humans prefer, B = 10.
(This works for all reals, not just integers. Most significant fractional digit, the first digit on the other side of the decimal point, is D-1, with more negative i's indicating less significant digits.)
Because 'bit' is a portmanteau of 'binary digit', we thus have a natural way of labeling bits, with bit 0 referring to the least significant (integer) bit (corresponding to value 1), bit 1 referring to the next one in significance (corresponding to value 2), and so on.
Some documentation for hardware using big-endian byte order insists on labeling the most significant bit in a word as "bit 0" (with bit numbers increasing from left to right -- contrary to most numeric representations, where digits grow more significant from right to left). This is just a labeling convention, as this convention does not follow the arithmetic rules. In fact, you need to know the width (number of bits) in that word, to even calculate the actual numeric value of such "bit 0"s.
Is C endian-neutral?
Yes, C (as in ISO C89, C99, and C11) is neutral with regards to byte order. The standards do not define any byte order; it is up to the implementation to decide. In practice, the compiler chooses the byte order suitable for the target architecture at compile time.
In theory, integer and floating-point types may very well have different byte order.
POSIX.1 adds networking support to C. Certain fields in network-related structures are defined to be in network byte order, most significant byte first. POSIX.1 provides htons(), htonl(), ntohs(), and ntohl() byteorder functions to convert from host to network byte order and vice versa.
In addition to network byte order (which is often called big-endian), little-endian byte order (least significant byte first) is also very common, for example on Intel/AMD architectures. The PDP-endian byte order (where four-byte values are stored second-most significant byte first, followed by the most significant byte, followed by the least significant, followed by the second-least significant byte) is nowadays rare.
Finally, C has been implemented on a large number of architectures, with byte orders covering all three mentioned above, without any byte order issues. That should be practical proof enough.
I am currently translating a lot of code from C to Matlab [or Python] on the same platform (PC). Do I need to care about endianess?
No, I don't see any reason for you to care about endianness when porting code between C, Matlab, Python, or just about any high-level language.
However:
Language being endian-neutral does not mean you don't need to care about endianness in your programs. Data byte order matters. It boils down to how your programs transfer -- read and write -- data; be that via in-memory structures (using shared memory, or between different programming languages via library bindings), to/from files, via network connections, or via pipes from/to other programs.
If your programs transfer data in some text-based format, then all you need to worry about is that format, and possibly the character set used -- I prefer UTF-8 (see utf8everywhere.org.
If your programs transfer data in binary, then you must understand that in binary, multi-byte values always have some specific byte order. It can be network byte order (or big-endian), little-endian, or native byte order for the current architecture. Just because your programming language is endian-neutral, does not mean you get to ignore the storage byte order.
For example, Matlab and Octave fread() support a fifth parameter that specifies the byte order used: native, ieee-be (IEEE big-endian), or ieee-le (IEEE little-endian). Python struct module pack and unpack functions default to native byte order and C alignment (padding), but you can use < or > as the first character in the format string to indicate little-endian or big-endian/network-endian byte order with no padding.
It is very common for C code to store binary data in native byte order. However, some C code does not. I prefer to store in native byte order, but also store known prototype values for each different basic numeric type, so that readers can trivially detect if they need to permute the byte order to interpret the code correctly. There are also various libraries and formats like NetCDF that may be utilized for creating portable binary data files.
The most important thing is to understand what the C code does, first.
I don't see why someone would want to port code from C to Matlab or Python, unless the C code was really poor to begin with -- in which case I'd just rewrite the logic, not port the existing code.
Have you met an unexpected problem with big/little endianness?
No, never when porting code between high-level languages.
Yes, when storing/retrieving binary data between different systems.
While not related to endianness, for multi-dimensional data, it is important to remember that Fortran and Matlab (and OpenGL matrices) use column-major order (each column being consecutive in memory), while C uses row-major order (each row being consecutive in memory).

Is it a must to reverse byte order when sending numbers across the network?

In most of the examples on the web, authors usually change the byte order before sending a number from host byte order to network byte order. Then at the receiving end, authors usually restore the order back from network byte order to host byte order.
Q1:Considering that the architecture of two systems are unknown, wouldn't it be more efficient if the authors simply checked for the endianness of the machines before reversing the byte order?
Q2:Is it really necessary to reverse the byte order of numbers even if they are passed to & received by the same machine architecture?
In general, you can't know the architecture of the remote system. If everyone uses a specific byte order - network byte order, then there is no confusion. There is some cost to all the reversing, but the cost of re-engineering ALL network devices would be far greater.
A1: Suppose that we were going to try to establish the byte order of the remote systems. We need to establish communication between systems and determine what byte order the remote system has. How do we communicate without knowing the byte order?
A2: If you know that both systems have the same architecture, then no you don't need to reverse bytes at each end. But you don't, in general, know. And if you do design a system like that, then you have made an network-architecture decision that excludes different CPU architectures in the future. Consider Apple switching from 86k to PPC to x86.
A1: No, that's the point. You don't want to have to care about the endienness of the machine on the other end. In most cases (outside of a strict, controlled environment ... which is most definitely not the internet) you're not going know. If everyone uses the same byte order, you don't have to worry about it.
A2: No. Except when that ends up not being the case down the road and everything breaks, someone is going to wonder why a well known best practice wasn't followed. Usually this will be the person signing off on your paycheck.
Little or big Endian is platform specific, but for network communication, it is common with big endianness, see wiki.
Its not about blind reversing.
All networks work on big endian. My computer [ linux + intel i386] work on little endian. so i always reverse the order when i code for my computer. i think mac work over big endian. Some mobile phone platforms also.
Network byte order is big-endian. If the sending or receiving architecture is big-endian too, you could skip the step on that end as the translation amounts to a nop. However, why bother? Translating everything is simpler, safer, and has close to no performance impact.
Q1: yes, it would be more efficient if sender & reciever tested their endianess (more precisely, communicate to each other their endianess, and tested if it is the same).
Q2: no, it is not always necessary to use a "standard" byte order. But it is simpler for code wanting to interoperate portably.
However, ease of coding might be more important than communication performance -and the network costs much more than swapping bytes, unless you've got a big lot of data.
Read about serialization and e.g. this question and my answer there
No need to test for endianness if you use the ntohl() and htonl(), etc. functions/macros. On big-endian machines, they'll already be no-ops.

data type ranges differing with operating systems

The 8-bit,16-bit,32-bit,64-bit operating systems have different data range for
integers,float and double values.
Is this the compiler or the processor that makes difference(8bit,16bit,32bit,64bit).
If in a network if a 16 bit integer data from one system is transferred to a 32 bit system
or vice-versa will the data be correctly represented in memory.Please help me to understand.
Ultimately, it is up to the compiler. The compiler is free to choose any data types it likes*, even if it has to emulate their behaviour with software routines. Of course, typically, for efficiency it will try to replicate the native types of the underlying hardware.
As to your second question, yes, of course, if you transfer the raw representation from one architecture to another, it may be interpreted incorrectly (endianness is another issue). That is why functions like ntohs() are used.
* Well, not literally anything it likes. The C standard places some constraints, such as that an int must be at least as large as a short.
The compiler (more properly the "implementation") is free to choose the sizes, subject to the limits in the C standard. The set of sizes offered by C for its various types depends in part on the hardware it runs on; i.e. the compiler makes the choice but, it (except in cases like Java where datatypes are explicitly independent of underlying hardware) is strongly influenced by what the hardware offers.
It depends not on just the compiler and operating system. It is dictated by the architecture (processor at least).
When passing data between possibly different architectures they use special fixed size data type, e.g. uint64_t, uint32_t instead of int, short etc.
But the size of integers is not the only concern when communicating between computers with different architectures, there's a byte order issue too (try googling about BigEndian and LittleEndian)
The size of a given type depends on the CPU and the conventions on the operating system.
If you want to have an int of a specific size, use the stdint.h header [wikipedia]. It defines the int8_t, int16_t, int32_t, int64_t, some others and their unsigned equivalent.
For communications between different computers, the protocol should define the sizes and byte order to use.
In a network it has to be defined in the protocol which data sizes you have. For endianness, it is highly recommended to use big endian values.
If there weren't the issue with the APIs, a compiler would be free to set its short, int, long as it wants. Bot often, the API calls are connected to these types. E.g. the open() function returns an int, whose size should be correct.
But the types might as well be part of the ABI definition.

When to worry about endianness?

I have seen countless references about endianness and what it means. I got no problems about that...
However, my coding project is a simple game to run on linux and windows, on standard "gamer" hardware.
Do I need to worry about endianness in this case? When should I need to worry about it?
My code is simple C and SDL+GL, the only complex data are basic media files (png+wav+xm) and the game data is mostly strings, integer booleans (for flags and such) and static-sized arrays. So far no user has had issues, so I am wondering if adding checks is necessary (will be done later, but there are more urgent issues IMO).
The times when you need to worry about endianess:
you are sending binary data between machines or processes (using a network or file). If the machines may have different byte order or the protocol used specifies a particular byte order (which it should), you'll need to deal with endianess.
you have code that access memory though pointers of different types (say you access a unsigned int variable through a char*).
If you do these things you're dealing with byte order whether you know it or not - it might be that you're dealing with it by assuming it's one way or the other, which may work fine as long as your code doesn't have to deal with a different platform.
In a similar vein, you generally need to deal with alignment issues in those same cases and for similar reasons. Once again, you might be dealing with it by doing nothing and having everything work fine because you don't have to cross platform boundaries (which may come back to bite you down the road if that does become a requirement).
If you mean a PC by "standard gamer hardware", then you don't have to worry about endianness as it will always be little endian on x86/x64. But if you want to port the project to other architectures, then you should design it endianness-independently.
Whenever you recieve/transmit data from a network, remeber to convert to/from network and host byte order. The C functions htons, htonl etc, or equivalients in your language, should be used here.
Whenever you read multi-byte values (like UTF-16 characters or 32 bit ints) from a file, since that file might have originated on a system with different endianness. If the file is UTF 16 or 32 it probably has a BOM (byte-order mark). Otherwise, the file format will have to specify endianness in some way.
You only need to worry about it if your game needs to run on different hardware architectures. If you are positive that it will always run on Intel hardware then you can forget about it. If it will run on Linux though many people use different architectures than Intel and you may end up having to think about it.
Are you distributing you game in source code form?
Because if you are distributing you game as a binary only, then you know exactly which processor families your game will run on. Also, the media files, are they user generated (possibly via a level editor) or are they really only ment to be supplied by yourself?
If this is a truly closed environment (your distribute binaries and the game assets are not intended to be customized) then you know your own risks to endians and I personally wouldn't fool with it.
However, if you are either distributing source and/or hoping people will customize their game, then you have a potential for concern. However, with most of the desktop/laptop computers around these days moving to x86 I would think this is a diminishing concern.
The problem occurs with networking and how the data is sent and when you are doing bit fiddling on different processors since different processors may store the data differently in memory.
I believe Power PC has the opposite endianness of the Intel boards. Might be able to have a routine that sets the endianness dependant on the architecture? I'm not sure if you can actually tell what the hardware architecture is in code...maybe someone smarter then me does know the answer to that question.
Now in reference to your statement "standard" Gamer H/W, I would say typically you're going to look at Consumer Off the Shelf solutions are really what most any Standard Gamer is using, so you're almost going to for sure get the same endian across the board. I'm sure someone will disagree with me but that's my $.02
Ha...I just noticed on the right there is a link that is showing up related to the suggestion I had above.
Find Endianness through a c program

Resources