When to worry about endianness?

When to worry about endianness? - c

I have seen countless references about endianness and what it means. I got no problems about that...
However, my coding project is a simple game to run on linux and windows, on standard "gamer" hardware.
Do I need to worry about endianness in this case? When should I need to worry about it?
My code is simple C and SDL+GL, the only complex data are basic media files (png+wav+xm) and the game data is mostly strings, integer booleans (for flags and such) and static-sized arrays. So far no user has had issues, so I am wondering if adding checks is necessary (will be done later, but there are more urgent issues IMO).

The times when you need to worry about endianess:
you are sending binary data between machines or processes (using a network or file). If the machines may have different byte order or the protocol used specifies a particular byte order (which it should), you'll need to deal with endianess.
you have code that access memory though pointers of different types (say you access a unsigned int variable through a char*).
If you do these things you're dealing with byte order whether you know it or not - it might be that you're dealing with it by assuming it's one way or the other, which may work fine as long as your code doesn't have to deal with a different platform.
In a similar vein, you generally need to deal with alignment issues in those same cases and for similar reasons. Once again, you might be dealing with it by doing nothing and having everything work fine because you don't have to cross platform boundaries (which may come back to bite you down the road if that does become a requirement).

If you mean a PC by "standard gamer hardware", then you don't have to worry about endianness as it will always be little endian on x86/x64. But if you want to port the project to other architectures, then you should design it endianness-independently.

Whenever you recieve/transmit data from a network, remeber to convert to/from network and host byte order. The C functions htons, htonl etc, or equivalients in your language, should be used here.
Whenever you read multi-byte values (like UTF-16 characters or 32 bit ints) from a file, since that file might have originated on a system with different endianness. If the file is UTF 16 or 32 it probably has a BOM (byte-order mark). Otherwise, the file format will have to specify endianness in some way.

You only need to worry about it if your game needs to run on different hardware architectures. If you are positive that it will always run on Intel hardware then you can forget about it. If it will run on Linux though many people use different architectures than Intel and you may end up having to think about it.

Are you distributing you game in source code form?
Because if you are distributing you game as a binary only, then you know exactly which processor families your game will run on. Also, the media files, are they user generated (possibly via a level editor) or are they really only ment to be supplied by yourself?
If this is a truly closed environment (your distribute binaries and the game assets are not intended to be customized) then you know your own risks to endians and I personally wouldn't fool with it.
However, if you are either distributing source and/or hoping people will customize their game, then you have a potential for concern. However, with most of the desktop/laptop computers around these days moving to x86 I would think this is a diminishing concern.

The problem occurs with networking and how the data is sent and when you are doing bit fiddling on different processors since different processors may store the data differently in memory.

I believe Power PC has the opposite endianness of the Intel boards. Might be able to have a routine that sets the endianness dependant on the architecture? I'm not sure if you can actually tell what the hardware architecture is in code...maybe someone smarter then me does know the answer to that question.
Now in reference to your statement "standard" Gamer H/W, I would say typically you're going to look at Consumer Off the Shelf solutions are really what most any Standard Gamer is using, so you're almost going to for sure get the same endian across the board. I'm sure someone will disagree with me but that's my $.02
Ha...I just noticed on the right there is a link that is showing up related to the suggestion I had above.
Find Endianness through a c program

Related

C- how to make a variable to be big end. by default?

I want my program to use big end by default (now it is little).
That means,everytime I declare an uint_32/int,the value which be transformed to it will be on big end order.
Is that possible? (Without calling each time to ntohl()).
I have researched on google and s.o for 3 days and haven't got any anawer
I will strongly appreciate any help!
edit:
I have server and client.
The client works with big endian,and the server is little endian.
Now,i am sending to the server md5 value for a byte array,ntohl()ing it on the server and get the valid md5 value.
In the server,when I call the md5.c function (which is dll,by the way),I am getting a different value.
This value is not even similair on any way to the recived value.
I assume it happens because of the endiannes.
The byte array I send to the function is not changing because those are single bytes,which are not seensetive the endianess,but the vars I declare on the function,and with them I manipulate my byte arr,can make a problem.
That is the reason big endian is so important to me.

I want my program to use big end by default
You cannot. The endianness is a fixed property of the target processor (of your C compiler) and related to its instruction set architecture.
You might in principle (but with C, that is really not worth the trouble) use a C compiler for some virtual machine (like WebAssembly, or MMIX) of the appropriate endianness, or even define a bytecode VM for a machine of desired endianness.
A variant of that might be to cross-compile for some other instruction set architecture (e.g. MIPS, ARM, etc...) and use some emulator to run the executable (and the necessary OS).
Regarding your edit, you could consider sending on the network the md5sum in alphanumerical form (e.g. like the output of the md5sum command).

You can compile the program for different processors, some of which use big endian instead of little endian, or you can control how your code will be compiled in that aspect. For example, for MIPS: https://gcc.gnu.org/onlinedocs/gcc/MIPS-Options.html

Endianness is purely about how a processor performs multi-byte arithmetic. The only time a programmer need to be aware is when serializing or addressing parts of an integer.
So unless you can change how a processor performs arithmetic on multi-byte words (ARM allows you to change endianness) you are stuck with how the processor works.

Define size of long and time_t to 4bytes

I want to synchronize two Raspberry Pi's with a C program. It is working fine, if the program only is running on the Pi's, but for development I want to use my PC (where its also easier to debug), but I send the timespec struct directly as binary over the wire. A raspberry is using 4bytes for long and time_t, my PC is using 8byte each... So they do not come together.
Is it possible to set long and time_t to 4byte each, only for this C script?
I know that the size of long, short, etc. is defined by the system.
Important: I only want to define it once in the script and not transforming it to uintXX or int each time.

In programming, it is not uncommon to need to treat network transmissions as separate from in-memory handling. In fact, it is pretty much the norm. So converting it to a network format of the proper byte order and size is really recommended and while help with the abstractions for your interfaces.
You might as well consider transforming to plain text, if that is not a time-critical piece of data exchange. It makes for a lot easier debugging.

C is probably not the best tool for the job here. It's much too low level to provide automatic data serialization like JavaScript, Python or similar more abstract languages.
You cannot assume the definitions of timespec will be identical on different platforms. For one thing the size of an int will be different depending on the 32/64 bits architecture, and you can have endianing problems too.
When you want to exchange data structures between heterogeneous platforms, you need to define your own protocol with unambiguous data and a clear endianing convention.
One solution would be to send the numbers as ASCII. Not terribly efficient, but if it's just a couple of values, who cares?
Another would be to create an exchange structure with (u)intXX_t fields.
You can assume a standard raspberry kernel will be little endian like your PC, but if you're writing a small exchange protocol, you might as well add a couple of htonl/ntohl for good measure.

Is it a must to reverse byte order when sending numbers across the network?

In most of the examples on the web, authors usually change the byte order before sending a number from host byte order to network byte order. Then at the receiving end, authors usually restore the order back from network byte order to host byte order.
Q1:Considering that the architecture of two systems are unknown, wouldn't it be more efficient if the authors simply checked for the endianness of the machines before reversing the byte order?
Q2:Is it really necessary to reverse the byte order of numbers even if they are passed to & received by the same machine architecture?

In general, you can't know the architecture of the remote system. If everyone uses a specific byte order - network byte order, then there is no confusion. There is some cost to all the reversing, but the cost of re-engineering ALL network devices would be far greater.
A1: Suppose that we were going to try to establish the byte order of the remote systems. We need to establish communication between systems and determine what byte order the remote system has. How do we communicate without knowing the byte order?
A2: If you know that both systems have the same architecture, then no you don't need to reverse bytes at each end. But you don't, in general, know. And if you do design a system like that, then you have made an network-architecture decision that excludes different CPU architectures in the future. Consider Apple switching from 86k to PPC to x86.

A1: No, that's the point. You don't want to have to care about the endienness of the machine on the other end. In most cases (outside of a strict, controlled environment ... which is most definitely not the internet) you're not going know. If everyone uses the same byte order, you don't have to worry about it.
A2: No. Except when that ends up not being the case down the road and everything breaks, someone is going to wonder why a well known best practice wasn't followed. Usually this will be the person signing off on your paycheck.

Little or big Endian is platform specific, but for network communication, it is common with big endianness, see wiki.

Its not about blind reversing.
All networks work on big endian. My computer [ linux + intel i386] work on little endian. so i always reverse the order when i code for my computer. i think mac work over big endian. Some mobile phone platforms also.

Network byte order is big-endian. If the sending or receiving architecture is big-endian too, you could skip the step on that end as the translation amounts to a nop. However, why bother? Translating everything is simpler, safer, and has close to no performance impact.

Q1: yes, it would be more efficient if sender & reciever tested their endianess (more precisely, communicate to each other their endianess, and tested if it is the same).
Q2: no, it is not always necessary to use a "standard" byte order. But it is simpler for code wanting to interoperate portably.
However, ease of coding might be more important than communication performance -and the network costs much more than swapping bytes, unless you've got a big lot of data.
Read about serialization and e.g. this question and my answer there

No need to test for endianness if you use the ntohl() and htonl(), etc. functions/macros. On big-endian machines, they'll already be no-ops.

Writing a portable C program - which things to consider?

For a project at university I need to extend an existing C application, which shall in the end run on a wide variety of commercial and non-commercial unix systems (FreeBSD, Solaris, AIX, etc.).
Which things do I have to consider when I want to write a C program which is most portable?

The best advice I can give, is to move to a different platform every day, testing as you go.
This will make the platform differences stick out like a sore thumb, and teach you the portability issues at the same time.
Saving the cross platform testing for the end, will lead to failure.
That aside
Integer sizes can vary.
floating point numbers might be represented differently.
integers can have different endianism.
Compilation options can vary.
include file names can vary.
bit field implementations will vary.
It is generally a good idea to set your compiler warning level up as high as possible,
to see the sorts of things the compiler can complain about.

I used to write C utilities that I would then support on 16 bit to 64 bit architectures, including some 60 bit machines. They included at least three varieties of "endianness," different floating point formats, different character encodings, and different operating systems (though Unix predominated).
Stay as close to standard C as you can. For functions/libraries not part of the standard, use as widely supported a code base as you can find. For example, for networking, use the BSD socket interface, with zero or minimal use of low level socket options, out-of-band signalling, etc. To support a wide number of disparate platforms with minimal staff, you'll have to stay with plain vanilla functions.
Be very aware of what's guaranteed by the standard, vice what's typical implementation behavior. For instance, pointers are not necessarily the same size as integers, and pointers to different data types may have different lengths. If you must make implementation dependent assumptions, document them thoroghly. Lint, or --strict, or whatever your development toolset has as an equivalent, is vitally important here.
Header files are your friend. Use implementaton defined macros and constants. Use header definitions and #ifdef to help isolate those instances where you need to cover a small number of alternatives.
Don't assume the current platform uses EBCDIC characters and packed decimal integers. There are a fair number of ASCII - two's complement machines out there as well. :-)
With all that, if you avoid the tempation to write things multiple times and #ifdef major portions of code, you'll find that coding and testing across disparate platforms helps find bugs sooner. You'll end up producing more disciplined, understandable, maintainable programs.

Use atleast two compilers.
Have a continuous build system in place, which preferably builds on the various target platforms.
If you do not need to work very low-level, try to use some library that provides abstraction. It is unlikely that you won't find third-party libraries that provide good abstraction for the things you need. For example, for network and communication, there is ACE. Boost (e.g. filesystem) is also ported to several platforms. These are C++ libraries, but there may be other C libraries too (like curl).
If you have to work at the low level, be aware that the platforms occasionally have different behavior even on things like posix where they are supposed to have the same behavior. You can have a look at the source code of the libraries above.

One particular issue that you may need to stay abreast of (for instance, if your data files are expected to work across platforms) is endianness.
Numbers are represented differently at the binary level on different architectures. Big-endian systems order the most significant byte first and little-endian systems order the least-significant byte first.
If you write some raw data to a file in one endianness and then read that file back on a system with a different endianness you will obviously have issues.
You should be able to get the endianness at compile-time on most systems from sys/param.h. If you need to detect it at runtime, one method is to use a union of an int and a char, then set the char to 1 and see what value the int has.

It is a very long list. The best thing to do is to read examples. The source of perl, for example. If you look at the source of perl, you will see a gigantic process of constructing a header file that deals with about 50 platform issues.
Read it and weep, or borrow.

The list may be long, but it's not nearly as long as also supporting Windows and MSDOS. Which is common with many utilities.
The usual technique is to separate core algorithm modules from those which deal with the operating system—basically a strategy of layered abstraction.
Differentiating amongst several flavors of unix is rather simple in comparison. Either stick to features all use the same RTL names for, or look at the majority convention for the platforms supported and #ifdef in the exceptions.

Continually refer to the POSIX standards for any library functions you use. Parts of the standard are ambiguous and some systems return different styles of error codes. This will help you to proactively find those really hard to find slightly different implementation bugs.

Run time Data and Code memory size estimate

I am working on a project, C programing language, to develop an application, that can be ported on to a number of different microcontroller platforms, such as ARM\Freescale\PIC microcontroller. I am developing this application on Linux now and then I will have to port it to the above said platforms.
I would like to know, are there any tools (open source preferably), using which I can determine the "code" and the data memory footprint\size, before porting it to the new platform.
I have been searching on "Google" for it and have not found anything so far, not even for Linux as well.
any help from you will greatly help me.
-Vikas

For a small program, much of the size is determined by the libraries/DLL your program depends on. Since you refer to ARM/Freescale/Pic I assume you're dealing with compact, embedded applications where data size is measured in bytes rather than MBytes.
For your own code, size differences will determined by:
word size (i.e. 32bit programs tend to be a bit larger /more data than 8 bit)
architecture (i.e. Intel code versus ARM, freescale, PIC)
In your case, I expect that PIC is the most critical part (for RAM/ROM constraints). So propably monitoring the PIC compile size during PC development is sufficient. The linker output will contain info on TEXT/DATA/BSS size, which you can monitor.
I generally work on embedded systems. In my work much of the data size is known at design time (i.e. number of buffers * buffer size). For code size, I have rules of thumb on different architectures which help me to do a sanity check at design time. For instance, I define a suite of some exising-code libraries, for which I know performance and size numbers for each architecture. This way I know what kind of ratio I can expect at design time. If the PC program has 1 MBytes of data, it won't fit in an 8-bit PIC.....

Nothing can tell you how much memory your application will need. You'll have to make some assumptions about how it will be used and try your application under different scenarios.
As you're testing, you can monitor the memory usage stats in the /proc file system or use the ps command to do the same.

The size of your text/code segment will depend on optimization level and back-end. GCC can be configured to generate that information for you.
Run-time is a little more difficult as Jeremy said. Besides his suggestion, you also might want to try gcov and/or gprof in order to analyse your program in the context of your most common use scenarios. This kind of instrumentation is focussed on complexity rather than size but at least you'll know better where to focus your memory analysis.

Your compiler can/will generate a map file. The map file will, generally speaking, have code and data size (or location ranges). There may be differences between different compilers for the different targets. And as pointed out in other posts here, your dependencies on supplied libraries will also impact overall memory usage.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight