Detecting Endianness - c

I'm currently trying to create a C source code which properly handles I/O whatever the endianness of the target system.
I've selected "little endian" as my I/O convention, which means that, for big endian CPU, I need to convert data while writing or reading.
Conversion is not the issue. The problem I face is to detect endianness, preferably at compile time (since CPU do not change endianness in the middle of execution...).
Up to now, I've been using this :
#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
...
#else
...
#endif
It's documented as a GCC pre-defined macro, and Visual seems to understand it too.
However, I've received report that the check fails for some big_endian systems (PowerPC).
So, I'm looking for a foolproof solution, which ensures that endianess is correctly detected, whatever the compiler and the target system. well, most of them at least...
[Edit] : Most of the solutions proposed rely on "run-time tests". These tests may sometimes be properly evaluated by compilers during compilation, and therefore cost no real runtime performance.
However, branching with some kind of << if (0) { ... } else { ... } >> is not enough. In the current code implementation, variable and functions declaration depend on big_endian detection. These cannot be changed with an if statement.
Well, obviously, there is fall back plan, which is to rewrite the code...
I would prefer to avoid that, but, well, it looks like a diminishing hope...
[Edit 2] : I have tested "run-time tests", by deeply modifying the code. Although they do their job correctly, these tests also impact performance.
I was expecting that, since the tests have predictable output, the compiler could eliminate bad branches. But unfortunately, it doesn't work all the time. MSVC is good compiler, and is successful in eliminating bad branches, but GCC has mixed results, depending on versions, kind of tests, and with greater impact on 64 bits than on 32 bits.
It's strange. And it also means that the run-time tests cannot be ensured to be dealt with by the compiler.
Edit 3 : These days, I'm using a compile-time constant union, expecting the compiler to solve it to a clear yes/no signal.
And it works pretty well :
https://godbolt.org/g/DAafKo

As stated earlier, the only "real" way to detect Big Endian is to use runtime tests.
However, sometimes, a macro might be preferred.
Unfortunately, I've not found a single "test" to detect this situation, rather a collection of them.
For example, GCC recommends : __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__ . However, this only works with latest versions, and earlier versions (and other compilers) will give this test a false value "true", since NULL == NULL. So you need the more complete version : defined(__BYTE_ORDER__)&&(__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
OK, now this works for newest GCC, but what about other compilers ?
You may try __BIG_ENDIAN__ or __BIG_ENDIAN or _BIG_ENDIAN which are often defined on big endian compilers.
This will improve detection. But if you specifically target PowerPC platforms, you can add a few more tests to improve even more detection. Try _ARCH_PPC or __PPC__ or __PPC or PPC or __powerpc__ or __powerpc or even powerpc. Bind all these defines together, and you have a pretty fair chance to detect big endian systems, and powerpc in particular, whatever the compiler and its version.
So, to summarize, there is no such thing as a "standard pre-defined macros" which guarantees to detect big-endian CPU on all platforms and compilers, but there are many such pre-defined macros which, collectively, give a high probability of correctly detecting big endian under most circumstances.

At compile time in C you can't do much more than trusting preprocessor #defines, and there are no standard solutions because the C standard isn't concerned with endianness.
Still, you could add an assertion that is done at runtime at the start of the program to make sure that the assumption done when compiling was true:
inline int IsBigEndian()
{
int i=1;
return ! *((char *)&i);
}
/* ... */
#ifdef COMPILED_FOR_BIG_ENDIAN
assert(IsBigEndian());
#elif COMPILED_FOR_LITTLE_ENDIAN
assert(!IsBigEndian());
#else
#error "No endianness macro defined"
#endif
(where COMPILED_FOR_BIG_ENDIAN and COMPILED_FOR_LITTLE_ENDIAN are macros #defined previously according to your preprocessor endianness checks)

Instead of looking for a compile-time check, why not just use big-endian order (which is considered the "network order" by many) and use the htons/htonl/ntohs/ntohl functions provided by most UNIX-systems and Windows. They're already defined to do the job you're trying to do. Why reinvent the wheel?

Try something like:
if(*(char *)(int[]){1}) {
/* little endian code */
} else {
/* big endian code */
}
and see if your compiler resolves it at compile-time. If not, you might have better luck doing the same with a union. Actually I like defining macros using unions that evaluate to 0,1 or 1,0 (respectively) so that I can just do things like accessing buf[HI] and buf[LO].

Notwithstanding compiler-defined macros, I don't think there's a compile-time way to detect this, since determining the endianness of an architecture involves analyzing the manner in which it stores data in memory.
Here's a function which does just that:
bool IsLittleEndian () {
int i=1;
return (int)*((unsigned char *)&i)==1;
}

As others have pointed out, there isn't a portable way to check for endianness at compile-time. However, one option would be to use the autoconf tool as part of your build script to detect whether the system is big-endian or little-endian, then to use the AC_C_BIGENDIAN macro, which holds this information. In a sense, this builds a program that detects at runtime whether the system is big-endian or little-endian, then has that program output information that can then be used statically by the main source code.
Hope this helps!

This comes from p. 45 of Pointers in C:
#include <stdio.h>
#define BIG_ENDIAN 0
#define LITTLE_ENDIAN 1
int endian()
{
short int word = 0x0001;
char *byte = (char *) &word;
return (byte[0] ? LITTLE_ENDIAN : BIG_ENDIAN);
}
int main(int argc, char* argv[])
{
int value;
value = endian();
if (value == 1)
printf("The machine is Little Endian\n");
else
printf("The machine is Big Endian\n");
return 0;
}

Socket's ntohl function can be used for this purpose. Source
// Soner
#include <stdio.h>
#include <arpa/inet.h>
int main() {
if (ntohl(0x12345678) == 0x12345678) {
printf("big-endian\n");
} else if (ntohl(0x12345678) == 0x78563412) {
printf("little-endian\n");
} else {
printf("(stupid)-middle-endian\n");
}
return 0;
}

My GCC version is 9.3.0, it's configured to support powerpc64 platform, and I've tested it and verified that it supports the following macros logic:
#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
......
#endif
#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
.....
#endif

As of C++20, no more hacks or compiler extensions are necessary.
https://en.cppreference.com/w/cpp/types/endian
std::endian (Defined in header <bit>)
enum class endian
{
little = /*implementation-defined*/,
big = /*implementation-defined*/,
native = /*implementation-defined*/
};
If all scalar types are little-endian, std::endian::native equals std::endian::little
If all scalar types are big-endian, std::endian::native equals std::endian::big

You can't detect it at compile time to be portable across all compilers. Maybe you can change the code to do it at run-time - this is achievable.

It is not possible to detect endianness portably in C with preprocessor directives.

I took the liberty of reformatting the quoted text
As of 2017-07-18, I use union { unsigned u; unsigned char c[4]; }
If sizeof (unsigned) != 4 your test may fail.
It may be better to use
union { unsigned u; unsigned char c[sizeof (unsigned)]; }

As most have mentioned, compile time is your best bet. Assuming you do not do cross compilations and you use cmake (it will also work with other tools such as a configure script, of course) then you can use a pre-test which is a compiled .c or .cpp file and that gives you the actual verified endianness of the processor you're running on.
With cmake you use the TestBigEndian macro. It sets a variable which you can then pass to your software. Something like this (untested):
TestBigEndian(IS_BIG_ENDIAN)
...
set(CFLAGS ${CFLAGS} -DIS_BIG_ENDIAN=${IS_BIG_ENDIAN}) // C
set(CXXFLAGS ${CXXFLAGS} -DIS_BIG_ENDIAN=${IS_BIG_ENDIAN}) // C++
Then in your C/C++ code you can check that IS_BIG_ENDIAN define:
#if IS_BIG_ENDIAN
...do big endian stuff here...
#else
...do little endian stuff here...
#endif
So the main problem with such a test is cross compiling since you may be on a completely different CPU with a different endianness... but at least it gives you the endianness at time of compiling the rest of your code and will work for most projects.

I provided a general approach in C with no preprocessor, but only runtime that compute endianess for every C type.
the output if this on my Linux x86_64 architecture is:
fabrizio#toshibaSeb:~/git/pegaso/scripts$ gcc -o sizeof_endianess sizeof_endianess.c
fabrizio#toshibaSeb:~/git/pegaso/scripts$ ./sizeof_endianess
INTEGER TYPE | signed | unsigned | 0x010203... | Endianess
--------------+---------+------------+-------------------------+--------------
int | 4 | 4 | 04 03 02 01 | little
char | 1 | 1 | - | -
short | 2 | 2 | 02 01 | little
long int | 8 | 8 | 08 07 06 05 04 03 02 01 | little
long long int | 8 | 8 | 08 07 06 05 04 03 02 01 | little
--------------+---------+------------+-------------------------+--------------
FLOATING POINT| size |
--------------+---------+
float | 4
double | 8
long double | 16
Get source at: https://github.com/bzimage-it/pegaso/blob/master/scripts/sizeof_endianess.c
This is a more general approach is to not detect endianess at compilation time (not possibile) nor assume any endianess escludes another one. In fact is important to remark that endianess is not a concept of the architecture/processor but regards single type. As argued by
#Christoph at https://stackoverflow.com/a/4712594/3280080 PDP-11 for example can have different endianess at the same time.
The approach consist to set an integer to be x = 0x010203... as long is it, then print them looking at casted-at-single-byte incrementing the address by one.
Can somebody test it please in a big endian and/or mixed endianess ?

I know I'm late to this party, but here is my take.
int is_big_endian() {
return 1 & *(uint16_t*)"01";
}
This is based on the fact that '0' is 48 in decimal and '1' 49, so '1' has the LSB bit set, while '0' not. I could make them '\x00' and '\x01' but I think my version makes it more readable.

#define BIG_ENDIAN ((1 >> 1 == 0) ? 0 : 1)

Related

How to tell gcc to not align function parameters on the stack?

I am trying to decompile an executable for the 68000 processor into C code, replacing the original subroutines with C functions one by one.
The problem I faced is that I don't know how to make gcc use the calling convention that matches the one used in the original program. I need the parameters on the stack to be packed, not aligned.
Let's say we have the following function
int fun(char arg1, short arg2, int arg3) {
return arg1 + arg2 + arg3;
}
If we compile it with
gcc -m68000 -Os -fomit-frame-pointer -S source.c
we get the following output
fun:
move.b 7(%sp),%d0
ext.w %d0
move.w 10(%sp),%a0
lea (%a0,%d0.w),%a0
move.l %a0,%d0
add.l 12(%sp),%d0
rts
As we can see, the compiler assumed that parameters have addresses 7(%sp), 10(%sp) and 12(%sp):
but to work with the original program they need to have addresses 4(%sp), 5(%sp) and 7(%sp):
One possible solution is to write the function in the following way (the processor is big-endian):
int fun(int bytes4to7, int bytes8to11) {
char arg1 = bytes4to7>>24;
short arg2 = (bytes4to7>>8)&0xffff;
int arg3 = ((bytes4to7&0xff)<<24) | (bytes8to11>>8);
return arg1 + arg2 + arg3;
}
However, the code looks messy, and I was wondering: is there a way to both keep the code clean and achieve the desired result?
UPD: I made a mistake. The offsets I'm looking for are actually 5(%sp), 6(%sp) and 8(%sp) (the char-s should be aligned with the short-s, but the short-s and the int-s are still packed):
Hopefully, this doesn't change the essence of the question.
UPD 2: It turns out that the 68000 C Compiler by Sierra Systems gives the described offsets (as in UPD, with 2-byte alignment).
However, the question is about tweaking calling conventions in gcc (or perhaps another modern compiler).
Here's a way with a packed struct. I compiled it on an x86 with -m32 and got the desired offsets in the disassembly, so I think it should still work for an mc68000:
typedef struct {
char arg1;
short arg2;
int arg3;
} __attribute__((__packed__)) fun_t;
int
fun(fun_t fun)
{
return fun.arg1 + fun.arg2 + fun.arg3;
}
But, I think there's probably a still cleaner way. It would require knowing more about the other code that generates such a calling sequence. Do you have the source code for it?
Does the other code have to remain in asm? With the source, you could adjust the offsets in the asm code to be compatible with modern C ABI calling conventions.
I've been programming in C since 1981 and spent years doing mc68000 C and assembler code (for apps, kernel, device drivers), so I'm somewhat familiar with the problem space.
It's not a gcc 'fault', it is 68k architecture that requires stack to be always aligned on 2 bytes.
So there is simply no way to break 2-byte alignment on the hardware stack.
but to work with the original program they need to have addresses
4(%sp), 5(%sp) and 7(%sp):
Accessing word or long values off the ODD memory address will immediately trigger alignment exception on 68000.
To get integral parameters passed using 2 byte alignment instead of 4 byte alignment, you can change the default int size to be 16 bit by -mshort. You need to replace all int in your code by long (if you want them to be 32 bit wide). The crude way to do that is to also pass -Dint=long to your compiler. Obviously, you will break ABI compatibility to object files compiled with -mno-short (which appears to be the default for gcc).

AVR GCC, assembly C stub functions, eor and the required constant value

I'm having this code:
uint16_t swap_bytes(uint16_t x)
{
asm volatile(
"eor, %A0, %B0" "\n\t"
"eor, %B0, %A0" "\n\t"
"eor, %A0, %B0" "\n\t"
: "=r" (x)
: "0" (x)
);
return x;
}
Which translates (by avr-gcc version 4.8.1 with -std=gnu99 -save-temps) to:
.global swap_bytes
.type swap_bytes, #function
swap_bytes:
/* prologue: function */
/* frame size = 0 */
/* stack size = 0 */
.L__stack_usage = 0
/* #APP */
; 43 "..\lib\own\ownlib.c" 1
eor, r24, r25
eor, r25, r24
eor, r24, r25
; 0 "" 2
/* #NOAPP */
ret
.size swap_bytes, .-swap_bytes
But then the compiler is complaining like that:
|65|Error: constant value required|
|65|Error: garbage at end of line|
|66|Error: constant value required|
|66|Error: garbage at end of line|
|67|Error: constant value required|
|67|Error: garbage at end of line|
||=== Build failed: 6 error(s), 0 warning(s) (0 minute(s), 0 second(s)) ===|
The mentioned lines are the ones with the eor commands. Why does the compiler having problems with that? The registers are even upper (>= r16) ones where nearly all operations are possible. constant value required sounds to me like it expects a literal... I dont get it.
Just to clarify for future googlers:
eor, r24, r25
has an extra comma after the eor. This should be written as:
eor r24, r25
I would also encourage you (again) to consider using gcc's __builtin_bswap16. In case you are not familiar with the gcc 'builtin' functions, they are functions that are built into the compiler, and (despite looking like functions) are typically inlined. They have been written and optimized by people who understand all the ins and outs of the various processors and can take into account things you may not have considered.
I understand the desire to keep code as small as possible. And I accept that it is possible that (somehow) this builtin on your specific processor is producing sub-optimal code (I assume you have checked?). On the other hand, it may produce exactly the same code. Or it may use some even more clever trick to do this. Or it might interleave instructions from the surrounding code to take advantage of pipelining (or some other avr-specific thing that I have never heard of because I don't speak 'avr').
What's more, consider this code:
int main()
{
return __builtin_bswap16(12345);
}
Your code always takes 3 instructions to process a swap. However with builtins, the compiler can recognize that the arg is constant and compute the value at compile time instead of at run time. Hard to be more efficient than that.
I could also point out the benefits of "easier to support." Writing inline asm is HARD to do correctly. And future maintainers hate to touch it cuz they're never quite sure how it works. And of course, the builtin is going to be more cross-platform portable.
Still not convinced? My last pitch: Even after you fix the commas, your inline asm code still isn't quite right. Consider this code:
int main(int argc, char *argv[])
{
return swap_bytes(argc) + swap_bytes(argc);
}
Because of the way you have written written swap_bytes (ie using volatile), gcc must compute the value twice (see the definition of volatile). Had you omitted volatile (or if you had used the builtin which does this correctly), it would have realized that argc doesn't change and re-used the output from the first call. Did I mention that correctly writing inline asm is HARD?
I don't know your code, constraints, level of expertise or requirements. Maybe your solution really is the best. The most I can do is to encourage you to think long and hard before using inline asm in production code.

what's the difference between __builtin_popcountll and_mm_popcnt_u64?

I was trying to how many 1 in 512MB memory and I found two possible methods, _mm_popcnt_u64() and __builtin_popcountll() in the gcc builtins.
_mm_popcnt_u64() is said to use the CPU introduction SSE4.2,which seems to be the fastest, and __builtin_popcountll() is excepted to use table lookup.
So, I think __builtin_popcountll() should be little slower than _mm_popcnt_u64().
However I got a result like this:
It took almost the same time for two methods. I highly doubt that they used the same way to work.
I also got this in popcntintrin.h
/* Calculate a number of bits set to 1. */
extern __inline int __attribute__((__gnu_inline__, __always_inline__, __artificial___))
_mm_popcnt_u32 (unsigned int __X)
{
return __builtin_popcount (__X);
}
#ifdef __x86_64__
extern __inline long long __attribute__((__gnu_inline__, __always_inline__, __artificial__))
_mm_popcnt_u64 (unsigned long long __X)
{
return __builtin_popcountll (__X);
}
#endif
So, I'm confused how __builtin_popcountll() works on earth
_mm_popcnt_u64 is part of <nmmintrin.h>, a header devised by Intel for utility functions for accessing SSE 4.2 instructions.
__builtin_popcountll is a GCC extension.
_mm_popcnt_u64 is portable to non-GNU compilers, and __builtin_popcountll is portable to non-SSE-4.2 CPUs. But on systems where both are available, both should compile to the exact same code.
If You compile without march flag, so with x86_64 default, builtin should be slower because it needs to dispatch function selecting between different architectures. This will cause no inlining and additional condition.

What is a "false positive"?

I noticed the following code on Understanding Strict Aliasing:
uint32_t
swap_words( uint32_t arg )
{
U32* in = (U32*)&arg;
uint16_t lo = in->u16[0];
uint16_t hi = in->u16[1];
in->u16[0] = hi;
in->u16[1] = lo;
return (in->u32);
}
According to the author,
The above source when compiled with GCC 4.0 with the
-Wstrict-aliasing=2 flag enabled will generate a warning. This warning is an example of a false positive. This type of cast is
allowed and will generate the appropriate code (see below). It is
documented clearly that -Wstrict-aliasing=2 may return false
positives.
I wonder what is a false positive, and should I pay attention to it when aliasing variables?
Here is the "appropriate code (see below)" mentioned above, if it's relevant:
Compiled with -fstrict-aliasing -O3 -Wstrict-aliasing -std=c99 on GNU C version 4.0.0 (Apple Computer, Inc. build 5026) (powerpc-apple-darwin8),
swap_words:
stw r3,24(r1) ; Store arg
lhz r0,24(r1) ; Load hi
lhz r2,26(r1) ; Load lo
sth r0,26(r1) ; Store result[1] = hi
sth r2,24(r1) ; Store result[0] = lo
lwz r3,24(r1) ; Load result
blr ; Return
Here a "false positive" means that the warning is incorrect, i.e. that there is no problem here. False positive is a standard term in science and medicine, denoting an error in an evaluation process resulting in mistaken detection of a condition tested. (A false negative would be failure to emit a warning when one were appropriate.)
If you are sure that a warning is a false positive, you don't need to fix your code to remove it. You might still want to restructure the code to avoid compilation warnings (or turn off the warning for that section of code, if possible). A normally warningless compile ensures that, when a real warning appears, one you should pay attention to, you don't miss it.
false positive in general means when something pretends the answer is yes / conforms to some condition when actually it's not true.
So here gcc gives you a warning about incorrect cast from integer to structure of the same size holding two 16-bit integers because it's dangerous operation. But you know that it's actually correct and generates correct code.
gcc uses special heuristics to find problems with the strict aliasing rules. These heuristics can have wrong positives, situations where they report a warning but there isn't anything to report.

Preferred idiom for endianess-agnostic reads

In the Plan 9 source code I often find code like this to read serialised data from a buffer with a well-defined endianess:
#include <stdint.h>
uint32_t le32read(uint8_t buf[static 4]) {
return (buf[0] | buf[1] << 8 | buf[2] << 16 | buf[3] << 24);
}
I expected both gcc and clang to compile this code into something as simple as this assembly on amd64:
.global le32read
.type le32read,#function
le32read:
mov (%rdi),%eax
ret
.size le32read,.-le32read
But contrary to my expectations, neither gcc nor clang recognize this pattern and produce complex assembly with multiple shifts instead.
Is there an idiom for this kind of operation that is both portable to all C99-implementations and produces good (i.e. like the one presented above) code across implementations?
After some research, I found (with the help of the terrific people in ##c on Freenode), that gcc 5.0 will implement optimizations for the kind of pattern described above. In fact, it compiles the C source listed in my question to the exact assembly I listed below.
I haven't found similar information about clang, so I filed a bug report. As of Clang 9.0, clang recognises both the read as well as the write idiom and turns it into fast code.
If you want to guaranty a conversions between a native platform order and a defined order (order on a network for example) you can let system libraries to the work and simply use the functions of <netinet/in.h> : hton, htons, htonl and ntoh, ntohs, nthol.
But I must admit that the include file is not guaranteed : under Windows I think it is winsock.h.
You could determine endianess like in this answer. Then use the O32_HOST_ORDER macro to decide whether to cast the byte array to an uint32_t directly or to use your bit shifting expression.
#include <stdint.h>
uint32_t le32read(uint8_t buf[static 4]) {
if (O32_HOST_ORDER == O32_LITTLE_ENDIAN) {
return *(uint32_t *)&buf[0];
}
return (buf[0] | buf[1] << 8 | buf[2] << 16 | buf[3] << 24);
}

Resources