Related
On Windows data can be loaded from DLLs, but it requires indirection through a pointer in the import address table. As a result, the compiler must know if an object that is being accessed is being imported from a DLL by using the __declspec(dllimport) type specifier.
This is unfortunate because it means a that a header for a Windows library designed to be used as either a static library or a dynamic library needs to know which version of the library the program is linking to. This requirement is not applicable to functions, which are transparently emulated for DLLs with a stub function calling the real function, whose address is stored in the import address table.
On Linux the dynamic linker (ld.so) copies the values of all linked data objects from a shared object into a private mapped region for each process. This doesn't require indirection because the address of the private mapped region is local to the module, so its address is decided when the program is linked (and in the case of position independent executables, relative addressing is used).
Why doesn't Windows do the same? Is there a situation where a DLL might be loaded more than once, and thus require multiple copies of linked data? Even if that was the case, it wouldn't be applicable to read only data.
It seems that the MSVCRT handles this issue by defining the _DLL macro when targeting the dynamic C runtime library (with the /MD or /MDd flag), then using that in all standard headers to conditionally declare all exported symbols with __declspec(dllimport). I suppose you could reuse this macro if you only supported statically linking when using the static C runtime and dynamically linking when using the dynamic C runtime.
References:
LNK4217 - Russ Keldorph's WebLog (emphasis mine)
__declspec(dllimport) can be used on both code and data, and its semantics are subtly different between the two. When applied to a routine call, it is purely a performance optimization. For data, it is required for correctness.
[...]
Importing data
If you export a data item from a DLL, you must declare it with __declspec(dllimport) in the code that accesses it. In this case, instead of generating a direct load from memory, the compiler generates a load through a pointer, resulting in one additional indirection. Unlike calls, where the linker will fix up the code correctly whether the routine was declared __declspec(dllimport) or not, accessing imported data requires __declspec(dllimport). If omitted, the code will wind up accessing the IAT entry instead of the data in the DLL, probably resulting in unexpected behavior.
Importing into an Application Using __declspec(dllimport)
Using __declspec(dllimport) is optional on function declarations, but the compiler produces more efficient code if you use this keyword. However, you must use `__declspec(dllimport) for the importing executable to access the DLL's public data symbols and objects.
Importing Data Using __declspec(dllimport)
When you mark the data as __declspec(dllimport), the compiler automatically generates the indirection code for you.
Importing Using DEF Files (interesting historical notes about accessing the IAT directly)
How do I share data in my DLL with an application or with other DLLs?
By default, each process using a DLL has its own instance of all the DLLs global and static variables.
Linker Tools Warning LNK4217
What happens when you get dllimport wrong? (seems to be unaware of data semantics)
How do I export data from a DLL?
CRT Library Features (documents the _DLL macro)
Linux and Windows use different strategies for accessing data stored in dynamic libraries.
On Linux, an undefined reference to an object is resolved to a library at link time. The linker finds the size of the object and reserves space for it in the .bss or the .rdata segment of the executable. When executed, the dynamic linker (ld.so) resolves the symbol to a dynamic library (again), and copies the object from the dynamic library to the process's memory.
On Windows, an undefined reference to an object is resolved to an import library at link time, and no space is reserved for it. When the module is executed, the dynamic linker resolves the symbol to a dynamic library, and creates a copy on write memory map in the process, backed by a shared data segment in the dynamic library.
The advantage of a copy on write memory map is that if the linked data is unchanged, then it can be shared with other processes. In practice this is a trifling benefit which greatly increases complexity, both for the toolchain and programs using dynamic libraries. For objects which are actually written this is always less efficient.
I suspect, although I have no evidence, that this decision was made for a particular and now outdated use case. Perhaps it was common practice to use large (for the time) read only objects in dynamic libraries on 16-bit Windows (in official Microsoft programs or otherwise). Either way, I doubt anyone at Microsoft has the expertise and time to change it now.
In order to investigate the issue I created a program which writes to an object from a dynamic library. It writes one byte per page (4096 bytes) in the object, then writes the entire object, then retries the initial one byte per page write. If the object is reserved for the process before main is called, the first and third loops should take approximately the same time, and the second loop should take longer than both. If the object is a copy on write map to a dynamic library, the first loop should take at least as long as the second, and the third should take less time than both.
The results are consistent with my hypothesis, and analyzing the disassembly confirms that Linux accesses the dynamic library data at a link time address, relative to the program counter. Surprisingly, Windows not only indirectly accesses the data, the pointer to the data and its length are reloaded from the import address table every loop iteration, with optimizations enabled. This was tested with Visual Studio 2010 on Windows XP, so maybe things have changed, although I wouldn't think that it has.
Here are the results for Linux:
$ dd bs=1M count=16 if=/dev/urandom of=libdat.dat
$ xxd -i libdat.dat libdat.c
$ gcc -O3 -g -shared -fPIC libdat.c -o libdat.so
$ gcc -O3 -g -no-pie -L. -ldat dat.c -o dat
$ LD_LIBRARY_PATH=. ./dat
local = 0x1601060
libdat_dat = 0x601040
libdat_dat_len = 0x601020
dirty= 461us write= 12184us retry= 456us
$ nm dat
[...]
0000000000601040 B libdat_dat
0000000000601020 B libdat_dat_len
0000000001601060 B local
[...]
$ objdump -d -j.text dat
[...]
400693: 8b 35 87 09 20 00 mov 0x200987(%rip),%esi # 601020 <libdat_dat_len>
[...]
4006a3: 31 c0 xor %eax,%eax # zero loop counter
4006a5: 48 8d 15 94 09 20 00 lea 0x200994(%rip),%rdx # 601040 <libdat_dat>
4006ac: 0f 1f 40 00 nopl 0x0(%rax) # align loop for efficiency
4006b0: 89 c1 mov %eax,%ecx # store data offset in ecx
4006b2: 05 00 10 00 00 add $0x1000,%eax # add PAGESIZE to data offset
4006b7: c6 04 0a 00 movb $0x0,(%rdx,%rcx,1) # write a zero byte to data
4006bb: 39 f0 cmp %esi,%eax # test loop condition
4006bd: 72 f1 jb 4006b0 <main+0x30> # continue loop if data is left
[...]
Here are the results for Windows:
$ cl /Ox /Zi /LD libdat.c /link /EXPORT:libdat_dat /EXPORT:libdat_dat_len
[...]
$ cl /Ox /Zi dat.c libdat.lib
[...]
$ dat.exe # note low resolution timer means retry is too small to measure
local = 0041EEA0
libdat_dat = 1000E000
libdat_dat_len = 1100E000
dirty= 20312us write= 3125us retry= 0us
$ dumpbin /symbols dat.exe
[...]
9000 .data
1000 .idata
5000 .rdata
1000 .reloc
17000 .text
[...]
$ dumpbin /disasm dat.exe
[...]
004010BA: 33 C0 xor eax,eax # zero loop counter
[...]
004010C0: 8B 15 8C 63 42 00 mov edx,dword ptr [__imp__libdat_dat] # store data pointer in edx
004010C6: C6 04 02 00 mov byte ptr [edx+eax],0 # write a zero byte to data
004010CA: 8B 0D 88 63 42 00 mov ecx,dword ptr [__imp__libdat_dat_len] # store data length in ecx
004010D0: 05 00 10 00 00 add eax,1000h # add PAGESIZE to data offset
004010D5: 3B 01 cmp eax,dword ptr [ecx] # test loop condition
004010D7: 72 E7 jb 004010C0 # continue loop if data is left
[...]
Here is the source code used for both tests:
#include <stdio.h>
#ifdef _WIN32
#include <windows.h>
typedef FILETIME time_l;
time_l time_get(void) {
FILETIME ret; GetSystemTimeAsFileTime(&ret); return ret;
}
long long int time_diff(time_l const *c1, time_l const *c2) {
return 1LL*c2->dwLowDateTime/100-c1->dwLowDateTime/100+c2->dwHighDateTime*100000-c1->dwHighDateTime*100000;
}
#else
#include <unistd.h>
#include <time.h>
#include <stdlib.h>
typedef struct timespec time_l;
time_l time_get(void) {
time_l ret; clock_gettime(CLOCK_MONOTONIC, &ret); return ret;
}
long long int time_diff(time_l const *c1, time_l const *c2) {
return 1LL*c2->tv_nsec/1000-c1->tv_nsec/1000+c2->tv_sec*1000000-c1->tv_sec*1000000;
}
#endif
#ifndef PAGESIZE
#define PAGESIZE 4096
#endif
#ifdef _WIN32
#define DLLIMPORT __declspec(dllimport)
#else
#define DLLIMPORT
#endif
extern DLLIMPORT unsigned char volatile libdat_dat[];
extern DLLIMPORT unsigned int libdat_dat_len;
unsigned int local[4096];
int main(void) {
unsigned int i;
time_l t1, t2, t3, t4;
long long int d1, d2, d3;
t1 = time_get();
for(i=0; i < libdat_dat_len; i+=PAGESIZE) {
libdat_dat[i] = 0;
}
t2 = time_get();
for(i=0; i < libdat_dat_len; i++) {
libdat_dat[i] = 0xFF;
}
t3 = time_get();
for(i=0; i < libdat_dat_len; i+=PAGESIZE) {
libdat_dat[i] = 0;
}
t4 = time_get();
d1 = time_diff(&t1, &t2);
d2 = time_diff(&t2, &t3);
d3 = time_diff(&t3, &t4);
printf("%-15s= %18p\n%-15s= %18p\n%-15s= %18p\n", "local", local, "libdat_dat", libdat_dat, "libdat_dat_len", &libdat_dat_len);
printf("dirty=%9lldus write=%9lldus retry=%9lldus\n", d1, d2, d3);
return 0;
}
I sincerely hope someone else benefits from my research. Thanks for reading!
Here is my C program with one static, two global, one local and one extern variable.
#include <stdio.h>
int gvar1;
int gvar2 = 12;
extern int evar = 1;
int main(void)
{
int lvar;
static int svar = 4;
lvar = 2;
gvar1 = 3;
printf ("global1-%d global2-%d local+1-%d static-%d extern-%d\n", gvar1, gvar2, (lvar+1), svar, evar);
return 0;
}
Note that gvar1, gvar2, evar, lvar and svar are all defined as integers.
I disassembled the code using objdump and the debug_str for this shows as below:
Contents of section .debug_str:
0000 76617269 61626c65 732e6300 6c6f6e67 variables.c.long
0010 20756e73 69676e65 6420696e 74002f75 unsigned int./u
0020 73657273 2f686f6d 6534302f 72616f70 sers/home40/raop
0030 2f626b75 702f6578 616d706c 65730075 /bkup/examples.u
0040 6e736967 6e656420 63686172 00737661 nsigned char.sva
0050 72006d61 696e006c 6f6e6720 696e7400 r.main.long int.
0060 6c766172 0073686f 72742075 6e736967 lvar.short unsig
0070 6e656420 696e7400 67766172 31006776 ned int.gvar1.gv
0080 61723200 65766172 00474e55 20432034 ar2.evar.GNU C 4
0090 2e342e36 20323031 31303733 31202852 .4.6 20110731 (R
00a0 65642048 61742034 2e342e36 2d332900 ed Hat 4.4.6-3).
00b0 73686f72 7420696e 7400 short int.
Why is it showing the following?
unsigned char.svar
long int.lvar
short unsigned int.gvar1.gvar2.evar
How does GCC decide which type it should be stored as?
I am using GCC 4.4.6 20110731 (Red Hat 4.4.6-3)
Why is it showing the following?
Simple answer: It is not showing what you think but it is showing:
1 "variables.c"
2 "long unsigned int"
2a "unsigned int"
2b "int"
3 "/users/home40/raop/bkup/examples"
4 "unsigned char"
4a "char"
5 "svar"
6 "main"
7 "long int"
8 "lvar"
9 "short unsigned int"
10 "gvar1"
11 "gvar2"
12 "evar"
13 "GNU C 4.4.6 20110731 (Red Hat 4.4.6-3)"
14 "short int"
The section is named .debug_str; it contains a list of strings which are separated by NUL bytes. These strings are in any order and they are referenced by the section .debug_info. So the fact that svar is following unsigned char has no meaning at all.
The .debug_info section contains the actual debugging information. This section does not contain strings. Instead it will contain information like this:
...
Item 123:
Type of information: Data type
Name: 2b /* String #2b in ".debug_str" is "int" */
Kind of data type: Signed integer
Number of bits: 32
... some more information ...
Item 124:
Type of information: Global variable
Name: 8 /* "lvar" */
Data type defined by: Item 123
Stored at: Address 0x1234
... some more information ...
Begin item 125:
Type of information: Function
Name: 6 /* "main" */
... some more information ...
Item 126:
Type of information: Local variable
Name: 5 /* "svar" */
Data type defined by: Item 123
Stored at: Address 0x1238
... some more information ...
End item 125 /* Function "main" */
Item 127:
...
You can see this information using the following command:
readelf --debug-dump filename.o
Why does GCC store global and static int differently?
I compiled your example twice: Once with optimization and once without optimization.
Without optimization svar and gvar1 were stored exactly the same way: Data type int, stored on a fixed address. lvar was: Data type int, stored on the stack.
With optimization lvar and svar were stored the same way: Data type: int, not stored at all, instead they are treated as constant value.
(This makes sense because the values of these variables never change.)
The C11 specification (read n1570) -or older C standards- does not define at what addresses or offsets are stored global or static variables, so the implementation (your gcc compiler and your ld linker) is free to put them at any place.
The organization and layout of the data segments is an implementation detail.
You may want to read more about DWARF to understand debug information, which is useful to the gdb debugger.
You may want to read more about linkers and loaders, and about the ELF format, if you want to understand how they are working. On Linux, there are several utilities to inspect elf(5) files, including objdump(1), readelf(1), nm(1).
Notice that your GCC4.4 is an obsolete and old version of GCC. Current version is GCC7, and GCC8 will be released in a few weeks (spring 2018). I strongly recommend to upgrade your compiler.
If you need to understand how and why the data segments are organized in such way and why your implementation chooses such a layout, you could take advantage that both gcc and ld (from binutils) are free software, and study their source in details. You'll need many years of work, since they are complex software (more than ten million lines of source code).
If you happen to start studying the internals of GCC, be sure to study a recent version. Most people of the GCC community have probably forgotten the details of GCC4.4 (released in 2009). A lot of things have changed in GCC since that ancient thing. A few years ago, I have written many slides about GCC internals, see the documentation of GCC MELT.
BTW, the layout of data segments, or of variables inside them, might vary with optimization options. It might happen that lvar does not sit in memory (e.g. stays in a register only); it could happen that a static variable is removed (using something like the as-if rule) etc.
For a single translation unit foo.c, you might compile it into assembler code using gcc -fverbose-asm -S -O foo.c and look into the emitted foo.s assembler code.
To understand more how your ld linker work, you might look into some relevant linker script. You could find how ld is invoked from gcc by using gcc -v (instead of gcc) in your compilation and linking command.
In most cases, you should not care about the particular offsets (in object files or executables) or addresses (in the virtual address space of your process) of global or static variables. Be also aware of ASLR. The proc(5) filesystem can be used to understand your process.
(your question is severely lacking some motivation and context)
I'm working on AIX with IBM's XL C compiler. I'm catching a compile error and I'm not sure how to proceed:
$ xlc -g3 -O0 -qarch=pwr8 -qaltivec fips197-p8.c -o fips197-p8.exe
"fips197-p8.c", line 59.16: 1506-754 (W) The parameter type is not valid for a function of this linkage type.
The relevant source code is shown below. The complete source code is available at fips197-p8.c. The source code is a test driver for Power 8 __cipher and __vcipherlast. It has a main and a few C functions. Effectively is a minimal complete working example for Power 8 AES.
$ cat -n fips197-p8.c
...
11 #if defined(__xlc__) || defined(__xlC__)
12 // #include <builtins.h>
13 #include <altivec.h>
14 typedef vector unsigned char uint8x16_p8;
15 typedef vector unsigned int uint64x2_p8;
16 #else
17 #include <altivec.h>
18 typedef vector unsigned char uint8x16_p8;
19 typedef vector unsigned long long uint64x2_p8;
20 #endif
...
52 uint8x16_p8 Load8x16(const uint8_t src[16])
53 {
54 #if defined(__xlc__) || defined(__xlC__)
55 /* IBM XL C/C++ compiler */
56 # if defined(__LITTLE_ENDIAN__)
57 return vec_xl_be(0, src);
58 # else
59 return vec_xl(0, src);
60 # endif
61 #else
62 /* GCC, Clang, etc */
63
64 #endif
65 }
The compiler version is shown below. We don't control the compiler, so this is what we have:
$ xlc -qversion
IBM XL C/C++ for AIX, V13.1.3 (5725-C72, 5765-J07)
Version: 13.01.0003.0000
vec_xl is fine on a little-endian. vec_xl for big-endian is giving us the trouble.
What is the problem, and how do I fix it?
So a little guesswork (confirmed by OP comments since it works) led me to think that this cryptic & obscure "The parameter type is not valid for a function of this linkage type." message (google first match is this question !) could be a qualifier issue.
Since your contract is
uint8x16_p8 Load8x16(const uint8_t src[16])
it is possible that, given the options & the current endianness, the compiler/prototype believes that vec_xl_be expects a non-const parameter as src.
So passing a const violates the contract (and that's the nicest way xlc could find to notifu you)
So either change to
uint8x16_p8 Load8x16(uint8_t src[16])
(with the risk of dropping constant constraints for all callers)
or drop the const by a non-const cast (like we do when the prototype lacks const, but the data is in fact not modified in the function):
vec_xl_be(0,(uint8_t*)src);
Is there a C library for assembling a x86/x64 assembly string to opcodes?
Example code:
/* size_t assemble(char *string, int asm_flavor, char *out, size_t max_size); */
unsigned char bytes[32];
size_t size = assemble("xor eax, eax\n"
"inc eax\n"
"ret",
asm_x64, &bytes, 32);
for(int i = 0; i < size; i++) {
printf("%02x ", bytes[i]);
}
/* Output: 31 C0 40 C3 */
I have looked at asmpure, however it needs modifications to run on non-windows machines.
I actually both need an assembler and a disassembler, is there a library which provides both?
There is a library that is seemingly a ghost; its existance is widely unknown:
XED (X86 Encoder Decoder)
Intel wrote it: https://software.intel.com/sites/landingpage/pintool/docs/71313/Xed/html/
It can be downloaded with Pin: https://software.intel.com/en-us/articles/pintool-downloads
Sure - you can use llvm. Strictly speaking, it's C++, but there are C interfaces. It will handle both the assembling and disassembling you're trying to do, too.
Here you go:
http://www.gnu.org/software/lightning/manual/lightning.html
Gnu Lightning is a C library which is designed to do exactly what you want. It uses a portable assembly language though, rather than x86 specific one. The portable assembly is compiled in run time to a machine specific one in a very straightforward manner.
As an added bonus, it is much smaller and simpler to start using than LLVM (which is rather big and cumbersome).
You might want libyasm (the backend YASM uses). You can use the frontends as examples (most particularly, YASM's driver).
I'm using fasm.dll: http://board.flatassembler.net/topic.php?t=6239
Don't forget to write "use32" at the beginning of code if it's not in PE format.
Keystone seems like a great choice now, however it didn't exist when I asked this question.
Write the assembly into its own file, and then call it from your C program using extern. You have to do a little bit of makefile trickery, but otherwise it's not so bad.
Your assembly code has to follow C conventions, so it should look like
global _myfunc
_myfunc: push ebp ; create new stack frame for procedure
mov ebp,esp ;
sub esp,0x40 ; 64 bytes of local stack space
mov ebx,[ebp+8] ; first parameter to function
; some more code
leave ; return to C program's frame
ret ; exit
To get at the contents of C variables, or to declare variables which C can access, you need only declare the names as GLOBAL or EXTERN. (Again, the names require leading underscores.) Thus, a C variable declared as int i can be accessed from assembler as
extern _i
mov eax,[_i]
And to declare your own integer variable which C programs can access as extern int j, you do this (making sure you are assembling in the _DATA segment, if necessary):
global _j
_j dd 0
Your C code should look like
extern void myasmfunc(variable a);
int main(void)
{
myasmfunc(a);
}
Compile the files, then link them using
gcc mycfile.o myasmfile.o
I'm currently trying to create a C source code which properly handles I/O whatever the endianness of the target system.
I've selected "little endian" as my I/O convention, which means that, for big endian CPU, I need to convert data while writing or reading.
Conversion is not the issue. The problem I face is to detect endianness, preferably at compile time (since CPU do not change endianness in the middle of execution...).
Up to now, I've been using this :
#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
...
#else
...
#endif
It's documented as a GCC pre-defined macro, and Visual seems to understand it too.
However, I've received report that the check fails for some big_endian systems (PowerPC).
So, I'm looking for a foolproof solution, which ensures that endianess is correctly detected, whatever the compiler and the target system. well, most of them at least...
[Edit] : Most of the solutions proposed rely on "run-time tests". These tests may sometimes be properly evaluated by compilers during compilation, and therefore cost no real runtime performance.
However, branching with some kind of << if (0) { ... } else { ... } >> is not enough. In the current code implementation, variable and functions declaration depend on big_endian detection. These cannot be changed with an if statement.
Well, obviously, there is fall back plan, which is to rewrite the code...
I would prefer to avoid that, but, well, it looks like a diminishing hope...
[Edit 2] : I have tested "run-time tests", by deeply modifying the code. Although they do their job correctly, these tests also impact performance.
I was expecting that, since the tests have predictable output, the compiler could eliminate bad branches. But unfortunately, it doesn't work all the time. MSVC is good compiler, and is successful in eliminating bad branches, but GCC has mixed results, depending on versions, kind of tests, and with greater impact on 64 bits than on 32 bits.
It's strange. And it also means that the run-time tests cannot be ensured to be dealt with by the compiler.
Edit 3 : These days, I'm using a compile-time constant union, expecting the compiler to solve it to a clear yes/no signal.
And it works pretty well :
https://godbolt.org/g/DAafKo
As stated earlier, the only "real" way to detect Big Endian is to use runtime tests.
However, sometimes, a macro might be preferred.
Unfortunately, I've not found a single "test" to detect this situation, rather a collection of them.
For example, GCC recommends : __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__ . However, this only works with latest versions, and earlier versions (and other compilers) will give this test a false value "true", since NULL == NULL. So you need the more complete version : defined(__BYTE_ORDER__)&&(__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
OK, now this works for newest GCC, but what about other compilers ?
You may try __BIG_ENDIAN__ or __BIG_ENDIAN or _BIG_ENDIAN which are often defined on big endian compilers.
This will improve detection. But if you specifically target PowerPC platforms, you can add a few more tests to improve even more detection. Try _ARCH_PPC or __PPC__ or __PPC or PPC or __powerpc__ or __powerpc or even powerpc. Bind all these defines together, and you have a pretty fair chance to detect big endian systems, and powerpc in particular, whatever the compiler and its version.
So, to summarize, there is no such thing as a "standard pre-defined macros" which guarantees to detect big-endian CPU on all platforms and compilers, but there are many such pre-defined macros which, collectively, give a high probability of correctly detecting big endian under most circumstances.
At compile time in C you can't do much more than trusting preprocessor #defines, and there are no standard solutions because the C standard isn't concerned with endianness.
Still, you could add an assertion that is done at runtime at the start of the program to make sure that the assumption done when compiling was true:
inline int IsBigEndian()
{
int i=1;
return ! *((char *)&i);
}
/* ... */
#ifdef COMPILED_FOR_BIG_ENDIAN
assert(IsBigEndian());
#elif COMPILED_FOR_LITTLE_ENDIAN
assert(!IsBigEndian());
#else
#error "No endianness macro defined"
#endif
(where COMPILED_FOR_BIG_ENDIAN and COMPILED_FOR_LITTLE_ENDIAN are macros #defined previously according to your preprocessor endianness checks)
Instead of looking for a compile-time check, why not just use big-endian order (which is considered the "network order" by many) and use the htons/htonl/ntohs/ntohl functions provided by most UNIX-systems and Windows. They're already defined to do the job you're trying to do. Why reinvent the wheel?
Try something like:
if(*(char *)(int[]){1}) {
/* little endian code */
} else {
/* big endian code */
}
and see if your compiler resolves it at compile-time. If not, you might have better luck doing the same with a union. Actually I like defining macros using unions that evaluate to 0,1 or 1,0 (respectively) so that I can just do things like accessing buf[HI] and buf[LO].
Notwithstanding compiler-defined macros, I don't think there's a compile-time way to detect this, since determining the endianness of an architecture involves analyzing the manner in which it stores data in memory.
Here's a function which does just that:
bool IsLittleEndian () {
int i=1;
return (int)*((unsigned char *)&i)==1;
}
As others have pointed out, there isn't a portable way to check for endianness at compile-time. However, one option would be to use the autoconf tool as part of your build script to detect whether the system is big-endian or little-endian, then to use the AC_C_BIGENDIAN macro, which holds this information. In a sense, this builds a program that detects at runtime whether the system is big-endian or little-endian, then has that program output information that can then be used statically by the main source code.
Hope this helps!
This comes from p. 45 of Pointers in C:
#include <stdio.h>
#define BIG_ENDIAN 0
#define LITTLE_ENDIAN 1
int endian()
{
short int word = 0x0001;
char *byte = (char *) &word;
return (byte[0] ? LITTLE_ENDIAN : BIG_ENDIAN);
}
int main(int argc, char* argv[])
{
int value;
value = endian();
if (value == 1)
printf("The machine is Little Endian\n");
else
printf("The machine is Big Endian\n");
return 0;
}
Socket's ntohl function can be used for this purpose. Source
// Soner
#include <stdio.h>
#include <arpa/inet.h>
int main() {
if (ntohl(0x12345678) == 0x12345678) {
printf("big-endian\n");
} else if (ntohl(0x12345678) == 0x78563412) {
printf("little-endian\n");
} else {
printf("(stupid)-middle-endian\n");
}
return 0;
}
My GCC version is 9.3.0, it's configured to support powerpc64 platform, and I've tested it and verified that it supports the following macros logic:
#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
......
#endif
#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
.....
#endif
As of C++20, no more hacks or compiler extensions are necessary.
https://en.cppreference.com/w/cpp/types/endian
std::endian (Defined in header <bit>)
enum class endian
{
little = /*implementation-defined*/,
big = /*implementation-defined*/,
native = /*implementation-defined*/
};
If all scalar types are little-endian, std::endian::native equals std::endian::little
If all scalar types are big-endian, std::endian::native equals std::endian::big
You can't detect it at compile time to be portable across all compilers. Maybe you can change the code to do it at run-time - this is achievable.
It is not possible to detect endianness portably in C with preprocessor directives.
I took the liberty of reformatting the quoted text
As of 2017-07-18, I use union { unsigned u; unsigned char c[4]; }
If sizeof (unsigned) != 4 your test may fail.
It may be better to use
union { unsigned u; unsigned char c[sizeof (unsigned)]; }
As most have mentioned, compile time is your best bet. Assuming you do not do cross compilations and you use cmake (it will also work with other tools such as a configure script, of course) then you can use a pre-test which is a compiled .c or .cpp file and that gives you the actual verified endianness of the processor you're running on.
With cmake you use the TestBigEndian macro. It sets a variable which you can then pass to your software. Something like this (untested):
TestBigEndian(IS_BIG_ENDIAN)
...
set(CFLAGS ${CFLAGS} -DIS_BIG_ENDIAN=${IS_BIG_ENDIAN}) // C
set(CXXFLAGS ${CXXFLAGS} -DIS_BIG_ENDIAN=${IS_BIG_ENDIAN}) // C++
Then in your C/C++ code you can check that IS_BIG_ENDIAN define:
#if IS_BIG_ENDIAN
...do big endian stuff here...
#else
...do little endian stuff here...
#endif
So the main problem with such a test is cross compiling since you may be on a completely different CPU with a different endianness... but at least it gives you the endianness at time of compiling the rest of your code and will work for most projects.
I provided a general approach in C with no preprocessor, but only runtime that compute endianess for every C type.
the output if this on my Linux x86_64 architecture is:
fabrizio#toshibaSeb:~/git/pegaso/scripts$ gcc -o sizeof_endianess sizeof_endianess.c
fabrizio#toshibaSeb:~/git/pegaso/scripts$ ./sizeof_endianess
INTEGER TYPE | signed | unsigned | 0x010203... | Endianess
--------------+---------+------------+-------------------------+--------------
int | 4 | 4 | 04 03 02 01 | little
char | 1 | 1 | - | -
short | 2 | 2 | 02 01 | little
long int | 8 | 8 | 08 07 06 05 04 03 02 01 | little
long long int | 8 | 8 | 08 07 06 05 04 03 02 01 | little
--------------+---------+------------+-------------------------+--------------
FLOATING POINT| size |
--------------+---------+
float | 4
double | 8
long double | 16
Get source at: https://github.com/bzimage-it/pegaso/blob/master/scripts/sizeof_endianess.c
This is a more general approach is to not detect endianess at compilation time (not possibile) nor assume any endianess escludes another one. In fact is important to remark that endianess is not a concept of the architecture/processor but regards single type. As argued by
#Christoph at https://stackoverflow.com/a/4712594/3280080 PDP-11 for example can have different endianess at the same time.
The approach consist to set an integer to be x = 0x010203... as long is it, then print them looking at casted-at-single-byte incrementing the address by one.
Can somebody test it please in a big endian and/or mixed endianess ?
I know I'm late to this party, but here is my take.
int is_big_endian() {
return 1 & *(uint16_t*)"01";
}
This is based on the fact that '0' is 48 in decimal and '1' 49, so '1' has the LSB bit set, while '0' not. I could make them '\x00' and '\x01' but I think my version makes it more readable.
#define BIG_ENDIAN ((1 >> 1 == 0) ? 0 : 1)