Struct layout in apcs-gnu ABI

Struct layout in apcs-gnu ABI - c

For this code:
struct S { unsigned char ch[2]; };
int main(void)
{
_Static_assert( sizeof(struct S) == 2, "size was not 2");
}
using GCC (various versions) for ARM with the ABI apcs-gnu (aka. OABI, or EABI version 0), I get the assertion fails. It turns out the size of the struct is 4.
I can work around this by using __attribute__((packed)); but my questions are:
What is the rationale for making this struct size 4?
Is there any documentation specifying the layout of structs in this ABI?
On the ARM website I found documentation for aapcs (EABI version 5) which does specify this struct as having a size of 2; but I could not find anything about apcs-gnu.

This is a GCC-specific decision to trade-off size for performance. It can be overridden with -mstructure-size-boundary=8.
An excerpt from source code:
/* Setting STRUCTURE_SIZE_BOUNDARY to 32 produces more efficient code, but the
value set in previous versions of this toolchain was 8, which produces more
compact structures. The command line option -mstructure_size_boundary=<n>
can be used to change this value. For compatibility with the ARM SDK
however the value should be left at 32. ARM SDT Reference Manual (ARM DUI
0020D) page 2-20 says "Structures are aligned on word boundaries".
The AAPCS specifies a value of 8. */
#define STRUCTURE_SIZE_BOUNDARY arm_structure_size_boundary

Related

How to tell gcc to not align function parameters on the stack?

I am trying to decompile an executable for the 68000 processor into C code, replacing the original subroutines with C functions one by one.
The problem I faced is that I don't know how to make gcc use the calling convention that matches the one used in the original program. I need the parameters on the stack to be packed, not aligned.
Let's say we have the following function
int fun(char arg1, short arg2, int arg3) {
return arg1 + arg2 + arg3;
}
If we compile it with
gcc -m68000 -Os -fomit-frame-pointer -S source.c
we get the following output
fun:
move.b 7(%sp),%d0
ext.w %d0
move.w 10(%sp),%a0
lea (%a0,%d0.w),%a0
move.l %a0,%d0
add.l 12(%sp),%d0
rts
As we can see, the compiler assumed that parameters have addresses 7(%sp), 10(%sp) and 12(%sp):
but to work with the original program they need to have addresses 4(%sp), 5(%sp) and 7(%sp):
One possible solution is to write the function in the following way (the processor is big-endian):
int fun(int bytes4to7, int bytes8to11) {
char arg1 = bytes4to7>>24;
short arg2 = (bytes4to7>>8)&0xffff;
int arg3 = ((bytes4to7&0xff)<<24) | (bytes8to11>>8);
return arg1 + arg2 + arg3;
}
However, the code looks messy, and I was wondering: is there a way to both keep the code clean and achieve the desired result?
UPD: I made a mistake. The offsets I'm looking for are actually 5(%sp), 6(%sp) and 8(%sp) (the char-s should be aligned with the short-s, but the short-s and the int-s are still packed):
Hopefully, this doesn't change the essence of the question.
UPD 2: It turns out that the 68000 C Compiler by Sierra Systems gives the described offsets (as in UPD, with 2-byte alignment).
However, the question is about tweaking calling conventions in gcc (or perhaps another modern compiler).

Here's a way with a packed struct. I compiled it on an x86 with -m32 and got the desired offsets in the disassembly, so I think it should still work for an mc68000:
typedef struct {
char arg1;
short arg2;
int arg3;
} __attribute__((__packed__)) fun_t;
int
fun(fun_t fun)
{
return fun.arg1 + fun.arg2 + fun.arg3;
}
But, I think there's probably a still cleaner way. It would require knowing more about the other code that generates such a calling sequence. Do you have the source code for it?
Does the other code have to remain in asm? With the source, you could adjust the offsets in the asm code to be compatible with modern C ABI calling conventions.
I've been programming in C since 1981 and spent years doing mc68000 C and assembler code (for apps, kernel, device drivers), so I'm somewhat familiar with the problem space.

It's not a gcc 'fault', it is 68k architecture that requires stack to be always aligned on 2 bytes.
So there is simply no way to break 2-byte alignment on the hardware stack.
but to work with the original program they need to have addresses
4(%sp), 5(%sp) and 7(%sp):
Accessing word or long values off the ODD memory address will immediately trigger alignment exception on 68000.

To get integral parameters passed using 2 byte alignment instead of 4 byte alignment, you can change the default int size to be 16 bit by -mshort. You need to replace all int in your code by long (if you want them to be 32 bit wide). The crude way to do that is to also pass -Dint=long to your compiler. Obviously, you will break ABI compatibility to object files compiled with -mno-short (which appears to be the default for gcc).

Is there any difference between mmap vs mmap64?

On a 64-bit machine, is there any difference between mmap vs mmap64?
There are other such as fstat64 vs fstat.
answer:
On a 64-bit Ubuntu 18 LTS,
Verified that mmap and mmap64 func addr are the same.
off_t and off64_t are 64-bit.
fstat/stat able to return > 2 GiB file size.
code:
#include <sys/mman.h>
#include <sys/stat.h>
#include <iostream>
using namespace std;
int main(){
cout << sizeof(off_t) << endl;
void* a = (void*)&mmap64;
void * b = (void*)&mmap;
cout << (a ==b) << endl; // same addr
a = (void*)&fstat64;
b = (void*)&fstat;
cout << (a==b) << endl; // diff addr but able to return > 2 GiB size
}

On a 64-bit machine, is there any difference between mmap vs mmap64?
None.
The *64 interfaces were introduced to enable Large File Support on 32-bit systems. It makes no difference on 64-bit systems.
However, the 64-bit interfaces are not actually exposed to users directly (not part POSIX). So you should not use the *64 interfaces directly. If you happen to need them on 32-bit systems, then use feature test macros of glibc (e.g., _FILE_OFFSET_BITS).
Macro: _FILE_OFFSET_BITS
This macro determines which file system interface shall be used, one replacing the other. Whereas _LARGEFILE64_SOURCE makes the 64 bit interface available as an additional interface, _FILE_OFFSET_BITS allows the 64 bit interface to replace the old interface.
If _FILE_OFFSET_BITS is undefined, or if it is defined to the value 32, nothing changes. The 32 bit interface is used and types like off_t have a size of 32 bits on 32 bit systems.
If the macro is defined to the value 64, the large file interface replaces the old interface. I.e., the functions are not made available under different names (as they are with _LARGEFILE64_SOURCE). Instead the old function names now reference the new functions, e.g., a call to fseeko now indeed calls fseeko64.
This macro should only be selected if the system provides mechanisms for handling large files. On 64 bit systems this macro has no effect since the *64 functions are identical to the normal functions.
This macro was introduced as part of the Large File Support extension (LFS).

The mmap64() function is identical to the mmap() function except that it can be used to map memory from files that are larger than 2 gigabytes into the process memory. The mmap64() function is a part of the large file extensions.
from here

What are the semantics of structure padding/packing in the Linux kernel?

I am interested in the semantics of structure padding and packing, specifically in relation to the structures returned from the Linux kernel.
For example, if a program+stdlib is compiled so structure padding doesn't take place, and a kernel is compiled with so structure padding does take place (Which IIRC is the default for GCC anyway), surely the program cannot run due to the structures returned from the kernel being garbage from it's point of view.
What about if the compiler in question changed it's padding semantics over time, surely the same problem is likely to crop up. The structures defined in /usr/include/linux/* and /usr/include/asm-generic/* do not appear to be packed, so they depend on the compiler used and the alignment semantics of said compiler, right?
But I can take a binary compiled years ago on a different computer with different memory alignment requirements and presumably different padding semantics, and run it on my modern computer and it appears to work fine.
How does it not see garbage? Is this just pure luck? Do compiler authors (like say, TCC and the like) take care to copy GCC's structure padding semantics? How is this potential problem dealt with in the real world?

The structures defined in /usr/include/linux/* and
/usr/include/asm-generic/* do not appear to be packed, so they
depend on the compiler used and the alignment semantics of said
compiler, right?
That's not true, generally. Here is an example from GCC on 64-bit Ubuntu (/usr/include/x86_64-linux-gnu/asm/stat.h):
struct stat {
__kernel_ulong_t st_dev;
__kernel_ulong_t st_ino;
__kernel_ulong_t st_nlink;
unsigned int st_mode;
unsigned int st_uid;
unsigned int st_gid;
unsigned int __pad0;
__kernel_ulong_t st_rdev;
__kernel_long_t st_size;
__kernel_long_t st_blksize;
__kernel_long_t st_blocks; /* Number 512-byte blocks allocated. */
__kernel_ulong_t st_atime;
__kernel_ulong_t st_atime_nsec;
__kernel_ulong_t st_mtime;
__kernel_ulong_t st_mtime_nsec;
__kernel_ulong_t st_ctime;
__kernel_ulong_t st_ctime_nsec;
__kernel_long_t __unused[3];
};
See __pad0? int is generally 4 bytes, but st_rdev is long, which is 8 bytes, so it must be 8-byte aligned. However, it is preceded by 3 ints = 12 bytes, so a 4-byte __pad0 is added.
Essentially, the implementation of stdlib takes care to hard-code its ABI.
BUT that isn't true for all APIs. Here is struct flock (from the same machine, /usr/include/asm-generic/fcntl.h) used by the fcntl() call:
struct flock {
short l_type;
short l_whence;
__kernel_off_t l_start;
__kernel_off_t l_len;
__kernel_pid_t l_pid;
__ARCH_FLOCK_PAD
};
As you can see, there is no padding between l_whence and l_start. And indeed, for the following C program, saved as abi.c:
#include <fcntl.h>
#include <string.h>
int main(int argc, char **argv)
{
struct flock fl;
int fd;
fd = open("y", O_RDWR);
memset(&fl, 0xff, sizeof(fl));
fl.l_type = F_RDLCK;
fl.l_whence = SEEK_SET;
fl.l_start = 200;
fl.l_len = 1;
fcntl(fd, F_SETLK, &fl);
}
We get:
$ cc -g -o abi abi.c && strace -e fcntl ./abi
fcntl(3, F_SETLK, {l_type=F_RDLCK, l_whence=SEEK_SET, l_start=200, l_len=1}) = 0
+++ exited with 0 +++
$ cc -g -fpack-struct -o abi abi.c && strace -e fcntl ./abi
fcntl(3, F_SETLK, {l_type=F_RDLCK, l_whence=SEEK_SET, l_start=4294967296, l_len=-4294967296}) = 0
+++ exited with 0 +++
As you can see, the fields following l_whence are indeed garbage.
Moreover, C has no ABI, and so this fragile compatibility relies on implementation playing nice. struct stat above assumes that the compiler wouldn't insert extra random padding.
ANSI C says:
There may also be unnamed padding at the end of a structure or union, as necessary to achieve the appropriate alignment were the structure or union to be a member of an array.
There's no wording on how padding may be inserted in the middle of a struct for reasons other than alignment, however there's also:
Implementation-defined behavior
Each implementation shall document its behavior in each of the areas listed in this section. The following are implementation-defined:
...
The padding and alignment of members of structures. This should present no problem unless binary data written by one implementation are read by another.
On my Ubuntu machine, both the compiler and the standard library come from GCC, so they interoperate smoothly. Clang wants to grow, so it's compatible with GNU libc. Everyone is just playing nice, most of the time.

Assembly label address incorrect on 32-bit processors

I have some simple code that finds the difference between two assembly labels:
#include <stdio.h>
static void foo(void){
__asm__ __volatile__("_foo_start:");
printf("Hello, world.\n");
__asm__ __volatile__("_foo_end:");
}
int main(void){
extern const char foo_start[], foo_end[];
printf("foo_start: %p, foo_end: %p\n", foo_start, foo_end);
printf("Difference = 0x%tx.\n", foo_end - foo_start);
foo();
return 0;
}
Now, this code works perfectly on 64-bit processors, just like you would expect it to. However, on 32-bit processors, the address of foo_start is the same as foo_end.
I'm sure it has to do with 32 to 64 bit. On i386, it results in 0x0, and x86_64 results in 0x7. On ARMv7 (32 bit), it results in 0x0, while on ARM64, it results in 0xC. (the 64-bit results are correct, I checked them with a disassembler)
I'm using Clang+LLVM to compile.
I'm wondering if it has to do with non-lazy pointers. In the assembly output of both 32-bit processor archs mentioned above, they have something like this at the end:
L_foo_end$non_lazy_ptr:
.indirect_symbol _foo_end
.long 0
L_foo_start$non_lazy_ptr:
.indirect_symbol _foo_start
.long 0
However, this is not present in the assembly output of both x86_64 and ARM64. I messed with removing the non-lazy pointers and addressing the labels directly yesterday, but to no avail. Any ideas on why this happens?
EDIT:
It appears that when compiled for 32 bit processors, foo_start[] and foo_end[] point to main. I....I'm so confused.

I didn't check on real code but suspect you are a victim of instruction reordering. As long as you do not define proper memory barriers, the compiler ist free to move your code within the function around as it sees fit since there is no interdependency between labels and printf() call.
Try adding ::: "memory" to your asm statements which should nail them where you wrote them.

I finally found the solution (or, alternative, I suppose). Apparently, the && operator can be used to get the address of C labels, removing the need for me to use inline assembly at all. I don't think it's in the C standard, but it looks like Clang supports it, and I've heard GCC does too.
#include <stdio.h>
int main(void){
foo_start:
printf("Hello, world.\n");
foo_end:
printf("Foo has ended.");
void* foo_start_ptr = &&foo_start;
void* foo_end_ptr = &&foo_end;
printf("foo_start: %p, foo_end: %p\n", foo_start_ptr, foo_end_ptr);
printf("Difference: 0x%tx\n", (long)foo_end_ptr - (long)foo_start_ptr);
return 0;
}
Now, this only works if the labels are in the same function, but for what I intend to use this for, it's perfect. No more ASM, and it doesn't leave a symbol behind. It appears to work just how I need it to. (Not tested on ARM64)

Detecting Endianness

I'm currently trying to create a C source code which properly handles I/O whatever the endianness of the target system.
I've selected "little endian" as my I/O convention, which means that, for big endian CPU, I need to convert data while writing or reading.
Conversion is not the issue. The problem I face is to detect endianness, preferably at compile time (since CPU do not change endianness in the middle of execution...).
Up to now, I've been using this :
#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
...
#else
...
#endif
It's documented as a GCC pre-defined macro, and Visual seems to understand it too.
However, I've received report that the check fails for some big_endian systems (PowerPC).
So, I'm looking for a foolproof solution, which ensures that endianess is correctly detected, whatever the compiler and the target system. well, most of them at least...
[Edit] : Most of the solutions proposed rely on "run-time tests". These tests may sometimes be properly evaluated by compilers during compilation, and therefore cost no real runtime performance.
However, branching with some kind of << if (0) { ... } else { ... } >> is not enough. In the current code implementation, variable and functions declaration depend on big_endian detection. These cannot be changed with an if statement.
Well, obviously, there is fall back plan, which is to rewrite the code...
I would prefer to avoid that, but, well, it looks like a diminishing hope...
[Edit 2] : I have tested "run-time tests", by deeply modifying the code. Although they do their job correctly, these tests also impact performance.
I was expecting that, since the tests have predictable output, the compiler could eliminate bad branches. But unfortunately, it doesn't work all the time. MSVC is good compiler, and is successful in eliminating bad branches, but GCC has mixed results, depending on versions, kind of tests, and with greater impact on 64 bits than on 32 bits.
It's strange. And it also means that the run-time tests cannot be ensured to be dealt with by the compiler.
Edit 3 : These days, I'm using a compile-time constant union, expecting the compiler to solve it to a clear yes/no signal.
And it works pretty well :
https://godbolt.org/g/DAafKo

As stated earlier, the only "real" way to detect Big Endian is to use runtime tests.
However, sometimes, a macro might be preferred.
Unfortunately, I've not found a single "test" to detect this situation, rather a collection of them.
For example, GCC recommends : __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__ . However, this only works with latest versions, and earlier versions (and other compilers) will give this test a false value "true", since NULL == NULL. So you need the more complete version : defined(__BYTE_ORDER__)&&(__BYTE_ORDER__ == __ORDER_BIG_ENDIAN__)
OK, now this works for newest GCC, but what about other compilers ?
You may try __BIG_ENDIAN__ or __BIG_ENDIAN or _BIG_ENDIAN which are often defined on big endian compilers.
This will improve detection. But if you specifically target PowerPC platforms, you can add a few more tests to improve even more detection. Try _ARCH_PPC or __PPC__ or __PPC or PPC or __powerpc__ or __powerpc or even powerpc. Bind all these defines together, and you have a pretty fair chance to detect big endian systems, and powerpc in particular, whatever the compiler and its version.
So, to summarize, there is no such thing as a "standard pre-defined macros" which guarantees to detect big-endian CPU on all platforms and compilers, but there are many such pre-defined macros which, collectively, give a high probability of correctly detecting big endian under most circumstances.

At compile time in C you can't do much more than trusting preprocessor #defines, and there are no standard solutions because the C standard isn't concerned with endianness.
Still, you could add an assertion that is done at runtime at the start of the program to make sure that the assumption done when compiling was true:
inline int IsBigEndian()
{
int i=1;
return ! *((char *)&i);
}
/* ... */
#ifdef COMPILED_FOR_BIG_ENDIAN
assert(IsBigEndian());
#elif COMPILED_FOR_LITTLE_ENDIAN
assert(!IsBigEndian());
#else
#error "No endianness macro defined"
#endif
(where COMPILED_FOR_BIG_ENDIAN and COMPILED_FOR_LITTLE_ENDIAN are macros #defined previously according to your preprocessor endianness checks)

Instead of looking for a compile-time check, why not just use big-endian order (which is considered the "network order" by many) and use the htons/htonl/ntohs/ntohl functions provided by most UNIX-systems and Windows. They're already defined to do the job you're trying to do. Why reinvent the wheel?

Try something like:
if(*(char *)(int[]){1}) {
/* little endian code */
} else {
/* big endian code */
}
and see if your compiler resolves it at compile-time. If not, you might have better luck doing the same with a union. Actually I like defining macros using unions that evaluate to 0,1 or 1,0 (respectively) so that I can just do things like accessing buf[HI] and buf[LO].

Notwithstanding compiler-defined macros, I don't think there's a compile-time way to detect this, since determining the endianness of an architecture involves analyzing the manner in which it stores data in memory.
Here's a function which does just that:
bool IsLittleEndian () {
int i=1;
return (int)*((unsigned char *)&i)==1;
}

As others have pointed out, there isn't a portable way to check for endianness at compile-time. However, one option would be to use the autoconf tool as part of your build script to detect whether the system is big-endian or little-endian, then to use the AC_C_BIGENDIAN macro, which holds this information. In a sense, this builds a program that detects at runtime whether the system is big-endian or little-endian, then has that program output information that can then be used statically by the main source code.
Hope this helps!

This comes from p. 45 of Pointers in C:
#include <stdio.h>
#define BIG_ENDIAN 0
#define LITTLE_ENDIAN 1
int endian()
{
short int word = 0x0001;
char *byte = (char *) &word;
return (byte[0] ? LITTLE_ENDIAN : BIG_ENDIAN);
}
int main(int argc, char* argv[])
{
int value;
value = endian();
if (value == 1)
printf("The machine is Little Endian\n");
else
printf("The machine is Big Endian\n");
return 0;
}

Socket's ntohl function can be used for this purpose. Source
// Soner
#include <stdio.h>
#include <arpa/inet.h>
int main() {
if (ntohl(0x12345678) == 0x12345678) {
printf("big-endian\n");
} else if (ntohl(0x12345678) == 0x78563412) {
printf("little-endian\n");
} else {
printf("(stupid)-middle-endian\n");
}
return 0;
}

My GCC version is 9.3.0, it's configured to support powerpc64 platform, and I've tested it and verified that it supports the following macros logic:
#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
......
#endif
#if __BYTE_ORDER__ == __ORDER_BIG_ENDIAN__
.....
#endif

As of C++20, no more hacks or compiler extensions are necessary.
https://en.cppreference.com/w/cpp/types/endian
std::endian (Defined in header <bit>)
enum class endian
{
little = /*implementation-defined*/,
big = /*implementation-defined*/,
native = /*implementation-defined*/
};
If all scalar types are little-endian, std::endian::native equals std::endian::little
If all scalar types are big-endian, std::endian::native equals std::endian::big

You can't detect it at compile time to be portable across all compilers. Maybe you can change the code to do it at run-time - this is achievable.

It is not possible to detect endianness portably in C with preprocessor directives.

I took the liberty of reformatting the quoted text
As of 2017-07-18, I use union { unsigned u; unsigned char c[4]; }
If sizeof (unsigned) != 4 your test may fail.
It may be better to use
union { unsigned u; unsigned char c[sizeof (unsigned)]; }

As most have mentioned, compile time is your best bet. Assuming you do not do cross compilations and you use cmake (it will also work with other tools such as a configure script, of course) then you can use a pre-test which is a compiled .c or .cpp file and that gives you the actual verified endianness of the processor you're running on.
With cmake you use the TestBigEndian macro. It sets a variable which you can then pass to your software. Something like this (untested):
TestBigEndian(IS_BIG_ENDIAN)
...
set(CFLAGS ${CFLAGS} -DIS_BIG_ENDIAN=${IS_BIG_ENDIAN}) // C
set(CXXFLAGS ${CXXFLAGS} -DIS_BIG_ENDIAN=${IS_BIG_ENDIAN}) // C++
Then in your C/C++ code you can check that IS_BIG_ENDIAN define:
#if IS_BIG_ENDIAN
...do big endian stuff here...
#else
...do little endian stuff here...
#endif
So the main problem with such a test is cross compiling since you may be on a completely different CPU with a different endianness... but at least it gives you the endianness at time of compiling the rest of your code and will work for most projects.

I provided a general approach in C with no preprocessor, but only runtime that compute endianess for every C type.
the output if this on my Linux x86_64 architecture is:
fabrizio#toshibaSeb:~/git/pegaso/scripts$ gcc -o sizeof_endianess sizeof_endianess.c
fabrizio#toshibaSeb:~/git/pegaso/scripts$ ./sizeof_endianess
INTEGER TYPE | signed | unsigned | 0x010203... | Endianess
--------------+---------+------------+-------------------------+--------------
int | 4 | 4 | 04 03 02 01 | little
char | 1 | 1 | - | -
short | 2 | 2 | 02 01 | little
long int | 8 | 8 | 08 07 06 05 04 03 02 01 | little
long long int | 8 | 8 | 08 07 06 05 04 03 02 01 | little
--------------+---------+------------+-------------------------+--------------
FLOATING POINT| size |
--------------+---------+
float | 4
double | 8
long double | 16
Get source at: https://github.com/bzimage-it/pegaso/blob/master/scripts/sizeof_endianess.c
This is a more general approach is to not detect endianess at compilation time (not possibile) nor assume any endianess escludes another one. In fact is important to remark that endianess is not a concept of the architecture/processor but regards single type. As argued by
#Christoph at https://stackoverflow.com/a/4712594/3280080 PDP-11 for example can have different endianess at the same time.
The approach consist to set an integer to be x = 0x010203... as long is it, then print them looking at casted-at-single-byte incrementing the address by one.
Can somebody test it please in a big endian and/or mixed endianess ?

I know I'm late to this party, but here is my take.
int is_big_endian() {
return 1 & *(uint16_t*)"01";
}
This is based on the fact that '0' is 48 in decimal and '1' 49, so '1' has the LSB bit set, while '0' not. I could make them '\x00' and '\x01' but I think my version makes it more readable.

#define BIG_ENDIAN ((1 >> 1 == 0) ? 0 : 1)

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight