I was reading this answer : C struct memory layout? and was curious to know why :
struct ST
{
long long ll;
char ch2;
char ch1;
short s;
int i;
};
still is the size of 24 bytes instead of 16. I was expecting 2*char + short + int to fit into 8 bytes. Why is it so?
EDIT:
Sorry for the confusion, I am running on a 64 bit system (debian) gcc (Debian 4.4.5-8) 4.4.5. I already know its due to padding. My question was why? One of the answers suggests :
char = 1 byte
char = 1 byte
short = 1 byte (why is this 1 and not 2?)
* padding of 5 bytes
My question is, why is this padding here... why not just put an int straight after the short, it will still fit within 8 bytes.
The simple answer is: it isn't 24 bytes. Or you're running on on the 64 bit s390 port of Linux that I haven't been able to find the ABI documentation for. Every other 64 bit hardware that Debian can run on will have the size of this struct as 16 bytes.
I have dug up the ABI documentation for a bunch of different CPU ABIs and they all have more or less this wording (it seems they have all been copying from each other):
Structures and unions assume the alignment of their most strictly aligned component. Each member is assigned to the lowest available offset with the appropriate alignment. The size of any object is always a multiple of the object's alignment.
And all architecture ABI documents I found (mips64, ppc64, amd64, ia64, sparc64, arm64) have the alignment for char 1, short 2, int 4 and long long 8.
Even though operating systems are allowed to make their own ABI, almost every unix-like system and especially Linux follow the System V ABI and their supplemental CPU documentation that specifies this behavior very well. And Debian will definitely not change this behavior to be different from all other Linuxes.
Here's a quick verification (all on amd64/x86_64 which is what you're most likely running):
$ cat > foo.c
#include <stdio.h>
int
main(int argc, char **argv)
{
struct {
long long ll;
char ch2;
char ch1;
short s;
int i;
} foo;
printf("%d\n", (int)sizeof(foo));
return 0;
}
MacOS:
$ cc -o foo foo.c && ./foo && uname -ms
16
Darwin x86_64
Ubuntu:
$ cc -o foo foo.c && ./foo && uname -ms
16
Linux x86_64
CentOS:
$ cc -o foo foo.c && ./foo && uname -ms
16
Linux x86_64
OpenBSD:
$ cc -o foo foo.c && ./foo && uname -ms
16
OpenBSD amd64
There's something else wrong with your compilation. Or that's not the struct you're testing or you're running on a very strange hardware architecture and specifying it as "64 bit" is equivalent to saying "I'm driving this government issued vehicle and it has very strange acceleration and the engine cuts out after 5 minutes" and not mentioning that you're talking about the space shuttle.
It's all about padding. In visual studio (and some other compilers) you can use the #pragma push/pack to make it align as you wish.
#pragma pack(push, 1)
struct ST
{
/*0x00*/ long long ll;
/*0x08*/ char ch2;
/*0x09*/ char ch1;
/*0x0a*/ short s;
/*0x0c*/ int i;
/*0x10*/
};
#pragma pack(pop)
Since you said it was coming up as size 24, I'm going to guess that the compiler is aligning you at 4 bytes by doing something like this:
struct ST
{
/*0x00*/ long long ll;
/*0x08*/ char ch2;
/*0x09*/ char padding1[0x3];
/*0x0c*/ char ch1;
/*0x0d*/ char padding2[0x3];
/*0x10*/ short s;
/*0x11*/ char padding3[0x2];
/*0x14*/ int i;
/*0x18*/
};
(Sorry, I think in hex when doing this sort of thing. 0x10 is 16 in decimal and 0x18 is 24 in decimal.)
Usually to minimize paddings, it's normal practice to order the members from biggest to smallest, could you try to reorder them and see what comes out of this?
struct ST
{
long long ll; //8 bytes
int i; //4bytes
short s; //2 bytes
char ch2; //1 byte
char ch1; //1 byte
};
total is 16 bytes
Related
I am interested in the semantics of structure padding and packing, specifically in relation to the structures returned from the Linux kernel.
For example, if a program+stdlib is compiled so structure padding doesn't take place, and a kernel is compiled with so structure padding does take place (Which IIRC is the default for GCC anyway), surely the program cannot run due to the structures returned from the kernel being garbage from it's point of view.
What about if the compiler in question changed it's padding semantics over time, surely the same problem is likely to crop up. The structures defined in /usr/include/linux/* and /usr/include/asm-generic/* do not appear to be packed, so they depend on the compiler used and the alignment semantics of said compiler, right?
But I can take a binary compiled years ago on a different computer with different memory alignment requirements and presumably different padding semantics, and run it on my modern computer and it appears to work fine.
How does it not see garbage? Is this just pure luck? Do compiler authors (like say, TCC and the like) take care to copy GCC's structure padding semantics? How is this potential problem dealt with in the real world?
The structures defined in /usr/include/linux/* and
/usr/include/asm-generic/* do not appear to be packed, so they
depend on the compiler used and the alignment semantics of said
compiler, right?
That's not true, generally. Here is an example from GCC on 64-bit Ubuntu (/usr/include/x86_64-linux-gnu/asm/stat.h):
struct stat {
__kernel_ulong_t st_dev;
__kernel_ulong_t st_ino;
__kernel_ulong_t st_nlink;
unsigned int st_mode;
unsigned int st_uid;
unsigned int st_gid;
unsigned int __pad0;
__kernel_ulong_t st_rdev;
__kernel_long_t st_size;
__kernel_long_t st_blksize;
__kernel_long_t st_blocks; /* Number 512-byte blocks allocated. */
__kernel_ulong_t st_atime;
__kernel_ulong_t st_atime_nsec;
__kernel_ulong_t st_mtime;
__kernel_ulong_t st_mtime_nsec;
__kernel_ulong_t st_ctime;
__kernel_ulong_t st_ctime_nsec;
__kernel_long_t __unused[3];
};
See __pad0? int is generally 4 bytes, but st_rdev is long, which is 8 bytes, so it must be 8-byte aligned. However, it is preceded by 3 ints = 12 bytes, so a 4-byte __pad0 is added.
Essentially, the implementation of stdlib takes care to hard-code its ABI.
BUT that isn't true for all APIs. Here is struct flock (from the same machine, /usr/include/asm-generic/fcntl.h) used by the fcntl() call:
struct flock {
short l_type;
short l_whence;
__kernel_off_t l_start;
__kernel_off_t l_len;
__kernel_pid_t l_pid;
__ARCH_FLOCK_PAD
};
As you can see, there is no padding between l_whence and l_start. And indeed, for the following C program, saved as abi.c:
#include <fcntl.h>
#include <string.h>
int main(int argc, char **argv)
{
struct flock fl;
int fd;
fd = open("y", O_RDWR);
memset(&fl, 0xff, sizeof(fl));
fl.l_type = F_RDLCK;
fl.l_whence = SEEK_SET;
fl.l_start = 200;
fl.l_len = 1;
fcntl(fd, F_SETLK, &fl);
}
We get:
$ cc -g -o abi abi.c && strace -e fcntl ./abi
fcntl(3, F_SETLK, {l_type=F_RDLCK, l_whence=SEEK_SET, l_start=200, l_len=1}) = 0
+++ exited with 0 +++
$ cc -g -fpack-struct -o abi abi.c && strace -e fcntl ./abi
fcntl(3, F_SETLK, {l_type=F_RDLCK, l_whence=SEEK_SET, l_start=4294967296, l_len=-4294967296}) = 0
+++ exited with 0 +++
As you can see, the fields following l_whence are indeed garbage.
Moreover, C has no ABI, and so this fragile compatibility relies on implementation playing nice. struct stat above assumes that the compiler wouldn't insert extra random padding.
ANSI C says:
There may also be unnamed padding at the end of a structure or union, as necessary to achieve the appropriate alignment were the structure or union to be a member of an array.
There's no wording on how padding may be inserted in the middle of a struct for reasons other than alignment, however there's also:
Implementation-defined behavior
Each implementation shall document its behavior in each of the areas listed in this section. The following are implementation-defined:
...
The padding and alignment of members of structures. This should present no problem unless binary data written by one implementation are read by another.
On my Ubuntu machine, both the compiler and the standard library come from GCC, so they interoperate smoothly. Clang wants to grow, so it's compatible with GNU libc. Everyone is just playing nice, most of the time.
I have a piece of code looking like this:
void update_clock(uint8_t *time_array)
{
time_t time = *((time_t *) &time_array[0]); // <-- hangs
/* ... more code ... */
}
Where time_array is an array of 4 bytes (i.e. uint8_t time_array[4]).
I'm using arm-none-eabi-gcc to compile this for an STM32L4 processor.
While compiling this a couple of months ago I got no errors and the code is running perfectly fine on all my test MCUs. I did some updates to my environment (OpenSTM32) when coming back to this project and now this piece of code is crashing on some MCUs while working fine on others.
I still have my binary from a couple of months ago and have confirmed that this code path works fine on all of my MCUs (I have about 5 to test on), but now it works on two of them while causing a crash on three of them.
I have mitigated the problem by rewriting the code like this:
time_t time = (
((uint32_t) time_array[0]) << 0 |
((uint32_t) time_array[1]) << 8 |
((uint32_t) time_array[2]) << 16 |
((uint32_t) time_array[3]) << 24
);
While this works for now, I think the old code looks cleaner and I'm also worried that if this code path hangs I probably will have similar errors elsewhere.
Does anyone have any idea what can be causing this? Can I change anything in my setup to make the compiler work the old way again?
From version 7-2017-q4-major, arm gcc ships with newlib compiled with time_t defined as 64 bit (long long) integer, causing all sorts of problems with code that assumes it to be 32 bits. Your code is reading past the end of the source array, taking whatever is stored there as the high order bits of the time value, possibly resulting in a date before the big bang, or after the heat death of the universe, which might not be what your code expects.
If the source array is known to contain 32 bits of data, copy it to a 32 bit int32_t variable first, then you can assign it to a time_t, this way it will be properly converted, regardless of the size of time_t.
Your development environment OpenSTM32 may be using a gcc compiler. If so, gcc supports the following macro flag.
-fno-strict-aliasing
It you are using -O2, this flag might resolve your problem.
Using memcpy is the standard advice, and is sometimes optimized-away by the compiler:
memcpy(&time, time_array, sizeof time);
Finally, you can use gcc's typeof and a compound literal with a union to generate the following safe cast:
#define PUN_CAST4(a, x) ((union {uint8_t src[4]; typeof(x) dst;}){{a[0],a[1],a[2],a[3]}}).dst
time_t time = PUN_CAST4(time_array, time);
As an example, the following code is compiled at https://godbolt.org/g/eZRXxW:
#include <stdint.h>
#include <time.h>
#include <string.h>
time_t update_clock(uint8_t *time_array) {
time_t t = *((time_t *) &time_array[0]); // assumes no alignment problem
return t;
}
time_t update_clock2(uint8_t *time_array) {
time_t t =
(uint32_t)time_array[0] << 0 |
(uint32_t)time_array[1] << 8 |
(uint32_t)time_array[2] << 16 |
(uint32_t)time_array[3] << 24;
return t;
}
time_t update_clock3(uint8_t *time_array) {
time_t t;
memcpy(&t, time_array, sizeof t);
return t;
}
#define PUN_CAST4(a, x) ((union {uint8_t src[4]; typeof(x) dst;}){{a[0],a[1],a[2],a[3]}}).dst
time_t update_clock4(uint8_t *time_array) {
time_t t = PUN_CAST4(time_array, t);
return t;
}
gcc 8.1 is good for all four examples: it generates the trivial code with -O2. But gcc 7.3 is bad for the 4th. Clang is also good for all four with -m32 for a 32-bit target, but fails on the 2nd and 4th without it
Your issue is caused by unaligned access, or writing to the wrong area.
Compiling
#include "stdint.h"
#include "time.h"
time_t myTime;
void update_clock(uint8_t *time_array)
{
myTime = *((time_t *) &time_array[0]); // <-- hangs
/* ... more code ... */
}
with GCC 7.2.1 with the arguments -march=armv7-m -Os generates the following
update_clock(unsigned char*):
ldr r3, .L2
ldrd r0, [r0]
strd r0, [r3]
bx lr
.L2:
.word .LANCHOR0
myTime:
Because your time array is an 8 bit type there are no rules for alignment, so if the linker has not word aligned it, when you try and dereference it as a time_t * the LDRD instruction is given a non word aligned address and causes a usagefault.
The LDRD and STRD instructions are loading and storing 8 bytes, whereas your array is only 4 bytes long. I suggest you check sizeof(time_t) in your environment, and make an aligned area long enough to store it.
The codes:
#include <stdio.h>
#include <stdarg.h>
#include <stdlib.h>
typedef unsigned int uint32_t;
float average(int n_values, ... )
{
va_list var_arg;
int count;
float sum = 0;
va_start(var_arg, n_values);
for (count = 0; count < n_values; count += 1) {
sum += va_arg(var_arg, signed long long int);
}
va_end(var_arg);
return sum / n_values;
}
int main(int argc, char *argv[])
{
(void)argc;
(void)argv;
printf("hello world!\n");
uint32_t t1 = 1;
uint32_t t2 = 4;
uint32_t t3 = 4;
printf("result:%f\n", average(3, t1, t2, t3));
return 0;
}
When I run in ubuntu (x86_64), It's Ok.
lix#lix-VirtualBox:~/test/c$ ./a.out
hello world!
result:3.000000
lix#lix-VirtualBox:~/test/c$ uname -a
Linux lix-VirtualBox 4.4.0-116-generic #140-Ubuntu SMP Mon Feb 12 21:23:04 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
lix#lix-VirtualBox:~/test/c$
But when I cross-compiler and run it in openwrt(ARM 32bit), It's wrong.
[root#OneCloud_0723:/root/lx]#./helloworld
hello world!
result:13952062464.000000
[root#OneCloud_0723:/root/lx]#uname -a
Linux OneCloud_0723 3.10.33 #1 SMP PREEMPT Thu Nov 2 19:55:17 CST 2017 armv7l GNU/Linux
I know do not call va_arg with an argument of the incorrect type. But Why we can get right result in x86_64 not in arm?
Thank you.
On x86-64 Linux, each 32-bit arg is passed in a separate 64-bit register (because that's what the x86-64 System V calling convention requires).
The caller happens to have zero-extended the 32-bit arg into the 64-bit register. (This isn't required; the undefined behaviour in your program could bite you with a different caller that left high garbage in the arg-passing registers.)
The callee (average()) is looking for three 64-bit args, and looks in the same registers where the caller put them, so it happens to work.
On 32-bit ARM, long long is doesn't fit in a single register, so the callee looking for long long args is definitely looking in different places than where the caller placed uint32_t args.
The first 64-bit arg the callee sees is probably ((long long)t1<<32) | t2, or the other way around. But since the callee is looking for 6x 32 bits of args, it will be looking at registers / memory that the caller didn't intend as args at all.
(Note that this could cause corruption of the caller's locals on the stack, because the callee is allowed to clobber stack args.)
For the full details, look at the asm output of your code with your compiler + compile options to see what exactly what behaviour resulted from the C Undefined Behaviour in your source. objdump -d ./helloworld should do the trick, or look at compiler output directly: How to remove "noise" from GCC/clang assembly output?.
On my system (x86_64)
#include <stdio.h>
int main(void)
{
printf("%zu\n", sizeof(long long int));
return 0;
}
this prints 8, which tells me that long long int is 64bits wide, I don't know
the size of a long long int on arm.
Regardless your va_arg call is wrong, you have to use the correct type, in
this case uint32, so your function has undefined behaviour and happens to get
the correct values. average should look like this:
float average(int n_values, ... )
{
va_list var_arg;
int count;
float sum = 0;
va_start(var_arg, n_values);
for (count = 0; count < n_values; count += 1) {
sum += va_arg(var_arg, uint32_t);
}
va_end(var_arg);
return sum / n_values;
}
Also don't declare your uint32_t as
typedef unsigned int uint32_t;
this is not portable, because int is not guaranteed to be 4 bytes long across
all architectures. The Standard C Library actually declares this type in
stdint.h, you should use the thos types instead.
So you program should look like this:
#include <stdio.h>
#include <stdarg.h>
#include <stdlib.h>
#include <stdint.h>
float average(int n_values, ... )
{
va_list var_arg;
int count;
float sum = 0;
va_start(var_arg, n_values);
for (count = 0; count < n_values; count += 1) {
sum += va_arg(var_arg, uint32_t);
}
va_end(var_arg);
return sum / n_values;
}
int main(void)
{
printf("hello world!\n");
uint32_t t1 = 1;
uint32_t t2 = 4;
uint32_t t3 = 4;
printf("result:%f\n", average(3, t1, t2, t3));
return 0;
}
this is portable and should yield the same results across different
architectures.
In the book I am reading, Software Exorcism, has this example code for a buffer overflow:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define BUFFER_SIZE 4
void victim(char *str)
{
char buffer[BUFFER_SIZE];
strcpy(buffer,str);
return;
}
void redirected()
{
printf("\tYou've been redirected!\n");
exit(0);
return;
}
void main()
{
char buffer[]=
{
'1','2','3','4',
'5','6','7','8',
'\x0','\x0','\x0','\x0','\x0'
};
void *fptr;
unsigned long *lptr;
printf("buffer = %s\n", buffer);
fptr = redirected;
lptr = (unsigned long*)(&buffer[8]);
*lptr = (unsigned long)fptr;
printf("main()\n");
victim(buffer);
printf("main()\n");
return;
}
I can get this to work in Windows with Visual Studio 2010 by specifying
Basic Runtime Checks -> Uninitialized variables
Buffer Security Check -> No
With those compile options, I get this behavior when running:
buffer = 12345678
main()
You've been redirected!
My question is about the code not working on Linux. Is there any clear reason why it is so?
Some info on what I've tried:
I've tried to run this with 32-bit Ubuntu 12.04 (downloaded from here), with these options:
[09/01/2014 11:46] root#ubuntu:/home/seed# sysctl -w kernel.randomize_va_space=0
kernel.randomize_va_space = 0
Getting:
[09/01/2014 12:03] seed#ubuntu:~$ gcc -fno-stack-protector -z execstack -o overflow overflow.c
[09/01/2014 12:03] seed#ubuntu:~$ ./overflow
buffer = 12345678
main()
main()
Segmentation fault (core dumped)
And with 64-bit CentOS 6.0, with these options:
[root]# sysctl -w kernel.randomize_va_space=0
kernel.randomize_va_space = 0
[root]# sysctl -w kernel.exec-shield=0
kernel.exec-shield = 0
Getting:
[root]# gcc -fno-stack-protector -z execstack -o overflow overflow.c
[root]# ./overflow
buffer = 12345678
main()
main()
[root]#
Is there something fundamentally different in Linux environment, which would cause the example not working, or am I missing something simple here?
Note: I've been through the related questions such as this one and this one, but haven't been able to find anything that would help on this. I don't think this is a duplicate of previous questions even though there are a lot of them.
Your example overflows the stack, a small and predictable memory layout, in an attempt to modify the return address of the function void victim(), which would then point to void redirected() instead of coming back to main().
It works with Visual. But GCC is a different compiler, and can use some different stack allocation rule, making the exploit fail. C doesn't enforce a strict "stack memory layout", so compilers can make different choices.
A good way to view this hypothesis is to test your code using MinGW (aka GCC for Windows), proving the behavior difference is not related strictly to the OS.
#define BUFFER_SIZE 4
void victim(char *str)
{
char buffer[BUFFER_SIZE];
strcpy(buffer,str);
return;
}
There is another potential problem here if optimizations are enabled. buffer is 12 bytes, and its called as victim(buffer). Then, within victim, you try to copy 12 bytes into a 4 byte buffer with strcpy.
FORTIFY_SOURCES should cause the program to seg fault on the call to strcpy. If the compiler can deduce the destination buffer size (which it should in this case), then the compiler will replace strcpy with a "safer" version that includes the destination buffer size. If the bytes to copy exceeds the destination buffer size, then the "safer" strcpy will call abort().
To turn off FORTIFY_SOURCES, then compile with -U_FORTIFY_SOURCE or -D_FORTIFY_SOURCE=0.
Given the following program:
/* Find the sum of all the multiples of 3 or 5 below 1000. */
#include <stdio.h>
unsigned long int method_one(const unsigned long int n);
int
main(int argc, char *argv[])
{
unsigned long int sum = method_one(1000000000);
if (sum != 0) {
printf("Sum: %lu\n", sum);
} else {
printf("Error: Unsigned Integer Wrapping.\n");
}
return 0;
}
unsigned long int
method_one(const unsigned long int n)
{
unsigned long int i;
unsigned long int sum = 0;
for (i=1; i!=n; ++i) {
if (!(i % 3) || !(i % 5)) {
unsigned long int tmp_sum = sum;
sum += i;
if (sum < tmp_sum)
return 0;
}
}
return sum;
}
On a Mac OS system (Xcode 3.2.3) if I use cc for compilation using the -std=c99 flag everything seems just right:
nietzsche:problem_1 robert$ cc -std=c99 problem_1.c -o problem_1
nietzsche:problem_1 robert$ ./problem_1
Sum: 233333333166666668
However, if I use c99 to compile it this is what happens:
nietzsche:problem_1 robert$ c99 problem_1.c -o problem_1
nietzsche:problem_1 robert$ ./problem_1
Error: Unsigned Integer Wrapping.
Can you please explain this behavior?
c99 is a wrapper of gcc. It exists because POSIX requires it. c99 will generate a 32-bit (i386) binary by default.
cc is a symlink to gcc, so it takes whatever default configuration gcc has. gcc produces a binary in native architecture by default, which is x86_64.
unsigned long is 32-bit long on i386 on OS X, and 64-bit long on x86_64. Therefore, c99 will have a "Unsigned Integer Wrapping", which cc -std=c99 does not.
You could force c99 to generate a 64-bit binary on OS X by the -W 64 flag.
c99 -W 64 proble1.c -o problem_1
(Note: by gcc I mean the actual gcc binary like i686-apple-darwin10-gcc-4.2.1.)
Under Mac OS X, cc is symlink to gcc (defaults to 64 bit), and c99 is not (defaults to 32bit).
/usr/bin/cc -> gcc-4.2
And they use different default byte-sizes for data types.
/** sizeof.c
*/
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv)
{
printf("sizeof(unsigned long int)==%d\n", (int)sizeof(unsigned long int));
return EXIT_SUCCESS;
}
cc -std=c99 sizeof.c
./a.out
sizeof(unsigned long int)==8
c99 sizeof.c
./a.out
sizeof(unsigned long int)==4
Quite simply, you are overflowing (aka wrapping) your integer variable when using the c99 compiler.
.PMCD.