Memory alignment and padding — difference between 32 and 64 bits [duplicate] - c

This question already has answers here:
Why isn't sizeof for a struct equal to the sum of sizeof of each member?
(13 answers)
Closed 6 years ago.
I would like to understand the results got with "gcc -m32" and "gcc -m64" compilation on the following small code:
#include <stdio.h>
#include <stdlib.h>
int main() {
struct MixedData
{
char Data1;
short Data2;
int Data3;
char Data4;
};
struct X {
char c;
uint64_t x;
};
printf("size of struct MixedData = %zu\n", sizeof(struct MixedData));
printf("size of struct X = %zu\n", sizeof(struct X));
printf("size of uint64_t = %zu\n", sizeof(uint64_t));
return 0;
}
With "gcc -m32", the ouput is :
size of struct MixedData = 12
size of struct X = 12
size of uint64_t = 8
Is size of struct X equal to 12 because compiler sets the following padding?
struct X {
char c; // 1 byte
char d[3]; // 3 bytes
uint64_t x; // 8 bytes
};
If this is the case, what's the size of a single word with 32 bits compilation (4 bytes?)? If it is equal to 4 bytes, this would be consistent because 12 is a multiple of 4.
Now concerning the size of MixedData with "gcc -m32" compilation, I get "size of struct MixedData = 12". I don't understand this value because I saw that total size of a structure had to be a multiple of the biggest size attribute in this structure. For example, here into structure MixedData, the biggest attribute is int Data3 with sizeof(Data3) = 4 bytes; why don't we have rather the following padding:
struct MixedData
{
char Data1; // 1 byte
char Datatemp1[3]; // 3 bytes
short Data2; // 2 bytes
short Data2temp; // 2 bytes
int Data3; // 4 bytes
char Data4; // 1 byte
char Data4temp[3] // 3 bytes
};
So the total size of struct MixedData would be equal to 16 bytes and not 12 bytes like I get.
Can anyone see what's wrong about these 2 interpretations?
A similar issue is about "gcc -m64" compilation; the output is:
size of struct MixedData = 12
size of struct X = 16
size of uint64_t = 8
The size of struct X (16 bytes) seems to be consistent because I think that compiler in 64 bits mode sets the following padding:
struct X {
char c; // 1 byte
char d[7]; // 7 bytes
uint64_t x; // 8 bytes
};
But I don't understand the value of struct MixedData (12 bytes). Indeed, I don't know how compiler sets the padding in this case because 12 is not a multiple of memory word in 64 bits mode (supposing this one is equal to 8 bytes). Could you tell me the padding generated by "gcc -m64" in this last case (for struct MixedData) ?

This one is a curiosity
struct
{
char Data1;
short Data2;
int Data3;
char Data4;
} x;
unsigned fun ( void )
{
x.Data1=1;
x.Data2=2;
x.Data3=3;
x.Data4=4;
return(sizeof(x));
}
Compiling then disassembling
64
0000000000000000 <fun>:
0: 55 push %rbp
1: 48 89 e5 mov %rsp,%rbp
4: c6 05 00 00 00 00 01 movb $0x1,0x0(%rip) # b <fun+0xb>
b: 66 c7 05 00 00 00 00 movw $0x2,0x0(%rip) # 14 <fun+0x14>
12: 02 00
14: c7 05 00 00 00 00 03 movl $0x3,0x0(%rip) # 1e <fun+0x1e>
1b: 00 00 00
1e: c6 05 00 00 00 00 04 movb $0x4,0x0(%rip) # 25 <fun+0x25>
25: b8 0c 00 00 00 mov $0xc,%eax
2a: 5d pop %rbp
2b: c3 retq
32
00000000 <fun>:
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: c6 05 00 00 00 00 01 movb $0x1,0x0
a: 66 c7 05 02 00 00 00 movw $0x2,0x2
11: 02 00
13: c7 05 04 00 00 00 03 movl $0x3,0x4
1a: 00 00 00
1d: c6 05 08 00 00 00 04 movb $0x4,0x8
24: b8 0c 00 00 00 mov $0xc,%eax
29: 5d pop %ebp
2a: c3 ret
Understand that the m32 and m64 are perhaps poorly described, one is basically the 32 bit processor, 32 bit registers (ebx, eax, ax, ah but not rbx,rax) and the other 64 bit processor with 64 bit registers (rbx,ebx,bx,bh,bl)
There doesnt have to be a connection between the size of or construction of structs vs the instruction set chosen.
the interesting thing here is the size of the struct 1+2+4+1 = 8 so they could have done it in 8 bytes. Now they probably wanted the int aligned, so that would pad it by a byte, and perhaps they wanted the whole thing aligned on a 32 bit boundary adding 3 more so that is probably what happened. The 32 bit code does make this a bit clear, no only did they align the int they also aligned the short. So they pad between Data1 and Data2 to align Data2 on a 16 bit boundary then that makes Data3 aligned on a 32 bit boundary and Data3 is a byte so cant be unaligned. Pad the end to aligned the next thing in .data.
The 64 bit code looks broken, perhaps they want the linker to patch that one up.
00000000004004d6 <fun>:
4004d6: 55 push %rbp
4004d7: 48 89 e5 mov %rsp,%rbp
4004da: c6 05 57 0b 20 00 01 movb $0x1,0x200b57(%rip) # 601038 <x>
4004e1: 66 c7 05 50 0b 20 00 movw $0x2,0x200b50(%rip) # 60103a <x+0x2>
4004e8: 02 00
4004ea: c7 05 48 0b 20 00 03 movl $0x3,0x200b48(%rip) # 60103c <x+0x4>
4004f1: 00 00 00
4004f4: c6 05 45 0b 20 00 04 movb $0x4,0x200b45(%rip) # 601040 <x+0x8>
4004fb: b8 0c 00 00 00 mov $0xc,%eax
400500: 5d pop %rbp
400501: c3 retq
ahh, I see yes that is what they were doing. And that is what they did align both Data2 and Data3. I guess I should have made it generate the address to the whole struct...
struct
{
char Data1;
short Data2;
int Data3;
char Data4;
} x;
unsigned fun ( void )
{
unsigned long long z;
z=(unsigned long long)&x;
x.Data1=1;
x.Data2=2;
x.Data3=3;
x.Data4=4;
return(sizeof(x));
}
int main ( void )
{
fun();
}
producing
00000000004004d6 <fun>:
4004d6: 55 push %rbp
4004d7: 48 89 e5 mov %rsp,%rbp
4004da: 48 c7 45 f8 38 10 60 movq $0x601038,-0x8(%rbp)
4004e1: 00
4004e2: c6 05 4f 0b 20 00 01 movb $0x1,0x200b4f(%rip) # 601038 <x>
4004e9: 66 c7 05 48 0b 20 00 movw $0x2,0x200b48(%rip) # 60103a <x+0x2>
4004f0: 02 00
4004f2: c7 05 40 0b 20 00 03 movl $0x3,0x200b40(%rip) # 60103c <x+0x4>
4004f9: 00 00 00
4004fc: c6 05 3d 0b 20 00 04 movb $0x4,0x200b3d(%rip) # 601040 <x+0x8>
400503: b8 0c 00 00 00 mov $0xc,%eax
400508: 5d pop %rbp
400509: c3 retq
confirming the base address 0x60138.
The struct is not tied to the instruction set. Change to this
struct
{
char Data1;
short Data2;
int Data3;
char Data4;
} __attribute__((packed)) x;
unsigned fun ( void )
{
unsigned long long z;
z=(unsigned long long)&x;
x.Data1=1;
x.Data2=2;
x.Data3=3;
x.Data4=4;
return(sizeof(x));
}
int main ( void )
{
fun();
}
and we get this
00000000004004d6 <fun>:
4004d6: 55 push %rbp
4004d7: 48 89 e5 mov %rsp,%rbp
4004da: 48 c7 45 f8 38 10 60 movq $0x601038,-0x8(%rbp)
4004e1: 00
4004e2: c6 05 4f 0b 20 00 01 movb $0x1,0x200b4f(%rip) # 601038 <x>
4004e9: 66 c7 05 47 0b 20 00 movw $0x2,0x200b47(%rip) # 601039 <x+0x1>
4004f0: 02 00
4004f2: c7 05 3f 0b 20 00 03 movl $0x3,0x200b3f(%rip) # 60103b <x+0x3>
4004f9: 00 00 00
4004fc: c6 05 3c 0b 20 00 04 movb $0x4,0x200b3c(%rip) # 60103f <x+0x7>
400503: b8 08 00 00 00 mov $0x8,%eax
400508: 5d pop %rbp
400509: c3 retq
the size of the struct is now 8 bytes, and they generated unaligned accesses.

Related

open syscall mode argument number confusion

On x64, ubuntu 20 machine, I wrote a simple C program
#include<stdio.h>
#include<fcntl.h>
int main()
{
// assume that foo.txt is already created
int fd1 = open("foo.txt", O_CREAT | O_RDONLY, 0770);
close(fd1);
exit(0);
}
I am trying to understand the hex value generated by the 0770 file mode argument. So I objdump the binary and got the below:
0000000000001189 <main>:
1189: f3 0f 1e fa endbr64
118d: 55 push rbp
118e: 48 89 e5 mov rbp,rsp
1191: 48 83 ec 10 sub rsp,0x10
1195: ba f8 01 00 00 mov edx,0x1f8
119a: be 40 00 00 00 mov esi,0x40
119f: 48 8d 3d 5e 0e 00 00 lea rdi,[rip+0xe5e] # 2004 <_IO_stdin_used+0x4>
11a6: b8 00 00 00 00 mov eax,0x0
11ab: e8 d0 fe ff ff call 1080 <open#plt>
It is clear that 0x1f8 is the mode argument. However, it corresponds to 504 in decimal.
How did 0770 convert to 504 (or 0x1f8 in hex)?
How did 0770 convert to 504
Just like you would to convert any other octal number, multiply each digit by its corresponding power of 8:
0770(8) = 7 * 8^2 + 7 * 8^1 + 0 * 8^0 = 7*64 + 7*8 = 504(10)

Variable length bitfield

tl;dr: How can I properly generate a variable length bitfield in C?
I have a structure containing a variable number of elements for instance:
struct s{
int v1;
int v2;
/* ... */
int vn;
}
I'd like to create an anonymous bitfield of the size of the struct. Since i want low coupling between my two structs I do like this:
struct x{
unsigned long a:sz(struct s);
};
This code works but leads to issues:
What can I do if the struct has a size above 64 bits?
I can use several ints to store my bitfield but that gets messy and overkill.
I could also use a pointer to a void* but I suspect performance loss
By design my bitfield will generate AND for each operation over it deteriorate performances as seen in the following code
int main() {
560: 41 54 push %r12
}
__fortify_function int
printf (const char *__restrict __fmt, ...)
{
return __printf_chk (__USE_FORTIFY_LEVEL - 1, __fmt, __va_arg_pack ());
562: 4c 8d 25 cb 01 00 00 lea 0x1cb(%rip),%r12 # 734 <_IO_stdin_used+0x4>
569: 55 push %rbp
56a: bd 64 00 00 00 mov $0x64,%ebp
56f: 53 push %rbx
570: bb 04 00 00 00 mov $0x4,%ebx
575: 0f 1f 00 nopl (%rax)
struct x x = { .a = 3, };
for(int i=0;i <100; ++i){
++x.a;
printf("%u\n", x.a);
578: 0f b6 d3 movzbl %bl,%edx
57b: 31 c0 xor %eax,%eax
57d: 83 c3 01 add $0x1,%ebx
580: 4c 89 e6 mov %r12,%rsi
583: bf 01 00 00 00 mov $0x1,%edi
588: 83 e3 0f and $0xf,%ebx
58b: e8 b0 ff ff ff callq 540 <__printf_chk#plt>
for(int i=0;i <100; ++i){
590: 83 ed 01 sub $0x1,%ebp
593: 75 e3 jne 578 <main+0x18>
}
return 0;
}
595: 5b pop %rbx
596: 31 c0 xor %eax,%eax
598: 5d pop %rbp
599: 41 5c pop %r12
59b: c3 retq
59c: 0f 1f 40 00 nopl 0x0(%rax)
Ideally I'd like to do something similar to the following pseudo-code (but this is obviously invalid in C).
#define varint(sz) (sz > 32) ? \
((sz>64) ? "#error size too huge" : int64_t) : \
((sz>16) ? int32_t: int16_t)
varint(sizeof(s)/sizeof(int)) mybitfield = 0;
How can I properly generate a variable length bitfield in C?

LD_PRELOAD a function with enums and struct

I am trying to LD_PRELOAD a function with declaration like
// header1.h
typedef enum { ... } enum1;
// header2.h
typedef enum { ... } enum2;
typedef struct { ... } Structure1;
enum1 foo(Structure1* str, enum2 val);
Is it possible to use say unsiged int instead of the enums and void* instead of the Structure1*.
I tried a simple code like this, but doesn't seem to work. Would it be because of type mismatch?
#define _GNU_SOURCE
#include <stdio.h>
#include <stdarg.h>
#include <dlfcn.h>
typedef unsigned int (*foo_t)(void* ptr, unsigned int e2);
unsigned int foo(void* handle, unsigned int e2)
{
printf ("foo\n");
foo_t foo_f = (foo_t) dlsym(RTLD_NEXT, "foo");
unsigned int result = foo_f(ptr, option);
return result;
}
EDIT :
To get to the actual use case,
I am trying to load
CURLcode Curl_setopt(struct Curl_easy *data, CURLoption option,
va_list param)
from here https://github.com/curl/curl/blob/curl-7_55_1/lib/url.c
but when i do nm, it doesnt seem to find this function
$ nm -D /usr/lib/x86_64-linux-gnu/libcurl.so.4.4.0 | grep setopt
000000000002fc80 T curl_easy_setopt
0000000000037ac0 T curl_multi_setopt
000000000003ad60 T curl_share_setopt
I tried objdump of curl_easy_setopt which calls Curl_setopt, but there is no call to Curl_setopt here
objdump -D -S -C /usr/lib/x86_64-linux-gnu/libcurl.so.4.4.0 --start-address 0x02fc80 --stop-address 0x02fd36
/usr/lib/x86_64-linux-gnu/libcurl.so.4.4.0: file format elf64-x86-64
Disassembly of section .text:
000000000002fc80 <curl_easy_setopt##CURL_OPENSSL_3>:
2fc80: 48 81 ec d8 00 00 00 sub $0xd8,%rsp
2fc87: 84 c0 test %al,%al
2fc89: 48 89 54 24 30 mov %rdx,0x30(%rsp)
2fc8e: 48 89 4c 24 38 mov %rcx,0x38(%rsp)
2fc93: 4c 89 44 24 40 mov %r8,0x40(%rsp)
2fc98: 4c 89 4c 24 48 mov %r9,0x48(%rsp)
2fc9d: 74 37 je 2fcd6 <curl_easy_setopt##CURL_OPENSSL_3+0x56>
2fc9f: 0f 29 44 24 50 movaps %xmm0,0x50(%rsp)
2fca4: 0f 29 4c 24 60 movaps %xmm1,0x60(%rsp)
2fca9: 0f 29 54 24 70 movaps %xmm2,0x70(%rsp)
2fcae: 0f 29 9c 24 80 00 00 movaps %xmm3,0x80(%rsp)
2fcb5: 00
2fcb6: 0f 29 a4 24 90 00 00 movaps %xmm4,0x90(%rsp)
2fcbd: 00
2fcbe: 0f 29 ac 24 a0 00 00 movaps %xmm5,0xa0(%rsp)
2fcc5: 00
2fcc6: 0f 29 b4 24 b0 00 00 movaps %xmm6,0xb0(%rsp)
2fccd: 00
2fcce: 0f 29 bc 24 c0 00 00 movaps %xmm7,0xc0(%rsp)
2fcd5: 00
2fcd6: 64 48 8b 04 25 28 00 mov %fs:0x28,%rax
2fcdd: 00 00
2fcdf: 48 89 44 24 18 mov %rax,0x18(%rsp)
2fce4: 31 c0 xor %eax,%eax
2fce6: 48 85 ff test %rdi,%rdi
2fce9: b8 2b 00 00 00 mov $0x2b,%eax
2fcee: 74 2e je 2fd1e <curl_easy_setopt##CURL_OPENSSL_3+0x9e>
2fcf0: 48 8d 84 24 e0 00 00 lea 0xe0(%rsp),%rax
2fcf7: 00
2fcf8: 48 89 e2 mov %rsp,%rdx
2fcfb: c7 04 24 10 00 00 00 movl $0x10,(%rsp)
2fd02: c7 44 24 04 30 00 00 movl $0x30,0x4(%rsp)
2fd09: 00
2fd0a: 48 89 44 24 08 mov %rax,0x8(%rsp)
2fd0f: 48 8d 44 24 20 lea 0x20(%rsp),%rax
2fd14: 48 89 44 24 10 mov %rax,0x10(%rsp)
2fd19: e8 e2 e9 fe ff callq 1e700 <curl_formget##CURL_OPENSSL_3+0xf2e0>
2fd1e: 48 8b 4c 24 18 mov 0x18(%rsp),%rcx
2fd23: 64 48 33 0c 25 28 00 xor %fs:0x28,%rcx
2fd2a: 00 00
2fd2c: 75 08 jne 2fd36 <curl_easy_setopt##CURL_OPENSSL_3+0xb6>
2fd2e: 48 81 c4 d8 00 00 00 add $0xd8,%rsp
2fd35: c3 retq
Curl_setopt() is not an externally provided symbol so you can't LD_PRELOAD it. Consider replacing curl_easy_setopt instead, which is the public and always accessible symbol.
As a second reason, the function Curl_setopt() doesn't even exist in more recent libcurls.

compiler over-optimization causing data run time and debugging inconsistency

I have the following code:
struct cre_eqEntry *
cre_eventGet(struct cre_eqObj *eq_obj)
{
struct cre_eqEntry *eqe = cre_queueTailNode(&eq_obj->q);
Memcpy(&tmpEqo, eq_obj, sizeof(struct cre_eqObj));
volatile u32 ddd = 0;
ddd = ((struct cre_eqEntry *)(eq_obj->q.dma_mem.virtaddr + 4 * eq_obj->q.tail))->evt;
CPUMemFenceReadWrite();
if (!ddd) {
tmp = eq_obj->q.tail;
assert(0);
return NULL;
}
}
It is a piece of kernel code. When I ran it, it fails at assert(0). So apparently ddd should be 0. But when I used GDB to debug the core dump and printed out '((struct cre_eqEntry *)(eq_obj->q.dma_mem.virtaddr + 4 * eq_obj->q.tail))->evt', surprisingly, the value is not 0.
So I start suspecting it is the problem of compiler over-optimization. Here's the disassembly code:
00000000000047ec <cre_eventGet>:
47ec: 55 push %rbp
47ed: 48 89 fe mov %rdi,%rsi
47f0: ba 80 00 00 00 mov $0x80,%edx
47f5: 53 push %rbx
47f6: 48 89 fb mov %rdi,%rbx
47f9: 48 83 ec 18 sub $0x18,%rsp
47fd: 0f b7 6f 24 movzwl 0x24(%rdi),%ebp
4801: 0f b7 47 28 movzwl 0x28(%rdi),%eax
4805: 0f af e8 imul %eax,%ebp
4808: 48 63 ed movslq %ebp,%rbp
480b: 48 03 6f 18 add 0x18(%rdi),%rbp
480f: 48 8d 3d 00 00 00 00 lea 0x0(%rip),%rdi # 4816 <cre_eventGet+0x2a>
4816: e8 00 00 00 00 callq 481b <cre_eventGet+0x2f>
481b: 0f b7 43 28 movzwl 0x28(%rbx),%eax
481f: 48 8b 53 18 mov 0x18(%rbx),%rdx
4823: c7 44 24 0c 00 00 00 movl $0x0,0xc(%rsp)
482a: 00
482b: c1 e0 02 shl $0x2,%eax
482e: 48 98 cltq
4830: 8b 04 02 mov (%rdx,%rax,1),%eax
4833: 89 44 24 0c mov %eax,0xc(%rsp)
4837: 0f ae f0 mfence
483a: 8b 44 24 0c mov 0xc(%rsp),%eax
483e: 85 c0 test %eax,%eax
4840: 74 14 je 4856 <cre_eventGet+0x6a>
As far as I can see, the assembly code does the same thing as the C code.
So now I ran out of ideas what is causing the problem of inconsistency of 'ddd'.
Please kindly give me some hints!
ddd = ((struct cre_eqEntry *)(eq_obj->q.dma_mem.virtaddr + 4 * eq_obj->q.tail))->evt;
Simplify your code. Perform address/boundary checks/validation. Your problem is likely that you are de-referencing some random, uninitialized, address within your process/thread's address space.
ddd = ((struct cre_eqEntry *)(eq_obj->q.dma_mem.virtaddr + 4 * eq_obj->q.tail))->evt; probably violates the strict aliasing rule (can't say 100% for sure without seeing the whole code).
If using gcc/clang, compile with -fno-strict-aliasing unless you want to rewrite your code to comply with the standard.
To do the latter, memcpy((u32 *)&ddd, &(struct cre_eqEntry *)(eq_obj->q.dma_mem.virtaddr + 4 * eq_obj->q.tail)->evt, sizeof ddd); but I guess your codebase may have similar violations in many places, so as a first step, using the compiler flag would be a way to see if this really is the problem.
The magic number 4 is suspicious too, review your code to check if this really is the correct offset and also check that it is not out of bounds of allocated memory.

Is GCC's option -O2 breaking this small program or do I have undefined behavior [duplicate]

This question already has answers here:
Decrementing a pointer out of bounds; incrementing it into bounds [duplicate]
(3 answers)
Why is out-of-bounds pointer arithmetic undefined behaviour?
(7 answers)
Closed 8 years ago.
I found this problem in a very large application, have made an SSCCE from it. I don't know whether the code has undefined behavior or -O2 breaks it.
When compiling it with gcc a.c -o a.exe -O2 -Wall -Wextra -Werror it prints 5.
But it prints 25 when compiling without -O2 (eg -O1) or uncommenting one of the 2 commented lines (prevent inlining).
#include <stdio.h>
#include <stdlib.h>
// __attribute__((noinline))
int f(int* todos, int input) {
int* cur = todos-1; // fixes the ++ at the beginning of the loop
int result = input;
while(1) {
cur++;
int ch = *cur;
// printf("(%i)\n", ch);
switch(ch) {
case 0:;
goto end;
case 1:;
result = result*result;
break;
}
}
end:
return result;
}
int main() {
int todos[] = { 1, 0}; // 1:square, 0:end
int input = 5;
int result = f(todos, input);
printf("=%i\n", result);
printf("end\n");
return 0;
}
Is GCC's option -O2 breaking this small program or do I have undefined behavior somewhere?
int* cur = todos-1;
invokes undefined behavior. todos - 1 is an invalid pointer address.
Emphasis mine:
(C99, 6.5.6p8) "If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined."
In supplement to #ouah's answer, this explains what the compiler is doing.
Generated assembler for reference:
400450: 48 83 ec 18 sub $0x18,%rsp
400454: be 05 00 00 00 mov $0x5,%esi
400459: 48 8d 44 24 fc lea -0x4(%rsp),%rax
40045e: c7 44 24 04 00 00 00 movl $0x0,0x4(%rsp)
400465: 00
400466: 48 83 c0 04 add $0x4,%rax
40046a: 8b 10 mov (%rax),%edx
However if I add a printf in main():
400450: 48 83 ec 18 sub $0x18,%rsp
400454: bf 84 06 40 00 mov $0x400684,%edi
400459: 31 c0 xor %eax,%eax
40045b: 48 89 e6 mov %rsp,%rsi
40045e: c7 04 24 01 00 00 00 movl $0x1,(%rsp)
400465: c7 44 24 04 00 00 00 movl $0x0,0x4(%rsp)
40046c: 00
40046d: e8 ae ff ff ff callq 400420 <printf#plt>
400472: 48 8d 44 24 fc lea -0x4(%rsp),%rax
400477: be 05 00 00 00 mov $0x5,%esi
40047c: 48 83 c0 04 add $0x4,%rax
400480: 8b 10 mov (%rax),%edx
Specifically (in the printf version), these two instructions populate the todo array
40045e: c7 04 24 01 00 00 00 movl $0x1,(%rsp)
400465: c7 44 24 04 00 00 00 movl $0x0,0x4(%rsp)
This is conspicuously missing from the non-printf version, which for some reason only assigns the second element:
40045e: c7 44 24 04 00 00 00 movl $0x0,0x4(%rsp)

Resources