GCC vs Clang copying struct flexible array member - c

Consider the following code snippet.
#include <stdio.h>
typedef struct s {
int _;
char str[];
} s;
s first = { 0, "abcd" };
int main(int argc, const char **argv) {
s second = first;
printf("%s\n%s\n", first.str, second.str);
}
When I compile this with GCC 7.2, I get:
$ gcc-7 -o tmp tmp.c && ./tmp
abcd
abcd
But when I compile this with Clang (Apple LLVM version 8.0.0 (clang-800.0.42.1)), I get the following:
$ clang -o tmp tmp.c && ./tmp
abcd
# Nothing here
Why does the output differ between the compilers? I would expect the string not to be copied, as it's a flexible array member (similar to this question). Why does GCC actually copy it?
Edit
Some comments and an answer suggested this might be due to optimization. GCC may make second an alias of first, so updating second should disallow GCC from doing that optimization. I added the line:
second._ = 1;
But this doesn't change the output.

Here's the real answer of what's going on with gcc. second is allocated on the stack, just as you'd expect. It is not an alias for first. This is easily verified by printing their addresses.
Additionally, the declaration s second = first; is corrupting the stack, because (a) gcc is allocating the minimum amount of storage for second but (b) it is copying all of first into second, corrupting the stack.
Here is a modified version of the original code which shows this:
#include <stdio.h>
typedef struct s {
int _;
char str[];
} s;
s first = { 0, "abcdefgh" };
int main(int argc, const char **argv) {
char v[] = "xxxxxxxx";
s second = first;
printf("%p %p %p\n", (void *) v, (void *) &first, (void *) &second);
printf("<%s> <%s> <%s>\n", v, first.str, second.str);
}
On my 32-bit Linux machine, with gcc, I get the following output:
0xbf89a303 0x804a020 0xbf89a2fc
<defgh> <abcdefgh> <abcdefgh>
As you can see from the addresses, v and second are on the stack, and first is in the data section. Further, it is also clear that the initialization of second has overwritten v on the stack, with the result that instead of the expected <xxxxxxxx>, it is instead showing <defgh>.
This seems like a gcc bug to me. At the very least, it should warn that the initialization of second will corrupt the stack, since it clearly has enough information to know this at compile time.
Edit: I tested this some more, and obtained essentially equivalent results by splitting the declaration of second into:
s second;
second = first;
The real problem is the assignment. It's copying all of first, rather than the minimal common part of the structure type, which is what I believe it should do. In fact, if you move the static initialization of first into a separate file, the assignment does what it should do, v prints correctly, and second.str is undefined garbage. This is the behavior gcc should be producing, regardless of whether the initialization of first is visible in the same compilation unit or not.

So, for an answer, both compilers are behaving correctly, but the answers you are getting are undefined behavior.
GCC
Because you never modify second GCC is simply making second and alias of first in its lookup table. Modify second and GCC cannot make that optimization and you’ll get the same answer/crash as Clang.
Clang
Clang does not automatically apply the same optimization, it seems. So when it copies the structure, it does so correctly: It copies the single int and nothing else.
You were lucky that there was a zero value on the stack after your local second variable, terminating your unknown character string. Basically, you are using an uninitialized pointer. Were there no zero, you could have gotten a lot of garbage and a memory fault.
The purpose of this thing is to do low-level stuff, like implement a memory manager, etc, by casting some memory to your structure. The compiler is under no obligation to understand what you are doing; it is only under obligation to act as if you know what you are doing. If you fail to cast the structure type over memory that actually has data of that type in it, all bets are off.
edit
So, using godbolt.org and looking at the assembly:
.LC0:
.string "%s\n%s\n"
main:
sub rsp, 24
mov eax, DWORD PTR first[rip]
mov esi, OFFSET FLAT:first+4
lea rdx, [rsp+16]
mov edi, OFFSET FLAT:.LC0
mov DWORD PTR [rsp+12], eax
xor eax, eax
call printf
xor eax, eax
add rsp, 24
ret
first:
.long 0
.string "abcd"
We see that GCC is, actually, doing exactly what I said with the OP’s original code: treating second as an alias of first.
Tom Karzes has significantly modified the code, and so is experiencing a different issue. What he reports does appear to be a bug; I haven’t time ATM to figure out what is really happening with his stack-corrupting assignment.

Related

GCC produces code with Unaligned Address Error on m68k when defining a local string

I am declaring a string that is 5 bytes long (including the null terminator) in a function:
int main(){
char fstring[] = "AAAA";
...
Which generates the following asm code:
main:
link.w %fp,#-16
move.l #1094795585,-13(%fp)
clr.b -9(%fp)
...
move.l or move.w to an odd address causes an Address Error
exception on 68k.
Is there anyway to force the compiler to emit correct code?
Since you're compiling for the original 68000, this is definitely a bug. Writing a word or longword to an odd memory address is always an alignment fault and the compiler should be smart enough to know this.

C return address of stack variable = NULL?

In C when you have a function that returns a pointer to one of it's local (on the stack) variables the calling function gets null returned instead. Why does that happen?
I can do this in C on my hardware
void A() {
int A = 5;
}
void B() {
// B will be 5 even when uninitialised due to the B stack frame using
// the old memory layout of A
int B;
printf("%d\n", B);
}
int main() {
A();
B();
}
Due to the fact that the stack frame memory doesn't get reset and B overlays A's memory record in the stack.
However I can't do
int* C() {
int C = 10;
return &C;
}
int main() {
// D will be null ?
int* D = C();
}
I know I shouldn't do this code, it's UB, is different on different hardware, compilers could optimize it to change the behaviour of the example, and it will get clobbered when we next call another function in this example anyway.
But I was wondering why specifically D is null when compiled with GCC and why I get a segmentation fault if I try and access that memory address, shouldn't the bits still be there?
Is it the compiler doing this?
GCC sees the undefined behaviour (UB) visible at compile time and decides to just return NULL on purpose. This is good: noisy failure right away on first use of a value is easier to debug. Returning NULL was a new feature somewhere around GCC5; as #P__J__'s answer shows on Godbolt, GCC4.9 prints non-null stack addresses.
Other compilers may behave differently, but any decent compile will warn about this error. See also What Every C Programmer Should Know About Undefined Behavior
Or with optimization disabled, you could use a tmp variable to hide the UB from the compiler. Like int *p = &C; return p; because gcc -O0 doesn't optimize across statements. (Or with optimization enabled, make that pointer variable volatile to launder a value through it, hiding the source of the pointer value from the optimizer.)
#include <stdio.h>
int* C() {
int C = 10;
int *volatile p = &C; // volatile pointer to plain int
return p; // still UB, but hidden from the compiler
}
int main()
{
int* D = C();
printf("%p\n", (void *)D);
if (D){
printf("%#x\n", *D); // in theory should be passing an unsigned int for %x
}
}
Compiling and running on the Godbolt compiler explorer, with gcc10.1 -O3 for x86-64:
0x7ffcdbf188e4
0x7ffc
Interestingly, the dead store to int C optimized away, although it does still have an address. It has its address taken, but the var holding the address doesn't escape the function until int C goes out of scope at the same time that address is returned. Thus no well-defined accesses to the 10 value are possible, and it is valid for the compiler to make this optimization. Making int C volatile as well would give us the value.
The asm for C() is:
C:
lea rax, [rsp-12] # address in the red-zone, below RSP
mov QWORD PTR [rsp-8], rax # store to a volatile local var, also in the red zone
mov rax, QWORD PTR [rsp-8] # reload it as return value
ret
The version that actually runs is inlined into main and behaves similarly. It's loading some garbage value from the callstack that was left there, probably the top half of an address. (x86-64's 64-bit addresses only have 48 significant bits. The low half of the canonical range always has 16 leading zero bits).
But it's memory that wasn't written by main, so perhaps an address used by some function that ran before main.
// B will be 5 even when uninitialised due to the B stack frame using
// the old memory layout of A
int B;
Nothing about that is guaranteed. It's just luck that that happens to work out when optimization is disabled. With a normal level of optimization like -O2, reading an uninitialized variable might just read as 0 if the compiler can see that at compile time. Definitely no need for it to load from the stack.
And the other function would have optimized away a dead store.
GCC also warns for use-uninitialized.
It is an undefined behaviour (UB) but many modern compilers when they detect it return the reference to the automatic storage variable return NULL as a precaution (for example newer versions of gcc).
example here:
https://godbolt.org/z/H-zU4C

How to dereference zero address with GCC? [duplicate]

This question already has answers here:
C standard compliant way to access null pointer address?
(5 answers)
Closed 7 years ago.
Suppose I need to write to zero address (e.g. I've mmapped something there and want to access it, for whatever reason including curiosity), and the address is known at compile time. Here're some variants I could think of to obtain the pointer, one of these works and another three don't:
#include <stdint.h>
void testNullPointer()
{
// Obviously UB
unsigned* p=0;
*p=0;
}
void testAddressZero()
{
// doesn't work for zero, GCC detects it as NULL
uintptr_t x=0;
unsigned* p=(unsigned*)x;
*p=0;
}
void testTrickyAddressZero()
{
// works, but the resulting assembly is not as terse as it could be
unsigned* p;
asm("xor %0,%0\n":"=r"(p));
*p=0;
}
void testVolatileAddressZero()
{
// p is updated, but the code doesn't actually work
unsigned*volatile p=0;
*p=0; // because this doesn't dereference p! // EDIT: pointee should also be volatile, then this will work
}
I compile this with
gcc test.c -masm=intel -O3 -c -o test.o
and then objdump -d test.o -M intel --no-show-raw-insn gives me (alignment bytes are skipped here):
00000000 <testNullPointer>:
0: mov DWORD PTR ds:0x0,0x0
a: ud2a
00000010 <testAddressZero>:
10: mov DWORD PTR ds:0x0,0x0
1a: ud2a
00000020 <testTrickyAddressZero>:
20: xor eax,eax
22: mov DWORD PTR [eax],0x0
28: ret
00000030 <testVolatileAddressZero>:
30: sub esp,0x10
33: mov DWORD PTR [esp+0xc],0x0
3b: mov eax,DWORD PTR [esp+0xc]
3f: add esp,0x10
42: ret
Here the testNullPointer obviously has UB since it dereferences what is null pointer by definition.
The principle of testAddressZero would give the expected code for any other than 0 address, e.g. 1, but for zero GCC appears to detect that address zero corresponds to null pointer, so also generates UD2.
The asm way of getting the zero address certainly inhibits the compiler's checks, but the price of that is that one has to write different assembly code for each architecture even if the principle of testAddressZero might have been successful (i.e. the same flat memory model on each arch) if not UD2 and similar traps. Also, the code appears not as terse as in the above two variants.
The way of volatile pointer would seem to be the best, but the code generated here appears to not dereference the address for some reason, so it's also broken.
The question now: if I'm targeting GCC, how can I seamlessly access zero address without any traps or other consequences of UB, and without the need to write in assembly?
As a workaround you can use the GCC option -fno-delete-null-pointer-checks that refrain the compiler to actively check for null pointer dereferencing.
While this option is intended to be used to speed-up code optimization it can be used in specific cases as this.
I would put the pointer into a global variable:
const uintptr_t zero = 0;
unsigned* zeroAddress= (unsigned *)zero;
void testZeroAddressPointer()
{
*zeroAddress=0;
}
Provided you expose the address beyond the scope of optimization (so the compiler can't figure out you don't set it somewhere else), that should do the trick, albeit slightly less efficiently.
Edit: make this code independent of implicit zero to null conversion.
The 0 address is the C99 NULL pointer (actually the "implementation" of the null pointer, which you can often write as 0....) on all the architectures I know about.
The null pointer has a very specific status in hosted C99: when a pointer can be (or was) dereferenced, it is guaranteed (by the language specification) to not be NULL (otherwise, it is undefined behavior).
Hence, the GCC compiler has the right to optimize (and actually will optimize)
int *p = something();
int x = *p;
/// the compiler is permitted to skip the following
/// because p has been dereferenced so cannot be NULL
if (p == NULL) { doit(); return; };
In your case, you might want to compile for the freestanding subset of the C99 standard. So compile with gcc -ffreestanding (beware, this option can bring some infelicities).
BTW, you might declare some extern char strange[] __attribute__((weak)); (perhaps even add asm("0") ...) and have some assembler or linker trick to make that strange have a 0 address. The compiler would not know that such a strange symbol is in fact at the 0 address...
My strong suggestion is to avoid dereferencing the 0 address.... See this. If you really need to deference the address 0, be prepared to suffer.... (so code some asm, lower the optimization, etc...).
(If you have mmap-ed the first page, just avoid using its first byte at address 0; that is often not a big deal.)
(IIRC, you are touching a grey area of GCC optimizations - and perhaps even of the C99 language specification, and you certainly want the free standing flavor of C; notice that -O3 optimization for free standing C is not well tested in the GCC compiler and might have residual bugs....)
You could consider changing the GCC compiler so that the null pointer has the numerical address 42. That would take some work.

C assembler function casting

I came across this piece of code (for the whole program see this page, see the program named "srop.c").
My question is regarding how func is used in the main method. I have only kept the code which I thought could be related.
It is the line *ret = (int)func +4; that confuses me.
There are three questions I have regarding this:
func(void) is a function, should it not be called with func() (note the brackets)
Accepting that that might be some to me unknown way of calling a function, how can it be casted to an int when it should return void?
I understand that the author doesn't want to save the frame pointer nor update it (the prologue), as his comment indicates. How is this skipping-two-lines ahead achieved with casting the function to an int and adding four?
.
(gdb) disassemble func
Dump of assembler code for function func:
0x000000000040069b <+0>: push %rbp
0x000000000040069c <+1>: mov %rsp,%rbp
0x000000000040069f <+4>: mov $0xf,%rax
0x00000000004006a6 <+11>: retq
0x00000000004006a7 <+12>: pop %rbp
0x00000000004006a8 <+13>: retq
End of assembler dump.
Possibly relevant is that when compiled gcc tells me the following:
warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
Please see below for the code.
void func(void)
{
asm("mov $0xf,%rax\n\t");
asm("retq\n\t");
}
int main(void)
{
unsigned long *ret;
/*...*/
/* overflowing */
ret = (unsigned long *)&ret + 2;
*ret = (int)func +4; //skip gadget's function prologue
/*...*/
return 0;
}
[Edit] Following the very helpful advice, here are some further information:
calling func returns a pointer to the start of the function: 0x400530
casting this to an int is dangerous (in hex) 400530
casting this to an int in decimal 4195632
safe cast to unsigned long 4195632
size of void pointer: 8
size of int: 4
size of unsigned long: 8
[Edit 2:] #cmaster: Could you please point me to some more information regarding how to put the assembler function in a separate file and link to it? The original program will not compile because it doesn’t know what the function prog (when put in the assembler file) is, so it must be added either before or during compilation?
Additionally, the gcc -S when ran on a C file only including the assembly commands seem to add a lot of extra information, could not func(void) be represented by the following assembler code?
func:
mov $0xf,%rax
retq
This code assumes a lot more than what is good for it. Anyway, the snippet that you have shown only tries to produce a pointer to the assembler function body, it does not attempt to call it. Here is what it does, and what it assumes:
func by itself produces a pointer to the function.
Assumption 1:
The pointer actually points to the start of the assembler code for func. That assumption is not necessarily right, there are architectures where a function pointer is actually a pointer to a pair of pointers, one of which points to the code, the other of which points to a data segment.
func + 4 increments this pointer to point to the first instruction of the body of the function.
Assumption 2:
Function pointers can be incremented, and their increment is in terms of bytes. I believe that this is not covered by the C standard, but I may be wrong on that one.
Assumption 3:
The prolog that is inserted by the compiler is precisely four bytes long. There is absolutely nothing that dictates what kind of prolog the compiler should emit, there is a multitude of variants allowed, with very different lengths. The code you've given tries to control the length of the prolog by not passing/returning any parameters, but still there can be compilers that produce a different prolog. Worse, the size of the prolog may depend on the optimization level.
The resulting pointer is cast to an int.
Assumption 4:
sizeof(void (*)(void)) == sizeof(int). This is false on most 64 bit systems: on these systems int is usually still four bytes while a pointer occupies eight bytes. On such a system, the pointer value will be truncated. When the int is cast back into a function pointer and called, this will likely crash the program.
My advice:
If you really want to program in assembler, compile a file with only an empty function with gcc -S. This will give you an assembler source file with all the cruft that's needed for the assembler to produce a valid object file and show you where you can add the code for your own function. Modify that file in any way you like, and then compile it together with some calling C code as normal. That way you avoid all these dangerous little assumptions.
The name of a function is a pointer to the start of the function. So the author is not calling the function at that point. Just saving a reference to the start of it.
It's not a void. It's a function pointer. More precisely in this case it is of type: void (*)(void). A pointer is just an address so can be cast to an int (but the address may be truncated if compiled for a 64 bit system as ints are 32 bits in that case).
The first instruction of the function pushes the fp onto the stack. By adding 4, that instruction is skipped. Note that in the snippets you gave the function has not been called. It's probably part of the code that you have not included.

Printf the current address in C program

Imagine I have the following simple C program:
int main() {
int a=5, b= 6, c;
c = a +b;
return 0;
}
Now, I would like to know the address of the expression c=a+b, that is the program address
where this addition is carried out. Is there any possibility that I could use printf?
Something along the line:
int main() {
int a=5, b= 6, c;
printf("Address of printf instruction in memory: %x", current_address_pointer_or_something)
c = a +b;
return 0;
}
I know how I could find the address out by using gdb and then info line file.c:line. However, I should know if I could also do that directly with the printf.
In gcc, you can take the address of a label using the && operator. So you could do this:
int main()
{
int a=5, b= 6, c;
sum:
c = a+b;
printf("Address of sum label in memory: %p", &&sum);
return 0;
}
The result of &&sum is the target of the jump instruction that would be emitted if you did a goto sum. So, while it's true that there's no one-to-one address-to-line mapping in C/C++, you can still say "get me a pointer to this code."
Visual C++ has the _ReturnAddress intrinsic, which can be used to get some info here.
For instance:
__declspec(noinline) void PrintCurrentAddress()
{
printf("%p", __ReturnAddress);
}
Which will give you an address close to the expression you're looking at. In the event of some optimizations, like tail folding, this will not be reliable.
Tested in Visual Studio 2008:
int addr;
__asm
{
call _here
_here: pop eax
; eax now holds the PC.
mov [addr], eax
}
printf("%x\n", addr);
Credit to this question.
Here's a sketch of an alternative approach:
Assume that you haven't stripped debug symbols, and in particular you have the line number to address table that a source-level symbolic debugger needs in order to implement things like single step by source line, set a break point at a source line, and so forth.
Most tool chains use reasonably well documented debug data formats, and there are often helper libraries that implement most of the details.
Given that and some help from the preprocessor macro __LINE__ which evaluates to the current line number, it should be possible to write a function which looks up the address of any source line.
Advantages are that no assembly is required, portability can be achieved by calling on platform-specific debug information libraries, and it isn't necessary to directly manipulate the stack or use tricks that break the CPU pipeline.
A big disadvantage is that it will be slower than any approach based on directly reading the program counter.
For x86:
int test()
{
__asm {
mov eax, [esp]
}
}
__declspec(noinline) int main() // or whatever noinline feature your compiler has
{
int a = 5;
int aftertest;
aftertest = test()+3; // aftertest = disasms to 89 45 F8 mov dword ptr [a],eax.
printf("%i", a+9);
printf("%x", test());
return 0;
}
I don't know the details, but there should be a way to make a call to a function that can then crawl the return stack for the address of the caller, and then copy and print that out.
Using gcc on i386 or x86-64:
#include <stdio.h>
#define ADDRESS_HERE() ({ void *p; __asm__("1: mov 1b, %0" : "=r" (p)); p; })
int main(void) {
printf("%p\n", ADDRESS_HERE());
return 0;
}
Note that due to the presence of compiler optimizations, the apparent position of the expression might not correspond to its position in the original source.
The advantage of using this method over the &&foo label method is it doesn't change the control-flow graph of the function. It also doesn't break the return predictor unit like the approaches using call :)
On the other hand, it's very much architecture-dependent... and because it doesn't perturb the CFG there's no guarantee that jumping to the address in question would make any sense at all.
If the compiler is any good this addition happens in registers and is never stored in memory, at least not in the way you are thinking. Actually a good compiler will see that your program does nothing, manipulating values within a function but never sending those values anywhere outside the function can result in no code.
If you were to:
c = a+b;
printf("%u\n",c);
Then a good compiler will also never store that value C in memory it will stay in registers, although it depends on the processor as well. If for example compilers for that processor use the stack to pass variables to functions then the value for c will be computed using registers (a good compiler will see that C is always 11 and just assign it) and the value will be put on the stack while being sent to the printf function. Naturally the printf function may well need temporary storage in memory due to its complexity (cant fit everything it needs to do in registers).
Where I am heading is that there is no answer to your question. It is heavily dependent on the processor, compiler, etc. There is no generic answer. I have to wonder what the root of the question is, if you were hoping to probe with a debugger, then this is not the question to ask.
Bottom line, disassemble your program and look at it, for that compile on that day with those settings, you will be able to see where the compiler has placed intermediate values. Even if the compiler assigns a memory location for the variable that doesnt mean the program will ever store the variable in that location. It depends on optimizations.

Resources