How to dereference zero address with GCC? [duplicate] - c

This question already has answers here:
C standard compliant way to access null pointer address?
(5 answers)
Closed 7 years ago.
Suppose I need to write to zero address (e.g. I've mmapped something there and want to access it, for whatever reason including curiosity), and the address is known at compile time. Here're some variants I could think of to obtain the pointer, one of these works and another three don't:
#include <stdint.h>
void testNullPointer()
{
// Obviously UB
unsigned* p=0;
*p=0;
}
void testAddressZero()
{
// doesn't work for zero, GCC detects it as NULL
uintptr_t x=0;
unsigned* p=(unsigned*)x;
*p=0;
}
void testTrickyAddressZero()
{
// works, but the resulting assembly is not as terse as it could be
unsigned* p;
asm("xor %0,%0\n":"=r"(p));
*p=0;
}
void testVolatileAddressZero()
{
// p is updated, but the code doesn't actually work
unsigned*volatile p=0;
*p=0; // because this doesn't dereference p! // EDIT: pointee should also be volatile, then this will work
}
I compile this with
gcc test.c -masm=intel -O3 -c -o test.o
and then objdump -d test.o -M intel --no-show-raw-insn gives me (alignment bytes are skipped here):
00000000 <testNullPointer>:
0: mov DWORD PTR ds:0x0,0x0
a: ud2a
00000010 <testAddressZero>:
10: mov DWORD PTR ds:0x0,0x0
1a: ud2a
00000020 <testTrickyAddressZero>:
20: xor eax,eax
22: mov DWORD PTR [eax],0x0
28: ret
00000030 <testVolatileAddressZero>:
30: sub esp,0x10
33: mov DWORD PTR [esp+0xc],0x0
3b: mov eax,DWORD PTR [esp+0xc]
3f: add esp,0x10
42: ret
Here the testNullPointer obviously has UB since it dereferences what is null pointer by definition.
The principle of testAddressZero would give the expected code for any other than 0 address, e.g. 1, but for zero GCC appears to detect that address zero corresponds to null pointer, so also generates UD2.
The asm way of getting the zero address certainly inhibits the compiler's checks, but the price of that is that one has to write different assembly code for each architecture even if the principle of testAddressZero might have been successful (i.e. the same flat memory model on each arch) if not UD2 and similar traps. Also, the code appears not as terse as in the above two variants.
The way of volatile pointer would seem to be the best, but the code generated here appears to not dereference the address for some reason, so it's also broken.
The question now: if I'm targeting GCC, how can I seamlessly access zero address without any traps or other consequences of UB, and without the need to write in assembly?

As a workaround you can use the GCC option -fno-delete-null-pointer-checks that refrain the compiler to actively check for null pointer dereferencing.
While this option is intended to be used to speed-up code optimization it can be used in specific cases as this.

I would put the pointer into a global variable:
const uintptr_t zero = 0;
unsigned* zeroAddress= (unsigned *)zero;
void testZeroAddressPointer()
{
*zeroAddress=0;
}
Provided you expose the address beyond the scope of optimization (so the compiler can't figure out you don't set it somewhere else), that should do the trick, albeit slightly less efficiently.
Edit: make this code independent of implicit zero to null conversion.

The 0 address is the C99 NULL pointer (actually the "implementation" of the null pointer, which you can often write as 0....) on all the architectures I know about.
The null pointer has a very specific status in hosted C99: when a pointer can be (or was) dereferenced, it is guaranteed (by the language specification) to not be NULL (otherwise, it is undefined behavior).
Hence, the GCC compiler has the right to optimize (and actually will optimize)
int *p = something();
int x = *p;
/// the compiler is permitted to skip the following
/// because p has been dereferenced so cannot be NULL
if (p == NULL) { doit(); return; };
In your case, you might want to compile for the freestanding subset of the C99 standard. So compile with gcc -ffreestanding (beware, this option can bring some infelicities).
BTW, you might declare some extern char strange[] __attribute__((weak)); (perhaps even add asm("0") ...) and have some assembler or linker trick to make that strange have a 0 address. The compiler would not know that such a strange symbol is in fact at the 0 address...
My strong suggestion is to avoid dereferencing the 0 address.... See this. If you really need to deference the address 0, be prepared to suffer.... (so code some asm, lower the optimization, etc...).
(If you have mmap-ed the first page, just avoid using its first byte at address 0; that is often not a big deal.)
(IIRC, you are touching a grey area of GCC optimizations - and perhaps even of the C99 language specification, and you certainly want the free standing flavor of C; notice that -O3 optimization for free standing C is not well tested in the GCC compiler and might have residual bugs....)
You could consider changing the GCC compiler so that the null pointer has the numerical address 42. That would take some work.

Related

C return address of stack variable = NULL?

In C when you have a function that returns a pointer to one of it's local (on the stack) variables the calling function gets null returned instead. Why does that happen?
I can do this in C on my hardware
void A() {
int A = 5;
}
void B() {
// B will be 5 even when uninitialised due to the B stack frame using
// the old memory layout of A
int B;
printf("%d\n", B);
}
int main() {
A();
B();
}
Due to the fact that the stack frame memory doesn't get reset and B overlays A's memory record in the stack.
However I can't do
int* C() {
int C = 10;
return &C;
}
int main() {
// D will be null ?
int* D = C();
}
I know I shouldn't do this code, it's UB, is different on different hardware, compilers could optimize it to change the behaviour of the example, and it will get clobbered when we next call another function in this example anyway.
But I was wondering why specifically D is null when compiled with GCC and why I get a segmentation fault if I try and access that memory address, shouldn't the bits still be there?
Is it the compiler doing this?
GCC sees the undefined behaviour (UB) visible at compile time and decides to just return NULL on purpose. This is good: noisy failure right away on first use of a value is easier to debug. Returning NULL was a new feature somewhere around GCC5; as #P__J__'s answer shows on Godbolt, GCC4.9 prints non-null stack addresses.
Other compilers may behave differently, but any decent compile will warn about this error. See also What Every C Programmer Should Know About Undefined Behavior
Or with optimization disabled, you could use a tmp variable to hide the UB from the compiler. Like int *p = &C; return p; because gcc -O0 doesn't optimize across statements. (Or with optimization enabled, make that pointer variable volatile to launder a value through it, hiding the source of the pointer value from the optimizer.)
#include <stdio.h>
int* C() {
int C = 10;
int *volatile p = &C; // volatile pointer to plain int
return p; // still UB, but hidden from the compiler
}
int main()
{
int* D = C();
printf("%p\n", (void *)D);
if (D){
printf("%#x\n", *D); // in theory should be passing an unsigned int for %x
}
}
Compiling and running on the Godbolt compiler explorer, with gcc10.1 -O3 for x86-64:
0x7ffcdbf188e4
0x7ffc
Interestingly, the dead store to int C optimized away, although it does still have an address. It has its address taken, but the var holding the address doesn't escape the function until int C goes out of scope at the same time that address is returned. Thus no well-defined accesses to the 10 value are possible, and it is valid for the compiler to make this optimization. Making int C volatile as well would give us the value.
The asm for C() is:
C:
lea rax, [rsp-12] # address in the red-zone, below RSP
mov QWORD PTR [rsp-8], rax # store to a volatile local var, also in the red zone
mov rax, QWORD PTR [rsp-8] # reload it as return value
ret
The version that actually runs is inlined into main and behaves similarly. It's loading some garbage value from the callstack that was left there, probably the top half of an address. (x86-64's 64-bit addresses only have 48 significant bits. The low half of the canonical range always has 16 leading zero bits).
But it's memory that wasn't written by main, so perhaps an address used by some function that ran before main.
// B will be 5 even when uninitialised due to the B stack frame using
// the old memory layout of A
int B;
Nothing about that is guaranteed. It's just luck that that happens to work out when optimization is disabled. With a normal level of optimization like -O2, reading an uninitialized variable might just read as 0 if the compiler can see that at compile time. Definitely no need for it to load from the stack.
And the other function would have optimized away a dead store.
GCC also warns for use-uninitialized.
It is an undefined behaviour (UB) but many modern compilers when they detect it return the reference to the automatic storage variable return NULL as a precaution (for example newer versions of gcc).
example here:
https://godbolt.org/z/H-zU4C

Provoke stack underflow in C

I would like to provoke a stack underflow in a C function to test security measures in my system. I could do this using inline assembler. But C would be more portable. However I can not think of a way to provoke a stack underflow using C since stack memory is safely handled by the language in that regard.
So, is there a way to provoke a stack underflow using C (without using inline assembler)?
As stated in the comments: Stack underflow means having the stack pointer to point to an address below the beginning of the stack ("below" for architectures where the stack grows from low to high).
There's a good reason why it's hard to provoke a stack underflow in C.The reason is that standards compliant C does not have a stack.
Have a read of the C11 standard, you'll find out that it talks about scopes but it does not talk about stacks. The reason for this is that the standard tries, as far as possible, to avoid forcing any design decisions on implementations. You may be able to find a way to cause stack underflow in pure C for a particular implementation but it will rely on undefined behaviour or implementation specific extensions and won't be portable.
You can't do this in C, simply because C leaves stack handling to the implementation (compiler). Similarly, you cannot write a bug in C where you push something on the stack but forget to pop it, or vice versa.
Therefore, it is impossible to produce a "stack underflow" in pure C. You cannot pop from the stack in C, nor can you set the stack pointer from C. The concept of a stack is something on an even lower level than the C language. In order to directly access and control the stack pointer, you must write assembler.
What you can do in C is to purposely write out of bounds of the stack. Suppose we know that the stack starts at 0x1000 and grows upwards. Then we can do this:
volatile uint8_t* const STACK_BEGIN = (volatile uint8_t*)0x1000;
for(volatile uint8_t* p = STACK_BEGIN; p<STACK_BEGIN+n; p++)
{
*p = garbage; // write outside the stack area, at whatever memory comes next
}
Why you would need to test this in a pure C program that doesn't use assembler, I have no idea.
In case someone incorrectly got the idea that the above code invokes undefined behavior, this is what the C standard actually says, normative text C11 6.5.3.2/4 (emphasis mine):
The unary * operator denotes indirection. If the operand points to a function, the result is
a function designator; if it points to an object, the result is an lvalue designating the
object. If the operand has type ‘‘pointer to type’’, the result has type ‘‘type’’. If an
invalid value has been assigned to the pointer, the behavior of the unary * operator is
undefined 102)
The question is then what's the definition of an "invalid value", as this is no formal term defined by the standard. Foot note 102 (informative, not normative) provides some examples:
Among the invalid values for dereferencing a pointer by the unary * operator are a null pointer, an
address inappropriately aligned for the type of object pointed to, and the address of an object after the
end of its lifetime.
In the above example we are clearly not dealing with a null pointer, nor with an object that has passed the end of its lifetime. The code may indeed cause a misaligned access - whether this is an issue or not is determined by the implementation, not by the C standard.
And the final case of "invalid value" would be an address that is not supported by the specific system. This is obviously not something that the C standard mentions, because memory layouts of specific systems are not coverted by the C standard.
It is not possible to provoke stack underflow in C. In order to provoke underflow the generated code should have more pop instructions than push instructions, and this would mean the compiler/interpreter is not sound.
In the 1980s there were implementations of C that ran C by interpretation, not by compilation. Really some of them used dynamic vectors instead of the stack provided by the architecture.
stack memory is safely handled by by the language
Stack memory is not handled by the language, but by the implementation. It is possible to run C code and not to use stack at all.
Neither the ISO 9899 nor K&R specifies anything about the existence of a stack in the language.
It is possible to make tricks and smash the stack, but it will not work on any implementation, only on some implementations. The return address is kept on the stack and you have write-permissions to modify it, but this is neither underflow nor portable.
Regarding already existing answers: I don't think that talking about undefined behaviour in the context of exploitation mitigation techniques is appropriate.
Clearly, if an implementation provides a mitigation against stack underflows, a stack is provided. In practice, void foo(void) { char crap[100]; ... } will end up having the array on the stack.
A note prompted by comments to this answer: undefined behaviour is a thing and in principle any code exercising it can end up being compiled to absolutely anything, including something not resembling the original code in the slightest. However, the subject of exploit mitigation techniques is closely tied to the target environment and what happens in practice. In practice, the code below should "work" just fine. When dealing with this kind of stuff you always have to verify generated assembly to be sure.
Which brings me to what in practice will give an underflow (volatile added to prevent the compiler from optimising it away):
static void
underflow(void)
{
volatile char crap[8];
int i;
for (i = 0; i != -256; i--)
crap[i] = 'A';
}
int
main(void)
{
underflow();
}
Valgrind nicely reports the problem.
By definition, a stack underflow is a type of undefined behaviour, and thus any code which triggers such a condition must be UB. Therefore, you can't reliably cause a stack underflow.
That said, the following abuse of variable-length arrays (VLAs) will cause a controllable stack underflow in many environments (tested with x86, x86-64, ARM and AArch64 with Clang and GCC), actually setting the stack pointer to point above its initial value:
#include <stdint.h>
#include <stdio.h>
#include <string.h>
int main(int argc, char **argv) {
uintptr_t size = -((argc+1) * 0x10000);
char oops[size];
strcpy(oops, argv[0]);
printf("oops: %s\n", oops);
}
This allocates a VLA with a "negative" (very very large) size, which will wrap the stack pointer around and result in the stack pointer moving upwards. argc and argv are used to prevent optimizations from taking out the array. Assuming that the stack grows down (default on the listed architectures), this will be a stack underflow.
strcpy will either trigger a write to an underflowed address when the call is made, or when the string is written if strcpy is inlined. The final printf should not be reachable.
Of course, this all assumes a compiler which doesn't just make the VLA some kind of temporary heap allocation - which a compiler is completely free to do. You should check the generated assembly to verify that the above code does what you actually expect it to do. For example, on ARM (gcc -O):
8428: e92d4800 push {fp, lr}
842c: e28db004 add fp, sp, #4, 0
8430: e1e00000 mvn r0, r0 ; -argc
8434: e1a0300d mov r3, sp
8438: e0433800 sub r3, r3, r0, lsl #16 ; r3 = sp - (-argc) * 0x10000
843c: e1a0d003 mov sp, r3 ; sp = r3
8440: e1a0000d mov r0, sp
8444: e5911004 ldr r1, [r1]
8448: ebffffc6 bl 8368 <strcpy#plt> ; strcpy(sp, argv[0])
This assumption:
C would be more portable
is not true. C doesn't tell anything about a stack and how it is used by the implementation. On your typical x86 platform, the following (horribly invalid) code would access the stack outside of the valid stack frame (until it is stopped by the OS), but it would not actually "pop" from it:
#include <stdarg.h>
#include <stdio.h>
int underflow(int dummy, ...)
{
va_list ap;
va_start(ap, dummy);
int sum = 0;
for(;;)
{
int x = va_arg(ap, int);
fprintf(stderr, "%d\n", x);
sum += x;
}
return sum;
}
int main(void)
{
return underflow(42);
}
So, depending on what exactly you mean with "stack underflow", this code does what you want on some platform. But as from a C point of view, this just exposes undefined behavior, I wouldn't suggest to use it. It's not "portable" at all.
Is it possible to do it reliably in standard compliant C? No
Is it possible to do it on at least one practical C compiler without resorting to inline assembler? Yes
void * foo(char * a) {
return __builtin_return_address(0);
}
void * bar(void) {
char a[100000];
return foo(a);
}
typedef void (*baz)(void);
int main() {
void * a = bar();
((baz)a)();
}
Build that on gcc with "-O2 -fomit-frame-pointer -fno-inline"
https://godbolt.org/g/GSErDA
Basically the flow in this program goes as follows
main calls bar.
bar allocates a bunch of space on the stack (thanks to the big array),
bar calls foo.
foo takes a copy of the return address (using a gcc extension). This address points into the middle of bar, between the "allocation" and the "cleanup".
foo returns the address to bar.
bar cleans up it's stack allocation.
bar returns the return address captured by foo to main.
main calls the return address, jumping into the middle of bar.
the stack cleanup code from bar runs, but bar doesn't currently have a stack frame (because we jumped into the middle of it). So the stack cleanup code underflows the stack.
We need -fno-inline to stop the optimiser inlining stuff and breaking our carefully laid-down strcture. We also need the compiler to free the space on the stack by calculation rather than by use of a frame pointer, -fomit-frame-pointer is the default on most gcc builds nowadays but it doesn't hurt to specify it explicitly.
I belive this tehcnique should work for gcc on pretty much any CPU architecture.
There is a way to underflow the stack, but it is very complicated. The only way that I can think of is define a pointer to the bottom element then decrement its address value. I.e. *(ptr)--. My parentheses may be off, but you want to decrement the value of the pointer, then dereference the pointer.
Generally the OS is just going to see the error and crash. I am not sure what you are testing. I hope this helps. C allows you to do bad things, but it tries to look after the programmer. Most ways to get around this protection is through manipulation of pointers.
Do you mean stack overflow? Putting more things into the stack than the stack can accomodate? If so, recursion is the easiest way to accomplish that.
void foo();
{foo();};
If you mean attempting to remove things from an empty stack, then please post your question to the stackunderflow web site, and let me know where you've found that! :-)
So there are older library functions in C which are not protected. strcpy is a good example of this. It copies one string to another until it reaches a null terminator. One funny thing to do is pass a program that uses this a string with the null terminator removed. It will run amuck until it reaches a null terminator somewhere. Or have a string copy to itself. So back to what I was saying before is C supports pointers to just about anything. You can make a pointer to an element in the stack at the last element. Then you can use the pointer iterator built into C to decrement the value of the address, change the address value to a location preceding the last element in the stack. Then pass that element to the pop. Now if you are doing this to the Operating system process stack that would get very dependent on the compiler and operating system implementation. In most cases a function pointer to the main and a decrement should work to underflow the stack. I have not tried this in C. I have only done this in Assembly Language, great care has to be taken in working like this. Most operating systems have gotten good at stopping this since it was for a long time an attack vector.

GCC vs Clang copying struct flexible array member

Consider the following code snippet.
#include <stdio.h>
typedef struct s {
int _;
char str[];
} s;
s first = { 0, "abcd" };
int main(int argc, const char **argv) {
s second = first;
printf("%s\n%s\n", first.str, second.str);
}
When I compile this with GCC 7.2, I get:
$ gcc-7 -o tmp tmp.c && ./tmp
abcd
abcd
But when I compile this with Clang (Apple LLVM version 8.0.0 (clang-800.0.42.1)), I get the following:
$ clang -o tmp tmp.c && ./tmp
abcd
# Nothing here
Why does the output differ between the compilers? I would expect the string not to be copied, as it's a flexible array member (similar to this question). Why does GCC actually copy it?
Edit
Some comments and an answer suggested this might be due to optimization. GCC may make second an alias of first, so updating second should disallow GCC from doing that optimization. I added the line:
second._ = 1;
But this doesn't change the output.
Here's the real answer of what's going on with gcc. second is allocated on the stack, just as you'd expect. It is not an alias for first. This is easily verified by printing their addresses.
Additionally, the declaration s second = first; is corrupting the stack, because (a) gcc is allocating the minimum amount of storage for second but (b) it is copying all of first into second, corrupting the stack.
Here is a modified version of the original code which shows this:
#include <stdio.h>
typedef struct s {
int _;
char str[];
} s;
s first = { 0, "abcdefgh" };
int main(int argc, const char **argv) {
char v[] = "xxxxxxxx";
s second = first;
printf("%p %p %p\n", (void *) v, (void *) &first, (void *) &second);
printf("<%s> <%s> <%s>\n", v, first.str, second.str);
}
On my 32-bit Linux machine, with gcc, I get the following output:
0xbf89a303 0x804a020 0xbf89a2fc
<defgh> <abcdefgh> <abcdefgh>
As you can see from the addresses, v and second are on the stack, and first is in the data section. Further, it is also clear that the initialization of second has overwritten v on the stack, with the result that instead of the expected <xxxxxxxx>, it is instead showing <defgh>.
This seems like a gcc bug to me. At the very least, it should warn that the initialization of second will corrupt the stack, since it clearly has enough information to know this at compile time.
Edit: I tested this some more, and obtained essentially equivalent results by splitting the declaration of second into:
s second;
second = first;
The real problem is the assignment. It's copying all of first, rather than the minimal common part of the structure type, which is what I believe it should do. In fact, if you move the static initialization of first into a separate file, the assignment does what it should do, v prints correctly, and second.str is undefined garbage. This is the behavior gcc should be producing, regardless of whether the initialization of first is visible in the same compilation unit or not.
So, for an answer, both compilers are behaving correctly, but the answers you are getting are undefined behavior.
GCC
Because you never modify second GCC is simply making second and alias of first in its lookup table. Modify second and GCC cannot make that optimization and you’ll get the same answer/crash as Clang.
Clang
Clang does not automatically apply the same optimization, it seems. So when it copies the structure, it does so correctly: It copies the single int and nothing else.
You were lucky that there was a zero value on the stack after your local second variable, terminating your unknown character string. Basically, you are using an uninitialized pointer. Were there no zero, you could have gotten a lot of garbage and a memory fault.
The purpose of this thing is to do low-level stuff, like implement a memory manager, etc, by casting some memory to your structure. The compiler is under no obligation to understand what you are doing; it is only under obligation to act as if you know what you are doing. If you fail to cast the structure type over memory that actually has data of that type in it, all bets are off.
edit
So, using godbolt.org and looking at the assembly:
.LC0:
.string "%s\n%s\n"
main:
sub rsp, 24
mov eax, DWORD PTR first[rip]
mov esi, OFFSET FLAT:first+4
lea rdx, [rsp+16]
mov edi, OFFSET FLAT:.LC0
mov DWORD PTR [rsp+12], eax
xor eax, eax
call printf
xor eax, eax
add rsp, 24
ret
first:
.long 0
.string "abcd"
We see that GCC is, actually, doing exactly what I said with the OP’s original code: treating second as an alias of first.
Tom Karzes has significantly modified the code, and so is experiencing a different issue. What he reports does appear to be a bug; I haven’t time ATM to figure out what is really happening with his stack-corrupting assignment.

C assembler function casting

I came across this piece of code (for the whole program see this page, see the program named "srop.c").
My question is regarding how func is used in the main method. I have only kept the code which I thought could be related.
It is the line *ret = (int)func +4; that confuses me.
There are three questions I have regarding this:
func(void) is a function, should it not be called with func() (note the brackets)
Accepting that that might be some to me unknown way of calling a function, how can it be casted to an int when it should return void?
I understand that the author doesn't want to save the frame pointer nor update it (the prologue), as his comment indicates. How is this skipping-two-lines ahead achieved with casting the function to an int and adding four?
.
(gdb) disassemble func
Dump of assembler code for function func:
0x000000000040069b <+0>: push %rbp
0x000000000040069c <+1>: mov %rsp,%rbp
0x000000000040069f <+4>: mov $0xf,%rax
0x00000000004006a6 <+11>: retq
0x00000000004006a7 <+12>: pop %rbp
0x00000000004006a8 <+13>: retq
End of assembler dump.
Possibly relevant is that when compiled gcc tells me the following:
warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
Please see below for the code.
void func(void)
{
asm("mov $0xf,%rax\n\t");
asm("retq\n\t");
}
int main(void)
{
unsigned long *ret;
/*...*/
/* overflowing */
ret = (unsigned long *)&ret + 2;
*ret = (int)func +4; //skip gadget's function prologue
/*...*/
return 0;
}
[Edit] Following the very helpful advice, here are some further information:
calling func returns a pointer to the start of the function: 0x400530
casting this to an int is dangerous (in hex) 400530
casting this to an int in decimal 4195632
safe cast to unsigned long 4195632
size of void pointer: 8
size of int: 4
size of unsigned long: 8
[Edit 2:] #cmaster: Could you please point me to some more information regarding how to put the assembler function in a separate file and link to it? The original program will not compile because it doesn’t know what the function prog (when put in the assembler file) is, so it must be added either before or during compilation?
Additionally, the gcc -S when ran on a C file only including the assembly commands seem to add a lot of extra information, could not func(void) be represented by the following assembler code?
func:
mov $0xf,%rax
retq
This code assumes a lot more than what is good for it. Anyway, the snippet that you have shown only tries to produce a pointer to the assembler function body, it does not attempt to call it. Here is what it does, and what it assumes:
func by itself produces a pointer to the function.
Assumption 1:
The pointer actually points to the start of the assembler code for func. That assumption is not necessarily right, there are architectures where a function pointer is actually a pointer to a pair of pointers, one of which points to the code, the other of which points to a data segment.
func + 4 increments this pointer to point to the first instruction of the body of the function.
Assumption 2:
Function pointers can be incremented, and their increment is in terms of bytes. I believe that this is not covered by the C standard, but I may be wrong on that one.
Assumption 3:
The prolog that is inserted by the compiler is precisely four bytes long. There is absolutely nothing that dictates what kind of prolog the compiler should emit, there is a multitude of variants allowed, with very different lengths. The code you've given tries to control the length of the prolog by not passing/returning any parameters, but still there can be compilers that produce a different prolog. Worse, the size of the prolog may depend on the optimization level.
The resulting pointer is cast to an int.
Assumption 4:
sizeof(void (*)(void)) == sizeof(int). This is false on most 64 bit systems: on these systems int is usually still four bytes while a pointer occupies eight bytes. On such a system, the pointer value will be truncated. When the int is cast back into a function pointer and called, this will likely crash the program.
My advice:
If you really want to program in assembler, compile a file with only an empty function with gcc -S. This will give you an assembler source file with all the cruft that's needed for the assembler to produce a valid object file and show you where you can add the code for your own function. Modify that file in any way you like, and then compile it together with some calling C code as normal. That way you avoid all these dangerous little assumptions.
The name of a function is a pointer to the start of the function. So the author is not calling the function at that point. Just saving a reference to the start of it.
It's not a void. It's a function pointer. More precisely in this case it is of type: void (*)(void). A pointer is just an address so can be cast to an int (but the address may be truncated if compiled for a 64 bit system as ints are 32 bits in that case).
The first instruction of the function pushes the fp onto the stack. By adding 4, that instruction is skipped. Note that in the snippets you gave the function has not been called. It's probably part of the code that you have not included.

How to post-modify a C pointer?

In modern processors it is possible to load a register from memory and then post-modify the indexing pointer by a desired value. For example, in our embedded processor, this will be done by:
ldr r0, [r1], +12
which means - load the value pointed to by r1 into r0 and then increment r1 by 12:
r0 = [r1]
r1 = r1 + 12
In the C language, using pointer arithmetics, one can assign a value using a pointer and then advance the pointer by 1:
char i, *p, a[3]={10, 20, 30};
p = &(a[0]);
i = *p++;
// now i==10 and p==&(a[1]).
I am looking for a way to dereference a pointer while post-modifying it by an offset other than 1. Is this possible in C, so it maps nicely to the similar asm instruction?
Note that:
i = *p+=2;
increases the value in a[0] w/o modifying the pointer, and:
i = *(p+=2);
pre-modifies the pointer, so in this case i==30.
Yes this is possible.
You shouldn't be doing weird pointer math to make it happen.
Not only is it about optimization settings, your GCC back-end needs to tell GCC that it has such a feature (i.e. when GCC itself is being compiled). Based on this knowledge, GCC automatically combines the relevant sequence into a single instruction.
i.e. if your back-end is written right, even something like:
a = *ptr;
ptr += SOME_CONST;
should become a single post-modify instruction.
How to correctly set this up when writing a back-end? (ask your friendly neighbourhood GCC back-end developer to do it for you):
If your GCC back-end is called foo:
In the GCC source tree, the back-end description and hooks will be located at gcc/config/foo/.
Among the files there (which get compiled along with GCC), there is usually a header foo.h which contains a lot of #defines describing machine features.
GCC expects that a back-end which supports post-increment define the macro HAVE_POST_INCREMENT to evaluate to true, and if it supports post-modify, then define the macro HAVE_POST_MODIFY_DISP to true. (post-increment => ptr++, post-modify => ptr += CONST). Maybe there are a few other things to be handled as well.
Assuming that your processor's back-end has got this right, lets move to what happens when you compile your code containing said post-modify sequence:
There is a specific GCC optimization pass that goes through instruction pairs that fall into this category and combines them. The source for that pass is here, and has a rather clear description of what GCC will do and how to get it to do it.
But this, in the end, is not in your control as a GCC user. It is in the control of the developer who wrote your GCC back-end. All you should be doing, like the most upvoted comment says, is:
a = *ptr;
ptr += SOME_CONST;
You can do it this way, but don't do it:
i = *((p += 2) - 2);
(not exactly post-modify)
The closest I can think of:
#define POST_INDEX_ASSIGN(lhs, ptr, index) (lhs = *(ptr), (ptr) += (index))
POST_INDEX_ASSIGN(i, p, 2);
i = *p;
p = (unsigned char*)p + 12;
where i is any kind of type and p is a pointer to that type.
If you don't add the typecast, the pointer increment will be done in steps with size == sizeof(*p), which would make the code completely different from the posted assembler.
For example, had p been an int* on a 32-bit system, the pointer would have been incremented 4*12 bytes without the typecast.

Resources