Segfault calling c function from assembly

Segfault calling c function from assembly - c

I am attempting to set up some pointers in an assembly program(AT&T syntax running on x86_64 linux), then pass them to a C program to essentially add their values. Of course, this isn't the most effective way of accomplishing the end result, but I'm trying to understand how to make something like this work in order to further build off of it. The C program looks as follows:
#include <stdio.h>
extern void iplus(long *a, long *b, long *c){
printf("Starting\n");
long r= *a + *b;
printf("R setup\n");
*c=r;
printf("Done\n");
}
This takes three long pointers, adds the value of the first two, then stores that value in the third. As shown above, it prints a message regarding its status at each point, in order to track where the segmentation fault occurs.
The assembly program referencing this function is as follows:
.extern exit
.extern malloc
.data
vars: .zero 24 /*stores pointer addresses*/
.text
FORI: .ascii "%d\0" /*format for printing integer*/
.global main
main:
and $~0xf, %rsp /*16-byte align the stack*/
movq $8,%rdi
call malloc
movq %rax,(vars+0) /*allocate 8 bytes and put its address into the variable*/
movq $8,%rdi
call malloc
movq %rax,(vars+8)
movq $8,%rdi
call malloc
movq %rax,(vars+16)
movq $3,((vars+0)) /*first addend 3*/
movq $7,((vars+8)) /*second addend 7*/
movq $0,((vars+16))
movq (vars),%rdi
movq (vars+8),%rsi
movq (vars+16),%rdx
call iplus /*call the function with these values*/
movq $FORI,%rdi
movq ((vars+16)),%rsi
call printf /*print the sum, "10" expected*/
call exit
Upon making then executing the above program, I get this output:
Starting
Segmentation fault (core dumped)
Meaning the function seems to be successfully called, but something about the first operation within that function, long r = *a + *b;, or something earlier that only becomes a problem at that point, is causing a segfault. What I expect to happen is that, for the three 8-byte values held by the 24-byte vars, the address returned by malloc (which allocates 8 bytes each time), is stored. This address then points to an 8-byte integer, which are set to 3, 7, and 0. The addresses of these integers(i.e. the values held in vars), are passed to iplus in order to sum them, then the sum is printed using printf. For a reason I cannot identify, this instead causes a segfault.
Why is the segfault occurring? Is it possible to perform this addition using the C function call with the structure of basically a double pointer still being used?

You can't use pointer residing in memory directly, you should first load it into register. Double parentheses you put are just ignored, so this:
movq $3,((vars+0)) /*first addend 3*/
movq $7,((vars+8)) /*second addend 7*/
movq $0,((vars+16))
is the same as this:
movq $3,(vars+0) /*first addend 3*/
movq $7,(vars+8) /*second addend 7*/
movq $0,(vars+16)
Instead you need to do (for each value):
movq (vars+0), %rax
movq $3, (%rax)

Related

C fibers crashing on printf

I am in the process of creating a fiber threading system in C, following https://graphitemaster.github.io/fibers/ . I have a function to set and restore context, and what i am trying to accomplish is launching a function as a fiber with its own stack. Linux, x86_64 SysV ABI.
extern void restore_context(struct fiber_context*);
extern void create_context(struct fiber_context*);
void foo_fiber()
{
printf("Called as a fiber");
exit(0);
}
int main()
{
const uint32_t stack_size = 4096 * 16;
const uint32_t red_zone_abi = 128;
char* stack = aligned_alloc(16, stack_size);
char* sp = stack + stack_size - red_zone_abi;
struct fiber_context c = {0};
c.rip = (void*)foo_fiber;
c.rsp = (void*)sp;
restore_context(&c);
}
where restore_context code is as follows:
.type restore_context, #function
.global restore_context
restore_context:
movq 8*0(%rdi), %r8
# Load new stack pointer.
movq 8*1(%rdi), %rsp
# Load preserved registers.
movq 8*2(%rdi), %rbx
movq 8*3(%rdi), %rbp
movq 8*4(%rdi), %r12
movq 8*5(%rdi), %r13
movq 8*6(%rdi), %r14
movq 8*7(%rdi), %r15
# Push RIP to stack for RET.
pushq %r8
xorl %eax, %eax
ret
So basically i am creating a new stack on the heap, and since the stack growns downwards, i take the end address - 128 bytes of red zone (which is necessary in the ABI). What restore_context does is simply swap %rsp to my new stack, and push address of foo_fiber onto it and then ret's to jump into foo_fiber. (it also loads some registers from fiber_context structure, but it should not matter now).
From what im seeing in GDB, the program manages to properly jump to foo_fiber and into printf, and then it crashes in _vprintf_internal on movaps %xmm1, 0x10(%rsp).
| 0x7ffff7e2f389 <__vfprintf_internal+153> movdqu (%rax),%xmm1 │
│ 0x7ffff7e2f38d <__vfprintf_internal+157> movups %xmm1,0x128(%rsp) │
│ 0x7ffff7e2f395 <__vfprintf_internal+165> mov 0x10(%rax),%rax │
│ >0x7ffff7e2f399 <__vfprintf_internal+169> movaps %xmm1,0x10(%rsp)
I find that extremely odd since it managed movups %xmm1, 0x128(%rsp) so a much higher offset from stack pointer. What is going on there?
If i change the code of foo_fiber to do something else, for example allocate and randomly fill char[100], it works.
I am kind of at loss about what is going on. At first i thought i might have alignment issues, since the vector xmm functions are crashing, so I changed malloc to aligned_alloc. The crash i am getting is a SIGSEGV, but 0x10

Agree with comments: your stack alignment is incorrect.
It is true that the stack must be aligned to 16 bytes. However, the question is when? The normal rule is that the stack pointer must be a multiple of 16 at the site of a call instruction that calls an ABI-compliant function.
Well, you don't use a call instruction, but what that really means is that on entry to an ABI-compliant function, the stack pointer must be 8 less than a multiple of 16, or in other words an odd multiple of 8, since it assumes it was called with a call instruction that pushed an 8-byte return address. That is just the opposite of what your code does, and so the stack is misaligned for the rest of your program, which makes printf crash when it tries to use aligned move instructions.
You could subtract 8 from the sp computed in your C code.
Or, I'm not really sure why you go to the trouble of loading the destination address into a register, then pushing and ret, when an indirect jump or call would do. (Unless you are deliberately trying to fool the indirect branch predictor?) An indirect call will also kill the stack-alignment bird, by pushing the return address (even though it will never be used). So you could leave the rest of your code alone, and replace all the r8/ret stuff in restore_context with just
callq *(8*0)(%rdi)

pointer to function points the jmp instruction in c

I viewed the disassembly of my c code, and found out that pointer to function actually point the jmp instruction, and doesn't point the real start of the function in memory (doesn't point push ebp instruction, that represents start of function's frame).
I have the followed function (that does basically nothing, it's just an example):
int func2(int a, int b)
{
return 1;
}
I tried to print the address of the function- printf("%p", &func2);
I looked at the disassembly of my code, and found out that the address that is printed is the address of the jmp instuction in assembly code. I would like to get the address that represents the start of function's frame. Is there any way to calculate it from the given address of the jmp instruction?
Moreover, I have the bytes that represents the jmp instruction.
011A11EF E9 CC 08 00 00 jmp func2 (011A1AC0h)
How can I get the address that represents the start of function's frame in memory (011A1AC0h in that case), only from the address of the jmp instruction and from the bytes that represents the jmp instruction itself? I read some information about that, and I found out that it is relative jmp, which means that I need to add the value that jmp holds to the address of the jmp instruction itself. Not sure if that's a good direction for the solution, and if it is, how can I get the value that jmp holds?

E916 is the Intel 64 and IA-32 opcode for a jmp instruction with a rel32 offset. The next four bytes contain the offset. Your disassembler shows them as “CC 08 00 00”, but this is reversed; the offset is 000008CC16, which is 225210. The offset is a signed 32-bit value that is added to the EIP register to obtain the address of the jump target. The EIP contains the address of the next instruction to be executed.
So, in this specific case, take the address of the byte just beyond the jump instruction and add the 32-bit offset.
However:
I count 11 forms of jmp instruction in Intel 64 and IA-32 manual. Who knows what the compiler may use when you make a slight change to source or compiler switches and recompile? You would need to be prepared to decode any form of the jmp instruction, or perhaps other instructions the compiler might use.
Intel has some legacy segment features in its architecture. The code segment on your system might be one big thing so you do not have to worry about that, but I cannot provide assurance.
Your compiler might have used this jmp instruction as a convenient way to create a value for the pointer rather than using the routine’s entry point (the proper term for the instruction where function execution normally begins, not frame) because it makes the linker do the relocation work instead of requiring the compiler to insert instructions to do that work at run-time (specifically, at the time the function address must be evaluated so it can be assigned to the pointer). This is somewhat of a guess, but the compiler might do something else next time. You are treading significantly outside normal computing.

I'm not sure to get your question, but take this sample:
#include <stdio.h>
int foo(int x)
{
return x+1;
}
int main(int argc, char** argv)
{
printf("foo = %p\n", foo);
return 0;
}
Which produces the following disassembly:
foo(int):
pushq %rbp
movq %rsp, %rbp
movl %edi, -4(%rbp)
movl -4(%rbp), %eax
addl $1, %eax
popq %rbp
ret
.LC0:
.string "foo = %p\n"
main:
pushq %rbp
movq %rsp, %rbp
subq $16, %rsp
movl %edi, -4(%rbp)
movq %rsi, -16(%rbp)
movl foo(int), %esi # pass the label argument (2) to printf
movl $.LC0, %edi # pass the format argument (1) to printf
movl $0, %eax
call printf
movl $0, %eax
leave
ret
As you can see, only the label is passed to printf. This label is resolved as an address by the compiler.
Also notice that it will be hard for you to get an absolute address of a running binary: the ASLR (Address Space Layout Randomization will choose a random base address for the binary. The offsets inside the binary still holds, hence relative calls.

On X86 machines E9 is the opcode for JMP rel16/32. So the cpu is going to use the value 0x000008CC as jump offset. The base address is the address of the instruction following the JMP instruction.

Variable types in C and who keeps track of it

I am taking a MOOC course CS50 from Harvard. In one of the first lectures we learned about variables of different data types: int,char, etc.
What I understand is that command (say, within main function) int a = 5 reserves a number of bytes (4 for the most part) of memory on the stack and puts there a sequence of zeros and ones which represent 5.
The same sequence of zeros and ones also could mean a certain character. So somebody needs to keep track of the fact that the sequence of zeros and ones in the memory place reserved for a is to be read as an integer (and not as a character).
The question is who does keep track of it? The computer's memory by sticking a tag to this place in memory saying "hey, whatever you find in these 4 bytes read as an integer"? Or the C compiler, which knows (looking at the type int of a) that when my code asks it to do something (more precisely, to produce a machine code doing something) with the value of a it needs to treat this value as an integer?
I would really appreciate an answer tailored to a C beginner.

With the C language, it's the compiler.
At run-time, there's only the 32 bits = 4 bytes on the stack.
You ask "The computer's memory by sticking a tag to this place...": that's impossible (with current computer architectures - thanks for the hint from #Ivan). The memory itself is just 8 bits (being 0 or 1) ber byte. There is no place in memory that can tag a memory cell with whatever additional info.
There are other languages (e.g. LISP, and to some degree also Java and C#) that store an integer as a combination of the 32 bits for the number plus a few bits or bytes that contain some bit-encoded tagging that here we have an integer. So they need e.g. 6 bytes for a 32-bit integer. But with C, that's not the case. You need knowledge from the source code to correctly interpret the bits found in memory - they don't explain themselves. And there have been special architectures that supported tagging in hardware.

In C, memory is untyped; no information beyond its value is stored there. All type information is computed at compile time from the type of an expression (a variable name, a value computation, a pointer dereferencing etc.) This computation depends on the information the programmer provides through declarations (also in headers) or casts. If that information is wrong, e.g. because a function prototype's parameters are declared wrong, all bets are off. The compiler warns about or prevents mis-declarations in the same "translation unit" (file with headers), but between translation units there are no (or not many?) protections. That's one reason why C has headers: They share common type information between translation units.
C++ keeps this idea but additionally offers run time type information (as opposed to compile time type information) for polymorphic types. It's obvious that every polymorphic object must carry extra information somewhere (not necessarily close to the data though). But that is C++, not C.

For the main part it's the C compiler that keeps track.
During the compilation process the compiler builds up a large data structure called the parse tree. It also keeps track of all variables, functions, types, ... everything with a name (i.e. identifier); this is called the symbol table.
The nodes of both the parse tree and the symbol table have an entry in which the type is recorded. They keep track of all the types.
With mainly these two data structures in hand, the compiler can check if your code does not violate type rules. It allows the compiler to warn you if you use incompatible values or variable names.
C does allow implicit conversation between types. You can for example assign an int to a double. But in memory these are completely different bit patterns for the same value.
In earlier (higher abstraction level) phases of the compilation process, the compiler does not deal with bit patterns yet (or too much), and makes conversions and checks at a higher level.
But during the assembly code generating process, the compiler needs to finally figure it all out in bits. So for an int to double conversion:
int i = 5;
double d = i; // Conversion.
the compiler will generate code to make this conversion happen.
In C however it's very easy to make mistakes and mess things up. This is because C is not a very strongly typed language and is rather flexible. So a programmer also needs to be aware.
Because C does not keep track of types anymore after compilation, so when program is run, a program can often silently continue running with the wrong data after executing some of your mistakes. And if you're 'lucky' that the program crashes, the error message you is not (very) informative.

You have a stack pointer which gives an absolute offset for the topmost stack frame in memory.
For a given scope of execution, the compiler knows which variable is located relative to this stack pointer and emits access to these variables as on offset to the stack pointer. So it is primarily the compiler mapping the variables, but it's the processor which is applying this mapping.
You can easily write programs which compute or remember a memory address which used to be valid, or is just outside of a valid region. The compiler doesn't stop you from doing so, only higher level languages with reference counting and strict boundary checks do at runtime.

The compiler keeps track of all type information during translation, and it will generate the proper machine code to deal with data of different types or sizes.
Let's take the following code:
#include <stdio.h>
int main( void )
{
long long x, y, z;
x = 5;
y = 6;
z = x + y;
printf( "x = %ld, y = %ld, z = %ld\n", x, y, z );
return 0;
}
After running that through gcc -S, the assignment, addition, and print statements are translated to:
movq $5, -24(%rbp)
movq $6, -16(%rbp)
movq -16(%rbp), %rax
addq -24(%rbp), %rax
movq %rax, -8(%rbp)
movq -8(%rbp), %rcx
movq -16(%rbp), %rdx
movq -24(%rbp), %rsi
movl $.LC0, %edi
movl $0, %eax
call printf
movl $0, %eax
leave
ret
movq is the mnemonic for moving values into 64-bit words ("quadwords"). %rax is a general-purpose 64-bit register that's being used as an accumulator. Don't worry too much about the rest of it for now.
Now let's see what happens when we change those longs to shorts:
#include <stdio.h>
int main( void )
{
short x, y, z;
x = 5;
y = 6;
z = x + y;
printf( "x = %hd, y = %hd, z = %hd\n", x, y, z );
return 0;
}
Again, we run it through gcc -S to generate the machine code, et voila:
movw $5, -6(%rbp)
movw $6, -4(%rbp)
movzwl -6(%rbp), %edx
movzwl -4(%rbp), %eax
leal (%rdx,%rax), %eax
movw %ax, -2(%rbp)
movswl -2(%rbp),%ecx
movswl -4(%rbp),%edx
movswl -6(%rbp),%esi
movl $.LC0, %edi
movl $0, %eax
call printf
movl $0, %eax
leave
ret
Different mnemonics - instead of movq we get movw and movswl, we're using %eax, which is the lower 32 bits of %rax, etc.
Once more, this time with floating-point types:
#include <stdio.h>
int main( void )
{
double x, y, z;
x = 5;
y = 6;
z = x + y;
printf( "x = %f, y = %f, z = %f\n", x, y, z );
return 0;
}
gcc -S again:
movabsq $4617315517961601024, %rax
movq %rax, -24(%rbp)
movabsq $4618441417868443648, %rax
movq %rax, -16(%rbp)
movsd -24(%rbp), %xmm0
addsd -16(%rbp), %xmm0
movsd %xmm0, -8(%rbp)
movq -8(%rbp), %rax
movq -16(%rbp), %rdx
movq -24(%rbp), %rcx
movq %rax, -40(%rbp)
movsd -40(%rbp), %xmm2
movq %rdx, -40(%rbp)
movsd -40(%rbp), %xmm1
movq %rcx, -40(%rbp)
movsd -40(%rbp), %xmm0
movl $.LC2, %edi
movl $3, %eax
call printf
movl $0, %eax
leave
ret
New mnemonics (movsd), new registers (%xmm0).
So basically, after translation, there's no need to tag the data with type information; that type information is "baked in" to the machine code itself.

malloc pointer address in main and in other function difference [duplicate]

This question already has answers here:
Printing pointer addresses in C [two questions]
(5 answers)
Closed 5 years ago.
I have the following question. Why is there a difference in the addresses of the two pointers in following example? This is the full code:
#include <stdio.h>
#include <stdlib.h>
void *mymalloc(size_t bytes){
void * ptr = malloc(bytes);
printf("Address1 = %zx\n",(size_t)&ptr);
return ptr;
}
void main (void)
{
unsigned char *bitv = mymalloc(5);
printf("Address2 = %zx\n",(size_t)&bitv);
}
Result:
Address1 = 7ffe150307f0
Address2 = 7ffe15030810

It's because you are printing the address of the pointer variable, not the pointer. Remove the ampersand (&) from bitv and ptr in your printfs.
printf("Address1 = %zx\n",(size_t)ptr);
and
printf("Address2 = %zx\n",(size_t)bitv);
Also, use %p for pointers (and then don't cast to size_t)
WHY?
In this line of code:
unsigned char *bitv = mymalloc(5);
bitv is a pointer and its value is the address of the newly allocated block of memory. But that address also needs to be stored, and &bitv is the address of the where that value is stored. If you have two variables storing the same pointer, they will still each have their own address, which is why &ptr and &bitv have different values.
But, as you expected, ptr and bitv will have the same value when you change your code.

Why is there a difference in the addresses of the two pointers
Because the two pointers are two different pointer(-variable)s, each having it's own address.
The value those two pointer(-variable)s carry in fact are the same.
To prove this print their value (and not their address) by changing:
printf("Address1 = %zx\n",(size_t)&ptr);
to be
printf("Address1 = %p\n", (void*) ptr);
and
printf("Address2 = %zx\n",(size_t)&bitv);
to be
printf("Address2 = %p\n", (void*) bitv);

In your code you used to print pointer's address following code:
printf("%zx", (size_t)&p);
It doesn't print address of variabele it's pointing to, it prints address of pointer.
You could print address using '%p' format:
printf("%p", &n); // PRINTS ADDRESS OF 'n'
There's an example which explains printing addresses
int n;
int *v;
n = 54;
v = &n;
printf("%p", v); // PRINTS ADDRESS OF 'n'
printf("%p", &v); // PRINTS ADDRESS OF pointer 'v'
printf("%p", &n); // PRINTS ADDRESS OF 'n'
printf("%d", *v); // PRINTS VALUE OF 'n'
printf("%d", n); // PRINTS VALUE OF 'n'
So your code should be written like this:
void * get_mem(int size)
{
void * buff = malloc(size); // allocation of memory
// buff is pointing to result of malloc(size)
if (!buff) return NULL; //when malloc returns NULL end function
//else print address of pointer
printf("ADDRESS->%p\n", buff);
return buff;
}
int main(void)
{
void * buff = get_mem(54);
printf("ADDRESS->%p\n", buff);
free(buff);
return 0;
}

(In addition to other answers, which you would read first and probably should help you more ...)
Read a good C programming book. Pointers and addresses are very difficult to explain, and I'm not even trying to. So the address of a pointer &ptr is generally not the same as the value of a pointer (however, you could code ptr= &ptr; but you often don't want to do that)... Look also at the picture explaining virtual address space.
Then read more documentation about malloc: malloc(3) Linux man page, this reference documentation, etc... Here is fast, standard conforming, but disappointing implementation of malloc.
read also documentation about printf: printf(3) man page, printf reference, etc... It should mention %p for printing pointers...
Notice that you don't print a pointer (see Alk's answer), you don't even print its address (of an automatic variable on the call stack), you print some cast to size_t (which might not have the same bit width as a pointer, even if on my Linux/x86-64 it does).
Read also more about C dynamic memory allocation and about pointer aliasing.
At last, read the C11 standard specification n1570.
(I can't believe why you would expect the two outputs to be the same; actually it could happen if a compiler is optimizing the call to mymalloc by inlining a tail call)
So I did not expect the output to be the same in general. However, with gcc -O2 antonis.c -o antonis I've got (with a tiny modification of your code)....
a surprise
However, if you declare the first void *mymalloc(size_t bytes) as a static void*mymalloc(size_t bytes) and compile with GCC 7 on Linux/Debian/x86-64 with optimizations enabled, you do get the same output; because the compiler inlined the call and used the same location for bitv and ptr; here is the generated assembler code with gcc -S -O2 -fverbose-asm antonis.c:
.section .rodata.str1.1,"aMS",#progbits,1
.LC0:
.string "Address1 = %zx\n"
.LC1:
.string "Address2 = %zx\n"
.section .text.startup,"ax",#progbits
.p2align 4,,15
.globl main
.type main, #function
main:
.LFB22:
.cfi_startproc
pushq %rbx #
.cfi_def_cfa_offset 16
.cfi_offset 3, -16
# antonis.c:5: void * ptr = malloc(bytes);
movl $5, %edi #,
# antonis.c:11: {
subq $16, %rsp #,
.cfi_def_cfa_offset 32
# antonis.c:6: printf("Address1 = %zx\n",(size_t)&ptr);
leaq 8(%rsp), %rbx #, tmp92
# antonis.c:5: void * ptr = malloc(bytes);
call malloc#PLT #
# antonis.c:6: printf("Address1 = %zx\n",(size_t)&ptr);
leaq .LC0(%rip), %rdi #,
# antonis.c:5: void * ptr = malloc(bytes);
movq %rax, 8(%rsp) # tmp91, ptr
# antonis.c:6: printf("Address1 = %zx\n",(size_t)&ptr);
movq %rbx, %rsi # tmp92,
xorl %eax, %eax #
call printf#PLT #
# antonis.c:13: printf("Address2 = %zx\n",(size_t)&bitv);
leaq .LC1(%rip), %rdi #,
movq %rbx, %rsi # tmp92,
xorl %eax, %eax #
call printf#PLT #
# antonis.c:14: }
addq $16, %rsp #,
.cfi_def_cfa_offset 16
popq %rbx #
.cfi_def_cfa_offset 8
ret
.cfi_endproc
.LFE22:
.size main, .-main
BTW, if I compile your unmodified source (without static) with gcc -fwhole-program -O2 -S -fverbose-asm I'm getting the same assembler as above.
If you don't add static and don't compile with -fwhole-program the two Adddress1 and Address2 stay different.
two run outputs
I run that antonis executable and got on the first time:
/tmp$ ./antonis
Address1 = 7ffe2b07c148
Address2 = 7ffe2b07c148
and the second time:
/tmp$ ./antonis
Address1 = 7ffc441851a8
Address2 = 7ffc441851a8
If you want to guess why the outputs are different from one run to the next one, think of ASLR.
BTW, a very important notion when coding in C is that of undefined behavior (see also this and that answers and the references I gave there). You don't have any in your question (it is just unspecified behavior), but as my contrived answer shows, you should not expect a particular behavior in that precise case.
PS. I believe (but I am not entirely sure) that a standard conforming C implementation could output Address1= hello world and likewise for Address2. After all, the behavior of printf with %p is implementation defined. And surely you could get 0xdeadbeef for both. More seriously, an address is not always the same (of the same bitwidth) than a size_t or an int, and the standard defines intptr_t in <stdint.h>

How does the stack works?

from what I understood the stack is used in a function to stock all the local variables that are declared.
I also understood that the bottom of the stack correspond to the largest address, and the top to the smallest ones.
So, let's say I have this C program:
#include <stdio.h>
#include <unistd.h>
int main(int argc, char *argv[]){
FILE *file1 = fopen("~/file.txt", "rt");
char buffer[10];
printf(argv[1]);
fclose(file1);
return 0;
}
Where would be pointer named "file1" in the stack compared to pointer named "buffer" ? would it be with upper in the stack (smaller address), or down (larger address) ?
Also, I know that printf() when giving format args (like %d, or %s) will read on the stack, but in this example where will it start to read ?

Wiki article:
http://en.wikipedia.org/wiki/Stack_(abstract_data_type)
The wiki article makes an analogy to a stack of objects, where the top of the stack is the only object you can see (peek) or remove (pop), and where you would add (push) another object onto.
For a typical implementation of a stack, the stack starts at some address and the address decreases as elements are pushed onto the stack. A push typically decrements the stack pointer before storing an element onto the stack, and a pop typically loads an element from the stack and increments the stack pointer after.
However, a stack could also grow upwards, where a push stores an element then increments the stack pointer after, and a pop would decrement the stack pointer before, then load an element from the stack. This is a common way to implement a software stack using an array, where the stack pointer could be a pointer or an index.
Back to the original question, there's no rule on the ordering of local variables on a stack. Typically the total size of all local variables is subtracted from the stack pointer, and the local variables are accessed as offsets from the stack pointer (or a register copy of the stack pointer, such as bp, ebp, or rbp in the case of a X86 processor).

The C language definition does not specify how objects are to be laid out in memory, nor does it specify how arguments are to be passed to functions (the words "stack" and "heap" don't appear anywhere in the language definition itself). That is entirely a function of the compiler and the underlying platform. The answer for x86 may be different from the answer for M68K which may be different from the answer for MIPS which may be different from the answer for SPARC which may be different from the answer for an embedded controller, etc.
All the language definition specifies is lifetime of objects (when storage for an object is allocated and how long it lasts) and the linkage and visibility of identifiers (linkage controls whether multiple instances of the same identifier refer to the same object, visibility controls whether that identifier is usable at a given point).
Having said all that, almost any desktop or server system you're likely to use will have a runtime stack. Also, C was initially developed on a system with a runtime stack, and much of its behavior certainly implies a stack model. A C compiler would be a bugger to implement on a system that didn't use a runtime stack.
I also understood that the bottom of the stack correspond to the largest address, and the top to the smallest ones.
That doesn't have to be true at all. The top of the stack is simply the place something was most recently pushed. Stack elements don't even have to be consecutive in memory (such as when using a linked-list implementation of a stack). On x86, the runtime stack grows "downwards" (towards decreasing addresses), but don't assume that's universal.
Where would be pointer named "file1" in the stack compared to pointer named "buffer" ? would it be with upper in the stack (smaller address), or down (larger address) ?
First, the compiler is not required to lay out distinct objects in memory in the same order that they were declared; it may re-order those objects to minimize padding and alignment issues (struct members must be laid out in the order declared, but there may be unused "padding" bytes between members).
Secondly, only file1 is a pointer. buffer is an array, so space will only be allocated for the array elements themselves - no space is set aside for any pointer.
Also, I know that printf() when giving format args (like %d, or %s) will read on the stack, but in this example where will it start to read ?
It may not read arguments from the stack at all. For example, Linux on x86-64 uses the System V AMD64 ABI calling convention, which passes the first six arguments via registers.
If you're really curious how things look on a particular platform, you need to a) read up on that platform's calling conventions, and b) look at the generated machine code. Most compilers have an option to output a machine code listing. For example, we can take your program and compile it as
gcc -S file.c
which creates a file named file.s containing the following (lightly edited) output:
.file "file.c"
.section .rodata
.LC0:
.string "rt"
.LC1:
.string "~/file.txt"
.text
.globl main
.type main, #function
main:
.LFB2:
pushq %rbp ;; save the current base (frame) pointer
.LCFI0:
movq %rsp, %rbp ;; make the stack pointer the new base pointer
.LCFI1:
subq $48, %rsp ;; allocate an additional 48 bytes on the stack
.LCFI2:
movl %edi, -36(%rbp) ;; since we use the contents of the %rdi(%edi) and %rsi(esi) registers
movq %rsi, -48(%rbp) ;; below, we need to preserve their contents on the stack frame before overwriting them
movl $.LC0, %esi ;; Write the *second* argument of fopen to esi
movl $.LC1, %edi ;; Write the *first* argument of fopen to edi
call fopen ;; arguments to fopen are passed via register, not the stack
movq %rax, -8(%rbp) ;; save the result of fopen to file1
movq $0, -32(%rbp) ;; zero out the elements of buffer (I added
movw $0, -24(%rbp) ;; an explicit initializer to your code)
movq -48(%rbp), %rax ;; copy the pointer value stored in argv to rax
addq $8, %rax ;; offset 8 bytes (giving us the address of argv[1])
movq (%rax), %rdi ;; copy the value rax points to to rdi
movl $0, %eax
call printf ;; like with fopen, arguments to printf are passed via register, not the stack
movq -8(%rbp), %rdi ;; copy file1 to rdi
call fclose ;; again, arguments are passed via register
movl $0, %eax
leave
ret
Now, this is for my specific platform, which is Linux (SLES-10) on x86-64. This does not apply to different hardware/OS combinations.
EDIT
Just realized that I left out some important stuff.
The notation N(reg) means offset N bytes from the address stored in register reg (basically, reg acts as a pointer). %rbp is the base (frame) pointer - it basically acts as the "handle" for the current stack frame. Local variables and function arguments (assuming they are present on the stack) are accessed by offsetting from the address stored in %rbp. On x86, local variables typically have a negative offset from %rbp, while function arguments have a positive offset.
The memory for file1 starts at -8(%rbp) (pointers on x86-64 are 64 bits wide, so we need 8 bytes to store it). That's fairly easy to determine based on the lines
call fopen
movq %rax, -8(%rbp)
On x86, function return values are written to %rax or %eax (%eax is the lower 32 bits of %rax). So the result of fopen is written to %rax, and we copy the contents of %rax to -8(%rbp).
The location for buffer is a little trickier to determine, since you don't do anything with it. I added an explicit initializer (char buffer[10] = {0};) just to generate some instructions that access it, and those are
movq $0, -32(%rbp)
movw $0, -24(%rbp)
From this, we can determine that buffer starts at -32(%rbp). There's 14 bytes of unused "padding" space between the end of buffer and the beginning of file1.
Again, this is how things play out on my specific system; you may see something different.

Very implementation dependent but still nearby. In faxt this is very crucial to setting up buffer overflow based attacks.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight