Regarding variables in a process memory - c

In the following code:
int main()
{
int i;
char* s = "Hello";
i = 10;
}
In memory:
10 should go in stack
address of "hello" should go in stack
"Hello" should be stored in read only memory
In the process memory, where is i and s. Where do they reside?

The variable names are just a convenience for the programmer, so that he can refer to them. The values themselve are stored wherever the compiler sees fit to place them, but the names are discarded.
If the optimizer decides that a certain variable has a small enough scope and there are enough registers available, the variable you refer to as i may not even have a storage place in the process memory, because it can be kept in a register as well.
So it mostly depends on the compiler decision, where a certain variable goes. Static and global variables are always in the process memory, but local variables may not be on the stack.

For this program:
The locally-scoped variables i and s are on the stack.
The string "Hello" is stored in the program .rodata section.
The value 10 ($10) for i and the address of the "Hello" string (.LC0) for s are initialized into i and s during the main function preamble.
You can see this all happening with e.g. gcc -S -o bar.s bar.c which will output the assembly language code for bar.c:
.file "bar.c"
.section .rodata
.LC0:
.string "Hello"
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movq $.LC0, -8(%rbp)
movl $10, -12(%rbp)
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu/Linaro 4.7.3-1ubuntu1) 4.7.3"
.section .note.GNU-stack,"",#progbits

The i and s here are local variables so they are stored in the stack segment. By referring i and s here are just representations in the language.

Related

Understanding a few of the 'helper' gnu-as directives

I have compiled a program main.c with about two lines of code to see what directives gcc / gas add to the unoptimized assembly file, using:
gcc -o main.s main.c -S
I can look up the concise description of each directive on the gas directive page, but was hoping someone could give a bit more context to some of these directives and what its practical usage is (for example, in gdb or the linker or wherever downstream). Here is the full assembly file with the items in question below:
.file "main.c"
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl $4, -8(%rbp)
movl $6, -4(%rbp)
movl -8(%rbp), %edx
movl -4(%rbp), %eax
addl %edx, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0"
.section .note.GNU-stack,"",#progbits
.file: it seems this is halfway-obsolete based on This statement may go away in future: it is only recognized to be compatible with old as programs.. But given that it is still there, where or how is this currently being used?
.ident: it seems like this gives the same thing as doing gcc --version. Is this used at all beyond giving helper information on the 'gcc' that was used to issue the command, or how is this used?
.section .note...: I have seen .section .text, .section .bss, .section .text, ...but I've never come across a .note, and doing a ctrl-f to search for note doesn't give anything on this page. What is this line doing with the three arguments? And the #progbits ?
.size: given that the directives take up no space, this is giving us the length of the first statement within main -- pushq %rbp minus the last statement ret, which is the length of the main function. But again, what usage is this? Also, it says on the as page that It is only permitted inside .def/.endef pairs., but this isn't inside those pairs, right?
.section .text.startup,"ax",#progbits -- what is text.startup, the ax looks like it means allocatable+executable, but what or where is the text.startup ?

How does x=x+1 is evaluated by the compiler and how is represented in assembly?

I'm trying to understand how does the compiler "sees" the i+1 part from expression i=i+1. I understand that i=3 means putting the value 3 in the location memory of variable i.
My guess about the i=i+1 is that the compiler expects a value from the right side of the "=" operator, so it gets the value from the location memory of variable i (which is 3, after the assignment) and add 1 to it, and the final result of the "i+1" expression(3+1=4) is stored back into the location memory of variable i, as a value. Is that correct?
And if it is, it means that any variable/combination of variables and literals present on the right side of an "=" operator will always be replaced with the value stored in them and those value can be added/substracted/etc with the values from other variables/literals (as in the x+1 expression), whilst the final result of those calculations will also be literal values (ex: 5, literal strings, etc), and will also be stored like values in a single variable on the left side of the "=" operator.
I'm also curious how this code is seen in assembly, and what are the main operations of this incrementation of i ( i = i+1);
#include <stdio.h>
int main()
{
int i = 3;
i = i + 1; // i should have the value of 4 stored back in it;
return 0;
}
This is not answerable for the general case. It depends on the target platform. If you want to inspect the assembly, you can do so with the -S parameter with gcc. When I did that to your code, it gave me this:
/tmp$ cat main.s
.file "main.c"
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl $3, -4(%rbp)
addl $1, -4(%rbp)
movl $0, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Debian 9.2.1-8) 9.2.1 20190909"
.section .note.GNU-stack,"",#progbits
A brief little explanation of what is happening here. First we push the value of the stackpointer. This is so that we can jump back later.
.cfi_startproc
pushq %rbp
Then we set up the stack frame with this code. It corresponds to declaring variables.
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
Then we have this. Comments are mine.
movl $3, -4(%rbp) # i = 3;
addl $1, -4(%rbp) # i = i + 1;
Lastly, we return from the main function
movl $0, %eax # Store 0 in the "return register"
popq %rbp # Restore stackpointer
.cfi_def_cfa 7, 8
ret # return
Note that there is not a 1-1 relationship between lines. Not even for very simple lines.
Please also note that C imposes requirement on the observable behavior of the program and not on the generated assembly. So for instance, a compiler might remove the whole body for the main function because the variable i is not used in an observable way. And it will if you use optimization. When I recompiled your code with -O3 I got this instead:
/tmp/$ cat main.s
.file "main.c"
.text
.section .text.startup,"ax",#progbits
.p2align 4
.globl main
.type main, #function
main:
.LFB11:
.cfi_startproc
xorl %eax, %eax
ret
.cfi_endproc
.LFE11:
.size main, .-main
.ident "GCC: (Debian 9.2.1-8) 9.2.1 20190909"
.section .note.GNU-stack,"",#progbits
Notice how much that got removed from main. It can be interesting that movl $0, %eax has changed to xorl %eax, %eax. If you think about it, it's pretty obvious that this is a "set zero" operation. One could reasonably argue why anyone would write stuff like that. Well, the optimizer does certainly not optimize for readability. There are a few reasons why it is better. You can read about them here: What is the best way to set a register to zero in x86 assembly: xor, mov or and?

GCC assembly code shows 32bit registers on 64bit machine

I am trying to learn how to use ptrace library for tracing all system calls and their arguments. I am stuck in getting the arguments passed to system call.
I went through many online resources and SO questions and figured out that on 64 bit machine the arguments are stored in registers rax(sys call number), rdi, rsi, rdx, r10, r8, r9
in the same order. Check this website .
Just to confirm this I wrote a simple C program as follows
#include<stdio.h>
#include<fcntl.h>
int main() {
printf("some print data");
open("/tmp/sprintf.c", O_RDWR);
}
and generated assembly code for this using gcc -S t.c but assembly code generated is as below
.file "t.c"
.section .rodata
.LC0:
.string "some print data"
.LC1:
.string "/tmp/sprintf.c"
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl $.LC0, %edi
movl $0, %eax
call printf
movl $2, %esi
movl $.LC1, %edi
movl $0, %eax
call open
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4"
.section .note.GNU-stack,"",#progbits
As you can see this code is storing parameters on esi and edi instead.
Why is happening?
Also please guide me on what is the best way to access these passed arguments from these registers/memory location from a C code? How can I figure out if the contents of register is the argument itself or is it a memory location where actual argument is stored?
Thanks!
this code is storing parameters on esi and edi
32-bit instructions are smaller, thus preferred when possible. See also Why do most x64 instructions zero the upper part of a 32 bit register.
How can I figure out if the contents of register is the argument itself or is it a memory location where actual argument is stored?
The AMD64 SystemV calling convention never implicitly replaces a function arg with a hidden pointer. Integer / pointer args in the C prototype always go in the arg-passing registers directly.
structs / unions passed by value go in one or more registers, or on the stack.
The full details are documented in the ABI. See more links in the x86 tag wiki. http://www.x86-64.org/documentation.html is down right now, so I linked the current revision on github.

Why Linux does not crash but output an random string?

char* getChar()
{
//char* pStr = "TEST!!!";
char str[10] = "TEST!!!";
return str;
}
int main(int argc, char *argv[])
{
double *XX[2];
printf("STR is %s.\n", getChar());
return (0);
}
I know a temporary variable in a stack SHOULD not be returned.
Actually it will output a undecided string.
When does Linux crash except NULL-Pointer-Reference?
You've got some undefined behavior. Read also this answer to have an idea of what that could mean.
If you wish an explanation, you need to dive into implementation specific details.
Here it goes....
This carp.c file (very similar to yours, I renamed getChar to carp and included <stdio.h>)
#include <stdio.h>
char *carp() {
char str[10] = "TEST!!!";
return str;
}
int main(int argc, char**argv)
{
printf("STR is %s.\n", carp());
return 0;
}
is compiled by gcc -O -fverbose-asm -S carp.c into a good warning
carp.c: In function 'carp':
carp.c:4:8: warning: function returns address of local variable [-Wreturn-local-addr]
return str;
^
and into this assembler code (GCC 4.9.1 on Debian/Sid/x86-64)
.text
.Ltext0:
.globl carp
.type carp, #function
carp:
.LFB11:
.file 1 "carp.c"
.loc 1 2 0
.cfi_startproc
.loc 1 5 0
leaq -16(%rsp), %rax #, tmp85
ret
.cfi_endproc
.LFE11:
.size carp, .-carp
.section .rodata.str1.1,"aMS",#progbits,1
.LC0:
.string "STR is %s.\n"
.text
.globl main
.type main, #function
main:
.LFB12:
.loc 1 7 0
.cfi_startproc
.LVL0:
subq $24, %rsp #,
.cfi_def_cfa_offset 32
.loc 1 8 0
movq %rsp, %rsi #,
.LVL1:
movl $.LC0, %edi #,
.LVL2:
movl $0, %eax #,
call printf #
.LVL3:
.loc 1 10 0
movl $0, %eax #,
addq $24, %rsp #,
.cfi_def_cfa_offset 8
ret
.cfi_endproc
.LFE12:
.size main, .-main
As you notice, the bad carp function is returning the stack pointer minus 16 bytes. And main would print what happens to be there. And what happens to be at that location probably depends upon a lot of factors (your environment environ(7), the ASLR used for the stack, etc....). If you are interested in understanding what exactly is the memory (and address space) at entry into main, dive into execve(2), ld.so(8), your compiler's crt0, your kernel's source code, your dynamic linker source code, your libc source code, the x86-64 ABI, etc.... My life is too short to take many hours to explain all of this.
BTW, notice that the initialization of local str to "TEST!!!" has been rightly optimized out by my compiler.
Read also signal(7): your process can be terminated in many cases (I won't call that "Linux crashing" like you do), e.g. when dereferencing a pointer out of its address space in virtual memory (see also this), executing a bad machine code, etc...
It didn't crash because you got lucky; since you have no way of knowing just how long this string that gets printed is, you have no idea what parts of memory it will venture into.

long compile time when using big arrays in the extern block

Why does gcc take a long time to compile a C code if it has a big array in the extern block?
#define MAXNITEMS 100000000
int buff[MAXNITEMS];
int main (int argc, char *argv[])
{
return 0;
}
I suspect a bug somewhere. There is no reason for the compile to take longer, no matter how big the array is since the compiler will just write an integer into the .bss segment since you never assign a value to an element in it. Proof:
.file "big.c"
.comm buff,4000000000000000000,32
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
movl %edi, -4(%rbp)
movq %rsi, -16(%rbp)
movl $0, %eax
popq %rbp
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu/Linaro 4.7.3-1ubuntu1) 4.7.3"
.section .note.GNU-stack,"",#progbits
As you can see, the only thing left of the array in the assembly is .comm buff,4000000000000000000,32.
I suggest you gcc with -S to see the assembler code. Maybe your version of GCC has bug. I tested with GCC 4.7.3 and the compile times here are the same, no matter which value I use.
Related: Where are static variables stored (in C/C++)?

Resources