why is not const char put in rodata segment? - c

Having this:
#include <stdio.h>
#include <stdlib.h>
void f(const char *str){
char *p = (char*)str;
*p=97;
}
int main(){
char c;
f(&c);
char *p = malloc(10);
if (p) { f(p); printf("p:%s\n",p); free(p); }
const char d = 0; //only this part in interest
f(&d); // here the function modifies the the char, but since it is NOT in rodata, no problem
printf("d:%c\n",d);
printf("c:%c\n",c);
}
Will generate gas:
...
.L3:
# a.c:16: const char d = 0;
movb $0, -10(%rbp) #, d
# a.c:17: f(&d);
leaq -10(%rbp), %rax #, tmp98
movq %rax, %rdi # tmp98,
call f #
# a.c:18: printf("d:%c\n",d);
movzbl -10(%rbp), %eax # d, d.0_1
movsbl %al, %eax # d.0_1, _2
movl %eax, %esi # _2,
leaq .LC1(%rip), %rdi #,
movl $0, %eax #,
call printf#PLT #
# a.c:20: printf("c:%c\n",c);
...
Here, the d const char variable is only moved to stack, but its name (rip location) is not in .section .rodata, why is that? When it has const modifier. Being it char* string, then it is placed automatically on rodata (char* does not even need const modifier). I have read somewhere constness is inherited (meaning once a variable is declared with const modifier, then even casting that cause cast-away-constness, does not change the constness - i.e. it will remain). But here the const char modifier is not even taken into account (directly manipulated via stack, as arrays are). Why?

The variable d is not static, but a function-local variable. If the function containing it is called multiple times (recursively, or concurrently in multiple threads), you get multiple instances of the variable (within the stack frame of the function), each of which has its own individual address, even though all of them contain the same data. The C standard requires these instances to be distinct. If you define the variable as static, the compiler might move it into the .rodata section, so that you only get one instance.
String literals (e.g. "foo") however are not required to have individual addresses when they appear in (recursive) functions (unless they are used to initialize a char array), so the compiler usually places them into a .rodata section.

Related

What's different between pointer with array in c? [duplicate]

This question already has answers here:
What is the difference between char s[] and char *s?
(14 answers)
Closed 5 years ago.
I try to google this topic, but no one can explain clear. I try the below code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char * argv[]){
char * p1 = "dddddd";
const char * p2 = "dddddd";
char p3[] = "dddddd";
char * p4 =(char*)malloc(sizeof("dddddd")+1);
strcpy(p4, "dddddd");
//*(p1+2) = 'b'; // test_1
//Output >> Bus error: 10
// *(p2+2) = 'b'; // test_2
// Output >> char_point.c:11:13: error: read-only variable is not assignable
*(p3+2) = 'b'; // test_3
// Output >>
//d
//dddddd
//dddddd
//ddbddd
*(p4+2) = 'k'; // test_4
// Output >>
//d
//dddddd
//dddddd
//ddbddd
//ddkddd
printf("%c\n", *(p1+2));
printf("%s\n", p1);
printf("%s\n", p2);
printf("%s\n", p3);
printf("%s\n", p4);
return 0;
}
I have try 3 tests, but only the test_3 and test_4 can pass. I know const char *p2 is read only, because it's a constant value! but i don't know why p1 can't be modified! which section of memory it's layout? BTW, I compile it on my Mac with GCC.
I try to compile it to dis-asm it by gcc -S, I got this.
.section __TEXT,__text,regular,pure_instructions
.macosx_version_min 10, 13
.globl _main
.p2align 4, 0x90
_main: ## #main
.cfi_startproc
## BB#0:
pushq %rbp
Lcfi0:
.cfi_def_cfa_offset 16
Lcfi1:
.cfi_offset %rbp, -16
movq %rsp, %rbp
Lcfi2:
.cfi_def_cfa_register %rbp
subq $48, %rsp
movl $8, %eax
movl %eax, %ecx
leaq L_.str(%rip), %rdx
movl $0, -4(%rbp)
movl %edi, -8(%rbp)
movq %rsi, -16(%rbp)
movq %rdx, -24(%rbp)
movq %rdx, -32(%rbp)
movl L_main.p3(%rip), %eax
movl %eax, -39(%rbp)
movw L_main.p3+4(%rip), %r8w
movw %r8w, -35(%rbp)
movb L_main.p3+6(%rip), %r9b
movb %r9b, -33(%rbp)
movq %rcx, %rdi
callq _malloc
xorl %r10d, %r10d
movq %rax, -48(%rbp)
movl %r10d, %eax
addq $48, %rsp
popq %rbp
retq
.cfi_endproc
.section __TEXT,__cstring,cstring_literals
L_.str: ## #.str
.asciz "dddddd"
L_main.p3: ## #main.p3
.asciz "dddddd"
.subsections_via_symbols
I want to know every pointer what i declaration, which section is it?
"Why p1 can't be modified?"
Roughly speaking, p1 points to a string literal, and attempts to modify string literals cause undefined behavior in C.
More specifically, according to the §6.4.5 6 of the C11 Standard, string literals are:
used to initialize an array of static storage duration and length just sufficient to contain the sequence. For character string literals, the array elements have type char....
Concerning objects with static storage duration, §5.1.2 1 states that
All objects with static storage duration shall be initialized (set to their initial values) before program startup. The manner and timing of such initialization are otherwise unspecified.
"Which section of memory it's layout?"
But, the Standard does not specify any specific memory layouts that an implementation must follow.
What the Standard does say about the arrays of char which are created from string literals is that (§6.4.5 7):
It is unspecified whether these arrays are distinct provided their elements have the appropriate values. If the program attempts to modify such an array, the behavior is undefined.
So
char * p1 = "dddddd";
this should be
const char * p1 = "dddddd";
String literals (the ones in quotes) reside in read-only memory. Even if you
don't use the const keyword in the declaration of the variable, p1 still
points to read-only memory. So
*(p1+2) = 'b'; // test_1
is going to fail.
Here
*(p2+2) = 'b'; // test_2
// Output >> char_point.c:11:13: error: read-only variable is not assignable
the compiler tells you, you cannot do that because you declared p2 as const.
The difference between the first test and this one, is that the code tries to
modify a character and fails.
Now this:
char * p4 =(char*)malloc(sizeof("dddddd")+1);
First, do not cast malloc & friends. Second: the sizeof-operator returns the
number of bytes needed to store the expression in memory. "ddddd" is a string
literal, it returns a pointer to char, so sizeof("dddddd") returns the number
of bytes that a pointer to char needs to be stored in memory.
The correct function would be strlen:
char * p4 = malloc(strlen("dddddd")+1);
Note that in this case
char txt[] = "Hello world";
printf("%lu\n", sizeof(txt));
will print 12 and not 11. C strings are '\0'-terminated, that means that txt
holds all these characters plus the '\0'-terminating byte. In this case
sizeof doesn't return the number of bytes for a pointer, because txt is an
array.
void foo(char *txt)
{
printf("%lu\n", sizeof(txt));
}
void bar(void)
{
char txt[] = "Hello world";
foo(txt);
}
Here you won't get 12 like before, most probably 8 (today's common size for a
pointer). Even though txt in bar is an array, the txt in foo is a
pointer.
Arrays are constant pointer, which means that an array points to a memory address and you cant change were it points. But you can change the elements in it.
While you can change where the pointer points, but it's elements are constant.
for example consider this code
int main(){
int a[] = {1,2,3};
int * ptr = {1,2,3};
//a[0] == *(a+0)
//a[1] == *(a+1)
a += 1; // this is wrong, because we cant change were array points
ptr += 1; // this is correct, now the pointer ptr will points to the next element which is 2
a[0] += 2 // this is correct, now a[0] will become 3
*ptr += 2 // this is wrong, because we cant change the elements of the pointer.
return 0;
}

Variable types in C and who keeps track of it

I am taking a MOOC course CS50 from Harvard. In one of the first lectures we learned about variables of different data types: int,char, etc.
What I understand is that command (say, within main function) int a = 5 reserves a number of bytes (4 for the most part) of memory on the stack and puts there a sequence of zeros and ones which represent 5.
The same sequence of zeros and ones also could mean a certain character. So somebody needs to keep track of the fact that the sequence of zeros and ones in the memory place reserved for a is to be read as an integer (and not as a character).
The question is who does keep track of it? The computer's memory by sticking a tag to this place in memory saying "hey, whatever you find in these 4 bytes read as an integer"? Or the C compiler, which knows (looking at the type int of a) that when my code asks it to do something (more precisely, to produce a machine code doing something) with the value of a it needs to treat this value as an integer?
I would really appreciate an answer tailored to a C beginner.
With the C language, it's the compiler.
At run-time, there's only the 32 bits = 4 bytes on the stack.
You ask "The computer's memory by sticking a tag to this place...": that's impossible (with current computer architectures - thanks for the hint from #Ivan). The memory itself is just 8 bits (being 0 or 1) ber byte. There is no place in memory that can tag a memory cell with whatever additional info.
There are other languages (e.g. LISP, and to some degree also Java and C#) that store an integer as a combination of the 32 bits for the number plus a few bits or bytes that contain some bit-encoded tagging that here we have an integer. So they need e.g. 6 bytes for a 32-bit integer. But with C, that's not the case. You need knowledge from the source code to correctly interpret the bits found in memory - they don't explain themselves. And there have been special architectures that supported tagging in hardware.
In C, memory is untyped; no information beyond its value is stored there. All type information is computed at compile time from the type of an expression (a variable name, a value computation, a pointer dereferencing etc.) This computation depends on the information the programmer provides through declarations (also in headers) or casts. If that information is wrong, e.g. because a function prototype's parameters are declared wrong, all bets are off. The compiler warns about or prevents mis-declarations in the same "translation unit" (file with headers), but between translation units there are no (or not many?) protections. That's one reason why C has headers: They share common type information between translation units.
C++ keeps this idea but additionally offers run time type information (as opposed to compile time type information) for polymorphic types. It's obvious that every polymorphic object must carry extra information somewhere (not necessarily close to the data though). But that is C++, not C.
For the main part it's the C compiler that keeps track.
During the compilation process the compiler builds up a large data structure called the parse tree. It also keeps track of all variables, functions, types, ... everything with a name (i.e. identifier); this is called the symbol table.
The nodes of both the parse tree and the symbol table have an entry in which the type is recorded. They keep track of all the types.
With mainly these two data structures in hand, the compiler can check if your code does not violate type rules. It allows the compiler to warn you if you use incompatible values or variable names.
C does allow implicit conversation between types. You can for example assign an int to a double. But in memory these are completely different bit patterns for the same value.
In earlier (higher abstraction level) phases of the compilation process, the compiler does not deal with bit patterns yet (or too much), and makes conversions and checks at a higher level.
But during the assembly code generating process, the compiler needs to finally figure it all out in bits. So for an int to double conversion:
int i = 5;
double d = i; // Conversion.
the compiler will generate code to make this conversion happen.
In C however it's very easy to make mistakes and mess things up. This is because C is not a very strongly typed language and is rather flexible. So a programmer also needs to be aware.
Because C does not keep track of types anymore after compilation, so when program is run, a program can often silently continue running with the wrong data after executing some of your mistakes. And if you're 'lucky' that the program crashes, the error message you is not (very) informative.
You have a stack pointer which gives an absolute offset for the topmost stack frame in memory.
For a given scope of execution, the compiler knows which variable is located relative to this stack pointer and emits access to these variables as on offset to the stack pointer. So it is primarily the compiler mapping the variables, but it's the processor which is applying this mapping.
You can easily write programs which compute or remember a memory address which used to be valid, or is just outside of a valid region. The compiler doesn't stop you from doing so, only higher level languages with reference counting and strict boundary checks do at runtime.
The compiler keeps track of all type information during translation, and it will generate the proper machine code to deal with data of different types or sizes.
Let's take the following code:
#include <stdio.h>
int main( void )
{
long long x, y, z;
x = 5;
y = 6;
z = x + y;
printf( "x = %ld, y = %ld, z = %ld\n", x, y, z );
return 0;
}
After running that through gcc -S, the assignment, addition, and print statements are translated to:
movq $5, -24(%rbp)
movq $6, -16(%rbp)
movq -16(%rbp), %rax
addq -24(%rbp), %rax
movq %rax, -8(%rbp)
movq -8(%rbp), %rcx
movq -16(%rbp), %rdx
movq -24(%rbp), %rsi
movl $.LC0, %edi
movl $0, %eax
call printf
movl $0, %eax
leave
ret
movq is the mnemonic for moving values into 64-bit words ("quadwords"). %rax is a general-purpose 64-bit register that's being used as an accumulator. Don't worry too much about the rest of it for now.
Now let's see what happens when we change those longs to shorts:
#include <stdio.h>
int main( void )
{
short x, y, z;
x = 5;
y = 6;
z = x + y;
printf( "x = %hd, y = %hd, z = %hd\n", x, y, z );
return 0;
}
Again, we run it through gcc -S to generate the machine code, et voila:
movw $5, -6(%rbp)
movw $6, -4(%rbp)
movzwl -6(%rbp), %edx
movzwl -4(%rbp), %eax
leal (%rdx,%rax), %eax
movw %ax, -2(%rbp)
movswl -2(%rbp),%ecx
movswl -4(%rbp),%edx
movswl -6(%rbp),%esi
movl $.LC0, %edi
movl $0, %eax
call printf
movl $0, %eax
leave
ret
Different mnemonics - instead of movq we get movw and movswl, we're using %eax, which is the lower 32 bits of %rax, etc.
Once more, this time with floating-point types:
#include <stdio.h>
int main( void )
{
double x, y, z;
x = 5;
y = 6;
z = x + y;
printf( "x = %f, y = %f, z = %f\n", x, y, z );
return 0;
}
gcc -S again:
movabsq $4617315517961601024, %rax
movq %rax, -24(%rbp)
movabsq $4618441417868443648, %rax
movq %rax, -16(%rbp)
movsd -24(%rbp), %xmm0
addsd -16(%rbp), %xmm0
movsd %xmm0, -8(%rbp)
movq -8(%rbp), %rax
movq -16(%rbp), %rdx
movq -24(%rbp), %rcx
movq %rax, -40(%rbp)
movsd -40(%rbp), %xmm2
movq %rdx, -40(%rbp)
movsd -40(%rbp), %xmm1
movq %rcx, -40(%rbp)
movsd -40(%rbp), %xmm0
movl $.LC2, %edi
movl $3, %eax
call printf
movl $0, %eax
leave
ret
New mnemonics (movsd), new registers (%xmm0).
So basically, after translation, there's no need to tag the data with type information; that type information is "baked in" to the machine code itself.

malloc pointer address in main and in other function difference [duplicate]

This question already has answers here:
Printing pointer addresses in C [two questions]
(5 answers)
Closed 5 years ago.
I have the following question. Why is there a difference in the addresses of the two pointers in following example? This is the full code:
#include <stdio.h>
#include <stdlib.h>
void *mymalloc(size_t bytes){
void * ptr = malloc(bytes);
printf("Address1 = %zx\n",(size_t)&ptr);
return ptr;
}
void main (void)
{
unsigned char *bitv = mymalloc(5);
printf("Address2 = %zx\n",(size_t)&bitv);
}
Result:
Address1 = 7ffe150307f0
Address2 = 7ffe15030810
It's because you are printing the address of the pointer variable, not the pointer. Remove the ampersand (&) from bitv and ptr in your printfs.
printf("Address1 = %zx\n",(size_t)ptr);
and
printf("Address2 = %zx\n",(size_t)bitv);
Also, use %p for pointers (and then don't cast to size_t)
WHY?
In this line of code:
unsigned char *bitv = mymalloc(5);
bitv is a pointer and its value is the address of the newly allocated block of memory. But that address also needs to be stored, and &bitv is the address of the where that value is stored. If you have two variables storing the same pointer, they will still each have their own address, which is why &ptr and &bitv have different values.
But, as you expected, ptr and bitv will have the same value when you change your code.
Why is there a difference in the addresses of the two pointers
Because the two pointers are two different pointer(-variable)s, each having it's own address.
The value those two pointer(-variable)s carry in fact are the same.
To prove this print their value (and not their address) by changing:
printf("Address1 = %zx\n",(size_t)&ptr);
to be
printf("Address1 = %p\n", (void*) ptr);
and
printf("Address2 = %zx\n",(size_t)&bitv);
to be
printf("Address2 = %p\n", (void*) bitv);
In your code you used to print pointer's address following code:
printf("%zx", (size_t)&p);
It doesn't print address of variabele it's pointing to, it prints address of pointer.
You could print address using '%p' format:
printf("%p", &n); // PRINTS ADDRESS OF 'n'
There's an example which explains printing addresses
int n;
int *v;
n = 54;
v = &n;
printf("%p", v); // PRINTS ADDRESS OF 'n'
printf("%p", &v); // PRINTS ADDRESS OF pointer 'v'
printf("%p", &n); // PRINTS ADDRESS OF 'n'
printf("%d", *v); // PRINTS VALUE OF 'n'
printf("%d", n); // PRINTS VALUE OF 'n'
So your code should be written like this:
void * get_mem(int size)
{
void * buff = malloc(size); // allocation of memory
// buff is pointing to result of malloc(size)
if (!buff) return NULL; //when malloc returns NULL end function
//else print address of pointer
printf("ADDRESS->%p\n", buff);
return buff;
}
int main(void)
{
void * buff = get_mem(54);
printf("ADDRESS->%p\n", buff);
free(buff);
return 0;
}
(In addition to other answers, which you would read first and probably should help you more ...)
Read a good C programming book. Pointers and addresses are very difficult to explain, and I'm not even trying to. So the address of a pointer &ptr is generally not the same as the value of a pointer (however, you could code ptr= &ptr; but you often don't want to do that)... Look also at the picture explaining virtual address space.
Then read more documentation about malloc: malloc(3) Linux man page, this reference documentation, etc... Here is fast, standard conforming, but disappointing implementation of malloc.
read also documentation about printf: printf(3) man page, printf reference, etc... It should mention %p for printing pointers...
Notice that you don't print a pointer (see Alk's answer), you don't even print its address (of an automatic variable on the call stack), you print some cast to size_t (which might not have the same bit width as a pointer, even if on my Linux/x86-64 it does).
Read also more about C dynamic memory allocation and about pointer aliasing.
At last, read the C11 standard specification n1570.
(I can't believe why you would expect the two outputs to be the same; actually it could happen if a compiler is optimizing the call to mymalloc by inlining a tail call)
So I did not expect the output to be the same in general. However, with gcc -O2 antonis.c -o antonis I've got (with a tiny modification of your code)....
a surprise
However, if you declare the first void *mymalloc(size_t bytes) as a static void*mymalloc(size_t bytes) and compile with GCC 7 on Linux/Debian/x86-64 with optimizations enabled, you do get the same output; because the compiler inlined the call and used the same location for bitv and ptr; here is the generated assembler code with gcc -S -O2 -fverbose-asm antonis.c:
.section .rodata.str1.1,"aMS",#progbits,1
.LC0:
.string "Address1 = %zx\n"
.LC1:
.string "Address2 = %zx\n"
.section .text.startup,"ax",#progbits
.p2align 4,,15
.globl main
.type main, #function
main:
.LFB22:
.cfi_startproc
pushq %rbx #
.cfi_def_cfa_offset 16
.cfi_offset 3, -16
# antonis.c:5: void * ptr = malloc(bytes);
movl $5, %edi #,
# antonis.c:11: {
subq $16, %rsp #,
.cfi_def_cfa_offset 32
# antonis.c:6: printf("Address1 = %zx\n",(size_t)&ptr);
leaq 8(%rsp), %rbx #, tmp92
# antonis.c:5: void * ptr = malloc(bytes);
call malloc#PLT #
# antonis.c:6: printf("Address1 = %zx\n",(size_t)&ptr);
leaq .LC0(%rip), %rdi #,
# antonis.c:5: void * ptr = malloc(bytes);
movq %rax, 8(%rsp) # tmp91, ptr
# antonis.c:6: printf("Address1 = %zx\n",(size_t)&ptr);
movq %rbx, %rsi # tmp92,
xorl %eax, %eax #
call printf#PLT #
# antonis.c:13: printf("Address2 = %zx\n",(size_t)&bitv);
leaq .LC1(%rip), %rdi #,
movq %rbx, %rsi # tmp92,
xorl %eax, %eax #
call printf#PLT #
# antonis.c:14: }
addq $16, %rsp #,
.cfi_def_cfa_offset 16
popq %rbx #
.cfi_def_cfa_offset 8
ret
.cfi_endproc
.LFE22:
.size main, .-main
BTW, if I compile your unmodified source (without static) with gcc -fwhole-program -O2 -S -fverbose-asm I'm getting the same assembler as above.
If you don't add static and don't compile with -fwhole-program the two Adddress1 and Address2 stay different.
two run outputs
I run that antonis executable and got on the first time:
/tmp$ ./antonis
Address1 = 7ffe2b07c148
Address2 = 7ffe2b07c148
and the second time:
/tmp$ ./antonis
Address1 = 7ffc441851a8
Address2 = 7ffc441851a8
If you want to guess why the outputs are different from one run to the next one, think of ASLR.
BTW, a very important notion when coding in C is that of undefined behavior (see also this and that answers and the references I gave there). You don't have any in your question (it is just unspecified behavior), but as my contrived answer shows, you should not expect a particular behavior in that precise case.
PS. I believe (but I am not entirely sure) that a standard conforming C implementation could output Address1= hello world and likewise for Address2. After all, the behavior of printf with %p is implementation defined. And surely you could get 0xdeadbeef for both. More seriously, an address is not always the same (of the same bitwidth) than a size_t or an int, and the standard defines intptr_t in <stdint.h>

In which data segment is the C string stored?

I'm wondering what's the difference between char s[] = "hello" and char *s = "hello".
After reading this and this, I'm still not very clear on this question.
As I know, there are five data segments in memory, Text, BSS, Data, Stack and Heap.
From my understanding,
in case of char s[] = "hello":
"hello" is in Text.
s is in Data if it is a global variable or in Stack if it is a local variable.
We also have a copy of "hello" where the s is stored, so we can modify the value of this string via s.
in case of char *s = "hello":
"hello" is in Text.
s is in Data if it is a global variable or in Stack if it is a local variable.
s just points to "hello" in Text and we don't have a copy of it, therefore modifying the value of string via this pointer should cause "Segmentation Fault".
Am I right?
You are right that "hello" for the first case is mutable and for the second case is immutable string. And they are kept in read-only memory before initialization.
In the first case the mutable memory is initialized/copied from immutable string. In the second case the pointer refers to immutable string.
For first case wikipedia says,
The values for these variables are initially stored within the
read-only memory (typically within .text) and are copied into the
.data segment during the start-up routine of the program.
Let us examine segment.c file.
char*s = "hello"; // string
char sar[] = "hello"; // string array
char content[32];
int main(int argc, char*argv[]) {
char psar[] = "parhello"; // local/private string array
char*ps = "phello"; // private string
content[0] = 1;
sar[3] = 1; // OK
// sar++; // not allowed
// s[2] = 1; // segmentation fault
s = sar;
s[2] = 1; // OK
psar[3] = 1; // OK
// ps[2] = 1; // segmentation fault
ps = psar;
ps[2] = 1; // OK
return 0;
}
Here is the assembly generated for segment.c file. Note that both s and sar is in global aka .data segment. It seems sar is const pointer to a mutable initialized memory or not pointer at all(practically it is an array). And eventually it has an implication that sizeof(sar) = 6 is different to sizeof(s) = 8. There are "hello" and "phello" in readonly(.rodata) section and effectively immutable.
.file "segment.c"
.globl s
.section .rodata
.LC0:
.string "hello"
.data
.align 8
.type s, #object
.size s, 8
s:
.quad .LC0
.globl sar
.type sar, #object
.size sar, 6
sar:
.string "hello"
.comm content,32,32
.section .rodata
.LC1:
.string "phello"
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
subq $64, %rsp
movl %edi, -52(%rbp)
movq %rsi, -64(%rbp)
movq %fs:40, %rax
movq %rax, -8(%rbp)
xorl %eax, %eax
movl $1752326512, -32(%rbp)
movl $1869376613, -28(%rbp)
movb $0, -24(%rbp)
movq $.LC1, -40(%rbp)
movb $1, content(%rip)
movb $1, sar+3(%rip)
movq $sar, s(%rip)
movq s(%rip), %rax
addq $2, %rax
movb $1, (%rax)
movb $1, -29(%rbp)
leaq -32(%rbp), %rax
movq %rax, -40(%rbp)
movq -40(%rbp), %rax
addq $2, %rax
movb $1, (%rax)
movl $0, %eax
movq -8(%rbp), %rdx
xorq %fs:40, %rdx
je .L2
call __stack_chk_fail
.L2:
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3"
.section .note.GNU-stack,"",#progbits
Again for local variable in main, the compiler does not bother to create a name. And it may keep it in register or in stack memory.
Note that local variable value "parhello" is optimized into 1752326512 and 1869376613 numbers. I discovered it by changing the value of "parhello" to "parhellp". The diff of the assembly output is as follows,
39c39
< movl $1886153829, -28(%rbp)
---
> movl $1869376613, -28(%rbp)
So there is no separate immutable store for psar . It is turned into integers in the code segment.
answer to your first question:
char s[] = "hello";
s is an array of type char. An array is a const pointer, meaning that you cannot change the s using pointer arithmetic (i.e. s++). The data aren't const, though, so you can change it.
See this example C code:
#include <stdio.h>
void reverse(char *p){
char c;
char* q = p;
while (*q) q++;
q--; // point to the end
while (p < q) {
c = *p;
*p++ = *q;
*q-- = c;
}
}
int main(){
char s[] = "DCBA";
reverse( s);
printf("%s\n", s); // ABCD
}
which reverses the text "DCBA" and produces "ABCD".
char *p = "hello"
p is a pointer to a char. You can do pointer arithmetic -- p++ will compile -- and puts data in read-only parts of the memory (const data).
and using p[0]='a'; will result to runtime error:
#include <stdio.h>
int main(){
char* s = "DCBA";
s[0]='D'; // compile ok but runtime error
printf("%s\n", s); // ABCD
}
this compiles, but not runs.
const char* const s = "DCBA";
With a const char* const, you can change neither s nor the data content which point to (i.e. "DCBE"). so data and pointer are const:
#include <stdio.h>
int main(){
const char* const s = "DCBA";
s[0]='D'; // compile error
printf("%s\n", s); // ABCD
}
The Text segment is normally the segment where your code is stored and is const; i.e. unchangeable. In embedded systems, this is the ROM, PROM, or flash memory; in a desktop computer, it can be in RAM.
The Stack is RAM memory used for local variables in functions.
The Heap is RAM memory used for global variables and heap-initialized data.
BSS contains all global variables and static variables that are initialized to zero or not initialized vars.
For more information, see the relevant Wikipedia and this relevant Stack Overflow question
With regards to s itself: The compiler decides where to put it (in stack space or CPU registers).
For more information about memory protection and access violations or segmentation faults, see the relevant Wikipedia page
This is a very broad topic, and ultimately the exact answers depend on your hardware and compiler.

Pass in pointer of c struct to x86-32 assembly becomes automatically dereference

I need to pass an address to an assembly function, but seems like I'm not able to do that.
Here's the c file:
int asm_func(void *arg);
struct foo {
int len;
char *buf;
};
int bar(int size, char *buf){
struct foo arg_to_asm_function;
arg_to_asm_function.len = size;
arg_to_asm_function.buf = buf;
return asm_func(&arg_to_asm_function);
}
Here's the assembly:
.global asm_func
asm_func:
pushl %esi
movl 8(%ebp), %esi
/* do something with &arg_to_asm_function, which is in esi */
popl %esi
ret
If I invoke the c function bar with arguments bar(5, "hello world"), and I stepi into the instruction
movl 8(%ebp), %esi
I get the value 5 in %esi (value of first field in the struct foo).
The expected value in %esi is the pointer to the struct foo that I declared, i.e. &arg_to_asm_function, not the value inside that address.
Why is this happening? Does the compiler automatically dereference the pointer for me? How would I pass in the address of the struct into %esi?
You didn't set up the stack frame in the assembly function, so 8(%ebp) won't give you the correct value. Because ebp still has the value from your C function, you're seeing the value of the first argument passed to that function instead.
You need to set up the stack frame with
push %ebp
mov %esp, %ebp
...
pop %ebp
This is assuming that the calling convention passes the function parameters on the stack - otherwise you'll need to get the parameter value from a register.

Resources