What's different between pointer with array in c? [duplicate] - c

This question already has answers here:
What is the difference between char s[] and char *s?
(14 answers)
Closed 5 years ago.
I try to google this topic, but no one can explain clear. I try the below code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char * argv[]){
char * p1 = "dddddd";
const char * p2 = "dddddd";
char p3[] = "dddddd";
char * p4 =(char*)malloc(sizeof("dddddd")+1);
strcpy(p4, "dddddd");
//*(p1+2) = 'b'; // test_1
//Output >> Bus error: 10
// *(p2+2) = 'b'; // test_2
// Output >> char_point.c:11:13: error: read-only variable is not assignable
*(p3+2) = 'b'; // test_3
// Output >>
//d
//dddddd
//dddddd
//ddbddd
*(p4+2) = 'k'; // test_4
// Output >>
//d
//dddddd
//dddddd
//ddbddd
//ddkddd
printf("%c\n", *(p1+2));
printf("%s\n", p1);
printf("%s\n", p2);
printf("%s\n", p3);
printf("%s\n", p4);
return 0;
}
I have try 3 tests, but only the test_3 and test_4 can pass. I know const char *p2 is read only, because it's a constant value! but i don't know why p1 can't be modified! which section of memory it's layout? BTW, I compile it on my Mac with GCC.
I try to compile it to dis-asm it by gcc -S, I got this.
.section __TEXT,__text,regular,pure_instructions
.macosx_version_min 10, 13
.globl _main
.p2align 4, 0x90
_main: ## #main
.cfi_startproc
## BB#0:
pushq %rbp
Lcfi0:
.cfi_def_cfa_offset 16
Lcfi1:
.cfi_offset %rbp, -16
movq %rsp, %rbp
Lcfi2:
.cfi_def_cfa_register %rbp
subq $48, %rsp
movl $8, %eax
movl %eax, %ecx
leaq L_.str(%rip), %rdx
movl $0, -4(%rbp)
movl %edi, -8(%rbp)
movq %rsi, -16(%rbp)
movq %rdx, -24(%rbp)
movq %rdx, -32(%rbp)
movl L_main.p3(%rip), %eax
movl %eax, -39(%rbp)
movw L_main.p3+4(%rip), %r8w
movw %r8w, -35(%rbp)
movb L_main.p3+6(%rip), %r9b
movb %r9b, -33(%rbp)
movq %rcx, %rdi
callq _malloc
xorl %r10d, %r10d
movq %rax, -48(%rbp)
movl %r10d, %eax
addq $48, %rsp
popq %rbp
retq
.cfi_endproc
.section __TEXT,__cstring,cstring_literals
L_.str: ## #.str
.asciz "dddddd"
L_main.p3: ## #main.p3
.asciz "dddddd"
.subsections_via_symbols
I want to know every pointer what i declaration, which section is it?

"Why p1 can't be modified?"
Roughly speaking, p1 points to a string literal, and attempts to modify string literals cause undefined behavior in C.
More specifically, according to the §6.4.5 6 of the C11 Standard, string literals are:
used to initialize an array of static storage duration and length just sufficient to contain the sequence. For character string literals, the array elements have type char....
Concerning objects with static storage duration, §5.1.2 1 states that
All objects with static storage duration shall be initialized (set to their initial values) before program startup. The manner and timing of such initialization are otherwise unspecified.
"Which section of memory it's layout?"
But, the Standard does not specify any specific memory layouts that an implementation must follow.
What the Standard does say about the arrays of char which are created from string literals is that (§6.4.5 7):
It is unspecified whether these arrays are distinct provided their elements have the appropriate values. If the program attempts to modify such an array, the behavior is undefined.

So
char * p1 = "dddddd";
this should be
const char * p1 = "dddddd";
String literals (the ones in quotes) reside in read-only memory. Even if you
don't use the const keyword in the declaration of the variable, p1 still
points to read-only memory. So
*(p1+2) = 'b'; // test_1
is going to fail.
Here
*(p2+2) = 'b'; // test_2
// Output >> char_point.c:11:13: error: read-only variable is not assignable
the compiler tells you, you cannot do that because you declared p2 as const.
The difference between the first test and this one, is that the code tries to
modify a character and fails.
Now this:
char * p4 =(char*)malloc(sizeof("dddddd")+1);
First, do not cast malloc & friends. Second: the sizeof-operator returns the
number of bytes needed to store the expression in memory. "ddddd" is a string
literal, it returns a pointer to char, so sizeof("dddddd") returns the number
of bytes that a pointer to char needs to be stored in memory.
The correct function would be strlen:
char * p4 = malloc(strlen("dddddd")+1);
Note that in this case
char txt[] = "Hello world";
printf("%lu\n", sizeof(txt));
will print 12 and not 11. C strings are '\0'-terminated, that means that txt
holds all these characters plus the '\0'-terminating byte. In this case
sizeof doesn't return the number of bytes for a pointer, because txt is an
array.
void foo(char *txt)
{
printf("%lu\n", sizeof(txt));
}
void bar(void)
{
char txt[] = "Hello world";
foo(txt);
}
Here you won't get 12 like before, most probably 8 (today's common size for a
pointer). Even though txt in bar is an array, the txt in foo is a
pointer.

Arrays are constant pointer, which means that an array points to a memory address and you cant change were it points. But you can change the elements in it.
While you can change where the pointer points, but it's elements are constant.
for example consider this code
int main(){
int a[] = {1,2,3};
int * ptr = {1,2,3};
//a[0] == *(a+0)
//a[1] == *(a+1)
a += 1; // this is wrong, because we cant change were array points
ptr += 1; // this is correct, now the pointer ptr will points to the next element which is 2
a[0] += 2 // this is correct, now a[0] will become 3
*ptr += 2 // this is wrong, because we cant change the elements of the pointer.
return 0;
}

Related

Please explain the output , where the C character pointer was assigned to new memory without Malloc call

#include <stdio.h>
int
main ()
{
char *a = "Hello";
a = "Hello_World";
printf ("%s", a);
return 0;
}
Now this program returned corrected and printed “Hello_World”.
But I remember reading that for changing a once initialised string pointer , I must use malloc to allocate memory and then input the new value of the string .
Please explain? Especially where is the memory allocated for the new changed value of the string , and what about the old memory.
Use gcc -S to see generated assembler.
You well see something like this
.LC0:
.string "Hello"
.LC1:
.string "Hello_World"
It was allocated in .data section as constants.
Then it wil be used like this
movq $.LC0, -8(%rbp)
movq $.LC1, -8(%rbp)
movq -8(%rbp), %rax
movq %rax, %rsi
movl $.LC2, %edi

why is not const char put in rodata segment?

Having this:
#include <stdio.h>
#include <stdlib.h>
void f(const char *str){
char *p = (char*)str;
*p=97;
}
int main(){
char c;
f(&c);
char *p = malloc(10);
if (p) { f(p); printf("p:%s\n",p); free(p); }
const char d = 0; //only this part in interest
f(&d); // here the function modifies the the char, but since it is NOT in rodata, no problem
printf("d:%c\n",d);
printf("c:%c\n",c);
}
Will generate gas:
...
.L3:
# a.c:16: const char d = 0;
movb $0, -10(%rbp) #, d
# a.c:17: f(&d);
leaq -10(%rbp), %rax #, tmp98
movq %rax, %rdi # tmp98,
call f #
# a.c:18: printf("d:%c\n",d);
movzbl -10(%rbp), %eax # d, d.0_1
movsbl %al, %eax # d.0_1, _2
movl %eax, %esi # _2,
leaq .LC1(%rip), %rdi #,
movl $0, %eax #,
call printf#PLT #
# a.c:20: printf("c:%c\n",c);
...
Here, the d const char variable is only moved to stack, but its name (rip location) is not in .section .rodata, why is that? When it has const modifier. Being it char* string, then it is placed automatically on rodata (char* does not even need const modifier). I have read somewhere constness is inherited (meaning once a variable is declared with const modifier, then even casting that cause cast-away-constness, does not change the constness - i.e. it will remain). But here the const char modifier is not even taken into account (directly manipulated via stack, as arrays are). Why?
The variable d is not static, but a function-local variable. If the function containing it is called multiple times (recursively, or concurrently in multiple threads), you get multiple instances of the variable (within the stack frame of the function), each of which has its own individual address, even though all of them contain the same data. The C standard requires these instances to be distinct. If you define the variable as static, the compiler might move it into the .rodata section, so that you only get one instance.
String literals (e.g. "foo") however are not required to have individual addresses when they appear in (recursive) functions (unless they are used to initialize a char array), so the compiler usually places them into a .rodata section.

malloc pointer address in main and in other function difference [duplicate]

This question already has answers here:
Printing pointer addresses in C [two questions]
(5 answers)
Closed 5 years ago.
I have the following question. Why is there a difference in the addresses of the two pointers in following example? This is the full code:
#include <stdio.h>
#include <stdlib.h>
void *mymalloc(size_t bytes){
void * ptr = malloc(bytes);
printf("Address1 = %zx\n",(size_t)&ptr);
return ptr;
}
void main (void)
{
unsigned char *bitv = mymalloc(5);
printf("Address2 = %zx\n",(size_t)&bitv);
}
Result:
Address1 = 7ffe150307f0
Address2 = 7ffe15030810
It's because you are printing the address of the pointer variable, not the pointer. Remove the ampersand (&) from bitv and ptr in your printfs.
printf("Address1 = %zx\n",(size_t)ptr);
and
printf("Address2 = %zx\n",(size_t)bitv);
Also, use %p for pointers (and then don't cast to size_t)
WHY?
In this line of code:
unsigned char *bitv = mymalloc(5);
bitv is a pointer and its value is the address of the newly allocated block of memory. But that address also needs to be stored, and &bitv is the address of the where that value is stored. If you have two variables storing the same pointer, they will still each have their own address, which is why &ptr and &bitv have different values.
But, as you expected, ptr and bitv will have the same value when you change your code.
Why is there a difference in the addresses of the two pointers
Because the two pointers are two different pointer(-variable)s, each having it's own address.
The value those two pointer(-variable)s carry in fact are the same.
To prove this print their value (and not their address) by changing:
printf("Address1 = %zx\n",(size_t)&ptr);
to be
printf("Address1 = %p\n", (void*) ptr);
and
printf("Address2 = %zx\n",(size_t)&bitv);
to be
printf("Address2 = %p\n", (void*) bitv);
In your code you used to print pointer's address following code:
printf("%zx", (size_t)&p);
It doesn't print address of variabele it's pointing to, it prints address of pointer.
You could print address using '%p' format:
printf("%p", &n); // PRINTS ADDRESS OF 'n'
There's an example which explains printing addresses
int n;
int *v;
n = 54;
v = &n;
printf("%p", v); // PRINTS ADDRESS OF 'n'
printf("%p", &v); // PRINTS ADDRESS OF pointer 'v'
printf("%p", &n); // PRINTS ADDRESS OF 'n'
printf("%d", *v); // PRINTS VALUE OF 'n'
printf("%d", n); // PRINTS VALUE OF 'n'
So your code should be written like this:
void * get_mem(int size)
{
void * buff = malloc(size); // allocation of memory
// buff is pointing to result of malloc(size)
if (!buff) return NULL; //when malloc returns NULL end function
//else print address of pointer
printf("ADDRESS->%p\n", buff);
return buff;
}
int main(void)
{
void * buff = get_mem(54);
printf("ADDRESS->%p\n", buff);
free(buff);
return 0;
}
(In addition to other answers, which you would read first and probably should help you more ...)
Read a good C programming book. Pointers and addresses are very difficult to explain, and I'm not even trying to. So the address of a pointer &ptr is generally not the same as the value of a pointer (however, you could code ptr= &ptr; but you often don't want to do that)... Look also at the picture explaining virtual address space.
Then read more documentation about malloc: malloc(3) Linux man page, this reference documentation, etc... Here is fast, standard conforming, but disappointing implementation of malloc.
read also documentation about printf: printf(3) man page, printf reference, etc... It should mention %p for printing pointers...
Notice that you don't print a pointer (see Alk's answer), you don't even print its address (of an automatic variable on the call stack), you print some cast to size_t (which might not have the same bit width as a pointer, even if on my Linux/x86-64 it does).
Read also more about C dynamic memory allocation and about pointer aliasing.
At last, read the C11 standard specification n1570.
(I can't believe why you would expect the two outputs to be the same; actually it could happen if a compiler is optimizing the call to mymalloc by inlining a tail call)
So I did not expect the output to be the same in general. However, with gcc -O2 antonis.c -o antonis I've got (with a tiny modification of your code)....
a surprise
However, if you declare the first void *mymalloc(size_t bytes) as a static void*mymalloc(size_t bytes) and compile with GCC 7 on Linux/Debian/x86-64 with optimizations enabled, you do get the same output; because the compiler inlined the call and used the same location for bitv and ptr; here is the generated assembler code with gcc -S -O2 -fverbose-asm antonis.c:
.section .rodata.str1.1,"aMS",#progbits,1
.LC0:
.string "Address1 = %zx\n"
.LC1:
.string "Address2 = %zx\n"
.section .text.startup,"ax",#progbits
.p2align 4,,15
.globl main
.type main, #function
main:
.LFB22:
.cfi_startproc
pushq %rbx #
.cfi_def_cfa_offset 16
.cfi_offset 3, -16
# antonis.c:5: void * ptr = malloc(bytes);
movl $5, %edi #,
# antonis.c:11: {
subq $16, %rsp #,
.cfi_def_cfa_offset 32
# antonis.c:6: printf("Address1 = %zx\n",(size_t)&ptr);
leaq 8(%rsp), %rbx #, tmp92
# antonis.c:5: void * ptr = malloc(bytes);
call malloc#PLT #
# antonis.c:6: printf("Address1 = %zx\n",(size_t)&ptr);
leaq .LC0(%rip), %rdi #,
# antonis.c:5: void * ptr = malloc(bytes);
movq %rax, 8(%rsp) # tmp91, ptr
# antonis.c:6: printf("Address1 = %zx\n",(size_t)&ptr);
movq %rbx, %rsi # tmp92,
xorl %eax, %eax #
call printf#PLT #
# antonis.c:13: printf("Address2 = %zx\n",(size_t)&bitv);
leaq .LC1(%rip), %rdi #,
movq %rbx, %rsi # tmp92,
xorl %eax, %eax #
call printf#PLT #
# antonis.c:14: }
addq $16, %rsp #,
.cfi_def_cfa_offset 16
popq %rbx #
.cfi_def_cfa_offset 8
ret
.cfi_endproc
.LFE22:
.size main, .-main
BTW, if I compile your unmodified source (without static) with gcc -fwhole-program -O2 -S -fverbose-asm I'm getting the same assembler as above.
If you don't add static and don't compile with -fwhole-program the two Adddress1 and Address2 stay different.
two run outputs
I run that antonis executable and got on the first time:
/tmp$ ./antonis
Address1 = 7ffe2b07c148
Address2 = 7ffe2b07c148
and the second time:
/tmp$ ./antonis
Address1 = 7ffc441851a8
Address2 = 7ffc441851a8
If you want to guess why the outputs are different from one run to the next one, think of ASLR.
BTW, a very important notion when coding in C is that of undefined behavior (see also this and that answers and the references I gave there). You don't have any in your question (it is just unspecified behavior), but as my contrived answer shows, you should not expect a particular behavior in that precise case.
PS. I believe (but I am not entirely sure) that a standard conforming C implementation could output Address1= hello world and likewise for Address2. After all, the behavior of printf with %p is implementation defined. And surely you could get 0xdeadbeef for both. More seriously, an address is not always the same (of the same bitwidth) than a size_t or an int, and the standard defines intptr_t in <stdint.h>

C optimization: Why does the compiler treat an object not as constant?

Compiling the following C module
static const int i = 1;
void f (const int *i);
int g (void)
{
f (&i);
return i;
}
using gcc -S -O3 on an x86_64 maching yields the following assembly for the function g:
g:
leaq i(%rip), %rdi
subq $8, %rsp
call f#PLT
movl $1, %eax # inlined constant as an immediate
addq $8, %rsp
ret
In other words, the return statement is compiled to moving the constant $1 into the return register %eax, which makes sense because i is declared constant.
However, if I remove that const so that I have
static int i = 1;
void f (const int *i);
int g (void)
{
f (&i);
return i;
}
the output of gcc -S -O3 suddenly becomes:
g:
leaq i(%rip), %rdi
subq $8, %rsp
call f#PLT
movl i(%rip), %eax # reload i
addq $8, %rsp
ret
That is, the return value is explicitly loaded from memory after the call to f.
Why is this so? The argument to f is declared to be a pointer to a constant int, so f should not be allowed to alter i. Furthermore, f cannot call a function that modifies i through a non-const reference because the only such function could be g as i is declared static.
It is not undefined behavior to cast a pointer to const to a pointer to non-const and modify the referenced object, as long as the referenced object is not declared const.
6.7.3p6 says: "If an attempt is made to modify an object defined with a const-qualified type through use of an lvalue with non-const-qualified type, the behavior is undefined."
Changes the situation. Body is known. No special actions are required.
static int i = 1;
__attribute__((noinline)) void f (int *i)
{
*i *=2;
}
int g (void)
{
f (&i);
return i;
}
f:
sal DWORD PTR [rdi]
ret
g:
mov edi, OFFSET FLAT:i
call f
mov eax, DWORD PTR i[rip]
ret
i:

In which data segment is the C string stored?

I'm wondering what's the difference between char s[] = "hello" and char *s = "hello".
After reading this and this, I'm still not very clear on this question.
As I know, there are five data segments in memory, Text, BSS, Data, Stack and Heap.
From my understanding,
in case of char s[] = "hello":
"hello" is in Text.
s is in Data if it is a global variable or in Stack if it is a local variable.
We also have a copy of "hello" where the s is stored, so we can modify the value of this string via s.
in case of char *s = "hello":
"hello" is in Text.
s is in Data if it is a global variable or in Stack if it is a local variable.
s just points to "hello" in Text and we don't have a copy of it, therefore modifying the value of string via this pointer should cause "Segmentation Fault".
Am I right?
You are right that "hello" for the first case is mutable and for the second case is immutable string. And they are kept in read-only memory before initialization.
In the first case the mutable memory is initialized/copied from immutable string. In the second case the pointer refers to immutable string.
For first case wikipedia says,
The values for these variables are initially stored within the
read-only memory (typically within .text) and are copied into the
.data segment during the start-up routine of the program.
Let us examine segment.c file.
char*s = "hello"; // string
char sar[] = "hello"; // string array
char content[32];
int main(int argc, char*argv[]) {
char psar[] = "parhello"; // local/private string array
char*ps = "phello"; // private string
content[0] = 1;
sar[3] = 1; // OK
// sar++; // not allowed
// s[2] = 1; // segmentation fault
s = sar;
s[2] = 1; // OK
psar[3] = 1; // OK
// ps[2] = 1; // segmentation fault
ps = psar;
ps[2] = 1; // OK
return 0;
}
Here is the assembly generated for segment.c file. Note that both s and sar is in global aka .data segment. It seems sar is const pointer to a mutable initialized memory or not pointer at all(practically it is an array). And eventually it has an implication that sizeof(sar) = 6 is different to sizeof(s) = 8. There are "hello" and "phello" in readonly(.rodata) section and effectively immutable.
.file "segment.c"
.globl s
.section .rodata
.LC0:
.string "hello"
.data
.align 8
.type s, #object
.size s, 8
s:
.quad .LC0
.globl sar
.type sar, #object
.size sar, 6
sar:
.string "hello"
.comm content,32,32
.section .rodata
.LC1:
.string "phello"
.text
.globl main
.type main, #function
main:
.LFB0:
.cfi_startproc
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset 6, -16
movq %rsp, %rbp
.cfi_def_cfa_register 6
subq $64, %rsp
movl %edi, -52(%rbp)
movq %rsi, -64(%rbp)
movq %fs:40, %rax
movq %rax, -8(%rbp)
xorl %eax, %eax
movl $1752326512, -32(%rbp)
movl $1869376613, -28(%rbp)
movb $0, -24(%rbp)
movq $.LC1, -40(%rbp)
movb $1, content(%rip)
movb $1, sar+3(%rip)
movq $sar, s(%rip)
movq s(%rip), %rax
addq $2, %rax
movb $1, (%rax)
movb $1, -29(%rbp)
leaq -32(%rbp), %rax
movq %rax, -40(%rbp)
movq -40(%rbp), %rax
addq $2, %rax
movb $1, (%rax)
movl $0, %eax
movq -8(%rbp), %rdx
xorq %fs:40, %rdx
je .L2
call __stack_chk_fail
.L2:
leave
.cfi_def_cfa 7, 8
ret
.cfi_endproc
.LFE0:
.size main, .-main
.ident "GCC: (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3"
.section .note.GNU-stack,"",#progbits
Again for local variable in main, the compiler does not bother to create a name. And it may keep it in register or in stack memory.
Note that local variable value "parhello" is optimized into 1752326512 and 1869376613 numbers. I discovered it by changing the value of "parhello" to "parhellp". The diff of the assembly output is as follows,
39c39
< movl $1886153829, -28(%rbp)
---
> movl $1869376613, -28(%rbp)
So there is no separate immutable store for psar . It is turned into integers in the code segment.
answer to your first question:
char s[] = "hello";
s is an array of type char. An array is a const pointer, meaning that you cannot change the s using pointer arithmetic (i.e. s++). The data aren't const, though, so you can change it.
See this example C code:
#include <stdio.h>
void reverse(char *p){
char c;
char* q = p;
while (*q) q++;
q--; // point to the end
while (p < q) {
c = *p;
*p++ = *q;
*q-- = c;
}
}
int main(){
char s[] = "DCBA";
reverse( s);
printf("%s\n", s); // ABCD
}
which reverses the text "DCBA" and produces "ABCD".
char *p = "hello"
p is a pointer to a char. You can do pointer arithmetic -- p++ will compile -- and puts data in read-only parts of the memory (const data).
and using p[0]='a'; will result to runtime error:
#include <stdio.h>
int main(){
char* s = "DCBA";
s[0]='D'; // compile ok but runtime error
printf("%s\n", s); // ABCD
}
this compiles, but not runs.
const char* const s = "DCBA";
With a const char* const, you can change neither s nor the data content which point to (i.e. "DCBE"). so data and pointer are const:
#include <stdio.h>
int main(){
const char* const s = "DCBA";
s[0]='D'; // compile error
printf("%s\n", s); // ABCD
}
The Text segment is normally the segment where your code is stored and is const; i.e. unchangeable. In embedded systems, this is the ROM, PROM, or flash memory; in a desktop computer, it can be in RAM.
The Stack is RAM memory used for local variables in functions.
The Heap is RAM memory used for global variables and heap-initialized data.
BSS contains all global variables and static variables that are initialized to zero or not initialized vars.
For more information, see the relevant Wikipedia and this relevant Stack Overflow question
With regards to s itself: The compiler decides where to put it (in stack space or CPU registers).
For more information about memory protection and access violations or segmentation faults, see the relevant Wikipedia page
This is a very broad topic, and ultimately the exact answers depend on your hardware and compiler.

Resources