malloc pointer address in main and in other function difference [duplicate] - c

This question already has answers here:
Printing pointer addresses in C [two questions]
(5 answers)
Closed 5 years ago.
I have the following question. Why is there a difference in the addresses of the two pointers in following example? This is the full code:
#include <stdio.h>
#include <stdlib.h>
void *mymalloc(size_t bytes){
void * ptr = malloc(bytes);
printf("Address1 = %zx\n",(size_t)&ptr);
return ptr;
}
void main (void)
{
unsigned char *bitv = mymalloc(5);
printf("Address2 = %zx\n",(size_t)&bitv);
}
Result:
Address1 = 7ffe150307f0
Address2 = 7ffe15030810

It's because you are printing the address of the pointer variable, not the pointer. Remove the ampersand (&) from bitv and ptr in your printfs.
printf("Address1 = %zx\n",(size_t)ptr);
and
printf("Address2 = %zx\n",(size_t)bitv);
Also, use %p for pointers (and then don't cast to size_t)
WHY?
In this line of code:
unsigned char *bitv = mymalloc(5);
bitv is a pointer and its value is the address of the newly allocated block of memory. But that address also needs to be stored, and &bitv is the address of the where that value is stored. If you have two variables storing the same pointer, they will still each have their own address, which is why &ptr and &bitv have different values.
But, as you expected, ptr and bitv will have the same value when you change your code.

Why is there a difference in the addresses of the two pointers
Because the two pointers are two different pointer(-variable)s, each having it's own address.
The value those two pointer(-variable)s carry in fact are the same.
To prove this print their value (and not their address) by changing:
printf("Address1 = %zx\n",(size_t)&ptr);
to be
printf("Address1 = %p\n", (void*) ptr);
and
printf("Address2 = %zx\n",(size_t)&bitv);
to be
printf("Address2 = %p\n", (void*) bitv);

In your code you used to print pointer's address following code:
printf("%zx", (size_t)&p);
It doesn't print address of variabele it's pointing to, it prints address of pointer.
You could print address using '%p' format:
printf("%p", &n); // PRINTS ADDRESS OF 'n'
There's an example which explains printing addresses
int n;
int *v;
n = 54;
v = &n;
printf("%p", v); // PRINTS ADDRESS OF 'n'
printf("%p", &v); // PRINTS ADDRESS OF pointer 'v'
printf("%p", &n); // PRINTS ADDRESS OF 'n'
printf("%d", *v); // PRINTS VALUE OF 'n'
printf("%d", n); // PRINTS VALUE OF 'n'
So your code should be written like this:
void * get_mem(int size)
{
void * buff = malloc(size); // allocation of memory
// buff is pointing to result of malloc(size)
if (!buff) return NULL; //when malloc returns NULL end function
//else print address of pointer
printf("ADDRESS->%p\n", buff);
return buff;
}
int main(void)
{
void * buff = get_mem(54);
printf("ADDRESS->%p\n", buff);
free(buff);
return 0;
}

(In addition to other answers, which you would read first and probably should help you more ...)
Read a good C programming book. Pointers and addresses are very difficult to explain, and I'm not even trying to. So the address of a pointer &ptr is generally not the same as the value of a pointer (however, you could code ptr= &ptr; but you often don't want to do that)... Look also at the picture explaining virtual address space.
Then read more documentation about malloc: malloc(3) Linux man page, this reference documentation, etc... Here is fast, standard conforming, but disappointing implementation of malloc.
read also documentation about printf: printf(3) man page, printf reference, etc... It should mention %p for printing pointers...
Notice that you don't print a pointer (see Alk's answer), you don't even print its address (of an automatic variable on the call stack), you print some cast to size_t (which might not have the same bit width as a pointer, even if on my Linux/x86-64 it does).
Read also more about C dynamic memory allocation and about pointer aliasing.
At last, read the C11 standard specification n1570.
(I can't believe why you would expect the two outputs to be the same; actually it could happen if a compiler is optimizing the call to mymalloc by inlining a tail call)
So I did not expect the output to be the same in general. However, with gcc -O2 antonis.c -o antonis I've got (with a tiny modification of your code)....
a surprise
However, if you declare the first void *mymalloc(size_t bytes) as a static void*mymalloc(size_t bytes) and compile with GCC 7 on Linux/Debian/x86-64 with optimizations enabled, you do get the same output; because the compiler inlined the call and used the same location for bitv and ptr; here is the generated assembler code with gcc -S -O2 -fverbose-asm antonis.c:
.section .rodata.str1.1,"aMS",#progbits,1
.LC0:
.string "Address1 = %zx\n"
.LC1:
.string "Address2 = %zx\n"
.section .text.startup,"ax",#progbits
.p2align 4,,15
.globl main
.type main, #function
main:
.LFB22:
.cfi_startproc
pushq %rbx #
.cfi_def_cfa_offset 16
.cfi_offset 3, -16
# antonis.c:5: void * ptr = malloc(bytes);
movl $5, %edi #,
# antonis.c:11: {
subq $16, %rsp #,
.cfi_def_cfa_offset 32
# antonis.c:6: printf("Address1 = %zx\n",(size_t)&ptr);
leaq 8(%rsp), %rbx #, tmp92
# antonis.c:5: void * ptr = malloc(bytes);
call malloc#PLT #
# antonis.c:6: printf("Address1 = %zx\n",(size_t)&ptr);
leaq .LC0(%rip), %rdi #,
# antonis.c:5: void * ptr = malloc(bytes);
movq %rax, 8(%rsp) # tmp91, ptr
# antonis.c:6: printf("Address1 = %zx\n",(size_t)&ptr);
movq %rbx, %rsi # tmp92,
xorl %eax, %eax #
call printf#PLT #
# antonis.c:13: printf("Address2 = %zx\n",(size_t)&bitv);
leaq .LC1(%rip), %rdi #,
movq %rbx, %rsi # tmp92,
xorl %eax, %eax #
call printf#PLT #
# antonis.c:14: }
addq $16, %rsp #,
.cfi_def_cfa_offset 16
popq %rbx #
.cfi_def_cfa_offset 8
ret
.cfi_endproc
.LFE22:
.size main, .-main
BTW, if I compile your unmodified source (without static) with gcc -fwhole-program -O2 -S -fverbose-asm I'm getting the same assembler as above.
If you don't add static and don't compile with -fwhole-program the two Adddress1 and Address2 stay different.
two run outputs
I run that antonis executable and got on the first time:
/tmp$ ./antonis
Address1 = 7ffe2b07c148
Address2 = 7ffe2b07c148
and the second time:
/tmp$ ./antonis
Address1 = 7ffc441851a8
Address2 = 7ffc441851a8
If you want to guess why the outputs are different from one run to the next one, think of ASLR.
BTW, a very important notion when coding in C is that of undefined behavior (see also this and that answers and the references I gave there). You don't have any in your question (it is just unspecified behavior), but as my contrived answer shows, you should not expect a particular behavior in that precise case.
PS. I believe (but I am not entirely sure) that a standard conforming C implementation could output Address1= hello world and likewise for Address2. After all, the behavior of printf with %p is implementation defined. And surely you could get 0xdeadbeef for both. More seriously, an address is not always the same (of the same bitwidth) than a size_t or an int, and the standard defines intptr_t in <stdint.h>

Related

How does C know the size of an Integer array? [duplicate]

How does c find at run time the size of array? where is the information about array size or bounds of array stored ?
sizeof(array) is implemented entirely by the C compiler. By the time the program gets linked, what looks like a sizeof() call to you has been converted into a constant.
Example: when you compile this C code:
#include <stdlib.h>
#include <stdio.h>
int main(int argc, char** argv) {
int a[33];
printf("%d\n", sizeof(a));
}
you get
.file "sz.c"
.section .rodata
.LC0:
.string "%d\n"
.text
.globl main
.type main, #function
main:
leal 4(%esp), %ecx
andl $-16, %esp
pushl -4(%ecx)
pushl %ebp
movl %esp, %ebp
pushl %ecx
subl $164, %esp
movl $132, 4(%esp)
movl $.LC0, (%esp)
call printf
addl $164, %esp
popl %ecx
popl %ebp
leal -4(%ecx), %esp
ret
.size main, .-main
.ident "GCC: (GNU) 4.1.2 (Gentoo 4.1.2 p1.1)"
.section .note.GNU-stack,"",#progbits
The $132 in the middle is the size of the array, 132 = 4 * 33. Notice that there's no call sizeof instruction - unlike printf, which is a real function.
sizeof is pure compile time in C++ and C prior to C99. Starting with C99 there are variable length arrays:
// returns n + 3
int f(int n) {
char v[n + 3];
// not purely a compile time construct anymore
return sizeof v;
}
That will evaluate the sizeof operand, because n is not yet known at compile time. That only applies to variable length arrays: Other operands or types still make sizeof compute at compile time. In particular, arrays with dimensions known at compile time are still handled like in C++ and C89. As a consequence, the value returned by sizeof is not a compile time constant (constant expression) anymore. You can't use it where such a value is required - for example when initializing static variables, unless a compiler specific extension allows it (the C Standard allows an implementation to have extensions to what it treats as constant).
sizeof() will only work for a fixed size array (which can be static, stack based or in a struct).
If you apply it to an array created with malloc (or new in C++) you will always get the size of a pointer.
And yes, this is based on compile time information.
sizeof gives the size of the variable, not the size of the object that you're pointing to (if there is one.) sizeof(arrayVar) will return the array size in bytes if and only if arrayVar is declared in scope as an array and not a pointer.
For example:
char myArray[10];
char* myPtr = myArray;
printf("%d\n", sizeof(myArray)) // prints 10
printf("%d\n", sizeof(myPtr)); // prints 4 (on a 32-bit machine)
sizeof(Array) is looked up at compile time, not at run time. The information is not stored.
Are you perhaps interested in implementing bounds checking? If so, there are a number of different ways to go about that.

Segfault calling c function from assembly

I am attempting to set up some pointers in an assembly program(AT&T syntax running on x86_64 linux), then pass them to a C program to essentially add their values. Of course, this isn't the most effective way of accomplishing the end result, but I'm trying to understand how to make something like this work in order to further build off of it. The C program looks as follows:
#include <stdio.h>
extern void iplus(long *a, long *b, long *c){
printf("Starting\n");
long r= *a + *b;
printf("R setup\n");
*c=r;
printf("Done\n");
}
This takes three long pointers, adds the value of the first two, then stores that value in the third. As shown above, it prints a message regarding its status at each point, in order to track where the segmentation fault occurs.
The assembly program referencing this function is as follows:
.extern exit
.extern malloc
.data
vars: .zero 24 /*stores pointer addresses*/
.text
FORI: .ascii "%d\0" /*format for printing integer*/
.global main
main:
and $~0xf, %rsp /*16-byte align the stack*/
movq $8,%rdi
call malloc
movq %rax,(vars+0) /*allocate 8 bytes and put its address into the variable*/
movq $8,%rdi
call malloc
movq %rax,(vars+8)
movq $8,%rdi
call malloc
movq %rax,(vars+16)
movq $3,((vars+0)) /*first addend 3*/
movq $7,((vars+8)) /*second addend 7*/
movq $0,((vars+16))
movq (vars),%rdi
movq (vars+8),%rsi
movq (vars+16),%rdx
call iplus /*call the function with these values*/
movq $FORI,%rdi
movq ((vars+16)),%rsi
call printf /*print the sum, "10" expected*/
call exit
Upon making then executing the above program, I get this output:
Starting
Segmentation fault (core dumped)
Meaning the function seems to be successfully called, but something about the first operation within that function, long r = *a + *b;, or something earlier that only becomes a problem at that point, is causing a segfault. What I expect to happen is that, for the three 8-byte values held by the 24-byte vars, the address returned by malloc (which allocates 8 bytes each time), is stored. This address then points to an 8-byte integer, which are set to 3, 7, and 0. The addresses of these integers(i.e. the values held in vars), are passed to iplus in order to sum them, then the sum is printed using printf. For a reason I cannot identify, this instead causes a segfault.
Why is the segfault occurring? Is it possible to perform this addition using the C function call with the structure of basically a double pointer still being used?
You can't use pointer residing in memory directly, you should first load it into register. Double parentheses you put are just ignored, so this:
movq $3,((vars+0)) /*first addend 3*/
movq $7,((vars+8)) /*second addend 7*/
movq $0,((vars+16))
is the same as this:
movq $3,(vars+0) /*first addend 3*/
movq $7,(vars+8) /*second addend 7*/
movq $0,(vars+16)
Instead you need to do (for each value):
movq (vars+0), %rax
movq $3, (%rax)

What's different between pointer with array in c? [duplicate]

This question already has answers here:
What is the difference between char s[] and char *s?
(14 answers)
Closed 5 years ago.
I try to google this topic, but no one can explain clear. I try the below code:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main(int argc, char * argv[]){
char * p1 = "dddddd";
const char * p2 = "dddddd";
char p3[] = "dddddd";
char * p4 =(char*)malloc(sizeof("dddddd")+1);
strcpy(p4, "dddddd");
//*(p1+2) = 'b'; // test_1
//Output >> Bus error: 10
// *(p2+2) = 'b'; // test_2
// Output >> char_point.c:11:13: error: read-only variable is not assignable
*(p3+2) = 'b'; // test_3
// Output >>
//d
//dddddd
//dddddd
//ddbddd
*(p4+2) = 'k'; // test_4
// Output >>
//d
//dddddd
//dddddd
//ddbddd
//ddkddd
printf("%c\n", *(p1+2));
printf("%s\n", p1);
printf("%s\n", p2);
printf("%s\n", p3);
printf("%s\n", p4);
return 0;
}
I have try 3 tests, but only the test_3 and test_4 can pass. I know const char *p2 is read only, because it's a constant value! but i don't know why p1 can't be modified! which section of memory it's layout? BTW, I compile it on my Mac with GCC.
I try to compile it to dis-asm it by gcc -S, I got this.
.section __TEXT,__text,regular,pure_instructions
.macosx_version_min 10, 13
.globl _main
.p2align 4, 0x90
_main: ## #main
.cfi_startproc
## BB#0:
pushq %rbp
Lcfi0:
.cfi_def_cfa_offset 16
Lcfi1:
.cfi_offset %rbp, -16
movq %rsp, %rbp
Lcfi2:
.cfi_def_cfa_register %rbp
subq $48, %rsp
movl $8, %eax
movl %eax, %ecx
leaq L_.str(%rip), %rdx
movl $0, -4(%rbp)
movl %edi, -8(%rbp)
movq %rsi, -16(%rbp)
movq %rdx, -24(%rbp)
movq %rdx, -32(%rbp)
movl L_main.p3(%rip), %eax
movl %eax, -39(%rbp)
movw L_main.p3+4(%rip), %r8w
movw %r8w, -35(%rbp)
movb L_main.p3+6(%rip), %r9b
movb %r9b, -33(%rbp)
movq %rcx, %rdi
callq _malloc
xorl %r10d, %r10d
movq %rax, -48(%rbp)
movl %r10d, %eax
addq $48, %rsp
popq %rbp
retq
.cfi_endproc
.section __TEXT,__cstring,cstring_literals
L_.str: ## #.str
.asciz "dddddd"
L_main.p3: ## #main.p3
.asciz "dddddd"
.subsections_via_symbols
I want to know every pointer what i declaration, which section is it?
"Why p1 can't be modified?"
Roughly speaking, p1 points to a string literal, and attempts to modify string literals cause undefined behavior in C.
More specifically, according to the §6.4.5 6 of the C11 Standard, string literals are:
used to initialize an array of static storage duration and length just sufficient to contain the sequence. For character string literals, the array elements have type char....
Concerning objects with static storage duration, §5.1.2 1 states that
All objects with static storage duration shall be initialized (set to their initial values) before program startup. The manner and timing of such initialization are otherwise unspecified.
"Which section of memory it's layout?"
But, the Standard does not specify any specific memory layouts that an implementation must follow.
What the Standard does say about the arrays of char which are created from string literals is that (§6.4.5 7):
It is unspecified whether these arrays are distinct provided their elements have the appropriate values. If the program attempts to modify such an array, the behavior is undefined.
So
char * p1 = "dddddd";
this should be
const char * p1 = "dddddd";
String literals (the ones in quotes) reside in read-only memory. Even if you
don't use the const keyword in the declaration of the variable, p1 still
points to read-only memory. So
*(p1+2) = 'b'; // test_1
is going to fail.
Here
*(p2+2) = 'b'; // test_2
// Output >> char_point.c:11:13: error: read-only variable is not assignable
the compiler tells you, you cannot do that because you declared p2 as const.
The difference between the first test and this one, is that the code tries to
modify a character and fails.
Now this:
char * p4 =(char*)malloc(sizeof("dddddd")+1);
First, do not cast malloc & friends. Second: the sizeof-operator returns the
number of bytes needed to store the expression in memory. "ddddd" is a string
literal, it returns a pointer to char, so sizeof("dddddd") returns the number
of bytes that a pointer to char needs to be stored in memory.
The correct function would be strlen:
char * p4 = malloc(strlen("dddddd")+1);
Note that in this case
char txt[] = "Hello world";
printf("%lu\n", sizeof(txt));
will print 12 and not 11. C strings are '\0'-terminated, that means that txt
holds all these characters plus the '\0'-terminating byte. In this case
sizeof doesn't return the number of bytes for a pointer, because txt is an
array.
void foo(char *txt)
{
printf("%lu\n", sizeof(txt));
}
void bar(void)
{
char txt[] = "Hello world";
foo(txt);
}
Here you won't get 12 like before, most probably 8 (today's common size for a
pointer). Even though txt in bar is an array, the txt in foo is a
pointer.
Arrays are constant pointer, which means that an array points to a memory address and you cant change were it points. But you can change the elements in it.
While you can change where the pointer points, but it's elements are constant.
for example consider this code
int main(){
int a[] = {1,2,3};
int * ptr = {1,2,3};
//a[0] == *(a+0)
//a[1] == *(a+1)
a += 1; // this is wrong, because we cant change were array points
ptr += 1; // this is correct, now the pointer ptr will points to the next element which is 2
a[0] += 2 // this is correct, now a[0] will become 3
*ptr += 2 // this is wrong, because we cant change the elements of the pointer.
return 0;
}

Why does GCC seg fault where clang does not on a textbook exercise?

Learning C using "System Programming with C and Unix" by Adam Hoover. I have come across question from Chapter 4 that puzzles me greatly. The question is as follows:
In the following code, the first printf() reached
produces the output "14," but the second printf()
can cause a bus error or a segmentation fault. Why?
The original code from the book:
main()
{
int *p;
funct(p);
printf("%d\n",*p);
}
funct(int *p2)
{
p2=(int *)malloc(4);
*p2=14;
printf("%d\n",*p2);
}
My slightly modified "debugging" (printf all the things) version:
#include <stdio.h>
#include <stdlib.h>
void funct(int *p2);
int main(){
int *p;
printf("main p - address: %p\n", p);
funct(p);
printf("main p - address: %p\n", p);
printf("main p value: %d\n", *p);
}
void funct(int *p2){
printf("funct (pre malloc) p2 - address: %p\n", p2);
p2 = (int *)malloc(4);
printf("funct (post malloc) p2 - address: %p\n", p2);
*p2 = 14;
printf("funct p2 value: %d\n", *p2);
}
I have compiled this sample using both gcc and clang (on ubuntu linux) and clang does not produce a seg fault for code that is supposed to do just that. I have puzzled over this for awhile now and can not imagine the why or how of this. Any insight welcome.
Thanks.
int *p;
funct(p);
printf("%d\n",*p);
This is wrong. p is passed by value. So what ever made modification in the function doesn't affect p in the main. And dereferencing an uninitialized pointer behaviour is undefined.
What you actually need to do is -
funct(&p) ; // in main
void funct( int **p ){
*p = malloc(sizeof(int));
// ...
}
This is undefined behaviour and doesn't have to result in a crash (or any other specific behaviour). A compiler is free to produce any code it likes for such cases. Since you asked why the code produced by clang doesn't crash, we'll need to dig into that code. Here's what clang trunk produces when compiling with -O3 on x86_64:
main: # #main
pushq %rbp
movq %rsp, %rbp # Build stack frame
movl $.L.str, %edi
movl $14, %esi
xorb %al, %al # no XMM registers used by varargs call
callq printf # printf(%edi = "%d\n", %esi = 14)
movl $.L.str, %edi
xorb %al, %al # no XMM registers used by varargs call
callq printf # printf(%edi = "%d\n", %esi = ?)
xorl %eax, %eax
popq %rbp
ret # return %eax = 0
Since p is uninitialized, clang has chosen, today, to compile the expression *p to nothing at all. This is a legitimate transformation, because clang can prove that the expression has undefined behaviour. The value being printed is then whatever ends up in the %esi register at the time of the printf call (on my machine, that happens to be -1). This may not be what you expected, but that is the nature of undefined behaviour!

How Does sizeof(Array) work

How does c find at run time the size of array? where is the information about array size or bounds of array stored ?
sizeof(array) is implemented entirely by the C compiler. By the time the program gets linked, what looks like a sizeof() call to you has been converted into a constant.
Example: when you compile this C code:
#include <stdlib.h>
#include <stdio.h>
int main(int argc, char** argv) {
int a[33];
printf("%d\n", sizeof(a));
}
you get
.file "sz.c"
.section .rodata
.LC0:
.string "%d\n"
.text
.globl main
.type main, #function
main:
leal 4(%esp), %ecx
andl $-16, %esp
pushl -4(%ecx)
pushl %ebp
movl %esp, %ebp
pushl %ecx
subl $164, %esp
movl $132, 4(%esp)
movl $.LC0, (%esp)
call printf
addl $164, %esp
popl %ecx
popl %ebp
leal -4(%ecx), %esp
ret
.size main, .-main
.ident "GCC: (GNU) 4.1.2 (Gentoo 4.1.2 p1.1)"
.section .note.GNU-stack,"",#progbits
The $132 in the middle is the size of the array, 132 = 4 * 33. Notice that there's no call sizeof instruction - unlike printf, which is a real function.
sizeof is pure compile time in C++ and C prior to C99. Starting with C99 there are variable length arrays:
// returns n + 3
int f(int n) {
char v[n + 3];
// not purely a compile time construct anymore
return sizeof v;
}
That will evaluate the sizeof operand, because n is not yet known at compile time. That only applies to variable length arrays: Other operands or types still make sizeof compute at compile time. In particular, arrays with dimensions known at compile time are still handled like in C++ and C89. As a consequence, the value returned by sizeof is not a compile time constant (constant expression) anymore. You can't use it where such a value is required - for example when initializing static variables, unless a compiler specific extension allows it (the C Standard allows an implementation to have extensions to what it treats as constant).
sizeof() will only work for a fixed size array (which can be static, stack based or in a struct).
If you apply it to an array created with malloc (or new in C++) you will always get the size of a pointer.
And yes, this is based on compile time information.
sizeof gives the size of the variable, not the size of the object that you're pointing to (if there is one.) sizeof(arrayVar) will return the array size in bytes if and only if arrayVar is declared in scope as an array and not a pointer.
For example:
char myArray[10];
char* myPtr = myArray;
printf("%d\n", sizeof(myArray)) // prints 10
printf("%d\n", sizeof(myPtr)); // prints 4 (on a 32-bit machine)
sizeof(Array) is looked up at compile time, not at run time. The information is not stored.
Are you perhaps interested in implementing bounds checking? If so, there are a number of different ways to go about that.

Resources