Printing a malloc'd pointer always gives the same address - c

Am I printing it wrong?
#include <stdio.h>
#include <stdlib.h>
int
main( void )
{
int * p = malloc(100000);
int * q;
printf("%p\n%p\n", (void *)p, (void *)q);
(void)getchar(); /* to run several instances at same time */
free(p);
return 0;
}
Whether I run it sequentially or in multiple terminals simultaneously, it always prints "0x60aa00000800" for p (q is different, though).
EDIT: Thanks for the answers, one of the reasons I was confused was because it used to print a different address each time. It turns out that a new compiler option I started using, -fsanitize=address, caused this change. Whoops.

The value of q is uninitialized garbage, since you never assign a value to it.
It's not surprising that you get the same address for p each time you run the program. That address is almost certainly a virtual address, so it applies only to the memory space of the currently running program (process).
Virtual address 0x60aa00000800 as seen from one program and virtual address 0x60aa00000800 as seen from another program are distinct physical addresses. The operating system maps virtual addresses to physical addresses, and vice versa, so there's no conflict. (If different programs could read and write the same physical memory, it would be a security nightmare.)
It also wouldn't be surprising if they were different each time. For example, some operating systems randomize stack addresses to prevent some code exploits. I'm not sure whether heap addresses are also randomized, but they certainly could be.
https://en.wikipedia.org/wiki/Virtual_memory

This behavior is not entirely surprising. The malloc operation is simply returning a pointer to user addressable + allocated memory in the process. It's completely reasonable for the first memory request of the same size to return the same address through different invocations of a process
The behavior for q doesn't contradict this. You have given q no value hence it gets whatever the last value written to that portion of the stack was. It's unsurprising that undefined behavior would be different through different invocations of the same process (after all, it's undefined)

Your code is fine.
The same code, and the same algorithm for obtaining memory with malloc() is run each time, so there's no reason the addresses should be different.
Some malloc implementations could randomize the start of memory allocations, yours does not.
This is because of virtual memory. The physical memory address for memory of q is different, but your operating system provides each process with a virtual view of memory, mapping different physical memory addresses to the same virtual addresses in your processes. So all processes have a similar view of the memory (and cannot see the memory of other processes)

Heap allocators are not required to provide distinct / unique addresses each time you run the program. There is no guarantee either way on this, but it's entirely reasonable for an implementation of malloc() to have deterministic behavior and give you the same pointer each time you run the program.
The stack, on the other hand, usually is (but is not required to be) located at a different address. This is a protection measure against buffer-overflow exploits. By making the stack location non-deterministic, they make it more difficult for an attacker to inject direct memory addresses of code via buffer-overflow attack.
Finally note that all pointers in a program are virtual memory addresses, not physical addresses. So even though two concurrent processes might have the same memory address in a pointer, those two processes still have distinct memory in separate areas of physical memory. The operating system takes care of this via its virtual memory manager and page translation. Each process has its own virtual address space, various pieces of which are mapped transparently by the OS to physical memory as needed.

Related

C accessing another programs memory?

I wrote this code so I can see the address of variable foo.
#include <stdio.h>
#include <stdlib.h>
int main(){
char* foo=(char*) malloc(1);
*foo='s';
printf(" foo addr : %p\n\n" ,&foo);
int pause;
scanf("%d",&pause);
return 0;
}
Then pause it and use the address of foo in here:
#include <stdio.h>
int main(){
char * ptr=(char *)0x7ffebbd57fc8; //this was the output from the first code
printf("\n\n\n\n%c\n\n\n",*ptr);
}
but I keep getting segmentation fault. Why is this code not working?
This is not a C question/problem but a matter of runtime support. On most OS programs run in a virtual environment, especially concerning their memory space. In such case memory is a virtual memory which means that when a program access a given address x the real (physical) memory is computed as f(x). f is a function implemented by the OS to ensure that a given process (object which represent the running of a code in the OS) have its own reserved memory separated from memory dedicated to other processes. This is called virtual memory.
Oups, your problem is not related to C language, but really depends of the OS, if any.
First let us read it from a pure C language point of view:
char * ptr=(char *)0x7ffebbd57fc8;
your are converting an unsigned integer to a char *. As you get the integer value from the other program, you can be sure that is has an acceptable range, so you indeed get a pointer pointing to that address. As it is a char * pointer, you can use it to read the byte representation of any object that will lie at that address. Still fine until there. But common systems use virtual addresses and limit each process to access only its own pages, so by default a process cannot access the memory of another process. In addition, with the common usage of virtual memory, there are no reasons that any two non kernel processes share common addresses. Exceptions for real addresses are:
real memory OS (MS/DOS and derivatives like FreeDOS, CP/M, and other anthic systems)
kernel mode: the kernel can access the whole memory of the system - who could load your program?
special functions: some OS provide special API to let one process read the memory of another one (Windows does), but it not as simple as directly reading an address...
As I assume that you are not in any of the first two cases, nothing is mapped at that address from the current process, hence the error.
On systems use virtual memory, you have a range of logical addresses that are available to user processes. These address ranges are subdivided into units called PAGES whose size depends upon the processor (512b to 1MB).
Pages are not valid until they are mapped into the process. The operating system will have system calls that allow the application to map pages. If you try to access a page that is not valid you get some kind of exception.
While the operating system only allocates memory pages, applications are used to calls, such as malloc(), that allocate memory blocks of arbitrary sizes.
Behind the scenes, malloc() is mapping pages (ie making them valid) to create a pool of memory that is uses to return small amounts of memory.
When you omit your malloc(), the memory is not being mapped.
Note that each process has its own range of logical addresses. One process's page containing 0x7ffebbd57fc8 is likely to be mapped to a different physical page frame than another process's 0x7ffebbd57fc8. If that were not the case, one user could muck with another. (There is always a range of addresses shared by all processes but this is can only be accessed in kernel mode.)
Your problem is further complicated by the fact that many systems these days randomly map processes to different locations in the logical address space. On such systems you could run your first program multiple times and get different addresses.
Why is this code not working?
You would need to call the system service on your operating system that maps memory into the process and make the page containing 0x7ffebbd57fc8 accessible.

what is the benefit of storing virtual address in a pointer rather than physical address?

I have gone through below link and it says that on most Operating Systems , pointers store the virtual address rather than physical address but I am unable to get the benefit of storing virtual address in a pointer .
when at the end we can modify the contents of a particular memory location directly through a pointer so what is the issue whether it is a virtual address or a physical address ?
Also during the time the code executes , most of the time the data segment will also remain in memory ,so we are dealing with physical memory location only so how the virtual address is useful ?
C pointers and the physical address
Security issues (as noted before) aside, there is another big advantage:
Globals and functions (and your stack) can always be found at fixed addresses (so the assembler can hardcode them), independently of where the instance of your program is loaded.
If you'd really like your code to run from any address, you have to make it position independent (with gcc you'd use the -fPIC argument). This question might be an interresting read wrt the -fPIC and virtual addressing: GCC -fPIC option
Actually, pointers normally hold LOGICAL ADDRESSES.
There are several reasons for using logical addressing, including:
Ease of memory management. The OS never has to allocate contiguous physical page frames to a process.
Security. Each process has access to its own logical address space and cannot mess with another processes address space.
Virtual Memory. Logical address translation is a prerequisite for implementing virtual memory.
Page protection. Aids security by limiting access to system pages to higher processor modes. Helps error (and virus) trapping by limiting the types of access to pages (e.g., not allowing writes to or executes of data pages).
This is not a complete list.
The same virtual address can point to different physical addresses in different moments.
If your physical memory is full, your data is swapped out from the memory to your HDD. When your program wants to access this data again - it is currently not stored in memory - the data is swapped in back to memory, but it often will be a different location as before.
The Page Table, which stores the assignment of virtual to physical addresses, is updated with the new physical address. So your virtual address remains the same, while the physical address may change.

How can we specify physical address for variable?

Any suggestions/discussions are welcome!
The question is actually brief as title, but I'll explain why I need physical address.
Background:
These days I'm fascinated by cache and multi-core architecture, and now I'm quite curious how cache influence our programs, under the parallel environment.
In some CPU models (for example, my Intel Core Duo T5800), the L2 cache is shared among cores. So, if program A is accessing memory at physical address like
0x00000000, 0x20000000, 0x40000000...
and program B accessing data at
0x10000000, 0x30000000, 0x50000000...
Since these addresses share the same suffix, the related set in L2 cache will be flushed frequently. And we're expected to see two programs fighting with each other, reading data slowly from memory instead of cache, although, they are separated in different cores.
Then I want to verify the result in practice. In this experiment, I have to know the physical address instead of virtual address. But how can I cope with this?
The first attempt:
Eat a large space from heap, mask, and get the certain address.
My CPU has a L2 cache with size=2048KB and associativity=8, so physical addressess like 0x12340000, 0x12380000, 0x123c0000 will be related to the first set in L2 cache.
int HEAP[200000000]={0};
int *v[2];
int main(int argc, char **argv) {
v[0] = (int*)(((unsigned)(HEAP)+0x3fffc) & 0xfffc0000);
v[1] = (int*) ((unsigned)(v[0]) + 0x40000);
// one program pollute v[0], another polluting v[1]
}
Sadly, with the "help" of virtual memory, variable HEAP is not always continuous inside physical memory. v[0] and v[1] might be related to different cache sets.
The second attempt
access /proc/self/mem, and try to get memory information.
Hmm... seems that the results are still about virtual memory.
Your understanding of memory and these addresses is incomplete/incorrect. Essentially, what you're trying to test is futile.
In the context of user-mode processes, pretty much every single address you see is a virtual address. That is, an address that makes sense only in the context of that process. The OS manages the mapping of where this virtual memory space (unique to a process) maps to memory pages. These memory pages at any given time may map to pages that are paged-in (i.e. reside in physical RAM) - or they may be paged-out, and exist only in the swap file on disk.
So to address the Background example, those addresses are from two different processes - it means absolutely nothing to try and compare them. Whether or not their code is present in any of the caches depends on a number of things, including the cache-replacement strategy of the processor, the caching policies enabled by the OS, the number of other processes (including kernel-mode threads), etc.
In your first attempt, again you aren't going to get anywhere as far as actually testing CPU cache directly. First of all, your large buffer is not going to be on the heap. It is going to be part of the data section (specifically the .bss) of the executable. The heap is used for the malloc() family of memory allocations. Secondly, it doesn't really matter if you allocate some huge 1GB region, because although it is contiguous in the virtual address space of your process, it is up to the OS to allocate pages of virtual memory wherever it deems fit - which may not actually be contiguous. Again, you have pretty much no control over memory allocation from userspace. "Is there a way to allocate contiguous physical memory from userspace in linux?" The short answer is No.
/proc/$pid/maps isn't going to get you anywhere either. Yes there are plenty of addresses listed in there, but again, they are all in the virtual address space of process $pid. Some more information on these: How do I read from /proc/$pid/mem under Linux?

C accessing memory location

I mean the physical memory, the RAM.
In C you can access any memory address, so how does the operating system then prevent your program from changing memory address which is not in your program's memory space?
Does it set specific memory adresses as begin and end for each program, if so how does it know how much is needed.
Your operating system kernel works closely with memory management (MMU) hardware, when the hardware and OS both support this, to make it impossible to access memory you have been disallowed access to.
Generally speaking, this also means the addresses you access are not physical addresses but rather are virtual addresses, and hardware performs the appropriate translation in order to perform the access.
This is what is called a memory protection. It may be implemented using different methods. I'd recommend you start with a Wikipedia article on this subject — http://en.wikipedia.org/wiki/Memory_protection
Actually, your program is allocated virtual memory, and that's what you work with. The OS gives you a part of the RAM, you can't access other processes' memory (unless it's shared memory, look it up).
It depends on the architecture, on some it's not even possible to prevent a program from crashing the system, but generally the platform provides some means to protect memory and separate address space of different processes.
This has to do with a thing called 'paging', which is provided by the CPU itself. In old operating systems, you had 'real mode', where you could directly access memory addresses. In contrast, paging gives you 'virtual memory', so that you are not accessing the raw memory itself, but rather, what appears to your program to be the entire memory map.
The operating system does "memory management" often coupled with TLB's (Translation Lookaside Buffers) and Virtual Memory, which translate any address to pages, which the operation system can tag readable or executable in the current processes context.
The minimum requirement for a processors MMU or memory management unit is in current context restrict the accessable memory to a range which can be only set in processors registers in supervisor mode (as opposed to user mode).
The logical address is generated by the CPU which is mapped to the physical address by the memory mapping unit. Unlike the physical address space the logical address is not restricted by memory size and you just get to work with the logical address space. The address binding is done by MMU. So you never deal with the physical address directly.
Most computers (and all PCs since the 386) have something called the Memory Management Unit (or MMU). It's job is to translate local addresses used by a program into the physical addresses needed to fetch real bytes from real memory. It's the operating system's job to program the MMU.
As a result of this, programs can be loaded into any region of memory and appear, from that program's point of view while executing, to be be any any other address. It's common to find that the code for all programs appear (locally) to be at the same address and their data always appears (locally) to be at the same address even though physically they will be in different locations. With each memory access, the MMU transparently translates from the local address space to the physical one.
If a program trys to access a memory address that has not been mapped into its local address space, the hardware generates an exception and typically gets flagged as a "segmentation violation", followed by the forcible termination of the program. This protects from accessing the memory of other processes.
But that doesn't have to be the case! On systems with "virtual memory" and current resource demands on RAM that exceed the amount of physical memory, some pages (just blocks of memory of a common size, often on the order of 4-8kB) can be written out to disk and given as RAM to a program trying to allocate and use new memory. Later on, when that page is needed by whatever program owns it, the memory access causes an exception and the OS swaps out some other memory page and re-loads the needed one from disk. The program that "page-faulted" gets delayed while this happens but otherwise notices nothing.
There are lots of other tricks the MMU/OS can do as well, like sharing memory between processes, making a disk file appear to be direct-memory-accessible, setting some pages as "NX" so they can't be treated as executable code, using arbitrary sections of the logical memory space regardless of how much and at what address the physical ram uses, and more.

Can malloc return same address in two different processes?

Suppose I have two process a and b on Linux. and in both process I use malloc() to allocate a memory,
Is there any chances that malloc() returns the same starting address in two processes?
If no, then who is going to take care of this.
If yes, then both process can access the same data at this address.
Is there any chances that malloc() return same starting address in two process.
Yes, but this is not a problem.
What you're not understanding is that operating systems firstly handle your physical space for you - programs etc only see virtual addresses. There is only one virtual address space, however, the operating system (let's stick with 32-bit for now) divides that up. On Windows, the top half (0xA0000000+) belongs to the kernel and the lower half to user mode processes. This is referred to as the 2GB/2GB split. On Linux, the divide is 3GB/1GB - see this article:
Kernel memory is defined to start at PAGE_OFFSET,which in x86 is 0XC0000000, or 3 gigabytes. (This is where the 3gig/1gig split is defined.) Every virtual address above PAGE_OFFSET is the kernel, any address below PAGE_OFFSET is a user address.
Now, when a process switch (as opposed to a context switch) occurs, all of the pages belonging to the current process are unmapped from virtual memory (not necessarily paging them) and all of the pages belonging to the to-be-run process are copied in (disclaimer: this might not exactly be true; one could mark pages dirty etc and copy on access instead, theoretically).
The reason for the split is that, for performance reasons, the upper half of the virtual memory space can remained mapped to the operating system kernel.
So, although malloc might return the same value in two given processes, that doesn't matter because:
physically, they're not the same address.
the processes don't share virtual memory anywhere.
For 64-bit systems, since we're currently only using 48 of those bits there is a gulf between the bottom of user mode and kernel mode which is not addressable (yet).
Yes, malloc() can return the same pointer value in separate processes, if the processes run in separate address spaces, which is achieved via virtual memory. But they won't access the same physical memory location in that case and the data at the address need not be the same, obviously.
Process is a collection of threads plus an address-space. This address-space is referred as virtual because every byte of it is not necessarily backed by physical memory. Segments of a virtual address-space will be eventually backed by physical memory if the application in the process ends up by using effectively this memory.
So, malloc() may return an identical address for two process, but it is no problem since these malloced memories will be backed by different segments of physical memory.
Moreover malloc() implementation is moslty not reentrant, therefore calling malloc() in differents threads sharing the same address-space hopefully won't result in returning the same virtual address.

Resources