Let's assume we have 2 programs written in C, one program allocates memory with malloc and launches the second program passing the address of allocated memory and size as arguments.
Now the question, is it possible for the second program to cast the first argument to a pointer and read/write to that memory. Why, why not?
For the sake of simplicity assume Linux as the underlying OS.
No, because on modern operating systems processes running in user mode see Virtual Memory. The same virtual address will translate to a different physical address or page file location between processes.
Fortunately, most operating systems do have APIs that allow for inter-process communication, so you can research those methods. This question seems to be a good place to start, since you claim to be working on Linux.
Related
In C, when you get the address of a variable is that address an address that really exist in the RAM of the computer or just an address in a fake memory in the C compiler (if that's how it really works)? Can you explain in layman’s terms?
Yes and no. When you take the address of a variable, and perform some operations on it (assuming the compiler doesn't optimize it out), it will correspond to an address in ram. However because of virtual memory, the address used in your program is almost certainly not the address of the variable in physical ram. The kernel remaps what virtual addresses (what your program sees) refer to which physical addresses (what the memory sees), so that different processes can be loaded into memory at the same time, yet not be able to access each others' memory. Additionally, your process's memory can be paged out, or written to disk if it has not been used recently and/or something else needs more memory, and reloaded into a completely different address, yet the virtual address will remain the same.
So yes, when you access a pointer, that address corresponds to an address in memory. But that address doesn't correspond to the actual address in ram, and the address it corresponds to can change over time.
The sort answer is "neither".
In general terms, the address of a variable in memory is in the context of a running program's address space.
What differs is how the program's address space is mapped to hardware by the host system.
With modern hardware that has a memory management unit (MMU), and operating systems (or their device drivers) that use the MMU, a program's address space is mapped to physical memory, which may consist of RAM or virtual memory, such as a swap file on a hard drive. The operating system uses the MMU to isolate programs from each other (so two processes cannot access each other's address space) and also uses the MMU to support swapping of data between RAM and swap. The running process cannot generally tell where its data is in physical memory, because the operating system and MMU specifically prevent it from doing so. Over time, the operating system and MMU may migrate memory used a program to different areas of RAM or to swap, but the program cannot detect this, since the operating system and MMU take care of mapping an address in the program (which never changes as far as the program is concerned) to the actual address. This covers most modern versions of windows, unix, and various realtime operating systems. (Those systems also typically provide means of programatically accessing physical memory, but only for programs that are running with higher privileges or for kernel mode drivers).
Older hardware did not have an MMU, so operating systems were not able to give programs separate address spaces. On such systems, the address as seen by a program had a one-to-one correspondence to a location in physical memory.
Somewhere in between was hardware that had separate areas of physical memory (e.g. provided by distinct banks of memory chips). On those systems, with support of special drivers, a host system could implement a partial mapping between addresses in a program's address space, and locations in particular areas of physical memory. This is why some target systems, and compilers that support them, support more than one pointer type (e.g. with names like near, far, and huge) as a compiler extension. In those cases, a pointer could refer to a location in a particular area of memory, and there may be some mapping of values, for each pointer type, from the value of a pointer seen by a program to the actual location within a corresponding area of physical memory.
The C compiler does not become a part of executable program it builds (otherwise, to install any built program, it would be necessary to also install and execute the compiler used to build it, or the program would not run). Typically, a compiler is no longer running when a program is executed (or, at least, a program cannot rely on it being present). A program therefore cannot access addresses within the compiler's address space.
In an interpreted environment (e.g. C code is interpreted by another program - the interpreter) the interpreter acts as an intermediary between the program and the hardware, and handles mapping between a program's address space, the interpreter's address space, and physical memory. C interpreters are relatively rare in practice, compared with toolchains that use compilers and linkers.
On ancient OSes, the MMU isn't present on the target processor, or not used (even if the processor allows it).
In that case, physical addresses are used, which is simpler to understand but also annoying because when you're debugging an assembly program or trying to decode a traceback, you have to know where the program was loaded or the post-mortem traceback is useless.
Without MMU, you can do very hacky & simple things. Shared memory can be coded in a few lines, you can inspect the whole memory very easily, etc...
On modern OSes, relying on MMU processor capability and address translation, executables are running in a virtual memory, which isn't an issue since they cannot access other executables memory anyway.
The good side is that if you're running/debugging the same executable many times, you always get the same addresses. Useful on long debugging sessions where you have to restart the debugger many times.
Also, some languages/compilers (like GNAT Ada compiler) provide a traceback with addresses when the program does something illegal. Using addr2line on the executable, you're able to get the exact traceback even after the process has ended and memory has been released.
The exception I know of is Windows shared libraries (DLL) which are almost never loaded at the same address, since this address is potentially shared between several executables. In those cases, for instance, a post-mortem traceback will be useless because the declared symbol address has an offset from the actual traceback address.
In case of multi-process environment where multiple processes runs at same time, linker can not decide address of the variables at compile time.
Reason is simple, if you assign dedicated address to the variables then you limit the number of processes that can run on your system.
So they assign a virtual address to the variables and those addresses translated to the physical addresses during run-time with the help of OS and processor.
One example of such system is linux running on x86 CPU.
In other cases where only one process/application runs on a processor then linker can assign actual physical address to variables.
example: embedded systems performing dedicated tasks, such as Oven.
I wrote this code so I can see the address of variable foo.
#include <stdio.h>
#include <stdlib.h>
int main(){
char* foo=(char*) malloc(1);
*foo='s';
printf(" foo addr : %p\n\n" ,&foo);
int pause;
scanf("%d",&pause);
return 0;
}
Then pause it and use the address of foo in here:
#include <stdio.h>
int main(){
char * ptr=(char *)0x7ffebbd57fc8; //this was the output from the first code
printf("\n\n\n\n%c\n\n\n",*ptr);
}
but I keep getting segmentation fault. Why is this code not working?
This is not a C question/problem but a matter of runtime support. On most OS programs run in a virtual environment, especially concerning their memory space. In such case memory is a virtual memory which means that when a program access a given address x the real (physical) memory is computed as f(x). f is a function implemented by the OS to ensure that a given process (object which represent the running of a code in the OS) have its own reserved memory separated from memory dedicated to other processes. This is called virtual memory.
Oups, your problem is not related to C language, but really depends of the OS, if any.
First let us read it from a pure C language point of view:
char * ptr=(char *)0x7ffebbd57fc8;
your are converting an unsigned integer to a char *. As you get the integer value from the other program, you can be sure that is has an acceptable range, so you indeed get a pointer pointing to that address. As it is a char * pointer, you can use it to read the byte representation of any object that will lie at that address. Still fine until there. But common systems use virtual addresses and limit each process to access only its own pages, so by default a process cannot access the memory of another process. In addition, with the common usage of virtual memory, there are no reasons that any two non kernel processes share common addresses. Exceptions for real addresses are:
real memory OS (MS/DOS and derivatives like FreeDOS, CP/M, and other anthic systems)
kernel mode: the kernel can access the whole memory of the system - who could load your program?
special functions: some OS provide special API to let one process read the memory of another one (Windows does), but it not as simple as directly reading an address...
As I assume that you are not in any of the first two cases, nothing is mapped at that address from the current process, hence the error.
On systems use virtual memory, you have a range of logical addresses that are available to user processes. These address ranges are subdivided into units called PAGES whose size depends upon the processor (512b to 1MB).
Pages are not valid until they are mapped into the process. The operating system will have system calls that allow the application to map pages. If you try to access a page that is not valid you get some kind of exception.
While the operating system only allocates memory pages, applications are used to calls, such as malloc(), that allocate memory blocks of arbitrary sizes.
Behind the scenes, malloc() is mapping pages (ie making them valid) to create a pool of memory that is uses to return small amounts of memory.
When you omit your malloc(), the memory is not being mapped.
Note that each process has its own range of logical addresses. One process's page containing 0x7ffebbd57fc8 is likely to be mapped to a different physical page frame than another process's 0x7ffebbd57fc8. If that were not the case, one user could muck with another. (There is always a range of addresses shared by all processes but this is can only be accessed in kernel mode.)
Your problem is further complicated by the fact that many systems these days randomly map processes to different locations in the logical address space. On such systems you could run your first program multiple times and get different addresses.
Why is this code not working?
You would need to call the system service on your operating system that maps memory into the process and make the page containing 0x7ffebbd57fc8 accessible.
I was reading a paragraph from the "The Linux Kernel Module Programming Guide" and I have a couple of doubts related to the following paragraph.
The reason for copy_from_user or get_user is that Linux memory (on
Intel architecture, it may be different under some other processors)
is segmented. This means that a pointer, by itself, does not reference
a unique location in memory, only a location in a memory segment, and
you need to know which memory segment it is to be able to use it.
There is one memory segment for the kernel, and one for each of the
processes.
However it is my understanding that Linux uses paging instead of segmentation and that virtual addresses at and above 0xc0000000 have the kernel mapping in.
Do we use copy_from_user in order to accommodate older kernels?
Do the current linux kernels use segmentation in any way at all? If so how?
If (1) is not true, are there any other advantages to using copy_from_user?
Yeah. I don't like that explanation either. The details are essentially correct in a technical sense (see also Why does Linux on x86 use different segments for user processes and the kernel?) but as you say, linux typically maps the memory so that kernel code could access it directly, so I don't think it's a good explanation for why copy_from_user, etc. actually exist.
IMO, the primary reason for using copy_from_user / copy_to_user (and friends) is simply that there are a number of things to be checked (dangers to be guarded against), and it makes sense to put all of those checks in one place. You wouldn't want every place that needs to copy data in and out from user-space to have to re-implement all those checks. Especially when the details may vary from one architecture to the next.
For example, it's possible that a user-space page is actually not present when you need to copy to or from that memory and hence it's important that the call be made from a context that can accommodate a page fault (and hence being put to sleep).
Also, user-space data pointers need to be checked carefully to ensure that they actually point to user-space and that they point to data regions, and that the copy length doesn't wrap beyond the end of the valid regions, and so forth.
Finally, it's possible that user-space actually doesn't share the same page mappings with the kernel. There used to be a linux patch for 32-bit x86 that made the complete 4G of virtual address space available to user-space processes. In that case, kernel code could not make the assumption that a user-space pointer was directly accessible, and those functions might need to map individual user-space pages one at a time in order to access them. (See 4GB/4GB Kernel VM Split)
I've written a program using dynamic memory allocation. I do not use the free function to free the memory, still at the address, the variable's value is present there.
Now I want to reuse this value and I want to see all the variables' values that are present in RAM from another process.
Is it possible?
#include<stdio.h>
#include<stdlib.h>
void main(){
int *fptr;
fptr=(int *)malloc(sizeof(int));
*fptr=4;
printf("%d\t%u",*fptr,fptr);
while(1){
//this is become infinite loop
}
}
and i want to another program to read the value of a because it is still in memory because main function is infinite. how can do this?
This question shows misconceptions on at least two topics.
The first is virtual address spaces and memory protection, as already addressed by RobertL in his answer. On a modern operating system, you just can't access memory belonging to another process (even if you knew the physical address, which you don't because all user space processes work on addresses in their private virtual address space). The result of trying to access something not mapped into your address space will be a segmentation fault. Read more on Wikipedia.
The second is the scope of the C standard. It doesn't know about processes. C is defined in terms of an abstract machine executing your program (and only this program). Scopes and lifetimes of variables are defined and the respective maximum is the global scope and a static storage duration. So yes, your variable will continue to live as long as your program runs, but it's scope will be this program.
When you understand that, you see: even on a platform using a single global address space and no memory protection at all, you could never access the variable of another program in terms of the C standard. You could probably pass a pointer value somehow to your other program and use that, maybe it would work, but it would be undefined behavior.
That's why operating systems provide means for inter process communication like shared memory (which comes close to what you seem to want) and pipes.
When you return from main(), the process frees all acquired resources, so you can't. Have a look on this.
First of all, when you close your process the memory you allocated with it will be freed entirely. You can circumvent this by having 1 process of you write to a file (like a .dat file) and the other read from it. There are other ways, too.
But generally speaking in normal cases, when your process terminates the existing memory will be freed.
If you try accessing memory from another process from within your process, you will most likely get a segmentation fault, as most operating systems protect processes from messing with each other's memory.
It depends on the operating system. Most modern multi-tasking operating systems protect processes from each other. Hardware is setup to disallow processes from seeing other processes memory. On Linux and Unix for example, to communicate between programs in memory you will need to use operating system services for "inter-process" communication, such as shared memory, semaphores, or pipes.
Am I printing it wrong?
#include <stdio.h>
#include <stdlib.h>
int
main( void )
{
int * p = malloc(100000);
int * q;
printf("%p\n%p\n", (void *)p, (void *)q);
(void)getchar(); /* to run several instances at same time */
free(p);
return 0;
}
Whether I run it sequentially or in multiple terminals simultaneously, it always prints "0x60aa00000800" for p (q is different, though).
EDIT: Thanks for the answers, one of the reasons I was confused was because it used to print a different address each time. It turns out that a new compiler option I started using, -fsanitize=address, caused this change. Whoops.
The value of q is uninitialized garbage, since you never assign a value to it.
It's not surprising that you get the same address for p each time you run the program. That address is almost certainly a virtual address, so it applies only to the memory space of the currently running program (process).
Virtual address 0x60aa00000800 as seen from one program and virtual address 0x60aa00000800 as seen from another program are distinct physical addresses. The operating system maps virtual addresses to physical addresses, and vice versa, so there's no conflict. (If different programs could read and write the same physical memory, it would be a security nightmare.)
It also wouldn't be surprising if they were different each time. For example, some operating systems randomize stack addresses to prevent some code exploits. I'm not sure whether heap addresses are also randomized, but they certainly could be.
https://en.wikipedia.org/wiki/Virtual_memory
This behavior is not entirely surprising. The malloc operation is simply returning a pointer to user addressable + allocated memory in the process. It's completely reasonable for the first memory request of the same size to return the same address through different invocations of a process
The behavior for q doesn't contradict this. You have given q no value hence it gets whatever the last value written to that portion of the stack was. It's unsurprising that undefined behavior would be different through different invocations of the same process (after all, it's undefined)
Your code is fine.
The same code, and the same algorithm for obtaining memory with malloc() is run each time, so there's no reason the addresses should be different.
Some malloc implementations could randomize the start of memory allocations, yours does not.
This is because of virtual memory. The physical memory address for memory of q is different, but your operating system provides each process with a virtual view of memory, mapping different physical memory addresses to the same virtual addresses in your processes. So all processes have a similar view of the memory (and cannot see the memory of other processes)
Heap allocators are not required to provide distinct / unique addresses each time you run the program. There is no guarantee either way on this, but it's entirely reasonable for an implementation of malloc() to have deterministic behavior and give you the same pointer each time you run the program.
The stack, on the other hand, usually is (but is not required to be) located at a different address. This is a protection measure against buffer-overflow exploits. By making the stack location non-deterministic, they make it more difficult for an attacker to inject direct memory addresses of code via buffer-overflow attack.
Finally note that all pointers in a program are virtual memory addresses, not physical addresses. So even though two concurrent processes might have the same memory address in a pointer, those two processes still have distinct memory in separate areas of physical memory. The operating system takes care of this via its virtual memory manager and page translation. Each process has its own virtual address space, various pieces of which are mapped transparently by the OS to physical memory as needed.