Opening two programs in a same memoryspace

Opening two programs in a same memoryspace - c

Is it possible to launch two completely independent programs into one scope of memory area?
For example, I have skype.exe and opera.exe and I want to launch them on a way that will allows them to share common memory. Sounds like threading to me.

These are quite some questions at the same time, let me try to dissect:
It is the definition of a process on a modern OS to have its own virtual address space. So running two processes in the same address space can't happen without a modification to the OS to allow exactly that.
Even if such a modification were available, it would be a less than perfect idea: Access to memory shared between threads is governed by synchronisation primitives explicitly built into them. There is no such mechanism to manage memory access between two processes, that have not explicitly been designed so
Sharing memory if so designed between processes does not at all need them to run in the same virtual address space in their totality: Shared memory segments exist in virtually all modern OS to facilitate exactly that. Again, those processes have to be explicitly designed to use this feature.

If they are two independent programs running then you have to ensure that the data is passed in an independent way between them. Let's say the two programs are running, the first program compute some data that the second program needs. The simplest thing to do is print the data from the first program into a file with some status at the end of the file (to indicate that it is safe for the other program to start reading it). From the other program you have a while loop that checks the status of the last line in that file every period of time.
The other option is to use some library like MPI which has protocols for message passing implemented.

Related

Getting as much uninitialized memory as possible

I'm trying to create a C/C++ program that dumps as much uninitialized memory as possible.
The program has to be run by a local user, i.e in user mode.
It does not work to use malloc:
Why does malloc initialize the values to 0 in gcc?
The goal is not to use this data as a seed for randomness.
Does the OS always make sure that you can't see "leftovers" from other processes?
If possible, I would like references to implementations or further explanation.

The most common multi-user operating systems (modern Windows, Linux, other Unix variants, VMS--probably all OSes with a concept of virtual memory) try to isolate processes from one another for security. If process A could read process B's leftover memory, it might get access to user data it shouldn't have, so these operating systems will clear pages of memory before they become available to a new process. You would probably have to have elevated privileges to get at uninitialized RAM, and the solution would likely depend on which operating system it was.
Embedded OSes, DOS, and ancient versions of Windows generally don't have the facilities for protecting memory. But they also don't have a concept of virtual memory or of strong process isolation. On these, just allocating memory through the usual methods (e.g., malloc) would give you uninitialized memory without you having to do anything special.
For more information on Windows, you can search for Windows zero page thread to learn about the OS thread whose only job is to write zeros in unused pages so that they can be doled out again. Also, Windows has a feature called superfetch which fills up unused RAM with files that Windows predicts you'll want to open soon. If you allocated memory and Windows decided to give you a superfetch page, there would be a risk that you'd see the contents of a file you don't have access to read. This is another reason why pages must be cleared before they can be allocated to a process.

You got uninitialized memory. It contains indeterminate values. In your case those values are all 0. Nothing unexpected. If you want pseudo-random numbers use a PRNG. If you want real random numbers/entropy, use a legitimate random source like your operating system's random number device (e.g. /dev/urandom) or API.

No operating system in its right mind is going to provide uninitialized memory to a process.
The closest thing you are going to find is the stack. That memory will have been initialized when mapped to the process but much of it will have been overwritten.

It's common sense. We don't need to document that 1+1=2 either.
An operating system that leaks secrets between processes would be useless for many applications. So if a general purpose operating system that wants to be general purpose it will isolate processes. Keeping track of which pages might contain secrets and which are safe would be too much work and too error-prone, so we assume that every page that has ever been used is dirty and contains secrets. Initializing new pages with garbage is slower than initializing them with just one value, so random garbage isn't used. The most useful value is zero (for calloc or bss for example), so new pages are zeroed to clear them.
There's really no other way to do it.
There might be special purpose operating systems that don't do it and do leak secrets between processes (it might be necessary for real-time requirements for example). Some older operating systems didn't have decent memory management and privilege isolation. Also, malloc will reuse previously freed memory within the same process. Therefore malloc will be documented to contain uninitialized garbage. But that doesn't mean you'll ever be able to obtain uninitialized memory from another process on a general purpose operating system.
I guess a simple rule of thumb is: if your operating system ever asks you for a password it will not give uninitialized pages to a process and since zeroing is the only reasonable way to initialize pages, they will be zeroed.

Reading/writing in Linux kernel space

I want to add functions in the Linux kernel to write and read data. But I don't know how/where to store it so other programs can read/overwrite/delete it.
Program A calls uf_obj_add(param, param, param) it stores information in memory.
Program B does the same.
Program C calls uf_obj_get(param) the kernel checks if operation is allowed and if it is, it returns data.
Do I just need to malloc() memory or is it more difficult ?
And how uf_obj_get() can access memory where uf_obj_add() writes ?
Where to store memory location information so both functions can access the same data ?

As pointed out by commentators to your question, achieving this in userspace would probably be much safer. However, if you insist on achieving this by modifying kernel code, one way you can go is implementing a new device driver, which has functions such as read and write that you may implement according to your needs, in order to have your processes access some memory space. Your processes can then work, as you described, by reading from and writing onto the same space more or less as if they are reading from/writing to a regular file.
I would recommend reading quite a bit of materials before diving into kernel code, though. A good resource on device drivers is Linux Device Drivers. Even though a significant portion of its information may not be up-to-date, you may find here a version of the source code used in the book ported to linux 3.x. You may find what you are looking for under the directory scull.
Again, as pointed out by commentators to your question, I do not think you should jump right into updating the execution of the kernel space. However, for educational purposes scull may serve as a good starting point to read kernel code and see how to achieve results similar to what you described.

Sharing memory across multiple computers?

I'd like to share certain memory areas around multiple computers, that is, for a C/C++ project. When something on computer B accesses some memory area which is currently on computer A, that has to be locked on A and sent to B. I'm fine when its only linux compitable.
Thanks in advance :D

You cannot do this for a simple C/C++ project.
Common computer hardware does not have the physical properties that support this directly: Memory on one system cannot be read by another system.
In order to make it appear to C/C++ programs on different machines that they are sharing memory, you have to write software that provides this function. Typically, you would need to do something like this:
Allocate some pages in the virtual memory address space (of each process).
Mark those pages read-only.
Set a handler to receive the exception that occurs when the process attempts to write to the read-only memory. (This handler might be in the operating system, as some sort of kernel extension, or it might be a signal handler in your process.)
When the exception is received, determine what the process was attempting to write to memory. Write that to the page (perhaps by writing it through a separate mapping in virtual memory to the same physical memory, with this extra mapping marked writeable).
Send a message by network communications to the other machine telling it that memory has changed.
Resume execution in the process after the instruction that wrote to memory.
Additionally, you need to determine what to do about memory coherence: If two processes write to the same address in memory at nearly the same time, what happens? If process A writes to location X and then reads location Y while, at nearly the same time, process B writes to location Y and reads X, what do they see? Is it okay if the two processes see data that cannot possibly be the result of a single time sequence of writes to memory?
On top of all that, this is hugely expensive in time: Stores to memory that require exception handling and network operations take many thousands, likely hundreds of thousands, times as long as normal stores to memory. Your processes will execute excruciatingly slowly whenever they write to this shared memory.

There are software solutions, as noted in the comments. These use the paging hardware in the processors on a node to detect access, and use your local network fabric to disseminate the changes to the memory. One hardware alternative is reflective memory - you can read more about it here:
https://en.wikipedia.org/wiki/Reflective_memory
http://www.ecrin.com/embedded/downloads/reflectiveMemory.pdf
Old page was broken
http://www.dolphinics.com/solutions/embedded-system-reflective-memory.html
Reflective memory provides low latency (about one microsecond per hop) in either a ring or tree configuration.

Choice of Linux IPC technique

I am building an application which takes as it's input an executable , executes it and keeps track of dynamic memory allocations among others to help track down memory errors.
After reading the name of the executable I create a child process,link the executable with my module ( which includes my version of malloc family of functions) and execute the executable provided by the user. The parent process will consist of a GUI ( using QT framework ) where I want to display warnings/errors/number of allocations.
I need to communicate the number of mallocs/frees and a series of warning messages to the parent process in real-time. After the users application has finished executing I wish to display the number of memory leaks. ( I have taken care of all the backend coding needed for this in the shared library I link against).
Real-Time:
I though of 2 different approaches to communicate this information.
Child process will write to 2 pipes ( 1 for writing whether allocation/free happened and another for writing a single integer to denote a warning message).
I though of simply sending a signal to denote whether an allocation has happened. Also create signals for each of the warning messages. I will map these to the actual warnings (strings) in the parent process.
Is the signal version as efficient as using a pipe? Is it feasible ? Is there any better choice , as I do care about efficiency:)
After user's application finishes executing:
I need to send the whole data structure I use to keep track of memory leaks here. This could possibly be very large so I am not sure which IPC method would be the most efficient.
Thanks for your time

I would suggest a unix-domain socket, it's a little more flexible than a pipe, can be configured for datagram mode which save you having to find message boundaries, and makes it easy to move to a network interface later.

Signals are definitely not the way to do this. In general, signals are best avoided whenever possible.
A pipe solution is fine. You could also use shared memory, but that would be more vulnerable to accidental corruption by the target application.

I suggest a combination of shared memory and a socket. Have a shared memory area, say 1MB, and log all your information in some standard format in that buffer. If/when the buffer fills or the process terminates you send a message, via the socket, to the reader. After the reader ACKs you can clear the buffer and carry on.
To answer caf's concern about target application corruption, just use the mprotect system call to remove permissions (set PROT_NONE) from the shared memory area before giving control to your target process. Naturally this means you'll have to set PROT_READ|PROT_WRITE before updating your log on each allocation, not sure if this is a performance win with the mprotect calls thrown in.
EDIT: in case it isn't blindingly obvious, you can have multiple buffers (or one divided into N parts) so you can pass control back to the target process immediately and not wait for the reader to ACK. Also, given enough computation resources the reader can run as often as it wants reading the currently active buffer and performing real-time updates to the user or whatever it's reading for.

Memory Optimization for child processes

I work on Linux for ARM processor for cable modem. There is a tool that I have written that sends/storms customized UDP packets using raw sockets. I form the packet from scratch so that we have the flexibility to play with different options. This tool is mainly for stress testing routers.
I actually have multiple interfaces created. Each interface will obtain IP addresses using DHCP. This is done in order to make the modem behave as virtual customer premises equipment (vcpe).
When the system comes up, I start those processes that are asked to. Every process that I start will continuously send packets. So process 0 will send packets using interface 0 and so on. Each of these processes that send packets would allow configuration (change in UDP parameters and other options at run time). Thats the reason I decide to have separate processes.
I start these processes using fork and excec from the provisioning processes of the modem.
The problem now is that each process takes up a lot of memory. Starting just 3 such processes, causes the system to crash and reboot.
I have tried the following:
I have always assumed that pushing more code to the Shared Libraries will help. So when I tried moving many functions into shared library and keeping minimum code in the processes, it made no difference to my surprise. I also removed all arrays and made them use the heap. However it made no difference. This maybe because the processes runs continuously and it makes no difference if it is stack or heap? I suspect the process from I where I call the fork is huge and that is the reason for the processes that I make result being huge. I am not sure how else I could go about. say process A is huge -> I start process B by forking and excec. B inherits A's memory area. So now I do this -> A starts C which inturn starts B will also not help as C still inherits A?. I used vfork as an alternative which did not help either. I do wonder why.
I would appreciate if someone give me tips to help me reduce the memory used by each independent child processes.

Given this is a test tool, then the most efficient thing to do is to add more memory to the testing machine.
Failing that:
How are you measuring memory usage? Some methods don't get accurate results.
Check you don't have any memory leaks. e.g. with Valgrind on Linux x86.
You could try running the different testers in a single process, as different threads, or even multiplexed in a single thread - since the network should be the limiting factor?
exec() will shrink the processes memory size as the new execution gets a fresh memory map.
If you can't add physical memory, then maybe you can add swap, maybe just for testing?

Not technically answering your question, but providing a couple of alternative solutions:
If you are using Linux have you considered using pktgen? It is a flexible tool for sending UDP packets from kernel as fast as the interface allows. This is much faster than a userspace tool.
oh and a shameless plug. I have made a multi-threaded network testing tool, which could be used to spam the network with UDP packets. It can operate in multi-process mode (by using fork), or multi-thread mode (by using pthreads). The pthreads might use less RAM, so might be better for you to use. If anything it might be worth looking at the source as I've spent many years improving this code, and its been able to generate enough packets to saturate a 10gbps interface.

What could be happening is that the fork call in process A requires a significant amount of RAM + swap (if any). Thus, when you call fork() from this process the kernel must reserve enough RAM and swap for the child process to have it's own copy (copy-on-write, actually) of the parent process's writable private memory, namely it's stack and heap. When you call exec() from the child process, that memory is no longer needed and your child process can have it's own, smaller private working set.
So, first thing to make sure is that you don't have more than one process at a time in the state between fork() and exec(). During this state is where the child process must have a duplicate of it's parent process virtual memory space.
Second, try using the overcommit settings which will allow the kernel to reserve more memory than actually exists. These are /proc/sys/vm/overcommit*. You can get away with using overcommit because your child processes only need the extra VM space until they call exec, and shouldn't actually touch the duplicated address space of the parent process.
Third, in your parent process you can allocate the largest blocks using shared memory, rather than the stack or heap, which are private. Thus, when you fork, those shared memory regions will be shared with the child process rather than duplicated copy-on-write.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight