I am building an application which takes as it's input an executable , executes it and keeps track of dynamic memory allocations among others to help track down memory errors.
After reading the name of the executable I create a child process,link the executable with my module ( which includes my version of malloc family of functions) and execute the executable provided by the user. The parent process will consist of a GUI ( using QT framework ) where I want to display warnings/errors/number of allocations.
I need to communicate the number of mallocs/frees and a series of warning messages to the parent process in real-time. After the users application has finished executing I wish to display the number of memory leaks. ( I have taken care of all the backend coding needed for this in the shared library I link against).
Real-Time:
I though of 2 different approaches to communicate this information.
Child process will write to 2 pipes ( 1 for writing whether allocation/free happened and another for writing a single integer to denote a warning message).
I though of simply sending a signal to denote whether an allocation has happened. Also create signals for each of the warning messages. I will map these to the actual warnings (strings) in the parent process.
Is the signal version as efficient as using a pipe? Is it feasible ? Is there any better choice , as I do care about efficiency:)
After user's application finishes executing:
I need to send the whole data structure I use to keep track of memory leaks here. This could possibly be very large so I am not sure which IPC method would be the most efficient.
Thanks for your time
I would suggest a unix-domain socket, it's a little more flexible than a pipe, can be configured for datagram mode which save you having to find message boundaries, and makes it easy to move to a network interface later.
Signals are definitely not the way to do this. In general, signals are best avoided whenever possible.
A pipe solution is fine. You could also use shared memory, but that would be more vulnerable to accidental corruption by the target application.
I suggest a combination of shared memory and a socket. Have a shared memory area, say 1MB, and log all your information in some standard format in that buffer. If/when the buffer fills or the process terminates you send a message, via the socket, to the reader. After the reader ACKs you can clear the buffer and carry on.
To answer caf's concern about target application corruption, just use the mprotect system call to remove permissions (set PROT_NONE) from the shared memory area before giving control to your target process. Naturally this means you'll have to set PROT_READ|PROT_WRITE before updating your log on each allocation, not sure if this is a performance win with the mprotect calls thrown in.
EDIT: in case it isn't blindingly obvious, you can have multiple buffers (or one divided into N parts) so you can pass control back to the target process immediately and not wait for the reader to ACK. Also, given enough computation resources the reader can run as often as it wants reading the currently active buffer and performing real-time updates to the user or whatever it's reading for.
Related
Assume that I have a page of memory that is read-only (e.g., set through mmap/mprotect). How do I modify one word (8 bytes) on this page at the lowest possible overhead?
Some context: I assume x86-64, Linux as my runtime environment. The modifications happen rarely but frequently enough so that I have to worry about overhead. The page is read only to protect some important data that must be read by the program frequently against rogue/illegal modifications. There are only few places that are allowed to modify the data on the page and I know all the locations of these places and the address of the page statically. The problem I'm trying to solve is protecting some data against memory safety bugs in the program with a few authorized places where I need to make modifications to the data. The modifications are not frequent but frequent enough so that several kernel-roundtrips (through system calls) are too costly.
So far, I thought of the following solutions:
mprotect
ptrace
shared memory
new system call
mprotect
mprotect(addr, 4096, PROT_WRITE | PROT_READ);
addr[12] = 0xc0fec0fe;
mprotect(addr, 4096, PROT_READ);
The mprotect solution is clean, simple, and straight-forward. Unfortunately, it involves two round trips into the kernel and will result in some overhead. In addition, the whole page will be writable during that time frame, allowing for some other thread to modify that memory area concurrently.
ptrace
Unfortunately, ptraceing yourself is no longer possible (as a ptraced-process needs to be stopped. So the solution is to fork, ptrace the child process, then use PTRACE_POKETEXT to write to the child processes memory.
This option has the drawback of spawning a parent process and will result in problems if the tracee uses multiple processes. The overhead per write is at least one system call for PTRACE plus the required synchronization between the processes.
shared memory
Shared memory is similar to the ptrace solution except that it reduces the system call. Both processes set up shared memory with different permissions (RW in the child, R in the parent). The two processes still need to synchronize on each write that is then carried out by the parent. Shared memory has similar drawbacks in complexity as the ptrace solution and incompatibilities with multiple communicating processes.
new system call
Adding a new system call to the kernel would solve the problem and would only require a single system call to modify one word in the process without having to change the page tables or the requirement to set up multiple communicating processes.
Is there anything that is faster than the 4 discussed/sketched solutions? Could I rely on any debug features? Are there any other neat low-level systems tricks?
This is a really ugly question.
I have a C++ program which does the following in a loop:
Waits for a JMS message
Calculates some data
Sends a JMS message in response
My program (let's call it "Bob") has a rather severe memory leak. The memory leak is located in a shared library that someone else wrote, which I must use, but the source code to which I do not have access.
This memory leak causes Bob to crash during the "calculates some data" phase of the loop. This is a problem, because another program is awaiting Bob's response, and will be very upset if it does not receive one.
Due to various restrictions (yes, this is an X/Y problem, I told you it was ugly), I have determined that my only viable strategy is to modify Bob so that it does the following in its loop:
Waits for a JMS message
Calculates some data
Sends a JMS message in response
Checks to see whether it's in danger of using "too much" memory
If so, forks and execs another copy of itself, and gracefully exits
My question is as follows:
What is the best (reliable but not too inefficient) way to detect whether we're using "too much" memory? My current thought is to compare getrlimit(RLIMIT_AS) rlim_cur to getrusage(RUSAGE_SELF) ru_maxrss; is that correct? If not, what's a better way? Bob runs in a Linux VM on various host machines, all with different amounts of memory.
Assuming the memory leak occurs in the "Calculates some data" phase, I think it might make more sense to just refactor that portion into a separate program and fork out to execute that in its own process. That way you can at least isolate the offending code and make it easier to replace it in the future, rather than just masking the problem by having the program restart itself when it runs low on memory.
The "Calculates some data" part can either be a long-running process that waits for requests from the main program and restarts itself when necessary, or (even simpler) it could be a one-and-done program that just takes its data in *argv and sends its results to stdout. Then your main loop can just fork out and exec it every time through, and read the results when they come back. I would go with the simpler option if possible, but that will of course depend on what your needs are.
If you make the program itself restart or fork the "Calculates some data" section to a separate process, in any case you'll need to check for the memory consumption. Since you're on Linux, an easy way to check this is to get the pid number of the process of interest, and read the contents of the file /proc/$PID/statm. The second number is the size of the resident set.
Reading these proc files are the way tools like top and htop get the data about processes. Reading a ~30-byte in-memory file periodically to check the memory leak doesn't sound too inefficient.
If the leak is regular and you want to make it a bit more sophisticated, you could even keep track of the rate of growth and adjust your rate of checks accordingly.
Is it possible to launch two completely independent programs into one scope of memory area?
For example, I have skype.exe and opera.exe and I want to launch them on a way that will allows them to share common memory. Sounds like threading to me.
These are quite some questions at the same time, let me try to dissect:
It is the definition of a process on a modern OS to have its own virtual address space. So running two processes in the same address space can't happen without a modification to the OS to allow exactly that.
Even if such a modification were available, it would be a less than perfect idea: Access to memory shared between threads is governed by synchronisation primitives explicitly built into them. There is no such mechanism to manage memory access between two processes, that have not explicitly been designed so
Sharing memory if so designed between processes does not at all need them to run in the same virtual address space in their totality: Shared memory segments exist in virtually all modern OS to facilitate exactly that. Again, those processes have to be explicitly designed to use this feature.
If they are two independent programs running then you have to ensure that the data is passed in an independent way between them. Let's say the two programs are running, the first program compute some data that the second program needs. The simplest thing to do is print the data from the first program into a file with some status at the end of the file (to indicate that it is safe for the other program to start reading it). From the other program you have a while loop that checks the status of the last line in that file every period of time.
The other option is to use some library like MPI which has protocols for message passing implemented.
I am working on a fuse program, when a read() call in fuse is called, it will read the specific file A and save it to its buffer. In my case, I let fuse send a message to my program, and it retrieves data from remote server and save it to this file A, then fuse read this file to get the data.
I am wondering is there a way to let my program save the data right into the buffer of fuse, and avoid I/O operations. Does named pipe a good option? I mean does it store its data in the memory? Or could I change this buffer to a shared memory? I know how to create a shared memory, but do not know if I could convert it. It seems a privates one.
Thanks your guys.
Oh i think here you want to make some communication between two different process then the idea of IPC(Interprocess Communication) comes..
there are 5 ways of doing that
1 Shared memory permits processes to communicate by simply reading and
writing to a specified memory location.
2 Mapped memory is similar to shared memory, except that it is associated with a
file in the filesystem.
3 Pipes permit sequential communication from one process to a related process.
4 FIFOs are similar to pipes, except that unrelated processes can communicate
because the pipe is given a name in the filesystem.
5 Sockets support communication between unrelated processes even on different
computers.
i think here shared memory will be good option.
1> 1st declare some shared memory in your program then attache it with fuse
2> when fuse send a message then your program should get data from server
and save it to that shared memory
3> make some signaling methods(to avoid any race condition) so after that
fuse can use that data
Perhaps you could write into some mmap of /proc/1234/mem where 1234 is the pid of the target process, but I don't recommend doing that (and there are certainly permission issues). And that still becomes a weird (and Linux specific) way of sharing memory. (I am not sure it should work).
But Mr32's answer is more appropriate.
I work on Linux for ARM processor for cable modem. There is a tool that I have written that sends/storms customized UDP packets using raw sockets. I form the packet from scratch so that we have the flexibility to play with different options. This tool is mainly for stress testing routers.
I actually have multiple interfaces created. Each interface will obtain IP addresses using DHCP. This is done in order to make the modem behave as virtual customer premises equipment (vcpe).
When the system comes up, I start those processes that are asked to. Every process that I start will continuously send packets. So process 0 will send packets using interface 0 and so on. Each of these processes that send packets would allow configuration (change in UDP parameters and other options at run time). Thats the reason I decide to have separate processes.
I start these processes using fork and excec from the provisioning processes of the modem.
The problem now is that each process takes up a lot of memory. Starting just 3 such processes, causes the system to crash and reboot.
I have tried the following:
I have always assumed that pushing more code to the Shared Libraries will help. So when I tried moving many functions into shared library and keeping minimum code in the processes, it made no difference to my surprise. I also removed all arrays and made them use the heap. However it made no difference. This maybe because the processes runs continuously and it makes no difference if it is stack or heap? I suspect the process from I where I call the fork is huge and that is the reason for the processes that I make result being huge. I am not sure how else I could go about. say process A is huge -> I start process B by forking and excec. B inherits A's memory area. So now I do this -> A starts C which inturn starts B will also not help as C still inherits A?. I used vfork as an alternative which did not help either. I do wonder why.
I would appreciate if someone give me tips to help me reduce the memory used by each independent child processes.
Given this is a test tool, then the most efficient thing to do is to add more memory to the testing machine.
Failing that:
How are you measuring memory usage? Some methods don't get accurate results.
Check you don't have any memory leaks. e.g. with Valgrind on Linux x86.
You could try running the different testers in a single process, as different threads, or even multiplexed in a single thread - since the network should be the limiting factor?
exec() will shrink the processes memory size as the new execution gets a fresh memory map.
If you can't add physical memory, then maybe you can add swap, maybe just for testing?
Not technically answering your question, but providing a couple of alternative solutions:
If you are using Linux have you considered using pktgen? It is a flexible tool for sending UDP packets from kernel as fast as the interface allows. This is much faster than a userspace tool.
oh and a shameless plug. I have made a multi-threaded network testing tool, which could be used to spam the network with UDP packets. It can operate in multi-process mode (by using fork), or multi-thread mode (by using pthreads). The pthreads might use less RAM, so might be better for you to use. If anything it might be worth looking at the source as I've spent many years improving this code, and its been able to generate enough packets to saturate a 10gbps interface.
What could be happening is that the fork call in process A requires a significant amount of RAM + swap (if any). Thus, when you call fork() from this process the kernel must reserve enough RAM and swap for the child process to have it's own copy (copy-on-write, actually) of the parent process's writable private memory, namely it's stack and heap. When you call exec() from the child process, that memory is no longer needed and your child process can have it's own, smaller private working set.
So, first thing to make sure is that you don't have more than one process at a time in the state between fork() and exec(). During this state is where the child process must have a duplicate of it's parent process virtual memory space.
Second, try using the overcommit settings which will allow the kernel to reserve more memory than actually exists. These are /proc/sys/vm/overcommit*. You can get away with using overcommit because your child processes only need the extra VM space until they call exec, and shouldn't actually touch the duplicated address space of the parent process.
Third, in your parent process you can allocate the largest blocks using shared memory, rather than the stack or heap, which are private. Thus, when you fork, those shared memory regions will be shared with the child process rather than duplicated copy-on-write.