i am porting program from GNU/Linux to VxWorks, i am having a problem regarding to fork() and i can't find alternatives ; VxWork's API provide two useful calls taskSpawn( ) and rtpSpawn( ) to spwan RTP/Task but these API do NOT duplicate the calling process (fork does). does anyone have idea about porting/workaround fork() to Vxworks?
VxWorks API Reference
If I remember my vxWork correctly - you can't. fork() requires virtual memory management, something I believe VxWorks 5.5 does not provide, at least not the full semantics needed to implement fork. (it was added in vxwork 6 though if I am not mistaken).
I don't know anything about VxWorks memory model but it may be impossible to port fork. The reason for this is that when a process is forked, the memory of the original process is copied into the new process. Importantly, the two processes must use the same internal virtual addresses otherwise things like pointers are going to break.
Obviously, the two processes must have different physical addresses which means in order to fork, one requires a platform that has a memory management unit (MMU) and the kernel must support a memory model that allows programs to share the same virtual addresses. This is why there is no fork equivalent for creating a new thread.
In addition to this, copying a large process can be very expensive. So Linux uses what is called copy-on-write. This means all fork does is mark all memory pages read-only. When a write is attempted, an interrupt is generated and only then is the memory page copied.
It is unlikely that a Real Time Operating System RTOS will support copy-on-write because it means that memory write times are not bounded and violates the realtime guarantees of the OS.
It is therefore much easier not to support fork at all just implement APIs for spawning brand new processes without the duplication.
Related
In the man pages I've been reading, it seems popen, system, etc. tend to call fork(). In turn, fork() copies the process's entire memory state. This seems really heavy, especially when in many situations a child from a call to fork() uses little if any of the memory allocated for the parent.
So, my question is, can I get fork() like behavior without duplicating the whole memory state of the parent process? Or is there something I am missing, such that fork() is not as heavy as it appears (like, maybe calls tend to be optimized to avoid unnecessary memory duplication)?
fork(2) is, as all syscalls, a primitive operation (but some C libraries use clone(2) for it), from the point of view of user-space application. It is mostly a single machine instruction SYSCALL or SYSENTER to switch from user-mode to kernel-mode, then the (recent version of) Linux kernel is doing quite significant processing.
It is in practice quite efficient (e.g. less than a millisecond, and sometimes even less than a tenth of it) because the kernel is extensively using lazy copy-on-write techniques to share pages between parent & child processes. The actual copying would happen later, on page faults, when overwriting a shared page.
And forking has a huge advantage, since the starting of some other program is delegated to execve(2): it is conceptually simple: the only difference between the parent & child processes is the result of fork
BTW on POSIX systems such as Linux, fork(2) or the suitable clone(2) equivalent is the only way to create a process (there are some few weird exceptions that you should generally ignore: the kernel is making some processes like /sbin/init etc...), since vfork(2) is obsolete.
The problem is that to run the main function of a standardly linked executable, you need to call execve, and exec replaces the whole process image and so you need a new address space, which is what fork is for.
You can get around this by having your calee expose its main functionality in a shared library (but then it must not be called main), and then you can load the function with the main functionality without having to fork (provided there are no symbol conflicts).
That would be a more efficient alternative to system (basically with the efficiency of a function call).
Now popen involves pipes and to use pipes you need to have the pipe ends in different schedulable units. Threads, which use the same address space, can be used here as a lighter alternative to separate processes.
As you alluded to fork() is a bit of a mad syscall that has kind of stuck around for historical reasons. There's a great article about its flaws here, and also this post goes into some details and potential workarounds.
Although on Linux fork() is optimised to use copy-on-write for the memory, it's still not "free" because:
It still has to do some memory-related admin (new page tables, etc.)
If you're using RAII (e.g. in C++ or possibly Rust) then all the objects that are copied will be cleaned up twice. That might even lead to logic errors (e.g. deleting temporary files twice).
It's likely that the parent process will keep running, probably modifying lots of its memory, and then it will have to be copied.
The alternatives appear to be:
vfork()
clone()
posix_spawn()
vfork() was created for the common use case of doing fork() and then execve() to run a program. execve() replaces all of the memory of the current process with a new set, so there's no point copying the parent process's memory if your just about to obliterate it.
So vfork() doesn't do that. Instead it runs in the same memory space as the parent process and pauses it until it gets to execve(). The Linux man page for vfork() says that doing just about anything except vfork() then execve() is undefined behaviour.
posix_spawn() is basically a nice wrapper around vfork() and then execve().
clone() is similar to fork() but allows you to exactly specify what is copied (file descriptors, memory, etc.). It has a load of options, including one (CLONE_VM) which lets the child process run in the same address space as the parent, which is pretty wild! I guess that is the lightest weight way to make a new process because it doesn't involve any copying of memory at all!
But in practice I think in most situations you should either:
Use threads, or
Use posix_spawn().
(Note, I am just researching this now; I'm not an expert so I might have got some things wrong.)
When a process forks, would the child process have the customized shared library (.so file) in its address space?
If so, is the address of the shared library be same or different from its parent process (due to ASLR) ?
Would the function running before the main function __attribute__ ((constructor)) constructor be executed again in all the child process? What about thread?
Yes, the child will retain the parent's mappings. Ordinarily, Linux's virtual memory system will actually share the page between the two processes, up until either one tries to write new data. At that point, a copy will be made and each process will have its own unique version - at a different physical address but retaining the same virtual address. This is referred to as "copy on write" and is a substantial efficiency and resources advantage over systems which cannot support this, particularly running code which forks frequently.
Address Space Layout Randomization (ASLR) can't apply for libraries or objects which are already allocated virtual addresses, as to do so would break any pointers held anywhere in the code - something that a system running non-managed code can't know enough about to account for.
Since all previously constructed objects already exist in memory, constructors are not called again just because of the fork. Any objects which need to be duplicated because they are being uniquely modified have this done invisibly by the VM system behind the scenes - they don't really know that they are being cloned, and you could very well end up having a pair of objects where part of the implementation continues to share a physical page with identical contents while another part has been invisibly bifurcated into distinct physical pages with differing contents for each process.
You also asked about threads, and that is an area where things get complicated. Normally, only the thread which called fork() will exist in live form in the child (though data belonging to the others will exist in shared mappings, since it can't be known what might be shared with the forked thread). If you need to try to fork a multithreaded program, you will need to see the documentation of your threading implementation. For the common pthreads implementation on Linux, particularly pay attention to pthread_atfork()
I'm working with semaphores in C , especifically to control the access to a shared memory zone in linux. but there is one thing that I can't understand.
I am using a mutex to control the access to a specific zone because i have 2 processes that must read/write from that zone. the thing is, when we use the fork() to create a new child process, the whole program is "copied" to another program as if they were two seperate programs right ? so, when i do V(mutex) in one process, how does the other one know he can't access ?
I know its a noob question but nobody could explain this to me until now.
After the fork neither process is going to know about the memory actions of the other because they are separate copies. You have to put your shared variables in shared memory, including mutexes and semaphores. Then all the processes are operating on the same resource.
For unrelated (i.e. non-forked) process there are usually system facilities (e.g. named semaphores) that each process can open based on a path name or similar method that each can use to find and use the resource.
You synchronisation objects must be placed in process shared memory, for example created with mmap (... MAP_ANONYMOUS ...). In addition, they must have the PTHREAD_PROCESS_SHARED attribute set, for example, by using pthread_mutexattr_setpshared.
See here:
Semaphores and Mutex for Thread and Process Synchronization
So mutex in practice is often used in threads, which makes sharing trivial. For processes however, mutex could be stored as a part of the shared mem.
For semaphores however, linux has built in library, which identifies global semaphores by keys. See below.
http://beej.us/guide/bgipc/output/html/multipage/semaphores.html
Or you can use other IPC to sync. Signals, for example.
Hope this helps.
I am working over an embedded http server written in C which was originally using fork() for handling each client request.
I switched it to use pthread_create instead of fork().
During memory usage comparison b/w the fork() and threaded version, I observed that is a change in %VSZ utilization as listed by top. The fork() version reports higher %VSZ then of pthread_create().
Can anyone explain why this change is there, because, as far as I think all the changes I have done are related to creating threads. I can't determine how it as changed the Virtual memory Size of the process.
Basically a fork()creates another process, which means that it gets assigned its own memory space, which means that you multiply the memory used.
A Thread on the other hand shares its memory space with the process that created it, therefore your memory usage will be way smaller, but you have to worry about race conditions and deadlocks if you access the same variable from multiple threads. (Does not happen with processes unless you use shared memory constructs)
A thread is "lightweight" because most of the overhead has already been accomplished through the creation of its process.
I found this in one of the tutorials.
Can somebody elaborate what it exactly means?
The claim that threads are "lightweight" is - depending on the platform - not necessarily reliable.
An operating system thread has to support the execution of native code, e.g. written in C. So it has to provide a decent-sized stack, usually measured in megabytes. So if you started 1000 threads (perhaps in an attempt to support 1000 simultaneous connections to your server) you would have a memory requirement of 1 GB in your process before you even start to do any real work.
This is a real problem in highly scalable servers, so they don't use threads as if they were lightweight at all. They treat them as heavyweight resources. They might instead create a limited number of threads in a pool, and let them take work items from a queue.
As this means that the threads are long-lived and small in number, it might be better to use processes instead. That way you get address space isolation and there isn't really an issue with running out of resources.
In summary: be wary of "marketing" claims made on behalf of threads. Parallel processing is great (increasingly it's going to be essential), but threads are only one way of achieving it.
Process creation is "expensive", because it has to set up a complete new virtual memory space for the process with it's own address space. "expensive" means takes a lot of CPU time.
Threads don't need to do this, just change a few pointers around, so it's much "cheaper" than creating a process. The reason threads don't need this is because they run in the address space, and virtual memory of the parent process.
Every process must have at least one thread. So if you think about it, creating a process means creating the process AND creating a thread. Obviously, creating only a thread will take less time and work by the computer.
In addition, threads are "lightweight" because threads can interact without the need of inter-process communication. Switching between threads is "cheaper" than switching between processes (again, just moving some pointers around). And inter-process communication requires more expensive communication than threads.
Threads within a process share the same virtual memory space but each has a separate stack, and possibly "thread-local storage" if implemented. They are lightweight because a context switch is simply a case of switching the stack pointer and program counter and restoring other registers, wheras a process context switch involves switching the MMU context as well.
Moreover, communication between threads within a process is lightweight because they share an address space.
process:
process id
environment
folder
registers
stack
heap
file descriptor
shared libraries
instruments of interprocess communications (pipes, semaphores, queues, shared memory, etc.)
specific OS sources
thread:
stack
registers
attributes (for sheduler, like priority, policy, etc.)
specific thread data
specific OS sources
A process contains one or more threads in it and a thread can do anything a process can do. Also threads within a process share the same address space because of which cost of communication between threads is low as it is using the same code section, data section and OS resources, so these all features of thread makes it a "lightweight process".
Just because the threads share the common memory space. The memory allocated to the main thread will be shared by all other child threads.
Whereas in case of Process, the child process are in need to allocate the separate memory space.