A thread is "lightweight" because most of the overhead has already been accomplished through the creation of its process.
I found this in one of the tutorials.
Can somebody elaborate what it exactly means?
The claim that threads are "lightweight" is - depending on the platform - not necessarily reliable.
An operating system thread has to support the execution of native code, e.g. written in C. So it has to provide a decent-sized stack, usually measured in megabytes. So if you started 1000 threads (perhaps in an attempt to support 1000 simultaneous connections to your server) you would have a memory requirement of 1 GB in your process before you even start to do any real work.
This is a real problem in highly scalable servers, so they don't use threads as if they were lightweight at all. They treat them as heavyweight resources. They might instead create a limited number of threads in a pool, and let them take work items from a queue.
As this means that the threads are long-lived and small in number, it might be better to use processes instead. That way you get address space isolation and there isn't really an issue with running out of resources.
In summary: be wary of "marketing" claims made on behalf of threads. Parallel processing is great (increasingly it's going to be essential), but threads are only one way of achieving it.
Process creation is "expensive", because it has to set up a complete new virtual memory space for the process with it's own address space. "expensive" means takes a lot of CPU time.
Threads don't need to do this, just change a few pointers around, so it's much "cheaper" than creating a process. The reason threads don't need this is because they run in the address space, and virtual memory of the parent process.
Every process must have at least one thread. So if you think about it, creating a process means creating the process AND creating a thread. Obviously, creating only a thread will take less time and work by the computer.
In addition, threads are "lightweight" because threads can interact without the need of inter-process communication. Switching between threads is "cheaper" than switching between processes (again, just moving some pointers around). And inter-process communication requires more expensive communication than threads.
Threads within a process share the same virtual memory space but each has a separate stack, and possibly "thread-local storage" if implemented. They are lightweight because a context switch is simply a case of switching the stack pointer and program counter and restoring other registers, wheras a process context switch involves switching the MMU context as well.
Moreover, communication between threads within a process is lightweight because they share an address space.
process:
process id
environment
folder
registers
stack
heap
file descriptor
shared libraries
instruments of interprocess communications (pipes, semaphores, queues, shared memory, etc.)
specific OS sources
thread:
stack
registers
attributes (for sheduler, like priority, policy, etc.)
specific thread data
specific OS sources
A process contains one or more threads in it and a thread can do anything a process can do. Also threads within a process share the same address space because of which cost of communication between threads is low as it is using the same code section, data section and OS resources, so these all features of thread makes it a "lightweight process".
Just because the threads share the common memory space. The memory allocated to the main thread will be shared by all other child threads.
Whereas in case of Process, the child process are in need to allocate the separate memory space.
Related
I have a little paging problem on my realtime system, and wanted to know how exactly linux should behave in my particular case.
Among various other things, my application spawns 2 threads using pthread_create(), which operate on a set of shared buffers.
The first thread, let's call it A, reads data from a device, performs some calculations on it, and writes the results into one of the buffers.
Once that buffer is full, thread B will read all the results and send them to a PC via ethernet, while thread A writes into the next buffer.
I have noticed that each time thread A starts writing into a previously unused buffer, i miss some interrupts and lose data (there is an id in the header of each packet, and if that increments by more than one, i have missed interrupts).
So if i use n buffers, i get exactly n bursts of missed interrupts at the start of my data acquisition (therefore the problem is definitely caused by paging).
To fix this, i used mlock() and memset() on all of the buffers to make sure they are actually paged in.
This fixed my problem, but i was wondering where in my code would be the correct place do this. In my main application, or in one/both of the threads? (currently i do it in both threads)
According to the libc documentation (section 3.4.2 "Locked Memory Details"), memory locks are not inherited by child processes created using fork().
So what about pthreads? Do they behave the same way, or would they inherit those locks from my main process?
Some background information about my system, even though i don't think it matters in this particular case:
It is an embedded system powered by a SoC with a dual-core Cortex-A9 running Linux 4.1.22 with PREEMPT_RT.
The interrupt frequency is 4kHz
The thread priorities (as shown in htop) are -99 for the interrupt, -98 for thread A (both of which are higher than the standard priority of -51 for all other interrupts) and -2 for thread B
EDIT:
I have done some additional tests, calling my page locking function from different threads (and in main).
If i lock the pages in main(), and then try to lock them again in one of the threads, i would expect to see a large amount of page faults for main() but no page faults for the thread itself (because the pages should already be locked). However, htop tells a different story: i see a large amount of page faults (MINFLT column) for each and every thread that locks those pages.
To me, that would suggest that pthreads actually do have the same limitation as child processes spawned using fork(). And if this is the case, locking them in both threads (but not in main) would be the correct procedure.
Threads share the same memory management context. If a page is resident for one thread, it's resident for all threads in the same process.
The implication of this is that memory locking is per-process, not per-thread.
You are probably still seeing minor faults on the first write because a fault is used to mark the page dirty. You can avoid this by also writing to each page after locking.
Is a thread dynamically allocated memory?
I have been researching and have a fair understanding of threads and how they are used. I have specifically looked at the POSIX API for threads.
I am trying to understand thread creation and how it differs from a simple malloc call.
I understand that threads share certain memory segments with the parent process, but it has it's own stack.
Any resources I can read through on this topic is appreciated. Thanks!
Thread creation and a malloc() call are completely different concepts. A malloc() call dynamically allocates the requested byte chunk of memory from the heap for the use of the program.
Whereas a thread can be considered as a 'light-weight process'. The thread is an entity within a process and every process will have atleast one thread to help complete its execution. The threads of a process will share the process virtual address and all the resources of the process. When you create new threads of a process, these new threads will have their own user stack, they will be scheduled independently to be executed by the scheduler. Also for the thread to run concurrently they will have their context which will store the state of the thread just before preemption i.e the status of all the registers.
Is a thread dynamically allocated memory?
No, nothing of the sort. Threads have memory uniquely associated with them -- at least a stack -- but such memory is not the thread itself.
I am trying to understand thread creation and how it differs from a simple malloc call.
New thread creation is not even the same kind of thing as memory allocation. The two are not at all comparable.
Threading implementations that have direct OS support (not all do) are unlikely to rely on the C library to obtain memory for their stack, kernel data structures, or any other thread-implementation-associated data. On the other hand, those that do not have OS support, such as Linux's old "green" threads, are more likely to allocate memory via the C library. Even threading implementations without direct OS support have the option of using a system call to obtain the memory they need, just as malloc() itself must do. In any case, the memory obtained is not itself the thread.
Note also that the difference between threading systems with and without OS support is orthogonal to the threading API. For example, Linux's green threads and the now-ubiquitous, kernel-supported NPTL threads both implement the POSIX thread API.
i am porting program from GNU/Linux to VxWorks, i am having a problem regarding to fork() and i can't find alternatives ; VxWork's API provide two useful calls taskSpawn( ) and rtpSpawn( ) to spwan RTP/Task but these API do NOT duplicate the calling process (fork does). does anyone have idea about porting/workaround fork() to Vxworks?
VxWorks API Reference
If I remember my vxWork correctly - you can't. fork() requires virtual memory management, something I believe VxWorks 5.5 does not provide, at least not the full semantics needed to implement fork. (it was added in vxwork 6 though if I am not mistaken).
I don't know anything about VxWorks memory model but it may be impossible to port fork. The reason for this is that when a process is forked, the memory of the original process is copied into the new process. Importantly, the two processes must use the same internal virtual addresses otherwise things like pointers are going to break.
Obviously, the two processes must have different physical addresses which means in order to fork, one requires a platform that has a memory management unit (MMU) and the kernel must support a memory model that allows programs to share the same virtual addresses. This is why there is no fork equivalent for creating a new thread.
In addition to this, copying a large process can be very expensive. So Linux uses what is called copy-on-write. This means all fork does is mark all memory pages read-only. When a write is attempted, an interrupt is generated and only then is the memory page copied.
It is unlikely that a Real Time Operating System RTOS will support copy-on-write because it means that memory write times are not bounded and violates the realtime guarantees of the OS.
It is therefore much easier not to support fork at all just implement APIs for spawning brand new processes without the duplication.
I am trying to work out a way to synchronize two processes which share data.
Basically I have two processes linked using shared memory. I need process A to set some data in the shared memory area, then process B to read that data and act on it.
The sequence of events I am looking to have is:
B blocks waiting for data available signal
A writes data
A signals data available
B reads data
B blocks waiting for data not available signal
A signals data not available
All goes back to the beginning.
In other terms, B would block until it got a "1" signal, get the data, then block again until that signal went to "0".
I have managed to emulate it OK using purely shared memory, but either I block using a while loop which consumes 100% of CPU time, or I use a while loop with a nanosleep in it which sometimes misses some of the signals.
I have tried using semaphores, but I can only find a way to wait for a zero, not for a one, and trying to use two semaphores just didn't work. I don't think semaphores are the way to go.
There will be numerous processes all accessing the same shared memory area, and all processes need to be notified when that shared memory has been modified.
It's basically trying to emulate a hardware data and control bus, where events are edge rather than level triggered. It's the transitions between states I am interested in, rather than the states themselves.
So, any ideas or thoughts?
Linux has its own eventfd(2) facility that you can incorporate into your normal poll/select loop. You can pass eventfd file descriptor from process to process through a UNIX socket the usual way, or just inherit it with fork(2).
Edit 0:
After re-reading the question I think one of your options is signals and process groups: start your "listening" processes under the same process group (setpgid(2)), then signal them all with negative pid argument to kill(2) or sigqueue(2). Again, Linux provides signalfd(2) for polling and avoiding slow signal trampolines.
If 2 processes are involved you can use a file , shared memory or even networking to pass the flag or signal. But if the processes are more, there may be some suitable solutions in modifying the kernel. There is one shared memory in your question, right ?! How the signals are passed now ?!
In linux, all POSIX control structures (mutex, conditions, read-write-locks, semaphores) have an option such that they also can be used between processes if they reside in shared memory. For the process that you describe a classic mutex/condition pair seem to fit the job well. Look into the man pages of the ..._init functions for these structures.
Linux has other proper utilities such as "futex" to handle this even more efficiently. But these are probably not the right tools to start with.
1 Single Reader & Single Writer
1 Single Reader & Single Writer
This can be implemented using semaphores.
In posix semaphore api, you have sem_wait() which will wait until value of the semaphore count is zero once it is incremented using sem_post from other process the wait will finish.
In this case you have to use 2 semaphores for synchronization.
process 1 (reader)
sem_wait(sem1);
.......
sem_post(sem2);
process 2(writer)
sem_wait(sem2);
.......
sem_post(sem1);
In this way you can achieve synchronization in shared memory.
How does one implement a multithreaded single process model in linux fedora under c where a single scheduler is used on a "main" core reading i/o availability (ex. tcp/ip, udp) then having a single-thread-per-core (started at init), the "execution thread", parse the data then update a small amount of info update to shared memory space (it is my understanding pthreads share data under a single process).
I beleive my options are:
Pthreads or the linux OS scheduler
I have a naive model in mind consisting of starting a certain number of these execution threads a single scheduler thread.
What is the best solution one could think when I know that I can use this sort of model.
Completing Benoit's answer, in order to communicate between your master and your worker threads, you could use conditional variable. The workers do something like:
while (true)
{
pthread_mutex_lock(workQueueMutex);
while (workQueue.empty())
pthread_cond_wait(workQueueCond, workQueueMutex);
/* if we get were then (a) we have work (b) we hold workQueueMutex */
work = pop(workQueue);
pthread_mutex_unlock(workQueueMutex);
/* do work */
}
and the master:
/* I/O received */
pthread_mutex_lock(workQueueMutex);
push(workQueue, work);
pthread_cond_signal(workQueueCond);
pthread_mutex_unlock(workQueueMutex);
This would wake up one idle work to immediately process the request. If no worker is available, the work will be dequeued and processed later.
Modifying the Linux scheduler is quite a tough work. I would just forget about it. Pthread is usually prefered. If I understand well, you want to have one core dedicated to the control plan, and a pool of other cores dedicated to the data plan processing? Then create a pool of threads from your master thread and setup core affinity for these slave threads with pthread_setaffinity_np(...).
Indeed threads of a process share the same address-space, and global variables are accessible by any threads of that process.
It looks to me that you have a version of the producer-consumer problem with a single consumer aggregating the results of n producers. This is a pretty standard problem, so I definitely think that pthread is more than enough for you. You don't need to go and mess around with the scheduler.
As one of the answer's states, a thread safe queue like the one described here works nicely for this sort of issue. Your original idea of spawning a bunch of threads is a good idea. You seem to be worried that the ability of the threads to share global state will cause you problems. I don't think that this is an issue if you keep shared state to a minimum and use sane locking discipline. Sharing state is fine as long as you do so responsibly.
Finally, unless you really know what you're doing, I would advise against manually messing with thread affinity. Just spawn the threads and let the scheduler handle when and on what core a thread runs. The thing to optimize is the number of threads you use. One for each core may not actually be the fastest approach if other threads are running.
Generally speaking, this is more or less exactly what the posix select and linux specific epoll functions are for.