Run 3 parallel threads in a single core machine

Run 3 parallel threads in a single core machine - c

I have a question to ask. I have a program (Process 1) which has three threads:
Thread 1 runs continuously, receives packets from a lock socket (AF_UNIX, NON_BLOCK) and copies them to a buffer.
Thread 2 reads from the buffer and writes the information received to a file (disk).
Thread 3 compresses the file if the file grows larger than 5 MB
There is another process (Process 2) which is continuously sending packets to the local socket read by Process 1. Number of packets (of around 100 bytes) sent per second can be as high as 3000-5000 packets per second. This setup runs on an embedded hardware with ARM v9 controller.
I have to ensure that the none of the packets are lost and all of them are written to disk. With the current implementation, I receive sending errors at Process 2 from "sendto" (Resource unavailable) every now and then.
I disable all locks and mutexes for avoiding race conditions (remove all checks to prevent write while read and vice-versa), even then I get sending errors from "sendto".
Then in the second step, I disable the writing to disk. Now, the Thread 1 of Process 1 can read as fast as possible from the local socket and there is no sending error. My guess is that since the threads are running on an ARM controller with no hyperthreading, there is only one thread executing at a single point of time and OS is handling the scheduling of threads.
My question here is,
Is it possible to run the three threads in parallel (each of them executing simultaneously) ? Is there a gcc construct or a compiler flag which can force the running of threads in parallel (in foreground) ? Can I change something in the program to achieve the above without splitting the functionality into multiple programs and using a shared memory for buffer ?
Regards,
Anupam

No. You can't force any kind of thread order. So your first question, is it possible for them to execute simultaneously? Yes. How can you do it? You can't. The operating system chooses to do that. You can set priorities and things like that, but I still think linux (or windows) will switch threads pretty randomly and without telling you/allowing you to change the scheduler. Think about all the threads from all the programs running on your computer; which ones can execute and when? The answer is, who knows! There is no way to tell when your thread will block, even if its holding a lock (which is why you're getting a resource busy response probably). So how do you stop this from happening? Make sure you check to see if the resource is still locked before trying to use the resource! Then, it doens't matter when the threads lock a resource.
Also, if its IPC, why are you using sockets? Why not try a pipe, and then it doesn't matter if you lock it(unless more than one thread writes to the resource at a time).

In this case Sender is faster than Receiver so the NON_BLOCK option on the socket may cause the send error (returns error when sender needs to block). I have the following 2 suggestions
In Sender (Process 2) you can resend the packets which are resulted in send error.
Remove NON_BLOCK option on the socket.

Related

Should I close a single fifo that's written to by multiple threads after they are done?

I'm experimenting with a fictional server/client application where the client side launches request threads by a (possibly very large) period of time, with small in-between delays. Each request thread writes on the 'public' fifo (known by all client and server threads) the contents of the request, and receives the server answer in a 'private' fifo that is created by the server with a name that is implicitly known (in my case, it's 'tmp/processId.threadId').
The public fifo is opened once in the main (request thread spawner) thread so that all request threads may write to it.
Since I don't care about the return value of my request threads and I can't make sure how many request threads I create (so that I store their ids and join them later), I opted to create the threads in a detached state, exit the main thread when the specified timeout expires and let the already spawned threads live on their own.
All of this is fine, however, I'm not closing the public fifo anywhere after all spawned request threads finish: after all, I did exit the main thread without waiting. Is this a small kind of disaster, in which case I absolutely need to count the active threads (perhaps with a condition variable) and close the fifo when it's 0? Should I just accept that the file is not explicitly getting closed, and let the OS do it?

All of this is fine, however, I'm not closing the public fifo anywhere
after all spawned request threads finish: after all, I did exit the
main thread without waiting. Is this a small kind of disaster, in
which case I absolutely need to count the active threads (perhaps with
a condition variable) and close the fifo when it's 0? Should I just
accept that the file is not explicitly getting closed, and let the OS
do it?
Supposing that you genuinely mean a FIFO, such as might be created via mkfifo(), no, it's not a particular issue that the process does not explicitly close it. If any open handles on it remain when the process terminates, they will be closed. Depending on the nature of the termination, it might be that pending data are not flushed, but that is of no consequence if the FIFO is used only for communication among the threads of one process.
But it possibly is an issue that the process does not remove the FIFO. A FIFO has filesystem persistence. Once you create one, it lives until it no longer has any links to the filesystem and is no longer open in any process (like any other file). Merely closing it does not cause it to be removed. Aside from leaving clutter on your filesystem, this might cause issues for concurrent or future runs of the program.
If indeed you are using your FIFOs only for communication among the threads of a single process, then you would probably be better served by pipes.

I managed to solve this issue setting up a cleanup rotine with atexit, which is called when the process terminates, ie. all threads finish their work.

Are locked pages inherited by pthreads?

I have a little paging problem on my realtime system, and wanted to know how exactly linux should behave in my particular case.
Among various other things, my application spawns 2 threads using pthread_create(), which operate on a set of shared buffers.
The first thread, let's call it A, reads data from a device, performs some calculations on it, and writes the results into one of the buffers.
Once that buffer is full, thread B will read all the results and send them to a PC via ethernet, while thread A writes into the next buffer.
I have noticed that each time thread A starts writing into a previously unused buffer, i miss some interrupts and lose data (there is an id in the header of each packet, and if that increments by more than one, i have missed interrupts).
So if i use n buffers, i get exactly n bursts of missed interrupts at the start of my data acquisition (therefore the problem is definitely caused by paging).
To fix this, i used mlock() and memset() on all of the buffers to make sure they are actually paged in.
This fixed my problem, but i was wondering where in my code would be the correct place do this. In my main application, or in one/both of the threads? (currently i do it in both threads)
According to the libc documentation (section 3.4.2 "Locked Memory Details"), memory locks are not inherited by child processes created using fork().
So what about pthreads? Do they behave the same way, or would they inherit those locks from my main process?
Some background information about my system, even though i don't think it matters in this particular case:
It is an embedded system powered by a SoC with a dual-core Cortex-A9 running Linux 4.1.22 with PREEMPT_RT.
The interrupt frequency is 4kHz
The thread priorities (as shown in htop) are -99 for the interrupt, -98 for thread A (both of which are higher than the standard priority of -51 for all other interrupts) and -2 for thread B
EDIT:
I have done some additional tests, calling my page locking function from different threads (and in main).
If i lock the pages in main(), and then try to lock them again in one of the threads, i would expect to see a large amount of page faults for main() but no page faults for the thread itself (because the pages should already be locked). However, htop tells a different story: i see a large amount of page faults (MINFLT column) for each and every thread that locks those pages.
To me, that would suggest that pthreads actually do have the same limitation as child processes spawned using fork(). And if this is the case, locking them in both threads (but not in main) would be the correct procedure.

Threads share the same memory management context. If a page is resident for one thread, it's resident for all threads in the same process.
The implication of this is that memory locking is per-process, not per-thread.
You are probably still seeing minor faults on the first write because a fault is used to mark the page dirty. You can avoid this by also writing to each page after locking.

pthread pause/wait conditions

I have two threads that are communicating to a serial port. One thread is continuously polling the serial port. The polling is done by first writing to the serial port write("do you have a message") and then reading the reply. This has to happen in consecutively.
I have another thread that is used to send commands over the serial port. So all this thread is doing is writing a message to the port, and then reading a confirmation that the message was sent.
So, while one thread is used to read and another used to write, they both do a read() and write().
My issue is, my current implementation uses and read mutex and a write mutex, which of course means you can get a write, write, read, read ordering, which is not the behavior I want.
I tried only using one lock and encompassing the write/reads, but this is causing one thread (polling thread) to never let the other thread grab the lock, or it randomly happens, even if I put sleeps after an unlock of one thread.
So, is there a way to pause one thread... say after it's done a read/write combo, and let another thread do it's thing? What's the best way of going about this issue? I do not want to use pthread_join because 1. It will pause my main code that is creating these threads and 2. The polling thread is never meant to exit unless there is an error.
Thanks

Using one process to signal multiple other processes "simultaneously"

I have two different applications that have to work together. process 1 acts as a time source and process 2 performs acts according to the time source provided by process 1. I need to run multiple copies of process 2. The goal is to have one time source process signaling 5-10 other processes at the same time, so they they all perform their work simultaneously.
Currently, I have this implemented in the following way:
The time source program starts, created a shared memory segment, creates an empty list of PIDs, then unlocks the segment.
Each time one of the client programs start, they go the shared memory, add their own pid to the list, and then unlock it.
The time source has a timer than goes off every 10ms. Every time the timer goes off, he cycles through the pid list, and sends a signal to everyone in it back to back.
This approach mostly works well, but I am hoping that it can be improved. I currently have two sticking points:
Very rarely, the signal delivered to one of the client processes will be skewed by ~2 milliseconds or so. The end result is: | 12ms | 8ms | instead of | 10ms | 10ms |.
The second issue is that all of the client programs are actually multithreaded and doing a lot of work (though only the original thread is responsible for handling the signal). If I have multiple client processes running at once, the delivery of the signals gets more sporatic and skewed, as if they are more difficult to deliver when the system is more taxed (even if the client process is ready and waiting for the interrupt).
What other approaches should I consider for doing this type of thing? I have considered the following (all in the shared memory segment):
Using volatile uin8_t flags (set by time source process, cleared by client).
Using semaphores, but if the time source process is running, and the client hasn't started yet, how do I keep from incrementing the semaphore over and over?
Condition variables, though there doesn't seem to be a solution that can be used in shared memory between unrelated processes.

Even if a process is in waiting state, ready to receive a signal, does not mean that the kernel is going to schedule the task yet, and especially when there are most tasks in running states than there are available CPU cores.
Adjusting the priority (or nice level) or processes and threads, will influence the kernel scheduler.
¨
You can also play around with the different schedulers that are available in your kernel, and their parameters.

Asynchronous File I/O using threads in C

I'm trying to understand how asynchronous file operations being emulated using threads. I've found next-to-nothing materials to read about the subject.
Is it possible that:
a process uses a thread to open a regular file (HDD).
the parent gets the file descriptor from the thread, now it may close the thread.
the parent uses the file descriptor with a new thread, reading X bytes from the file.
the parent gets the file descriptor with the seek-position of the current file state.
the parent may repeat these operations, without the need to open, or seek, every time it wishes to "continue" reading a new chunk of the file?
This is just a wild guess of mine, would appreciate if anybody mind to shed more light to clarify how it's being emulated efficiently.
UPDATE:
By efficient I actually mean that I don't want the thread to "wait" since the moment the file been opened. Think of a HTTP non-blocking daemon which serves a client with a huge file, you want to use the thread to read chunks of the file without blocking the daemon - but you don't want to keep the thread busy while "waiting" for the actual transfer to take place, you want to use the thread for other blocking operations of other clients.

To understand asynchronous I/O better, it may be helpful to think in terms of overlapping operation. That is, the number of pending operations (operations that have been started but not yet completed) can simutaneously go above one.
A diagram that explains asynchronous I/O might look like this: http://msdn.microsoft.com/en-us/library/aa365683(VS.85).aspx
If you are using the asynchronous I/O capabilities provided by the underlying Operating System, then it is possible to asynchronously read from multiple files without spawning a equal number of threads.
If your underlying Operating System does not provide asynchronous I/O, or if you decide not to use it, in other words, you wish to emulate asynchronous operation by only using blocking I/O (the regular Read/Write provided by the Operating System) then it is necessary to spawn as many threads as the number of simutaneous I/O operations. This is because when a thread is making a function call to blocking I/O, the thread cannot continue its execution until the operation finishes. In order to start another blocking I/O operation, that operation has to be issued from another thread that is not already occupied.

When you open/create a file fire up a thread. Now store that thread id/ptr as your file handle.
Basically the thread will do nothing except sit in a loop waiting for an "event". A semaphore would be good here. When you want to do a read then you add the read command to a queue (remember to critical section the stack add), return a unique id, and then you increment the semaphore. If the thread is asleep it will now wake up and grab the first message off the queue and process it. When it has completed you remove the command from the queue.
To poll if a file read has completed you can, simply, check to see if its in the command queue. If its not there then the command has completed.
Furthermore if you want to allow synchronous reads as well then you can wait after sending the message through for an "event" to get triggered by the completion. You then check to see if the unique id is the queue and if it isn't you return control. If it still is then you go back to a wait state until the relevant unique id has been processed.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight