Why is forking slowing down my application

Why is forking slowing down my application - c

My application takes a checkpoint every few 100 milliseconds by using the fork system call. However, I notice that my application slows down significantly when using checkpointing (forking). I tested the time taken by fork call and it came out to be 1 to 2 ms. So why is fork slowing down my application so much. Note that I only keep 1 checkpoint (forked process) at a time and kill the previous checkpoint whenever I take a new one. Also, my computer has a huge RAM.
Notice that my forked process just sleeps after creation. It is only awoken when rollback needs to be done. So, it should not be scheduled by the OS. One thing that comes to my mind is that since fork is a copy-on-write mechanism, there are page faults occuring whenever my application modifies a page. But should that slow down the application significantly? Without checkpointing (forking), my application finishes in approximately 3.1 seconds and with it, it takes around 3.7 seconds. Any idea, what is slowing down my application?

You are probably observing the cost of the copy-on-write mechanism, as you hypothesize. That's actually quite expensive -- it is the reason vfork still exists. (The main cost is not the extra page faults themselves, but the memcpy of each page as it is touched, and the associated cache and TLB flushes.) It's not showing up as a cost of fork because the page faults don't happen inside the system call.
You can confirm the hypothesis by looking at the times reported by getrusage -- if this is correct, the extra time elapsed should be nearly all "system" time (CPU burnt inside the kernel). oprofile or perf will let you pin down the problem more specifically... if you can get them to work at all, which is nontrivial, alas.
Unfortunately, copy-on-write is also the reason why your checkpoint mechanism works in the first place. Can you get away with taking checkpoints at longer intervals? That's the only quick fix I can think of.

I suggest using oprofile to find out.
oprofile is believed to be able to profile a system (and not only a single process).
You could compare with what other checkpointing packages do, e.g. BLCR

Forking is by nature very expensive, as you're creating a copy of the existing process as an entirely new process. If speed is important to you, you should use threads.
Additionally, you say that the forked process sleeps until a 'rollback' is needed. I'm not sure what you mean by rollback, but provided its something that you can put in a function, you ought to just place it in a function and then create a thread that just runs that function and exits when you detect the need for the rollback. As an added bonus, if you use that method you only create the thread if you need it.

Related

What is best way to have several child processes in C

Working on a project that request to download about 300 pics from different locations by using wget every 20 minutes.
I wrote a C program that reads the database for all the Ids and locations into an array.
For each entry in the array, I call the external wget command to download it.
It works but is slow because it is doing one by one.
My thinking is to use either Multi-process, multi-thread or openMP to create several children.
Any suggestion for how to do this is appreciate.

Multiple Processes
An error in one process cannot crash another process. This is particularly useful when you will host third-party code (e.g. plugins), and this is the approach that (among others) Google Chrome takes. The disadvantage is that N processes use more system resources than N threads.
Multiple Threads
Uses fewer system resources than an equivalent number of processes. Thread programming is more error prone for many developers, and an error in one thread can affect other threads.
Best Option
For what you are doing, you are unlikely to see a significant difference in resource utilization. Use whichever model you can write fast in high quality.

Personally I would go for multi process. The wget's do not need to share any memory or communicate (other than an exit status which is only needed by the root) so a thread will not provide any additional benefit (in my opinion). As well as this creating them as processed allows the OS scheduler to best decide when to run each process.

Programming a relatively large, threaded application for old systems

Today my boss and I were having a discussion about some code I had written. My code downloads 3 files from a given HTTP/HTTPS link. I had multi-threaded the download so that all 3 files are downloading simultaneously in 3 separate threads. During this discussion, my boss tells me that the code is going to be shipped to people who will most likely be running old hardware and software (I'm talking Windows 2000).
Until this time, I had never considered how a threaded application would scale on older hardware. I realize that if the CPU has only 1 core, threads are useless and may even worsen performance. I have been wondering if this download task is an I/O operation. Meaning, if an API is blocked waiting for information from the HTTP/HTTPS server, will another thread that wants to do some calculation be scheduled meanwhile? Do older OSes do such scheduling?
Another thing he said: Since the code is going to be run on old machines, my application should not eat the CPU. He said use Sleep() calls after CPU intensive tasks to allow other programs some breathing space. Now I was always under the impression that using Sleep() is terrible in any program. Am I wrong? When is using Sleep() justified?
Thanks for looking!

I have been wondering if this download task is an I/O operation.
Meaning, if an API is blocked waiting for information from the
HTTP/HTTPS server, will another thread that wants to do some
calculation be scheduled meanwhile? Do older OSes do such scheduling?
Yes they do. That's the joke of having blocked IO. The thread is suspended and other calculations (threads) take place until an event wakes up the blocked thread. That's why it makes completely sense to split it up into threads even for single core machines instead of doing some poor man scheduling between the downloads yourself in a single thread.
Of course your downloads affect each other regarding bandwith, so threading won't help to speedup the download :-)
Another thing he said: Since the code is going to be run on old
machines, my application should not eat the CPU. He said use Sleep()
calls after CPU intensive tasks to allow other programs some breathing
space.
Actually using sleep AFTER the task finished won't help here. Doing Sleep after a certain time of calculation (doing sort of time slicing) before going on with the calculation could help. But this is only true for cooperative systems (e.g. like Windows 3.11). This does not play a role for preemptive systems where the scheduler uses time slicing to allocate calculation time to threads. Here it would be more important to think about lowering the priority for CPU intensive tasks in order to give other tasks precedence...
Now I was always under the impression that using Sleep() is terrible
in any program. Am I wrong? When is using Sleep() justified?
This really depends on what you are doing. If you implement sort of busy waiting for a certain flag being set which is set maybe after few seconds it's better to recheck if it's set after going to sleep for a while in order to give up your scheduled time slice instead of just buring CPU power with checking for a flag never being set.
In modern systems there is no sense in introducing Sleep in a calculation as it will only slow down your calculation.
Scheduling is subject to the OS's scheduler. He's the one with the "big picture". In my opinion every approach to "do it better" is only valid inside the scope of a specific application where you have the overview over certain relationships that are not obvious to the scheduler.
Addendum:
I did some research and found that Windows supports preemptive multitasking from Windows 95. The Windows NT-line (where Windows 2000 belongs to) always supported preemptive multitasking.

Polling a database versus triggering program from database?

I have a process wherein a program running in an application server must access a table in an Oracle database server whenever at least one row exists in this table. Each row of data relates to a client requesting some number crunching performed by the program. The program can only perform this number crunching serially (that is, for one client at a time rather than multiple clients in parallel).
Thus, the program needs to be informed of when data is available in the database for it to process. I could either
have the program poll the database, or
have the database trigger the program.
QUESTION 1: Is there any conventional wisdom why one approach might be better than the other?
QUESTION 2: I wonder if programs have any issues "running" for months at a time (would any processes in the server stop or disrupt the program from running? -- if so I don't know how I'd learn there was a problem unless from angry customers). Anyone have experience running programs on a server for a long time without issues? Or, if the server does crash, is there a way to auto-start a (i.e. C language executable) program on it after the server re-boots, thus not requiring a human to start it specifically?
Any advice appreciated.
UPDATE 1: Client is waiting for results, but a couple seconds additional delay (from polling) isn't a deal breaker.

I would like to give a more generic answer...
There is no right answer that applies every time. Some times you need a trigger, and some times is better to poll.
But… 9 out of 10 times, polling is much more efficient, safe and fast than triggering.
It's really simple. A trigger needs to instantiate a single program, of whatever nature, for every shot. That is just not efficient most of the time. Some people will argue that that is required when response time is a factor, but even then, half of the times polling is better because:
1) Resources: With triggers, and say 100 messages, you will need resources for 100 threads, with 1 thread processing a packet of 100 messages you need resources for 1 program.
2) Monitoring: A thread processing packets can report time consumed constantly on a defined packet size, clearly indicating how it is performing and when and how is performance being affected. Try that with a billion triggers jumping around…
3) Speed: Instantiating threads and allocating their resources is very expensive. And don’t get me started if you are opening a transaction for each trigger. A simple program processing a say 100 meessage packet will always be much faster that initiating 100 triggers…
3) Reaction time: With polling you can not react to things on line. So, the only exception allowed to use polling is when a user is waiting for the message to be processed. But then you need to be very careful, because if you have lots of clients doing the same thing at the same time, triggering might respond LATER, than if you where doing fast polling.
My 2cts. This has been learned the hard way ..

1) have the program poll the database, since you don't want your database to be able to start host programs (because you'd have to make sure that only "your" program can be started this way).
The classic (and most convenient IMO) way for doing this in Oracle would be through the DBMS_ALERT package.
The first program would signal an alert with a certain name, passing an optional message. A second program which registered for the alert would wait and receive it immediatly after the first program commits. A rollback of the first program would cancel the alert.
Of cause you can have many sessions signaling and waiting for alerts. However, an alert is a serialization device, so if one program signaled an alert, other programs signaling the same alert name will be blocked until the first one commits or rolls back.
Table DBMS_ALERT_INFO contains all the sessions which have registered for an alert. You can use this to check if the alert-processing is alive.
2) autostarting or background execution depends on your host platform and OS. In Windows you can use SRVANY.EXE to run any executable as a service.

I recommend using a C program to poll the database and a utility such as monit to restart the C program if there are any problems. Your C program can touch a file once in a while to indicate that it is still functioning properly, and monit can monitor the file. Monit can also check the process directly and make sure it isn't using too much memory.
For more information you could see my answer of this other question:
When a new row in database is added, an external command line program must be invoked
Alternatively, if people aren't sitting around waiting for the computation to finish, you could use a cron job to run the C program on a regular basis (e.g. every minute). Then monit would be less needed because your C program will start and stop all the time.

You might want to look into Oracle's "Change Notification":
http://docs.oracle.com/cd/E11882_01/appdev.112/e25518/adfns_cqn.htm
I don't know how well this integrates with a "regular" C program though.
It's also available through .Net and Java/JDBC
http://docs.oracle.com/cd/E11882_01/win.112/e23174/featChange.htm
http://docs.oracle.com/cd/E11882_01/java.112/e16548/dbchgnf.htm

There are simple job managers like gearman that you can use to send a job message from the database to a worker. Gearman has among others a MySQL user defined function interface, so it is probably easy to build one for oracle as well.

Linux automatically restarting application on crash - Daemons

I have an system running embedded linux and it is critical that it runs continuously. Basically it is a process for communicating to sensors and relaying that data to database and web client.
If a crash occurs, how do I restart the application automatically?
Also, there are several threads doing polling(eg sockets & uart communications). How do I ensure none of the threads get hung up or exit unexpectedly? Is there an easy to use watchdog that is threading friendly?

You can seamlessly restart your process as it dies with fork and waitpid as described in this answer. It does not cost any significant resources, since the OS will share the memory pages.
Which leaves only the problem of detecting a hung process. You can use any of the solutions pointed out by Michael Aaron Safyan for this, but a yet easier solution would be to use the alarm syscall repeatedly, having the signal terminate the process (use sigaction accordingly). As long as you keep calling alarm (i.e. as long as your program is running) it will keep running. Once you don't, the signal will fire.
That way, no extra programs needed, and only portable POSIX stuff used.

The gist of it is:
You need to detect if the program is still running and not hung.
You need to (re)start the program if the program is not running or is hung.
There are a number of different ways to do #1, but two that come to mind are:
Listening on a UNIX domain socket, to handle status requests. An external application can then inquire as to whether the application is still ok. If it gets no response within some timeout period, then it can be assumed that the application being queried has deadlocked or is dead.
Periodically touching a file with a preselected path. An external application can look a the timestamp for the file, and if it is stale, then it can assume that the appliation is dead or deadlocked.
With respect to #2, killing the previous PID and using fork+exec to launch a new process is typical. You might also consider making your application that runs "continuously", into an application that runs once, but then use "cron" or some other application to continuously rerun that single-run application.
Unfortunately, watchdog timers and getting out of deadlock are non-trivial issues. I don't know of any generic way to do it, and the few that I've seen are pretty ugly and not 100% bug-free. However, tsan can help detect potential deadlock scenarios and other threading issues with static analysis.

You could create a CRON job to check if the process is running with start-stop-daemon from time to time.

use this script for running your application
#!/bin/bash
while ! /path/to/program #This will wait for the program to exit successfully.
do
echo “restarting” # Else it will restart.
done
you can also put this script on your /etc/init.d/ in other to start as daemon

Suspend and Resume thread (Windows, C)

I'm currently developing a heavily multi-threaded application, dealing with lots of small data batch to process.
The problem with it is that too many threads are being spawns, which slows down the system considerably. In order to avoid that, I've got a table of Handles which limits the number of concurrent threads. Then I "WaitForMultipleObjects", and when one slot is being freed, I create a new thread, with its own data batch to handle.
Now, I've got as many threads as I want (typically, one per core). Even then, the load incurred by multi-threading is extremely sensible. The reason for this: the data batch is small, so I'm constantly creating new threads.
The first idea I'm currently implementing is simply to regroup jobs into longer serial lists. Therefore, when I'm creating a new thread, it will have 128 or 512 data batch to handle before being terminated. It works well, but somewhat destroys granularity.
I was asked to look for another scenario: if the problem comes from "creating" threads too often, what about "pausing" them, loading data batch and "resuming" the thread?
Unfortunately, I'm not too successful.
The problem is: when a thread is in "suspend" mode, "WaitForMultipleObjects" does not detect it as available. In fact, I can't efficiently distinguish between an active and suspended thread.
So I've got 2 questions:
How to detect "suspended thread", so that i can load new data into it and resume it?
Is it a good idea? After all, is "CreateThread" really a ressource hog?
Edit
After much testings, here are my findings concerning Thread Pooling and IO Completion Port, both advised in this post.
Thread Pooling is tested using the older version "QueueUserWorkItem".
IO Completion Port requires using CreateIoCompletionPort, GetQueuedCompletionStatus and PostQueuedCompletionStatus;
1) First on performance : Creating many threads is very costly, and both thread pooling and io completion ports are doing a great job to avoid that cost. I am now down to 8-jobs per batch, from an earlier 512-jobs per batch, with no slowdown. This is considerable. Even when going to 1-job per batch, performance impact is less than 5%. Truly remarkable.
From a performance standpoint, QueueUserWorkItem wins, albeit by such a small margin (about 1% better) that it is almost negligible.
2) On usage simplicity :
Regarding starting threads : No question, QueueUserWorkItem is by far the easiest to setup. IO Completion port is heavyweight in comparison.
Regarding ending threads : Win for IO Completion Port.
For some unknown reason, MS provides no function in C to know when all jobs are completed with QueueUserWorkItem. It requires some nasty tricks to successfully implement this basic but critical function. There is no excuse for such a lack of feature.
3) On resource control : Big win for IO Completion Port, which allows to finely tune the number of concurrent threads, while there is no such control with QueueUserWorkItem, which will happily spend all CPU cycles from all available cores. That, in itself, could be a deal breaker for QueueUserWorkItem.
Note that newer version of Completion Port seems to allow that control, but are only available on Windows Vista and later.
4) On compatibility : small win for IO Completion Port, which is available since Windows NT4. QueueUserWorkItem only exists since Windows 2000. This is however good enough. Newer version of Completion Port is a no-go for Windows XP.
As can be guessed, I'm pretty much tied between the 2 solutions. They both answer correctly to my needs.
For a general situation, I suggest I/O Completion Port, mostly for resource control.
On the other hand, QueueUserWorkItem is easier to setup. Quite a pity that it loses most of this simplicity on requiring the programmer to deal alone with end-of-jobs detection.

Instead of implementing your own, consider using CreateThreadpool(). The OS will do the work for you, and you don't have to worry about getting it right.

Yes, there's a fair amount of overhead involved with CreateThread. One solution is to use a thread pool, QueueUserWorkItem. Another is to just start a set of threads and have them retrieve a 'job item' from a thread-safe queue.

If you want to also support Windows XP, you cannot use CreateThreadpool -- otherwise, if Vista and newer is sufficient, Windows thread pools are the easiest way.
If Windows XP support is needed, spawn a number of threads and assign them to an IO completion port, then have each thread block on GetQueuedCompletionStatus(). Completion ports let you post events to the port which will wake exactly one thread per event, and they are very efficient. They use a LIFO strategy on waking threads to keep caches warm, too.
In any case, you will never want to suspend a thread. Never ever. Block, wait, but don't suspend.
The reason is that with suspend you get the problem that you describe, plus you will create deadlocks, e.g. if your thread is within a critical section or mutex. Aside from a debugger, nobody should ever need to suspend a thread.