How to speed up consecutive program startup under Linux? - c

I've written two relatively small programs using C. Both of them comunnicate with each other using textual data. Program A generates some problems from given input, B evaluates them and creates input for another iteration of A.
Here's a bash script that I currently use:
for i in {1..1000}
do
./A data > data2;
./B data2 > data;
done
The problem is that since what A and B do is not very time consuming, most of the time is spent (as I suppose) in starting apps up. When I measure time the script runs I get:
$ time ./bash.sh
real 0m10.304s
user 0m4.010s
sys 0m0.113s
So my main question is: is there any way to communicate data beetwen those two apps faster? I don't want to integrate them into one application, because I'm trying to build a toolset with independent, easly communicating tools (as was suggested in "The Art of Unix Programming" from which I'm learning the way to write reusable software).
PS. The data and data2 files contain sets of data needed in whole at once by those applications (so communicating by for e.g. one line of data at time is impossible).
Thanks for any suggestions.
cheers,
kajman

Can you create named pipe ?
mkfifo data1
mkfifo data2
./A data1 > data2 &
./B data2 > data1
If your application is reading and writing in a loop, this could work :)

If you used pipes to transfer the stdout of program A to the stdin of program B you would remove the need to write the file "data2" each loop.
./A data1 | ./B > data1
Program B would need to have the capability of using input from stdin rather than a specified file.

If you want to make a program run faster, you need to understand what is making the program run slowly. The field of computer science dedicated to measuring the performance of a running program is called profiling.
Once you discover which internal portion of your program is running slow, you can generally speed it up. How you go about speeding up that item depends heavily on what "the slow part" is doing and how it is "being done".
Several people have recommended pipes for moving the data directly from the output of one program into the input of another program. Assuming you rewrite your tools to handle input and output in a piped manner, this might improve performance. Again, it depends on what you are doing and how you are doing it.
For example, if your tool just fixes windows style end-of-lines into unix style end-of-lines, the program might read in one line, waiting for it to be available, check the end-of-line and write out the line with the desired end-of-line. Or the tool might read in all of the data, do a replacement call on each "wrong" end-of-line in memory, and then write out all of the data. With the first solution, piping speeds things up. With the second solution piping doesn't speed up anything.
The reason is is truly so hard to answer such a question is because the fix you need really depends on the code you have, the problem you are trying to solve, and the means by which you are solving it now. In the end, there isn't always a 100% guarantee that the code can be sped up; however, virtually every piece of code has opportunities to be sped up. Use profiling to speed up the parts that are slow, instead of wasting your time working on a part of your program that is only called once, and represents 0.001% of the program's runtime.
Remember if you speed up something that is 0.001% of your program's runtime by 50%, you actually only sped up your entire program by 0.0005%. Use profiling to determine the block of code that's taking up 90% of your runtime and concentrate on it.

I do have to wonder why, if A and B depend on each other to run, do you want them to be part of an independent toolset.
One solution is a compromise between the two:
Create a library that contains A.
Create a library that contains B.
Create a program that spawns two threads, 1 containing A and 2 containing B.
Create a semaphore that tells A to run and another that tells B to run.
After the function that calls A in 1, increment B's semaphore.
After the function that calls B in 2, increment A's semaphore.
Another possibility is to use file locking in your programs:
Make both A and B execute in infinite loops (or however many times you're processing data)
Add code to attempt to lock both files at the beginning of the infinite loop in A and B (if not, sleep and try again so that you don't do anything until you have the lock).
Add code to unlock and sleep for longer than the sleep in step 2 at the end of each loop.
Either of these solve the problem of having the overhead of launching the program between runs.

It's almost certainly not application startup which is the bottleneck. Linux will end up caching large portions of your programs, which means that launching will progressively get faster (to a point) the more times you start your program.
You need to look elsewhere for your bottleneck.

Related

Logging Events---Looking For a Good Way

I created a program that monitors for events.
I want to log these events "in the right way".
Currently I have a string array, log[500][100].
Each line is a string of characters (up to 100) that report something about the event.
I have it set up so that only the last 500 events are saved in the array.
After that, new events overwrite the oldest events.
Currently I just keep revolving through the array until the program terminates, then I write the array to a file.
Going forward I would like to view the log in real time, any time I wish, without disturbing the event processing and logging process.
I considered opening the file for "appending" but here are my concerns:
(1) The program is running on a Raspberry Pi which has a flash memory as a "disk drive". I believe flash memories have a limited number of write cycles before problems can occur. This program runs 24/7 "forever" so I am afraid the "disk drive" will "wear out".
(2) I am using pretty much all the CPU capacity of the RPi so I don't want to add a lot of overhead/CPU cycles.
How would experienced programmers attack this problem?
Please go easy on me, this is my first C program.
[EDIT]
I began reviewing all the information and I became intrigued by Mark A's suggestion for tmpfs. I looked into it more and I am sure this answers my question. It allows the creation of files in RAM not the SD card. They are lost on power down but I don't care.
In order to keep the files from growing to large I created a double buffer approach. First I write 500 events to file A then switch to file B. When 500 events have been written to file B I close and reopen file A (to delete the contents and start at 0 events) and switch to writing to file A. I found I needed to fflush(file...) after each write or else the file was empty until fclose.
Normally that would be OK but right now I am fighting a nasty segmentation fault so I want as much insight into what is going on. When I hit the fault, I never get to my fclose statements.
Welcome to Stack Overflow and to C programming! A wonderful world of possibilities awaits you.
I have several thoughts in response to your situation.
The very short summary is to use stdout and delegate the output-file management to the shell.
The longer, rambling answer full of my personal musing is as follows:
1 : A very typical thing for C programs to do is not be in charge of how outputs are kept. You might have heard of the "built in" file handles, stdin, stdout, and stderr. These file handles are (under normal circumstances) always available to your program for input (from stdin) and output (stdout and stderr). As you might guess from their names stdout is customarily used for regular output and stderr is customarily used for error / exception output. It is exceedingly typical for a C program to simply read from stdin and output to stdout and stderr, and let something else (e.g., the shell) take care of what those actually are.
For example, reading from stdin means that your program can be used for keyboard entry and for file reading, without needing to change your program's code. The same goes for stdout and stderr; simply output to those file handles, and let the user decide whether those should go to the screen or be redirected to a file. And, because stdout and stderr are separate file handles, the user can have them go to separate 'destinations'.
In your case, to implement this, drop the array entirely, and simply
fprintf(stdout, "event notice : %s\n", eventdetailstring);
(or similar) every time your program has something to say. Take a look at fflush(), too, because of potential output buffering.
2a : This gets you continuous output. This itself can help with your concern about memory wear on the Pi's flash disk. If you do something like:
eventmonitor > logfile
then logfile will be being appended to during the lifetime of your program, which will tend to be writing to new parts of the flash disk. Of course, if you only ever append, you will eventually run out of space on the disk, so you might set up a cron job to kill the currently running eventmonitor and restart it every day at midnight. Done with the above command, that would cause it to overwrite logfile once per day. This prevents endless growth, and it might even use a new physical area of the flash drive for the new file (even though it's the same name; underneath, it's a different file, with a different inode, etc.) But even if it reuses the exact same area of the flash drive, now you are down to worrying if this will last more than 10,000 days, instead of 10,000 writes. I'm betting that within 10,000 days, new options will be available -- worst case, you buy a new Pi every 27 years or so!
There are other possible variations on this theme, as well. e.g., you could have a sophisticated script kicked off by cron every day at midnight that kills any currently running eventmonitor, deletes output files older than a week, and starts a new eventmonitor outputting to a file whose filename is based partly on the date so that past days' file aren't overwritten. But all of this is in the realm of using your program. You can make your program easier to use by writing it to use stdin, stdout, and stderr.
2b : Or, you can just have stdout go to the screen, which is typically how it already is when a program is started from an interactive shell / terminal window. I imagine you could have the Pi running headless most of the time, and when you want to see what your program is outputting, hook up a monitor. Generally, things will stay running between disconnecting and reconnecting your monitor. This avoids affecting the flash drive at all.
3 : Another approach is to have your event monitoring program send its output somewhere off-system. This is getting into more advanced programming territory, so you might want to save this for a later enhancement, after you've mastered more of the basics. But, your program could establish a network connection to, say, a JSON API and send event information there. This would let you separate the functions of event monitoring from event reporting.
You will discover as you learn more programming that this idea of separation of concerns is an important concept, and applies at various levels of a program or a system of interoperating programs. In this case, the Pi is a good fit for the data monitoring aspect because it is a lightweight solution, and some other system with more capacity and more stable storage can cover the data collection aspect.

Execute Large C Program By Generating Intermediate Stages

I have an algorithm that takes 7 days to Run To Completion (and few more algorithms too)
Problem: In order to successfully Run the program, I need continuous power supply. And if out of luck, there is a power loss in the middle, I need to restart it again.
So I would like to ask a way using which I can make my program execute in phases (say each phase generates Results A,B,C,...) and now in case of a power loss I can some how use this intermediate results and continue/Resume the Run from that point.
Problem 2: How will i prevent a file from re opening every time a loop iterates ( fopen was placed in a loop that runs nearly a million times , this was needed as the file is being changed with each iteration)
You can separate it in some source files, and use make.
When each result phase is complete, branch off to a new universe. If the power fails in the new universe, destroy it and travel back in time to the point at which you branched. Repeat until all phases are finished, and then merge your results into the original universe via a transcendental wormhole.
Well, couple of options, I guess:
You split your algorithm along sensible lines with this a defined output from a phase that can be the input to the next phase. Then, configure your algorithm as a workflow (ideally soft-configured through some declaration file.
You add logic to your algorithm by which it knows what it has successfully completed (commited). Then, on failure, you can restart the algorithm and it bins all uncommitted data and restarts from the last commit point.
Note that both these options may draw out your 7hr run time further!
So, to improve the overall runtime, could you also separate your algorithm so that it has "worker" components that can work on "jobs" in parallel. This usually means drawing out some "dumb" but intensive logic (such as a computation) that can be parameterised. Then, you have the option of running your algorithm on a grid/ space/ cloud/ whatever. At least you have options to reduce the run time. Doesn't even need to be a space... just use queues (IBM MQ Series has a C interface) and just have listeners on other boxes listening to your jobs queue and processing your results before persisting the results. You can still phase the algorithm as discussed above too.
Problem 2: Opening the file on each iteration of the loop because it's changed
I may not be best qualified to answer this but doing fopen on each iteration (and fclose) presumably seems wasteful and slow. To answer, or have anyone more qualified answer, I think we'd need to know more about your data.
For instance:
Is it text or binary?
Are you processing records or a stream of text? That is, is it a file of records or a stream of data? (you aren't cracking genes are you? :-)
I ask as, judging by your comment "because it's changed each iteration", would you be better using a random-accessed file. By this, I'm guessing you're re-opening to fseek to a point that you may have passed (in your stream of data) and making a change. However, if you open a file as binary, you can fseek through anywhere in the file using fsetpos and fseek. That is, you can "seek" backwards.
Additionally, if your data is record-based or somehow organised, you could also create an index for it. with this, you could use to fsetpos to set the pointer at the index you're interested in and traverse. Thus, saving time in finding the area of data to change. You could even persist your index in an accompanying index file.
Note that you can write plain text to a binary file. Perhaps worth investigating?
Sounds like classical batch processing problem for me.
You will need to define checkpoints in your application and store the intermediate data until a checkpoint is reached.
Checkpoints could be the row number in a database, or the position inside a file.
Your processing might take longer than now, but it will be more reliable.
In general you should think about the bottleneck in your algo.
For problem 2, you must use two files, it might be that your application will be days faster, if you call fopen 1 million times less...

Simulating file system access

I am designing a file system in user space and need to test it. I do not want to use the available benchmarking tools as my requirements are different. So to test the file system I wish to simulate file access operation. To do this, I first use the ftw() function to walk through one f my existing file system(experimental) and list all the files and directories in a file.
Then I invoke a simulator to simulate file access by a number of processes. Thus, the simulator randomly starts a process i.e it forks a thread which does what a real process would have done. The thread randomly selects a file operation (read, write, rename etc) selects arguments to this operation from the list(generated by ftw()) . The thread does a number of such file operations and then exits marking the end of a process. The simulator continues to spawn threads; thread execution can overlap just as real processes do. Now, as operations are performed by threads, files get inserted, deleted, renamed and this is updated in the list of files.
I have not yet started coding. Does the plan seem sane? I am also not sure how to code the simulator...how will it spawn threads over a period of time. Should I be using some random delay to do this.
Thanks
Yep, that seems fairly reasonable to me. I would consider attempting to impose a statistical distribution over your file operations (and accesses to particular files) that is somehow matched to your expected workload. You might be able to find some statistics about typical filesystem workloads as a starting point.
That sounds about right for a decent test case just to make sure it's working. You could use sleep() to wait between spawning threads or just spawn them all at once and have them do an operation then wait a bit, then do another operation, etc... IMO if you hit it hard with a lot of requests and it works then there's a likely chance your filesystem will do just fine. Take an example from PostMark which all it does is append like crazy to different files and other benchmarks that do random access reads/writes in different locations to make sure that the page has to be read from disk.

Program communicating with itself between executions

I want to write a C program that will sample something every second (an extension to screen). I can't do it in a loop since screen waits for the program to terminate every time, and I have to access the previous sample in every execution. Is saving the value in a file really my best bet?
You could use a named pipe (if available), which might allow the data to remain "in flight", i.e. not actually hit disk. Still, the code isn't any simpler, and hitting disk twice a second won't break the bank.
You could also use a named shared memory region (again, if available). That might result in simpler code.
You're losing some portability either way.
Is saving the value in a file really my best bet?
Unless you want to write some complicated client/server model communicating with another instance of the program just for the heck of it. Reading and writing a file is the preferred method.

printf slows down my program

I have a small C program to calculate hashes (for hash tables). The code looks quite clean I hope, but there's something unrelated to it that's bugging me.
I can easily generate about one million hashes in about 0.2-0.3 seconds (benchmarked with /usr/bin/time). However, when I'm printf()inging them in the for loop, the program slows down to about 5 seconds.
Why is this?
How to make it faster? mmapp()ing stdout maybe?
How is stdlibc designed in regards to this, and how may it be improved?
How could the kernel support it better? How would it need to be modified to make the throughput on local "files" (sockets,pipes,etc) REALLY fast?
I'm looking forward for interesting and detailed replies. Thanks.
PS: this is for a compiler construction toolset, so don't by shy to get into details. While that has nothing to do with the problem itself, I just wanted to point out that details interest me.
Addendum
I'm looking for more programatic approaches for solutions and explanations. Indeed, piping does the job, but I don't have control over what the "user" does.
Of course, I'm doing a testing right now, which wouldn't be done by "normal users". BUT that doesn't change the fact that a simple printf() slows down a process, which is the problem I'm trying to find an optimal programmatic solution for.
Addendum - Astonishing results
The reference time is for plain printf() calls inside a TTY and takes about 4 mins 20 secs.
Testing under a /dev/pts (e.g. Konsole) speeds up the output to about 5 seconds.
It takes about the same amount of time when using setbuffer() in my testing code to a size of 16384, almost the same for 8192: about 6 seconds.
setbuffer() has apparently no effect when using it: it takes the same amount of time (on a TTY about 4 mins, on a PTS about 5 seconds).
The astonishing thing is, if I'm starting the test on TTY1 and then switch to another TTY, it does take just the same as on a PTS: about 5 seconds.
Conclusion: the kernel does something which has to do with accessibility and user friendliness. HUH!
Normally, it should be equally slow no matter if you stare at the TTY while its active, or you switch over to another TTY.
Lesson: when running output-intensive programs, switch to another TTY!
Unbuffered output is very slow.
By default stdout is fully-buffered, however when attached to terminal, stdout is either unbuffered or line-buffered.
Try to switch on buffering for stdout using setvbuf(), like this:
char buffer[8192];
setvbuf(stdout, buffer, _IOFBF, sizeof(buffer));
You could store your strings in a buffer and output them to a file (or console) at the end or periodically, when your buffer is full.
If outputting to a console, scrolling is usually a killer.
If you are printf()ing to the console it's usually extremely slow. I'm not sure why but I believe it doesn't return until the console graphically shows the outputted string. Additionally you can't mmap() to stdout.
Writing to a file should be much faster (but still orders of magnitude slower than computing a hash, all I/O is slow).
You can try to redirect output in shell from console to a file. Using this, logs with gigabytes in size can be created in just seconds.
I/O is always slow in comparison to
straight computation. The system has
to wait for more components to be
available in order to use them. It
then has to wait for the response
before it can carry on. Conversely
if it's simply computing, then it's
only really moving data between the
RAM and CPU registers.
I've not tested this, but it may be quicker to append your hashes onto a string, and then just print the string at the end. Although if you're using C, not C++, this may prove to be a pain!
3 and 4 are beyond me I'm afraid.
As I/O is always much slower than CPU computation, you might store all values in fastest possible I/O first. So use RAM if you have enough, use Files if not, but it is much slower than RAM.
Printing out the values can now be done afterwards or in parallel by another thread. So the calculation thread(s) may not need to wait until printf has returned.
I discovered long ago using this technique something that should have been obvious.
Not only is I/O slow, especially to the console, but formatting decimal numbers is not fast either. If you can put the numbers in binary into big buffers, and write those to a file, you'll find it's a lot faster.
Besides, who's going to read them? There's no point printing them all in a human-readable format if nobody needs to read all of them.
Why not create the strings on demand rather that at the point of construction? There is no point in outputting 40 screens of data in one second how can you possibly read it? Why not create the output as required and just display the last screen full and then as required it the user scrolls???
Why not use sprintf to print to a string and then build a concatenated string of all the results in memory and print at the end?
By switching to sprintf you can clearly see how much time is spent in the format conversion and how much is spent displaying the result to the console and change the code appropriately.
Console output is by definition slow, creating a hash is only manipulating a few bytes of memory. Console output needs to go through many layers of the operating system, which will have code to handle thread/process locking etc. once it eventually gets to the display driver which maybe a 9600 baud device! or large bitmap display, simple functions like scrolling the screen may involve manipulating megabytes of memory.
I guess the terminal type is using some buffered output operations, so when you do a printf it does not happen to output in split micro-seconds, it is stored in the buffer memory of the terminal subsystem.
This could be impacted by other things that could cause a slow down, perhaps there's a memory intensive operation running on it other than your program. In short there's far too many things that could all be happening at the same time, paging, swapping, heavy i/o by another process, configuration of memory utilized, maybe memory upgrade, and so on.
It might be better to concatenate the strings until a certain limit is reached, then when it is, write it all out at once. Or even using pthreads to carry out the desired process execution.
Edited:
As for 2,3 it is beyond me. For 4, I am not familiar with Sun, but do know of and have messed with Solaris, There may be a kernel option to use a virtual tty.. i'll admit its been a while since messing with the kernel configs and recompiling it. As such my memory may not be great on this, have a root around with the options to see.
user#host:/usr/src/linux $ make; make menuconfig **OR kconfig if from X**
This will fire up the kernel menu, have a dig around in to see the video settings section under the devices sub-tree..
Edited:
but there's a tweak you put into the kernel by adding a file into the proc filesystem (if a such thing does exist), or possibly a switch passed into the kernel, something like this (this is imaginative and does not imply it actually exists), fastio
Hope this helps,
Best regards,
Tom.

Resources