Print debug message without slowing down the program

Print debug message without slowing down the program - c

My C program needs to evaluate time elapsed between sending/receiving a message from client to server. In order to debug, I used some printf() statements and they seem to slow down the program.
After removing all of them, the pattern of time becomes as expected.
I wonder if there is any way to print that has little influence on execution time.

I would try to store the timestamps you need as a variable during the workflow, then print after the time-sensitive workflow is complete. This way you still have the information you need but shouldn't be impacting performance.

Related

How can I get the code coverage information for a C program who trapped in infinite loop?

I need to get the code coverage information for a amount of C programs. I only need to know whether or not each line is executed. However, some of them will never end for the sake of infinite loops. The most of tools, e.g. like gcov, llvm-cov, which could get the information for the program only after it ends.
I set a time limitation for all the programs. If it dosen't end when its exection time is beyound the limitation, its process will be killed. However, when its process was killed, all the information stored in the memory is cleaned. So I can't get the code coverage information for those programs. How can I do that?

A simple solution would be to add a litle timer interrupt inside your program that will raise an exception after a few seconds of "bad behaviour" and cause the program to terminate nicely.
When searching and probing in the temporary fashion you seem to be after, it is completely legit to "hack it" as long as you remove the hack immediately you found the error..

How often does a processor check if time on a timer is up to execute a program?

For example, if we program a computer to check and update some variables every 5 minutes does that mean that the computer actually checks if the condition matches (if that 5 minutes are up so it can execute a program) every tick? So that means (in my point of view) the bigger the amount of conditionals or timers or both the heavier the load on the processor even though the processor just checks if the time is up or whether the condition is match or not.
My reasoning being here that the processor can't really put some task away and forget about it for 5 minutes and then just remember about it and execute the program. It has to keep a track of time (counting seconds or ticks or whatever), keep track of timers that are currently being on and check if the time on every timer is up or not.
That makes every timer a conditional statement. Right?
So the main question is... am I correct on all of those statements or the reality is a bit different and if so then how different?
Thank you.

I am assuming here that you have a basic understanding of how processes are managed in a CPU
Most programming languages implement some form of wait() function, that will cause the CPU to stop executing instructions from that thread until it is interrupted, allowing it to work on other tasks in the meantime. Waiting for an interrupt does not use much of the system's resources, and is much more efficient than the polling method that you were describing.
This is a pretty basic explanation, but if you want to learn more, lookup preemptive multitasking.

How to write a file after a given run time with MPI?

I have a simulation code written in C, parallel with MPI, running on a linux Cluster that kills jobs after 12h of wall time. Jobs that last longer than 12h must then be restarted from a file written by the program.
My code currently write these 'restart files' every N steps of my simulation. It is important that each node is at the same simulation step before writing the restart file.
In my case, these files are big (> 1GB/process) therefore I cannot write the as often as I would need (takes to much time and space).
Also, the execution time of one simulation step depends on what is going on within the simulation, as a result it is quite difficult to predict on many steps my simulation will have done within the 12h. So I cannot either decide to write the restart file after the number of step I think will be done just before the 12h of run time.
As a result, when my job is killed, the last restart file may have been written several hours before, which then forces me do redo a substantial part of the last 12h execution.
Therefore, I am looking for a way to write a restart file as a function of the elapsed run time. I have thought of using MPI_Wtime(), however for a given runtime, say 11:50:00, all processors won't necessarily be at the same phase step... which is not good. Is there a simple solution to that problem ?

Once your processes hit the 11:50:00 mark (or some other suitable deadline), have them AllReduce the number of iterations completed using MPI_MAX. Then they can catch up to exactly that number of iterations, and wait for everyone else to do the same with a simple Barrier. They can then start writing the restart file.

How to speed up consecutive program startup under Linux?

I've written two relatively small programs using C. Both of them comunnicate with each other using textual data. Program A generates some problems from given input, B evaluates them and creates input for another iteration of A.
Here's a bash script that I currently use:
for i in {1..1000}
do
./A data > data2;
./B data2 > data;
done
The problem is that since what A and B do is not very time consuming, most of the time is spent (as I suppose) in starting apps up. When I measure time the script runs I get:
$ time ./bash.sh
real 0m10.304s
user 0m4.010s
sys 0m0.113s
So my main question is: is there any way to communicate data beetwen those two apps faster? I don't want to integrate them into one application, because I'm trying to build a toolset with independent, easly communicating tools (as was suggested in "The Art of Unix Programming" from which I'm learning the way to write reusable software).
PS. The data and data2 files contain sets of data needed in whole at once by those applications (so communicating by for e.g. one line of data at time is impossible).
Thanks for any suggestions.
cheers,
kajman

Can you create named pipe ?
mkfifo data1
mkfifo data2
./A data1 > data2 &
./B data2 > data1
If your application is reading and writing in a loop, this could work :)

If you used pipes to transfer the stdout of program A to the stdin of program B you would remove the need to write the file "data2" each loop.
./A data1 | ./B > data1
Program B would need to have the capability of using input from stdin rather than a specified file.

If you want to make a program run faster, you need to understand what is making the program run slowly. The field of computer science dedicated to measuring the performance of a running program is called profiling.
Once you discover which internal portion of your program is running slow, you can generally speed it up. How you go about speeding up that item depends heavily on what "the slow part" is doing and how it is "being done".
Several people have recommended pipes for moving the data directly from the output of one program into the input of another program. Assuming you rewrite your tools to handle input and output in a piped manner, this might improve performance. Again, it depends on what you are doing and how you are doing it.
For example, if your tool just fixes windows style end-of-lines into unix style end-of-lines, the program might read in one line, waiting for it to be available, check the end-of-line and write out the line with the desired end-of-line. Or the tool might read in all of the data, do a replacement call on each "wrong" end-of-line in memory, and then write out all of the data. With the first solution, piping speeds things up. With the second solution piping doesn't speed up anything.
The reason is is truly so hard to answer such a question is because the fix you need really depends on the code you have, the problem you are trying to solve, and the means by which you are solving it now. In the end, there isn't always a 100% guarantee that the code can be sped up; however, virtually every piece of code has opportunities to be sped up. Use profiling to speed up the parts that are slow, instead of wasting your time working on a part of your program that is only called once, and represents 0.001% of the program's runtime.
Remember if you speed up something that is 0.001% of your program's runtime by 50%, you actually only sped up your entire program by 0.0005%. Use profiling to determine the block of code that's taking up 90% of your runtime and concentrate on it.

I do have to wonder why, if A and B depend on each other to run, do you want them to be part of an independent toolset.
One solution is a compromise between the two:
Create a library that contains A.
Create a library that contains B.
Create a program that spawns two threads, 1 containing A and 2 containing B.
Create a semaphore that tells A to run and another that tells B to run.
After the function that calls A in 1, increment B's semaphore.
After the function that calls B in 2, increment A's semaphore.
Another possibility is to use file locking in your programs:
Make both A and B execute in infinite loops (or however many times you're processing data)
Add code to attempt to lock both files at the beginning of the infinite loop in A and B (if not, sleep and try again so that you don't do anything until you have the lock).
Add code to unlock and sleep for longer than the sleep in step 2 at the end of each loop.
Either of these solve the problem of having the overhead of launching the program between runs.

It's almost certainly not application startup which is the bottleneck. Linux will end up caching large portions of your programs, which means that launching will progressively get faster (to a point) the more times you start your program.
You need to look elsewhere for your bottleneck.

printf slows down my program

I have a small C program to calculate hashes (for hash tables). The code looks quite clean I hope, but there's something unrelated to it that's bugging me.
I can easily generate about one million hashes in about 0.2-0.3 seconds (benchmarked with /usr/bin/time). However, when I'm printf()inging them in the for loop, the program slows down to about 5 seconds.
Why is this?
How to make it faster? mmapp()ing stdout maybe?
How is stdlibc designed in regards to this, and how may it be improved?
How could the kernel support it better? How would it need to be modified to make the throughput on local "files" (sockets,pipes,etc) REALLY fast?
I'm looking forward for interesting and detailed replies. Thanks.
PS: this is for a compiler construction toolset, so don't by shy to get into details. While that has nothing to do with the problem itself, I just wanted to point out that details interest me.
Addendum
I'm looking for more programatic approaches for solutions and explanations. Indeed, piping does the job, but I don't have control over what the "user" does.
Of course, I'm doing a testing right now, which wouldn't be done by "normal users". BUT that doesn't change the fact that a simple printf() slows down a process, which is the problem I'm trying to find an optimal programmatic solution for.
Addendum - Astonishing results
The reference time is for plain printf() calls inside a TTY and takes about 4 mins 20 secs.
Testing under a /dev/pts (e.g. Konsole) speeds up the output to about 5 seconds.
It takes about the same amount of time when using setbuffer() in my testing code to a size of 16384, almost the same for 8192: about 6 seconds.
setbuffer() has apparently no effect when using it: it takes the same amount of time (on a TTY about 4 mins, on a PTS about 5 seconds).
The astonishing thing is, if I'm starting the test on TTY1 and then switch to another TTY, it does take just the same as on a PTS: about 5 seconds.
Conclusion: the kernel does something which has to do with accessibility and user friendliness. HUH!
Normally, it should be equally slow no matter if you stare at the TTY while its active, or you switch over to another TTY.
Lesson: when running output-intensive programs, switch to another TTY!

Unbuffered output is very slow.
By default stdout is fully-buffered, however when attached to terminal, stdout is either unbuffered or line-buffered.
Try to switch on buffering for stdout using setvbuf(), like this:
char buffer[8192];
setvbuf(stdout, buffer, _IOFBF, sizeof(buffer));

You could store your strings in a buffer and output them to a file (or console) at the end or periodically, when your buffer is full.
If outputting to a console, scrolling is usually a killer.

If you are printf()ing to the console it's usually extremely slow. I'm not sure why but I believe it doesn't return until the console graphically shows the outputted string. Additionally you can't mmap() to stdout.
Writing to a file should be much faster (but still orders of magnitude slower than computing a hash, all I/O is slow).

You can try to redirect output in shell from console to a file. Using this, logs with gigabytes in size can be created in just seconds.

I/O is always slow in comparison to
straight computation. The system has
to wait for more components to be
available in order to use them. It
then has to wait for the response
before it can carry on. Conversely
if it's simply computing, then it's
only really moving data between the
RAM and CPU registers.
I've not tested this, but it may be quicker to append your hashes onto a string, and then just print the string at the end. Although if you're using C, not C++, this may prove to be a pain!
3 and 4 are beyond me I'm afraid.

As I/O is always much slower than CPU computation, you might store all values in fastest possible I/O first. So use RAM if you have enough, use Files if not, but it is much slower than RAM.
Printing out the values can now be done afterwards or in parallel by another thread. So the calculation thread(s) may not need to wait until printf has returned.

I discovered long ago using this technique something that should have been obvious.
Not only is I/O slow, especially to the console, but formatting decimal numbers is not fast either. If you can put the numbers in binary into big buffers, and write those to a file, you'll find it's a lot faster.
Besides, who's going to read them? There's no point printing them all in a human-readable format if nobody needs to read all of them.

Why not create the strings on demand rather that at the point of construction? There is no point in outputting 40 screens of data in one second how can you possibly read it? Why not create the output as required and just display the last screen full and then as required it the user scrolls???
Why not use sprintf to print to a string and then build a concatenated string of all the results in memory and print at the end?
By switching to sprintf you can clearly see how much time is spent in the format conversion and how much is spent displaying the result to the console and change the code appropriately.
Console output is by definition slow, creating a hash is only manipulating a few bytes of memory. Console output needs to go through many layers of the operating system, which will have code to handle thread/process locking etc. once it eventually gets to the display driver which maybe a 9600 baud device! or large bitmap display, simple functions like scrolling the screen may involve manipulating megabytes of memory.

I guess the terminal type is using some buffered output operations, so when you do a printf it does not happen to output in split micro-seconds, it is stored in the buffer memory of the terminal subsystem.
This could be impacted by other things that could cause a slow down, perhaps there's a memory intensive operation running on it other than your program. In short there's far too many things that could all be happening at the same time, paging, swapping, heavy i/o by another process, configuration of memory utilized, maybe memory upgrade, and so on.
It might be better to concatenate the strings until a certain limit is reached, then when it is, write it all out at once. Or even using pthreads to carry out the desired process execution.
Edited:
As for 2,3 it is beyond me. For 4, I am not familiar with Sun, but do know of and have messed with Solaris, There may be a kernel option to use a virtual tty.. i'll admit its been a while since messing with the kernel configs and recompiling it. As such my memory may not be great on this, have a root around with the options to see.
user#host:/usr/src/linux $ make; make menuconfig **OR kconfig if from X**
This will fire up the kernel menu, have a dig around in to see the video settings section under the devices sub-tree..
Edited:
but there's a tweak you put into the kernel by adding a file into the proc filesystem (if a such thing does exist), or possibly a switch passed into the kernel, something like this (this is imaginative and does not imply it actually exists), fastio
Hope this helps,
Best regards,
Tom.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight