I want to measure the execution time of a c-code segment using Linux.
I take one timestamps at the beginning of the code segment and one at the end.
But I don't know how to protect the code against IRQs and context switches to high prior tasks. The program runs in user space!
The code segment is short so don't panic hosing the system.
Does anyone know an easy solution for this kind of protection?
You can use getrusage(2) to get the CPU time used, rather than just measuring real time. That should get you the answer you want without having to resort to funny business like blocking other programs from running.
Related
For the function in c, system(), would it affect the hardware counters if you are trying to see how that command you ran performed
For example lets say im using the Performance API(PAPI) and the program is a precompiled matrix multiplication application
PAPI_start_counters();
system("./matmul");
PAPI_read_counters();
//Print out values
PAPI_stop_counters();
I am obviously missing a bit but what I am trying to find out is it is possible, through the use of said counters to get the performance of a program im running.
from my tests I would get wild numbers like the ones below. they are obviously wrong, just want to find out why
Total Cycles =========== 140733358872510
Instructions Completed =========== 4203968
Floating Point Instructions =========== 0
Floating Point Operations =========== 4196867
Loads =========== 140733358872804
Stores =========== 4204037
Branches Taken =========== 15774436
system() is a very slow function in general. On Linux, it spawns /bin/sh (forking and executing a full shell process), which parses your command, and spawns the second program. Loading these two programs requires loading the code to memory, initializing all their libraries, executing startup code, etc. Only then will the program code actually start executing.
Because of the unpredictability of disk access and Linux process scheduling, timing system() calls has a very high inherent variability. Therefore, you won't get accurate results even if you use a high-performance counter.
The better solution would be to compile the target program as a library instead. Load it before initializing your counters, then just execute the main function from the library. That way, all the code executes in your process, and you have negligible startup time. Your performance numbers will be much more precise this way.
Do you have access to the code of matmul? If so, it's much more precise to instrument and measure only the code you're interested in. That means you wrap only those instructions (or C statements) in counters that you want to measure.
For more information see:
Related discussion here
IntelĀ® Performance Counter Monitor here
Performance measurements with x86 RDTSC instruction here
As stated above, measuring using PAPI to wrap system() invocations carries way too much process overhead to give you any idea of how fast your math code is actually running.
The numbers you are getting are odd, but not necessarily wrong. The huge disparity between the instructions completed and the cycles probably indicate that the executable "matmul" is doing a lot of waiting for external processes (e.g. disk I/O) to complete. I do not know the specifics of the msg FP Instructions and FP ops, but if they are displaying those values differently PAPI has a reason.
What is interesting is that the loads and cycles are obviously connected as well as instructions/fp ops and stores.
I would have to know about the internals of "matmul" in order to give you a better description.
I am wondering if it is possible to force a cache flush within c using linux x86. I have read several answers answering how to do this within the shell or using asm/cache.h (requiring me to write a linux module...)
I am using the PAPI library which allows me to get very close to the exact number of clock cycles that a given block of code takes to execute. However, since I want to time some extremely short functions I need to run the functions many times for accurate statistics (the timing function call takes longer than the code within the blocks takes to execute). By running the code multiple times the cache is speeding up the execution of successive calls of the same block of code and I would like to prevent this!
I don't Know any standard way to do this other than loading other thing to the cache. My usual workaround is simply process something large enough to "cool down" the cache, say a matrix multiplication.
I have an assignment where I am analyzing the runtime of various sorting algorithms. I have written the code but I think it's an unfair comparison.
My code basically grabs the the clock time before and after the sorting is finished to compute the elapsed time. However, what if the OS decides to interrupt more frequently during the runtime of a specific sorting algorithm, or if it rather decides that some other background application should be given more of the time domain when it's thread comes back up?
I am not a CS major so I may not be entirely correct here, but from what I've read previously I was concerned this might have an impact on the results.
I also realize that if OS scheduling is suspended and the program hangs then there might be a serious problem; I am just wondering if it possible.
Normally, there's no real reason for it. The scheduler will slightly increase the execution time, but if the code runs for a few seconds, the change will be tiny.
So unless you're running heavy applications on the same computer, the amount of noise this will add to your tests is negligible.
In Linux, you can use isolcpus parameter to mark CPUs that won't be used by the scheduler. You can find information here. I'm not sure what's the minimal kernel version.
If you use it, you'll need to use sched_setaffinity, to put your theread on an isolated CPU, because the scheduler won't put it there.
It is not possible, not in user space code. Otherwise, any malicious process could steal the CPU from others.
If you want precise time counting for your process only, I suggest using time command. You can read about it here: What do 'real', 'user' and 'sys' mean in the output of time(1)?
Quick answer: you are most likely interested in user time, assuming your code doesn't make a heavy use of syscalls (which would be rather strange for a sorting algorithm)
On an up-to-date POSIX system (basically Linux) you can use clock_gettime with CLOCK_PROCESS_CPUTIME_ID or CLOCK_THREAD_CPUTIME_ID if you make sure the process doesn't migrate between CPUs (you can set its affinity for example).
The difference in times returned by clock_gettime with those arguments results in exact time the process/thread spent executing. Only pitfall as I mentioned is process migration as the man page says:
The CLOCK_PROCESS_CPUTIME_ID and CLOCK_THREAD_CPUTIME_ID clocks are realized on many platforms using timers from the CPUs (TSC on i386, AR.ITC on Itanium). These registers may differ between CPUs and as a consequence these clocks may return bogus results if a process is migrated to another CPU.
This means that you don't really need to suspend all other processes just to measure the execution time of your program.
As a part of my academic project I have to execute a C program.
I want to get the execution time of the program. For that I have to sleep all other processes in Linux for some seconds. Is there any method for doing that?
(I have tried using the time command in Linux but it is not working properly: it shows different execution time when I am executing the same program. So I am computing execution time by seeing the difference between start time and end time).
About the best way I can think of is to drop to single-user mode, which you get with
# init 1
on pretty much any distribution. This will also stop X, you'll be on a raw console. Handling interrupts from stray mouse movement is likely to be one of the reasons for whatever variability you're seeing, so that's a good thing.
When you want your full system back, init 3 is probably the one, that or init 5.
The usual way to do this is to try to quiesce the machine as much as possible, then take several measurements and average them. It's advisable to discard the first reading, as that's likely to involve population of caches.
It is impossible to get the exact time of execution of a process into a system in which the scheduler commutes from 1 process to the other.
The Intel processors inserted a register that counts the number of clocks, but even so it is impossible to measure the time.
There is a book that you can find as PDF on google, "Computer Systems: A Programmer's Perspective" -- In this book an whole chapter is dedicated to time measurements.
Use the time command. The sum user + sys will give you the time your programm used the CPU directly plus the time the system used the CPU on behalf of your program. I think it is what you want to know.
There will always be a difference in execution time for things no matter how many processes you shut down, polling, IO, background daemons all affect execution priority.
The academic approach would be to run a sizeable sample and take statistics, you might also want to take a look at sar to log the background. To invalidate any readings you might take
Try executing your application with nice -n 20. It may help to make the other processes quieter.
nice man page
I want to measure the running time of a specific system call, for example, I want to know a pread need how many time on both CPU and I/O.
Which function should I use?
Now I usetimes, and it works.
gettimeofday is get the current time, and that may not just calculate the running time of a specific process, right?
clock is return the CPU time this program used so far, does this include the I/O time? If there are other programs running, will this influence the time of this function? I mean something like switching running process.
getrusage seems like a ideal one, but it also returns the CPU time of a specific process.
Does anyone know how benchmark tools like iozone calculate system calls time? I've read its code, and still have no idea.
You're looking for times(2).