Time to run instructions of a for loop - c

I am interested to calculate a duration of 125 μs for implementing a TDM (Time Division Multiplexing scheme) based scheme. However, I am not able to get this duration with an accuracy of +-5us using the Linux operating system. I am using DPDK which runs on ubuntu and intel hardware. If I take time from the computer using function clock_gettime(CLOCK_REALTIME), it adds the time to make a call to the kernel to get the time. This gives an inaccurate duration to me.
Therefore, I dedicated a cpu core for calculating time without asking the time from the kernel. For this, I run a for loop for a maximum instructions (8000000) and find the number instructions that need to be executed for the 125 μs duration (i.e. (125*8000000)/timespent).
However, the problem is that it is also giving inaccurate results (there is always different results i.e., a difference 1000 instructions).
Does anybody know why I am getting inaccurate results even if I am dedicating a CPU for this?
Do you know a method to calculate a duration (very short, may be equal to 125 us) without making a call to the kernel? thanks!

You are getting inaccurate result because you are on a multitasking operating system. You cannot do this on modern computers. You can only do this on embedded microcontroller where you control 100% of the cpu time. The operating system need to manage your process, even if you have a dedicated cpu. The mouse and keyboard takes time also. Your have to run the process on 'Bare Metal'.

Related

Are two successive calls to getrusage guaranteed to produce increasing results?

In a program that calls getrusage() twice in order to obtain the time of a task by subtraction, I have once seen an assertion, saying that the time of the task should be nonnegative, fail. This, of course, cannot easily be reproduced, although I could write a specialized program that might reproduce it more easily.
I have tried to find a guarantee that getrusage() increased along execution, but neither the man page on my system(Linux on x86-64) nor this system-independant description say so explicitly.
The behavior was observed on a physical computer, with several cores, and NTP running.
Should I report a bug against the OS I am using? Am I asking too much when I expect getrusage() to increase with time?
On many systems rusage (I presume you mean ru_utime and ru_stime) is not calculated accurately, it's just sampled once per clock tick which is usually as slow as 100Hz and sometimes even slower.
Primary reason for that is that many machines have clocks that are incredibly expensive to read and you don't want to do this accounting (you'd have to read the clock twice for every system call). You could easily end up spending more time reading clocks than doing anything else in programs that do many system calls.
The counters should never go backwards though. I've seen that many years ago where the total running time of the process was tracked on context switches (which was relatively cheap and getrusge could calculate utime by using samples for stime, and subtracting that from the total running time). The clock used in that case was the wall clock instead of a monotonic clock and when you changed the time on the machine, the running time of processes could go back. But that was of course a bug.

Is there a way to suspend OS scheduling for the duration of a program?

I have an assignment where I am analyzing the runtime of various sorting algorithms. I have written the code but I think it's an unfair comparison.
My code basically grabs the the clock time before and after the sorting is finished to compute the elapsed time. However, what if the OS decides to interrupt more frequently during the runtime of a specific sorting algorithm, or if it rather decides that some other background application should be given more of the time domain when it's thread comes back up?
I am not a CS major so I may not be entirely correct here, but from what I've read previously I was concerned this might have an impact on the results.
I also realize that if OS scheduling is suspended and the program hangs then there might be a serious problem; I am just wondering if it possible.
Normally, there's no real reason for it. The scheduler will slightly increase the execution time, but if the code runs for a few seconds, the change will be tiny.
So unless you're running heavy applications on the same computer, the amount of noise this will add to your tests is negligible.
In Linux, you can use isolcpus parameter to mark CPUs that won't be used by the scheduler. You can find information here. I'm not sure what's the minimal kernel version.
If you use it, you'll need to use sched_setaffinity, to put your theread on an isolated CPU, because the scheduler won't put it there.
It is not possible, not in user space code. Otherwise, any malicious process could steal the CPU from others.
If you want precise time counting for your process only, I suggest using time command. You can read about it here: What do 'real', 'user' and 'sys' mean in the output of time(1)?
Quick answer: you are most likely interested in user time, assuming your code doesn't make a heavy use of syscalls (which would be rather strange for a sorting algorithm)
On an up-to-date POSIX system (basically Linux) you can use clock_gettime with CLOCK_PROCESS_CPUTIME_ID or CLOCK_THREAD_CPUTIME_ID if you make sure the process doesn't migrate between CPUs (you can set its affinity for example).
The difference in times returned by clock_gettime with those arguments results in exact time the process/thread spent executing. Only pitfall as I mentioned is process migration as the man page says:
The CLOCK_PROCESS_CPUTIME_ID and CLOCK_THREAD_CPUTIME_ID clocks are realized on many platforms using timers from the CPUs (TSC on i386, AR.ITC on Itanium). These registers may differ between CPUs and as a consequence these clocks may return bogus results if a process is migrated to another CPU.
This means that you don't really need to suspend all other processes just to measure the execution time of your program.

Measure time complexity of a program in any programming language

I am searching for a standard way to identify running time complexity of a program.
As described here, I am not searching for a solution for analyzing the same by looking at code, rather than through some other parameters at program runtime.
Consider a program which requires the user to convert a binary string to its decimal equivalent. The time complexity for such a program should be O(n) at worst, when each binary digit is processed at a time. With some intelligence, the running time can be reduced to O(n/4) (process 4 digits from the binary string at a time, assume that the binary string has 4k digits for all k=1,2,3...)
I wrote this program in C and used the time command and a function that uses gettimeoftheday (both) to calculate running time on a linux box having a 64 bit quad core processor (each core at 800 MHZ) under two categories:
When system is under normal load (core usage 5-10%)
When system is under heavy load (core usage 80-90%)
Following are the readings for O(n) algorithm, length of binary string is 100000, under normal load:
Time spent in User (ms) - 216
Time Spent in Kernel (ms) - 8
Timed using gettimeofday (ms) - 97
Following are the readings for O(n) algorithm, length of binary string is 200000, under high load:
Time spent in User (ms) - 400
Time Spent in Kernel (ms) - 48
Timed using gettimeofday (ms) - 190
What I am looking for:
If I am using time command, which output should I consider? real, user or sys?
Is there a standard method to calculate the running time of a program?
Every time I execute these commands, I get a different reading. How many times should I sample so that the average will always be the same, given the code does not change.
What if I want to use multiple threads and measure time in each thread by calling execve on such programs.
From the research I have done, I have not come across any standard approach. Also, whatever command / method I seem to use gives a me different output each time (I understand this is because of the context switches and cpu cycles). We can assume here that I can even do with a solution that is machine dependant.
To answer your questions:
Depends on what your code is doing each component of the output of time may be significant. This question deals with what those components mean. If the code you're timing doesn't utilize system calls, calculating the "user" time is probably sufficient. I'd probably just use the "real" time.
What's wrong with time? If you need better granularity (i.e. you just want to time a section of code instead of the entire program) you can always get the start time before the block of code you are profiling, run the code and then get the end time then calculate the difference to give you the runtime. NEVER use gettimeofday as the time does not monotonically increase. The system time can be changed by the administrator or an NTP process. You should use clock_gettime instead.
To minimise the runtime differences from run to run, I would check that cpu frequency scaling is OFF especially if you're getting very wildly differing results. This has caught me out before.
Once you start getting into multiple threads, you might want to start looking at a profiler. gprof is a good place to start.

CPU load of my program

How can we know that how much load does our program is on CPU?
I tried to find it using htop. But htop wont give the cpu load. It actually gives the cpu utilization percentage of my program(using pid).
I am using C programming, Linux environment.
The function you are probably looking for is getrusage. It fills struct rusage. There are two members of the struct you are interested in:
ru_utime - user CPU time used
ru_stime - system CPU time used
You can call the function at regular intervals of time and based on the results you can estimate the cpu load (e.g. in percentage) of your own process.
If you want to get it at the system level, then you need to read (and parse) /proc/stat
file (also at regular intervals), see here.

Does the frequency of machine effect execution time of my code?

I have written code which performs a specific task; now when I will run on different machine(having different frequency) will it take different time?
Ouestion
If my code has one printf function, then will its required number of machine cycles be fixed for all machines, or will it depend on the system?
My system frequency is 2.0GHz, what does it mean?
The performance time of the code will depend on the frequency of the CPU, amongst many other things. All other things being equal, a faster CPU will take less time to execute the same instructions. But the number of other things that can affect the timing is vast, including O/S, compiler, memory chips, disk and so on.
If the machines have the same basic architecture, then the number of machine cycles is fixed. However, modern CPU architectures are very complex, and there could easily be variations depending on what else is running on the machine at the same time. If the machines have different chip types (even within a family such as Intel Core 2 Duo), then the results could be different. If the machines are of different architectures (Intel vs SPARC or PowerPC, say), then all bets are off.
If the 'frequency is 2.0 GHz', then it means that the main CPU clock cycles at 2.0 GHz. How many instructions are executed in that time depends on the instructions, and the parallelism (how many cores), and the CPU type, etc. The CPU frequency is separate from the bus frequency which controls how fast memory can be read (so, I'm using a 2.0 GHz CPU but the memory bus runs at 1067 MHz).
Clock speed of a computer of course has its influence on the execution time of a program, but just stating that the processor runs at 2 GHz is absolutely not enough to determine how long exactly the program will run because there are huge differences in "efficiency" between the processor families - an Intel Core family processor will just do a lot more work per time unit than its predecessor, the Pentium 4, when both run at the same speed.
So yes, CPU speed has a serious influence on the execution time of a program but just the GHz value is absolutely not enough. That's why various benchmarks were set up, to be able to compare the work a processor can do in a time unit. These benchmarks will run a mix of instructions that can be considered a typical workload in a chosen scenario, and time how long their execution will take. Check out Whetstone and Dhrystone for some older but relatively easy to understand benchmarks.
The fact that there are tons of benchmarks only proves that it's not easy at all to obtain a comparable value on whose relevance everybody can agree, it remains a topic for debate...
The frequency of the CPU defines how much work it can do within a certain time. The code is the same on all machines (i.e. it's compiled code) so yes the frequency will affect the time it takes to run your program.

Resources