I am making an emulator for the 6502 CPU and i want to control the frequency at which it is running.
Without any delay, my max average frequency is about 70-80 MHz. However, when i try to use nanosleep in order to control frequency, it only works until i reach 1 mS delay (1000000 nS).
If i try to set the delay any lower, for example to .1 mS, it does not change the average execution timing. What is strange to me is that if i set the delay to .1 mS, the actual delay turns out to be ~.008 mS and it does not change for any value in range 1-99999 nS.
I have tried compiling the code both in debug and release mode
Am i using nanosleep wrong? It is my first time using nanosleep, i have only used sleep and usleep before, so i am a bit confused.
Is there any better way to control the frequency of code execution?
How to correctly use nanosleep() to run code at specific frequency?
I say it's impossible to use nanosleep to run code at specific frequency (or, it's possible, up to a certain accuracy with some drift?). From man nanosleep emphasis mine, but whole is relevant:
The nanosleep() function shall cause the current thread to be suspended from execution until either the time interval specified
by the rqtp argument has elapsed or a signal is delivered to the calling thread, and its action is to invoke a signal-catching
function or to terminate the process. The suspension time may be longer than requested because the argument value is rounded up
to an integer multiple of the sleep resolution or because of the scheduling of other activity by the system. But, except for the
case of being interrupted by a signal, the suspension time shall not be less than the time specified by rqtp, as measured by the
system clock CLOCK_REALTIME.
Is there any better way to control the frequency of code execution?
Use timers and try to depend on the kernel to properly schedule periodic execution, rather then doing it yourself. Use timer_create to request an interrupt after specified timeout and run your code after that timeout, and you may also consider making your process a real-time process. In pseudocode I think I would:
void timer_thread(union sigval s) {
do_your_code();
}
int main() {
timer_t t;
timer_create(CLOCK_MONOTONIC, &(struct sigevent){
.sigev_notify = SIGEV_THREAD,
.sigev_notify_function = timer_thread
}, &t);
timer_settime(t, &(struct itimerspec){{.tv_sec = 1},{.tv_sec = 1}}, 0);
while (1) {
// just do nothing - code will be triggered in separate thread by timer
pause();
}
}
I am doing C program in Linux . I have a main thread which continuously updates values of two variables and other thread write those variable values into a file every 20 milliseconds. I have used usleep to achieve this time interval. sample code is below.
main()
{
.
.
.
.
.
pthread_create(...write_file..); /* started another thread by passing a function write_file */
while(variable1)
{
updates value of variables
}
return 0;
}
void write_file()
{
.
.
.
.
fp = fopen("sample.txt" , "a");
while(variable2)
{
fprintf(fp," %d \n", somevariable);
usleep(20 * 1000);
}
fclose(fp);
}
Is it suitable to use usleep function achieve 20 milliseconds time interval or should I use some other methods like Timer.?
Is this usleep is accurate enough ? Does this sleep function any way affect the main thread ?
Using of sleep() family often results in non-precise timing, especially when process has many CPU-consuming threads and required intervals are relatively small, like 20ms. So you shouldn't assume that *sleep() call blocks execution exactly to specified time. For described above situation actual sleep duration may be even twice or more greater than specified (assuming that kernel is not real-time one). As result you should implement some kind of compensation logic, that adjusts sleep duration for subsequent calls.
More precise (but of course not ideal) approach is to use POSIX timers. See timer_create(). The most precise timers are the ones that use SIGEV_SIGNAL or SIGEV_THREAD_ID notifications (latter is only on Linux systems). As signal number you can use one of the real-time signals (SIGRTMIN to SIGRTMAX), but be aware that pthread implementations often use few of these signals internally, so you should choose actual number carefully. And also doing something in signal handler context requires extra attention, because not every library function may be used safely here. You can find safe list here.
P.S. Also note that select() called with empty sets is a fairly portable way to sleep with subsecond precision.
Sleeping: sleep() and usleep()
Now, let me start with the easier timing calls. For delays of multiple seconds, your best bet is probably to use sleep(). For delays of at least tens of milliseconds (about 10 ms seems to be the minimum delay), usleep() should work. These functions give the CPU to other processes (``sleep''), so CPU time isn't wasted. See the manual pages sleep(3) and usleep(3) for details.
For delays of under about 50 milliseconds (depending on the speed of your processor and machine, and the system load), giving up the CPU takes too much time, because the Linux scheduler (for the x86 architecture) usually takes at least about 10-30 milliseconds before it returns control to your process. Due to this, in small delays, usleep(3) usually delays somewhat more than the amount that you specify in the parameters, and at least about 10 ms.
nanosleep()
In the 2.0.x series of Linux kernels, there is a new system call, nanosleep() (see the nanosleep(2) manual page), that allows you to sleep or delay for short times (a few microseconds or more).
For delays <= 2 ms, if (and only if) your process is set to soft real time scheduling (using sched_setscheduler()), nanosleep() uses a busy loop; otherwise it sleeps, just like usleep().
The busy loop uses udelay() (an internal kernel function used by many kernel drivers), and the length of the loop is calculated using the BogoMips value (the speed of this kind of busy loop is one of the things that BogoMips measures accurately). See /usr/include/asm/delay.h) for details on how it works.
Source: http://tldp.org/HOWTO/IO-Port-Programming-4.html
Try use nanosleep() instead usleep(), it should be more accurately for 20ms interval.
I want to implement a delay function using null loops. But the amount of time needed to complete a loop once is compiler and machine dependant. I want my program to determine the time on its own and delay the program for the specified amount of time. Can anyone give me any idea how to do this?
N. B. There is a function named delay() which suspends the system for the specified milliseconds. Is it possible to suspend the system without using this function?
First of all, you should never sit in a loop doing nothing. Not only does it waste energy (as it keeps your CPU 100% busy counting your loop counter) -- in a multitasking system it also decreases the whole system performance, because your process is getting time slices all the time as it appears to be doing something.
Next point is ... I don't know of any delay() function. This is not standard C. In fact, until C11, there was no standard at all for things like this.
POSIX to the rescue, there is usleep(3) (deprecated) and nanosleep(2). If you're on a POSIX-compliant system, you'll be fine with those. They block (means, the scheduler of your OS knows they have nothing to do and schedules them only after the end of the call), so you don't waste CPU power.
If you're on windows, for a direct delay in code, you only have Sleep(). Note that THIS function takes milliseconds, but has normally only a precision around 15ms. Often good enough, but not always. If you need better precision on windows, you can request more timer interrupts using timeBeginPeriod() ... timeBeginPeriod(1); will request a timer interrupt each millisecond. Don't forget calling timeEndPeriod() with the same value as soon as you don't need the precision any more, because more timer interrupts come with a cost: they keep the system busy, thus wasting more energy.
I had a somewhat similar problem developing a little game recently, I needed constant ticks in 10ms intervals, this is what I came up with for POSIX-compliant systems and for windows. The ticker_wait() function in this code just suspends until the next tick, maybe this is helpful if your original intent was some timing issue.
Unless you're on a real-time operating system, anything you program yourself directly is not going to be accurate. You need to use a system function to sleep for some amount of time like usleep in Linux or Sleep in Windows.
Because the operating system could interrupt the process sooner or later than the exact time expected, you should get the system time before and after you sleep to determine how long you actually slept for.
Edit:
On Linux, you can get the current system time with gettimeofday, which has microsecond resolution (whether the actual clock is that accurate is a different story). On Windows, you can do something similar with GetSystemTimeAsFileTime:
int gettimeofday(struct timeval *tv, struct timezone *tz)
{
const unsigned __int64 epoch_diff = 11644473600000000;
unsigned __int64 tmp;
FILETIME t;
if (tv) {
GetSystemTimeAsFileTime(&t);
tmp = 0;
tmp |= t.dwHighDateTime;
tmp <<= 32;
tmp |= t.dwLowDateTime;
tmp /= 10;
tmp -= epoch_diff;
tv->tv_sec = (long)(tmp / 1000000);
tv->tv_usec = (long)(tmp % 1000000);
}
return 0;
}
You could do something like find the exact time it is at a point in time and then keep it in a while loop which rechecks the time until it gets to whatever the time you want. Then it just breaks out and continue executing the rest of your program. I'm not sure if I see much of a benefit in looping rather than just using the delay function though.
I'm trying to determine the time that it takes for a machine to receive a packet, process it and give back an answer.
This machine, that I'll call 'server', runs a very simple program, which receives a packet (recv(2)) in a buffer, copies the received content (memcpy(3)) to another buffer and sends the packet back (send(2)). The server runs NetBSD 5.1.2.
My client measures the round-trip time a number of times (pkt_count):
struct timespec start, end;
for(i = 0; i < pkt_count; ++i)
{
printf("%d ", i+1);
clock_gettime(CLOCK_MONOTONIC, &start);
send(sock, send_buf, pkt_size, 0);
recv(sock, recv_buf, pkt_size, 0);
clock_gettime(CLOCK_MONOTONIC, &end);
//struct timespec nsleep = {.tv_sec = 0, .tv_nsec = 100000};
//nanosleep(&nsleep, NULL);
printf("%.3f ", timespec_diff_usec(&end, &start));
}
I removed error checks and other minor things for clarity. The client runs on an Ubuntu 12.04 64-bit. Both programs run in real-time priority, although only the Ubuntu kernel is real time (-rt). The connection between the programs is TCP. This works fine and gives me an average of 750 microseconds.
However, if I enable the commented out nanosleep call (with a sleep of 100 µs), my measures drop 100 µs, giving an average of 650 µs. If I sleep for 200 µs, the measures drop to 550 µs, and so on. This goes up until a sleep of 600 µs, giving an average of 150 µs. Then, if I raise the sleep to 700 µs, my measures go way up to 800 µs in average. I confirmed my program's measures with Wireshark.
I can't figure out what is happening. I already set the TCP_NODELAY socket option in both client and server, no difference. I used UDP, no difference (same behavior). So I guess this behavior is not due to the Nagle algorithm. What could it be?
[UPDATE]
Here's a screenshot of the output of the client together with Wireshark. Now, I ran my server in another machine. I used the same OS with the same configuration (as it is a Live System in a pen drive), but the hardware is different. This behaviour didn't show up, everything worked as expected. But the question remains: why does it happen in the previous hardware?
[UPDATE 2: More info]
As I said before, I tested my pair of programs (client/server) in two different server computers. I plotted the two results obtained.
The first server (the weird one) is a RTD Single Board Computer, with a 1Gbps Ethernet interface. The second server (the normal one) is a Diamond Single Board Computer with a 100Mbps Ethernet interface. Both of them run the SAME OS (NetBSD 5.1.2) from the SAME pendrive.
From these results, I do believe that this behaviour is due either to the driver or the to NIC itself, although I still can't imagine why it happens...
OK, I reached a conclusion.
I tried my program using Linux, instead of NetBSD, in the server. It ran as expected, i.e., no matter how much I [nano]sleep in that point of the code, the result is the same.
This fact tells me that the problem might lie in the NetBSD's interface driver. To identify the driver, I read the dmesg output. This is the relevant part:
wm0 at pci0 dev 25 function 0: 82801I mobile (AMT) LAN Controller, rev. 3
wm0: interrupting at ioapic0 pin 20
wm0: PCI-Express bus
wm0: FLASH
wm0: Ethernet address [OMMITED]
ukphy0 at wm0 phy 2: Generic IEEE 802.3u media interface
ukphy0: OUI 0x000ac2, model 0x000b, rev. 1
ukphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
So, as you can see, my interface is called wm0. According to this (page 9) I should check which driver is loaded by consulting the file sys/dev/pci/files.pci, line 625 (here). It shows:
# Intel i8254x Gigabit Ethernet
device wm: ether, ifnet, arp, mii, mii_bitbang
attach wm at pci
file dev/pci/if_wm.c wm
Then, searching through the driver source code (dev/pci/if_wm.c, here), I found a snippet of code that might change the driver behavior:
/*
* For N interrupts/sec, set this value to:
* 1000000000 / (N * 256). Note that we set the
* absolute and packet timer values to this value
* divided by 4 to get "simple timer" behavior.
*/
sc->sc_itr = 1500; /* 2604 ints/sec */
CSR_WRITE(sc, WMREG_ITR, sc->sc_itr);
Then I changed this 1500 value to 1 (trying to increase the number of interrupts per second allowed) and to 0 (trying to eliminate the interrupt throttling altogether), but both of these values produced the same result:
Without nanosleep: latency of ~400 us
With a nanosleep of 100 us: latency of ~230 us
With a nanosleep of 200 us: latency of ~120 us
With a nanosleep of 260 us: latency of ~70 us
With a nanosleep of 270 us: latency of ~60 us (minimum latency I could achieve)
With a nanosleep of anything above 300 us: ~420 us
This is, at least better behaved than the previous situation.
Therefore, I concluded that the behavior is due to the interface driver of the server. I am not willing to investigate it further in order to find other culprits, as I am moving from NetBSD to Linux for the project involving this Single Board Computer.
This is a (hopefully educated) guess, but I think it might explain what you're seeing.
I'm not sure how real time the linux kernel is. It might not be fully pre-emptive... So, with that disclaimer, continuing :)...
Depending on the scheduler, a task will possibly have what is called a "quanta", which is just an ammount of time it can run for before another task of same priority will be scheduled in. If the kernel is not fully pre-emptive, this might also be the point where a higher priority task can run. This depends on the details of the scheduler which I don't know enough about.
Anywhere between your first gettime and second gettime your task may be pre-empted. This just means that it is "paused" and another task gets to use the CPU for a certain amount of time.
The loop without the sleep might go something like this
clock_gettime(CLOCK_MONOTONIC, &start);
send(sock, send_buf, pkt_size, 0);
recv(sock, recv_buf, pkt_size, 0);
clock_gettime(CLOCK_MONOTONIC, &end);
printf("%.3f ", timespec_diff_usec(&end, &start));
clock_gettime(CLOCK_MONOTONIC, &start);
<----- PREMPTION .. your tasks quanta has run out and the scheduler kicks in
... another task runs for a little while
<----- PREMPTION again and your back on the CPU
send(sock, send_buf, pkt_size, 0);
recv(sock, recv_buf, pkt_size, 0);
clock_gettime(CLOCK_MONOTONIC, &end);
// Because you got pre-empted, your time measurement is artifically long
printf("%.3f ", timespec_diff_usec(&end, &start));
clock_gettime(CLOCK_MONOTONIC, &start);
<----- PREMPTION .. your tasks quanta has run out and the scheduler kicks in
... another task runs for a little while
<----- PREMPTION again and your back on the CPU
and so on....
When you put the nanosecond sleep in, this is most likely a point where the scheduler is able to run before the current task's quanta expires (the same would apply to recv() too, which blocks). So perhaps what you get is something like this
clock_gettime(CLOCK_MONOTONIC, &start);
send(sock, send_buf, pkt_size, 0);
recv(sock, recv_buf, pkt_size, 0);
clock_gettime(CLOCK_MONOTONIC, &end);
struct timespec nsleep = {.tv_sec = 0, .tv_nsec = 100000};
nanosleep(&nsleep, NULL);
<----- PREMPTION .. nanosleep allows the scheduler to kick in because this is a pre-emption point
... another task runs for a little while
<----- PREMPTION again and your back on the CPU
// Now it so happens that because your task got prempted where it did, the time
// measurement has not been artifically increased. Your task then can fiish the rest of
// it's quanta
printf("%.3f ", timespec_diff_usec(&end, &start));
clock_gettime(CLOCK_MONOTONIC, &start);
... and so on
Some kind of interleaving will then occur where sometimes you are prempted between the two gettime()'s and sometimes outside of them because of the nanosleep. Depending on x, you might hit a sweet spot where you happen (by chance) to get your pre-emption point, on average, to be outside your time measurement block.
Anyway, that's my two-pennies worth, hope it helps explain things :)
A little note on "nanoseconds" to finish with...
I think one needs to be cautious with "nanoseconds" sleep. The reason I say this is that I think it is unlikely that an average computer can actually do this unless it uses special hardware.
Normally an OS will have a regular system "tick", generated at perhaps 5ms. This is an interrupt generated by say a RTC (Real Time Clock - just a bit of hardware). Using this "tick" the system then generates it's internal time representation. Thus, the average OS will only have a time resolution of a few milliseconds. The reason that this tick is not faster is that there is a balance to be achieved between keeping a very accurate time and not swamping the system with timer interrupts.
Not sure if I'm a little out of date with your average modern PC... I think some of them do have higher res timers, but still not into the nanosecond range and they might even struggle at 100uS.
So, in summary, keep in mind that the best time resolution you're likely to get is normally in the milliseconds range.
EDIT: Just revisiting this and thought I'd add the following... doesn't explain what your seeing but might provide another avenue for investigation...
As mentioned the timing accuracy of the nanosleep is unlikely to be better than milliseconds. Also your task can be pre-empted which will also cause timing problems. There is also the problem that the time taken for a packet to go up the protocol stack can vary, as well as network delay.
One thing you could try is, if your NIC supports, IEEE1588 (aka PTP). If your NIC supports it, it can timestamp PTP event packets as they leave and enter the PHY. This will give you the bes tpossible estimate of network delay. This eliminates any problems you might have with software pre-emption etc etc. I know squat about Linux PTP I'm afraid, but you could try http://linuxptp.sourceforge.net/
I think 'quanta' is the best theory for explanation.
On linux it is context switch frequency.
Kernel gives to process quanta time. But process is preempted in two situations:
Process call system procedure
quanta time is ended
hardware interrupt is comming (from network, hdd, usb, clock, etc...)
Unused quanta time is assigned to another ready to run process, using priorities/rt etc.
Actually context switch frequency is configured at 10000 times per second, it gives about 100us for quanta. but content switching takes a time, it is cpu depended, see this:
http://blog.tsunanet.net/2010/11/how-long-does-it-take-to-make-context.html
i dont understad, why content swith frequency is that high but it is discussion for linux kernel forum.
partially similar problem you can find here:
https://serverfault.com/questions/14199/how-many-context-switches-is-normal-as-a-function-of-cpu-cores-or-other
If the amount of data being sent by the application is large and fast enough, it could be filling the kernel buffers, which leads to a delay on each send(). Since the sleep is outside the measured section, it would then be eating the time that would otherwise be spent blocking on the send() call.
One way to help check for this case would be to run with a relatively small number of iterations, and then a moderate number of iterations. If the problem occurs with a small number of iterations (say 20) with small packet sizes (say <1k), then this is likely an incorrect diagnoses.
Keep in mind that your process and the kernel can easily overwhelm the network adapter and the wire-speed of the ethernet (or other media type) if sending data in a tight loop like this.
I'm having trouble reading the screen shots. If wireshark shows a constant rate of transmission on the wire, then it suggests this is the correct diagnoses. Of course, doing the math - dividing the wirespeed by the packet size (+ header) - should give an idea of the maximum rate at which the packets can be sent.
As for the 700 microseconds leading to increased delay, that's harder to determine. I don't have any thoughts on that one.
I have an advice on how to create a more accurate performance measurement.
Use the RDTSC instruction (or even better the intrinsic __rdtsc() function). This involves reading a CPU counter without leaving ring3 (no system call).
The gettime functions almost always involve a system call which slows things down.
Your code is a little tricky as it involves 2 system calls (send/recv), but in general it is better to call sleep(0) before the first measurement to ensure that the very short measurement doesn't receive a context switch. Off course the time measurement (and Sleep()) code should be disabled/enabled via macros in performance sensitive functions.
Some operating systems can be tricked into raising your process priority by having your process release it execution time window (e.g. sleep(0)). On the next schedule tick, the OS (not all) will raise the priority of your process since it didn't finish running its execution time quota.
I need a way to get the elapsed time (wall-clock time) since a program started, in a way that is resilient to users meddling with the system clock.
On windows, the non standard clock() implementation doesn't do the trick, as it appears to work just by calculating the difference with the time sampled at start up, so that I get negative values if I "move the clock hands back".
On UNIX, clock/getrusage refer to system time, whereas using function such as gettimeofday to sample timestamps has the same problem as using clock on windows.
I'm not really interested in precision, and I've hacked a solution by having a half a second resolution timer spinning in the background countering the clock skews when they happen
(if the difference between the sampled time and the expected exceeds 1 second i use the expected timer for the new baseline) but I think there must be a better way.
I guess you can always start some kind of timer. For example under Linux a thread
that would have a loop like this :
static void timer_thread(void * arg)
{
struct timespec delay;
unsigned int msecond_delay = ((app_state_t*)arg)->msecond_delay;
delay.tv_sec = 0;
delay.tv_nsec = msecond_delay * 1000000;
while(1) {
some_global_counter_increment();
nanosleep(&delay, NULL);
}
}
Where app_state_t is an application structure of your choice were you store variables. If you want to prevent tampering, you need to be sure no one killed your thread
For POSIX, use clock_gettime() with CLOCK_MONOTONIC.
I don't think you'll find a cross-platform way of doing that.
On Windows what you need is GetTickCount (or maybe QueryPerformanceCounter and QueryPerformanceFrequency for a high resolution timer). I don't have experience with that on Linux, but a search on Google gave me clock_gettime.
Wall clock time can bit calculated with the time() call.
If you have a network connection, you can always acquire the time from an NTP server. This will obviously not be affected in any the local clock.
/proc/uptime on linux maintains the number of seconds that the system has been up (and the number of seconds it has been idle), which should be unaffected by changes to the clock as it's maintained by the system interrupt (jiffies / HZ). Perhaps windows has something similar?