Timers and Sleep on windows - c

I am trying to port some linux code that uses nanosleep() and get_clocktime() to windows. As far as I have read there really isn't that many performance timers on windows and there are no real sleep functions other than Sleep(). I found QueryPerformanceCounter and QueryPerformanceFrequency on windows, but how would I adapt those to use with a fast sleep function. The code in which I am trying to port is located at this StackOverflow post: https://stackoverflow.com/a/13559213/1161270
Overall I'm trying to port linux code to windows that uses nanosleep(), get_clocktime() and struct timespec, but there seems to be no real equivalent. I am also open to other ideas on how to add throttle delays. I've read into the PdhGetFormattedCounterArray() functions and I have working code to monitor the output bandwidth of the computer in bytes, but I am unsure on how to use this data to create a delay to throttle back data sending to a specific kb/s speed, and would much rather use the other method provided in the linked post.
Thank you for your time.

Bear in mind that the ACTUAL precision on most OS's types of sleeps are often in milliseconds (or even some multiple thereof). It may well appear that you can sleep shorter periods, but in reality, either the OS doesn't actually put the process to sleep, or the sleep period is "ticks". This is true for both Linux (depending on kernel configuration) and Windows.
Microsoft explains timeouts here (and Sleep is simply a timeout waiting for nothing to happen):
http://msdn.microsoft.com/en-gb/library/windows/desktop/ms687069%28v=vs.85%29.aspx#waitfunctionsandtime-outintervals

Related

what is the reference for timing calculations in linux

I want to clarify about timers in linux, how they are behaving?
I know in micro-controllers the timers/counters we use the reference, timing of machine instruction to execute.so there we could make it loop for how much time we need sleep/timer/counter.
But in linux where & how it will take the reference that if i use sleep(5), exactly 5 seconds are elapsed.If any one know please clarify me kindly.
Every operating system kernel (that I know of) has a whole machine independent framework for timers. This is pretty much one of the most central things a kernel must have because we need timers for everything, process scheduling, dealing with hardware errors, select/poll timeouts, network protocols, etc. At any point in time your kernel has dozens, if not thousands of timers waiting to be executed at some point in the future. Most of them will be canceled and never executed.
The simplest framework that pretty much everyone uses sets up one of the many clocks in a machine to generate an interrupt at a set interval. 100Hz is the most common, Windows (at least in the past) sets it to 64Hz (but it can be changed by any application), some systems experimented with 1024Hz. The timer interrupt fires and the interrupt handler checks if there's anything queued up to do at that time and if there is, it is executed. There has been some work for Linux to improve this so that we can get shorter or longer intervals than 10ms depending on the next scheduled timer, both to improve the precision of the timers and to save power, but in general it works as described above.
If I understand your question correctly, you think that there is something that measures how much certain sequence of instructions takes and then loops until some amount of time passes. This is something that is almost never done because it wastes power and it blocks anything else from running at the same time and is also quite unreliable. It is still done in modern kernels, but very rarely and only when high precision is required when talking to really, really stupid hardware. Last time I had to do it was 17 years ago to talk to some ethernet controller where you had to manually implement MII by bit-banging in software, it was terrible and hung the system for quite a long time every time you (un-)plugged an ethernet cable. Nobody builds hardware that requires this anymore because it really ruins the performance of modern systems.
So in your question, sleep(5) will be implemented by registering a function in the timer framework to be called in 5 seconds from now and then putting the process to sleep. 5 seconds later the timer fires and the process gets awakened again.

Reading a 4 µs long +5V TTL from a parallel port -- when to use kernel interrupts

I've got an experimental box of tricks running that, every 100 ms or so, will spit out a 4 microsecond long +5V pulse of electricity on a TTL line. The exact time that this happens is not known ahead of time, but it's important -- so I'd like to use the Red Hat 5.3 computer that essentially runs the experiment to service this TTL, and create a glorified timestamp.
At the moment, what I've done is wired the TTL into pin 13 of the parallel port (STATUS_SELECT, one of the input lines on a parallel port) on the linux box, spawn a process when the experiment starts, use chrt to change its scheduled priority to 99 -- i.e. high -- and then just poll the parallel port repeatedly in a while loop until the pin goes high. I then create an accurate timestamp, and, in a non-blocking way write it to disk.
Obviously, this is inefficient -- sometimes the process is suspended, and a TTL will be missed. As the computer is, itself, busy doing other things (namely acquiring data from my experimental bit of kit -- an MRI scanner!) this happens quite often. Polling is easy, but probably bad.
My question is this: doing something quickly when a TTL occurs seems like the bread-and-butter of computing, but, as far as I can tell, it's only possible to deal with interrupts on linux if you're a kernel module. The parallel port can generate interrupts, and libraries like paraport let you build kernel modules relatively quickly, where you have to supply your own handler.
Is the best way to deal with this problem and create accurate (±25 ms) timestamps for an experiment whenever that TTL comes in -- to write a kernel module that provides a list of recent interrupts to somewhere in /proc, and then read them out with a normal process later? Is that approach not going to work, and be very CPU inefficient -- or open a bag of worms to do with interrupt priority I'm not aware of?
Most importantly, this seems like it should be a solved problem -- is it, and if so do any wise people wish to point me in the right direction? Writing a kernel module seems like, frankly, a lot of hard, risky work for something that feels as if it should perhaps be simple.
The premise that "it's only possible to deal with interrupts on linux if you're a kernel module" dismisses some fairly common and effective strategies.
The simple course of action for responding to interrupts in userspace (especially infrequent ones) is to have a driver which created a kernel device (or in some cases sysfs node) where either a read() or perhaps a custom ioctl() from userspace will block until the interrupt occurs. You'd have to check if the default parallel port driver supports this, but it's extremely common with the GPIO drivers on embedded-type boards, and the basic scheme could be borrowed into the parallel port - provided that the hardware supports true interrupts.
If very precise timing is the goal, you might do better to customize the kernel module to record the timestamp there, and implement a mechanism where a read() from userspace blocks until the interrupt occurs, and then obtains the kernel's already recorded timestamp as the read data - thus avoiding the variable latency of waking userspace and calling back into the kernel to get the time.
You might also look at true local-bus serial ports (if present) as an alternate-interrupt capable interface in cases where the available parallel port is some partial or indirect implementation which doesn't support them.
In situations where your only available interface is something indirect and high latency such as USB, or where you want a lot of host- and operation-system- independence, then it may indeed make sense to use an external microcontroller. In that case, you would probably try to set the micro's clock from the host system, and then have it give you timestamp messages every time it sees an event. If your experiment only needs the timestamps to be relative to each other within a given experimental session, this should work well. But if you need to establish an absolute time synchronization across the USB latency, you may have to do some careful roundtrip measurement and then estimation of the latency in order to compensate it (see NTP for an extreme example).

Programming a relatively large, threaded application for old systems

Today my boss and I were having a discussion about some code I had written. My code downloads 3 files from a given HTTP/HTTPS link. I had multi-threaded the download so that all 3 files are downloading simultaneously in 3 separate threads. During this discussion, my boss tells me that the code is going to be shipped to people who will most likely be running old hardware and software (I'm talking Windows 2000).
Until this time, I had never considered how a threaded application would scale on older hardware. I realize that if the CPU has only 1 core, threads are useless and may even worsen performance. I have been wondering if this download task is an I/O operation. Meaning, if an API is blocked waiting for information from the HTTP/HTTPS server, will another thread that wants to do some calculation be scheduled meanwhile? Do older OSes do such scheduling?
Another thing he said: Since the code is going to be run on old machines, my application should not eat the CPU. He said use Sleep() calls after CPU intensive tasks to allow other programs some breathing space. Now I was always under the impression that using Sleep() is terrible in any program. Am I wrong? When is using Sleep() justified?
Thanks for looking!
I have been wondering if this download task is an I/O operation.
Meaning, if an API is blocked waiting for information from the
HTTP/HTTPS server, will another thread that wants to do some
calculation be scheduled meanwhile? Do older OSes do such scheduling?
Yes they do. That's the joke of having blocked IO. The thread is suspended and other calculations (threads) take place until an event wakes up the blocked thread. That's why it makes completely sense to split it up into threads even for single core machines instead of doing some poor man scheduling between the downloads yourself in a single thread.
Of course your downloads affect each other regarding bandwith, so threading won't help to speedup the download :-)
Another thing he said: Since the code is going to be run on old
machines, my application should not eat the CPU. He said use Sleep()
calls after CPU intensive tasks to allow other programs some breathing
space.
Actually using sleep AFTER the task finished won't help here. Doing Sleep after a certain time of calculation (doing sort of time slicing) before going on with the calculation could help. But this is only true for cooperative systems (e.g. like Windows 3.11). This does not play a role for preemptive systems where the scheduler uses time slicing to allocate calculation time to threads. Here it would be more important to think about lowering the priority for CPU intensive tasks in order to give other tasks precedence...
Now I was always under the impression that using Sleep() is terrible
in any program. Am I wrong? When is using Sleep() justified?
This really depends on what you are doing. If you implement sort of busy waiting for a certain flag being set which is set maybe after few seconds it's better to recheck if it's set after going to sleep for a while in order to give up your scheduled time slice instead of just buring CPU power with checking for a flag never being set.
In modern systems there is no sense in introducing Sleep in a calculation as it will only slow down your calculation.
Scheduling is subject to the OS's scheduler. He's the one with the "big picture". In my opinion every approach to "do it better" is only valid inside the scope of a specific application where you have the overview over certain relationships that are not obvious to the scheduler.
Addendum:
I did some research and found that Windows supports preemptive multitasking from Windows 95. The Windows NT-line (where Windows 2000 belongs to) always supported preemptive multitasking.

real time intervals in C/C++

It is possible to make real time intervals in not Real-Time Linux application in C/C++?
I'm writing a ADC simulator. This is an application that generates packages with certain frequency. It is important that the frequency of package generation as closely as possible corresponded to the sampling rate of ADC. Why I don't want to use sleep() and usleep() to set package generation time intervals.
Thanks.
It is possible to make real time intervals in not Real-Time Linux application in C/C++?
No... if it were, it would be a Real-Time Linux system.
That said, you can probably get very close, so it depends on your intervals and tolerances. Your only serious option for sub-timeslice precision is to nail the sending thread to a core and let it spin, while keeping other processing off that core, but that's very wasteful of hardware....
If you can afford to have latencies long enough for your sending code to be re-scheduled then you can look at setting up alarms & signal handlers, but that's potentially massively higher latency, perhaps only on relatively rare occasions where the cores have all been otherwise utilised. To assess how well this works, you've got to do real measurements under realistic system loads.
The packet generator shouldn't be with the packet sender.
If you want the packets to be sent on time, you should create the packets before hand, and send them to the packer sender.
So you need a thread with a work queue, and use a sleep on that thread to send the packets on time. (you can look a boost's sleep())

Not able to kill bad kernel running on NVIDIA GPU

I am in a real fix. Please help. Its urgent.
I have a host process that spawns multiple host(CPU) threads (pthreads). These threads in turn call the CUDA kernel. These CUDA kernels are written by external users. So it might be bad kernels that enter infinite loop. In order to overcome this I have put a time-out of 2 mins that will kill the corresponding CPU thread.
Will killing the CPU thread also kill the kernel running on the GPU? As far as what I have tested it does'nt.
How can I kill all the threads currently running in the GPU?
Edit: The reason I am using CPU threads that call the kernel is because, the sever has two Tesla GPU's. So the thread will schedule the kernel on the GPU device alternatively.
Thanks,
Arvind
It doesn't seem to. I ran a broken kernel and locked up one of my devices seemingly indefinitely (until reboot). I'm not sure how to kill running kernel. I think there is a way to limit kernel execution time via the driver, though, so that might be the way to go.
Unless there's a larger part of this I'm not really getting, You might be better off using CUDA Streams api for multi-device tasking, but YMMV.
As for the killing; if you're running the cards with a display (and x server) attached, they will automatically timeout after 5 seconds (again, YMMV).
Assuming that this isn't the case; check out calling cudaDeviceReset() API Reference; from the 'parent' thread after your own prescribed 'kill' timeout.
I have not implemented this function in my own code yet so honestly have no idea if it'll work in your situation, but its worth investigation.
Will killing the CPU thread also kill the kernel running on the GPU? As far as what I have tested it does'nt.
Probably not. On Linux u can use cuda-gdb to figure that out.
I don't see the point of sending multiple kernels to the GPU using threads.. I wonder what happens if you send multiple Kernels to the GPU at time.. Will the thread scheduler of the GPU deal with that?

Resources