I have a process that feeds a piece of hardware (data transmission device) with a specific buffer size. What can I reasonable expect from the windows scheduler windows to ensure I do not get a buffer underflow?
My buffer is 32K in size and gets consumed at ~800k bytes per second.
If I fill it in 16k byte batches that is one batch every 20ms. However, what is my lower limit for filling it. If say, I call sleep(0) in my filling loop what is my reasonable worst case scheduling interval?
OS = Windows XP SP3
Dual Core 2.2Ghz
Note, I am making an API call to check the buffer fill level and a call to the driver API to pass it the data. I am assuming these are scheduling points that Windows could make use of in addition to the sleep(0).
I would like to (as a process) play nice and still meet my realtime deadline. The machine is dedicated to this task but needs to receive the data over the network and send it to the IO device.
What can I expect for scheduler perfomance?
What else do I need to take into account.

There is no guaranteed worst-case. Losing the CPU for hundreds of milliseconds is quite possible. You are subject to whatever kernel threads are doing, they'll always run with a higher priority than you can ever get. Running into a misbehaving NIC, USB or audio driver is a problem you'll constantly be fighting. Unless you can control the hardware.
If you can survive occasional under-runs then make sure that the I/O request you use to get the device data is a waitable event. Windows likes scheduling threads that are blocking on an I/O request that completed ahead of all other ones. Polling with a Sleep() is not a good strategy. It burns CPU cycles needlessly and the scheduler won't favor the thread at all.
If you can't survive the under-runs then you need to consider a device driver.

There is no guarantee: Windows is not a real-time O/S.
What else do I need to take into account
What else is running on the machine (something high priority might preempt you)
How much RAM you have (system performance changes a lot when RAM is in short supply)
Whether you're dong I/O (because you might e.g. stall while waiting for disk or network access)
I would like to (as a process) play nice and still meet my realtime deadline. The machine is dedicated to this task but needs to receive the data over the network and send it to the IO device.
Consider setting the priority of your process and/or thread at "real time priority".


I've got an experimental box of tricks running that, every 100 ms or so, will spit out a 4 microsecond long +5V pulse of electricity on a TTL line. The exact time that this happens is not known ahead of time, but it's important -- so I'd like to use the Red Hat 5.3 computer that essentially runs the experiment to service this TTL, and create a glorified timestamp.
At the moment, what I've done is wired the TTL into pin 13 of the parallel port (STATUS_SELECT, one of the input lines on a parallel port) on the linux box, spawn a process when the experiment starts, use chrt to change its scheduled priority to 99 -- i.e. high -- and then just poll the parallel port repeatedly in a while loop until the pin goes high. I then create an accurate timestamp, and, in a non-blocking way write it to disk.
Obviously, this is inefficient -- sometimes the process is suspended, and a TTL will be missed. As the computer is, itself, busy doing other things (namely acquiring data from my experimental bit of kit -- an MRI scanner!) this happens quite often. Polling is easy, but probably bad.
My question is this: doing something quickly when a TTL occurs seems like the bread-and-butter of computing, but, as far as I can tell, it's only possible to deal with interrupts on linux if you're a kernel module. The parallel port can generate interrupts, and libraries like paraport let you build kernel modules relatively quickly, where you have to supply your own handler.
Is the best way to deal with this problem and create accurate (±25 ms) timestamps for an experiment whenever that TTL comes in -- to write a kernel module that provides a list of recent interrupts to somewhere in /proc, and then read them out with a normal process later? Is that approach not going to work, and be very CPU inefficient -- or open a bag of worms to do with interrupt priority I'm not aware of?
Most importantly, this seems like it should be a solved problem -- is it, and if so do any wise people wish to point me in the right direction? Writing a kernel module seems like, frankly, a lot of hard, risky work for something that feels as if it should perhaps be simple.
The premise that "it's only possible to deal with interrupts on linux if you're a kernel module" dismisses some fairly common and effective strategies.
The simple course of action for responding to interrupts in userspace (especially infrequent ones) is to have a driver which created a kernel device (or in some cases sysfs node) where either a read() or perhaps a custom ioctl() from userspace will block until the interrupt occurs. You'd have to check if the default parallel port driver supports this, but it's extremely common with the GPIO drivers on embedded-type boards, and the basic scheme could be borrowed into the parallel port - provided that the hardware supports true interrupts.
If very precise timing is the goal, you might do better to customize the kernel module to record the timestamp there, and implement a mechanism where a read() from userspace blocks until the interrupt occurs, and then obtains the kernel's already recorded timestamp as the read data - thus avoiding the variable latency of waking userspace and calling back into the kernel to get the time.
You might also look at true local-bus serial ports (if present) as an alternate-interrupt capable interface in cases where the available parallel port is some partial or indirect implementation which doesn't support them.
In situations where your only available interface is something indirect and high latency such as USB, or where you want a lot of host- and operation-system- independence, then it may indeed make sense to use an external microcontroller. In that case, you would probably try to set the micro's clock from the host system, and then have it give you timestamp messages every time it sees an event. If your experiment only needs the timestamps to be relative to each other within a given experimental session, this should work well. But if you need to establish an absolute time synchronization across the USB latency, you may have to do some careful roundtrip measurement and then estimation of the latency in order to compensate it (see NTP for an extreme example).

Linux already contains all the interrupt handling for network data. don't have to do anything regarding this. Data arrives, Linux will process it (in the kernel) and pass it to the process waiting for the data. do not write interrupt handlers for network devices. You don't have to write an interrupt handler, because all the interrupt handlers needed are already provided by Linux. Just have your program read from the opened socket.
I want to know the time at which the kernel starts executing after the interruption. could some one help me how to know the time at which the kernel starts executing ??
how to copy the time when the interrupt occurs and send it back as a response to the client.
This time will change according to what the machine is currently doing, if it is in a critical section where the interruptions are masked it will wait. Hopefully these critical sections are short.
You can use a logic analyser to look at that (I did it a long long time ago on a Windows NT machine - pentium 100 MHz, and the usual interrupt latency was a few micro seconds, while with an IDE drive busy at the same time it was often 100 ms). I bet that with a recent machine and linux standard kernel it should always be a few microseconds, less than 30, but that's just a guess. Real time linux kernel will have a consistent response time.

Today my boss and I were having a discussion about some code I had written. My code downloads 3 files from a given HTTP/HTTPS link. I had multi-threaded the download so that all 3 files are downloading simultaneously in 3 separate threads. During this discussion, my boss tells me that the code is going to be shipped to people who will most likely be running old hardware and software (I'm talking Windows 2000).
Until this time, I had never considered how a threaded application would scale on older hardware. I realize that if the CPU has only 1 core, threads are useless and may even worsen performance. I have been wondering if this download task is an I/O operation. Meaning, if an API is blocked waiting for information from the HTTP/HTTPS server, will another thread that wants to do some calculation be scheduled meanwhile? Do older OSes do such scheduling?
Another thing he said: Since the code is going to be run on old machines, my application should not eat the CPU. He said use Sleep() calls after CPU intensive tasks to allow other programs some breathing space. Now I was always under the impression that using Sleep() is terrible in any program. Am I wrong? When is using Sleep() justified?
Thanks for looking!
I have been wondering if this download task is an I/O operation.
Meaning, if an API is blocked waiting for information from the
HTTP/HTTPS server, will another thread that wants to do some
calculation be scheduled meanwhile? Do older OSes do such scheduling?
Yes they do. That's the joke of having blocked IO. The thread is suspended and other calculations (threads) take place until an event wakes up the blocked thread. That's why it makes completely sense to split it up into threads even for single core machines instead of doing some poor man scheduling between the downloads yourself in a single thread.
Of course your downloads affect each other regarding bandwith, so threading won't help to speedup the download :-)
Another thing he said: Since the code is going to be run on old
machines, my application should not eat the CPU. He said use Sleep()
calls after CPU intensive tasks to allow other programs some breathing
Actually using sleep AFTER the task finished won't help here. Doing Sleep after a certain time of calculation (doing sort of time slicing) before going on with the calculation could help. But this is only true for cooperative systems (e.g. like Windows 3.11). This does not play a role for preemptive systems where the scheduler uses time slicing to allocate calculation time to threads. Here it would be more important to think about lowering the priority for CPU intensive tasks in order to give other tasks precedence...
Now I was always under the impression that using Sleep() is terrible
in any program. Am I wrong? When is using Sleep() justified?
This really depends on what you are doing. If you implement sort of busy waiting for a certain flag being set which is set maybe after few seconds it's better to recheck if it's set after going to sleep for a while in order to give up your scheduled time slice instead of just buring CPU power with checking for a flag never being set.
In modern systems there is no sense in introducing Sleep in a calculation as it will only slow down your calculation.
Scheduling is subject to the OS's scheduler. He's the one with the "big picture". In my opinion every approach to "do it better" is only valid inside the scope of a specific application where you have the overview over certain relationships that are not obvious to the scheduler.
I did some research and found that Windows supports preemptive multitasking from Windows 95. The Windows NT-line (where Windows 2000 belongs to) always supported preemptive multitasking.

How can I make my pthreads execute a function each time they are rescheduled by the kernel?
I need to identify on which physical CPU/socket (not logical core) my thread is being scheduled at and cannot afford to do this all the time.
Can the wakeup routine be hooked somehow to make the necessary updates to TLS only when the thread is actually being rescheduled?
As to why I need this: I have code which executes AMOs appx every 70ns per thread which is fine if the address is not cached on another socket, deploying the same code on two sockets gives a 15 times performance impact because of frequent cache invalidations. I intend to allocate memory especially for this which is only shared among threads running the same L3 cache. So I need to identify on which socket I am running and address the correct memory block. I could obviously call sched_getcpu and compare this to the physical CPU ID in /proc/cpuinfo, but this is a rather big overhead. I cannot afford to allocate thread-private memory for each thread though, too expensive.
From what I have read in Linux Kernel Development, Third Edition, there is no service nor interface, provided by the kernel, for what you want. Using pthread_setaffinity (as suggested above by #osgx, or, in more recent linux kernel implementations, pthread_setaffinity_np) or caching a TLS key per cpu socket in the beginning (as suggested above by #caf) are perhaps the best methods to use in that direction.

It is possible to make real time intervals in not Real-Time Linux application in C/C++?
I'm writing a ADC simulator. This is an application that generates packages with certain frequency. It is important that the frequency of package generation as closely as possible corresponded to the sampling rate of ADC. Why I don't want to use sleep() and usleep() to set package generation time intervals.
It is possible to make real time intervals in not Real-Time Linux application in C/C++?
No... if it were, it would be a Real-Time Linux system.
That said, you can probably get very close, so it depends on your intervals and tolerances. Your only serious option for sub-timeslice precision is to nail the sending thread to a core and let it spin, while keeping other processing off that core, but that's very wasteful of hardware....
If you can afford to have latencies long enough for your sending code to be re-scheduled then you can look at setting up alarms & signal handlers, but that's potentially massively higher latency, perhaps only on relatively rare occasions where the cores have all been otherwise utilised. To assess how well this works, you've got to do real measurements under realistic system loads.
The packet generator shouldn't be with the packet sender.
If you want the packets to be sent on time, you should create the packets before hand, and send them to the packer sender.
So you need a thread with a work queue, and use a sleep on that thread to send the packets on time. (you can look a boost's sleep())
