Timeout for KVM userspace guest

Timeout for KVM userspace guest - c

I am building a custom VMM, and I am trying to implement a timeout without using signals (which are "sent" to the whole process) or threads (I'm not going to use threads).
Now, one idea is to implement the LAPIC and just before executing the guest code we could program the LAPIC TIMER to trigger after a certain time. It should be possible to have a fairly decent timeout with this. However, this solution is fairly painful to do just for simple timeout behavior.
Is there no other, better way to get KVM to interrupt itself after a certain amount of time? I was really hoping for an argument to KVM_RUN or just about anything, really.
As should be plain from the title, the guest is executing in userspace most of the time. There is a razor thin kernel layer. I don't really want to install a LAPIC unless I absolutely have to. Ideas?

Using KVM_CREATE_IRQCHIP in combination with KVM_SET_LAPIC we can utilize the emulated LVT timer to get per-thread execution timeouts on KVM without any trouble. It is expensive to call KVM_SET_LAPIC, but it is necessary in order to avoid exposing the MSRs and device to the guest.
I tried alternatively to write the MSRs using KVM API, however even that is not possible (I'm guessing without the proper CPUID bits). Either way, the LAPIC timer works no matter how many features you disable in the guest.
KVM_SET_LAPIC costs around 3 microseconds (which is extreme) on my machine, so I'm still looking for alternatives.
I speculate that given you trust ring-0 in the kernel writing just the x2APIC TIMER and INITCNT MSRs might be cheaper.
One thing to remember is to also set the CURRENTCNT register at the same time, because KVM_SET_LAPIC is explicit, and if you end up with a CURRENTCNT > INITCNT you will get a dmesg log entry, which can be expensive.

Related

Reading a 4 µs long +5V TTL from a parallel port -- when to use kernel interrupts

I've got an experimental box of tricks running that, every 100 ms or so, will spit out a 4 microsecond long +5V pulse of electricity on a TTL line. The exact time that this happens is not known ahead of time, but it's important -- so I'd like to use the Red Hat 5.3 computer that essentially runs the experiment to service this TTL, and create a glorified timestamp.
At the moment, what I've done is wired the TTL into pin 13 of the parallel port (STATUS_SELECT, one of the input lines on a parallel port) on the linux box, spawn a process when the experiment starts, use chrt to change its scheduled priority to 99 -- i.e. high -- and then just poll the parallel port repeatedly in a while loop until the pin goes high. I then create an accurate timestamp, and, in a non-blocking way write it to disk.
Obviously, this is inefficient -- sometimes the process is suspended, and a TTL will be missed. As the computer is, itself, busy doing other things (namely acquiring data from my experimental bit of kit -- an MRI scanner!) this happens quite often. Polling is easy, but probably bad.
My question is this: doing something quickly when a TTL occurs seems like the bread-and-butter of computing, but, as far as I can tell, it's only possible to deal with interrupts on linux if you're a kernel module. The parallel port can generate interrupts, and libraries like paraport let you build kernel modules relatively quickly, where you have to supply your own handler.
Is the best way to deal with this problem and create accurate (±25 ms) timestamps for an experiment whenever that TTL comes in -- to write a kernel module that provides a list of recent interrupts to somewhere in /proc, and then read them out with a normal process later? Is that approach not going to work, and be very CPU inefficient -- or open a bag of worms to do with interrupt priority I'm not aware of?
Most importantly, this seems like it should be a solved problem -- is it, and if so do any wise people wish to point me in the right direction? Writing a kernel module seems like, frankly, a lot of hard, risky work for something that feels as if it should perhaps be simple.

The premise that "it's only possible to deal with interrupts on linux if you're a kernel module" dismisses some fairly common and effective strategies.
The simple course of action for responding to interrupts in userspace (especially infrequent ones) is to have a driver which created a kernel device (or in some cases sysfs node) where either a read() or perhaps a custom ioctl() from userspace will block until the interrupt occurs. You'd have to check if the default parallel port driver supports this, but it's extremely common with the GPIO drivers on embedded-type boards, and the basic scheme could be borrowed into the parallel port - provided that the hardware supports true interrupts.
If very precise timing is the goal, you might do better to customize the kernel module to record the timestamp there, and implement a mechanism where a read() from userspace blocks until the interrupt occurs, and then obtains the kernel's already recorded timestamp as the read data - thus avoiding the variable latency of waking userspace and calling back into the kernel to get the time.
You might also look at true local-bus serial ports (if present) as an alternate-interrupt capable interface in cases where the available parallel port is some partial or indirect implementation which doesn't support them.
In situations where your only available interface is something indirect and high latency such as USB, or where you want a lot of host- and operation-system- independence, then it may indeed make sense to use an external microcontroller. In that case, you would probably try to set the micro's clock from the host system, and then have it give you timestamp messages every time it sees an event. If your experiment only needs the timestamps to be relative to each other within a given experimental session, this should work well. But if you need to establish an absolute time synchronization across the USB latency, you may have to do some careful roundtrip measurement and then estimation of the latency in order to compensate it (see NTP for an extreme example).

Programming a relatively large, threaded application for old systems

Today my boss and I were having a discussion about some code I had written. My code downloads 3 files from a given HTTP/HTTPS link. I had multi-threaded the download so that all 3 files are downloading simultaneously in 3 separate threads. During this discussion, my boss tells me that the code is going to be shipped to people who will most likely be running old hardware and software (I'm talking Windows 2000).
Until this time, I had never considered how a threaded application would scale on older hardware. I realize that if the CPU has only 1 core, threads are useless and may even worsen performance. I have been wondering if this download task is an I/O operation. Meaning, if an API is blocked waiting for information from the HTTP/HTTPS server, will another thread that wants to do some calculation be scheduled meanwhile? Do older OSes do such scheduling?
Another thing he said: Since the code is going to be run on old machines, my application should not eat the CPU. He said use Sleep() calls after CPU intensive tasks to allow other programs some breathing space. Now I was always under the impression that using Sleep() is terrible in any program. Am I wrong? When is using Sleep() justified?
Thanks for looking!

I have been wondering if this download task is an I/O operation.
Meaning, if an API is blocked waiting for information from the
HTTP/HTTPS server, will another thread that wants to do some
calculation be scheduled meanwhile? Do older OSes do such scheduling?
Yes they do. That's the joke of having blocked IO. The thread is suspended and other calculations (threads) take place until an event wakes up the blocked thread. That's why it makes completely sense to split it up into threads even for single core machines instead of doing some poor man scheduling between the downloads yourself in a single thread.
Of course your downloads affect each other regarding bandwith, so threading won't help to speedup the download :-)
Another thing he said: Since the code is going to be run on old
machines, my application should not eat the CPU. He said use Sleep()
calls after CPU intensive tasks to allow other programs some breathing
space.
Actually using sleep AFTER the task finished won't help here. Doing Sleep after a certain time of calculation (doing sort of time slicing) before going on with the calculation could help. But this is only true for cooperative systems (e.g. like Windows 3.11). This does not play a role for preemptive systems where the scheduler uses time slicing to allocate calculation time to threads. Here it would be more important to think about lowering the priority for CPU intensive tasks in order to give other tasks precedence...
Now I was always under the impression that using Sleep() is terrible
in any program. Am I wrong? When is using Sleep() justified?
This really depends on what you are doing. If you implement sort of busy waiting for a certain flag being set which is set maybe after few seconds it's better to recheck if it's set after going to sleep for a while in order to give up your scheduled time slice instead of just buring CPU power with checking for a flag never being set.
In modern systems there is no sense in introducing Sleep in a calculation as it will only slow down your calculation.
Scheduling is subject to the OS's scheduler. He's the one with the "big picture". In my opinion every approach to "do it better" is only valid inside the scope of a specific application where you have the overview over certain relationships that are not obvious to the scheduler.
Addendum:
I did some research and found that Windows supports preemptive multitasking from Windows 95. The Windows NT-line (where Windows 2000 belongs to) always supported preemptive multitasking.

Is there a difference between a real time system and one that is just deterministic?

At work we're discussing the design of a new platform and one of the upper management types said it needed to run our current code base (C on Linux) but be real time because it needed to respond in less than a second to various inputs. I pointed out that:
That point doesn't mean it needs to be "real time" just that it needs a faster clock and more streamlining in its interrupt handling
One of the key points to consider is the OS that's being used. They wanted to stick with embedded Linux, I pointed out we need an RTOS. Using Linux will prevent "real time" because of the kernel/user space memory split thus I/O is done via files and sockets which introduce a delay
What we really need to determine is if it needs to be deterministic (needs to respond to input in <200ms 90% of the time for example).
Really in my mind if point 3 is true, then it needs to be a real time system, and then point 2 is the biggest consideration.
I felt confident answering, but then I was thinking about it later... What do others think? Am I on the right track here or am I missing something?
Is there any difference that I'm missing between a "real time" system and one that is just "deterministic"? And besides a RTC and a RTOS, am I missing anything major that is required to execute a true real time system?
Look forward to some great responses!
EDIT:
Got some good responses so far, looks like there's a little curiosity about my system and requirements so I'll add a few notes for those who are interested:
My company sells units in the 10s of thousands, so I don't want to go over kill on the price
Typically we sell a main processor board and an independent display. There's also an attached network of other CAN devices.
The board (currently) runs the devices and also acts as a webserver sending basic XML docs to the display for end users
The requirements come in here where management wants the display to be updated "quickly" (<1s), however the true constraints IMO come from the devices that can be attached over CAN. These devices are frequently motor controlled devices with requirements including "must respond in less than 200ms".

You need to distinguish between:
Hard realtime: there is an absolute limit on response time that must not be breached (counts as a failure) - e.g. this is appropriate for example when you are controlling robotic motors or medical devices where failure to meet a deadline could be catastrophic
Soft realtime: there is a requirement to respond quickly most of the time (perhaps 99.99%+), but it is acceptable for the time limit to be occasionally breached providing the response on average is very fast. e.g. this is appropriate when performing realtime animation in a computer game - missing a deadline might cause a skipped frame but won't fundamentally ruin the gaming experience
Soft realtime is readily achievable in most systems as long as you have adequate hardware and pay sufficient attention to identifying and optimising the bottlenecks. With some tuning, it's even possible to achieve in systems that have non-deterministic pauses (e.g. the garbage collection in Java).
Hard realtime requires dedicated OS support (to guarantee scheduling) and deterministic algorithms (so that once scheduled, a task is guaranteed to complete within the deadline). Getting this right is hard and requires careful design over the entire hardware/software stack.
It is important to note that most business apps don't require either: in particular I think that targeting a <1sec response time is far away from what most people would consider a "realtime" requirement. Having said that, if a response time is explicitly specified in the requirements then you can regard it as soft realtime with a fairly loose deadline.

From the definition of the real-time tag:
A task is real-time when the timeliness of the activities' completion is a functional requirement and correctness condition, rather than merely a performance metric. A real-time system is one where some (though perhaps not all) of the tasks are real-time tasks.
In other words, if something bad will happen if your system responds too slowly to meet a deadline, the system needs to be real-time and you will need a RTOS.
A real-time system does not need to be deterministic: if the response time randomly varies between 50ms and 150ms but the response time never exceeds 150ms then the system is non-deterministic but it is still real-time.

Maybe you could try to use RTLinux or RTAI if you have sufficient time to experiment with. With this, you can keep the non realtime applications on the linux, but the realtime applications will be moved to the RTOS part. In that case, you will(might) achieve <1second response time.
The advantages are -
Large amount of code can be re-used
You can manually partition realtime and non-realtime tasks and try to achieve the response <1s as you desire.
I think migration time will not be very high, since most of the code will be in linux
Just on a sidenote be careful about the hardware drivers that you might need to run on the realtime part.
The following architecture of RTLinux might help you to understand how this can be possible.

It sounds like you're on the right track with the RTOS. Different RTOSs prioritize different things either robustness or speed or something. You will need to figure out if you need a hard or soft RTOS and based on what you need, how your scheduler is going to be driven. One thing is for sure, there is a serious difference betweeen using a regular OS and a RTOS.
Note: perhaps for the truest real time system you will need hard event based resolution so that you can guarantee that your processes will execute when you expect them too.

RTOS or real-time operating system is designed for embedded applications. In a multitasking system, which handles critical applications operating systems must be
1.deterministic in memory allocation,
2.should allow CPU time to different threads, task, process,
3.kernel must be non-preemptive which means context switch must happen only after the end of task execution. etc
SO normal windows or Linux cannot be used.
example of RTOS in an embedded system: satellites, formula 1 cars, CAR navigation system.
Embedded System: System which is designed to perform a single or few dedicated functions.
The system with RTOS: also can be an embedded system but naturally RTOS will be used in the real-time system which will need to perform many functions.
Real-time System: System which can provide the output in a definite/predicted amount of time. this does not mean the real-time systems are faster.
Difference between both :
1.normal Embedded systems are not Real-Time System
2. Systems with RTOS are real-time systems.

Suspend and Resume thread (Windows, C)

I'm currently developing a heavily multi-threaded application, dealing with lots of small data batch to process.
The problem with it is that too many threads are being spawns, which slows down the system considerably. In order to avoid that, I've got a table of Handles which limits the number of concurrent threads. Then I "WaitForMultipleObjects", and when one slot is being freed, I create a new thread, with its own data batch to handle.
Now, I've got as many threads as I want (typically, one per core). Even then, the load incurred by multi-threading is extremely sensible. The reason for this: the data batch is small, so I'm constantly creating new threads.
The first idea I'm currently implementing is simply to regroup jobs into longer serial lists. Therefore, when I'm creating a new thread, it will have 128 or 512 data batch to handle before being terminated. It works well, but somewhat destroys granularity.
I was asked to look for another scenario: if the problem comes from "creating" threads too often, what about "pausing" them, loading data batch and "resuming" the thread?
Unfortunately, I'm not too successful.
The problem is: when a thread is in "suspend" mode, "WaitForMultipleObjects" does not detect it as available. In fact, I can't efficiently distinguish between an active and suspended thread.
So I've got 2 questions:
How to detect "suspended thread", so that i can load new data into it and resume it?
Is it a good idea? After all, is "CreateThread" really a ressource hog?
Edit
After much testings, here are my findings concerning Thread Pooling and IO Completion Port, both advised in this post.
Thread Pooling is tested using the older version "QueueUserWorkItem".
IO Completion Port requires using CreateIoCompletionPort, GetQueuedCompletionStatus and PostQueuedCompletionStatus;
1) First on performance : Creating many threads is very costly, and both thread pooling and io completion ports are doing a great job to avoid that cost. I am now down to 8-jobs per batch, from an earlier 512-jobs per batch, with no slowdown. This is considerable. Even when going to 1-job per batch, performance impact is less than 5%. Truly remarkable.
From a performance standpoint, QueueUserWorkItem wins, albeit by such a small margin (about 1% better) that it is almost negligible.
2) On usage simplicity :
Regarding starting threads : No question, QueueUserWorkItem is by far the easiest to setup. IO Completion port is heavyweight in comparison.
Regarding ending threads : Win for IO Completion Port.
For some unknown reason, MS provides no function in C to know when all jobs are completed with QueueUserWorkItem. It requires some nasty tricks to successfully implement this basic but critical function. There is no excuse for such a lack of feature.
3) On resource control : Big win for IO Completion Port, which allows to finely tune the number of concurrent threads, while there is no such control with QueueUserWorkItem, which will happily spend all CPU cycles from all available cores. That, in itself, could be a deal breaker for QueueUserWorkItem.
Note that newer version of Completion Port seems to allow that control, but are only available on Windows Vista and later.
4) On compatibility : small win for IO Completion Port, which is available since Windows NT4. QueueUserWorkItem only exists since Windows 2000. This is however good enough. Newer version of Completion Port is a no-go for Windows XP.
As can be guessed, I'm pretty much tied between the 2 solutions. They both answer correctly to my needs.
For a general situation, I suggest I/O Completion Port, mostly for resource control.
On the other hand, QueueUserWorkItem is easier to setup. Quite a pity that it loses most of this simplicity on requiring the programmer to deal alone with end-of-jobs detection.

Instead of implementing your own, consider using CreateThreadpool(). The OS will do the work for you, and you don't have to worry about getting it right.

Yes, there's a fair amount of overhead involved with CreateThread. One solution is to use a thread pool, QueueUserWorkItem. Another is to just start a set of threads and have them retrieve a 'job item' from a thread-safe queue.

If you want to also support Windows XP, you cannot use CreateThreadpool -- otherwise, if Vista and newer is sufficient, Windows thread pools are the easiest way.
If Windows XP support is needed, spawn a number of threads and assign them to an IO completion port, then have each thread block on GetQueuedCompletionStatus(). Completion ports let you post events to the port which will wake exactly one thread per event, and they are very efficient. They use a LIFO strategy on waking threads to keep caches warm, too.
In any case, you will never want to suspend a thread. Never ever. Block, wait, but don't suspend.
The reason is that with suspend you get the problem that you describe, plus you will create deadlocks, e.g. if your thread is within a critical section or mutex. Aside from a debugger, nobody should ever need to suspend a thread.

Best way to ensure accurate timing with C

I am a beginning C programmer (though not a beginning programmer) looking to dive into a project to teach myself C. My project is music-based, and because of this I am curious whether there are any 'best practices' per-se, when it comes to timing functions.
Just to clarify, my project is pretty much an attempt to build some barebones music notation/composition software (remember, emphasis on barebones). I was originally thinking about using OSX as my platform, but I want to do it in C, not Obj-C (though I know it would probably be easier...CoreAudio looked like a pretty powerful tool for this kind of stuff). So even though I don't have to build OSX apps in Obj-C, I will probably end up building this on a linux system (probably debian...).
Thanks everyone, for your great answers.

There are two accurate methods for timing functions:
Single process execution.
Timer event handler / callback
Single Process Execution
Most modern computers execute more than one program simultaneously. Actually, they execute pieces of many programs, swapping them out based on priorities and other metrics to look like more than one program is executing at the same time. This overhead effects timing in programs. Either the program gets delayed in reading the time or the OS gets delayed in setting its own time variables.
The solution in this case is to eliminate as many tasks from running. The ideal environment is for best accuracy is to have your program as the sole program running. Some OSes provide API for superuser applications to block all other programs or kill them.
Timer event handling / callback
Since the OS can't be trusted to execute your program with high precision, most OS's will provide Timer APIs. Many of these APIs include the ability to call one of your functions when the timer expires. This is known as a callback function. Other OS's may send a message or generate an event when the timer expires. These fall under the class of timer handlers. The callback process has less overhead than the handlers and thus is more accurate.
Music Hardware
Although you may have your program send music to the speakers, many computers now have separate processors that play music. This frees up the main processor and provides more continuous notes, rather than sounds separated by silent gaps due to platform overhead of your program send the next sounds to the speaker.
A quality music processor has at least these to functions:
Start Playing
End Music Notification
Start Playing
This is the function where you tell the music processor where your data is and the size of the data. The processor will start playing the music.
End Music Notification
You provide the processor with a pointer to a function that it will call when the music data has been processed. Nice processors will call the function early so there will be no gaps in the sounds while reloading.
All of this is platform dependent and may not be standard across platforms.
Hope this helps.

This is quite a vast area, and, depending on exactly what you want to do, potentially very difficult.
You don't give much away by saying your project is "music based".
Is it a musical score typesetting program?
Is it processing audio?
Is it filtering MIDI data?
Is it sequencing MIDI data?
Is it generating audio from MIDI data
Does it only perform playback?
Does it need to operate in a real time environment?
Your question though hints at real time operation, so in that case...
The general rule when working in a real time environment is don't do anything which may block the real time thread. This includes:
Calling free/malloc/calloc/etc (dynamic memory allocation/deallocation).
File I/O.any
Use of spinlocks/semaphores/mutexes upon threads.
Calls to GUI code.
Calls to printf.
Bearing these considerations in mind for a real time music application, you're going to have to learn how to do multi-threading in C and how to pass data from the UI/GUI thread to the real time thread WITHOUT breaking ANY of the above restrictions.
For an open source real time audio (and MIDI) (routing) server take a look at http://jackaudio.org

gettimeofday() is the best for wall clock time. getrusage() is the best for CPU time, although it may not be portable. clock() is more portable for CPU timing, but it may have integer overflow.

This is pretty system-dependent. What OS are you using?
You can take a look at gettimeofday() for fairly high granularity. It should work ok if you just need to read time once in awhile.
SIGALRM/setitimer can be used to receive an interrupt periodically. Additionally, some systems have higher level libraries for dealing with time.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight