INTEGRITY RTOS latency - benchmarking

I survey the RTOS for evaluation. Does anyone have any benchmark about interrupt response latency on INTEGRITY RTOS? I had found some benchmark on VxWorks, PREEMT_RT Linux, RTAI, Neutrino,...
http://wiki.csie.ncku.edu.tw/embedded/xenomai/rtlws_paper.pdf
https://content.sciendo.com/configurable/contentpage/journals$002fcjece$002f11$002f2$002farticle-p35.xml
Does anyone have similar benchmark report on INTEGRITY RTOS?

Related

Profiling cache coherence latency

Is there a tool that makes it possible to monitor the time spent on managing cache coherence by MESIF protocol on Skylake servers or its successors for Linux OS? I am also interested in programmatic ways in C if possible.

Do efficiency cores support the same instructions as performance cores?

When writing a program that requires high computational performance, it is often required that multiple threads, SIMD vectorization, or other extensions are required. One can query the CPU using CPUID to find out what instruction set it supports. However, since the programmer has no control over which cores are actually executing the different threads, it could be a problem if different cores support different instruction sets.
If one queries the CPU at the start of the program, is it safe to assume all threads will support the same instruction set? If not, then does this break programs that assume they do all support the same instructions or are the CPUs clever enough to realize they shouldn't use those cores?
Does one need to query CPUID on each thread separately?
Is there any way a program can avoid running on E-cores?
If the instruction sets are the same, then where is the 'Efficiency'? Is it with less cache, lower clock speed, or something else?
This question is posed out of curiosity, but the answers may affect how I write programs in the future. I would appreciate any informed comments on these questions but please don't just share your thoughts and opinions on how you think it works if you don't know with high confidence. Thanks.
I have only tried to find information on the internet, but found nothing of sufficiently low level to answer these questions adequately.
Do efficiency cores support the same instructions as performance cores?
Yes (for Intel's Alder lake, but also for big.LITTLE ARM).
For Alder Lake; operating systems were "deemed unable" to handle heterogeneous CPUs; so Intel nerfed existing support for extensions that already existed in performance cores (primarily AVX-512) to match the features present in the efficiency cores.
Sadly, supporting heterogeneous CPU isn't actually hard in some cases (e.g. hypervisors that don't give all CPUs to a single guest) and is solvable in the general case; and failing to provide a way to re-enable disabled extensions (if an OS supports heterogeneous CPUs) prevents an OS from trying to support heterogeneous CPUs in future; essentially turning a temporary solution into a permanent problem.
Does one need to query CPUID on each thread separately?
Not for the purpose of determining feature availability. If you have highly optimized code (e.g. code tuned differently for different CPU types) you might still want to (even though it's not a strict need); but will also need to pin the thread to a specific CPU or group of CPUs.
Is there any way a program can avoid running on E-cores?
Potentially, via. CPU affinity. Typically it just makes things worse though (better to run on an E core than to not run at all because P cores are already busy).
If the instruction sets are the same, then where is the 'Efficiency'? Is it with less cache, lower clock speed, or something else?
Lower clock, shorter pipeline, less aggressive speculative execution, ...

Can the kernel set the interval of the "hardware timer" of the CPU, and does the CPU have a dedicated hardware timer for scheduling?

Based on my understanding, the CPU has a "hardware timer" that fires an interrupt when its interval expires.
The kernel uses this hardware timer to implement the scheduling mechanism for the processes, so if the hardware timer fires an interrupt with the number of 123, the kernel will map this interrupt number to an interrupt handler that executes the scheduler code (which will decide which process to execute next).
I have two questions:
Can the kernel set the interval of the hardware timer, or is the interval a fixed number that can't be changed programmatically?
Does the CPU have a dedicated hardware timer for scheduling or is there many hardware timers, and the kernel can choose whichever timer it wants to use for scheduling?
Edit: The hardware architecture I am more interested in is a PC, but I would like to know if other architectures (for example: a mobile phone, a raspberry PI, etc.) works in a similar way.
Details are hardware specific (might be different with various motherboards, chipsets, processors; read about SouthBridge). Read about High Precision Event Timer (and APIC).
See also OSDEV wiki, notably Programmable Interval Timer.
(so the answer is usually yes to both questions)
From early on, IBM-compatible PCs had PITs (Programmable Interval Timers): IBM PC and IBM PC XT had the Intel 8253, the IBM PC AT introduced the Intel 8254.
From the IBM PC Technical Reference from April 1984, page 1-11:
System Timers
Three programmable timer/counters are used by the system as follows: Channel 0 is a general-purpose timer providing a constant time base for implementing a time-of-day clock, Channel 1 times and requests refresh cycles from the Direct Memory Access (DMA) channel, and Channel 2 supports the tone generation for the speaker. [...]
Channel 0 is exactly the "constant time base," the "interval" you are asking for. And, to answer your 1st question, it is changeable; it is the Programmable Interval Timer.
However, the CPU built into the original IBM PC was the Intel 8088, basically an Intel 8086 with an 8-bit data bus. Real Mode was the state of the art back then; Protected Mode was introduced some years later with the Intel 80286, so effective multitasking, let alone preemptive multitasking or multithreading, were of no concern in those days when DOS reigned the market.
Fast-forwarding to the IBM PC AT, the world was blessed with a Protected Mode-capable CPU, the Intel 80286, and the Intel 8254 was introduced, a "[...] superset of the 8253." (from the 8254 PIT datasheet). If you really want an in-depth understanding of the PITs, read the 8253/8254 datasheets linked at the bottom. It might also be worth looking at Linux. Since the latest kernels are way too complicated to really understand the particular parts in a matter of twenty minutes, I suggest you look at Linux 0.01, the very first release. _timer_interrupt in kernel/system_calls.s might be interesting and from there you can go wherever you want.
Regarding your 2nd question: there are multiple timer sources, but only one is suitable for interval timing, that is, channel 0. IBM-compatibles still comply with the system timer layout shown above. They retain the same functionality, but might add more on top of that or change how the hardware works and how it's packaged. Nowadays, additional timers do exist like high-resolution timers, but using them for interrupt timing instead would break compatibility.
Intel 8253 Datasheet
Intel 8254 Datasheet
IBM PC Technical Reference
IBM PC AT Technical Reference
Can the kernel set the interval of the hardware timer, or is the interval a fixed number that can't be changed programmatically?
Your questions are ENTIRELY processor specific. Some processors have controllable timers. Others have timers that go off at fixed intervals. Most processors you are likely to encounter have adjustable timers, however.
Does the CPU have a dedicated hardware timer for scheduling or is there many hardware timers, and the kernel can choose whichever timer it wants to use for scheduling?
Some processors have only one timer. Most processors these days have multiple timers.

Is the shared L2 cache in multicore processors multiported? [duplicate]

The Intel core i7 has per-core L1 and L2 caches, and a large shared L3 cache. I need to know what kind of an interconnect connects the multiple L2s to the single L3. I am a student, and need to write a rough behavioral model of the cache subsystem.
Is it a crossbar? A single bus? a ring? The references I came across mention structural details of the caches, but none of them mention what kind of on-chip interconnect exists.
Thanks,
-neha
Modern i7's use a ring. From Tom's Hardware:
Earlier this year, I had the chance to talk to Sailesh Kottapalli, a
senior principle engineer at Intel, who explained that he’d seen
sustained bandwidth close to 300 GB/s from the Xeon 7500-series’ LLC,
enabled by the ring bus. Additionally, Intel confirmed at IDF that
every one of its products currently in development employs the ring
bus.
Your model will be very rough, but you may be able to glean more information from public information on i7 performance counters pertaining to the L3.

What's a good system test for keeping a deadline?

Reading about RTOS, the characteristic of a "hard" RTOS is that it can keep a deadline deterministically but how do we test or prove that the system actually fulfils the requirements?
The MicroC/OS II RTOS is characterized as a hard RTOS but how can I validate that claim? If I have some C code and ISR for my FPGA that can run C programs and make context switch between threads with semaphores similar to what an RTOS does, how can I know whether the OS / RTOS is "hard" or "soft" RTOS?
Can it depend on the application and must it have a timer and therefore using the builtin hardware timer (e.g. the Altera DE2 has a 50 Mhz oscillator) with hardware interrupts is preferred, and then we just test whether threads and processes can be scheduled according to a deadline and we then check if the deadline was met?
Or is there some general practice to what must be included to make the difference between operating system, real-time operating system, and hard and soft RTOS?
Is there some "typical test" with a typical requirement for the label "hard RTOS" ?
It is hard to answer this question, because your premise is wrong.
A system classified as hard realtime is distinguished from a soft realtime system only through the severity of a missed deadline. In hard RT, a missed deadline is classified as a system failure, which may or may not cause harm to hardware and people, while soft realtime usually means that a missed deadline only degrades system performance, but does not bring it to a grinding halt.
A typical example for a hard RT system would be a watchdog that shuts down a system on overheating - if it fails to meet its deadline, the system breaks. Also, general safety-related systems in power plants, or airplanes fall in this category.
A Soft RT example would be video streaming, where a missed deadline causes degraded visual quality or stuttering, but does not necessarily cause a failure of the system.
Long story short, hard and soft RT are characteristics of complete software systems, measured by their specifications and fault models. So typically, it is the application running on the operating system that fits the hard/soft RT criteria, the OS merely provides interfaces with predictable timing behaviour, that allow the application to make timing assumptions.

Resources