using a timer for reading run time data

using a timer for reading run time data - timer

I want to use a timer to read the data from a simulink block to the workspace during simulation.
I made a simple mdl model composed of a clock connected to a scope.
Then I wrote this simple code:
t=timer('period', 1, 'taskstoexecute', 10, 'executionmode', 'fixedrate');
t.Timerfcn={#TimeStep};
start(t)
function time = TimeStep (~,~)
load_system('mymodel');
set_param('mymodel','SimulationCommand','start');
block='mymodel/Clock';
rto=get_param(block,'runtimeObject');
time=rto.OutputPort(1).Data;
disp(time);
The problem is that when I run the code for simulation time 10, it shows me "0" in work space and repeat it ten times. I assume that it should show me the time from 1 to 10. I have also modifies the solver to a discrete solver with time step=1.
The other thing I do not understand is that when I put a ramp function instead of the clock and change it to:
block='mymodel/Ramp';'
then I receive an error of "too many inputs".
I would appreciate any help.

You have two things that count time and seem to think that one of them is controlling the time in the other. It isn't.
More specifically, you have
A 'Timer' in MATLAB that you have asked to run a certain piece of code once per second over 10 seconds. (Both times are measured in wall clock time.)
Some MATLAB code that loads a Simulink model (if it isn't already loaded); starts the model (if it isn't already started); and gets the value on the output of a Clock block that is in the model (it does this only once each time the code is executed). As with every Simulink model, it will execute as fast as it can until the simulation end time is reached (or something else stops it).
So, in your case, each time the Timer executes, the simulation is started, the value at the output of the clock is obtained/printed (which since it happens very quickly after the start of the model it prints out that the simulation time is 0); and then (because you have a very simple simulation that takes no time at all to finish) the simulation terminates.
The above happens 10 times, each time printing the value of the clock at the start of the simulation, i.e. 0.
To see other values you need to make your simulation run longer - taking at least 1 second of wall clock time. For instance, if you change the solver to fixed step and put in a very small step size, something like 0.000001, then the simulation will probably take several seconds (of wall clock time) to execute.
Now you should see the Timer print different times as sometimes the model will still be executing when the code is called (1 second of wall clock time later).
But fundamentally you need to understand that the Timer is not controlling, and is not dependent upon, the simulation time, and vice-versa.
(I'm not sure about the issue with using Ramp but suspect it's got to with Clock being a fundamental block while Ramp is a masked subsystem.)

Related

Is it acceptable to measure time elapsed by iteratively blocking a thread for a fixed period and then multiply said period by the loop count?

I work for a company that produces automatic machines, and I help maintain their software that controls the machines. The software runs on a real-time operating system, and consists of multiple threads running concurrently. The code bases are legacy, and have substantial technical debts. Among all the issues that the code bases exhibit, one stands out as being rather bizarre to me; most of the timing algorithms that involve the computation of time elapsed to realize common timed features such as timeouts, delays, recording time spent in a particular state, and etc., basically take the following form:
unsigned int shouldContinue = 1;
unsigned int blockDuration = 1; // Let's say 1 millisecond.
unsigned int loopCount = 0;
unsigned int elapsedTime = 0;
while (shouldContinue)
{
.
. // a bunch of statements, selections and function calls
.
blockingSystemCall(blockDuration);
.
. // a bunch of statements, selections and function calls
.
loopCount++;
elapsedTime = loopCount * blockDuration;
}
The blockingSystemCall function can be any operating system's API that suspends the current thread for the specified blockDuration. The elapsedTime variable is subsequently computed by basically multiplying loopCount by blockDuration or by any equivalent algorithm.
To me, this kind of timing algorithm is wrong, and is not acceptable under most circumstances. All the instructions in the loop, including the condition of the loop, are executed sequentially, and each instruction requires measurable CPU time to execute. Therefore, the actual time elapsed is strictly greater than the value of elapsedTime in any given instance after the loop starts. Consequently, suppose the CPU time required to execute all the statements in the loop, denoted by d, is constant. Then, elapsedTime lags behind the actual time elapsed by loopCount • d for any loopCount > 0; that is, the deviation grows according to an arithmetic progression. This sets the lower bound of the deviation because, in reality, there will be additional delays caused by thread scheduling and time slicing, depending on other factors.
In fact, not too long ago, while testing a new data-driven predictive maintenance feature which relies on the operation time of a machine, we discovered that the operation time reported by the software lagged behind that of a standard reference clock by a whopping three hours after the machine was in continuous operation for just over two days. It was through this test that I discovered the algorithm outlined above, which I swiftly determined to be the root cause.
Coming from a background where I used to implement timing algorithms on bare-metal systems using timer interrupts, which allows the CPU to carry on with the execution of the business logic while the timer process runs in parallel, it was shocking for me to have discovered that the algorithm outlined in the introduction is used in the industry to compute elapsed time, even more so when a typical operating system already encapsulates the timer functions in the form of various easy-to-use public APIs, liberating the programmer from the hassle of configuring a timer via hardware registers, raising events via interrupt service routines, etc.
The kind of timing algorithm as illustrated in the skeleton code above is found in at least two code bases independently developed by two distinct software engineering teams from two subsidiary companies located in two different cities, albeit within the same state. This makes me wonder whether it is how things are normally done in the industry or it is just an isolated case and is not widespread.
So, the question is, is the algorithm shown above common or acceptable in calculating elapsed time, given that the underlying operating system already provides highly optimized time-management system calls that can be used right out of the box to accurately measure elapsed time or even used as basic building blocks for creating higher-level timing facilities that provide more intuitive methods similar to, e.g., the Timer class in C#?

You're right that calculating elapsed time that way is inaccurate -- since it assumes that the blocking call will take exactly the amount of time indicated, and that everything that happens outside of the blocking system call will take no time at all, which would only be true on an infinitely-fast machine. Since actual machines are not infinitely fast, the elapsed-time calculated this way will always be somewhat less than the actual elapsed time.
As to whether that's acceptable, it's going to depend on how much timing accuracy your program needs. If it's just doing a rough estimate to make sure a function doesn't run for "too long", this might be okay. OTOH if it is trying for accuracy (and in particular accuracy over a long period of time), then this approach won't provide that.
FWIW the more common (and more accurate) way to measure elapsed time would be something like this:
const unsigned int startTime = current_clock_time();
while (shouldContinue)
{
loopCount++;
elapsedTime = current_clock_time() - startTime;
}
This has the advantage of not "drifting away" from the accurate value over time, but it does assume that you have a current_clock_time() type of function available, and that it's acceptable to call it within the loop. (If current_clock_time() is very expensive, or doesn't provide some real-time performance guarantees that the calling routine requires, that might be a reason not to do it this way)

I don't think these loops do what you think they do.
In a RTOS, the purpose of a loop like this is usually to perform a task at regular intervals.
blockingSystemCall(N) probably does not just sleep for N milliseconds like you think it does. It probably sleeps until N milliseconds after the last time your thread woke up.
More accurately, all the sleeps your thread has performed since starting are added to the thread start time to get the time at which the OS will try to wake the thread up. If your thread woke up due to an I/O event, then the last one of those times could be used instead of the thread start time. The point is that the inaccuracies in all these start times are corrected, so your thread wakes up at regular intervals and the elapsed time measurement is perfectly accurate according to the RTOS master clock.
There could also be very good reasons for measuring elapsed time by the RTOS master clock instead of a more accurate wall clock time, in addition to simplicity. This is because all of the guarantees that an RTOS provides (which is the reason you are using a RTOS in the first place) are provided in that time scale. The amount of time taken by one task can affect the amount of time you are guaranteed to have available for other tasks, as measured by this clock.
It may or may not be a problem that your RTOS master clock runs slow by 3 hours every 2 days...

Does FreeRTOS guarantee the first timer tick to be exactly 1ms?

I am implementing software watchdogs to ensure that a 1kHz task is executing within its allotted deadline (i.e. 1ms). But I am wondering if there's exactly 1ms between the 1kHz starting and tick 1.
From my understanding, this is what happens when FreeRTOS starts
vPortSetupTimerInterrupt(); // Tick 0 starts
...
prvPortStartFirstTick(); // Context switch
// After the context switch, the 1kHz task starts
Between tick 0 and tick 1, the 1kHz task doesn't get a full 1ms to do useful work because some time was spent on calling vPortSetupTimerInterrupt() to prvPortStartFirstTick(). Is this correct? And if so, is this a cause for concern? Or is the extra delay time so short that it is negligible?
I am developing on ARM Cortex M4 (STM32F302 series).

You are correct in that executing from vPortSetupTimerInterrupt() to prvPortStartFirstTick() (not sure what that function is, not a FreeRTOS supplied one) does take some time - as does executing any instruction. If it is a concern or not depends on your application - but if it is a concern then you are probably not going to every get exactly 1ms to the first tick. Think about doing it the other way around - say the first task starts before the timer is started - then the first task will have to be the one that starts the time - so again you are going to spend some time in the task doing something other than what you want the task to be doing.

Simulink Clock Synchronisation

In simulink if I run any simulation, it follows an internal clock. I want to run these simulations in real time.
Example: if I use a PWM pulse generator and give it a sample time of 1 second, I expect that it will generate a sample at the end of every one second real-time but the simulink clock moves very very fast (every one second real time corresponds to about 1e6 seconds smulink time). Is there any way to synchronize the simulink clock with the real time clock?
I actually need to give input to hardware at the end of every 2 seconds in a loop and that is why this kind of synchronization is needed.

Firstly note that Simulink is not a real-time environment, so anything you do related to this is not guaranteed to be anything but approximate in the timing achieved.
If your model runs faster than real-time then it can be paused at each time step until clock-time and simulation time are (approximately) equal. This is achieved by writing an S-Function.
There are several examples of doing this. For instance here or here.

Determining cause of delay/pause - kernel scheduler etc

System is an embedded Linux/Busybox core on a small embedded board with a web server (Boa) running.
We are seeing some high latency in responses from the web server - sometimes >500ms for no good reason, so I've been digging...
On liberally scattering debug prints throughout the code it seems to come down to the entire process just... stopping for a bit, in a way which I can only assume must be the process/thread being interrupted by another process.
Using print statements and clock_gettime() to calculate time taken to process a request, I can see the code reach the bottom of a while() loop (parsing input), print something like "Time so far: 5ms" and then the next line at the top of the loop will print "Time so far: 350ms" - and all that the code does between the bottom of the loop and the 1st print back at the top is a basic check along the lines of while(position < end), it has nothing complicated that could hold it up.
There's no IO blocking, the data it's parsing has all arrived already, and it's not making any external calls or wandering off into complex functions.
I then looked into whether the kernel scheduler (CFS in our case) might be holding things up, adding calls to clock() (processor time rather than wall-clock) and again calculating time differences Vs processor time used I can see that the wall-clock time delay may run beyond 300ms from one loop to the next, but the reported processor time taken (which seems to have a ~10ms resolution) is more like 50ms.
So, that suggests the task scheduler is holding the process up for hundreds of milliseconds at a time. I've checked the scheduler granularity and max delay and it's nowhere near 100ms, scheduler latency is set at 6ms for example.
Any advice on what I can do now to try and track down the problem - identifying processes which could hog the CPU for >100ms, measuring/tracking what the scheduler is doing, etc.?

First you should try and run your program using strace to see if there are any system calls holding things up.
If that is ambiguous or does not help I would suggest you try and profile the kernel. You could try OProfile
This will create a call graph that you can analyze and see what is happening.

Generic Microcontroller Delay Function

Come someone please tell me how this function works? I'm using it in code and have an idea how it works, but I'm not 100% sure exactly. I understand the concept of an input variable N incrementing down, but how the heck does it work? Also, if I am using it repeatedly in my main() for different delays (different iputs for N), then do I have to "zero" the function if I used it somewhere else?
Reference: MILLISEC is a constant defined by Fcy/10000, or system clock/10000.
Thanks in advance.
// DelayNmSec() gives a 1mS to 65.5 Seconds delay
/* Note that FCY is used in the computation. Please make the necessary
Changes(PLLx4 or PLLx8 etc) to compute the right FCY as in the define
statement above. */
void DelayNmSec(unsigned int N)
{
unsigned int j;
while(N--)
for(j=0;j < MILLISEC;j++);
}

This is referred to as busy waiting, a concept that just burns some CPU cycles thus "waiting" by keeping the CPU "busy" doing empty loops. You don't need to reset the function, it will do the same if called repeatedly.
If you call it with N=3, it will repeat the while loop 3 times, every time counting with j from 0 to MILLISEC, which is supposedly a constant that depends on the CPU clock.

The original author of the code have timed and looked at the assembler generated to get the exact number of instructions executed per Millisecond, and have configured a constant MILLISEC to match that for the for loop as a busy-wait.
The input parameter N is then simply the number of milliseconds the caller want to wait and the number of times the for-loop is executed.
The code will break if
used on a different or faster micro controller (depending on how Fcy is maintained), or
the optimization level on the C compiler is changed, or
c-compiler version is changed (as it may generate different code)
so, if the guy who wrote it is clever, there may be a calibration program which defines and configures the MILLISEC constant.

This is what is known as a busy wait in which the time taken for a particular computation is used as a counter to cause a delay.
This approach does have problems in that on different processors with different speeds, the computation needs to be adjusted. Old games used this approach and I remember a simulation using this busy wait approach that targeted an old 8086 type of processor to cause an animation to move smoothly. When the game was used on a Pentium processor PC, instead of the rocket majestically rising up the screen over several seconds, the entire animation flashed before your eyes so fast that it was difficult to see what the animation was.
This sort of busy wait means that in the thread running, the thread is sitting in a computation loop counting down for the number of milliseconds. The result is that the thread does not do anything else other than counting down.
If the operating system is not a preemptive multi-tasking OS, then nothing else will run until the count down completes which may cause problems in other threads and tasks.
If the operating system is preemptive multi-tasking the resulting delays will have a variability as control is switched to some other thread for some period of time before switching back.
This approach is normally used for small pieces of software on dedicated processors where a computation has a known amount of time and where having the processor dedicated to the countdown does not impact other parts of the software. An example might be a small sensor that performs a reading to collect a data sample then does this kind of busy loop before doing the next read to collect the next data sample.