CC3200 RTOS MultiThreading - c

I'm new to the RTOS method of creating tasks. Previously, I was using a pthread on the Raspberry Pi, which enable me to run 2 tasks simultaneously at the same time,
1) to send data through sockets every 2 seconds
2) to receive data through sockets whenever data is being sent from client
I'd like to do the same thing, but CC3200 is currently running on RTOS and I read that you can only pause tasks and run another one, but I need to have both running at the same time.
I tried to do this:
osi_TaskCreate( WlanAPMode, \
(const signed char*)"wireless LAN in AP mode", \
OSI_STACK_SIZE, NULL, 1, NULL );
osi_TaskCreate( SendAnalogInputToClient, "Analog Input to Client",\
OSI_STACK_SIZE, NULL, 1, NULL );
osi_start();
But it seems that my 2nd task isn't running. Anyone has experience with this?

I think what you're misunderstanding is the general concept of concurrency and execution of tasks/threads.
Both Raspberry Pi and CC3200 have a single core processor - that is, their processing units are capable of executing only one instruction at a time. Unlike modern computers which may have several cores and sometimes can execute twice as much threads by using hyperthreading, the single core processors do not allow to execute more than one instruction at any given time due to their design.
What you’ve done using Raspberry Pi was simply running multiple “threads” that the OS (Raspbian I assume) executed “concurrently”. I put those in quotation marks as they were not real threads and the concurrency wasn’t real. You just had an impression of concurrency, as both programs shared processor time, though the core executed only one program at a time. It is the OS job to switch between the two “threads” and make an impression of concurrency. It is a so called context switch when a processing unit switches to another task and loads its context to its registers.
The same is happening in the case of CC3200 and SYS/BIOS TI-RTOS. Only one task is executed at any given moment. It’s the programmer (or architect) job to design the system so that all the tasks are given as much processing time as they need to execute properly.
Your code isn’t really helpful here, as you’re starting the tasks correctly. It’s the task source codes that are the problem. I assume your first task never sleeps/delays/blocks and it consumes 100% of the processing time. That’s why your second task never gets a chance to run.
This is a good place to start: http://processors.wiki.ti.com/index.php/SYS/BIOS_Online_Training

Related

Effective software scheduling

For example in a code like below
while(1){
task1();
task2();
}
there should be cooperation between task1() and task2() which are executed in rr fashion. However, if task1() is implemented as follows
task1(){
while(1);
}
Is there a way to build a scheduler which avoid monopolization of resources by task1() by only relying on software (for example switching tasks each 500 ms)?
Assume to have available only plain C/Assembly, and not rely on external scheduler/OS.
Is there a way to build a scheduler which avoid monopolization of resources by task1() by only relying on software (for example switching tasks each 500 ms)?
Yes it's possible; but it probably isn't possible in plain C because (at a minimum) you'd need to switch between different stacks during task switches.
However, you should know that just switching tasks every 500 ms is very "not effective". Specifically; when one task has to wait for anything (time delay, data received from network, user input, data to be fetched from disk, a mutex, ...) you want to keep the CPU busy by switching to a different task (if there are any other tasks).
To do that; you either need fully asynchronous interfaces for everything (which C does not have), or you need to control all of the code (e.g. write an OS).
Of course the majority of task switches are caused by "task has to wait for something" or "something task was waiting for occurred"; and switching tasks every 500 ms is relatively irrelevant (it only matters for rare tasks that don't do any IO), and even when it is relevant it's a bad idea (in a "10 half finished jobs vs. 5 finished jobs and 5 unstarted jobs" way).
one easy way is to use
a hardware timer,
a queue of tasks to run,
a scheduler
and a dispatcher.
a pre-allocated stack for each task
the timer interrupt handler triggers the scheduler. The scheduler determines which task is to run next. The scheduler triggers the dispatcher.
The dispatcher performs a 'context' switch between the prior running task and the next task to run, placing the prior running task back into the queue, then restores the 'context' of the next task to run, then
transfers execution control to the 'next' task,
The queue is composed of control blocks. The control blocks contain a copy of all the registers and the address of the entry point for the task

why there is many schedule() call in different place?

I am tracing Linux 0.11
https://mirrors.edge.kernel.org/pub/linux/kernel/Historic/old-versions/
I see there are many schedule() call in different place, not just the one inside do_timer().
Few questions here:
do_timer() (#sched.c) will be called every time the timer timeout? This timer is based on an x86 interrupt call?
Since there are many schedule() calls outside of do_timer(), can I say that is kind of preempting? or what's the purpose?
Any operation that blocks calls schedule() to yield control.
Some tasks' state has changed, it needs to be updated in schedule().
Some tasks' are working and still a lot of work, schedule() for balance.
Since there are many schedule() calls outside of do_timer(), can I say that is kind of preempting? or what's the purpose?
For a real OS; most task switches occur because a task blocks waiting for something (user input, network packet, disk IO, ..) or a task unblocks because something it was waiting for happened (and the unblocked task has higher priority and preempts the currently running lower priority task).
The whole "task switch caused by timer IRQ" thing is mostly just a fallback to guard against malicious CPU hogs (denial of service attacks); and for normal software under normal conditions you could disable it (delete the schedule() from the timer IRQ handler) and nobody would notice or care. Note: Some people will say it's also for "non-malicious" CPU bound tasks, but CPU bound tasks are relatively rare, and (ignoring the fact that the Linux scheduler has never been good for task priorities) for CPU bound tasks it's better to rely on an effective system of task priorities (e.g. give the CPU bound tasks a low priority so that almost everything will preempt them).
Also note that various courses on OS theory start with "so simple it never actually happens in practice" concepts, which is almost always a pure round-robin scheduler with tasks that never block (often with "Hey, we can accurately predict the future and know exactly how long each task will run for" nonsense), which is mostly fine as a first step (in a "learn to walk before you run" way) but sucks big salty dog balls if it's not followed by more realistic and more complex concepts (better scheduling algorithms, task priorities, multiple simultaneous scheduling algorithms/"scheduler policies", multi-CPU, interactive/latency sensitive tasks, ..) because it leaves the student/victim with little more than misinformation (e.g. the ever re-occurring "all tasks switches are caused by timer IRQ" misconception).
do_timer() (#sched.c) will be called every time the timer timeout? This timer is based on an x86 interrupt call?
I'm guessing that the timer was the raw PIT chip's IRQ (given that Linux version 0.11 was "absolute beginner developer with no intention of making it portable" historical memorabilia from before thousands of volunteers fixed half of the worst parts).
Also don't forget that the scheduler uses time for two different things - the "current task has used too much CPU time" thing that almost never matters, and figuring out when tasks that are blocked/sleeping (e.g. because they called sleep()) should unblock/wake up. The do_timer() might be for either of these things and might be for both (I don't know without looking at it).

Determining cause of delay/pause - kernel scheduler etc

System is an embedded Linux/Busybox core on a small embedded board with a web server (Boa) running.
We are seeing some high latency in responses from the web server - sometimes >500ms for no good reason, so I've been digging...
On liberally scattering debug prints throughout the code it seems to come down to the entire process just... stopping for a bit, in a way which I can only assume must be the process/thread being interrupted by another process.
Using print statements and clock_gettime() to calculate time taken to process a request, I can see the code reach the bottom of a while() loop (parsing input), print something like "Time so far: 5ms" and then the next line at the top of the loop will print "Time so far: 350ms" - and all that the code does between the bottom of the loop and the 1st print back at the top is a basic check along the lines of while(position < end), it has nothing complicated that could hold it up.
There's no IO blocking, the data it's parsing has all arrived already, and it's not making any external calls or wandering off into complex functions.
I then looked into whether the kernel scheduler (CFS in our case) might be holding things up, adding calls to clock() (processor time rather than wall-clock) and again calculating time differences Vs processor time used I can see that the wall-clock time delay may run beyond 300ms from one loop to the next, but the reported processor time taken (which seems to have a ~10ms resolution) is more like 50ms.
So, that suggests the task scheduler is holding the process up for hundreds of milliseconds at a time. I've checked the scheduler granularity and max delay and it's nowhere near 100ms, scheduler latency is set at 6ms for example.
Any advice on what I can do now to try and track down the problem - identifying processes which could hog the CPU for >100ms, measuring/tracking what the scheduler is doing, etc.?
First you should try and run your program using strace to see if there are any system calls holding things up.
If that is ambiguous or does not help I would suggest you try and profile the kernel. You could try OProfile
This will create a call graph that you can analyze and see what is happening.

How do I decide between taskSpawn(), period(), and watchdogs?

We are using embedded C for the VxWorks real time operating system.
Currently, all of our UDP connections are started with TaskSpawn().
This routine creates and activates a new task with a specified
priority and options and returns a system-assigned ID.
We specify the task size, a priority, and pass in an entry point.
These are continuous connections, and thus every entry point contains an infinite loop where we delay before the next iteration.
Then I discovered period().
period spawns a task to call a function periodically.
Period sounds like what we should be using instead, but I can't find any information on when you would prefer this function over TaskSpawn. Period also doesn't allow specifying the task size or the priority, so how is it decided? Is the task size dynamic? What will the priority be?
There are also watchdogs.
Any task may create a watchdog timer and use it to run a specified
routine in the context of the system-clock ISR, after a specified
delay.
Again, this seems to be in line with the goal of processing data at a particular rate. Which do I choose when a task must continuously execute code at the same rate (i.e. in real time)?
What are the differences between these 3 methods?
Here is a little clarification:
taskSpawn(..) creates a task with which you're free to do anything with you like.
Watchdogs shall only be used to monitor time constraints. Remember that the callback of the watchdog is executed within the context of the system clock ISR which has many limitations (e.g. free stack size, never use blocking function calls in an ISR, ...). Additionally executing "a lot of code" in the system clock ISR slows down your entire system.
period(..) is intended to be a helper for the VxWorks shell and not to be used by a program.
With that being said your only option is to use taskSpawn(..) unless you're doing some very simple stuff in which case period(..) might be ok to use.
If you need to do things cyclically in a specific time frame you might look at timers or taskDelay(..) in combination with sysClkRateSet(..).
Another option is to create two tasks. One that is setting a semaphore after a specific time intervall and the other "worker" tasks waits for this semaphore to do something. With that approach you separate "timing" from "action" which proved to be benefitial according to my experience. You also might want to monitor excution time of the "worker" task by using a watchdog.

average time between kernel launch and execution?

If I understand correctly, when you launch a CUDA kernel asynchronously, it may begin execution immediately or it may wait for previous asynchronous calls (transfers, kernels, etc) to complete first. (I also understand that kernels can run concurrently in some cases, but I want to ignore that for now).
How can I find out the time between launching a kernel ("queuing") and when it actually begins execution. In fact, I really just want to know the average "queued time" for all launches in a single run of my program (generally in the tens or hundreds of thousands of kernel launches.)
I can easily calculate the average execution time per kernel with events (~500us). I tried to simulate - I dropped the results of CLOCK() every time a kernel is launched, with the idea that I could then determine how long the launch queue was when each kernel was launched. But CLOCK() does not have high enough precision (0.01s) - sometimes as many as 60 kernels appear to be launched at a single time, when of course in reality many are not.
Rather than clock use the QueryPerformanceTimer which counts based on machine clock cycles.
Code for QueryPerformanceTimer
Secondly, the profiling tool (Visual Profiler) only measures serial launches [see page 24] and [see post number 3].
Thus the best option is (1) use QueryPerformanceTimer (or the Visual Profiler) such that you get an accurate measurement of a single launch and (2) use QueryPerformanceTimer to get the timing of multiple launches and observe whether the timing results suggest that asynchronous launching took place.

Resources