STM32 - Read I/O from multiple Tasks - c

I'm using FreeRTOS in my STM32F4 based board, and I read about the communication between tasks with queues and semaphores, easy to understand and apply.
But on the documentation, I don't find any information about if is secure to call the same method from different task, for example:
void DefaultTask(void const * argument)
{
uint8_t pin = 10;
uint16_t analog = ADC_GetAnalog(pin);
uint32_t encoder = Encoder_GetCount(1);
}
void SecondTask(void const * argument)
{
uint8_t pin = 14;
uint16_t analog = ADC_GetAnalog(pin);
uint32_t encoder = Encoder_GetCount(2);
}
The ADC_GetAnalog:
uint16_t ADC_GetAnalog(uint8_t PinNumber)
{
if((PinNumber >=1)&&(PinNumber<=18))
{
return ADC_Pin[PinNumber].AnalogValue;
}
else
return 0;
}
I also have multiple encoders in my system (interrupts that increment/decrement the property CNT of the htim# ), and call the read method in the same line as the ADC, also from different Tasks:
uint32_t Encoder_GetCount(uint8_t encoder_num)
{
volatile __IO uint32_t count = 0;
switch(encoder_num)
{
case 1:
count = htim1.Instance->CNT;
break;
case 2:
count = htim3.Instance->CNT;
break;
case 3:
count = htim5.Instance->CNT;
break;
default:
break;
}
return (uint32_t)count;
}
Today I use this way, but would like to know if it's the best (safer) way!!

From what you provide, it appears your functions that can be called simultaneously are only reading stuff, not writing them. So you are good to go. Even if you were writing stuff, if it is local variables, it is fine (each task will have it own copy)
You will need to care about synchronization when you write global variables, or write stuff to certain peripherals (eg, a serial flash chip, you don't want 2 tasks to use it at the same time). One way to deal with it is simply with semaphores/mutexes, or preferably (if possible) have only 1 task with access to this peripheral, a clean design is key to a maintainable system.

It depends on what function does.
Your functions does not change any global variable so it should be safe to call them from different tasks.
For example, if you would had function which writes to global variable, eg. buffer, second call will overwrite changes made by first call. If buffer is used to send data both task could (depends on timing) send same bytes.

Related

Best way to read from a sensor that doesn't have interrupt pin and requires some time before the measurement is ready

I'm trying to interface a pressure sensor (MS5803-14BA) with my board (NUCLEO-STM32L073RZ).
According to the datasheet (page 3), the pressure sensor requires some milliseconds before the measurement is ready to be read. For my project, I would be interested in the highest resolution that requires around 10 ms for the conversion of the raw data.
Unfortunately, this pressure sensor doesn't have any interrupt pin that can be exploited to see when the measurement is ready, and therefore I temporarily solved the problem putting a delay after the request of new data.
I don't like my current solution, since in those 10 ms I could put the mcu working on something else (I have several other sensors attached to my board), but without any interrupt pin, I'm not sure about what is the best way to solve this problem.
Another solution came into my mind: Using a timer that triggers every say 20 ms and performs the following operations:
1.a Read the current value stored in the registers (discarding the first value)
1.b Ask for a new value
In this way, at the next iteration I would just need to read the value requested at the end of the previous iteration.
What I don't like is that my measurement would be always 20 ms old. Until the delay remains 20 ms, it should be still fine, but if I need to reduce the rate, the "age" of the reading with my solution would increase.
Do you have any other idea about how to deal with this?
Thank you.
Note: Please let me know if you would need to see my current implementation.
How to do high-resolution, timestamp-based, non-blocking, single-threaded cooperative multi-tasking
This isn't a "how to read a sensor" problem, this is a "how to do non-blocking cooperative multi-tasking" problem. Assuming you are running bare-metal (no operating system, such as FreeRTOS), you have two good options.
First, the datasheet shows you need to wait up to 9.04 ms, or 9040 us.
Now, here are your cooperative multi-tasking options:
Send a command to tell the device to do an ADC conversion (ie: to take an analog measurement), then configure a hardware timer to interrupt you exactly 9040 us later. In your ISR you then can either set a flag to tell your main loop to send a read command to read the result, OR you can just send the read command right inside the ISR.
Use non-blocking time-stamp-based cooperative multi-tasking in your main loop. This will likely require a basic state machine. Send the conversion command, then move on, doing other things. When your time stamp indicates it's been long enough, send the read command to read the converted result from the sensor.
Number 1 above is my preferred approach for time-critical tasks. This isn't time-critical, however, and a little jitter won't make any difference, so Number 2 above is my preferred approach for general, bare-metal cooperative multi-tasking, so let's do that.
Here's a sample program to demonstrate the principle of time-stamp-based bare-metal cooperative multi-tasking for your specific case where you need to:
request a data sample (start ADC conversion in your external sensor)
wait 9040 us for the conversion to complete
read in the data sample from your external sensor (now that the ADC conversion is complete)
Code:
enum sensorState_t
{
SENSOR_START_CONVERSION,
SENSOR_WAIT,
SENSOR_GET_CONVERSION
}
int main(void)
{
doSetupStuff();
configureHardwareTimer(); // required for getMicros() to work
while (1)
{
//
// COOPERATIVE TASK #1
// Read the under-water pressure sensor as fast as permitted by the datasheet
//
static sensorState_t sensorState = SENSOR_START_CONVERSION; // initialize state machine
static uint32_t task1_tStart; // us; start time
static uint32_t sensorVal; // the sensor value you are trying to obtain
static bool newSensorVal = false; // set to true whenever a new value arrives
switch (sensorState)
{
case SENSOR_START_CONVERSION:
{
startConversion(); // send command to sensor to start ADC conversion
task1_tStart = getMicros(); // get a microsecond time stamp
sensorState = SENSOR_WAIT; // next state
break;
}
case SENSOR_WAIT:
{
const uint32_t DESIRED_WAIT_TIME = 9040; // us
uint32_t tNow = getMicros();
if (tNow - task1_tStart >= DESIRED_WAIT_TIME)
{
sensorState = SENSOR_GET_CONVERSION; // next state
}
break;
}
case SENSOR_GET_CONVERSION:
{
sensorVal = readConvertedResult(); // send command to read value from the sensor
newSensorVal = true;
sensorState = SENSOR_START_CONVERSION; // next state
break;
}
}
//
// COOPERATIVE TASK #2
// use the under-water pressure sensor data right when it comes in (this will be an event-based task
// whose running frequency depends on the rate of new data coming in, for example)
//
if (newSensorVal == true)
{
newSensorVal = false; // reset this flag
// use the sensorVal data here now for whatever you need it for
}
//
// COOPERATIVE TASK #3
//
//
// COOPERATIVE TASK #4
//
// etc etc
} // end of while (1)
} // end of main
For another really simple timestamp-based multi-tasking example see Arduino's "Blink Without Delay" example here.
General time-stamp-based bare-metal cooperative multi-tasking architecture notes:
Depending on how you do it all, in the end, you basically end up with this type of code layout, which simply runs each task at fixed time intervals. Each task should be non-blocking to ensure it does not conflict with the run intervals of the other tasks. Non-blocking on bare metal means simply "do not use clock-wasting delays, busy-loops, or other types of polling, repeating, counting, or busy delays!". (This is opposed to "blocking" on an operating-system-based (OS-based) system, which means "giving the clock back to the scheduler to let it run another thread while this task 'sleeps'." Remember: bare metal means no operating system!). Instead, if something isn't quite ready to run yet, simply save your state via a state machine, exit this task's code (this is the "cooperative" part, as your task must voluntarily give up the processor by returning), and let another task run!
Here's the basic architecture, showing a simple timestamp-based way to get 3 Tasks to run at independent, fixed frequencies withOUT relying on any interrupts, and with minimal jitter, due to the thorough and methodical approach I take to check the timestamps and update the start time at each run time.
1st, the definition for the main() function and main loop:
int main(void)
{
doSetupStuff();
configureHardwareTimer();
while (1)
{
doTask1();
doTask2();
doTask3();
}
}
2nd, the definitions for the doTask() functions:
// Task 1: Let's run this one at 100 Hz (every 10ms)
void doTask1(void)
{
const uint32_t DT_DESIRED_US = 10000; // 10000us = 10ms, or 100Hz run freq
static uint32_t t_start_us = getMicros();
uint32_t t_now_us = getMicros();
uint32_t dt_us = t_now_us - t_start_us;
// See if it's time to run this Task
if (dt_us >= DT_DESIRED_US)
{
// 1. Add DT_DESIRED_US to t_start_us rather than setting t_start_us to t_now_us (which many
// people do) in order to ***avoid introducing artificial jitter into the timing!***
t_start_us += DT_DESIRED_US;
// 2. Handle edge case where it's already time to run again because just completing one of the main
// "scheduler" loops in the main() function takes longer than DT_DESIRED_US; in other words, here
// we are seeing that t_start_us is lagging too far behind (more than one DT_DESIRED_US time width
// from t_now_us), so we are "fast-forwarding" t_start_us up to the point where it is exactly
// 1 DT_DESIRED_US time width back now, thereby causing this task to instantly run again the
// next time it is called (trying as hard as we can to run at the specified frequency) while
// at the same time protecting t_start_us from lagging farther and farther behind, as that would
// eventually cause buggy and incorrect behavior when the (unsigned) timestamps start to roll over
// back to zero.
dt_us = t_now_us - t_start_us; // calculate new time delta with newly-updated t_start_us
if (dt_us >= DT_DESIRED_US)
{
t_start_us = t_now_us - DT_DESIRED_US;
}
// PERFORM THIS TASK'S OPERATIONS HERE!
}
}
// Task 2: Let's run this one at 1000 Hz (every 1ms)
void doTask2(void)
{
const uint32_t DT_DESIRED_US = 1000; // 1000us = 1ms, or 1000Hz run freq
static uint32_t t_start_us = getMicros();
uint32_t t_now_us = getMicros();
uint32_t dt_us = t_now_us - t_start_us;
// See if it's time to run this Task
if (dt_us >= DT_DESIRED_US)
{
t_start_us += DT_DESIRED_US;
dt_us = t_now_us - t_start_us; // calculate new time delta with newly-updated t_start_us
if (dt_us >= DT_DESIRED_US)
{
t_start_us = t_now_us - DT_DESIRED_US;
}
// PERFORM THIS TASK'S OPERATIONS HERE!
}
}
// Task 3: Let's run this one at 10 Hz (every 100ms)
void doTask3(void)
{
const uint32_t DT_DESIRED_US = 100000; // 100000us = 100ms, or 10Hz run freq
static uint32_t t_start_us = getMicros();
uint32_t t_now_us = getMicros();
uint32_t dt_us = t_now_us - t_start_us;
// See if it's time to run this Task
if (dt_us >= DT_DESIRED_US)
{
t_start_us += DT_DESIRED_US;
dt_us = t_now_us - t_start_us; // calculate new time delta with newly-updated t_start_us
if (dt_us >= DT_DESIRED_US)
{
t_start_us = t_now_us - DT_DESIRED_US;
}
// PERFORM THIS TASK'S OPERATIONS HERE!
}
}
The code above works perfectly but as you can see is pretty redundant and a bit irritating to set up new tasks. This job can be a bit more automated and much much easier to do by simply defining a macro, CREATE_TASK_TIMER(), as follows, to do all of the redundant timing stuff and timestamp variable creation for us:
/// #brief A function-like macro to get a certain set of events to run at a desired, fixed
/// interval period or frequency.
/// #details This is a timestamp-based time polling technique frequently used in bare-metal
/// programming as a basic means of achieving cooperative multi-tasking. Note
/// that getting the timing details right is difficult, hence one reason this macro
/// is so useful. The other reason is that this maro significantly reduces the number of
/// lines of code you need to write to introduce a new timestamp-based cooperative
/// task. The technique used herein achieves a perfect desired period (or freq)
/// on average, as it centers the jitter inherent in any polling technique around
/// the desired time delta set-point, rather than always lagging as many other
/// approaches do.
///
/// USAGE EX:
/// ```
/// // Create a task timer to run at 500 Hz (every 2000 us, or 2 ms; 1/0.002 sec = 500 Hz)
/// const uint32_t PERIOD_US = 2000; // 2000 us pd --> 500 Hz freq
/// bool time_to_run;
/// CREATE_TASK_TIMER(PERIOD_US, time_to_run);
/// if (time_to_run)
/// {
/// run_task_2();
/// }
/// ```
///
/// Source: Gabriel Staples
/// https://stackoverflow.com/questions/50028821/best-way-to-read-from-a-sensors-that-doesnt-have-interrupt-pin-and-require-some/50032992#50032992
/// #param[in] dt_desired_us The desired delta time period, in microseconds; note: pd = 1/freq;
/// the type must be `uint32_t`
/// #param[out] time_to_run A `bool` whose scope will enter *into* the brace-based scope block
/// below; used as an *output* flag to the caller: this variable will
/// be set to true if it is time to run your code, according to the
/// timestamps, and will be set to false otherwise
/// #return NA--this is not a true function
#define CREATE_TASK_TIMER(dt_desired_us, time_to_run) \
{ /* Use scoping braces to allow multiple calls of this macro all in one outer scope while */ \
/* allowing each variable created below to be treated as unique to its own scope */ \
time_to_run = false; \
\
/* set the desired run pd / freq */ \
const uint32_t DT_DESIRED_US = dt_desired_us; \
static uint32_t t_start_us = getMicros(); \
uint32_t t_now_us = getMicros(); \
uint32_t dt_us = t_now_us - t_start_us; \
\
/* See if it's time to run this Task */ \
if (dt_us >= DT_DESIRED_US) \
{ \
/* 1. Add DT_DESIRED_US to t_start_us rather than setting t_start_us to t_now_us (which many */ \
/* people do) in order to ***avoid introducing artificial jitter into the timing!*** */ \
t_start_us += DT_DESIRED_US; \
/* 2. Handle edge case where it's already time to run again because just completing one of the main */ \
/* "scheduler" loops in the main() function takes longer than DT_DESIRED_US; in other words, here */ \
/* we are seeing that t_start_us is lagging too far behind (more than one DT_DESIRED_US time width */ \
/* from t_now_us), so we are "fast-forwarding" t_start_us up to the point where it is exactly */ \
/* 1 DT_DESIRED_US time width back now, thereby causing this task to instantly run again the */ \
/* next time it is called (trying as hard as we can to run at the specified frequency) while */ \
/* at the same time protecting t_start_us from lagging farther and farther behind, as that would */ \
/* eventually cause buggy and incorrect behavior when the (unsigned) timestamps start to roll over */ \
/* back to zero. */ \
dt_us = t_now_us - t_start_us; /* calculate new time delta with newly-updated t_start_us */ \
if (dt_us >= DT_DESIRED_US) \
{ \
t_start_us = t_now_us - DT_DESIRED_US; \
} \
\
time_to_run = true; \
} \
}
Now, there are multiple ways to use it, but for the sake of this demo, in order to keep the really clean main() loop code which looks like this:
int main(void)
{
doSetupStuff();
configureHardwareTimer();
while (1)
{
doTask1();
doTask2();
doTask3();
}
}
Let's use the CREATE_TASK_TIMER() macro like this. As you can see, the code is now much cleaner and easier to set up a new task. This is my preferred approach, because it creates the really clean main loop shown just above, with just the various doTask() calls, which are also easy to write and maintain:
// Task 1: Let's run this one at 100 Hz (every 10ms, or 10000us)
void doTask1(void)
{
bool time_to_run;
const uint32_t DT_DESIRED_US = 10000; // 10000us = 10ms, or 100Hz run freq
CREATE_TASK_TIMER(DT_DESIRED_US, time_to_run);
if (time_to_run)
{
// PERFORM THIS TASK'S OPERATIONS HERE!
}
}
// Task 2: Let's run this one at 1000 Hz (every 1ms)
void doTask2(void)
{
bool time_to_run;
const uint32_t DT_DESIRED_US = 1000; // 1000us = 1ms, or 1000Hz run freq
CREATE_TASK_TIMER(DT_DESIRED_US, time_to_run);
if (time_to_run)
{
// PERFORM THIS TASK'S OPERATIONS HERE!
}
}
// Task 3: Let's run this one at 10 Hz (every 100ms)
void doTask3(void)
{
bool time_to_run;
const uint32_t DT_DESIRED_US = 100000; // 100000us = 100ms, or 10Hz run freq
CREATE_TASK_TIMER(DT_DESIRED_US, time_to_run);
if (time_to_run)
{
// PERFORM THIS TASK'S OPERATIONS HERE!
}
}
Alternatively, however, you could structure the code more like this, which works equally as well and produces the same effect, just in a slightly different way:
#include <stdbool.h>
#include <stdint.h>
#define TASK1_PD_US (10000) // 10ms pd, or 100 Hz run freq
#define TASK2_PD_US (1000) // 1ms pd, or 1000 Hz run freq
#define TASK3_PD_US (100000) // 100ms pd, or 10 Hz run freq
// Task 1: Let's run this one at 100 Hz (every 10ms, or 10000us)
void doTask1(void)
{
// PERFORM THIS TASK'S OPERATIONS HERE!
}
// Task 2: Let's run this one at 1000 Hz (every 1ms)
void doTask2(void)
{
// PERFORM THIS TASK'S OPERATIONS HERE!
}
// Task 3: Let's run this one at 10 Hz (every 100ms)
void doTask3(void)
{
// PERFORM THIS TASK'S OPERATIONS HERE!
}
int main(void)
{
doSetupStuff();
configureHardwareTimer();
while (1)
{
bool time_to_run;
CREATE_TASK_TIMER(TASK1_PD_US, time_to_run);
if (time_to_run)
{
doTask1();
}
CREATE_TASK_TIMER(TASK2_PD_US, time_to_run);
if (time_to_run)
{
doTask2();
}
CREATE_TASK_TIMER(TASK3_PD_US, time_to_run);
if (time_to_run)
{
doTask3();
}
}
}
Part of the art (and fun!) of embedded bare-metal microcontroller programming is the skill and ingenuity involved in deciding exactly how you want to interleave each task and get them to run together, all as though they were running in parallel. Use one of the above formats as a starting point, and adapt to your particular circumstances. Message-passing can be added between tasks or between tasks and interrupts, tasks and a user, etc, as desired, and as required for your particular application.
Here's an example of how to configure a timer for use as a timestamp-generator on an STM32F2 microcontroller.
This shows functions for configureHardwareTimer() and getMicros(), used above:
// Timer handle to be used for Timer 2 below
TIM_HandleTypeDef TimHandle;
// Configure Timer 2 to be used as a free-running 32-bit hardware timer for general-purpose use as a 1-us-resolution
// timestamp source
void configureHardwareTimer()
{
// Timer clock must be enabled before you can configure it
__HAL_RCC_TIM2_CLK_ENABLE();
// Calculate prescaler
// Here are some references to show how this is done:
// 1) "STM32Cube_FW_F2_V1.7.0/Projects/STM32F207ZG-Nucleo/Examples/TIM/TIM_OnePulse/Src/main.c" shows the
// following (slightly modified) equation on line 95: `Prescaler = (TIMxCLK/TIMx_counter_clock) - 1`
// 2) "STM32F20x and STM32F21x Reference Manual" states the following on pg 419: "14.4.11 TIMx prescaler (TIMx_PSC)"
// "The counter clock frequency CK_CNT is equal to fCK_PSC / (PSC[15:0] + 1)"
// This means that TIMx_counter_clock_freq = TIMxCLK/(prescaler + 1). Now, solve for prescaler and you
// get the exact same equation as above: `prescaler = TIMxCLK/TIMx_counter_clock_freq - 1`
// Calculating TIMxCLK:
// - We must divide SystemCoreClock (returned by HAL_RCC_GetHCLKFreq()) by 2 because TIM2 uses clock APB1
// as its clock source, and on my board this is configured to be 1/2 of the SystemCoreClock.
// - Note: To know which clock source each peripheral and timer uses, you can look at
// "Table 25. Peripheral current consumption" in the datasheet, p86-88.
const uint32_t DESIRED_TIMER_FREQ = 1e6; // 1 MHz clock freq --> 1 us pd per tick, which is what I want
uint32_t Tim2Clk = HAL_RCC_GetHCLKFreq() / 2;
uint32_t prescaler = Tim2Clk / DESIRED_TIMER_FREQ - 1; // Don't forget the minus 1!
// Configure timer
// TIM2 is a 32-bit timer; See datasheet "Table 4. Timer feature comparison", p30-31
TimHandle.Instance = TIM2;
TimHandle.Init.Period = 0xFFFFFFFF; // Set pd to max possible for a 32-bit timer
TimHandle.Init.Prescaler = prescaler;
TimHandle.Init.ClockDivision = TIM_CLOCKDIVISION_DIV1;
TimHandle.Init.CounterMode = TIM_COUNTERMODE_UP;
TimHandle.Init.RepetitionCounter = 0; // NA (has no significance) for this timer
// Initialize the timer
if (HAL_TIM_Base_Init(&TimHandle) != HAL_OK)
{
// handle error condition
}
// Start the timer
if (HAL_TIM_Base_Start(&TimHandle) != HAL_OK)
{
// handle error condition
}
}
// Get the 1 us count value on Timer 2.
// This timer will be used for general purpose hardware timing that does NOT rely on interrupts.
// Therefore, the counter will continue to increment even with interrupts disabled.
// The count value increments every 1 microsecond.
// Since it is a 32-bit counter it overflows every 2^32 counts, which means the highest value it can
// store is 2^32 - 1 = 4294967295. Overflows occur every 2^32 counts / 1 count/us / 1e6us/sec
// = ~4294.97 sec = ~71.6 min.
uint32_t getMicros()
{
return __HAL_TIM_GET_COUNTER(&TimHandle);
}
References:
https://www.arduino.cc/en/tutorial/BlinkWithoutDelay
Doxygen: What's the right way to reference a parameter in Doxygen?
Enum-based error codes for error handling: Error handling in C code
Other architectural styles in C, such as "object-based" C via opaque pointers: Opaque C structs: various ways to declare them
See also:
A full, runnable Arduino example with an even better version of my CREATE_TASK_TIMER() macro from above:
My answer for C and C++, including microcontrollers and Arduino (or any other system): Full coulomb counter example demonstrating the above concept with timestamp-based, single-threaded, cooperative multi-tasking
First of all thank you for your suggestions. I tried to analyze every single possible solution you proposed.
The solution proposed by Peter seemed very interesting but I have to say that, after having gone through the datasheet several times, I don't believe that is feasible. My consideration is based on the following facts.
Using a scope I see that the acknowledge is received right after sending the command for doing conversion. See following image concerning the temperature conversion:
It seems quite clear to me the acknowledge bit right after the command. After that the SDA line (yellow) goes high, therefore I don't see how it is possible that I can exploit that for detecting when the conversion is ready.
Concerning the solution when using SPI, yes, the SDO remains low during the conversion, but I cannot use it: I need to stick with I2C. Furthermore, I have other sensors attached to that SPI bus and I agree with what Gabriel Staples says.
After my consideration I went for the solution proposed by Gabriel Staples (considering that, in order to read pressure value, I also need to read and convert temperature).
My current solution is based on a state machine with 6 states. In my solution, I distinguish between the wait time for the pressure conversion and the wait time for the temperature conversion with the idea the I could try to see how much the pressure reading degrades if I use a less precise temperature reading.
Here is my current solution. The following function is called inside the main while:
void MS5803_update()
{
static uint32_t tStart; // us; start time
switch (sensor_state)
{
case MS5803_REQUEST_TEMPERATURE:
{
MS5803_send_command(MS5803_CMD_ADC_CONV + TEMPERATURE + baro.resolution);
tStart = HAL_GetTick();
sensor_state = MS5803_WAIT_RAW_TEMPERATURE;
break;
}
case MS5803_WAIT_RAW_TEMPERATURE:
{
uint32_t tNow = HAL_GetTick();
if (tNow - tStart >= conversion_time)
{
sensor_state = MS5803_CONVERTING_TEMPERATURE;
}
break;
}
case MS5803_CONVERTING_TEMPERATURE:
{
MS5803_send_command(MS5803_CMD_ADC_READ);
uint8_t raw_value[3]; // Read 24 bit
MS5803_read_value(raw_value,3);
temperature_raw = ((uint32_t)raw_value[0] << 16) + ((uint32_t)raw_value[1] << 8) + raw_value[2];
sensor_state = MS5803_REQUEST_PRESSURE;
break;
}
case MS5803_REQUEST_PRESSURE:
{
MS5803_send_command(MS5803_CMD_ADC_CONV + PRESSURE + baro.resolution);
tStart = HAL_GetTick();
sensor_state = MS5803_WAIT_RAW_PRESSURE;
break;
}
case MS5803_WAIT_RAW_PRESSURE:
{
uint32_t tNow = HAL_GetTick();
if (tNow - tStart >= conversion_time)
{
sensor_state = MS5803_CONVERTING_PRESSURE;
}
break;
}
case MS5803_CONVERTING_PRESSURE:
{
MS5803_send_command(MS5803_CMD_ADC_READ);
uint8_t raw_value[3]; // Read 24 bit
MS5803_read_value(raw_value,3);
pressure_raw = ((uint32_t)raw_value[0] << 16) + ((uint32_t)raw_value[1] << 8) + raw_value[2];
// Now I have both temperature and pressure raw and I can convert them
MS5803_updateMeasurements();
// Reset the state machine to perform a new measurement
sensor_state = MS5803_REQUEST_TEMPERATURE;
break;
}
}
}
I don't pretend that my solution is better. I just post it in order to have an opinion from you guys. Note: I'm still working on it. Therefore I cannot guarantee is bug-free!
For PeterJ_01: I could agree that this is not strictly a teaching portal, but I believe that everybody around here asks questions to learn something new or to improve theirselves. Therefore, if you believe that the solution using the ack is better, it would be great if you could show us a draft of your idea. For me it would be something new to learn.
Any further comment is appreciated.

ARM USB functions take too long to excecute

I just started working on a XMC4500 microcontroller. I'm currently implementing USB CDC communication and I ran into a problem.
After calling a function "USBD_VCOM_SendData" the program then waits for a frame to send the data.
More accurately the program waits in the "usbd_endpoint_stream_xmc4000.c" file in the "Endpoint_Write_Stream_LE". There it waits for the endpoint to be ready in the "Endpoint_WaitUntilReady()" function.
#define USB_STREAM_TIMEOUT_MS 100
uint8_t Endpoint_WaitUntilReady(void)
{
#if (USB_STREAM_TIMEOUT_MS < 0xFF)
uint8_t TimeoutMSRem = USB_STREAM_TIMEOUT_MS;
#else
uint16_t TimeoutMSRem = USB_STREAM_TIMEOUT_MS;
#endif
uint16_t PreviousFrameNumber = USB_Device_GetFrameNumber();
for (;;)
{
if (Endpoint_GetEndpointDirection() == ENDPOINT_DIR_IN)
{
if (Endpoint_IsINReady())
{
return ENDPOINT_READYWAIT_NoError;
}
}
else
{
if (Endpoint_IsOUTReceived())
{
return ENDPOINT_READYWAIT_NoError;
}
}
uint8_t USB_DeviceState_LCL = USB_DeviceState;
if (USB_DeviceState_LCL == DEVICE_STATE_Unattached)
{
return ENDPOINT_READYWAIT_DeviceDisconnected;
}
else if (USB_DeviceState_LCL == DEVICE_STATE_Suspended)
{
return ENDPOINT_READYWAIT_BusSuspended;
}
else if (Endpoint_IsStalled())
{
return ENDPOINT_READYWAIT_EndpointStalled;
}
uint16_t CurrentFrameNumber = USB_Device_GetFrameNumber();
if (CurrentFrameNumber != PreviousFrameNumber)
{
PreviousFrameNumber = CurrentFrameNumber;
if (!(TimeoutMSRem--))
{
return ENDPOINT_READYWAIT_Timeout;
}
}
}
}
This wait time is approximately 100-150us and is to long. When I was working on STM32 microcontrollers the execution time was significantly smaller.
Has anybody delt with this problem before?
Is there a way to write the data to a buffer and then let the peripheral take care of the transaction without the need for processor time?
Or at least trigger an interrupt when the endpoint is ready for the transaction.
The loop terminates instantly if Endpoint_IsINReady() returns true. Perhaps you should try to run that macro in your code, and only send messages to the USB host if it evaluates to true.
Also, to ensure that the library does not use its blocking behavior, you'll have to think about how much data the endpoint buffers can hold at once, and make sure you don't send any more than that at a time. (You can make your own buffer and send it to the USB library piecemeal if you need to.)

Uart receives correct Bytes but in chaotic order

Using Atmel studio 7, with STK600 and 32UC3C MCU
I'm pulling my hair over this.
I'm sending strings of a variable size over UART once every 5 seconds. The String consists of one letter as opcode, then two chars are following that tell the lenght of the following datastring (without the zero, there is never a zero at the end of any of those strings). In most cases the string will be 3 chars in size, because it has no data ("p00").
After investigation I found out that what supposed to be "p00" was in fact "0p0" or "00p" or (only at first try after restarting the micro "p00"). I looked it up in the memory view of the debugger. Then I started hTerm and confirmed that the data was in fact "p00". So after a while hTerm showed me "p00p00p00p00p00p00p00..." while the memory of my circular uart buffer reads "p000p000p0p000p0p000p0p0..."
edit: Actually "0p0" and "00p" are alternating.
The baud rate is 9600. In the past I was only sending single letters. So everything was running well.
This is the code of the Receiver Interrupt:
I tried different variations in code that were all doing the same in a different way. But all of them showed the exact same behavior.
lastWebCMDWritePtr is a uint8_t* type and so is lastWebCMDRingstartPtr.
lastWebCMDRingRXLen is a uint8_t type.
__attribute__((__interrupt__))
void UartISR_forWebserver()
{
*(lastWebCMDWritePtr++) = (uint8_t)((&AVR32_USART0)->rhr & 0x1ff);
lastWebCMDRingRXLen++;
if(lastWebCMDWritePtr - lastWebCMDRingstartPtr > lastWebCMDRingBufferSIZE)
{
lastWebCMDWritePtr = lastWebCMDRingstartPtr;
}
// Variation 2:
// advanceFifo((uint8_t)((&AVR32_USART0)->rhr & 0x1ff));
// Variation 3:
// if(usart_read_char(&AVR32_USART0, getReadPointer()) == USART_RX_ERROR)
// {
// usart_reset_status(&AVR32_USART0);
// }
//
};
I welcome any of your ideas and advices.
Regarts Someo
P.S. I put the Atmel studio tag in case this has something to do with the myriad of debugger bugs of AS.
For a complete picture you would have to show where and how lastWebCMDWritePtr, lastWebCMDRingRXLen, lastWebCMDRingstartPtr and lastWebCMDRingBufferSIZE are used elsewhere (on the consuming side)
Also I would first try a simpler ISR with no dependencies to other software modules to exclude a hardware resp. register handling problem.
Approach:
#define USART_DEBUG
#define DEBUG_BUF_SIZE 30
__attribute__((__interrupt__))
void UartISR_forWebserver()
{
uint8_t rec_byte;
#ifdef USART_DEBUG
static volatile uint8_t usart_debug_buf[DEBUG_BUF_SIZE]; //circular buffer for debugging
static volatile int usart_debug_buf_index = 0;
#endif
rec_byte = (uint8_t)((&AVR32_USART0)->rhr & 0x1ff);
#ifdef USART_DEBUG
usart_debug_buf_index = usart_debug_buf_index % DEBUG_BUF_SIZE;
usart_debug_buf[usart_debug_buf_index] = rec_byte;
usart_debug_buf_index++
if (!(usart_debug_buf_index < DEBUG_BUF_SIZE)) {
usart_debug_buf_index = 0; //candidate for a breakpoint to see what happened in the past
}
#endif
//uart_recfifo_enqueue(rec_byte);
};

Improve performance of reading volatile memory

I have a function reading from some volatile memory which is updated by a DMA. The DMA is never operating on the same memory-location as the function. My application is performance critical. Hence, I realized the execution time is improved by approx. 20% if I not declare the memory as volatile. In the scope of my function the memory is non-volatile. Hovever, I have to be sure that next time the function is called, the compiler know that the memory may have changed.
The memory is two two-dimensional arrays:
volatile uint16_t memoryBuffer[2][10][20] = {0};
The DMA operates on the opposite "matrix" than the program function:
void myTask(uint8_t indexOppositeOfDMA)
{
for(uint8_t n=0; n<10; n++)
{
for(uint8_t m=0; m<20; m++)
{
//Do some stuff with memory (readings only):
foo(memoryBuffer[indexOppositeOfDMA][n][m]);
}
}
}
Is there a proper way to tell my compiler that the memoryBuffer is non-volatile inside the scope of myTask() but may be changed next time i call myTask(), so I could optain the performance improvement of 20%?
Platform Cortex-M4
The problem without volatile
Let's assume that volatile is omitted from the data array. Then the C compiler
and the CPU do not know that its elements change outside the program-flow. Some
things that could happen then:
The whole array might be loaded into the cache when myTask() is called for
the first time. The array might stay in the cache forever and is never
updated from the "main" memory again. This issue is more pressing on multi-core
CPUs if myTask() is bound to a single core, for example.
If myTask() is inlined into the parent function, the compiler might decide
to hoist loads outside of the loop even to a point where the DMA transfer
has not been completed.
The compiler might even be able to determine that no write happens to
memoryBuffer and assume that the array elements stay at 0 all the time
(which would again trigger a lot of optimizations). This could happen if
the program was rather small and all the code is visible to the compiler
at once (or LTO is used).
Remember: After all the compiler does not know anything about the DMA
peripheral and that it is writing "unexpectedly and wildly into memory"
(from a compiler perspective).
If the compiler is dumb/conservative and the CPU not very sophisticated (single core, no out-of-order execution), the code might even work without the volatile declaration. But it also might not...
The problem with volatile
Making
the whole array volatile is often a pessimisation. For speed reasons you
probably want to unroll the loop. So instead of loading from the
array and incrementing the index alternatingly such as
load memoryBuffer[m]
m += 1;
load memoryBuffer[m]
m += 1;
load memoryBuffer[m]
m += 1;
load memoryBuffer[m]
m += 1;
it can be faster to load multiple elements at once and increment the index
in larger steps such as
load memoryBuffer[m]
load memoryBuffer[m + 1]
load memoryBuffer[m + 2]
load memoryBuffer[m + 3]
m += 4;
This is especially true, if the loads can be fused together (e.g. to perform
one 32-bit load instead of two 16-bit loads). Further you want the
compiler to use SIMD instruction to process multiple array elements with
a single instruction.
These optimizations are often prevented if the load happens from
volatile memory because compilers are usually very conservative with
load/store reordering around volatile memory accesses.
Again the behavior differs between compiler vendors (e.g. MSVC vs GCC).
Possible solution 1: fences
So you would like to make the array non-volatile but add a hint for the compiler/CPU saying "when you see this line (execute this statement), flush the cache and reload the array from memory". In C11 you could insert an atomic_thread_fence at the beginning of myTask(). Such fences prevent the re-ordering of loads/stores across them.
Since we do not have a C11 compiler, we use intrinsics for this task. The ARMCC compiler has a __dmb() intrinsic (data memory barrier). For GCC you may want to look at __sync_synchronize() (doc).
Possible solution 2: atomic variable holding the buffer state
We use the following pattern a lot in our codebase (e.g. when reading data from
SPI via DMA and calling a function to analyze it): The buffer is declared as
plain array (no volatile) and an atomic flag is added to each buffer, which
is set when the DMA transfer has finished. The code looks something
like this:
typedef struct Buffer
{
uint16_t data[10][20];
// Flag indicating if the buffer has been filled. Only use atomic instructions on it!
int filled;
// C11: atomic_int filled;
// C++: std::atomic_bool filled{false};
} Buffer_t;
Buffer_t buffers[2];
Buffer_t* volatile currentDmaBuffer; // using volatile here because I'm lazy
void setupDMA(void)
{
for (int i = 0; i < 2; ++i)
{
int bufferFilled;
// Atomically load the flag.
bufferFilled = __sync_fetch_and_or(&buffers[i].filled, 0);
// C11: bufferFilled = atomic_load(&buffers[i].filled);
// C++: bufferFilled = buffers[i].filled;
if (!bufferFilled)
{
currentDmaBuffer = &buffers[i];
... configure DMA to write to buffers[i].data and start it
}
}
// If you end up here, there is no free buffer available because the
// data processing takes too long.
}
void DMA_done_IRQHandler(void)
{
// ... stop DMA if needed
// Atomically set the flag indicating that the buffer has been filled.
__sync_fetch_and_or(&currentDmaBuffer->filled, 1);
// C11: atomic_store(&currentDmaBuffer->filled, 1);
// C++: currentDmaBuffer->filled = true;
currentDmaBuffer = 0;
// ... possibly start another DMA transfer ...
}
void myTask(Buffer_t* buffer)
{
for (uint8_t n=0; n<10; n++)
for (uint8_t m=0; m<20; m++)
foo(buffer->data[n][m]);
// Reset the flag atomically.
__sync_fetch_and_and(&buffer->filled, 0);
// C11: atomic_store(&buffer->filled, 0);
// C++: buffer->filled = false;
}
void waitForData(void)
{
// ... see setupDma(void) ...
}
The advantage of pairing the buffers with an atomic is that you are able to detect when the processing is too slow meaning that you have to buffer more,
make the incoming data slower or the processing code faster or whatever is
sufficient in your case.
Possible solution 3: OS support
If you have an (embedded) OS, you might resort to other patterns instead of using volatile arrays. The OS we use features memory pools and queues. The latter can be filled from a thread or an interrupt and a thread can block on
the queue until it is non-empty. The pattern looks a bit like this:
MemoryPool pool; // A pool to acquire DMA buffers.
Queue bufferQueue; // A queue for pointers to buffers filled by the DMA.
void* volatile currentBuffer; // The buffer currently filled by the DMA.
void setupDMA(void)
{
currentBuffer = MemoryPool_Allocate(&pool, 20 * 10 * sizeof(uint16_t));
// ... make the DMA write to currentBuffer
}
void DMA_done_IRQHandler(void)
{
// ... stop DMA if needed
Queue_Post(&bufferQueue, currentBuffer);
currentBuffer = 0;
}
void myTask(void)
{
void* buffer = Queue_Wait(&bufferQueue);
[... work with buffer ...]
MemoryPool_Deallocate(&pool, buffer);
}
This is probably the easiest approach to implement but only if you have an OS
and if portability is not an issue.
Here you say that the buffer is non-volatile:
"memoryBuffer is non-volatile inside the scope of myTask"
But here you say that it must be volatile:
"but may be changed next time i call myTask"
These two sentences are contradicting. Clearly the memory area must be volatile or the compiler can't know that it may be updated by DMA.
However, I rather suspect that the actual performance loss comes from accessing this memory region repeatedly through your algorithm, forcing the compiler to read it back over and over again.
What you should do is to take a local, non-volatile copy of the part of the memory you are interested in:
void myTask(uint8_t indexOppositeOfDMA)
{
for(uint8_t n=0; n<10; n++)
{
for(uint8_t m=0; m<20; m++)
{
volatile uint16_t* data = &memoryBuffer[indexOppositeOfDMA][n][m];
uint16_t local_copy = *data; // this access is volatile and wont get optimized away
foo(&local_copy); // optimizations possible here
// if needed, write back again:
*data = local_copy; // optional
}
}
}
You'll have to benchmark it, but I'm pretty sure this should improve performance.
Alternatively, you could first copy the whole part of the array you are interested in, then work on that, before writing it back. That should help performance even more.
You're not allowed to cast away the volatile qualifier1.
If the array must be defined holding volatile elements then the only two options, "that let the compiler know that the memory has changed", are to keep the volatile qualifier, or use a temporary array which is defined without volatile and is copied to the proper array after the function call. Pick whichever is faster.
1 (Quoted from: ISO/IEC 9899:201x 6.7.3 Type qualifiers 6)
If an attempt is
made to refer to an object defined with a volatile-qualified type through use of an lvalue
with non-volatile-qualified type, the behavior is undefined.
It seems to me that you a passing half of the buffer to myTask and each half does not need to be volatile. So I wonder if you could solve your issue by defining the buffer as such, and then passing a pointer to one of the half-buffers to myTask. I'm not sure whether this will work but maybe something like this...
typedef struct memory_buffer {
uint16_t buffer[10][20];
} memory_buffer ;
volatile memory_buffer double_buffer[2];
void myTask(memory_buffer *mem_buf)
{
for(uint8_t n=0; n<10; n++)
{
for(uint8_t m=0; m<20; m++)
{
//Do some stuff with memory:
foo(mem_buf->buffer[n][m]);
}
}
}
I don't know you platform/mCU/SoC, but usually DMAs have interrupt that trigger on programmable threshold.
What I can imagine is to remove volatile keyword and use interrupt as semaphore for task.
In other words:
DMA is programmed to interrupt when last byte of buffer is written
Task is block on a semaphore/flag waiting that the flag is released
When DMA calls the interrupt routine cange the buffer pointed by DMA for the next reading time and change the flag that unlock the task that can elaborate data.
Something like:
uint16_t memoryBuffer[2][10][20];
volatile uint8_t PingPong = 0;
void interrupt ( void )
{
// Change current DMA pointed buffer
PingPong ^= 1;
}
void myTask(void)
{
static uint8_t lastPingPong = 0;
if (lastPingPong != PingPong)
{
for (uint8_t n = 0; n < 10; n++)
{
for (uint8_t m = 0; m < 20; m++)
{
//Do some stuff with memory:
foo(memoryBuffer[PingPong][n][m]);
}
}
lastPingPong = PingPong;
}
}

Can't write to SC1DRL register on 68HC12 board--what am I missing?

I am trying to write to use the multiple serial interface on the 68HC12 but am can't get it to talk. I think I've isolated the problem to not being able to write to the SC1DRL register (SCI Data Register Low).
The following is from my SCI ISR:
else if (HWRegPtr->SCI.sc1sr1.bit.tdre) {
/* Transmit the next byte in TX_Buffer. */
if (TX_Buffer.in != TX_Buffer.out || TX_Buffer.full) {
HWRegPtr->SCI.sc1drl.byte = TX_Buffer.buffer[TX_Buffer.out];
TX_Buffer.out++;
if (TX_Buffer.out >= SCI_Buffer_Size) {
TX_Buffer.out = 0;
}
TX_Buffer.full = 0;
}
/* Disable the transmit interrupt if the buffer is empty. */
if (TX_Buffer.in == TX_Buffer.out && !TX_Buffer.full) {
Disable_SCI_TX();
}
}
TX_Buffer.buffer has the right thing at index TX_Buffer.out when its contents are being written to HWRegPtr->SCI.sc1drl.byte, but my debugger doesn't show a change, and no data is being transmitted over the serial interface.
Anybody know what I'm missing?
edit:
HWRegPtr is defined as:
extern HARDWARE_REGISTER *HWRegPtr;
HARDWARE_REGISTER is a giant struct with all the registers in it, and is volatile.
It's likely that SC1DRL is a write-only register (check the official register docs to be sure -- google isn't turning up the right PDF for me). That means you can't read it back (even with an in-target debugger) to verify your code.
How is HWRegPtr defined? Does it have volatile in the right places to ensure the compiler treats every write through that pointer as something which must happen immediately?

Resources