Suppose having the following code elements working on a fifo buffer:
static uint_fast32_t buffer_start;
static uint_fast32_t buffer_end;
static mutex_t buffer_guard;
(...)
void buffer_write(uint8_t* data, uint_fast32_t len)
{
uint_fast32_t pos;
mutex_lock(buffer_guard);
pos = buffer_end;
buffer_end = buffer_end + len;
(...) /* Wrap around buffer_end, fill in data */
mutex_unlock(buffer_guard);
}
bool buffer_isempty(void)
{
bool ret;
mutex_lock(buffer_guard);
ret = (buffer_start == buffer_end);
mutex_unlock(buffer_guard);
return ret;
}
This code might be running on an embedded system, with a RTOS, with the buffer_write() and buffer_isempty() functions called from different threads. The compiler has no means to know that the mutex_lock() and mutex_unlock() functions provided by the RTOS are working with a critical sections.
As the code is above, due to buffer_end being a static variable (local to the compilation unit), the compiler might choose to reorder accesses to it around function calls (at least as far as I understand the C standard, this seems possible to happen). So potentially the code performing buffer_end = buffer_end + len line have a chance to end up before the call to mutex_lock().
Using volatile on these variables (like static volatile uint_fast32_t buffer_end;) seems to resolve this as then they would be constrained by sequence points (which a mutex_lock() call is, due to being a function call).
Is my understanding right on these?
Is there a more appropriate means (than using volatile) of dealing with this type of problem?
Related
I'm using microcontroller to make some ADC measurements. I have an issue when I try to compile following code using -O2 optimization, MCU freezes when PrintVal() function is present in code. I did some debugging and it turns out that when I add -fno-inline compiler flag, the code will run fine even with PrintVal() function.
Here is some background:
AdcIsr.c contains interrupt that is executed when ADC finishes it's job. This file also contains ISRInit() function that initializes variable that will hold value after conversion. In main loop will wait for interrupt and only then access AdcMeas.value.
AdcIsr.c
static volatile uin16_t* isrVarPtr = NULL;
ISR()
{
uint8_t tmp = readAdc();
*isrVarPtr = tmp;
}
void ISRInit(volatile uint16_t *var)
{
isrVarPtr = var;
}
AdcMeas.c
typedef struct{
uint8_t id;
volatile uint16_t value;
}AdcMeas_t;
static AdcMeas_t AdcMeas = {0};
const AdcMeas_t* AdcMeasGetStructPtr()
{
return &AdcMeas;
}
main.c
void PrintVal(const AdcMeas_t* data)
{
printf("AdcMeas %d value: %d\r\n", data->id, data->value);
}
void StartMeasurement()
{
...
AdcOn();
...
}
int main()
{
ISRInit(AdcMeasGetStructPtr()->value);
while(1)
{
StartMeasurement();
WaitForISR();
PrintVal(AdcMeasGetStructPtr());
DelayMs(1000);
}
}
Questions:
Is there something wrong with usage of const AdcMeas_t* data as argument of the PrintVal() function? I understand that AdcMeas.value may change inside interrupt and PrintVal() may be outdated.
AdcMeas contains a 'generic getter'. Is this a good practice to use this sort of function to allow read-only access to static structure? or should I implement AdcMeasGetId() and AdcMeasGetValue functions (note that this struct has only 2 members, what if it has 8 members)?
I know this code is a bit dumb (waiting for interrupt in while loop), this is just an example.
Some bugs:
You have no header files, neither library include or your own ones. This means that everything is hopelessly broken until you fix that. You cannot do multiple file projects in C without header files.
*isrVarPtr = tmp; Here you write to a variable without protection from race conditions. If the main program reads this variable in several steps, you risk getting incorrect data. You need to protect against race conditions or guarantee atomic access.
const AdcMeasGetStructPtr() is gibberish and there is no way that the return &AdcMeas; inside it would compile with a conforming C compiler.
If you have an old but conforming C90 compiler, the return type will get treated as int. Otherwise, if you have a modern C compiler, not even the function definition will compiler. So it would seem that something is very wrong with your compiler, which is a greater concern than this bug.
Declaring the typedef struct in the C file and then returning a pointer to it doesn't make any sense. You need to re-design this module. You could have a getter function returning an instance to a private struct, if there is only ever going to be 1 instance of it (singleton). However, as mentioned, it needs to handle race conditions.
Stylistic concerns:
Empty parenthesis () in a function declaration is almost always wrong in C. This is obsolete style and means "accept any parameter". C++ is different here.
int main() doesn't make any sense at all in a microcontroller system. You should use some implementation-defined form suitable for freestanding programs. The most commonly supported form is void main (void).
DelayMs(1000); is highly questionable code in any embedded system. There should never be a reason why you'd want to hang up your MCU being useless, with max current consumption, for a whole second.
Overall it seems you would benefit from a "continuous conversion" ADC. ADCs that support continuous conversion just dump their latest read in the data register and you can pick it up with polling whenever you need it. Catching all ADC interrupts is really just for hard realtime systems, signal processing and similar.
I've been having an implementation discussion where the idea that a CPU can choose to completely reorder the storing of memory has come up.
I was initializing a static array in C using code similar to:
static int array[10];
static int array_initialized = 0;
void initialize () {
array[0] = 1;
array[1] = 2;
...
array_initialized = -1;
}
and it is used later similar to:
int get_index(int index) {
if (!array_initialized) initialize();
if (index < 0 || index > 9) return -1;
return array[index];
}
is it possible for the CPU to reorder memory access in a multi-core intel architecture (or other architecture) such that it sets array_initialized before the initialize function has finished setting the array elements? or so that another execution thread can see array_initialized as non-zero before the entire array has been initialized in its view of the memory?
TL:DR: to make lazy-init safe if you don't do it before starting multiple threads, you need an _Atomic flag.
is it possible for the CPU to reorder memory access in a multi-core Intel (x86) architecture
No, such reordering is possible at compile time only. x86 asm effectively has acquire/release semantics for normal loads/stores. (seq_cst + a store buffer with store forwarding).
https://preshing.com/20120625/memory-ordering-at-compile-time/
(or other architecture)
Yes, most other ISAs have a weaker asm memory model that does allow StoreStore reordering and LoadLoad reordering. (Effectively memory_order_relaxed, or sort of like memory_order_consume on ISAs other than Alpha AXP, but compilers don't try to maintain data dependencies.)
None of this really matters from C because the C memory model is very weak, allowing compile-time reordering and simultaneous read/write or write+write of any object is data-race UB.
Data Race UB is what lets a compiler keep static variables in registers for the life of a function / inside a loop when compiling for "normal" ISAs.
Having 2 threads run this function is C data-race UB if array_initialized isn't already set before either of them run. (e.g. by having the main thread run it once before starting any more threads). And remove the array_initialized flag entirely, unless you have a use for the lazy-init feature before starting any more threads.
It's 100% safe for a single thread, regardless of how many other threads are running: the C programming model guarantees that a single thread always sees its own operations in program order. (Just like asm for all normal ISAs; other than explicit parallelism in ISAs like Itanium, you always see your own operations in order. It's only other threads seeing your operations where things get weird).
Starting a new thread is (I think) always a "full barrier", or in C terms "synchronizes with" the new thread. Stuff in the new thread can't happen before anything in the parent thread. So just calling get_index once from the main thread makes it safe with no further barriers for other threads to run get_index after that.
You could make lazy init thread-safe with an _Atomic flag
This is similar to what gcc does for function-local static variables with non-constant initializers. Check out the code-gen for that if you're curious: a read-only check of an already-init flag and then a call to an init function that makes sure only one thread runs the initializer.
This requires an acquire load in the fast-path for the already-initialized state. That's free on x86 and SPARC-TSO (same asm as a normal load), but not on weaker ISAs. AArch64 has an acquire load instruction, other ISAs need some barrier instructions.
Turn your array_initialized flag into a 3-state _Atomic variable:
init not started (e.g. init == 0). Check for this with an acquire load.
init started but not finished (e.g. init == -1)
init finished (e.g. init == 1)
You can leave static int array[10]; itself non-atomic by making sure exactly 1 thread "claims" responsibility for doing the init, using atomic_compare_exchange_strong (which will succeed for exactly one thread). And then have other threads spin-wait for the INIT_FINISHED state.
Using initial state == 0 lets it be in the BSS, hopefully next to the data. Otherwise we might prefer INIT_FINISHED=0 for ISAs where branching on an int from memory being (non)zero is slightly more efficient than other numbers. (e.g. AArch64 cbnz, MIPS bne $reg, $zero).
We could get the best of both worlds (cheapest possible fast-path for the already-init case) while still having the flag in the BSS: Have the main thread write it with INIT_NOTSTARTED = -1 before starting any more threads.
Having the flag next to the array is helpful for a small array where the flag is probably in the same cache line as the data we want to index. Or at least the same 4k page.
#include <stdatomic.h>
#include <stdbool.h>
#ifdef __x86_64__
#include <immintrin.h>
#define SPINLOOP_BODY _mm_pause()
#else
#define SPINLOOP_BODY /**/
#endif
#ifdef __GNUC__
#define unlikely(expr) __builtin_expect(!!(expr), 0)
#define likely(expr) __builtin_expect(!!(expr), 1)
#define NOINLINE __attribute__((noinline))
#else
#define unlikely(expr) (expr)
#define likely(expr) (expr)
#define NOINLINE /**/
#endif
enum init_states {
INIT_NOTSTARTED = 0,
INIT_STARTED = -1,
INIT_FINISHED = 1 // optional: make this 0 to speed up the fast-path on some ISAs, and store an INIT_NOTSTARTED before the first call
};
static int array[10];
static _Atomic int array_initialized = INIT_NOTSTARTED;
// called either before or during init.
// One thread claims responsibility for doing the init, others spin-wait
NOINLINE // this is rare, make sure it doesn't bloat the fast-path
void initialize(void) {
bool winner = false;
// check read-only if another thread has already claimed init
if (array_initialized == INIT_NOTSTARTED) {
int expected = INIT_NOTSTARTED;
winner = atomic_compare_exchange_strong(&array_initialized, &expected, INIT_STARTED);
// seq_cst memory order is fine. Weaker might be ok but it only has to run once
}
if (winner) {
array[0] = 1;
// ...
atomic_store_explicit(&array_initialized, INIT_FINISHED, memory_order_release);
} else {
// spin-wait for the winner in other threads
// yield(); optional.
// Or use some kind of mutex or condition var if init is really slow
// otherwise just spin on a seq_cst load. (Or acquire is fine.)
while(array_initialized != INIT_FINISHED)
SPINLOOP_BODY; // x86 only
// winner's release store syncs with our load:
// array[] stores Happened Before this point so we can read it without UB
}
}
int get_index(int index) {
// atomic acquire load is fine, doesn't need seq_cst. Cheaper than seq_cst on PowerPC
if (unlikely(atomic_load_explicit(&array_initialized, memory_order_acquire) != INIT_FINISHED))
initialize();
if (unlikely(index < 0 || index > 9)) return -1;
return array[index];
}
This does compile to correct-looking and efficient asm on Godbolt. Without unlikely() macros, gcc/clang think that at least the stand-alone version of get_index has initialize() and/or return -1 as the most likely fast-path.
And compilers wanted to inline the init function, which would be silly because it only runs once per thread at most. Hopefully profile-guided optimization would correct that.
A structure TsMyStruct is given as parameter to some functions :
typedef struct
{
uint16_t inc1;
uint16_t inc2;
}TsMyStruct;
void func1(TsMyStruct* myStruct)
{
myStruct->inc1 += 1;
}
void func2(TsMyStruct* myStruct)
{
myStruct->inc1 += 2;
myStruct->inc2 += 3;
}
func1 is called under non-interrupt context and func2 is called under interrupt context. Call stack of func2 has an interrupt vector as origin. C compiler does not know func2 can be called (but code isn't considered as "unused" code as linker needs it in interrupt vector table memory section), so some code reading myStruct->inc2 outside func2 can be possibly optimized preventing myStruct->inc2 to be reloaded from ram. It is true for C basic types, but is it true for inc2 structure member or some array...? Is it true for function parameters?
As a general rule, can I say "every memory zone (of basic type? or not?) modified in interrupt context and read elsewhere must be declared as volatile"?
Yes, any memory that is used both inside and outside of an interrupt handler should be volatile, including structs and arrays, and pointers passed as function parameters. Assuming that you are targeting a single-core device, you do not need additional synchronization.
Still, you have to consider that func1 could be interrupted anywhere, which may lead to inconsistent results if you're not careful. For instance, consider this:
void func1(volatile TsMyStruct* myStruct)
{
myStruct->inc1 += 1;
if (myStruct->inc1 == 4)
{
print(myStruct->inc1); // assume "print" exists
}
}
void func2(volatile TsMyStruct* myStruct)
{
myStruct->inc1 += 2;
myStruct->inc2 += 3;
}
Since interrupts are asynchronous, this could print numbers different from 4. That would happen, for instance, if func1 is interrupted after the check but before the print call.
no. volatile is not enough. You have both to set an optimization barrier for the compiler (which can be volatile) and for the processor. E.g. when a CPU core writes data, this can go into some cache and won't be visible for another core.
Usually, you need some locking in your code (spin locks, or mutex). Such functions contain usually an optimization barrier so you do not need an volatile.
Your code is racy, with proper locking it would look like
void func1(TsMyStruct* myStruct)
{
lock();
myStruct->inc1 += 1;
unlock();
}
void func2(TsMyStruct* myStruct)
{
lock();
myStruct->inc1 += 2;
unlock();
myStruct->inc1 += 3;
}
and the lock() + unlock() functions contain optimization barriers (e.g. __asm__ __volatile__("" ::: "memory") or just a call to a global function) which will cause the compiler to reload myStruct.
For nitpicking: lock() and unlock() are expected to do the right thing (e.g. disable irqs). Real world implementations would be e.g. spin_lock_irqsave() + spin_lock_irqrestore() in linux.
I have a function reading from some volatile memory which is updated by a DMA. The DMA is never operating on the same memory-location as the function. My application is performance critical. Hence, I realized the execution time is improved by approx. 20% if I not declare the memory as volatile. In the scope of my function the memory is non-volatile. Hovever, I have to be sure that next time the function is called, the compiler know that the memory may have changed.
The memory is two two-dimensional arrays:
volatile uint16_t memoryBuffer[2][10][20] = {0};
The DMA operates on the opposite "matrix" than the program function:
void myTask(uint8_t indexOppositeOfDMA)
{
for(uint8_t n=0; n<10; n++)
{
for(uint8_t m=0; m<20; m++)
{
//Do some stuff with memory (readings only):
foo(memoryBuffer[indexOppositeOfDMA][n][m]);
}
}
}
Is there a proper way to tell my compiler that the memoryBuffer is non-volatile inside the scope of myTask() but may be changed next time i call myTask(), so I could optain the performance improvement of 20%?
Platform Cortex-M4
The problem without volatile
Let's assume that volatile is omitted from the data array. Then the C compiler
and the CPU do not know that its elements change outside the program-flow. Some
things that could happen then:
The whole array might be loaded into the cache when myTask() is called for
the first time. The array might stay in the cache forever and is never
updated from the "main" memory again. This issue is more pressing on multi-core
CPUs if myTask() is bound to a single core, for example.
If myTask() is inlined into the parent function, the compiler might decide
to hoist loads outside of the loop even to a point where the DMA transfer
has not been completed.
The compiler might even be able to determine that no write happens to
memoryBuffer and assume that the array elements stay at 0 all the time
(which would again trigger a lot of optimizations). This could happen if
the program was rather small and all the code is visible to the compiler
at once (or LTO is used).
Remember: After all the compiler does not know anything about the DMA
peripheral and that it is writing "unexpectedly and wildly into memory"
(from a compiler perspective).
If the compiler is dumb/conservative and the CPU not very sophisticated (single core, no out-of-order execution), the code might even work without the volatile declaration. But it also might not...
The problem with volatile
Making
the whole array volatile is often a pessimisation. For speed reasons you
probably want to unroll the loop. So instead of loading from the
array and incrementing the index alternatingly such as
load memoryBuffer[m]
m += 1;
load memoryBuffer[m]
m += 1;
load memoryBuffer[m]
m += 1;
load memoryBuffer[m]
m += 1;
it can be faster to load multiple elements at once and increment the index
in larger steps such as
load memoryBuffer[m]
load memoryBuffer[m + 1]
load memoryBuffer[m + 2]
load memoryBuffer[m + 3]
m += 4;
This is especially true, if the loads can be fused together (e.g. to perform
one 32-bit load instead of two 16-bit loads). Further you want the
compiler to use SIMD instruction to process multiple array elements with
a single instruction.
These optimizations are often prevented if the load happens from
volatile memory because compilers are usually very conservative with
load/store reordering around volatile memory accesses.
Again the behavior differs between compiler vendors (e.g. MSVC vs GCC).
Possible solution 1: fences
So you would like to make the array non-volatile but add a hint for the compiler/CPU saying "when you see this line (execute this statement), flush the cache and reload the array from memory". In C11 you could insert an atomic_thread_fence at the beginning of myTask(). Such fences prevent the re-ordering of loads/stores across them.
Since we do not have a C11 compiler, we use intrinsics for this task. The ARMCC compiler has a __dmb() intrinsic (data memory barrier). For GCC you may want to look at __sync_synchronize() (doc).
Possible solution 2: atomic variable holding the buffer state
We use the following pattern a lot in our codebase (e.g. when reading data from
SPI via DMA and calling a function to analyze it): The buffer is declared as
plain array (no volatile) and an atomic flag is added to each buffer, which
is set when the DMA transfer has finished. The code looks something
like this:
typedef struct Buffer
{
uint16_t data[10][20];
// Flag indicating if the buffer has been filled. Only use atomic instructions on it!
int filled;
// C11: atomic_int filled;
// C++: std::atomic_bool filled{false};
} Buffer_t;
Buffer_t buffers[2];
Buffer_t* volatile currentDmaBuffer; // using volatile here because I'm lazy
void setupDMA(void)
{
for (int i = 0; i < 2; ++i)
{
int bufferFilled;
// Atomically load the flag.
bufferFilled = __sync_fetch_and_or(&buffers[i].filled, 0);
// C11: bufferFilled = atomic_load(&buffers[i].filled);
// C++: bufferFilled = buffers[i].filled;
if (!bufferFilled)
{
currentDmaBuffer = &buffers[i];
... configure DMA to write to buffers[i].data and start it
}
}
// If you end up here, there is no free buffer available because the
// data processing takes too long.
}
void DMA_done_IRQHandler(void)
{
// ... stop DMA if needed
// Atomically set the flag indicating that the buffer has been filled.
__sync_fetch_and_or(¤tDmaBuffer->filled, 1);
// C11: atomic_store(¤tDmaBuffer->filled, 1);
// C++: currentDmaBuffer->filled = true;
currentDmaBuffer = 0;
// ... possibly start another DMA transfer ...
}
void myTask(Buffer_t* buffer)
{
for (uint8_t n=0; n<10; n++)
for (uint8_t m=0; m<20; m++)
foo(buffer->data[n][m]);
// Reset the flag atomically.
__sync_fetch_and_and(&buffer->filled, 0);
// C11: atomic_store(&buffer->filled, 0);
// C++: buffer->filled = false;
}
void waitForData(void)
{
// ... see setupDma(void) ...
}
The advantage of pairing the buffers with an atomic is that you are able to detect when the processing is too slow meaning that you have to buffer more,
make the incoming data slower or the processing code faster or whatever is
sufficient in your case.
Possible solution 3: OS support
If you have an (embedded) OS, you might resort to other patterns instead of using volatile arrays. The OS we use features memory pools and queues. The latter can be filled from a thread or an interrupt and a thread can block on
the queue until it is non-empty. The pattern looks a bit like this:
MemoryPool pool; // A pool to acquire DMA buffers.
Queue bufferQueue; // A queue for pointers to buffers filled by the DMA.
void* volatile currentBuffer; // The buffer currently filled by the DMA.
void setupDMA(void)
{
currentBuffer = MemoryPool_Allocate(&pool, 20 * 10 * sizeof(uint16_t));
// ... make the DMA write to currentBuffer
}
void DMA_done_IRQHandler(void)
{
// ... stop DMA if needed
Queue_Post(&bufferQueue, currentBuffer);
currentBuffer = 0;
}
void myTask(void)
{
void* buffer = Queue_Wait(&bufferQueue);
[... work with buffer ...]
MemoryPool_Deallocate(&pool, buffer);
}
This is probably the easiest approach to implement but only if you have an OS
and if portability is not an issue.
Here you say that the buffer is non-volatile:
"memoryBuffer is non-volatile inside the scope of myTask"
But here you say that it must be volatile:
"but may be changed next time i call myTask"
These two sentences are contradicting. Clearly the memory area must be volatile or the compiler can't know that it may be updated by DMA.
However, I rather suspect that the actual performance loss comes from accessing this memory region repeatedly through your algorithm, forcing the compiler to read it back over and over again.
What you should do is to take a local, non-volatile copy of the part of the memory you are interested in:
void myTask(uint8_t indexOppositeOfDMA)
{
for(uint8_t n=0; n<10; n++)
{
for(uint8_t m=0; m<20; m++)
{
volatile uint16_t* data = &memoryBuffer[indexOppositeOfDMA][n][m];
uint16_t local_copy = *data; // this access is volatile and wont get optimized away
foo(&local_copy); // optimizations possible here
// if needed, write back again:
*data = local_copy; // optional
}
}
}
You'll have to benchmark it, but I'm pretty sure this should improve performance.
Alternatively, you could first copy the whole part of the array you are interested in, then work on that, before writing it back. That should help performance even more.
You're not allowed to cast away the volatile qualifier1.
If the array must be defined holding volatile elements then the only two options, "that let the compiler know that the memory has changed", are to keep the volatile qualifier, or use a temporary array which is defined without volatile and is copied to the proper array after the function call. Pick whichever is faster.
1 (Quoted from: ISO/IEC 9899:201x 6.7.3 Type qualifiers 6)
If an attempt is
made to refer to an object defined with a volatile-qualified type through use of an lvalue
with non-volatile-qualified type, the behavior is undefined.
It seems to me that you a passing half of the buffer to myTask and each half does not need to be volatile. So I wonder if you could solve your issue by defining the buffer as such, and then passing a pointer to one of the half-buffers to myTask. I'm not sure whether this will work but maybe something like this...
typedef struct memory_buffer {
uint16_t buffer[10][20];
} memory_buffer ;
volatile memory_buffer double_buffer[2];
void myTask(memory_buffer *mem_buf)
{
for(uint8_t n=0; n<10; n++)
{
for(uint8_t m=0; m<20; m++)
{
//Do some stuff with memory:
foo(mem_buf->buffer[n][m]);
}
}
}
I don't know you platform/mCU/SoC, but usually DMAs have interrupt that trigger on programmable threshold.
What I can imagine is to remove volatile keyword and use interrupt as semaphore for task.
In other words:
DMA is programmed to interrupt when last byte of buffer is written
Task is block on a semaphore/flag waiting that the flag is released
When DMA calls the interrupt routine cange the buffer pointed by DMA for the next reading time and change the flag that unlock the task that can elaborate data.
Something like:
uint16_t memoryBuffer[2][10][20];
volatile uint8_t PingPong = 0;
void interrupt ( void )
{
// Change current DMA pointed buffer
PingPong ^= 1;
}
void myTask(void)
{
static uint8_t lastPingPong = 0;
if (lastPingPong != PingPong)
{
for (uint8_t n = 0; n < 10; n++)
{
for (uint8_t m = 0; m < 20; m++)
{
//Do some stuff with memory:
foo(memoryBuffer[PingPong][n][m]);
}
}
lastPingPong = PingPong;
}
}
I've been reading through a lot of answers, and there are a lot of opinions about this but I wasn't able to find a code that answers my question (I found a lot of code that answers "how to share variables by declaring")
Here's the situation:
Working with embedded systems
Using IAR workbench systems
STM32F4xx HAL drivers
Declaring global variables is not an option (Edit: Something to do with keeping the memory size small, so local variables disappear at the end of scope but the global variable stay around. The local variables were sent out as outputs, so we discard them right away as we don't need them)
C language
in case this is important: 2 .c files, and 1 .h is included in both
Now that's out of the way, let me write an example.
file1.c - Monitoring
void function(){
uint8_t varFlag[10]; // 10 devices
for (uint8_t i = 0; i < 10; i++)
{
while (timeout <= 0){
varFlag[i] = 1;
// wait for response. We'll know by the ack() function
// if response back from RX,
// then varFlag[i] = 0;
}
file2.c - RX side
// listening... once indicated, this function is called
// ack is not called in function(), it is called when
// there's notification that there is a received message
// (otherwise I would be able to use a pointer to change
// the value of varFlag[]
void ack(uint8_t indexDevice)
{
// indexDevice = which device was acknowledged? we have 10 devices
// goal here is to somehow do varFlag[indexDevice] = 0
// where varFlag[] is declared in the function()
}
You share values or data, not variables. Stricto sensu, variables do not exist at runtime; only the compiler knows them (at most, with -g, it might put some metadata such as offset & type of locals in the debugging section -which is usually stripped in production code- of the executable). Ther linker symbol table (for global variables) can, and often is, stripped in a embedded released ELF binary. At runtime you have some data segment, and probably a call stack made of call frames (which hold some local variables, i.e. their values, in some slots). At runtime only locations are relevant.
(some embedded processors have severe restrictions on their call stack; other have limited RAM, or scratchpad memory; so it would be helpful to know what actual processor & ISA you are targeting, and have an idea of how much RAM you have)
So have some global variables keeping these shared values (perhaps indirectly thru some pointers and data structures), or pass these values (perhaps indirectly, likewise...) thru arguments.
So if you want to share the ten bytes varFlag[10] array:
it looks like you don't want to declare uint8_t varFlag[10]; as a global (or static) variable. Are you sure you really should not (these ten bytes have to sit somewhere, and they do consume some RAM anyway, perhaps in your call stack....)?
pass the varFlag (array, decayed to pointer when passed as argument) as an argument, so perhaps declare:
void ack(uint8_t indexDevice, uint8_t*flags);
and call ack(3,varFlag) from function...
or declare a global pointer:
uint8_t*globflags;
and set it (using globflags = varFlag;) at the start of the function declaring varFlag as a local variable, and clear if (using globflags = NULL;) at the end of that function.
I would suggest you to look at the assembler code produced by your compiler (with GCC you might compile with gcc -S -Os -fverbose-asm -fstack-usage ....). I also strongly suggest you to get your code reviewed by a colleague...
PS. Perhaps you should use GCC or Clang/LLVM as a cross-compiler, and perhaps your IAR is actually using such a compiler...
Your argument for not using global variables:
Something to do with keeping the memory size small, so local variables disappear at the end of scope but the global variable stay around. The local variables were sent out as outputs, so we discard them right away as we don't need them
confuses lifetime with scope. Variables with static lifetime occupy memory permanently regardless of scope (or visibility). A variable with global scope happens to also be statically allocated, but then so is any other static variable.
In order to share a variable across contexts it must necessarily be static, so there is no memory saving by avoiding global variables. There are however plenty of other stronger arguments for avoiding global variables and you should read A Pox on Globals by Jack Ganssle.
C supports three-levels of scope:
function (inside a function)
translation-unit (static linkage, outside a function)
global (external linkage)
The second of these allows variable to be directly visible amongst functions in the same source file, while external linkage allows direct visibility between multiple source files. However you want to avoid direct access in most cases since that is the root of the fundamental problem with global variables. You can do this using accessor functions; to use your example you might add a file3.c containing:
#include "file3.h"
static uint8_t varFlag[10];
void setFlag( size_t n )
{
if( n < sizeof(varFlag) )
{
varFlag[n] = 1 ;
}
}
void clrFlag( size_t n )
{
if( n < sizeof(varFlag) )
{
varFlag[n] = 0 ;
}
}
uint8_t getFlag( size_t n )
{
return varFlag[n] == 0 ? 0 : 1 ;
}
With an associated header file3.h
#if !defined FILE3_INCLUDE
#define FILE3_INCLUDE
void setFlag( size_t n ) ;
void clrFlag( size_t n ) ;
uint8_t getFlag( size_t n ) ;
#endif
which file1.c and file2.c include so they can access varFlag[] via the accessor functions. The benefits include:
varFlag[] is not directly accessible
the functions can enforce valid values
in a debugger you can set a breakpoint catch specifically set, clear or read access form anywhere in the code.
the internal data representation is hidden
Critically the avoidance of a global variable does not save you memory - the data is still statically allocated - because you cannot get something for nothing varFlag[] has to exist, even if it is not visible. That said, the last point about internal representation does provide a potential for storage efficiency, because you could change your flag representation from uint8_t to single bit-flags without having to change interface to the data or the accessing the accessing code:
#include <limits.h>
#include "file3.h"
static uint16_t varFlags ;
void setFlag( size_t n )
{
if( n < sizeof(varFlags) * CHAR_BIT )
{
varFlags |= 0x0001 << n ;
}
}
void clrFlag( size_t n )
{
if( n < sizeof(varFlags) * CHAR_BIT )
{
varFlags &= ~(0x0001 << n) ;
}
}
uint8_t getFlag( size_t n )
{
return (varFlags & (0x0001 << n)) == 0 ? 0 : 1 ;
}
There are further opportunities to produce robust code, for example you might make only the read accessor (getter) publicly visible and hide the so that all but one translation unit has read-only access.
Put the functions into a seperate translation unit and use a static variable:
static type var_to_share = ...;
void function() {
...
}
void ack() {
...
}
Note that I said translation unit, not file. You can do some #include magic (in the cleanest way possible) to keep both function definitions apart.
Unfortunately you can't in C.
The only way to do such thing is with assemply.