Read optimizations on shared memory

Read optimizations on shared memory - c

Suppose you have a function that make several read access to a shared variable whose access is atomic. All in running in the same process. Imagine them as threads of a process or as a sw running on bare metal platform with no MMU.
As a requirement you must ensure that the value of that read is consistent for all the length of the function so the code must not re-read the memory location and have to put in a local variable or on a register. How can we ensure that this behaviour is respected?
As an example...
shared is the only shared variable
extern uint32_t a, b, shared;
void useless_function()
{
__ASM volatile ("":::"memory");
uint32_t value = shared;
a = value *2;
b = value << 3;
}
Can value be optimized out by direct readings of shared variable in some contexts? If yes, how can I be sure this cannot happen?

As a requirement you must ensure that the value of that read is consistent for all the length of the function so the code must not re-read the memory location and have to put in a local variable or on a register. How can we ensure that this behaviour is respected?
You can do that with READ_ONCE macro from Linux kernel:
/*
* Prevent the compiler from merging or refetching reads or writes. The
* compiler is also forbidden from reordering successive instances of
* READ_ONCE and WRITE_ONCE, but only when the compiler is aware of some
* particular ordering. One way to make the compiler aware of ordering is to
* put the two invocations of READ_ONCE or WRITE_ONCE in different C
* statements.
*
* These two macros will also work on aggregate data types like structs or
* unions. If the size of the accessed data type exceeds the word size of
* the machine (e.g., 32 bits or 64 bits) READ_ONCE() and WRITE_ONCE() will
* fall back to memcpy(). There's at least two memcpy()s: one for the
* __builtin_memcpy() and then one for the macro doing the copy of variable
* - '__u' allocated on the stack.
*
* Their two major use cases are: (1) Mediating communication between
* process-level code and irq/NMI handlers, all running on the same CPU,
* and (2) Ensuring that the compiler does not fold, spindle, or otherwise
* mutilate accesses that either do not require ordering or that interact
* with an explicit memory barrier or atomic instruction that provides the
* required ordering.
*/
E.g.:
uint32_t value = READ_ONCE(shared);
READ_ONCE macro essentially casts the object you read to be volatile because the compiler cannot emit extra reads or writes for volatile objects.
The above is equivalent to:
uint32_t value = *(uint32_t volatile*)&shared;
Alternatively:
uint32_t value;
memcpy(&value, &shared, sizeof value);
memcpy breaks the dependency between shared and value, so that the compiler cannot re-load shared instead of loading value.

In the example given you are not using the variable value in the function at all. So it will definitely be optimised.
Also, as mentioned in comments, in a multitasking system, the value of shared can be changed within the function.
What I need is that shared is read only once and it local value keeped for all function length and not re-evaluated
I would suggest something like this below.
extern uint32_t a, b, shared;
void useless_function()
{
__ASM volatile ("":::"memory");
uint32_t value = shared;
a = value*2;
b = value << 3;
}
Here shared is read only once in the function. It will be read again on next call of the function.

Related

Where are arrays stored in memory?

I know that the global variable is stored in memory, and the local variable is saved in the stack, and the stack will be destroyed after each use.
But I do not know how an array is saved in memory.
I tried to declare a global array:
int tab[5]={10,9,12,34,30};
and at the end I read the contents of the memory, I mean, that after the execution of the code, I read the contents of the memory, (e.g I'm working on a microcontroller, and I know where the data is saving) when I declare a global variable for example a = 10; and when I read the contents of the memory I find the value 10 in the memory, but I do not find the content of the table that is 10,9,12,34,30
I want to understand where the array content is it save in memory?
I work on Aurix Infineon, and I use Hightec as a compiler, I execute my code directly on the aurix, I read the memorition like this:
const volatile unsigned char * mem_start = 0xd0000000;
#define size ((ptrdiff_t) 0xfff)
unsigned char bufferf [size];
code ();
main(){
...
for (int e = 0; e < sizeof (bufferf); e ++)
bufferf [e] = * (mem_start + e); // read memory
}

The answer to the question of where arrays, or indeed any variables, are stored depends on whether you are considering the abstract machine or the real hardware. The C standard specifies how things work on the abstract machine, but a conforming compiler is free to do something else on the real hardware (e.g., due to optimization) if the observable effects are the same.
(On the abstract machine, arrays are generally stored in the same place as other variables declared in the same scope would be.)
For example, a variable might be placed in a register instead of the stack, or it might be optimized away completely. As a programmer you generally shouldn't care about this, since you can also just consider the abstract machine. (Admittedly there may be some cases where you do care about it, and on microcontrollers with very limited RAM one reason might be that you have to be very frugal about using the stack, etc.)
As for your code for reading the memory: it cannot possibly work. If size is the size of the memory available to variables, you cannot fit the array bufferf[size] in that memory together with everything else.
Fortunately, copying the contents of the memory to a separate buffer is not needed. Consider your line bufferf[e] = *(mem_start + e) – since you can already read an arbitrary index e from memory at mem_start, you can use *(mem_start + e) (or, better, mem_start[e], which is exactly equivalent) directly everywhere you would use bufferf[e]! Just treat mem_start as pointing to the first element of an array of size bytes.
Also note that if you are programmatically searching for the contents of the array tab, its elements are ints, so they are more than one byte each – you won't simply find five adjacent bytes with those values.
(Then again, you can also just take the address of tab[0] and find out where it is stored, i.e., ((unsigned char *) tab) - mem_start is the index of tab in mem_start. However, here observing it may change things due to the aforementioned optimization.)

global variable is stored in memory, and the local variable is saved in the stack
This is false.
Global variables are sometimes kept in registers, not only in static memory.
If the function is recursive the compiler may choose to use the stack or not. If the function is tail recursive it is no need to use the stack as there is no continuation after the function returns and one can use the current frame for the next call.
There are mechanical methods to convert the code in continuation passing style and in this equivalent form it is evaluated stackless.
There are lots of mathematical models of computation and not all use the stack. The code can be converted from one model to other keeping the same result after evaluation.
You got these informations from books written in the 70s, 80s and in the meantime the process of evaluation of code was much improved (using methods that were theoretical in the 30s but nowadays are implemented in systems).

This program may get you closer to what you want:
int tab[] = {10, 9, 12, 34, 30};
// Used to get the number of elements in an array
#define ARRAY_SIZE(array) \
(sizeof(array) / sizeof(array[0]))
// Used to suppress compiler warnings about unused variables
#define UNUSED(x) \
(void)(x)
int main(void) {
int *tab_ptr;
int tab_copy[ARRAY_SIZE(tab)];
tab_ptr = tab;
UNUSED(tab_ptr);
for (int index = 0; index < ARRAY_SIZE(tab_copy); index++) {
tab_copy[index] = tab[index];
}
return 0;
}
I don't have the Hightec environment to test this, nor do I have an Aurix to run it on. However, this may give you a starting point that helps you obtain the information you desire.
It looks like the Infineon Aurix is a 32-bit little-endian machine. This means that an int will be 32-bits wide and the first byte will store the first 8 bits of the value. For example, the value 10 would be stored like this:
0a 00 00 00
If you try to read this memory as char, you will get 0x0a, 0x00, 0x00, and 0x00.

volatile and non-volatile bitfields

I'm writing a code for Cortex-M0 CPU and gcc. I've the following structure:
struct {
volatile unsigned flag1: 1;
unsigned flag2: 1;
unsigned foo; // something else accessed in main loop
} flags;
flag1 is read and written from both GPIO interrupt handler and main loop. flag2 is only read and written in main loop.
The ISR looks like this:
void handleIRQ(void) {
if (!flags.flag1) {
flags.flag1 = 1;
// enable some hw timer
}
}
The main loop looks like this:
for (;;) {
// disable IRQ
if (flags.flag1) {
// handle IRQ
flags.flag1 = 0;
// access (rw) flag2 many times
}
// wait for interrupt, enable IRQ
}
When accessing flag2 in main loop, will the compilier optimize access to it so it won't be fetched or stored to memory every time it is read or written to in code?
It's not clear to me because to set flag1 in ISR, it will need to load whole char, set a bit and store it back.

It is my reading of the C11 standard that it is not proper to use a bitfield for this - even if both of them were declared as volatile. The following excerpt is from 3.14 Memory location:
Memory location
Either an object of scalar type, or a maximal sequence of adjacent bit-fields all having nonzero width
NOTE 1 Two threads of execution can update and access separate memory locations without interfering with each other.
NOTE 2 It is not safe to concurrently update two non-atomic bit-fields in the same structure if all
members declared between them are also (non-zero-length) bit-fields, no matter what the sizes of those
intervening bit-fields happen to be.
There is no exception given for volatile. Thus it wouldn't be safe to use the above bitfield if both threads of execution (i.e. the main and the ISR) if ISR will update one flag and the main will update another. The solution given is to add a member of size 0 in between to force them be placed in different memory locations. But then again, it would mean that both flags would consume at least one byte of memory, so it is again just simpler to use a non-bit-field unsigned char or bool for them:
struct {
volatile bool flag1;
bool flag2;
unsigned foo; // something else accessed in main loop
} flags;
Now they will be placed in different memory locations and they can be updated without them interfering with each other.
However the volatile for flag1 is still strictly necessary because otherwise updates to flag1 would be side-effect free in the main thread, and the compiler could deduce that it can keep that field in a register only - or that nothing need to be updated at all.
However, one needs to note that under C11, even the guarantees of volatile might not be enough: 5.1.2.3p5:
When the processing of the abstract machine is interrupted by receipt of a signal, the values of objects that are neither lock-free atomic objects nor of type volatile sig_atomic_t are unspecified, as is the state of the floating-point environment. The value of any object modified by the handler that is neither a lock-free atomic object nor of type volatile sig_atomic_t becomes indeterminate when the handler exits, as does the state of the floating-point environment if it is modified by the handler and not restored to its original state.
Thus, if full compatibility is required, flag1 ought to be for example of type volatile _Atomic bool; it might even be possible to use an _Atomic bitfield. Both of these require a C11 compiler, however.
Then again, you can check the manuals of your compiler if they guarantee that an access to such volatile objects is also guaranteed to be atomic.

The volatile flag for just one bit isn't all that meaningful - it is possibly even harmful. What the compiler might do in practice is to allocate two chunks of memory, possibly each 32 bits wide. Because the volatile flag blocks it from combining the two bits inside the same allocated area, since there is no bit-level access instruction available.
When accessing flag2 in main loop, will the compilier optimize access to it so it won't be fetched or stored to memory every time it is read or written to in code?
That's hard to tell, depends on how many data registers there are available. Disassemble the code and see.
Overall, bit-fields are not recommended since they are so poorly defined by the standard. And in this case, the individual volatile bit might lead to extra memory getting allocated.
Instead, you should do this:
volatile bool flag1;
bool flag2;
Assuming those flags aren't part of a hardware register, in which case the code was incorrect from the start and they should both be volatile.

Clear variable on the stack

Code Snippet:
int secret_foo(void)
{
int key = get_secret();
/* use the key to do highly privileged stuff */
....
/* Need to clear the value of key on the stack before exit */
key = 0;
/* Any half decent compiler would probably optimize out the statement above */
/* How can I convince it not to do that? */
return result;
}
I need to clear the value of a variable key from the stack before returning (as shown in the code).
In case you are curious, this was an actual customer requirement (embedded domain).

You can use volatile (emphasis mine):
Every access (both read and write) made through an lvalue expression of volatile-qualified type is considered an observable side effect for the purpose of optimization and is evaluated strictly according to the rules of the abstract machine (that is, all writes are completed at some time before the next sequence point). This means that within a single thread of execution, a volatile access cannot be optimized out or reordered relative to another visible side effect that is separated by a sequence point from the volatile access.
volatile int key = get_secret();

volatile might be overkill sometimes as it would also affect all the other uses of a variable.
Use memset_s (since C11): http://en.cppreference.com/w/c/string/byte/memset
memset may be optimized away (under the as-if rules) if the object modified by this function is not accessed again for the rest of its lifetime. For that reason, this function cannot be used to scrub memory (e.g. to fill an array that stored a password with zeroes). This optimization is prohibited for memset_s: it is guaranteed to perform the memory write.
int secret_foo(void)
{
int key = get_secret();
/* use the key to do highly privileged stuff */
....
memset_s(&key, sizeof(int), 0, sizeof(int));
return result;
}
You can find other solutions for various platforms/C standards here: https://www.securecoding.cert.org/confluence/display/c/MSC06-C.+Beware+of+compiler+optimizations
Addendum: have a look at this article Zeroing buffer is insufficient which points out other problems (besides zeroing the actual buffer):
With a bit of care and a cooperative compiler, we can zero a buffer — but that's not what we need. What we need to do is zero every location where sensitive data might be stored. Remember, the whole reason we had sensitive information in memory in the first place was so that we could use it; and that usage almost certainly resulted in sensitive data being copied onto the stack and into registers.
Your key value might have been copied into another location (like a register or temporary stack/memory location) by the compiler and you don't have any control to clear that location.

If you go with dynamic allocation you can control wiping that memory and not be bound by what the system does with the stack.
int secret_foo(void)
{
int *key = malloc(sizeof(int));
*key = get_secret();
memset(key, 0, sizeof(int));
// other magical things...
return result;
}

One solution is to disable compiler optimizations for the section of the code that you dont want optimizations:
int secret_foo(void) {
int key = get_secret();
#pragma GCC push_options
#pragma GCC optimize ("O0")
key = 0;
#pragma GCC pop_options
return result;
}

How to read and write memory mapped registers using keyword volatile?

I met this question in an interview. I have no such experience.
So if we have two registers. One with address 0x11111111 and the other 0x22222222. We want to read and write it. The first one is a 32-bit register while the second one is 64-bit. How do we do it in C? Can anyone just give me an example?
Thanks,

You can use some kind of pointer or other, for example:
#include <stdint.h>
uint32_t volatile * p = (uint32_t volatile *) 0x11111111;
uint64_t volatile * q = (uint64_t volatile *) 0x22222222;
++*p; // read-modify-write
(Note that this specific example is almost certainly bogus, since neither address seems to be aligned properly for the respective type.)
As you say, qualifying the pointers as volatile is necessary if the values stored at those addresses can change from outside your program; with volatile you tell the compiler that no assumptions may be made about the value (e.g. constant propagation or common subexpression elimination may not be done for volatile values).

Volatile keyword in C [duplicate]

This question already has answers here:
Why is volatile needed in C?
(18 answers)
Closed 9 years ago.
I am writing program for ARM with Linux environment. its not a low level program, say app level
Can you clarify me what is the difference between,
int iData;
vs
volatile int iData;
Does it have hardware specific impact ?

Basically, volatile tells the compiler "the value here might be changed by something external to this program".
It's useful when you're (for instance) dealing with hardware registers, that often change "on their own", or when passing data to/from interrupts.
The point is that it tells the compiler that each access of the variable in the C code must generate a "real" access to the relevant address, it can't be buffered or held in a register since then you wouldn't "see" changes done by external parties.
For regular application-level code, volatile should never be needed unless (of course) you're interacting with something a lot lower-level.

The volatile keyword specifies that variable can be modified at any moment not by a program.
If we are talking about embedded, then it can be e.g. hardware state register. The value that it contains may be modified by the hardware at any unpredictable moment.
That is why, from the compiler point of view that means that compiler is forbidden to apply optimizations on this variable, as any kind of assumption is wrong and can cause unpredictable result on the program execution.

By making a variable volatile, every time you access the variable, you force the CPU to fetch it from memory rather than from a cache. This is helpful in multithreaded programs where many threads could reuse the value of a variable in a cache. To prevent such reuse ( in multithreaded program) volatile keyword is used. This ensures that any read or write to an volatile variable is stable (not cached)

Generally speaking, the volatile keyword is intended to prevent the compiler from applying any optimizations on the code that assume values of variables cannot change "on their own."
(from Wikipedia)
Now, what does this mean?
If you have a variable that could have its contents changed at any time, usually due to another thread acting on it while you are possibly referencing this variable in a main thread, then you may wish to mark it as volatile. This is because typically a variable's contents can be "depended on" with certainty for the scope and nature of the context in which the variable is used. But if code outside your scope or thread is also affecting the variable, your program needs to know to expect this and query the variable's true contents whenever necessary, more than the normal.
This is a simplification of what is going on, of course, but I doubt you will need to use volatile in most programming tasks.

In the following example, global_data is not explicitly modified. so when the optimization is done, compiler thinks that, it is not going to modified anyway. so it assigns global_data with 0. And uses 0, whereever global_data is used.
But actually global_data updated through some other process/method(say through ptrace ). by using volatile you can force it to read always from memory. so you can get updated result.
#include <stdio.h>
volatile int global_data = 0;
int main()
{
FILE *fp = NULL;
int data = 0;
printf("\n Address of global_data:%x \n", &global_data);
while(1)
{
if(global_data == 0)
{
continue;
}
else if(global_data == 2)
{
;
break;
}
}
return 0;
}

volatile keyword can be used,
when the object is a memory mapped io port.
An 8 bit memory mapped io port at physical address 0x15 can be declared as
char const ptr = (char *) 0x15;
Suppose that we want to change the value at that port at periodic intervals.
*ptr = 0 ;
while(*ptr){
*ptr = 4;//Setting a value
*ptr = 0; // Clearing after setting
}
It may get optimized as
*ptr = 0 ;
while(0){
}
Volatile supress the compiler optimization and compiler assumes that tha value can
be changed at any time even if no explicit code modify it.
Volatile char *const ptr = (volatile char * )0x15;
Used when the object is a modified by ISR.
Sometimes ISR may change tha values used in the mainline codes
static int num;
void interrupt(void){
++num;
}
int main(){
int val;
val = num;
while(val != num)
val = num;
return val;
}
Here the compiler do some optimizations to the while statement.ie the compiler
produce the code it such a way that the value of num will always read form the cpu
registers instead of reading from the memory.The while statement will always be
false.But in actual scenario the valu of num may get changed in the ISR and it will
reflect in the memory.So if the variable is declared as volatile the compiler will know
that the value should always read from the memory

volatile means that variables value could be change any time by any external source. in GCC if we dont use volatile than it optimize the code which is sometimes gives unwanted behavior.
For example if we try to get real time from an external real time clock and if we don't use volatile there then what compiler do is it will always display the value which is stored in cpu register so it will not work the way we want. if we use volatile keyword there then every time it will read from the real time clock so it will serve our purpose....
But as u said you are not dealing with any low level hardware programming then i don't think you need to use volatile anywhere
thanks

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight