Erasing sensitive information from memory - c

After reading this question I'm curious how one would do this in C. When receiving the information from another program, we probably have to assume that the memory is writable.
I have found this stating that a regular memset maybe optimized out and this comment stating that memsets are the wrong way to do it.

The example you have provided is not quite valid: the compiler can optimize out a variable setting operation when it can detect that there are no side effects and the value is no longer used.
So, if your code uses some shared buffer, accessible from multiple locations, the memset would work fine. Almost.
Different processors use different caching policies, so you might have to use memory barriers to ensure the data (zero's) have reached memory chip from the cache.
So, if you are not worried about hardware level details, making sure compiler can't optimize out operation is sufficient. For example, memsetting block before releasing it would be executed.
If you want to ensure the data is removed from all hardware items, you need to check how the data caching is implemented on your platform and use appropriate code to force cache flush, which can be non-trivial on multi-core machine.

Related

Does labelling a block of memory volatile imply the cache is always bypassed? [duplicate]

Cache is controlled by cache hardware transparently to processor, so if we use volatile variables in C program, how is it guaranteed that my program reads data each time from the actual memory address specified but not cache.
My understanding is that,
Volatile keyword tells compiler that the variable references shouldn't be optimized and should be read as programmed in the code.
Cache is controlled by cache hardware transparently, hence when processor issues an address, it doesn't know whether the data is coming from cache or the memory.
So, if I have a requirement of having to read a memory address every time required, how can I make sure that its not referred from cache but from required address?
Some how, these two concepts are not fitting together well. Please clarify how its done.
(Imagining we have write-back policy in cache (if required for analyzing the problem))
Thank you,
Microkernel :)
Firmware developer here. This is a standard problem in embedded programming, and one that trips up many (even very experienced) developers.
My assumption is that you are attempting to access a hardware register, and that register value can change over time (be it interrupt status, timer, GPIO indications, etc.).
The volatile keyword is only part of the solution, and in many cases may not be necessary. This causes the variable to be re-read from memory each time it is used (as opposed to being optimized out by the compiler or stored in a processor register across multiple uses), but whether the "memory" being read is an actual hardware register versus a cached location is unknown to your code and unaffected by the volatile keyword. If your function only reads the register once then you can probably leave off volatile, but as a general rule I will suggest that most hardware registers should be defined as volatile.
The bigger issue is caching and cache coherency. The easiest approach here is to make sure your register is in uncached address space. That means every time you access the register you are guaranteed to read/write the actual hardware register and not cache memory. A more complex but potentially better performing approach is to use cached address space and have your code manually force cache updates for specific situations like this. For both approaches, how this is accomplished is architecture-dependent and beyond the scope of the question. It could involve MTRRs (for x86), MMU, page table modifications, etc.
Hope that helps. If I've missed something, let me know and I'll expand my answer.
From your question there is a misconception on your part.
Volatile keyword is not related to the cache as you describe.
When the keyword volatile is specified for a variable, it gives a hint to the compiler not to do certain optimizations as this variable can change from other parts of the program unexpectedly.
What is meant here, is that the compiler should not reuse the value already loaded in a register, but access the memory again as the value in register is not guaranteed to be the same as the value stored in memory.
The rest concerning the cache memory is not directly related to the programmer.
I mean the synchronization of any cache memory of CPU with the RAM is an entirely different subject.
My suggestion is to mark the page as non-cached by the virtual memory manager.
In Windows, this is done through setting PAGE_NOCACHE when calling VirtualProtect.
For a somewhat different purpose, the SSE 2 instructions have the _mm_stream_xyz instructions to prevent cache pollution, although I don't think they apply to your case here.
In either case, there is no portable way of doing what you want in C; you have to use OS functionality.
Wikipedia has a pretty good article about MTRR (Memory Type Range Registers) which apply to the x86 family of CPUs.
To summarize it, starting with the Pentium Pro Intel (and AMD copied) had these MTR registers which could set uncached, write-through, write-combining, write-protect or write-back attributes on ranges of memory.
Starting with the Pentium III but as far as I know, only really useful with the 64-bit processors, they honor the MTRRs but they can be overridden by the Page Attribute Tables which let the CPU set a memory type for each page of memory.
A major use of the MTRRs that I know of is graphics RAM. It is much more efficient to mark it as write-combining. This lets the cache store up the writes and it relaxes all of the memory write ordering rules to allow very high-speed burst writes to a graphics card.
But for your purposes you would want either a MTRR or a PAT setting of either uncached or write-through.
As you say cache is transparent to the programmer. The system guarantees that you always see the value that was last written to if you access an object through its address. The "only" thing that you may incur if an obsolete value is in your cache is a runtime penalty.
volatile makes sure that data is read everytime it is needed without bothering with any cache between CPU and memory. But if you need to read actual data from memory and not cached data, you have two options:
Make a board where said data is not cached. This may already be the case if you address some I/O device,
Use specific CPU instructions that bypass the cache. This is used when you need to scrub memory for activating possible SEU errors.
The details of second option depend on OS and/or CPU.
using the _Uncached keyword may help in embedded OS , like MQX
#define MEM_READ(addr) (*((volatile _Uncached unsigned int *)(addr)))
#define MEM_WRITE(addr,data) (*((volatile _Uncached unsigned int *)(addr)) = data)

What are the good implementation practices to minimize RAM consumption

I run a C code on an arm based Linux device that has a very small RAM space (16MB). My code is often killed (SIGKILL) by the kernel with 'out of memory' message. I run the program with Valgrind, and it does not look like there is a memory leak. I run the code with gdb as well but could not identify any mistake on the code. I will try to optimize my code going it through some many times.
In general, what would be the good implementation practices on a code to minimize the memory usage?
one might be to use functions as much as possible(?), but I guess gcc already optimizes the code to decrease the source usage.
to avoid dynamic memory allocations
what else?
Be careful about scope of objects. Make sure you are handling the memory deallocation after an object is no longer needed. I'm not sure I understand your use functions as much as possible(?). Functions require overhead, every call causes a little bit of extra memory to be taken up because it has to store a few pointers and a little bit of information about the method on the call stack. So, while that may help keep your source code clean - it won't lower your memory usage (it'll probably increase it). One way to get the best of both worlds in C is to use inline functions - which suggests to the compiler that it should not create an actual function, but rather just insert that block of code wherever it is used. Keep in mind that efficient code usually has a more machine level look to it (meaning repetition, pointers, and often developer-managed array indices) rather than taking advantage of broad purpose, function abundant objects. But, thank goodness for smart compilers so you don't have to know every optimization. However, in a lower level language like c, since it gives you so much ability to manipulate everything, you need to be careful that you don't make costly mistakes.
If you have this kind of problem on Linux you can disable overcommit memory. It will make sure that all the memory allocated has physical memory. The kernel will be less likely to kill your program. Then be sure to test the result of all mallocs because they will fail at some point when you don't have memory anymore. You can find more information here : http://www.etalabs.net/overcommit.html
You can also disable some programs on your embedded system to free memory. May be you don't use cron or don't need six TTY at startup.

multithreaded C/C++ variable no cache (Linux)

I use 2 pthreads, where one thread "notifies" the other one of an event, and for that there is a variable ( normal integer ), which is set by the second thread.
This works, but my question is, is it possible that the update is not seen immediately by the first (reading) thread, meaning the cache is not updated directly? And if so, is there a way to prevent this behaviour, e.g. like the volatile keyword in java?
(the frequency which the event occurs is approximately in microsecond range, so more or less immediate update needs to be enforced).
/edit: 2nd question: is it possible to enforce that the variable is hold in the cache of the core where thread 1 is, since this one is reading it all the time. ?
It sounds to me as though you should be using a pthread condition variable as your signaling mechanism. This takes care of all the issues you describe.
It may not be immediately visible by the other processors but not because of cache coherence. The biggest problems of visibility will be due to your processor's out-of-order execution schemes or due to your compiler re-ordering instructions while optimizing.
In order to avoid both these problems, you have to use memory barriers. I believe that most pthread primitives are natural memory barriers which means that you shouldn't expect loads or stores to be moved beyond the boundaries formed by the lock and unlock calls. The volatile keyword can also be useful to disable a certain class of compiler optimizations that can be useful when doing lock-free algorithms but it's not a substitute for memory barriers.
That being said, I recommend you don't do this manually and there are quite a few pitfalls associated with lock-free algorithms. Leaving these headaches to library writters should make you a happier camper (unless you're like me and you love headaches :) ). So my final recomendation is to ignore everything I said and use what vromanov or David Heffman suggested.
The most appropriate way to pass a signal from one thread to another should be to use the runtime library's signalling mechanisms, such as mutexes, condition variables, semaphores, and so forth.
If these have too high an overhead, my first thought would be that there was something wrong with the structure of the program. If it turned out that this really was the bottleneck, and restructuring the program was inappropriate, then I would use atomic operations provided by the compiler or a suitable library.
Using plain int variables, or even volatile-qualified ones is error prone, unless the compiler guarantees they have the appropriate semantics. e.g. MSVC makes particular guarantees about the atomicity and ordering constraints of plain loads and stores to volatile variables, but gcc does not.
Better way to use atomic variables. For sample you can use libatomic. volatile keyword not enough.

Concurrent access to struct member

I'm using 32-bit microcontroller (STR91x). I'm concurrently accessing (from ISR and main loop) struct member of type enum. Access is limited to writing to that enum field in the ISR and checking in the main loop. Enum's underlying type is not larger than integer (32-bit).
I would like to make sure that I'm not missing anything and I can safely do it.
Provided that 32 bit reads and writes are atomic, which is almost certainly the case (you might want to make sure that your enum's word-aligned) then that which you've described will be just fine.
As paxdiablo & David Knell said, generally speaking this is fine. Even if your bus is < 32 bits, chances are the instruction's multiple bus cycles won't be interrupted, and you'll always read valid data.
What you stated, and what we all know, but it bears repeating, is that this is fine for a single-writer, N-reader situation. If you had more than one writer, all bets are off unless you have a construct to protect the data.
If you want to make sure, find the compiler switch that generates an assembly listing and examine the assembly for the write in the ISR and the read in the main loop. Even if you are not familiar with ARM assembly, I'm sure you could quickly and easily be able to discern whether or not the reads and writes are atomic.
ARM supports 32-bit aligned reads that are atomic as far as interrupts are concerned. However, make sure your compiler doesn't try to cache the value in a register! Either mark it as a volatile, or use an explicit memory barrier - on GCC this can be done like so:
int tmp = yourvariable;
__sync_synchronize(yourvariable);
Note, however, that current versions of GCC person a full memory barrier for __sync_synchronize, rather than just for the one variable, so volatile is probably better for your needs.
Further, note that your variable will be aligned automatically unless you are doing something Weird (ie, explicitly specifying the location of the struct in memory, or requesting a packed struct). Unaligned variables on ARM cannot be read atomically, so make sure it's aligned, or disable interrupts while reading.
Well, it depends entirely on your hardware but I'd be surprised if an ISR could be interrupted by the main thread.
So probably the only thing you have to watch out for is if the main thread could be interrupted halfway through a read (so it may get part of the old value and part of the new).
It should be a simple matter of consulting the specs to ensure that interrupts are only processed between instructions (this is likely since the alternative would be very complex) and that your 32-bit load is a single instruction.
An aligned 32 bit access will generally be atomic (unless it were a particularly ludicrous compiler!).
However the rock-solid solution (and one generally applicable to non-32 bit targets too) is to simply disable the interrupt temporarily while accessing the data outside of the interrupt. The most robust way to do this is through an access function to statically scoped data rather than making the data global where you then have no single point of access and therefore no way of enforcing an atomic access mechanism when needed.

Is there a way to force a variable to be stored in the cache in C?

I just had a phone interview where I was asked this question. I am aware of ways to store in register or heap or stack, but cache specifically?
Not in C as a language. In GCC as a compiler - look for __builtin_prefetch.
You might be interested in reading What every programmer should know about memory.
Edit:
Just to clear some confusion - caches are physically separate memories in hardware, but not in software abstraction of the machine. A word in a cache is always associated with address in main memory. This is different from the CPU registers, which are named/addressed separately from the RAM.
In C, as in as defined by the C standard? No.
In C, as in some specific implementation on a specific platform? Maybe.
As cache is a CPU concept and is meaningless for C language (and C language has targets processors that have no cache, unlikely today, but quite common in old days) definitely No.
Trying to optimize such things by hand is also usually a quite bad idea.
What you can do is keep the job easy for the compiler keeping loops very short and doing only one thing (good for instruction cache), iterate on memory blocks in the right order (prefer accesses to consecutive cells in memory to sparse accesses), avoid reusing the same variables for different uses (it introduces read-after-write dependencies), etc. If you are attentive to such details the program is more likely to be efficiently optimized by compiler and memory accesses be cached.
But it will still depend on actual hardware and even compiler may not guarantee it.
It depends on the platform, so if you were speaking to a company targetting current generation consoles, you would need to know the PowerPC data cache intrinsics/instructions. On various platforms, you would also need to know the false sharing rules. Also, you can't cache from memory marked explicitly as uncached.
Without more context about the actual job or company or question, this would probably be best answered by talking about what not to do to keep memory references in the data cache.
If you are trying to force something to be stored in the CPU cache, I would recommend that you avoid trying to do so unless you have an overwhelmingly good reason. Manually manipulating the CPU cache can have all sorts of unintended consequences, not the least among them being coherency in multi-core or multi-CPU applications. This is something that is done by the CPU at run-time and is generally transparent to the programmer and the compiler for a good reason.
The specific answer will depend on your compiler and platform. If you are targeting a MIPS architecture, there is a CACHE instruction (assembly) which allows you to do CPU cache manipulations.

Resources