Is CryptGenRandom() thread-safe? - c

Is CryptGenRandom() thread-safe with a single global program-wide HCRYPTPROV instance?
MSDN appears to lack any info on this: https://msdn.microsoft.com/en-us/library/windows/desktop/aa379942(v=vs.85).aspx
Creating a separate HCRYPTPROV per thread and destroying it again would significantly complicate matters (and also risk more security-relevant bugs on my side), so this would be really useful to know. Sharing one global HCRYPTPROV would be a lot easier for sure.
So does anyone here know about the thread-safety of CryptGenRandom(), particularly with a single HCRYPTPROV instance?

Creating a separate HCRYPTPROV per thread doesn't make much sense. This is pointer to memory block from heap in all current implementations, primarily saved pointers to CSP entry points which used to call actual provider implementation (CPGenRandom in our case). The references themselves do not contain state of the CSP, unlike for example HCRYPTKEY which containing actual key state. So even if you create a separate HCRYPTPROV for every thread - this changes nothing.
There may be some global variables / data used by CSP internally during this call; this is however unknown as these would be implementation details. Of course we can serialize calls to CryptGenRandom in the code. However we cannot control that some other dll in our process also call CryptGenRandom concurrently. So serializing all calls to CryptGenRandom also impossible.
As result I think the CPGenRandom must be design to be thread-safe. and it my tests with a well known Microsoft CSP this is true. Internal synchronization is used in function, when need access global data and if multiple threads call CPGenRandom concurrently; every thread receives unique random data.
So my conclusion - CryptGenRandom is thread-safe, at least for all Microsoft CSP

Related

Calling convention which only allows one instance of a function at a time

Say I have multiple threads and all threads call the same function at approximately the same time.
Is there a calling convention which would only allow one instance of the function at any time? What I mean is that the function called by the second thread would only start after the function called by the first thread had returned.
Or are these calling conventions compiler specific? I don't have a whole lot of experience using them.
(Skip to the bottom if you don't care about the threading mumbo-jumbo)
As mentioned before, this is not a "calling convention" but a general problem of computing: concurrency. And the particular case where two or more threads can enter a shared zone at a time, and have a different outcome, is called a race condition (and also extends to/from electronics, and other areas).
The hard thing about threading is that computing is such a deterministic affair, but when threading gets involved, it adds a degree of uncertainty, which vary per platform/OS.
A one-thread affair would guarantee that it can do all tasks in the same order, always, but when you got multiple threads, and the order depends on how fast they can complete a task, shared other applications wanting to use the CPU, then the underlying hardware affects the results.
There's not much of a "sure fire way to do threading", as there's techniques, tools and libraries to deal with individual cases.
Locking in
The most well known technique is using semaphores (or locks), and the most well known semaphore is the mutex one, which only allows one thread at a time to access a shared space, by having a sort of "flag" that is raised once a thread has entered.
if (locked == NO)
{
locked = YES;
// Do ya' thing
locked = NO;
}
The code above, although it looks like it could work, it would not guarantee against cases where both threads pass the if () and then set the variable (which threads can easily do). So there's hardware support for this kind of operation, that guarantees that only one thread can execute it: The testAndSet operation, that checks and then, if available, sets the variable. (Here's the x86 instruction from the instruction set)
On the same vein of locks and semaphores, there's also the read-write lock, that allows multiple readers and one writer, specially useful for things with low volatility. And there's many other variations, some that limit an X amount of threads and whatnot.
But overall, locks are lame, since they are basically forcing serialisation of multi-threading, where threads actually need to get stuck trying to get a lock (or just testing it and leaving). Kinda defeats the purpose of having multiple threads, doesn't it?
The best solution in terms of threading, is to minimise the amount of shared space that threads need to use, possibly, elmininating it completely. Maybe use rwlocks when volatility is low, try to have "try and leave" kind of threads, that check if the lock is up, and then go away if it isn't, etc.
As my OS teacher once said (in Zen-like fashion): "The best kind of locking is the one you can avoid".
Thread Pools
Now, threading is hard, no way around it, that's why there are patterns to deal with such kind of problems, and the Thread Pool Pattern is a popular one, at least in iOS since the introduction of Grand Central Dispatch (GCD).
Instead of having a bunch of threads running amok and getting enqueued all over the place, let's have a set of threads, waiting for tasks in a "pool", and having queues of things to do, ideally, tasks that shouldn't overlap each other.
Now, the thread pattern doesn't solve the problems discussed before, but it changes the paradigm to make it easier to deal with, mentally. Instead of having to think about "threads that need to execute such and such", you just switch the focus to "tasks that need to be executed" and the matter of which thread is doing it, becomes irrelevant.
Again, pools won't solve all your problems, but it will make them easier to understand. And easier to understand may lead to better solutions.
All the theoretical things above mentioned are implemented already, at POSIX level (semaphore.h, pthreads.h, etc. pthreads has a very nice of r/w locking functions), try reading about them.
(Edit: I thought this thread was about Obj-C, not plain C, edited out all the Foundation and GCD stuff)
Calling convention defines how stack & registers are used to implement function calls. Because each thread has its own stack & registers, synchronising threads and calling convention are separate things.
To prevent multiple threads from executing the same code at the same time, you need a mutex. In your example of a function, you'd typically put the mutex lock and unlock inside the function's code, around the statements you don't want your threads to be executing at the same time.
In general terms: Plain code, including function calls, does not know about threads, the operating system does. By using a mutex you tap into the system that manages the running of threads. More details are just a Google search away.
Note that C11, the new C standard revision, does include multi-threading support. But this does not change the general concept; it simply means that you can use C library functions instead of operating system specific ones.

Erlang NIF from single process storing structs in memory

Im writing an Erlang C NIF that will only be used by one Erlang process. I want to create a struct that will hold an array of pointers. I need this to exist in between the process' calls to the NIF.
What I need insight to is the proper way to do this approach from the Erlang NIF side of things. Im thinking of writing a struct outside of all the functions so its accessible to all. When I create it in one call to the NIF, and then come back and use it with another call to the NIF, it seems to work just fine.
Im worried that this could be because the process is staying local to the scheduling thread and therefore does not have to move the struct and underlying array in memory.
Should I be using erlang:memalloc from within a function and avoiding globals all together or, staying as is with global structs?
Possibly return a pointer to a single array containing all my data?
You could certainly return a pointer to a single array containing your data; to do that, look at ErlNifResourceType. You would pass this back to the calling erlang process, and it in turn would pass it back to you on subsequent NIF calls. This would ensure that only one thread was operating on your data at a time (assuming only one process had a copy of the resource; it's not something you want to share, especially if it contains pointers).
You could also encode it as an erlang list, but that would probably be very inefficient.
That being said, you can use shared memory from a NIF. For example, here's an ets-like database implemented as a NIF using shared data.
You just have to keep in mind that you're accessing shared resources. The NIF API provides thread creation, thread specific data, mutexes, conditions, and read/write locks. You can even send a message to an erlang process from a NIF-created thread (in the event of a long-running NIF call, this is actually how you'd want to implement it to prevent scheduling problems).
Given your requirements, you're probably better off using the ErlNifResource type rather than messing with multithreading and shared resource controls. Technically if you're only using one erlang process you could leave it as a global variable (read: shared resource) without any harmful side effects. That being said, things change, and you don't want to be the cause of someone's headache down the road when they try to use your code from multiple processes. Whichever method you wind up using, make sure it's thread safe.

Why thread specific data is required in pthread?

all the threads share memory location. For example a global variable changes in one thread will reflect in another thread. Since each thread has its own stack, the local
variables that are created inside the thread is unique. In this case, why do we need
to go for thread specific data mechanism?. Can't it be achieved by auto storage varibles
inside the thread function ?
Kindly clarify!!!.
BR
Rj
Normal globals are shared between threads. Local variables are specific to a particular invocation of a function. If you want something that (for example) is visible to a number of functions running in the same thread, but unique to that thread, then thread specific data is what you're looking for.
It's not required but it's rather handy. Some functions like rand and strtok use static storage duration information which is likely to be problematic when shared among threads.
Say you have a random number function where you want to maintain a different sequence (hence seed) for each thread. You have two approaches.
You can use something like the kludgy:
int seed;
srand (&seed, time (NULL));
int r = rand_r (void *seed);
where the seed has to be created by the caller and passed in each time.
Or you can use the rather nicer, ISO-compliant:
srand (time (NULL));
int r = rand();
that uses thread-local storage to maintain a thread-specific seed. Similarly with the information used by strtok regarding the locations within the string it's processing.
That way, you don't have to muck about with changing your code between threaded and non-threaded versions.
Now you could create that information in the thread function but how is the rand function going to know about it's address without it being passed down. And what if rand is called 87 stack levels down? That's an awful lot of levels to be transferring a pointer through.
And, even if you do something like:
void pthread_fn (void *unused) {
int seed;
rand_set_seed_location (&seed);
:
}
and rand subsequently uses that value regardless of how deep it is in the stack, that's still a code change from the standard. It may work but so may writing an operating system in COBOL. That doesn't make it a good idea :-)
Yes, the stack is one way of allocating thread-local storage (including handles to heap allocations local to the particular thread).
The best example for thread specific data is the "errno". When a call to some function in c library failed, the errno is set, and you can check it out to find the reason of the failure. If there's no thread specific data, it's impossible to port these functions to multi-thread environment because the errno could be set by other threads before you check it.
As a general rule, most uses of TSD should be avoided in new APIs. If a function needs some information, it should be passed to it.
However, sometimes you need TSD to 'paper over' an API defect. A good example is 'gmtime'. The 'gmtime' function returns a pointer to a structure that is valid until the next call to 'gmtime'. But that would make 'gmtime' awfully hard to use in a multi-threaded program. What if some library called 'gmtime' when you didn't expect it, trashing your structure? One simple workaround is make the structure returned thread-specific. (The long-term solution, of course, is to create a more suitable API such as 'gmtime_r'.)
One case where it's perfectly reasonable to use TSD in new designs is for information that won't be accessed frequently that would clutter the API. For example, if a critical error is discovered, it might be nice to log certain context information from higher-level code (Which client were you serving? What command did they send?). Your choices are basically to pass this context information from function to function to function (which isn't even always possible if some of the functions are outside your control) or to store it in TSD.

Tips to write thread-safe UNIX code?

What are the guidelines to write thread-safe UNIX code in C and C++?
I know only a few:
Don't use globals
Don't use static local storage
What others are there?
The simple thing to do is read a little. The following list contains some stuff to look at and research.
Spend time reading the Open Group Base Specification particularly the General Information section and the subsection on threads. This is the basis information for multithreading under most UN*X-alike systems.
Learn the difference between a mutex and a semaphore
Realize that everything that is shared MUST be protected. This applies to global variables, static variables, and any shared dynamically allocated memory.
Replace global state flags with condition variables. These are implemented using pthread_cond_init and related functions.
Once you understand the basics, learn about the common problems so that you can identify them when they occur:
Lock inversion deadlocks
Priority inversion - if you are interested in a real life scenario, then read this snippet about the Mars Pathfinder
It really comes down to shared state, globals and static local are examples of shared state. If you don't share state, you won't have a problem. Other examples of shared state include multiple threads writing to a file or socket.
Any shared resource will need to be managed properly - that might mean making something mutex protected, opening another file, or intelligently serializing requests.
If two threads are reading and writing from the same struct, you'll need to handle that case.
Beware of the sem_t functions, they may return uncompleted on interrupts, IO, SIGCHLD etc. If you need them, be sure to allways capture that case.
pthread_mut_t and pthread_cond_t functions are safe with respect to EINTR.
A good open book about concurrency in general can be found here: Little Book of Semaphores
It presents various problems that are solved step-by step and include solutions to common concurrency issues like starvation, race conditions etc.
It is not language-specific but contains short chapters about implementing the solutions in C with the Pthread-Library or Python.

Thread local storage used anywhere else?

Is thread local storage used anywhere else other than making global and static variables local to a thread?Is it useful in any new code that we write?
TLS can certainly be useful in new code. If you ever want a global variable which needs to be specific to each thread, (like errno in C/C++), thread-local-storage is the way to go.
Thread specific singleton objects? A multi-threaded web server where each thread is handling one request, there is quite a good amount of possibility of some TLS data (like request URL or some database connections, essentially some resources intended to be used at any point during request handling if required) so that they can be easily accessed anywhere in the code when required.
These days errno is typically put in thread-local storage.
There are some situations (eg: shared libraries like DLLs that require startup code) where using thread-local storage can be a problem.
I've only needed it for thread-specific error handling, and optimization (in C):
__thread int cpfs_errno;
static __thread struct Cpfs *g_cpfs;
In this example, this saves me passing a context pointer of struct Cpfs * through dozens of functions in which it never changes.

Resources