Why do HashTable read operations need to be locked? - nshashtable

The read operation of CopyAndWriteArrayList does not need to be locked, because the write operation will copy the array and will not affect the original array, so the read operation is not locked.
So what will happen if the HashTable is not locked? If it only reads the data modified by other threads, there is no problem in some business scenarios.
Is this the reason why ConcurrentHashMap appeared later, the write operation is locked, the read operation is not locked, and the visibility is guaranteed through volatile, and the performance of reading and writing is improved at the same time.
God please enlighten me.

HashTable is an old library. You can find that the thread synchronization scheme for old libraries in JDK is directly locked. This is a relatively simple scheme, so do it directly.
When Java became more and more popular, and the community wanted Java to have stronger performance, various thread synchronization solutions appeared after JDK5, so this is just a historical issue, and there is no need to delve into it.
You can focus more on the design ideas of new solutions, such as the ones you mentioned: CopyAndWriteArrayList and ConcurrentHashMap.

Related

Do mutexes only function correctly if all relevant threads attempt to acquire the locks they should be acquiring, prior to utilizing a resource?

I'm just learning about locks for the first time prior to taking an OS class for the first time. I originally thought that locks would literally "lock some resource" where you would need to specify the resource (perhaps by pointer to the address of the resource in memory), but after reading through a couple really basic implementations of spin-locks (say, the unix-like training OS "xv6"'s version):
http://pages.cs.wisc.edu/~skobov/cs537/P3/xv6/kernel/spinlock.h
http://pages.cs.wisc.edu/~skobov/cs537/P3/xv6/kernel/spinlock.c
As well as this previous stack overflow question: (What part of memory does a mutex lock? (pthreads))
I think I had it all wrong.
It seems to me instead that locks are effectively just a boolean flag like variable that temporarily (or indefinitely) blocks execution of some code that would utilize a resource, but only where another thread actually also attempts to acquire the lock (where in that second thread attempting to acquire the lock as well, that blocking of the second thread has the side effect of that second thread not being able to utilize the resource until the lock is released by the first thread). So now I'm wondering instead: if a poorly designed thread that uses no mutexes and simply attempts to utilize a resource that another well designed thread held a lock on, is the poorly designed thread able to access the resource regardless (by simply ignoring the mutex -- which I'm now thinking acts as a flag a thread should look at, but has the opportunity to ignore)?
If that's the case, then why do we implement locks as sophisticated boolean variables such that all threads must use the locks as opposed to a lock that instead prevents access to a memory region?
Since I'm relatively new to all this, I appreciate any reasonable terminology edit recommendations if I'm stating my question incorrectly as well an answer!
Thank you very much!
--edit, Thank you all for the prompt and helpful responses!
If that's the case, then why do we implement locks as sophisticated boolean variables such that all threads must use the locks as opposed to a lock that instead prevents access to a memory region?
A lot of reasons:
What if the thing you're controlling access to isn't a memory region? What if it's a file or a network connection?
How would the compiler know when it was going to access a region of protected memory? Would the compiler have to assume that any memory access anywhere might synchronize with other threads? That would make many optimizations impossible, including storing possibly shared variables in registers which is pretty critical.
Would hardware have to support locking memory on any granularity? How would it know what memory is associated with an object? Consider a linked list. Would you have to lock every bit of memory associated with that linked list and every object in it? When you add or remove an object from the list, do you have to change what memory is protected? Won't that be both expensive and extremely difficult to use?
How would it know when to release a lock? Say you access some area of memory that needs protection and then later you access some other area of memory. How would the implementation know whether other threads could be allowed to access that area in-between those two accesses? The implementation would need to know whether the code accessing that region was or wasn't relying on a consistent view of the shared state over those two accesses. How could it know that? Get it wrong by keeping the lock and concurrency suffers. Get it wrong by releasing the lock in-between the two accesses, and the code can behave unpredictably.
And so on.

Reader-Writer using semaphores and shared memory in C

I'm trying to make a simple reader/writer program using POSIX named semaphores, its working, but on some systems, it halts immediately on the first semaphore and thats it ... I'm really desperate by now. Can anyone help please? Its working fine on my system, so i can't track the problem by ltrace. (sorry for the comments, I'm from czech republic)
https://www.dropbox.com/s/hfcp44u2r0jd7fy/readerWriter.c
POSIX semaphores are not well suited for application code since they are interruptible. Basically any sort of IO to your processes will mess up your signalling. Please have a look at this post.
So you'd have to be really careful to interpret all error returns from the sem_ functions properly. In the code that you posted there is no such thing.
If your implementation of POSIX supports them, just use rwlocks, they are made for this, are much higher level and don't encounter that difficulty.
In computer science, the readers-writers problems are examples of a common computing problem in concurrency. There are at least three variations of the problems, which deal with situations in which many threads try to access the same shared memory at one time. Some threads may read and some may write, with the constraint that no process may access the share for either reading or writing, while another process is in the act of writing to it. (In particular, it is allowed for two or more readers to access the share at the same time.) A readers-writer lock is a data structure that solves one or more of the readers-writers problems.

Reading Critical Section Data using pthreads

I have a multi-threaded application, I'm using pthreads with the pthread_mutex_lock function. The only data I need to protect is in one data structure. Is it safe if I apply the lock only when I write to the data structure? Or should I apply the lock whenever I read or write?
I found a question similar to this, but it was for Windows, from that question it would that the answer to mine would be that it is ok. Just want to make sure though.
EDIT
follow up: So I want to pass in a command line argument and on read from it (from different threads). Do I still have to use pthread_mutex_lock?
You could use a pthreads_rwlock_t to allow "one-writer OR N-readers" concurrency. But if you stick with the general pthread_mutex_lock, it needs to be acquired for ANY access to the shared data structure it's protecting, so you're cutting things down to "one reader-or-writer" concurrency.
It is necessary to apply the lock when you read as well unless you can guarantee atomic writes (at which point you don't even need to lock on write). The problem arises from writes that take more than 1 cycle.
Imagine you write 8 bytes as two 4 byte writes. If the other thread kicks off after it has half been written then the read will read invalid data. Its veyr ucommon that this happens but when it does its a hell of a bug to track down.
Yes, you need to be locked for reads as well as writes.
Compilers and CPUs do not necessarily write to a field in a structure atomically. In addition your code may not write atomically, and the structure may at certain points be out of sync with regards to itself.
If all you need to share is a single integer value, you might choose to use atomic integers. GCC has atomic attributes you can use. This is not as portable as using pthreads locks.

Is lock free multithreaded programming making anything easier?

I only read a little bit about this topic, but it seems that the only benefit is to get around contention problems but it will not have any important effect on the deadlock problem as the code which is lock free is so small and fundamental (fifos, lifos, hash) that there was never a deadlock problem.
So it's all about performance - is this right?
Lock-free programming is (as far as I can see) always about performance, otherwise using a lock is in most cases much simpler, and therefore preferable.
Note however that with lock-free programming you can end up trading deadlock for live-lock, which is a lot harder to diagnose since no tools that I know of are designed to diagnose it (although I could be wrong there).
I'd say, only go down the path of lock-free if you have to; that is, you have a scenario where you have a heavily contended lock that is hurting your performance. (If it ain't broke, don't fix it).
Couple of issues.
We will soon be facing desktop systems with 64, 128 and 256 cores. Parallism in this domain is unlike our current experience of 2, 4, 8 cores; the algorithms which run successfully on such small systems will run slower on highly parallel systems due to contention.
In this sense, lock-free is important since it is contributes strongly to solving scalability.
There are also some very specific areas where lock-free is extremely convenient, such as the Windows kernel, where there are modes of execution where sleeps of any kind (such as waits) are forbidden, which obviously is very limiting with regard to data structures, but where lock-free provides a good solution.
Also, lock-free data structures often do not have failure modes; they cannot actually fail, where lock-based data structures can of course fail to obtain their locks. Not having to worry about failures simplifies code.
I've written a library of lock free data structures which I'll be releasing soon. I think if a developer can get hold of a well-proven API, then he can just use it - doesn't matter if it's lock-free or not, he doesn't need to worry about the complexity in the underlying implementation - and that's the way to go.
It's also about scalability. In order to get performance gains these days, you'll have to parallelise the problems you're working on so you can scale them across multiple cores - the more, the merrier.
The traditional way of doing this is by locking data structures that require parallel access but the more threads you can run truly parallel, the bigger an bottleneck this becomes.
So yes, it is about performance...
For preemptive threading, threads suspended while holding a lock can block threads that would otherwise be making forward progress. Lock-free doesn't have that problem since by Herlihy's definition, some other thread can always make forward progress.
For non-preemptive threading, it doesn't matter that much since even spin lock based solutions are lock-free by Herlihy's definition.
This is about performances - but also about the ability to take multi-thread loads:
locks grant an exclusive access to a portion of code: while a thread has a lock, other threads are spinning (looping while trying to acquire the lock) or blocked, sleeping until the lock is released (which usually happens if spinning lasts too long);
atomic operations grant an exclusive access to a resource (usually a word-sized variable or a pointer) by using uninterruptible intrinsic CPU instructions.
As locks BLOCK other threads' execution, a program is slowed-down.
As atomic operations execute serially (one after another), there is no blocking*.
(*) as long as the number of concurrent CPUs trying to access the same resource do not create a bottleneck - but we don't have enough CPU Cores yet to see this as a problem.
I have worked on the matter to write a wait-free (lock-free without wait states) Key-Value store for the server I am working on.
Libraries like Tokyo Cabinet (even TC-FIXED, a simple array) rely on locks to preserve the integrity of a database:
"while a writing thread is operating the database, other reading threads and writing threads are blocked" (Tokyo Cabinet documentation)
The results of a test without concurrency (a one-thread test):
SQLite time: 56.4 ms (a B-tree)
TC time: 10.7 ms (a hash table)
TC-FIXED time: 1.3 ms (an array)
G-WAN KV time: 0.4 ms (something new which works, but I am not sure a name is needed)
With concurrency (several threads writing and reading in the same DB), only the G-WAN KV survived the same test because (by contrast with the others) it never ever blocks.
So, yes, this KV store makes it easier for developpers to use it since they do not have to care about threading issues. Making it work this way was not trivial however.
I believe I saw an article that mathematically proved that any algorithm can be written in a wait free manner (which basically means that you can be assured of each thread always making progress towards its goal). This means that it can be applied to any large scale application (after all, a program is just an algorithm with many, many parameters) and because wait free ensures that neither dead/live-lock occurs within it (as long as it doesn't have bugs which preclude it from being truly wait free), it does simplify that side of the program. On the other hand, a mathematical proof is a far cry from actually implementing the code itself (AFAIK, there isn't even a fully lock-free linked list that can run on PCs, I've seen ones that cover most parts, but they usually either can't handle some common functions, or some functions require the structure to be locked).
On a side note, I've also found another proof that showed any lock-free algorithm can actually be considered wait-free due to the laws of probability and various other factors.
Scalability is a really important issue in efficient multi/manicore programming. The greatest limiting factor is actually the code section that should be executed in serial (see Amdahl's Law). However, contentions on locks are also very problematic.
Lock-free algorithm addresses the scalability problem which legacy lock has. So, I could say lock-free is mostly for performance, not decreasing the possibility of deadlock.
However, keep in mind, with current x86 architecture, writing general lock-free algorithm is impossible. This is because we can't atomically exchange arbitrary size of data in current x86 (and also true for other architectures except for Sun's ROCK). So, current lock-free data structures are quite limited and very specialized for specific uses.
I think current lock-free data structures would not be used anymore in a decade. I strongly expect hardware-assisted general lock-free mechanism (yes, that is transactional memory, TM) will be implemented within a decade. If any kind of TM is implemented, though it can't perfectly solve the problems of locks, many problems (including priority inversion and deadlock) will be eliminated. However, implementing TM in hardware is still very challenging, and in x86, only a draft just has been proposed.
It's still too long: 2 sentences summary.
Lock-free data structure is not panacea for lock-based multithreading programming (even TM is not. If you seriously need scalability and have troubles on lock contention, then consider lock-free data structure.

Safety nets in complex multi-threaded code?

As a developer who has just finished writing thousands of lines of complex multi-threaded 'C' code in a project, and which is going to be enhanced, modified etc. by several other developers unfamiliar with this code in the future, I wanted to find out what kind of safety nets do you guys try to put in such code? As an example I could do these:
Define accessor macros for lock protected
structure members, which assert that
the corresponding lock is held. This
makes it clear that these members
are lock-protected to anyone unfamiliar with this code.
Functions which are supposed to be
called with some spinlock held,
assert that the spinlock is being held.
What kind of safety nets have you put into multi-threaded code that you have written?
What kind of problems have you faced when other developers modified such code?
What kind of debugging aids have you put into such code?
Thanks for your comments.
There are a number of things we do in our product (a hypervisor designed to help you find concurrency bugs in applications) that are more generally useful. Note that we do these in our code itself (because its a highly concurrent piece of software) and that some of these are useful whether or not you are writing concurrent code.
Like you, we have the ability to assert(lock_held(...)) and use it.
We also (because we have our own scheduler) can assert(single_threaded()) for those (rare) situations where we count on no other thread being active in the system.
Memory corruption from one thread to another is pretty common (and hard to debug) so we do two things to address this: sprinkled throughout our thread stack are some magic cookies. We periodically (in our get_thread_id()) function invoke a "validate_thread_stack()" function that checks these cookies to make sure the stack is not corrupted.
Our malloc sticks magic cookies before and after a malloc block of memory and checks these on free. If anyone overruns their data these can be used to find the corruption early.
On free() we blast a well known pattern (in our case 0xdddd...) over the memory. This nicely corrupts anyone else who had a dangling pointer left over to that memory region.
We have a guard page (a memory page not mapped into the address space) near the bottom of the thread stack. If the thread overruns its stack, we catch it via page fault and drop into our debugger.
Our locks are witnessed. Checkout the FreeBSD lock witness code. Its like that but homebrew. Basically the witness code is a lightweight way of detecting potential deadlocks by looking at cycles in the lock acquisition graph.
Our locks are also wrapped with accessors that record the file/line number of acquisition and release. For double unlocks or double locks, you get pretty debug information on your screwup.
Our locks are also profiled. Once you get your code working you want it working well. We track the usual things like how many acquisitions, how long it took to acquire it.
In our system, we have an expectation that locks are not contended (we carefully designed the code this way). So if you wait for a spin lock longer than a second or two in our system you get dropped into the debugger because its most likely not a good thing.
Our variables that are meant to be updated atomically are wrapped inside of C struct's. The reason for this is to prevent sloppy code where you mix good use: atomic_increment(&var); and bad use var++. We make it very hard to write the latter code.
"volatile" is forbidden in our code base because its ambiguously implemented by compilers. Its a bad way to try and cobble together synchronization.
And of course code reviews. If you can't explain your concurrency assumptions and locking discipline to a colleague, then there's definitely issues with the code :-)
Make everything absolutely obvious, so that other developers cannot miss the synchronization scope when they view subsections of the code in isolation.
for example: don't hold a lock in code that spans multiple files.
Seems like you've answered your own question: put lots of assertions into the code. They will tell other developers what invariants and preconditions must hold.

Resources