Operating Systems Deadlocks - timer

Consider the following procedure to eliminate deadlock: When a process requests a resource, it specifies a time limit. If the process blocks because the resource is not available, a timer is started. If the time limit is exceeded, the process is released and allowed to run again. Does this eliminate deadlock? Why or why not?

possibly because deadlock involved in this case not only depends on the resources acquired by the processes involved but also processes uses a clock/timer mechanism
to peform a cascading rollback until . Deadlock is removed .

That does not stop deadlocks. It would just stop a process from being blocked when one has occurred.

Related

does SQLITE needs explicit lock?

I have read the sqlite.org contents to know about the sqlite locking mechanism.
As per my understanding sqlite uses its own locking mechanism to prevent from multiple processes / threads writing at once on same file.
https://www.sqlite.org/lockingv3.html
https://www.sqlite.org/wal.html
Do we need explicit lock (like semaphore mutex) for write queries across one or more process?
When WAL mode enabled, If process-1 executing read query while process-2 writing to the DB , will process-1 gets db_locked/ db_busy error?

How do you prevent a user program from leaving kernel resources locked?

Let's consider a case where a user program calls a system call that has some synchronization measures. The simplest example would be
rwlock_t lock; // let's assume it's initialized properly.
write_lock(&lock);
// do something...
write_unlock(&lock);
Now, what happens when the user program terminates after locking lock but before releasing it is that the lock becomes perpetually locked, which we do not want it to happen. What we want the kernel to do is smartly detect any hanging locks and release them accordingly. But detecting such tasks can incur too much overhead, as the system needs to periodically record and check every task for every synchronizing action.
Or perhaps we can centralize the code into another kernel thread and do synchronization job there. But invoking on another thread still requires some form of synchronization, so I don't think it is possible to completely remove synchronizing code from the user program.
I have put a lot of thought into this and tried to google for some information but I couldn't see any light on this. Any help would be very much appreciated. Thank you.

How to trigger spurious wake-up within a Linux application?

Some background:
I have an application that relies on third party hardware and a closed source driver. The driver currently has a bug in it that causes the device to stop responding after a random period of time. This is caused by an apparent deadlock within the driver and interrupts proper functioning of my application, which is in an always-on 24/7 highly visible environment.
What I have found is that attaching GDB to the process, and immediately detaching GDB from the process results in the device resuming functionality. This was my first indication that there was a thread locking issue within the driver itself. There is some kind of race condition that leads to a deadlock. Attaching GDB was obviously causing some reshuffling of threads and probably pushing them out of their wait state, causing them to re-evaluate their conditions and thus breaking the deadlock.
The question:
My question is simply this: is there a clean wait for an application to trigger all threads within the program to interrupt their wait state? One thing that definitely works (at least on my implementation) is to send a SIGSTOP followed immediately by a SIGCONT from another process (i.e. from bash):
kill -19 `cat /var/run/mypidfile` ; kill -18 `cat /var/run/mypidfile`
This triggers a spurious wake-up within the process and everything comes back to life.
I'm hoping there is an intelligent method to trigger a spurious wake-up of all threads within my process. Think pthread_cond_broadcast(...) but without having access to the actual condition variable being waited on.
Is this possible, or is relying on a program like kill my only approach?
The way you're doing it right now is probably the most correct and simplest. There is no "wake all waiting futexes in a given process" operation in the kernel, which is what you would need to achieve this more directly.
Note that if the failure-to-wake "deadlock" is in pthread_cond_wait but interrupting it with a signal breaks out of the deadlock, the bug cannot be in the application; it must actually be in the implementation of pthread condition variables. glibc has known unfixed bugs in its condition variable implementation; see http://sourceware.org/bugzilla/show_bug.cgi?id=13165 and related bug reports. However, you might have found a new one, since I don't think the existing known ones can be fixed by breaking out of the futex wait with a signal. If you can report this bug to the glibc bug tracker, it would be very helpful.

Why is forking slowing down my application

My application takes a checkpoint every few 100 milliseconds by using the fork system call. However, I notice that my application slows down significantly when using checkpointing (forking). I tested the time taken by fork call and it came out to be 1 to 2 ms. So why is fork slowing down my application so much. Note that I only keep 1 checkpoint (forked process) at a time and kill the previous checkpoint whenever I take a new one. Also, my computer has a huge RAM.
Notice that my forked process just sleeps after creation. It is only awoken when rollback needs to be done. So, it should not be scheduled by the OS. One thing that comes to my mind is that since fork is a copy-on-write mechanism, there are page faults occuring whenever my application modifies a page. But should that slow down the application significantly? Without checkpointing (forking), my application finishes in approximately 3.1 seconds and with it, it takes around 3.7 seconds. Any idea, what is slowing down my application?
You are probably observing the cost of the copy-on-write mechanism, as you hypothesize. That's actually quite expensive -- it is the reason vfork still exists. (The main cost is not the extra page faults themselves, but the memcpy of each page as it is touched, and the associated cache and TLB flushes.) It's not showing up as a cost of fork because the page faults don't happen inside the system call.
You can confirm the hypothesis by looking at the times reported by getrusage -- if this is correct, the extra time elapsed should be nearly all "system" time (CPU burnt inside the kernel). oprofile or perf will let you pin down the problem more specifically... if you can get them to work at all, which is nontrivial, alas.
Unfortunately, copy-on-write is also the reason why your checkpoint mechanism works in the first place. Can you get away with taking checkpoints at longer intervals? That's the only quick fix I can think of.
I suggest using oprofile to find out.
oprofile is believed to be able to profile a system (and not only a single process).
You could compare with what other checkpointing packages do, e.g. BLCR
Forking is by nature very expensive, as you're creating a copy of the existing process as an entirely new process. If speed is important to you, you should use threads.
Additionally, you say that the forked process sleeps until a 'rollback' is needed. I'm not sure what you mean by rollback, but provided its something that you can put in a function, you ought to just place it in a function and then create a thread that just runs that function and exits when you detect the need for the rollback. As an added bonus, if you use that method you only create the thread if you need it.

Shared POSIX objects cleanup on process end / death

Is there any way to perform POSIX shared synchronization objects cleanup especially on process crash? Locked POSIX semaphores unblock is most desired thing but automatically 'collected' queues / shared memory region would be nice too. Another thing to keep eye on is we can't in general use signal handlers because of SIGKILL which cannot be caught.
I see only one alternative: some external daemon which accepts subscriptions and 'keep-alive' requests working as watchdog so not having notifications about some object it could close / unlock object in accordance to registered policy.
Has anyone better alternative / proposition? I never worked seriously with POSIX shared objects before (sockets were enough for all my needs and are much more useful by my opinion) and I did not found any applicable article. I'd gladly use sockets here but can't because of historical reasons.
Rather than using semaphores you could use file locking to co-oridinate your processes. The big advanatge of file locks being that they are released if the process terminates. You can map each semaphore onto a lock for a byte in a shared file and know that locks will get released on exit; in mosts version of unix the bytes you lock don't even have to exist. There is code for this in Marc Rochkind's book Advanced Unix Programming 1st edition, don't know if it's in the latest 2nd edition though.
I know this question is old, but another great solution is POSIX robust mutexes. They automatically unlock and enter an "inconsistent flag" state when the owner dies, and the next thread to attempt locking the mutex gets an EOWNERDEAD error but succeeds in becoming the new owner of the mutex. It's then able to clean up whatever state the mutex was protecting (which could be in a very bad inconsistent state due to asynchronous termination of the previous owner!) and mark the mutex as consistent again before unlocking it.
See the documentation on robust mutexes here:
http://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_mutex_lock.html
The usual way is to work with signal handlers. Just catch the signals and call the cleanup functions.
But your watchdog daemon has some merits, too. It would surely make the system more simple to understand and manage. To make it more simple to administrate, your application should start the daemon when it's not running and the daemon should be able to clean up any residue from the last crash.

Resources