Thread not acquiring available mutex - c

The following is a bit of output from gdb. It seems to me that the thread running here is not actually acquiring the lock.
120 pthread_mutex_lock(&queue_p->lock);
(gdb) p queue_p->lock->__data->__owner
$45 = 0 //This makes sense; the instruction hasn't run yet.
(gdb) n
121 if (queue_p->back == NULL) {
(gdb) p queue_p->lock->__data->__owner
$46 = 0 // What?
There are no other threads running when this happens. It's just a single line of code, but it's not operating the way I expect. Which probably means my expectations are wrong. Can anyone un-befuddle me?
EDIT: Hoping this may be of use to others. What I missed is that the main thread continues execution after it creates threads. This is obvious to me now, but at the time I posted this question, I was under the influence of a lifetime spent writing single-threaded procedural code.
So my code created threads, each of which attempted to acquire this mutex. But before they could, the main thread had run its course, which included destroying the mutex. I was focussed on the code shown and didn't fully comprehend that it wasn't the only code running.

Related

Reading an update across threads

In my application, I have a block of shared memory, which one thread periodically writes to, and another thread periodically gets (and then sets to 0)
Thread 1:
#onevent:
__atomic_store(addr, val, __ATOMIC_SEQ_CST);
Thread 2:
while((val = __atomic_exchange_n(addr, 0, __ATOMIC_SEQ_CST)) == 0);
... work on val
I find that occasionally thread 2 spins forever.
In addition, placing any kind of debugging statements, say a print statement of addr after the atomic store or after each atomic exchange, and everything works fine (so some kind of race condition).
I'm really stuck (since I tried this in a separate isolated program, and it seems to work fine). Any help would be much appreciated. For reference I am running on a high-core count dual-socket node.

Stepping through a multithreaded application with GDB

1st foray into using pthreads to create a multithreaded aplication
I'm trying to debug with gdb but getting some strange unexpected behaviour
Trying to ascertain whether its me or gdb at fault
Scenario:
Main thread creates a child thread.
I place a breakpoint on a line in the child thread fn
gdb stops on that breakpoint no problem
I confirm there are now 2 threads with info threads
I also check that the 2nd thread is starred, i.e. it is the current thread for gdbs purposes
Here is the problem, when I now hit n to step through to the next line in the thread fn, the parent thread (thread 1) simply resumes and completes and gdb exits.
Is this the correct behaviour?
How can I step through the thread fn code that is being executed in the 2nd thread line by line with gdb?
In other words, even though thread 2 is confirmed as the current thread by gdb, when I hit n, it seems to be the equivalent of hitting c in the parent thread, i.e. the parent thread (thread 1) just resumes execution, completes and exits.
At a loss as to how to debug multiple threads with gdb behaving as it is currently
I am using gdb from within emacs25, i.e. M-x gud-gdb
What GDB does here depends on your settings, and also your system (some vendors patch this area).
Normally, in all-stop mode, when the inferior stops, GDB stops all the threads. This gives you the behavior that you'd "expect" -- you can switch freely between threads and see what is going on in each one.
When the inferior continues, including via next or step, GDB lets all threads run. So, if your second thread doesn't interact with your first thread in any way (no locks, etc), you may well see it exit.
However, you can control this using set scheduler-locking. Setting this to on will make it so that only the current thread can be resumed. And, setting it to step will make it so that only the current thread can be resumed by step and next, but will let all threads run freely on continue and the like.
The default mode here is replay, which is basically off, except when using record-and-replay mode. However, the Fedora GDB is built with the default as step; I am not sure if other distros followed this, but you may want to check.
Yes, this is correct behaviour of gdb. You are only debugging currently active thread, other threads are executing normally behind the scenes. Think about it, how else would you move other threds?
But your code has a bug. Your parent thread should not exit before child thread is done. The best way to do this is to join child thread in the main thread before exiting.

thread handling

Suppose a thread A creates a thread B and after a duration the thread B crashes with an issue, Is there any possibility that the control moves back to the thread A in C language.
Sort of an exceptional handling.
No. "Control passes back" doesn't make a lot of sense at all, since they are executing independently anyway -- usually, Thread A isn't going to sit around waiting for Thread B to finish, but it will be doing something else.
Incidentally, threads can, of course, check whether another thread is still running. Check your thread library or the system functions that you are using.
However, that will only work for something one could call a "soft crash"; a lot of crashes screw up a lot more than just the thread doing the bad thing, such as hardware exceptions that kill the entire process, or corrupting memory. So, trying to catch crashes in another thread is going to be a good amount of work with little benefit, if any at all. Better spend that time fixing the crashes.
No. They're separate threads of execution. Once thread A has created and started thread B, both A and B can execute independently.
Of course if thread B crashes the whole process, thread A won't exist any more...
Threads cannot call other threads, only signal them. The 'normal' function/method call/return mechanism is stack-based and each thread has its own stack, (it is very common for several threads to run exactly the same code using different stack auto-variables).
If a thread cannot call another thread, then there is no 'return' from one thread to another either.

Which thread holds the lock

I am working on C and I have a core dump of a multithreaded (two threads) process that I am debugging.
I see in gdb that the mutex_lock is acquired by both the threads under a rare situation. Is there a way I could check the thread that possess the lock in gdb?
I am running a flavor of linux..
Also, I am not allowed to post the code since it's proprietary.
On every line that gets and releases the lock in question (of course change the printf text), do the following:
break file:line
commands
printf "acquiring lock"
info threads
cont
end

Problems running program using shared memory; seg fault sometimes; shmmax and shmall have something to do with it?

HI,
I have a program in which a master processes spawns N workers who will invert, each one, each row of an image, giving me an inverted image at the end. The program uses shared memory and posix semaphores, unnamed sems, more spefically and I use shmctl with IPC_RMID and sem_close and sem_destroy in the terminate() function.
However, when I run the program several times, sometimes it gives me a segmentation fault and is in the first shmget. I've already modified my shmmax value in the kernel, but I can't do the same to the shmall value, I don't know why.
Can someone please help me? Why this happens and why isn't it all the time? The code seems fine, gives me what I want, efficient and so...but sometimes I have to reboot Ubuntu to be able to run it again, even thought I'me freeing the resources.
Please enlighten me!
EDIT:
Here are the 3 files needed to run the code + the makefile:
http://pastebin.com/JqTkEkPv
http://pastebin.com/v7fQXyjs
http://pastebin.com/NbYFAGYq
http://pastebin.com/mbPg1QJm
You have to run it like this ./invert someimage.ppm outimage.ppm
(test with a small one for now please)
Here are some values that may be important:
$ipcs -lm
------ Shared Memory Limits --------
max number of segments = 4096
max seg size (kbytes) = 262144
max total shared memory (kbytes) = 8388608
min seg size (bytes) = 1
$ipcs -ls
------ Semaphore Limits --------
max number of arrays = 128
max semaphores per array = 250
max semaphores system wide = 32000
max ops per semop call = 32
semaphore max value = 32767
EDIT: the seg fault was solved! I was allocating an **array in shared memory and that was a little bit odd.So, I've allocated segment for an *array only and voilĂ . If you want, check the new code and comment.
If all your sem_t POSIX semaphores are unnamed you should only use sem_init and sem_destroy on them and never sem_close.
Now that you posted your code we can say a bit more.
Without having read all in detail, I think the cleanup phase of your main looks suspicious. In fact it seems to me that all your worker processes will perform that cleanup phase too.
After the fork you should more clearly distinguish what main does and what the workers do. Alternatives:
Your main process could just
wait on the pids of the workers
and only then do the rest of
processing and cleanup.
All the worker processes could
return in main after the call to
worker.
Call exit at the end of the worker
function.
Edit after your code update:
I think still a better solution would be to do a classical wait for all the processes.
Now let's look into your worker process. In fact these never terminate, there is no break statement in the while (1) loop. I think what is happening is that once there is no more work to be done
the worker is stuck in
sem_wait(sem_remaining_lines)
your main process gets notified of
the termination
it destroys the sem_remaining_lines
the worker returns from sem_wait
and continues
since mutex3 is also already
destroyed (or maybe even unmapped) the wait on it returns
immediately
now it tries to access the data, and
depending on how far the main
process got on destruction the data
is mapped or not and the worker
crashes (or not)
As you can see you have many problems in there. What I would do to clean up this mess is
waitpid before destroy the shared
data
sem_trywait instead of the 1 in
while (1). But perhaps I didn't completely understand your control flow. In any case, give them a termination condition.
capture all returns from system
functions, in particular the sem_t
family. These can be interrupted by
IO, so you definitively must
check for EINTR on these.

Resources