Best way to invoke gdb from inside program to print its stacktrace? - c

Using a function like this:
#include <stdio.h>
#include <stdlib.h>
#include <sys/wait.h>
#include <unistd.h>
void print_trace() {
char pid_buf[30];
sprintf(pid_buf, "--pid=%d", getpid());
char name_buf[512];
name_buf[readlink("/proc/self/exe", name_buf, 511)]=0;
int child_pid = fork();
if (!child_pid) {
dup2(2,1); // redirect output to stderr
fprintf(stdout,"stack trace for %s pid=%s\n",name_buf,pid_buf);
execlp("gdb", "gdb", "--batch", "-n", "-ex", "thread", "-ex", "bt", name_buf, pid_buf, NULL);
abort(); /* If gdb failed to start */
} else {
waitpid(child_pid,NULL,0);
}
}
I see the details of print_trace in the output.
What are other ways to do it?

You mentioned on my other answer (now deleted) that you also want to see line numbers. I'm not sure how to do that when invoking gdb from inside your application.
But I'm going to share with you a couple of ways to print a simple stacktrace with function names and their respective line numbers without using gdb. Most of them came from a very nice article from Linux Journal:
Method #1:
The first method is to disseminate it
with print and log messages in order
to pinpoint the execution path. In a
complex program, this option can
become cumbersome and tedious even if,
with the help of some GCC-specific
macros, it can be simplified a bit.
Consider, for example, a debug macro
such as:
#define TRACE_MSG fprintf(stderr, __FUNCTION__ \
"() [%s:%d] here I am\n", \
__FILE__, __LINE__)
You can propagate this macro quickly
throughout your program by cutting and
pasting it. When you do not need it
anymore, switch it off simply by
defining it to no-op.
Method #2: (It doesn't say anything about line numbers, but I do on method 4)
A nicer way to get a stack backtrace,
however, is to use some of the
specific support functions provided by
glibc. The key one is backtrace(),
which navigates the stack frames from
the calling point to the beginning of
the program and provides an array of
return addresses. You then can map
each address to the body of a
particular function in your code by
having a look at the object file with
the nm command. Or, you can do it a
simpler way--use backtrace_symbols().
This function transforms a list of
return addresses, as returned by
backtrace(), into a list of strings,
each containing the function name
offset within the function and the
return address. The list of strings is
allocated from your heap space (as if
you called malloc()), so you should
free() it as soon as you are done with
it.
I encourage you to read it since the page has source code examples. In order to convert an address to a function name you must compile your application with the -rdynamic option.
Method #3: (A better way of doing method 2)
An even more useful application for
this technique is putting a stack
backtrace inside a signal handler and
having the latter catch all the "bad"
signals your program can receive
(SIGSEGV, SIGBUS, SIGILL, SIGFPE and
the like). This way, if your program
unfortunately crashes and you were not
running it with a debugger, you can
get a stack trace and know where the
fault happened. This technique also
can be used to understand where your
program is looping in case it stops
responding
An implementation of this technique is available here.
Method #4:
A small improvement I've done on method #3 to print line numbers. This could be copied to work on method #2 also.
Basically, I followed a tip that uses addr2line to
convert addresses into file names and
line numbers.
The source code below prints line numbers for all local functions. If a function from another library is called, you might see a couple of ??:0 instead of file names.
#include <stdio.h>
#include <signal.h>
#include <stdio.h>
#include <signal.h>
#include <execinfo.h>
void bt_sighandler(int sig, struct sigcontext ctx) {
void *trace[16];
char **messages = (char **)NULL;
int i, trace_size = 0;
if (sig == SIGSEGV)
printf("Got signal %d, faulty address is %p, "
"from %p\n", sig, ctx.cr2, ctx.eip);
else
printf("Got signal %d\n", sig);
trace_size = backtrace(trace, 16);
/* overwrite sigaction with caller's address */
trace[1] = (void *)ctx.eip;
messages = backtrace_symbols(trace, trace_size);
/* skip first stack frame (points here) */
printf("[bt] Execution path:\n");
for (i=1; i<trace_size; ++i)
{
printf("[bt] #%d %s\n", i, messages[i]);
/* find first occurence of '(' or ' ' in message[i] and assume
* everything before that is the file name. (Don't go beyond 0 though
* (string terminator)*/
size_t p = 0;
while(messages[i][p] != '(' && messages[i][p] != ' '
&& messages[i][p] != 0)
++p;
char syscom[256];
sprintf(syscom,"addr2line %p -e %.*s", trace[i], p, messages[i]);
//last parameter is the file name of the symbol
system(syscom);
}
exit(0);
}
int func_a(int a, char b) {
char *p = (char *)0xdeadbeef;
a = a + b;
*p = 10; /* CRASH here!! */
return 2*a;
}
int func_b() {
int res, a = 5;
res = 5 + func_a(a, 't');
return res;
}
int main() {
/* Install our signal handler */
struct sigaction sa;
sa.sa_handler = (void *)bt_sighandler;
sigemptyset(&sa.sa_mask);
sa.sa_flags = SA_RESTART;
sigaction(SIGSEGV, &sa, NULL);
sigaction(SIGUSR1, &sa, NULL);
/* ... add any other signal here */
/* Do something */
printf("%d\n", func_b());
}
This code should be compiled as: gcc sighandler.c -o sighandler -rdynamic
The program outputs:
Got signal 11, faulty address is 0xdeadbeef, from 0x8048975
[bt] Execution path:
[bt] #1 ./sighandler(func_a+0x1d) [0x8048975]
/home/karl/workspace/stacktrace/sighandler.c:44
[bt] #2 ./sighandler(func_b+0x20) [0x804899f]
/home/karl/workspace/stacktrace/sighandler.c:54
[bt] #3 ./sighandler(main+0x6c) [0x8048a16]
/home/karl/workspace/stacktrace/sighandler.c:74
[bt] #4 /lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe6) [0x3fdbd6]
??:0
[bt] #5 ./sighandler() [0x8048781]
??:0
Update 2012/04/28 for recent linux kernel versions, the above sigaction signature is obsolete. Also I improved it a bit by grabbing the executable name from this answer. Here is an up to date version:
char* exe = 0;
int initialiseExecutableName()
{
char link[1024];
exe = new char[1024];
snprintf(link,sizeof link,"/proc/%d/exe",getpid());
if(readlink(link,exe,sizeof link)==-1) {
fprintf(stderr,"ERRORRRRR\n");
exit(1);
}
printf("Executable name initialised: %s\n",exe);
}
const char* getExecutableName()
{
if (exe == 0)
initialiseExecutableName();
return exe;
}
/* get REG_EIP from ucontext.h */
#define __USE_GNU
#include <ucontext.h>
void bt_sighandler(int sig, siginfo_t *info,
void *secret) {
void *trace[16];
char **messages = (char **)NULL;
int i, trace_size = 0;
ucontext_t *uc = (ucontext_t *)secret;
/* Do something useful with siginfo_t */
if (sig == SIGSEGV)
printf("Got signal %d, faulty address is %p, "
"from %p\n", sig, info->si_addr,
uc->uc_mcontext.gregs[REG_EIP]);
else
printf("Got signal %d\n", sig);
trace_size = backtrace(trace, 16);
/* overwrite sigaction with caller's address */
trace[1] = (void *) uc->uc_mcontext.gregs[REG_EIP];
messages = backtrace_symbols(trace, trace_size);
/* skip first stack frame (points here) */
printf("[bt] Execution path:\n");
for (i=1; i<trace_size; ++i)
{
printf("[bt] %s\n", messages[i]);
/* find first occurence of '(' or ' ' in message[i] and assume
* everything before that is the file name. (Don't go beyond 0 though
* (string terminator)*/
size_t p = 0;
while(messages[i][p] != '(' && messages[i][p] != ' '
&& messages[i][p] != 0)
++p;
char syscom[256];
sprintf(syscom,"addr2line %p -e %.*s", trace[i] , p, messages[i] );
//last parameter is the filename of the symbol
system(syscom);
}
exit(0);
}
and initialise like this:
int main() {
/* Install our signal handler */
struct sigaction sa;
sa.sa_sigaction = (void *)bt_sighandler;
sigemptyset (&sa.sa_mask);
sa.sa_flags = SA_RESTART | SA_SIGINFO;
sigaction(SIGSEGV, &sa, NULL);
sigaction(SIGUSR1, &sa, NULL);
/* ... add any other signal here */
/* Do something */
printf("%d\n", func_b());
}

If you're using Linux, the standard C library includes a function called backtrace, which populates an array with frames' return addresses, and another function called backtrace_symbols, which will take the addresses from backtrace and look up the corresponding function names. These are documented in the GNU C Library manual.
Those won't show argument values, source lines, and the like, and they only apply to the calling thread. However, they should be a lot faster (and perhaps less flaky) than running GDB that way, so they have their place.

nobar posted a fantastic answer. In short;
So you want a stand-alone function that prints a stack trace with all of the features that gdb stack traces have and that doesn't terminate your application. The answer is to automate the launch of gdb in a non-interactive mode to perform just the tasks that you want.
This is done by executing gdb in a child process, using fork(), and scripting it to display a stack-trace while your application waits for it to complete. This can be performed without the use of a core-dump and without aborting the application.
I believe that this is what you are looking for, #Vi

Isn't abort() simpler?
That way if it happens in the field the customer can send you the core file (I don't know many users who are involved enough in my application to want me to force them to debug it).

Related

Why the printf( ) is working strangely after reopening stdout stream

After reopening STDOUT stream, the message does not display on my screen if calling print() like this:
printf("The message disappeared\n")
The snippet code for explaining the problem:
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <unistd.h>
#include <stdarg.h>
#include <unistd.h>
#include <errno.h>
int main(void)
{
printf("Display a message\n");
int fd, fd_copy, new_fd;
FILE *old_stream = stdout;
fd = STDOUT_FILENO;
fd_copy = dup(fd);
fclose(old_stream);
new_fd = dup2(fd_copy, fd);
close(fd_copy);
FILE *new_stream = fdopen(fd, "w");
stdout = new_stream;
printf("test %d\n", 1);
fprintf(stdout, "test 2\n");
int rc = printf("test 3\n");
printf("Test 4 Why the message disappeared\n");
printf("Error message is [%s]\n", strerror(errno));
return 0;
}
Why only the test 4 cannot display on my screen. Don't they all use stdout as output?
Output:
# gcc main.c; ./a.out
Display a message
test 1
test 2
test 3
Error message is [Bad file descriptor]
The code snippet above comes from the LVM2 library function.
int reopen_standard_stream(FILE **stream, const char *mode)
/* https://github.com/lvmteam/lvm2/blob/master/lib/log/log.c */
The dynamic library I designed:
I wrap up a dynamic library it includes the LVM dynamic library for other processes uses. One of the functions is this (output all PVs in the system):
char global_buffer[0x1000];
void show_pvs_clbk_fn(int level, const char *file, int line,
int dm_errno, const char *format)
{
/* Extract and process output here rather than printing it */
if (level != LVM2_LOG_PRINT)
return;
sprintf(global_buffer, "%s%s\n", global_buffer, format)
}
int show_all_PV(char *out_buffer)
{
void *handle = lvm2_init();
lvm2_log_fn(show_pvs_clbk_fn);
int rc = lvm2_run(handle, "pvs");
lvm2_exit(handle);
if (rc != LVM2_COMMAND_SUCCEEDED) {
return -1;
}
strcpy(out_buffer, global_buffer)
return 0;
}
A caller may calls show_all_PV() API like this:
int main(void)
{
char tmp[0x1000];
if (!show_all_PV(tmp)) {
printf("====== PVS are ======\n");
printf("%s\n", tmp);
}
}
Output:
====== PVS are ======
PV VG Fmt Attr PSize PFree
/dev/nvme1n1p1 vg1 lvm2 a-- <1.2t 1.1t
Some caller maybe mess up the stdout:
I found a stranger thing is that if the caller defines a function which includes vfprintf(stdout, ) system call. they never get output from normal print() API.
#inclide <stdlin.h>
#inclide <stdio.h>
#inclide <unistd.h>
#inclide <stdarg.h>
#if 1
int a_function_never_be_called(const char *formatP, ...)
{
va_list ap;
va_start(ap, formatP);
vfprintf(stdout, formatP, ap);
va_end(ap);
return 0;
}
#endif
int main(void)
{
char tmp[0x1000];
if (!show_all_PV(tmp)) {
printf("====== PVS are ======\n");
printf("%s\n", tmp);
}
}
The string "====== PVS are ======" disappeared and the caller got an IO error Bad file descripto.
Output:
PV VG Fmt Attr PSize PFree
/dev/nvme1n1p1 vg1 lvm2 a-- <1.2t 1.1t
Assigning to stdout (or stdin or stderr) is Undefined Behaviour. And in the face of undefined behaviour, odd things happen.
Technically, no more needs to be said. But after I wrote this answer, #zwol noted in a comment that the glibc documentation claims to allow reassignment of standard IO streams. In those terms, this behaviour is a bug. I accept this fact, but the OP was not predicated on the use of glibc, and there are many other standard library implementations which don't make this guarantee. In some of them, assigning to stdout will raise an error at compile time; in others, it will simply not work or not work consistently. In other words, regardless of glibc, assigning to stdout is Undefined Behaviour, and software which attempts to do so is, at best, unportable. (And, as we see, even on glibc it can lead to unpredictable output.)
But my curiosity was aroused so I investigated a bit. The first thing is to look at the actual code generated by gcc and see what library function is actually being called by each of those output calls:
printf("test %d\n", 1); /* Calls printf("test %d\n", 1); */
fprintf(stdout, "test 2\n"); /* Calls fwrite("test 2\n", 1, 7, stdout); */
int rc = printf("test 3\n"); /* Calls printf("test 3\n"); */
printf("Test 4 Why the message disappeared\n");
/* Calls puts("Test 4...disappeared"); */
printf("Error message is [%s]\n", strerror(errno));
/* Calls printf("..."); */
Note that GCC is trying hard to optimise the calls. In lines 2 and 4, it is able to find a non-printf library call, avoiding run-time parsing of the format string.
But note that it does not do that in the case of line 3, which looks the same as line 4. Why not? Because you are using the return value of printf, which is the number of characters sent to stdout. But that's not the same as the return value of puts, which just returns a "non-negative number" on success. So the substitution is impossible.
Suppose we remove int rc = from line 3, and recompile. Now we get this:
printf("test %d\n", 1); /* Calls printf("test %d\n", 1); */
fprintf(stdout, "test 2\n"); /* Calls fwrite("test 2\n", 1, 7, stdout); */
printf("test 3\n"); /* Calls puts("test 3"); */
printf("Test 4 Why the message disappeared\n");
/* Calls puts("Test 4...disappeared"); */
printf("Error message is [%s]\n", strerror(errno));
/* Calls printf("..."); */
So without the use of the return value, GCC can substitute printf with puts. (Note also that when it does that substitution, it also removes the \n from the string literal, because puts automatically adds a newline to the end of its output.)
When we run the modified program, we see this:
Display a message
test 1
test 2
Error message is [Bad file descriptor]
Now, two lines have disappeared, which are precisely the two lines for which GCC used puts.
After the shenanigans at the beginning, puts no longer works, presumably because it relies on stdout not having been reassigned. Which it's allowed to do, because reassigning stdout is Undefined Behaviour. (You can use freopen if you want to reopen stdout.)
Final note:
Unsurprisingly, it turns out that the glibc team did accept it as a bug; it was reported as bug 24051 and a similar issue with stdin as bug 24153. Both were fixed in glibc v2.30, released in August of 2019. So if you have a recently upgraded Linux install, or you are reading this answer years after I wrote it, you might not see this bug.

Partial write due to SIGINT

I have a write_full which should write the buffer fully to standard output even if it was interrupted by a signal.
I have a loop which keep write_full a string until quit is changed by a signal handler. Here's the code:
#include <signal.h>
#include <unistd.h>
#include <errno.h>
volatile sig_atomic_t quit = 0;
void sigint_handler(int s)
{
quit = 1;
}
int write_full(char *buf, size_t len)
{
while (len > 0) {
ssize_t written = write(STDOUT_FILENO, buf, len);
if (written == -1) {
if (errno == EINTR) {
continue;
}
return -1;
}
buf += written;
len -= (size_t)written;
}
return 0;
}
int main(void)
{
struct sigaction act = {
.sa_handler = sigint_handler
};
sigaction(SIGINT, &act, NULL);
while (!quit) {
write_full("loop\n", 5);
}
write_full("cleanup", 7);
return 0;
}
I expect the program to write the "loop" fully before printing "cleanup", but I see an output like this:
loop
loop
loop
l^C
cleanup
Why is this happening? I expect it to be something like this:
loop
loop
loop
l^Coop
cleanup
because the write_full should continue writing the "oop\n" part even after the first write was a short write due to interrupt. I put a breakpoint at the signal handler and stepped and it seems like write is reporting that it has written 4 characters even when it only wrote "l" to the stdout. So instead of writing "oop\n" next, it only writes "\n".
l^C
Program received signal SIGINT, Interrupt.
Breakpoint 1, sigint_handler (s=2) at src/main.c:9
9 quit = 1;
(gdb) next
10 }
(gdb) next
write_full (buf=0x4020a0 "loop\n", len=5) at src/main.c:16
16 if (written == -1) {
(gdb) print written
$1 = 4
Why is this happening? And how can I fix this?
Ctrl-C messes up terminal output. The program writes all it should write, but by default the terminal driver cuts the line after Ctrl-C. This is not under control of your program. The cut happens if the driver sees Ctrl-C in the middle of copying of the complete line buffer to the physical device. Input line buffer is also discarded. This pertains to other signal-generating characters as well.
This is described rather briefly in the stty(1) and termios(3) manual pages.
This behaviour can be disabled with stty noflsh command.
You can also redirect to a file to see the complete output.

Thread safe, reentrant, async-signal safe putenv

I apologise in advance for what will be a bit of a code dump, I've trimmed as much unimportant code as possible:
// Global vars / mutex stuff
extern char **environ;
pthread_mutex_t env_mutex = PTHREAD_MUTEX_INITIALIZER;
int
putenv_r(char *string)
{
int len;
int key_len = 0;
int i;
sigset_t block;
sigset_t old;
sigfillset(&block);
pthread_sigmask(SIG_BLOCK, &block, &old);
// This function is thread-safe
len = strlen(string);
for (int i=0; i < len; i++) {
if (string[i] == '=') {
key_len = i; // Thanks Klas for pointing this out.
break;
}
}
// Need a string like key=value
if (key_len == 0) {
errno = EINVAL; // putenv doesn't normally return this err code
return -1;
}
// We're moving into environ territory so start locking stuff up.
pthread_mutex_lock(&env_mutex);
for (i = 0; environ[i] != NULL; i++) {
if (strncmp(string, environ[i], key_len) == 0) {
// Pointer assignment, so if string changes so does the env.
// This behaviour is POSIX conformant, instead of making a copy.
environ[i] = string;
pthread_mutex_unlock(&env_mutex);
return(0);
}
}
// If we get here, the env var didn't already exist, so we add it.
// Note that malloc isn't async-signal safe. This is why we block signals.
environ[i] = malloc(sizeof(char *));
environ[i] = string;
environ[i+1] = NULL;
// This ^ is possibly incorrect, do I need to grow environ some how?
pthread_mutex_unlock(&env_mutex);
pthread_sigmask(SIG_SETMASK, &old, NULL);
return(0);
}
As the title says, I'm trying to code a thread safe, async-signal safe reentrant version of putenv. The code works in that it sets the environment variable like putenv would, but I do have a few concerns:
My method for making it async-signal safe feels a bit ham-handed, just blocking all signals (except SIGKILL/SIGSTOP of course). Or is this the most appropriate way to go about it.
Is the location of my signal blocking too conservative? I know strlen isn't guaranteed to be async-signal safe, meaning that my signal blocking has to occur beforehand, but perhaps I'm mistaken.
I'm fairly sure that it is thread safe, considering that all the functions are thread-safe and that I lock interactions with environ, but I'd love to be proven otherwise.
I'm really not too sure about whether it's reentrant or not. While not guaranteed, I imagine that if I tick the other two boxes it'll most likely be reentrant?
I found another solution to this question here, in which they just set up the appropriate signal blocking and mutex locking (sick rhymes) and then call putenv normally. Is this valid? If so, it's obviously far simpler than my approach.
Sorry about the large block of code, I hope I've established a MCVE. I'm missing a bit of error checking in my code for brevity's sake. Thanks!
Here is the rest of the code, including a main, if you wish to test the code yourself:
#include <string.h>
#include <errno.h>
#include <pthread.h>
#include <stdlib.h>
#include <stdio.h>
#include <signal.h>
// Prototypes
static void thread_init(void);
int putenv_r(char *string);
int
main(int argc, char *argv[]) {
int ret = putenv_r("mykey=myval");
printf("%d: mykey = %s\n", ret, getenv("mykey"));
return 0;
}
This code is a problem:
// If we get here, the env var didn't already exist, so we add it.
// Note that malloc isn't async-signal safe. This is why we block signals.
environ[i] = malloc(sizeof(char *));
environ[i] = string;
It creates a char * on the heap, assigns the address of that char * to environ[i], then overwrites that value with the address contained in string. That's not going to work. It doesn't guarantee that environ is NULL-terminated afterwards.
Because char **environ is a pointer to an array of pointers. The last pointer in the array is NULL - that's how code can tell it's reached the end of the list of environment variables.
Something like this should work better:
unsigned int envCount;
for ( envCount = 0; environ[ envCount ]; envCount++ )
{
/* empty loop */;
}
/* since environ[ envCount ] is NULL, the environ array
of pointers has envCount + 1 elements in it */
envCount++;
/* grow the environ array by one pointer */
char ** newEnviron = realloc( environ, ( envCount + 1 ) * sizeof( char * ) );
/* add the new envval */
newEnviron[ envCount - 1 ] = newEnvval;
/* NULL-terminate the array of pointers */
newEnviron[ envCount ] = NULL;
environ = newEnviron;
Note that there's no error checking, and it assumes the original environ array was obtained via a call to malloc() or similar. If that assumption is wrong, the behavior is undefined.

How to make several threads read several files without interference?

I am studying mutexes and I am stuck in an exercise. For each file in a given directory, I have to create a thread to read it and display its contents (no problem if order is not correct).
So far, the threads are running this function:
void * reader_thread (void * arg)
{
char * file_path = (char*)arg;
FILE * f;
char temp[20];
int value;
f=fopen(file_path, "r");
printf("Opened %s.\n",file_path);
while (fscanf(f, "%s",temp)!=EOF)
if (!get_number (temp, &value)) /*Gets int value from given string (if numeric)*/
printf("Thread %lu -> %s: %d\n", pthread_self(), file_path, value );
fclose(f);
pthread_exit(NULL);
}
Being called by a function that receives a DIR pointer, previously created by opendir().
(I have omitted some error checking here to make it cleaner, but I get no error at all.)
int readfiles (DIR * dir, char * path)
{
struct dirent * temp = NULL;
char * file_path;
pthread_t thList [MAX_THREADS];
int nThreads=0, i;
memset(thList, 0, sizeof(pthread_t)*MAX_THREADS);
file_path=malloc((257+strlen(path))*sizeof(char));
while((temp = readdir (dir))!=NULL && nThreads<MAX_THREADS) /*Reads files from dir*/
{
if (temp->d_name[0] != '.') /*Ignores the ones beggining with '.'*/
{
get_file_path(path, temp->d_name, file_path); /*Computes rute (overwritten every iteration)*/
printf("Got %s.\n", file_path);
pthread_create(&thList[nThreads], NULL, reader_thread, (void * )file_path)
nThreads++;
}
}
printf("readdir: %s\n", strerror (errno )); /*Just in case*/
for (i=0; i<nThreads ; i++)
pthread_join(thList[i], NULL)
if (file_path)
free(file_path);
return 0;
}
My problem here is that, although paths are computed perfectly, the threads don't seem to receive the correct argument. They all read the same file. This is the output I get:
Got test/testB.
Got test/testA.
readdir: Success
Opened test/testA.
Thread 139976911939328 -> test/testA: 3536
Thread 139976911939328 -> test/testA: 37
Thread 139976911939328 -> test/testA: -38
Thread 139976911939328 -> test/testA: -985
Opened test/testA.
Thread 139976903546624 -> test/testA: 3536
Thread 139976903546624 -> test/testA: 37
Thread 139976903546624 -> test/testA: -38
Thread 139976903546624 -> test/testA: -985
If I join the threads before the next one begins, it works OK. So I assume there is a critical section somewhere, but I don't really know how to find it. I have tried mutexing the whole thread function:
void * reader_thread (void * arg)
{
pthread_mutex_lock(&mutex_file);
/*...*/
pthread_mutex_unlock(&mutex_file);
}
And also, mutexing the while loop in the second function. Even both at the same time. But it won't work in any way. By the way, mutex_file is a global variable, which is init'd by pthread_mutex_init() in main().
I would really appreciate a piece of advice with this, as I don't really know what I'm doing wrong. I would also appreciate some good reference or book, as mutexes and System V semaphores are feeling a bit difficult to me.
Thank you very much.
Well, you are passing exactly the same pointer as file path to both threads. As a result, they read file name from the same string and end up reading the same file. Actually, you get a little bit lucky here because in reality you have a race condition — you update the contents of the string pointer by file_path while firing up threads that read from that pointer, so you may end up with a thread reading that memory while it is being changed. What you have to do is allocate an argument for each thread separately (i.e. call malloc and related logic in your while loop), and then free those arguments once thread is exited.
Looks like you're using the same file_path buffer for all threads, just loading it over and over again with the next name. You need to allocate a new string for each thread, and have each thread delete the string after using it.
edit
Since you already have an array of threads, you could just make a parallel array of char[], each holding the filename for the corresponding thread. This would avoid malloc/free.

Reason for Segmentation Fault

I have written a program using clone() system call having CLONE_VM and CLONE_FILES set.
I am not able to understand why the output is showing Segmentation Fault. Can somebody please correct my code and tell me the reason for the same.
#include<stdio.h>
#include<unistd.h>
#include<fcntl.h>
#include<sys/types.h>
#include<sys/stat.h>
#include<sched.h>
#include<stdlib.h>
int variable, fd;
int do_something() {
// sleep(100);
variable = 42;
close(fd);
_exit(0);
}
int main(int argc, char *argv[]) {
void **child_stack;
char tempch;
variable = 9;
fd = open("test.file", O_RDONLY);
child_stack = (void **) malloc(16384);
printf("The variable was %d\n", variable);
clone(do_something, child_stack, CLONE_VM|CLONE_FILES, NULL);
// sleep(100);
printf("The variable is now %d\n", variable);
if (read(fd, &tempch, 1) < 1) {
perror("File Read Error");
exit(1);
}
printf("We could read from the file\n");
return 0;
}
You need to know which direction stack grows on your processor, and you need to know which end of the stack you must pass to clone().
From man clone:
Stacks grow downwards on all processors that run Linux (except the
HP PA processors), so child_stack usually points to the topmost
address of the memory space set up for the child stack.
You are not passing the topmost address, you are passing the bottommost address, and you are not (I am guessing) on HP-PA.
Fix:
child_stack = (void **) malloc(16384) + 16384 / sizeof(*child_stack);
P.S. I am astonished by the number of obviously wrong non-answers here.
No, close on invalid file descriptor
does not crash on any UNIX and
Linux system in existence.
No, void* vs. void** has nothing at all to do with the problem.
No, you don't need to take an address of do_something, the compiler will do that automatically for you.
And finally, yes: calling close, _exit, or any other libc routine in the clone()d thread is potentially unsafe, although it does not cause the problem here.
The way to fix is to have the child stack actually on the stack .. i.e.
char child_stack [16384];
I suspect that stack pointer can't point to data segment or sth like that...
And even then.. it works with -g .. but crashes with -O !!!

Resources