I am experiencing a strange problem with the the popen and fgets library functions on a Linux system.
A short program demonstrating the problem is below that:
Installs a signal handler for SIGUSR1.
Creates a secondary thread to repeatedly send SIGUSR1 to the main thread.
In the main thread, repeatedly executes a very simple shell command via popen(), gets the output via fgets(), and checks to see if the output is of the expected length.
The output is unexpectedly truncated intermittently. Why?
Command-line invocation example:
$ gcc -Wall test.c -lpthread && ./a.out
iteration 0
iteration 1
iteration 2
iteration 3
iteration 4
iteration 5
unexpected length: 0
Details of my machine (the program will also compile and run with this online C compiler):
$ cat /etc/redhat-release
CentOS release 6.5 (Final)
$ uname -a
Linux localhost.localdomain 2.6.32-431.17.1.el6.x86_64 #1 SMP Wed May 7 23:32:49 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
# gcc 4.4.7
$ gcc --version
gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-4)
Copyright (C) 2010 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
# glibc 2.12
$ ldd --version
ldd (GNU libc) 2.12
Copyright (C) 2010 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Roland McGrath and Ulrich Drepper.
The program:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <signal.h>
#include <pthread.h>
#include <errno.h>
void dummy_signal_handler(int signal);
void* signal_spam_task(void* arg);
void echo_and_verify_output();
char* fgets_with_retry(char *buffer, int size, FILE *stream);
static pthread_t main_thread;
/**
* Prints an error message and exits if the output is truncated, which happens
* about 5% of the time.
*
* Installing the signal handler with the SA_RESTART flag, blocking SIGUSR1
* during the call to fgets(), or sleeping for a few milliseconds after the
* call to popen() will completely prevent truncation.
*/
int main(int argc, char **argv) {
// install signal handler for SIGUSR1
struct sigaction sa, osa;
sa.sa_handler = dummy_signal_handler;
sigemptyset(&sa.sa_mask);
sa.sa_flags = 0;
sigaction(SIGUSR1, &sa, &osa);
// create a secondary thread to repeatedly send SIGUSR1 to main thread
main_thread = pthread_self();
pthread_t spam_thread;
pthread_create(&spam_thread, NULL, signal_spam_task, NULL);
// repeatedly execute simple shell command until output is unexpected
unsigned int i = 0;
for (;;) {
printf("iteration %u\n", i++);
echo_and_verify_output();
}
return 0;
}
void dummy_signal_handler(int signal) {}
void* signal_spam_task(void* arg) {
for (;;)
pthread_kill(main_thread, SIGUSR1);
return NULL;
}
void echo_and_verify_output() {
// run simple command
FILE* stream = popen("echo -n hello", "r");
if (!stream)
exit(1);
// count the number of characters in the output
unsigned int length = 0;
char buffer[BUFSIZ];
while (fgets_with_retry(buffer, BUFSIZ, stream) != NULL)
length += strlen(buffer);
if (ferror(stream) || pclose(stream))
exit(1);
// double-check the output
if (length != strlen("hello")) {
printf("unexpected length: %i\n", length);
exit(2);
}
}
// version of fgets() that retries on EINTR
char* fgets_with_retry(char *buffer, int size, FILE *stream) {
for (;;) {
if (fgets(buffer, size, stream))
return buffer;
if (feof(stream))
return NULL;
if (errno != EINTR)
exit(1);
clearerr(stream);
}
}
If an error occurs on a FILE stream while reading with fgets, it's undefined as to whether some bytes read are transferred to the buffer before fgets returns NULL or not (7.19.7.2 of the C99 spec). So if the SIGUSR1 signal occurs while in the fgets call and causes an EINTR, its possible that some characters may be lost from the stream.
The upshot is that you can't use stdio functions to read/write FILE objects if the underlying system calls might have recoverable error returns (such as EINTR or EAGAIN), as there's no guarantee the standard library won't lose some data from the buffer when that happens. You can claim that this is a "bug" in the standard library implementation, but it is a bug that the C standard allows.
Related
I was having a problem with one application, so i went back to the basics and grabbed the sem_timedwait example from the ubuntu focal online manpages. I modified it slightly to repro the problem.
CASE: sem_post before sem_timedwait
EXPECTED: sem_timedwait to succeed immediately
OBTAINED: sem_timedwait times out
The problem was showing initially on a Docker (WSL disabled) container with Ubuntu 20.04 (g++ 9 multilib)
I then tried from a WSL Debian 9 (g++ 6 multilib) and a WSL Ubuntu 20.04 (g++ 9 multilib) installed fresh from PowerShell
I further installed a full fresh Ubuntu 20.04 VM with g++ 9 multilib on Hyper-V
I also tried apt update && apt upgrade to be sure to be on the latest packages, I also tried at some point to completely remove g++ 9 and all its dependencies and use g++ 10 (which comes with libasan.so.6 instead of libasan.so.5)
Original sem_timedwait example from Ubuntu
Modified version, added a sleep before sem_timedwait so that the call to sem_timedwait happens always after the sem_post. I also added a print of sem_getvalue to verify that the semaphore counter was being incremented correctly to 1.
[File: test_sem.cpp]
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <semaphore.h>
#include <time.h>
#include <assert.h>
#include <errno.h>
#include <signal.h>
sem_t sem;
#define handle_error(msg) \
do { perror(msg); exit(EXIT_FAILURE); } while (0)
static void
handler(int sig)
{
write(STDOUT_FILENO, "sem_post() from handler\n", 24);
if (sem_post(&sem) == -1) {
write(STDERR_FILENO, "sem_post() failed\n", 18);
_exit(EXIT_FAILURE);
}
}
int
main(int argc, char *argv[])
{
struct sigaction sa;
struct timespec ts;
int s;
if (argc != 3) {
fprintf(stderr, "Usage: %s <alarm-secs> <wait-secs>\n",
argv[0]);
exit(EXIT_FAILURE);
}
if (sem_init(&sem, 0, 0) == -1)
handle_error("sem_init");
/* Establish SIGALRM handler; set alarm timer using argv[1] */
sa.sa_handler = handler;
sigemptyset(&sa.sa_mask);
sa.sa_flags = 0;
if (sigaction(SIGALRM, &sa, NULL) == -1)
handle_error("sigaction");
alarm(atoi(argv[1]));
/* Calculate relative interval as current time plus
number of seconds given argv[2] */
if (clock_gettime(CLOCK_REALTIME, &ts) == -1)
handle_error("clock_gettime");
ts.tv_sec += atoi(argv[2]);
//this is a cancellation point when the alarm goes off
sleep(atoi(argv[1]) + 2);
int value = 0;
sem_getvalue(&sem, &value);
printf("sem_getvalue(): %d\n", value);
sleep(2);
printf("main() about to call sem_timedwait()\n");
while ((s = sem_timedwait(&sem, &ts)) == -1 && errno == EINTR)
continue; /* Restart if interrupted by handler */
/* Check what happened */
if (s == -1) {
if (errno == ETIMEDOUT)
printf("sem_timedwait() timed out\n");
else
perror("sem_timedwait");
} else
printf("sem_timedwait() succeeded\n");
exit((s == 0) ? EXIT_SUCCESS : EXIT_FAILURE);
}
to compile this example i used the following
g++ -std=gnu++17 -m32 -fsanitize=address -fsanitize-recover=address -fsanitize-address-use-after-scope -fno-omit-frame-pointer test_sem.cpp -lstdc++ -lpthread -lasan
to run it, simply ./a.out 2 5
what I obtain is the following unexpected result:
sem_post() from handler
sem_getvalue(): 1
main() about to call sem_timedwait()
sem_timedwait() timed out
the same code compiled WITHOUT the -m32 flag g++ -std=gnu++17 -fsanitize=address -fsanitize-recover=address -fsanitize-address-use-after-scope -fno-omit-frame-pointer test_sem.cpp -lstdc++ -lpthread -lasan gives me the following expected result
sem_post() from handler
sem_getvalue(): 1
main() about to call sem_timedwait()
sem_timedwait() succeeded
the same code compiled WITH the -m32 flag but WITHOUT the libasan g++ -std=gnu++17 -m32 test_sem.cpp -lstdc++ -lpthread -lasan gives me the following expected result:
sem_post() from handler
sem_getvalue(): 1
main() about to call sem_timedwait()
sem_timedwait() succeeded
Just for the sake of me I tried also to replace the signal handler code with a second thread to achieve the same sem_post before sem_timedwait and I obtain the same exact result. I further tried also using the non-POSIX-compliant sem_clockwait using both CLOCK_REALTIME and CLOCK_MONOTONIC and I got the same exact result.
I also tried completely removing g++ 9 and installed g++ 10 (which uses libasan.so.6 instead of libasan.so.5)
Right now I dont know if it is something on my side but seems that Docker Ubuntu 20.04 (no WSL), Debian 9 WSL 2, Ubuntu 20.04 WLS 2, and full Hyper-V Virtual Machine with Ubuntu 20.04 are all giving me the same result.
I tried everything I could think of to no avail.
UPDATE 1: This question has been updated to eliminate the multithreading, simplifying its scope. The original problem popened in the main thread, and pclosed the child process in a different thread. The problem being asked about is reproducible much more simply, by doing the popen and pclose in the same (main) thread.
Update 2: With help from responders at How to check libc version?, I think I've identified that the libc being used is uClibc 0.9.30.
The following code popens a script in the main thread, waits a little bit, then pcloses the child process in the same main thread. This program is cross-compiled for several cross-targets.
The executable's code:
// mybin.c
#include <stdio.h>
#include <stdlib.h>
#include <stdarg.h>
#include <stdbool.h>
#include <string.h>
#include <time.h>
#include <errno.h>
#include <unistd.h>
static FILE* logFile = NULL;
static void logInit( const char* fmt );
static void log_( const char* file, int line, const char* fmt, ... );
static void logCleanup();
#define log(fmt, ...) log_( __FILE__, __LINE__, fmt, ##__VA_ARGS__ )
int main( int argc, char* argv[] )
{
logInit( "./mybin.log" );
{
bool success = false;
FILE* f;
if ( ! (f = popen( "./myscript", "r" )) )
{
log( "popen error: %d (%s)", errno, strerror( errno ) );
goto end;
}
log( "before sleep" );
sleep( 1 );
log( "after sleep" );
pclose( f );
log( "after pclose" );
success = true;
}
end:
log( "At end" );
logCleanup();
return 0;
}
/** Initializes logging */
static void logInit( const char* file )
{
logFile = fopen( file, "a" );
}
/** Logs timestamp-prefixed, newline-suffixed printf-style text */
static void log_( const char* file, int line, const char* fmt, ... )
{
//static FILE* logOut = logFile ? logFile : stdout;
FILE* logOut = logFile ? logFile : stdout;
time_t t = time( NULL );
char fmtTime[16] = { '\0' };
struct tm stm = *(localtime( &t ));
char logStr[1024] = { '\0' };
va_list args;
va_start( args, fmt );
vsnprintf( logStr, sizeof logStr, fmt, args );
va_end( args );
strftime( fmtTime, sizeof fmtTime, "%Y%m%d_%H%M%S", &stm );
fprintf( logOut, "%s %s#%d %s\n", fmtTime, file, line, logStr );
}
/** Cleans up after logInit() */
static void logCleanup()
{
if ( logFile ) { fclose( logFile ); }
logFile = NULL;
}
The script:
#! /bin/bash
# mybin
rm -f ./myscript.log
for i in {1..10}; do echo "$(date +"%Y%m%d_%H%M%S") script is running" >> ./myscript.log; sleep 1; done
The expected behavior is that the compiled executable spawns execution of the script in a child process, waits for its completion, then exits. This is met on many cross-targets including x86, x64, and ARM. Below is an example architecture on which the expected behavior is met, compilation, and corresponding logs:
$ uname -a
Linux linuxbox 5.4.8-200.fc31.x86_64 #1 SMP Mon Jan 6 16:44:18 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Compilation:
$ gcc --version && gcc -g ./mybin.c -lpthread -o mybin
gcc (GCC) 9.2.1 20190827 (Red Hat 9.2.1-1)
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$
mybin.log:
20200705_200950 ./mybin.c#33 before sleep
20200705_200951 ./mybin.c#35 after sleep
20200705_201000 ./mybin.c#37 after pclose
20200705_201000 ./mybin.c#44 At end
myscript.log:
20200705_200950 script is running
20200705_200951 script is running
20200705_200952 script is running
20200705_200953 script is running
20200705_200954 script is running
20200705_200955 script is running
20200705_200956 script is running
20200705_200957 script is running
20200705_200958 script is running
20200705_200959 script is running
However, on one target, an odd thing occurs: pclose returns early: after the script has started running, but well before it has completed running -- why? Below is the problem architecture on which the unexpected behavior is observed, cross-compiler flags, and corresponding logs:
$ uname -a
Linux hostname 2.6.33-arm1 #2 Wed Jul 1 23:05:25 UTC 2020 armv7ml GNU/Linux
Cross-compilation:
$ /path/to/toolchains/ARM-cortex-m3-4.4/bin/arm-uclinuxeabi-gcc --version
arm-uclinuxeabi-gcc (Sourcery G++ Lite 2010q1-189) 4.4.1
Copyright (C) 2009 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ /path/to/toolchains/ARM-cortex-m3-4.4/bin/arm-uclinuxeabi-gcc -O2 -Wall -fno-strict-aliasing -Os -D__uClinux__ -fno-strict-aliasing -mcpu=cortex-m3 -mthumb -g -ffunction-sections -fdata-sections -I/path/to/toolchains/ARM-cortex-m3-4.4/usr/include/ -Wl,--gc-sections -Wl,-elf2flt=-s -Wl,-elf2flt=8192 -I/path/to/toolchains/ARM-cortex-m3-4.4/sysroot/usr/include -I/path/to/libs/ARM-cortex-m3-4.4/usr/include/ -L/path/to/toolchains/ARM-cortex-m3-4.4/sysroot/usr/lib -lrt -L/path/to/libs/ARM-cortex-m3-4.4/usr/lib -L/path/to/libs/ARM-cortex-m3-4.4/lib -o mybin ./mybin.c -lrt -lpthread
$
mybin.log:
20200705_235632 ./mybin.c#33 before sleep
20200705_235633 ./mybin.c#35 after sleep
20200705_235633 ./mybin.c#37 after pclose
20200705_235633 ./mybin.c#44 At end
myscript.log:
20200705_235632 script is running
The gist of my question is: why does pclose return prematurely, and why only on this one cross-target?
Comments and research have me circling the notion that this is a bug in the variant/version of libc - it'd be great if someone knowledgeable on the subject could help confirm if that is the case.
Not a dup of pclose() prematurely returning in a multi-threaded environment (Solaris 11)
I currently started learning the Linux Device driver programming in Linux. where I found this small piece of code printing hello world using printk() function.
#include <linux/module.h>
#include <linux/kernel.h>
#include <linux/init.h>
MODULE_LICENSE("Dual BSD/GPL");
static int hello_init(void)
{
printk(KERN_ALERT "Hello World!!!\n");
return 0;
}
static void hello_exit(void)
{
printk(KERN_ALERT "Goodbye Hello World!!!\n");
}
module_init(hello_init);
module_exit(hello_exit);
After compiling code using make command and load driver using insmod command. I'm not getting the "Hello world" printed on screen instead its printing only on the log file /var/log/kern.log. But I want printk to print on my ubuntu terminal. I'm using ubuntu(14.04). Is it possible?
It isn't possible to redirect kernel logs and massages to gnome-terminal and there you have to use dmesg.
But in a virtual terminal(open one with ctrl+F1-F6) you can redirect them to standard output.
First determine tty number by entering tty command in virtual terminal.The output may be /dev/tty(1-6).
Compile and run this code with the argument you specified.
/*
* setconsole.c -- choose a console to receive kernel messages
*
* Copyright (C) 1998,2000,2001 Alessandro Rubini
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <unistd.h>
#include <sys/ioctl.h>
int main(int argc, char **argv)
{
char bytes[2] = {11,0}; /* 11 is the TIOCLINUX cmd number */
if (argc==2) bytes[1] = atoi(argv[1]); /* the chosen console */
else {
fprintf(stderr, "%s: need a single arg\n",argv[0]); exit(1);
}
if (ioctl(STDIN_FILENO, TIOCLINUX, bytes)<0) { /* use stdin */
fprintf(stderr,"%s: ioctl(stdin, TIOCLINUX): %s\n",
argv[0], strerror(errno));
exit(1);
}
exit(0);
}
For example if your output for tty command was /dev/tty1then type this two command:
gcc setconsole.c -o setconsole
sudo ./setconsole 1
This will set your tty to receive kernel messages.
Then compile and run this code.
/*
* setlevel.c -- choose a console_loglevel for the kernel
*
* Copyright (C) 1998,2000,2001 Alessandro Rubini
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA.
*/
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <errno.h>
#include <sys/klog.h>
int main(int argc, char **argv)
{
int level;
if (argc==2) {
level = atoi(argv[1]); /* the chosen console */
} else {
fprintf(stderr, "%s: need a single arg\n",argv[0]); exit(1);
}
if (klogctl(8,NULL,level) < 0) {
fprintf(stderr,"%s: syslog(setlevel): %s\n",
argv[0],strerror(errno));
exit(1);
}
exit(0);
}
There are 8 level of kernel messages as you specify in your code KERN_ALERT is one of them.To make console receive all of them you should run above code with 8 as arguement.
gcc setlevel.c -o setlevel
sudo ./setlevel 8
Now you can insert your module to kernel and see kernel logs in console.
By the way these codes are from ldd3 examples.
printk prints to the kernel log. There is no "screen" as far as the kernel is concerned. If you want to see the output of printk in real time, you can open a terminal and type the following dmesg -w. Note that the -w flag is only supported by recent versions of dmesg (which is provided by the util-linux package).
My code is segfaulting and I have no idea what is wrong. I've simplified it as far as I can but still can't find a problem.
C File test.c:
#include <stdlib.h>
#include <stdio.h>
struct container {
void *A[3], *B[3], *C[3], *D[3];
int x, y, z;
};
int main (int argc, char* argv[]) {
struct container *cont = malloc (sizeof cont);
FILE* fh = fopen( argv[1], "r" );
if( fh == NULL ) return 0;
fscanf(fh, "%d %d", &cont->y, &cont->z);
fclose( fh );
free( cont );
return 0;
}
Contents of test.txt
1 1
Executing and running through gdb:
$ gcc --version
gcc (Ubuntu/Linaro 4.8.1-10ubuntu9) 4.8.1
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
$ gcc -Wall -g test.c && gdb a.out
GNU gdb (GDB) 7.6.1-ubuntu
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/dberg/ITX/Cells/test/a.out...done.
(gdb) break 26
Breakpoint 1 at 0x400739: file test.c, line 26.
(gdb) run test.txt
Starting program: /home/dberg/ITX/Cells/test/a.out test.txt
Breakpoint 1, main (argc=2, argv=0x7fffffffdf48) at test.c:26
26 fclose( fh );
(gdb) n
Program received signal SIGSEGV, Segmentation fault.
__GI___libc_free (mem=0x1) at malloc.c:2892
2892 malloc.c: No such file or directory.
(gdb)
Deleting any one of the unused struct members allows the code to execute without error. Moving any of the unused struct members to the end of the struct or decreasing the size of any 1 or all arrays also allows the code to execute successfully. The presence of the fscanf() call is also necessary for the segfault
Where is my syntax wrong and why is the size of the struct so critical to this bug?
There's a * missing in struct container *cont = malloc (sizeof cont);, you need sizeof *cont.
Bzzzt! It is not fclose that is failing, it is that you are not malloc'ing enough space to hold a (struct container) type. Which is a semantic rather than syntactic problem.
Suppose you have a file, called "stuff", containing:
1,2,3
And your program is named doit.c, and it reads this file (checking for enough arguments, checking return values from fopen and malloc, etc),
//you might want to carry a shorter name around,
typedef struct container_s
{
void *A[3], *B[3], *C[3], *D[3];
int x, y, z;
} container;
//how big is a (struct container)? depends. How big is a (void*) or an (int)?
//Suppose 32-bit, then you have 12x4+3*4=60 bytes.
//Suppose 64-bit pointer, and 32-bit integer, then you have 12x8+3*4=108 bytes.
int main (int argc, char* argv[])
{
struct container* cont;
FILE* fh = fopen( argv[1], "r" );
char* filename=NULL;
//you really should not examine argv[1] if there is no argument...
if(argc<1) {
printf("usage: stuff <filename>\n");
exit(EXIT_FAILURE);
}
filename=argv[1];
//allocate space for a (struct container_s)
if( !(cont = malloc(sizeof(struct container))) ) {
printf("error: cannot allocate container\n");
}
//check that file opens successfully,
if(!(fh=fopen(filename,"r" ))) {
printf("error: cannot open %s\n",filename);
return 0;
}
//read your vector (x,y,z),
fscanf(fh,"%d,%d,%d",&(cont->x),&(cont->y),&(cont->z));
//for fun, print the (x,y,z) coordinates,
printf("stuff(%d,%d,%d)\n",cont->x,cont->y,cont->z);
fclose(fh);
free(cont);
return 0;
}
Compile and run the above, and you get,
./doit stuff
stuff(1,2,3)
Please check the return values from library functions (fopen, malloc) and bounds check arrays (such as argv[]). Oh, and you might want to give a symbolic name for A[], B[], C[], and D[] in your container.
I have a test program:
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <unistd.h>
#include <mqueue.h>
#include <errno.h>
#include <fcntl.h>
int main() {
struct mq_attr attrs;
attrs.mq_maxmsg = 10;
attrs.mq_msgsize = sizeof(int);
const char name[] = "/test-queue";
mqd_t q = mq_open(name, O_CREAT | O_RDWR, 0600, &attrs);
if (q == (mqd_t)-1) {
perror("mq_open");
exit(EXIT_FAILURE);
}
mq_unlink(name); // it doesn't matter if I do this at the end or not
if (fork()) {
int msg = 666;
if (mq_send(q, (const char *)&msg, sizeof(msg), 1)) {
perror("mq_send");
exit(EXIT_FAILURE);
}
} else {
int msg;
unsigned priority;
if (mq_receive(q, (char *)&msg, sizeof(msg), &priority) == -1) {
perror("mq_receive");
exit(EXIT_FAILURE);
}
printf("%d\n", msg);
}
mq_close(q);
return 0;
}
I compile this program using gcc -std=c99 -Wall -o mqtest mqtest.c -lrt on two platforms:
Linux kallikanzarid-desktop 3.8.0-31-generic #46-Ubuntu SMP Tue Sep 10 20:03:44 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
FreeBSD bsd.localhost 9.2-RELEASE FreeBSD 9.2-RELEASE #0 r255898: Thu Sep 26 22:50:31 UTC 2013 root#bake.isc.freebsd.org:/usr/obj/usr/src/sys/GENERIC amd64
On Linux, everything works. On FreeBSD, I get mq_receive: Bad file descriptor. Moving the mq_unlink call to the end of main() doesn't help. Is there a way to fix this, or do I have to postpone marking the queue for deletion and reopen it after the fork?
FreeBSD does preserve message queue descriptors. See mq_open(2):
FreeBSD implements message queue based on file descriptor. The descriptor is inherited by child after fork(2). The descriptor is closed in a new image after exec(3). The select(2) and kevent(2) system calls are supported for message queue descriptor.
Edit:
The structure that mqd_t points to does contain a descriptor. But if you test that file descriptor just after the fork() using fcntl(), it also returns EBADF.
This is a bug in FreeBSD. But wether the bug is in the docs or in the implementation I cannot say.