Can't open /dev/input/js file descriptor after inotify event - c

I'm creating a Linux module for a game library that let's you hotplug multiple joysticks, it uses inotify to watch /dev/input.
I am testing it with 3 joysticks:
First I connect 2 joysticks.
Then I start the application, the joysticks work and I don't get a error.
After that I connect the third joystick, perror gives: /dev/input/js1: Permission denied.
When I check ls -l /proc/<pid-of-process>/fd it lists /dev/input/js0 and /dev/input/js2.
All the joysticks work fine when I run it as root.
This is how it's initialized:
static void createGamepad(char *locName){
char dirName[30];
int fd;
snprintf(dirName, 30, "/dev/input/%s", locName);
fd = open(dirName, O_RDONLY | O_NONBLOCK, 0);
if(fd < 0){
perror(dirName);
}
}
struct dirent *dir;
DIR *d;
int i, notifyfd, watch;
// Attach notifications to check if a device connects/disconnects
notifyfd = inotify_init();
watch = inotify_add_watch(notifyfd, "/dev/input", IN_CREATE | IN_DELETE);
d = opendir("/dev/input");
i = 0;
while((dir = readdir(d)) != NULL){
if(*dir->d_name == 'j' && *(dir->d_name + 1) == 's'){
createGamepad(dir->d_name, i);
i++;
}
}
closedir(d);
After that inotify handles it like this in the while(1) loop:
static bool canReadINotify(){
fd_set set;
struct timeval timeout;
FD_ZERO(&set);
FD_SET(notifyfd, &set);
timeout.tv_sec = 0;
timeout.tv_usec = 0;
return select(notifyfd + 1, &set, NULL, NULL, &timeout) > 0 &&
FD_ISSET(notifyfd, &set);
}
// Inside the event loop
struct inotify_event ne;
while(canReadINotify()){
if(read(notifyfd, &ne, sizeof(struct inotify_event) + 16) >= 0){
if(*ne.name != 'j' || *(ne.name + 1) != 's'){
continue;
}
if(ne.mask & IN_CREATE){
createGamepad(ne.name);
}
}
}
Is it even possible with inotify or should I use udev? And if it's possible, how can I solve this?

It is very likely a race condition. You see, you get the inotify event when the device node is created (by udev using a mknod() call), but the access permissions are set by udev using a separate chown() call, just a tiny bit later.
See systemd src/udev/udev-node.c, node_permissions_apply(). In this particular case, /dev/input/jsX is not a symlink, but the actual device node; at least with systemd the device node access mode gets set sometime later, after the actual node is created.
One robust solution would be to modify your createGamepad() function, so that instead of failing completely at fd == -1 && errno == EACCES, you instead retry after a short while; at least a few times, say for up to a second or two.
However, ninjalj pointed out a better suggestion: use also the access permissions change as a trigger to check the device node. This is trivially accomplished, by using IN_CREATE | IN_DELETE | IN_ATTRIBUTE in the inotify_add_watch() function!
(You'll also want to ignore open()==-1, errno==EACCES errors in createGamepad(), as they are likely caused by this race condition, and the following IN_ATTRIBUTE inotify event will yield access to the same device.)
Prior to ninjalj's comment, I'd personally have used an array of input devices, and another for "possible" input devices that can/need to be retried after a short timeout to decide whether they are available or not, but I think his suggestion is much better.
Need/want an example?

Related

Faking an input device for testing purpose

What I want to do
I'm writing a daemon which listen to the input devices for keys presses and send signals via D-Bus. The main goal is to manage audio volume and screen backlight level by requesting changes or informing about changes.
I use libevdev to handle the input device events.
I wrote a function for opening an input device located at a specified path:
Device device_open(const char *path);
That function works well, but while I'm writing unit tests for it, I wanted to create file fixtures with different properties (existence of the file, read access, etc.) to check the error handling of my function and memory management (as I store data in a structure).
What I have already done
But testing it with a real input device (located at /dev/input/event*) needs root access rights. Setting read access for everyone on /dev/input/event* files works but seems risky to me. Executing my tests as root is worse !
Creating a device using mknod works but needs to be done as root.
I also tried to use character special files (because input devices are one of those) allowing read for everyone (like /dev/random, /dev/zero, /dev/null and even the terminal device i'm currently using: /dev/tty2).
But those devices does not handles ioctl requests needed by libevdev: EVIOCGBIT is the first request returning an error "Inappropriate ioctl for device".
What I'm looking for
I want to be able to create device files as a regular user (the user executing the unit tests). Then, by setting access rights I should be able to test my function behavior for different kinds of file (read only, no read allowed, bad device type, etc.).
If it appears to be impossible, I will certainly refactor my function using private helpers. But how to do it. Any examples ?
Thanks.
Edit: I tried to express better my needs.
Create a group for users who are allowed to access the device, and an udev rule to set the ownership of that input event device to that group.
I use teensy (system) group:
sudo groupadd -r teensy
and add each user into it using e.g.
sudo usermod -a -g teensy my-user-name
or whatever graphical user interface I have available.
By managing which users and service daemons belong to the teensy group, you can easily manage the access to the devices.
For my Teensy microcontrollers (that have native USB, and I use for HID testing), I have the following /lib/udev/rules.d/49-teensy.rules:
ATTRS{idVendor}=="16c0", ATTRS{idProduct}=="04[789B]?", ENV{ID_MM_DEVICE_IGNORE}="1"
ATTRS{idVendor}=="16c0", ATTRS{idProduct}=="04[789A]?", ENV{MTP_NO_PROBE}="1"
SUBSYSTEMS=="usb", ATTRS{idVendor}=="16c0", ATTRS{idProduct}=="04[789ABCD]?", GROUP:="teensy", MODE:="0660"
KERNEL=="ttyACM*", ATTRS{idVendor}=="16c0", ATTRS{idProduct}=="04[789B]?", GROUP:="teensy", MODE:="0660"
You only need the third line (SUBSYSTEMS=="usb", one) for HID devices, though. Make sure the idVendor and idProduct match your USB HID device. You can use lsusb to list the currently connected USB devices vendor and product numbers. The matching uses glob patterns, just like file names.
After adding the above, don't forget running sudo udevadm control --reload-rules && sudo udevadm trigger to reload the rules. Next time you plug in your USB HID device, all members of your group (teensy in the above) can access it directly.
Note that by default in most distributions, udev also creates persistent symlinks in /dev/input/by-id/ using the USB device type and serial. In my case, one of my Teensy LC's (serial 4298820) with a combined keyboard-mouse-joystic device provides /dev/input/by-id/usb-Teensyduino_Keyboard_Mouse_Joystick_4298820-event-kbd for the keyboard event device, /dev/input/by-id/usb-Teensyduino_Keyboard_Mouse_Joystick_4298820-if01-event-mouse for the mouse event device, and /dev/input/by-id/usb-Teensyduino_Keyboard_Mouse_Joystick_4298820-if03-event-joystick and /dev/input/by-id/usb-Teensyduino_Keyboard_Mouse_Joystick_4298820-if04-event-joystick for the two joystick interfaces.
(By "persistent", I do not mean these symlinks always exist; I mean that whenever that particular device is plugged in, the symlink of exactly that name exists, and points to the actual Linux input event character device.)
The Linux uinput device can be used to implement a virtual input event device using a simple privileged daemon.
The process to create a new virtual USB input event device goes as follows.
Open /dev/uinput for writing (or reading and writing):
fd = open("/dev/uinput", O_RDWR);
if (fd == -1) {
fprintf(stderr, "Cannot open /dev/uinput: %s.\n", strerror(errno));
exit(EXIT_FAILURE);
}
The above requires superuser privileges. However, immediately after opening the device, you can drop all privileges, and have your daemon/service run as a dedicated user instead.
Use the UI_SET_EVBIT ioctl for each event type allowed.
You will want to allow at least EV_SYN; and EV_KEY for keyboards and mouse buttons, and EV_REL for mouse movement, and so on.
if (ioctl(fd, UI_SET_EVBIT, EV_SYN) == -1 ||
ioctl(fd, UI_SET_EVBIT, EV_KEY) == -1 ||
ioctl(fd, UI_SET_EVBIT, EV_REL) == -1) {
fprintf(stderr, "Uinput event types not allowed: %s.\n", strerror(errno));
close(fd);
exit(EXIT_FAILURE);
}
I personally use a static constant array with the codes, for easier management.
Use the UI_SET_KEYBIT ioctl for each key code the device may emit, and UI_SET_RELBIT ioctl for each relative movement code (mouse code). For example, to allow space, left mouse button, horizontal and vertical mouse movement, and mouse wheel:
if (ioctl(fd, UI_SET_KEYBIT, KEY_SPACE) == -1 ||
ioctl(fd, UI_SET_KEYBIT, BTN_LEFT) == -1 ||
ioctl(fd, UI_SET_RELBIT, REL_X) == -1 ||
ioctl(fd, UI_SET_RELBIT, REL_Y) == -1 ||
ioctl(fd, UI_SET_RELBIT, REL_WHEEL) == -1) {
fprintf(stderr, "Uinput event types not allowed: %s.\n", strerror(errno));
close(fd);
exit(EXIT_FAILURE);
}
Again, static const arrays (one for UI_SET_KEYBIT and one for UI_SET_RELBIT codes) is much easier to maintain.
Define a struct uinput_user_dev, and write it to the device.
If you have name containing the device name string, vendor and product with the USB vendor and product ID numbers, version with a version number (0 is fine), use
struct uinput_user_dev dev;
memset(&dev, 0, sizeof dev);
strncpy(dev.name, name, UINPUT_MAX_NAME_SIZE);
dev.id.bustype = BUS_USB;
dev.id.vendor = vendor;
dev.id.product = product;
dev.id.version = version;
if (write(fd, &dev, sizeof dev) != sizeof dev) {
fprintf(stderr, "Cannot write an uinput device description: %s.\n", strerror(errno));
close(fd);
exit(EXIT_FAILURE);
}
Later kernels have an ioctl to do the same thing (apparently being involved in systemd development causes this kind of drain bamage);
struct uinput_setup dev;
memset(&dev, 0, sizeof dev);
strncpy(dev.name, name, UINPUT_MAX_NAME_SIZE);
dev.id.bustype = BUS_USB;
dev.id.vendor = vendor;
dev.id.product = product;
dev.id.version = version;
if (ioctl(fd, UI_DEV_SETUP, &dev) == -1) {
fprintf(stderr, "Cannot write an uinput device description: %s.\n", strerror(errno));
close(fd);
exit(EXIT_FAILURE);
}
The idea seems to be that instead of using the former, you can try the latter first, and if it fails, do the former instead. You know, because a single interface might some day not be enough. (That's what the documentation and commit say, anyway.)
I might sound a bit cranky, here, but that's just because I do subscribe to both the Unix philosophy and the KISS principle (or minimalist approach), and see such warts completely unnecessary. And too often coming from the same loosely related group of developers. Ahem. No personal insult intended; I just think they are doing poor job.
Create the virtual device, by issuing an UI_DEV_CREATE ioctl:
if (ioctl(fd, UI_DEV_CREATE) == -1) {
fprintf(stderr, "Cannot create the virtual uinput device: %s.\n", strerror(errno));
close(fd);
exit(EXIT_FAILURE);
}
At this point, the kernel will construct the device, provide the corresponding event to the udev daemon, and the udev daemon will construct the device node and symlink(s) according to its configuration. All this will take a bit of time -- a fraction of a second in the real world, but enough that trying to emit events immediately might cause some of them to be lost.
Emit the input events (struct input_event) by writing to the uinput device.
You can write one or more struct input_events at a time, and should never see short writes (unless you try to write a partial event structure). Partial event structures are completely ignored. (See drivers/input/misc/uinput.c:uinput_write() uinput_inject_events() for how the kernel handles such writes.)
Many actions consists of more than one struct input_event. For example, you might move the mouse diagonally (emitting both { .type == EV_REL, .code == REL_X, .value = xdelta } and { .type == EV_REL, .code == REL_Y, .value = ydelta } for that single movement). The synchronization events ({ .type == EV_SYN, .code == 0, .value == 0 }) are used as a sentinel or separator, denoting the end of related events.
Because of this, you'll need to append an { .type == EV_SYN, .code == 0, .value == 0 } input event after each individual action (mouse movement, key press, key release, and so on). Think of it as the equivalent of a newline, for line-buffered input.
For example, the following code moves the mouse diagonally down right by a single pixel.
struct input_event event[3];
memset(event, 0, sizeof event);
event[0].type = EV_REL;
event[0].code = REL_X;
event[0].value = +1; /* Right */
event[1].type = EV_REL;
event[1].code = REL_Y;
event[1].value = +1; /* Down */
event[2].type = EV_SYN;
event[2].code = 0;
event[2].value = 0;
if (write(fd, event, sizeof event) != sizeof event)
fprintf(stderr, "Failed to inject mouse movement event.\n");
The failure case is not fatal; it only means the events were not injected (although I don't see how that could happen in current kernels; better be defensive, just in case). You can simply retry the same again, or ignore the failure (but letting the user know, so they can investigate, if it ever happens). So log it or output a warning, but no need for it to cause the daemon/service to exit.
Destroy the device:
ioctl(fd, UI_DEV_DESTROY);
close(fd);
The device does get automatically destroyed when the last duplicate of the original opened descriptor gets closed, but I recommend doing it explicitly as above.
Putting steps 1-5 in a function, you get something like
#define _POSIX_C_SOURCE 200809L
#define _GNU_SOURCE
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <linux/uinput.h>
#include <string.h>
#include <errno.h>
#include <stdio.h>
static const unsigned int allow_event_type[] = {
EV_KEY,
EV_SYN,
EV_REL,
};
#define ALLOWED_EVENT_TYPES (sizeof allow_event_type / sizeof allow_event_type[0])
static const unsigned int allow_key_code[] = {
KEY_SPACE,
BTN_LEFT,
BTN_MIDDLE,
BTN_RIGHT,
};
#define ALLOWED_KEY_CODES (sizeof allow_key_code / sizeof allow_key_code[0])
static const unsigned int allow_rel_code[] = {
REL_X,
REL_Y,
REL_WHEEL,
};
#define ALLOWED_REL_CODES (sizeof allow_rel_code / sizeof allow_rel_code[0])
static int uinput_open(const char *name, const unsigned int vendor, const unsigned int product, const unsigned int version)
{
struct uinput_user_dev dev;
int fd;
size_t i;
if (!name || strlen(name) < 1 || strlen(name) >= UINPUT_MAX_NAME_SIZE) {
errno = EINVAL;
return -1;
}
fd = open("/dev/uinput", O_RDWR);
if (fd == -1)
return -1;
memset(&dev, 0, sizeof dev);
strncpy(dev.name, name, UINPUT_MAX_NAME_SIZE);
dev.id.bustype = BUS_USB;
dev.id.vendor = vendor;
dev.id.product = product;
dev.id.version = version;
do {
for (i = 0; i < ALLOWED_EVENT_TYPES; i++)
if (ioctl(fd, UI_SET_EVBIT, allow_event_type[i]) == -1)
break;
if (i < ALLOWED_EVENT_TYPES)
break;
for (i = 0; i < ALLOWED_KEY_CODES; i++)
if (ioctl(fd, UI_SET_KEYBIT, allow_key_code[i]) == -1)
break;
if (i < ALLOWED_KEY_CODES)
break;
for (i = 0; i < ALLOWED_REL_CODES; i++)
if (ioctl(fd, UI_SET_RELBIT, allow_rel_code[i]) == -1)
break;
if (i < ALLOWED_REL_CODES)
break;
if (write(fd, &dev, sizeof dev) != sizeof dev)
break;
if (ioctl(fd, UI_DEV_CREATE) == -1)
break;
/* Success. */
return fd;
} while (0);
/* FAILED: */
{
const int saved_errno = errno;
close(fd);
errno = saved_errno;
return -1;
}
}
static void uinput_close(const int fd)
{
ioctl(fd, UI_DEV_DESTROY);
close(fd);
}
which seem to work fine, and requires no libraries (other than the standard C library).
It is important to realize that the Linux input subsystem, including uinput and struct input_event, are binary interfaces to the Linux kernel, and therefore will be kept backwards compatible (except for pressing technical reasons, like security issues or serious conflicts with other parts of the kernel). (The desire to wrap everything under the freedesktop.org or systemd umbrella is not one.)

Read chardevice with libevent

I wrote a chardevice that passes some messages received from the network to an user space application. The user space application has to both read the chardevice and send/receive messages via TCP sockets to other user-space applications. Both read and receiving should be blocking.
Since Libevent is able to handle multiple events at the same time, I thought registering an event for the file created by the chardevice and an event for a socket would just work, but I was wrong.
But a chardevice creates a "character special file", and libevent seems to not be able to block. If I implement a blocking mechanism inside the chardevice, i.e. mutex or semaphore, then the socket event blocks too, and the application cannot receive messages.
The user space application has to accept outside connections at any time.
Do you know how to make it work? Maybe also using another library, I just want a blocking behaviour for both socket and file reader.
Thank you in advance.
Update: Thanks to #Ahmed Masud for the help. This is what I've done
Kernel module chardevice:
Implement a poll function that waits until new data is available
struct file_operations fops = {
...
.read = kdev_read,
.poll = kdev_poll,
};
I have a global variable to handle if the user space has to stop, and a wait queue:
static working = 1;
static wait_queue_head_t access_wait;
This is the read function, I return -1 if there is an error in copy_to_user, > 0 if everything went well, and 0 if the module has to stop. used_buff is atomic since it handles the size of a buffer shared read by user application and written by kernel module.
ssize_t
kdev_read(struct file* filep, char* buffer, size_t len, loff_t* offset)
{
int error_count;
if (signal_pending(current) || !working) { // user called sigint
return 0;
}
atomic_dec(&used_buf);
size_t llen = sizeof(struct user_msg) + msg_buf[first_buf]->size;
error_count = copy_to_user(buffer, (char*)msg_buf[first_buf], llen);
if (error_count != 0) {
atomic_inc(&used_buf);
paxerr("send fewer characters to the user");
return error_count;
} else
first_buf = (first_buf + 1) % BUFFER_SIZE;
return llen;
}
When there is data to read, I simply increment used_buf and call wake_up_interruptible(&access_wait).
This is the poll function, I just wait until the used_buff is > 0
unsigned int
kdev_poll(struct file* file, poll_table* wait)
{
poll_wait(file, &access_wait, wait);
if (atomic_read(&used_buf) > 0)
return POLLIN | POLLRDNORM;
return 0;
}
Now, the problem here is that if I unload the module while the user space application is waiting, the latter will go into a blocked state and it won't be possible to stop it. That's why I wake up the application when the module is unloaded
void
kdevchar_exit(void)
{
working = 0;
atomic_inc(&used_buf); // increase buffer size to application is unlocked
wake_up_interruptible(&access_wait); // wake up application, but this time read will return 0 since working = 0;
... // unregister everything
}
User space application
Libevent by default uses polling, so simply create an event_base and a reader event.
base = event_base_new();
filep = open(fname, O_RDWR | O_NONBLOCK, 0);
evread = event_new(base, filep, EV_READ | EV_PERSIST,
on_read_file, base);
where on_read_file simply reads the file, no poll call is made (libevent handles that):
static void
on_read_file(evutil_socket_t fd, short event, void* arg)
{
struct event_base* base = arg;
int len = read(...);
if (len < 0)
return;
if (len == 0) {
printf("Stopped by kernel module\n");
event_base_loopbreak(base);
return;
}
... // handle message
}

mq_open giving "too many open files"

I created a message queue with following code. First few times it works properly.
int main()
{
mqd_t mqdes;
char mq_name[10] = "/mq";
int oflag = O_CREAT | O_RDWR, ret;
struct mq_attr attr;
attr.mq_maxmsg = 1024;
attr.mq_msgsize = 2048;
mqdes = mq_open(mq_name, oflag, 0766, &attr);
if(mqdes == -1) {
perror("mq_open");
if(errno == EMFILE)
perror("EMFILE");
exit(1);
}
printf("mqueue created, mq_descriptor: %d\n", mqdes);
ret = mq_close(mqdes);
if(ret == -1) {
perror("mq_close");
exit(2);
}
printf(" mq closed successful\n");
return 0;
}
After that, it's giving following error
mq_open: Too many open files
EMFILE: Too many open files
But why i'm getting this error? How can I see possix message queues like ipcs is for system V?
Try to set the resource limits:
#include <sys/resource.h>
struct rlimit rlim;
memset(&rlim, 0, sizeof(rlim));
rlim.rlim_cur = RLIM_INFINITY;
rlim.rlim_max = RLIM_INFINITY;
setrlimit(RLIMIT_MSGQUEUE, &rlim);
I had the same issue while trying something. If you have by accident too many open message queues left on your system, you can try deleting your mqueue's in directory /dev/mqueue. This worked for me.
Also you might want to use mq_unlink(const char *name) after the mq_close() to ensure that the queue is removed from the system as described here.
I had the same problem and I solved it by increasing RLIMIT_MSGQUEUE via setrlimit.
If the hard limit (rlim_max) is too low as well (which was the case for me), you will have to give your process the CAP_SYS_RESOURCE privilege so that you can set the hard limit before you set the process limit (rlim_cur). Either run $ setcap 'CAP_SYS_RESOURCE=+ep' /path/to/executable over an executable or edit /etc/security/capability.conf to give CAP_SYS_RESOURCE to a user/group.

cat terminal , check usb removed (perror)?

For an assignment we have to create C program that functions similar to the cat command. The first hand-in requires it to mimic very minimal operations of cat....i.e print to output, redirect. The issue I'm having is that one requirement is to print an error in the case that an output file residing on a usb drive is lost, i.e usb pulled out whilst redirecting stdout to it.
How do I catch such an error, also how can perform a test-case for that particular error ??
Many Thanks....really have no idea
UPDATE CODE TEMP
int main(){
char c;
while((c = getchar()) != EOF){
putchar(c);
// Ensure newly created file exists
}
return EXIT_SUCCESS;
}
Assuming you are using fprintf(), from the man pages:
On success, the total number of characters written is returned.
So:
store the size of the char array you will write into a variable x
if fprintf() is less than x, the writing was interrupted.
exit gracefully
EDIT:
There are 2 things I'm thinking of:
1: When putchar() fails, it indicates an error when writing to the file. Since writing one byte doesn't take very long, this should be unlikely since it will be in a safe state once the byte is written (or you assume).
You can do this like so
if(putchar(c) == EOF){
//write error
}
2: If you're asked to quit the instant you detect a file removal, then you need to monitor the directory. Luckily, you're only looking at one directory. However that while loop gets in the way of things because getchar() is a blocking function (cannot return until something happens). You should use inotify to monitor the directory, then probably poll to poll the file descriptor of inotify(). When I did this I used select because we were forced to.
Some kind of an idea how to monitor a directory with inotify()
int length, i = 0;
char buffer[EVENT_BUF_LEN];
memset(buffer, 0, EVENT_BUF_LEN*sizeof(char));
//init inotify
fd = inotify_init();
if(fd < 0){
perror("inotify init");
}
//add directory to watch list
wd = inotify_add_watch(fd, path , IN_DELETE |
IN_DELETE_SELF | IN_MODIFY | IN_MOVE_SELF | IN_MOVED_FROM | IN_MOVED_TO);
fd_set fds;
FD_ZERO(&fds);
FD_SET(fd, &fds);
//wait for event, since read() blocks
length = read( fd, buffer, EVENT_BUF_LEN );
if ( length < 0 ) {
perror("zero event length");
}
struct inotify_event *event;
while (i < length){
//cast the event to a char buffer
event = (struct inotify_event*) &buffer[i];
if (event->len){
//this was a custom function of mine
storeEvent(event);
}
i += EVENT_SIZE + event->len;
}
You'll have to check which attributes to use when adding a directory (like IN_DELETE or IN_MODIFY) since they will determine what triggers an inotify() event. Note this code will only detect one event, and blocks at the read() statement.

close() is not closing socket properly

I have a multi-threaded server (thread pool) that is handling a large number of requests (up to 500/sec for one node), using 20 threads. There's a listener thread that accepts incoming connections and queues them for the handler threads to process. Once the response is ready, the threads then write out to the client and close the socket. All seemed to be fine until recently, a test client program started hanging randomly after reading the response. After a lot of digging, it seems that the close() from the server is not actually disconnecting the socket. I've added some debugging prints to the code with the file descriptor number and I get this type of output.
Processing request for 21
Writing to 21
Closing 21
The return value of close() is 0, or there would be another debug statement printed. After this output with a client that hangs, lsof is showing an established connection.
SERVER 8160 root 21u IPv4 32754237 TCP localhost:9980->localhost:47530 (ESTABLISHED)
CLIENT 17747 root 12u IPv4 32754228 TCP localhost:47530->localhost:9980 (ESTABLISHED)
It's as if the server never sends the shutdown sequence to the client, and this state hangs until the client is killed, leaving the server side in a close wait state
SERVER 8160 root 21u IPv4 32754237 TCP localhost:9980->localhost:47530 (CLOSE_WAIT)
Also if the client has a timeout specified, it will timeout instead of hanging. I can also manually run
call close(21)
in the server from gdb, and the client will then disconnect. This happens maybe once in 50,000 requests, but might not happen for extended periods.
Linux version: 2.6.21.7-2.fc8xen
Centos version: 5.4 (Final)
socket actions are as follows
SERVER:
int client_socket;
struct sockaddr_in client_addr;
socklen_t client_len = sizeof(client_addr);
while(true) {
client_socket = accept(incoming_socket, (struct sockaddr *)&client_addr, &client_len);
if (client_socket == -1)
continue;
/* insert into queue here for threads to process */
}
Then the thread picks up the socket and builds the response.
/* get client_socket from queue */
/* processing request here */
/* now set to blocking for write; was previously set to non-blocking for reading */
int flags = fcntl(client_socket, F_GETFL);
if (flags < 0)
abort();
if (fcntl(client_socket, F_SETFL, flags|O_NONBLOCK) < 0)
abort();
server_write(client_socket, response_buf, response_length);
server_close(client_socket);
server_write and server_close.
void server_write( int fd, char const *buf, ssize_t len ) {
printf("Writing to %d\n", fd);
while(len > 0) {
ssize_t n = write(fd, buf, len);
if(n <= 0)
return;// I don't really care what error happened, we'll just drop the connection
len -= n;
buf += n;
}
}
void server_close( int fd ) {
for(uint32_t i=0; i<10; i++) {
int n = close(fd);
if(!n) {//closed successfully
return;
}
usleep(100);
}
printf("Close failed for %d\n", fd);
}
CLIENT:
Client side is using libcurl v 7.27.0
CURL *curl = curl_easy_init();
CURLcode res;
curl_easy_setopt( curl, CURLOPT_URL, url);
curl_easy_setopt( curl, CURLOPT_WRITEFUNCTION, write_callback );
curl_easy_setopt( curl, CURLOPT_WRITEDATA, write_tag );
res = curl_easy_perform(curl);
Nothing fancy, just a basic curl connection. Client hangs in tranfer.c (in libcurl) because the socket is not perceived as being closed. It's waiting for more data from the server.
Things I've tried so far:
Shutdown before close
shutdown(fd, SHUT_WR);
char buf[64];
while(read(fd, buf, 64) > 0);
/* then close */
Setting SO_LINGER to close forcibly in 1 second
struct linger l;
l.l_onoff = 1;
l.l_linger = 1;
if (setsockopt(client_socket, SOL_SOCKET, SO_LINGER, &l, sizeof(l)) == -1)
abort();
These have made no difference. Any ideas would be greatly appreciated.
EDIT -- This ended up being a thread-safety issue inside a queue library causing the socket to be handled inappropriately by multiple threads.
Here is some code I've used on many Unix-like systems (e.g SunOS 4, SGI IRIX, HPUX 10.20, CentOS 5, Cygwin) to close a socket:
int getSO_ERROR(int fd) {
int err = 1;
socklen_t len = sizeof err;
if (-1 == getsockopt(fd, SOL_SOCKET, SO_ERROR, (char *)&err, &len))
FatalError("getSO_ERROR");
if (err)
errno = err; // set errno to the socket SO_ERROR
return err;
}
void closeSocket(int fd) { // *not* the Windows closesocket()
if (fd >= 0) {
getSO_ERROR(fd); // first clear any errors, which can cause close to fail
if (shutdown(fd, SHUT_RDWR) < 0) // secondly, terminate the 'reliable' delivery
if (errno != ENOTCONN && errno != EINVAL) // SGI causes EINVAL
Perror("shutdown");
if (close(fd) < 0) // finally call close()
Perror("close");
}
}
But the above does not guarantee that any buffered writes are sent.
Graceful close: It took me about 10 years to figure out how to close a socket. But for another 10 years I just lazily called usleep(20000) for a slight delay to 'ensure' that the write buffer was flushed before the close. This obviously is not very clever, because:
The delay was too long most of the time.
The delay was too short some of the time--maybe!
A signal such SIGCHLD could occur to end usleep() (but I usually called usleep() twice to handle this case--a hack).
There was no indication whether this works. But this is perhaps not important if a) hard resets are perfectly ok, and/or b) you have control over both sides of the link.
But doing a proper flush is surprisingly hard. Using SO_LINGER is apparently not the way to go; see for example:
http://msdn.microsoft.com/en-us/library/ms740481%28v=vs.85%29.aspx
https://www.google.ca/#q=the-ultimate-so_linger-page
And SIOCOUTQ appears to be Linux-specific.
Note shutdown(fd, SHUT_WR) doesn't stop writing, contrary to its name, and maybe contrary to man 2 shutdown.
This code flushSocketBeforeClose() waits until a read of zero bytes, or until the timer expires. The function haveInput() is a simple wrapper for select(2), and is set to block for up to 1/100th of a second.
bool haveInput(int fd, double timeout) {
int status;
fd_set fds;
struct timeval tv;
FD_ZERO(&fds);
FD_SET(fd, &fds);
tv.tv_sec = (long)timeout; // cast needed for C++
tv.tv_usec = (long)((timeout - tv.tv_sec) * 1000000); // 'suseconds_t'
while (1) {
if (!(status = select(fd + 1, &fds, 0, 0, &tv)))
return FALSE;
else if (status > 0 && FD_ISSET(fd, &fds))
return TRUE;
else if (status > 0)
FatalError("I am confused");
else if (errno != EINTR)
FatalError("select"); // tbd EBADF: man page "an error has occurred"
}
}
bool flushSocketBeforeClose(int fd, double timeout) {
const double start = getWallTimeEpoch();
char discard[99];
ASSERT(SHUT_WR == 1);
if (shutdown(fd, 1) != -1)
while (getWallTimeEpoch() < start + timeout)
while (haveInput(fd, 0.01)) // can block for 0.01 secs
if (!read(fd, discard, sizeof discard))
return TRUE; // success!
return FALSE;
}
Example of use:
if (!flushSocketBeforeClose(fd, 2.0)) // can block for 2s
printf("Warning: Cannot gracefully close socket\n");
closeSocket(fd);
In the above, my getWallTimeEpoch() is similar to time(), and Perror() is a wrapper for perror().
Edit: Some comments:
My first admission is a bit embarrassing. The OP and Nemo challenged the need to clear the internal so_error before close, but I cannot now find any reference for this. The system in question was HPUX 10.20. After a failed connect(), just calling close() did not release the file descriptor, because the system wished to deliver an outstanding error to me. But I, like most people, never bothered to check the return value of close. So I eventually ran out of file descriptors (ulimit -n), which finally got my attention.
(very minor point) One commentator objected to the hard-coded numerical arguments to shutdown(), rather than e.g. SHUT_WR for 1. The simplest answer is that Windows uses different #defines/enums e.g. SD_SEND. And many other writers (e.g. Beej) use constants, as do many legacy systems.
Also, I always, always, set FD_CLOEXEC on all my sockets, since in my applications I never want them passed to a child and, more importantly, I don't want a hung child to impact me.
Sample code to set CLOEXEC:
static void setFD_CLOEXEC(int fd) {
int status = fcntl(fd, F_GETFD, 0);
if (status >= 0)
status = fcntl(fd, F_SETFD, status | FD_CLOEXEC);
if (status < 0)
Perror("Error getting/setting socket FD_CLOEXEC flags");
}
Great answer from Joseph Quinsey. I have comments on the haveInput function. Wondering how likely it is that select returns an fd you did not include in your set. This would be a major OS bug IMHO. That's the kind of thing I would check if I wrote unit tests for the select function, not in an ordinary app.
if (!(status = select(fd + 1, &fds, 0, 0, &tv)))
return FALSE;
else if (status > 0 && FD_ISSET(fd, &fds))
return TRUE;
else if (status > 0)
FatalError("I am confused"); // <--- fd unknown to function
My other comment pertains to the handling of EINTR. In theory, you could get stuck in an infinite loop if select kept returning EINTR, as this error lets the loop start over. Given the very short timeout (0.01), it appears highly unlikely to happen. However, I think the appropriate way of dealing with this would be to return errors to the caller (flushSocketBeforeClose). The caller can keep calling haveInput has long as its timeout hasn't expired, and declare failure for other errors.
ADDITION #1
flushSocketBeforeClose will not exit quickly in case of read returning an error. It will keep looping until the timeout expires. You can't rely on the select inside haveInput to anticipate all errors. read has errors of its own (ex: EIO).
while (haveInput(fd, 0.01))
if (!read(fd, discard, sizeof discard)) <-- -1 does not end loop
return TRUE;
This sounds to me like a bug in your Linux distribution.
The GNU C library documentation says:
When you have finished using a socket, you can simply close its file
descriptor with close
Nothing about clearing any error flags or waiting for the data to be flushed or any such thing.
Your code is fine; your O/S has a bug.
include:
#include <unistd.h>
this should help solve the close(); problem

Resources