Is it legal to read a file descriptor into NULL? - c

Recently, I've been fixing the timestep for the sake of a library that I am writing. Following some research, suppose that I ended up with this prototype, precise and easy to combine with the generic event system of my library:
#include <stdio.h>
#include <unistd.h>
#include <sys/timerfd.h>
#include <poll.h>
struct pollfd fds[1];
struct itimerspec its;
int main(void) {
fds[0] = (struct pollfd) {timerfd_create(CLOCK_MONOTONIC, 0), POLLIN, 0}; //long live clarity
its.it_interval = (struct timespec) {0, 16666667};
its.it_value = (struct timespec) {0, 16666667};
timerfd_settime(fds[0].fd, 0, &its, NULL);
while(1) {
poll(fds, 1, -1);
if(fds[0].revents == POLLIN) {
long long buffer;
read(fds[0].fd, &buffer, 8);
printf("ROFL\n");
} else {
printf("BOOM\n");
break;
}
}
close(fds[0].fd);
return 0;
}
However, it severely hurt me that I've had to pollute my CPU caches with a whole precious 8 bytes of data in order to make the timer's file descriptor reusable. Because of that, I've tried to replace the read() call with lseek(), as follows:
lseek(fds[0].fd, 0, SEEK_END);
Unfortunately, both that and even lseek(fds[0].fd, 8, SEEK_CUR); gave me ESPIPE errors and would not work. But then, I found out that the following actually did its job, despite of giving EFAULTs:
read(fds[0].fd, NULL, 8);
Is it legal, defined behavior to offset the file descriptor like this? If it is not (as the EFAULTs suggested to me, strongly enough to refrain from using that piece of genius), does there exist a function that would discard the read data, without ever writing it down, or otherwise offset my timer's file descriptor?

The POSIX specification of read(2) does not specify the consequences of passing a null pointer as the buffer argument. No specific error code is given, nor does it say whether any data will be read from the descriptor.
The Linux man page has this error, though:
EFAULT buf is outside your accessible address space.
It doesn't say that it will read the 8 bytes and discard them when this happens, though.
So I don't think you can depend on this working as you desire.

Related

Synchronize with sigev_notify_function()

I would like to read (asynchronously) BLOCK_SIZE bytes of one file, and the BLOCK_SIZE bytes of the second file, printing what has been read to the buffer as soon as the respective buffer has been filled. Let me illustrate what I mean:
// in main()
int infile_fd = open(infile_name, O_RDONLY); // add error checking
int maskfile_fd = open(maskfile_name, O_RDONLY); // add error checking
char* buffer_infile = malloc(BLOCK_SIZE); // add error checking
char* buffer_maskfile = malloc(BLOCK_SIZE); // add error checking
struct aiocb cb_infile;
struct aiocb cb_maskfile;
// set AIO control blocks
memset(&cb_infile, 0, sizeof(struct aiocb));
cb_infile.aio_fildes = infile_fd;
cb_infile.aio_buf = buffer_infile;
cb_infile.aio_nbytes = BLOCK_SIZE;
cb_infile.aio_sigevent.sigev_notify = SIGEV_THREAD;
cb_infile.aio_sigevent.sigev_notify_function = print_buffer;
cb_infile.aio_sigevent.sigev_value.sival_ptr = buffer_infile;
memset(&cb_maskfile, 0, sizeof(struct aiocb));
cb_maskfile.aio_fildes = maskfile_fd;
cb_maskfile.aio_buf = buffer_maskfile;
cb_maskfile.aio_nbytes = BLOCK_SIZE;
cb_maskfile.aio_sigevent.sigev_notify = SIGEV_THREAD;
cb_maskfile.aio_sigevent.sigev_notify_function = print_buffer;
cb_maskfile.aio_sigevent.sigev_value.sival_ptr = buffer_maskfile;
and the print_buffer() function is defined as follows:
void print_buffer(union sigval sv)
{
printf("%s\n", __func__);
printf("buffer address: %p\n", sv.sival_ptr);
printf("buffer: %.128s\n", (char*)sv.sival_ptr);
}
By the end of the program I do the usual clean up, i.e.
// clean up
close(infile_fd); // add error checking
close(maskfile_fd); // add error checking
free(buffer_infile);
printf("buffer_inline freed\n");
free(buffer_maskfile);
printf("buffer_maskfile freed\n");
The problem is, every once in a while buffer_inline gets freed before print_buffer manages to print its contents to the console. In a usual case I would employ some kind of pthread_join() but as far as I know this is impossible since POSIX does not specify that sigev_notify_function must be implemented using threads, and besides, how would I get the TID of such thread to call pthread_join() on?
Don't do it this way, if you can avoid it. If you can, just let process termination take care of it all.
Otherwise, the answer indicated in Andrew Henle's comment above is right on. You need to be sure that no more sigev_notify_functions will improperly reference the buffers.
The easiest way to do this is simply to countdown the number of expected notifications before freeing the buffers.
Note: your SIGEV_THREAD function is executed in a separate thread, though not necessarily a new thread each time. (POSIX.1-2017 System Interfaces §2.4.2) Importantly, you are not meant to manage this thread's lifecycle: it is detached by default, with PTHREAD_CREATE_JOINABLE explicitly noted as undefined behavior.
As an aside, I'd suggest never using SIGEV_THREAD in robust code. Per spec, the signal mask of the sigev_notify_function thread is implementation-defined. Yikes. For me, that makes it per se unreliable. In my view, SIGEV_SIGNAL and a dedicated signal-handling thread are much safer.

MMAP segmentation fault

int fp, page;
char *data;
if(argc > 1){
printf("Read the docs");
exit(1);
}
fp = open("log.txt", O_RDONLY); //Opening file to read
page = getpagesize();
data = mmap(0, page, PROT_READ, 0,fp, 0);
initscr(); // Creating the ncurse screen
clear();
move(0, 0);
printw("%s", data);
endwin(); //Ends window
fclose(fp); //Closing file
return 0;
Here is my code I keep getting a segmentation fault for some reason.
All my header files have been included so that's not the problem (clearly, because its something to do with memory). Thanks in advance.
Edit: Got it - it wasn't being formatted as a string. and also had to use stat() to get the file info rather than getpagesize()
You can't fclose() a file descriptor you got from open(). You must use close(fp) instead. What you do is passing a small int that gets treated as a pointer. This causes a segmentation fault.
Note that your choice of identifier naming is unfortunate. Usually fp would be a pointer-to-FILE (FILE*, as used by the standard IO library), while fd would be a file descriptor (a small integer), used by the kernel's IO system calls.
Your compiler should have told you that you pass an int where a pointer-to-FILE was expected, or that you use fclose() without a prototype in scope. Did you enable the maximum warning level of your compiler?
Another segfault is possible if the data pointer does not point to a NUL (0) terminated string. Does your log.txt contain NUL-terminated strings?
You should also check if mmap() fails returning MAP_FAILED.
Okay so here is the code that got it working
#include <sys/stat.h>
int status;
struct stat s;
status = stat(file, &s);
if(status < 0){
perror("Stat:");
exit(1);
data = mmap(NULL, s.st_size, PROT_READ, MAP_PRIVATE, fd, 0);
Before i was using 'getpagesize();' thanks beej !!!
mmap's man page gives you information on the parameters:
void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset);
As you can see, your second argument may be wrong (except you really want to exactly map a part of the file fitting into a single page).
Also: Probably 0 is not a valid flag value? Let's have a look again at the man page:
The flags argument determines whether updates to the mapping are
visible to other processes mapping the same region, and whether
updates are carried through to the underlying file. This behavior is
determined by including exactly one of the following values in flags: MAP_SHARED or MAP_PRIVATE
So you could try something like
data = mmap(0, size, PROT_READ, MAP_SHARED, fp, 0);
Always use the provided flags, as the underlying value may differ from machine to machine.
Also, the mapped area should not be larger than the underlying file. Check the size of log.txt beforehand.
The second argument to mmap should not be page size, it should be the size of your file. Here is a nice example.

Trouble Calling ioctl from user-space C

I'm trying to implement a program to access memory on an embedded system. I need to access some control register so I think that ioctl is the best way to do it. I have added the ioctl to the fops:
struct file_operations aes_fops = {
read: aes_read,
write: aes_write,
unlocked_ioctl: aes_ioctl,
open: aes_open,
release: aes_release
};
And have set up the function:
int aes_ioctl(struct inode *inode,
struct file *file,
unsigned int ioctl_num,
unsigned long ioctl_param){
printk(KERN_INFO "in ioctl\n");
....
}
But I am not getting inside of this function. Here is my user space code. Please help me understand if I am doing this totally wrong.
int main(int argc, char* argv[]){
int fd = fopen("/dev/aes", "r+");
ioctl(fd, 0, 1);
fclose(fd);
}
Some of the code is apparently for older kernels, because I am compiling for an embedded system where an older version of Linux has been modified.
The problem with your code is the request number you are using - 0. The kernel has some request number reserved for internal use. The kernel regards the request number as a struct, separates it to fields and calls the right subsystem for it.
See Documentation/ioctl/ioctl-number.txt (from Linux 3.4.6):
Code Seq#(hex) Include File Comments
========================================================
0x00 00-1F linux/fs.h conflict!
0x00 00-1F scsi/scsi_ioctl.h conflict!
0x00 00-1F linux/fb.h conflict!
0x00 00-1F linux/wavefront.h conflict!
0x02 all linux/fd.h
0x03 all linux/hdreg.h
...
Depending on what you are during, you'd have to follow the kernel guidelines for adding new ioctls():
If you are adding new ioctl's to the kernel, you should use the _IO
macros defined in <linux/ioctl.h>:
_IO an ioctl with no parameters
_IOW an ioctl with write parameters (copy_from_user)
_IOR an ioctl with read parameters (copy_to_user)
_IOWR an ioctl with both write and read parameters.
See the kernel's own Documentation/ioctl/ioctl-decoding.txt document for further details on how those numbers are structured.
In practice, if you pick Code 1, which means starting from 0x100 up until 0x1ff, you'd be fine.
Is your setup of the aes_fops structure correct? I've never seen it done that way. All the code I have is:
.unlocked_ioctl = aes_ioctl,
rather than:
unlocked_ioctl: aes_ioctl,
Colons within a structure (as you have in your setup) fields are used for bit fields as far as I'm aware (and during definition), not initialising the individual fields.
In other words, try:
struct file_operations aes_fops = {
.read = aes_read,
.write = aes_write,
.unlocked_ioctl = aes_ioctl,
.open = aes_open,
.release = aes_release
};
Note: It appears that gcc once did allow that variant of structure field initialisation but it's been obsolete since gcc 2.5 (see here, straight from the GCC documentation). You really should be using the proper method (ie, the one blessed by the ISO standard) to do this.
Without knowing the error returned, it's hard to say... My first though is your permissions on your file descriptor. I've seen similar issues before. First, you can get some more information about the failure if you take a look at the return of the ioctl:
#include <errno.h>
int main(int argc, char* argv[])
{
long ret;
int fd = fopen("/dev/aes", "r+");
ret = ioctl(fd, 0, 1);
if (ret < 0)
printf("ioctl failed. Return code: %d, meaning: %s\n", ret, strerror(errno));
fclose(fd);
}
Check the return values and this should help give you something to search on. Why check? See the bottom of the post...
Next in order to check if it is permissions issue, run the command:
ls -l /dev/aes
if you see something like:
crw------- 1 root root 10, 57 Aug 21 10:24 /dev/aes
Then just issue a:
sudo chmod 777 /dev/aes
And give it a try again. I bet it will work for you. (Note I ran that with root permissions since root is the owner of my version of your mod)
If the permissions are already OK, then I have a few more suggestions:
1) To me, the use of fopen/fclose is strange. You really only need to do:
int fd = open("/dev/aes");
close(fd);
My system doesn't even let your code compile as is.
2) Your IOCTL parameter list is old, I don't know what kernel version your compiling on, but recent kernels use this format:
long aes_ioctl(struct file *file, unsigned int ioctl_num, unsigned long ioctl_param){
Note the removal of the inode and the change of the return type. When I ran your code on my system, I made these changes.
Best of luck!
Note: Why should we check the return when we're "not getting into the ioctl"? Let me give you an example:
//Kernel Code:
//assume include files, other fops, exit, miscdev struct, etc. are present
long hello_ioctl(struct file *file, unsigned long ioctl_num, unsigned long ioctl_param) {
long ret = 0;
printk("in ioctl");
return ret;
}
static const struct file_operations hello_fops = {
owner: THIS_MODULE,
read: hello_read,
unlocked_ioctl: hello_ioctl,
};
static int __init hello_init(void) {
int ret;
printk("hello!\n");
ret = misc_register(&hello_dev); //assume it worked...
return ret;
}
User space code:
//assume includes
void main() {
int fd;
long ret;
fd = open("/dev/hello");
if(fd) {
c = ioctl(fd, 0, 1);
if (c < 0)
printf("error: %d, errno: %d, meaning: %s\n", c, errno, strerror(errno));
close(fd);
}
return;
}
So what's the output? Lets assume bad file permissions on /dev/hello (meaning our user space program can't access /dev/hello).
The dmesg | tail shows:
[ 2388.051660] Hello!
So it looks like we didn't "get in to" the ioctl. What's the output from the program?
error: -1, errno: 9, meaning: Bad file descriptor
Lots of useful output. Clearly the ioctl call did something, just not what we wanted. Now changing the permissions and re-running we can see the new dmesg:
[ 2388.051660] Hello!
[ 2625.025339] in ioctl

Increasing limit of FD_SETSIZE and select

I want to increase FD_SETSIZE macro value for my system.
Is there any way to increase FD_SETSIZE so select will not fail
Per the standards, there is no way to increase FD_SETSIZE. Some programs and libraries (libevent comes to mind) try to work around this by allocating additional space for the fd_set object and passing values larger than FD_SETSIZE to the FD_* macros, but this is a very bad idea since robust implementations may perform bounds-checking on the argument and abort if it's out of range.
I have an alternate solution that should always work (even though it's not required to by the standards). Instead of a single fd_set object, allocate an array of them large enough to hold the max fd you'll need, then use FD_SET(fd%FD_SETSIZE, &fds_array[fd/FD_SETSIZE]) etc. to access the set.
I also suggest using poll if possible. And there exist several "event" processing libraries like libevent or libev (or the event abilities of Glib from GTK, or QtCore, etc) which should help you. There are also things like epoll. And your problem is related to C10k
It would be better (and easy) to replace with poll. Generally poll() is a simple drop-in replacement for select() and isn't limited by the 1024 of FD_SETSIZE...
fd_set fd_read;
int id = 42;
FD_ZERO(fd_read);
FD_SET(id, &fd_read);
struct timeval tv;
tv.tv_sec = 5;
tv.tv_usec = 0;
if (select(id + 1, &fd_read, NULL, NULL, &tv) != 1) {
// Error.
}
becomes:
struct pollfd pfd_read;
int id = 42;
int timeout = 5000;
pfd_read.fd = id;
pfd_read.events = POLLIN;
if (poll(&pfd_read, 1, timeout) != 1) {
// Error
}
You need to include poll.h for the pollfd structure.
If you need to write as well as read then set the events flag as POLLIN | POLLOUT.
In order to use a fd_set larger than FD_SETSIZE, it is possible to define an extended one like this :
#include <sys/select.h>
#include <stdio.h>
#define EXT_FD_SETSIZE 2048
typedef struct
{
long __fds_bits[EXT_FD_SETSIZE / 8 / sizeof(long)];
} ext_fd_set;
int main()
{
ext_fd_set fd;
int s;
printf("FD_SETSIZE:%d sizeof(fd):%ld\n", EXT_FD_SETSIZE, sizeof(fd));
FD_ZERO(&fd);
while ( ((s=dup(0)) != -1) && (s < EXT_FD_SETSIZE) )
{
FD_SET(s, &fd);
}
printf("select:%d\n", select(EXT_FD_SETSIZE,(fd_set*)&fd, NULL, NULL, NULL));
return 0;
}
This prints :
FD_SETSIZE:2048 sizeof(fd):256
select:2045
In order to open more than 1024 filedescriptors, it is needed to increase the limit using for instance ulimit -n 2048.
Actually there IS a way to increase FD_SETSIZE on Windows. It's defined in winsock.h and per Microsoft themselves you can increase it by simply defining it BEFORE you include winsock.h:
See Maximum Number of Sockets an Application Can Use (old link), or the more recent page Maximum Number of Sockets Supported.
I do it all the time and have had no problems. The largest value I have used was around 5000 for a server I was developing.

mmap, msync and linux process termination

I want to use mmap to implement persistence of certain portions of program state in a C program running under Linux by associating a fixed-size struct with a well known file name using mmap() with the MAP_SHARED flag set. For performance reasons, I would prefer not to call msync() at all, and no other programs will be accessing this file. When my program terminates and is restarted, it will map the same file again and do some processing on it to recover the state that it was in before the termination. My question is this: if I never call msync() on the file descriptor, will the kernel guarantee that all updates to the memory will get written to disk and be subsequently recoverable even if my process is terminated with SIGKILL? Also, will there be general system overhead from the kernel periodically writing the pages to disk even if my program never calls msync()?
EDIT: I've settled the problem of whether the data is written, but I'm still not sure about whether this will cause some unexpected system loading over trying to handle this problem with open()/write()/fsync() and taking the risk that some data might be lost if the process gets hit by KILL/SEGV/ABRT/etc. Added a 'linux-kernel' tag in hopes that some knowledgeable person might chime in.
I found a comment from Linus Torvalds that answers this question
http://www.realworldtech.com/forum/?threadid=113923&curpostid=114068
The mapped pages are part of the filesystem cache, which means that even if the user process that made a change to that page dies, the page is still managed by the kernel and as all concurrent accesses to that file will go through the kernel, other processes will get served from that cache.
In some old Linux kernels it was different, that's the reason why some kernel documents still tell to force msync.
EDIT: Thanks RobH corrected the link.
EDIT:
A new flag, MAP_SYNC, is introduced since Linux 4.15, which can guarantee the coherence.
Shared file mappings with this flag provide the guarantee that
while some memory is writably mapped in the address space of
the process, it will be visible in the same file at the same
offset even after the system crashes or is rebooted.
references:
http://man7.org/linux/man-pages/man2/mmap.2.html search MAP_SYNC in the page
https://lwn.net/Articles/731706/
I decided to be less lazy and answer the question of whether the data is written to disk definitively by writing some code. The answer is that it will be written.
Here is a program that kills itself abruptly after writing some data to an mmap'd file:
#include <stdint.h>
#include <string.h>
#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
typedef struct {
char data[100];
uint16_t count;
} state_data;
const char *test_data = "test";
int main(int argc, const char *argv[]) {
int fd = open("test.mm", O_RDWR|O_CREAT|O_TRUNC, (mode_t)0700);
if (fd < 0) {
perror("Unable to open file 'test.mm'");
exit(1);
}
size_t data_length = sizeof(state_data);
if (ftruncate(fd, data_length) < 0) {
perror("Unable to truncate file 'test.mm'");
exit(1);
}
state_data *data = (state_data *)mmap(NULL, data_length, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_POPULATE, fd, 0);
if (MAP_FAILED == data) {
perror("Unable to mmap file 'test.mm'");
close(fd);
exit(1);
}
memset(data, 0, data_length);
for (data->count = 0; data->count < 5; ++data->count) {
data->data[data->count] = test_data[data->count];
}
kill(getpid(), 9);
}
Here is a program that validates the resulting file after the previous program is dead:
#include <stdint.h>
#include <string.h>
#include <stdlib.h>
#include <stdio.h>
#include <errno.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <assert.h>
typedef struct {
char data[100];
uint16_t count;
} state_data;
const char *test_data = "test";
int main(int argc, const char *argv[]) {
int fd = open("test.mm", O_RDONLY);
if (fd < 0) {
perror("Unable to open file 'test.mm'");
exit(1);
}
size_t data_length = sizeof(state_data);
state_data *data = (state_data *)mmap(NULL, data_length, PROT_READ, MAP_SHARED|MAP_POPULATE, fd, 0);
if (MAP_FAILED == data) {
perror("Unable to mmap file 'test.mm'");
close(fd);
exit(1);
}
assert(5 == data->count);
unsigned index;
for (index = 0; index < 4; ++index) {
assert(test_data[index] == data->data[index]);
}
printf("Validated\n");
}
I found something adding to my confusion:
munmap does not affect the object that was mappedthat is, the call to munmap
does not cause the contents of the mapped region to be written
to the disk file. The updating of the disk file for a MAP_SHARED
region happens automatically by the kernel's virtual memory algorithm
as we store into the memory-mapped region.
this is excerpted from Advanced Programming in the UNIX® Environment.
from the linux manpage:
MAP_SHARED Share this mapping with all other processes that map this
object. Storing to the region is equiva-lent to writing to the
file. The file may not actually be updated until msync(2) or
munmap(2) are called.
the two seem contradictory. is APUE wrong?
I didnot find a very precise answer to your question so decided add one more:
Firstly about losing data, using write or mmap/memcpy mechanisms both writes to page cache and are synced to underlying storage in background by OS based on its page replacement settings/algo. For example linux has vm.dirty_writeback_centisecs which determines which pages are considered "old" to be flushed to disk. Now even if your process dies after the write call has succeeded, the data would not be lost as the data is already present in kernel pages which will eventually be written to storage. The only case you would lose data is if OS itself crashes (kernel panic, power off etc). The way to absolutely make sure your data has reached storage would be call fsync or msync (for mmapped regions) as the case might be.
About the system load concern, yes calling msync/fsync for each request is going to slow your throughput drastically, so do that only if you have to. Remember you are really protecting against losing data on OS crashes which I would assume is rare and probably something most could live with. One general optimization done is to issue sync at regular intervals say 1 sec to get a good balance.
Either the Linux manpage information is incorrect or Linux is horribly non-conformant. msync is not supposed to have anything to do with whether the changes are committed to the logical state of the file, or whether other processes using mmap or read to access the file see the changes; it's purely an analogue of fsync and should be treated as a no-op except for the purposes of ensuring data integrity in the event of power failure or other hardware-level failure.
According to the manpage,
The file may not actually be
updated until msync(2) or munmap() is called.
So you will need to make sure you call munmap() prior to exiting at the very least.

Resources