Thread Safety of Reading a File

Thread Safety of Reading a File - c

So my end goal is to allow multiple threads to read the same file from start to finish. For example, if the file was 200 bytes:
Thread A 0-> 200 bytes
Thread B 0-> 200 bytes
Thread C 0-> 200 bytes
etc.
Basically have each thread read the entire file. The software is only reading that file, no writing.
so I open the file:
fd = open(filename, O_RDWR|O_SYNC, 0);
and then in each thread simply loop the file. Because I only create one File Descriptor, are also create a create a clone of the file descriptor in each thread using dup
Here is a minimual example of a thread function:
void ThreadFunction(){
int file_desc= dup(fd);
uint32_t nReadBuffer[1000];
int numBytes = -1;
while (numBytes != 0) {
numBytes = read(file_desc, &nReadBuffer, sizeof(nReadBuffer));
//processing on the bytes goes here
}
}
However, I'm not sure this is correctly looping through the entire file and each thread is instead somehow daisy chaining through the file.
Is this approach correct? I inherited this software for a project I am working on, the file descriptor gets used in an mmap call, so I am not entirely sure of O_RDWR or O_SYNC matter

As other folks have mentioned, it isn't possible to use a duplicated file descriptor here. However, there is a thread-safe alternative, which is to use pread. pread reads a file at an offset and doesn't change the implicit offset in the file description.
This does mean that you have to manually manage the offset in each thread, but that shouldn't be too much of a problem with your proposed function.

Related

clear a Pipe in C

I'm sending data from one process to another pipe and I want to clear the pipe after reading.
Is there a function in C that can do this ?

Yes. It's just the read function offered by the stdio library. You have to invoke it as many times as you need in order to be sure the pipe will be empty.
As the documentation suggests, the read function attempts reading count bytes from an I/O channel (a pipe in your case) for which you have passed the file descriptor as first argument, and places its content into a buffer with enough room to accommodate it.
Let's recall that the read function may return a value indicating a number of bytes read that is smaller than that of those requested. This is perfectly fine if there are less bytes to read than what you expected.
Also remeber that reading from a pipe is blocking if there's nothing to read and the writer has not yet closed the relative descriptor, thus meaning that you'll not get EOF until the counterpart closes its descriptor. Therefore you'll stuck while attempting to read from pipe. If you are intended to avoid the aforementioned possibility I suggest to follow the solution below based on the poll function to verify whether there's data to read from a file descriptor:
#include <poll.h>
struct pollfd pfd;
int main(void)
{
/* your operations */
pfd.fd = pipe_fd;
pfd.events = POLLIN;
while (poll(&pfd, 1, 0) == 1)
{
/* there's available data, read it */
}
return 0;
}

Linux select() not blocking

I'm trying to understand the difference between select() and poll() better. For this I tried to implement a simple program that will open a file as write-only, add its file descriptor to the read set and than execute select in hopes that the function will block until the read permission is granted.
As this didnt work (and as far as I understood, this is intended behaviour) I tried to block access to the file using flock before the select() executen. Still, the program did not block its execution.
My sample code is as follows:
#include <stdio.h>
#include <poll.h>
#include <sys/file.h>
#include <errno.h>
#include <sys/select.h>
int main(int argc, char **argv)
{
printf("[+] Select minimal example\n");
int max_number_fds = FOPEN_MAX;
int select_return;
int cnt_pollfds;
struct pollfd pfds_array[max_number_fds];
struct pollfd *pfds = pfds_array;
fd_set fds;
int fd_file = open("./poll_text.txt", O_WRONLY);
struct timeval tv;
tv.tv_sec = 10;
tv.tv_usec = 0;
printf("\t[+] Textfile fd: %d\n", fd_file);
//create and set fds set
FD_ZERO(&fds);
FD_SET(fd_file, &fds);
printf("[+] Locking file descriptor!\n");
if(flock(fd_file,LOCK_EX) == -1)
{
int error_nr = errno;
printf("\t[+] Errno: %d\n", error_nr);
}
printf("[+] Executing select()\n");
select_return = select(fd_file+1, &fds, NULL, NULL, &tv);
if(select_return == -1){
int error_nr = errno;
printf("[+] Select Errno: %d\n", error_nr);
}
printf("[+] Select return: %d\n", select_return);
}
Can anybody see my error in this code? Also: I first tried to execute this code with two FDs added to the read list. When trying to lock them I had to use flock(fd_file,LOCK_SH) as I cannot exclusively lock two FDs with LOCK_EX. Is there a difference on how to lock two FDs of the same file (compared to only one fd)
I'm also not sure why select will not block when a file, that is added to the Read-set is opened as Write-Only. The program can never (without a permission change) read data from the fd, so in my understanding select should block the execution, right?
As a clarification: My "problem" I want to solve is that I have to check if I'm able to replace existing select() calls with poll() (existing in terms of: i will not re-write the select() call code, but will have access to the arguments of select.). To check this, I wanted to implement a test that will force select to block its execution, so I can later check if poll will act the same way (when given similar instructions, i.e. the same FDs to check).
So my "workflow" would be: write tests for different select behaviors (i.e. block and not block), write similar tests for poll (also block, not block) and check if/how poll can be forced do exactly what select is doing.
Thank you for any hints!

When select tells you that a file descriptor is ready for reading, this doesn't necessarily mean that you can read data. It only means that a read call will not block. A read call will also not block when it returns an EOF or error condition.
In your case I expect that read will immediately return -1 and set errno to EBADF (fd is not a valid file descriptor or is not open for reading) or maybe EINVAL (fd is attached to an object which is unsuitable for reading...)
Edit: Additional information as requested in a comment:
A file can be in a blocking state if a physical operation is needed that will take some time, e.g. if the read buffer is empty and (new) data has to be read from the disk, if the file is connected to a terminal and the user has not yet entered any (more) data or if the file is a socket or a pipe and a read would have to wait for (new) data to arrive...
The same applies for write: If the send buffer is full, a write will block. If the remaining space in the send buffer is smaller than your amount of data, it may write only the part that currently fits into the buffer.
If you set a file to non-blocking mode, a read or write will not block but tell you that it would block.
If you want to have a blocking situation for testing purposes, you need control over the process or hardware that provides or consumes the data. I suggest to use read from a terminal (stdin) when you don't enter any data or from a pipe where the writing process does not write any data. You can also fill the write buffer on a pipe when the reading process does not read from it.

How much information is actually stored in a file descriptor?

This may sound like an odd question, but when I go and open a file:
int fd;
fd = open("/dev/somedevice", O_RDWR);
What exactly am I getting back? I can see the man page says:
The open() function shall return a file descriptor for the named file that is the lowest file descriptor not currently open for that process
But is that it? Is it just an int or is there data attached to it behind the scenes? The reason I'm asking is I found some code (Linux/C) where we're opening the file from user space:
//User space code:
int fdC;
if ((fdC = open(DEVICE, O_RDWR)) < 0) {
printf("Error opening device %s (%s)\n", DEVICE, strerror(errno));
goto error_exit;
}
while (!fQuit) {
if ((nRet = read(fdC, &rx_message, 1)) > 0) {
then on the kernel end, the file operations for this module (which supplies the fd) map reads to the n_read() function:
struct file_operations can_fops = {
owner: THIS_MODULE,
lseek: NULL,
read: n_read,
Then the file descriptor is used in the n_read(), but it's being accessed to get data:
int n_read(struct file *file, char *buffer, size_t count, loff_t *loff)
{
data_t * dev;
dev = (data_t*)file->private_data;
So... I figure what's happening here is either:
A) a file descriptor returned from open() contains more data than just a descriptive integer value
Or
B)The mapping between a call to "read" in the user space isn't as simple as I'm making it out to be and there's some code missing in this equation.
Any input that might help direct me?

File descriptor is just an int. The kernel uses it as an index to a table containing all the related information, including file position, file ops (kernel functions that provide the read(), write(), mmap() etc. syscalls), and so on.
When you open() a file or device, the kernel creates a new file descriptor entry for your process, and populates the internal data, including the file ops.
When you use read(), write(), mmap(), etc. with a valid file descriptor, the kernel simply looks up the correct in-kernel function to call based on the file ops in the file descriptor table it has (and which the file descriptor indexes). It really is that simple.

In addition to existing good answer by #Nominal Aminal it is an integer but it points to an entry of a structure in kernel called file descriptor table. That is at least the case with Linux. Of the several fields that are part of that struct, an interesting one is:
FILE * pointer; // descriptor to / from reference counts etc.
You might be interested in following api's which given one of FILE * or descriptor, return the other
How to obtain FILE * from fd and vice versa

I think that it is just an int.
From Wikipedia:
Generally, a file descriptor is an index for an entry in a kernel-resident data structure containing the details of all open files. In POSIX this data structure is called a file descriptor table, and each process has its own file descriptor table.

file descriptor polling

I have created a following program in which I wish to poll on the file descriptor of the file that I am opening in the program.
#define FILE "help"
int main()
{
int ret1;
struct pollfd fds[1];
ret1 = open(FILE, O_CREAT);
fds[0].fd = ret1;
fds[0].events = POLLIN;
while(1)
{
poll(fds,1,-1);
if (fds[0].revents & POLLIN)
printf("POLLING");
}
return 0;
}
It is going in infinite loop. I am expecting to run the loop when some operation happen to the file. (Its a ASCII file)
plz help

poll() actually doesn't work on opened files. Since a read() on a file will never block, poll() will always return that you can read non-blocking from the file.
This would (almost) work on character devices*, named pipes** or sockets, though, since those block when you read() from them when there is no data available. (you also need to actually read that data then, or else poll will tell again and again that data is available)
To "poll" a growing/shrinking file, see man inotify or implement your own polling using fstat() in a loop.
* block devices are a story apart; while technically a read from a harddisk can block for 10 ms or longer, this is not perceived as blocking I/O in linux.
** see also how to flush a named pipe using bash

No idea if this is the cause of your problems (probably not), but it is a particularly bad idea to redefine the standard macro FILE.
Didn't your compiler complain about this?

Does Linux's splice(2) work when splicing from a TCP socket?

I've been writing a little program for fun that transfers files over TCP in C on Linux. The program reads a file from a socket and writes it to file (or vice versa). I originally used read/write and the program worked correctly, but then I learned about splice and wanted to give it a try.
The code I wrote with splice works perfectly when reading from stdin (redirected file) and writing to the TCP socket, but fails immediately with splice setting errno to EINVAL when reading from socket and writing to stdout. The man page states that EINVAL is set when neither descriptor is a pipe (not the case), an offset is passed for a stream that can't seek (no offsets passed), or the filesystem doesn't support splicing, which leads me to my question: does this mean that TCP can splice from a pipe, but not to?
I'm including the code below (minus error handling code) in the hopes that I've just done something wrong. It's based heavily on the Wikipedia example for splice.
static void splice_all(int from, int to, long long bytes)
{
long long bytes_remaining;
long result;
bytes_remaining = bytes;
while (bytes_remaining > 0) {
result = splice(
from, NULL,
to, NULL,
bytes_remaining,
SPLICE_F_MOVE | SPLICE_F_MORE
);
if (result == -1)
die("splice_all: splice");
bytes_remaining -= result;
}
}
static void transfer(int from, int to, long long bytes)
{
int result;
int pipes[2];
result = pipe(pipes);
if (result == -1)
die("transfer: pipe");
splice_all(from, pipes[1], bytes);
splice_all(pipes[0], to, bytes);
close(from);
close(pipes[1]);
close(pipes[0]);
close(to);
}
On a side note, I think that the above will block on the first splice_all when the file is large enough due to the pipe filling up(?), so I also have a version of the code that forks to read and write from the pipe at the same time, but it has the same error as this version and is harder to read.
EDIT: My kernel version is 2.6.22.18-co-0.7.3 (running coLinux on XP.)

What kernel version is this? Linux has had support for splicing from a TCP socket since 2.6.25 (commit 9c55e01c0), so if you're using an earlier version, you're out of luck.

You need to splice_all from pipes[0] to to every time you do a single splice from from to pipes[1] (the splice_all is for the amount of bytes just read by the last single splice) . Reason: pipes represents a finite kernel memory buffer. So if bytes is more than that, you'll block forever in your splice_all(from, pipes[1], bytes).

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight