Read file stored in segment and send bytes - c

I have a file I stored in a structure in the segment from a different process A. Now from process B I need to get this file and convert it to bytes so I can send it or send it while reading its bytes , what would be an ideal way of doing this? see below:
typedef struct mysegment_struct_t {
FILE *stream;
size_t size;
}
so I have the mapping to the segment and all just not sure how to get it now
size_t bytes_sent;
struct mysegment_struct_t *fileinfo =
(struct mysegment_struct_t *)mmap(NULL,size,PROT_READ | PROT_WRITE, MAP_SHARED, fd,0);
//read stream into a byte array? (how can this be done in c)
//FILE *f = fopen(fileinfo->stream, "w+b"); //i a bit lost here, the file is in the segment already
//send bytes
while (bytes_sent < fileinfo->size) {
bytes_sent +=send_to_client(buffer, size); //some buffer containing bytes?
}
I am kind of new to C programming but I cant find something like read the file in memory to a byte array for example.
Thanks
from blog https://www.softprayog.in/programming/interprocess-communication-using-posix-shared-memory-in-linux
there has to be a way i can share the file between processes using the shared memory.

You simply can't do this. The pointer stream points to objects that only exist in the memory of process A, and are not in the shared memory area (and even if they were, they wouldn't typically be mapped at the same address). You're going to have to design something else.
One possibility is to send the file descriptor over a Unix domain socket, see Portable way to pass file descriptor between different processes. However, it is probably worth stepping back and thinking about why you want to pass an open file between processes in the first place, and whether there is a better way to achieve your overall goal.

Related

Passing a struct trough a pipe in C

Consider the following struct:
struct msg {
int id;
int size;
double *data;
}
Now, this struct is to be used to communicate trough a pipe between a Producer and a Consumer processes.
As it is, it won't work, due to the data pointer... so it must be changed to actual data (not the pointer to data). But the complication arises from the fact that Producer must be able to send ANY amount of data (and receiver... works accordingly).
Does any one can, please point me a solution?
Specifically:
What is the best solution for defining the data structures?
Is union with a char* c_data (passing it to the write) the way to go?
How to implement read for accounting the size?
Thank you very much for your feedback.
There unfortunately is no native way of sending arbitrary objects through a pipe. However, you can achieve what you want pretty easily by sending raw data with the help of fread() and fwrite() as a very simple way of serializing the data in binary form.
Please keep in mind that in order for the following to work, both the producer and the consumer programs need to be compiled on the same machine, using the same data structure definitions and possibly the same compiler flags.
Here's a simple solution:
Create a common definition of an header structure to be used both by the producer and the receiver:
struct msg_header {
int id;
int size;
};
This will hold information about the real data. I would suggest you to use size_t to store the size, as it is unsigned and more suitable for this purpose.
In the producer, prepare the data to be sent along with the correct header, for example:
struct msg_header header = {.id = 0, .size = 4};
double *data = {1.23, 2.34, 3.45, 4.56};
It doesn't obviously need to be declared like this, it could even be dynamically sized and allocated through malloc(), the important thing is that you know the size.
Still in the producer, send the header followed by the data through the pipe:
// Use fdopen() if you don't already have a FILE*, otherwise skip this line.
FILE *pipe = fdopen(pipe_fd, "w");
// Send the header through the pipe.
fwrite(&header, sizeof(header), 1, pipe);
// Send the data through the pipe.
fwrite(&data, sizeof(*data), header.size, pipe);
In the consumer, read the header and then use the .size value to read the correct amount of data:
// Use fdopen() if you don't already have a FILE*, otherwise skip this line.
FILE *pipe = fdopen(pipe_fd, "r");
struct msg_header header;
double *data;
// Read the header from the pipe.
fread(&header, sizeof(header), 1, pipe);
// Allocate the memory needed to hold the data.
data = malloc(sizeof(*data) * header.size);
// Read the data from the pipe.
fread(&data, sizeof(*data), header.size, pipe);
Keep in mind that you have to check for errors after each of the above function calls. I did not add error checking in my examples just to make the code simpler. Refer to the manual pages for more information.

Thread Safety of Reading a File

So my end goal is to allow multiple threads to read the same file from start to finish. For example, if the file was 200 bytes:
Thread A 0-> 200 bytes
Thread B 0-> 200 bytes
Thread C 0-> 200 bytes
etc.
Basically have each thread read the entire file. The software is only reading that file, no writing.
so I open the file:
fd = open(filename, O_RDWR|O_SYNC, 0);
and then in each thread simply loop the file. Because I only create one File Descriptor, are also create a create a clone of the file descriptor in each thread using dup
Here is a minimual example of a thread function:
void ThreadFunction(){
int file_desc= dup(fd);
uint32_t nReadBuffer[1000];
int numBytes = -1;
while (numBytes != 0) {
numBytes = read(file_desc, &nReadBuffer, sizeof(nReadBuffer));
//processing on the bytes goes here
}
}
However, I'm not sure this is correctly looping through the entire file and each thread is instead somehow daisy chaining through the file.
Is this approach correct? I inherited this software for a project I am working on, the file descriptor gets used in an mmap call, so I am not entirely sure of O_RDWR or O_SYNC matter
As other folks have mentioned, it isn't possible to use a duplicated file descriptor here. However, there is a thread-safe alternative, which is to use pread. pread reads a file at an offset and doesn't change the implicit offset in the file description.
This does mean that you have to manually manage the offset in each thread, but that shouldn't be too much of a problem with your proposed function.

How to write data into an offset which is not 512*n bytes using linux native AIO?

I'm writing some app like Bittorrent client to download file from net and write it to local file. I will get partial data and write to the file.
For example, I will download a 1GB file, I will get offset 100, data: 312 bytes, offset 1000000, data: 12345, offset 4000000, data: 888 bytes.
I'm using Linux native AIO(io_setup, io_submit, io_getevents), I found this
When using linux kernel AIO, files are required to be opened in O_DIRECT mode. This introduces further requirements of all read and write operations to have their file offset, memory buffer and size be aligned to 512 bytes.
So how can I write data into some offset which is not 512 aligned?
For example, first I write 4 bytes to a file, so I have to do something like this:
fd = open("a.txt", O_CREAT | O_RDWR | O_DIRECT, 0666);
struct iocb cb;
char data[512] = "asdf";
cb.aio_buf = ALIGN(data, 512);
cb.aio_offset = 512;
cb.aio_nbytes = 512;
Then I would like to append data after asdf:
struct iocb cb2;
char data2[512] = "ghij";
cb2.aio_buf = ALIGN(data2, 512);
cb2.aio_offset = 5;
cb2.aio_nbytes = 512;
It will give error when write
Invalid argument (-22)
So how to do it?
You have to do what the driver would do if you weren't using O_DIRECT. That is, read the whole block, update the portion you want, and write it back. Block devices simply don't allow smaller accesses.
Doing it yourself can be more efficient (for example, you can update a number of disconnected sequences in the same block for the cost of one read and write). However, since you aren't letting the driver do the work you also aren't getting any atomicity guarantees across the read-modify-write operation.
You don't. The Linux AIO API is not useful, especially not for what you're trying to do. It was added for the sake of Oracle, who wanted to bypass the kernel's filesystems and block device buffer layer for Reasons™. It does not have anything to do with POSIX AIO or other things reasonable people mean when they talk about "AIO".

Is it possible to write directly from file to the socket?

I'm interested in the basic principles of Web-servers, like Apache or Nginx, so now I'm developing my own server.
When my server gets a request, it's searching for a file (e.g index.html), if it exists - read all the content to the buffer (content) and write it to the socket after. Here's a simplified code:
int return_file(char* content, char* fullPath) {
file = open(fullPath, O_RDONLY);
if (file > 0) { // File was found, OK
while ((nread = read(file, content, 2048)) > 0) {}
close(file);
return 200;
}
}
The question is pretty simple: is it possible to avoid using buffer and write file content directly to the socket?
Thanks for any tips :)
There is no standardized system call which can write directly from a file to a socket.
However, some operating systems do provide such a call. For example, both FreeBSD and Linux implement a system call called sendfile, but the precise details differ between the two systems. (In both cases, you need the underlying file descriptor for the file, not the FILE* pointer, although on both these platforms you can use fileno() to extract the fd from the FILE*.)
For more information:
FreeBSD sendfile()
Linux sendfile()
What you can do is write the "chunk" you read immediately to the client.
In order to write the content, you MUST read it, so you can't avoid that, but you can use a smaller buffer, and write the contents as you read them eliminating the need to read the whole file into memory.
For instance, you could
unsigned char byte;
// FIXME: store the return value to allow
// choosing the right action on error.
//
// Note that `0' is not really an error.
while (read(file, &byte, 1) > 0) {
if (write(client, &byte, 1) <= 0) {
// Handle error.
}
}
but then, unsigned char byte; could be unsigned char byte[A_REASONABLE_BUFFER_SIZE]; which would be better, and you don't need to store ALL the content in memory.
}
No, it is not. There must be an intermediate storage that you use for reading/writing the data.
There is one edge case: when you use memory mapped files, the mapped file's region can be used for writing into socket. But internally, the system would anyway perform a read into memory buffer operation.

How much information is actually stored in a file descriptor?

This may sound like an odd question, but when I go and open a file:
int fd;
fd = open("/dev/somedevice", O_RDWR);
What exactly am I getting back? I can see the man page says:
The open() function shall return a file descriptor for the named file that is the lowest file descriptor not currently open for that process
But is that it? Is it just an int or is there data attached to it behind the scenes? The reason I'm asking is I found some code (Linux/C) where we're opening the file from user space:
//User space code:
int fdC;
if ((fdC = open(DEVICE, O_RDWR)) < 0) {
printf("Error opening device %s (%s)\n", DEVICE, strerror(errno));
goto error_exit;
}
while (!fQuit) {
if ((nRet = read(fdC, &rx_message, 1)) > 0) {
then on the kernel end, the file operations for this module (which supplies the fd) map reads to the n_read() function:
struct file_operations can_fops = {
owner: THIS_MODULE,
lseek: NULL,
read: n_read,
Then the file descriptor is used in the n_read(), but it's being accessed to get data:
int n_read(struct file *file, char *buffer, size_t count, loff_t *loff)
{
data_t * dev;
dev = (data_t*)file->private_data;
So... I figure what's happening here is either:
A) a file descriptor returned from open() contains more data than just a descriptive integer value
Or
B)The mapping between a call to "read" in the user space isn't as simple as I'm making it out to be and there's some code missing in this equation.
Any input that might help direct me?
File descriptor is just an int. The kernel uses it as an index to a table containing all the related information, including file position, file ops (kernel functions that provide the read(), write(), mmap() etc. syscalls), and so on.
When you open() a file or device, the kernel creates a new file descriptor entry for your process, and populates the internal data, including the file ops.
When you use read(), write(), mmap(), etc. with a valid file descriptor, the kernel simply looks up the correct in-kernel function to call based on the file ops in the file descriptor table it has (and which the file descriptor indexes). It really is that simple.
In addition to existing good answer by #Nominal Aminal it is an integer but it points to an entry of a structure in kernel called file descriptor table. That is at least the case with Linux. Of the several fields that are part of that struct, an interesting one is:
FILE * pointer; // descriptor to / from reference counts etc.
You might be interested in following api's which given one of FILE * or descriptor, return the other
How to obtain FILE * from fd and vice versa
I think that it is just an int.
From Wikipedia:
Generally, a file descriptor is an index for an entry in a kernel-resident data structure containing the details of all open files. In POSIX this data structure is called a file descriptor table, and each process has its own file descriptor table.

Resources