Is it possible to write directly from file to the socket? - c

I'm interested in the basic principles of Web-servers, like Apache or Nginx, so now I'm developing my own server.
When my server gets a request, it's searching for a file (e.g index.html), if it exists - read all the content to the buffer (content) and write it to the socket after. Here's a simplified code:
int return_file(char* content, char* fullPath) {
file = open(fullPath, O_RDONLY);
if (file > 0) { // File was found, OK
while ((nread = read(file, content, 2048)) > 0) {}
close(file);
return 200;
}
}
The question is pretty simple: is it possible to avoid using buffer and write file content directly to the socket?
Thanks for any tips :)

There is no standardized system call which can write directly from a file to a socket.
However, some operating systems do provide such a call. For example, both FreeBSD and Linux implement a system call called sendfile, but the precise details differ between the two systems. (In both cases, you need the underlying file descriptor for the file, not the FILE* pointer, although on both these platforms you can use fileno() to extract the fd from the FILE*.)
For more information:
FreeBSD sendfile()
Linux sendfile()

What you can do is write the "chunk" you read immediately to the client.
In order to write the content, you MUST read it, so you can't avoid that, but you can use a smaller buffer, and write the contents as you read them eliminating the need to read the whole file into memory.
For instance, you could
unsigned char byte;
// FIXME: store the return value to allow
// choosing the right action on error.
//
// Note that `0' is not really an error.
while (read(file, &byte, 1) > 0) {
if (write(client, &byte, 1) <= 0) {
// Handle error.
}
}
but then, unsigned char byte; could be unsigned char byte[A_REASONABLE_BUFFER_SIZE]; which would be better, and you don't need to store ALL the content in memory.
}

No, it is not. There must be an intermediate storage that you use for reading/writing the data.
There is one edge case: when you use memory mapped files, the mapped file's region can be used for writing into socket. But internally, the system would anyway perform a read into memory buffer operation.

Related

Read file stored in segment and send bytes

I have a file I stored in a structure in the segment from a different process A. Now from process B I need to get this file and convert it to bytes so I can send it or send it while reading its bytes , what would be an ideal way of doing this? see below:
typedef struct mysegment_struct_t {
FILE *stream;
size_t size;
}
so I have the mapping to the segment and all just not sure how to get it now
size_t bytes_sent;
struct mysegment_struct_t *fileinfo =
(struct mysegment_struct_t *)mmap(NULL,size,PROT_READ | PROT_WRITE, MAP_SHARED, fd,0);
//read stream into a byte array? (how can this be done in c)
//FILE *f = fopen(fileinfo->stream, "w+b"); //i a bit lost here, the file is in the segment already
//send bytes
while (bytes_sent < fileinfo->size) {
bytes_sent +=send_to_client(buffer, size); //some buffer containing bytes?
}
I am kind of new to C programming but I cant find something like read the file in memory to a byte array for example.
Thanks
from blog https://www.softprayog.in/programming/interprocess-communication-using-posix-shared-memory-in-linux
there has to be a way i can share the file between processes using the shared memory.
You simply can't do this. The pointer stream points to objects that only exist in the memory of process A, and are not in the shared memory area (and even if they were, they wouldn't typically be mapped at the same address). You're going to have to design something else.
One possibility is to send the file descriptor over a Unix domain socket, see Portable way to pass file descriptor between different processes. However, it is probably worth stepping back and thinking about why you want to pass an open file between processes in the first place, and whether there is a better way to achieve your overall goal.

What is the best way to read input of unpredictable and indeterminate (ie no EOF) size from stdin in C?

This must be a stupid question because this should be a very common and simple problem, but I haven't been able to find an answer anywhere, so I'll bite the bullet and ask.
How on earth should I go about reading from the standard input when there is no way of determining the size of the data? Obviously if the data ends in some kind of terminator like a NUL or EOF then this is quite trivial, but my data does not. This is simple IPC: the two programs need to talk back and forth and ending the file streams with EOF would break everything.
I thought this should be fairly simple. Clearly programs talk to each other over pipes all the time without needing any arcane tricks, so I hope there is a simple answer that I'm too stupid to have thought of. Nothing I've tried has worked.
Something obvious like (ignoring necessary realloc's for brevity):
int size = 0, max = 8192;
unsigned char *buf = malloc(max);
while (fread((buf + size), 1, 1, stdin) == 1)
++size;
won't work since fread() blocks and waits for data, so this loop won't terminate. As far as I know nothing in stdio allows nonblocking input, so I didn't even try any such function. Something like this is the best I could come up with:
struct mydata {
unsigned char *data;
int slen; /* size of data */
int mlen; /* maximum allocated size */
};
...
struct mydata *buf = xmalloc(sizeof *buf);
buf->data = xmalloc((buf->mlen = 8192));
buf->slen = 0;
int nread = read(0, buf->data, 1);
if (nread == (-1))
err(1, "read error");
buf->slen += nread;
fcntl(0, F_SETFL, oflags | O_NONBLOCK);
do {
if (buf->slen >= (buf->mlen - 32))
buf->data = xrealloc(buf->data, (buf->mlen *= 2));
nread = read(0, (buf->data + buf->slen), 1);
if (nread > 0)
buf->slen += nread;
} while (nread == 1);
fcntl(0, F_SETFL, oflags);
where oflags is a global variable containing the original flags for stdin (cached at the start of the program, just in case). This dumb way of doing it works as long as all of the data is present immediately, but fails otherwise. Because this sets read() to be non-blocking, it just returns -1 if there is no data. The program communicating with mine generally sends responses whenever it feels like it, and not all at once, so if the data is at all large this exits too early and fails.
How on earth should I go about reading from the standard input when there is no way of determining the size of the data?
There always has to be a way to determinate the size. Otherwise, the program would require infinite memory, and thus impossible to run on a physical computer.
Think about it this way: even in the case of a never-ending stream of data, there must be some chunks or points where you have to process it. For instance, a live-streamed video has to decode a portion of it (e.g. a frame). Or a video game which processes messages one by one, even if the game has undetermined length.
This holds true regardless of the type of I/O you decide to use (blocking/non-blocking, synchronous/asynchronous...). For instance, if you want to use typical blocking synchronous I/O, what you have to do is process the data in a loop: each iteration, you read as much data as is available, and process as much as you can. Whatever you can not process (because you have not received enough yet), you keep for the next iteration. Then, the rest of the loop is the rest of the logic of the program.
In the end, regardless of what you do, you (or someone else, e.g. a library, the operating system, the hardware buffers...) have to buffer incoming data until it can be processed.
Basically, you have two choices -- synchronous or asynchronous -- and both have their advantages and disadvantages.
For synchronous, you need either delimeters or a length field embedded in the record (or fixed length records, but that is pretty inflexible). This works best for synchronous protocols like synchronous rpc or simplex client-server interactions where only one side talks at a time while the other side waits. For ASCII/text based protocols, it is common to use a control-character delimiter like NL/EOL or NUL or CTX to mark the end of messages. Binary protocols more commonly use an embedded length field -- the receiver first reads the length and then reads the full amount of (expected) data.
For asynchronous, you use non-blocking mode. It IS possible to use non-blocking mode with stdio streams, it just requires some care. out-of-data conditions show up to stdio like error conditions, so you need to use ferror and clearerr on the FILE * as appropriate.
It's possible for both to be used -- for example in client-server interactions, the clients may use synchronous (they send a request and wait for a reply) while the server uses asynchronous (to be be robust in the presence of misbehaving clients).
The read api on Linux or the ReadFile Api on windows will immediately return and not wait for the specified number of bytes to fill the buffer (when reading a pipe or socket). Read then reurns the number of bytes read.
This means, when reading from a pipe, you set a buffersize, read as much as returned and the process it. You then read the next bit. The only time you are blocked is if there is no data available at all.
This differs from fread which only returns once the desired number of bytes are returned or the stream determines doing so is impossible (like eof).

Write atomically to a file using Write() with snprintf()

I want to be able to write atomically to a file, I am trying to use the write() function since it seems to grant atomic writes in most linux/unix systems.
Since I have variable string lengths and multiple printf's, I was told to use snprintf() and pass it as an argument to the write function in order to be able to do this properly, upon reading the documentation of this function I did a test implementation as below:
int file = open("file.txt", O_CREAT | O_WRONLY);
if(file < 0)
perror("Error:");
char buf[200] = "";
int numbytes = snprintf(buf, sizeof(buf), "Example string %s" stringvariable);
write(file, buf, numbytes);
From my tests it seems to have worked but my question is if this is the most correct way to implement it since I am creating a rather large buffer (something I am 100% sure will fit all my printfs) to store it before passing to write.
No, write() is not atomic, not even when it writes all of the data supplied in a single call.
Use advisory record locking (fcntl(fd, F_SETLKW, &lock)) in all readers and writers to achieve atomic file updates.
fcntl()-based record locks work over NFS on both Linux and BSDs; flock()-based file locks may not, depending on system and kernel version. (If NFS locking is disabled like it is on some web hosting services, no locking will be reliable.) Just initialize the struct flock with .l_whence = SEEK_SET, .l_start = 0, .l_len = 0 to refer to the entire file.
Use asprintf() to print to a dynamically allocated buffer:
char *buffer = NULL;
int length;
length = asprintf(&buffer, ...);
if (length == -1) {
/* Out of memory */
}
/* ... Have buffer and length ... */
free(buffer);
After adding the locking, do wrap your write() in a loop:
{
const char *p = (const char *)buffer;
const char *const q = (const char *)buffer + length;
ssize_t n;
while (p < q) {
n = write(fd, p, (size_t)(q - p));
if (n > 0)
p += n;
else
if (n != -1) {
/* Write error / kernel bug! */
} else
if (errno != EINTR) {
/* Error! Details in errno */
}
}
}
Although there are some local filesystems that guarantee write() does not return a short count unless you run out of storage space, not all do; especially not the networked ones. Using a loop like above lets your program work even on such filesystems. It's not too much code to add for reliable and robust operation, in my opinion.
In Linux, you can take a write lease on a file to exclude any other process opening that file for a while.
Essentially, you cannot block a file open, but you can delay it for up to /proc/sys/fs/lease-break-time seconds, typically 45 seconds. The lease is granted only when no other process has the file open, and if any other process tries to open the file, the lease owner gets a signal. (If the lease owner does not release the lease, for example by closing the file, the kernel will automagically break the lease after the lease-break-time is up.)
Unfortunately, these only work in Linux, and only on local files, so they are of limited use.
If readers do not keep the file open, but open, read, and close it every time they read it, you can write a full replacement file (must be on the same filesystem; I recommend using a lock-subdirectory for this), and hard-link it over the old file.
All readers will see either the old file or the new file, but those that keep their file open, will never see any changes.

Using mmap to receive file from server

Say I'm sending a file from a server to a client using read & send system call.
Now, I want to receive the data at the client side using mmap system call. how do I do that?
given the following send_file function (Server side) :
(sd is the socket descriptor associated with the client)
int send_file (int sd, const char* file_name) {
int fd;
char buf[1024];
if ( (fd =get_fd(file_name)) > 0) {
while (read(fd, buf, sizeof buf) > 0) {
if (send(sd, buf, sizeof buf, 0) < 0) {
perror("send");
return -1;
}
}
close(fd);
return 0;
}
return -1;
}
Again, I want to create the file at the client's side and then use MMAP to store the file from the server.
How do I do that? would love to get some suggestions.
thanks in advance
You can't receive a file from a socket using mmap. mmap is used to map files (or anonymous memory) into your process virtual adress space. Quoting the man page
mmap() creates a new mapping in the virtual address space of the
calling process.
So you have to use "sockets calls" to receive the file at the client side.
Not sure to understand why you want to do that, here is a way using mmap to write into the file at client side. First you have to use fopen(). You can then use lseek to "enlarge" the file:
The lseek() function allows the file offset to be set beyond the
end of the file (but this does not change the size of the file). If
data is later written at this point, subseā€
quent reads of the data in the gap (a "hole") return null bytes ('\0') until data is actually written into the gap.
And finally you can mmap it and copy the content received through network.

Does Linux's splice(2) work when splicing from a TCP socket?

I've been writing a little program for fun that transfers files over TCP in C on Linux. The program reads a file from a socket and writes it to file (or vice versa). I originally used read/write and the program worked correctly, but then I learned about splice and wanted to give it a try.
The code I wrote with splice works perfectly when reading from stdin (redirected file) and writing to the TCP socket, but fails immediately with splice setting errno to EINVAL when reading from socket and writing to stdout. The man page states that EINVAL is set when neither descriptor is a pipe (not the case), an offset is passed for a stream that can't seek (no offsets passed), or the filesystem doesn't support splicing, which leads me to my question: does this mean that TCP can splice from a pipe, but not to?
I'm including the code below (minus error handling code) in the hopes that I've just done something wrong. It's based heavily on the Wikipedia example for splice.
static void splice_all(int from, int to, long long bytes)
{
long long bytes_remaining;
long result;
bytes_remaining = bytes;
while (bytes_remaining > 0) {
result = splice(
from, NULL,
to, NULL,
bytes_remaining,
SPLICE_F_MOVE | SPLICE_F_MORE
);
if (result == -1)
die("splice_all: splice");
bytes_remaining -= result;
}
}
static void transfer(int from, int to, long long bytes)
{
int result;
int pipes[2];
result = pipe(pipes);
if (result == -1)
die("transfer: pipe");
splice_all(from, pipes[1], bytes);
splice_all(pipes[0], to, bytes);
close(from);
close(pipes[1]);
close(pipes[0]);
close(to);
}
On a side note, I think that the above will block on the first splice_all when the file is large enough due to the pipe filling up(?), so I also have a version of the code that forks to read and write from the pipe at the same time, but it has the same error as this version and is harder to read.
EDIT: My kernel version is 2.6.22.18-co-0.7.3 (running coLinux on XP.)
What kernel version is this? Linux has had support for splicing from a TCP socket since 2.6.25 (commit 9c55e01c0), so if you're using an earlier version, you're out of luck.
You need to splice_all from pipes[0] to to every time you do a single splice from from to pipes[1] (the splice_all is for the amount of bytes just read by the last single splice) . Reason: pipes represents a finite kernel memory buffer. So if bytes is more than that, you'll block forever in your splice_all(from, pipes[1], bytes).

Resources