I'm interested in the basic principles of Web-servers, like Apache or Nginx, so now I'm developing my own server.
When my server gets a request, it's searching for a file (e.g index.html), if it exists - read all the content to the buffer (content) and write it to the socket after. Here's a simplified code:
int return_file(char* content, char* fullPath) {
file = open(fullPath, O_RDONLY);
if (file > 0) { // File was found, OK
while ((nread = read(file, content, 2048)) > 0) {}
close(file);
return 200;
}
}
The question is pretty simple: is it possible to avoid using buffer and write file content directly to the socket?
Thanks for any tips :)
There is no standardized system call which can write directly from a file to a socket.
However, some operating systems do provide such a call. For example, both FreeBSD and Linux implement a system call called sendfile, but the precise details differ between the two systems. (In both cases, you need the underlying file descriptor for the file, not the FILE* pointer, although on both these platforms you can use fileno() to extract the fd from the FILE*.)
For more information:
FreeBSD sendfile()
Linux sendfile()
What you can do is write the "chunk" you read immediately to the client.
In order to write the content, you MUST read it, so you can't avoid that, but you can use a smaller buffer, and write the contents as you read them eliminating the need to read the whole file into memory.
For instance, you could
unsigned char byte;
// FIXME: store the return value to allow
// choosing the right action on error.
//
// Note that `0' is not really an error.
while (read(file, &byte, 1) > 0) {
if (write(client, &byte, 1) <= 0) {
// Handle error.
}
}
but then, unsigned char byte; could be unsigned char byte[A_REASONABLE_BUFFER_SIZE]; which would be better, and you don't need to store ALL the content in memory.
}
No, it is not. There must be an intermediate storage that you use for reading/writing the data.
There is one edge case: when you use memory mapped files, the mapped file's region can be used for writing into socket. But internally, the system would anyway perform a read into memory buffer operation.
This is for a Linux system, in C. It involves network programming. It is for a file transfer program.
I've been having this problem where this piece of code works unpredictably. It either is completely successful, or the while loop in the client never ends. I discovered that this is because the fileLength variable would sometimes be a huge (negative or positive) value, which I thought was attributed to making some mistake with ntohl. When I put in a print statement, it seemed to work perfectly, without error.
Here is the client code:
//...here includes relevant header files
int main (int argc, char *argv[]) {
//socket file descriptor
int sockfd;
if (argc != 2) {
fprintf (stderr, "usage: client hostname\n");
exit(1);
}
//...creates socket file descriptor, connects to server
//create buffer for filename
char name[256];
//recieve filename into name buffer, bytes recieved stored in numbytes
if((numbytes = recv (sockfd, name, 255 * sizeof (char), 0)) == -1) {
perror ("recv");
exit(1);
}
//Null terminator after the filename
name[numbytes] = '\0';
//length of the file to recieve from server
long fl;
memset(&fl, 0, sizeof fl);
//recieve filelength from server
if((numbytes = recv (sockfd, &fl, sizeof(long), 0)) == -1) {
perror ("recv");
exit(1);
}
//convert filelength to host format
long fileLength = ntohl(fl);
//check to make sure file does not exist, so that the application will not overwrite exisitng files
if (fopen (name, "r") != NULL) {
fprintf (stderr, "file already present in client directory\n");
exit(1);
}
//open file called name in write mode
FILE *filefd = fopen (name, "wb");
//variable stating amount of data recieved
long bytesTransferred = 0;
//Until the file is recieved, keep recieving
while (bytesTransferred < fileLength) {
printf("transferred: %d\ntotal: %d\n", bytesTransferred, fileLength);
//set counter at beginning of unwritten segment
fseek(filefd, bytesTransferred, SEEK_SET);
//buffer of 256 bytes; 1 byte for byte-length of segment, 255 bytes of data
char buf[256];
//recieve segment from server
if ((numbytes = recv (sockfd, buf, sizeof buf, 0)) == -1) {
perror ("recv");
exit(1);
}
//first byte of buffer, stating number of bytes of data in recieved segment
//converting from char to short requires adding 128, since the char ranges from -128 to 127
short bufLength = buf[0] + 128;
//write buffer into file, starting after the first byte of the buffer
fwrite (buf + 1, 1, bufLength * sizeof (char), filefd);
//add number of bytes of data recieved to bytesTransferred
bytesTransferred += bufLength;
}
fclose (filefd);
close (sockfd);
return 0;
}
This is the server code:
//...here includes relevant header files
int main (int argc, char *argv[]) {
if (argc != 2) {
fprintf (stderr, "usage: server filename\n");
exit(1);
}
//socket file descriptor, file descriptor for specific client connections
int sockfd, new_fd;
//...get socket file descriptor for sockfd, bind sockfd to predetermined port, listen for incoming connections
//...reaps zombie processes
printf("awaiting connections...\n");
while(1) {
//...accepts any incoming connections, gets file descriptor and assigns to new_fd
if (!fork()) {
//close socket file discriptor, only need file descriptor for specific client connection
close (sockfd);
//open a file for reading
FILE *filefd = fopen (argv[1], "rb");
//send filename to client
if (send (new_fd, argv[1], strlen (argv[1]) * sizeof(char), 0) == -1)
{ perror ("send"); }
//put counter at end of selected file, and find length
fseek (filefd, 0, SEEK_END);
long fileLength = ftell (filefd);
//convert length to network form and send it to client
long fl = htonl(fileLength);
//Are we sure this is sending all the bytes??? TEST
if (send (new_fd, &fl, sizeof fl, 0) == -1)
{ perror ("send"); }
//variable stating amount of data unsent
long len = fileLength;
//Until file is sent, keep sending
while(len > 0) {
printf("remaining: %d\ntotal: %d\n", len, fileLength);
//set counter at beginning of unread segment
fseek (filefd, fileLength - len, SEEK_SET);
//length of the segment; 255 unless last segment
short bufLength;
if (len > 255) {
len -= 255;
bufLength = 255;
} else {
bufLength = len;
len = 0;
}
//buffer of 256 bytes; 1 byte for byte-length of segment, 255 bytes of data
char buf[256];
//Set first byte of buffer as the length of the segment
//converting short to char requires subtracting 128
buf[0] = bufLength - 128;
//read file into the buffer starting after the first byte of the buffer
fread(buf + 1, 1, bufLength * sizeof(char), filefd);
//Send data too client
if (send (new_fd, buf, sizeof buf, 0) == -1)
{ perror ("send"); }
}
fclose (filefd);
close (new_fd);
exit (0);
}
close (new_fd);
}
return 0;
}
Note: I've simplified the code a bit, to make it clearer I hope.
Anything beginning with //... represents a bunch of code
You seem to be assuming that each send() will either transfer the full number of bytes specified or will error out, and that each one will will pair perfectly with a recv() on the other side, such that the recv() receives exactly the number of bytes sent by the send() (or error out), no more and no less. Those are not safe assumptions.
You don't show the code by which you set up the network connection. If you're using a datagram-based protocol (i.e. UDP) then you're more likely to get the send/receive boundary matching you expect, but you need to account for the possibility that packets will be lost or corrupted. If you're using a stream-based protocol (i.e. TCP) then you don't have to be too concerned with data loss or corruption, but you have no reason at all to expect boundary-matching behavior.
You need at least three things:
An application-level protocol on top of the network-layer. You've got parts of that already, such as in how you transfer the file length first to advise the client about much content to expect, but you need to do similar for all data transferred that are not of pre-determined, fixed length. Alternatively, invent another means to communicate data boundaries.
Every send() / write() that aims to transfer more than one byte must be performed in a loop to accommodate transfers being broken into multiple pieces. The return value tells you how many of the requested bytes were transferred (or at least how many were handed off to the network stack), and if that's fewer than requested you must loop back to try to transfer the rest.
Every recv() / read() that aims to transfer more than one byte must be performed in a loop to accommodate transfers being broken into multiple pieces. I recommend structuring that along the same lines as described for send(), but you also have the option of receiving data until you see a pre-arranged delimiter. The delimiter-based approach is more complicated, however, because it requires additional buffering on the receiving side.
Without those measures, your server and client can easily get out of sync. Among the possible results of that are that the client interprets part of the file name or part of the file content as the file length.
Even though you removed it from that code I'll make an educated guess and assume that you're using TCP or some other stream protocol here. This means that the data that the servers sends is a stream of bytes and the recv calls will not correspond in the amount of data they get with the send calls.
It is equally legal for your first recv call to just get one byte of data, as it is to get the file name, file size and half of the file.
You say
When I put in a print statement,
but you don't say where. I'll make another educated guess here and guess that you did it on the server before sending the file length. And that happened to shake things enough that the data amounts that were sent on the connection just accidentally happened to match what you were expecting on the client.
You need to define a protocol. Maybe start with a length of the filename, then the filename, then the length of the file. Or always send 256 bytes for the filename regardless of how long it is. Or send the file name as a 0-terminated string and try to figure out the data from that. But you can never assume that just because you called send with X bytes that the recv call will get X bytes.
I believe the issue is actually a compound of everything you and others have said. In the server code you send the name of the file like this:
send (new_fd, argv[1], strlen (argv[1]) * sizeof(char), 0);
and receive it in the client like this:
recv (sockfd, name, 255 * sizeof (char), 0);
This will cause an issue when the filename length is anything less than 255. Since TCP is a stream protocol (as mentioned by #Art), there are no real boundaries between the sends and recvs, which can cause you to receive data in odd places where you are not expecting them.
My recommendation would be to first send the length of the filename, eg:
// server
long namelen = htonl(strlen(argv[1]));
send (new_fd, &namelen, 4, 0);
send (new_fd, argv[1], strlen (argv[1]) * sizeof(char), 0);
// client
long namelen;
recv (sockfd, &namelen, 4, 0);
namelen = ntohl(namelen);
recv (sockfd, name, namelen * sizeof (char), 0);
This will ensure that you are always aware of exactly how long your filename is and makes sure that you aren't accidentally reading your file length from somewhere in the middle of your file (which is what I expect is happening currently).
edit.
Also, be cautious when you are sending sized numbers. If you use the sizeof call on them, you may be sending and receiving different sizes. This is why I hard-coded the sizes in the send and recv for the name length so that there is no confusion on either side.
Well, after some testing, I discovered that the issue causing the problem did have something to do with htonl(), though I had still read the data incorrectly in the beginning. It wasn't that htonl() wasn't working at all, but that I didn't realize a 'long' has different lengths depending on system architecture (thanks #tofro). That is to say the length of a 'long' integer on 32-bit and 64-bit operating systems is 4 bytes and 8 bytes, respectively. And the htonl() function (from arpa/inet.h) for 4-byte integers. I was using a 64-bit OS, which explains why the value was being fudged. I fixed the issue by using the int32_t variable (from stdint.h) to store the file length. So the main issue in this case was not that it was becoming out of sync (I think). But as for everyone's advice towards developing an actual protocol, I think I know what exactly you mean, I definitely understand why it's important, and I'm currently working towards it. Thank you all for all your help.
EDIT: Well now that it has been several years, and I know a little more, I know that this explanation doesn't make sense. All that would result from long being larger than I expected (8 bytes rather than 4) is that there's some implicit casting going on. I used sizeof(long) in the original code rather than hardcoding it to assume 4 bytes, so that particular (faulty) assumption of mine shouldn't have produced the bug I saw.
The problem is almost certainly what everyone else said: one call to recv was not getting all of the bytes representing the file length. At the time I doubted this was the real cause of the behaviour I saw, because the file name (of arbitrary length) I was sending through was never partially sent (i.e. the client always created a file of the correct filename). Only the file length was messed up. My hypothesis at the time was that recv mostly respected message boundaries, and while recv can possibly only send part of the data, it was more likely that it was sending it all and there was another bug in my code. I now know this isn't true at all, and TCP doesn't care.
I'm a little curious as to why I didn't see other unexpected behaviour as well (e.g. the file name being wrong on the receiving end), and I wanted to investigate further, but despite managing to find the files, I can't seem to reproduce the problem now. I suppose I'll never know, but at least I understand the main issue here.
Can someone kindly explain how I can implemented an mread function, by using the read() system call. This method needs to read contents which are found in the mmapped file and read them into a buffer. I have access to both the mmapped file and buffer by means of pointers. (i.e. void *addr and void *buff).
Your help is v.much appreciated.
Try so far:
int fd;
if ((fd = open("file.hole",O_RDWR, "rb")) < 0) {
perror("create .hole file error");
exit(EXIT_FAILURE);
}
if (write(fd, addr, count)!= count) {
perror("Cannot write from address");
exit(EXIT_FAILURE);
}
buff = (char*)malloc(count * sizeof(char *));
if (read(fd, buff, count)) {
perror("Cannot read from file descriptor to the buffer");
exit(EXIT_FAILURE);
}
Sorry, this might not a complete solution, but I don't have sufficient reputation to add only comments.
If you need to read data from mmaped file (assumed that you have called mmap() on a file), you don't need read() system call; you just need to copy the content from addr to buff.
If you need read() system call to copy data from a file to a buffer, you don't need mmap(); you should just do open() on the file to get an fd and then do read() data from the fd to the buffer.
I am running into problems when I try to transfer a file between the simple server and client applications that I have written. The file gets transferred successfully, but the file size is different at the receiving side (server side).
I open the file on the client side, use fseek() to find the size of the file. Then I use fread() to to read it into a buffer of char type. I send this buffer using sendto() as I have to use UDP sockets.
On the server side, I use recvfrom() to store this and use fwrite() to write it into another file. But when I check the size of the file, it is much bigger. Also I am not able to open it even though it is supposed to be a text file.
Can you give me some pointers as to where I might be going wrong? Also is this the best way to send files over sockets? Are there better methods to send files?
Thanks
Code for client side
//Writing code to open file and copy it into buffer
fseek(fp, 0, SEEK_END);
size_t file_size = ftell(fp);
fseek(fp, 0, SEEK_SET);
if(fread(file_buffer, file_size, 1, fp)<=0)
{
printf("Unable to copy file into buffer! \n");
exit(1);
}
//Sending file buffer
if(sendto(sock, file_buffer, strlen(file_buffer), 0, (struct sockaddr *) &serv_addr, serv_len)<0)
{
printf("Error sending the file! \n");
exit(1);
}
bzero(file_buffer, sizeof(file_buffer));
Code on the server side to receive the file
//Receiving file from client
char file_buffer[BUFSIZE];
if(recvfrom(sock, file_buffer, BUFSIZE, 0, (struct sockaddr *) &client_addr, &client_addr_size)<0)
{
printf("Error receiving file.");
exit(1);
}
char new_file_name[] = "copied_";
strcat(new_file_name,file_name);
FILE *fp;
fp = fopen(new_file_name,"w+");
if(fwrite(file_buffer, 1, sizeof(file_buffer), fp)<0)
{
printf("Error writing file! \n");
exit(1);
}
There are several problems with your code.
In the sender:
How do you allocate file_buffer? What if the file is bigger than the buffer? (Probably this is not causing the problem at hand.)
You only check to see if the return value from fread is <= 0. In fact, if an error or EOF occurs, the return value can be any value less than the full size of the file. (Probably this is not causing the problem at hand.)
You pass strlen(file_buffer) instead of file_size to the sendto system call. strlen looks for NUL bytes which the file may or may not contain. It probably doesn't contain any, since you say it's a text file.
If the file contain at least one NUL byte the packet will be truncated before the first NUL byte and you will not transmit the full contents of the file.
If the file contains no NUL bytes, strlen will scan beyond the end of the file as read into the buffer. Either your program will crash because strlen scans into unallocated memory addresses, or you will send additional garbage past the end of the file.
In the receiver:
You ignore the return value from recvfrom which is the length of the payload of the packet that was received. After that, there is no way for you to know how much data you received.
When you fwrite the result, you pass sizeof(file_buffer) as the size instead of the actual amount of data that was received. This is a fixed value (BUFSIZE) which is probably bigger than your file. The file written on disk will contain garbage beyond the end of the file.
I've been writing a little program for fun that transfers files over TCP in C on Linux. The program reads a file from a socket and writes it to file (or vice versa). I originally used read/write and the program worked correctly, but then I learned about splice and wanted to give it a try.
The code I wrote with splice works perfectly when reading from stdin (redirected file) and writing to the TCP socket, but fails immediately with splice setting errno to EINVAL when reading from socket and writing to stdout. The man page states that EINVAL is set when neither descriptor is a pipe (not the case), an offset is passed for a stream that can't seek (no offsets passed), or the filesystem doesn't support splicing, which leads me to my question: does this mean that TCP can splice from a pipe, but not to?
I'm including the code below (minus error handling code) in the hopes that I've just done something wrong. It's based heavily on the Wikipedia example for splice.
static void splice_all(int from, int to, long long bytes)
{
long long bytes_remaining;
long result;
bytes_remaining = bytes;
while (bytes_remaining > 0) {
result = splice(
from, NULL,
to, NULL,
bytes_remaining,
SPLICE_F_MOVE | SPLICE_F_MORE
);
if (result == -1)
die("splice_all: splice");
bytes_remaining -= result;
}
}
static void transfer(int from, int to, long long bytes)
{
int result;
int pipes[2];
result = pipe(pipes);
if (result == -1)
die("transfer: pipe");
splice_all(from, pipes[1], bytes);
splice_all(pipes[0], to, bytes);
close(from);
close(pipes[1]);
close(pipes[0]);
close(to);
}
On a side note, I think that the above will block on the first splice_all when the file is large enough due to the pipe filling up(?), so I also have a version of the code that forks to read and write from the pipe at the same time, but it has the same error as this version and is harder to read.
EDIT: My kernel version is 2.6.22.18-co-0.7.3 (running coLinux on XP.)
What kernel version is this? Linux has had support for splicing from a TCP socket since 2.6.25 (commit 9c55e01c0), so if you're using an earlier version, you're out of luck.
You need to splice_all from pipes[0] to to every time you do a single splice from from to pipes[1] (the splice_all is for the amount of bytes just read by the last single splice) . Reason: pipes represents a finite kernel memory buffer. So if bytes is more than that, you'll block forever in your splice_all(from, pipes[1], bytes).