best way to read post request from a tcp socket? - c

i am writing a web server in c. i am working with TCP sockets and i want to know how to read post request and put it into a buffer.
what i can do is first read the post header in a buffer with fixed length and then create a dynamic buffer based on the content_length and read the body in dynamic buffer. but as the size of post header can vary, after read, the buffer containing the header could also contain body of the post request. i can try to parse it further but would it be efficient to do that?
is there a better way to do this?

Well, in your case, i would say that the 'best way' is to separate each step into its own process.
First, in order to parse the message, you continuously have to read raw bytes from the underlying device (file, socket, pipe, etc) and 'feed' the parser, until the message is complete.
Parsing the message could also be divided into different steps:
parsing the start line
parsing the headers
parsing the body (if any)
Since each step involves processing bytes of unknown size (separated by delimiter and not size specific, unless we already have 'parsed' the content length and know the body size), each process has its own needs of buffers, e.g.
the start line is composed of <method> <url> <version> \r\n
the header is composed of <key> : <value> \r\n
the composition of the body is determined by the headers
and the complete message does have the form <header> \r\n <body>, where the body part is optional (content length is zero)
So, long story short: you have to divide your message parser into different processing steps (highly recommended if you want to avoid spaghetti code) and decide when to use fixed sized or reallocatable buffers and how to reuse them (if not strictly separated by each step). Once you've done that and have a presentable implementation, then we're able to debate on how to reduce the usage of memory allocations and reuse of buffers.

Related

Why should I use circular buffers when reading and writing to sockets in C?

I'm doing an assignment where the goal is to create a basic FTP server in C capable of handling multiple clients at once.
The subject tells us to "wisely use circular buffers" but I don't really understand why or how ?
I'm already using select to know when I can read or write into my socket without blocking as I'm not allowed to use recv, send or O_NONBLOCKING.
Each connection has a structure where I store everything related to this client like the communication file descriptor, the network informations and the buffers.
Why can't I just use read on my socket into a fixed size buffer and then pass this buffer to the parsing function ?
Same goes for writing : why can't I just dprintf my response into the socket ?
From my point of view using a circular buffer adds a useless layer of complexity just to be translated back into a string to parse the command or to send back the response.
Did I misunderstood the subject ? Instead of storing individual characters should I store commands and responses as circular buffers of strings ?
Why should I use circular buffers when reading and writing to sockets in C?
The socket interface does not itself provide a reason for using circular buffers (a.k.a. ring buffers). You should be looking instead at the protocol requirements of the application using the socket -- the FTP protocol in this case. This will be colored by the characteristics of the underlying network protocol (TCP for FTP) and their effect on the behavior of the socket layer.
Why can't I just use read on my socket into a fixed size buffer and then pass this buffer to the parsing function ?
You surely could do without circular buffers, but that wouldn't be as simple as you seem to suppose. And that's not the question you should be asking anyway: it's not whether circular buffers are required, but what benefit they can provide that you might not otherwise get. More on that later.
Also, you surely can have fixed size circular buffers -- "circular" and "fixed size" are orthogonal characteristics. However, it is usually among the objectives of using a circular buffer to minimize or eliminate any need for dynamically adjusting the buffer size.
Same goes for writing : why can't I just dprintf my response into the socket ?
Again, you probably could do as you describe. The question is what do you stand to gain from interposing a circular buffer? Again, more later.
From my point of view using a circular buffer adds a useless layer of
complexity just to be translated back into a string to parse the
command or to send back the response.
Did I misunderstood the subject ?
That you are talking about translating to and from strings makes me think that you did indeed misunderstand the subject.
Instead of storing individual
characters should I store commands and responses as circular buffers
of strings ?
Again, where do you think "of strings" comes into it? Why are you supposing that the elements of the buffer(s) would represent (whole) messages?
A circular buffer is more a manner of use of an ordinary, flat, usually fixed-size buffer than it is a separate data structure of its own. There is a little bit of extra bookkeeping data involved, however, so I won't quibble with anyone who wants to call it a data structure in its own right.
Circular buffers for input
Among the main contexts for circular buffers' usefulness is data arriving with stream semantics (such as TCP provides) rather than with message semantics (such as UDP provides). With respect to your assignment, consider this: when the server reads command input, how does it know where the command ends? I suspect you're supposing that you will get one complete command per read(), but that is in no way a safe assumption, regardless of the implementation of the client. You may get partial commands, multiple commands, or both on each read(), and you need to be prepared to deal with that.
So suppose, for example, that you receive one and a half control messages in one read(). You can parse and respond to the first, but you need to read more data before you can act on the second. Where do you put that data? Ok, you read it into the end of the buffer. And what if on the next read() you get not only the rest of a message, but also part of another message?
You cannot keep on indefinitely adding data at the end of the buffer, not even if you dynamically allocate more space as needed. You could at some point move the unprocessed data from the tail of the buffer to the beginning, thus opening up space at the end, but that is costly, and at this point we are well past the simplicity you had in mind. (That simplicity was always imaginary.) Alternatively, you can perform your reads into a circular buffer, so that consuming data from the (logical) beginning of the buffer automatically makes space available at the (logical) end.
Circular buffers for output
Similar applies on the writing side with a stream-oriented network protocol. Consider that you cannot write() an arbitrary amount of data at a time, and it is very hard to know in advance exactly how much you can write. That's more likely to bite you on the data connection than on the control connection, but in principle, it applies to both. If you have only one client to feed at a time then you can keep write()ing in a loop until you've successfully transferred all the data, and this is what dprintf() would do. But that's potentially a blocking operation, so it undercuts your responsiveness when you are serving multiple clients at the same time, and maybe even with just one if (as with FTP) there are multiple connections per client.
You need to buffer data on the server, especially for the data connection, and now you have pretty much the same problem that you did on the reading side: when you've written only part of the data you want to send, and the socket is not ready for you to send more, what do you do? You could just track where you are in the buffer, and send more pieces as you can until the buffer is empty. But then you are wasting opportunities to read more data from the source file, or to buffer more control responses, until you work through the buffer. Once again, a circular buffer can mitigate that, by giving you a place to buffer more data without requiring it to start at the beginning of the buffer or being limited by the available space before the physical end of the buffer.

Sending stream terminating bytes to indicate end of TCP stream

Say for example I wish to stream something over TCP, for instance a file share (bytes of a read file) or the output of a process (stdin, stderr, stdout) via a pipe using CreateProcess or _popen. I wouldn't be able to calculate the size in bytes of the streamed output. If I were to send a specific set of special characters to indicate the stream is over to be parsed by the server/client, would this be practical and secure?
If I were to send a specific set of special characters to indicate the stream is over to be parsed by the server/client, would this be practical and secure?
If the contents of the file can be arbitrary binary data, then, no, this is not by itself a “safe” way of doing it, because the “specific set of special characters” you use could appear somewhere in the file, so they would be sent over the connection, and the receiver would think the file has ended when it has not.
(I say this is not “safe,” meaning it is not a guaranteed way to communicate the file correctly, because it is not clear what you mean by “practical” or “secure.” Secure from what? Is there some attacker giving you files to transmit, so they can craft them in malicious ways? If all your files are known to contain only plain text ASCII characters, you could “safely” use a non-ASCII character as a sentinel.)
There are myriad ways of making a protocol to send arbitrary data. You could send the length first (although then you need to decide how many bits to use for the length, or design some protocol to send arbitrarily many bits for the length). You could send the data in blocks of, say, 512 bytes, and send each block with its length first. As long as that length is 512 bytes, it would mean another block for the same data is coming. Once the length is less than 512 bytes (including 0), it means this is the last block for the data. You could send the data as a quoted string, starting with ", then followed by the bytes of the data except that each " in the data is actually sent as \ then " and each \ in the data is sent as \ and \, and then the end of the data is indicated by a " (not preceded by a \ that escapes it).

detecting end of http header with \r\n\r\n

Using recv I want to get the http header so I can parse for a content length. However I'm having trouble detecting the line break. Or actually do I even have to detect line break or will the first time I read into the buffer always be the complete header (assuming I have a long enough buffer).
This is written in C.
edit: looking at some of the related questions one of the things I am worried about is
"...the "\r\n" of the header break might be pulled into your buffer by two different calls to recv() which would prevent your code from recognizing the header break."
You should call recv() repeatedly and each time it gives you x bytes you increase the buffer-pointer you give to it by x bytes (and decrease the cb it is allowed to write also by x bytes). You do this until your buffer either contains a \r\n\r\n or is completely full, in which case you just close the socket and ignore the malicious client from then on. Buffer-size should be about 3000 bytes.
But: this ignores the general problem that your server seems to be a polling-server. If you have some experience you should try to make an epoll-server instead.
In addition to the problem of identifying "\r\n\r\n" across packet boundaries, you have the problem of identifying "Content-Length: xxxx\r\n" across packet boundaries. I suggest recieving and parsing one byte at a time. When you get a recv() of '\r' followed by a recv() of '\n', followed by a recv() of '\r' followed by a recv() of '\n', you can be sure the header has ended. Once you've grasped this, adapt your solution to recieve and parse n bytes at a time where n is a preprocessor definition defined to 1 initially, and change n.
In the end I did something like this:
while ( recv... > 0 ) {
if rnrn is inside the buffer using strstr
look for content length, output error if content length doesn't exist
else
keep on reading into the buffer
}
and then once the header is found I keep on reading for the message body.
anyway thanks guys, ended up doing a really inefficient way to get my answer but what must be done is done.

What do I use as replacement of GetFileSize() for pipes?

See title.
On the client side of an named pipe, I want to determine the size of the content to be read from a named pipe in order to allocate memory for a buffer to take the content.
The MSDN help says:
You cannot use the GetFileSize function with a handle of a nonseeking device such as a pipe or a communications device. To determine the file type for hFile, use the GetFileType function.
Hmmm. Okay. But if I cannot use GetFileSize to determine the amount of data being readable from a pipe, what shall I use then? Currently, I do
length = GetFileSize(pipehandle, 0);
while(length == 0){
Sleep(10); // wait a bit
length = GetFileSize(pipehandle, 0); // and try again
}
Sooner or later, length does get greater zero, but the waiting seems a bit bad to me.
Background: I have a pipe server (roughly the Multithreaded Pipe Server from the MSDN example) that waits for the client to connect. Upon connection, the server reads the content of a file and passes that to the client using the pipe connection.
More Background: The overall reason why I want to do that is that I'm working with an external library that implements an XML parser. Originally, the parser opens a file, then CreateFileMapping is applied to that file and finally MapViewOfFile is being called in order to get the file content.
Now, the project rules have changed and we're no longer allowed to create files on the disk, so we need another way to pass the information from App1 (the pipe server) to App2 (the pipe client). To change as less as possible, I decided to use pipes for passing the data because on the first view, opening a pipe is the same as opening any other file and I assume that I have to do only very few changes to get rid of reading files while being able to read from pipes.
Currently, I determine the size of the data in the pipe (I know that it is used only once to pass the input file from App1 to App2), then do a malloc to get a buffer and read the whole content of the pipe into that buffer.
If I'm on a completely off the track, I'd also be open for any suggestions to do things better.
Clearly you want a PIPE_TYPE_BYTE in this case since the amount of data is unpredictable. Treat it just like a regular file in the client, calling ReadFile() repeated with a small buffer of, say, 4096 bytes. If your need is to store it in an array then you could simply write an integer first so that the client knows how big to make the array.
If you created your pipe in a PIPE_TYPE_MESSAGE type, you will be able to use the PeekNamedPipe method to retrieve a complete message from the pipe.
The main difference between PIPE_TYPE_MESSAGE and PIPE_TYPE_BYTE are :
in MESSAGE type, the system is managing the length of the value sent into the pipe, just ask to read one message and you will get all the message (usefull for not too large message to avoid to fill all the memory)
in BYTE type, you have to manage the length of the data you send threw the pipe. Maybe a TLV protocol could be a good way to know the size of your "messages" (maybe the T-Type part sounds like a useless one), you can then read the content in two parts : first, read the first bytes which will give you the size of the message, and then read message by parts if you don't want to overfill the memory.

How to find out how much I should read from a socket?

In .NET there is the DataAvailable property in the network stream and the Available property in the tcp client.
However silverlight lacks those.
Should I send a header with the lenght of the message? I'd rather not waste network resources.
Is there any other way?
You are micro-optimizing. Why do you think that another 4 bytes would affect the performance?
In other words: Use a length header.
Update
I saw your comment on the other answer. You are using BeginRead in the wrong way. It will never block or wait until the entire buffer have been filled.
You should declare a buffer which can receive your entire message. The return value from EndRead will report the number of bytes received.
You should also know that TCP is stream based. There is no guarantees that your entire JSON message will be received at once (or that only your first message is received). Therefore you must have some sort of way to know when a message is complete.
And I say it again: A length header will hardly affect the performance.
What do you mean by 'waste network resources'? Every network read API I am aware of returns the actual number of bytes read, somehow. What's the actual problem here?

Resources