Parsing HTTP Headers - c

I've had a new found interest in building a small, efficient web server in C and have had some trouble parsing POST methods from the HTTP Header. Would anyone have any advice as to how to handle retrieving the name/value pairs from the "posted" data?
POST /test HTTP/1.1
Host: test-domain.com:7017
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.1) Gecko/2008070208 Firefox/3.0.1
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: keep-alive
Referer: http://test-domain.com:7017/index.html
Cookie: __utma=43166241.217413299.1220726314.1221171690.1221200181.16; __utmz=43166241.1220726314.1.1.utmccn=(direct)|utmcsr=(direct)|utmcmd=(none)
Cache-Control: max-age=0
Content-Type: application/x-www-form-urlencoded
Content-Length: 25
field1=asfd&field2=a3f3f3
// ^-this
I see no tangible way to retrieve the bottom line as a whole and ensure that it works every time. I'm not a fan of hard-coding in anything.

You can retrieve the name/value pairs by searching for newline newline or more specifically \r\n\r\n (after this, the body of the message will start).
Then you can simply split the list by the &, and then split each of those returned strings between the = for name/value pairs.
See the HTTP 1.1 RFC.

Once you have Content-Length in the header, you know the amount of bytes to be read right after the blank line. If, for any reason (GET or POST) Content-Length is not in the header, it means there's nothing to read after the blank line (crlf).

You need to keep parsing the stream as headers until you see the blank line. The rest is the POST data.
You need to write a little parser for the post data. You can use C library routines to do something quick and dirty, like index, strtok, and sscanf. If you have room for it in your definition of "small", you could do something more elaborate with a regular expression library, or even with flex and bison.
At least, I think this kind of answers your question.

IETF RFC notwithstanding, here is a more to the point answer. Assuming that you realize that there is always an extra /r/n after the Content-Length line in the header, you should be able to do the work to isolate it into a char* variable named data. This is where we start.
char *data = "f1=asfd&f2=a3f3f3";
char f1[100],
char f2[100];
sscanf(data, "%s&%s", &f1, &f2); // get the field tuples
char f1_name[50];
char f1_data[50];
sscanf(f1, "%s=%s", f1_name, f1_data);
char f2_name[50];
char f2_data[50];
sscanf(f2, "%s=%s", f2_name, f2_data);

Related

CURL and my little http client give different results

I have this little C program implementing a HTTP client. Basically I used this reference code with slight modifications, mainly for debugging.
My problem is, that the content I get back using this is not quite the same as I get when I use curl for example.
Basically what the code does is:
Construct a HTTP headers for a GET request
Establishes TCP connection with the server
Sends the HTTP headers
Receives back a response
The read part looks like this:
while((recived_len = recv(sock, BUF, BUFSIZ-1, 0)) > 0)
{
BUF[recived_len] = '\0';
response = (char*)realloc(response, strlen(response) + strlen(BUF) + 1);
sprintf(response, "%s%s", response, BUF);
}
In particular, I always get in the begining of the body part 4 bytes which I don't understand where they come from:
HTTP/1.1 200 OK
Date: Tue, 20 Apr 2021 09:17:54 GMT
Expires: -1
Cache-Control: private, max-age=0
Content-Type: text/html; charset=ISO-8859-1
P3P: CP="This is not a P3P policy! See g.co/p3phelp for more info."
Server: gws
X-XSS-Protection: 0
X-Frame-Options: SAMEORIGIN
Set-Cookie: 1P_JAR=2021-04-20-09; expires=Thu, 20-May-2021 09:17:54 GMT; path=/; domain=.google.com; Secure
Set-Cookie: NID=213=ts4T4alAR2ODEf4vlWrmoZj-cjJcAXACbbxf64Zte4lEbuvUgik6TUgKkdY5OVHDQuTWM59DekV3ayNXDl08TcETU-WwztPVmMFz9BXegk93QFyno5WCS9fJDGq3sSrbFsFjxPOPCLTOx-b8H3a4Ed_HbI9lXBocjGu07ULo8PY; expires=Wed, 20-Oct-2021 09:17:54 GMT; path=/; domain=.google.com; HttpOnly
Accept-Ranges: none
Vary: Accept-Encoding
Connection: close
Transfer-Encoding: chunked
5225 // what is this?
<!doctype html><html dir="rtl" itemscope="" itemtype="http://schema.org/WebPage" lang="iw"><head><meta content="text/html; charset=UTF-8" http-equiv="Content-Type"><meta content="/images/branding/googleg/1x/googleg_standard_color_128dp.png" itemprop="image"><title>Google</title><script nonce="ks4dIW3TspucDhFX4XRWZA==">(function(){window.google={kEI:'Qpx-YKSmGIeSaPf3mJgP',
The headers part looks fine, so I don't think it's a problem with the resquet's headers, and also the beggining of the content (except from these 4 bytes) looks fine. However, down the stream things get messier and it gets quite different from the curl response, which seems much more reasonable.
Can someone tell me why this happening?
If I try to send a GET request to
EDIT
OK thanks I understand this 4 bytes are the length of the Chunk, since the Transfer-Encoding is of type chunk. From what I've read here about it, chunks should be sent one after another, with each one preceded by (or followed by, depending from where you look) \r\n<chunk_length>\r\n
Basically what I'm trying to do is implement a MP3 player, so I want to be able to read a stream of MP3 from a streaming server and play it.
When I use curl to get the stream, I can play it. I can even see that the data is sent in chunks as described above.
However, when I use my program, the data I'm getting is for some reason not structured properly..
My final goal in all this is to use ESP8266 module as a Radio player. I try to understand all the moving parts that should be involved in order to make this work. First I want to implement this on my PC before I move to working on the ESP8266
EDIT #2
Thanks to #Emanuel P I noticed that I get back from the server content type of text/html; charset=ISO-8859-1, so I added a Content-Type: */* just as curl does, and I do get back a Content-Type: audio/mpeg, as expected, but it still doesn't work for some reason - the MP3 player doesn't read the file well:
$ ./mp3player.out stream.mp3
Note: Illegal Audio-MPEG-Header 0x04e248be at offset 2258.
Note: Trying to resync...
Note: Skipped 87 bytes in input.
Warning: Big change from first (MPEG version, layer, rate). Frankenstein stream?

Raw http server: send image issue

I'am working with kind of IoT device. Finaly I've got simple httpd server to work, and simple html pages works like a charm, but browser does not recognise images. I think this is http header issue, but i do not know what is exacly wrong.
For example, my test page look like this:
<html>
<head><title>test page</title></head>
<body>
hello world!
<img src="img.png">
</body>
</html>
If i go to http://de.vi.ce.ip/ 2 reqests are generated:
GET / HTTP/1.1\r\n
Accept text/html, application/xhtml+xml, */*\r\n
Accept-Language: en-EN\r\n
...
GET /img.png HTTP/1.1\r\n
Accept image/png, image/svg+xml, image/*;q=0.8, */*;q=0.5\r\n
Accept-Language: en-EN\r\n
...
To witch my server responds with:
HTTP/1.0 200 OK\r\n
Content-Type: text/html\r\n
Content-Length: 131\r\n
\r\n
<page data>
HTTP/1.0 200 OK\r\n
Content-Type: image/png\r\n
Content-Length: 5627\r\n
\r\n
<image binary data>
As the result i can see the text, but images are broken.
I've tryed few more parameters like Connection: close, Accept-Ranges: bytes, Content-Location (path).
I've tryed jpeg image under Content-Type: image/jpeg with no luck. I'am certain that image sent correctly.
I've made exactly the same - raw http server for IoT and your response looks absolutely correct. Try check following:
You correctly flush the socket before closing it. If you call close() right after send(), you will likely encounter this problem - data has not been correctly written
The Content-Length should be exactly the size of your file. Make sure you are not counting \r\n bytes of http response. Browser may still wait for tail bytes
Finally, get the browser network logs :)
The request is asking for png
GET /img.png HTTP/1.1\r\n
Why not return the correct content type;
Content-Type: image/png\r\n
I was running into a very similar problem.
In my case when I thought I was using \r\n line terminators, I was actually only using \n; which worked fine in chromium for serving the text/html page, but was throwing net::ERR_INVALID_HTTP_RESPONSE error when serving the image/jpeg. So the page loaded, but the images were broken.
My fix was to make sure that everything was using \r\n as it was supposed to.

How to extract data from HTTP header in C?

Today I am asking how to extract the data section from the received buffer in my recv() in C (not C++).
I just need some suggestions, how would I get
HTTP/1.1 200 OK\r\n
Date: Mon, 23 May 2005 22:38:34 GMT\r\n
Server: Apache/1.3.3.7 (Unix) (Red-Hat/Linux)\r\n
Last-Modified: Wed, 08 Jan 2003 23:11:55 GMT\r\n
ETag: "3f80f-1b6-3e1cb03b"\r\n
Content-Type: text/html; charset=UTF-8\r\n
Content-Length: 131\r\n
Connection: close\r\n
\r\n
<html>
<head>
<title>An Example Page</title>
</head>
<body>
Hello World, this is a very simple HTML document.
</body>
</html>
The part of the above header? It is stored in my buffer, I specifically just want to dissect the data (the source code of the page). Any ideas?
The header ends with \r\n\r\n. If the whole response is in the receive buffer and you put a '\0' at the end of the response, then you can use the following code to find the start of the data section
char *data = strstr( buffer, "\r\n\r\n" );
if ( data != NULL )
{
data += 4;
// do something with the data
}
You need to actually parse the data in order to know where the headers end and the message data begins, and where the message data ends. The headers end with a \r\n\r\n (CRLF+CRLF, 0x0D 0x0A 0x0D 0x0A) byte sequence. So you have to keep reading until you encounter that terminator. Then you have to parse the headers to know how the rest of the message is encoded and how it is terminated. Refer to RFC 2616 Section 4.4 Message Length for the rules. That will tell you HOW to read the remaining data and WHEN to stop reading it. The data might be chunked or compressed or self-terminating. The Content-Type and Transfer-Encoding headers tell you how to interpret the message data.
In your particular example, after reading the headers, per Section 4.4 you would retrieve the value of the Content-Length header and then read exactly 131 bytes, stop reading, and close the socket because of the Connection: close header. You would then retreive the value of the Content-Type header and know that the data is UTF-8 encoded HTML and process it accordingly.
See the pseudo-code I posted in an earlier answer:
Receiving Chunked HTTP Data With Winsock

Transfer-Encoding: chunked-- Browser does not respond

I have made a very simple web server on my LINUX machine using TCP socket programming in C language.I am sending it a HTTP GET request from a browser(both chrome and mozilla ) from the local machine.
This problem is that when i do not set the header
Transfer-Encoding: chunked in the response , the browser successfully displays the webpage.
But when i keep this header , the browser does not respond, it says NO DATA IS AVAILABLE.
EDIT: It works for firefox now after i added the chunk size (446 bytes) as pointed by #RomanK.
But chrome becomes unresponsive.
Here is the code
responseIndex = add(response,"HTTP/1.1 200 OK",responseIndex);
responseIndex = add(response,"Transfer-Encoding: chunked",responseIndex);
responseIndex = add(response,"Content-Type: text/html",responseIndex);
response[responseIndex++]='\r';
response[responseIndex++]='\n';
updateIndex = add(response,"446",updateIndex);
responseIndex = add(response,filebuffer,responseIndex);
response[responseIndex++]='\0';
send(clntSock, response, strlen(response), 0) ;
close(clntSock);
exit(0);
Here, add is a function to append the second argument to response and then append "/r/n".
response is a string.
responseIndex is just an int to keep track of the current length of response.
filebuffer is a string which contains all the text of the html file to be sent.
Response :
HTTP/1.1 200 OK
Transfer-Encoding: chunked
Content-Type: text/html
446 (or 1EB)
<html>
BODY
</html>
The error code given by chrome is : ERR_INVALID_CHUNKED_ENCODING
Content-Length and chunked transfer encoding are mutually exclusive.
You should omit Content-Length and rather add the chunk size at the start of each chunk as per the Wikipedia article.
Or, in other words, you need to output the chunk size in hexadecimal before this line
responseIndex = add(response,filebuffer,responseIndex);
EDIT : Note that the you need to provide the size of the chunk only, not of the entire HTTP response. In your case it should be the size of the HTML body only; for example it looks like your sample body would be 30 or 31 in size in hex (not sure about the whitespace).
So, 3 points:
a) Use hex
b) Use lowercase
c) Use size of the chunk (in your case, the body, as you have a single chunk). Do not include the size of the HTTP meta-data.
It's also a bit questionable that you use chunks in the first place; they should be used only in cases where you do not know the response size when you start generating the response. Here you know the response size at the start and can use Content-Length without Transfer-Encoding: chunked.
The point of chunked transfer is (sorry for tautology) to send data in chunks. The browser doesn't know how many chunks to expect, so you need to tell it that some chunk is the last one. The protocol specifies that the last chunk should be of size 0:
HTTP/1.1 200 OK
Transfer-Encoding: chunked
Content-Type: text/html
446\r\n
Precisely 446 bytes of data
0\r\n

HTTP POST mutli part "BAD REQUEST"

I'm trying to upload a file using POST
here's my request :
POST /upload.php HTTP/1.1
Host: localhost
Content-Type: multipart/form-data; boundary=---------------------------552335022525
Content-Length: 192
-----------------------------552335022525
Content-Disposition: form-data; name="userfile"; filename="12.txt"
Content-Type: text/plain
blabla
-----------------------------552335022525--
Using HTTP live headers firefox plugin everything works
but when putting it a char *buffer and send it with winsocksapi I get 400 Bad Request error
You need a blank line between the headers and the payload.
Content-Length: 192
-----------------------------552335022525
This is part of the HTTP protocol. HTTP request headers end with the first empty line (CR-LF by itself.) What you are sending is resulting in the string
-----------------------------552335022525
being taken (along with the following two lines) as a request header which, of course, it isn't. The server can't make head or tail of that, so it responds with 400 Bad Request.
Also, sending the Content-length is not necessary with multipart/form-data, nor even a good idea, as the wrong value could create problems. The MIME multipart format is self describing.

Resources