How to extract data from HTTP header in C? - c

Today I am asking how to extract the data section from the received buffer in my recv() in C (not C++).
I just need some suggestions, how would I get
HTTP/1.1 200 OK\r\n
Date: Mon, 23 May 2005 22:38:34 GMT\r\n
Server: Apache/1.3.3.7 (Unix) (Red-Hat/Linux)\r\n
Last-Modified: Wed, 08 Jan 2003 23:11:55 GMT\r\n
ETag: "3f80f-1b6-3e1cb03b"\r\n
Content-Type: text/html; charset=UTF-8\r\n
Content-Length: 131\r\n
Connection: close\r\n
\r\n
<html>
<head>
<title>An Example Page</title>
</head>
<body>
Hello World, this is a very simple HTML document.
</body>
</html>
The part of the above header? It is stored in my buffer, I specifically just want to dissect the data (the source code of the page). Any ideas?

The header ends with \r\n\r\n. If the whole response is in the receive buffer and you put a '\0' at the end of the response, then you can use the following code to find the start of the data section
char *data = strstr( buffer, "\r\n\r\n" );
if ( data != NULL )
{
data += 4;
// do something with the data
}

You need to actually parse the data in order to know where the headers end and the message data begins, and where the message data ends. The headers end with a \r\n\r\n (CRLF+CRLF, 0x0D 0x0A 0x0D 0x0A) byte sequence. So you have to keep reading until you encounter that terminator. Then you have to parse the headers to know how the rest of the message is encoded and how it is terminated. Refer to RFC 2616 Section 4.4 Message Length for the rules. That will tell you HOW to read the remaining data and WHEN to stop reading it. The data might be chunked or compressed or self-terminating. The Content-Type and Transfer-Encoding headers tell you how to interpret the message data.
In your particular example, after reading the headers, per Section 4.4 you would retrieve the value of the Content-Length header and then read exactly 131 bytes, stop reading, and close the socket because of the Connection: close header. You would then retreive the value of the Content-Type header and know that the data is UTF-8 encoded HTML and process it accordingly.
See the pseudo-code I posted in an earlier answer:
Receiving Chunked HTTP Data With Winsock

Related

CURL and my little http client give different results

I have this little C program implementing a HTTP client. Basically I used this reference code with slight modifications, mainly for debugging.
My problem is, that the content I get back using this is not quite the same as I get when I use curl for example.
Basically what the code does is:
Construct a HTTP headers for a GET request
Establishes TCP connection with the server
Sends the HTTP headers
Receives back a response
The read part looks like this:
while((recived_len = recv(sock, BUF, BUFSIZ-1, 0)) > 0)
{
BUF[recived_len] = '\0';
response = (char*)realloc(response, strlen(response) + strlen(BUF) + 1);
sprintf(response, "%s%s", response, BUF);
}
In particular, I always get in the begining of the body part 4 bytes which I don't understand where they come from:
HTTP/1.1 200 OK
Date: Tue, 20 Apr 2021 09:17:54 GMT
Expires: -1
Cache-Control: private, max-age=0
Content-Type: text/html; charset=ISO-8859-1
P3P: CP="This is not a P3P policy! See g.co/p3phelp for more info."
Server: gws
X-XSS-Protection: 0
X-Frame-Options: SAMEORIGIN
Set-Cookie: 1P_JAR=2021-04-20-09; expires=Thu, 20-May-2021 09:17:54 GMT; path=/; domain=.google.com; Secure
Set-Cookie: NID=213=ts4T4alAR2ODEf4vlWrmoZj-cjJcAXACbbxf64Zte4lEbuvUgik6TUgKkdY5OVHDQuTWM59DekV3ayNXDl08TcETU-WwztPVmMFz9BXegk93QFyno5WCS9fJDGq3sSrbFsFjxPOPCLTOx-b8H3a4Ed_HbI9lXBocjGu07ULo8PY; expires=Wed, 20-Oct-2021 09:17:54 GMT; path=/; domain=.google.com; HttpOnly
Accept-Ranges: none
Vary: Accept-Encoding
Connection: close
Transfer-Encoding: chunked
5225 // what is this?
<!doctype html><html dir="rtl" itemscope="" itemtype="http://schema.org/WebPage" lang="iw"><head><meta content="text/html; charset=UTF-8" http-equiv="Content-Type"><meta content="/images/branding/googleg/1x/googleg_standard_color_128dp.png" itemprop="image"><title>Google</title><script nonce="ks4dIW3TspucDhFX4XRWZA==">(function(){window.google={kEI:'Qpx-YKSmGIeSaPf3mJgP',
The headers part looks fine, so I don't think it's a problem with the resquet's headers, and also the beggining of the content (except from these 4 bytes) looks fine. However, down the stream things get messier and it gets quite different from the curl response, which seems much more reasonable.
Can someone tell me why this happening?
If I try to send a GET request to
EDIT
OK thanks I understand this 4 bytes are the length of the Chunk, since the Transfer-Encoding is of type chunk. From what I've read here about it, chunks should be sent one after another, with each one preceded by (or followed by, depending from where you look) \r\n<chunk_length>\r\n
Basically what I'm trying to do is implement a MP3 player, so I want to be able to read a stream of MP3 from a streaming server and play it.
When I use curl to get the stream, I can play it. I can even see that the data is sent in chunks as described above.
However, when I use my program, the data I'm getting is for some reason not structured properly..
My final goal in all this is to use ESP8266 module as a Radio player. I try to understand all the moving parts that should be involved in order to make this work. First I want to implement this on my PC before I move to working on the ESP8266
EDIT #2
Thanks to #Emanuel P I noticed that I get back from the server content type of text/html; charset=ISO-8859-1, so I added a Content-Type: */* just as curl does, and I do get back a Content-Type: audio/mpeg, as expected, but it still doesn't work for some reason - the MP3 player doesn't read the file well:
$ ./mp3player.out stream.mp3
Note: Illegal Audio-MPEG-Header 0x04e248be at offset 2258.
Note: Trying to resync...
Note: Skipped 87 bytes in input.
Warning: Big change from first (MPEG version, layer, rate). Frankenstein stream?

Raw http server: send image issue

I'am working with kind of IoT device. Finaly I've got simple httpd server to work, and simple html pages works like a charm, but browser does not recognise images. I think this is http header issue, but i do not know what is exacly wrong.
For example, my test page look like this:
<html>
<head><title>test page</title></head>
<body>
hello world!
<img src="img.png">
</body>
</html>
If i go to http://de.vi.ce.ip/ 2 reqests are generated:
GET / HTTP/1.1\r\n
Accept text/html, application/xhtml+xml, */*\r\n
Accept-Language: en-EN\r\n
...
GET /img.png HTTP/1.1\r\n
Accept image/png, image/svg+xml, image/*;q=0.8, */*;q=0.5\r\n
Accept-Language: en-EN\r\n
...
To witch my server responds with:
HTTP/1.0 200 OK\r\n
Content-Type: text/html\r\n
Content-Length: 131\r\n
\r\n
<page data>
HTTP/1.0 200 OK\r\n
Content-Type: image/png\r\n
Content-Length: 5627\r\n
\r\n
<image binary data>
As the result i can see the text, but images are broken.
I've tryed few more parameters like Connection: close, Accept-Ranges: bytes, Content-Location (path).
I've tryed jpeg image under Content-Type: image/jpeg with no luck. I'am certain that image sent correctly.
I've made exactly the same - raw http server for IoT and your response looks absolutely correct. Try check following:
You correctly flush the socket before closing it. If you call close() right after send(), you will likely encounter this problem - data has not been correctly written
The Content-Length should be exactly the size of your file. Make sure you are not counting \r\n bytes of http response. Browser may still wait for tail bytes
Finally, get the browser network logs :)
The request is asking for png
GET /img.png HTTP/1.1\r\n
Why not return the correct content type;
Content-Type: image/png\r\n
I was running into a very similar problem.
In my case when I thought I was using \r\n line terminators, I was actually only using \n; which worked fine in chromium for serving the text/html page, but was throwing net::ERR_INVALID_HTTP_RESPONSE error when serving the image/jpeg. So the page loaded, but the images were broken.
My fix was to make sure that everything was using \r\n as it was supposed to.

Transfer-Encoding: chunked-- Browser does not respond

I have made a very simple web server on my LINUX machine using TCP socket programming in C language.I am sending it a HTTP GET request from a browser(both chrome and mozilla ) from the local machine.
This problem is that when i do not set the header
Transfer-Encoding: chunked in the response , the browser successfully displays the webpage.
But when i keep this header , the browser does not respond, it says NO DATA IS AVAILABLE.
EDIT: It works for firefox now after i added the chunk size (446 bytes) as pointed by #RomanK.
But chrome becomes unresponsive.
Here is the code
responseIndex = add(response,"HTTP/1.1 200 OK",responseIndex);
responseIndex = add(response,"Transfer-Encoding: chunked",responseIndex);
responseIndex = add(response,"Content-Type: text/html",responseIndex);
response[responseIndex++]='\r';
response[responseIndex++]='\n';
updateIndex = add(response,"446",updateIndex);
responseIndex = add(response,filebuffer,responseIndex);
response[responseIndex++]='\0';
send(clntSock, response, strlen(response), 0) ;
close(clntSock);
exit(0);
Here, add is a function to append the second argument to response and then append "/r/n".
response is a string.
responseIndex is just an int to keep track of the current length of response.
filebuffer is a string which contains all the text of the html file to be sent.
Response :
HTTP/1.1 200 OK
Transfer-Encoding: chunked
Content-Type: text/html
446 (or 1EB)
<html>
BODY
</html>
The error code given by chrome is : ERR_INVALID_CHUNKED_ENCODING
Content-Length and chunked transfer encoding are mutually exclusive.
You should omit Content-Length and rather add the chunk size at the start of each chunk as per the Wikipedia article.
Or, in other words, you need to output the chunk size in hexadecimal before this line
responseIndex = add(response,filebuffer,responseIndex);
EDIT : Note that the you need to provide the size of the chunk only, not of the entire HTTP response. In your case it should be the size of the HTML body only; for example it looks like your sample body would be 30 or 31 in size in hex (not sure about the whitespace).
So, 3 points:
a) Use hex
b) Use lowercase
c) Use size of the chunk (in your case, the body, as you have a single chunk). Do not include the size of the HTTP meta-data.
It's also a bit questionable that you use chunks in the first place; they should be used only in cases where you do not know the response size when you start generating the response. Here you know the response size at the start and can use Content-Length without Transfer-Encoding: chunked.
The point of chunked transfer is (sorry for tautology) to send data in chunks. The browser doesn't know how many chunks to expect, so you need to tell it that some chunk is the last one. The protocol specifies that the last chunk should be of size 0:
HTTP/1.1 200 OK
Transfer-Encoding: chunked
Content-Type: text/html
446\r\n
Precisely 446 bytes of data
0\r\n

Arduino Ethercard - return content of website

I am trying to access a website, and then return whatever it outputs in the body -> eg. "Success" or "Failed".
When I try with my code, I am getting the following back.
<<< REQ >>>
HTTP/1.1 200 OK
Date: Sat, 30 Aug 2014 17:36:31 GMT
Content-Type: text/html
Connection: close
Set-Cookie: __cfduid=d8a4fc3c84849b6786c6ca890b92e2cc01409420191023; expires=Mon, 23-Dec-2019 23:50:00 GMT; path=/; domain=.japseyz.com; HttpOnly
Vary: Accept-Encoding
X-Powered-By: PHP/5.3.28
Server.
My code is: http://pastebin.com/WwWbnLNn
If all you want to know is whether the HTTP transaction succeeded or failed, then you need to examine the HTTP Response code... which is in the first line of the response. In your example it is "200"... the human readable interpretation of it is "OK".
Here is a link to most of the HTTP 1.1 response codes: w3.org-rfc2616 RespCodes
Your question indicated you wanted to extract this information from the "body"...
... but that information is not located in the "body", it is in the first response
header, as described above.
have you tried ethercard samples? there is a webclient sample, in which you can find procedure called CALLBACK - in that procedure you can process data stored in buf variable.
in your case you need to look for first empty line, which tells you that headers has been sent and page content(what php writes to the page i.e.) follows.
how familiar are you at pointers? how deep you do need to process the page output? i.e. OK or ERROR is enough, or you do need to pass same parameters back to duino?

Getting body part headers with IMAP C-Client

I am using UW IMAP c-client v. 2007e and I couldn't find the following function: I need to retrieve a header of certain body part. I.e., the mail message contains multiple body parts, and one of them is an attachment looking like this:
--_004_57D6D2035A40B8ECEBA59CB9C13F52A1334093MBXC18_
Content-Type: text/plain; name="func.txt"
Content-Description: func.txt
Content-Disposition: attachment; filename="func.txt"; size=604;
creation-date="Thu, 12 Aug 2010 18:41:40 GMT";
modification-date="Thu, 12 Aug 2010 18:41:40 GMT"
Content-Transfer-Encoding: base64
...blah-blah-blah in base64...
Now, I can retrieve the blah-blah part with imap_fetchbody() but I don't see the function that can give me only the headers part (or headers+body, that would be OK too). I can get whole message text, but then I'd have to parse it myself into parts, which I don't want to do. Is there a way I could make c-client retrieve this header for certain part?
You want the MIME section of the part. From the IMAP RFC:
The MIME part specifier refers to the [MIME-IMB] header for
this part.
The c-client function for doing a FETCH is:
long imap_msgdata (MAILSTREAM *stream,unsigned long msgno,char *section,
unsigned long first,unsigned long last,STRINGLIST *lines,
long flags)
So if you want the headers from part 2.3 of message #8 (for instance), you'd call it as:
imap_msgdata(stream, 8, "2.3.MIME", 0, 0, null, FT_PEEK);
(FT_PEEK keeps the message from being marked as \Seen. If you want to mark the message read, pass 0 as the last argument.)

Resources