Remove HTTP Header Info - c

In C is there a way to exclude the HTTP header information that comes with the data when using recv() on a socket? I am trying to read some binary data and all I want is the actual binary information, not the HTTP header information. The current data received looks like this:
HTTP/1.1 200 OK
Content-Length: 3314
Content-Type: image/jpeg
Last-Modified: Tue, 20 Mar 2012 14:51:34 GMT
Accept-Ranges: bytes
ETag: "45da99f1a86cd1:6b9"
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
Date: Mon, 20 Aug 2012 14:10:08 GMT
Connection: close
╪ α
I would like only to read the binary portion of the file. (That's obviously not all the binary, only that much was displayed since I printed the output from my recv loop as a string and the first NULL char is after that small binary string).
I just need to get rid of the header portion, is there a simple way to do this?

You would be better of using some HTTP parsing library like curl
If you want to do it yourself:
You can search for '\r\n\r\n' (two \r\n) which separates HTTP headers and contents, and use string/buffer after that.
Also, you need to get Content-Length from header and read that many bytes as http content.
Something like:
/* http_resp has data read from recv */
httpbody = strstr(http_resp, "\r\n\r\n");
if(httpbody)
httpbody += 4; /* move ahead 4 chars
/* now httpbody has just data, stripped down http headers */
Note: make sure strstr does not overrun the memory, may be using strnstr (not sure this exists or not) or similar functions.

I think you need to extract the value of the Content-Length to know the size of the binary data to be read otherwise it will be impossible to know whether all data has been received. A simple approach to consume, and mostly ignore, the header portion is to read the incoming data byte-by-byte until "\r\n\r\n" is encountered, which indicates the end of the header section and the beginning of the content.

Related

Redirecting to permanently moved page with response code 302

I am sending this request from my C code:
char * request = "GET / HTTP/1.1\r\n" \
"Host: www.some.com\r\n" \
"Connection: keep-alive\r\n" \
"User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36\r\n" \
"Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9\r\nAccept-Language: en-US,en;q=0.9\r\nAccept-Encoding: gzip, deflate\r\n\r\n";
But I get this response after sending the above request:
HTTP/1.1 302 Found
Location: https://www.some.com/?gws_rd=ssl
Cache-Control: private
Content-Type: text/html; charset=UTF-8
BFCache-Opt-In: unload
Date: Thu, 24 Feb 2022 06:17:10 GMT
Server: gws
Content-Length: 231
X-XSS-Protection: 0
X-Frame-Options: SAMEORIGIN
Set-Cookie: 1P_JAR=2022-02-24-06; expires=Sat, 26-Mar-2022 06:17:10 GMT; path=/; domain=.some.com; Secure; SameSite=none
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>302 Moved</TITLE></HEAD><BODY>
<H1>302 Moved</H1>
The document has moved
here.
</BODY></HTML>
This is the message here. so If I follow https://www.some.com/?gws_rd=ssl I dont get any data, its like request is being sent but data is not received. I sending this request to https://www.some.com/?gws_rd=ssl
char *x="GET / https://www.some.com/?gws_rd=ssl\r\n\r\n";
why is that. Whats wrong with my http/https.
I am using openSSL.
So after sending initial request the server moved the resource to new url. Now when I following the new url nothing happens no data response
Code:
/* filename nossl.c */
#include "stdio.h"
#include "string.h"
#include "openssl/ssl.h"
#include "openssl/bio.h"
#include "openssl/err.h"
int main()
{
BIO * bio;
char resp[1024];
int ret;
//char * request = "GET /cas/login?service=https%3A%2F%2Fweb.corp.ema-tech.com%3A8888%2F HTTP/1.1\x0D\x0AHost: web.corp.ema-tech.com\x0D\x0A\x43onnection: Close\x0D\x0A\x0D\x0A";
char * request = "GET / HTTP/1.1\r\n" \
"Host: www.yoursite.com\r\n" \
"Connection: keep-alive\r\n" \
"User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36\r\n" \
"Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9\r\nAccept-Language: en-US,en;q=0.9\r\nAccept-Encoding: gzip, deflate\r\n\r\n";
char *x="GET / https://www.yoursite.com/?gws_rd=ssl\r\n\r\n";
/* Set up the library */
ERR_load_BIO_strings();
SSL_load_error_strings();
OpenSSL_add_all_algorithms();
/* Create and setup the connection */
//bio = BIO_new_connect("web.corp.ema-tech.com:8888");
printf("___________________________+\n");
bio = BIO_new_connect("www.yoursite.com:80");
if(bio == NULL) {
printf("====___________________________-\n");
printf("BIO is null\n");
}
if(BIO_do_connect(bio) <= 0) {
printf("+++++___________________________#\n");
BIO_free_all(bio);
}
printf("___________________________#^\n");
/* Send the request */
BIO_write(bio, request, strlen(request));
printf("___________________________0\n");
/* Read in the response */
for(;;) {
ret = BIO_read(bio, resp, 1023);
printf("----%d\n",ret);
if(ret <= 0) break;
resp[ret] = 0;
printf("%s\n", resp);
}
BIO_write(bio,x,sizeof("GET / https://www.yoursite.com/?gws_rd=ssl\r\n\r\n"));
for(;;) {
ret = BIO_read(bio, resp, 1023);
printf("----%d\n",ret);
if(ret <= 0) break;
resp[ret] = 0;
printf("%s\n", resp);
}
/* Close the connection and free the context */
BIO_free_all(bio);
return 0;
}
If your first request was HTTP (not HTTPS) then the server is mainly telling you to use HTTPS instead of HTTP. Your request would be
char * request = "GET /?gws_rd=ssl HTTP/1.1\r\n" \
"Host: www.some.com\r\n" ...
The /?gws_rd=ssl is the local resource name (/) and a query string (?gws_rd=ssl) from https://www.some.com/?gws_rd=ssl, while the host name www.some.com goes to the "Host:" header.
Some servers will only allow to connect if you are using the server name TLS extension (OpenSSL: "SSL_set_tlsext_host_name") and supply the host name as well.
You could also think about using an C existing library for a HTTPS client, for example:
libcurl (https://curl.se/libcurl/ - libcurl is one of the most used HTTP/HTTPS client libraries in C)
CivetWeb (https://github.com/civetweb/civetweb/blob/master/docs/api/mg_connect_client_secure.md - actually a server with some additional client functions; disclaimer: I am in the maintainer team of this server).
Both are open source and MIT licensed.
Edit:
Actually I need to know difference between openssl and https
HTTPS is a communication protocol (HyperText Transfer Protocol Secure).
OpenSSL is a crypto library.
The protocol stack from HTTP like:
HTTP: [HTTP]
[TCP/IP]
The stack from HTTPS looks like:
HTTPS: [HTTP]
[TLS (= SSL)]
[TCP/IP]
SSL stands for Secure Sockets Layer, and TLS (Transport Layer Security) is the successor of SSL. OpenSSL implements SSL version 2 and 3 (both deprecated) as well as all versions of TLS (1.0, 1.1, 1.2 and 1.3).
OpenSSL can provide the middle part of the HTTPS stack, but you still need the top and bottom part. They are identical to HTTP, so TLS (the protocol) respectively OpenSSL (a library implementing the protocol) is inserted in the middle.
To see this live in action try to read from www.google.com using the OpenSSL command line:
$ openssl s_client www.google.com
The server will provide some information, in particular the server certificate. Then you type:
GET / HTTP/1.1
Host: www.google.com
Connection: close
After the empty line at the bottom, the server will send a header:
HTTP/1.1 200 OK
Date: ..
Server: gws
Connection: close
Followed by an empty line and finally a HTML page.
This OpenSSL command line client will implement the TLS layer and use the TCP/IP layer from the operating system. But you have to provide the HTTP layer on top: The four lines of text (GET ..., Host ..., Connection ... and the empty line at the end) is a valid HTTP protocol request.
The full source of s_client can be found here: https://github.com/openssl/openssl/blob/master/apps/s_client.c
The source is lengthy because it provides a hundred different options.
A much smaller client example with more explanation can be found here:
https://wiki.openssl.org/index.php/SSL/TLS_Client
You will find the same four lines for the HTTP protocol in this example:
BIO_puts(web, "GET " HOST_RESOURCE " HTTP/1.1\r\n"
"Host: " HOST_NAME "\r\n"
"Connection: close\r\n\r\n");
In your code you used "Connection: keep-alive". That's perfectly fine if you want to make multiple HTTP requests using the same HTTP connection. Just make sure the last request you want to make used "Connection: close". Also be aware that a HTTP server may decide to close the connection at any time by sending a "Connection: close" header.
"Connection: close" is easier to begin with.
If you only want to download a web page, these four lines of code are usually enough - unless you need a login/cookies/access token/... for a specific web site. Additional requests such as POST (e.g, submitting a web form) will require more lines on top of OpenSSL. If you need this, you should consider using an additional library instead of implementing it on your own.
The response of the server needs to be split into header (everything above the first empty line) and body (everything below). Depending on the header, it might be required to interpret the body data differently.
For example, www.google.com will send one header line "Transfer-Encoding: chunked" (instead of "Content-Length: ####"). This are two different ways a server can let the client know how long the body data is supposed to be. If you get a "Content-Length: 1234" header, you know that you have to read 1234 bytes in your HTTP protocol implementation.
If you get a "Transfer-Encoding: chunked" header, the server will first send a hex number, followed by "\r\n". Followed by as many bytes as the hex number stated before. Followed by another hex number, "\r\n" and more data. Finally a hex number "0" will indicate the end of the data. The hex numbers and "\r\n" are not part of the HTML page - you need to remove it (if you keep it, you will end up with broken HTML or whatever you want to download).
If a server neither sends "Content-Length:" nor "Transfer-Encoding:" then you need to read until the server closes the connection.
This is also part of the HTTP protocol hat has to be implemented on top of OpenSSL for a HTTPS client. You will have to implement all three in a HTTP or HTTPS client, unless you need to communicate with only one server and you know it is only using "Content-Length: ####".

File descriptor does write data in C

I am using the joyent http-parser and lib event to create my own simple web server in C.
I have it connecting to the port and receiving the HTTP requests fine, however, I am not able to send a response using the socket.
So in one window I enter:
$ curl localhost:5555/a.txt
and in my server I receive and correctly handled it. TO the point I do the following:
http_data_cb* message_complete_req_callback(http_parser* parser)
{
int i;
http_request_t* http_request = parser->data;
http_response_t* http_response = generate_response(http_request);
printf("Writing %d:\n%s", strlen(http_response->full_message), http_response->full_message);
i = write(http_request->fd, http_response->full_message, strlen(http_response->full_message));
fsync(http_request->fd);
printf("Wrote: %d\n", i);
return 0;
}
Which prints the following:
Writing 96:
HTTP/1.0 200 OK
Tue, 04 Aug 2015 10:20:58 AEST
Server: mywebserver/jnd
Connection: close
Wrote: 96
However my curl instance doesn't receive anything. Any ideas?
Thanks in advance.
Your response contains no data, just headers. Curl strips off the headers and just prints the content. Not only that but you're responding with HTTP/1.0 which is a long way out of date. As it happens, the Connection: close only makes sense in 1.1 as 1.0 does not support keeping a connection open.
For curl to report anything you need to send some content. I'd expect the output to be something like:
Writing 128:
HTTP/1.1 200 OK
Tue, 04 Aug 2015 10:20:58 AEST
Server: mywebserver/jnd
Connection: close
Content-Length: 12
Hello World
Wrote: 128
Which would trigger curl to print:
Hello World
Notice the 12 character content length includes 1 character for the line feed. The content is Hello World<lf>.

Microchip webserver, don't receive ack after HTTP 200 OK frame

My current project is a bare-metal webserver. For this I'm using no libraries, and programming directly onto the chip. What I'm trying to do right now is send a single piece of HTTP data:
HTTP/1.1 200 OK\r\n
Content-Length: 45\r\n
Content-Type: text/html\r\n
Server: microserver\r\n
Connection: close\r\n
\r\n
<!DOCTYPE html><html>Hello<br>world!<hr></html>
My server tries to go through the following steps:
Receive SYN
Send [SYN+ACK]
Receive ACK
Receive ACK containing HTTP GET
Send [ACK,PUSH,FIN] with HTTP data (this one changed a lot, I've tried sending ACK PUSH and FIN seperately (with the content in PUSH), tried [ACK+PUSH],FIN and ACK,[PUSH+FIN] as well.
Receive [ACK+FIN] <<--- Here's where it goes wrong, this one is never even sent, according to wireshark.
Send ACK.
As said, it goes wrong at step 6. every single time. No matter what combination of ACK, PUSH and FIN I use in step 5. When looking at it with wireshark all the SEQ and ACK numbers are correct.
My server is able to close connections once the [FIN+ACK] finally does get sent, which sometimes happens on sockets that are kept open by the browser synchronously.
Pcap file of what wireshark records: https://www.dropbox.com/s/062k9hkgkenaqij/httpdump.pcap with as filter: (tcp.dstport == 80 || tcp.srcport == 80) && (ip.dst == 169.254.100.100 || ip.src == 169.254.100.100)
I know there is a 4 year old very similar question, Building a webserver, client doesn't acknowledge HTTP 200 OK frame, but I've tried pretty much everything that was suggested in there, and it didn't seem to get me any further, so I'm out of ideas.
EDIT:
I have found the problem, after studying sections of wireshark captures for hours an end. Apparently the mistake was that I did not use data to calculate the TCP checksum... But well, found the solution.

Meaning of libcurl messages and execution process

I am using libcurl library to fetch abc-1.tar file from server. I want to know meaning of message which is display and process of execution of libcurl to display this messages.
For Example: I provide some messages below out of that I know basic message meaning like Content-Length means length of file which are downloaded, etc.
I want meaning of all messages, particularly messages which are start with * (e. g. Connection #0 to host (nil) left intact)
* Re-using existing connection! (#0) with host (nil)
* Connected to (nil) (182.72.67.14) port 65101 (#0)
GET /...... HTTP/1.1
Host: 182.72.67.14:65101
Accept: */*
Connection:keep-alive
< HTTP/1.1 200 OK
< Cache-Control: private
< Content-Length: 186368
< Content-Type: application/x-tar
< Server: Microsoft-IIS/7.5
< Content-Disposition: attachment; filename=abc-1.tar
< X-AspNet-Version: 4.0.30319
< X-Powered-By: ASP.NET
< Date: Tue, 01 Oct 2013 06:29:00 GMT
<
* Connection #0 to host (nil) left intact
cURL's Man Page specifies three types of "special" verbose output:
A line starting with '>' means "header data" sent by curl, '<' means "header data" received by curl that is hidden in normal cases, and a line starting with '*' means additional info provided by curl.
You can read about HTTP header fields in the HTTP official publication page. Any other output lines displayed by cURL belong to the HTTP body carried by the corresponding message.
So what is the actual meaning of these informationals starting with *, you ask? They inform you about the status of the transfer's TCP connection with the host. For instance:
"Connected to (nil) (182.72.67.14) port 65101 (#0)" means that a TCP connection is established with the server side (in your case: 182.72.67.14). The #0 is the TCP session number (which is used only by cURL). The nil indicates that the host name couldn't be resolved via DNS (had it been resolved, the it would've appeared instead of nil).
"Connection #0 to host (nil) left intact" means that although the transfer is over, the TCP session itself is still open (i.e no FIN/ACK exchanges have been made), allowing you to keep reusing the same TCP connection for multiple transfers (which could be useful if you don't want to sacrifice time on opening a new TCP connection).
The message "Re-using existing connection! (#0) with host (nil)" supports that, indicating that cURL does indeed that, riding an existing TCP connection (from a previous transfer).
Marked by < are HTTP headers.You can read in detail about http headers and their meaning here
and marked by * are verbose information provided by curl which is displayed on stderr.

C program to convert Dollar to Rupee

Is there a way to write a C program to convert say Dollar to Indian Rupee (or visa-versa). The conversion parameter should not be hard coded but dynamic. More preciously it should get the latest value of Rupee vs Dollar automatically(from Internet) ?
Step 1 would be to get the latest conversion rate. You can use a web-service for that. There are many available. You can try this.
Request:
GET /CurrencyConvertor.asmx/ConversionRate?FromCurrency=INR&ToCurrency=USD HTTP/1.1
Host: www.webservicex.net
Response:
HTTP/1.1 200 OK
Content-Type: text/xml; charset=utf-8
Content-Length: length
<?xml version="1.0" encoding="utf-8"?>
<double xmlns="http://www.webserviceX.NET/">SOME_RATE_IN_DOUBLE</double>
For sending the request you can make use of cURL.
Once you have the response, just parse it to get the rate. Once you've the rate you can easily write the program to convert.
EDIT:
If using cURL is something you are not comfortable with you can make use of good old system and wget. For this you need to construct the URL first like:
www.webservicex.net/CurrencyConvertor.asmx/ConversionRate?FromCurrency=INR&ToCurrency=USD
then from the C program you can do:
char cmd[200];
char URL[] = "www.webservicex.net/CurrencyConvertor.asmx/ConversionRate?FromCurrency=INR&ToCurrency=USD";
sprintf(cmd,"wget -O result.html '%s'",URL); // ensure the URL is in quotes.
system(cmd);
After this the conversion rate is in the file result.html as XML. Just open it and parse it.
If you are using windows, you need to install wget for windows if you don't have it. You can get it here.
First, you need to find a server that can provides the conversion rate. After that, you write your program to fetch the rates from that server and use those information further in your program.
This site, http://www.csharphelp.com/2007/01/currency-converter-server-with-c/ although provides a tutorial for C# + Web, it can give you a general technical idea of how to do it.

Resources