Meaning of libcurl messages and execution process

Meaning of libcurl messages and execution process - c

I am using libcurl library to fetch abc-1.tar file from server. I want to know meaning of message which is display and process of execution of libcurl to display this messages.
For Example: I provide some messages below out of that I know basic message meaning like Content-Length means length of file which are downloaded, etc.
I want meaning of all messages, particularly messages which are start with * (e. g. Connection #0 to host (nil) left intact)
* Re-using existing connection! (#0) with host (nil)
* Connected to (nil) (182.72.67.14) port 65101 (#0)
GET /...... HTTP/1.1
Host: 182.72.67.14:65101
Accept: */*
Connection:keep-alive
< HTTP/1.1 200 OK
< Cache-Control: private
< Content-Length: 186368
< Content-Type: application/x-tar
< Server: Microsoft-IIS/7.5
< Content-Disposition: attachment; filename=abc-1.tar
< X-AspNet-Version: 4.0.30319
< X-Powered-By: ASP.NET
< Date: Tue, 01 Oct 2013 06:29:00 GMT
<
* Connection #0 to host (nil) left intact

cURL's Man Page specifies three types of "special" verbose output:
A line starting with '>' means "header data" sent by curl, '<' means "header data" received by curl that is hidden in normal cases, and a line starting with '*' means additional info provided by curl.
You can read about HTTP header fields in the HTTP official publication page. Any other output lines displayed by cURL belong to the HTTP body carried by the corresponding message.
So what is the actual meaning of these informationals starting with *, you ask? They inform you about the status of the transfer's TCP connection with the host. For instance:
"Connected to (nil) (182.72.67.14) port 65101 (#0)" means that a TCP connection is established with the server side (in your case: 182.72.67.14). The #0 is the TCP session number (which is used only by cURL). The nil indicates that the host name couldn't be resolved via DNS (had it been resolved, the it would've appeared instead of nil).
"Connection #0 to host (nil) left intact" means that although the transfer is over, the TCP session itself is still open (i.e no FIN/ACK exchanges have been made), allowing you to keep reusing the same TCP connection for multiple transfers (which could be useful if you don't want to sacrifice time on opening a new TCP connection).
The message "Re-using existing connection! (#0) with host (nil)" supports that, indicating that cURL does indeed that, riding an existing TCP connection (from a previous transfer).

Marked by < are HTTP headers.You can read in detail about http headers and their meaning here
and marked by * are verbose information provided by curl which is displayed on stderr.

Related

Redirecting to permanently moved page with response code 302

I am sending this request from my C code:
char * request = "GET / HTTP/1.1\r\n" \
"Host: www.some.com\r\n" \
"Connection: keep-alive\r\n" \
"User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36\r\n" \
"Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9\r\nAccept-Language: en-US,en;q=0.9\r\nAccept-Encoding: gzip, deflate\r\n\r\n";
But I get this response after sending the above request:
HTTP/1.1 302 Found
Location: https://www.some.com/?gws_rd=ssl
Cache-Control: private
Content-Type: text/html; charset=UTF-8
BFCache-Opt-In: unload
Date: Thu, 24 Feb 2022 06:17:10 GMT
Server: gws
Content-Length: 231
X-XSS-Protection: 0
X-Frame-Options: SAMEORIGIN
Set-Cookie: 1P_JAR=2022-02-24-06; expires=Sat, 26-Mar-2022 06:17:10 GMT; path=/; domain=.some.com; Secure; SameSite=none
<HTML><HEAD><meta http-equiv="content-type" content="text/html;charset=utf-8">
<TITLE>302 Moved</TITLE></HEAD><BODY>
<H1>302 Moved</H1>
The document has moved
here.
</BODY></HTML>
This is the message here. so If I follow https://www.some.com/?gws_rd=ssl I dont get any data, its like request is being sent but data is not received. I sending this request to https://www.some.com/?gws_rd=ssl
char *x="GET / https://www.some.com/?gws_rd=ssl\r\n\r\n";
why is that. Whats wrong with my http/https.
I am using openSSL.
So after sending initial request the server moved the resource to new url. Now when I following the new url nothing happens no data response
Code:
/* filename nossl.c */
#include "stdio.h"
#include "string.h"
#include "openssl/ssl.h"
#include "openssl/bio.h"
#include "openssl/err.h"
int main()
{
BIO * bio;
char resp[1024];
int ret;
//char * request = "GET /cas/login?service=https%3A%2F%2Fweb.corp.ema-tech.com%3A8888%2F HTTP/1.1\x0D\x0AHost: web.corp.ema-tech.com\x0D\x0A\x43onnection: Close\x0D\x0A\x0D\x0A";
char * request = "GET / HTTP/1.1\r\n" \
"Host: www.yoursite.com\r\n" \
"Connection: keep-alive\r\n" \
"User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36\r\n" \
"Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9\r\nAccept-Language: en-US,en;q=0.9\r\nAccept-Encoding: gzip, deflate\r\n\r\n";
char *x="GET / https://www.yoursite.com/?gws_rd=ssl\r\n\r\n";
/* Set up the library */
ERR_load_BIO_strings();
SSL_load_error_strings();
OpenSSL_add_all_algorithms();
/* Create and setup the connection */
//bio = BIO_new_connect("web.corp.ema-tech.com:8888");
printf("___________________________+\n");
bio = BIO_new_connect("www.yoursite.com:80");
if(bio == NULL) {
printf("====___________________________-\n");
printf("BIO is null\n");
}
if(BIO_do_connect(bio) <= 0) {
printf("+++++___________________________#\n");
BIO_free_all(bio);
}
printf("___________________________#^\n");
/* Send the request */
BIO_write(bio, request, strlen(request));
printf("___________________________0\n");
/* Read in the response */
for(;;) {
ret = BIO_read(bio, resp, 1023);
printf("----%d\n",ret);
if(ret <= 0) break;
resp[ret] = 0;
printf("%s\n", resp);
}
BIO_write(bio,x,sizeof("GET / https://www.yoursite.com/?gws_rd=ssl\r\n\r\n"));
for(;;) {
ret = BIO_read(bio, resp, 1023);
printf("----%d\n",ret);
if(ret <= 0) break;
resp[ret] = 0;
printf("%s\n", resp);
}
/* Close the connection and free the context */
BIO_free_all(bio);
return 0;
}

If your first request was HTTP (not HTTPS) then the server is mainly telling you to use HTTPS instead of HTTP. Your request would be
char * request = "GET /?gws_rd=ssl HTTP/1.1\r\n" \
"Host: www.some.com\r\n" ...
The /?gws_rd=ssl is the local resource name (/) and a query string (?gws_rd=ssl) from https://www.some.com/?gws_rd=ssl, while the host name www.some.com goes to the "Host:" header.
Some servers will only allow to connect if you are using the server name TLS extension (OpenSSL: "SSL_set_tlsext_host_name") and supply the host name as well.
You could also think about using an C existing library for a HTTPS client, for example:
libcurl (https://curl.se/libcurl/ - libcurl is one of the most used HTTP/HTTPS client libraries in C)
CivetWeb (https://github.com/civetweb/civetweb/blob/master/docs/api/mg_connect_client_secure.md - actually a server with some additional client functions; disclaimer: I am in the maintainer team of this server).
Both are open source and MIT licensed.
Edit:
Actually I need to know difference between openssl and https
HTTPS is a communication protocol (HyperText Transfer Protocol Secure).
OpenSSL is a crypto library.
The protocol stack from HTTP like:
HTTP: [HTTP]
[TCP/IP]
The stack from HTTPS looks like:
HTTPS: [HTTP]
[TLS (= SSL)]
[TCP/IP]
SSL stands for Secure Sockets Layer, and TLS (Transport Layer Security) is the successor of SSL. OpenSSL implements SSL version 2 and 3 (both deprecated) as well as all versions of TLS (1.0, 1.1, 1.2 and 1.3).
OpenSSL can provide the middle part of the HTTPS stack, but you still need the top and bottom part. They are identical to HTTP, so TLS (the protocol) respectively OpenSSL (a library implementing the protocol) is inserted in the middle.
To see this live in action try to read from www.google.com using the OpenSSL command line:
$ openssl s_client www.google.com
The server will provide some information, in particular the server certificate. Then you type:
GET / HTTP/1.1
Host: www.google.com
Connection: close
After the empty line at the bottom, the server will send a header:
HTTP/1.1 200 OK
Date: ..
Server: gws
Connection: close
Followed by an empty line and finally a HTML page.
This OpenSSL command line client will implement the TLS layer and use the TCP/IP layer from the operating system. But you have to provide the HTTP layer on top: The four lines of text (GET ..., Host ..., Connection ... and the empty line at the end) is a valid HTTP protocol request.
The full source of s_client can be found here: https://github.com/openssl/openssl/blob/master/apps/s_client.c
The source is lengthy because it provides a hundred different options.
A much smaller client example with more explanation can be found here:
https://wiki.openssl.org/index.php/SSL/TLS_Client
You will find the same four lines for the HTTP protocol in this example:
BIO_puts(web, "GET " HOST_RESOURCE " HTTP/1.1\r\n"
"Host: " HOST_NAME "\r\n"
"Connection: close\r\n\r\n");
In your code you used "Connection: keep-alive". That's perfectly fine if you want to make multiple HTTP requests using the same HTTP connection. Just make sure the last request you want to make used "Connection: close". Also be aware that a HTTP server may decide to close the connection at any time by sending a "Connection: close" header.
"Connection: close" is easier to begin with.
If you only want to download a web page, these four lines of code are usually enough - unless you need a login/cookies/access token/... for a specific web site. Additional requests such as POST (e.g, submitting a web form) will require more lines on top of OpenSSL. If you need this, you should consider using an additional library instead of implementing it on your own.
The response of the server needs to be split into header (everything above the first empty line) and body (everything below). Depending on the header, it might be required to interpret the body data differently.
For example, www.google.com will send one header line "Transfer-Encoding: chunked" (instead of "Content-Length: ####"). This are two different ways a server can let the client know how long the body data is supposed to be. If you get a "Content-Length: 1234" header, you know that you have to read 1234 bytes in your HTTP protocol implementation.
If you get a "Transfer-Encoding: chunked" header, the server will first send a hex number, followed by "\r\n". Followed by as many bytes as the hex number stated before. Followed by another hex number, "\r\n" and more data. Finally a hex number "0" will indicate the end of the data. The hex numbers and "\r\n" are not part of the HTML page - you need to remove it (if you keep it, you will end up with broken HTML or whatever you want to download).
If a server neither sends "Content-Length:" nor "Transfer-Encoding:" then you need to read until the server closes the connection.
This is also part of the HTTP protocol hat has to be implemented on top of OpenSSL for a HTTPS client. You will have to implement all three in a HTTP or HTTPS client, unless you need to communicate with only one server and you know it is only using "Content-Length: ####".

Curl doesn't send entire form-data in HTTP POST request

Edit:
Problem: 2 and Problem: 3 solved by following #melpomene comment i.e., by using number of bytes read to print the buffer.
But still struck on Problem: 1.
I have written a TCP server-client program. Later out of curiosity, I want to know about HTTP server.
My previous question: Simple TCP server can't output to web browser
Now I'm just seeing what and how the data is transferred to the server by using GET and POST( form-data and x-www-form-urlencoded for now).
I'm following How to cURL POST from the Command Line to send POST requests.
When I send x-www-form-urlencoded as:
curl -d "data=example1&data2=example2" localhost:8080
Output on Server:
POST / HTTP/1.1
Host: localhost:8080
User-Agent: curl/7.54.0
Accept: */*
Content-Length: 28
Content-Type: application/x-www-form-urlencoded
data=example1&data2=example2
This is as expected.
Problem: 1
Now comes the problem. When I try to send form-data, the output is not expected.
When I send form-data as:
curl -X POST -F "name=user" -F "password=test" localhost:8080
Output on server:
POST / HTTP/1.1
Host: localhost:8080
User-Agent: curl/7.54.0
Accept: */*
Content-Length: 244
Expect: 100-continue
Content-Type: multipart/form-data; boundary=------------------------78b7f8917ad1992c
I'm getting the boundary but I'm not getting the next part like the data I'm sending.
Problem: 2
One more odd thing is when I try to send x-www-form-urlencoded after sending form-data.
When I send x-www-form-urlencoded after form-data as:
curl -d "data=example1&data2=example2" localhost:8080
Output on server:
POST / HTTP/1.1
Host: localhost:8080
User-Agent: curl/7.54.0
Accept: */*
Content-Length: 28
Content-Type: application/x-www-form-urlencoded
data=example1&data2=example2------------78b7f8917ad1992c
Why am I getting boundary here?
Problem: 3
And also while sending GET as:
curl localhost:8080
Output on server:
GET / HTTP/1.1
Host: localhost:8080
User-Agent: curl/7.54.0
Accept: */*
ontent-Length: 28
Content-Type: application/x-www-form-urlencoded
data=example1&data2=example2------------78b7f8917ad1992c
I'm getting Content-Type and x-www-form-urlencoded data along with boundary.
What am I doing wrong?
Is something wrong with my code or with my understanding?
Server.c:
// Server side C program to demonstrate Socket programming
#include <stdio.h>
#include <sys/socket.h>
#include <unistd.h>
#include <stdlib.h>
#include <netinet/in.h>
#include <string.h>
#define PORT 8080
int main(int argc, char const *argv[])
{
int server_fd, new_socket; long valread;
struct sockaddr_in address;
int addrlen = sizeof(address);
char buffer[1024] = {0};
char *hello = "HTTP/1.1 200 OK\nContent-Type: text/plain\nContent-Length: 12\n\nHello world!";
// Creating socket file descriptor
if ((server_fd = socket(AF_INET, SOCK_STREAM, 0)) == 0)
{
perror("In socket");
exit(EXIT_FAILURE);
}
address.sin_family = AF_INET;
address.sin_addr.s_addr = INADDR_ANY;
address.sin_port = htons( PORT );
if (bind(server_fd, (struct sockaddr *)&address, sizeof(address))<0)
{
perror("In bind");
exit(EXIT_FAILURE);
}
if (listen(server_fd, 10) < 0)
{
perror("In listen");
exit(EXIT_FAILURE);
}
while(1)
{
printf("\n+++++++ Waiting for new connection ++++++++\n\n");
if ((new_socket = accept(server_fd, (struct sockaddr *)&address, (socklen_t*)&addrlen))<0)
{
perror("In accept");
exit(EXIT_FAILURE);
}
valread = read( new_socket , buffer, 1024);
printf("%s\n",buffer );
write(new_socket , hello , strlen(hello));
printf("------------------Hello message sent-------------------\n");
close(new_socket);
}
return 0;
}

Problem 1
I have checked the request headers and found Expect: 100-continue header. This is the first time I have seen this header.
A simple search in Google shows this is causing the problem.
Expect: 100-Continue' Issues and Risks (I'm just gonna paste everything to avoid dead link)
How the Expect: 100-Continue Header Works
When Expect: 100-Continue is NOT present, HTTP follows approximately the following flow (from the
client's point of view):
1. The request initiates a TCP connection to the server.
2. When the connection to the server is established, the full request--which includes both the request headers and the request body--is transmitted to the server.
3. The client waits for a response from the server
(comprised of response headers and a response body).
4. If HTTP
keep-alives are supported, the request is optionally repeated from
step 2.
When the client is using the Expect: 100-Continue feature, the
following events occur:
1. The request initiates a TCP connection to the server.
2. When the
connection to the server is established, the request--including the
headers, the Expect: 100-Continue header, without the request body--is
then transmitted to the server.
3. The client then waits for a response
from the server. If the status code is a final status code, using the
prior steps above the client retries the request without Expect:
100-Continue header. If the status code is 100-Continue, the request
body is sent to the server.
4. The client will then wait for a response
from the server (comprised of response headers and a response body).
5. If HTTP keep-alives are supported, the request is optionally repeated
from step 2.
Why use Expect: 100-Continue?
API POST requests that
include the Expect: 100-Continue header save bandwidth between the
client and the server, because the server can reject the API request
before the request body is even transmitted. For API POST requests
with very large request bodies (such as file uploads), the server can,
for example, check for invalid authentication and reject the request
before the push body was sent, resulting in significant bandwidth
savings.
Without Expect: 100-Continue:
Without the Expect: 100-Continue
feature, the entire API request, including the (potentially large)
push body would have to be transmitted before the server could even
determine if the syntax or authentication is valid. However, since the
majority of our API requests have small POST bodies, the benefits of
separating the request header from the request body is negligible.
Problems when the request header and body are sent separately
Because
of the high volume of requests that Urban Airship handles, many levels
of complexity exist between our clients and the servers responsible
for responding to API requests. This is not an abnormal phenomenon for
most server configurations and strategies, but it does introduce a
risk of elevated request failures to any API POST requests using the
Expect: 100-Continue header. This is due to the fact that the request
header and the request body are sent separately from one another, and
must travel through the same connection throughout the entire API
server infrastructure.
With the number of proxies, load-balancing servers, and back-end
request processing servers that are implemented, requests with the
Expect: 100-Continue header have an increased probability of becoming
separated from one another, and hence returning with an error.
What To Expect:
We've always attempted to support Expect: 100-Continue.
However, we have determined that our customers that use Expect:
100-Continue are receiving a sub-optimal quality of service due to
elevated request failures.
Additionally, the majority of our API requests have small POST bodies,
and as a result the benefits of separating the request header from the
request body are negligible. These reasons have motivated us to
disable support for Expect: 100-Continue service-wide.
Our Recommendations:
We recommend against the use of Expect:
100-Continue. If you receive an HTTP Error 417 (Expectation failed),
retry the request without Expect: 100-Continue.
So, to prevent Expect: 100-continue header in POST form-data, include -H 'Expect:' in your `curl
curl -X POST -F "name=user" -F "password=test" localhost:8080 -H 'Expect:'
Now you can receive your entire data in one go(just like Postman) as you said in comments.
Problem 2 & 3
As #melpomene said in comments, read() doesn't put \0 after reading. That's why you are seeing data from previous requests.
So, just use valread to iterate over string to print or just declare variable in your while loop as I said in the comments.
Code:
while(1)
{
printf("\n+++++++ Waiting for new connection ++++++++\n\n");
if ((new_socket = accept(server_fd, (struct sockaddr *)&address, (socklen_t*)&addrlen))<0)
{
perror("In accept");
exit(EXIT_FAILURE);
}
char buffer[30000] = {0}; // This way you get new variable everytime. So, there is no need to iterate over the string using valread value.
valread = read( new_socket , buffer, 30000);
printf("%s\n",buffer );
write(new_socket , hello , strlen(hello));
printf("------------------Hello message sent-------------------%lu\n", valread);
close(new_socket);
}

File descriptor does write data in C

I am using the joyent http-parser and lib event to create my own simple web server in C.
I have it connecting to the port and receiving the HTTP requests fine, however, I am not able to send a response using the socket.
So in one window I enter:
$ curl localhost:5555/a.txt
and in my server I receive and correctly handled it. TO the point I do the following:
http_data_cb* message_complete_req_callback(http_parser* parser)
{
int i;
http_request_t* http_request = parser->data;
http_response_t* http_response = generate_response(http_request);
printf("Writing %d:\n%s", strlen(http_response->full_message), http_response->full_message);
i = write(http_request->fd, http_response->full_message, strlen(http_response->full_message));
fsync(http_request->fd);
printf("Wrote: %d\n", i);
return 0;
}
Which prints the following:
Writing 96:
HTTP/1.0 200 OK
Tue, 04 Aug 2015 10:20:58 AEST
Server: mywebserver/jnd
Connection: close
Wrote: 96
However my curl instance doesn't receive anything. Any ideas?
Thanks in advance.

Your response contains no data, just headers. Curl strips off the headers and just prints the content. Not only that but you're responding with HTTP/1.0 which is a long way out of date. As it happens, the Connection: close only makes sense in 1.1 as 1.0 does not support keeping a connection open.
For curl to report anything you need to send some content. I'd expect the output to be something like:
Writing 128:
HTTP/1.1 200 OK
Tue, 04 Aug 2015 10:20:58 AEST
Server: mywebserver/jnd
Connection: close
Content-Length: 12
Hello World
Wrote: 128
Which would trigger curl to print:
Hello World
Notice the 12 character content length includes 1 character for the line feed. The content is Hello World<lf>.

Microchip webserver, don't receive ack after HTTP 200 OK frame

My current project is a bare-metal webserver. For this I'm using no libraries, and programming directly onto the chip. What I'm trying to do right now is send a single piece of HTTP data:
HTTP/1.1 200 OK\r\n
Content-Length: 45\r\n
Content-Type: text/html\r\n
Server: microserver\r\n
Connection: close\r\n
\r\n
<!DOCTYPE html><html>Hello<br>world!<hr></html>
My server tries to go through the following steps:
Receive SYN
Send [SYN+ACK]
Receive ACK
Receive ACK containing HTTP GET
Send [ACK,PUSH,FIN] with HTTP data (this one changed a lot, I've tried sending ACK PUSH and FIN seperately (with the content in PUSH), tried [ACK+PUSH],FIN and ACK,[PUSH+FIN] as well.
Receive [ACK+FIN] <<--- Here's where it goes wrong, this one is never even sent, according to wireshark.
Send ACK.
As said, it goes wrong at step 6. every single time. No matter what combination of ACK, PUSH and FIN I use in step 5. When looking at it with wireshark all the SEQ and ACK numbers are correct.
My server is able to close connections once the [FIN+ACK] finally does get sent, which sometimes happens on sockets that are kept open by the browser synchronously.
Pcap file of what wireshark records: https://www.dropbox.com/s/062k9hkgkenaqij/httpdump.pcap with as filter: (tcp.dstport == 80 || tcp.srcport == 80) && (ip.dst == 169.254.100.100 || ip.src == 169.254.100.100)
I know there is a 4 year old very similar question, Building a webserver, client doesn't acknowledge HTTP 200 OK frame, but I've tried pretty much everything that was suggested in there, and it didn't seem to get me any further, so I'm out of ideas.
EDIT:
I have found the problem, after studying sections of wireshark captures for hours an end. Apparently the mistake was that I did not use data to calculate the TCP checksum... But well, found the solution.

Remove HTTP Header Info

In C is there a way to exclude the HTTP header information that comes with the data when using recv() on a socket? I am trying to read some binary data and all I want is the actual binary information, not the HTTP header information. The current data received looks like this:
HTTP/1.1 200 OK
Content-Length: 3314
Content-Type: image/jpeg
Last-Modified: Tue, 20 Mar 2012 14:51:34 GMT
Accept-Ranges: bytes
ETag: "45da99f1a86cd1:6b9"
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
Date: Mon, 20 Aug 2012 14:10:08 GMT
Connection: close
╪ α
I would like only to read the binary portion of the file. (That's obviously not all the binary, only that much was displayed since I printed the output from my recv loop as a string and the first NULL char is after that small binary string).
I just need to get rid of the header portion, is there a simple way to do this?

You would be better of using some HTTP parsing library like curl
If you want to do it yourself:
You can search for '\r\n\r\n' (two \r\n) which separates HTTP headers and contents, and use string/buffer after that.
Also, you need to get Content-Length from header and read that many bytes as http content.
Something like:
/* http_resp has data read from recv */
httpbody = strstr(http_resp, "\r\n\r\n");
if(httpbody)
httpbody += 4; /* move ahead 4 chars
/* now httpbody has just data, stripped down http headers */
Note: make sure strstr does not overrun the memory, may be using strnstr (not sure this exists or not) or similar functions.

I think you need to extract the value of the Content-Length to know the size of the binary data to be read otherwise it will be impossible to know whether all data has been received. A simple approach to consume, and mostly ignore, the header portion is to read the incoming data byte-by-byte until "\r\n\r\n" is encountered, which indicates the end of the header section and the beginning of the content.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight