I am making a http c client socket. So far i have made a custom url parser and now the problem is connecting to absolute urls. The program works fine with relative urls but not absolute ones.
Here is a sample output for the results of both absolute and relative urls:
absolute url: http://www.google.com
relative url : http://techpatterns.com/downloads/firefox/useragentswitcher.xml
In an absolute url it gives a 301/302 status code while in a relative url the status is 200 OK
Here is a sample code of the key areas
char ip[100],*path, *domain, *abs_domain, *proto3;
char *user_agent = "Mozilla/5.0 (Windows NT 6.2; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0";
char *accept_type = "Accept: text/html, application/xhtml+xml, */*\r\nAccept-Language: en-US\r\n";
char *encoding = "Accept-Encoding: gzip, deflate\r\n";
char *proxy_conn = "Proxy-Connection: Keep-Alive\r\n";
char hostname[1000];
url:
fgets(hostname,sizeof(hostname), stdin);
for(i=0; i<strlen(hostname);i++){//remove new line
if(hostname[i]=='\n'){
hostname[i]='\0';
}
}
proto3 = get_protocol(hostname); //get protocol i.e. http, ftp, etc
//get domain ie http://mail.google.com/index -> mail.google.com
//http://www.google/com/ssl_he -> www.google.com
domain = get_domain(hostname);
if(strlen(domain)==0){
printf("invalid url\n\n");
goto url;
}
abs_domain = get_abs_domain(hostname);//gets abs domain google.com, facebook.com etc
path = get_path(hostname);
//getting the ip address from the hostname
if ( (he = gethostbyname( abs_domain ) ) == NULL)
{
printf("gethostbyname failed : %d" , WSAGetLastError());
goto url;
}
//Cast the h_addr_list to in_addr , since h_addr_list also has the ip address in long format only
addr_list = (struct in_addr **) he->h_addr_list;
for(i = 0; addr_list[i] != NULL; i++)
{
//Return the first one;
strcpy(ip , inet_ntoa(*addr_list[i]) );
}
clientService.sin_addr.s_addr = inet_addr(ip);
clientService.sin_family = AF_INET;
clientService.sin_port = htons(80);
sprintf(sendbuf, "GET /%s HTTP/1.1\r\n%sUser-Agent: %s\r\nHost: %s\r\n\r\n", path,accept_type,user_agent, abs_domain);
Brief exlanation of the code:
i.e. if the url entered by the user is http://mail.deenze.com/control_panel/index.php
the protocol will be -> http
the domain will be -> mail.deenze.com
the abs_domain will be -> deenze.com
the path will be control_panel/index.php
Finally this values in conjunction with the user agent will be used to send the data.
301 and 302 status codes are redirects, not errors. They indicate that you should try the request at a different URL instead.
In this case, it looks like despite the fact that you entered the URL http://www.google.com/, the Host header you are sending only includes google.com. Google is sending you back a redirect telling you to use www.google.com instead.
I notice that you seem to have a get_abs_domain function that is stripping the www off; there is no reason you should do this. www.google.com and google.com are different hostnames, and may give you entirely different contents. In practice, most sites will give you the same result for them, but you can't depend on that; some will redirect from one to the other, some will simply serve up the same content, and some may only work at one or the other.
Instead of trying to rewrite one to the other, you should just follow whatever redirect you are returned.
I would recommend using an existing HTTP client library rather than trying to write your own (unless this is just an exercise for your own edification). For example, there's cURL if you want to be portable or HttpClient if you only need to work on Windows (based on your screenshots, I'm assuming that's the platform you're using). There is a lot of complexity in writing an HTTP client that can actually handle most of the web; SSL, compression, redirects, chunked transfer encoding, etc.
#Brian Campbell, i think the problem was the www, because if i use www.google.com it gives me a redirect url: https://www.google.com/?gws_rd=ssl same as my browser, but because it is a https i think i will have to use ssl, thank for your answer
I cant copy paste the text in my terminal but i have increase the fonts for visibility purposes
Related
I am making a post request from my ESP32 S2 Kaluga kit.
I have tested the HTTP request while running a server program in my LAN.
I am using
esp_http_client_handle_t and esp_http_client_config_t from
esp_http_client.h to do this.
Now, I have a HTTPS api setup in AWS API gateway. I get following error with https now:
E (148961) esp-tls-mbedtls: No server verification option set in esp_tls_cfg_t structure. Check esp_tls API reference
E (148961) esp-tls-mbedtls: Failed to set client configurations, returned [0x8017] (ESP_ERR_MBEDTLS_SSL_SETUP_FAILED)
E (148971) esp-tls: create_ssl_handle failed
E (148981) esp-tls: Failed to open new connection
E (148981) TRANSPORT_BASE: Failed to open a new connection
E (148991) HTTP_CLIENT: Connection failed, sock < 0
How can I solve this? Thank you
Edit:
Following is the code I use
I create a http client for post request:
esp_err_t client_event_get_handler(esp_http_client_event_handle_t evt)
{
switch (evt->event_id)
{
case HTTP_EVENT_ON_DATA:
printf("HTTP GET EVENT DATA: %s", (char *)evt->data);
break;
default:
break;
}
return ESP_OK;
}
static void post_rest_function( char *payload , int len)
{
esp_http_client_config_t config_post = {
.url = SERVER_URL,
.method = HTTP_METHOD_POST,
.event_handler = client_event_get_handler,
.auth_type = HTTP_AUTH_TYPE_NONE,
.transport_type = HTTP_TRANSPORT_OVER_TCP
};
esp_http_client_handle_t client = esp_http_client_init(&config_post);
esp_http_client_set_post_field(client, payload, len);
esp_http_client_set_header(client, "Content-Type", "image/jpeg");
esp_http_client_perform(client);
esp_http_client_cleanup(client);
}
and I use it in main with an image payload:
void app_main(){
....
post_rest_function( (char *)pic->buf, pic->len);
....
}
You need certificate to make https requests. In case you dont want to implement this, just edit your sdkconfig "Allow potentially insecure options" -> true
"Skip server certificate verification by default" -> true
Careful, this is unsafe.
Additionally, you may choose to include the certificates to make sure that your transfer is safe (valid server).
You can obtain the root SSL certificate of your host like so watch through till 56 minute mark for a complete explanation.
OR you may use the included certificate bundle that espressif provides in the IDF framework, for that:
In your code include #include "esp_crt_bundle.h"
and in your client_config_t add these:
.transport_type = HTTP_TRANSPORT_OVER_SSL, //Specify transport type
.crt_bundle_attach = esp_crt_bundle_attach, //Attach the certificate bundle
after which the process remains quite the same.
The video I linked above is quite helpful, I recommend you watch the whole thing :)
I am unable to receive the response to multiple HTTP requests when I attempt to enqueue data to send to a server.
We are able to establish a connection to a server and immediately issue an HTTP request inside the connected_callback() function (called as soon as a connection to the server is established) using the tcp_write() function. However, if I attempt to generate two HTTP resquests or more using the following syntax:
err_t connected_callback(void *arg, struct tcp_pcb *tpcb, err_t err) {
xil_printf("Connected to JUPITER server\n\r");
LWIP_UNUSED_ARG(arg);
/* set callback values & functions */
tcp_sent(tpcb, sent_callback);
tcp_recv(tpcb, recv_callback);
if (err == ERR_OK) {
char* request = "GET /circuits.json HTTP/1.1\r\n"
"Host: jupiter.info.polymtl.ca\r\n\r\n";
(void) tcp_write(tpcb, request, 100, 1);
request = "GET /livrable1/simulation.dee HTTP/1.1\r\n"
"Host: jupiter.info.polymtl.ca\r\n\r\n";
(void) tcp_write(tpcb, request, 100, 1);
tcp_output(tpcb);
xil_printf("tcp_write \n");
} else {
xil_printf("Unable to connect to server");
}
return err;}
I manage to send all of the data to the server, but I never receive any data for the second HTTP request. I manage to print the payload for the first request (the JSON file) but I never manage to receive anything for the .dee file. Are there any specific instructions to enqueue HTTP requests together with lwIP or am I missing something?
If you require any more code to accurately analyze my problem, feel free to say so.
Thanks!
The problem I see is that you have double \r\n combination at the end of your request header statement.
You need \r\n\r\n only at the end of your header. Now, you have double times. Remove from first write.
Now I have got a url list and I want to get all the webpages back. Here is what i have done:
for each url:
getaddrinfo(hostname, port, &hints, &res); // DNS
// create socket
sockfd = socket(res->ai_family, res->ai_socktype, res->ai_protocol);
connect(sockfd, res->ai_addr, res->ai_addrlen);
creatGET();
/* for example:
GET / HTTP/1.1\r\n
Host: stackoverflow.cn\r\n
...
*/
writeHead(); // send GET head to host
recv(); // get the webpage content
end
I have noted that many url's are under the same host, for example:
http://job.01hr.com/j/f-6164230.html
http://job.01hr.com/j/f-6184336.html
http://www.012yy.com/gangtaiju/32692/
http://www.012yy.com/gangtaiju/35162/
so I wonder, can I just connect only once to each host and then just creatGET(),writeHead() and recv() for each url? That may save a lot of time. So I changed my program like this:
split url into groups by their host;
for each group:
get hostname in the group;
getaddrinfo(hostname, port, &hints, &res);
sockfd = socket(res->ai_family, res->ai_socktype, res->ai_protocol);
connect(sockfd, res->ai_addr, res->ai_addrlen);
for each url in the group:
creatGET();
writeHead();
recv();
end
end
unfortunately, I find my program can only get the first webpage in each group back, and the rest all return empty file.
Am I missing something? Maybe the sockfd need some kind of reset for each recv() ?
Thank you for you generous help .
HTTP 1.1 connections are persistent meaning that after e.g. a POST/GET - 200 OK sequense the next request-response sequence could reuse the already established TCP connection.
But this is not mandatory. The connection could close at any time, so you should code for that as well.
Also it seems to me that you are trying to implement your own HTTP client.
I am not sure why you would want to do that, but anyway if you must you should read a little bit about the HTTP RFC to understand about the various headers to make sure that the underlying TCP connection is open as long as possible.
Of course if your server is an old HTTP1.0 you should not expect any reuse of connection unless explicitely indicated via keep-alive headers
I am trying to load a url and I get this error:
DownloadError: ApplicationError: 2 Too many repeated redirects
This is the code I am using:
headers = { 'User-Agent' : 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_8; de-at) AppleWebKit/533.21.1 (KHTML, like Gecko) Version/5.0.5 Safari/533.21.1' }
url = "http://www.cafebonappetit.com/menu/your-cafe/collins-cmc/cafes/details/50/collins-bistro"
cmcHTM = urlfetch.fetch(url=url)
cmcHTML = str(cmcHTM.content)
I check the redirections of this website at: http://www.internetofficer.com/seo-tool/redirect-check/
and I found that this site is redirected to itself! So url fetch seems to be going in circles trying to load this page.
Meanwhile, this page loads just fine in my browser.
So I tried using this code:
cmcHTM = urlfetch.fetch(url=url,
follow_redirects=False,
deadline=100
)
This just returns nothing though. Is there any way of getting this html?!
Sorry for the delayed response. I found this that worked:
import urllib, urllib2, Cookie
from google.appengine.api import urlfetch
class URLOpener:
def __init__(self):
self.cookie = Cookie.SimpleCookie()
def open(self, url, data = None):
if data is None:
method = urlfetch.GET
else:
method = urlfetch.POST
while url is not None:
response = urlfetch.fetch(url=url,
payload=data,
method=method,
headers=self._getHeaders(self.cookie),
allow_truncated=False,
follow_redirects=False,
deadline=10
)
data = None # Next request will be a get, so no need to send the data again.
method = urlfetch.GET
self.cookie.load(response.headers.get('set-cookie', '')) # Load the cookies from the response
url = response.headers.get('location')
return response
def _getHeaders(self, cookie):
headers = {
'Host' : 'www.google.com',
'User-Agent' : 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 (.NET CLR 3.5.30729)',
'Cookie' : self._makeCookieHeader(cookie)
}
return headers
def _makeCookieHeader(self, cookie):
cookieHeader = ""
for value in cookie.values():
cookieHeader += "%s=%s; " % (value.key, value.value)
return cookieHeader
I guess the key is the while loop - following the redirects based on the return header...
I think this is a problem in the site, not in your code. The site seems designed so it does a redirect to itself when it doesn't detect some header that is customarily sent by a browser. E.g. when I try accessing it with curl I get an empty body with a 302 redirect to itself, but in the browser I get a page. You'd have to ask the site owner what they are checking for...
I'm working on an old school linux variant (QNX to be exact) and need a way to grab a web page (no cookies or login, the target URL is just a text file) using nothing but sockets and arrays.
Anyone got a snippet for this?
note: I don't control the server and I've got very little to work with besides what is already on the box (adding in additional libraries is not really "easy" given the contraints -- although I do love libcurl)
I'd look at libcurl if you want SSL support for or anything fancy.
However if you just want to get a simple webpage from a port 80, then just open a tcp socket, send "GET /index.html HTTP/1.0\n\r\n\r" and parse the output.
I do have some code, but it also supports (Open)SSL so it's a bit long to post here.
In essence:
parse the URL (split out URL scheme, host name, port number, scheme specific part
create the socket:
s = socket(PF_INET, SOCK_STREAM, proto);
populate a sockaddr_in structure with the remote IP and port
connect the socket to the far end:
err = connect(s, &addr, sizeof(addr));
make the request string:
n = snprinf(headers, "GET /%s HTTP/1.0\r\nHost: %s\r\n\r\n", ...);
send the request string:
write(s, headers, n);
read the data:
while (n = read(s, buffer, bufsize) > 0) {
...
}
close the socket:
close(s);
nb: pseudo-code above would collect both response headers and data. The split between the two is the first blank line.