How HTTP downloads are segmented inside TCP packets? - file

I'm trying to figure out how an HTTP server is encoding/spliting a file during an http download.
When I'm using Wireshark I can find four HTTP Headers (see below) and a bunch of TCP packets without any headers. I would like to know how tcp packets are formed and if I can retrieve any concrete data from them (like the name of the file, any ID or something substantial).
First header :
GET /upload/toto.test HTTP/1.1
Host: 192.168.223.167:90
Connection: keep-alive
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.114 Safari/537.36
Accept-Encoding: gzip,deflate,sdch
Accept-Language: fr-FR,fr;q=0.8,en-US;q=0.6,en;q=0.4
Range: bytes=3821-3821
If-Range: "40248-5800428-4fab43ec800ce"
Second header :
HTTP/1.1 206 Partial Content
Date: Sat, 31 May 2014 21:25:31 GMT
Server: Apache/2.2.22 (Debian)
Last-Modified: Sat, 31 May 2014 15:59:21 GMT
ETag: "40248-5800428-4fab43ec800ce"
Accept-Ranges: bytes
Content-Length: 1
Content-Range: bytes 3821-3821/92275752
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Third :
GET /upload/toto.test HTTP/1.1
Host: 192.168.223.167:90
Connection: keep-alive
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.114 Safari/537.36
Accept-Encoding: gzip,deflate,sdch
Accept-Language: fr-FR,fr;q=0.8,en-US;q=0.6,en;q=0.4
Range: bytes=3821-92275751
If-Range: "40248-5800428-4fab43ec800ce"
Last one :
HTTP/1.1 206 Partial Content
Date: Sat, 31 May 2014 21:25:31 GMT
Server: Apache/2.2.22 (Debian)
Last-Modified: Sat, 31 May 2014 15:59:21 GMT
ETag: "40248-5800428-4fab43ec800ce"
Accept-Ranges: bytes
Content-Length: 92271931
Content-Range: bytes 3821-92275751/92275752
Keep-Alive: timeout=5, max=99
Connection: Keep-Alive
TCP Packet following the fourth http header (in ASCII) :
PV)?FEM##cZU:P-O"-~zLW^2&$Z$f5APzve~BuH5/}`z2MI"{lIQCBmTO-ah6O)497Kro+gS((R
8n8_lMXusDp{Qs1g?j~iZaB.ADI|yp((t3#4SA4[MV#N1(2He|a9}Dw`'=k^C;G%#KUD``Sw:KXYG1{pxP,*`BSAMO0?FlFb(~X/|Ub=H[b7Y'NAP])IARH(g*LI}AE%BzFOzN5Xf7$D|.Hw00AUh[lE)ovKAUmcSuFnzQS+T0=z7;#nKX2!>ik)p73a5{h2ZZo~etin"UCFc+#ZjgB60y()-1{e|XRj9r:zDM(ulcSAayGeZCks7Nnz{L8(&L8Ew?J9}WA/t?^xS{sbnw8J7/%Iqt0i4_h*D6?|[&3zFngl~ku>#RVp+:`'RdtKh(",MPJqx5
tov&pZV8)'X?iW(J1d-!]FM>_Q\V=&xYH C9G?dp6&
\td|k$AY!D^`HnW=OsMcbV(*(RQL-xhWPa\:C>-M'oH fGwr:0=\K7!lMoPH)fB2OSUrg89
For the curious, this file is an image of Android (sample for the question).
EDIT for CodeCaster :
I'm trying to limit the output bandwidth generated by a download requested on a nodejs server, the thing is I have to do this at a network level (with Iptables actually) and not at a code level. To do this and because it is a per user limit I have to gather a significant string that I could use to filter packets (an ASCII string or an hexa string) and limit the user download bandwith. My original question is about how the content is formated/encoded, I'm not trying to find another way (because I know there are) it is a context constraint.

TCP is a protocol in the OSI model, and PDU's (aka packets) are processed in each layer of the OSI model. In each layer, the PDU gets another header, so by the time it reaches the transport layer, it already has one header from the application layer. TCP then puts on its own header, and the PDU goes on to the network layer for further processing.
As far as data size of the PDU, that depends on the physical protocol's MTU (maximum transfer unit) For instance, Ethernet's MTU is 1500 bytes.
And as far as getting data, if you mean from the header, it's simple enough to code a solution that searches for certain attributes (like Content-Length or Server). If you mean to get data from the data PDU, that is generally not a good idea unless you are looking for analytic purposes, in which case Wireshark should work. (If I recall; it's been a long time since I used Wireshark.)

Related

Too many OPTIONS requests

In my application, the front end (ReactJS using axios, if that matters) makes some API calls to the backend (Node/Express, again if that matters). In all of the responses, server does responds with Access-Control-Allow-Origin:* (This is a test environment, appropriate changes will be made to allow specific origins in production).
In the Chrome Developer Tools Network tab, I observe that for every request say POST /assets , POST /filters, PUT /media etc., a preflighted OPTIONS request is sent. Now I do understand from here, the reason for those and that's fine.
OPTIONS Request Headers
OPTIONS /api/v1/content/bb54fbf52909f78e015f/f91659797e93cba7ae9b/asset/all
HTTP/1.1
Host: XX.X.XX.XXX:5000
Connection: keep-alive
Access-Control-Request-Method: POST
Origin: http://localhost:3000
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36
Access-Control-Request-Headers: authorization,content-type
Accept: */*
DNT: 1
Referer: http://localhost:3000/main/93f1ced0f15f35024402/assets
Accept-Encoding: gzip, deflate
Accept-Language: en,en-US;q=0.8,mr;q=0.6
Response Headers
HTTP/1.1 204 No Content
Access-Control-Allow-Origin: *
Access-Control-Allow-Methods: GET,HEAD,PUT,PATCH,POST,DELETE
Vary: Access-Control-Request-Headers
Access-Control-Allow-Headers: authorization,content-type
Date: Sat, 05 Aug 2017 10:09:16 GMT
Connection: keep-alive
My observation is that this is sent for literally every requests, and repetitively i.e. even if the same request is being made again (immediately or otherwise).
My questions are
Is this necessarily a bad thing (i.e. would it cause any performance issues, even minor)?
Why doesn't browser remember the header responses for the same server, same request?
Is there anything I am missing to configure on the front end or backend for making this sticky?
You need to send the Access-Control-Max-Age header to tell the browser that it’s OK to cache your other Access-Control-* headers for that many seconds:
Access-Control-Max-Age: 600

UNIX C HTTP request returning 301 Moved Permanently

I am familiar with the 301 error code but new to http requests and formatting them correctly.
In my program i need to retrieve my school's homepage, but i get a 301 Moved Permanently header. The header's location says where the page moved to, but even that new location won't work for me, probably because i didn't format it correctly.
Initially i send this request:
GET / HTTP/1.1\r\nHost: www.cs.uregina.ca\r\nConnection: close\r\n\r\n
And receive this header:
Received: HTTP/1.1 301 Moved Permanently
Date: Tue, 04 Nov 2014 05:38:42 GMT
Server: Apache
Location: http://www.cs.uregina.ca/
Connection: close
Transfer-Encoding: chunked
Content-Type: text/html; charset=utf-8
What should my new HTTP request look like to get the above moved webpage?
If i try the location of the moved page like it suggests then i get the following 400 Bad Request Response:
GET / HTTP/1.1\r\nHost: http://www.cs.uregina.ca\r\nConnection: close\r\n\r\n
Received: HTTP/1.1 400 Bad Request
Date: Tue, 04 Nov 2014 05:52:36 GMT
Server: Apache
Content-Length: 334
Connection: close
Content-Type: text/html; charset=iso-8859-1
Initially i send this request:
GET / HTTP/1.1\r\nHost: www.cs.uregina.ca\r\nConnection: close\r\n\r\n
And receive this header:
Received: HTTP/1.1 301 Moved Permanently
...
Location: http://www.cs.uregina.ca/
...
This is exactly what I get when I request cs.uregina.ca. You have probably connected to cs.uregina.ca (or some subdomain other than www), or to an IP address the does not correspond to www.cs.uregina.ca.
If i try the location of the moved page like it suggests then i get
the following 400 Bad Request Response:
GET / HTTP/1.1\r\nHost: http://www.cs.uregina.ca\r\nConnection: close\r\n\r\n
Received: HTTP/1.1 400 Bad Request
...
This is not surprising. You must remove the http:// protocol from the Host: header. Eg:
GET / HTTP/1.1\r\nHost: www.cs.uregina.ca\r\nConnection: close\r\n\r\n
In general, when requesting a URL such as the following:
http://domain.example:80/path/to/resource/?query#fragment
---- -------------- ==------------------------
protocol host | path
port
you would:
resolve the host name to an IP address, and connect to that IP address on port (if present in the URL) or the default port associated with the protocol.
Communicate with the server using a mechanism specific to protocol. In this case, an HTTP request.
Request path from the server with an appropriate Host: header (in case there are multiple hosts on the same IP).
The fragment identifier is used with (X)HTML and is not actually sent to the server.
The request should (at a minimum) look like this:
GET /path/to/resource/?query HTTP/1.1
Host: domain.example
Connection: close
The full details can be found in:
RFC 7230: Hypertext Transfer Protocol (HTTP/1.1): Message Syntax and Routing.
RFC 7231: Hypertext Transfer Protocol (HTTP/1.1): Semantics and Content.
RFC 7232: Hypertext Transfer Protocol (HTTP/1.1): Conditional Requests.
RFC 7233: Hypertext Transfer Protocol (HTTP/1.1): Range Requests.
RFC 7234: Hypertext Transfer Protocol (HTTP/1.1): Caching.
RFC 7235: Hypertext Transfer Protocol (HTTP/1.1): Authentication.
If you just want the homepage, download nc and type "nc www.cs.uregina.ca 80"
When nc starts type the following and then hit return twice:
GET http://www.cs.uregina.ca HTTP/1.0

How to live stream video using C program. What should be the HTTP reply ? How can I use chunked encoding if possible?

(the actual question has been edited because I was successful doing live streaming, BUT NOW I DO NOT UNDERSTAND THE COMMUNICATION between client and my C code.)
Okay I finally did live streaming using my C code. BUT I COULD NOT UNDERSTAND HOW "HTTP" IS WORKING HERE.
I studied the communication b/w my browser and the server at the link http://www.flumotion.com/demosite/webm/ using wireshark.
I found that the client first sends this GET request
GET /ahiasfhsasfsafsgfg.webm HTTP/1.1
Host: localhost
Connection: keep-alive
Referer: file:///home/anirudh/Desktop/anitom.html
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US) AppleWebKit/534.13 (KHTML, like Gecko) Chrome/9.0.597.98 Safari/534.13
Accept-Encoding: gzip,deflate,sdch
Accept-Language: en-US,en;q=0.8
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3
Range: bytes=0-1024
to this get request the server responds by sending this reply
HTTP/1.0 200 OK
Date: Tue, 01 Mar 2011 06:14:58 GMT
Connection: close
Cache-control: private
Content-type: video/webm
Server: FlumotionHTTPServer/0.7.0.1
and then the server sends the data until the client disconnects. The client disconnects when it receives a certain amount of data. The CLIENT then connects to the server on a new port and the same GET request is sent to the server. The server again gives the same reply but this time the client does not disconnect but continuously reads the packets until the server disconnects. I wrote a C code which in which I have a server socket which replicates the above behavior. (thanks to wireshark, flumotion and stackoverflow)
BUT BUT BUT, I could not understand why does the client need to send two requests and why does it resets on the first request and again send the same request on a new port and this time it listens to the data as if its getting live streamed.
Also I do not know how I can live stream using chunked encoding.
The same thing in detail is available here : http://systemsdaemon.blogspot.com/2011/03/live-streaming-video-tutorial-for.html
and here http://systemsdaemon.blogspot.com/2011/03/http-streaming-video-using-program-in-c.html
Please help me out. Thanks in advance.
The first request is limited to 1024 bytes in order to test that the stream is actually a valid video source and not say a 600MB Windows executable.

Http POST.. why doesn't print anything?

I'm writing a sniffer program with the pcap library that checks the http traffic. I succeed when I m looking for GET messages or status codes but I don't know why it doesn't work for the post requests.
I tried to use wireshark and I saw that for POST requests, in addition to the http protocol there is also an Line-based text data: application/x-www-form-urlencoded "protocol".
When I try to print the content of the payload I didn't get results or I get strange characters.. so I was thinking that maybe the problem is this "Line-based.." stuff..
Any idea of the possible cause?
The strange characters may be from utf-8 encoded as opposed to ascii encoded POSTs. It also depends which applications you are looking to capture, as some Flash apps use POST requests but encrypt them to prevent tampering.
EDIT: See my answer to your other question
This is what I'm capturing with tcpdump. What do you see?
POST /xml/crud/posttest.cgi HTTP/1.1
Host: www.snee.com
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.12) Gecko/20101027 Fedora/3.6.12-1.fc13 Firefox/3.6.12
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Connection: keep-alive
Referer: http://www.snee.com/xml/crud/posttest.html
Content-Type: application/x-www-form-urlencoded
Content-Length: 21
fname=test&lname=test

Google Mobile Ads in Win Phone 7 Applications

I'm interested in using Google Ads in my Win Phone 7 Application. I've created a custom control that currently uses AdMob services to load ads, and I'm interested in incorporating a Google Ads provider (as well as any others I can). You can see the source for this control here: https://bitbucket.org/jacob4u2/moads/wiki/Home.
The best case scenario for me would be information about some kind of REST based JSON service that I could call and get back information like; Image Url, Ad Text, Ad Link Url. I've already done some research with the javascript that is added to a website that calls out to such a service to get ads, I would just like to know the legality and possibility of using this underlying service for myself.
Here's a look at the underlying service request and response from the Google Mobile Website Ad Sense Javascript from Fiddler:
GET http://googleads.g.doubleclick.net/pagead/ads?oe=utf8&ad_type=text_image&client=[someclientstring]&color_bg=FFFFFF&color_border=336699&color_link=0000FF&color_text=000000&color_url=008000&correlator=1283032525791&dt=1283032525791&ea=0&flash=0&format=320x50_mb&frm=1&js=afmc-v1.1&output=html&u_ah=738&u_aw=1366&u_cd=32&u_h=768&u_w=1366&u_his=1&u_tz=-240&url=http%3A%2F%2Flocalhost%3A53339%2F&dtd=5 HTTP/1.1
Host: googleads.g.doubleclick.net
Connection: keep-alive
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.127 Safari/533.4
Referer: http://localhost:53339/
Accept: application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Encoding: gzip,deflate,sdch
Accept-Language: en-US,en;q=0.8
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3
Cookie: __gads=ID=2ca5d68be0ad9c24:T=1276802611:S=ALNI_Mb20Pe5DhybgSn6XMox3s10fBFcgw; VWCUK200=L070410/Q46888_8658_5_070410_2_123110_188666x187920x070410x1x2/Q46885_8658_5_062810_1_123110_188672x187926x062910x1x1; id=ca99132260000f4|1782317/496326/14815|t=1272328868|et=730|cs=w4txjauw
HTTP/1.1 200 OK
P3P: policyref="http://googleads.g.doubleclick.net/pagead/gcn_p3p_.xml", CP="CURa ADMa DEVa TAIo PSAo PSDo OUR IND UNI PUR INT DEM STA PRE COM NAV OTC NOI DSP COR"
Content-Type: text/html; charset=UTF-8
X-Content-Type-Options: nosniff
Date: Sat, 28 Aug 2010 21:54:25 GMT
Server: cafe
Cache-Control: private, x-gzip-ok=""
Content-Length: 603
X-XSS-Protection: 1; mode=block
<html><body style="background-color:transparent"></body></html>
Looks like a lot of parameters, hopefully I've removed any confidential stuff. Anyone ever looked into anything like this?
I would contact Google to see if this is within their terms of service - it would be a shame to do the coding and then find out that that you get no revenue from them.
I would also consider how the ads are chosen if this is not a web page. Typically the ads are chosen base don the page context. In Silverlight apps on the phone there is no web page context.

Resources