How can I tell that the content of this URL is gzip-encoded? - mime-types

I am downloading a Helm chart from https://kubernetes-charts.storage.googleapis.com/redis-0.5.1.tgz. (The fact that it is Redis or related to Helm or anything in particular is irrelevant to this question, which is just about things like Content-Encoding and so on.)
When I check its headers like this:
$ curl -H "Accept-Encoding: gzip" -I https://kubernetes-charts.storage.googleapis.com/redis-0.5.1.tgz
…I do not see a Content-Encoding header in the output, and the Content-Type is listed as being application/x-tar:
HTTP/1.1 200 OK
X-GUploader-UploadID: AEnB2UqBzSXfTToMAdMARXSjJeN0on3jaNY3u74eXcWfvqsOwRpi38Xc6T0XrrmY4otPeySaYRwXyHccHYtChoPAgFQwYZhQMhcpZRWtZURRANGdfRJoupI
Expires: Tue, 27 Jun 2017 00:21:59 GMT
Date: Mon, 26 Jun 2017 23:21:59 GMT
Cache-Control: public, max-age=3600
Last-Modified: Fri, 05 May 2017 03:03:41 GMT
ETag: "e4184c81a58fb731283847222a1f4005"
x-goog-generation: 1493953421241613
x-goog-metageneration: 1
x-goog-stored-content-encoding: identity
x-goog-stored-content-length: 3550
x-goog-meta-goog-reserved-file-mtime: 1493953414
Content-Type: application/x-tar
x-goog-hash: crc32c=bQHveg==
x-goog-hash: md5=5BhMgaWPtzEoOEciKh9ABQ==
x-goog-storage-class: STANDARD
Accept-Ranges: bytes
Content-Length: 3550
Server: UploadServer
Alt-Svc: quic=":443"; ma=2592000; v="39,38,37,36,35"
The resulting file, when downloaded, is a gzipped tar archive.
What is the proper way of programmatically detecting that the payload is in fact gzipped? Or is this a problem with the web server in question?

I think the server is misconfigured. Since .tgz is just abbreviation for .tar.gz it should get the content type application/gzip-

Content-Type: application/x-tar
this header tells you the type, but i'm not sure that's gzip
https://superuser.com/questions/901962/what-is-the-correct-mime-type-for-a-tar-gz-file
see accepted answer at
How to check if a file is gzip compressed? . for way to identify programmatically

Related

Gatling - extract token from response headers

I'm a newer in Gatling and I created a POST login request which returns the following response headers:
HTTP/1.1 302
Set-Cookie: JSESSIONID=ECA5F6FEA172B13BF5D445399C9C0962; Path=/; HttpOnly
Location: http://localhost:20001/index;jsessionid=ECA5F6FEA172B13BF5D445399C9C0962
Content-Language: en-US
Content-Length: 0
Date: Thu, 06 May 2021 16:01:20 GMT
I need to extract JSESSIONID value and use it in other requests.
I tried:
.check(regex("JSESSIONID=(.*?);").find.saveAs("token")))
however got an error
> regex(JSESSIONID=(.*?);).findAll.exists, found nothing 1 (100.0%)
Any help would be greatly appreciated!
You need use headerRegex
.check(headerRegex("Set-Cookie", """JSESSIONID=(.*?);"""").saveAs("token"))

Gmail automatically urlencodes links in mail

I'm trying to send an email to a gmail account containing a link which looks like this:
http://www.example.com/#/something#param=1
This is a link to an AngularJS application which needs the second '#' as a separator.
The problem is that gmail changes the seconds '#' to '%23' and this causes the application not to recognize the char as a separator.
Is there anything I can do with this?
Thanks.
I did a quick test using gmail to try to figure out what happens.
Here is the raw message
MIME-Version: 1.0
Received: by 10.96.50.232 with HTTP; Mon, 19 Jan 2015 05:44:30 -0800 (PST)
Date: Mon, 19 Jan 2015 14:44:30 +0100
Delivered-To: *******#gmail.com
Message-ID: <CPZt8dX3Q17wXmU7UT2iXp7q4tSn1UsDmyiUbXFVK7xE2Q0C10A#mail.gmail.com>
Subject: test
From: <name> <*******#gmail.com>
To: <name> <*******#gmail.com>
Content-Type: multipart/alternative; boundary=20cf303b41179134a6050d0185f4
--20cf303b41179134a6050d0185f4
Content-Type: text/plain; charset=UTF-8
http://www.example.com/#/something#param=1
--20cf303b41179134a6050d0185f4
Content-Type: text/html; charset=UTF-8
<div dir="ltr">http://www.example.com/#/something#param=1<br></div>
--20cf303b41179134a6050d0185f4--
The original message is showing the correct values, but I also noticed how my actual gmail client shows the second # as the html encoded %23.
Surprisingly enough, in contrast to what I suggested in my comment, using plain-text will actually give the desired result.
MIME-Version: 1.0
Received: by 10.96.50.232 with HTTP; Mon, 19 Jan 2015 06:06:39 -0800 (PST)
Date: Mon, 19 Jan 2015 15:06:39 +0100
Delivered-To: *****#gmail.com
Message-ID: <KMBt8bX2CrmEL66iRFAJ+_s_1W2eodD=9X=bMdsBK_13qzh6DaA#mail.gmail.com>
Subject: test4
From: <name> <*****#gmail.com>
To: <name> <*****#gmail.com>
Content-Type: text/plain; charset=UTF-8
http://www.example.com/#/something#param=1
I don't know how your AngularJS application reads the link from the email, so plain-text may not be an option, but the link in the above email is mapped to http://www.example.com/#/something#param=1 in my gmail client.

What should be the logic behind parsing a file containing (HTTP and data) and (ETag and data) and (HTTP,ETag and data)

I am trying to parse a file containing multiple header and data so that header part
HTTP/1.1 302 Found
Cache-Control: no-store, no-cache, private
Pragma: no-cache
Expires: Sat, 15 Nov 2008 16:00:00 GMT
P3P: policyref="http://cdn.adnxs.com/w3c/policy/p3p.xml", CP="NOI DSP COR ADM PSAo PSDo OURo SAMo UNRo OTRo BUS COM NAV DEM STA PRE"
X-XSS-Protection: 0
Location: http://ib.adnxs.com/tt?id=868557&size=728x90&referrer=facebook.com
Date: Thu, 31 Jan 2013 05:49:01 GMT
Content-Length: 0
Content-Type: text/html; charset=ISO-8859-1
can be written into a separate file(.header) and the content after this header information can be stored in a data file(.data).
But problem comes when some header part contains ETag field and
the header where only ETag is written in place on HTTP.
What should be the logic so that it can be parsed??

Mocking or Stubbing mtime for File::Stat

My object GETs a file over HTTP.
It does so, using the If-Modified-Since header. If the files has not been modified since the time in the header a Not Modified response will be returned and the file should not be fetched&written. Like so:
class YouTube
#...
def parse
uri = URI("http://i.ytimg.com/vi/#{#id}/0.jpg")
req = Net::HTTP::Get.new(uri.request_uri)
if File.exists? thumbname
stat = File.stat thumbname
req['If-Modified-Since'] = stat.mtime.rfc2822
end
res = Net::HTTP.start(uri.hostname, uri.port) {|http|
http.request(req)
}
File.open(thumbname, 'wb') do |f|
f.write res.body
end if res.is_a?(Net::HTTPSuccess)
end
#...
end
I want to test both cases (in both cases, a file on disk exists). To do so, I'd need to stub either something in Net::HTTP, or I need to stub the File.stat to return an mtime for which I a sure the online resource will return a new or Not-modified-since.
Should I stub (or even mock) Net::HTTP? And if so, what?
Or should I stub mtime to return a date far in the past or far in the future to enforce or suppress the Not-modified Header?
Edit: Diving deeper into the matter, I learned that the i.ytimg.com-domain does not support these headers. So i'll need to solve this by inspecing JSON from the YouTube API. However, the problem "What and how to mock when testing if-modified-since-headers" still stands.
Here is how I conclude the domain does not support this:
$curl -I --header 'If-Modified-Since: Sun, 24 Mar 2013 17:33:29 +0100' -L http://i.ytimg.com/vi/D80QdsFWdcQ/0.jpg
HTTP/1.1 200 OK
Content-Type: image/jpeg
Date: Sun, 24 Mar 2013 16:31:10 GMT
Expires: Sun, 24 Mar 2013 22:31:10 GMT
X-Content-Type-Options: nosniff
Server: sffe
Content-Length: 13343
X-XSS-Protection: 1; mode=block
Cache-Control: public, max-age=21600
Age: 822
There is no "Last-Modified" there. Illustrated with another call, to exaple.com, which does support the if-modified-since headers.
$curl -I --header 'If-Modified-Since: Sun, 24 Mar 2013 17:33:29 +0100' -L example.com
HTTP/1.0 302 Found
Location: http://www.iana.org/domains/example/
Server: BigIP
Connection: Keep-Alive
Content-Length: 0
HTTP/1.1 302 FOUND
Date: Sun, 24 Mar 2013 16:47:47 GMT
Server: Apache/2.2.3 (CentOS)
Location: http://www.iana.org/domains/example
Connection: close
Content-Type: text/html; charset=utf-8
HTTP/1.1 304 NOT MODIFIED
Date: Sun, 24 Mar 2013 16:47:47 GMT
Server: Apache/2.2.3 (CentOS)
Connection: close

setting a Content-Type in CakePHP when the response is large-ish (>4kB)

Quite simply, I'm trying to generate and download a CSV file from a CakePHP controller. No problem generating the CSV, and everything works until the response >= 4096 bytes.
The following controller illustrates the problem:
class TestTypeController extends Controller {
public function csv($size = 100) {
# set the content type
Configure::write('debug', 0);
$this->autoRender = false;
$this->response->type('csv');
# send the response
for ($i = 0; $i < $size; $i++)
echo 'x';
}
}
When I call http://baseurl/test_type/csv/4095, I'm prompted to save the file, and the Content-Type header is text/csv. The response headers are:
HTTP/1.1 200 OK
Date: Tue, 05 Jun 2012 14:28:56 GMT
Server: Apache/2.2.22 (Ubuntu)
X-Powered-By: PHP/5.3.10-1ubuntu3.1
Content-Length: 4095
Keep-Alive: timeout=5, max=98
Connection: Keep-Alive
Content-Type: text/csv; charset=UTF-8
When I call http://baseurl/test_type/csv/4096, the file is printed to the screen, and the response headers are:
HTTP/1.1 200 OK
Date: Tue, 05 Jun 2012 14:28:53 GMT
Server: Apache/2.2.22 (Ubuntu)
X-Powered-By: PHP/5.3.10-1ubuntu3.1
Vary: Accept-Encoding
Content-Encoding: gzip
Content-Length: 38
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Content-Type: text/html
Obviously, 4kB is the limit where Content-Encoding starts gzipping the response. I'm not familiar with how the Content-Type is meant to react, but I'd obviously prefer it to remain text/csv.
The same problem occurs using the RequestHandlerComponent to manage the type of the response.
I'm using the CakePHP 2.2.0-RC1, but I've verified the problem exists with stable 2.1.3. Any ideas? Pointers in the right direction?
The answer was pretty simple -- the controller should be returning the CSV data instead of echoing it.

Resources