Great tool, does everything I need. Love its Transform tab that allows compression of the response. But what about request? Seems like a simple thing but I don't see that functionality. Am I missing something?
Fiddler Web Debugger, V2.3.4.4.
You can write a bit of script to compress the request body. Click Rules > Customize Rules, and add something like this:
static function OnBeforeRequest(oSession: Session){
if (oSession.requestBodyBytes != null && oSession.requestBodyBytes.Length>0){
oSession.requestBodyBytes = Utilities.GzipCompress(oSession.requestBodyBytes);
oSession["Content-Length"] = oSession.requestBodyBytes.Length.ToString();
oSession["Content-Encoding"] = "gzip";
}
However, I'm not aware of any servers that actually support compressed requests. There's no good way for a server to signal that it supports compressed requests, and Zip Bomb attacks are a real threat for servers.
Related
Background story: I need to obtain the handles of the tagged Twitter users from an attached Twitter media. There's no current API method to do that unfortunately (see https://twittercommunity.com/t/how-to-get-tags-of-a-media-in-a-tweet/185614 and https://github.com/twitterdev/open-evolution/issues/34).
I have no other choice but to scrape, this is an example URL: https://twitter.com/justinwood_/status/1626275168157851650/media_tags. This is the page which pops up when you click on the tags link under the media of the parent Tweet: https://twitter.com/justinwood_/status/1626275168157851650/
The React generated DOM is deep and ugly, but would be scrapeable, however I do not want to log in with any account to get banned. Unfortunately when you visit https://twitter.com/justinwood_/status/1626275168157851650/media_tags in an Incognito window the popup shows up dead empty. However when I dig into the network requests the /TweetDetail GraphQL endpoint is full of messages about the anonymous page visit, fortunately it still contains the list of handles I need despite of all of this.
So what I need to have is a scraper which is able to process JavaScript, and capture the response for that specific GraphQL call. Selenium uses a headless Chrome under the hood, so it is able to process JavaScript, and Selenium-Wire offers the ability to capture the response.
Unfortunately my crafted Selenium-Wire script only has the TweetResultByRestId and UsersByRestId GraphQL requests but is missing the TweetDetail. I don't know what to tweak to make all the requests to happen. I iterated over a ton of Chrome options. Here is a variation of my script:
from seleniumwire import webdriver
from selenium.webdriver.chrome.service import Service
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument("--disable-extensions")
chrome_options.add_argument("--disable-gpu")
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--headless") # for Jenkins
chrome_options.add_argument("--disable-dev-shm-usage") # Jenkins
chrome_options.add_argument('--start-maximized')
chrome_options.add_argument('--window-size=1900,1080')
chrome_options.add_argument('--ignore-certificate-errors-spki-list')
chrome_options.add_argument('--ignore-ssl-errors')
selenium_options = {
'request_storage_base_dir': '/tmp', # Use /tmp to store captured data
'exclude_hosts': ''
}
ser = Service('/usr/bin/chromedriver')
ser.service_args=["--verbose", "--log-path=test.log"]
driver = webdriver.Chrome(service=ser, options=chrome_options, seleniumwire_options=selenium_options)
tweet_id = "1626275168157851650"
twitter_media_url = f"https://twitter.com/justinwood_/status/{tweet_id}/media_tags"
driver.get(twitter_media_url)
driver.wait_for_request("/TweetDetail", timeout=10)
Any ideas?
Apparently it looks like I'd rather need to scrape the parent Tweet URL https://twitter.com/justinwood_/status/1626275168157851650/ and right now it seems my craved GraphQL call happens. Probably I got confused while trying 100 combinations.
Does Flink supports Side Outputs feature in Dataset(Batch Api) ? If not, how to handle valid and invalid records when loading from file ?
You can always do something like this:
DataSet<EventOrInvalidRecord> goodAndBadTogether = input.map(new CreateObjectIfPossible())
goodAndBadTogether.filter(new KeepOnlyGood())...
goodAndBadTogether.filter(new KeepOnlyBad())...
Another reasonable option in some cases is to go ahead and use the DataStream API, even if you don't have streaming sources.
I come here because I am searching (like the title mentionned) to do a query from geotools (through geoserver) to get feature from a solr index.
To be more precise :
I saw on geoserver user manual that i can do query on solr like this in http :
http://localhost:8080/geoserver/wfs?service=WFS&version=1.1.0&request=GetFeature
&typeName=mySolrLayer
&format="xxx"
&viewparams=q:"mySolrQuery"
The important part on this URL is the viewparams that I want to use directly from geotools.
I have already test this case (this is a part of my code):
url = new URL(
"http://localhost:8080/geoserver/wfs?request=GetCapabilities&VERSION=1.1.0";
);
Map<String, String> param = new HashMap();
params.put(WFSDataStoreFactory.URL.key, url);
param.put("viewparams","q:myquery");
Hints hints = new Hints();
hints.put(Hints.VIRTUAL_TABLE_PARAMETERS, viewParams);
query.setHints(hints);
...
featureSource.getFeatures(query);
But here, it seems to doesn't work, the url send to geoserver is a normal "GET FEATURE" request without the viewparams parameter.
I tried this with geotools-12.2 ; geotools-13.2 and geotools-15-SNAPSHOT but I didn't succeed to pass the query, geoserver send me all the feature in my database and doesn't take "viewparams" as a param.
I need to do it like this because actually the query come from another program and I would easily communicate this query to another part of the project...
If someone can help me ?
There doesn't currently seem to be a way to do this in the GeoTool's WFSDatastore implementations as the GetFeature request is constructed from the URL provided by the getCapabilities document. This is as the standard requires but it may be worth making a feature enhancement request to allow clients to override this string (as QGIS does for example) which would let you specify the additional parameter in your base URL which would then be passed to the server as you need.
Unfortunately the WFS module lives in Unsupported land at present so unless you have resources to work on this issue yourself and can provide a PR to implement it there is not a great chance of it being implemented.
When uploading to GCS (Google Cloud Storage) using the BlobStore's createUploadURL function, I can provide a callback together with header data that will be POSTed to the callback URL.
There doesn't seem to be a way to do that with GCS's signed URL's
I know there is Object Change Notification but that won't allow the user to provide upload specific information in the header of a POST, the way it is possible with createUploadURL's callback.
My feeling is, if createUploadURL can do it, there must be a way to do it with signed URL's, but I can't find any documentation on it. I was wondering if anyone may know how createUploadURL achieves that callback calling behavior.
PS: I'm trying to move away from createUploadURL because of the __BlobInfo__ entities it creates, which for my specific use case I do not need, and somehow seem to be indelible and are wasting storage space.
Update: It worked! Here is how:
Short Answer: It cannot be done with PUT, but can be done with POST
Long Answer:
If you look at the signed-URL page, in front of HTTP_Verb, under Description, there is a subtle note that this page is only relevant to GET, HEAD, PUT, and DELETE, but POST is a completely different game. I had missed this, but it turned out to be very important.
There is a whole page of HTTP Headers that does not list an important header that can be used with POST; that header is success_action_redirect, as voscausa correctly answered.
In the POST page Google "strongly recommends" using PUT, unless dealing with form data. However, POST has a few nice features that PUT does not have. They may worry that POST gives us too many strings to hang ourselves with.
But I'd say it is totally worth dropping createUploadURL, and writing your own code to redirect to a callback. Here is how:
Code:
If you are working in Python voscausa's code is very helpful.
I'm using apejs to write javascript in a Java app, so my code looks like this:
var exp = new Date()
exp.setTime(exp.getTime() + 1000 * 60 * 100); //100 minutes
json['GoogleAccessId'] = String(appIdentity.getServiceAccountName())
json['key'] = keyGenerator()
json['bucket'] = bucket
json['Expires'] = exp.toISOString();
json['success_action_redirect'] = "https://" + request.getServerName() + "/test2/";
json['uri'] = 'https://' + bucket + '.storage.googleapis.com/';
var policy = {'expiration': json.Expires
, 'conditions': [
["starts-with", "$key", json.key],
{'Expires': json.Expires},
{'bucket': json.bucket},
{"success_action_redirect": json.success_action_redirect}
]
};
var plain = StringToBytes(JSON.stringify(policy))
json['policy'] = String(Base64.encodeBase64String(plain))
var result = appIdentity.signForApp(Base64.encodeBase64(plain, false));
json['signature'] = String(Base64.encodeBase64String(result.getSignature()))
The code above first provides the relevant fields.
Then creates a policy object. Then it stringify's the object and converts it into a byte array (you can use .getBytes in Java. I had to write a function for javascript).
A base64 encoded version of this array, populates the policy field.
Then it is signed using the appidentity package. Finally the signature is base64 encoded, and we are done.
On the client side, all members of the json object will be added to the Form, except the uri which is the form's address.
var formData = new FormData(document.forms.namedItem('upload'));
var blob = new Blob([thedata], {type: 'application/json'})
var keys = ['GoogleAccessId', 'key', 'bucket', 'Expires', 'success_action_redirect', 'policy', 'signature']
for(field in keys)
formData.append(keys[field], url[keys[field]])
formData.append('file', blob)
var rest = new XMLHttpRequest();
rest.open('POST', url.uri)
rest.onload = callback_function
rest.send(formData)
If you do not provide a redirect, the response status will be 204 for success. But if you do redirect, the status will be 200. If you got 403 or 400 something about the signature or policy maybe wrong. Look at the responseText. If is often helpful.
A few things to note:
Both POST and PUT have a signature field, but these mean slightly different things. In case of POST, this is a signature of the policy.
PUT has a baseurl which contains the key (object name), but the URL used for POST may only include bucket name
PUT requires expiration as seconds from UNIX epoch, but POST wants it as an ISO string.
A PUT signature should be URL encoded (Java: by wrapping it with a URLEncoder.encode call). But for POST, Base64 encoding suffices.
By extension, for POST do Base64.encodeBase64String(result.getSignature()), and do not use the Base64.encodeBase64URLSafeString function
You cannot pass extra headers with the POST; only those listed in the POST page are allowed.
If you provide a URL for success_action_redirect, it will receive a GET with the key, bucket and eTag.
The other benefit of using POST is you can provide size limits. With PUT however, if a file breached your size restriction, you can only delete it after it was fully uploaded, even if it is multiple-tera-bytes.
What is wrong with createUploadURL?
The method above is a manual createUploadURL.
But:
You don't get those __BlobInfo__ objects which create many indexes and are indelible. This irritates me as it wastes a lot of space (which reminds me of a separate issue: issue 4231. Please go give it a star)
You can provide your own object name, which helps create folders in your bucket.
You can provide different expiration dates for each link.
For the very very few javascript app-engineers:
function StringToBytes(sz) {
map = function(x) {return x.charCodeAt(0)}
return sz.split('').map(map)
}
You can include succes_action_redirect in a policy document when you use GCS post object.
Docs here: Docs: https://cloud.google.com/storage/docs/xml-api/post-object
Python example here: https://github.com/voscausa/appengine-gcs-upload
Example callback result:
def ok(self):
""" GCS upload success callback """
logging.debug('GCS upload result : %s' % self.request.query_string)
bucket = self.request.get('bucket', default_value='')
key = self.request.get('key', default_value='')
key_parts = key.rsplit('/', 1)
folder = key_parts[0] if len(key_parts) > 1 else None
A solution I am using is to turn on Object Changed Notifications. Any time an object is added, a Post is sent to a URL - in my case - a servlet in my project.
In the doPost() I get all info of objected added to GCS and from there, I can do whatever.
This worked great in my App Engine project.
I recently started using bottle and GAE blobstore and while I can upload the files to the blobstore I cannot seem to find a way to download them from the store.
I followed the examples from the documentation but was only successful on the uploading part. I cannot integrate the example in my app since I'm using a different framework from webapp/2.
How would I go about creating an upload handler and download handler so that I can get the key of the uploaded blob and store it in my data model and use it later in the download handler?
I tried using the BlobInfo.all() to create a query the blobstore but I'm not able to get the key name field value of the entity.
This is my first interaction with the blobstore so I wouldn't mind advice on a better approach to the problem.
For serving a blob I would recommend you to look at the source code of the BlobstoreDownloadHandler. It should be easy to port it to bottle, since there's nothing very specific about the framework.
Here is an example on how to use BlobInfo.all():
for info in blobstore.BlobInfo.all():
self.response.out.write('Name:%s Key: %s Size:%s Creation:%s ContentType:%s<br>' % (info.filename, info.key(), info.size, info.creation, info.content_type))
for downloads you only really need to generate a response that includes the header "X-AppEngine-BlobKey:[your blob_key]" along with everything else you need like a Content-Disposition header if desired. or if it's an image you should probably just use the high performance image serving api, generate a url and redirect to it.... done
for uploads, besides writing a handler for appengine to call once the upload is safely in blobstore (that's in the docs)
You need a way to find the blob info in the incoming request. I have no idea what the request looks like in bottle. The Blobstoreuploadhandler has a get_uploads method and there's really no reason it needs to be an instance method as far as I can tell. So here's an example generic implementation of it that expects a webob request. For bottle you would need to write something similar that is compatible with bottles request object.
def get_uploads(request, field_name=None):
"""Get uploads for this request.
Args:
field_name: Only select uploads that were sent as a specific field.
populate_post: Add the non blob fields to request.POST
Returns:
A list of BlobInfo records corresponding to each upload.
Empty list if there are no blob-info records for field_name.
stolen from the SDK since they only provide a way to get to this
crap through their crappy webapp framework
"""
if not getattr(request, "__uploads", None):
request.__uploads = {}
for key, value in request.params.items():
if isinstance(value, cgi.FieldStorage):
if 'blob-key' in value.type_options:
request.__uploads.setdefault(key, []).append(
blobstore.parse_blob_info(value))
if field_name:
try:
return list(request.__uploads[field_name])
except KeyError:
return []
else:
results = []
for uploads in request.__uploads.itervalues():
results += uploads
return results
For anyone looking for this answer in future, to do this you need bottle (d'oh!) and defnull's multipart module.
Since creating upload URLs is generally simple enough and as per GAE docs, I'll just cover the upload handler.
from bottle import request
from multipart import parse_options_header
from google.appengine.ext.blobstore import BlobInfo
def get_blob_info(field_name):
try:
field = request.files[field_name]
except KeyError:
# Maybe form isn't multipart or file wasn't uploaded, or some such error
return None
blob_data = parse_options_header(field.content_type)[1]
try:
return BlobInfo.get(blob_data['blob-key'])
except KeyError:
# Malformed request? Wrong field name?
return None
Sorry if there are any errors in the code, it's off the top of my head.