How to get partial results from Google App Engine's urlfetch? - google-app-engine

When I'm using google.appengine.api.urlfetch.fetch (or the asynchronous variant with make_rpc) to fetch a URL that steadily streams data, after a while I will get a google.appengine.api.urlfetch_errors.DeadlineExceededError as expected. Since it is a stream that I want to sample, setting the deadline to a higher value can't ever help, unless the stream finishes (which I do not expect to happen).
It seems there is no possibility of getting the partially downloaded result. At least the API doesn't offer anything. Is it possible to
either request the downloaded part
or only ask for a certain amount of data (since I can estimate the stream's rate) to be downloaded?
[Clarification: Since it is a stream, requests with a Range header will be answered with 200 OK and not 206 Partial Content.]

In your call to urlfetch.fetch, you can set HTTP headers. The Range header is how you specify a partial-download request in HTTP:
resp = urlfetch.fetch(
url=whatever,
headers={'Range': 'bytes=100-199'})
if those are the 100 bytes you want. The HTTP status code you get should be 206 for such a partial download, etc (none of that's GAE-specific). See e.g http://en.wikipedia.org/wiki/Byte_serving for details.

Related

How to get Content-Length from response headers in CN1?

I implemented this but the response headers don't include Content-Length, even though I make sure the server replies that bit specifically. I also verified the response outside CN1 and it includes Content-Length. The full list of headers captured in ReadHeaders is (as seen from Android): null,Alt-Svc,Cache-Control,Connection,Content-Type,Date,ETag,Server,Transfer-Encoding,Vary,X-Android-Received-Millis,X-Android-Response-Source,X-Android-Selected-Protocol,X-Android-Sent-Millis,X-Cloud-Trace-Context,X-Powered-By. Right now to estimate download sizes I (1) call endpoint to get total size (2) call endpoint to get actual download and use NetworkManager progress listener, but it would be nice to be able to track progress with only one request (by using Content-Length). The vanilla RequestBuilder doesn't expose response headers so a direct usage of ConnectionRequest with readHeaders is needed. But the Content-Length is missing from getHeaderFieldNames
Note:
The reason why this wasn't working is because by default Android/CN1 sends a request with the header Accept-Enconding:gzip. This returns a chunked response that doesn't include the length header. I can't guarantee that this behavior matches every server response, but it does in my case (Node.js + Express)
To force a server to return a non-chunked response, set the header to "compress", "identity", or "deflate"
Example:
Rest.post(url).header("Accept-Encoding", "compress").fetchAsJsonMap(resp -> {...

How can an HTTP 403 be returned from an apache web server input filter?

I have written an apache 2.x module that attempts to scan request bodies, and conditionally return 403 Forbidden if certain patterns match.
My first attempt used ap_hook_handler to intercept the request, scan it and then returned DECLINED to the real handler could take over (or 403 if conditions were met).
Problem with that approach is when I read the POST body of the request (using ap_get_client_block and friends), it apparently consumed body so that if the request was subsequently handled by mod_proxy, the body was gone.
I think the right way to scan the body would be to use an input filter, except an input filter can only return APR_SUCCESS or fail. Any return codes other than APR_SUCCESS get translated into HTTP 400 Bad Request.
I think maybe I can store a flag in the request notes if the input filter wants to fail the request, but I'm not sure which later hook to get that.
turned out to be pretty easy - just drop an error bucket into the brigade:
apr_bucket_brigade *brigade = apr_brigade_create(f->r->pool, f->r->connection->bucket_alloc);
apr_bucket *bucket = ap_bucket_error_create(403, NULL, f->r->pool,
f->r->connection->bucket_alloc);
APR_BRIGADE_INSERT_TAIL(brigade, bucket);
bucket = apr_bucket_eos_create(f->r->connection->bucket_alloc);
APR_BRIGADE_INSERT_TAIL(brigade, bucket);
ap_pass_brigade(f->next, brigade);

Akka http file streaming not writing bytes more than 'n' size from different clients

`Akka http 10.0.6;
max-content-length is set to 6000m`
I am using Akka file streaming to upload huge files( sent as octet-stream to my service ) to accept incoming bytes and write it to a file sink. This is what I observe from my experimentations.. My limited understand from reading the documents is the client should be able keep sending the data unless we tell explicitly from akka http for back pressure mechanisms.. Been searching online to understand this behavior, not able to get an understanding yet on this to explain the following behavior.. Is there something I am missing in the code? How can I debug more on this? Also, via scalatest, able to do it. If someone can throw more might on what’s the difference in behavior w.r.t scalatest and via curl/http clients.. Thanks
Through "curl", I can stream max of 1KB sized file. Anything more than this, it will just hang and this message is given after timeout no matter how long I wait(20 seconds, 5 mins, 10 mins, etc) "Sending an 2xx 'early' response before end of request was received... Note that the connection will be closed after this response. Also, many clients will not read early responses! Consider only issuing this response after the request data has been completely read!"
 
Through "Apache http client", max of 128KB sized file streaming happens. Anything more than this, it will hang and same message as in #1 from akka http service
 
Through "python client", max of ~26KB sized file streaming happens. Anything more than this, it will hang and same message as in #1 from akka http service
 
Through scalatest inside the service, was able to upload files like 200MB, 400MB and more also
Here’s the code..
`put {
withoutSizeLimit {
extractDataBytes { bytes =>
 
implicit val system = ActorSystem()
implicit val materializer = ActorMaterializer()
// Tried system dispatcher also
implicit val executionContext = system.dispatchers.lookup("dispatcher")
val sink = FileIO.toPath(Paths.get("/file.out"))
val action = bytes.runWith(sink).map {
case ior if ior.wasSuccessful => {
complete(StatusCodes.OK, s"${ior.count} bytes written")
}
case ior => complete(StatusCodes.EnhanceYourCalm, ior.getError.toString)
}
Await.result(action, 300.seconds)
}
}
}`

Consuming response headers in Apache Output filter

I am writing an apache module output filter that needs to consume a couple of internal-only response headers. These response headers are set by a perl based application running in the backend. The APR function I am using in my output filter is:
apr_table_get(r->headers_out, "x-my-response-header");
However, what seems to happen is that in my output filter I do not see the above response header set, up until the third or fourth bucket brigade - which is unfortunately already too late - I actually need to use the value of x-my-response-header to compute a new response header and set that in the response to the browser.
I insert my output filter this way:
ap_hook_insert_filter(insertOutputFilterHook, NULL, NULL, APR_HOOK_FIRST);
ap_register_output_filter(myFiltersName, myOutputFilter, NULL, AP_FTYPE_CONTENT_SET);
What I have verified:
The internal-only headers do appear on the HTTP response on my browser (haven't unset them yet)
The first two bucket brigade's buckets contain html page text
Questions:
What could be the reasons for why the internal-only response header not be set/visible in the first call to my output filter / first bucket brigade?
Is it possible to instead accumulate the first few bucket brigades and then start flushing them out once the internal only response header's value known?

Why are these deferred tasks not being executed in the order in which they were added?

I'm using Twilio to send sms's with appengine. Twilio doesn't accept sms's longer than 160 characters so I have to split them. I am splitting the sms's and sending them as follows:
def send_sms_via_twilio(mobile_number, message_text):
client = TwilioRestClient(twilio_account_sid , twilio_auth_token)
message = client.sms.messages.create(to=mobile_number, from_=my_twilio_number, body=message_text)
split_list = split_sms(long_message)
for each_message in split_list:
send_sms_via_twilio(each_message)
However I found that the order of sending varied. For example sometimes I'd recieve message 2/5 then 1/5 then 4/5 etc and other times the order would be correct. The order of the split_list is definately correct. To overcome the incorrect order of the sms's I tried
for each_message in split_list:
deferred.defer(send_sms_via_twilio, each_message, _countdown=1)
However I encountered the same problem. I then tried
for each_message in split_list:
deferred.defer(send_sms_via_twilio, each_message, _countdown=1, _queue="send-text-message")
and defined my queue as
- name: send-text-message
rate: 1/s
bucket_size: 10
max_concurrent_requests: 1
retry_parameters:
task_retry_limit: 5
Thinking that the issue was concurrency (running in python27) and that if I limited max_concurrent_requests this issue would be solved. However the issue is still present i.e. the texts still get sent in the wrong order. I checked the logs but couldnt see any notification of task failure - they just seem to be executing in the wrong order.
Is there something I am missing? How can I fix this issue.
Note that the SMS messaging (specifically the underlying protocols like SMPP) are asynchronous by definition. It means there is no way you can specify the order of distinct SMS messages.
There is a way to specify the order of SMS packets by using the UDH (user defined headers) in the binary body of those messages. But this works only for long SMS messages -- those that are too long to be sent in one message. For example, if your msg exceeds 160 GSM-7 characters or 80 UTF-16 characters it will be send as more than one message with UDH.
In that case the mobile phone won't show message parts as they arrive. It will collect them in memory until the last one comes and then assembles them in the right order. For the end user this is just a message longer than usual and you don't have to write "1/3", "2/3", ... in the message.
Disclaimer: I work for a company that enables you to send and receive both multiple binary messages with user-specified headers (UDH) and/or standard long messages.
If you are not tied to Twilio try using SMSified. They automatically split the message for you, insure it is in the correct order, and add "1/2, 2/2..." to the end of the message. In other words you just send the complete message to their REST API, no matter the length, and they handle the rest. Since they also use a REST API you can continue to use Python.

Resources