Why is urlfetch throwing Download Errors when calling some Google services? - google-app-engine

I've noticed that some Google services are blocking requests from App Engine servers, resulting in a urlfetch DownloadError. An example would be a feedproxy.google.com url (http://feedproxy.google.com/~r/blabbermouth/~3/cAk78LX4gJE/news.aspx, for example).
This occurs on all the apps I've tried it on, including app IDs I've never used for any kind of url fetching before. This behavior also doesn't occur on the local SDK. This leads me to believe that this is a result of using any GAE IP address when making the request.
The weird thing is that it results in the throwing of a DownloadError, instead of an error status_code in the successfully retrieved response. Using urlfetch or httplib locally works just fine, so this DownloadError I don't yet grok, or it's just a bug, in which case I'll file a ticket.

Without having a look at your code I will be guessing but since the URL that you are following is a going to redirect are you allowing redirects in your call? Note the follow_redirects=True
e.g. urlfetch.fetch(url, payload=None, method=GET, headers={}, allow_truncated=False, follow_redirects=True, deadline=None)
http://code.google.com/appengine/docs/python/urlfetch/fetchfunction.html

Related

Network request failed from fetch in reactjs app

I am using fetch in a NodeJS application. Technically, I have a ReactJS front-end calling the NodeJS backend (as a proxy), and then the proxy calls out to backend services on a different domain.
However, from logging errors from consumers (I haven't been able to reproduce this issue myself) I see that a lot of these proxy calls (using fetch) throw an error that just says Network Request Failed, which is of no help. Some context:
This only occurs on a subset of all total calls (lets say 5% of traffic)
Users that encounter this error can often make the same call again some time later (next couple minutes/hours/days) and it will go through
From Application Insights, I can see no correlation between browsers, locations, etc
Calls often return fast, like < 100 ms
All calls are HTTPS, non are HTTP
We have a fetch polyfill from fetch-ponyfill that will take over if fetch is not available (Internet Explorer). I did test this package itself and the calls went through fine. I also mentioned that this error does occur on browsers that do support fetch, so I don't think this is the error.
Fetch settings for all requests
Method is set per request, but I've seen it fail on different types (GET, POST, etc)
Mode is set to 'same-origin'. I thought this was odd, since we were sending a request from one domain to another, but I tried to set it differently and it didn't affect anything. Also, why would some requests work for some, but not for others?
Body is set per request, based on the data being sent.
Headers is usually just Accept and Content-Type, both set to JSON.
I have tried researching this topic before, but most posts I found referenced React native applications running on iOS, where you have to set some security permissions in the plist file to allow HTTP requests or something to do with transport security.
I have implement logging specific points for the data in Application Insights, and I can see that fetch() was called, but then() was never reached; it went straight to the .catch(). So it's not even reaching code that parses the request, because apparently no request came back (we then parse the JSON response and call other functions, but like I said, it doesn't even reach this point).
Which is also odd, since the request never comes back, but it fails (often) within 100 ms.
My suspicions:
Some consumers have some sort of add-on for there browser that is messing with the request. Although, I run with uBlock Origin and HTTPS Everywhere and I have not seen this error. I'm not sure what else could be modifying requests that would cause it to immediately fail.
The call goes through, which then reaches an Azure Application Gateway, which might fail for some reason (too many connected clients, not enough ports, etc) and returns a response that immediately fails the fetch call without running the .then() on the response.
For #2, I remember I had traced a network call that failed and returned Network Request Failed: Made it through the proxy -> made it through the Application Gateway -> hit the backend services -> backend services sent a response. I am currently requesting access to backend service logs in order to verify this on some more recent calls (last time I did this, I did it through a screenshare with a backend developer), and hopefully clear up the path back to the client (the ReactJS application). I do remember though that it made it to the backend services successfully.
So I'm honestly not sure what's going on here. Does anyone have any insight?
Based on your excellent description and detective work, it's clear that the problem is between your Node app and the other domain. The other domain is throwing an error and your proxy has no choice but to say that there's an error on the server. That's why it's always throwing a 500-series error, the Network Request Failed error that you're seeing.
It's an intermittent problem, so the error is inconsistent. It's a waste of your time to continue to look at the browser because the problem will have been created beyond that, either in your proxy translating that request or on the remote server. You have to find that error.
Here's what I'd do...
Implement brute-force logging in your Node app. You can use Bunyan, or Winston or just require(fs) and write out to some file when an error occurs. Then look at the results. Only log it out when the response code from the other server is in the 400 or 500 ranges. Log the request object and the response object.
Something like this with Bunyan:
fetch(urlToRemoteServer)
.then(res => res.json())
.then(res => whateverElseYoureDoing(res))
.catch(err => {
// Get the request & response to the remote server
log.info({request: req, response: res, err: err});
});
where the res in this case is the response we just got from the other domain and req is our request to them.
The logs on your Azure server will then have the entire request and response. From this you can find commonalities. and (🤞) the cause of the problem.

Google App Engine may be discarding http header

I'm testing an application on Google App Engine.
I use the Flexible environment with a custom python runtime.
In my application I need a "session.id" header on HTTP requests.
My web application code extracts the value of session.id and validates it, if it's invalid or missing the web application returns HTTP 401.
When I issue a HTTP POST to my GAE web application using curl and setting the header, e.g.:
curl -v -X POST --data "echo" -H "session.id: someweirdandlargestring" https://*****.appspot.com/echo
it seems GAE proxy removes the "session.id" header, then the web app returns 401. If I send "session.id" as a Cookie everything works fine and the server returns 200. I've debugged the application and the header does not reach the web application.
I've read the docs (https://cloud.google.com/appengine/docs/flexible/python/how-requests-are-handled) and they describe headers that are expected to be removed or added from requests before they reach our actual server. But they say:
For security purposes, some headers are sanitized or amended by intermediate proxies before they reach the application.
What makes believe me GAE is removing the HTTP header I've set.
My questions:
Is this an expected behavior from GAE?
What would you suggest to fix it? In order words: how to make my header entry to reach the web server application code?
Note that using a cookie is not an option in short term.
As #viz has suggested I've tested a header name without the dot. It works.
If I use, let's say, "sessionid", then the header reaches the server.
GAE web servers are discarding malformed http headers.
The best I've found about HTTP header conventions are on NGINX docs http://nginx.org/en/docs/http/ngx_http_core_module.html#ignore_invalid_headers:
Valid names are composed of English letters, digits, hyphens, and possibly underscores
If this can help, please let me bring this September 21 update, as I bumped intot he same issue for PHP7, which GAE was seemingly discarding my "api_secret" header for (not really RFC-ish, it is?). I deployed a little debugging tool which dumped the headers, and carried on experiments.
Outcome:
headers are forced to camel case;
apisecret became Apisecret;
api-secret became Api-Secret;
api.secret was discarded;
I finally chose X-Api-Secret, to be in line with RFC2047 recommendation for user-defined headers.

Why is CORS Disabled by Default?

Alright, first of all, I am absolutely aware that we have a bunch of answers on this and there is a plethora of articles on the topic. I just read these answers a second before typing this:
Why is CORS without credentials forbidden?.
Is CORS considered bad practice?
Etc. My particular situation is this - I just set up WebAPI2 for my practice project, the front end for which is running via gulp browser-sync. I have no idea how these ports get picked, but lets say the Web API is running on port http://localhost:1234/ and browser-sync generates the website on http://localhost:4321/. So I hit the API via angular's $http and get the famous CORS error (API controller method does get hit), so I am guessing it's the API returning not allowed. Edit: I fixed this via installing a CORS for Web API package via NuGet (Article Here) before asking this Q, just referencing for anyone who might need it later.
So, I was thinking, if I deployed this, ANY request would get denied, unless I am missing something. Or would it not be denied because of something I don't understand? Is disallowing CORS just a throwback from the MVC days? Or is there some purpose to it with APIs?
Maybe I am just ranting, but this confuses the **** out of me.
CORS is based on the response headers returned from the API. It is not the API that rejects responding to the request, the web browser explicitly disallows handling the response. The API will process the request as normal.
When dealing with anything other than a GET, CORS also requires a "preflight" request to the API first, to ensure subsequent requests are allowed. This amongst sending the headers back is what the Web API nuget package provides.
CORS is off by default for security purposes.

GAE cloud endpoints - Api not updating after deploy

I'm starting to use cloud endpoints in my GAE project but have been running into issues with the api not updating on the server.
localhost:8888/_ah/api/explorer is ok.
But when I deploy, nothing changes.
myapp.appspot.com:8888/_ah/api/explorer is bad
Further investigation shows the url end points update
example: https://myapp.appspot.com/_ah/api/myapp/v1/foo/list
But the loaded client api is still incorrect.
example: gapi.client.load('myapp', 'v1', callback, url);
gapi.client.myapp.foo.list();
If I changed the call from foo/list to foo/list2, the rest url would update, the api package would not.
I'll try to cover the two cases people could run into:
Client Side:
The Google APIs Explorer web app aggressively caches, so you'll need to clear your cache or force a refresh when you update your API server side to see the changes in the client.
Server Side (In Deployed Production App Engine App):
If you're having deployment issues, there are two places to look when debugging:
Check your Admin Logs (https://appengine.google.com/adminlogs?&app_id=s~YOUR-APP-ID) after deployment. After a successful deployment of your application code, you should see the message:
Completed update of a new default version
and shortly after that you should see:
Successfully updated API configuration
If you this message indicates the API configuration update failed, you should deploy again. If said error is persistent, you should notify us of a bug. If you don't see any message about your API configuration, you should check that the path /_ah/spi/.* is explicitly named in your routing config (app.yaml for Python, web.xml for Java).
Check your Application Logs (https://appengine.google.com/logs?&app_id=s~YOUR-APP-ID) after deployment. After the deployment finishes, Google's API infrastructure makes a request to /_ah/spi/BackendService.getApiConfigs in your application so that your API configuration (as JSON) can be registered with Google's API infrastructure and all the discovery-related configs can be created. If this request does not complete with a 200, then your API changes will not show up since Google's API infrastructure will have nothing to register.
If you are consistently getting a 302 redirect for requests to /_ah/spi/BackendService.getApiConfigs, it is because you (or your generated API config) have specified a "bns adapter" that uses http: as the protocol in your API root, but your web.xml (Java) or app.yaml (Python) is required that paths through /_ah/spi are secure. This will make requests using http: as the protocol be redirected (using 302) to the same page with https: as the protocol. This was discussed on the Trusted Tester forum before going to Experimental.
This is what happened to me.
I tested my endpoint on localhost and it worked fine.
I deployed my endpoint on appspot and when I made requests to it I received in the browser the message 'Not found'.
So I looked in the logs and when I made requests to the endpoint I saw a 404 http error code on favicon file. And in effects I forgot to put that file in my deploy.
So I redeployed my war with the favicon file, the 404 http code disappeared and the endpoint worked fine on appspot too!
I realize that this may sound silly, but it is what I experienced. (I apologize for my poor english)
I noticed that if you upload your app for the first time without the following in your web.xml:
<security-constraint>
<web-resource-collection>
<url-pattern>/_ah/spi/*</url-pattern>
</web-resource-collection>
<user-data-constraint>
<transport-guarantee>CONFIDENTIAL</transport-guarantee>
</user-data-constraint>
</security-constraint>
Then your bns adapter will be set as http going forward. When I add the above afterwards, I get 302 http code on /_ah/spi/BackendService.getApiConfigs and the endpoints never update.
So now I have reverted to not use https on /_ah/spi and my endpoints are updating. I guess for those that see their endpoints not being updated revert back to the first configuration they had for ssl on /_ah/spi/.
Yaw.
I had the same error Not Found (the 404 error code) when I was calling my API using this URL
https: // MY_APP_ID.appspot.com / _ah / api / MY_SERVICE / v1 / user
I tried everything and finally fixed it by removing the discovery files from WEB-INF and kept only MY_SERVICE-v1.api and then redeployed the API. It works fine now.
I was also getting stale API discovery doc after deploying new version, it took a couple of minutes for GAE to start serving the new one to me.
I had the same problem, and I checked the admin logs, other logs etc... but still my API wasn't updating to the latest version.
So I decided to check in the API code for the last method I had written (I am writing in Java 7). And I found out that GAE doesn't like statements like:
if (!blocked){ .... }
I switched that to:
if (blocked == false) { ... }
And it worked like a charm. So by the looks of it, GAE scans the new API methods and doesn't accept some shortcuts.

403 Responses for Authenticated User:Friends Twitter calls from Google App Engine

I am using the tweepy-gae library to do authenticated (oauth) calls to the twitter api (user:friends). The calls are working when running from my local machine and are failing with a 403 Forbidden: "The request is understood, but it has been refused. An accompanying error message will explain why. This code is used when requests are being denied due to update limits." The app has read / write access but only does reads (not posting anything).
I'm aware of the white listing issues with Google App Engine and Twitter, and how GAE uses the same set of IP addresses that are hitting the limits collectively.
But these are authenticated calls for a method (user_friends) that are authentication optional, and twitter documentation says that rate limiting in this case is based on the authenticated user (350 calls / hour). And I'm doing a couple of calls per hour here and there only.
Any idea what can be the issue? Any help or hints would be appreciated :)
I figured what was wrong in this case. The Tweepy GAE library hasn't been updated in a while and was calling calling the api under http://twitter.com/[] instead of http://api.twitter.com/[]
So the twitter api was behaving oddly probably because it was an old outdated version of the api that was being hit.

Resources