Google App Engine Error: DNS lookup failed for URL: http://metadata.google.internal - google-app-engine

I've been working on a small Google App Engine (standard environment) project that uses Cloud Endpoints v2. My code is largely based on the quickstart provided by Google.
Everything was working fine, but I re-deployed today after having not looked at it for a few weeks, and I'm getting the following error when I attempt to call the endpoint:
error: An error occured while connecting to the server: DNS lookup failed for URL: metadata.google.internal
This wasn't happening before. It seems to be happening when the endpoints package is being imported by Python.
My endpoint doesn't do anything fancy - I haven't changed the source from the sample EchoApi. The error ends up in the GCP Logging console no matter if I try to access the API through the API Explorer or via Curl.
I don't get any errors during deployment.
Edit #1
Some further information:
The error originates from within Google's code that is included with the google-endpoints package which I've included in my lib folder, per
the documentation. Specifically, the error occurs on line 54 of google/api/control/wsgi.py.
Basically, it's making a request to metadata.google.internal using urllib2.
I'm guessing this address is only available from within the Google Cloud, and that for whatever reason, the instance that's hosting my app can't do a DNS lookup on it.
Edit: #2
Dug a bit further.
It seems that the error originates in the google-endpoints-api-management package. Changes committed to that package on October 19th seem to have introduced additional platform reporting. metadata.google.internal is queried to check if the code is running within the Google Container Engine, then it blows up, because the metadata address doesn't resolve.
Here's the commit:
https://github.com/cloudendpoints/endpoints-management-python/commit/0a37d0e443091053ed03e455e06d3a0ae770999f
The google-endpoints package only requires google-endpoints-api-management >=1.0.0b1. On my end, things were working fine on version 1.0.0b2, but then I built a new lib folder, which brought down 1.0.0b5, and things went sideways. Required packages haven't changed between b2 and b5, so I'm thinking I may be able to just downgrade back to b2 for the time being. Haven't tried it yet.
Sent the Google Dev an email. Perhaps he'll chime in with further tips.
Edit: 2016-11-07
Tested downgrading the google-endpoints-api-management package to 1.0.0b2. Seems to be working, kludgy a fix as it is. If you're using the lib folder, the following will scrub the newer error-prone wsgi.py file and put back the older one:
pip install -t lib google-endpoints-api-management==1.0.0b2 --upgrade
Not pretty, but it may just get you back in business.
On a side note, the Google engineer promptly replied saying that he would take a look at this issue soon. With luck, endpoints v2 will eventually come out of beta, 'cause I'm really liking it so far.

This will be fixed in an upcoming patch to the google-endpoints-api-management package (which will be 1.0.0b6). It will probably be released sometime on Monday, 11/6.
If you'd like to continue testing right away and this error is blocking you, you can go back to 1.0.0b4 until 1.0.0b6 comes out. Everything should still work as normal with that version.
Thanks for bringing this to our attention! We're doing our best to iron out all of these wrinkles now during beta in preparation for our first general release.
EDIT: 1.0.0b6 has been released and resolves this issue. Thanks for your patience during our beta phase!

(Posted solution on behalf of the OP).
Google has released version 1.0.0b6 of the google-endpoints-api-management package to address this issue. It solved the problem for me. For anyone who is encountering this problem, clean out your lib folder and re-install the google-endpoints package. This will bring down the new google-endpoints-api-management package with it.
Thanks to Brad at Google for really quick action on this.

Related

Appengine runs a stale version of the code -- and stack traces don't match source code

I have a python27 appengine application. My application generates a 500 error early in the code initialization, and I can inspect the stack trace in the StackDriver debugger in the GCP console.
I've since patched the code, and I've re-deployed under the same service name and version name (i.e. gcloud app deploy --version=SAME). Unfortunately, the old error still comes up, and line numbers in the stack traces reflect the files in the buggy deployment. If I use the code viewer to debug the error, I am however brought to the updated patched code in the online viewer -- and there is a mismatch. It behave as if the app instance is holding on to a previous snapshot of the code.
I'm fuzzy on the freshness and eventual consistency guarantees of GAE. Do I have to wait to get everything to serve the latest deployed version? Can I force it to use the newer code right away?
Things I've tried:
I initially assumed the problem had to do with versioning, i.e. maybe requests being load-balanced between instances with the same version, but each with slightly different code. I'm a bit fuzzy on the actual rules that govern which GAE instance gets chosen for a new request (esp whether GAE tries to reuse previous instances based on a source IP). I'm also fuzzy on whether or not active instances get destroyed right away when different code is redeployed under the same version name.
To take that possibility out of the equation, I tried pushing to a new version name, and then deleting all previous versions (using gcloud app versions list to get the list). But it doesn't help -- I still get stack traces from the old code, despite the source being up to date in the GCP console debugger. Waiting a couple hours doesn't do anything either.
I've tried two things:
disabling and re-enabling the application in GAE->Settings
I'd also noticed that there were some .pyc files uploaded in the snapshot, so I removed those and re-deployed.
I discovered that (1) is a very effective way to stop all running appengine instances. When you deploy a new version of a project, a traffic split is created (i.e. 0% for the old version and 100% for the new), but in my experience old instances might still be running if they've been used recently (despite them being configured to receive 0% of traffic). Toggling kills them all immediately. I unfortunately found that my stale code was still being used after re-enabling.
(2) did the trick. It wasn't obvious that .pyc were being uploaded. I discovered it by looking at GCP->StackDriver->Debug and I saw .pyc files in the tree snapshot.
I had recently updated my .gitignore to ignore locally installed pip runtime dependencies for the project (output of pip install -t lib requirements.txt). I don't want those in git, but they do need to ship as part of my appengine project. I had removed the #!.gitignore special include line from .gcloudignore. However, I forgot to re-add *.pyc into my .gcloudignore.
Another way to see the complete set of files included in an app deployment is to increase the verbosity to info on the gcloud app deploy command -- you see a giant json manifest with checksums. I don't typically leave that on because it's hard to visually inspect, but I would have spotted the .pyc in there.

Updating AppEngine deployed version on GitHub push

Is it possible to auto-deploy to AppEngine when Github receives a new commit? I found a bunch of dead documentation links that suggest it is, however I still have no idea how to set it up. There are some mentions for creating a release pipeline, but I don't see any way to do that in the cloud console anymore.
I've got my code mirroring the GitHub repository already, however I can't figure out how to link this to the deployment pipeline or even how to create a new version. Am I missing something obvious? This seems like it should be incredibly straightforward to do...
The entry on the Google Cloud Platform blog that's supposed to explain how to achieve this is blank. It seems Google quietly killed this feature. Some people have complained that it suddenly stopped working. The only way to do this now is to use Continous Delivery/Integration tools such as Travis CI.

Bluemix Monitoring and Analytics: Resource Monitoring - JsonSender request error

I am having problems with the Bluemix Monitoring and Analytics service.
I have 2 applications with bindings to a single Monitoring and Analytics service. Every ~1 minute I get the following log line in both apps:
ERR [Resource Monitoring][ERROR]: JsonSender request error: Error: unsupported certificate purpose
When I remove the bindings, the log message does not appear. I also greped my code for anything related to "JsonSender" or "Resource Monitoring" and did not find anything.
I am doing some major refactoring work on our server, which might have broken things. However, our code does not use the Monitoring service directly (we don't have a package that connects to the monitoring server or something like that) - so I will be very surprised if the problem is due to the refactoring changes. I did not check the logs before doing the changes.
Any ideas will help.
Bluemix have 3 production environments: ng, eu-gb, au-syd, and I tested with ng, and eu-gb, both using 2 applications with same M&A service, and tested with multiple instances. They are all work fine.
Meanwhile, I received a similar problem that claim they are using Node.js 4.2.6.
So there are some more information we need to know to identify the problem:
1. Which version of Node.js are you using (Bluemix Default or any other one)
2. Which production environment are you using? (ng, eu-gb, au-syd)
3. Is there any environment variables are you using in your application?
(either the creating in code one, or the one using USER-DEFINED Variables)
4. One more thing, could you please try to delete the M&A service, and create it again, in case we are trapped in a previous fault of M&A.
cf ds <your M&A service name>
cf cs MonitoringAndAnalytics <plan> <your M&A service name>
NodeJS versions 4.4.* all appear to work
NodeJS uses openssl and apparently did/does not like how one of the M&A server certificates were constructed.
Unfortunately NodeJS does not expose the openssl verify purpose API.
Please consider upgrading to 4.4 while we consider how to change the server's certificates in the least disruptive manner as there are other application types that do not have an issue with them (e.g. Liberty and Ruby)
setting node js version 4.2.4 in package.json worked for me, however this is an alternative by-passing solution. Actual fix is being handled by core team. Thanks.

gwt sample project rpc call failure

When i try to run gwt sample project it gives "RPC call failure.An error occurred while attempting to contact the server. Please check your network connection and try again". It was running good before but after i update programs and libraries it gives that error. Which update causes this error or there is other things?
Appengine version:1.7.0
GWT version:2.4.0
Eclipse version:4.2(juno)
JDK version:1.7.0_05
This may not be the problem you are facing but it seems to be the most frequent problem.
Let me predict - you last tried GWT four years ago and you hoarded the sample project hoping to pull it out one day.
And yesterday, you did pull it out. It kinda worked and then you decided to upgrade to the latest 2.4.0. (Actually the latest is 2.5.0-rc1).
Ooops. The web-inf/lib of your project is still faithfully using pre-version 2.2 gwt-servlet jar.
Nope, you can't do that. GWT-RPC data transfer format is version-unstable. Not guaranteed to be compatible from one version to the next.
Simple solution - recreate a new GWT project using the new Google plugin.
Then copy the src and web.xml of your project into the new project.
Or replace the gwt-servlet.jar with the latest. And if you usegwt-servlet-deps.jar, you would need to upgrade that too (but I doubt so, because if you did use gwt-servlet.deps.jar you wouldn't be asking this question).
But why would you keep the gwt sample from an old project?
The samples have remained quite the same over the years. Why not use the samples from the new GWT 2.4.0 download. You don't have to keep the samples. You should try to construct the projects for the samples afresh.
The GWT directory is found under Eclipse's plugins directory, under a long name. Like
plugins/com.google.gwt.eclipse.sdkbundle_2.4.0.v201205091048-rel-r37/gwt-2.4.0.
In which you will find the samples directory.

GAE SDK 1.6.4 on Mac with django-nonrel: syncdb won't create the datastore file

first question from newbie. Haven't needed to post a question for months now -- so many great answers already posted. I am stuck on this one, though.
Developing on a Mac, Python 2.7, Django with the django-nonrel project, GAE datastore. Up to and including GAE SDK 1.6.3, all goodness. After upgrading to GAE SDK 1.6.4, noticed 3 strange things:
Dev server (python manage.py runserver) fails immediately with "Error: No module named webob." No other errors or output. I rooted around under /usr/local/google_appengine/lib/... and, indeed, no module named webob. There are two close matches -- webob_0_9 and webob_1_1_1. I made a symlink webob -> webob_1_1_1 to get past the error.
Startup messages from the dev server include an INFO: message that the SDK version is later than the advertised version. Google has 1.6.4 on their download site, so not seeing how my 1.6.4 is later than the latest.
Django's syncdb command (python manage.py syncdb) will no longer create the .gaedata/datastore file. It says it's creating tables, it prompts me for the superuser creds, it even says it installed a bunch of objects from my fixture file. It gives no errors, but when it completes, it has done none of these things -- the .gaedata/datastore file doesn't even exist.
Prior to 1.6.4, syncdb worked great, including loading fixture data. I tried starting a fresh project with only bare bones files in it and a simple model (one class having one field) to see if some complexity I had introduced might be the cause of the problem. Even in simple-land, syncdb wouldn't create the datastore.
My only solution was to fall back to GAE SDK 1.6.3 -- everything works once again. Anyone else seeing similar symptoms with SDK 1.6.4? Are there obvious diagnostic steps I should be taking?
Fixed in latest django-nonrel. Either sync to latest code, or manually integrate this pull request:
https://github.com/django-nonrel/djangoappengine/pull/24
Haven't seen this problem, don't know.
There's a workaround, but it's not merged into the master branch yet, you'll have to manually integrate the changes.
https://github.com/django-nonrel/djangoappengine/pull/25

Resources