We are considering using Google Cloud Storage as an alternative to AWS, and so are planning to do some performance testing on GCS. One of the features we would like to test is searching for files at a certain path. Unfortunately, the SDK does not have the ability to search for a prefix. Instead, we are forced to use the Java client API. Here is the relevant code which is failing:
GcsService gcsService = GcsServiceFactory.createGcsService(RetryParams.getDefaultInstance());
AppIdentityService appIdentity = AppIdentityServiceFactory.getAppIdentityService();
ListOptions.Builder b = new ListOptions.Builder();
b.setRecursive(true);
b.setPrefix("folder/");
ListResult result = gcsService.list("rms-test-bucket", b.build());
Specifically, the code rolls over on the call to gcsService.list() with a NullPointerException. I attached all sources in IntelliJ, stepped through the code, and found that the cause was a call to ApiProxy.getDelegate() returning null, when it should have returned a non null value.
We suspect that there is a configuration problem somewhere although it is not clear what it might be.
Where are you running that code from? This could should be run in AE standard or AE Flexible compat (as that API is App Engine specific). For all other cases you should use the google-cloud-java client. In fact I would suggest using that client even on AE as it is supported on all platform and much richer in its functionality. For more information see here.
I'm not entirely sure what's wrong with your example, but if your goal is strictly to test GCS performance with searching for files at a certain path, the gsutil command-line utility contains a solid implementation of that logic. You could use it to evaluate performance. If you're testing from a GCE instance, it's already preinstalled.
Related
I have a website on AppEngine that is 99% static. It is running on Python 2.7 runtime. Now the time has come to evolve this webapp, and since I have almost none Python code in it, I'd prefer to write it in Go instead.
Can I change runtime from Python 2.7 to Go, while keeping the project intact? Specifically, I want to keep the same app-ID, the same custom domain attached to it, the same SSL certificate, and so on.
What do I have to do in order to do that? I surely have to change runtime in the app.yaml. Is there anything else?
Bonus question: will such change happen without a downtime?
I'd be grateful for any links to documentation on exactly that (swapping runtime on a live app). I can't find any.
Specify a runtime as well as a new value for version. When deployed you'll have an older version that is Python and a newer version that is Go. There won't be any downtime (same as when deploying a newer version of Python).
Rather than trusting links/docs (that may be out of date or not 100% exactly what you're trying to do), why not create a new GAE-Std project for testing purposes and try it yourself. Having a GAE-Std test project is good for testing new function (especially by other testers who won't have access to the dev environ on your laptop).
The GAE services offer complete code isolation. So it should be possible to simply deploy a new version of the service, which can be written in a different language or even use a different GAE (standard/flex) environment. Personally I didn't go through a language change, but I did go through a split of a single-service app into a multi-service one, I see no reason for which the same principles wouldn't apply.
Maybe develop the new version as a separate app first, to be able to test it properly without risking an accidental impact on the old version and only after that bring the code as a new version in the old app. That'd be using the GAE project isolation. You can, in fact, test the entire version migration as a separate app if you so desire without even touching the existing app. I am using this technique - a separate app ID - to implement a staging environment for my app, completely isolated from my production app, see How to copy / clone entire Google App Engine Project
Make sure to not switch traffic to the new version at deployment time. This keeps the app working with the old version. Test first that the new version works as expected using Targeted routing. Then maybe use Splitting traffic across multiple versions to perform A/B testing with just a small percentage of the traffic going to the new version. Finally, when happy with the results, switch all traffic to the new version.
You need to pay special attention to the app-level configs (dispatch, cron, queue, datastore indexes), shared by all services/versions. They need to be functionally equivalent in the 2 versions. The service isolation doesn't apply to them, only project isolation can ensure no impact to the old version.
There should be no need to make any change to the app ID, custom domain mapping or SSL config. The above mentioned tests should confirm that.
A few potentially interesting posts related to re-working services/modules:
Converting App Engine frontend versions to modules
Google App Engine upgrading part by part
Migrating to app engine modules, test versions first?
Advantages of implementing CI/CD environments at GAE project/app level vs service/module level?
I have implemented a simple API in Go on Google App Engine Standard using just:
func init() {
http.HandleFunc("/api/v1/resource",submitResource)
}
Nothing special. However I want to port this code to using Cloud Endpoints instead in order to get the better monitoring and diagnostics.
Is it even possible with STANDARD instances or must I move to FLEXIBLE?
I can't find any documentation on this. Nor answers to this seemingly simple question. At the moment I half wish I had chosen Python because its support seems more mature. I chose Go because it seems more appropriate for API-like code because my minimal research suggested Go offered better performance.
If it is possible, are there any pointers to how please?
Only Python and Java are supported on GAE Standard via the Endpoints Frameworks. However, Go is supported on GAE Flexible.
Here is the Go GAE Flexible sample:
https://github.com/GoogleCloudPlatform/golang-samples/tree/master/endpoints/getting-started
After much research and trial and error, the simple answer is "No." - as of Dec 2016.
The longer answer is it's possible if you want to put far too much effort into making up to date libraries of your own. There is basically no support, even in alpha, for the current Google Cloud Endpoints using Go with Google App Engine Standard.
It's possible to run Go+endpoints on GAE Standard environment, however libraries might be outdated now.
Libraries and sample app can be found on github:
https://github.com/GoogleCloudPlatform/go-endpoints
I have successfully deployed "Greetings" as AppEngine SE app, and it works.
I am having trouble with using Push Queues on Google App Engine's Flexible Environment (formally named, their Managed VM Environment). I am receiving numerous 404 Instance Unavailable (see picture below).
After a bit of sleuthing, I believe these errors may be because I am adding a task to a task queue, then deploying a new version of the Flexible VM instance. The taskqueue that I previously pushed is locked to the older instance, and can no longer run. Is this how taskqueues work with Flexible VM? If so, how does one use push taskqueues with the Flexible VM?
I was 90% done migrating to flexible env when I came across this same problem. After extensive research, I concluded there are three options:
REST API (experimental)
Use the beta REST API for task queues (this, as all other google APIs from flexible env, is external, so you need to deal with auth appropriately).
REST API reference: https://cloud.google.com/appengine/docs/python/taskqueue/rest/
Note, this is external and experimental. Find e.g. a java sdk without any meaningful documentation here: https://developers.google.com/api-client-library/java/apis/ (current version: https://developers.google.com/api-client-library/java/apis/taskqueue/v1beta2)
Compat runtime
Build your own flexible environment, based off a -compat runtime. This offers the old appengine api in a container suitable for the flexible env:
https://cloud.google.com/appengine/docs/flexible/custom-runtimes/build (look for images with "YES" in the last column)
e.g.: https://cloud.google.com/appengine/docs/flexible/java/dev-jetty9-and-apis
https://cloud.google.com/appengine/docs/flexible/java/migrating-an-existing-app
Note: I spent two weeks in blistered frustration pleading every God almighty help me get this to work, following container rabbit holes into the depths of Lucifer's soul and across unexplored dimensions. I eventually had to give in. I just can't get this to work to a satisfying degree.
Proxy service
Kind of a hacky alternative, but it gets the job done: create a very thin standard environment wrapper service which proxies tasks onto / off your queue. Pass them to your own app however you want. ¯\_(ツ)_/¯
Downside is you are now spinning up extra instances and burning extra minutes.
I ended up with a variation of this, where I'm using a proxy service in standard env, but just ported my eventual task handler to AWS Lambda (so it's completely off GAE). It's a different disaster, but a more manageable one.
Good luck!
I'm the tech lead and manager for this product.
There are two distinct answers to your question.
For the first, it seems like you have a version routing issue -- as you say, tasks cannot run against a VM because you launched a new version. By default, tasks are assigned to run on the version from which they were enqueued to avoid version mismatches. You should be able to override the version by re-configuring the target in your queue.yaml (or queue.xml). Documentation for that can be found here. You might also need to look at your
From a broader perspective, building a migration path away from standard/MVM-only support for task queues is currently our highest priority.
The replacement is Cloud Tasks, which exposes the same interface but can be used fully independently from App Engine. It exists in the same universe as AppEngine Task Queues, so you will be able to add tasks to existing queues (both push and pull). It is currently available in closed alpha. You can sign up to join the alpha here.
We strongly recommend against writing new code against the REST API. It is unsupported and the cloud tasks alpha is already substantially more feature complete.
I'm upvoting hraban's answer (he did wrestle with the devil after all) but providing an additional answer here.
Keep in mind that the Flexible Environment (Managed VMs) is still just a compute engine instance... with Google doing a good job of extracting features from AppEngine to make them reachable in a transparent manner. TaskQueues didn't quite make it. Keep a sharp eye on the cloud library--that's the mechanism by which the DataStore becomes usable (for Java go to http://googlecloudplatform.github.io/google-cloud-java/0.3.0/index.html). If you go to that link you can also select other languages. I have it on official word that TaskQueues are still on the roadmap (but no ETA).
As of now you can't use the REST api to enqueue onto PUSH queues. Now the way that I decided to tackle this problem was to use the REST API and create a PULL queue to put tasks in. Then I poll that queue inside an AppEngine service (i.e. module) and put it into a PUSH task queue. Why do I go to all that trouble? Because I need scheduled execution... which is a feature that TaskQueues alone can give you on AppEngine. So I package my task in an envelope and then unpack and re-push it into a task queue. Sounds crazy? It's been working reliably for me. Don't be scared off by the fact that the REST api is "alpha".
I will say if you're starting something new take a good look at the Pub/Sub API.
I have started to try to use the Google Cloud datalab. While I understand it is a Beta product, I find the Doc's very frustrating, to say the least.
The questions here and lack of responses as well as lack of new revisions or docs over the several months the project has been available make me wonder if there is any commitment to the product?
A beginning would be a notebook that shows data ingestion from external sources to both the datastore system and the Big query system. That is a common use case. I'd like to use my own data, it would be great to have a Notebook to ingest it. It seems that should be doable without huge effort? And it would get me (and others) out of this mess trying to link the various terse docs from various products and workspaces up and working together..
in addition to a better explanation of the Git hub connection process (prior question))
For BigQuery, see here: https://github.com/GoogleCloudPlatform/datalab/blob/master/content/datalab/tutorials/BigQuery/Importing%20and%20Exporting%20Data.ipynb
For GCS, see here: https://github.com/GoogleCloudPlatform/datalab/blob/master/content/datalab/tutorials/Storage/Storage%20Commands.ipynb
Those are the only two storage options currently supported in Datalab (which should not be used in any event for large scale data transfers; these are for small scale transfers that can fit in memory in the Datalab VM).
For Git support, see https://github.com/GoogleCloudPlatform/datalab/blob/master/content/datalab/intro/Using%20Datalab%20-%20Managing%20Notebooks%20with%20Git.ipynb. Note that this has nothing to do with Github, however.
As for the low level of activity recently, that is because we have been heads down getting ready for GCP Next (which happens this coming week). Once that is over we should be able to migrate a number of new features over to Datalab and get a new public release out soon.
Datalab isn't running on your local machine. Just the presentation part is in your browser. So if you mean the browser client machine, that wouldn't be a good solution - you'd be moving data from the local machine to a VM which is running the Datalab Python code (and this VM has limited storage space), and then moving it again to the real destination. Instead, you should use the cloud console or (preferably) gcloud command line on your local machine for this.
We have a library with very complex logic implemented in C. It has a command line interface with not too complex string-based arguments. In order to access this, we would like to wrap the library so that it can be accessed with simple XML RPC or even straightforward HTTP POST calls.
Having some experience with Java, my first idea would be
Wrap the library in JNI/JNA
Use a thin WS stack and a servlet engine
Proxy requests through Apache to the servlet engine
I believe there should already be something simple that could be used, so I am posting this question here. A solution has the following requirements
It should be deployable to a current linux distribution, preferrably already available via package management
It should integrate with a standard web server (as in my example Apache)
Small changes to the library's interface should be manageable
End-to-end (HTTP-WS-library-WS-HTTP) the solution should not incur too much overhead, but reliability is very important
Alternatively to the JNI/JNA proposal, I think in the C# world it should not be too difficult to write a web service and call this unmanaged code module, but I hope someone can give me some pointers that are feasible in regards to the requirements.
If you're going with web services, perhaps Soaplab would be useful. It's basically a tool to wrap existing command line applications in SOAP web services. The web services it generates look a bit weird but it is quite a popular way to make something like this work.
Creating an apache module is quite easy and since your familiar with xmlrpc you should check out mod-xmlrpc2. You can easily add your C code to this apache module and have a running xmlrpc server in minutes
I think you may also publish it as a SOAP based web service. gSoap can be used to provide the service interface out of the library. Have you explored gSOAP? See http://www.cs.fsu.edu/~engelen/soap.html
Regards,
Kangkan
Depends what technology you're comfortable with, what you already have installed and working on your servers, and what your load requirements are.
How about raw CGI? Assuming the C code is stateless between requests, you can do this without modifying the library at all. Write a simple script which pulls the request parameters out of the CGI environment, perhaps sanitises the input, calls the library via the command-line interface, and packages the result into whatever HTTP response you want. Then configure Apache to dispatch the relevant URL(s) to this script. Python, for example, has library support for XML-RPC, and so does every other scripting language used on the web.
Servlets sound like overkill, but for instance if you want multiple requests per CGI process instantiation, and don't feel like getting involved in Apache configuration, then it might be easiest to stick with what you know.
I'm doing a similar thing with C++ at the moment. In my case, I'm writing a PHP module to allow PHP scripts to access the logic in my C++ library.
I can then use whatever format I want to allow the rest of the world to see it - initially it will just be through a PHP web application but I'll also be developing an XML-RPC interface.
If you're going down the JNI route, check out SWIG.
http://www.swig.org/Doc1.3/Java.html
Assuming you have headers to project bindings with, swig is pretty easy to work with.