I was handed an assignment but I don't know where to start.
The aim is to have 2 piece of code running. One will run in Open stack private cloud and perform the task of indexing two sets of text, with another running in EC2 with the task of matching the two indexed tests.
I want to access them via google app engine.
Ideally, I would like to click a button or perform an action on Google app engine, which then sends a request to Openstack to run the code and retrieve the output of a txt file.
That outputted text files will then be forwarded onto EC2 where the matching will occur and the results sent back to Google App Engine.
My question is, how can I send the files between the systems using REST requests?
FrankN --
EC2, GAE and OpenStack are disparate cloud computing platforms. To integrate them might include, say, using one platform while saving backups to another.
CloudU.Rackspace.com is a vendor-neutral education site about cloud computing (note: It'll take six or so hours to finish it all). This might help.
Disclaimer: I work for Rackspace.
Firstly, not really sure what your requirements are, why your code does or why are you trying to mix cloud providers in that way.
That said, I would suggest taking the upload from GAE and push it to AWS S3 where you can then retrieve and use as you please from EC2.
Not sure what functionality you are trying to get out of OpenStack that is not present in AWS; however, I would suggest building whatever you are building in EC2 first, then duplicate in on OpenStack services to avoid future vendor lock in.
Related
I am deploying a Plotly Dash app on Google App Engine but meet some difficulties. The data source to be queried in the dashboard is a Bigquery table, whose content is changing. I hope that the data in the App can always be the latest.
What I tried is at the beginning of the main.py code, I read in the table from Bigquery by Bigquery Python API, but after the App being deployed onto GAE, I found the data was fixed; even I deleted the Bigquery table, the App was not affected. May I know what is the correct way to get data from BigQuery to App Engine? Thanks.
How to connect to BigQuery from GAE using Python might be a bit more of a task than a single question can answer, but here are some hints:
Everything Google Cloud can (in my opinion) be best understood through the repositories on Github. For instance, the python docs samples contain several examples, out of which I think the client example is probably the easiest and most basic. Bigquery Python Samples are here. That will basically answer your question, except for a few gotchas I will mention.
You will of course need to download the client library to do development on a local environment. That is straightforward, but if something seems not to be working make sure you have enabled the API service account for your project--that can be a little confusing.
Something that is critically important to remember is that your GAE app will not be able to easily communicate with BigQuery if it is in a different region, and, in fact, once you set up a GAE app you cannot move or delete it! So, do pay attention to what you are doing as you set up, and if you have a locations mismatch you will need to migrate your BQ instance to the matching location.
I have to maintain a database on the Google Cloud Platform and along with it put in a script(preferably in python) that is automated to put in new values from an API on a daily basis.
I'm confused as to how to go about this. Any suggestions?
You can take advantage of the App Engine platform which allow you to deploy a python application. It can be set to simply await instructions from your API or fetch the information directly. With the help of CRON, you can schedule task that should take care of pushing the object within your Database.
Another option would be the Cloud Functions. Currently Cloud Functions only handles the Nodejs runtime but it allows you to run a backend application that only runs when triggered. With a simple HTTP trigger from your API, your function should handle the data received and organize it before storing it in your Database.
Other options are available like Cloud Endpoints, Database (Spanner, Cloud SQL, Cloud PostgreSQL, Bigtable,) API, etc. All depends of semantics of your project (Will it be run only once daily, how fast does the whole operation has to be completed, etc.). I would suggest to review all of Google CLoud products in order to find the right solution for you.
In my app (Google App Engine Standard Python 2.7) I have some flags in global variables that are initialized (read values from memcache/Datastore) when the instance start (at the first request). That variables values doesn't change often, only once a month or in case of emergencies (i.e. when google app engine Taskqueue or Memcache service are not working well, that happened not more than twice a year as reported in GC Status but affected seriously my app and my customers: https://status.cloud.google.com/incident/appengine/15024 https://status.cloud.google.com/incident/appengine/17003).
I don't want to store these flags in memcache nor Datastore for efficiency and costs.
I'm looking for a way to send a message to all instances (see my previous post GAE send requests to all active instances ):
As stated in https://cloud.google.com/appengine/docs/standard/python/how-requests-are-routed
Note: Targeting an instance is not supported in services that are configured for auto scaling or basic scaling. The instance ID must be an integer in the range from 0, up to the total number of instances running. Regardless of your scaling type or instance class, it is not possible to send a request to a specific instance without targeting a service or version within that instance.
but another solution could be:
1) Send a shutdown message/command to all instances of my app or a service
2) Send a restart message/command to all instances of my app or service
I use only automatic scaling, so I'cant send a request targeted to a specific instance (I can get the list of active instances using GAE admin API).
it's there any way to do this programmatically in Python GAE? Manually in the GCP console it's easy when having a few instances, but for 50+ instances it's a pain...
One possible solution (actually more of a workaround), inspired by your comment on the related post, is to obtain a restart of all instances by re-deployment of the same version of the app code.
Automated deployments are also possible using the Google App Engine Admin API, see Deploying Your Apps with the Admin API:
To deploy a version of your app with the Admin API:
Upload your app's resources to Google Cloud Storage.
Create a configuration file that defines your deployment.
Create and send the HTTP request for deploying your app.
It should be noted that (re)deploying an app version which handles 100% of the traffic can cause errors and traffic loss due to:
overwriting the app files actually being in use (see note in Deploying an app)
not giving GAE enough time to spin up sufficient instances fast enough to handle high income traffic rates (more details here)
Using different app versions for the deployments and gradually migrating traffic to the newly deployed apps can completely eliminate such loss. This might not be relevant in your particular case, since the old app version is already impaired.
Automating traffic migration is also possible, see Migrating and Splitting Traffic with the Admin API.
It's possible to use the Google Cloud API to stop all the instances. They would then be automatically scaled back up to the required level. My first attempt at this would be a process where:
The config item was changed
The current list of instances was enumerated from the API
The instances were shutdown over a time period that allows new instances to be spun up and replace them, and how time sensitive the config change is. Perhaps close on instance per 60s.
In terms of using the API you can use the gcloud tool (https://cloud.google.com/sdk/gcloud/reference/app/instances/):
gcloud app instances list
Then delete the instances with:
gcloud app instances delete instanceid --service=s1 --version=v1
There is also a REST API (https://cloud.google.com/appengine/docs/admin-api/reference/rest/v1/apps.services.versions.instances/list):
GET https://appengine.googleapis.com/v1/{parent=apps/*/services/*/versions/*}/instances
DELETE https://appengine.googleapis.com/v1/{name=apps/*/services/*/versions/*/instances/*}
We have Java servlets up and running on GAE, using blobstore, datastore and other cloud services.
Currently, we're starting a migration process to cloud Endpoints and we've hit an issue: if we use a different GAE project, we would not be able to query regarding current datastore entities (to the best of my knowledge, Google doesn't want you to do this - see
this question
and the GAE terms of service - section 3.3d), so we need to use the same project for both.
I looked up whether it's possible to have one GAE instance running Java servlets and one instance running Endpoints, but I found no conclusive answer anywhere.
If we try to implement and something goes wrong, we're looking at a potentially major issue for our users, so we need to be sure beforehand.
Has anyone tried something similar, and can assure us that this works?
You have 2 options to run the old and the new code inside the same app (thus with no issues sharing access to the datastore) but as separate engine instances, so they can be developed/deployed/managed independently:
as different versions of the same app/module(s):
the old version remains the default, the new one can be accessed at a different URL during development (possibly via URL routing)
you can use traffic splitting to do live A/B testing on the new code and for gradual final migration until you make the new version the default
as different modules of the same app:
both can run (fully functional) side by side indefinitely, but you need to be more careful during development
traffic is routed to the modules in several possible ways
final migration is done by publishing the new URLs, eventually re-directing the old URLs and finally bringing down the old module code
The 2 approaches can even be combined, if needed, as the final solution described by the OP's in this somehow similar question (for the python environment, but java equivalents exist): Google App Engine upgrading part by part
I am currently using google app engine as my mobile application back end. I have a few tasks that can not be performed in the gae environment (mainly image recognition using opencv). My intention is to retain gae and use AWS to perform these specific tasks.
Is there a simple way to pass specific tasks from gae to AWS? E.g. A task queue?
You could either push tasks from GAE towards AWS, or have your AWS instances pull tasks from GAE.
If you push tasks from GAE towards AWS, you could use URLFetch to push your data towards your AWS instances.
If you prefer to have your AWS instances pull tasks from GAE, you could have your GAE instances put their tasks in the GAE Pull Queue, and then have your AWS instances use the Task Queue REST API to lease tasks from the queue.
In either case, the AWS instance could report back the processing result through a simple POST request to your GAE servlets, or through inserting tasks via the abovementioned REST API which would later be leased by your GAE instances. The latter could be useful if you want to control the rate of which your GAE app process the results.
Disclaimer: I'm a lead developer on the AppScale project.
One way that you could go is with AppScale - it's an open source implementation of the App Engine APIs that runs over Amazon EC2 (as well as other clouds). Since it's open source, you could alter the AppServer that we ship with it to enable OpenCV to be used. This would require you to run your App Engine app in AWS, but you could get creative and have a copy of your app running with Google, and have it send Task Queue requests to the version of your app running in AWS only when you need to use the OpenCV libraries.
Have you considered using amazon simple queue service ? http://aws.amazon.com/sqs/
You should be able to add items to the queue from gae using a standard http clint.
Sure. AppEngine has a Task Queue, where you can put in your tasks by simply implementing DeferredTask. In that task you can make requests to AWS.
Your intention to retain the application in GAE and use AWS to perform a few tasks, that can not be performed in the GAE, seems for me a right scenario.
I'd like to share a few ideas along with some resources to answer the main part of your question:
Is there a simple way to pass specific tasks from gae to AWS? E.g. A task queue?
If you need GAE and AWS to perform the task all the time (24/7) then your application will definitely depend on batch schedule or task queue. They are available by GAE.
However if you could arrange to pull the task in GAE and perform by AWG on interval basis (say twice a day of less than an hour each), you may no need to use them as long you can manage the GAE to put the data on Google Cloud Storage (GCS) as public.
For this scenario, you need to setup AWS EC2 Instance for On/Off Schedule and let the instance to run a boot script using cloud-init to collect the data through your domain that pointed to GCS (c.storage.googleapis.com) like so:
wget -q --read-timeout=0.0 --waitretry=5 --tries=400 \\
--background http://your.domain.com/yourfile?q=XXX...
By having the data from GCS, then AWS can perform these specific tasks. Let it fire up GAE to clean the data and put the result back to GCS to be ready to be used as your mobile application back end.
Following are some options to consider:
You should note that not all of the EC2 types are suitable for On/Off Schedule. I recommend to use EC2-VPC/EBS if you want to setup AWS EC2 Instance for On/Off Schedule
You may no need to setup EC2 if you can set AWS Lambda to perform the task without EC2. The cost is cheaper, a task running twice a day for typically less than 3 seconds with memory consumption up to 128MB typically costs less than $0.0004 USD/month
As outcome of rearranging you your application in GAE and set AWG to perform some of the tasks, it might finally rise your billing rates, try to to optimize the instance class in GAE.