Automate function + Database Google Cloud Platform - database

I have to maintain a database on the Google Cloud Platform and along with it put in a script(preferably in python) that is automated to put in new values from an API on a daily basis.
I'm confused as to how to go about this. Any suggestions?

You can take advantage of the App Engine platform which allow you to deploy a python application. It can be set to simply await instructions from your API or fetch the information directly. With the help of CRON, you can schedule task that should take care of pushing the object within your Database.
Another option would be the Cloud Functions. Currently Cloud Functions only handles the Nodejs runtime but it allows you to run a backend application that only runs when triggered. With a simple HTTP trigger from your API, your function should handle the data received and organize it before storing it in your Database.
Other options are available like Cloud Endpoints, Database (Spanner, Cloud SQL, Cloud PostgreSQL, Bigtable,) API, etc. All depends of semantics of your project (Will it be run only once daily, how fast does the whole operation has to be completed, etc.). I would suggest to review all of Google CLoud products in order to find the right solution for you.

Related

Google App Engine streaming data into Bigquery: GCP architecture

I'm working with a Django web app deployed on Google App Engine flexible environment.
I'm streaming my data while processing requests in my views using bigquery.Client(). But I think it is not the best way to do it. Do I need to delegate this process outside of the view (using pub/sub, tasks, cloud functions etc.? If so, give me a suitable architecture: which GCP product should I use, how to connect, and what to read.
Based on your comment, I could recommend you Cloud Run;
Cloud Run is a serverless container based product. You write a webserver (that handle your POST request), wrap it in a container and deploy it on Cloud Run.
With a brand new feature, named always on the CPU is not throttled after the response sent (the normal behavior). With always on, you keep the full CPU up to the Cloud Run instances off load (usually after 15 minutes, but can be quicker).
The benefit of the feature is the capacity to return immediately the response to the client, and then to continue to process, asynchronously, your data to store in BigQuery (in streaming mode).

Connecting API to database

I am using this API to build an app (Xcode) and the maximum number of calls a day is 5000. The way I have currently built the app for testing purposes is to call the API every time the user refreshes the data. So, I am running out of calls per day. So, I was wondering how to connect an API to a database like firebase. Then update the data in the database maybe 4 times a day at a specific time. When the user would be refreshing, they would pull data from the database instead. I'm new to programming and am not sure if this is the best solution and would appreciate if anyone could direct me to more resources. Thanks!
This is the api I am using: https://projects.propublica.org/api-docs/congress-api/?
Edit: Also would something like this also mean I would build a REST API? https://github.com/unitedstates/congress It is a repository that includes data importing scripts and scrapers. I'm guessing this isn't compatible with swift but is compatible with building a REST API in AWS or Firebase?
You can use AWS (Amazon Web Services). Their free tier allows many of their services for free (12 months, and usage limit) including the ones I would recommend you for this project:
Make an AWS account.
Use S3 storage buckets to host a datafile.
Use API Gateway to make an API.
Use Lambda to run a Python/Javascript in the cloud which connects the API with the S3 bucket (your data).
Use IAM to create roles and permissions for the S3 bucket, API and Lambda scripts to communicate.
Here's how you set up the API: https://www.youtube.com/watch?v=uFsaiEhr1zs
Here's how you read the S3 bucket: https://www.youtube.com/watch?v=6LvtSmJhVRE
You can also work with these tools to set up an API that PUTS data to the S3 bucket and updates the data regularly.

What are the costs involved in triggering functions in Google App Engine?

I am interested in triggering notifications into my Salesforce system when reviews are posted on my Google Business profile.
Using this article, it describes being able to call the Salesforce API by executing Python code using Google's App Engine, leaving very little for me to code myself but I'm not familiar with how Google works in terms of billing.
Essentially when a topic/subscription is triggered, Google will execute a small python script to call the Salesforce API. I want to know how much will it cost to do this, or how can I calculate how this will be billed?
You must be referring to Cloud Functions because that's the product used in the article (also, Cloud Functions has Pub/Sub event trigger). App Engine and Cloud Functions are both serverless products but they have different use cases. If you wish to know the difference between them, here's a good SO answer.
You are billed depending on how long the function takes to finish, how many times you invoke it, and the type of resources configured on the function (the higher the CPU & memory is, the higher the cost). There's additional charges as well if your function makes outbound data transfer whenever you invoke the function.
You can estimate your monthly cost by checking out GCP pricing calculator. Also, there are several products involved on the article (such as Secret Manager and Pub/Sub) so take note of adding these products on your estimate. For additional pricing details, check out the docs: https://cloud.google.com/functions/pricing
I also have another advice though this is not about the pricing when invoking the function but more of the additional when deploying it. From GCP docs:
When you deploy your function's source code to Cloud Functions, that source is stored in a Cloud Storage bucket. Cloud Build then automatically builds your code into a container image and pushes that image to Container Registry. Cloud Functions accesses this image when it needs to run the container to execute your function.
This apply on newer runtimes. It may be noticeable on your billing if you frequently deploy/update a function because Cloud Build has to re-build those images and those images has to be stored on a Cloud Storage bucket. Functions take time to deploy so it's best practice to test your functions first locally. A solution is to use Functions Framework.

EC2 , Openstack, Google App Engine (GAE) and REST

I was handed an assignment but I don't know where to start.
The aim is to have 2 piece of code running. One will run in Open stack private cloud and perform the task of indexing two sets of text, with another running in EC2 with the task of matching the two indexed tests.
I want to access them via google app engine.
Ideally, I would like to click a button or perform an action on Google app engine, which then sends a request to Openstack to run the code and retrieve the output of a txt file.
That outputted text files will then be forwarded onto EC2 where the matching will occur and the results sent back to Google App Engine.
My question is, how can I send the files between the systems using REST requests?
FrankN --
EC2, GAE and OpenStack are disparate cloud computing platforms. To integrate them might include, say, using one platform while saving backups to another.
CloudU.Rackspace.com is a vendor-neutral education site about cloud computing (note: It'll take six or so hours to finish it all). This might help.
Disclaimer: I work for Rackspace.
Firstly, not really sure what your requirements are, why your code does or why are you trying to mix cloud providers in that way.
That said, I would suggest taking the upload from GAE and push it to AWS S3 where you can then retrieve and use as you please from EC2.
Not sure what functionality you are trying to get out of OpenStack that is not present in AWS; however, I would suggest building whatever you are building in EC2 first, then duplicate in on OpenStack services to avoid future vendor lock in.

Using amazon web services as google app engine back end

I am currently using google app engine as my mobile application back end. I have a few tasks that can not be performed in the gae environment (mainly image recognition using opencv). My intention is to retain gae and use AWS to perform these specific tasks.
Is there a simple way to pass specific tasks from gae to AWS? E.g. A task queue?
You could either push tasks from GAE towards AWS, or have your AWS instances pull tasks from GAE.
If you push tasks from GAE towards AWS, you could use URLFetch to push your data towards your AWS instances.
If you prefer to have your AWS instances pull tasks from GAE, you could have your GAE instances put their tasks in the GAE Pull Queue, and then have your AWS instances use the Task Queue REST API to lease tasks from the queue.
In either case, the AWS instance could report back the processing result through a simple POST request to your GAE servlets, or through inserting tasks via the abovementioned REST API which would later be leased by your GAE instances. The latter could be useful if you want to control the rate of which your GAE app process the results.
Disclaimer: I'm a lead developer on the AppScale project.
One way that you could go is with AppScale - it's an open source implementation of the App Engine APIs that runs over Amazon EC2 (as well as other clouds). Since it's open source, you could alter the AppServer that we ship with it to enable OpenCV to be used. This would require you to run your App Engine app in AWS, but you could get creative and have a copy of your app running with Google, and have it send Task Queue requests to the version of your app running in AWS only when you need to use the OpenCV libraries.
Have you considered using amazon simple queue service ? http://aws.amazon.com/sqs/
You should be able to add items to the queue from gae using a standard http clint.
Sure. AppEngine has a Task Queue, where you can put in your tasks by simply implementing DeferredTask. In that task you can make requests to AWS.
Your intention to retain the application in GAE and use AWS to perform a few tasks, that can not be performed in the GAE, seems for me a right scenario.
I'd like to share a few ideas along with some resources to answer the main part of your question:
Is there a simple way to pass specific tasks from gae to AWS? E.g. A task queue?
If you need GAE and AWS to perform the task all the time (24/7) then your application will definitely depend on batch schedule or task queue. They are available by GAE.
However if you could arrange to pull the task in GAE and perform by AWG on interval basis (say twice a day of less than an hour each), you may no need to use them as long you can manage the GAE to put the data on Google Cloud Storage (GCS) as public.
For this scenario, you need to setup AWS EC2 Instance for On/Off Schedule and let the instance to run a boot script using cloud-init to collect the data through your domain that pointed to GCS (c.storage.googleapis.com) like so:
wget -q --read-timeout=0.0 --waitretry=5 --tries=400 \\
--background http://your.domain.com/yourfile?q=XXX...
By having the data from GCS, then AWS can perform these specific tasks. Let it fire up GAE to clean the data and put the result back to GCS to be ready to be used as your mobile application back end.
Following are some options to consider:
You should note that not all of the EC2 types are suitable for On/Off Schedule. I recommend to use EC2-VPC/EBS if you want to setup AWS EC2 Instance for On/Off Schedule
You may no need to setup EC2 if you can set AWS Lambda to perform the task without EC2. The cost is cheaper, a task running twice a day for typically less than 3 seconds with memory consumption up to 128MB typically costs less than $0.0004 USD/month
As outcome of rearranging you your application in GAE and set AWG to perform some of the tasks, it might finally rise your billing rates, try to to optimize the instance class in GAE.

Resources