Long running script in Google Cloud

Long running script in Google Cloud - google-app-engine

I have a piece of code, based on NodeJs, that does not serve any HTTP request, but monitors some online systems and sends report emails.
This code is run by a shell script and keeps running 24x7.
Which Google Cloud offering is best suited to host this?
I tried with App Engine, but after one hour of console inactivity, the console exists and the script stops running.
I am not sure if Compute Engine would be best for this. I can host this in AWS EC2, it would work there... but wondering about Google.
Any tips appreciated.
Thanks

This can be done with a simple Python app running in App Engine Standard Platform. See this post for details.

If you're able to modify it so it can run periodically you could run it on AWS Lambda with a schedule as trigger and use SES to send out e-mails.
Alternatively, if you have control over the "online systems", you could use CloudWatch custom metrics and create alerts based on the thresholds of your metrics.
If you must use Google Cloud, you could use Google Cloud Functions instead of AWS Lambda, and Google Cloud Monitoring / Logging.

The second version of Google Cloud Functions can run for up to 60 minutes (thanks to Google Cloud Run).
To sum up on GCP:
Google Cloud Functions 1nd: 9 mins
Google Cloud Functions 2nd: 60 mins
Google Cloud Run: 60 mins
Google App Engine: 10 minutes (Automatic scaling) and 24 hours (Basic scaling)
Google Compute engine : infinite (you manage the VM)

Related

which GCP component to use to fetch data from an API

I'm a little bit confused between gcp components, here is my use case :
daily, I need to fetch data from an external API (the API return json data), store it in GCS then load it in Bigquery, I already created the python script fetching the data and store it in GCS and i'm confused which component to use for deployment :
Cloud run : from the doc it is used for deploying services, so I think its a bad choose
Cloud function: I think it works, but it is used for even based processing (through single purpose function...)
composer :(I'll use composer to orchestrate tasks, such as preprocessing of files in GCS, load them to BQ, transfert them to an archive Bucket) through kubernetesPodOperator, create a task that trigger the script to get the data
compute engine: I don't think that its the best chose since there are better ones
app engine: also I don't think it a good idea since it is used to deploy and scale web app ...
(correcte me if i'm wrong in what I said, ) so my question is : what is the GCP component used for this kind of task

Cloud run : from the doc it is used for deploying services
app engine: also I don't think it a good idea since it is used to deploy and scale web app ...
I think you've misunderstood. Both Cloud run and Google App Engine (GAE) are serverless offerings from Google Cloud. You deploy your code to any of them and you can invoke their urls which in turn will cause your code to execute and do stuff like go fetch data from somewhere and save it somewhere.
Google App Engine has a shorter timeout than Cloud Run (can't remember if Cloud Run has time out). So, if your code will take a long time to run, you don't want to use Google App Engine (unless you make it a background task) and if you don't need a UI, then you don't need GAE.
For your specific scenario, you can deploy your code to Cloud Run and use Cloud Scheduler to schedule it to be invoked at specific times. We have that architecture running in a similar scenario (we have a task that runs once daily; it's deployed to Cloud Run; Google Scheduler invokes the endpoint, it runs and saves data to datastore linked to an App Engine App). We wrote a blog article on deploying to Cloud Run and another on securing your cloud run (based off our experience in the earlier described scenario)
GAE Timeout:
Every request to a Google App Engine (Standard) must complete within 1 - 10 minutes for automatic scaling and up to 24 hours for basic scaling (see documentation). For Google App Engine Flexible, the timeout is 60 minutes (documentation).

Who starts the google cloud scheduler?

I can't find any documentation on how gcp scheduler works under the hood. An App Engine is needed in the project, so I assume that the Http calls or Pub/Sub messages are started from the App Engine.
Currently I can use a cloud scheduler even without an App Engine in the project. Apparently a compute engine that also contains a permanently running VM is also sufficient. Could someone confirm my assumptions please or does anyone have sources on this?

I can't tell you how work Cloud Scheduler under the hood. I can just tell you that works well!
I'm sure there is a VM, or a cluster of VM, on Google serverless environment, and your Cloud Scheduler job is set on it. It's serverless, the under the hood doesn't matter, it works, and it's what I want!
Now, the relation with App Engine can be confusing. In fact, there is no longer relation between the product now, but you need the App Engine API activated on your project to use Cloud Scheduler. This strange things is normal if you have been using Google Cloud for a while. At the beginning, only App Engine existed, and Datastore, Cloud Task, Cloud Scheduler was all "modules" of App Engine. Years, after years, google has refactored and extracted these modules to create independent products, as you can see them today. However, some relations are still present, like the API activation.

Where to run continuous jobs on Google Cloud Platform?

I have a job that involves continuously listening to one or more websocket/mqtt feeds and forwarding this data to an event queue. This job is written in javascript and would run 24/7 in a continuous loop.
The most obvious solution is to run this job on a VM with Compute Engine, but I was wondering is there is a more elegant solution. Azure, for example, has WebJobs that's well-suited to this kind of task. It even restarts the script if there is an error.
Is there some other component on GCP that can run this job in a "managed" way?

Google Cloud does not have a product similar to Azure WebJobs at the moment. Both the standard and flexible environments of Google Cloud App Engine do not currently support websockets. In order to use websockets you can use Compute Engine or Kubernetes Engine.

Running a Fully-Managed, Always-Available node.js script on Google Cloud Platform

I am a big fan of particle.io and was very excited when they added a Google Cloud Platform (GCP) integration so I can save my IoT data into a GCP "DataStore".
I've followed their tutorial and got it working but I need some advice on implementing this so it can scale on GCP.
My current implementation is like so:
https://docs.particle.io/tutorials/integrations/google-cloud-platform/#example-use-cases
Basically I have a GCP "Compute Engine" instance which runs a node.js script that listens for the PubSub events (sent by my IoT devices) and saves it to DataStore.
Now because I want it to scale, ideally this node.js script should run on a managed service that can respond to spikes automatically. But GCP does not seem to have anything like this.
In AWS I could so this:
IoT Data -> Particle.io AWS WebHook -> AWS API Gateway Endpoint -> AWS Lambda -> AWS DynamoDB
All the AWS points are managed.
What's the best way to have that node.js script always running in a fully-managed, always-available way on GCP? which can run my node.js script that listens for PubSub events and saves to the DataStore and automatically scales as load increases
Any help/advice will be appreciated.
Thanks very much,
Mark

You have a number of options:
1- As someone else mentioned, there is Cloud Functions. It's basically a Node.js function you deploy and Google Cloud takes care of scaling it up/down for you.
2- You can deploy your Node.js app to App Engine Flex which has autoscaling enabled by default.
3- If you want to stay on Compute Engine, you can manually set autoscaling on Compute Engine.

Google app engine cron job scheduling setup and pricing

I want to trigger a https endpoint every 1 minute I was using cron-job.org but it is not that reliable and goes down often. I have looked at 2 options Microsoft azure scheduler and Google app engine cron scheduler. Microsoft scheduler pricing is very clear, however, I dont understand how to setup google cron job and pricing to run the cron job every minute.

To use Google's cron scheduler, you will have to pay for the app engine running 24x7. Whereas Azure Scheduler is a true microservice and you only pay based on number of jobs/job collections, not the underlying resources consumed.

Unlike Microsoft's scheduler which appears to be an independently configurable and billable service, the GAE cron service can only be a part of a GAE app.
A standard environment GAE app is charged by instance-hours plus the various services it uses. See App Engine Pricing. But it also comes with fairly generous free daily quotas.
A simple app which would only make a few requests per minute - like the one you describe - should have no problems staying within the free daily limits.
Check the Quickstart to see how to get a basic app skeleton running. You already have the cron service doc, you only need the cron.yaml Reference to add a cron service to your app.