Can different microservices in GAE, own dedicated cron jobs?
Background
We have written multiple services on GAE microservices application.
One micronservice say Service1(default) [JAVA in GAE Standard environment] has 10 cron jobs, wheareas another microservice say, Service2 [Python in GAE Flexible environment] has 5 other cronjobs.
When we deploy both the services, cron jobs get replaced with the latest service cron jobs.
I know that Task Queue is shared resource in GAE Microservices and hence Cron jobs too may be shared. But is it impossible to let microservice have their dedicated cronjobs based on their service scope and get them uploaded on Server where all cronjobs can co-exist?
Timely response is highly appreciated.
The cron configuration is also an application level configuration, not a module/service level one, which is why when you deploy it for one service it overwrites the previous one from another service.
You need to combine all cron jobs for all your services into a single cron configuration file and deploy that one instead, preferably using the specific cron deployment command, not by uploading it together with a particular service (sometimes that fails for multi service apps).
There are other such app level configurations as well, see https://stackoverflow.com/a/42361987/4495081
Related
I wanna deploy service (python script that uses Apache Beam) on my project on GCP with execution time sometimes up to 24h. I need this service with the data pipeline to be always working. Also I have a web application that's gonna use the results from the data pipeline. My solution for this was that I deploy the web app on GCP App Engine and the python script on K8s cluster because the job can last up to 24h and App Engine is serverless so everything in serverless should be a short time job something like up to 15mins. Am I on the right way of thinking or you have some other better solution for GCP services to suggest.
If you are using Apache Beam my advice is that you deploy the pipeline on Dataflow. The service is fully managed by GCP, and in fact this product was the one open sourced in the Apache Beam project, so using the product should be straight forward.
Once processed the data by Dataflow, you can write your results to several possible destinations, like BigQuery, GCS, Pub/Sub, Datastore, and consume these results from your Web App. Please, see the relevant documentation.
Please, only, pay attention on the required processing time: Dataflow will scale as needed but even in that case, if your jobs takes 24 hours to run, is certainly something you must test and study carefully, Also review the possible associated costs.
I'm new to GCP and currently, I try to deploy all my applications on their services.
For an application in a single container I use CloudRun which I already really like.
Now I want to deploy an application that uses Angular in the frontend and spring in the backend and a SQL-DB (Postgres). Every part of this is in its own separated container.
Should I use for this purpose although CloudRun or does GCP have more suitable services I should consider to use if I want to host a scalable and serverless Application? So is there such a thing as the best practice for Frontend-backend architecture applications on GCP?
I recommend you to use these services:
Cloud SQL to host the database. It's managed for you and efficient
Cloud Run for the business tier (the spring application).
Be careful, with spring the cold start can take several seconds. I wrote an article on this (the article is quite old and the perf are now better on Cloud Run, but the latency on the first request exists (and take 5 - 7s for an hello world container)). Add several CPU (4 is a good number) to speed up the cold start or use the --min-instance parameter for this (or other solution that you can find in one of my articles)
For the front end, I recommend you to host the static files on Cloud Storage.
To serve this on internet, put a Load Balancer in front of this
Create a serverless network endpoint group (NEG) for Cloud Run service
Create a Cloud Storage backend to serve the static files.
Use the domain that you want and serve it in SSL
Optionally, use CDN to cache your static files.
CloudRun runs stateless containers. It doesn't make a distinction between frontend and backend, or worker jobs.
You can run your frontend, backend, admin code base as Cloud Run service.
Next to these you setup Cloud SQL for your operational database, and connect the Cloud Run services with the Cloud SQL connector so they are able to use for read/write queries.
After reading the docs
cloud scheduler - https://cloud.google.com/scheduler/
GAE cron job -
https://cloud.google.com/appengine/docs/flexible/nodejs/scheduling-jobs-with-cron-yaml
cloud function pub/sub trigger -
https://cloud.google.com/functions/docs/calling/pubsub
I think they are mostly the same.
I can use GAE cron job + pub/sub + cloud function to implement the same functions which cloud scheduler has.
In my understanding, it seems there are some differences between them:
Cloud Scheduler can be more convenient to adjust frequency. To update the frequency of GAE cron job, you must update the config, like schedule: every 1 hours of cron.yaml and redeploy.
There is no need to implement the cron job architecture(integrate GAE, GAE cron service, pub/sub, cloud function, etc..) by yourself which means you don't need to write code for combining them together anymore.
Am I correct? Or, any other differences?
You're right in that the Google Cloud Scheduler is kind of an evolution of the GAE cron job mechanism to make it more user-friendly and flexible. You can see that they are still related since the Cloud Scheduler doc specifies:
To use Cloud Scheduler your project must contain an App Engine app
that is located in one of the supported regions. If your project does
not have an App Engine app, you must create one.
Historically, GAE cron job was the only cron service offered by the platform. You could only target a GAE handler to receive the request from cron. From there you could indeed perform actions like like publish on pub/sub, call an HTTP Cloud Function or launch a dataflow job, but you always had to deploy a GAE service to handle it, which wasn't optimal.
The new Cloud Scheduler makes it more straightforward to use with Pub/Sub, Cloud Functions but also any publicly available HTTP endpoint (may be on-premise). And of course App Engine handlers. More targets may be added in the future for more use-cases.
Finally, as you mentioned, the API exposed to manage it decouples it from App Engine and its cron.yaml file and makes it easier to create and update cron jobs dynamically.
If I write a CRON job which fetches some external data and saves it on the GAE instance - will it become available across all instances?
I am using nodejs flexible env.
While you do have access to the GAE instance disk, the data in an instance is not replicated trough the rest. This is why it is recommended that you write all information to a Cloud Storage bucket, this way you can use this repository in order to share the data across the instances.
I believe you are referring to a CRON Job directly created on the Flexible instances not to App Engine Cron Service. So the answer is no, it won't be available to all instances. It will only be stored to the instance(s) that executed that cron job at that time, meaning that if you have autoscaling enabled, all the new instances will not contain the external data.
I want to trigger a https endpoint every 1 minute I was using cron-job.org but it is not that reliable and goes down often. I have looked at 2 options Microsoft azure scheduler and Google app engine cron scheduler. Microsoft scheduler pricing is very clear, however, I dont understand how to setup google cron job and pricing to run the cron job every minute.
To use Google's cron scheduler, you will have to pay for the app engine running 24x7. Whereas Azure Scheduler is a true microservice and you only pay based on number of jobs/job collections, not the underlying resources consumed.
Unlike Microsoft's scheduler which appears to be an independently configurable and billable service, the GAE cron service can only be a part of a GAE app.
A standard environment GAE app is charged by instance-hours plus the various services it uses. See App Engine Pricing. But it also comes with fairly generous free daily quotas.
A simple app which would only make a few requests per minute - like the one you describe - should have no problems staying within the free daily limits.
Check the Quickstart to see how to get a basic app skeleton running. You already have the cron service doc, you only need the cron.yaml Reference to add a cron service to your app.