How to schedule a Cron job on the Google Cloud Platform? - google-app-engine

Does anybody know of a barebones example for scheduling a Cron job on the Google Cloud Platform?
I have a simple python script (developed in a virtualenv in visual code) and I'm struggling to follow the examples provided by google.
The script make use of the Google Cloud client libraries (like google.cloud.storage and google.cloud.bigquery).
Any help much appreciated.

One way of doing it would be to
create a VM in the Compute Engine - read this
connect to your instance - read this
set up your cron job by modify the /etc/crontab file: $ nano /etc/crontab
When you are actually interested in starting a process once a new file is uploaded to the Storage, then you could look into setting up a Cloud Function which would do that: https://cloud.google.com/functions/docs/calling/storage

You can now use Google Cloud Scheduler which is a fully managed (and cheap!) and much easier than setting up 'proper' Cron jobs. You can find it here: https://cloud.google.com/scheduler/

Related

Amazon SageMaker Model Monitor for Batch Transform jobs

Couldn't find the right place to ask this, so doing it here.
Does Model Monitor support monitoring Batch Transform jobs, or only endpoints? The documentation seems to only reference endpoints...
We just launched the support.
Here are the sample notebook:
https://github.com/aws/amazon-sagemaker-examples/tree/main/sagemaker_model_monitor/model_monitor_batch_transform
Here is the what's new post:
https://aws.amazon.com/about-aws/whats-new/2022/10/amazon-sagemaker-model-monitor-batch-transform-jobs/

Migrating from Google App Engine Mapreduce to Apache Beam

I have been a long-time user of Google App Engine's Mapreduce library for processing data in the Google Datastore. Google no longer supports it and it doesn't work at all in Python 3. I'm trying to migrate our older Mapreduce jobs to Google's Dataflow / Apache Beam runner, but the official documentation is awful, it just describes Apache Beam, it does not tell you how to migrate.
In particular, the issues are this:
in Mapreduce, the jobs would run on your existing deployed application. However in Beam you have to create and deploy a custom Docker image to build the environment for Dataflow, is this right?
To create a new job template in Mapreduce, you just need to edit a yaml file and deploy it. To create one in Apache beam, you need to create custom runner code, a template file deployed to google cloud storage, and link up with the docker image, is this right?
Is the above accurate? If so, is it generally the case that working with Dataflow is much more difficult than Mapreduce? Are there any libraries or tips for making this easier?
In technical terms that's what is happening, but unless you have some specific advanced use-cases, you won't need to set any custom Docker images manually. Dataflow does some work in the background to run your user code and dependencies on a custom container so that it can execute your user-written code and dependencies on their VMs.
In Dataflow, writing a job template mainly requires writing some pipeline code in your chosen language (Java or Python), and possibly writing some metadata. Once your code is written, creating and staging the template itself isn't much different than running a normal Dataflow job. There's a page documenting the process.
I agree the page on Mapreduce to Beam migration is very sparse and unhelpful, although I think I understand why that is. Migrating from Mapreduce to Beam isn't a straightforward 1:1 migration where only the syntax changes. It's a different pipeline model and most likely will require some level of rewriting your code for the migration. A migration guide that fully covered everything would end up repeating most of the existing documentation.
Since it sounds like most of your questions are around setting up and executing Beam pipelines, I encourage you to begin with the Dataflow quickstart in your chosen language. It won't teach you how to write pipelines, but will teach you how to set up your environment to write and run pipelines. There are links in the quickstarts which direct you to Apache Beam tutorials that teach you the Beam API and how to write your own pipelines, and those will be useful for rewriting your Mapreduce code in Beam.

How to run python file repeatedly every day in GCP(google cloud platform)?

I have written python code to move data from firestore to Bigquery.
How can I run this code at a specified time every day?
Please help beginner
The most economical way would be to use Google Cloud Scheduler. It can initiate a job that runs on a schedule, similar to corn. Then via Pub/Sub it can invoke Google Cloud Function with your code.
Here is the tutorial, describing exactly that: https://cloud.google.com/scheduler/docs/tut-pub-sub Just use Python, instead of JavaScript for runtime.

How can the Google Cloud Datastore Emulator verify our datastore-index.xml?

We use the Google Cloud Datastore Emulator. It autogenerates indexes.yaml. But as we did with the old Google Plugin for Eclipse, we want to get missing-index messages in the local development environment, and not later in cloud deployment. So, we want the Emulator to use our manually-maintained datastore-indexes.xml
How do we configure the use of a specific datastore-indexes.xml in the Google Cloud Datastore Emulator? I don't see any relevant command-line switches in the help text.
EDIT:
My answer was based on the dev_appserver emulator, not the current one. After running some tests, it appears that the emulator only has endpoints for a subset of the Datastore API methods, and index building (nor export/import for that matter) are available.
Leaving my previous answer to avoid repeated answers with the same wrong info:
_________
According to the docs, if autoGenerate="false" is in your datastore-indexes.xml, the development server should ignore the contents of WEB-INF/appengine-generated/datastore-indexes-auto.xml.
I think this might be what you're looking for, although I have not yet tested it.

How to launch app from the web/cloud

I have developed an app in Twilio which I would like to run from the cloud. I tried learning about AWS and Google App Engine but am quite confused at this stage:
I have 2 questions which I hope to get your help on:
1) How can I store my scripts and database in the cloud? Right now, everything is running out of my local machine but I would like to transfer the scripts and db to another server and run my app at a predetermined time of day. What would be the best way to do this?
2) How can I write a batch file to run my app at a predetermined time of day in the cloud?
I understand this does not have code, but I really hope someone can point me to the right direction. I have spent lots of time trying to understand this myself but still am unsure. Tks in adv.
Update: The application is a Twilio app that makes calls to people, the script simply applies an algorithm to make calls in a certain fashion and the database is a mysql db that provides the details of people to be called.
This is quite difficult to provide an exact answer without understanding what is the application, what is the DB or what is the script that you wish to run.
I can give you a couple of ideas that might be helpful in such cases.
OpsWorks (http://aws.amazon.com/opsworks/) is a managed service for managing applications. You can define your stack (multiple layers like web, workers, DB...) and what are the chef recipes that should run in various points in the life of the instances in each layer (startup, shutdown, app deployment or stack modification..). Then you can use the ability to add instances to each layer in specific days and hours, to implement the functionality of running at predetermined times as you requested.
In such a solution you can either have some of your instances (like DB) always on, or even to bootstrap them using the chef recipes every day, with restore from snapshot on start and create snapshot on shutdown.
Another AWS service that you use is Data Pipeline (http://aws.amazon.com/datapipeline/). It is designed to move data periodically between data sources, for example from a MySQL database to Amazon Redshift, the Data warehouse service. But you can use it to trigger scripts and run random shell scripts that you wish (http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-shellcommandactivity.html), and schedule it to run in various conditions like every hour/day or specific times (http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-concepts-schedules.html).
A simple path here would be just to create an EC2 instance in AWS, and put the components needed to run your app there. A thorough walk through is here:
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/get-set-up-for-amazon-ec2.html
Essentially you will create an EC2 virtual machine, which you can for most purposes treat just like any other Linux server. You can install MySQL on it, copy your script there, and run it. Of course whatever container or support libraries your code requires will need to be installed as well.
You don't say what OS you are using locally, but if it is Mac or Linux, you should be able to follow almost the same process to get your script running on an EC2 instance that you used on your local machine.
As you get to know AWS, there are sophisticated services you can use for deployment, infrastructure orchestration, database services, and so on. But just to get started running a script from a virtual machine should be pretty straightforward.
I recently developed a Twilio application using Ruby on Rails for the backend and found Heroku extremely simple to setup and launch. While Heroku does cost more than AWS, I found that the time I saved using Heroku more than made up this. As an early stage startup, we wanted to spend our time developing important features, and not "wasting" time optimizing our AWS cloud.
However, while I believe Heroku is ideal for early-stage websites/startups I do believe hosting should be reevaluated once a company reaches a certain size. At some point it becomes economically viable to devote resources into optimizing an AWS cloud solution because it will be cheaper than Heroku in the long run.

Resources