How do i use cron to update a file in app engine? - google-app-engine

So I have this github repo that updates every 24 hours and I want to pull a file every 24 hours; from this repo to my Google Cloud app engine environment; is there any way I can pull that file every 24 hours using the App Engine cron?
The app is running on an app engine.

Yes, it's possible. But, you have not given us enough info to help. For instance, what language are you using?
If you have a url to a file, you can fetch that from your app. Then, you can save that file to your GCS storage bucket and access it there from your app.

You can use Cloud Build triggers based on GH commits to get the updated file and trigger the app to be deployed to GAE. If the file is updated at the exact same time everyday, you can use Cloud Scheduler to run like a cron job.

Related

which GCP component to use to fetch data from an API

I'm a little bit confused between gcp components, here is my use case :
daily, I need to fetch data from an external API (the API return json data), store it in GCS then load it in Bigquery, I already created the python script fetching the data and store it in GCS and i'm confused which component to use for deployment :
Cloud run : from the doc it is used for deploying services, so I think its a bad choose
Cloud function: I think it works, but it is used for even based processing (through single purpose function...)
composer :(I'll use composer to orchestrate tasks, such as preprocessing of files in GCS, load them to BQ, transfert them to an archive Bucket) through kubernetesPodOperator, create a task that trigger the script to get the data
compute engine: I don't think that its the best chose since there are better ones
app engine: also I don't think it a good idea since it is used to deploy and scale web app ...
(correcte me if i'm wrong in what I said, ) so my question is : what is the GCP component used for this kind of task
Cloud run : from the doc it is used for deploying services
app engine: also I don't think it a good idea since it is used to deploy and scale web app ...
I think you've misunderstood. Both Cloud run and Google App Engine (GAE) are serverless offerings from Google Cloud. You deploy your code to any of them and you can invoke their urls which in turn will cause your code to execute and do stuff like go fetch data from somewhere and save it somewhere.
Google App Engine has a shorter timeout than Cloud Run (can't remember if Cloud Run has time out). So, if your code will take a long time to run, you don't want to use Google App Engine (unless you make it a background task) and if you don't need a UI, then you don't need GAE.
For your specific scenario, you can deploy your code to Cloud Run and use Cloud Scheduler to schedule it to be invoked at specific times. We have that architecture running in a similar scenario (we have a task that runs once daily; it's deployed to Cloud Run; Google Scheduler invokes the endpoint, it runs and saves data to datastore linked to an App Engine App). We wrote a blog article on deploying to Cloud Run and another on securing your cloud run (based off our experience in the earlier described scenario)
GAE Timeout:
Every request to a Google App Engine (Standard) must complete within 1 - 10 minutes for automatic scaling and up to 24 hours for basic scaling (see documentation). For Google App Engine Flexible, the timeout is 60 minutes (documentation).

Python telegram bot persistence with google app engine

I am deploying a Telegram BOT using Python Telegram Bot library on Google App Engine flexible environment and have enabled bot and conversation persistence on this.
The challenge I am facing is, if I update the app and deploy again the conversation has to be restarted since I am unable to copy the persistence file from the previous version of the app.
How do I ensure that the same persistence files are used whenever I deploy a new version of the APP?
Any help is appreciated.
Thank you
If you are writing to the /tmp directory or storing that file inside App Engine's file system, that is expected to happen.
The docs for the standard environment explain this as well, and when you're doing a new deploy the instance you were using gets deleted and then you lose your persistence file with the logs of the chat.
You should consider moving the file to Cloud Storage or using any other storage system to save the changes and upload/downlaod the file on a regular basis in order to avoid that.

Can I delete default buckets on Google App Engine Storage?

When I create a new application on Google Cloud App Engine, these buckets in Google Storage show up as well:
bucket_1: <region>.artifacts.<app_id>.appspot.com
bucket_2: staging.<app_id>.appspot.com
bucket_3: <app_id>.appspot.com
I've only added 300MB on bucket_3 and never added anything to bucket_1. Nonetheless, bucket_1 is currently occupying 3.9GB. Why do I need this bucket_1? Can I delete all its content or even delete the whole bucket?
Thanks in advance.
When you create a new App Engine Application, these buckets in Google Storage are created:
bucket_2: staging.<app_id>.appspot.com
bucket_3: <app_id>.appspot.com
Bucket bucket_1: <region>.artifacts.<app_id>.appspot.com is created when you run the command gcloud app deploy. This is the Container Registry bucket where App Engine stores container images. You can delete this bucket, however, next time when you deploy a new version gcloud app deploy the bucket will be recreated.
I did some testing on my side and observed that when you deploy your first app engine standard version, 48 images are created in us.artifacts.your-project.appspot.com/containers/images folder. From this moment, every time you deploy a new app engine version 3 more images are added to this folder. I am not sure about the internal implementation, but I think it caches the images in this folder.

In GCP, is there a canonical way to scrape data from an API?

I'm building an application that will periodically pull data from several APIs and write them to cloud storage for later processing by Dataflow. There are many different ways to do this so I wanted to sanity check before I jumped in.
My plan is this:
For each API, Cloud Scheduler will hit an endpoint for an App Engine app
The app will create a Compute Engine VM instance with a startup script that pulls the data from the API and writes it to storage
When done, the VM will hit another endpoint on the App Engine app that shuts down the VM.
Is this a reasonable way to perform this sort of action? Is there a simpler or more straight-forward method? Thank you in advance for the replies.
Cloud Scheduler can schedule Compute Engine without App Engine however it seems that you cannot create and delete the VM with this method.
You can just use App Engine cron jobs to schedule the tasks. Your App Engine app cron handler can simply run the script that pulls data from the APIs. Maybe I am missing something, why do you need to use a Compute Engine instance to run the script?

Google Cloud Platform Cron Jobs per project or per app?

I have a Google Cloud Platform application that runs on several Google App Engine Standard instances. One app has a cron.xml with some cron jobs defined pointing to REST endpoints on that app.
Now I want to create a cron job in another app, so I created another cron.xml, all the REST endpoints and deployed. After deployment I realized the deployment erased the cron jobs defined by the first app. I read somewhere that you can only have one cron.xml defined PER PROJECT and not PER APP. Is this correct? I have been reading the documentation and a book on Google App Engine and could not find an answer.
cron.yaml/cron.xml apply per application.
I just got an answer from Google Cloud support. Apparently there can be only one CRON file PER PROJECT. Each upload of a cron file will override the previous upload.
A workaround for this would be add a TARGET tag for the URLs that are not in the same App as the CRON file. Adding target: will re-route the request to the appName app.

Resources