I can't find any documentation on how gcp scheduler works under the hood. An App Engine is needed in the project, so I assume that the Http calls or Pub/Sub messages are started from the App Engine.
Currently I can use a cloud scheduler even without an App Engine in the project. Apparently a compute engine that also contains a permanently running VM is also sufficient. Could someone confirm my assumptions please or does anyone have sources on this?
I can't tell you how work Cloud Scheduler under the hood. I can just tell you that works well!
I'm sure there is a VM, or a cluster of VM, on Google serverless environment, and your Cloud Scheduler job is set on it. It's serverless, the under the hood doesn't matter, it works, and it's what I want!
Now, the relation with App Engine can be confusing. In fact, there is no longer relation between the product now, but you need the App Engine API activated on your project to use Cloud Scheduler. This strange things is normal if you have been using Google Cloud for a while. At the beginning, only App Engine existed, and Datastore, Cloud Task, Cloud Scheduler was all "modules" of App Engine. Years, after years, google has refactored and extracted these modules to create independent products, as you can see them today. However, some relations are still present, like the API activation.
Related
I'm a little bit confused between gcp components, here is my use case :
daily, I need to fetch data from an external API (the API return json data), store it in GCS then load it in Bigquery, I already created the python script fetching the data and store it in GCS and i'm confused which component to use for deployment :
Cloud run : from the doc it is used for deploying services, so I think its a bad choose
Cloud function: I think it works, but it is used for even based processing (through single purpose function...)
composer :(I'll use composer to orchestrate tasks, such as preprocessing of files in GCS, load them to BQ, transfert them to an archive Bucket) through kubernetesPodOperator, create a task that trigger the script to get the data
compute engine: I don't think that its the best chose since there are better ones
app engine: also I don't think it a good idea since it is used to deploy and scale web app ...
(correcte me if i'm wrong in what I said, ) so my question is : what is the GCP component used for this kind of task
Cloud run : from the doc it is used for deploying services
app engine: also I don't think it a good idea since it is used to deploy and scale web app ...
I think you've misunderstood. Both Cloud run and Google App Engine (GAE) are serverless offerings from Google Cloud. You deploy your code to any of them and you can invoke their urls which in turn will cause your code to execute and do stuff like go fetch data from somewhere and save it somewhere.
Google App Engine has a shorter timeout than Cloud Run (can't remember if Cloud Run has time out). So, if your code will take a long time to run, you don't want to use Google App Engine (unless you make it a background task) and if you don't need a UI, then you don't need GAE.
For your specific scenario, you can deploy your code to Cloud Run and use Cloud Scheduler to schedule it to be invoked at specific times. We have that architecture running in a similar scenario (we have a task that runs once daily; it's deployed to Cloud Run; Google Scheduler invokes the endpoint, it runs and saves data to datastore linked to an App Engine App). We wrote a blog article on deploying to Cloud Run and another on securing your cloud run (based off our experience in the earlier described scenario)
GAE Timeout:
Every request to a Google App Engine (Standard) must complete within 1 - 10 minutes for automatic scaling and up to 24 hours for basic scaling (see documentation). For Google App Engine Flexible, the timeout is 60 minutes (documentation).
I have a job that involves continuously listening to one or more websocket/mqtt feeds and forwarding this data to an event queue. This job is written in javascript and would run 24/7 in a continuous loop.
The most obvious solution is to run this job on a VM with Compute Engine, but I was wondering is there is a more elegant solution. Azure, for example, has WebJobs that's well-suited to this kind of task. It even restarts the script if there is an error.
Is there some other component on GCP that can run this job in a "managed" way?
Google Cloud does not have a product similar to Azure WebJobs at the moment. Both the standard and flexible environments of Google Cloud App Engine do not currently support websockets. In order to use websockets you can use Compute Engine or Kubernetes Engine.
I have a machine learning project and for this project, I have to get data from a website in evey 15 minutes. And I decided to use google cloud platform to do it. I've coded a python script to do the process(get the data from website and write down to a csv file) and when I run this script on my computer, it works well. I need to run this script for a couple weeks. So it should be running in google cloud's computers and it should continue running when I close my computer. How can I do this?
I can also use another cloud service if it's required to but google cloud would be better.
Disclaimer: I'm with Google Cloud Platform Support
Google Cloud Compute Engine is defined as an Infrastructure as a Service. It basically provides access to Virtual Machines (VMs), Disks and Networking functionalities. By using this product, you are able to configure your resources from scratch, defining one or multiple VM instances, configuring your work environment, etc. It might require more configuration and boiler plating than needed, but it offers the most control. You can always use some resources for free but in my opinion it is a lot of scratch to start from.
Google Cloud App Engine is defined as a Platform as a Service. It is basically a managed app platform. The management can be automatised to certain degrees. It is based on Compute Engine, in the sense that it provides functionalities, a platform, on top of the infrastructure defined by Compute Engine VMs. You can thus deploy your python script in an App Engine Flexible Python Environment. You can define your whole application as a collection of interrelated microservices, i.e. one service gets the data from a website, maybe another writes csv files and another might trigger ML jobs.
App Engine also provides the possibility to schedule jobs as cron jobs. So if your application needs to run periodical jobs or at a specific time, this is the tool to use. App Engine pricing is correlated with the used resources, but you can estimate eventual budgets by using the Google Cloud Platform Pricing Calculator.
You can store the csv files in Google Cloud Storage as objects in buckets or as data in Datastore, Cloud SQL or BigQuery. Components of Google Cloud Platform can communicate with each other via service acounts. This allows your App Engine deployment, for example, to perform CRUD operations in your Cloud SQL instance, programatically. Or... to trigger a Cloud Machine Learning job.
Your question is very broad and can be addressed in multiple, various ways. I would initially deploy the python script in App Engine Flexible. I would deploy a cron job if needed, to fetch data every 15 minutes. I would upload the csv files in Google Cloud Storage Buckets. I would then use the Cloud Machine Learning python client to trigger Machine Learning jobs programatically.
There are other products that might interest you:
Cloud Dataflow - configure stream/batch data processing
Cloud Dataprerp - transform/clean raw data
Cloud Pub/Sub - global real-time messaging.
All the products/components and sub-products/sub-components can communicate with each other and processes can easily be automated in the Cloud. So the whole project can run in Google's Cloud infrastructure when you close your computer. But, of course, you have to configure it beforehand, in your Google Cloud Platform Project(s).
I am aware that I met your broad question with a broad answer. For any specific issues along your path of implementing the project in the Cloud, the community will be here to provide support.
Good luck!
Im creating a Node.js website that probably won't have loads of traffic, and was looking into cheap solutions to host the site. Came across Google cloud services offering free usage for their services with limits. A f1-mirco is more than enough for my needs, but I will happily pay for some usage if it goes over by any chance.
I wanted to setup a linux centOS 7 on GCE (which I already did), and run my application and REST API on it. Now here comes the problem.
I tried to use Google's datastore service, but it sprung an app engine instance and without it datastore won't work.
Is datastore entirely relying on app engine to function?? In the docs, it said if you use any of the client API, it requires app engine. What can I do to not use the client api and query data then? Don't want to use the app engine at the moment or datastore is just not for me then?
Thanks for any help!
Some of the underlying infrastructure of Cloud Datastore and App Engine are still tied together for creation, etc. So while creating an Cloud Datastore database also defines an App Engine instance for the project, it doesn't require you to use it. You don't get charged for App Engine either, unless you decide to deploy an App using it.
You should be totally fine use the Google Cloud Node client library on the f1 micro instance.
I am pretty new to this whole idea of cloud and started of with Google app engine. I was able to create the basic 'hello world' program.
When i tried to understand the difference between a cloud and a server I learned that Cloud is where you have an access to virtual instance created exclusively for you and you are free to choose and install software of your choice.
But I don't see such an option with Google-cloud/app-engine. What if I have a tom-cat based application server which I would like to deploy on a cloud? Will Google app engine be of any help or should I try other cloud service providers such as Amazon EC2, hp cloud etc?
/DJ
The cloud type that you are referring to is called Infrastructure as a Service cloud.
OTOH, Google App Engine is Platform as a Service cloud.
The difference is that IaaS are a bunch of virtual machines that you need to setup yourself (OS + app stack), while PaaS typically comes with it's own API, where you write your app against the API and the rest (sw stack + scalability) is taken care of.
AppEngine comes with it's own servlet container (Tomcat is also a servlet container), so from this standpoint you could use your code on AppEngine. But the problem lies elsewhere: AppEngine imposes a set of limitation on the apps:
app must use GAE provided databases.
app can not write to filesystem
app can not have listening sockets
requests must finish in 60 seconds (e.g. no Comet or WebSockets -> no push)
You might want to review the FAQ.
To add to Peter's excellent answer, note that Google also has an IaaS service called Google Compute Engine.
Regarding other cloud query-
Before you start with cloud you might once try other options. Currently deploying application in almost all services are very easy.
few of them are-
Jelastic , Heroku , rackspace , nimbus , openshift etc.
Difference between cloud and server is very well explained already.
Since you mentioned about tomcat based application , I have worked with Jelastic for the same and found very easy to implement.
http://jelastic.com/docs/tomcat
http://jelastic.com/tomcat-hosting
Try all possible option , it will help you more .