Is it possible to run postgres (essentially, a non-HTTP service) in a custom Google App Engine Flexible container? Or will I be forced to use Google's Cloud SQL solution?
TL;DR: You could do that, but don’t. It’s better to externalize the persistent data storage.
Yes, it is possible to run a PostgreSQL database as a microservice (named simply a 'service' in Google Cloud Platform) in a custom Google App Engine Flexible container. However, that raises another important question, namely why would you like to run an SQL database inside a container. This is a risky solution, unless you are perfectly sure about what you are doing and how to manage that.
Typical container orchestration is based on stateless services which means that they are not intended to store persistent data. This kind of containers do have some form of storage sometimes, like NoSQL databases for cache or user session information. This data is not persistent, it can be lost during restarts or destruction of instances in an agile containerized application environment. PostgreSQL databases are rather used as stateful services and do not suit the aforementioned model. Putting such database into a container, one can run into problems like data corruption or direct concurrency when accessing some shared data directory. Also, in Google App Engine Flexible it’s not possible to add a shared persistent disk, the volumes are attached to instances and destroyed together with them. Much safer solution is keeping the SQL database in an external, durable storage, as Cloud SQL that you have mentioned. There are numerous blog posts and articles that elaborate this issue with the stateless/stateful services, like this one.
It should be mentioned that if you are to use the container in a local environment or for test/development (and you are not looking for a durable state of the database), putting a PostgreSQL inside a container should be perfectly ok. Also, if you design a special way of splitting your data across instances this could work fine, as the guys did with their MySQL servers in this article. So once again, the idea of putting a PostgreSQL database in a container should be carefully thought-out, especially that there are so many options of a safe externalization of such a service.
And just as a side note, you are not forced to use Cloud SQL. The database can be hosted on Compute Engine, another cloud provider, on premises, or can be managed by a third-party vendor. In case of hosting it in Compute Engine the application is able to communicate with the database inside the same project using the internal IP of the Compute Engine instance. Using Cloud Launcher you can quickly deploy PostgreSQL and other popular databases to Compute Engine. Check these Google docs for more information about using third-party databases.
Related
I would like to save hyperspectral images using Python, but I don't know where I can persist the data. I have thought about HDFS. I need to do it on my local server without using cloud providers
Is there a way to make it easy, and do you recommend any particular database?
HDFS generally requires you to build and administer your own Hadoop cluster.
I'd consider cloud object storage such as AWS S3 or Google Cloud Storage for the following reasons:
Relatively cheap
Fully managed
No restriction on file size or number of files
Easy Python APIs
Durable - data can be replicated across multiple regions (all handled automatically for you), so you don't need to worry about losing anything if a server dies.
Could someone explain the benefits/issues with hosting a database in Kubernetes via a persistent volume claim combined with a storage volume over using an actual cloud database resource?
It's essentially a trade-off: convenience vs control. Take a concrete example: let's say you pay Amazon money to use Athena, which is really just a nicely packaged version of Facebook Presto which AWS kindly operates for you in exchange for $$$. You could run Presto on EKS yourself, but why would you.
Now, let's say you want to or need to use Apache Drill or Apache Impala. Amazon doesn't offer it. Nor does any of the other big public cloud providers at time of writing, as far as I know.
Another thought: what if you want to migrate off of AWS? Your data has gravity as well.
Could someone explain the benefits/issues with hosting a database in Kubernetes ... over using an actual cloud database resource?
As previous excellent answer noted:
It's essentially a trade-off: convenience vs control
In addition to previous example (Athena), take a look at RDS as well and see what you would need to handle yourself (why would you, as said already):
Automatic backups
Multizone deployments
Snapshots
Engine upgrades
Read replicas
and other bells and whistles that come with managed service opposed to self-hosted/managed one.
But there is more to it than just convenience/control that this post I trying to shed light onto:
Kubernetes is adding another layer of abstraction there (pods, services...), and depending on way of handling storage (persistent volumes) you can have two additional considerations:
Access speed (depending on your use case this can be negligent or show stopper).
Storage that you have at hand might not be optimized for relational database type of I/O (or restrict you to schedule pods efficiently). The very same reasons you are not advised to run db on NFS for example.
There are several recent conference talks on kubernetes pointing out that database is big no-no for kubernetes (although this is highly opinionated, we do run average load mysql and postgresql databases in k8s), and large load/fast I/O is somewhat challenge to get right on k8s as opposed to somebody already fine tuned everything for you in managed cloud solution.
In conclusion:
It is all about convenience, controls and capabilities.
Sorry, if this is a naive question, but i've watched bunch of talks from google's staff and still don't understand why on earth i would use AE instead of CF?
If i understood it correctly, the whole concept of both of these services is to build "microservice architecture".
both CF and AE are stateless
both suppose to execute during limited period of time
both can interact with dbs and other gcp apis.
Though, AE must be wrapped into own server. Basically it utilizes a lot of complexities on top of the same capabilities as CF. So, when should i use it instead of CF?
Cloud Functions (CFs) and Google App Engine (GAE) are different tools for different jobs. Using the right tool for the job is usually a good idea.
Driving a nail using pliers might be possible, but it won't be as convenient as using a hammer. Similarly building a complex app using CFs might be possible, but building it using GAE would definitely be more convenient.
CFs have several disadvantages compared to GAE (in the context of building more complex applications, of course):
they're limited to Node.js, Python, Go, Java, .NET Core, and Ruby. GAE supports several other popular programming languages
they're really designed for lightweight, standalone pieces of functionality, attempting to build complex applications using such components quickly becomes "awkward". Yes, the inter-relationship context for every individual request must be restored on GAE just as well, only GAE benefits from more convenient means of doing that which aren't available on CFs. For example user session management, as discussed in other comments
GAE apps have an app context that survives across individual requests, CFs don't have that. Such context makes access to certain Google services more efficient/performant (or even plain possible) for GAE apps, but not for CFs. For example memcached.
the availability of the app context for GAE apps can support more efficient/performant client libraries for other services which can't operate on CFs. For example accessing the datastore using the ndb client library (only available for standard env GAE python apps) can be more efficient/performant than using the generic datastore client library.
GAE can be more cost effective as it's "wholesale" priced (based on instance-hours, regardless of how many requests a particular instance serves) compared to "retail" pricing of CFs (where each invocation is charged separately)
response times might be typically shorter for GAE apps than CFs since typically the app instance handling the request is already running, thus:
the GAE app context doesn't need to be loaded/restored, it's already available, CFs need to load/restore it
(most of the time) the handling code is already loaded; CFs' code still needs to be loaded. Not too sure about this one; I guess it depends on the underlying implementation.
App Engine is better suited to applications, which have numerous pieces of functionality behaving in various inter-related (or even unrelated) ways, while cloud functions are more specifically single-purpose functions that respond to some event and perform some specific action.
App Engine offers numerous choices of language, and more management options, while cloud functions are limited in those areas.
You could easily replicate Cloud Functions on App Engine, but replicating a large scale App Engine application using a bunch of discrete Could Functions would be complicated. For example, the backend of Spotify is App Engine based.
Another way to put this is that for a significantly large application, starting with a more complex system like App Engine can lead to a codebase which is less complex, or at least, easier to manage or understand.
Ultimately these both run on similar underlying infrastructure at Google, and it's up to you to decide which one works for the task at hand. Furthermore, There is nothing stopping you from mixing elements of both in a single project.
Google Cloud Functions are simple , single purpose functions which are fired while watching event(s).
These function will remove need to build your own application servers to handle light weight APIs.
Main use cases :
Data processing / ETL : Listen and respond to Cloud Storage events, e.g. File created , changed or removed )
Webhooks : Via a simple HTTP trigger, respond to events originating from 3rd party systems like GitHub)
Lightweight APIs : Compose applications from lightweight, loosely coupled bits of logic
Mobile backend: Listen and respond to events from Firebase Analytics, Realtime Database, Authentication, and Storage
IoT: Thousands of devices streaming events and which in-turn calls google cloud functions to transform and store data
App Engine is meant for building highly scalable applications on a fully managed serverless platform. It will help you to focus more on code. Infrastructure and security will be provided by AE
It will support many popular programming languages. You can bring any framework to app engine by supplying docker container.
Use cases:
Modern web application to quickly reach customers with zero config deployment and zero server management.
Scalable mobile backends : Seamless integration with Firebase provides an easy-to-use frontend mobile platform along with the scalable and reliable back end.
Refer to official documentation pages of Cloud functions and App Engine
As both Cloud Functions and App Engine are serverless services, this is what I feel.
For Microservices - We can go either with CF's or App Engine. I prefer CF's though.
For Monolithic Apps - App engine suits well.
Main differentiator as #Cameron points out, is that cloud functions reliably respond to events. E.g. if you want to execute a script on a change in a cloud storage bucket, there is a dedicated trigger for cloud functions. Replicating this logic would be much more cumbersome in GAE. Same for Firestore collection changes.
Additionally, GAE’s B-machines (backend machines for basic or manual scaling) have conveniently longer run times of up to 24 hours. Cloud functions currently only run for 9 minutes top. Further, GAE allows you to encapsulate cron jobs as yamls next to your application code. This makes developing a server less event driven service much more clean.
Of course, the other answers covered these aspects better than mine. But I wanted to point out the main advantages of Cloud Functions being the trigger options. If you want functions or services to communicate with each other, GAE is probably the better choice.
What are the best practices for production and test (staging) environments in Google App Engine? Is it a good idea to setup separate projects?
We also use Google Cloud Storage and Cloud SQL. I'd like to prevent accidents where someone is mistakenly working in production when they intend to work in test.
We'll be storing a lot of stuff in GCS. From my understanding GCS environments are separate between projects. This can be desirable for us. But, if we want to copy product to test, is it possible to duplicate GCS from one app to another?
Looking forward to hearing how others do this.
Bruyere's answer is technically right, you can either version your app or use separate projects.
In practice, I've done both and you always end up needing to separate the projects for a ton of good reasons :
You might not want the same people to have the rights to update the staging env (all the devs would have this ability for example) and the production env (typically this would be restricted to the tech lead, or QA team, or you continuous integration server)
Isolating two app engine versions is not that easy, in particular when you deal with cron jobs, email or XMPP reception
You might not want the same people to be able to read/write the staging data and the prod data
You want to make sure that the App Engine prod app does not write to the staging Cloud Storage bucket. If they are part of the same project, by default this is possible
My recommendation is to store the environment related data (cloud storage bucket, Cloud SQL url etc) in a configuration file that is loaded by the application. If you use Java, I personnally use a properties file that is populated by Maven based on two profiles (dev and prod, dev being the default one).
Another important point is to separate environments from the start. Once you've started assuming that both environments will live in the same application, a lot of your code will be developed based on that assumption and it will be harder to move back to two different projects.
I was also wondering if there was another option than project for UAT/Prod env, I found this article from Google Documentation says you should use different projects.
Best Practices for Enterprise Organizations
https://cloud.google.com/docs/enterprise/best-practices-for-enterprise-organizations
We recommend that you spend some time planning your project IDs for
manageability. A typical project ID naming convention might use the
following pattern:
[company tag]-[group tag]-[system name]-[environment (dev, test, uat, stage, prod)]
For example, the development environment for the human resources
department's compensation system might be named acmeco-hr-comp-dev.
I can see two ways to do this, all depending on your needs:
1) Using versions on your app, a different instance in the Cloud SQL and different bucket name for GCS you can use the same project. You just have to be super careful about setting what target each of your calls go to and to re-direct them when it goes live.
2) Using a separate project is probably the safer option but either way you will need to use a unique bucket name. Bucket names must be unique across all GCS instances.
Its quite easy to copy from one bucket to another once you get your permissions setup. Using gsutil you can copy from bucket to bucket.
Is it possible to run Google App Engine Development Server on my own server? How well development server datastore can handle high load and what amount of data will cripple it?
Some options for running an App Engine app without App Engine:
TyphoonAE, which runs Python apps using a stack of popular open-source components
appscale, which runs Python or Java apps off of Amazon's EC2 cloud
I haven't tried either. See this question for some additional discussion of both.
How well will the datastore perform if you simply spin up dev_appserver.py on a public IP? If you have a lot of data, poorly. When using the dev server, the entire datastore is held in memory, so as you insert data, Python's memory usage will climb. Once you've added enough data to cause your system to start swapping, your app will become unusably slow. There's an option in the dev server to use a SQLite datastore stub instead of the in-memory stub. This makes performance tolerable with large amounts of data, but it's not nearly as efficient as the production datastore, so datastore access is relatively slow even with small amounts of data. Certainly much slower than the in-memory datastore with small amounts of data.
Running the dev server as a stand-alone production server is just generally a bad idea. The API stubs provided with the dev server are designed for use by developers, not users. E.g. sending mail just writes a log entry instead of actually sending mail; logging in as an administrator entails clicking a checkbox that says "log in as administrator".
If you want to move an existing app off App Engine, use one of the options above. If you're developing an app from scratch, use Django or some other framework that's designed to run on generic hardware. The development server is intended for just that: development.
YES, with a lot of missing features (parallel queues, cron jobs, mail, XMPP,..), some hidden security issues, poor performance and stability, it is technically possible.
As you probably guessed, it's a bad idea.
Take for example the HTTP server; using the development server you would put in Production an undocumented BaseHTTPServer, quite impossible to configure and with probably some hidden security flaws ready to be exploited.
As #Drew well said, there are better choices out there to run you Google App Engine code in a Production ready environment that is not GAE.
Although this is 2y+ old thread, just adding my info: http://www.jboss.org/capedwarf