Problem: Cloud SQL instances run indefinitely and are monetarily expensive to host.
Goal: Save money while not compromising on database availability.
It has been almost four years and Google Cloud has not fulfilled this feature request that has already been implemented on AWS with their Aurora RDS.
Since it does not seem that on demand Cloud SQL that auto-scales to zero is coming any time soon, will the following strategy work?
Have instances of Cloud SQL, a Baby and a Papa. They follow the master/slave replica principle, with twist. The Baby
instance is small with few vCPU's and low memory, it always runs, but
does so cheaply. However, the Papa instance is expensive with high vCPU and high
memory but runs only when needed.
To begin, only the Baby Cloud SQL instance is running so it is the master that accepts reads/writes. The Papa Cloud SQL instance is not running.
Since I am using standard app engine that
will auto-scale to zero with no traffic, schedule a cron job that
checks every 10 min if no app engine instances exists. In this case,
the application has no traffic. If this is not the case, the Papa Cloud SQL instance is started. Once started, the Papa instance
becomes the master that accepts reads/writes while the Baby instance
becomes a slave replica capable of only reads.
If the cron job detects the app engine has zero instances running, this means there is no traffic. Thus, the Papa Cloud SQL instance is
stopped and the Baby Cloud SQL replica is promoted to master and can accept reads/writes.
In this way, the expensive Papa instance runs on demand. If there is a traffic
spike when the Papa instance is stopped or rebooting, the Baby
instance will still be able to respond to requests.
This strategy ensures that the expensive Papa Cloud SQL instance only runs with traffic. Is this Baby-Papa dynamic possible on Google Cloud?
Cloud SQL has an Admin API that can be used to manipulate your Cloud SQL instances in such a way. You could build pieces of what you are describing using Cloud Scheduler to trigger a Cloud Function which uses the API to start and stop instances, or even promote/demote them to master.
However, it's probably a bad idea. These operations can take several minutes to complete and would give you dramatic increases to cold start times for requests. Additionally, SQL servers prefer to be long running for a reason - they use resources to cache and optimize queries to improve performance. Start, stoping, and resizing instances can cause you to lose these benefits.
It's better to consider - do you actually need a relational database? If not, it's probably better to use something like Firestore, which is a serverless product.
If you determine that you do indeed need a relational database, can you optimize your use for a smaller Cloud SQL instance? Can you cache queries using Memorystore or Firestore as listed above, or instead use the services I described above to export the results on a timed basis, which would be easier for your app to consume?
Would it be better to start and stop your Cloud SQL instance when there is no traffic? If you traffic is based around certain predictable times, you could schedule your instance to resize at the start and stop of these time periods.
Finally, if cost is really an option, you could run your own SQL server on a GCE instance. This means you have to do pretty much all of the management yourself (install, updates, maintenance, etc), but it would be cheaper.
All of these are probably much more functional solutions than trying to shoehorn non-serverless infrastructure to match a serverless workload.
Related
Im using the Google Cloud Run service to host my Spring application in a docker container. The database is running in the Google SQL service. My problem is that the requests from the application to the database can take up to 2 minutes. See the Google Cloud Run log (the long requests are painted yellow). And here's the Dockerfile and Docker Compose File
The database is quite empty, it contains about 20 tables but each contains only few rows, so no request is bigger than few kB. And to make it more strange, after re-deploying the application the requests are fast again. But after few minutes, hours or even after a whole day the requests slow down again. When I start the application on my local machine the requests are always fast (to my local SQL and Google SQL instance), never had any slow connection. All actions within my application that doesn't require any DB request are still fast and takes only few ms.
Both services are running in the same region (europe-west) and CPU usage of the run service is never higher than 15%, of the Google SQL never above 3%. The Google SQL uses 1 CPU and 3.75GB, the Google run service has 4GB RAM and 2CPUs. But increasing the power of the Google Run Service and Google SQL doesn't improve the request latency. Google Cloud SQL is using MySQL 5.7 (like my local DB).
And after seeing the logs only warnings are shown in the filtered Google SQL log (I really dont know why this happens). Additionally here are my DB connection settings in the Spring config. But I dont think this has any impact, the config works perfect when connecting my local application to my local SQL instance or to the Google SQL instance.
But maybe one of you has an idea?
While not a real answer, there is a bug filed at Google that is tracking the issue:
https://issuetracker.google.com/issues/186313545
This is really hurting our customers experience and makes us loose trust in the service quality of cloud run. Even more so if there is no feedback from Google to know if they are even addressing the issue.
Edit:
The issue now seems to be resolved, according to the interactions in https://issuetracker.google.com/issues/186313545
We're migrating our environment over to AWS from a colo facility. As part of that we are upgrading our 2 SQL Server 2005s to 2014s. The two are currently mirrored and we'd like to keep it that way or find other ways to make the servers redundant. # of transactions/server-use is light for our app - but it's in production, requires high availability, and, as a result, requires some kind of fail over.
We have already setup one EC2 instance and put SQL server 2014 on it (as opposed to using RDBMS for licensing reasons and are now exploring what to do next to achieve this.
What suggestions do people have to achieve the redundancy we need?
I've seen two options thus far from here and googling around. I list them below - we're very open to other options!
First, use RDBMS mirroring service, but I can't tell if that only applies if the principal server is also RDBMS - it also doesn't help with licensing.
Second, use multiple availability zones. What are the pros/cons of this versus using different regions altogether (e.g., bandwidth issues) etc? And does multi-AZ actually give redundancy (if AWS goes down in Oregon, for example, then doesn't everything go down)?
Thanks for the help!
The Multi-AZ capability of Amazon RDS (Relational Database Service) is designed to offer high-availability for a database.
From Amazon RDS Multi-AZ Deployments:
When you provision a Multi-AZ DB Instance, Amazon RDS automatically creates a primary DB Instance and synchronously replicates the data to a standby instance in a different Availability Zone (AZ). Each AZ runs on its own physically distinct, independent infrastructure, and is engineered to be highly reliable. In case of an infrastructure failure (for example, instance hardware failure, storage failure, or network disruption), Amazon RDS performs an automatic failover to the standby, so that you can resume database operations as soon as the failover is complete. Since the endpoint for your DB Instance remains the same after a failover, your application can resume database operation without the need for manual administrative intervention.
Multiple Availability Zones are recommended to improve availability of systems. Each AZ is a separate physical facility such that any disaster that should befall one AZ should not impact another AZ. This is normally considered sufficient redundancy rather than having to run across multiple Regions. It also has the benefit that data can be synchronously replicated between AZs due to low-latency connections, while this might not be possible between Regions since they are located further apart.
One final benefit... The Multi-AZ capability of Amazon RDS can be activated by simply selecting "Yes" when the database is launched. Running your own database and using mirroring services requires you to do considerably more work on an on-going basis.
I'm interested in taking advantage of Amazon's managed database (RDS), but, at the same time, I'd like my web application to run on-premises or on another cloud provider that offers data centers near me (less latency, as my application not always has to fetch data from the DB).
Is this scenario common? Would it make sense, or is Amazon RDS supposed to be run with instances running in Amazon?
If you're looking to reduce latency this is probably not your best option, as DB performance is going to be pretty bad (very large latencies between the web application and the DB server, basically cancelling out and advantages of having the app server as close to your clients as possible).
I've actually had to test a similar configuration, with a DB server in Europe and an app server in the US and the performance was much worse than having both in any of the two regions.
I've recently moved the database for our web application to RDS whilst still having our application servers hosted on-premise. Eventually the entire application will live on AWS, but it was too big a move to do all components at the same time. I'm only dealing with latency between Perth and Sydney in Australia, which is about 50ms, and this is working fine for us.
I don't recommend that you adopt this as a permanent configuration. You'll get much better performance from hosting your entire stack at AWS as opposed to keeping parts separated.
I'm researching cloud services to host an e-commerce site. And I'm trying to understand some basics on how they are able to scale things.
From what I can gather from AWS, Rackspace, etc documentation:
Setup 1:
You can get an instance of a webserver (AWS - EC2, Rackspace - Cloud Server) up. Then you can grow that instance to have more resources or make replicas of that instance to handle more traffic. And it seems like you can install a database local to these instances.
Setup 2:
You can have instance(s) of a webserver (AWS - EC2, Rackspace - Cloud Server) up. You can also have instance(s) of a database (AWS - RDS, Rackspace - Cloud Database) up. So the webserver instances can communicate with the database instances through a single access point.
When I use the term instances, I'm just thinking of replicas that can be access through a single access point and data is synchronized across each replica in the background. This could be the wrong mental image, but it's the best I got right now.
I can understand how setup 2 can be scalable. Webserver instances don't change at all since it's just the source code. So all the http requests are distributed to the different webserver instances and is load balanced. And the data queries have a single access point and are then distributed to the different database instances and is load balanced and all the data writes are sync'd between all database instances that is transparent to the application/webserver instance(s).
But for setup 1, where there is a database setup locally within each webserver instance, how is the data able to be synchronized across the other databases local to the other web server instances? Since the instances of each webserver can't talk to each other, how can you spin up multiple instances to scale the app? Is this setup mainly for sites with static content where the data inside the database is not getting changed? So with an e-commerce site where orders are written to the database, this architecture will just not be feasible? Or is there some way to get each webserver instance to update their local database to some master copy?
Sorry for such a simple question. I'm guessing the documentation doesn't say it plainly because it's so simple or I just wasn't able to find the correct document/page.
Thank you for your time!
Update:
Moved question to here:
https://webmasters.stackexchange.com/questions/32273/cloud-architecture
We have one server setup to be the application server, and our database installed across a cluster of separate machines on AWS in the same availability zone (initially three but scalable). The way we set it up is with a "k-safe" replication. This is scalable as the data is distributed across the machines, and duplicated such that one machine could disappear entirely and the site continues to function. THis also allows queries to be distributed.
(Another configuration option was to duplicate all the data on each of the database machines)
Relating to setup #1, you're right, if you duplicate the entire database on each machine with load balancing, you need to worry about replicating the data between the nodes, this will be complex and will take a toll on performance, or you'll need to sacrifice consistency, or synchronize everything to a single big database and then you lose the effect of clustering. Also keep in mind that when throughput increases, adding an additional server is a manual operation that can take hours, so you can't respond to throughput on-demand.
Relating to setup #2, here scaling the application is easy and the cloud providers do that for you automatically, but the database will become the bottleneck, as you are aware. If the cloud provider scales up your application and all those application instances talk to the same database, you'll get more throughput for the application, but the database will quickly run out of capacity. It has been suggested to solve this by setting up a MySQL cluster on the cloud, which is a valid option but keep in mind that if throughput suddenly increases you will need to reconfigure the MySQL cluster which is complex, you won't have auto scaling for your data.
Another way to do this is a cloud database as a service, there are several options on both the Amazon and RackSpace clouds. You mentioned RDS but it has the same issue because in the end it's limited to one database instance with no auto-scaling. Another MySQL database service is Xeround, which spreads the load over several database nodes, and there is a load balancer that manages the connection between those nodes and synchronizes the data between the partitions automatically. There is a single access point and a round-robin DNS that sends the requests to up to thousands of database nodes. So this might answer your need for a single access point and scalability of the database, without needing to setup a cluster or change it every time there is a scale operation.
I run a very high traffic(10m impressions a day)/high revenue generating web site built with .net. The core meta data is stored on a SQL server. My team and I have a unique caching strategy that involves querying the database for new meta data at regular intervals from a middle tier server, serializing the data to files and sending those to the web nodes. The web application uses the data in these files (some are actually serialized objects) to instantiate objects and caches those in memory to use for real time requests.
The advantage of this model is that it:
Allows the web nodes to cache all data in memory and not incur any IO overhead querying a database.
If the database ever goes down either unexpectedly or for maintenance windows, the web servers will continue to run and generate revenue. You can even fire up a web server without having to retrieve its initial data from the DB because all the data it needs are in files on its own disks.
Allows us to be completely horizontally scalable. If throughput suffers, we can just add a web server.
The disadvantages are that this caching and persistense layers adds complexity in the code that queries the database, packages the data and unpackages it on the web server. Any time our domain model requires us to add entities, more of this "plumbing" has to be coded. This architecture has been in place for four years and there are probably better ways to tackle this.
One strategy I have been considering is using replication to replicate our master sql server database to local database instances installed on each web server. The web server application would use normal sql/ORM techniques to instantiate objects. Here, we can still sustain a master database outage and we would not have to code up specialized caching code and could instead use nHibernate to handle the persistence.
This seems like a more elegant solution and would like to see what others think or if anyone else has any alternatives to suggest.
I think you're overthinking this. SQL Server already has mechanisms available to you to handle these kinds of things.
First, implement a SQL Server cluster to protect your main database. You can fail over from node to node in the cluster without losing data, and downtime is a matter of seconds, max.
Second, implement database mirroring to protect from a cluster failure. Depending on whether you use synchronous or asynchronous mirroring, your mirrored server will either be updated in realtime or a few minutes behind. If you do it in realtime, you can fail over to the mirror automatically inside your app - SQL Server 2005 & above support embedding the mirror server's name in the connection string, so you don't even have to lift a finger. The app just connects to whatever server's live.
Between these two things, you're protected from just about any main database failure short of a datacenter-wide power outage or network outage, and there's none of the complexity of the replication stuff. That covers your high availability issue, and lets you answer the scaling question separately.
My favorite starting point for scaling is using three separate connection strings in your application, and choose the right one based on the needs of your query:
Realtime - Points directly at the one master server. All writes go to this connection string, and only the most mission-critical reads go here.
Near-Realtime - Points at a load balanced pool of read-only SQL Servers that are getting updated by replication or log shipping. In your original design, these lived on the web servers, but that's dangerous practice and a maintenance nightmare. SQL Server needs a lot of memory (not to mention money for licensing) and you don't want to be tied into adding a database server for every single web server.
Delayed Reporting - In your environment right now, it's going to point to the same load-balanced pool of subscribers, but down the road you can use a technology like log shipping to have a pool of servers 8-24 hours behind. These scale out really well, but the data's far behind. It's great for reporting, search, long-term history, and other non-realtime needs.
If you design your app to use those 3 connection strings from the start, scaling is a lot easier, and doesn't involve any coding complexity - just pick the right connection string.
Have you considered memcached? Since it is:
in memory
can run locally
fully scalable horizontally
prevents the need to re-cache on each web server
It may fit the bill. Check out Google for lots of details and usage stories.
Just some addition to what RickNZ proposed above..
Since your master data which you are caching currently won't change so frequently and probably over some maintenance window, here is what should you do first on database side:
Create a SNAPSHOT replication for the master tables which you want to cache. Adding new entities will be equally easy.
On all the webservers, install SQL Express and subscribe to this Publication.
Since, this is not a frequently changing data, you can rest assure, no much server resource usage issue minus network trips for master data.
All your caching which was available via previous mechanism is still availbale minus all headache which comes when you add new entities.
Next, you can leverage .NET mechanisms as suggested above. You won't face memcached cluster failure unless your webserver itself goes down. There is a lot availble in .NET which a .NET pro can point out after this stage.
It seems to me that Windows Server AppFabric is exactly what you are looking for. (AKA "Velocity"). From the introductory documentation:
Windows Server AppFabric provides a
distributed in-memory application
cache platform for developing
scalable, available, and
high-performance applications.
AppFabric fuses memory across multiple
computers to give a single unified
cache view to applications.
Applications can store any
serializable CLR object without
worrying about where the object gets
stored. Scalability can be achieved by
simply adding more computers on
demand. The cache also allows for
copies of data to be stored across the
cluster, thus protecting data against
failures. It runs as a service
accessed over the network. In
addition, Windows Server AppFabric
provides seamless integration with
ASP.NET that enables ASP.NET session
objects to be stored in the
distributed cache without having to
write to databases. This increases
both the performance and scalability
of ASP.NET applications.
Have you considered using SqlDependency caching?
You could also write the data to the local disk at the web tier, if you're concerned about initial start-up time or DB outages. But at least with a SqlDependency, you shouldn't have to poll the DB to look for changes. It can also be made relatively transparent.
In my experience, adding a DB instance on web servers generally doesn't work out too well from a scalability or performance perspective.
If you're concerned about performance and scalability, you might consider partitioning your data tier. The specifics depend on your app, but as an example, you could move read-only data onto a couple of SQL Express servers that are populated with replication.
In case it helps, I talk about this subject at length in my book (Ultra-Fast ASP.NET).