Make microservice application resilient to db downtime

Make microservice application resilient to db downtime - database

We have a microservice application which is saving the data into an Oracle Db.
So far the DB is our single point of failure which we want to improve (we are using a single Oracle DB with a cold failover instance).
Now the company is asking us to upgrade the oracle DB, the issue is that it requires downtime.
For that reason we were thinking about:
add a global/geo replicated cache layer (e.g redis) between the microservice and the DB
for each new record that should be saved on the db:
Add the record in the cache (storing the entries on the HD in case the whole cache layer crashes)
throw an event to a queue (we have RabbitMQ). On the other side of the queue we can create a new service to consume the events and add them to the DB in an asynch way.
It's basically adding a write-behind cache layer.
In the above scenario we are confident that we can save easily 1 week data in the cache or more.
If the DB is down the new service which is listening to the queue will simply re-trying adding the rows in the DB, as soon as an event is added to the Db then the event can be ack and the next one will be consumed. In this way, if the DB is down or if we have to do some maintenance, it should not affect the main application: the users can still "save" the data and retrieve it (with the 1 week max constraint whenever the db is down).
The down side is that the architecture is more complex and we can have now data eventual consistency.
Is there another design pattern to better deal with database downtime without having the users feel that something is wrong?
Do you know any already-existing tools that we can use to automatically read an event from Rabbit and save it in the db? (we are already doing it with logstash to automatically forwards some rabbit events to elastic).
The next step would be to have a cluster of DB (cassandra,mongo etc) but for now we do not have the capacity for that.

Adding cache for increase availability is, probably, an awkward solution - as you will eventually get to the same issue of keeping cache available. Also, handling cold caches is not a simple task.
I am not familiar with Oracle, but most databases do support replication; and you have options for synchronous/asynchronous/semi-synchronous patterns.
Quick search helped me to discover "Oracle Data Guard" - seems that's the tool you need. Docs say that the Guard supports data replication and failover.
As for using Cassandra - I highly recommend to evaluate that first - Oracle gives you ACID properties and joins; this makes application code much simpler. Also, consistency patterns will be different. Lots of details to think about.
My general recommendation is to look into your data layer (oracle in this case) and follow their recommendation to achieve high availability. Oracle is mature product, and availability is well-supported.

Related

When is SQL Server as a distributed caching mechanism worthwhile?

I have 2 web servers, and I'm running into an issue where I need to prematurely expire (remove) a cached item. Since I'm currently using IMemoryCache, a Remove(key) call only removes the cached item from one server. I don't have the ability to leverage Redis, Nache, etc. but the app is already using SQL server. I can easily set up distributed caching with a cache table, but it seems counter-intuitive because what I'm caching is user data that I don't want to hit the database for on every call (e.g., I cache 50 items of user data every 5 minutes which has cut down on 500 trips to the database). Is there something I'm missing which would make using SQL server as my distributed cache backend actually beneficial?

Sounds like you are having the typical problem of cache invalidation and expiry. You can use a grid-cache for distributed caching (e.g. Redis, Hazelcast) but it doesn't solve the invalidation problem. You may want to consider vendors like ScaleArc or Heimdall Data. They provide the caching logic. You choose the storage of choice (in-memory, Redis etc.) and it handles query caching and invalidation. The is SQL Server blog on it: https://www.itprotoday.com/industry-perspectives/reduce-sql-server-costs-heimdall-data-caching

Best practice or design to scale out/horizontal scale database for microservices

The main benefit of Microservices are one Service “Type” can be scale out by using multiple container instances and load-balancing to improve through put.
But one things is, multiple instances (ie. containers) of a "Service Type" are sharing the same database instance; and this could leave to performance bottle neck when multiple instance write/read on that database instance.
Traditionally, we would scale up on the processing power of that database instance to meet high demand.
The main questions for me is, what is the current best practice/design/solution to scale out/ horizontal scale so we can have multiple instance of that database and having performance improvement?
In particular, what I want to archive are:
One instance is down, a nother instance can handle the load -> High
Availability
Can load balance read, or maybe even write to multiple database
intance
Maintain the persistent and consistency of data incase I want to
create more database-instance
Within my knowledge,
One of the solution is Microsoft SQL Server provide High availability for SQL Server containers with can do most of the requirements above (https://learn.microsoft.com/en-us/sql/linux/sql-server-linux-container-ha-overview?view=sql-server-2017). But I'm wonder is there a better solution to avoid technology lock-down?
Another solution which I'm thinking of is: Replicate to multiple instance by using CDC Stream Data from a master database instance to multiple replications. This allow replication read.
But I'm still not convince because to quarrant the consistency, every services instance should write to master-database-instance, this could also, leave to bottle neck on master database instance.

There are 3 possible architectures for database at a broad level:
Single leader (e.g. RDBMS)
Multi leader (e.g. RDBMS in multiple DC)
Leader less (e.g. Riak, Cassandra)
As you go from top to bottom in the above list, horizontal scalability potential increases, but consistancy becomes weaker.
Scalability potential increases because more nodes can accept writes as you go down the list. Consistancy becomes weaker as writes take time to propagate or replicate to all nodes responsible for the data. Conflicts arise when same record is written in two different nodes at almost same time and so at the time of replication the system does not know which one is correct.
There are various conflict resolution strategies. Different database use different strategies. You need to study these strategies to understand which one suits your usecase and based on that you pick your DB.

There is always a trade off when making choices . database has its limitations and despite scaling database we can avoid performace hit by using simple best practices. you can't leave it to database to handle high request rate and mind it scaling database is expensive option and you will hit database limits eventually if not taken right so plan the whole system than just database.
coming to your point you can have one master and slave for read and write separately is very common approach but you have to rely on eventual consistency and sql always on is something you can have a look. You can cache the most frequently data. If you have very high request rate you may need to consider queues where you put the request and dequeue later to avoid database performance hit.

Solr master-master replication alternatives?

Currently we have 2 servers with a load-balancer before them. We want to be able to turn 1 machine off and later on, without the user noticing it.
Our application also uses solr and now i wanted to install & configure solr on both servers and the question is how do i configure a master-master replication?
After my initial research i found out that it's not possible :(
But what are my options here? I want both indices to stay in sync and when a document is commited on one server it should also go to the other.
Thanks for your help!

Not certain of your specific use case (why turn 1 server on and off?), there is no specific "master-master" replication. Solr does however support distributed indexing and querying via SolrCloud. From the documentation for SolrCloud:
Replication ensures redundancy for your data, and enables you to send
an update request to any node in the shard. If that node is a
replica, it will forward the request to the leader, which then
forwards it to all existing replicas, using versioning to make sure
every replica has the most up-to-date version. This architecture
enables you to be certain that your data can be recovered in the event
of a disaster, even if you are using Near Real Time searching.
It's a bit complex so I'd suggest you spend some time going thru the documentation as it's not quite as simple as setting up a couple of masters and load balancing between them. It is a big step up from the previous master/slave replication that Solr used, so even if it's not a perfect fit it will be a lot closer to what you need.
https://cwiki.apache.org/confluence/display/solr/SolrCloud
https://cwiki.apache.org/confluence/display/solr/Getting+Started+with+SolrCloud

You can just create a simple master - slave replication as described here:
https://cwiki.apache.org/confluence/display/solr/Index+Replication
But be sure you send your inserts, deletes, updates directly to the master, but selects can go through the load balancer.
The other alternative is to create a third server as a master, and 2 slaves, and the lode balancer can be in front of the two slaves.

Managing high-volume writes to SQL Server database

I have a web service that is used to manage files on a filesystem that are also tracked in a Microsoft SQL Server database. We have a .NET system service that watches for files that are added using the FileSystemWatcher class. When a file-added callback comes from FileSystemWatcher, metadata about the file is added to our database, and it works fairly well.
I've now come to a bit of a scalability problem. I'm adding large quantities of files to the filesystem in rapid succession, and this ends up hammering the database with file adds which results in locking up my web front-end.
I have yet to work on database scability issues, so I'm trying to come up with mitigate tactics. I was thinking of perhaps caching file adds and only writing them off to the database every five minutes or so, but I'm not sure how practical that is. This is data that needs to find its way into our database at some point anyway, and so it's going to have to get hammered at some point. Maybe I could limit the number of file db entries written per second to a certain amount, but then I risk having that amount be less than the rate at which files are added. How can I best tackle this?

Have you thought about using something like SQL Server Service Broker? That way you could push through tons of entries in a burst and it would level out the inserts into your database.
Basically you'd be pushing messages onto a queue which would then be consumed by a receiver stored procedure that would perform the insert for you. You could limit the maximum number of receivers executing to help with the responsiveness issues in your web interface.
There's a nice intro paper here. Although it's for 2005, not much has changed between 2005 and the newer versions of SQL Server.

You have a performance problem and you should approach it with a performance investigation methodology like Waits and Queues. Once you identify the actual problem, we can discuss solutions.
This is just a guess but, assuming the notification 'update metadata' code is a stright forward insert, the likely problem is that you're generating one transaction per notification. This results in commit flush waits, see Diagnosing Transaction Log Performance . Batch commit (aggregate multiple notifications before committing) is the canonical solution.

first option is using Caching to handle high-volume data. or using clusters for analysis high volume data. please click here for more information.

Scaling out SQL Server for the web (Single Writer Multiple Readers)

Has anyone had any experience scaling out SQL Server in a multi reader single writer fashion. If not can anyone suggest a suitable alternative for a read intensive web application, that they have experience with

It depends on probably 2 things:
How big each single write is?
Do readers need real time data?
A write will block readers when writing, but if each write is small and fast then readers won't notice.
If you offload, say, end of day reporting then you batch your load onto a separate server because readers do not require real time data. This makes sense
A write on your primary server must be synched to your offload secondary server... which will block there as part of the synch process anyway + you add an overhead load to manage the synch.
Most apps are 95%+ read anyway all the time. For example, an update or delete is a read followed by a write.
My choice would be (probably, based on the low write volume and it's a web app) to scale up and stuff as much RAM as I could in the DB server with separate disk paths for the data and log files of the database.

I don't have any experience with scaling out SQL Server for your scenario.
However for a Read-Intensive application, I would be looking at reducing the load on the database and employ a Cache Strategy using something like Memcache or MS Velocity
There are two approaches that I'm aware of:
Have the entire database loaded into the Cache and manage Adding and Updating of items in the cache.
Add items to the cache only when they are requested and remove them when a write operation is performed.

Some kind of replication would do the trick.
http://msdn.microsoft.com/en-us/library/ms151827.aspx
You of course need to change your app code.
Some people use partitioned tables, with different row ranges being stored on different servers - united with views. This would be invisible to the app. Federation for this practice, I think.
By designing your database, application and server configuration (SQL particulars - location of data/log/system/sql binaries/tempdb), you should be able to handle a pretty good load. Try not to complicate things if you don't have to.