How to have AWS RDS synchronous read replication? - database

For AWS RDS there are 2 way to create a "clone" of your DB :
1/ Read replica : Create a read replica, data is asynchronous, meaning there's a little delay
2/ Multi-AZ standby : Create a standby DB, data is synchronous, meaning it's exactly same all the time; but this is for fail-over and cannot be used unless main DB down.
So the "synchronous" ability is there already, but I don't find any option to have a synchronous read only replica.
For my case, I want to have read replica to reduce read load on main DB, but data is very sensitive so cannot afford to read old data at all, any suggestion for my case here with AWS RDS service ? like making the "standby" being readable.

If you're using Postgres or MySQL, you can deploy to Aurora rather than standard RDS. It uses a shared data storage layer, so gives you synchronous read replicas, in addition to improved data durability and automatic failover.

There is a new option in RDS that allows having two readable standbys with synchronous replication: https://aws.amazon.com/rds/features/multi-az/#Amazon_RDS_Multi-AZ_with_two_readable_standbys. It's a relatively new offering that still needs testing if there is really no lag when you read from replicas.

Related

How can we sync one postgres database in rds to other postgres database in rds?

There is one postgres database in rds. We need to sync it to another postgres database in rds.
I have connected one database with pg admin but not sure how to synchronise the database with the other one.
Is this a recurring task or a one time task? If it's a one time thing, you can do a dump/backup and restore it to a new instance. If this is a recurring thing, then...
Why do you want to replicate the data? What is your actual goal? If it's for scaling, you should likely use the native RDS functionality for that. If it's for testing, then you will likely want to sanitize data as part of the copy.
As smac2020 said in their answer, AWS provides a migration service you can use for several use cases, but if this is a one-time thing, that is likely overkill. Just do a backup and restore that to another server.
You can also leverage change-data-capture to replicate data out, but that will likely take a lot to get working well and will only be worth it in certain circumstances. We leverage CDC to copy data to our warehouse, for example. Here is an example that streams data to Kinesis. Here is another example that streams to Kinesis using the migration service.

AWS DMS Migration Questions

I am new to AWS DMS and trying to understand some detail however unable to find answers so any help on this is highly appreciated.
Q1 - If you have distributed database at your corporate data center ( on prem) , Do you need to create DMS for each of distributed database? if so does it sync all when it does CDC
Q2 - Can DMS replicate from the standby database?
Q1) Assuming you use a single URL to connect to the database, you should only need that single set of connection information to replicate the databases.
Q2) If you are just doing a full load and no on going replication, then yes, this is possible. If you are talking about ongoing replication, it depends on the database but it usually requires additional logging to be enabled. For example, Oracle requires the addition of supplemental logging, and MySQL requires row-level binary logging (bin logging). Many times standby databases don't have those enabled but, assuming they are enabled on your instance, it should be possible.
References:
https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Task.CDC.html
Q1) Create a single DMS endpoint to the master node (or to any slave if you've not a master) of your distributed database. It's enough for your data migration
Q2) Yes for full load migration. If you need an ongoing replication you've to enable the LogMiner or Binlogs in your data source before.

AWS RDS Read Replica but with different storage class/type

I have a master db in one region.. and I want to create a read replica of it in another region just for disaster recovery practices.
I do not want it to be that costly, but I want the replication to work.
My current master db has db.t2.medium.
My question is:
What type should I keep for my read replica? Is db.t2.small fine for my replica?
It should not have much effect as read replica (RR) replication is asynchronous:
Amazon RDS then uses the asynchronous replication method for the DB engine to update the read replica whenever there is a change to the primary DB instance.
This means that your RR will be always lagging behind the master. For exactly how much, it depends on your setup. Thus you should monitor the lag as shown in Monitoring read replication. This is needed, because you may find that the lag is unacceptably large for the RR to be useful for DR purposes (i.e. large RPO).

Load balancer and multiple instance of database design

The current single application server can handle about 5000 concurrent requests. However, the user base will be over millions and I may need to have two application servers to handle requests.
So the design is to have a load balancer to hope it will handle over 10000 concurrent requests. However, the data of each users are being stored in one single database. So the design is to have two or more servers, shall I do the followings?
Having two instances of databases
Real-time sync between two database
Is this correct?
However, if so, will the sync process lower down the performance of the servers
as Database replication seems costly.
Thank you.
You probably want to think of your service in "tiers". In this instance, you've got two tiers; the application tier and the database tier.
Typically, your application tier is going to be considerably easier to scale horizontally (i.e. by adding more application servers behind a load balancer) than your database tier.
With that in mind, the best approach is probably to overprovision your database (i.e. put it on its own, meaty server) and have your application servers all connect to that same database. Depending on the database software you're using, you could also look at using read replicas (AWS docs) to reduce the strain on your database.
You can also look at caching via Memcached / Redis to reduce the amount of load you're placing on the database.
So – tl;dr – put your DB on its own, big, server, and spread your application code across many small servers, all connecting to that same DB server.
Best option could be the synchronizing the standby node with data from active node as cost effective solution since it can be achievable using open source relational database(e.g. Maria DB).
Do not store computable results and statistics that can be easily doable at run time which may help reduce to data size.
If history data is not needed urgent for inquiries , it can be written to text file in easily importable format to database(e.g. .csv).
Data objects that are very oftenly updated can be kept in in-memory database as key value pair, use scheduled task to perform batch update/insert to relation database to achieve persistence
Implement retry logic for database batch update tasks to handle db downtimes or network errors
Consider writing data to relational database as serialized objects
Cache configuration data to memory from database either periodically or via API to refresh the changing part.

Realtime Streaming of SQL SERVER (RDS) transactions to NoSQL

I have a situation where I want to stream all the Updates, Deletes and Inserts from my AWS RDS SQL Server to a NoSQL DB such as DynamoDB or RethinkDB.
What I am trying to achieve is to divide my users into critical and non critical databases reducing the load on my rds server and using technologies like rethinkdb or dynamodb streams to send the other set of data (non critical) to front end.
I have thought of various ways to do this:
the most obvious to just asynchronously make entry in both databases though I can end up in a situation where one of the entries may fail.
two is to use RabbitMQ or queing service aws sqs to que the second entry and make sure that it inserts.
(which I want to achive) is if somehow a nodejs service can listen to mssql streams and push the content to nosql.
What can be done in a situation like this?
The profit I am looking for is to store a dataset in nosql that can be served to over 100k users as they all want to see the same data with just some where clause changes and in realtime. This in turn will reduce the RDS Server transactions to a minimum reads and writes.
You can use 2 approach below :
AWS DMS
Or, combining EMR, Amazon Kinesis, and Lambda (with custom scripts)

Resources