Confused about AWS RDS read replica databases. Why can I edit rows? - database

Edit: I'm not trying to edit the read replica. I'm saying I did edit it and I'm confused on why I was able to.
I have a database in US-West. I made a read replica in Mumbai, so the users in India don't experience slowness. Out of curiosity, I tried to edit a row in the Mumbai read-replica database hoping to get a security error rejecting my write attempt (since after all, it is a READ replica). But the write operation was successful. Why is that? Shouldn't this be a read-only database?
I then went to the master database hoping the writing process would at least be synchronized, but my write execution didn't persist. The master database was now different than the place.
I also tried edited data in the master database, hoping it would replicate it to the slave database, but that failed as well.
Obviously, I'm not understanding something.

Take a look at this link from Amazon Web Service to get an idea:
How do I configure my Amazon RDS DB instance read replica to be modifiable?
Probably your read replica has the flag read_only = false
Modify the newly created parameter group and set the following parameter:
In the navigation pane, choose Parameter Groups. The available DB parameter groups appear in a list.
In the list, select the parameter group you want to modify.
Choose Edit Parameters and set the following parameter to the specified value:
read_only = 0
Choose Save Changes.
I think you should read a little about Cross region read replicas and how they work.
Working with Read Replicas of MariaDB, MySQL, and PostgreSQL DB Instances
Read Replica lag is influenced by a number of factors including the load on both the primary and secondary instances, the amount of data being replicated, the number of replicas, if they are within the same region or cross-region, etc. Lag can stretch to seconds or minutes, though typically it is under one minute.
Reference: https://stackoverflow.com/a/44442233/1715121
Facts to remember about RDS Read Replica
In Read Replica, a snapshot is taken of the primary database.
Read replicas are available in Amazon RDS for MySQL, MariaDB, and PostgreSQL.
Read replicas in Amazon RDS for MySQL, MariaDB, and PostgreSQL provide a complementary availability mechanism to Amazon RDS Multi-AZ Deployments
All traffic between the source and destination database is encrypted for Read Replica’s.
You need to enable backups before creating Read replica’s. This can be done by setting the backup retention period to a value other than 0
Amazon RDS for MySQL, MariaDB and PostgreSQL currently allow you to create up to five Read Replicas for a given source DB Instance
It is possible to create a read replica of another read replica. You can create a second-tier Read Replica from an existing first-tier Read Replica. By creating a second-tier Read Replica, you may be able to move some of the replication load from the master database instance to a first-tier Read Replica.
Even though a read replica is updated from the source database, the target replica can still become out of sync due to various reasons.
You can delete a read replica at any point in time.

I had the same issue. (Old question, but I couldn't find an answer anywhere else and this was my exact issue)
I had created a cross-region read replica, which when complete, all the original data was there, but no updates were synchronised between the two regions.
The issue was the parameter groups.
In my case, I had changed my primary from the default parameter group to one which allowed case insensitive tables. The parameter group is not copied over to the new region, so the replication was failing with:
Error 'Table 'MY_TABLE' doesn't exist' on query. Default database: 'mydb'. Query: 'UPDATE MY_TABLE SET ....''
So in short, create a parameter group in your new region that matches the primary region and assign that parameter group as soon as the replica is created.
I also ran this script on the newly created replica:
CALL mysql.rds_start_replication
I am unsure if this was required, but I ran it anyway.

I think only adding index can be done on slave db in amazon rds if you put read replica in write mode and it will be continue in write mode till you change parameter read_only=1 and apply it immediately.

Related

AWS RDS Read Replica but with different storage class/type

I have a master db in one region.. and I want to create a read replica of it in another region just for disaster recovery practices.
I do not want it to be that costly, but I want the replication to work.
My current master db has db.t2.medium.
My question is:
What type should I keep for my read replica? Is db.t2.small fine for my replica?
It should not have much effect as read replica (RR) replication is asynchronous:
Amazon RDS then uses the asynchronous replication method for the DB engine to update the read replica whenever there is a change to the primary DB instance.
This means that your RR will be always lagging behind the master. For exactly how much, it depends on your setup. Thus you should monitor the lag as shown in Monitoring read replication. This is needed, because you may find that the lag is unacceptably large for the RR to be useful for DR purposes (i.e. large RPO).

Azure SQL - Automatic Tuning with Geo-replication - Server in unspecified state and query store has reached its capacity limit

I have a primary db and a secondary geo-replicated db.
On the primary, the server atuomatic tuning is turned on.
On the replica when I try to do the same I encounter the following issues.
The database is inheriting settings from the server, but the server is
in the unspecified state. Please specify the automatic tuning state on
the server.
And
Automated recommendation management is disabled because Query Store
has reached its capacity limit and is not collecting new data. Learn
more about the retention policies to maintain Query Store so new data
can be collected.
However, on the server, tuning options are on so I don't understand that "unspecified state". Moreover, why I look at the Query Store set up in both databases properties in SSMS they are exactly the same with 9MB of space available out of 10MB.
Note: both databases are setup on the 5 DTUs basic pricing plan.
UPDATE
While the primary db Query store Operation Mode is Read Write, the replica is Read Only. It seems I cannot change it (I couldn't from the properties dialog of the db in SSMS).
Fair enough but how can the same query be 10 times faster on primary than on replica. Aren't optimizations copied accross?
UPDATE 2
Actually Query Store are viewable on SSMS and I can see that they are identical in both db. I think the difference in response times that I observe is not related.
UPDATE 3
I marked #vCillusion's post as the answer as he/she deserves credits. However, it's too detailed with regards to the actual issue.
My replica is read-only and as such cannot be auto-tuned as this would require writing in the query store. Azure not being able to collect any data into the read only query store led to a misleading (and wrong) error message about the query store reaching its capacity.
We get this message only when the Query Store is in read-only mode. Double check your query store configuration.
According to MSDN, you might need to consider below:
To recover Query Store try explicitly setting the read-write mode and recheck actual state.
ALTER DATABASE [QueryStoreDB]
SET QUERY_STORE (OPERATION_MODE = READ_WRITE);
GO
SELECT actual_state_desc, desired_state_desc, current_storage_size_mb,
max_storage_size_mb, readonly_reason, interval_length_minutes,
stale_query_threshold_days, size_based_cleanup_mode_desc,
query_capture_mode_desc
FROM sys.database_query_store_options;
If the problem persists, it indicates corruption of the Query Store data and continues on the disk. We can recover Query Store by executing sp_query_store_consistency_check stored procedure within the affected database.
If that did not help, you could try to clear Query Store before requesting read-write mode.
ALTER DATABASE [QueryStoreDB]
SET QUERY_STORE CLEAR;
GO
ALTER DATABASE [QueryStoreDB]
SET QUERY_STORE (OPERATION_MODE = READ_WRITE);
GO
SELECT actual_state_desc, desired_state_desc, current_storage_size_mb,
max_storage_size_mb, readonly_reason, interval_length_minutes,
stale_query_threshold_days, size_based_cleanup_mode_desc,
query_capture_mode_desc
FROM sys.database_query_store_options;
If you checked it and it's in read-write mode, then we might be dealing with some bug here. Please provide feedback to Microsoft on this.
Additional points of limitation in query store:
Also, note Query Store feature is introduced to monitor performance and is still evolving. There are certain known limitations around it.
As of now, it does not work on Read-Only databases (Including read-only AG replicas). Since readable secondary Replicas are read-only,
the query store on those secondary replicas is also read-only. This
means runtime statistics for queries executed on those replicas are
not recorded into the query store.
Check database is not Read-Only.
Query store didn't work for system databases like master or tempdb
Check if Primary filegroup have enough memory since the Data is stored only in Primary filegroup
The supported scenario is that Automatic tuning needs to be enabled on the primary only. Index automatically created on the primary will be automatically replicated to the secondary. This process takes the usual sync up time between the primary and the secondary. At this time it is not possible to have secondary read only replica tuned differently for performance than the primary. The query store error message is due to its read-only state as note above, and should be disregarded. The performance issue to your secondary replica most likely needs to be explored through some other reasons.

db replication vs mirroring

Can anyone explain the differences from a replication db vs a mirroring db server?
I have huge reports to run. I want to use a secondary database server to run my report so I can off load resources from the primary server.
Should I setup a replication server or a mirrored server and why?
For your requirements the replication is the way to go. (asumming you're talking about transactional replication) As stated before mirroring will "mirror" the whole database but you won't be able to query unless you create snapshots from it.
The good point of the replication is that you can select which objects will you use and you can also filter it, and since the DB will be open you can delete info if it's not required( just be careful as this can lead to problems maintaining the replication itself), or create specific indexes for the report which are not needed in "production". I used to maintain this kind of solutions for a long time with no issues.
(Assuming you are referring to Transactional Replication)
The biggest differences are: 1) Replication operates on an object-by-object basis whereas mirroring operates on an entire database. 2) You can't query a mirrored database directly - you have to create snapshots based on the mirrored copy.
In my opinion, mirroring is easier to maintain, but the constant creation of snapshots may prove to be a hassle.
As mentioned here
Database mirroring and database replication are two high data
availability techniques for database servers. In replication, data and
database objects are copied and distributed from one database to
another. It reduces the load from the original database server, and
all the servers on which the database was copied are as active as the
master server. On the other hand, database mirroring creates copies of
a database in two different server instances (principal and mirror).
These mirror copies work as standby copies and are not always active
like in the case of data replication.
This question can also be helpful or have a look at MS Documentation

What type of database replication should I use?

I have 2 databases, one on local server and one on a remote server.
I created a transactional replication publication on the local DB, which feeds the remote DB every minute with whatever updates it gets. So far, this is working perfectly.
However, the local DB needs to get cleaned (all its information deleted) daily. THIS is the part I'm having trouble with, I was expecting a replication mode that would only feed the server DB with the inserts, and make it ignore the part when the local DB gets cleaned. At the moment, the remote DB is also getting cleaned.
Would a different kind of replication help me achieve what I want, or is replication no longer the way to do it?
Have a look at this SO question here

Is it possible to have secondary server available read-only in a log shipping scenario?

I am looking into using log shipping in a SQL Server 2005 environment. The idea was to set up frequent log shipping to a secondary server. The intent: Use the secondary server to serve report queries, thereby offloading the primary db server.
I came across this on a sqlservercentral forum thread:
When you create the log shipping you have 2 choices. You can configure restore log operation to be done with norecovery or with standby option. If you use the norecovery option, you can not issue select statements on it. If instead of norecovery you use the standby option, you can run select queries on the database.
Bear in mind with the standby option when log file restores occur users will be kicked out without warning by the restore process. Acutely when you configure the log shipping with standby option, you can also select between 2 choices – kill all processes in the secondary database and perform log restore or don’t perform log restore if the database is being used. Of course if you select the second option, the restore operation might never run if someone opens a connection to the database and doesn’t close it, so it is better to use the first option.
So my questions are:
Is the above true? Can you really not use log shipping in the way I intend?
If it is true, could someone explain why you cannot execute SELECT statements to a database while the transaction log is being restored?
EDIT:
First question is duplicate of this serverfault question. But I still would like the second question answered: Why is it not possible to execute SELECT statements while the transaction log is being restored?
could someone explain why you cannot
execute SELECT statements to a
database while the transaction log is
being restored?
Short answer is that RESTORE statement takes an exclusive lock on the database being restored.
For writes, I hope there is no need for me to explain why they are incompatible with a restore. Why does it not allow reads either? First of all, there is no way to know if a session that has a lock on a database is going to do a read or a write. But even if it would be possible, restore (log or backup) is an operation that updates directly the data pages in the database. Since these updates go straight to the physical location (the page) and do not follow the logical hierarchy (metadata-partition-page-row), they would not honor possible intent locks from other data readers, and thus have the possibility to change structures as they are read. A SELECT table scan following the page next-prev pointers would be thrown into disarray, resulting in a corrupted read.
Well yes and no.
You can do exactly what you wish to do, in that you may offload reporting workloads to a secondary server by configuring Log Shipping to a read only copy of a database. I have set this type of architecture up on a number of occasions previously and it works very well indeed.
The caveat is that in order to perform a restore of a Transaction Log Backup file there must be no other connections to the database in question. Hence the two choices being, when the restore process runs it will either fail, thereby prioritising user connections, or it will succeed by disconnecting all user connection in order to perform the restore.
Dependent on your restore frequency this is not necessarily a problem. You simply educate your users to the fact that, say every hour at 10 past the hour, there is a possibility that your report may fail. If this happens simply re-run the report.
EDIT: You may also want to evaluate alternative architeciture solutions to your business need. For example, Transactional Replication or Database Mirroring with a Database Snapshot
If you have enterprise version, you can use database mirroring + snapshot to create read-only copy of the database, available for reporting, etc. Mirroring uses "continuous" log shipping "under the hood". It is frequently used in scenario you have described.
Yes it's true.
I think the following happens:
While the transaction log is being restored, the database is locked, as large portions of it are being updated.
This is for performance reasons more then anything else.
I can see two options:
Use database mirroring.
Schedule the log shipping to only occur when the reporting system is not in use.
Slight confusion in that, the norecovery flag on the restore means your database is not going to be brought out of a recovery state and into an online state - that is why the select statements will not work - the database is offline. The no-recovery flag is there to allow you to restore multiple log files in a row (in a DR type scenario) without bringing the database back online.
If you did not want to log ship / have the disadvantages you could swap to a one way transactional replication, but the overhead / set-up will be more complex overall.
Would peer-to-peer replication work. Then you can run queries on one instance and so save the load on the original instance.

Resources