AWS DMS Replication Task Swap Usage issue - database

I have created AWS DMS Replication Task which is stuck at 70% with a message in log, the task used to run smoothly before and have been stuck for the last week.
2021-01-13T12:19:20:148589 [TASK_MANAGER ]D: Swap usage stats not available yet (replicationtask_cmd.c:1625)
I have checked Replication Instance Swap Memory and I see no usage of it in cloudformation at all, I have increased Source DB RAM so that there is no issue there, not sure which swap usage this is referring to. Please help debug this

Answer: This was a swap issue in the Target DB and not in Replication Instance/Source DB

Related

AWS RDS Backups

So I recently started using AWS and Elastic Beanstalk with RDS.
I wonder whats the best practices for creating database backups?
So far my setup is this.
Enable automatic backups.
bash script that creates manual snapshots everyday and removes manual snapshots older than 8 days.
bash script that creates a sql dump of the database and uploads it to S3.
Reason why I am creating the manual snapshots is if I were to delete the database by misstake, then I still have snapshots.
The bash scripts is on an EC2 instance launched with a IAM role which is allowed to execute these scripts.
Am I on the right track here?
I really appreciate answers, thanks.
A bit of context...
That automated backups are not saved after DB deletion is a very important technical gotcha. I've seen it catch devs on the team unawares, so thanks for bringing this up.
After the DB instance is deleted, RDS retains this final DB snapshot and all other manual DB snapshots indefinitely. However, all automated backups are deleted and cannot be recovered when you delete a DB instance. source
I suspect for most folks, the final snapshot is sufficient.
Onto the question at hand...
Yes. 110%. Absolutely.
I wouldn't create manual snapshots; rather, copy the automated ones.
Option 1: You already have the automated snapshots available. Why not just copy the automated snapshot (less unnecessary DB load; though, admittedly less of an issue if you're multi-AZ since you'll be snapshoting from the replica), which created a manual snapshot. I'd automate this using the aws sdk and a cron job.
Option 2: Requires manual adherence. Simply copy your automated snapshots (to create manual snapshots) before terminating a DB.
Unclear why you'd need the s3 dump if you have the snapshots.
For schema changes?: If you're doing it for schema changes, these should really be handled with migrations (we use knex.js, but pick your poison). If that's a bridge too far, remember that there's an option for schema-only dumps (pg_dump --schema-only). Much more manageable.
Getting at the data?: Your snapshots are already on s3, see faq. You can always load a snapshot, and sql dump it if you choose. I don't see an immediately obvious reason for purposely duplicating your data.

Postgres RDS read replica takes too long

We currently running PostgresRDS on AWS with two read replica, the instance is db.m4.4xlarge. We created new read replica with similar size yesterday to help with the load but it's been in this status: "Streaming replication has stopped." for over a day and the lag time only goes up.
Previously when we created read replica, after creation they were in similar state but it caught up after few hours. Any idea if there is a better way to get the real status of the read replica? why is the lag only grows even when we stopped writing to the master postgres.
It's not the first time we encounter the problem and the lack of real status/state is really annoying.
Thanks for the help.

Symmetric DS taking too long on DataExtractorService

I'm using SDS to migrate data from a SQL server to a Mysql database. My tests of moving the data of a database that was not in use worked correctly though they took like 48 hours to migrate all the existing data. I configured dead triggers to move all current data and triggers to move the new added data.
When moving to a live database that it is in use the data is being migrated too slow. On the log file I keep getting the message:
[corp-000] - DataExtractorService - Reached one sync byte threshold after 1 batches at 105391240 bytes. Data will continue to be synchronized on the next sync
I have like 180 tables and I have created 15 channels for the dead triggers and 6 channels for the triggers. For the configuration file I have:
job.routing.period.time.ms=2000
job.push.period.time.ms=5000
job.pull.period.time.ms=5000
I have none foreign key configuration so there wont be an issue with that. What I would like to know is how to make this process faster. Should I reduce the number of channels?
I do not know what could be the issue since the first test I ran went very well. Is there a reason why the threshold is not being clreared.
Any help will be apreciated.
Thanks.
How large are your tables? How much memory does the SymmetricDS instance have?
I've used SymmetricDS for a while, and without having done any profiling on it I believe that reloading large databases went quicker once I increased available memory (I usually run it in a Tomcat container).
That being said, SymmetricDS isn't by far as quick as some other tools when it comes to the initial replication.
Have you had a look at the tmp folder? Can you see any progress in file size. That is, the files which SymmetricDS temporarily writes to locally before sending the batch off to the remote side? Have you tried turning on more fine grained logging to get more details? What about database timeouts? Could it be that the extraction queries are running too long, and that the database just cuts them off?

Is writing server log files to a database a good idea?

After reading an article about the subject from O'Reilly, I wanted to ask Stack Overflow for their thoughts on the matter.
Write locally to disk, then batch insert to the database periodically (e.g. at log rollover time). Do that in a separate, low-priority process. More efficient and more robust...
(Make sure that your database log table contains a column for "which machine the log event came from" by the way - very handy!)
I'd say no, given that a fairly large percentage of server errors involve problems communicating with databases. If the database were on another machine, network connectivity would be another source of errors that couldn't be logged.
If I were to log server errors to a database, it would be critical to have a backup logger that wrote locally (to an event log or file or something) in case the database was unreachable.
Log to DB if you can and it doesn't slow down your DB :)
It's way way way faster to find anything in DB then in log files. Especially if you think ahead what you will need. Logging in db let's you query log table like this:
select * from logs
where log_name = 'wcf' and log_level = 'error'
then after you find error you can see the whole path that lead to this error
select * from logs
where contextId = 'what you get from previous select' order by timestamp
How will you get this info if you log it in text files?
Edit:
As JonSkeet suggested this answer would be better if I stated that one should consider making logging to db asynchronous. So I state it :) I just didn't need it. For example how to do it you can check "Ultra Fast ASP.NET" by Richard Kiessig.
If the database is production database, this is a horrible idea.
You will have issues with backups, replication, recovery. Like more storage for DB itself, replicas, if any, and backups. More time to setup and restore replication, more time to verify backups, more time to recover DB from backups.
It probably isn't a bad idea if you want the logs in a database but I would say not to follow the article's advice if you have a lot of log file entries. The main issue is that I've seen file systems have issues keeping up with logs coming from a busy site let alone a database. If you really want to do this I would look at loading the log files into the database after they are first written to disk.
Think about a properly setup database that utilizes RAM for reads and writes? This is so much faster than writing to disk and would not present the disk IO bottleneck you see when serving large numbers of clients that occurs when threads start locking down due to the OS telling them to wait on currently executing threads that are using all available IO handles.
I don't have any benchmarks to prove this data, although my latest application is rolling with database logging. This will have a failsafe as mentioned in one of these responses. If the database connection can not be created, create local database (h2 maybe?) and write to that. Then we can have a periodic check of database connectivity that will re-establish the connection, dump the local database, and push it to the remote database.
This could be done during off hours if you do not have a H-A site.
Sometime in the future I hope to develop benchmarks to demonstrate my point.
Good Luck!

SQL Server transactional replication for very large tables

I have set up transactional replication between two SQL Servers on different ends of a relatively slow VPN connection. The setup is your standard "load snapshot immediately" kind of thing where the first thing it does after initializing the subscription is to drop and recreate all tables on the subscriber side and then start doing a BCP of all the data. The problem is that there are a few tables with several million rows in them, and the process either a) takes a REALLY long time or b) just flat out fails. The messages I keep getting when I look in Replication Monitor are:
The process is running and is waiting for a response from the server.
Query timeout expired
Initializing
It then tries to restart the bulk loading process (skipping any BCP files that it has already loaded).
I am currently stuck where it just keeps doing this over and over again. It's been running for a couple days now.
My questions are:
Is there something I could do to improve this situation given that the network connection is so slow? Maybe some setting or something? I don't mind waiting a long time as long as the process doesn't keep timing out.
Is there a better way to do this? Perhaps make a backup, zip it, copy it over and then restore? If so, how would the replication process know where to pick up when it starts applying the transactions, since updates will be occurring between the time I make the backup and get it restored and running on the other side.
Yes.
You can apply the initial snapshot manually.
It's been a while for me, but the link (into BOL) has alternatives to setting up the subscriber.
Edit: From BOL How-tos, Initialize a Transactional Subscriber from a Backup
In SQL 2005, you have a "compact snapshot" option, that allow you to reduce the total size of the snapshot. When applied over a network, snapshot items "travel" compacted to the suscriber, where they are then expanded.
I think you can easily figure the potential speed gain by comparing sizes of standard and compacted snapshots.
By the way, there is a (quite) similar question here for merge replication, but I think that at the snapshot level there is no difference.

Resources