AWS EC2 rsync between regions xtrabackup folder - database

Just to give you an idea, we have a DR db server in another region of AWS (Oregon), from the master (Virginia). We had an issue where replication broke, and we have to do a dump and restore.. we are talking about 3 tb of data.. so making a backup, creating an AMI, moving it across, dumping it back to a volume and then restoring is a lot of work. I am doing an rsync across ssh, and it is taking forever.. I estimate 2 days for the task to complete.. The data is an xtrabackup - so all db tables, and files basically..
Has anyone come across this issue, and what is the best way to transfer such massive amounts of data in the shortest amount of time? Believe me, I have thought of S3 etc.. but don't have the experience in transfer speeds to/from buckets across regions etc. Any ideas?

First made an Xtrabackup using this command:
xtrabackup -u root -H 127.0.0.1 -p 'supersecretpassword' --backup --datadir=/data/mysql/ --target-dir=/xtrabackup/
xtrabackup -u root -H 127.0.0.1 -p 'supersecretpassword' --prepare --datadir=/data/mysql/ --target-dir=/xtrabackup/
Then uploaded to S3 bucket using this command:
aws s3 sync /dbbackup s3://tmp-restore-bucket/
From the DR server in the other region, ran this command to download the xtrabackup straight to the db data folder after removing the existing db data files. This is the fastest way.
aws s3 sync s3://tmp-restore-bucket /data/mysql/
Finally start mysql on the DR server, and start your slave sync again using the command given in one of the xtrabackup files you created.
Super easy and the best and fastest way I've found.

Related

Do I need to dump databases from a volume before backing them up?

There are plenty of resources on how to dump Postgres/Mariadb/MySQL/etc. databases from a volume/container; my question is if I need to do so before backing them up. More explicitly, is it safe to stop my MariaDB container, copy the contents of the volume to another folder, and back that up directly? Are there consequences I should be aware of?
My current export code:
mkdir -p $HOME/backup/mariadb_backup
docker run --rm -v mariadb_volume:/data -v $HOME/backup:/backup ubuntu cp -aruT /data /backup/mariadb_backup
I then run borg on the backup folder.
It is safe to back up the files of a stopped database.
People usually don't want to shut down a database that's providing some service, so they come up with methods how not to do that.
One is run a dump operation that exports the contents of a database while it is serving other requests.
Another is a filesystem snapshot. That is atomically take a snapshot of the files underlying the database so that all files retain their content from a single point in time and then back that up.
The only thing you should not do is back up the files of a running database one by one. You will get an inconsistent copy if you do that.

How to push data from local SQL Server to Tableau Server on AWS

We are developing Tableau dashboards and deploying the workbooks on a EC2 windows instance in AWS. One of the data source is the company SQL server inside firewall. The server is managed by IT and we only have read permission to one of the databases. Now the solution is to build workbook on Tableau desktop locally by connecting to the company SQL server. Before publishing the workbooks to Tableau server, the data are extracted from data sources. The static data got uploaded with workbooks when published.
Instead of linking to static extracted data on Tableau server, we would like to set up a database on AWS (e.g. Postgresql), probably on the same instance and push the data from company SQL server to AWS database.
There may be a way to push directly from SQL server to postgres on AWS. But since we don't have much control of the server plus the IT folks are probably not willing to push data to external, this will not be an option. What I can think of is as follows:
Set up Postgres on AWS instance and create the tables with same schemas as the ones in SQL server.
Extract data from SQL server and save as CSV files. One table per file.
Enable file system sharing on AWS windows instance. So the instance can read files from local file system directly.
Load data from CSV to Postgres tables.
Set up the data connection on Tableau Server on AWS to read data from Postgres.
I don't know if others have come across a situation like this and what their solutions are. But I think this is not a uncommon scenario. One change would be to have both local Tableau Desktop and AWS Tableau Server connect to Postgres on AWS. Not sure if local Tableau could access Postgres on AWS though.
We also want to automate the whole process as much as possible. On local server, I can probably run a Python script as cron job to frequently export data from SQL server and save to CSVs. On the server side, something similar will be run to load data from CSV to Postgres. If the files are big, though, it may be pretty slow to import data from CSV to postgres. But there is no better way to transfer files from local to AWS EC2 instance programmatically since it is Windows instance.
I am open to any suggestions.
A. Platform choice
If you use a database other than SQL Server on AWS (say Postgres), you need to perform one (or maybe two) conversions:
In the integration from on on-prem SQl Server to AWS database you need to map from SQL Server datatypes to postgres datatypes
I don't know much about Tableau, but if it is currently pointing at SQL Server, you probably need some kind of conversion to point it at Postgres
These two steps alone might make it worth your while to investigate a SQL Express RDS. SQL Express has no licencing cost but obviously windows does. You can also run SQL Express on Linux which would have no licencing costs, but would require a lot of fiddling about to get running (i.e. I doubt if there is a SQL Express Linux RDS available)
B. Integration Approach
Any process external to your network (i.e. on the cloud) that is pulling data from your network will need the firewall opened. Assuming this is not an option, that leaves us only with push from on-prem options
Just as an aside on this point, Power BI achieves it's desktop data integration by using a desktop 'gateway' that coordinates data transfer, meaning that cloud Power BI doesn't need to open a port to get what it needs, it uses the desktop gateway to push it out
Given that we only have push options, then we need something on-prem to push data out. Yes, this could be a cron job on Linux or a windows scheduled task. Please note, this is where you start creating shadow IT
To get data out of SQL Server to be pushed to the cloud, the easiest way is to use BCP.EXE to generate flat files. If these are going into a SQL Server, these should be native format (to save complexity). If these are going to Postgres they should be tab delimited
If these files are being uploaded to SQL Server, then it's just another BCP command to push native files into tables into SQL Server (prior to this you need to run SQLCMD.EXE command to truncate the target table
So for three tables, assuming you'd installed the free* SQL Server client tools, you'd have a batch file something like this:
REM STEP 1: Clear staging folder
DEL /Y C:\Staging\*.TXT
REM STEP 2: Generate the export files
BCP database.dbo.Table1 OUT C:\Staging\Table1.TXT -E -S LocalSQLServer -N
BCP database.dbo.Table2 OUT C:\Staging\Table2.TXT -E -S LocalSQLServer -N
BCP database.dbo.Table3 OUT C:\Staging\Table3.TXT -E -S LocalSQLServer -N
REM STEP 3: Clear target tables
REM Your SQL RDS is unlikely to support single sign on
REM so need to use user/pass here
SQLCMD -U username -P password -S RDSSQLServerName -d databasename -Q"TRUNCATE TABLE Table1; TRUNCATE TABLE Table2; TRUNCATE TABLE Table3;"
REM STEP 4: Push data in
BCP database.dbo.Table1 IN C:\Staging\Table1.TXT -U username -P password -S RDSSQLServerName-N
BCP database.dbo.Table2 IN C:\Staging\Table2.TXT -U username -P password -S RDSSQLServerName-N
BCP database.dbo.Table3 IN C:\Staging\Table3.TXT -U username -P password -S RDSSQLServerName-N
(I'm pretty sure that BCP and SQLCMD are free... not sure but you can certainly download the free SQL Server tools and see)
If you wanted to push to Postgres SQL instead,
in step 2, you'd need to drop the -N option, which would make the file text, tab delimited, readable by anything
in step 3 and step 4 you'd need to use the associated Postgres command line tool, but you'd need to deal with data types etc. (which can be a pain - ambiguous date formats alone are always a huge problem)
Also note here the AWS RDS instance is just another database with a hostname, login, password. The only thing you have to do is make sure the firewall is open on the AWS side to accept incoming connections from your IP Address
There are many more layers of sophistication you can build into your integration: differential replication, retries etc. but given the 'shadow IT status' this might not be worth it
Also be aware that I think AWS charges for data uploads, so if you are replicating a 1G database everyday, that's going to add up. (Azure doesn't charge for uploads but I'm sure you'll pay in some other way!)
For this type of problem I would strongly recommend use of SymmetricDS - https://www.symmetricds.org/
The main caveat is that the SQL Server would require the addition of some triggers to track changes but at that point SymmetricDS will handle the push of the data.
An alternative approach, similar to what you suggested, would be to have a script export the data into CSV files, upload them to S3, and then have a bucket event trigger on the S3 bucket that kicks off a Lambda to load the data when it arrives.

Unable to DELETE or GET couchdb2 databases

I have a testing script that creates and deletes testing databases. At some point today it started failing. Digging further it looks like several of my testing databases are in an inconsistent state.
The databases appear in Fauxton with the message "This database failed to load." I am unable to view the database contents on this interface. Their names which are usually links are now plain text.
Issuing GET and DELETE commands with curl shows the following errors:
$ curl -s -X DELETE http://username:password#0.0.0.0:5984/dbname
{"error":"error","reason":"internal_server_error"}
$ curl -s -X GET http://username:password#0.0.0.0:5984/dbname
{"error":"internal_server_error","reason":"No DB shards could be opened.","ref":2413987899}
I have looked inside the couchdb2 data directory and I do see that shards exist for these databases.
What can I do to delete these databases? I am not sure if I can do this by manually deleting files in the couchdb2 data directory.
Have you solved your issue yet? I had this same problem, and ultimately ended up just installing a new CouchDB 2.1.0 instance and replicating to it before taking down the original. I suspect it might have had something to do with CouchDB not liking its default choice of "couchdb#localhost" as the name for a node, because it was constantly telling me that was an illegal hostname.

How to operate a postgres database from 1 hard disk on multiple systems?

My issue is that I work from various systems yet require access to a single, large (50 Gb) database that is always up to date. If it were a smaller database, dumping and restoring the database onto the external disk would by fine, e.g. via $ pg_dump mydb > /path../db.sql to save it and then on the other computer use $psql -d mydb -f /path../db.sql to recover the data and so on...
In this case, however, that option will not work as I don't have 50 Gb of free space on both machines. So I'd like the files for this particular db to be on a single external drive.
How would I do this?
I've been using pg_cltcluster for this purpose, e.g.
$ cp -rp ax/var/lib/postgresql/9.1/main /media/newdest # move the data
$ pg_cltcluster 9.1 main stop # stop the postgres server (on ubuntu: add -- [ctl args])
$ /usr/lib/postgresql/9.1/bin/initdb -D /media/newdest # initialise instance in new place
(On ubuntu, pg_ctlcluster is used instead of pg_ctl to allow multiple db clusters and this should allow pg_ctlcluster 9.1 main start -- -D /media/newdest to replace the last line of code, I think)
I suspect this approach is not the best solution because a) it's not working at present after various tries and b) I'm not sure I'll be able to access the cluster from another computer.
Database software are designed to handle large datasets and moving your data around is a common task so I am baffled about why there is less info on this on the internet:
This question that basically says "don't use TABLESPACES to do it"
This one that just solves the permissions problem and links to a (useful) IMB page on the matter that talks about moving the entire setup, not just one database as I want to.

I have a 18MB MySQL table backup. How can I restore such a large SQL file?

I use a Wordpress plugin called 'Shopp'. It stores product images in the database rather than the filesystem as standard, I didn't think anything of this until now.
I have to move server, and so I made a backup, but restoring the backup is proving a horrible task. I need to restore one table called wp_shopp_assets which is 18MB.
Any advice is hugely appreciated.
Thanks,
Henry.
For large operations like this it is better to go to command line. phpMyAdmin gets tricky when lots of data is involved because there are all sorts of timeouts in PHP that can trip it up.
If you can SSH into both servers, then you can do a sequence like the following:
Log in to server1 (your current server) and dump the table to a file using "mysqldump" --- mysqldump --add-drop-table -uSQLUSER -pPASSWORD -h
SQLSERVERDOMAIN DBNAME TABLENAME > BACKUPFILE
Do a secure copy of that file from server1 to server2 using "scp" ---
scp BACKUPFILE USER#SERVER2DOMAIN:FOLDERNAME
Log out of server 1
Log into server 2 (your new server) and import that file into the new DB using "mysql" --- mysql -uSQLUSER -pPASSWORD DBNAME < BACKUPFILE
You will need to replace the UPPERCASE text with your own info. Just ask in the comments if you don't know where to find any of these.
It is worthwhile getting to know some of these command line tricks if you will be doing this sort of admin from time to time.
try HeidiSQL http://www.heidisql.com/
connect to your server and choose the database
go to menu "import > Load sql file" or simply paste the sql file into the sql tab
execute sql (F9)
HeidiSQL is an easy-to-use interface
and a "working-horse" for
web-developers using the popular
MySQL-Database. It allows you to
manage and browse your databases and
tables from an intuitive Windows®
interface.
EDIT: Just to clarify. This is a desktop application, you will connect to your database server remotely. You won't be limited to php script max runtime, or upload size limit.
use bigdupm.
create a folder on your server witch is not easy to guess like "BigDump_D09ssS" or w.e
Download the http://www.ozerov.de/bigdump.php importer file and add them to that directory after reading the instructions and filling out your config information.
FTP The .SQL File to that folder along side the bigdump script and go to your browser and navigate to that folder.
Selecting the file you uploaded will start importing the SQL is split chunks and would be a much faster method!
Or if this is an issue i reccomend the other comment about SSH And mysql -u -p -n -f method!
Even though this is an old post I would like to add that it is recommended to not use database-storage for images when you have more than like 10 product(image)s.
Instead of exporting and importing such a huge file it would be better to transfer the Shopp installation to file-storage for images before transferring.
You can use this free plug-in to help you. Always backup your files and database before performing this action.
What I do is open the file in a code editor, copy and paste into a SQL window within phpmyadmin. Sounds silly, but I swear by it via large files.

Resources