I am just wondering if this is the safest way, in terms of the database, to copy my production setup to a development environment?
ssh user#app.com pg_dump app-production | psql app-development
I just want to make sure that this command doesn't or can't have any unintended side effects on the database being dumped.
It will impose a considerable load on the production to read all of the data from disk and send it over the net. It will also lock each object, sometimes in ways that can potentially interfere with the operation of the production system.
I think the least-impact method is to hook into whatever backup system you already have in place for the production system. If you use pg_dump for your backup, restore from the most recent one of those without touching production at all. If you use wal archiving for your backup, "restore" from that to get your clone, again without touching production at all.
It won't make any changes to the production database, however it might have a noticeable effect on production database performance.
It will increase the general load as its obviously going to access all the tables and the large objects.
However, the thing I'd be more concerned about is the way you're using the network. By piping direct through the connection you're relying on an open network connection throughout the process of the pg_dump and also keeping the access open until the load is completed at app-development.
Also, if there is a network drop or anything, you'd have to restart completely.
I'd recommend you dump to a file if you can. Something like
pg_dump -Fc --file=app-production.backup app-production
And then transfer app-production.backup with sftp to your dev box.
That way you can utilise the custom format "-Fc" which compresses the data so your ssh hit will be smaller. Also once you sftp the file to your local dev box, you can then load, reload, reload again as often as you want without revisiting your production database.
PG Dump documentation
Related
I'm using PostgreSQL v9.1 for my organization. The database is hosted in Amazon Web Services (EC2 instance) below a Django web-framework which performs tasks on the database (read/write data). The problem is, to backup this database in a periodic fashion in a specified format (see Requirements).
Requirements:
A standby server is available for backup purposes.
The master-db is to be backed up every hour. Once the hour is ticked, the db is quickly backed up in entirety and then copied to slave in a file-system archive.
Along with hourly backups, I need to perform a daily backup of the database at midnight and a weekly backup on midnight of every Sunday.
Weekly-backups will be the final backups of the db. All weekly-backups will be saved. Daily-backups of the last week will only be saved and Hourly-backups of the last day will only be saved.
But I have the following constraints too.
Live data comes into the server every day (rate of insertion is per 2 seconds).
The database now hosting critical customer data which implies that it cannot be turned off.
Usually, data stops coming into the db during nights, but there's a good chance that data might be coming into master-db during some nights for which I have no control over to stop the insertions (Customer-data will be lost)
If I use traditional backup mechanisms/software (example, barman), I've to configuring archiving mode in postgresql.conf and authenticate users in pg_hba.conf which implies I need a server-restart to turn it on which again, stops the incoming data for some minutes. This is not permitted (see above constraint).
Is there a clever way to backup the master-db for my needs? Is there a tool which can automate this job for me?
This is a very crucial requirement as data has begun to appear into the master-db since few days and I need to make sure there's replication of master-db on some standby-server all the time.
Use EBS snapshots
If, and only if, your entire database including pg_xlog, data, pg_clog, etc is on a single EBS volume, you can use EBS snapshots to do what you describe because they are (or claim to be) atomic. You can't do this if you stripe across multiple EBS volumes.
The general idea is:
Take an EBS snapshot using the EBS APIs using command line AWS tools or a scripting interface like the wonderful boto Python library.
Once the snapshot completes, use AWS API commands to create a volume from it and attach the volume your instance, or preferably to a separate instance, and then mount it.
On the EBS snapshot you will find a read-only copy of your database from the point in time you took the snapshot, as if your server crashed at that moment. PostgreSQL is crashsafe, so that's fine (unless you did something really stupid like set fsync=off in postgresql.conf). Copy the entire database structure to your final backup, e.g archive it to S3 or whatever.
Unmount, unlink, and destroy the volume containing the snapshot.
This is a terribly inefficient way to do what you want, but it will work.
It is vitally important that you regularly test your backups by restoring them to a temporary server and making sure they're accessible and contain the expected information. Automate this, then check manually anyway.
Can't use EBS snapshots?
If your volume is mapped via LVM, you can do the same thing at the LVM level in your Linux system. This works for the lvm-on-md-on-striped-ebs configuration. You use lvm snapshots instead of EBS, and can only do it on the main machine, but it's otherwise the same.
You can only do this if your entire DB is on one file system.
No LVM, can't use EBS?
You're going to have to restart the database. You do not need to restart it to change pg_hba.conf, a simple reload (pg_ctl reload, or SIGHUP the postmaster) is sufficient, but you do indeed have to restart to change the archive mode.
This is one of the many reasons why backups are not an optional extra, they're part of the setup you should be doing before you go live.
If you don't change the archive mode, you can't use PITR, pg_basebackup, WAL archiving, pgbarman, etc. You can use database dumps, and only database dumps.
So you've got to find a time to restart. Sorry. If your client applications aren't entirely stupid (i.e. they can handle waiting on a blocked tcp/ip connection), here's how I'd try to do it after doing lots of testing on a replica of my production setup:
Set up a PgBouncer instance
Start directing new connections to the PgBouncer instead of the main server
Once all connections are via pgbouncer, change postgresql.conf to set the desired archive mode. Make any other desired restart-only changes at the same time, see the configuration documentation for restart-only parameters.
Wait until there are no active connections
SIGSTOP pgbouncer, so it doesn't respond to new connection attempts
Check again and make sure nobody made a connection in the interim. If they did, SIGCONT pgbouncer, wait for it to finish, and repeat.
Restart PostgreSQL
Make sure I can connect manually with psql
SIGCONT pgbouncer
I'd rather explicitly set pgbouncer to a "hold all connections" mode, but I'm not sure it has one, and don't have time to look into it right now. I'm not at all certain that SIGSTOPing pgbouncer will achieve the desired effect, either; you must experiment on a replica of your production setup to ensure that this is the case.
Once you've restarted
Use WAL archiving and PITR, plus periodic pg_dump backups for extra assurance.
See:
WAL-E
PgBarman
... and of course, the backup chapter of the user manual, which explains your options in detail. Pay particular attention to the "SQL Dump" and "Continuous Archiving and Point-in-Time Recovery (PITR)" chapters.
PgBarman automates PITR option for you, including scheduling, and supports hooks for storing WAL and base backups in S3 instead of local storage. Alternately, WAL-E is a bit less automated, but is pre-integrated into S3. You can implement your retention policies with S3, or via barman.
(Remember that you can use retention policies in S3 to shove old backups into Glacier, too).
Reducing future pain
Outages happen.
Outages of single-machine setups on something as unreliable as Amazon EC2 happen a lot.
You must get failover and replication in place. This means that you must restart the server. If you do not do this, you will eventually have a major outage, and it will happen at the worst possible time. Get your HA setup sorted out now, not later, it's only going to get harder.
You should also ensure that your client applications can buffer writes without losing them. Relying on a remote database on an Internet host to be available all the time is stupid, and again, it will bite you unless you fix it.
In a server with a single postgres database, is it possible to migrate the whole database onto a different server (running the same OS, etc) without going through the usual time-consuming way of dumping and importing (pg_dump)?
After all, everything must still be in the filesystem?
Assumptions are the postgres service is not running, and the servers are running Ubuntu.
Also, if you want, you can use pg_basebackup which will connect over the network connection and request a copy of all files. This is preferable where the architecture, OS, etc. is not changing. For more complex cases, see barman which will manage this process for you.
I'm a new to PostgreSQL and I'm looking to backup the database. I understand that there are 3 methods pg_dump, snapshot and copy and using WAL. Which one do you suggest for full backup of the database? If possible, provide code snippets.
It depends a lot more on your operational requirements than anything else.
All three will require shelling out to an external program. libpq doesn't provide those facilities directly; you'll need to invoke the pg_basebackup or pg_dump via execv or similar.
All three have different advantages.
Atomic snapshot based backups are useful if the filesystem supports them, but become useless if you're using tablespaces since you then need a multivolume atomic snapshot - something most systems don't support. They can also be a pain to set up.
pg_dump is simple and produces compact backups, but requires more server resources to run and doesn't support any kind of point-in-time recovery or incremental backup.
pg_basebackup + WAL archiving and PITR is very useful, and has a fairly low resource cost on the server, but is more complex to set up and manage. Proper backup testing is imperative.
I would strongly recommend allowing the user to control the backup method(s) used. Start with pg_dump since you can just invoke it as a simple command line and manage a single file. Use the -Fc mode and pg_restore to restore it where needed. Then explore things like configuring the server for WAL archiving and PITR once you've got the basics going.
what's the best way to dump a large(terabytes) db? are there other faster/efficient way besides mysqldump? this is intended to be zipped, unzipped, and then reimported into another mysql db on another server.
If it's possible for you to stop the database server, the best way is probably for you to:
Stop the database
Do a file copy of the files (including appropriate transaction logs, etc) to a new file system.
Restart the database.
Then move the copied files to the new server and bring up the database on top of the files. It's a bit complicated to do this, but it's by far the fastest way.
I used to be a DBA for a terabyte+ database in MySQL and this is one of the ways we'd do nightly backups of the database. mysqldump would've never worked for data that large. We'd stop the database each night and file copy the underlying files.
Since your intent seems to be having two copies of the DB, why not set up replication to do this?
That will ensure that both copies of the DB remain in an identical state (in terms of data anyway).
And, if you want a snapshot to be exported, you can:
wait for a quiet time.
disable replication.
back up the slave copy.
re-enable replication.
I need to do some structural changes to a database (alter tables, add new columns, change some rows etc) but I need to make sure that if something goes wrong i can rollback to initial state:
All needed changes are inside a SQL script file.
I don't have administrative access to database.
I really need to ensure the backup is done on server side since the BD has more than 30 GB of data.
I need to use sqlplus (under a ssh dedicated session over a vpn)
Its not possible to use "flashback database"! It's off and i can't stop the database.
Am i in really deep $#$%?
Any ideas how to backup the database using sqlplus and leaving the backup on db server?
better than exp/imp, you should use rman. it's built specifically for this purpose, it can do hot backup/restore and if you completely screw up, you're still OK.
One 'gotcha' is that you have to backup the $ORACLE_HOME directory too (in my experience) because you need that locally stored information to recover the control files.
a search of rman on google gives some VERY good information on the first page.
An alternate approach might be to create a new schema that contains your modified structures and data and actually test with that. That presumes you have enough space on your DB server to hold all the test data. You really should have a pretty good idea your changes are going to work before dumping them on a production environment.
I wouldn't use sqlplus to do this. Take a look at export/import. The export utility will grab the definitions and data for your database (can be done in read consistent mode). The import utility will read this file and create the database structures from it. However, access to these utilities does require permissions to be granted, particularly if you need to backup the whole database, not just a schema.
That said, it's somewhat troubling that you're expected to perform the tasks of a DBA (alter tables, backup database, etc) without administrative rights. I think I would be at least asking for the help of a DBA to oversee your approach before you start, if not insisting that the DBA (or someone with appropriate privileges and knowledge) actually perform the modifications to the database and help recover if necessary.
Trying to back up 30GB of data through sqlplus is insane, It will take several hours to do and require 3x to 5x as much disk space, and may not be possible to restore without more testing.
You need to use exp and imp. These are command line tools designed to backup and restore the database. They are command line tools, which if you have access to sqlplus via your ssh, you have access to imp/exp. You don't need administrator access to use them. They will dump the database (with al tables, triggers, views, procedures) for the user(s) you have access to.