Copying a tablespace from one postgresql instance to another - database

I'm looking for a way to quickly "clone" a database from 1 postgresql server to another.
Assuming...
I have a postgresql server running on HostA, serving 2 databases
I have 2 devices mounted on HostA, each device stores the data for one of the database (i.e. 1 database => 1 tablespace => 1 device )
I am able to obtain a "safe" point in time snapshot through some careful method
I am able to consistently and safely produce a clone of any of the 2 devices used (I could even assume both database only ever receive reads)
The hosts involved are CentOS 5.4
Is it possible to host a second postgresql server on HostB, mount one of the cloned devices and get the corresponding database to "pop" into existence? (Kind of like when you copy the MyISAM table files in MySQL).
If this is at all possible, what is the mechanism (i.e. what DDL should I look into or pg commands)?
It's important for me to be able to move individual databases in isolation of each other. But if this isn't possible, would a similar approach work at a server level (trying to clone and respawn a server by copying the datadir over to a host with the same postgresql install) ?

Not easily, because there are a number of files that are shared between databases which means each database in the same install is dependent on this.
You can do it at the server level, or at the cluster level, but not at individual database level. Just be sure to copy/clone over the whole data directory and all external tablespaces. As long as you can make the clone atomically (either on the same filesystem or with a system that can do atomic clones across filesystems), you don't even need to stop the database on hostA to do it.

Related

How to Setup an existing postgres DB in a new server?

I have an postgresql DB on an AWS Instance, for some reason the instance now is damaged and the only thing I can do is to detach the disk volume and attach it to a new instance.
The challenge I have now is, how do I setup the postgresql DB I had on the damaged instance volume into the new instance without losing any data.
I tried to attach the damaged instance volume as the main volume in the new instance but it doesnt boot up so what I did was that I mounted the volume as a secondary disk and now I can see the information in it including the "data" folder where postgres DB information its supposed to be, but I dont know what to do in order to enable the DB on this new instance.
The files in the /path/to/data/ directory are all that you need in order for a PostgreSQL instance to start up, given that the permissions are set to 0700 and owned by the user starting up the process (usually postgres, but sometimes others). The other things to bear in mind are:
Destination OS needs to be the same as where you got the data/ directory from (because filesystem variations may either corrupt your data or prevent Postgres from starting up)
Destination filesystem needs to be the same as where you got your data/ directory from (for reasons similar to above)
Postgres major version needs to be the same as where you got your data/ directory from (for reasons similar to above)
If these conditions are met, you should be able to bring up the database and use it.

How to achieve consistency among multiple databases? [duplicate]

What is database clustering? If you allow the same database to be on 2 different servers how do they keep the data between synchronized. And how does this differ from load balancing from a database server perspective?
Database clustering is a bit of an ambiguous term, some vendors consider a cluster having two or more servers share the same storage, some others call a cluster a set of replicated servers.
Replication defines the method by which a set of servers remain synchronized without having to share the storage being able to be geographically disperse, there are two main ways of going about it:
master-master (or multi-master) replication: Any server can update the database. It is usually taken care of by a different module within the database (or a whole different software running on top of them in some cases).
Downside is that it is very hard to do well, and some systems lose ACID properties when in this mode of replication.
Upside is that it is flexible and you can support the failure of any server while still having the database updated.
master-slave replication: There is only a single copy of authoritative data, which is the pushed to the slave servers.
Downside is that it is less fault tolerant, if the master dies, there are no further changes in the slaves.
Upside is that it is easier to do than multi-master and it usually preserve ACID properties.
Load balancing is a different concept, it consists distributing the queries sent to those servers so the load is as evenly distributed as possible. It is usually done at the application layer (or with a connection pool). The only direct relation between replication and load balancing is that you need some replication to be able to load balance, else you'd have a single server.
From SQL Server point of view:
Clustering will give you an active - passive configuration. Meaning in a 2 node cluster, one of them will be the active (serving) and the other one will be passive (waiting to take over when the active node fails). It's a high availability from hardware point of view.
You can have an active-active cluster, but it will require multiple instances of SQL Server running on each node. (i.e. Instance 1 on Node A failing over to Instance 2 on Node B, and instance 1 on Node B failing over to instance 2 on Node A).
Load balancing (at least from SQL Server point of view) does not exists (at least in the same sense of web server load balancing). You can't balance load that way. However, you can split your application to run on some database on server 1 and also run on some database on server 2, etc. This is the primary mean of "load balancing" in SQL world.
Clustering uses shared storage of some kind (a drive cage or a SAN, for example), and puts two database front-ends on it. The front end servers share an IP address and cluster network name that clients use to connect, and they decide between themselves who is currently in charge of serving client requests.
If you're asking about a particular database server, add that to your question and we can add details on their implementation, but at its core, that's what clustering is.
Database Clustering is actually a mode of synchronous replication between two or possibly more nodes with an added functionality of fault tolerance added to your system, and that too in a shared nothing architecture. By shared nothing it means that the individual nodes actually don't share any physical resources like disk or memory.
As far as keeping the data synchronized is concerned, there is a management server to which all the data nodes are connected along with the SQL node to achieve this(talking specifically about MySQL).
Now about the differences: load balancing is just one result that could be achieved through clustering, the others include high availability, scalability and fault tolerance.

Easiest way to replicate (copy? Export and import?) a large, rarely changing postgreSQL database

I have imported about 200 GB of census data into a postgreSQL 9.3 database on a Windows 7 box. The import process involves many files and has been complex and time-consuming. I'm just using the database as a convenient container. The existing data will rarely if ever change, and will be updating it with external data at most once a quarter (though I'll be adding and modifying intermediate result columns on a much more frequent basis. I'll call the data in the database on my desktop the “master.” All queries will come from the same machine, not remote terminals.
I would like to put copies of all that data on three other machines: two laptops, one windows 7 and one Windows 8, and on a Ubuntu virtual machine on my Windows 7 desktop as well. I have installed copies of postgreSQL 9.3 on each of these machines, currently empty of data. I need to be able to do both reads and writes on the copies. It is OK, and indeed I would prefer it, if changes in the daughter databases do not propagate backwards to the primary database on my desktop. I'd want to update the daughters from the master 1 to 4 times a year. If this wiped out intermediate results on the daughter databases this would not bother me.
Most of the replication techniques I have read about seem to be worried about transaction-by-transaction replication of a live and constantly changing server, and perfect history of queries & changes. That is overkill for me. Is there a way to replicate by just copying certain files from one postgreSQL instance to another? (If replication is the name of a specific form of copying, I'm trying to ask the more generic question). Or maybe by restoring each (empty) instance from a backup file of the master? Or of asking postgreSQL to create and export (ideally on an external hard drive) some kind of postgreSQL binary of the data that another instance of postgreSQL can import, without my having to define all the tables and data types and so forth again?
This question is also motivated by my desire to work around a home wifi/lan setup that is very slow – a tenth or less of the speed of file copies to an external hard drive. So if there is a straightforward way to get the imported data from one machine to another by transference of (ideally compressed) binary files, this would work best for my situation.
While you could perhaps copy the data directory directly as mentioned by Nick Barnes in the comments above, I would recommend using a combination of pg_dump and pg_restore, which will dump a self-contained file which can then be dispersed to the other copies.
You can run pg_dump on the master to get a dump of the DB. I would recommend using the options -Fc -j3 to use the custom binary format (instead of dumping in SQL format; this should be much smaller and perhaps faster as well) and will dump 3 tables at once (this can be adjusted up or down depending on the disk throughput capabilities of your machine and the number of cores that it has).
Then you run dropdb on the copies, createdb to recreate an empty DB of the same name, and then run pg_restore on that new empty DB to restore the dump file to the DB. You would want to use the options -d <dbname> -f <dump_file> -j3 (again adjusting the number for -j according to the abilities of the machine).
When you want to refresh the copies with new content from the master DB, simply repeat the above steps

Postgresql: direct replication on demand

I have two Postgresql databases (on different servers): lets call them stable and unstable. The stable is for read purposes only. From time to time the the stable is updated to the current unstable.
I would like to improve the update process. Dumping to a file and restoring is not possible any more due to the size of database. I found many replication tools designed for a master-slave configuration, but that is not exactly what I do, as the slave (stable) is updated on demand. The tools seem to be more complicated that the problem I want them to solve.
Ideally, I would like a simple tool with interface like pg_replicate source target which would result in target db being exact clone of source and the data being transfered directly between the two database servers (they are connected by a local network) without creating large temporary files. The database size is 50 GB and growing, so transfering SQL inserts might not suffice.
So, is there a tool which could do it? Or anything close?

Access Newly Created PostgreSQL Cluster

I am looking to create a new database cluster in PostgreSQL because I want this cluster to point to a different (and larger) data directory than my current cluster on localhost:5432. To create a new cluster, I ran the command below. However, after restarting PostgresSQL I don't see the db cluster in pgAdmin and can't connect to a server with port 5435. How do I connect to this new cluster? Alternatively, I thought I could create a new tablespace within the old cluster which points to this larger data directory, but I'm assuming that populating a database using this tablespace would still result in files being written to the cluster data directory? I'm running PostgreSQL 9.3 on Ubuntu 12.4.
$ pg_createcluster -d /home/foo 9.3 test_env
Creating new cluster 9.3/test_env ...
config /etc/postgresql/9.3/test_env
data /home/foo/
locale en_US.UTF-8
port 5435
You need to specify option '--start' if you want it to actually start the database. Without that it will just create the data directory.
However using table spaces would probably be a better solution since running multiple database clusters introduces the overhead of running completely separate postmaster processes. If you create a database in a table space then all files associated with the database will go in the associated directory. The system catalogue will still be in the cluster's data directory, but this is unlikely to be large enough to cause you any problems.

Resources