PostgreSQL: Difference between pg_dump + psql versus template create?

PostgreSQL: Difference between pg_dump + psql versus template create? - database

I know there are two ways of making a copy of a database.
One is to export the database as a giant SQL file, then load it as a separate database:
pg_dump <database> | psql <new database>
Another way is to pass the database name as a template to a database creation argument:
createdb -T <database> <new database>
What is the difference between these two methods, if any?
Are there any benefits of using one over another, such as performance?

Using CREATE DATABASE/createdb with a template makes a directory copy, whereas pg_dump + psql has to serialize and deserialize the whole database, send them on a round-trip to the client, and has to run everything through the transaction and write-ahead logging machinery. So the former method should be much faster.
The disadvantage is that CREATE DATABASE locks the template database while it's being copied. So if you want to create copies of a live database, that won't work so well. But if you want to quickly make copies of an inactive/template database, then using CREATE DATABASE is probably the right solution.

According to the current docs
Although it is possible to copy a database other than template1 by
specifying its name as the template, this is not (yet) intended as a
general-purpose "COPY DATABASE" facility. The principal limitation is
that no other sessions can be connected to the template database while
it is being copied. CREATE DATABASE will fail if any other connection
exists when it starts; otherwise, new connections to the template
database are locked out until CREATE DATABASE completes.
Apart from that mild warning, which goes back to at least version 8.2, you can make certain kinds of changes by using createdb--things like changing collation, encoding, etc. (Within limits.)
Personally, I'd have a hard time justifying the use of createdb, which takes a full database lock, to copy a production database.
I think the other main difference is that "dump and load" is a fully supported way of copying a database. Also, you can carry a copy of the dump to an isolated development computer for testing if you need to. (The createdb utility has to have access to both the source and the target at the same time.) But I have not used createdb to make copies, so I could be wrong.

Related

How do I clone an entire database from one Postgres instance, to become a part of a database hosted on a different instance?

I have a PostgreSQL instance A with 10 tables, and another instance B hosted on a different box, which contains the same 10 tables but also many others. I'd like to clone all 10 tables from the small database A to overwrite their equivalents in the larger database B. What's a good way to do this?
One path I'm considering is to do a full pg_dump of A, copy that dump file to B's host, then pg_restore it into B. It seems like it should work since I do want every single table on A to overwrite the table of the same name on B, but I'm just a bit nervous doing a pg_restore of a full database dump, and I'm also not very familiar with pg_dump and pg_restore so it would be great to have that plan validated by someone more knowledgeable.

You can use a plain format pg_dump with the --clean option and specify the tables you want to dump with -t.
Then you get an SQL script that contains only the tables you want replaced, and each table is preceeded with a DROP TABLE.
You can check the script before using it.

How to take a backup of a big filestream enabled db without the files

We have a large >40Gb filestream enabled db in Production. I would like to automatically make a backup of this db and restore it to staging, for testing deployments. The nature of our environment is such that the filestream data is > 90% of the data and I don't need it in staging.
Is there a way that I can make a backup of the db without the filestream data, as this would drastically reduce my staging disk and network requirements, while still enabling me to test a (somewhat) representative sample of prod?

I am assuming you have a fairly recent version of SQL Server. Since this is production, I am assuming you are in full recovery model.
You can’t just exclude individual tables from a backup. Backup and restore do not work like that. The only possibility i can think is to do a backup of just the file groups that do not contain the filestream. I am not 100% sure if you will be able to restore it though since I have never tried it. Spend some time researching partial backups and restoring a file group and give it a try.

You can use Generate Scripts and interface and do one of the following:
copy all SQL objects and the data (without the filestream tables) and recreate the database
copy all SQL objects without the data; create the objects in new database on the current SQL instance; copy the data that you need directly from the first database;
The first is lazy and probably will not work well with big database. The second will work for sure, but you need to sync the data by your own.
In both cases, open this interface:
Then choose all objects and all tables without the big ones:
From this option you can control the data extraction (skip or include):
I guess it will be best to script all the objects without the data. Then create a model database. You can even add some sample data in your model database. When you are changing the the production database (create new object, delete object, etc), apply these changes on your model database, too. Having such model database means you are having a copy of your production database with all supported functionalities and you can restore a this model database on every test SQL instance you want.

Copy Postgres database structures but not data

We are creating a Dockerfile that can spin up a Postgres database container, using this image as a basis
https://hub.docker.com/_/postgres/
Every time we test we want to create a fresh database with the production database as a template - we want to copy the database structures without copying the data in tables etc.
Can someone provide an example of doing this? I need something concrete with database urls etc.
There are some good examples here, but some are a bit nebulous to a Postgres newb.
I see examples like this:
pg_dump production-db | psql test-db
I don't know what "production-db" / "test-db" refer to (are they URL strings?), so I am lost. Also, I believe this copies over all the data in the DB, and we really just want to copy the database structures (tables, views, etc).

Django - Compare Model Code to Database

I maintain a Django project with a database that has several model constraints that have fallen out of sync with the actual database. So, for example, some model fields have null=False set, but the database permits NULLs for the corresponding database column.
I'm curious if there is a utility, either in Django or a third-party Python script, that will compare the SHOW CREATE TABLE output (in this case, using MySQL syntax) for each table and compare it with the python manage.py sql output, to highlight the discrepancies.
Granted, in an ideal situation, the database wouldn't fall out of sync with the Django model code in the first place, but since that's where I am, I'm curious if there's a solution to this problem before I write one myself or do the comparison manually.

./manage.py inspectdb generates the model file corresponding to the models that exist within the database.
You can diff it with your current model files using a standard unix diff or any other fancy diffing tool to find the difference and plan your migration strategy.
While the former seems simpler and better, you can also see the diff at the sql level. ./manage.py sqlall generates the sql for the current db schema and correspondingly show create table table-name shows the sql for the table creation.
You might want to refer http://code.google.com/p/django-evolution/ which once auto migrated the state of the db to the one in the current models. - Note however, that this project is old and seems abandoned.

I did come up with a quick and dirty means of doing what I described. It's not perfect, but if you run ./manage.py testserver, the test database will be created based on the model code. Then (using MySQL-specific syntax), you can dump the schema for the regular database and the test database to files:
$ mysqldump -uroot -p [database_name] --no-data=true > schema.txt
$ mysqldump -uroot -p [test_database_name] --no-data=true > test_schema.txt
Then you can simply diff schema.txt and test_schema.txt and find the differences.

For PostgreSQL, do a manage.py syncdb on a temporary empty database, then dump production and temporary databases with pg_dump -sOx and compare the resulting files. Among visual diff tools, at least GNOME Meld seems to cope well with PostgreSQL dumps.

Creating a New Database from Within a Stored Procedure

Due to an employee quitting, I've been given a project that is outside my area of expertise.
I have a product where each customer will have their own copy of a database. The UI for creating the database (licensing, basic info collection, etc) is being outsourced, so I was hoping to just have a single stored procedure they can call, providing a few parameters, and have the SP create the database. I have a script for creating the database, but I'm not sure the best way to actually execute the script.
From what I've found, this seems to be outside the scope of what a SP easily can do. Is there any sort of "best practice" for handling this sort of program flow?

Generally speaking, SQL scripts - both DML and DDL - are what you use for database creation and population. SQL Server has a command line interface called SQLCMD that these scripts can be run through - here's a link to the MSDN tutorial.
Assuming there's no customization to the tables or columns involved, you could get away with using either attach/reattach or backup/restore. These would require that a baseline database exist - no customer data. Then you use either of the methods mentioned to capture the database as-is. Backup/restore is preferrable because attach/reattach requires the database to be offline. But users need to be sync'd before they can access the database.

If you got the script to create database, it is easy for them to use it within their program. Do you have any specific pre-requisite to create the database & set permissions accordingly, you can wrap up all the scripts within 1 script file to execute.