I am attempting to import a 40GB SQL backup on one of my servers and the software I used to automate the backups apparently included the "information_schema". Are there any tools/scripts/etc. that could be used to remove this data?
Due to the size of the SQL file, I have tried Notepad++ (file too large) and other Text Editors make it very difficult to tell what information belongs to the information_schema.
About at my wit's end and hoping there is something that could simplify removing this data from the SQL dump. I tried running the import with "-f" to force past it but it made what appears to be a bit of a mess.
I have tried how to figure out this. but the only idea is using grep and remove TABLES in inforation_schema like this: this example removes 3 tables from dump.sql
egrep -v '(GLOBAL_VARIABLES|CHARACTER_SETS|COLLATIONS)' dump.sql > new_dump.sql
in practice, 40 tables should be given....
I hope you succeed.
FYI. one database is 40GB? how many tables in there. To restore dump file quickly, in later separate big tables or databases from each other. and load concurrently.
Related
I need to backup a Drupal database it is huge. So it has over 1500 tables (don't blame me, its a Drupal thing) and is 10GB in size.
I couldn't do it with PHPMyAdmin, I just got an error when it started to build the .sql file.
I want to make sure I wont break anything or take the server down or anything when I try to back it up.
I was going to attempt a mysqldump on my server and then copy the file down locally but realised that this may cause unforeseen problems. So my question to you is, is it safe to use mysqldump on so many tables at once and even if it is safe are there any problems such a huge file could lead to in the future for rebuilding the database?
Thanks for the input guys.
is it safe to use mysqldump on so many tables at once
I run daily backups with mysqldump on servers literally 10x this size: 15000+ tables, 100+ GB.
If you have not examined the contents of a file produced by mysqldump ... you should, because to see its output is to understand why it is an intrinsically safe backup utility:
The backups are human-readable, and consist entirely of the necessary SQL statements to create a database exactly like the one you backed up.
In this form, their content is easily manipulated with ubiquitous tools like sed and grep and perl, which can be used to pluck out just one table from a file for restoration, for example.
If a restoration fails, the error will indicate the line number within the file where the error occurred. This is usually related to buggy behavior in the version of the server where the backup was created (e.g. MySQL Server 5.1 allowed you to create views in some situations where the server itself would not accept the output of its own SHOW CREATE VIEW statement. The create statement was not considered -- by the same server -- to be a valid view definition, but this was not a defect in mysqldump, or in the backup file, per se.)
Restoring from a mysqldump-created backup is not lightning fast, because the server must execute all of those SQL statements, but from the perspective of safety, I would argue that there isn't a safer alternative, since it is the canonical backup tool and any bugs are likely to be found and fixed by virtue of the large user base, if nothing else.
Do not use the --force option, except in emergencies. It will cause the backup to skip over any errors encountered on the server while the backup is running, causing your backup to be incomplete with virtually no warning. Instead, find and fix any errors that occur. Typical errors during backup are related to views that are no longer valid because they reference tables or columns that have been renamed or dropped, or where the user who originally created the view has been removed from the server. Fix these by redefining the view, correctly.
Above all, test your backups by restoring them to a different server. If you haven't done this, you don't really have backups.
The output file can be compressed, usually substantially, with gzip/pigz, bzip2/bpzip2, xz/pixz, or zpaq. These are listed in approximate order by amount of space saved (gzip saves the least, zpaq saves the most) and speed (gzip is the fastest, zpaq is the slowest). pigz, pbzip2, pixz, and zpaq will take advantage of multiple cores, if you have then. The others can only use a single core at a time.
Use mysqlhotcopy it is well working with large databases
Work only MyISAM and ARCHIVE-tables.
Work only on the server where the database is stored.
This utility is deprecated in MySQL 5.6.20 and removed in MySQL 5.7
I have imported about 200 GB of census data into a postgreSQL 9.3 database on a Windows 7 box. The import process involves many files and has been complex and time-consuming. I'm just using the database as a convenient container. The existing data will rarely if ever change, and will be updating it with external data at most once a quarter (though I'll be adding and modifying intermediate result columns on a much more frequent basis. I'll call the data in the database on my desktop the “master.” All queries will come from the same machine, not remote terminals.
I would like to put copies of all that data on three other machines: two laptops, one windows 7 and one Windows 8, and on a Ubuntu virtual machine on my Windows 7 desktop as well. I have installed copies of postgreSQL 9.3 on each of these machines, currently empty of data. I need to be able to do both reads and writes on the copies. It is OK, and indeed I would prefer it, if changes in the daughter databases do not propagate backwards to the primary database on my desktop. I'd want to update the daughters from the master 1 to 4 times a year. If this wiped out intermediate results on the daughter databases this would not bother me.
Most of the replication techniques I have read about seem to be worried about transaction-by-transaction replication of a live and constantly changing server, and perfect history of queries & changes. That is overkill for me. Is there a way to replicate by just copying certain files from one postgreSQL instance to another? (If replication is the name of a specific form of copying, I'm trying to ask the more generic question). Or maybe by restoring each (empty) instance from a backup file of the master? Or of asking postgreSQL to create and export (ideally on an external hard drive) some kind of postgreSQL binary of the data that another instance of postgreSQL can import, without my having to define all the tables and data types and so forth again?
This question is also motivated by my desire to work around a home wifi/lan setup that is very slow – a tenth or less of the speed of file copies to an external hard drive. So if there is a straightforward way to get the imported data from one machine to another by transference of (ideally compressed) binary files, this would work best for my situation.
While you could perhaps copy the data directory directly as mentioned by Nick Barnes in the comments above, I would recommend using a combination of pg_dump and pg_restore, which will dump a self-contained file which can then be dispersed to the other copies.
You can run pg_dump on the master to get a dump of the DB. I would recommend using the options -Fc -j3 to use the custom binary format (instead of dumping in SQL format; this should be much smaller and perhaps faster as well) and will dump 3 tables at once (this can be adjusted up or down depending on the disk throughput capabilities of your machine and the number of cores that it has).
Then you run dropdb on the copies, createdb to recreate an empty DB of the same name, and then run pg_restore on that new empty DB to restore the dump file to the DB. You would want to use the options -d <dbname> -f <dump_file> -j3 (again adjusting the number for -j according to the abilities of the machine).
When you want to refresh the copies with new content from the master DB, simply repeat the above steps
I just started using HeidiSQL to manage my databases at work. I was previously using NaviCat, and I was able to simply drag and drop a database from one server to a database on another server and copy all the tables and data--piece of cake, backed up. Similar functionality isn't obvious in Heidi, but I decided using mysqldump is a standard way of backing up a database and I should get to know it.
I used Heidi to perform the dump--creating databases, creating tables, inserting data, and making one single SQL file. This was performed on one single database that is 801.7 MB (checksum: 1755734665). When I executed the dump file on my local doppelganger database it appeared to work flawlessly, however the database size is 794.0 MB (checksum: 2937674450).
So, by creating a new database from a mysqldump of a database I lost 7.7 MB of something (and the databases have different checksums--not sure what that means though). I searched and found a post saying that performing a mysqldump can optimize a database, but could not find any conclusive information.
Am I being a helicopter parent? Is there any easy way to make sure I didn't lose data/structure somehow?
Thank you,
Kai
I got some project using PostgreSQL database in legacy, and it uses 19 stored procedures (functions), and some 70 views.
Now, we did some update on live database, and as functions were changed, due to postgres limitation and need to drop and recreate all functions and views, we spent quite some time to do that.
Is there an automated way of changing functions and views in postgress in a way that it takes care about dependencies and do it in proper order.
We have basic views that then create upper level views ... its a bit complex database, at least for me :)
Thanks
I think the easiest way to do this is to backup a database to a text file:
pg_dump database_name > database_name.pg_dump
They'll be in a proper dependency order, as otherwise restoring a database from backup would be hard. You can edit function and view definitions in the backup file and restore it back to new database.
If database backup file is too big to be edited in your editor, from Postgres 9.2, you can split it to 3 sections:
pg_dump --section=pre-data > database_name.1.pg_dump
pg_dump --section=data > database_name.2.pg_dump
pg_dump --section=post-data > database_name.3.pg_dump
You'll edit only the first section, which will be small. In older versions you could use for example split utility.
If you cannot afford downtime required for backup and restore it gets trickier. But I'll still recommend working with backup file. Remember that Postgres supports DDL in transactions — if you import functions and views in a transaction and there'll be an error, you can simply rollback all changes, make corrections and try again.
There is no "easy" way. The best approach IMO is to be prepared first and the set up a way to do this using SQL scripts and version control.
What we do in LedgerSMB is we keep the function definitions in a series of .sql files, which are tracked in subversion. We then have a script that reloads them. This will take some work to set up if you haven't done so before. The easiest way to do this is:
pg_dump -s > ddl_statements_for_mydb.sql
Then you can copy/paste the function definitions (change CREATE to CREATE OR REPLACE or add a DROP IF EXISTS where appropriate). Then you will want to modularize into usable chunks and have a script that reloads all chunks in the right order into your db. the time and effort that goes into setting up everything now will save many times that in the future because you can apply changes in a predictable way to testing, staging, and production accounts with no appreciable downtime (perhaps even no downtime at all depending on how you structure it).
I have a sql file generated by "mysqldump --all-databases" . There are many databases in it. What I want to do is to update my local database but only a specific one, not all. I tried to use "mysql -database=db_name < file.sql" but it updated all databases. Is there a way to skip all databases except the one that I want.
You can try doing:
mysql -D example_database -o < dump.sql
This will only execute the SQL commands for the specified database and will skip the commands for all other databases. The -o ("one database") option is critical to this working as expected (it tells mysql to ignore statements related to other databases).
dump.sql is the result of executing mysqldump --all-databases
it's getting dirty, stop reading if or be warned ;)
It's not pretty and maybe there is a better (correct) way to achive this. But assuming the dumps you are working with aren't to big, you might be quicker by importing the full dump into temporary DBs and creating a fresh dump of the database you'd like to import. (As I said, not pretty)
Additionally, you should really make sure that you're able to restore backups you make (in any imaginable way). This could get really embarrassing the day you need them urgently.
A couple of ideas:
You can edit the dump file (it's just a text file, after all), extracting only the items related to the database you want to restore. I think each database is written to the file individually, so this is a matter of identifying the one big block related to the one you want and deleting the bits before and after it -- e.g., not as much of a chore as it sounds. Look for the create database and use statements (there's also -- at least in my version, with my options) a banner comment "Current Database: foo" at the top of each section). This would be pretty easy to do with vi or anything else that lets you do large operations easily. You can then search through the result ensuring that there are no cross-references to the DBs you don't want updated.
You can back up the databases you don't want updated, do the update, then restore them. I mean, before you do this sort of thing, you have a backup anyway, right? ;-)
You can restore to a blank MySQL instance, then grab the backup of the one you want. Blech.
(Variation on #3) You can do a rename on all of your current databases, import the file, then drop the ones you don't want and rename the originals back. Also blech.
I'd probably go with #1 (after a full backup). Good luck with it.
Edit No, I'd go with codaddict's -D databasename -o solution (after a full backup). Nice one