We're using solr 3.6 replication with 2 servers - a master and a slave - and we're currently looking for the way to do clean backups.
As the wiki says so, we can use a HTTP command to create a snapshot of the master like this: http://myMasterHost/solr/replication?command=backup
But we still have some questions:
What is the benefit of the backup command on a classic shell script copying the index files?
The command only backups the indexes; is it possible to copy also the spellchecker folder? is it needed?
Can we create the snapshot while the application is running, so while there are potential index updates?
When we have to restore the servers from the backup, what do we have to do on the slave?
just copy the snapshot in its index folder, and removing the replication.properties file (or not)?
ask for a fetchindex through the HTTP command http://mySlave/solr/replication?command=fetchindex ?
just empty the slave index folder, in order to force a full replication from the master?
You can use the backup command provided by the ReplicationHandler. It's an asynchronous operation and it takes time if your index is big. This way you don't need to shutdown Solr. Then you'll find within the index directory a new directory named backup.yyyymmddHHMMSS with the backup date. You can also configure how many old backups you want to keep.
After that of course it's better if you move the backup to a safe location, probably to a different server.
I don't think it's possible to backup the spellchecker, not completely sure though.
Of course the command is meant to be run while the application is running. The only problem is that you will probably lose in the backup the documents that you committed after you started the backup itself.
You can also have a look at the lucene CheckIndex tool. Once you backed up the index you could check if the index is ok.
I wouldn't personally use the backups to restore the index on the slaves if you already have a good index on the master. The copy of the index would be automatic using the standard replication process (it's really a copy of the index segments), you don't need to copy them manually unless the backup contains better data than the master.
Related
I removed some indexes on a very large table and realized I needed them. Instead of adding them back concurrently, which would take a very long time, I was wondering if I could just do restore using a database copy that was taken before the indexes were removed?
If by "database copy" you mean a copy of the Postgres DB directory at file level (with Postgres not running to get a consistent state), then yes, such a snapshot includes everything, indexes too. You could copy that back on file level, and then start Postgres - falling back to the previous state, of course.
If, OTOH, you mean a backup with the standard Postgres tools pg_dump or pg_dumpall, then no, indexes are not included physically. Just the instructions to build them. It would not make sense to include huge junks of functionally dependent values. Building them from restored data may be about as fast.
Either way, you could not add back an index from an older snapshot to a live DB anyway, after changes to the table have been made. That's a logically impossible. Then there is no alternative to rebuilding the index one way or another.
I'll answer for MySQL. You tagged your question with both mysql and postgresql so I don't know which one you really use.
If your backup was a physical backup made with a backup solution like Percona XtraBackup or MySQL Enterprise Backup, it will include the indexes, so restoring it will be quicker.
If your backup was a logical backup made with mysqldump or mydumper, then the backup includes only data. Restoring it will have to rebuild the indexes anyway. It will not save any time.
If you made the mistake of making a "backup" only by copying files out of the data directory, those are sort of like the physical backup, but unless you copied the files while the MySQL Server was shut down, the backup is probably not viable.
I have a cluster of 3 sharded SOLR 4.1. There is a replicated cluster but the data is quite out of synced. I have stopped polling on those secondary nodes for a long time.
Now I want to start the replication again but I'm afraid it would take too long to replicate 400GB index data on each node.
If I manually copy over the index files from the master to the slave node, will it work?
Thanks
Yes, that should work just fine - as long as you don't write to the index while copying it (or copy it from a snapshot). In fact, that's what the replication does in the background (by replicating the segment files that needs replicating).
In older versions of Solr the replication was just shell scripts triggered to copy the index to other servers after an update happened.
what's the best way to dump a large(terabytes) db? are there other faster/efficient way besides mysqldump? this is intended to be zipped, unzipped, and then reimported into another mysql db on another server.
If it's possible for you to stop the database server, the best way is probably for you to:
Stop the database
Do a file copy of the files (including appropriate transaction logs, etc) to a new file system.
Restart the database.
Then move the copied files to the new server and bring up the database on top of the files. It's a bit complicated to do this, but it's by far the fastest way.
I used to be a DBA for a terabyte+ database in MySQL and this is one of the ways we'd do nightly backups of the database. mysqldump would've never worked for data that large. We'd stop the database each night and file copy the underlying files.
Since your intent seems to be having two copies of the DB, why not set up replication to do this?
That will ensure that both copies of the DB remain in an identical state (in terms of data anyway).
And, if you want a snapshot to be exported, you can:
wait for a quiet time.
disable replication.
back up the slave copy.
re-enable replication.
I wonder how to backup (dump) a Solr database?
If it is only to copy some files, then please specify which files (filename, location etc).
Thanks
We use Solr Replication to do our backup.
You can either have a slave that is dedicated to be a backup or use the "backup" command to make a backup on the master (I never used that last method).
Typically, the index is stored in $SOLR_HOME/data.
Back up that entire folder.
In Solr 8/9 version solr backup and restore is available via its replication handler.
It will create a snapshot of the data which you can also restore later.
Here in the solr documentation page you can find more useful information:
https://solr.apache.org/guide/8_9/making-and-restoring-backups.html#standalone-mode-backups
So this can be used with new 8/9 version if someone is looking for it.
We would like to be able to nightly make a copy/backup/snapshot of a production database so that we can import it in the dev environment.
We don't want to log ship to the dev environment because it needs to be something we can reset whenever we like to the last taken copy of the production database.
We need to be able to clear certain logging and/or otherwise useless or heavy tables that would just bloat the copy.
We prefer the attach/detach method as opposed to something like sql server publishing wizard because of how much faster an attach is than an import.
I should mention we only have SQL Server Standard, so some features won't be available.
What's the best way to do this?
MSDN
I'd say use those procedures inside a SQL Agent job (use master.xp_cmdshell to perform the copy).
You might want to put the big huge tables on their own partition and have this partition belong to a different file group. You would backup then backup and restore the main file group.
You might want to also consider doing incremental backups. Say, a full backup every weekend and an incremental every night. I haven't done file group backups, so I don't know if these work well together.
I'm guessing that you are already doing regular backups of your production database? If you aren't, stop reading this reply and go set it up right now.
I'd recommend that you write a script that automatically runs, say once a day, that:
Drops your current test database.
Restores your current production backup to your test environment.
You can write a simple script to do this and execute it using the isql.exe command line tool.