How to properly make SOLR backups - solr

Is there a way, to make SOLR backups daily, without restaring SOLR?
I mean SOLR's feature.

The backup of of your index (which contains the documents) can be started via http-request
http://localhost:8983/solr/yourcore/replication?command=backup
See Making and Restoring Backups of SolrCores
and
Solr 5.2: quick look on Solr backup functionality
for more information.
So if you want a daily backup make a cronjob to call this url regularly.

Related

Creating efficient, fast, incremental InfluxDB database backups

I have a raspberrypi (4b) running Raspbian Linux, collecting IoT data around the house and feeding this into an InfluxDB 1.8.3 (opensource) database. This works fine so far.
I also have a backup which runs daily like this:
influxd backup -portable /home/pi/influx-backup/
Question:
This backup process takes almost 30 minutes, during which InfluxDB is almost unuseable, system load climbs to >7 and my Pi cannot collect data. Each backup is a complete backup. Can I somehow create a faster incremental backup daily? The documentation only mentions a -since parameter but you'd have to specify this manually, which would be risky.
Alternatively, the whole system is backed up daily using borgbackup anyway. Stopping Influx, making a rsync copy of /var/lib/influxdb/data as backup, and restarting it is much, much faster than influxd backup. Is this a good alternative idea to backup the database?
What other alternatives exist to perform regular, quick, (if possible online) backups of Influx databases?
Thanks!
Acording to this site:
InfluxDB also has support for incremental backups. Snapshotting from the server now creates a full backup if one does not exist and creates numbered incremental backups after that.
DataSource
If this is the case, but you are still having a problem, perhaps you could downsize your data by running continuous queries and a data retention policy.

Problems with Solr in CKAN

I have a problem with solr and ckan.
I understood that Solr is not directly linked to PostgreSQL. The Solr index is maintained by the CKAN code itself.
I've lost all Solr's information because its broken so now I cant do queries in Solr. How can recover all the data in Solr?
Any crawling method that can help me? Or is it enough to dump my ckan database and export/import again?
You can use the search-index command for CKAN's CLI to rebuild to Solr index:
Rebuilds the search index. This is useful to prevent search indexes from getting out of sync with the main database.
For example:
paster --plugin=ckan search-index rebuild --config=/etc/ckan/std/std.ini
This default behaviour will clear the index and rebuild it with all datasets. If you want to rebuild it for only one dataset, you can provide a dataset name:
paster --plugin=ckan search-index rebuild test-dataset-name --config=/etc/ckan/std/std.ini

Backup strategy with master-slave solr 3.6 servers

We're using solr 3.6 replication with 2 servers - a master and a slave - and we're currently looking for the way to do clean backups.
As the wiki says so, we can use a HTTP command to create a snapshot of the master like this: http://myMasterHost/solr/replication?command=backup
But we still have some questions:
What is the benefit of the backup command on a classic shell script copying the index files?
The command only backups the indexes; is it possible to copy also the spellchecker folder? is it needed?
Can we create the snapshot while the application is running, so while there are potential index updates?
When we have to restore the servers from the backup, what do we have to do on the slave?
just copy the snapshot in its index folder, and removing the replication.properties file (or not)?
ask for a fetchindex through the HTTP command http://mySlave/solr/replication?command=fetchindex ?
just empty the slave index folder, in order to force a full replication from the master?
You can use the backup command provided by the ReplicationHandler. It's an asynchronous operation and it takes time if your index is big. This way you don't need to shutdown Solr. Then you'll find within the index directory a new directory named backup.yyyymmddHHMMSS with the backup date. You can also configure how many old backups you want to keep.
After that of course it's better if you move the backup to a safe location, probably to a different server.
I don't think it's possible to backup the spellchecker, not completely sure though.
Of course the command is meant to be run while the application is running. The only problem is that you will probably lose in the backup the documents that you committed after you started the backup itself.
You can also have a look at the lucene CheckIndex tool. Once you backed up the index you could check if the index is ok.
I wouldn't personally use the backups to restore the index on the slaves if you already have a good index on the master. The copy of the index would be automatic using the standard replication process (it's really a copy of the index segments), you don't need to copy them manually unless the backup contains better data than the master.

How to backup a Solr database?

I wonder how to backup (dump) a Solr database?
If it is only to copy some files, then please specify which files (filename, location etc).
Thanks
We use Solr Replication to do our backup.
You can either have a slave that is dedicated to be a backup or use the "backup" command to make a backup on the master (I never used that last method).
Typically, the index is stored in $SOLR_HOME/data.
Back up that entire folder.
In Solr 8/9 version solr backup and restore is available via its replication handler.
It will create a snapshot of the data which you can also restore later.
Here in the solr documentation page you can find more useful information:
https://solr.apache.org/guide/8_9/making-and-restoring-backups.html#standalone-mode-backups
So this can be used with new 8/9 version if someone is looking for it.

Sync solr documents with database records

I wonder there is a proper way to solr documents with sync database records. I usually have problems: there is solr documents while there are no database records referent by solr. It seems some db records has been deleted, but no trigger has been to update solr. I want to write a rake task to remove documents in solr that run periodically.
Any suggestions?
Chamnap
Yes, there is one.
You have to use the DataImportHandler with the delta import feature.
Basically, you specify a query that updates only the rows that have been modified, instead of rebuilding the whole index. Here's an example.
Otherwise you can add a feature in your application that simply trigger the removal of the documents via HTTP in both your DB and in your index.
I'm using Java + Java DB + Lucene (where Solr is based on) for my text search and database records. My solution is to backup then recreate (delete + create) the Lucene database to sync with my records on Java DB. This seems to be the easiest approach, only problem is that this is not advisable to run often. This also means that your records are not updated in real-time. I run my batch job nightly so that all changes reflect the next day. Hope this helps.
Also read an article about syncing Solr and db records here under "No synchronization". It states that it's not easy, but possible in some cases. Would be helpful if you specify your programming language so more people can help you.
In addition to the above, "soft" deletion by setting a deleted or deleted_at column is a great approach. That way you can run a script to periodically clear out deleted records from your Solr index as needed.
You mention using a rake task — is this a Rails app you're working with? Most Solr clients for Rails apps should support deleting records via an after_destroy hook.

Resources