I have a problem with solr and ckan.
I understood that Solr is not directly linked to PostgreSQL. The Solr index is maintained by the CKAN code itself.
I've lost all Solr's information because its broken so now I cant do queries in Solr. How can recover all the data in Solr?
Any crawling method that can help me? Or is it enough to dump my ckan database and export/import again?
You can use the search-index command for CKAN's CLI to rebuild to Solr index:
Rebuilds the search index. This is useful to prevent search indexes from getting out of sync with the main database.
For example:
paster --plugin=ckan search-index rebuild --config=/etc/ckan/std/std.ini
This default behaviour will clear the index and rebuild it with all datasets. If you want to rebuild it for only one dataset, you can provide a dataset name:
paster --plugin=ckan search-index rebuild test-dataset-name --config=/etc/ckan/std/std.ini
Related
I have a cluster with HBASE+Phoenix.
I've installed SOLR on it.
Now I'm trying to set up hbase replication for the cluster, following this manual:
https://community.hortonworks.com/articles/1181/hbase-indexing-to-solr-with-hdp-search-in-hdp-23.html
Started hbase-indexer server, added hbase-indexer, put data via hbase shell, requested commit via browser.
But there are no changes in the collection in SOLR - zero new records.
Status 'replication' command in hbase shell increases sizeOfLogQueue with each PUT command to the indexed table.
When greping hbase log (hbase-hbase-regionserver-myserver.log) I found lots of records like this:
Indexer_hbaseindexer: Total replicated edits: 0, currently replicating
from:
hdfs://HDP-Test/apps/hbase/data/WALs/myserver,16020,1519204674681/myserver%2C16020%2C1519204674681.default.1519204995372
at position: 45671433
The position here never changes.
Issue author on this link tells that when changing WAL codec to IndexedWALEditCodec, the hbase replication stops.
Is it real that IndexedWALEditCodec stops hbase replication from working correctly? That shouldn't be true.
What may be a problem then? Any hint would be appreciated
env:
HDFS 2.7.3
HBASE 1.1.2
SOLR 5.5.2
HBASE INDEXER 2.2.8
p.s. When restarting Hbase, then querying solr commit, the changes appear. But afterwards it doesn't do anything.
Is there a way, to make SOLR backups daily, without restaring SOLR?
I mean SOLR's feature.
The backup of of your index (which contains the documents) can be started via http-request
http://localhost:8983/solr/yourcore/replication?command=backup
See Making and Restoring Backups of SolrCores
and
Solr 5.2: quick look on Solr backup functionality
for more information.
So if you want a daily backup make a cronjob to call this url regularly.
I recently recovered a SOLR database that uses SOLR cloud to shard an index. I now have that database running on a single machine, but the data is still sharded--moreover now this is unnecessary.
How can I stop using SOLR cloud and merge these shards into a single collection?
I ended up using the Lucene Merge Index tool. The SOLR approaches did not work for me (obtuse errors).
We are using Solr Cloud (4.3) for indexing data. We have 2 shard/2 replica servers in Solr Cloud.
We tried executing query on individual shard and it shows correct
When we execute same query (:) from Solr Admin Console, it display inconsistent results (number of records found is different each time).
What could be wrong? How can we troubleshoot it?
How Query is executed on different (shard/replica) and result combine? Is there any document which explain details about this?
I believe that you have to make sure that solr is doing soft commits to push information to the other replicas. This needs to be set to the frequency that you need the data to stay "current"
solr.autoSoftCommit.maxDocs=<max number of uncommitted documents before soft commit>
solr.autoSoftCommit.maxTime=<max time in ms before soft commit>
http://wiki.apache.org/solr/SolrConfigXml
SOLR autoCommit vs autoSoftCommit
Do a commit operation on solr Cloud after you index your data. Then refresh your results,One or two times it might show you different results,But after that it should be pretty consistent.
I wonder there is a proper way to solr documents with sync database records. I usually have problems: there is solr documents while there are no database records referent by solr. It seems some db records has been deleted, but no trigger has been to update solr. I want to write a rake task to remove documents in solr that run periodically.
Any suggestions?
Chamnap
Yes, there is one.
You have to use the DataImportHandler with the delta import feature.
Basically, you specify a query that updates only the rows that have been modified, instead of rebuilding the whole index. Here's an example.
Otherwise you can add a feature in your application that simply trigger the removal of the documents via HTTP in both your DB and in your index.
I'm using Java + Java DB + Lucene (where Solr is based on) for my text search and database records. My solution is to backup then recreate (delete + create) the Lucene database to sync with my records on Java DB. This seems to be the easiest approach, only problem is that this is not advisable to run often. This also means that your records are not updated in real-time. I run my batch job nightly so that all changes reflect the next day. Hope this helps.
Also read an article about syncing Solr and db records here under "No synchronization". It states that it's not easy, but possible in some cases. Would be helpful if you specify your programming language so more people can help you.
In addition to the above, "soft" deletion by setting a deleted or deleted_at column is a great approach. That way you can run a script to periodically clear out deleted records from your Solr index as needed.
You mention using a rake task — is this a Rails app you're working with? Most Solr clients for Rails apps should support deleting records via an after_destroy hook.