About solr document not to be commited based on condition - solr

I am trying to find a way to block the solr commit using solr api based on certain condition.
Currently, every solr document index has an unique id. So how could I update the solr api that it does not commit to solr index if already the id is present based on my below code:
SolrInputDocument solrDoc = new SolrInputDocument();
solrDoc.addField("id", indexUrl);
solrDoc.addField("price",100);
HttpSolrServer server = new HttpSolrServer(endpoint);
UpdateResponse response = server.add(solrDoc);
server.commit();
Thanks

You're mistaking Solr for an SQL server... Solr commit is nothing like SQL commit.
The idea behind Solr commit is to lower the amount of writes to the disk. Solr is not transactional... you don't have a rollback ability - except maybe erasing the tlog folder manually. You'll need to rewrite the commit feature entirely to do what you want.
You can query Solr for the id's before sending them, but you have no ACID guarantees that by the time they are sent and committed there won't be other documents with the same id already in.
You could maybe get an ACID guarantee by using a shared SQL server to generate an ID for you. Or by using zooKeeper to do the same (although harder to configure).

Related

Alfresco 6.2 Solr waiting for updated index in ITs

We are currently upgrading our Alfresco 5.x to Alfresco 6.2 but we have troubles with our integration-tests especially the ones which are creating and searching for nodes.
The integration-tests were using the NO INDEX Solr which makes the created nodes searchable immediately but now with a separate Solr instance with Alfresco 6 we have to wait until Solr indexed the new nodes.
Correct me if I'm wrong, but as I know Alfresco only triggers committed transactions for Solr indexing which means either
a #Test has to succeed to successfully commit a transaction or
I have to begin a new transaction with the RetryingTransactionHelper in which I create new nodes
Using the RetryingTransactionHelper works but afterward I have to wait until the new data has been index.
(tl;dr) How do I check if new nodes have been indexed in Solr to use the org.alfresco.service.cmr.search.SearchService in my integration-tests?

Data dosn't get replicated in Solr 8.4 CDCR

i have configured Solr 8.4 on 3+3 , 6 systesm across 2 Data center. Uses zookeeper ensemble external server in each region.
I could configure the solrConfig.xml for test collection as per Solr manual. Followed the instruction manual provided by solr website to start solrCloud in a sequence.
When i insert a single record (document) in primary he dosn't get replicated. The replication happens only after i restart solrCloud on all servers. This is a new collection i defined. I have updated records manually thru UI. Not indexed the collection.
Do i have to restart solr evry time i update ? why records get updated only while restarting?
Please let me know if you have come across this.
Note: i dint run an index. it was empty collection using defaukt configset . Added record from UI in document section

hbase-indexer+Phoenix: hbase replication not working?

I have a cluster with HBASE+Phoenix.
I've installed SOLR on it.
Now I'm trying to set up hbase replication for the cluster, following this manual:
https://community.hortonworks.com/articles/1181/hbase-indexing-to-solr-with-hdp-search-in-hdp-23.html
Started hbase-indexer server, added hbase-indexer, put data via hbase shell, requested commit via browser.
But there are no changes in the collection in SOLR - zero new records.
Status 'replication' command in hbase shell increases sizeOfLogQueue with each PUT command to the indexed table.
When greping hbase log (hbase-hbase-regionserver-myserver.log) I found lots of records like this:
Indexer_hbaseindexer: Total replicated edits: 0, currently replicating
from:
hdfs://HDP-Test/apps/hbase/data/WALs/myserver,16020,1519204674681/myserver%2C16020%2C1519204674681.default.1519204995372
at position: 45671433
The position here never changes.
Issue author on this link tells that when changing WAL codec to IndexedWALEditCodec, the hbase replication stops.
Is it real that IndexedWALEditCodec stops hbase replication from working correctly? That shouldn't be true.
What may be a problem then? Any hint would be appreciated
env:
HDFS 2.7.3
HBASE 1.1.2
SOLR 5.5.2
HBASE INDEXER 2.2.8
p.s. When restarting Hbase, then querying solr commit, the changes appear. But afterwards it doesn't do anything.

Solr Cloud: Inconsistent Result

We are using Solr Cloud (4.3) for indexing data. We have 2 shard/2 replica servers in Solr Cloud.
We tried executing query on individual shard and it shows correct
When we execute same query (:) from Solr Admin Console, it display inconsistent results (number of records found is different each time).
What could be wrong? How can we troubleshoot it?
How Query is executed on different (shard/replica) and result combine? Is there any document which explain details about this?
I believe that you have to make sure that solr is doing soft commits to push information to the other replicas. This needs to be set to the frequency that you need the data to stay "current"
solr.autoSoftCommit.maxDocs=<max number of uncommitted documents before soft commit>
solr.autoSoftCommit.maxTime=<max time in ms before soft commit>
http://wiki.apache.org/solr/SolrConfigXml
SOLR autoCommit vs autoSoftCommit
Do a commit operation on solr Cloud after you index your data. Then refresh your results,One or two times it might show you different results,But after that it should be pretty consistent.

Sync solr documents with database records

I wonder there is a proper way to solr documents with sync database records. I usually have problems: there is solr documents while there are no database records referent by solr. It seems some db records has been deleted, but no trigger has been to update solr. I want to write a rake task to remove documents in solr that run periodically.
Any suggestions?
Chamnap
Yes, there is one.
You have to use the DataImportHandler with the delta import feature.
Basically, you specify a query that updates only the rows that have been modified, instead of rebuilding the whole index. Here's an example.
Otherwise you can add a feature in your application that simply trigger the removal of the documents via HTTP in both your DB and in your index.
I'm using Java + Java DB + Lucene (where Solr is based on) for my text search and database records. My solution is to backup then recreate (delete + create) the Lucene database to sync with my records on Java DB. This seems to be the easiest approach, only problem is that this is not advisable to run often. This also means that your records are not updated in real-time. I run my batch job nightly so that all changes reflect the next day. Hope this helps.
Also read an article about syncing Solr and db records here under "No synchronization". It states that it's not easy, but possible in some cases. Would be helpful if you specify your programming language so more people can help you.
In addition to the above, "soft" deletion by setting a deleted or deleted_at column is a great approach. That way you can run a script to periodically clear out deleted records from your Solr index as needed.
You mention using a rake task — is this a Rails app you're working with? Most Solr clients for Rails apps should support deleting records via an after_destroy hook.

Resources