We are currently upgrading our Alfresco 5.x to Alfresco 6.2 but we have troubles with our integration-tests especially the ones which are creating and searching for nodes.
The integration-tests were using the NO INDEX Solr which makes the created nodes searchable immediately but now with a separate Solr instance with Alfresco 6 we have to wait until Solr indexed the new nodes.
Correct me if I'm wrong, but as I know Alfresco only triggers committed transactions for Solr indexing which means either
a #Test has to succeed to successfully commit a transaction or
I have to begin a new transaction with the RetryingTransactionHelper in which I create new nodes
Using the RetryingTransactionHelper works but afterward I have to wait until the new data has been index.
(tl;dr) How do I check if new nodes have been indexed in Solr to use the org.alfresco.service.cmr.search.SearchService in my integration-tests?
Related
i have configured Solr 8.4 on 3+3 , 6 systesm across 2 Data center. Uses zookeeper ensemble external server in each region.
I could configure the solrConfig.xml for test collection as per Solr manual. Followed the instruction manual provided by solr website to start solrCloud in a sequence.
When i insert a single record (document) in primary he dosn't get replicated. The replication happens only after i restart solrCloud on all servers. This is a new collection i defined. I have updated records manually thru UI. Not indexed the collection.
Do i have to restart solr evry time i update ? why records get updated only while restarting?
Please let me know if you have come across this.
Note: i dint run an index. it was empty collection using defaukt configset . Added record from UI in document section
Is there any way I can insert data to only one node or shard of Solrv5.x and get it replicated to all the other nodes linked to it via zookeeper.
Thanks,
Ravi
This is what Solr does by default when running in SolrCloud mode (which is when it's using Zookeeper).
As long as you index to one of the nodes, the nodes will figure out where (which server has the collection) the document should go and which other servers it should be replicated to.
You set these settings when creating or changing a collection through the replicationFactor setting.
I am trying to find a way to block the solr commit using solr api based on certain condition.
Currently, every solr document index has an unique id. So how could I update the solr api that it does not commit to solr index if already the id is present based on my below code:
SolrInputDocument solrDoc = new SolrInputDocument();
solrDoc.addField("id", indexUrl);
solrDoc.addField("price",100);
HttpSolrServer server = new HttpSolrServer(endpoint);
UpdateResponse response = server.add(solrDoc);
server.commit();
Thanks
You're mistaking Solr for an SQL server... Solr commit is nothing like SQL commit.
The idea behind Solr commit is to lower the amount of writes to the disk. Solr is not transactional... you don't have a rollback ability - except maybe erasing the tlog folder manually. You'll need to rewrite the commit feature entirely to do what you want.
You can query Solr for the id's before sending them, but you have no ACID guarantees that by the time they are sent and committed there won't be other documents with the same id already in.
You could maybe get an ACID guarantee by using a shared SQL server to generate an ID for you. Or by using zooKeeper to do the same (although harder to configure).
I recently recovered a SOLR database that uses SOLR cloud to shard an index. I now have that database running on a single machine, but the data is still sharded--moreover now this is unnecessary.
How can I stop using SOLR cloud and merge these shards into a single collection?
I ended up using the Lucene Merge Index tool. The SOLR approaches did not work for me (obtuse errors).
I wonder there is a proper way to solr documents with sync database records. I usually have problems: there is solr documents while there are no database records referent by solr. It seems some db records has been deleted, but no trigger has been to update solr. I want to write a rake task to remove documents in solr that run periodically.
Any suggestions?
Chamnap
Yes, there is one.
You have to use the DataImportHandler with the delta import feature.
Basically, you specify a query that updates only the rows that have been modified, instead of rebuilding the whole index. Here's an example.
Otherwise you can add a feature in your application that simply trigger the removal of the documents via HTTP in both your DB and in your index.
I'm using Java + Java DB + Lucene (where Solr is based on) for my text search and database records. My solution is to backup then recreate (delete + create) the Lucene database to sync with my records on Java DB. This seems to be the easiest approach, only problem is that this is not advisable to run often. This also means that your records are not updated in real-time. I run my batch job nightly so that all changes reflect the next day. Hope this helps.
Also read an article about syncing Solr and db records here under "No synchronization". It states that it's not easy, but possible in some cases. Would be helpful if you specify your programming language so more people can help you.
In addition to the above, "soft" deletion by setting a deleted or deleted_at column is a great approach. That way you can run a script to periodically clear out deleted records from your Solr index as needed.
You mention using a rake task — is this a Rails app you're working with? Most Solr clients for Rails apps should support deleting records via an after_destroy hook.