Solr cloud indexing hang - solr

I am working with solrcloud now, but I am facing a problem which could cause indexing process hang.
My deployment is only one collection having 5 shard running at 5 machine. Every day we will do a full index using dataimporthandler, which have 50m docs. and we trigger indexing at one of 5 machine, using distribute indexing of solrcloud.
I have founded that, sometimes one of 5 machine will die, cause of
2013-01-08 10:43:35,879 ERROR core.SolrCore - java.io.FileNotFoundException: /home/admin/index/core_p_shard2/index/_31xu.fnm (No such file or directory)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.<init>(RandomAccessFile.java:216)
at org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:222)
at org.apache.lucene.store.NRTCachingDirectory.openInput(NRTCachingDirectory.java:232)
at org.apache.lucene.codecs.lucene40.Lucene40FieldInfosReader.read(Lucene40FieldInfosReader.java:52)
at org.apache.lucene.index.SegmentCoreReaders.<init>(SegmentCoreReaders.java:101)
at org.apache.lucene.index.SegmentReader.<init>(SegmentReader.java:57)
at org.apache.lucene.index.ReadersAndLiveDocs.getReader(ReadersAndLiveDocs.java:120)
at org.apache.lucene.index.BufferedDeletesStream.applyDeletes(BufferedDeletesStream.java:267)
at org.apache.lucene.index.IndexWriter.applyAllDeletes(IndexWriter.java:3010)
at org.apache.lucene.index.DocumentsWriter.applyAllDeletes(DocumentsWriter.java:180)
at org.apache.lucene.index.DocumentsWriter.postUpdate(DocumentsWriter.java:310)
at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:386)
at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1445)
at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:210)
at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61)
at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:448)
at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:325)
at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:230)
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:157)
at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1699)
and I have check index dir, which does not contain _31xu.fnm indeed. I am wondering it's there some concurrent bug in distribute indexing?
As far as I konw, distribute indexing is work like this. you can send docs to any shard, and docs will forword to correct shard according to a hash id. and dataimporthandler will forward docs to correc shard using updatehandler. and finally docs will be flushed to disk via DocumentsWriterPerThread. I am wondering it's there are too much update request which sended from the shard triggered indexing caused the problem. My guess is based on that I found at the machine whild died has a lot of index segment, and each of them is very small.
I am not familiar with solr too much, may be my guess has no meaning at all, does anyone have some idea? thanks

Related

using solr master slave configuration with TYPO3?

We have some sites which use solr as an internal search. This is done with the extension ext:solr from DKD. Within the extension there is an install script which provides core for multiple languages.
This is working well on most systems.
Meanwhile we have some bigger sites and as there are some specialities we get problems:
We have sites which import data on a regulary base from outside of TYPO3. To get the solr index up to date we need to rebuild the complete index (at night). But as the site gets bigger the reindex takes longer and longer. And if an error occurs the index is broken the next day.
You could say: no problem just refresh all records, but that would leave information in the index for records which are deleted meanwhile (there is no 'delete' information in the import, except that a deleted record is no longer in the import. So a complete delete of all records before the import (or special marking and explicit deletion afterwards) is necessary.
Anyway, the reindex takes very long and can't be triggered any time. And an error leaves the index incomplete.
In theory there is the option to work with two indices: one which is build up anew and the other one is used for search requests. In this way you always have a complete index, so it might be not up to date. After the new index is build you can swap the indices and rebuild the older one.
That needs to be triggered from inside of TYPO3, but I have not found anything about such a configuration.
Another theoretic option might be a master-slave configuration, but as far as I think about it:
when the index of master is reset to rebuild it, this reset would be synchronized to slave which looses all the information it should provide until the rebuild is complete.
(I think the problem is independent of a specific TYPO3 or solr version, so no version tag)
you know about our read and write concept introduced in EXT:Solr 9 https://docs.typo3.org/p/apache-solr-for-typo3/solr/11.0/en-us/Releases/solr-release-9-0.html#support-to-differ-between-read-and-write-connections ?
Isn't it something for your case?
The only one thing what you need is to setup it in deployment properly.
If your fresh Index is finalized and fine and not broken, you just switch the read core to read from previous write one.

Deleting the fetched records automatically when Fetch_Error occurs with solr and storm crawler integration

I have Solr and Storm Crawler integrated. I need to handle the deletion of the document from the solr index after FETCH_ERROR status gets converted into an ERROR after a number of successive attempts which is not happening right now.
I read in case of elasticsearch, we have AbstractStatusUpdaterBolt and DeletionBolt to take care of that.
Do we have any similar deletion bolt for solr integration also which actually along with StatusUpdaterBolt could delete the record from solr index?
Any direction would help. Thanks.
Currently, with StormCrawler 1.15, we don't have a DeletionBolt for SOLR. Writing one should not be too difficult, you could use the one for ES as an example. The logic of sending tuples to the deletion stream is already handled by the AbstractStatusUpdater bolt so there is nothing to do on that front.
Feel free to open an issue to ask for this to be added, or even better, contribute a pull request if you can.

Posting large directory of files to SOLR using post tool, how to commit after every file

I am using the java post tool for solr to upload and index a directory of documents. There are several thousand documents. Solr only does a commit at the very end of the process and sometimes things stop before it completes so I lose all the work.
Has anyone a technique to fetch the name of each doc and call post on that so you get the commit for each document? Rather than the large commit of all the docs at the end?
From the help page for the post tool:
Other options:
..
-params "<key>=<value>[&<key>=<value>...]" (values must be URL-encoded; these pass through to Solr update request)
This should allow you to use -params "commitWithin=1000" to make sure each document shows up within one second of being added to the index.
Committing after each document is an overkill for the performance, in any case it's quite strange that you had to resubmit anything from start if something goes wrong. I suggest to seriously to change the indexing strategy you're using instead of investigating in a different way to commit.
Given that, if you not have any other way that change the commit configuration, I suggest to configure autocommit in your Solr collection/index or use the parameter commitWithin, as suggested by #MatsLindh. Just be aware if the tool you're using has the chance to add this parameter.
autoCommit
These settings control how often pending updates will be automatically pushed to the index. An alternative to autoCommit is
to use commitWithin, which can be defined when making the update
request to Solr (i.e., when pushing documents), or in an update
RequestHandler.

Solr 6.4: Cannot unload core via API or Admin Panel

The problem is: I tried to replace a core creating a new one with a different name, swapping and then UNLOAD the old one, but it failed.
Now, even trying to clean everything manually (unloading the cores with the AdminPanel or via curl using deleteIndexDir=true&deleteInstanceDir=true and deleting the physical diretories of both cores, nothing works.
If I UNLOAD the cores using the AdminPanel, then I don't see the cores listed anymore. But the STATUS command still returns me this:
$ curl -XGET 'http://localhost:8983/solr/admin/cores?action=STATUS&core=mycore&wt=json'
{"responseHeader":{"status":0,"QTime":0},"initFailures":{},"status":{"mycore":{"name":"mycore","instanceDir":"/var/solr/data/mycore","dataDir":"data/","config":"solrconfig.xml","schema":"schema.xml","isLoaded":"false"}}}
But, if I try to UNLOAD the core via curl:
$ curl -XGET 'http://localhost:8983/solr/admin/cores?action=UNLOAD&deleteIndexDir=true&deleteInstanceDir=true&core=mycore&wt=json'
{"responseHeader":{"status":0,"QTime":0}}
and there is no effect. I still see the core listed in the AdminPanel, the STATUS returns exactly the same and of course if I want to access the cores errors start poping up telling me that solrconfig.xml doesn't exist. Of course, nothing exists.
I know if I restart Solr everything will be fine. But I cannot restart Solr in production whenever it gets dirty alone (and it does, very often).
Some time ago I made a comment here but I didn't get any useful reply.
Now, the real problem is that in production there are other cores working and to restart Solr it takes about half an hour, which is not ok at all.
So, the question is how to clean unloaded cores properly WITHOUT restarting Solr. Please before saying "no, it's not possible" try to understand the business requirement. It MUST be possible somehow. If you know the reason why it's not possible, let's start thinking together how could it be possible.
UPDATE
I'm adding here some errors I've found looking at the logs, I hope it helps:
Solr init error
Solr create error
Solr duplicate requestid error (my script tried twice using the same id)
Solr closing index writer error
Solr error opening new searcher
I've just noticed that the error opening searcher and the one creating the core are related, both have Caused by: java.nio.file.FileAlreadyExistsException: /var/solr/data/mycore/data/index/write.lock

solr update/json hangs semi-randomly

I'm a total solr noob so I'm probably missing important information here.
Solr version: 10.4.2
Platform: Mac OS X
I'm attempting to add about 5000 documents to an empty index. Documents have 4 fields:
id (string, indexed, stored)
title (solr.TextField, indexed, not stored)
keywords (solr.TextField, multi values, indexed, not stored)
content (solr.TextField, indexed, not stored)
I'm using update/json to insert the documents in batches of 100 in a tight loop (making a new HTTP request to the update/json endpoint for each batch). The problem gets better if I add, e.g., a 100ms delay between each request. If I delay a full second it goes away completely, but this is obviously unacceptably slow.
I have worked around it by adding very short timeouts for my HTTP requests (1 second), and implementing some retry logic. It works, but of course I get annoying delays all the time as it retries.
My process often hangs waiting for solr to respond at some point during the process. For instance, if I start with a fresh core and test it right now, these are my results for each run in turn:
hang on the 45th batch, solr admin shows 3,280 documents
hang on the 52nd batch, solr admin shows 3,788 documents
hang on the 14th batch, solr admin shows 3,788 documents
hang on the 17th batch, solr admin shows 3,788 documents
successfully completes all batches, solr admin shows 4,043 documents
The log in solr admin shows no output during any of these runs. At any point after a failed or successful run I can query the index and get back reasonable results considering the data that has been added.
The update/json request handler is the one that is "implicitly added" -- it is not specified in my solrconfig.xml.
I have tried switching my locking mechanism from native to simple with no change in behavior.
Any help you can offer would be greatly appreciated. I'm not sure where to start.
Additional info:
1: It seems to hang forever. By "hang" I mean Solr never responds to the HTTP request. If I cancel the request and send it again, it generally works fine right away. I have let it wait up to about 10 minutes for a response.
2: My solrconfig.xml has this:
<updateHandler class="solr.DirectUpdateHandler2">
<updateLog>
<str name="dir">${solr.ulog.dir:}</str>
</updateLog>
<autoCommit>
<maxTime>${solr.autoCommit.maxTime:15000}</maxTime>
<openSearcher>false</openSearcher>
</autoCommit>
<autoSoftCommit>
<maxTime>${solr.autoSoftCommit.maxTime:-1}</maxTime>
</autoSoftCommit>
</updateHandler>
You did not describe the actual 'hang'. Is it hanging for a period of time or forever? That makes quite a difference.
I am assuming your actual document (content fields?) are quite large.
There might be a couple of things:
Garbage collection. If you allocated a lot of memory to Solr,
when it hits the limit, the GC could be quite long. There are Java
flags to enable GC reporting during a test run
Index merging.
Watch the data/index directory and see if the files start moving
around.
Look also in the server logs, not just on the WebUI. The
server logs will have constant chatter about what's going, UI only
shows the issues.
It's also worth checking what your commit and
soft-commit settings are (in solrconfig.xml).

Resources