Solr throwing error org.apache.solr.common.SolrException: Attempted an atomic/partial update to a child doc without indicating the _root_ somehow - solr

We seem to get this error with an existing record in Solr
org.apache.solr.common.SolrException: Attempted an atomic/partial update to a child doc without indicating the _root_ somehow.
If I change the reference id to something that doesn't exist; it adds it fine and I can happily fire the same request in with no issue. I don't really understand this message and reading the solr documentation that also doesn't make sense!
Should it not just replace the whole document, thought it would be as simple as that, is the record corrupts maybe!?
How might I debug the issue in Solr or know what to look for?
Our solrconfig.xml had these defaults in, but unsure if this is what can cause this error.
<updateRequestProcessorChain name="add-unknown-fields-to-the-schema" default="${update.autoCreateFields:true}" processor="uuid,remove-blank,field-name-mutating,parse-boolean,parse-long,parse-double,parse-date,add-schema-fields">
<processor class="solr.LogUpdateProcessorFactory"/>
<processor class="solr.DistributedUpdateProcessorFactory"/>
<processor class="solr.RunUpdateProcessorFactory"/>
</updateRequestProcessorChain>
The url we seem to call is the following with the json object passed in
POST /solr/mycore/update?commit=true
I think these two links may have relevance, but all rather low level issues.
https://issues.apache.org/jira/browse/SOLR-15468
https://issues.apache.org/jira/browse/SOLR-14923
We don't store child objects, we only store single values or arrays of strings/numbers. We do also store the original JSON object string in the Bookings field, which has lots of extra stuff that we stripped out, this is a text_en field, so wondering if that's the issue; presume that is indexed which sounds dodgy as it can be quite a large object.

My current approach was to delete the record via the cores Document tab, then we could submit the request with no problems!
I do wonder if the issue stems from increasing a version from 7.6 to 8.11.1, but the person who copied the data just copied the data folder from 7.6 and put it in the 8.11.1 version, this seems rather dodgy and imagine a backup may have been a better idea.

Related

Using nested EL for dynamically reading numbered header values does not work any more

As a minimal example, consider an exchange with a few header values like this:
header.test0
header.test1
header.test2
...
I realise that numbered header fields like this are not ideal, but this is not something I can change.
In the past (up to Camel 2.17) I was able to loop over these values like this (XML DSL):
<loop>
<simple>${header.myCounter}</simple>
<log message="${header.test${header.CamelLoopIndex}}"/>
</loop>
However, since upgrading to Camel 2.21 this results in the following error:
org.apache.camel.language.simple.types.SimpleIllegalSyntaxException: expected symbol functionEnd but was eol at location 16 test${header
The documentation states that simple expressions can still be nested, so I am unsure why it stopped working, and how I can fix this. I suppose that my issue is caused by the fact that I am combining static text with dynamic values, whereas existing examples always take the complete dynamic value as name.
So my question is, what is the correct way for reading the values of header.test0, header.test1, ... in a loop dynamically?
For anyone that has similar issues:
The issue is not the nested expressions. The issue is that a SimpleIllegalSyntaxException will be thrown if you're using a nested expression (like in the example) AND when the header field does not exist. In older versions of Camel a non-existing header field would just evaluate to "null". In more recent version an exception is thrown.
Assuming the example above, an exception will be thrown if you loop from 0 to 5, and if header.test5 doesn't exist (for instance).
The null-safe "?." doesn't help here. The "solution" is to use a try/catch (yes, you can do that in XML DSL) and to do the appropriate thing in the "catch" part if the header value does not exist. It's clumsy, but it allows you to replicate the original behaviour of your routes in newer Camel versions.

Solr. How to know which request handler is being used. While you are inside a search component

I'm trying to make a solr plugin to report different statistics about solr queries, including things like number of results, what terms were used, and which core and request handler was used.
I thought of doing this as a custom SearchComponent, and adding it as a last component in all RequestHandlers, but I have one issue: I can't seem to find out which request handler is currently being used inside my search component. That is, in my plugin class that extends SearchComponent, how can I find the request handler?
I feel like this probably is probably easily accessible in some field and that I'm just blind. Any tips?
Edit:
One thing I could do is configure one search component for each request handler, where they each get some field with the request handler name/id. Although that isn't very pretty.
I also have a mild feeling my entire approach is wrong since I don't know Solr very well.
You can place within each request handler an attribute within the invariants (you could also place it within defaults-section, but since its an invariant, it makes sense to place it there) that names the handler and then access req.getParams to find that parameter (handlerName).
<requestHandler name="/myhandler" class="solr.SearchHandler">
<lst name="invariants">
<str name="handlerName">myhandler</str>
.
.
</lst>
.
.
</requestHandler>

SolrJ not setting request handler

I have a SolrCloud collection set up with multiple request handlers and I would like to access a non-default request handler called /all which is defined in solrconfig.xml. This handler works fine when I search from the browser:
All handler:
However, when I search from SolrJ using SolrQuery.setRequestHandler("all"), I get 0 results. SolrJ just puts a qt=/all into the query, so these are the browser results of the same query (SolrJ gets the same thing):
Select with qt=/all:
The same behavior is observed for all of our other handlers. If a handler is not defined, Solr throws a different error if there is a leading '/' or defaults to select if there isn't a leading '/' so we know that isn't the problem.
So my question is, how can I get this to work in SolrJ? Select has the default settings in solrconfig.xml and it needs to stay the default handler. Searching around, the error seems to happen when there are duplicate IDs or the id field is not stored. But if this was the case, none of the searching should work so I think something else must be going on here.
It's true that the visible effect in the "setRequestHandler" method is to just set the qt parameter. But that's not the end of the story with SolrJ.
When a SolrJ request is processed, if the qt parameter contains a string starting with a forward slash, SolrJ will change "/select" in the URL path to the value contained in that parameter before it sends the request to Solr. It will also send the qt parameter as-is to Solr, but the parameter itself usually doesn't matter.
If you send an actual request to Solr on the /select handler with qt set to "/all", Solr should ignore the qt parameter -- unless you set handleSelect to true in the requestDispatcher section of the solrconfig.xml. This is not recommended -- because it means that you can change the index through the /select handler, simply by setting qt to "/update".
https://cwiki.apache.org/confluence/display/solr/RequestDispatcher+in+SolrConfig#RequestDispatcherinSolrConfig-handleSelectElement
It's also possible that there's a bug in newer Solr, where using qt with handleSelect="true" doesn't work right. I'm not sure whether that would actually be considered to be a bug or not. It certainly wouldn't be recommended config.
What is the handleSelect setting in solrconfig.xml when you receive that exception?
Looks like the issue kind of just resolved itself. No changes were made but everything works properly now. We think there could have been some indexing going on in the background or maybe something happened with Solr itself. We may never know, but I am no longer able to reproduce the issue so I guess that works.
Thank you to everyone who tried to help.

Sync SOLR with another repository

I need to keep SOLR indexes in sync with another repository (SQL DB) . SOLR being the source : all operations (Update,Delete,'Insert') on documents are done in SOLR and fired from a third party software on which I have no control.
I had to do it quickly, so did 'something that just works' :
2 scheduled jobs:
the first is for newly inserted and updated docs in SOLR: a simple
search query brings me the docs that need to be synchronized, so it
is easy to make the same into my database.
the second is for deletes:
This gets all IDs in SOLR and compare them to the ones in DB, the
extra ones are being deleted.
I keep these in separated job for more flexibility (enable / disable through config), and also because the sync schedule is different for each one.
I am not satisfied with my solution, I did not have much time to dive deep in the SORL documentation back then.
But now , I am wondering if there is/are better way(s) to do it. Ideally to get nearly real time sync and fire it on demand.
May be event Handlers in SOLR configuration ?
I think updates won't be an issue with event handler, if I can hook in the update event and fire the same operation in the DB. (If anyone could confirm if this is the best approach)
For delete and this is the most required, because the comparison I do between IDs is heavy (huge DB and huge document sets in SOLR).
Is there any event handler in SOLR that would let me know which documents are being deleted (when delete query is submitted) ?
I thought asking here may save me time.
Also, if possible to point me to some samples would be great.
(Preferably using .NET but I am open to do it in JAVA since it is the framework of SOLR or mix the two.)
Thanks.
There is an update hook available in Solr that allows you to run a binary:
<!-- The RunExecutableListener executes an external command.
exe - the name of the executable to run
dir - dir to use as the current working directory. default="."
wait - the calling thread waits until the executable returns.
default="true"
args - the arguments to pass to the program. default=nothing
env - environment variables to set. default=nothing
-->
<!-- A postCommit event is fired after every commit
-->
<listener event="postCommit" class="solr.RunExecutableListener">
<str name="exe">snapshooter</str>
<str name="dir">solr/bin</str>
<bool name="wait">true</bool>
<!--
<arr name="args"> <str>arg1</str> <str>arg2</str> </arr>
<arr name="env"> <str>MYVAR=val1</str> </arr>
-->
</listener>
</updateHandler>
See the documentation

preventing certain docs from being indexed in clucene

I am building a search index with clucene and I want to make sure docs containing any offensive terms never get added to the index. Using a StandardAnalyzer with stop list is not good enough since the offensive doc still gets added and would be returned for non-offensive searches.
Instead I am hoping to build up a document, then check if it contains any offensive words, then adding it only if it doesn't.
Cheers!
You can't really access that type of data in a Document
What you can do is run the analysis chain manually on the text and check each token individually. You can do this in a stupid loop, or by adding another analyzer to the chain that just raises a flag you check later.
This introduces some more work, but the best way to achieve that IMO.

Resources