Using atomic update in Solr get an error

Using atomic update in Solr get an error - solr

I'm getting the following error in 5.2.1:
RunUpdateProcessor has received an AddUpdateCommand containing a document that appears to still contain Atomic document update operations, most likely because DistributedUpdateProcessorFactory was explicitly disabled from this updateRequestProcessorChain
I tried working in cloud and in single. Guess that must be something with my solrconfig.xml - can someone please post example to a file that works?
In the solrconfig I have the following but also tried other.
<initParams path="/update/**">
<lst name="defaults">
<str name="update.chain">add-unknown-fields-to-the-schema</str>
</lst>
</initParams>
<updateRequestProcessorChain name="add-unknown-fields-to-the-schema">
<processor class="solr.UUIDUpdateProcessorFactory">
<str name="fieldName">id</str>
</processor>
<processor class="solr.LogUpdateProcessorFactory" />
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
When trying the example in http://yonik.com/solr/atomic-updates/ it works fine but that is using dynamic fields.
BTW - got the same error when trying solrj and also curl command (with
xml in a file)
Thanks.

It appears that I had the following missing from schema.xml. Strange that didn't read anything about it as a requirement.
<uniqueKey>id</uniqueKey>

Related

Solr deduplication error while indexing nutch data

I had integrated nutch 2.3.1 with solr 6.5, with this I could push data to solr and get indexed. Now I want to remove duplicate elements and for this I made the modifications in schema.xml and solrconfig.xml
<field name="signatureField" type="string" stored="true" indexed="true" multiValued="false" />
<updateRequestProcessorChain name="dedupe">
<processor class="solr.processor.SignatureUpdateProcessorFactory">
<bool name="enabled">true</bool>
<str name="signatureField">id</str>
<bool name="overwriteDupes">false</bool>
<str name="fields">id,content,date,url</str> <!-- changing to id <str name="fields">name,features,cat</str>-->
<str name="signatureClass">solr.processor.Lookup3Signature</str>
</processor>
<processor class="solr.LogUpdateProcessorFactory" />
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
<requestHandler name="/update" class="solr.UpdateRequestHandler" >
<lst name="defaults">
<str name="update.chain">dedupe</str>
</lst>
</requestHandler>
but after indexing bin/nutch solrindex http://localhost:8983/solr/testcore -all
error !!
please help me to sort out this issue
thanking you in advance :)

This issue might be related to the schema updated, if you have some data existing in Solr and you updated the schema while that data exist in the core, Nutch will take it as a mismatch Schema, best way to fix this issue is re-crawling the webpage with the schema updated and keep in mind that any update to the schema will/could probably cause issues with you existing index.
Since post is already old, for future reference for people that could have the same issue.
Best :)

Why does Solr 6.1 turn JSON single values into arrays?

I'm in the process of upgrading from 4.7 to 6.1. I was specifying fields in solrconfig.xml previously but wanted to move to the managed schema way so I can add JSON with new fields whenever I want to.
The problem is 6.1 managed schema is turning string values or numbers etc into arrays. This errors out sorting since Solr cannot sort on array values and its turning my single-value dates into arrays with a single value.
SolrConfig.xml 6.1 has this:
<processor class="solr.AddSchemaFieldsUpdateProcessorFactory">
<str name="defaultFieldType">strings</str>
<lst name="typeMapping">
<str name="valueClass">java.lang.Boolean</str>
<str name="fieldType">booleans</str>
</lst>
<lst name="typeMapping">
<str name="valueClass">java.util.Date</str>
<str name="fieldType">tdates</str>
</lst>
<lst name="typeMapping">
<str name="valueClass">java.lang.Long</str>
<str name="valueClass">java.lang.Integer</str>
<str name="fieldType">tlongs</str>
</lst>
<lst name="typeMapping">
<str name="valueClass">java.lang.Number</str>
<str name="fieldType">tdoubles</str>
</lst>
</processor>
I tried making the data types singular such as strings -> string but that didn't work.
Thanks!

Fields already created are the issue
(sorry to answer my own question but I found out the answer before anyone else did)
Changing the above snippet to singular data types works BUT...
If you have already created fields dynamically with a different solrconfig.xml then you reload it to have singular fields, the defaults will work as expected BUT you have already defined the existing ones.
To remedy this, unloaded the core, deleted it, recreated it, changed the solrconfig.xml to the desired settings, then added the docs in there.
It worked fine after that.
UPDATE
I recommend editing the manage-schema file found in /var/solr/data/CORE_NAME/conf and predefine the fields you want leaving the default behavior. You can also do this through the admin interface by adding fields.

Is it possible to include special characters in a field name with Solr in schemaless mode?

I want my dynamic field names to be able to include hash characters. Is this possible when Solr is in schemaless mode?

Found it. In the file solrconfig.xml I changed the following block of code
<processor class="solr.FieldNameMutatingUpdateProcessorFactory">
<str name="pattern">[^\w-\.]</str>
<str name="replacement">_</str>
</processor>
to
<processor class="solr.FieldNameMutatingUpdateProcessorFactory">
<str name="pattern">[^\w-\.\#]</str>
<str name="replacement">_</str>
</processor>

Custom UpdateRequestProcessorFactory

I have written a custom UpdateRequestProcessorFactory to parse my data before getting indexed. But the data is not getting committed. So when i restart the server all the data is gone. I have used the correct config also.
<updateRequestProcessorChain name="mytestupdatehandler" default="true">
<processor class="com.solr.handler.interceptor"></processor>
<processor class="solr.LogUpdateProcessorFactory" />
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
<requestHandler name="/MypdateHandler" class="solr.UpdateRequestHandler" >
<lst name="defaults">
<str name="update.chain">mytestupdatehandler</str>
</lst>
</requestHandler>
<requestHandler name="/update" class="solr.UpdateRequestHandler">
<!-- See below for information on defining
updateRequestProcessorChains that can be used by name
on each Update Request
-->
<lst name="defaults">
<str name="maxThreads">50</str>
<str name="handlerType">asyncXML</str>
<str name="sharedError">false</str>
</lst>
</requestHandler>
Also the default update handler also uses my update.chain . how to prevent that

You have default="true", so that makes your chain used for all update handlers. Remove that.
You seem to be missing class name in your handler definition, unless it is interceptor in the com.solr.handler package: <processor class="com.solr.handler.interceptor.CLASSNAME?"></processor>
Are you getting any errors in the console log if you start Solr from the command line? That might give you a hint.

Nutch not deleting duplicates from Solr

When Nutch finishes its crawl it recognises that there are duplicates to delete and goes through saying "deleting xxx duplicates" and completes with no problems. The only problem is that it actually hasnt deleted the duplicates although it said it has.
I've also tried using the dedup command on its own and the result is the same.
I have Solr & Nutch Set-up as shown on my blog if you wish to delve a little deeper, each stage in a different post:
http://amac4.blogspot.co.uk/2013/07/setting-up-solr-with-apache-tomcat-be.html
http://amac4.blogspot.co.uk/2013/07/setting-up-nutch-to-crawl-filesystem.html

In the signatureField tag I had "id" instead of "signature"
<updateRequestProcessorChain name="dedupe">
<processor class="org.apache.solr.update.processor.SignatureUpdateProcessorFactory">
<bool name="enabled">true</bool>
<bool name="overwriteDupes">true</bool>
<str name="signatureField">signature</str>
<str name="fields">id</str>
<str name="signatureClass">org.apache.solr.update.processor.Lookup3Signature</str>
</processor>
<processor class="solr.LogUpdateProcessorFactory" />
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
Works perfectly now

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Using atomic update in Solr get an error - solr

It appears that I had the following missing from schema.xml. Strange that didn't read anything about it as a requirement. <uniqueKey>id</uniqueKey>

Related

Solr deduplication error while indexing nutch data

Why does Solr 6.1 turn JSON single values into arrays?

Is it possible to include special characters in a field name with Solr in schemaless mode?

Custom UpdateRequestProcessorFactory

Nutch not deleting duplicates from Solr

Categories

Resources