I got a problem with automatic uuid generation in Solr. I want Solr to generate automatically uuids for the data imported by DataImportHandler.
Here's what i did:
In schema.xml
<fieldType name="uuid" class="solr.UUIDField" indexed="true" />
<field name="id" type="uuid" indexed="true" stored="true" required="true" multiValued="false" />
In solrconfig.xml
I added:
<updateRequestProcessorChain name="uuid">
<processor class="solr.UUIDUpdateProcessorFactory">
<str name="fieldName">id</str>
</processor>
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
I modified:
<requestHandler name="/update" class="solr.UpdateRequestHandler">
<!-- See below for information on defining
updateRequestProcessorChains that can be used by name
on each Update Request
-->
<!--
<lst name="defaults">
<str name="update.chain">dedupe</str>
</lst>
-->
<lst name="defaults">
<str name="update.chain">uuid</str>
</lst>
Also I did not comment or remove the UniqueKey and removed everything about QueryElevation.
But I just keep getting this error, which I totally have no idea where it comes out.
org.apache.solr.common.SolrException: Invalid UUID String: '1'
at org.apache.solr.schema.UUIDField.toInternal(UUIDField.java:89)
at org.apache.solr.schema.FieldType.readableToIndexed(FieldType.java:393)
at org.apache.solr.schema.FieldType.readableToIndexed(FieldType.java:398)
at org.apache.solr.update.AddUpdateCommand.getIndexedId(AddUpdateCommand.java:98)
at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:717)
at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:557)
at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:71)
at org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:235)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:512)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:416)
at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:331)
at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:239)
at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:411)
at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:483)
at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:464)
BTW, I am using Solr 4.8. Thanks very much for the reply and I really appreciate your help !!!
My guess is that you are getting field with that name coming from DIH and the UUID URP does not override one if it is present.
Try adding IgnoreFieldUpdateProcessorFactory in front and see if the problem goes away. If it does, you can start figuring out where DIH is picking it up from. For example, if you are getting data from the database and use select *, DIH will automatically try to map any fields with the identical names to what you have in schema.
Related
I'm experiencing problem when I try to use Context Filtering with auto suggester. What I want is to filter the suggestions based on url field
Here is my searchComponent:
<lst name="suggester">
<str name="name">AnalyzingInfixSuggester</str>
<str name="lookupImpl">AnalyzingInfixLookupFactory</str>
<str name="dictionaryImpl">DocumentDictionaryFactory</str>
<str name="field">main_title</str>
<str name="weightField">main_title</str>
<str name="contextField">url</str>
<str name="suggestAnalyzerFieldType">text_general</str>
</lst>
Here are the fields in my schema:
<field name="main_title" type="string" indexed="true" stored="true"/>
<field name="url" type="string" indexed="true" stored="true"/>
Example:
I'm searching for "aacsb" and I have two results, which is correct. One is in English and one in German. I want to filter them out and show only the German result.
My urls looks like this:
https://www.myWebsite.com/aacsb-dog-lion?german
https://www.myWebsite.com/aacsb-dog-lion?english
Here are my queries:
http://localhost:8983/solr/myCore/suggest?&q=aacsb&suggest.dictionary=AnalyzingInfixSuggester&suggest.cfq=-url:english
http://localhost:8983/solr/myCore/suggest?&q=aacsb&suggest.dictionary=AnalyzingInfixSuggester&suggest.cfq=-english
With these I'm receiving both results. It doesn't matter if we have the field name or not.
When I tried these
http://localhost:8983/solr/myCore/suggest?&q=aacsb&suggest.dictionary=AnalyzingInfixSuggester&suggest.cfq=url:english
http://localhost:8983/solr/myCore/suggest?&q=aacsb&suggest.dictionary=AnalyzingInfixSuggester&suggest.cfq=english
I don't receive any results.
I read the documentation several times:LINK, but I still can't make it work.
Any help is welcomed.
Thanks!
EDIT:
I pasted the wrong queries, this was the correct:
http://localhost:8983/solr/myCore/suggest?&q=aacsb&suggest.dictionary=AnalyzingInfixSuggester&suggest.cfq=url:\*english\*
http://localhost:8983/solr/myCore/suggest?&q=aacsb&suggest.dictionary=AnalyzingInfixSuggester&suggest.cfq=\*english\*
I had integrated nutch 2.3.1 with solr 6.5, with this I could push data to solr and get indexed. Now I want to remove duplicate elements and for this I made the modifications in schema.xml and solrconfig.xml
<field name="signatureField" type="string" stored="true" indexed="true" multiValued="false" />
<updateRequestProcessorChain name="dedupe">
<processor class="solr.processor.SignatureUpdateProcessorFactory">
<bool name="enabled">true</bool>
<str name="signatureField">id</str>
<bool name="overwriteDupes">false</bool>
<str name="fields">id,content,date,url</str> <!-- changing to id <str name="fields">name,features,cat</str>-->
<str name="signatureClass">solr.processor.Lookup3Signature</str>
</processor>
<processor class="solr.LogUpdateProcessorFactory" />
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
<requestHandler name="/update" class="solr.UpdateRequestHandler" >
<lst name="defaults">
<str name="update.chain">dedupe</str>
</lst>
</requestHandler>
but after indexing bin/nutch solrindex http://localhost:8983/solr/testcore -all
error !!
please help me to sort out this issue
thanking you in advance :)
This issue might be related to the schema updated, if you have some data existing in Solr and you updated the schema while that data exist in the core, Nutch will take it as a mismatch Schema, best way to fix this issue is re-crawling the webpage with the schema updated and keep in mind that any update to the schema will/could probably cause issues with you existing index.
Since post is already old, for future reference for people that could have the same issue.
Best :)
I am trying to use Solr auto-delete feature with solr-6. I have made the following changes in my managed-schema.xml and solrconfig.xml.
managed-schema
<!--expiration date field-->
<field name="eDate" type="date" multiValued="false" indexed="true" stored="true"/>
<field name="ttl" type="string" multiValued="false" indexed="true" stored="true" default="+90SECONDS"/>
solrconfig
<processor class="solr.processor.DocExpirationUpdateProcessorFactory">
<int name="autoDeletePeriodSeconds">30</int>
<str name="ttlFieldName">ttl</str>
<str name="expirationFieldName">eDate</str>
</processor>
I am able to use auto delete feature as expected if I explicitly set the ttl field either in the incoming document or if I set the ttl request parameter in the update request.
However, I want to use a default value for ttl as specified in the managed-schema if I do not explicitly set the ttl field. When I try this, ttl field is generated with the default value but the corresponding eDate field is not generated.
Is it possible to do what I am trying to do?
If yes, then how can I do this? Please leave a comment if you need any further details.
I couldn't make it working via default param in field description, but I make it working via adding solr.DefaultValueUpdateProcessorFactory
In my update chain I have this:
<processor class="solr.DefaultValueUpdateProcessorFactory">
<str name="fieldName">ttl</str>
<str name="value">+15SECONDS</str>
</processor>
<processor class="solr.processor.DocExpirationUpdateProcessorFactory">
<int name="autoDeletePeriodSeconds">5</int>
<str name="ttlFieldName">ttl</str>
<str name="expirationFieldName">eDate</str>
</processor>
I change values to have a quicker test :) Link to the working code
I am trying to configure Solr 4 to work with UUID and so far I am unsuccessful
From reading the documentation I have seen two different ways to configure schema.xml to work with UUID (both do not work)
for both I need to write
<fieldType name="uuid" class="solr.UUIDField" indexed="true" />
option 1:
add:
<field name="id" type="uuid" indexed="true" stored="true" default="NEW" multiValued="false"/>
and make sure to remove the line
<uniqueKey>id</uniqueKey>
option 2
add:
<field name="id" type="uuid" indexed="true" stored="true" required="true" multiValued="false" />
Both options are not working correctly and returning
org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Error initializing QueryElevationComponent.
I also tried adding a row to the colrconfig.xml file with the configuration:
<updateRequestProcessorChain name="uuid">
<processor class="solr.UUIDUpdateProcessorFactory">
<str name="fieldName">uniqueKey</str>
</processor>
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
Thanks,
Shimon
After some work here is the solution:
In schema.xml, add (or edit) the field field
<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
In solr config, update the chain and add the chain to the handlers (Example: for /update/extract):
<updateRequestProcessorChain name="uuid">
<processor class="solr.UUIDUpdateProcessorFactory">
<str name="fieldName">id</str>
</processor>
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>`
<requestHandler name="/update/extract"
startup="lazy"
class="solr.extraction.ExtractingRequestHandler" >
<lst name="defaults">
<str name="lowernames">true</str>
<str name="uprefix">ignored_</str>
<str name="captureAttr">true</str>
<str name="fmap.a">links</str>
<str name="fmap.div">ignored_</str>
<str name="update.chain">uuid</str>
</lst>
</requestHandler>
You may want to remove the Query Elevation component if not using it.
QueryElevationComponent requires unique key to be defined and it should be a string unique key with JIRA.
However, it was fixed with the Solr 4.0 alpha so it would depend what Solr version you are using.
This limitation is documented in the Solr wiki.
I do a simple query as follows
http://....:8983/solr/vault/select?q=*:*
I dont see all the fields that I declared as stored="true" and required="true"
for insance, i have defined the following filed which is not displayed in the results:
<field name="Comments" type="text_en" indexed="true" stored="true" required="true"/>
(I can see it in
http://...:8983/solr/#/vault/schema
and I see it weas loaded in the sql profiler)
Why is that?
Fields to be displayed are defined by 'fl' parameter and can be configured against the SearchHandler in solrconfig.xml.
If you want to see the field values after hitting the URL, append '&fl=*' to the URL.
http://....:8983/solr/vault/select?q=*:*&fl=*
If you want to specify all fields by default, update solrconfig.xml file. Refer to sample config file at example solr config
<requestHandler name="/select" class="solr.SearchHandler">
<lst name="defaults">
<str name="echoParams">explicit</str>
<int name="rows">10</int>
<str name="df">text</str>
<str name="fl">*,score</str> <!-- field entry added-->
</lst>
Can you see any of the data that you have loaded? If not, make sure you have committed your updates to the index.
You can issue a hard commit via http://....:8983/solr/value/update?commit=true