Solr auto delete using default value of ttl field - solr

I am trying to use Solr auto-delete feature with solr-6. I have made the following changes in my managed-schema.xml and solrconfig.xml.
managed-schema
<!--expiration date field-->
<field name="eDate" type="date" multiValued="false" indexed="true" stored="true"/>
<field name="ttl" type="string" multiValued="false" indexed="true" stored="true" default="+90SECONDS"/>
solrconfig
<processor class="solr.processor.DocExpirationUpdateProcessorFactory">
<int name="autoDeletePeriodSeconds">30</int>
<str name="ttlFieldName">ttl</str>
<str name="expirationFieldName">eDate</str>
</processor>
I am able to use auto delete feature as expected if I explicitly set the ttl field either in the incoming document or if I set the ttl request parameter in the update request.
However, I want to use a default value for ttl as specified in the managed-schema if I do not explicitly set the ttl field. When I try this, ttl field is generated with the default value but the corresponding eDate field is not generated.
Is it possible to do what I am trying to do?
If yes, then how can I do this? Please leave a comment if you need any further details.

I couldn't make it working via default param in field description, but I make it working via adding solr.DefaultValueUpdateProcessorFactory
In my update chain I have this:
<processor class="solr.DefaultValueUpdateProcessorFactory">
<str name="fieldName">ttl</str>
<str name="value">+15SECONDS</str>
</processor>
<processor class="solr.processor.DocExpirationUpdateProcessorFactory">
<int name="autoDeletePeriodSeconds">5</int>
<str name="ttlFieldName">ttl</str>
<str name="expirationFieldName">eDate</str>
</processor>
I change values to have a quicker test :) Link to the working code

Related

Solr server Context Filtering in Auto suggester not working

I'm experiencing problem when I try to use Context Filtering with auto suggester. What I want is to filter the suggestions based on url field
Here is my searchComponent:
<lst name="suggester">
<str name="name">AnalyzingInfixSuggester</str>
<str name="lookupImpl">AnalyzingInfixLookupFactory</str>
<str name="dictionaryImpl">DocumentDictionaryFactory</str>
<str name="field">main_title</str>
<str name="weightField">main_title</str>
<str name="contextField">url</str>
<str name="suggestAnalyzerFieldType">text_general</str>
</lst>
Here are the fields in my schema:
<field name="main_title" type="string" indexed="true" stored="true"/>
<field name="url" type="string" indexed="true" stored="true"/>
Example:
I'm searching for "aacsb" and I have two results, which is correct. One is in English and one in German. I want to filter them out and show only the German result.
My urls looks like this:
https://www.myWebsite.com/aacsb-dog-lion?german
https://www.myWebsite.com/aacsb-dog-lion?english
Here are my queries:
http://localhost:8983/solr/myCore/suggest?&q=aacsb&suggest.dictionary=AnalyzingInfixSuggester&suggest.cfq=-url:english
http://localhost:8983/solr/myCore/suggest?&q=aacsb&suggest.dictionary=AnalyzingInfixSuggester&suggest.cfq=-english
With these I'm receiving both results. It doesn't matter if we have the field name or not.
When I tried these
http://localhost:8983/solr/myCore/suggest?&q=aacsb&suggest.dictionary=AnalyzingInfixSuggester&suggest.cfq=url:english
http://localhost:8983/solr/myCore/suggest?&q=aacsb&suggest.dictionary=AnalyzingInfixSuggester&suggest.cfq=english
I don't receive any results.
I read the documentation several times:LINK, but I still can't make it work.
Any help is welcomed.
Thanks!
EDIT:
I pasted the wrong queries, this was the correct:
http://localhost:8983/solr/myCore/suggest?&q=aacsb&suggest.dictionary=AnalyzingInfixSuggester&suggest.cfq=url:\*english\*
http://localhost:8983/solr/myCore/suggest?&q=aacsb&suggest.dictionary=AnalyzingInfixSuggester&suggest.cfq=\*english\*

Solr ClassificationUpdateProcessorFactory Bayes: problem with labels

I have encountered a very strange behaviour: to test the classifcation function in Solr, I have defined the following processor chain:
<updateRequestProcessorChain name="classification">
<processor class="solr.UUIDUpdateProcessorFactory">
<str name="fieldName">id</str>
</processor>
<processor class="solr.ClassificationUpdateProcessorFactory">
<str name="inputFields">content</str>
<str name="classField">cat_knn</str>
<str name="predictedClassField.maxCount">2</str>
<str name="algorithm">knn</str>
<str name="knn.k">10</str>
<str name="knn.minTf">1</str>
<str name="knn.minDf">1</str>
</processor>
<processor class="solr.ClassificationUpdateProcessorFactory">
<str name="inputFields">content</str>
<str name="classField">cat_bayes</str>
<str name="predictedClassField.maxCount">2</str>
<str name="algorithm">bayes</str>
</processor>
<processor class="solr.LogUpdateProcessorFactory"/>
<processor class="solr.RunUpdateProcessorFactory"/>
</updateRequestProcessorChain>
As a test set I am using news categories, such as "business", "entertainment" etc.
The relevant fields are defined as follows:
<field name="cat_knn" type="text_en" indexed="true" stored="true" multiValued="true"/>
<field name="cat_bayes" type="text_en" indexed="true" stored="true" multiValued="true"/>
For the training set cat_knn and cat_bayes contain exactly the same category labels.
However, if I use the above chain to classify new documents, the cat_knn for new documents are labeled with the full label, i.e. "business" or "entertainment", whereas for the bayes algorithm the labels are cut and displayed as "busi" or "entertain". At the same time, a label like "sport" is properly recorded as "sport".
Any idea what might be going on here?
What you are seeing is the stemmed tokens for the field instead. On the SolrClassification wiki page it specifies that:
The field that contains the class of the document. It must appear in the indexed documents. If knn algorithm it must be stored. If bayes algorithm it must be indexed and ideally not heavily analysed.
This indicates that bayes uses the actual tokens, while knn uses the stored text for the field when outputting the class.
Change the field type to string or strings (single valued vs multivalued), or a text field with minimal analysis (maybe a KeywordTokenizer with only a LowercaseFilter or similar).

Solr keeps giving errors in automatic uuid generation

I got a problem with automatic uuid generation in Solr. I want Solr to generate automatically uuids for the data imported by DataImportHandler.
Here's what i did:
In schema.xml
<fieldType name="uuid" class="solr.UUIDField" indexed="true" />
<field name="id" type="uuid" indexed="true" stored="true" required="true" multiValued="false" />
In solrconfig.xml
I added:
<updateRequestProcessorChain name="uuid">
<processor class="solr.UUIDUpdateProcessorFactory">
<str name="fieldName">id</str>
</processor>
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
I modified:
<requestHandler name="/update" class="solr.UpdateRequestHandler">
<!-- See below for information on defining
updateRequestProcessorChains that can be used by name
on each Update Request
-->
<!--
<lst name="defaults">
<str name="update.chain">dedupe</str>
</lst>
-->
<lst name="defaults">
<str name="update.chain">uuid</str>
</lst>
Also I did not comment or remove the UniqueKey and removed everything about QueryElevation.
But I just keep getting this error, which I totally have no idea where it comes out.
org.apache.solr.common.SolrException: Invalid UUID String: '1'
at org.apache.solr.schema.UUIDField.toInternal(UUIDField.java:89)
at org.apache.solr.schema.FieldType.readableToIndexed(FieldType.java:393)
at org.apache.solr.schema.FieldType.readableToIndexed(FieldType.java:398)
at org.apache.solr.update.AddUpdateCommand.getIndexedId(AddUpdateCommand.java:98)
at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:717)
at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:557)
at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
at org.apache.solr.handler.dataimport.SolrWriter.upload(SolrWriter.java:71)
at org.apache.solr.handler.dataimport.DataImportHandler$1.upload(DataImportHandler.java:235)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:512)
at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:416)
at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:331)
at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:239)
at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:411)
at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:483)
at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:464)
BTW, I am using Solr 4.8. Thanks very much for the reply and I really appreciate your help !!!
My guess is that you are getting field with that name coming from DIH and the UUID URP does not override one if it is present.
Try adding IgnoreFieldUpdateProcessorFactory in front and see if the problem goes away. If it does, you can start figuring out where DIH is picking it up from. For example, if you are getting data from the database and use select *, DIH will automatically try to map any fields with the identical names to what you have in schema.

Configuring Solr to use UUID as a key

I am trying to configure Solr 4 to work with UUID and so far I am unsuccessful
From reading the documentation I have seen two different ways to configure schema.xml to work with UUID (both do not work)
for both I need to write
<fieldType name="uuid" class="solr.UUIDField" indexed="true" />
option 1:
add:
<field name="id" type="uuid" indexed="true" stored="true" default="NEW" multiValued="false"/>
and make sure to remove the line
<uniqueKey>id</uniqueKey>
option 2
add:
<field name="id" type="uuid" indexed="true" stored="true" required="true" multiValued="false" />
Both options are not working correctly and returning
org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Error initializing QueryElevationComponent.
I also tried adding a row to the colrconfig.xml file with the configuration:
<updateRequestProcessorChain name="uuid">
<processor class="solr.UUIDUpdateProcessorFactory">
<str name="fieldName">uniqueKey</str>
</processor>
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
Thanks,
Shimon
After some work here is the solution:
In schema.xml, add (or edit) the field field
<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
In solr config, update the chain and add the chain to the handlers (Example: for /update/extract):
<updateRequestProcessorChain name="uuid">
<processor class="solr.UUIDUpdateProcessorFactory">
<str name="fieldName">id</str>
</processor>
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>`
<requestHandler name="/update/extract"
startup="lazy"
class="solr.extraction.ExtractingRequestHandler" >
<lst name="defaults">
<str name="lowernames">true</str>
<str name="uprefix">ignored_</str>
<str name="captureAttr">true</str>
<str name="fmap.a">links</str>
<str name="fmap.div">ignored_</str>
<str name="update.chain">uuid</str>
</lst>
</requestHandler>
You may want to remove the Query Elevation component if not using it.
QueryElevationComponent requires unique key to be defined and it should be a string unique key with JIRA.
However, it was fixed with the Solr 4.0 alpha so it would depend what Solr version you are using.
This limitation is documented in the Solr wiki.

solr - defined stored="true" and required="true" and the fields are not displayed in * search

I do a simple query as follows
http://....:8983/solr/vault/select?q=*:*
I dont see all the fields that I declared as stored="true" and required="true"
for insance, i have defined the following filed which is not displayed in the results:
<field name="Comments" type="text_en" indexed="true" stored="true" required="true"/>
(I can see it in
http://...:8983/solr/#/vault/schema
and I see it weas loaded in the sql profiler)
Why is that?
Fields to be displayed are defined by 'fl' parameter and can be configured against the SearchHandler in solrconfig.xml.
If you want to see the field values after hitting the URL, append '&fl=*' to the URL.
http://....:8983/solr/vault/select?q=*:*&fl=*
If you want to specify all fields by default, update solrconfig.xml file. Refer to sample config file at example solr config
<requestHandler name="/select" class="solr.SearchHandler">
<lst name="defaults">
<str name="echoParams">explicit</str>
<int name="rows">10</int>
<str name="df">text</str>
<str name="fl">*,score</str> <!-- field entry added-->
</lst>
Can you see any of the data that you have loaded? If not, make sure you have committed your updates to the index.
You can issue a hard commit via http://....:8983/solr/value/update?commit=true

Resources