Why does Solr 6.1 turn JSON single values into arrays? - solr

I'm in the process of upgrading from 4.7 to 6.1. I was specifying fields in solrconfig.xml previously but wanted to move to the managed schema way so I can add JSON with new fields whenever I want to.
The problem is 6.1 managed schema is turning string values or numbers etc into arrays. This errors out sorting since Solr cannot sort on array values and its turning my single-value dates into arrays with a single value.
SolrConfig.xml 6.1 has this:
<processor class="solr.AddSchemaFieldsUpdateProcessorFactory">
<str name="defaultFieldType">strings</str>
<lst name="typeMapping">
<str name="valueClass">java.lang.Boolean</str>
<str name="fieldType">booleans</str>
</lst>
<lst name="typeMapping">
<str name="valueClass">java.util.Date</str>
<str name="fieldType">tdates</str>
</lst>
<lst name="typeMapping">
<str name="valueClass">java.lang.Long</str>
<str name="valueClass">java.lang.Integer</str>
<str name="fieldType">tlongs</str>
</lst>
<lst name="typeMapping">
<str name="valueClass">java.lang.Number</str>
<str name="fieldType">tdoubles</str>
</lst>
</processor>
I tried making the data types singular such as strings -> string but that didn't work.
Thanks!

Fields already created are the issue
(sorry to answer my own question but I found out the answer before anyone else did)
Changing the above snippet to singular data types works BUT...
If you have already created fields dynamically with a different solrconfig.xml then you reload it to have singular fields, the defaults will work as expected BUT you have already defined the existing ones.
To remedy this, unloaded the core, deleted it, recreated it, changed the solrconfig.xml to the desired settings, then added the docs in there.
It worked fine after that.
UPDATE
I recommend editing the manage-schema file found in /var/solr/data/CORE_NAME/conf and predefine the fields you want leaving the default behavior. You can also do this through the admin interface by adding fields.

Related

Default operator AND using SOLR on Coldfusion

I just want the default operator to be AND and not an OR for every basic search. For a particular collection, in the schema.xml and solrconfig.xml files I set the defaultOperator to AND (makes no difference) and set the mm to 100%, restart the CF Add-on Server services and still no difference when doing a search. I am on Coldfusion 2018.
<cfsearch
name='qHearings'
collection='hearings_collection'
criteria='conflicts of interest'
/>
returns me documents with words 'conflicts' OR 'interest'. If I change it to:
<cfsearch
name='qHearings'
collection='hearings_collection'
criteria='conflicts AND of AND interest'
/>
returns me documents with words 'conflicts' AND 'interest'. This is good but my users don't like be told to use AND and I hear endless comments about why can't it be like google search :(
I have been reading up on SOLR and it seems like many have the same problem but I try the suggestions but I always get an OR search result.
Anyone got basic SOLR search to default to AND?
Thank you #MatsLindh, your comments lead me to the right path! I was setting
<solrQueryParser q.op="AND"/>
in the schema.xml thinking that was where I was suppose to do it (of course, it made no difference I still got an OR search result).
I couldn't find a Solr log for Coldfusion but I played around with solrconfig.xml file for one particular collection. After re-reading your comments I added
<str name="q.op">AND</str>
to the "standard" handler and it worked! I am somewhat embarrassed because it wasn't obvious to me to do it that way and for all my googling I didn't see examples of it being done that way (I only saw it as being passed in a query parameter).
So my standard handler looks like this:
<requestHandler name="standard" class="solr.StandardRequestHandler" default="true">
<!-- default values for query parameters -->
<lst name="defaults">
<str name="echoParams">explicit</str>
<str name="hl.fl">summary title </str>
<str name="df">contents</str>
<str name="q.op">AND</str>
<str name="mm">100%</str>
<!-- omp = Only More Popular -->
<str name="spellcheck.onlyMorePopular">false</str>
<!-- exr = Extended Results -->
<str name="spellcheck.extendedResults">false</str>
<!-- The number of suggestions to return -->
<str name="spellcheck.count">1</str>
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr>
</requestHandler>
Super embarrassing for me that the solution was so simple.

Solr suggester: Context Filter incorrectly applied to FileDictionaryFactory

In the docs, it says that context filtering only comes into effect "when using AnalyzingInfixLookupFactory or BlendedInfixLookupFactory, when backed by a DocumentDictionaryFactory".
However, I have found that the context filtering is applied when a FileDictionaryFactory is used. This doesn't work, as there are no documents for the context filter to be applied to.
http://localhost:8983/solr/mycore/suggest?qt=suggest&suggest.dictionary=location&q=russia
> Returns ["Russia"]
http://localhost:8983/solr/mycore/suggest?qt=suggest&suggest.dictionary=location&q=russia&cfq=a
> Returns []
This is my suggester config:
<searchComponent name="suggest" class="solr.SuggestComponent">
<lst name="suggester">
<str name="name">location</str>
<str name="lookupImpl">AnalyzingInfixLookupFactory</str>
<str name="dictionaryImpl">FileDictionaryFactory</str>
<str name="sourceLocation">tdwg.txt</str>
<str name="suggestAnalyzerFieldType">text_general</str>
<str name="highlight">false</str>
</lst>
<lst name="suggester">
<str name="name">common-name</str>
<str name="lookupImpl">AnalyzingInfixLookupFactory</str>
<str name="dictionaryImpl">DocumentDictionaryFactory</str>
<str name="field">region.vernacular_names_t</str>
<str name="indexPath">common_name_suggest</str>
<str name="contextField">searchable.context_ss</str>
<str name="suggestAnalyzerFieldType">text_general</str>
<str name="highlight">false</str>
</lst>
</searchComponent>
As you can see, for one of the suggesters I do want context filtering (and it is working correctly). So I can't simply remove the suggest.cfq parameter from my request.
Is there anything I can change about my configuration so that the context filter is not applied to my FileDictionaryFactory suggester?
I already faced this issue in the past, it appears that when suggest.cfq is present in the request, context filtering will be applied for every (enabled) lookup implementations that supports it (AnalyzingInfix and BlendedInfix).
It seems there is no other solution than switching to another lookup impl. than these 2 for the dictionary which you don't want to apply context filtering.
For example you can try to use the FuzzyLookupFactory for the "location" suggester and the context filter won't be applied.
NB: this is a workaround, as it's not possible to get infix matches with FuzzyLookup or AnalyzingLookup implementations (only the whole prefix from the input token(s) is taken into account).
If you really need infix matches for both suggesters, it's likely you will have to make 2 parallel requests before merging the suggestions :/.

AnalyzingInfixLookupFactory implementation in Solr Suggestor not returning suggestion results

My requirement is to provide automatic suggestions to users on asset names as per their project.
I have tried using AnalyzingInfixLookupFactory and BlendedInfixLookupFactory, as these are the only ones that support context filtering.
But no suggestion results are being returned.
Below is extract from solrconfig.xml:
<searchComponent name="suggest" class="solr.SuggestComponent">
<lst name="suggester">
<str name="name">mySuggester</str>
<str name="lookupImpl">AnalyzingInfixLookupFactory</str>
<str name="dictionaryImpl">DocumentDictionaryFactory</str>
<str name="field">assetname_str</str>
<str name="indexPath">/home/suggest_index</str>
<str name="contextField">projectid</str>
<str name="weightField">weight</str>
<str name="suggestAnalyzerFieldType">string</str>
<str name="buildOnStartup">false</str>
<str name="buildOnCommit">false</str>
</lst>
</searchComponent>
<requestHandler name="/suggest" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<str name="suggest">true</str>
<str name="suggest.count">10</str>
<str name="suggest.dictionary">mySuggester</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
However if I try using FuzzyLookupFactory as lookup Impl, then suggestion results are returned as expected.(but problem is Fuzzylookupfactory does not support context filtering)
url used:
http://ipaddress:port/solr/collection_name/suggest?suggest=true&suggest.build=true&suggest.dictionary=mySuggester&wt=json&suggest.q=Com&suggest.cfq=
1234
(I know this is an old issue, but in case others stumble across it with the same problem...)
I spent a couple of days dealing with the same empty results. You don't say what the type of the field is that you're using as material for suggestions. You've got suggestAnalyzerFieldType set to string.
By default, string is a fieldType with no analysis many out-of-the-box schema.xml examples. A key concept, which is only vaguely hinted at in the Solr manual's Suggester doc, is that lookupImpls like AnalyzingInfixLookupFactory and BlendedInfixLookupFactory can take a suggestAnalyzerFieldType that is not the type of the field from which you are generating suggestions, but rather need a type that contains the appropriate analyzer elements, such as solr.WhiteSpaceTokenizer needed for suggestions.
In my case, I was trying to suggest from a multivalued string field--I wanted the field to have no tokenization. But until I changed the suggestAnalyzerFieldType from string to text_ws (a fieldType whose analyzer is only sole.WhiteSpaceTokenizer, I got empty results.
For what it's worth, if you use multivalued string field for suggestions, and many documents that contain the same string values in that field, then the BlendedInfixLookupFactory seems to produce a better result with no duplicate suggestions.

Solr SuggestComponent - Building dictionaries based on certain filters?

I am currently using the solr suggest component for an autocomplete feature. Now, according to user permissions and which area of the site I am on, I want to offer the user different suggestions. Now I assumed it would easily be possible to only consider certain explicit entries for building my dictionary (i.e.: dict1 is built only from entries where type=t1 and locale=en, dict 2 where type=t1 and locale=de, dict3 where type=t2 and locale=en, etc...). But I can't figure out where I would do such a thing. The system is running solr 4.6.
Do you know of any solution or have a possible workaround?
I am not currently able to update solr on the system or change the way documents are indexed apart from the solr configuration, so unfortunately context filtering is not available to me. This would only be a last resort if nothing else works.
Since you’re using old Solr 4.6 which doesn’t have context filtering or even multiple dictionaries, you would need to specify your dictionaries aka SearchComponent for each of the entry
<searchComponent class="solr.SpellCheckComponent" name="suggest-en">
<lst name="spellchecker">
-->
<!--
<str name="name">suggest</str>
<str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
<str name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup</str>
<str name="field">name</str> <!-- the indexed field to derive suggestions from
<float name="threshold">0.005</float>
<str name="buildOnCommit">true</str>
<str name="sourceLocation">american-english</str>
--> </lst>
</searchComponent>
And then define request handlers, like this:
<requestHandler class="org.apache.solr.handler.component.SearchHandler"
name="/suggest-en">
<lst name="defaults">
<str name="spellcheck">true</str>
<str name="spellcheck.dictionary">suggest-en</str>
<str name="spellcheck.onlyMorePopular">true</str>
<str name="spellcheck.count">5</str>
<str name="spellcheck.collate">true</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
The workaround here would be, that you would need to specify suggest-en, suggest-de, and all other types of the request handlers, and later point out clients depends on their profile to the correct request handler.

Solr - Suggest Component with 2 different field types

Im having trouble finding a way how to have 2 differently structured fields in one suggest component. (https://cwiki.apache.org/confluence/display/solr/Suggester)
The goal is to have an autocomplete module with these fields.
A field where StandardTokenizer is used
example output: This is a title
A field where a Custom tokenizer is used (Basically a regex to get a base domain of a full URL)
example output: thisisatitle.com
Therefore the requesthandler containing the the suggestcomponent is able to show both strings in the results array: thisisatitle.com and This is a title
Things ive tried are:
Multiple suggestcomponents
Ive googled and the only solution ive currently found is using shards as they allow for different schemas to be combined. To my mind that is rather ineffective as running 2 servers would be a waste of resources and also maintainability would suffer.
Any suggestions/workarounds are welcome.
To use multiple suggestion dictionaries (that can have different analyzers applied), you can use the "multiple dictionaries" configuration as shown in the documentation:
<searchComponent name="suggest" class="solr.SuggestComponent">
<lst name="suggester">
<str name="name">mySuggester</str>
<str name="lookupImpl">FuzzyLookupFactory</str>
<str name="dictionaryImpl">DocumentDictionaryFactory</str>
<str name="field">cat</str>
<str name="weightField">price</str>
<str name="suggestAnalyzerFieldType">string</str>
</lst>
<lst name="suggester">
<str name="name">altSuggester</str>
<str name="dictionaryImpl">DocumentExpressionDictionaryFactory</str>
<str name="lookupImpl">FuzzyLookupFactory</str>
<str name="field">product_name</str>
<str name="weightExpression">((price * 2) + ln(popularity))</str>
<str name="sortField">weight</str>
<str name="sortField">price</str>
<str name="storeDir">suggest_fuzzy_doc_expr_dict</str>
<str name="suggestAnalyzerFieldType">text_en</str>
</lst>
</searchComponent>

Resources