I am trying to integrate auto suggest functionality of solr in my project. I use this as my starting point. I changed my searched fields accordingly.
my schema.xml
<field name="name" type="text_suggest" indexed="true" stored="true"/>
<field name="manu" type="text_suggest" indexed="true" stored="true"/>
<field name="popularity" type="int" indexed="true" stored="true" />
<!-- A variant of textsuggest which only matches from the very left edge -->
<copyField source="name" dest="textnge"/>
<field name="textnge" type="autocomplete_edge" indexed="true" stored="false" />
<!-- A variant of name which matches from the left edge of all terms (implicit truncation) -->
<copyField source="name" dest="textng"/>
<field name="textng" type="autocomplete_ngram" indexed="true" stored="false" omitNorms="true" omitTermFreqAndPositions="true" />
My request handler in solrconfig.xml
<requestHandler class="solr.SearchHandler" name="/ac" default="true" >
<lst name="defaults">
<str name="defType">edismax</str>
<str name="rows">10</str>
<str name="fl">*,score</str>
<str name="qf">name^50 manu^20.0 textng^50.0</str>
<str name="pf">textnge^50.0</str>
<str name="bf">product(log(sum(popularity,1)),100)^20</str>
<str name="debugQuery">false</str>
</lst>
</requestHandler>
The problem is that my "/ac" handler is acting more like "/select" handler. When I type "moni" I am getting nothing. But when I type "monitor", its returning me the documents containing monitor in them.
I have been trying this for whole day and nothing seems to work. Any help will be deeply appreciated
Well when you look for "moni" in your query, you are actually specifically saying that you're looking for the "moni" keyword. Try looking for multiterms keywrods by adding "*", such as q=moni*.
You can also look in other fieldType analyser like autocomplete_edge (q=textnge:mori) or autocomplete_ngram (q=textng:mori) for more data.
I think you need to specify search component in solarconfig.xml like below
<searchComponent class="solr.SpellCheckComponent" name="ac">
<lst name="spellchecker">
<str name="name">suggest</str>
<str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
<str name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookupFactory</str>
<str name="field">yourfieldname</str> <!-- the indexed field to derive suggestions from -->
<float name="threshold">0.005</float>
<str name="buildOnCommit">true</str>
</lst>
</searchComponent>
Related
I've a question re autocomplete in Solr - say there is a multi words string "nice cheap laptop" which should be suggested to users in case they type 'nice', 'cheap' or 'laptop'. How to achieve that with Solr?
I'm trying to migrate to SOLR a code that currently works with ElasticSearch - for ES the mapping is provided with type 'completion', for which I configure all permutations of the terms in the phrase as input to search against, and output is the original phrase. Couldn't find in the docs if/how this is possible with SOLR.
EDIT:
I tried adding the following to solrconfig.xml:
<searchComponent name="suggest" class="solr.SuggestComponent">
<lst name="suggester">
<str name="name">mySuggester</str>
<str name="lookupImpl">FuzzyLookupFactory</str>
<str name="dictionaryImpl">DocumentDictionaryFactory</str>
<str name="field">name</str>
<!--str name="weightField">price</str-->
<str name="suggestAnalyzerFieldType">string</str>
<str name="buildOnStartup">false</str>
</lst>
</searchComponent>
<requestHandler name="/suggest" class="solr.SearchHandler"
startup="lazy" >
<lst name="defaults">
<str name="suggest.dictionary">mySuggester</str>
<str name="suggest">true</str>
<str name="suggest.count">10</str>
</lst>
<arr name="components">
<str>suggest</str>
</arr>
</requestHandler>
And the following to managed schema:
<field name="productNameId" type="string" indexed="true" stored="true"/>
<field name="aspectId" type="pint" indexed="true" stored="true"/>
<field name="name" type="text_general" indexed="true" stored="true"/>
<field name="categoryId" type="string" indexed="true" stored="true"/>
Then indexed 3 documents with solrj:
String urlString = "http://localhost:8983/solr/aspects";
HttpSolrClient client = new HttpSolrClient.Builder(urlString).build();
client.setParser(new XMLResponseParser() );
ProductAspects pa1 = new ProductAspects();
pa1.setId("1");
pa1.setAspectId(1);
pa1.setName("alice");
ProductAspects pa2 = new ProductAspects();
pa2.setId("2");
pa2.setAspectId(2);
pa2.setName("alza");
ProductAspects pa3 = new ProductAspects();
pa3.setId("3");
pa3.setAspectId(3);
pa3.setName("alza bob");
final UpdateResponse res1 = client.addBean( pa1 );
final UpdateResponse res2 = client.addBean( pa2 );
final UpdateResponse res3 = client.addBean( pa3 );
UpdateResponse res = client.commit();
After that, I would expect that typing 'alz' would return just 2 docs, but it returns all 3 docs:
http://localhost:8983/solr/aspects/suggest?suggest.dictionary=mySuggester&suggest=true&suggest.build=true&suggest.q=alz
Can you please assist what is the correct config for autocomplete with Solr?
I have a list of cities on mysql db which is hooked onto a UI for autocompletion purposes. I am currently using solr-5.3.0. Data import is happening through scheduled delta imports. I have the following questions:
I want to implement spell checker to this feature. I tried using:
DirectSolrSpellChecker
IndexBasedSpellChecker
FileBasedSpellChecker
Out of these 3 only FileBasedSpellChecker is able to give
suggestions that solely exists on db. For eg, while searching
cologne I've got results like
{
"responseHeader":{
"status":0,
"QTime":4,
"params":{
"q":"searchfield:kolakata",
"indent":"true",
"spellcheck":"true",
"wt":"json"}},
"response":{"numFound":0,"start":0,"docs":[]
},
"spellcheck":{
"suggestions":[
"cologne",{
"numFound":4,
"startOffset":12,
"endOffset":19,
"suggestion":["Cologne",
"Bologna",
"Cogne",
"Bastogne"]}],
"collations":[
"collation","searchfield:Cologne"]}}
These cities are pretty accurate and exists in db/file.
But when I use other 2 I got results like
{
"responseHeader":{
"status":0,
"QTime":4,
"params":{
"q":"searchfield:kolakata",
"indent":"true",
"spellcheck":"true",
"wt":"json"}},
"response":{"numFound":0,"start":0,"docs":[]
},
"spellcheck":{
"suggestions":[
"cologne",{
"numFound":4,
"startOffset":12,
"endOffset":19,
"suggestion":["Cologne",
"Cologn",
"Colognei"]}],
"collations":[
"collation","searchfield:Cologne"]}}
These cities who are not present in my db.
Though FileBasedSpellChecker is giving satisfactory results, but I
am a little apprehensive in using them because, I would need to keep
updating the file manually everytime a new city gets added/removed.
Also its generally not advisable to use FileBasedSpellChecker in
general.
I also need to make the suggestions searchable as well, that means
currently I am accessing the doc returned in
"responseHeader":{"response":{"docs":[<some-format>]}}
to search for results in that city, but now I want the suggestor to
return the results in the same <some-format> instead of just
string results, in order to get it integrated with UI properly.
One minor change requested is to sort the suggestions in ascending
order of edit/levenshtein distance. This is not a hard requirement
and can be negotiated with.
edit
My solrconfig looks like this:
<requestHandler name="/select" class="solr.SearchHandler">
<lst name="defaults">
<str name="echoParams">explicit</str>
<int name="rows">10</int>
<str name="df">searchfield</str>
<str name="spellcheck">true</str>
<str name="spellcheck.collate">true</str>
<str name="spellcheck.dictionary">file</str>
<str name="spellcheck.maxCollationTries">5</str>
<str name="spellcheck.count">5</str>
</lst>
<arr name="last-components">
<str>spellcheck</str>
</arr>
</requestHandler>
and
<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
<str name="queryAnalyzerFieldType">text_ngram</str>
<lst name="spellchecker">
<str name="name">file</str>
<str name="classname">solr.FileBasedSpellChecker</str>
<str name="sourceLocation">spellings.txt</str>
<str name="spellcheckIndexDir">./spellchecker</str>
</lst>
</searchComponent>
schema looks like this:
<field name="name" type="string" indexed="true" stored="true" multiValued="false" />
<field name="latlng" type="location" indexed="true" stored="true" multiValued="false" />
<field name="citycode" type="string" indexed="true" stored="true" multiValued="false" />
<field name="country" type="string" indexed="true" stored="true" multiValued="false" />
<field name="searchscore" type="float" indexed="true" stored="true" multiValued="false" />
<field name="searchfield" type="text_ngram" indexed="true" stored="false" multiValued="true" omitNorms="true" omitTermFreqAndPositions="true" />
<defaultSearchFieldsearchfield</defaultSearchField>
<solrQueryParser defaultOperator="OR"/>
<copyField source="name" dest="searchfield"/>
I am trying to look for alternative ways to sort a multivalue field.
I know that this question has been asked before and the solutions talk about min and max but that is not the strategy i am looking for.
Is there a way we can do a COPY of the multivalue over to another field which can be used for sorting?
For example like this:
<field name="cat" type="string" indexed="true" stored="true"
multiValued="true"/>
<copyField source="cat" dest="firstcat"/>
<field name="firstcat" type="string" indexed="true" stored="false"
multiValued="false"/>
Answering my question.
The copyfield above will not work and will throw an exception when there is more than one value in the multivalue string. I mean, duh. Obviously.
One working solution is to use the updateRequestProcessorChain configuration in the solrconfig.xml and add it to the update handler chain.
Here is a sample:
<updateRequestProcessorChain name="concatFields">
<processor class="solr.CloneFieldUpdateProcessorFactory">
<str name="source">str1</str>
<str name="dest">str2</str>
</processor>
<processor class="solr.ConcatFieldUpdateProcessorFactory">
<str name="fieldName">str2</str>
<str name="delimiter">_</str>
</processor>
<processor class="solr.CloneFieldUpdateProcessorFactory">
<str name="source">str2</str>
<str name="dest">str3</str>
</processor>
<processor class="solr.LogUpdateProcessorFactory" />
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
And then chain the processor to the path:
<initParams path="/update/**">
<lst name="defaults">
<str name="update.chain">concatFields</str>
</lst>
</initParams>
I am trying to configure Solr 4 to work with UUID and so far I am unsuccessful
From reading the documentation I have seen two different ways to configure schema.xml to work with UUID (both do not work)
for both I need to write
<fieldType name="uuid" class="solr.UUIDField" indexed="true" />
option 1:
add:
<field name="id" type="uuid" indexed="true" stored="true" default="NEW" multiValued="false"/>
and make sure to remove the line
<uniqueKey>id</uniqueKey>
option 2
add:
<field name="id" type="uuid" indexed="true" stored="true" required="true" multiValued="false" />
Both options are not working correctly and returning
org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Error initializing QueryElevationComponent.
I also tried adding a row to the colrconfig.xml file with the configuration:
<updateRequestProcessorChain name="uuid">
<processor class="solr.UUIDUpdateProcessorFactory">
<str name="fieldName">uniqueKey</str>
</processor>
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>
Thanks,
Shimon
After some work here is the solution:
In schema.xml, add (or edit) the field field
<field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" />
In solr config, update the chain and add the chain to the handlers (Example: for /update/extract):
<updateRequestProcessorChain name="uuid">
<processor class="solr.UUIDUpdateProcessorFactory">
<str name="fieldName">id</str>
</processor>
<processor class="solr.RunUpdateProcessorFactory" />
</updateRequestProcessorChain>`
<requestHandler name="/update/extract"
startup="lazy"
class="solr.extraction.ExtractingRequestHandler" >
<lst name="defaults">
<str name="lowernames">true</str>
<str name="uprefix">ignored_</str>
<str name="captureAttr">true</str>
<str name="fmap.a">links</str>
<str name="fmap.div">ignored_</str>
<str name="update.chain">uuid</str>
</lst>
</requestHandler>
You may want to remove the Query Elevation component if not using it.
QueryElevationComponent requires unique key to be defined and it should be a string unique key with JIRA.
However, it was fixed with the Solr 4.0 alpha so it would depend what Solr version you are using.
This limitation is documented in the Solr wiki.
I am starting to integrate with Solr and have run across what I perceive as an issue. I uploaded a simple spreadsheet using the java API (here is an exert:
- Document, id, value
- Excel3, name, steelers
- Excel3, subject, pirates
- Excel3, description, penguins
- Excel3, comments, panthers
- Excel3, author, panthers
)
Using this I used the first column as the "document name", second column as the field in the document to index, and the third column as the indexed data. All of these fields already existed in schema.xml, but here is how they are set up:
<field name="id" type="string" indexed="true" stored="true" required="true" />
<field name="name" type="text_general" indexed="true" stored="true"/>
<field name="subject" type="text_general" indexed="true" stored="true"/>
<field name="description" type="text_general" indexed="true" stored="true"/>
<field name="comments" type="text_general" indexed="true" stored="true"/>
<field name="author" type="text_general" indexed="true" stored="true"/>
now here is where my problem comes into play. I run a search for say steelers, and it comes back fine, but if I look for penguins, or many of the other fields, it does not pull back any results. However if I do description:penguins, the result pulls back as expected.
Can anyone please help me understand why the part before the : is required for some fields, but not others?
example searches:
solr/select?indent=on&q=penguins&wt=xml ----Doesn't return any results
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">0</int>
<lst name="params">
<str name="indent">on</str>
<str name="q">penguins</str>
<str name="wt">xml</str>
</lst>
</lst>
<result name="response" numFound="0" start="0"/>
</response>
solr/select?indent=on&q=description:penguins&wt=xml
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">18</int>
<lst name="params">
<str name="indent">on</str>
<str name="q">description:penguins</str>
<str name="wt">xml</str>
</lst>
</lst>
<result name="response" numFound="1" start="0">
<doc>
<str name="author">panthers</str>
<str name="comments">panthers</str>
<str name="description">penguins</str>
<str name="id">Excel3</str>
<str name="name">steelers</str>
<str name="subject">pirates</str>
</doc>
</result>
</response>
The default query parser will query the default field, which can be specified in the schema.xml as seen here: http://wiki.apache.org/solr/SchemaXml#The_Default_Search_Field
I think #Frank Famer's comment about using the DisMax parser is a real solution to this problem. That said, here are two work-arounds I've seen in practice:
1.Create an additional copyField that is indexed, not stored, that contains the values from all the fields you want to search and then specify that field as the default. It would look something like this in your schema.xml file.
<field name="myhugedefaultfield" type="text" indexed="true" stored="false" multiValued="true"/>
<copyField source="name" dest="myhugedefaultfield"/>
<copyField source="subject" dest="myhugedefaultfield"/>
<copyField source="description" dest="myhugedefaultfield"/>
<defaultSearchField>myhugedefaultfield</defaultSearchField>
2.Alter the user edited syntax and turn the query for penguins into a query for (name:penguins) OR (subject:penguins) OR (description:penguins).