I am using apache solr to search records in my current application.
And I was able to filter the suggesions based on DocumentType by configuring the context field.
Now I want to add another context field like departmentType. I am not sure how to configure the suggester for multiple context fields.
This is the suggester that used with single context fields and this is working fine.
<searchComponent name="suggest" class="solr.SuggestComponent">
<lst name="suggester">
<str name="name">suggesterByName</str>
<str name="lookupImpl">AnalyzingInfixLookupFactory</str>
<str name="dictionaryImpl">DocumentDictionaryFactory</str>
<str name="field">fullName</str>
<str name="contextField">documentType</str>
<str name="suggestAnalyzerFieldType">text_general</str>
<str name="buildOnStartup">false</str>
</lst>
</searchComponent>
I refer this post
https://issues.apache.org/jira/browse/SOLR-7888
but still not clear how to configure multiple context fields in a single suggester .
You have to create a new field in your schema.xml as context_field.
This field should have multivalued=true
<field name="context_field" type="text_suggest" multiValued="true" indexed="true" stored="true"/>
Then you have to create this context_field as a list in json for indexing in solr.
"context_field" : ["some document type", "some department type"]
after indexing you can suggest like this-
suggest.q=b&suggest.cfq=context_documentType AND context_departmentType
Hope it works
Related
I am doing a Solr streaming expression, and I try to use the /export handler to fetch all of the results. Consider the following query:
search(main, q=*:*, fl="SSRN",qt="/export",sort="SSRN asc")
I configured my schema.xml for the SSRN field as follows:
<field name="SSRN" type="int" indexed="true" stored="true" required="false" multiValued="false" docValues="true" />
Since the SSRN field is a docValue, it should work. The results are just the standard 10 documents. This is running in a SolrCloud environment with just one node and one shard.
Thanks in advance!
I fixed the issue. It seems that in SOLR-8426: Enable /export, /stream and /sql handlers by default and remove them from example configs, they removed the need to add /export handler to the solrconfig.xml. If you do add it, then it doesn't work. The solution is just to remove this code (from solrconfig.xml):
<requestHandler name="/export" class="solr.SearchHandler">
<lst name="invariants">
<str name="rq">{!xport}</str>
<str name="wt">xsort</str>
<str name="distrib">false</str>
<lst name="defaults">
<str name="echoParams">explicit</str>
<str name="wt">json</str>
<str name="indent">true</str>
</lst>
</requestHandler>
I am trying to crawl data using Nutch and Index that Data in Solr.
I have follow the steps from this Url Using Nutch with Solr and Nutch Wiki Tutorial
I've successfully Index data using Solrindex command
bin/nutch solrindex http://127.0.0.1:8983/solr/ crawl/crawldb -linkdb crawl/linkdb crawl/segments/* but in Result I can't find the Indexed data.
I want result as below Image
But I can't see any result data at right side.
If you want some data to be returned with the search response, check that the targeted fields are stored by solr, then you can set a list of fields to return in your query using fl param (with stored field name as value). You can also set default fl values in solrconfig.xml.
For example, let's say you want content field to be returned. In your schema.xml, in the <fields> declaration you should have the option stored="true" for this field like so :
<field name="content" type="text" indexed="true" stored="true"/>
Then in solrconfig.xml, declare default fl params in the requestHandler definition, you can set specific fields (space separated field names). The xml sample (grabbed from the tutorial) should look like this if we just want data stored in the content field to be returned.
<requestHandler name="/nutch" class="solr.SearchHandler" >
<lst name="defaults">
<str name="defType">dismax</str>
<str name="echoParams">explicit</str>
<float name="tie">0.01</float>
<str name="qf">
content^0.5 anchor^1.0 title^1.2
</str>
<str name="pf">
content^0.5 anchor^1.5 title^1.2 site^1.5
</str>
<str name="fl">
url content
</str>
<str name="mm">
2<-1 5<-2 6<90%
</str>
<int name="ps">100</int>
<bool hl="true"/>
<str name="q.alt">*:*</str>
<str name="hl.fl">title url content</str>
<str name="f.title.hl.fragsize">0</str>
<str name="f.title.hl.alternateField">title</str>
<str name="f.url.hl.fragsize">0</str>
<str name="f.url.hl.alternateField">url</str>
<str name="f.content.hl.fragmenter">regex</str>
</lst>
</requestHandler>
You can override these defaults right in the query. A common use case is to put "*,score" in the fl area in solr query interface so that you can see all stored fields (using wildcard character *) along with the score in the results. You might also want to specify the query type parameter (qt) according to the targeted request handler (should be "/nutch").
Helpful links :
http://wiki.apache.org/solr/SchemaXml#Common_field_options
http://wiki.apache.org/solr/CommonQueryParameters#fl
I've looked through a ton of examples and other questions here and from them, I've got my config very close to what I need but I'm missing one last little bit that I'm having a heck of a time working out. I'm searching on values like:
solar powered
solar glass
solar globe
solar lights
solar magic
solid brass
solid copper
What I want:
If I search for sol the result should include all these values. This works.
If I search for solar I should get just the first five. This works.
If I search for solar gl I should get only solar glass and solar globe. This does not work. Instead, I get one set of matches for solar and a second set of matches for gl.
In a nutshell, I want to consider the input string as a whole, regardless of any whitespace. I gather this is accomplished by creating a separate query (versus index) analyzer, but I've not been able to make it work. Can anyone suggest a configuration that will get me what I'm looking for?
I've (unsuccessfully) tried:
Querying with "solar gl"
Querying with mm=100%
Defining separate query and index analyzers both using KeywordTokenizerFactory. (I don't know what the heck I thought that would do.)
Defining an index analyzer but not a query analyzer.
Defining a query analyzer with no tokenizer.
Here's my current schema:
<field name="suggest_phrase" type="suggest_phrase"
indexed="true" stored="false" multiValued="false" />
And the field definition:
<fieldType name="suggest_phrase" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.KeywordTokenizerFactory" />
<filter class="solr.LowerCaseFilterFactory" />
</analyzer>
</fieldType>
And the config:
<searchComponent name="suggest_phrase" class="solr.SpellCheckComponent">
<lst name="spellchecker">
<str name="name">suggest_phrase</str>
<str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
<str name="lookupImpl">org.apache.solr.spelling.suggest.fst.FSTLookup</str>
<str name="field">suggest_phrase</str>
<str name="buildOnCommit">true</str>
</lst>
</searchComponent>
<requestHandler class="org.apache.solr.handler.component.SearchHandler" name="/suggest_phrase">
<lst name="defaults">
<str name="spellcheck">true</str>
<str name="spellcheck.dictionary">suggest_phrase</str>
<str name="spellcheck.onlyMorePopular">true</str>
<str name="spellcheck.count">10</str>
<str name="spellcheck.collate">false</str>
</lst>
<arr name="components">
<str>suggest_phrase</str>
</arr>
</requestHandler>
Found the answer, finally! I knew I was really close. Turns out my configuration above was correct and I simply needed to change my query.
Use KeywordTokenizerFactory so that the strings get indexed as a whole.
Use SpellCheckComponent for the request handler.
The piece I was missing -- don't query with q=<string> but with spellcheck.q=<string>.
Given the source strings noted above and a query of spellcheck.q=solar+gl this yields the desired results:
solar glass
solar globe
You may use the AnalyzingInfixLookupFactory or FreeTextLookupFactory
AnalyzingInfixLookupFactory returns the entire content of the field.
FreeTextLookupFactory returns a defined number of tokens.
More details and other suggester algorithms you will find here: http://alexbenedetti.blogspot.de/2015/07/solr-you-complete-me.html
Solr Configuration
<lst name="suggester">
<str name="name">AnalyzingInfixSuggester</str>
<str name="lookupImpl">AnalyzingInfixLookupFactory</str>
<str name="dictionaryImpl">DocumentDictionaryFactory</str>
<str name="field">title</str>
<str name="weightField">price</str>
<str name="suggestAnalyzerFieldType">text_en</str>
</lst>
<lst name="suggester">
<str name="name">FreeTextSuggester</str>
<str name="lookupImpl">FreeTextLookupFactory</str>
<str name="dictionaryImpl">DocumentDictionaryFactory</str>
<str name="field">title</str>
<str name="ngrams">3</str>
<str name="separator"> </str>
<str name="suggestFreeTextAnalyzerFieldType">text_general</str>
</lst>
I've tried this many times and I came to the conclusion that is not possible out of the box.
I found a workaround for that:
I indexed the data adding sopecial chars between each word so that they would not be tokenized.
For example:
solarzzzzzzpowered
solarzzzzzzglass
solarzzzzzzglobe
then when you compose your query you make sure you add the same amount of chars between the two words you type, for example solr gl become solarzzzzzzgl.
This will achieve the behavious that you are asking.
Another option would be not to use the autosuggestion field and make a custom field for yourself, but then you will have to manage the wildcard search and all the indexation by yourself and is not too convenient in terms of time and performance.
While using highlighting we get output in 2 sections.
<result name="response" numFound="2345" start="0">
<doc>...</doc>
<doc>...</doc>
</result>
<lst name="highlighting">
<lst name="08dcc4e3">...highlighted fields with <em> tag...</lst>
<lst name="12e47c63">...highlighted fields with <em> tag...</lst>
</lst>
Is it possible to have 'em' tags within the fields inside the documents?
For example if query is for 'engine', I should get something like -
<doc>
<str name="content">
Build your own search <em>engine</em> using solr
</str>
.
.
.
</doc>
If the above thing is not possible, is it possible to have highlighting output of document id xyz inside the document with other fields, something like -
<doc>
<str name="id">xyz</str>
.
.
.
<lst name="highlighting">
<arr name="content">
<str>...</str>
</arr>
</lst>
</doc>
If this is not possible with simple config changes and default highlighting, Is it possible with custom highlighting module (by extending default highlighting or so).
I am currently using solr 3.4 but let me know if it is possible in solr versions post 3.4
I'm indexing rich text documents into SOLR 3.4 using ExtractingRequestHandler and I'm having trouble getting it to behave like I want it to.
I would like to store creation date as a field to use for faceted search later and have defined the following in schema.xml:
<field name="creation_date" type="date" indexed="true" stored="true"/>
I index like this:
curl -s "http://localhost:8983/solr/update/extract?literal.id=myid&resource.name=myfile.xls&commit=true" -F myfile=#/path/to/myfile.xls
I get the dynamic field attr_creation_date (that other rules make sure), but I don't get it as creation_date. I have also unsuccessfully tried to use copyField like so:
<copyField source="attr_creation_date" dest="creation_date"/>
Yet another try was putting this in solrconfig.xml, but no luck:
<str name="fmap.Creation-Date">creation_date</str>
I'm pretty sure I'm missing something basic here. Any help is most appreciated!
Settings for ExtractingRequestHandler in solrconfig.xml:
<requestHandler name="/update/extract" startup="lazy"
class="solr.extraction.ExtractingRequestHandler" >
<lst name="defaults">
<str name="fmap.content">text</str>
<str name="fmap.Last-Save-Date">last_save_date</str>
<str name="fmap.Creation-Date">creation_date</str>
<str name="fmap.Content-Type">content_type</str>
<str name="lowernames">true</str>
<str name="uprefix">attr_</str>
<str name="captureAttr">true</str>
<str name="fmap.a">links</str>
</lst>
</requestHandler>
My schema.xml file (lots of default stuff): https://gist.github.com/1358002