How to use Solr Spellchecker at query time - solr

I have setup my solr schema as required by spellchecker to function. When I run a query with a wrongly spelled word, I get the following alongside the docs:
"spellcheck":{
"suggestions":[
"jordun",{
"numFound":1,
"startOffset":0,
"endOffset":6,
"origFreq":0,
"suggestion":[{
"word":"jordan",
"freq":33}]}],
"correctlySpelled":true,
"collations":[
"collation","jordan"]}
However, the docs still return results for the incorrectly spelled word. Is there any way to return results for the first option in spellcheck suggestions without running another query?

You can add another field to index which lists the options to the query which match the field of the spelling suggestions with the value of the word with the misspelling.

Related

Solr 3.6.2 spellcheck multi-word phrase: how to get collations without ignored stopwords?

I'm having a problem with the Solr 3.6.2 default (field based) spellchecker configured with query time parameters
spellcheck.onlyMorePopular=true
spellcheck.count=5
spellcheck.collate=true
spellcheck.maxCollations=5
spellcheck.maxCollationTries=5
on a field type which has a solr.StopFilterFactory filter on its analyzers.
The suggestion phase works as intended :
the indexed field does not contain any stopword
no suggestion is provided for a given stopword
But the resulting collation always contains the ignored stopwords, which I don't want: I'd prefer a raw suggestion of combined terms over something which looks like a "sort of" natural language answer.
For instance, searching for "handfum of perries", I'd prefer "handful berry" over "handful of berry".
I don't think that the stopwords excluded from spellchecking suggestions because of the field query analyzer are "marked" for preservation like the official documentation goes about other query elements :
Note that the non-spellcheckable terms such as those for range
queries, prefix queries etc. are detected and excluded for
spellchecking. Such non-spellcheckable terms are preserved in the
collated output so that the original query can be run again, as is.
It seems two solutions would be
either having a custom query converter so the stopwords are ignored right from the start: not sure it is possible in 3.6.2
or having a custom spellchecker that would not try to find any suggestion for a stopword (or would always suggest an "empty" string), without messing up the collation process
Am I missing something ?
Regards

Solr 6.1 - Get all token counts for a Facet field across all documents

I have a TextField in my solr schema.xml on which I want to run faceting and find out counts for every tokens in that field across all the documents. Is there a way to get this? I tried following and I thought it was working until I found out that it's not a complete list of tokens that I am getting form this query:
http://solrnode1:8983/solr/mycollection/select?facet.field=PRODUCT_NAME&facet=on&indent=on&q=*:*&wt=json&rows=0
For example, there is one document in that field that says "Education Services 2014" but I don't see any facet token for 'education' with its count. Interestingly if I change my query parameter to q=PRODUCT_NAME:*education* instead of q=*:* then it shows up in faceting with count! I am not sure what's happening here. Am I missing something here?
I think it's a facet.limit which is by default 100. I increased it and its getting more tokens and counts now.

/select with 'q' parameter does not work

Whenever i query with q=: it shows all the documents but when i query with q=programmer 0 docs found.(contents is the default search field)
my schema has: id(unique),author,title,contents fields
Also query works fine for:
q=author:"Value" or q=title:"my book" etc, only for contents field no results.
Also when i query using spell checker(/spell?q=programmer) output shows spelling suggestions for this word,when 'programmer' is the right word and present in many documents.
I referred the example docs for configurations.
All of a sudden i am getting this,initially it worked fine.
I guess there some problem only in the contents field,but cannot figure it out.
Is it because indexes are not created properly for contents field?
(I am using solr 4.2 on Windows 7 with tomcat as webserver)
Please help.Thanks a lot in advance.
Are you sure you set the default search field? The reason you have this problem might be because you didn't set the <defaultSearchField> field in your schema.xml file. This is why "q=author:value" works while q=WHATEVER doesn't.
The Is used by Solr when parsing queries to
identify which field name should be searched in queries where an
explicit field name has not been used.
But also consider this:
The is used by Solr when parsing queries to
identify which field name should be searched in queries where an
explicit field name has not been used. It is preferable to not use or
rely on this setting; instead the request handler or query LocalParams
for a search should specify the default field(s) to search on. This
setting here can be omitted and it is being considered for
deprecation.
Do you have any data in your instance. try q=*:* and see what it returns. "for" is a stop word, may be it was filtered out. Look for something else as value to test.

Solr Query with LIKE Clause

I'm working with Solr and I'd like to know if it is possible to have a LIKE clause in the query. For example, I want to know all organizations with "New York" in the title. In SQL, this would be written like Name LIKE 'New York%'.
My question - how do you write a LIKE query in Solr?
I'm using the SolrNet library, if that makes a difference.
You just search for "New York", but first you need to properly configure your field's analyzer. For example you might want to start with a field type like text_general as defined in the default Solr schema. This field type will tokenize on whitespace and other common word separators, then apply a filter of stopwords, then lowercase the terms in order to make searches case-insensitive.
More information about analyzers in the Solr wiki.
If you're using solr 3.1 or newer, have a look at the Extended DisMax Query Parser, which supports wildcard queries. You can enable it using <str name="defType">edismax</str> in the request handler configuration.
Then you can use a query like title:New York* with the same behaviour as a query with like clause. The main difference between my answer and the accepted one is that you can even search for fragment of words using wildcards. For example New Yorkers would match in this case.
Unfortunately you could have problems with case-sensitive queries even if you're using a LowerCaseFilterFactory. Have a look here to know more. Most of those problems will be fixed with the solr 3.6 release since the SOLR-2438 issue has been solved.

Solr Index appears to be valid - but returns no results

Solr newbie here.
I have created a Solr index and write a whole bunch of docs into it. I can see
from the Solr admin page that the docs exist and the schema is fine as well.
But when I perform a search using a test keyword I do not get any results back.
On entering * : *
into the query (in Solr admin page) I get all the results.
However, when I enter any other query (e.g. a term or phrase) I get no results.
I have verified that the field being queried is Indexed and contains the values I am searching for.
So I am confused what I am doing wrong.
Probably you don't have a <defaultSearchField> correctly set up. See this question.
Another possibility: your field is of type string instead of text. String fields, in contrast to text fields, are not analyzed, but stored and indexed verbatim.
I had the same issue with a new setup of Solr 8. The accepted answer is not valid anymore, because the <defaultSearchField> configuration will be deprecated.
As I found no answer to why Solr does not return results from any fields despite being indexed, I consulted the query documentation. What I found is the DisMax query parser:
The DisMax query parser is designed to process simple phrases (without complex syntax) entered by users and to search for individual terms across several fields using different weighting (boosts) based on the significance of each field. Additional options enable users to influence the score based on rules specific to each use case (independent of user input).
In contrast, the default Lucene parser only speaks about searching one field. So I gave DisMax a try and it worked very well!
Query example:
http://localhost:8983/solr/techproducts/select?defType=dismax&q=video
You can also specify which fields to search exactly to prevent unwanted side effects. Multiple fields are separated by spaces which translate to + in URLs:
http://localhost:8983/solr/techproducts/select?defType=dismax&q=video&qf=features+text
Last but not least, give the fields a weight:
http://localhost:8983/solr/techproducts/select?defType=dismax&q=video&qf=features^20.0+text^0.3
If you are using pysolr like I do, you can add those parameters to your search request like this:
results = solr.search('search term', **{
'defType': 'dismax',
'qf': 'features text'
})
In my case the problem was the format of the query. It seems that my setup, by default, was looking and an exact match to the entire value of the field. So, in order to get results if I was searching for the sit I had to query *sit*, i.e. use wildcards to get the expected result.
With solr 4, I had to solve this as per Mauricio's answer by defining type="text_en" to the field.
With solr 6, use text_general.

Resources