Incorrect results for Solr search with multiple terms - solr

Perhaps someone can enlighten me on how Solr matches terms. So I have a string attribute named assignedBy, and I do a query against this attribute with the value "Aaron Mason" (no quotes). Solr returns more matches than I anticipated because the term "Mason" also matches documents whose other fields contain the word "Mason" in it. By turning on debugging feature (from Solr admin), I see Solr breaks down the query into two attribute queries - "aaron" for assignedBy and "mason" for the catch-all text (see below). Is this the correct behavior? How do I ensure that it only finds matches against the attribute I specify? Thanks.
"debug":{
"rawquerystring":"assignedBy:Aaron Mason",
"querystring":"assignedBy:Aaron Mason",
"parsedquery":"assignedBy:aaron _text_:mason",
"parsedquery_toString":"assignedBy:aaron _text_:mason",

yes you are correct. when you q=assignedBy:Aaron Mason
after parsing the query, based on you query tokenizers in schema file, it looks like
assignedBy:aaron and _text_:mason.
if you don't specify field name queryterm is searched in default field (which is set in solrconfig.xml file) you can look for <str name="df">text</str> under /select handler. in your case it might be _text_.
So, Solr search for its index and retrieve combined results of all documents which has field assignedBy with term "Aaron" and all documents which has field _text_ with term "mason".
you might have used copyfield to copy some field values to text field. check for it.
You can use dismax/edismax where you can specify in which field all your terms to search for
example:
q=Aaron Mason&wt=json&debugQuery=on&defType=dismax&qf=assignedBy
This only finds matches against the field "assignedBy" specified in qf

Related

How does Solr process the query string when using edismax qf parameter and specify field in query

All:
[UPDATE]
After reading the debug explain, it seems that the qf will expand only
the keywords without specifying field.
===================================================================
When I learn to use edismax query parser, it said the qf paramter is:
Query Fields: specifies the fields in the index on which to perform
the query. If absent, defaults to df.
And its purpose is to generate all fields' combination with the query terms.
However, if we already specify the field in query( q prameter), I wonder what happen when I specify another different fields in qf?
For example:
q=title:epic
defType=edismax
qf=content
Could anyone give some explanation how SOLR interpret this query?
Thanks
When you specify qf it means you want solr to search for whatever is in the "q" field in these "qf" fields. So, your first and third line contradict each other:
q=title:epic
defType=edismax
qf=content
If you want to search for any document where the content field contains anything matching your search terms, but these search terms as tokens in "q" separated by +OR+.
like this...
q=I+OR+like+OR+books+ORand+OR+games
defType=edismax
qf=content
When q=title:epic. It means you has settled the query field to title, so the qf parameter could not be set as "content", in this case, you have no query result for sure. You leave the qf parameter empty or set it as "title"

Solr dynamicField not searched in query without field name

I'm experimenting with the Example database in Solr 4.10 and not understanding how dynamicFields work. The schema defines
dynamicField name="*_s" type="string" indexed="true" stored="true"
If I add a new item with a new field name (say "example_s":"goober" in JSON format), a query like
?q=goober
returns no matches, while
?q=example_s:goober
will find the match. What am I missing?
I would like to see the SearchHandler from solrconfig.xml file that you are using to execute the above mentioned query.
In SearchHandler we generally have Default Query Field i.e. qf parameter.
Check that your dynamic field example_s is present in that query field list of solrconfig file else you can pass it while sending query to search handler.
Hope this will help you in resolving your problem.
If you are using the default schema, here's what's happening:
You are probably using default end-point (/select), so you get the definition of search type and parameters from that. Which means, it is default (lucene) search and the field searched is text.
The text field is an aggregate and is populated by copyField instruction from other fields.
Your dynamic field definition for *_s allows you to index the text with any name ending in _s, such as example_s. It's indexed (so you could search against it directly) and stored (so you can see it when you ask for all fields). It will not however search it as a general text. Notice that (differently from ElasticSearch), Solr strings have to be matched fully and completely. If you have some multi-word text in it, there is barely any point searching it. "goober" is one word so it's not a very good example to understand the difference here.
The easiest solution for you is add another copyField instruction:
<copyField source="*_s" dest="text"/>, then all your *_s dynamic fields would also be searchable. But notice that the search analyzers will not be the ones for *_s definition, but the ones for the text field's definition, which is not string, but text_general, defined elsewhere in the file.
As to Solr vs. ElasticSearch, they both err on the different sides of magic. Solr makes you configure the system and makes it very easy to see the exact current configuration. ElasticSearch hides all of the configuration, but you have to rediscover it the second you want to change away from the default behaviour. In the end, the result is probably similar and meets somewhere in the middle.

solr use both n-gram search and default search

I'm trying to create a corpus using Solr. I have a field named "content" and I need to index and search bigrams and trigrams. Also need to index and search using the default searching.
How to configure these things?
You'll have to add the ShingleFilterFactory to your field definition, after the tokenization has been performed. You can configure the ShingleFilter to generate bigrams or trigrams.
There is no such thing as "default searching", but the bundled schema includes a field named text_general that might be a good match for regular search. You'll have two different fields, one for searching shingles (where you'd want to match the whole bigram / trigram, probably), and one for the "regular search".
You can add the same content to both fields by using a copyField directive, such as <copyField source="content" dest="content_ngrams" />. You can use qf when querying to say which field you want to query, or if you want to score the fields differently for matches (i.e. boosting a match in a bi/trigram). You could also query for a direct match with fieldname:value, depending on how you need to query the index.

Solr DisMax query equivalent

I am trying to set up elevate handler in SOLR 3.5.0 and I need the equivalent of the below query in dismax format which defines different boost values on the same field based on the match type(exact match gets 200 whereas wildcard match gets 100).
q=name:(foo*^100.0 OR foo^200.0)
This is one way to solve this problem.
Keep a text field with only WhiteSpaceTokenizer (and maybe LowerCaseFilter depending on your case-sensitivity needs). Use this field for the exact match. Let's call this field name_ws.
Instead of using a wild-card query on name_ws, use a text-type copy field with EdgeNGramTokenizer in your analyzer chain, which will output tokens like:
food -> f, fo, foo, food
Let's call this field name_edge.
Then you can issue this dismax query:
q=foo&defType=dismax&qf=name_ws^200+name_edge^100
(Add debugQuery=on to verify if the scoring works the way you want.)

Solr Ngram Synonyms Dismax

I have ngram-indexed 2 fields (columns in the database) and the third one is my full text field. Now my default text field is the full text field and while querying I use dismax handler and specify in it both the ngrammed field with certain boost values and also full text field with a certain boost value.
Problem for me if I dont use dismax and just search full text field(i.e. default field specified in schema) synonyms work correctly i.e. ca returns all results where california is there whereas if i use dismax ca is also searched in the ngrammed fields and return partial matches of the word ca and does not go at all in the synonym part.
I want to use synonyms in every case so how should I go about it?
Ensure you already correctly configured the "SynonymFilterFactory" filter in your ngram field's query analyzer.
If still doesn't work, the Solr admin's analysis interface can give more details of the tokenize/filter procedures, through which can check if the Synonym part already works as expected.

Resources