Why does Dismax not work in simple query? - solr

All:
I am pretty new to SOLR, I upload some documents which have "season" in content field(store but not indexed, copy to text field) and in title field(store and indexed copy to text field)
When I use basic query without dismax like:
http://localhost:8983/solr/collection1/select?q=season&rows=5&wt=json&indent=true
It works very well and return correct results, but when I want to boost those documents which have more "season" in content rather than title, I used dismax like(I guess the way I use it is totally, cos the content is not indexed, but I at least expect certain return result even incorrect ):
http://localhost:8983/solr/collection1/select?q=season&rows=5&wt=json&indent=true&defType=dismax&qf=content%5E100+title%5E1
There is no match result returned, I wonder if anyone could help me with this? Or could anyone show me how to use dismax correctly
Thanks

In your second query you specify the "content" field as the one and only query field but earlier you write that this field is stored but not indexed. If a field is not indexed you can not search against it.

I faced the same problem. Tracked it down to the schema definition where for dismax to work, field type should be text and not string
for e.g text_general,text_en_splitting,text_en
Its because of the tokenizers used for this field types.
-->

Related

Solr dismax Query Over Multiple Fields

I am trying to do a solr dismax query over multiple fields, and am a little confused with the syntax.
My core contains a whole load of podcast episodes. The fields in the index are EPISODE_ID, EPISODE_TITLE, EPISODE_DESC, and EPISODE_KEYWORDS.
Now, when I do a query I would like to search for the query term in the EPISODE_TITLE, EPISODE_DESC, and EPISODE_KEYWORDS fields, with different boosts for the different fields.
So when I search for 'jedi', the query I've built looks like this:
http://localhost:8983/solr/episode_core/select?
&defType=dismax&q=jedi&fl=EPISODE_ID,EPISODE_TITLE,EPISODE_DESC,EPISODE_KEYWORDS
&qf=EPISODE_TITLE^3.0+EPISODE_DESC^2.0+EPISODE_KEYWORDS
However, this doesn't seem to work - it returns zero records.
When I put a default field like below, it now works, but this is kind of crap because it means I'm not getting results from searching all of the 3 fields:
http://localhost:8983/solr/episode_core/select?&df=EPISODE_DESC
&defType=dismax&q=jedi&fl=EPISODE_ID,EPISODE_TITLE,EPISODE_DESC,EPISODE_KEYWORDS
&qf=EPISODE_TITLE^3.0+EPISODE_DESC^2.0+EPISODE_KEYWORDS
Is there something I am missing here? I thought that you could search over multiple fields, and I thought that the 'qf' parameter would mean you didn't need to supply the default field parameter?
All help much appreciated...
Your idea is correct. If you've defined qf (query fields) for Dismax, there shouldn't be any need to specify a df (default field).
Can you be more specific about what isn't working?
Also, read up on Configuration Invariants in solrconfig.xml as it is possible your configuration could be sending some different parameters than you've specified in the URL.
(E.g. if you're seeing a specific error message asking you to provide a df)

How does Solr process the query string when using edismax qf parameter and specify field in query

All:
[UPDATE]
After reading the debug explain, it seems that the qf will expand only
the keywords without specifying field.
===================================================================
When I learn to use edismax query parser, it said the qf paramter is:
Query Fields: specifies the fields in the index on which to perform
the query. If absent, defaults to df.
And its purpose is to generate all fields' combination with the query terms.
However, if we already specify the field in query( q prameter), I wonder what happen when I specify another different fields in qf?
For example:
q=title:epic
defType=edismax
qf=content
Could anyone give some explanation how SOLR interpret this query?
Thanks
When you specify qf it means you want solr to search for whatever is in the "q" field in these "qf" fields. So, your first and third line contradict each other:
q=title:epic
defType=edismax
qf=content
If you want to search for any document where the content field contains anything matching your search terms, but these search terms as tokens in "q" separated by +OR+.
like this...
q=I+OR+like+OR+books+ORand+OR+games
defType=edismax
qf=content
When q=title:epic. It means you has settled the query field to title, so the qf parameter could not be set as "content", in this case, you have no query result for sure. You leave the qf parameter empty or set it as "title"

Solr terms component complete field match

i am new to Solr.
I am working with the terms component to get the Top Terms from a Field.
For Example:
I got the field "Firm" and there are many types of firms in it with the endings "gmbh" and "ag".
But i need this Field sepperated by the full content of it.
For Example: Mustermann gmbh, max gmbh, etc .....
I've tried many different fieldtypes in the schema.xml but nothing worked.
Thank you in advance.
Best regards,
Lorenzo :-)
You can use Facets in your request to get the "Top X of field Y"
E.g.
q=*&facet=true&facet.field=Firm&facet.limit=50&facet.minCount=1
When you use facet.limit you get the top X results.
Your field Firm in the schema.xml should not use a Tokenizer, because you would get "mustermann" and "gmbh" instead of "mustermann gmbh" (I think "string" is in standard a field without a Tokenizer)
Don't forget to reindex if you have to change field values.

Solr Index appears to be valid - but returns no results

Solr newbie here.
I have created a Solr index and write a whole bunch of docs into it. I can see
from the Solr admin page that the docs exist and the schema is fine as well.
But when I perform a search using a test keyword I do not get any results back.
On entering * : *
into the query (in Solr admin page) I get all the results.
However, when I enter any other query (e.g. a term or phrase) I get no results.
I have verified that the field being queried is Indexed and contains the values I am searching for.
So I am confused what I am doing wrong.
Probably you don't have a <defaultSearchField> correctly set up. See this question.
Another possibility: your field is of type string instead of text. String fields, in contrast to text fields, are not analyzed, but stored and indexed verbatim.
I had the same issue with a new setup of Solr 8. The accepted answer is not valid anymore, because the <defaultSearchField> configuration will be deprecated.
As I found no answer to why Solr does not return results from any fields despite being indexed, I consulted the query documentation. What I found is the DisMax query parser:
The DisMax query parser is designed to process simple phrases (without complex syntax) entered by users and to search for individual terms across several fields using different weighting (boosts) based on the significance of each field. Additional options enable users to influence the score based on rules specific to each use case (independent of user input).
In contrast, the default Lucene parser only speaks about searching one field. So I gave DisMax a try and it worked very well!
Query example:
http://localhost:8983/solr/techproducts/select?defType=dismax&q=video
You can also specify which fields to search exactly to prevent unwanted side effects. Multiple fields are separated by spaces which translate to + in URLs:
http://localhost:8983/solr/techproducts/select?defType=dismax&q=video&qf=features+text
Last but not least, give the fields a weight:
http://localhost:8983/solr/techproducts/select?defType=dismax&q=video&qf=features^20.0+text^0.3
If you are using pysolr like I do, you can add those parameters to your search request like this:
results = solr.search('search term', **{
'defType': 'dismax',
'qf': 'features text'
})
In my case the problem was the format of the query. It seems that my setup, by default, was looking and an exact match to the entire value of the field. So, in order to get results if I was searching for the sit I had to query *sit*, i.e. use wildcards to get the expected result.
With solr 4, I had to solve this as per Mauricio's answer by defining type="text_en" to the field.
With solr 6, use text_general.

How do I return only a truncated portion of a field in SOLR?

I have a really large (5000+ characters) text field in SOLR named Description. So far it works great for searching and highlighting. If I perform a search and there are no highlighted portions then I just show the first 300 characters. What I would like to do is just return the 300 characters in the result from SOLR.
I would like to do this because when testing I get improved performance if I return a smaller result. This is probably because the XML doc is smaller so less time on the wire and then the processing is faster because the doc is smaller.
I have thought of using a new field that just stored the first 300 characters. I think this would work, but I was wondering if there was a better or more native solution.
What you're looking for is the highlighting hl.maxAlternateFieldLength (http://wiki.apache.org/solr/HighlightingParameters#hl.maxAlternateFieldLength).
You will need to define the field as its own alternate field. If you want to highlight the field Description, the highlight query parameters would be:
hl=true
hl.fl=Description
f.Description.hl.alternateField=Description
hl.maxAlternateFieldLength=300
Finally, to omit the Description field from the query result, you will have to exclude it from the fl query parameter:
fl=score,url,title,date,othermetadata
When using the Unified Highlighter, hl.alternateField is not available as a query parameter. Instead you can use the hl.defaultSummary query parameter (available since Solr 4.5)
hl.defaultSummary
If true, use the leading portion of the text as a snippet if a proper highlighted snippet can’t otherwise be generated. The default is false.

Resources