How can SOLR be made to boost within result set? - solr

I have indexed some documents that have title, content and keyword (multi-value).
I want to search on title and content, and then, within these results boost by keyword.
I have set up my qf as such:
<str name="qf">
content^0.5 title^1.0
</str>
And my bq as such:
<str name="bq">keyword:(*.*)^1.0</str>
But I'm fairly sure that this is boosting on all keywords (not just ones matching my search term)
Does anyone know how to achieve what I want (I'm using the DisMax query request handler btw.)

I don't think that's how the boost works. Boost is supposed to specify the importance of a match on a specific field.
So by doing something like content^0.5 title^1.0 keyword^5.0, you can make your queries give extra importance to the keyword.
You might be able to force it by doing a complex query. For instance you can use the "+" operator to make it required. So something like this if you were searching for "query":
+(content:query title:query) keyword:query

Related

solr: Boosting documents that match all terms

I would like to have documents that match all terms rank the highest, followed by the partial matches. Among the full matches and among the partial matches, the documents should be ranked by default behavior (IF-TDF). I figured (correct me if I'm wrong) that the best way to do this would be with boost queries, but I am not sure what the correct syntax is.
Here are some of my handler settings:
<str name="defType">edismax</str>
<str name="qf">parent_and_self_description details^0.0001 info^0.0001 code^10000</str>
And let's say an example query is q=cheese apple
How should I set my bq? I guessed something like bq=(cheese AND apple)^100 or bq=(+cheese +apple)^100 but obviously this is not working so it must be syntactically wrong. Thank you.

Solr 5 how to search in specific field

I am using Solr version 5 for searching data. I am using below query which searches for keyword in all fields.
http://localhost:8983/solr/document/select?q=keyword1+keyword2&wt=json
Can anyone suggest me query to search for keyword only in title field.
Thanks.
use
http://localhost:8983/solr/document/select?q=title:*yourkeyword*&wt=json
or for exact match
http://localhost:8983/solr/document/select?q=title:"yourkeyword"&wt=json
You can not search for a keyword in all fields without some extra work:
How can I search all field in SOLR that contain the keywords,.?
The "q"-Parameter contains the query string and for the standard parser this means that you must specify the field via colon like in
fieldname:searchterm
or the standard parser will use the default field. The default field is specified in the "df"-Parameter and if you did not change your solrconfig.xml you will search in the "text"-Field because you will find something like
<requestHandler name="/select" class="solr.SearchHandler">
<lst name="defaults">
<str name="df">text</str>
</lst>
</requestHandler>
P.S. If you want to search in all fields you have either to copy all field-content to one field or you must use a specific query parser like dismax parser, where you can list all your fields in the "qf"-Parameter.
P.P.S. You can not search in all fields but you can highlight in all fields :-)
The best way is to run the query from Admin concole. When we run it, it also provides the actuall SQL query executed. Just copy the query and use it.
About the question: search specific field value from the Solr. In the admin console look for 'Q' text box. write the yourfield=value OR yourfield:value. Hit the 'Execute Query' button. Top right side the SQL will be available.
Generated Query: ......select?indent=on&q=YOURFIELD:"VALUE"&wt=json

LUCENE: search for terms that match a regex

I need to search for any terms in the lucene index, matching particular regex. I know that I can do it using the TermsComponent in solr, if it is configed like this:
<searchComponent name="terms" class="solr.TermsComponent"/>
<!-- A request handler for demonstrating the terms component -->
<requestHandler name="/terms" class="solr.SearchHandler" startup="lazy">
<lst name="defaults">
<bool name="terms">true</bool>
<bool name="distrib">false</bool>
</lst>
<arr name="components">
<str>terms</str>
</arr>
</requestHandler>
For example, I want to fetch any terms containing "surface defects". Using solr I can do this:
http://localhost:8983/solr/core1/terms?terms.fl=content&
terms.regex=^(.*?(\bsurface%20defects\b)[^$]*)$&
terms.sort=count&
terms.limit=10000
But my question is, how can I achieve the same by using the Lucene API, not solr? I looked into the org.apache.solr.handler.component.TermsComponent class but it is not very obvious for me.
You can use a RegexQuery:
Query query = new RegexQuery(new Term("myField", myRegex));
Or the QueryParser:
String queryString = "/" + myRegex + "/";
QueryParser parser = new QueryParser("myField", new KeywordAnalyzer());
Query query = parser.parse(queryString);
Now, my question is: Are you sure that regex works in Solr?
I haven't tried the TermsComponent regex functionality, so maybe it's doing some fancy SpanQuery footwork here, or running regexes on the stored fields retrieved, or something like that, but you are using regex syntax that is not supported by Lucene, and may be making some general assumptions about how regexes work in Lucene that are not accurate.
The big one: a lucene regex query must match the whole term. If your field is not analyzed, the general idea here should work. If they are analyzed with, say, StandardAnalyzer, you can not use a regex query to search like this, since "surface defects" would be split into multiple terms. On the plus side, in that case, a simple PhraseQuery would probably work just fine, as well as being faster and easier (In general, on Lucene regex queries: You probably don't need them, and if you do, you probably should have analyzed better).
^ and $ won't work. You are attempting to match terms, and must match the whole term in order to match. As such, these don't serve any purpose, and aren't supported.
.*? not really wrong, but reluctant matching isn't supported, so it is redundant. .* does the same thing here.
[^$]* if you are trying not to match dollar signs, fine, otherwise, I'm not sure what regex engine would support this. $ in a character class is just a dollar sign.
\b no support in lucene regexes. The whole idea of analysis is that the content should already but split on word breaks, so what purpose would this serve?

Querying across multiple fields with different boosts in Solr

In Solr, what is the best way of querying across different fields where each query on each field has a different weighting?
We're using C# and ASP.NET, with SolrNet being used to query Solr. Our index looks a bit like this:
document_id
title
text_content
tags
[some more fields...]
This is then queried using keywords, where each keyword has a different weight. So, for example, "ipad" might have a weight of 40, but "android" might have a weight of 25.
In conjunction with this, each field has a different base weight. For example, keywords are more valuable than page title, which are more valuable than text content.
So, we end up with something like the following:
title^25
text_content^10
tags^50
And the following keywords:
ipad^25
apple^22
microsoft^15
windows^15
software^20
computer^18
So, each search query has a different weighting, and each field has a different weight. As a result, we end up with search criteria that looks like this:
title:ipad^50
title:apple^47
title:microsoft^40
[more titles...]
text_content:ipad^35
text_content:apple^32
text_content:microsoft^25
[lots more...]
This translates into a very, very long search query, which exceeds the limit allowed. It also seems like a very inefficient way of doing things, and I was wondering if there's a better way of achieving this.
Effectively, we have a list of keywords with varied weights, and a list of fields in Solr which also have varied weights, and the idea is to query the index to retrieve the most relevant documents.
Further complicating this matter, though it may be out of the scope of this question, is that the query also includes filters to filter out documents. This is done using the following type of query:
&fq=(-document_id:4f845eb321c90b0aec5ee0eb)&fq=(-document_id:4f845cd421c90b0aec5ee041)&fq=(-document_id:4f845cea21c90b0aec5ee049)&fq=(-document_id:4f845cf821c90b0aec5ee04d)&fq=(-document_id:4f845d0e21c90b0aec5ee056)&fq=(-document_id:4f845d3521c90b0aec5ee064)&fq=(-document_id:4f845d3921c90b0aec5ee065)&fq=(-document_id:4f845d4921c90b0aec5ee06b)&fq=(-document_id:4f845d7521c90b0aec5ee07b)&fq=(-document_id:4f845d9021c90b0aec5ee084)&fq=(-document_id:4f845dac21c90b0aec5ee08e)&fq=(-document_id:4f845dbc21c90b0aec5ee093)
These can also add a lot of characters to the search query, and it would be good if there was also a better way to handle this as well.
Any help or advice is most appreciated. Thanks.
I would suggest to add those default parameters to your requesthandler configuration within solrconfig.xml. They are always the same, right?
<requestHandler name="standard" class="solr.SearchHandler" default="true">
<lst name="defaults">
<str name="defType">edismax</str>
<str name="qf">title^25 text_content^10 tags^50</str>
</lst>
</requestHandler>
You should be able to add your static filters and so on, so that you don't have to specify those values unless you want to do something different from the default, ending up with urls a lot shorter.

Solr search query over multiple fields

Is it possible to search in Solr over two fields using two different words and get back only those results which contain both of them?
For example, if I have fields "type" and "location" , I want only those results who have type='furniture' and location = 'office' in them.
You can use boolean operators and search on individual fields.
q=type:furniture AND location:office
If the values are fixed, it is better to use Filter Queries for Performance.
fq=type:furniture AND location:office
The suggested solutions have the drawback, that you have to care about escaping special characters.
If the user searches for "type:d'or AND location:coffee break" the query will fail.
I suggest to combine two edismax handlers:
<requestHandler name="/combine" class="solr.SearchHandler" default="false">
<lst name="invariants">
<str name="q">
(_query_:"{!edismax qf='type' v=$uq1}"
AND _query_:"{!edismax qf='location' v=$uq2}")
</str>
</lst>
</requestHandler>
Call the request handler like this:
http://localhost:8983/solr/collection1/combine?uq1=furniture&uq2=office
Explanation
The variables $uq1 and $uq2 will be replaced by the request parameters uq1 and uq2 will.
The result of the first edismax query (uq1) is combined by logical AND with the second edismax query (uq2)
Solr Docs
https://wiki.apache.org/solr/LocalParams
You can also use the boostQuery function on the dismaxRequest handler as
type=dismax&bq=type:furniture AND location:office
fq=type:furniture AND location:office
Instead of using AND, this could be break into two filter queries as well.
fq=type:furniture
fq=location:office

Resources