SOLR finding exact field(s) information - solr

After field query execution, for given search term(s), SOLR APIs are returning the doc Ids.
My question is there a way to fetch minimal set of fields which contains only end user search terms?
For example, I have a SOLR document with nearly 200 attributes
My query is (name:SOLR* OR Description:LUCENE*) AND (Publisher:Print* OR AUTHOR:ERIC etc)
In the above example, if name field matches, i want only name and so on

Related

How to see Solr explain for a document not returned by Solr query

I am using Solr's explain to debug my Solr query. I can see explain results for everything that Solr query returns, but not for the documents the query has not returned.
There are documents which I think should be returned by a query but are not.
I want to see how the Solr score is calculated for those documents to be able to compare with other documents.
I was able to find the answer to this question.
There's a query parameter called explainOther. You can specify a query in this parameter and on top of the explain you will get for the matched queries, now Solr will show you the explain for any record that matches this new explainOther query as well.
Here is the explanation of that parameter from Solr reference guide:
The explainOther Parameter
(From: https://lucene.apache.org/solr/guide/6_6/common-query-parameters.html#CommonQueryParameters-TheexplainOtherParameter)
The explainOther parameter specifies a Lucene query in order to identify a set of documents. If this parameter is included and is set to a non-blank value, the query will return debugging information, along with the "explain info" of each document that matches the Lucene query, relative to the main query (which is specified by the q parameter). For example:
q=supervillians&debugQuery=on&explainOther=id:juggernaut
The query above allows you to examine the scoring explain info of the top matching documents, compare it to the explain info for documents matching id:juggernaut, and determine why the rankings are not as you expect.
The default value of this parameter is blank, which causes no extra "explain info" to be returned.

Apache SOLR edisMAX, Multiple keywords and sequence of those keywords yielding different results

I am trying to understand the root cause of an issue with my SOLR search query. Below code is SOLRJ client code.
query.setStart(0);
query.setRows(1000);
query.set("debugQuery", true);
query.set("defType", "edismax");
query.setQuery("title:business OR statistics) OR (name:business OR statistics)");
query.add("fq", "bsuiness_id:(101 102)");
query.add("tie", "0.1");
query.set("bq","weight:[0 TO 500]^1 weight:[501 TO 1000]^3");
returns 200 search results
query.setStart(0);
query.setRows(1000);
query.set("debugQuery", true);
query.set("defType", "edismax");
query.setQuery("title:statistics OR business) OR (name:statistics OR business)");
query.add("fq", "bsuiness_id:(101 102)");
query.add("tie", "0.1");
query.set("bq","weight:[0 TO 500]^1 weight:[501 TO 1000]^3");
returns 100 search results
My understanding is keyword "business statistics" and "statistics business" should yield same results. However, you may notice above that they are not.
Can someone please provide any pointers about what is missing?
The two queries are not the same. (And you're missing a ( at the start)
title:business OR statistics) OR (name:business OR statistics)
searches for business in the title field and statistics in the default search field (since it doesn't seem like you have a qf parameter), or business in the name field and again, statistics in the default search field.
So in effect:
title = business or name = business or statistics in default search field
Your second query:
title:statistics OR business) OR (name:statistics OR business)
.. searches for statistics in the title field, or business in the default search field, or statistics in the name field, or business (again) in the default search field. In effect:
title = statistics or name = statistics or business in the default search field
.. as you can see, these two queries are not the same. The field: prefix is only valid for the token that follows right behind it - not for those other tokens.
Using the edismax handler, I suggest you rewrite this to using the qf parameter instead (query fields), which tells Solr which fields to query. Your two examples can then be simplified to:
q=statistics business&qf=name title
.. search for statistics and business in the two fields named in the qf parameter. You can use q.op=OR to get hits where any of the terms are present (as in your example), or q.op=AND to get hits where both are present.
In that case statistics business and business statistics as the query will give you the same result.
If you want to use the explicit syntax (aka the Lucene syntax), you can use the form field:(term1 OR term2) - title:(business OR statistics) OR name:(business OR statistics) - but since you're already using the edismax handler, I recommend using the built-in support for more natural queries and using qf to say which fields to search. You can also use weights with qf to weigh hits in the two fields differently - qf=name^3 title will give three times the weigh to any hits in the name field.

Sorting of solr documents based on search term in solr

I would like to sort solr documents based on searched term. For example the search term is "stringABC"
Then the order of the results should be
stringABC,
stringABCxxxx,
xxxxstringABCxxxx
The solr document will contain lot of fileds ex: title, description, path, article No, Product code etc..
And the default field will contain more than one field ex: title, description and path.
So the solr doc will only be returned when the search term satisfied any field from the default field.
Use three fields - one with the exact string, one with a EdgeNgramTokenizer and one with an NgramTokenizer. You can then use qf=field1^10 field2^5 field3 to score hits in these fields according to how you want to prioritize them between each other.

Difference between Solr Facet Fields and Filter Queries

I am using SolrMeter to test Apache Solr search engine. The difference between Facet fields and Filter queries is not clear to me. SolrMeter tutorial lists this as an exapmle of Facet fields :
content
category
fileExtension
and this as an example of Filter queries :
category:animal
category:vegetable
categoty:vegetable price:[0 TO 10]
categoty:vegetable price:[10 TO *]
I am having a hard time wrapping my head around it. Could somebody explain by example? Can I use SolrMeter without specifying either facets or filters?
Facet fields are used to get statistics about the returned documents - specifically, for each value of that field, how many returned documents have that value for that field. So for example, if you have 10 products matching a query for "soft rug" if you facet on "origin," you might get 6 documents for "Oklahoma" and 4 for "Texas." The facet field query will give you the numbers 6 and 4.
Filter queries on the other hand are used to filter the returned results by adding another constraint. The thing to remember is that the query when used in filtering results doesn't affect the scoring or relevancy of the documents. So for example, you might search your index for a product, but you only want to return results constrained by a geographic area or something.
A facet is an field (type) of the document, so category is the field. As Ansari said, facets are used to get statistics and provide grouping capabilities. You could apply grouping on the category field to show everything vegetable as one group.
Edit: The parts about searching inside of a specific field are wrong. It will not search inside of the field only. It should be 'adding a constraint to the search' instead.
Performing a filter query of category:vegetable will search for vegetable in the category field and no other fields of the document. It is used to search just specific fields rather than every field. Sometimes you know that the term you want only is in one field so you can search just that one field.

how to Index URL in SOLR so I can boost results after website

I have thousands of documents indexed in my SOLR which represents data crawled from different websites. One of the fields of a document is SourceURL which contains the url of a webpage that I crawled and indexed into this Document.
I want to boost results from a specific website using boost query.
For example I have 4 documents each containing in SourceURL the following data
https://meta.stackoverflow.com/page1
http://www.stackoverflow.com/page2
https://stackoverflow.com/page3
https://stackexchange.com/page1
I want to boost all results that are from stackoverflow.com, and not subdomains (in this case result 2 and 3 ).
Do you know how can I index the url field and then use boost query to identify all the documents from a specific website like in the case above ?
One way would be to parse the url prior to index time and specify if it is a primary domain ( primarydomain boolean field in your schema.xml file for example).
Then you can boost the primarydomain field in your query results. See using the DisMaxQParserPlugin from the Solr Wiki for an example on how to boost fields at query time.

Resources