Query one document or another depending on specific field - solr

Imagine a SolR-index with documents similar to this
[
{
ProductId: 123,
Contract: abc
},
{
ProductId: 123,
Contract: def
},
{
ProductId: 123
},
{
ProductId: 567
},
{
ProductId: 567,
Contract: bar
}
]
There is always a document with a specific ProductId and without a Contract
Additionally there may be 0 to n documents with Contract
I need a query, where I can use a Contract and that should return me all ProductIds either the one with the given Contract, if exists, or the single document without a Contract at all.
For example I will make a query with Contract: def (somehow) and it should give me this
[
{
ProductId: 123,
Contract: def
},
{
ProductId: 567
}
]
The document with Contract:abc is not part of the result
The document with ProductId:123 but without Contract is not part of the result
The document ProductId:567 is part of the result, because there is no document with this ProductId and ContractId: def
In other words what I need is something like
Give me one documents per ProductId and with Contract:X XOR -Contract*, but not both.

Step 1 Write your query so that records without Contracts as well as all with matching contracts are returned, but the ones with the appropriate contract have the highest score. This gets around the problem that you will sometimes want items in your results that don't match the contract value: q=Contract:"def" OR (*:* -Contract:[* TO *]). The (*:* -Contract:[* TO *]) matches that all records without contracts, and the Contract:"def" matches records with the correct contract. The records matching Contract:"def" should naturally have a higher score than those with no contract, but if there's any trouble or you just want to be sure, you can add a boost to that clause, Contract:"def"^2.
Step 2 Add Result Grouping to the query, configured so that you are requesting only the highest scoring record for any given ProductId:
q=Contract:"def" OR (*:* -Contract:[* TO *])&group=true&group.field=ProductId
This requires that the ProductId field be configured in your schema.xml as multiValued="false", as multiValued fields cannot be used as groups. I'm also assuming that you are using the Standard Query Parser, either set as a default in your solrconfig.xml or by adding the argument defType=lucene when you make the query.
The results should look something like this:
'grouped'=>{
'ProductId'=>{
'matches'=>5,
'groups'=>[{
'groupValue'=>123,
'doclist'=>{'numFound'=>3,'start'=>0,'docs'=>[
{
'ProductId'=>123,
'Contract'=>'def'}]
}},
{
'groupValue'=>567,
'doclist'=>{'numFound'=>2,'start'=>0,'docs'=>[
{
'ProductId'=>567}]
}}]}}}
Note that neither the matches nor the numFound values in the result set will tell you how many groups have been returned, but the argument rows=XX can be used to define the maximum number of desired groups (in this case ProductIds).

Related

Can Solr JSON Facet API stat functions be combined with a domain filter to limit the results of the function?

My use case is to make a query to Solr, and to extract counts of unique terms for certain fields within the result set. The trick is that within my counts, I need to limit the output to only terms that match a certain input string--without adjusting the main Solr query. E.g., "Solr, give me results for 'War and Peace', and give me the first ten facets on author where the author field has 'doge' in it, and give me a count of all unique author values in the result set where the author field has 'doge' in it."
The Solr JSON Facet API allows me to facet using stat functions; in this case, I'm interested in using the unique() function to get the counts I need. So, e.g.,
{
"author_count": "unique(author)"
}
...tells me the total number of unique values for 'author' in the result set. This is good.
I can limit the output of a facet using the domain change option, like so:
{
"author_facet": {
"type": "terms",
"field": "author",
"mincount": 1,
"limit": 10,
"offset": 0,
"domain": {
"filter": "author:doge"
}
}
}
This is also good.
The problem I'm having is that when I send both of these choices, the result of the unique() call (in author_count) is a count of all unique author values in the base result set, regardless of whether the author contains 'doge'. The author_facet results do correctly limit the output to only authors with 'doge' in them. But I need to also apply that limit to the results of the unique() function.
I cannot alter the base query, because it represents user input that is independent of the facet filtering input. E.g, the user will have searched for "War and Peace," and now want to see only those facets where the author is 'doge', with a count of the total authors matching 'doge'.
If it is meaningful to the answer, I am running Solr 9.0.0.
Is there a way to apply domain filtering to Solr stat functions in the JSON Facet API, such as unique()?
EDIT: To clarify: The number of authors with 'doge' may be very large, and so would exceed the number of actual facets that should be returned. I'm limiting the facet response to 100, but there could be 978 authors with 'doge'. I want to inform the user of that 978 count while only returning the top 100.

Combine solr's document score with a static, indexed score in solr 7.x

I have people indexed into solr based on structured documents. For simplicity's sake, let's say they have the following schema
{
personName: text,
games :[ { gamerScore: int, game: text } ]
}
An example of the above would be
{
personName: john,
games: [
{ gamerScore: 80, game: Zelda },
{ gamerScore: 20, game: Space Invader },
{ gamerScore: 60, game: Tetris},
]
}
'gamerScore' (a value between 1 and 100 to indicate how good the person is in the specified game).
Relevance matching in solr is all done through the Text field 'game'. However, I want my final result list to be a combination of relevance to the query as provided by solr and my own gamerScore. Namely, I need to re-rank the results based on the following formula:
personFinalScore = (0.8 * solrScore) + (0.2 * gamerScore)
What am trying to achieve is the combination of two different scores in a weighted manner in solr. This question was asked a long time ago, and was wondering if there is something in solr v7.x. that can tackle this.
I can change the schema around if a solution requires it.
In effect your formula can be simplified to applying your gamerScore with 0.25 - the absolute value of the score is irrelevant, just how much the gamerScore field affects the score of the document.
The dismax based handlers supports bf:
The bf parameter specifies functions (with optional boosts) that will
be used to construct FunctionQueries which will be added to the user’s
main query as optional clauses that will influence the score.
Since bf is an addtive boost, you can use bf=product(gamerScore,0.25) to make the gamerScore count 20% of the total score.

Grouped records with aggregate fields

I'm running an instance of Solr 6.2. One of the use cases I'm exploring is to return records grouped by a field, including summed columns (facets) and sorted by those columns. I realize Solr is not meant to be utilized as a relational database, but is this possible?
Using the JSON API, I send the following data payload to the query endpoint of my Solr instance:
{
query: "*:*",
filter: ["status:1", "date:[2016-10-11T00:00:00Z-7DAYS/DAY TO 2016-10-11T00:00:00Z]"],
limit: 10,
params: {
group: true,
group.field: name,
group.facet: true
},
facet: {
funcs: {
type: terms,
field: name,
sort: { sum_v1: desc },
limit: 10,
facet: {
sum_v1: "sum(v1)",
sum_v2: "sum(v2)",
sum_v3: "sum(v3)"
}
}
}
This returns 10 records at a time in both the groups key and facets key of the response JSON. However, the sorted facet buckets do not match up with the grouped records. How can I get the facet counts with the relevant groups?
The only workaround I can come up with is to do a query for the grouped records first, then do another query using the id's from that query to get the facet counts. However, the downside is that I'd lose the ability to sort or filter by any of the facet counts.

Solr facet query filtering

I'm trying to build a facet query on the manufacturer field when the search term = "LENS" but want to eliminate all those manufactures where there is no lens..
For example:- I need the following output but want to eliminate "Kodak" since there is not lens from that manufacturer....
"facet_fields": {
"manu" : [
"Canon USA": 25,
"Olympus": 21,
"Sony": 12,
"Panasonic": 9,
"Nikon": 4,
"Kodak":0
],
http://localhost/solr/collection1/select?q=lens&rows=0&wt=json&indent=true&facet=true&facet.query=lens&facet.field=manu
does not yield the correct result
You can use facet.mincount to only retrieve facet keys that have a value above a certain treshold. This is 0 by default.
facet.mincount=1
You can also supply the value on a per-field basis if you're doing multiple facets in a single request, f.manu.facet.mincount=1.
Additionally, there should be no need to do a facet.query when you're already performing the same query as the actual query. The facet.query is useful if you want to do arbitrary queries for a facet, within the same document set already returned by your query.

Restrict multi field facet calculation to subset of possible values

I have a non trivial SOLR query, which already involves a filter query and facet calculations over multiple fields. One of the facet fields is a a multi value integer field, that is used to store categories. There are many possible categories and new ones are created dynamically, so using multiple fields is not an option.
What I want to do, is to restrict facet calculation over this field to a certain set of integers (= categories). So for example I want to calculate facets of this field, but only taking categories 3,7,9 and 15 into account. All other values in that field should be ignored.
How do I do that? Is there some build in functionality which can be used to solve this? Or do I have to write a custom search component?
The parameter can be defined for each field specified by the facet.field parameter – you can do it, by adding a parameter like this: facet.field_name.prefix.
I don't know about any way to define the facet base that should be different from the result, but one can use the facet.query to explicitly define each facet filter, e.g.:
facet.query={!key=3}category:3&facet.query={!key=7}category:7&facet.query={!key=9}category:9&facet.query={!key=15}category:15
Given the solr schema/data from this gist, the results will have something like this:
"facet_counts": {
"facet_queries": {
"3": 1,
"7": 1,
"9": 0,
"15": 0
},
"facet_fields": {
"category": [
"2",
2,
"1",
1,
"3",
1,
"7",
1,
"8",
1
]
},
"facet_dates": {},
"facet_ranges": {}
}
Thus giving the needed facet result.
I have some doubts about performance here(especially when there will be more than 4 categories and if the initial query is returning a lot of results), so it is better to do some benchmarking, before using this in production.
Not exactly the answer to my own question, but the solution we are using now: The numbers I want to filter on, build distinct groups. So we can prefix the id with a group id like this:
1.3
1.8
1.9
2.4
2.5
2.11
...
Having the data like this in SOLR, we can use facted prefixes to facet only over a single group: http://wiki.apache.org/solr/SimpleFacetParameters#facet.prefix

Resources