Solr faceting based on function result

Solr faceting based on function result - solr

I'm trying to perform faceting based on a dynamic value. Basically I want identical behavior to the def function, but that doesn't seem to be available with faceting.
Consider these two "products":
{
"id":"product1",
"defaultPrice":19.99,
"overridePrice":14.99
},
{
"id":"product2",
"defaultPrice":49.99
}
I want to add that overridePrice is just an example. The actual field is a dynamic value that will depend on what context a search is performed in, and there may be many overridden prices, so I can't just derive price at index time.
For the response, I'm doing something like this for fl:
fl=price:def(overridePrice, defaultPrice) and using the same def function to perform sorting on price. This works fine.
So now I want to apply the same logic to facets. I've tried using json.facet, which seemed like it would work:
json.facet={
price: "def(overridePrice, defaultPrice)"
}
I've tried other variations as well, such as using field:def(overridePrice, defaultPrice) as well as field:price, but def doesn't seem to be an available function for faceting, and the price derived field is not available when faceting.
So the question: How can I perform faceting based on a default field like I'm doing for fl and sorting? Will this require a custom aggregation function, or is there a clever way I can do this without a custom function? It would be much more preferable to be able to do this with built-in Solr functionality.

I was able to do a hacky solution based on a tip in another question.
We can use two facets with a query to filter documents depending on if a field exists.
Example:
{
price_override: {
type: query,
q: "overridePrice:[* TO *]",
facet: {
price_override:{
type:terms,
field: overridePrice
}
}
},
price_standard: {
type: query,
q: "-overridePrice:[* TO *] AND defaultPrice:[* TO *]",
facet: {
price_standard: {
type: terms,
field: defaultPrice
}
}
}
}
Explanation:
price_override: {
type: query,
q: "overridePrice:[* TO *]"
This range query only selects documents that have an overridePrice field.
price_standard: {
type: query,
q: "-overridePrice:[* TO *] AND defaultPrice:[* TO *]"
-overridePrice:[* TO *] omits documents with the overridePrice field, and selects documents with a defaultPrice field.
And the facet response:
"facets":{
"count":2,
"price_override":{
"count":1,
"price_override":{
"buckets":[{
"val":14.99,
"count":1}]}},
"price_standard":{
"count":1,
"price_standard":{
"buckets":[{
"val":49.99,
"count":1}]}}}
This does require manually grouping price_override and price_standard into a single facet group, but the results are as expected. This could also pretty easily be tweaked into a ranged query, which is my use case.

Related

Solr Conditional Highlighting: How to highlight with conditions?

In a Solr Implementation, I am trying to do some conditional highlight depending on others fields than the one we search on.
I want to get the matching result a field "content" highlighted only if it is indicated in Solr that this field can be exposed for this element.
Given a Solr base populated with :
[{ firstname:"Roman",
content: "A quick response is the best",
access:"" },
{ "firstname":"Roman",
"content": "Responsive is important",
"access":"contentAuthorized" }
]
I would like to get both document in my answer, and the highlight on the "content" field only for the one with the data "access":"contentAuthorized", so I am executing the query:
q:(firstname:r* OR (+tags:contentAuthorized AND +content:r*))
The expected answer would be:
...
{
{
"firstname":"Roman"
},
{
"firstname":"Roman"
}
},
highlighting":{
"0f278cb5-7150-42f9-8dca-81bfa68a9c6e":{
"firstname":["<em>Roman</em>"],
"105c6464-0350-4873-9936-b46c39c88647":{
"firstname":["<em>Roman</em>"],
"content":["<em>Responsive</em> is important],
}
}
But I actually get:
...
{
{
"firstname":"Roman"
},
{
"firstname":"Roman"
}
},
highlighting":{
"0f278cb5-7150-42f9-8dca-81bfa68a9c6e":{
"firstname":["<em>Roman</em>"],
"content":["A quick <em>response</em> is the best"],
"105c6464-0350-4873-9936-b46c39c88647":{
"firstname":["<em>Roman</em>"],
"content":["<em>Responsive</em> is important],
}
}
So, I get the "content" on the highlight of the second element while (+tags:contentAuthorized AND +content:r*) is false.
Does anyone have an idea of how I could do conditional highlighting with Solr so ?
Thank you for reading this and for taking your time to think about it :D

If you want highlighting to be applied on certain fields only, then you need to set the query parameter hl.fl to those fields. In your case hl.fl=content. You should then set hl.requireFieldMatch=true.
Refer to Solr Highlighting documentation:
By default, false, all query terms will be highlighted for each field to be highlighted (hl.fl) no matter what fields the parsed query refer to. If set to true, only query terms aligning with the field being highlighted will in turn be highlighted.
For further info on how to use the query parameters: https://solr.apache.org/guide/8_6/highlighting.html

Solr facet Filtering

I have these fields in Solr
"IsFair": "True",
"IsHeight" : "True",
"IsFat" : "false",
"IsManly" : "False"
But while filtering data I want them to display as Fair, Height, Fat, Manly in a single field at the front end.
Something like FilterName "Appearance Type" and in that, I should have "Fair", "Height", "Fat", "Manly" as filters. Someone suggested me to use Hybrid Filter but I didn't understand how to achieve this.

I think the best way of implementation will be to create a multivalued field appearance_type of type String and generate facet on the field appearance_type, later while applying the filter you can use the same field.
So your example document will have,
{
"id":"doc1",
"appearance_type":["fair","height"]
}

Multilevel facet query in Solr

I'm trying to get a two-level facet count from our solr server. The documents look like this (shortened to only the relevant fields):
{
"id":"100071F5",
"datasource":"ABC",
"mediatype":"ISSBD"
}
With a query like this ...
http://localhost:8983/solr/select?wt=json&fl=id&indent=true&facet=true&facet.field=datasource&facet.field=mediatype
... I get as a result like this:
"facet_fields":{
"datasource":[
"ABC",75286,
"DEF",47257],
"mediatype":[
"POSTCARD",75286,
"RARE",12033,
"MANUSCRIPT",9418,
"BOOK",5849,
"OTHER",4747,
"UNKNOWN",2603,
"MAP",1033,
"GLOBE",704,
"MIXED",13,
"ISSUE",1]} ...
But what I really want is something like this:
"facet_fields":{
"datasource":[
"ABC",75286,
"mediatype":[
"POSTCARD",1234,
"RARE",1,
"BOOK",533,
"OTHER",47],
"DEF",47257,
"mediatype":[
"POSTCARD",7286,
"RARE",1203,
"MANUSCRIPT",918,
"BOOK",549,
"OTHER",4747,
"UNKNOWN",2603,
"MAP",1033,
"GLOBE",704,
"MIXED",13,
"ISSUE",1]} ...
In words: I like to have a facet over one field and then a subfacet for each subresult over another field. Is this possible in Solr?

You can use Pivot Faceting:
&facet.pivot=datasource,mediatype
It should give you a similar structure back:
[{
"field":"datasource",
"value":"ABC",
"count":75286,
"pivot":[{
"field":"mediatype",
"value":"POSTCARD",
"count":34919
}, { ....
}]
}]
You can also use the JSON Facet API to make even more detailed facet aggregations and sub-facets. Example adopted from the reference guide above:
top_datasource:{
type: terms,
field: datasource,
facet:{
top_mediatype:{
type: terms, // nested terms facet on author will be calculated for each parent bucket (datasource)
field: mediatype
}
}
}

Query specificly indexed value in multivalued field

I have a multivalued field which is filled by an array of strings. Now I want to find all documents that have i. e. foo as the i. e. second (!) string in this field. Is this possible?
If it is not, what would be your recommendation to achieve this?

For Solr, you can use UpdateRequestProcessor to copy and modify the field to add position prefix. So, you'll end up with 2_91 or similar. You can use StatelessScriptURP for that.
Alternatively, you could send this information as multiple fields and have dynamic field definition to map them.
Basically, for both Solr and ES, underlying Lucene stores multivalued strings as just one long string with large token offset between last token of first value and first token of second value. So, absolute positions require some sort of hack. Runtime hacks (e.g. ElasticSearch example in the other answer) are expensive during query. Content modifying hacks (e.g. URP in this example) are expensive with additional disk space or with more complex schema.

In elasticsearch, you can achieve this using Script Filter, Here is a sample,
consider a mapping for phone_no as,
{
"index": {
"mappings": {
"type": {
"properties": {
"phone_no": {
"type": "string"
}
}
}
}
}
}
put a document (first),
POST index/type
{
"phone_no" :["91","92210"]
}
and second one too,
POST index/type
{
"phone_no" :["92210","91"]
}
so, if you want to find the second value equals 91, then here is a query,
POST index/type/_search
{
"filter" :{
"script": {
"script": "_source.phone_no[1].equals(val)",
"params": {
"val" :"91"
}
}
}
}
where , val can be user defined,
Here in the above script, no case is handled (like, if it have size >1 , which may return execption sometime, you can modify script for your need ). Thanks,
Hope this might helps!!

mongodb - retrieve array subset

what seemed a simple task, came to be a challenge for me.
I have the following mongodb structure:
{
(...)
"services": {
"TCP80": {
"data": [{
"status": 1,
"delay": 3.87,
"ts": 1308056460
},{
"status": 1,
"delay": 2.83,
"ts": 1308058080
},{
"status": 1,
"delay": 5.77,
"ts": 1308060720
}]
}
}}
Now, the following query returns whole document:
{ 'services.TCP80.data.ts':{$gt:1308067020} }
I wonder - is it possible for me to receive only those "data" array entries matching $gt criteria (kind of shrinked doc)?
I was considering MapReduce, but could not locate even a single example on how to pass external arguments (timestamp) to Map() function. (This feature was added in 1.1.4 https://jira.mongodb.org/browse/SERVER-401)
Also, there's always an alternative to write storedJs function, but since we speak of large quantities of data, db-locks can't be tolerated here.
Most likely I'll have to redesign the structure to something 1-level deep, like:
{
status:1,delay:3.87,ts:138056460,service:TCP80
},{
status:1,delay:2.83,ts:1308058080,service:TCP80
},{
status:1,delay:5.77,ts:1308060720,service:TCP80
}
but DB will grow dramatically, since "service" is only one of many options which will append each document.
please advice!
thanks in advance

In version 2.1 with the aggregation framework you are now able to do this:
1: db.test.aggregate(
2: {$match : {}},
3: {$unwind: "$services.TCP80.data"},
4: {$match: {"services.TCP80.data.ts": {$gte: 1308060720}}}
5: );
You can use a custom criteria in line 2 to filter the parent documents. If you don't want to filter them, just leave line 2 out.

This is not currently supported. By default you will always receive the whole document/array unless you use field restrictions or the $slice operator. Currently these tools do not allow filtering the array elements based on the search criteria.
You should watch this request for a way to do this: https://jira.mongodb.org/browse/SERVER-828

I'm attempting to do something similar. I tried your suggestion of using the GROUP function, but I couldn't keep the embedded documents separate or was doing something incorrectly.
I needed to pull/get a subset of embedded documents by ID. Here's how I did it using Map/Reduce:
db.parent.mapReduce(
function(parent_id, child_ids){
if(this._id == parent_id)
emit(this._id, {children: this.children, ids: child_ids})
},
function(key, values){
var toReturn = [];
values[0].children.forEach(function(child){
if(values[0].ids.indexOf(product._id.toString()) != -1)
toReturn.push(child);
});
return {children: toReturn};
},
{
mapparams: [
"4d93b112c68c993eae000001", //example parent id
["4d97963ec68c99528d000007", "4debbfd5c68c991bba000014"] //example embedded children ids
]
}
).find()
I've abstracted my collection name to 'parent' and it's embedded documents to 'children'. I pass in two parameters: The parent document ID and an array of the embedded document IDs that I want to retrieve from the parent. Those parameters are passed in as the third parameter to the mapReduce function.
In the map function I find the parent document in the collection (which I'm pretty sure uses the _id index) and emit its id and children to the reduce function.
In the reduce function, I take the passed in document and loop through each of the children, collecting the ones with the desired ID. Looping through all the children is not ideal, but I don't know of another way to find by ID on an embedded document.
I also assume in the reduce function that there is only one document emitted since I'm searching by ID. If you expect more than one parent_id to match, than you will have to loop through the values array in the reduce function.
I hope this helps someone out there, as I googled everywhere with no results. Hopefully we'll see a built in feature soon from MongoDB, but until then I have to use this.

Fadi, as for "keeping embedded documents separate" - group should handle this with no issues
function getServiceData(collection, criteria) {
var res=db[collection].group({
cond: criteria,
initial: {vals:[],globalVar:0},
reduce: function(doc, out) {
if (out.globalVar%2==0)
out.vals.push({doc.whatever.kind.and.depth);
out.globalVar++;
},
finalize: function(out) {
if (vals.length==0)
out.vals='sorry, no data';
return out.vals;
}
});
return res[0];
};