I'm trying to get a two-level facet count from our solr server. The documents look like this (shortened to only the relevant fields):
{
"id":"100071F5",
"datasource":"ABC",
"mediatype":"ISSBD"
}
With a query like this ...
http://localhost:8983/solr/select?wt=json&fl=id&indent=true&facet=true&facet.field=datasource&facet.field=mediatype
... I get as a result like this:
"facet_fields":{
"datasource":[
"ABC",75286,
"DEF",47257],
"mediatype":[
"POSTCARD",75286,
"RARE",12033,
"MANUSCRIPT",9418,
"BOOK",5849,
"OTHER",4747,
"UNKNOWN",2603,
"MAP",1033,
"GLOBE",704,
"MIXED",13,
"ISSUE",1]} ...
But what I really want is something like this:
"facet_fields":{
"datasource":[
"ABC",75286,
"mediatype":[
"POSTCARD",1234,
"RARE",1,
"BOOK",533,
"OTHER",47],
"DEF",47257,
"mediatype":[
"POSTCARD",7286,
"RARE",1203,
"MANUSCRIPT",918,
"BOOK",549,
"OTHER",4747,
"UNKNOWN",2603,
"MAP",1033,
"GLOBE",704,
"MIXED",13,
"ISSUE",1]} ...
In words: I like to have a facet over one field and then a subfacet for each subresult over another field. Is this possible in Solr?
You can use Pivot Faceting:
&facet.pivot=datasource,mediatype
It should give you a similar structure back:
[{
"field":"datasource",
"value":"ABC",
"count":75286,
"pivot":[{
"field":"mediatype",
"value":"POSTCARD",
"count":34919
}, { ....
}]
}]
You can also use the JSON Facet API to make even more detailed facet aggregations and sub-facets. Example adopted from the reference guide above:
top_datasource:{
type: terms,
field: datasource,
facet:{
top_mediatype:{
type: terms, // nested terms facet on author will be calculated for each parent bucket (datasource)
field: mediatype
}
}
}
Related
I'm trying to perform faceting based on a dynamic value. Basically I want identical behavior to the def function, but that doesn't seem to be available with faceting.
Consider these two "products":
{
"id":"product1",
"defaultPrice":19.99,
"overridePrice":14.99
},
{
"id":"product2",
"defaultPrice":49.99
}
I want to add that overridePrice is just an example. The actual field is a dynamic value that will depend on what context a search is performed in, and there may be many overridden prices, so I can't just derive price at index time.
For the response, I'm doing something like this for fl:
fl=price:def(overridePrice, defaultPrice) and using the same def function to perform sorting on price. This works fine.
So now I want to apply the same logic to facets. I've tried using json.facet, which seemed like it would work:
json.facet={
price: "def(overridePrice, defaultPrice)"
}
I've tried other variations as well, such as using field:def(overridePrice, defaultPrice) as well as field:price, but def doesn't seem to be an available function for faceting, and the price derived field is not available when faceting.
So the question: How can I perform faceting based on a default field like I'm doing for fl and sorting? Will this require a custom aggregation function, or is there a clever way I can do this without a custom function? It would be much more preferable to be able to do this with built-in Solr functionality.
I was able to do a hacky solution based on a tip in another question.
We can use two facets with a query to filter documents depending on if a field exists.
Example:
{
price_override: {
type: query,
q: "overridePrice:[* TO *]",
facet: {
price_override:{
type:terms,
field: overridePrice
}
}
},
price_standard: {
type: query,
q: "-overridePrice:[* TO *] AND defaultPrice:[* TO *]",
facet: {
price_standard: {
type: terms,
field: defaultPrice
}
}
}
}
Explanation:
price_override: {
type: query,
q: "overridePrice:[* TO *]"
This range query only selects documents that have an overridePrice field.
price_standard: {
type: query,
q: "-overridePrice:[* TO *] AND defaultPrice:[* TO *]"
-overridePrice:[* TO *] omits documents with the overridePrice field, and selects documents with a defaultPrice field.
And the facet response:
"facets":{
"count":2,
"price_override":{
"count":1,
"price_override":{
"buckets":[{
"val":14.99,
"count":1}]}},
"price_standard":{
"count":1,
"price_standard":{
"buckets":[{
"val":49.99,
"count":1}]}}}
This does require manually grouping price_override and price_standard into a single facet group, but the results are as expected. This could also pretty easily be tweaked into a ranged query, which is my use case.
In a Solr Implementation, I am trying to do some conditional highlight depending on others fields than the one we search on.
I want to get the matching result a field "content" highlighted only if it is indicated in Solr that this field can be exposed for this element.
Given a Solr base populated with :
[{ firstname:"Roman",
content: "A quick response is the best",
access:"" },
{ "firstname":"Roman",
"content": "Responsive is important",
"access":"contentAuthorized" }
]
I would like to get both document in my answer, and the highlight on the "content" field only for the one with the data "access":"contentAuthorized", so I am executing the query:
q:(firstname:r* OR (+tags:contentAuthorized AND +content:r*))
The expected answer would be:
...
{
{
"firstname":"Roman"
},
{
"firstname":"Roman"
}
},
highlighting":{
"0f278cb5-7150-42f9-8dca-81bfa68a9c6e":{
"firstname":["<em>Roman</em>"],
"105c6464-0350-4873-9936-b46c39c88647":{
"firstname":["<em>Roman</em>"],
"content":["<em>Responsive</em> is important],
}
}
But I actually get:
...
{
{
"firstname":"Roman"
},
{
"firstname":"Roman"
}
},
highlighting":{
"0f278cb5-7150-42f9-8dca-81bfa68a9c6e":{
"firstname":["<em>Roman</em>"],
"content":["A quick <em>response</em> is the best"],
"105c6464-0350-4873-9936-b46c39c88647":{
"firstname":["<em>Roman</em>"],
"content":["<em>Responsive</em> is important],
}
}
So, I get the "content" on the highlight of the second element while (+tags:contentAuthorized AND +content:r*) is false.
Does anyone have an idea of how I could do conditional highlighting with Solr so ?
Thank you for reading this and for taking your time to think about it :D
If you want highlighting to be applied on certain fields only, then you need to set the query parameter hl.fl to those fields. In your case hl.fl=content. You should then set hl.requireFieldMatch=true.
Refer to Solr Highlighting documentation:
By default, false, all query terms will be highlighted for each field to be highlighted (hl.fl) no matter what fields the parsed query refer to. If set to true, only query terms aligning with the field being highlighted will in turn be highlighted.
For further info on how to use the query parameters: https://solr.apache.org/guide/8_6/highlighting.html
I am using node-red to communicate with cloudant and for each time my flow runs I might have different amount of id:s coming in msg.payload. Later I want to use these id:s to display all the relevant objects. Is it possible to search for multiple id:s in some way? Or do you have any other solution? Can't find anything about this online atm
It looks like Node-RED supports querying by _id, a search index, or all documents. When you use _id there does not seem to be a way to specify more than one ID. You can use a search index, however, to query for multiple IDs.
Create a search index in Cloudant similar to the following:
{
"_id": "_design/allDocSearch",
"views": {},
"language": "javascript",
"indexes": {
"byId": {
"analyzer": "standard",
"index": "function (doc) {\n index(\"id\", doc._id);\n}"
}
}
}
This corresponds to the following when using the Cloudant dashboard:
design doc = allDocSearch
index name = byId
index function =
function (doc) {
index("name", doc.name);
}
To search for multiple IDs your query would look something like this:
id:"1" OR id:"2"
In Node-Red set up your Cloudant node to point to the appropriate database, specify a "Search by" of search index, and configure your design document and index name (in this case it would be allDocSearch/byId).
You can test with a simple inject node with a payload similar to the search query above: id:"1" OR id:"2"
I am using Cloudant database and I would like to retrieve all the documents within the db that match specific fields.
I mean - I'd want to get only those documents which have some fields with specific values that I put.
Could you please help me with an example of code I can test?
Thanks in advance
A good, but quite general question.
You can achieve this in several ways. The most canonical CouchDB way would be to create a map-reduce view (secondary index) keyed on the field you wish to be able to query on, for example:
function (doc) {
if (doc && doc.surname) {
emit(doc.surname, 1);
}
}
You can create such views using the Cloudant dashboard:
You can now query this for all documents with a particular surname:
curl 'https://ACCOUNT.cloudant.com/examples/_design/example/_view/by_surname?limit=100&reduce=false&include_docs=true&startkey="kruger"&endkey="kruger0"'
If you want to be able to query on a combination of fields, you can create a vector-valued key:
function (doc) {
if (doc && doc.surname && doc.firstname) {
emit([doc.surname, doc.firstname], 1);
}
}
which is queried as this:
curl 'https://ACCOUNT.cloudant.com/examples/_design/example/_view/by_name?limit=100&reduce=false&include_docs=true&startkey=\["kruger", "stefan"\]&endkey=\["kruger","stefan0"\]'
If you're new to Cloudant, another way to query is by using the aptly named Cloudant Query (a.k.a. Mango), which is a json-based declarative query language.
It is well documented (https://console.bluemix.net/docs/services/Cloudant/api/cloudant_query.html), but the gist of it is queries of the type:
{
"selector": {
"year": {
"$gt": 2010
}
},
"fields": ["_id", "_rev", "year", "title"],
"sort": [{"year": "asc"}],
"limit": 10,
"skip": 0
}
what seemed a simple task, came to be a challenge for me.
I have the following mongodb structure:
{
(...)
"services": {
"TCP80": {
"data": [{
"status": 1,
"delay": 3.87,
"ts": 1308056460
},{
"status": 1,
"delay": 2.83,
"ts": 1308058080
},{
"status": 1,
"delay": 5.77,
"ts": 1308060720
}]
}
}}
Now, the following query returns whole document:
{ 'services.TCP80.data.ts':{$gt:1308067020} }
I wonder - is it possible for me to receive only those "data" array entries matching $gt criteria (kind of shrinked doc)?
I was considering MapReduce, but could not locate even a single example on how to pass external arguments (timestamp) to Map() function. (This feature was added in 1.1.4 https://jira.mongodb.org/browse/SERVER-401)
Also, there's always an alternative to write storedJs function, but since we speak of large quantities of data, db-locks can't be tolerated here.
Most likely I'll have to redesign the structure to something 1-level deep, like:
{
status:1,delay:3.87,ts:138056460,service:TCP80
},{
status:1,delay:2.83,ts:1308058080,service:TCP80
},{
status:1,delay:5.77,ts:1308060720,service:TCP80
}
but DB will grow dramatically, since "service" is only one of many options which will append each document.
please advice!
thanks in advance
In version 2.1 with the aggregation framework you are now able to do this:
1: db.test.aggregate(
2: {$match : {}},
3: {$unwind: "$services.TCP80.data"},
4: {$match: {"services.TCP80.data.ts": {$gte: 1308060720}}}
5: );
You can use a custom criteria in line 2 to filter the parent documents. If you don't want to filter them, just leave line 2 out.
This is not currently supported. By default you will always receive the whole document/array unless you use field restrictions or the $slice operator. Currently these tools do not allow filtering the array elements based on the search criteria.
You should watch this request for a way to do this: https://jira.mongodb.org/browse/SERVER-828
I'm attempting to do something similar. I tried your suggestion of using the GROUP function, but I couldn't keep the embedded documents separate or was doing something incorrectly.
I needed to pull/get a subset of embedded documents by ID. Here's how I did it using Map/Reduce:
db.parent.mapReduce(
function(parent_id, child_ids){
if(this._id == parent_id)
emit(this._id, {children: this.children, ids: child_ids})
},
function(key, values){
var toReturn = [];
values[0].children.forEach(function(child){
if(values[0].ids.indexOf(product._id.toString()) != -1)
toReturn.push(child);
});
return {children: toReturn};
},
{
mapparams: [
"4d93b112c68c993eae000001", //example parent id
["4d97963ec68c99528d000007", "4debbfd5c68c991bba000014"] //example embedded children ids
]
}
).find()
I've abstracted my collection name to 'parent' and it's embedded documents to 'children'. I pass in two parameters: The parent document ID and an array of the embedded document IDs that I want to retrieve from the parent. Those parameters are passed in as the third parameter to the mapReduce function.
In the map function I find the parent document in the collection (which I'm pretty sure uses the _id index) and emit its id and children to the reduce function.
In the reduce function, I take the passed in document and loop through each of the children, collecting the ones with the desired ID. Looping through all the children is not ideal, but I don't know of another way to find by ID on an embedded document.
I also assume in the reduce function that there is only one document emitted since I'm searching by ID. If you expect more than one parent_id to match, than you will have to loop through the values array in the reduce function.
I hope this helps someone out there, as I googled everywhere with no results. Hopefully we'll see a built in feature soon from MongoDB, but until then I have to use this.
Fadi, as for "keeping embedded documents separate" - group should handle this with no issues
function getServiceData(collection, criteria) {
var res=db[collection].group({
cond: criteria,
initial: {vals:[],globalVar:0},
reduce: function(doc, out) {
if (out.globalVar%2==0)
out.vals.push({doc.whatever.kind.and.depth);
out.globalVar++;
},
finalize: function(out) {
if (vals.length==0)
out.vals='sorry, no data';
return out.vals;
}
});
return res[0];
};