A way to bring facet results together based on common id - arrays

I'm doing a mongodb aggregation with two facets. Each facet is a different operation performed on the same collection. Each facet's results had two fields per object; the id and the operation result. I want to combine each facet's results based on the common id.
The desired result is like this:
[
{
"id":"1",
"bind":"xxx",
"pres":"xxx"
},
{
"id":"2",
......
}
]
I would like unfound areas to be zero or not be included if that is supported.
I've started with
const combined_agg = [
{
"$facet":{
"bind":opp_bind,
"pres":opp_pres,
}
}
Where the two opp are the variables for the two operations. The above gives me:
[
{
"bind":
[
{"binding":6,"id":"xxxx"},
....
],
"pres":
[
{"presenting":4,"id":"xxxx"},
....
]
}
]
From here, I am running into trouble.
I have tried to concatenate the arrays with
{
"$project":{"result":{"$concatArrays":["$bind","$pres"]}}
}
which gives me one object with one large array. I tried to $unwind that large array so I objects are at the root but unwind only gives me the first 20 items of the array.
I tried using $group within the result array, but that gives me an id field with an array of all the ids and two other fields with arrays of their values.
{
"$group":{
"_id":"$result.id",
"fields":{
"$push":{"bind":"$result.bind","pres":"$result.pres"}
}
}
}
I don't know how to separate them out so I can recombine them. I also saw some somewhat similar problems using map but I couldn't wrap my head around it.

I was able to figure out how to do it. I used lookup with a pipeline to get the right format.
Lookup added the result to every object of the original query. Then I used project and filter to find the correct value from the second query. Then I used addFields and arrayElementAt to get the value I wanted along with another project to get only the values I needed. It wasn't very pretty though.

Related

Saving conditions whether a user has done something in MongoDB: which field type should I use?

So I have this scenario where I want to save a condition whether if a user "has joined the website onboarding tour". So the question I want to ask is not how to save the field to MongoDB, but rather what field type should I use for it?
I want it to be reusable for another certain condition such as "has joined a campaign A", or "has visited special page B". Now I have 3 cases that you can look:
Case 1: Just a single boolean field for every case: One field for one condition, such as hasJoinedNewOnboarding: true or hasJoinedCampaignA: false. All I need to do is search by key: true.
Case 2: Use an array: One array for many conditions, such as ['hasJoinedNewOnboarding','hasJoinedCampaignA']. Let's say the field name is meta_data. All I need to do is search using $elemMatch like { meta_data: { $elemMatch: { 'hasJoinedNewOnboarding' } } }
Case 3: Use an object: One object for many conditions, such as Case 2: Use an array: One array for many conditions, such as { hasJoinedNewOnboarding: true, hasJoinedCampaignA: false }. Let's say the field name is meta_data. All I need to do is search it like { 'meta_data.hasJoinedNewOnboarding': true }
With that said, which one do you think is the best way to store the conditions in the database? Or do you have something in mind that is better than these 3 cases?
Thanks
There's not much difference when you want to query your data, it will always be either:
db.col.find({hasJoinedNewOnboarding: true})
or for the second approach:
db.col.find({arrayName: "hasJoinedNewOnboarding"})
Both ways are easy however I would recommend storing such events in an array because it's easier to aggregate the data when you don't need to refer to multiple key names in MongoDB,
For example, if you have a document like:
{
events: [
"hasJoinedNewOnboarding",
"hasJoinedCampaignA"
]
}
You can dynamically count how many users have done something by running following query:
db.collection.aggregate([
{
$unwind: "$events"
},
{
$group: {
_id: "$events",
count: { $sum: 1 }
}
}
])
Mongo Playground
Alternatively if you decide to use first or third approach the name of the event is represented by the name of the key in MongoDB's document so you can still easily count single event occurances but if you want to group all events dynamically you need to use $objectToArray operator which becomes more cumbersome.
So the recommended approach would be to keep them as an array of strings or an array of objects like:
{ events: [ { eventType: "NewOnboarding", date: ... } ] }

Mongodb query to find element value type of nested array or object in a field

I have mongodb data model where I have some array fields that contain embedded objects or arrays. I have some inconsistencies in the field in question because I've tweaked my application logic. Initially, my model looked like this:
Initial Setup of Results collection
"competition" : "competition1",
"stats" : [
{
"stat1" : [],
"stat2" : []
}
]
However, I saw that this wasn't the best setup for my needs. So I changed it to the following:
New Setup of Results collection
"competition" : "competition1",
"stats" : [
{
"stat1" : 3,
"stat2" : 2
}
]
My problem now is that documents that have the initial setup cause an error. So what I want is to find all documents that have the initial setup and convert them to have the new setup.
How can I accomplish this in mongodb?
Here is what I've tried, but I'm stuck...
db.getCollection('results').find({"stats.0": { "$exists": true }})
But what I want is to be able to do something like
db.getCollection('results').find({"stats.0".stat1: { "$type": Array}})
Basically I want to get documents where the value of stats[0].stat1 is of type array and override the entire stats field to be an empty array.
This would fix the errors I'm getting.
$type operator for arrays in older versions works little differently than what you might think than $type in 3.6.
This will work in 3.6
db.getCollection('results').find( { "stats.0.stat1" : { $type: "array" } } )
You can do it couple of ways for lower versions and It depends what you are looking for.
For empty arrays you can just check
{"stats.0.stat1":{$size:0}}
For non empty arrays
{"stats.0.stat1": {$elemMatch:{ "$exists": true }}}
Combine both using $or for finding both empty and non empty array.
For your use case you can use below update
db.getCollection('results').update({"stats.0.stat1":{$size:0}}, {$set:{"stats":[]}})

Return only one element from strings array in elasticsearch

I have array of strings in one field "strArray":
strArray: ['browser:IE', 'device:PC', 'country:USA', 'state:CA']
I need do aggregations by browser (device, country or state). It's not a problem, if I know order of these values in strArray field.
I could to use those structure:
"aggs": {
"deviceAggs": {
"terms": {
"script": "doc['strArray'][1]"
}
}
}
But problem is that order of inserting these strings can be different.
How can I do this ? I think about several ways:
Scripting - use function like as substring and get only "correct" values.
Filtering - it's possible to filter one value (which contains string "device:") from array.
Sorting strArray values to put all values in definite order, but "sort" give me strange result - return only one element (without any filtering).
Don't ask me, why I have this structure (this is not my choice), if we have structure key: value - we would not have problems.
Scripting is only directly possible here.
To get an idea on how to use scripting in aggregations, you can refer this blog.
Something like below should work
for(element in doc['strArray'].values){
if(element.startsWith('browser')){
return element;
}
};
return null;
Both sorting and filtering is done on document level and not element level.
On element level if you can make this array as nested , filtering is possible. That is first you need to change the structure to -
strArray: [
{ "name" : 'browser:IE' } ,
{ "name" : 'device:PC' }
]
And then make the strArray field as nested.
In that case you can do a nested filter based on prefix query ( Using query filter ) and then , do a nested aggregation on the data.

Using CouchDB-lucene how can I index an array of objects (not values)

Hello everyone and thanks in advance for any ideas, suggestions or answers.
First, the environment: I am using CouchDB (currently developing on 1.0.2) and couchdb-lucene 0.7. Obviously, I am using couchdb-lucene ("c-l" hereafter) to provide full-text searching within couchdb.
Second, let me provide everyone with an example couchdb document:
{
"_id": "5580c781345e4c65b0e75a220232acf5",
"_rev": "2-bf2921c3173163a18dc1797d9a0c8364",
"$type": "resource",
"$versionids": [
"5580c781345e4c65b0e75a220232acf5-0",
"5580c781345e4c65b0e75a220232acf5-1"
],
"$usagerights": [
{
"group-administrators": 31
},
{
"group-users": 3
}
],
"$currentversionid": "5580c781345e4c65b0e75a220232acf5-1",
"$tags": [
"Tag1",
"Tag2"
],
"$created": "/Date(1314973405895-0500)/",
"$creator": "administrator",
"$modified": "/Date(1314973405895-0500)/",
"$modifier": "administrator",
"$checkedoutat": "/Date(1314975155766-0500)/",
"$checkedoutto": "administrator",
"$lastcommit": "/Date(1314973405895-0500)/",
"$lastcommitter": "administrator",
"$title": "Test resource"
}
Third, let me explain what I want to do. I am trying to figure out how to index the '$usagerights' property. I am using the word index very loosely because I really do not care about being able to search it, I simply want to 'store' it so that it is returned with the search results. Anyway, the property is an array of json objects. Now, these json objects that compose the array will always have a single json property.
Based on my understanding of couchdb-lucene, I need to reduce this array to a comma separated string. I would expect something like "group-administrators:31,group-users:3" to be a final output.
Thus, my question is essentially: How can I reduce the $usagerights json array above to a comma separated string of key:value pairs within the couchdb design document as used by couchdb-lucene?
A previous question I posted regarding indexing of tagging in a similar situation, provided for reference: How-to index arrays (tags) in CouchDB using couchdb-lucene
Finally, if you need any additional details, please just post a comment and I will provide it.
Maybe I am missing something, but the only difference I see from your previous question, is that you should iterate on the objects. Then the code should be:
function(doc) {
var result = new Document(), usage, right;
for(var i in doc.$usagerights) {
usage = doc.$usagerights[i];
for(right in usage) {
result.add(right + ":" + usage[right]);
}
}
return result;
}
There's no requirement to convert to a comma-separated list of values (I'd be intrigued to know where you picked up that idea).
If you simply want the $usagerights item returned with your results, do this;
ret.add(JSON.stringify(doc.$usagerights),
{"index":"no", "store":"yes", "field":"usagerights"});
Lucene stores strings, not JSON, so you'll need to JSON.parse the string on query.

mongodb - retrieve array subset

what seemed a simple task, came to be a challenge for me.
I have the following mongodb structure:
{
(...)
"services": {
"TCP80": {
"data": [{
"status": 1,
"delay": 3.87,
"ts": 1308056460
},{
"status": 1,
"delay": 2.83,
"ts": 1308058080
},{
"status": 1,
"delay": 5.77,
"ts": 1308060720
}]
}
}}
Now, the following query returns whole document:
{ 'services.TCP80.data.ts':{$gt:1308067020} }
I wonder - is it possible for me to receive only those "data" array entries matching $gt criteria (kind of shrinked doc)?
I was considering MapReduce, but could not locate even a single example on how to pass external arguments (timestamp) to Map() function. (This feature was added in 1.1.4 https://jira.mongodb.org/browse/SERVER-401)
Also, there's always an alternative to write storedJs function, but since we speak of large quantities of data, db-locks can't be tolerated here.
Most likely I'll have to redesign the structure to something 1-level deep, like:
{
status:1,delay:3.87,ts:138056460,service:TCP80
},{
status:1,delay:2.83,ts:1308058080,service:TCP80
},{
status:1,delay:5.77,ts:1308060720,service:TCP80
}
but DB will grow dramatically, since "service" is only one of many options which will append each document.
please advice!
thanks in advance
In version 2.1 with the aggregation framework you are now able to do this:
1: db.test.aggregate(
2: {$match : {}},
3: {$unwind: "$services.TCP80.data"},
4: {$match: {"services.TCP80.data.ts": {$gte: 1308060720}}}
5: );
You can use a custom criteria in line 2 to filter the parent documents. If you don't want to filter them, just leave line 2 out.
This is not currently supported. By default you will always receive the whole document/array unless you use field restrictions or the $slice operator. Currently these tools do not allow filtering the array elements based on the search criteria.
You should watch this request for a way to do this: https://jira.mongodb.org/browse/SERVER-828
I'm attempting to do something similar. I tried your suggestion of using the GROUP function, but I couldn't keep the embedded documents separate or was doing something incorrectly.
I needed to pull/get a subset of embedded documents by ID. Here's how I did it using Map/Reduce:
db.parent.mapReduce(
function(parent_id, child_ids){
if(this._id == parent_id)
emit(this._id, {children: this.children, ids: child_ids})
},
function(key, values){
var toReturn = [];
values[0].children.forEach(function(child){
if(values[0].ids.indexOf(product._id.toString()) != -1)
toReturn.push(child);
});
return {children: toReturn};
},
{
mapparams: [
"4d93b112c68c993eae000001", //example parent id
["4d97963ec68c99528d000007", "4debbfd5c68c991bba000014"] //example embedded children ids
]
}
).find()
I've abstracted my collection name to 'parent' and it's embedded documents to 'children'. I pass in two parameters: The parent document ID and an array of the embedded document IDs that I want to retrieve from the parent. Those parameters are passed in as the third parameter to the mapReduce function.
In the map function I find the parent document in the collection (which I'm pretty sure uses the _id index) and emit its id and children to the reduce function.
In the reduce function, I take the passed in document and loop through each of the children, collecting the ones with the desired ID. Looping through all the children is not ideal, but I don't know of another way to find by ID on an embedded document.
I also assume in the reduce function that there is only one document emitted since I'm searching by ID. If you expect more than one parent_id to match, than you will have to loop through the values array in the reduce function.
I hope this helps someone out there, as I googled everywhere with no results. Hopefully we'll see a built in feature soon from MongoDB, but until then I have to use this.
Fadi, as for "keeping embedded documents separate" - group should handle this with no issues
function getServiceData(collection, criteria) {
var res=db[collection].group({
cond: criteria,
initial: {vals:[],globalVar:0},
reduce: function(doc, out) {
if (out.globalVar%2==0)
out.vals.push({doc.whatever.kind.and.depth);
out.globalVar++;
},
finalize: function(out) {
if (vals.length==0)
out.vals='sorry, no data';
return out.vals;
}
});
return res[0];
};

Resources