Find documents whose array field contain some subsets in MongoDB - arrays

The "users" collection has documents with an array field.
Example documents:
{
"_id" :1001,
"properties" : ["A", "B", "C", "D", "E", "F", "G", "H", "I"]
}
{
"_id" : 1002,
"properties" : ["1", "2", "3", "4", "5", "6", "7", "8", "9", "10"]
}
How can I build a query to get the documents which follow the next condition?
Get only the documents that have the properties:
[ "3" AND ("A" OR "1") AND ("B" OR "2") ]
or in other way:
"3" AND "A" AND "B"
OR
"3" AND "A" AND "2"
OR
"3" AND "1" AND "B"
OR
"3" AND "1" AND "2"
In the previous example, the query has to result only the document:
{
"_id" : 1002,
"properties" : ["1", "2", "3", "4", "5", "6", "7", "8", "9", "10"]
}
The collection has 4 million documents. The document array "properties" field has average length of 15 elements. The query I am looking for must have a good performance in this quite a big collection.

Stephan's answer is ok. Other ways to achieve the result using $in and $all operators:
db.users.find(
{
$and:[
{"properties":"3"},
{"properties" : {$in: ["A", "1"]}},
{"properties" : {$in: ["B", "2"]}}
]
}
);
(translation of your first description of the subset)
And
db.users.find(
{
$or: [
{"properties" : {$all: ["3", "A", "B"]}},
{"properties" : {$all: ["3", "A", "2"]}},
{"properties" : {$all: ["3", "1", "B"]}},
{"properties" : {$all: ["3", "1", "2"]}}
]
}
);
(translation of your second description of the subset)
I'm afraid I can't tell which one will ensure the best performance. I hope that you have and index on properties.
You may try the queries on a smaller collection with explain to see the execution plan

try this:
db.users.find(
{
$or: [
{$and: [{ "properties": "3" }, { "properties": "A" }, { "properties": "B" }]},
{$and: [{ "properties": "3" }, { "properties": "A" }, { "properties": "2" }]},
{$and: [{ "properties": "3" }, { "properties": "1" }, { "properties": "B" }]},
{$and: [{ "properties": "3" }, { "properties": "1" }, { "properties": "2" }]}
]
}
);
or
db.users.find(
{
$and: [
{"properties": "3" },
{$or: [ { "properties": "A" }, { "properties": "1" } ]},
{$or: [ { "properties": "B" }, { "properties": "2" } ]}
]
}
);

Related

How do I merge arrays from multiple documents without duplicates in MongoDB aggregation?

I have 3 documents:
{
"id": 1,
"user": "Brian1",
"configs": [
"a",
"b",
"c",
"d"
]
}
----
{
"id": 2,
"user": "max_en",
"configs": [
"a",
"h",
"i",
"j"
]
}
----
----
{
"id": 3,
"user": "userX",
"configs": [
"t",
"u",
"s",
"b"
]
}
I want to merge all the "configs" arrays into one array without dublicates,like this:
{
"configs": [
"a",
"b",
"c",
"d",
"h",
"i",
"j",
"t",
"u",
"s",
]
}
I've tried the following:
Aggregation.group("").addToSet("configs").as("configs") and { _id: "", 'configs': { $addToSet: '$configs' } }
The first one gives an error because I've left the fieldname empty (I don't know what to put there).
The second one returns a merged array but with duplicates.
When you want to group all the documents, you need to add {_id: null}
It means group all documents.
Probably you need this
db.collection.aggregate([
{
"$unwind": "$configs"
},
{
$group: {
_id: null,
configs: {
"$addToSet": "$configs"
}
}
}
])
But be cautious when you need to use on larger collection without a match.

jq - remove duplicate entries within all arrays inside the JSON file

I have the following JSON file.
[
{
"name": "first",
"Arrays": {
"dddd0001": [
"A",
"A",
"B",
"B",
"C",
"C",
"C",
"C",
"D",
"E",
"F"
]
}
},
{
"name": "second",
"Arrays": {
"dddd0002": [
"AA",
"AA",
"BA",
"BB",
"CC",
"CC",
"CC",
"CC",
"DD",
"DD",
"FF"
]
}
},
{
"name": "third",
"Arrays": {
"dddd0003": [
"1",
"1",
"2",
"3",
"3",
"4",
"4",
"4",
"0",
"0",
"0"
]
}
}
]
I need to remove duplicates inside every array in the JSON file. So the result should look like following
[
{
"name": "first",
"Arrays": {
"dddd0001": [
"A",
"B",
"C",
"D",
"E",
"F"
]
}
},
{
"name": "second",
"Arrays": {
"dddd0002": [
"AA",
"BA",
"BB",
"CC",
"DD",
"FF"
]
}
},
{
"name": "third",
"Arrays": {
"dddd0003": [
"1",
"2",
"3",
"4",
"0"
]
}
}
]
Array key names are not known in advance. There might be multiple arrays inside the Arrays object.
I tried to use unique_by but it requires the key name.
This algorithm - search for every array inside the Arrays object, for every such array apply unique function, re-assign results back to the array - should be fairly easy to implement, but I am stuck.
Thanks.
walk( if type == "array" then unique else . end)
If the original order should be respected, then you can easily use "def uniques" as defined at How do I get jq to return unique results when json has multiple identical entries?
you can use unique and |=:
$ jq '.[].Arrays[] |= unique' file.json
[
{
"name": "first",
"Arrays": {
"dddd0001": [
"A",
"B",
"C",
"D",
"E",
"F"
]
}
},
{
"name": "second",
"Arrays": {
"dddd0002": [
"AA",
"BA",
"BB",
"CC",
"DD",
"FF"
]
}
},
{
"name": "third",
"Arrays": {
"dddd0003": [
"0",
"1",
"2",
"3",
"4"
]
}
}
]
$
the only "problem" is that unique sorts the elements of the array, so for example contents of "dddd0003" array are not in the same order of your expected result. I don't know if this is could be a problem for you.
if "Arrays" property can also contain "non-array" values, extra care can be taken in order to "filter out" those "non-array" values so that unique doesn't complain.
select(type == "array") can be used: (output omitted):
$ jq '(.[].Arrays[] | select(type == "array")) |= unique' file.json
...
or arrays:
$ jq '(.[].Arrays[] | arrays) |= unique' file.json
...
these last two solutions better reflect your algorithm.
var jsonArr = [
{
"name": "first",
"Arrays": {
"dddd0001": [
"A",
"A",
"B",
"B",
"C",
"C",
"C",
"C",
"D",
"E",
"F"
]
}
},
{
"name": "second",
"Arrays": {
"dddd0002": [
"AA",
"AA",
"BA",
"BB",
"CC",
"CC",
"CC",
"CC",
"DD",
"DD",
"FF"
]
}
},
{
"name": "third",
"Arrays": {
"dddd0003": [
"1",
"1",
"2",
"3",
"3",
"4",
"4",
"4",
"0",
"0",
"0"
]
}
}
]
for(var i=0; i< jsonArr.length; i++)
{
var arrtemp = jsonArr[i][Object.keys(jsonArr[i])[1]];
var arrtmp2 = arrtemp[Object.keys(arrtemp)[0]];
jsonArr[i][Object.keys(jsonArr[i])[1]] =arrtmp2.filter((v, p) => arrtmp2.indexOf(v) == p);
console.log(jsonArr[i])
}

How to filter a collection in Mongodb

Let's say I have three documents in a collection, like so:
[
{"_id": "101", parts: ["a", "b"]},
{"_id": "102", parts: ["a", "c"]},
{"_id": "103", parts: ["a", "z"]},
]
what is the query I have to write so that if I input ["a","b","c"]
(i.e. all items in parts field value in each doc should be present in ["a","b","c"]) will output:
[
{"_id": "101", parts: ["a", "b"]},
{"_id": "102", parts: ["a", "c"]}
]
is this even possible? any idea?
Below solution may not be the best but it works. The idea is finding all documents that has no items in parts outside the input array. It can be done with combination of $not, $elemMatch and $nin:
db.collection.find({
parts: {
$not: {
"$elemMatch": {
$nin: ["a", "b", "c"]
}
}
}
})
Mongo Playground
Thanks to #prasad_. I have tried to come up with a solution which is similar to what I wanted. I have used $setDifference here.
db.collection.aggregate([
{
$project: {
diff: {
$setDifference: [
"$parts",
[
"a",
"b",
"c"
]
]
},
document: "$$ROOT"
}
},
{
$match: {
"diff": {
$eq: []
}
}
},
{
$project: {
"diff": 0
}
},
])
output:
[
{
"_id": "101",
"document": {
"_id": "101",
"parts": [
"a",
"b"
]
}
},
{
"_id": "102",
"document": {
"_id": "102",
"parts": [
"a",
"c"
]
}
}
]
Mongo Playground

Filter for array type of field in loopback

I am using loopback v3 with mongodb database and implementing filter for array type of field.
inq operator is not working.
I have an array of object like below
[
{
"name":"name1",
"title": "title1",
"category": ["a", "b","c"]
},
{
"name":"name2",
"title": "title2",
"category": ["x", "y","z"]
},
{
"name":"name3",
"title": "title3",
"category": ["b", "d","e"]
}
]
now i want a list where category containing "b"
So i am using below filter method
filter: {where:{category:{inq:["b"]}}}
I think inq does n't work for this case.it gives empty response.
Output : [ ]
how can i get my desired output.
Desired output:
[
{
"name":"name1",
"title": "title1",
"category": ["a", "b","c"]
},
{
"name":"name3",
"title": "title3",
"category": ["b", "d","e"]
}
]
below is my properties
"properties": {
"name": {
"type": "string"
},
"title": {
"type": "string"
},
"category": {
"type": [
"string"
]
}
},
Please suggest.
Thanks
for me the above scenario works fine. Although in your code the array closing brackets should be ] instead of }, just pointing out something I found in your code.
How did you setup your model for this?
"properties": {
"name": {
"type": "string"
},
"title": {
"type": "string"
},
"category": {
"type": [
"string"
]
}
},
Does your model properties look like this ?
"role": {
"type": "array",
"default": [
"patient"
]
}
let filter = {role:{in:['doctor']}}
this.find({
where: filter
}, cb);

MongoDB working with indexed array

For example, I have a collection "test" with an index on array field "numbers", I have two documents there:
db.test.createIndex({"numbers": 1})
db.test.insert({"title": "A", "numbers": [1,4,9]})
db.test.insert({"title": "B", "numbers": [2,3,7]})
1) How can I get all results sorted by "numbers" (using index), so for each value from an array I get a full document? Like this:
{"_id": "...", "title": "A", "numbers": [1,4,9]}
{"_id": "...", "title": "B", "numbers": [2,3,7]}
{"_id": "...", "title": "B", "numbers": [2,3,7]}
{"_id": "...", "title": "A", "numbers": [1,4,9]}
{"_id": "...", "title": "B", "numbers": [2,3,7]}
{"_id": "...", "title": "A", "numbers": [1,4,9]}
2) How can I get such results (sorry for no explanation, but I think it's clear what I'm trying to achieve here):
{"_id": "...", "title": "A", "numbers": 1}
{"_id": "...", "title": "B", "numbers": 2}
{"_id": "...", "title": "B", "numbers": 3}
{"_id": "...", "title": "A", "numbers": 4}
{"_id": "...", "title": "B", "numbers": 7}
{"_id": "...", "title": "A", "numbers": 9}
3) How can I get similar results, but ordering by the second element in each array?:
{"_id": "...", "title": "B", "numbers": 3}
{"_id": "...", "title": "A", "numbers": 4}
Also I care about the performance, so it'd be great if you explain which technique is faster / slower (if there is more than one way to do it, of course). Thanks.
UPD: Let me clarify. We have an index on "numbers" array. So I want to iterate this index from min to max values and get a document which the current value belongs. So some document will be presented in results N times, where N = number of elements in its array ("numbers" in this case).
Simply use the index in the sort by "dot notation":
db.collection.find().sort({ "numbers.0": 1 })
Which is the fastest way if you now the position of which you want, so just use the "index" ( starting at 0 of course ). So the same applies to any indexed position of the array.
If you want the "smallest" value in an array to sort by, then that takes more work, using .aggregate() to work that out:
db.collection.aggregate([
{ "$unwind": "$numbers" },
{ "$group": {
"_id": "$_id",
"numbers": { "$push": "$numbers" },
"min": { "$min": "$numbers" }
}},
{ "$sort": { "min": 1 } }
])
And naturally that is going to take more time in execution due to the work done than the earlier form. It of course requires the $unwind in order to de-normalize the array elements to individual documents, and the the $group with specifically $min to find the smallest value. Then of course there is the basic $sort you need.
For the full thing then you can basically do this:
db.test.aggregate([
{ "$project": {
"title": 1,
"numbers": 1,
"copy": "$numbers"
}},
{ "$unwind": "$copy" },
{ "$group": {
"_id": {
"_id": "$_id",
"number": "$copy"
},
"numbers": { "$first": "$numbers" }
}},
{ "$sort": { "_id.number": 1 } }
])
Which produces:
{
"_id" : {
"_id" : ObjectId("560545d64d64216d6de78edb"),
"number" : 1
},
"numbers" : [ 1, 4, 9 ]
}
{
"_id" : {
"_id" : ObjectId("560545d74d64216d6de78edc"),
"number" : 2
},
"numbers" : [ 2, 3, 7 ]
}
{
"_id" : {
"_id" : ObjectId("560545d74d64216d6de78edc"),
"number" : 3
},
"numbers" : [ 2, 3, 7 ]
}
{
"_id" : {
"_id" : ObjectId("560545d64d64216d6de78edb"),
"number" : 4
},
"numbers" : [ 1, 4, 9 ]
}
{
"_id" : {
"_id" : ObjectId("560545d74d64216d6de78edc"),
"number" : 7
},
"numbers" : [ 2, 3, 7 ]
}
{
"_id" : {
"_id" : ObjectId("560545d64d64216d6de78edb"),
"number" : 9
},
"numbers" : [ 1, 4, 9 ]
}

Resources