I have a simple Collection for understand sort in MongoDB
my documents are:
{
"_id" : ObjectId("54b94985d74d670613e4fd35"),
"tag" : [
"A",
"B",
"Z"
]
}
{
"_id" : ObjectId("54b949c9d74d670613e4fd36"),
"tag" : [
"D",
"E",
"F"
]
}
{
"_id" : ObjectId("54b949dfd74d670613e4fd37"),
"tag" : [
"G",
"H",
"I"
]
}
When I sort by Tag I Have these results
db.candy.find().sort({tag:1})
{
"_id" : ObjectId("54b94985d74d670613e4fd35"),
"tag" : [
"A",
"B",
"Z"
]
}
{
"_id" : ObjectId("54b949c9d74d670613e4fd36"),
"tag" : [
"D",
"E",
"F"
]
}
{
"_id" : ObjectId("54b949dfd74d670613e4fd37"),
"tag" : [
"G",
"H",
"I"
]
}
Instead with tag:-1
db.candy.find().sort({tag:-1})
{
"_id" : ObjectId("54b94985d74d670613e4fd35"),
"tag" : [
"A",
"B",
"Z"
]
}
{
"_id" : ObjectId("54b949dfd74d670613e4fd37"),
"tag" : [
"G",
"H",
"I"
]
}
{
"_id" : ObjectId("54b949c9d74d670613e4fd36"),
"tag" : [
"D",
"E",
"F"
]
}
The results are very similar, the first object It's the same and Change only the second and the third.
Same Results with Array of object.
My question is:
How it works the sort?
I know that the letter A is the first letter of Alphabetic ( ASCII CODE ) and the Z is the last.
The mongo check each element ( or object ) of array ?
And Why the order inside array is the same when I use tag:-1 and tag:1 ? I expect something like
tag:1
{
"_id" : ObjectId("54b94985d74d670613e4fd35"),
"tag" : [
"A",
"B",
"Z"
]
}
And tag:-1
{
"_id" : ObjectId("54b94985d74d670613e4fd35"),
"tag" : [
"Z",
"A",
"B"
]
}
sort operator when sorting docs by array field, does the following:
When sorting descending, it takes the biggest element from each array and compares with other
When sorting ascending , it takes the smallest element from each array and compares with other
These are used only to sort the documents, so thats why the order inside the document is the same
With arrays, a less-than comparison or an ascending sort compares the
smallest element of arrays, and a greater-than comparison or a
descending sort compares the largest element of the arrays. As such,
when comparing a field whose value is a single-element array (e.g. [ 1
]) with non-array fields (e.g. 2), the comparison is between 1 and 2.
A comparison of an empty array (e.g. [ ]) treats the empty array as
less than null or a missing field.
http://docs.mongodb.org/manual/reference/method/cursor.sort/
Just to re-iterate and elaborate on #marcinn, answer.
When the below statement is issued, it asks mongodb to sort the documents that were found by the query passed to the find() statement, in this case, all the documents in the collection would be returned by the find() function, by the tag field.
A point to note here is, the field tag is of type array and not a simple field.
db.candy.find().sort({tag:1})
If it were a simple field, the documents would have been sorted by the value in the tag field.
Anyway, mongodb needs a value by which it can sort the documents. To get the value of the field, mongodb does the following:
Checks if tag is an array. If it is an array, it needs to choose an
element from the array whose value it can assume to be the weight of
particular document.
Now it checks if the sort specified is in ascending or descending
order.
If it is in ascending order, it finds the smallest element in the tag
array, else the largest.
With the element chosen from the tag array, for each document, the
sort is applied.
One important point to note here is that, the sort operation only changes the order of the root documents retrieved, in other words, it acts like an order by clause in SQL terms.
It does not change the order of the tag array elements for each document.
As a rule of thumb, a find() query, with limit, sort operations chained to it, does not change the structure of the retrieved documents. To manipulate the structure of the documents you need to perform an aggregate operation.
What you expect in your question, is achieved by manipulating the fields in each document, which only an aggregation operation can do.
So if you aggregate it as,
db.candy.aggregate([
{$match:{"_id":ObjectId("54b94985d74d670613e4fd35")}},
{$unwind:"$tag"},
{$sort:{"tag":-1}},
{$group:{"_id":"$_id","tag":{$push:"$tag"}}}
])
then you could get your result as:
{
"_id" : ObjectId("54b94985d74d670613e4fd35"),
"tag" : [
"Z",
"A",
"B"
]
}
Related
I have added some array data to a Mongo DB using PyMongo. I've also created indexes for the array elements. The general approach is shown below.
data = [
{"d": ["a", "b", "c"],},
{"d": ["a", "c", "b"],},
{"d": ["b", "a", "c"],},
]
my_collection.insert_many(data)
my_collection.create_index('d.0')
my_collection.create_index('d.1')
my_collection.create_index('d.2')
The last element in the arrays can be matched by running,
[x for x in my_collection.find({"d.2": "c"})]
As I understand things, the below command will show that the indexes are being used.
my_collection.find({"d.2": "c"}).explain()
I would like reference array elements with negative integers. For example,
[x for x in my_collection.find({"d.-1": "c"})]
The above command doesn't work. However the following equivalent command will produce the desire results.
[x for x in my_collection.find({"$expr": {"$eq": [ { "$arrayElemAt": [ "$d", -1 ] }, "c" ] }})]
However, running,
my_collection.find({"$expr": {"$eq": [ { "$arrayElemAt": [ "$d", -1 ] }, "c" ] }}).explain()
seems to indicate that indexes will not be used. Is there a way to use both negative array references and indexes?
I have documents with an array of events objects :
{
"events": [
{
"name": "A"
},
{
"name": "C"
},
{
"name": "D"
},
{
"name": "B"
},
{
"name": "E"
}
]
},
{
"events": [
{
"name": "A"
},
{
"name": "B"
},
{
"name": "S"
},
{
"name": "C"
}
]
}
]
In this array, I want to count the number of events that are in a said order, with intervening events. For example, I look for the order [A,B,C], with the array [A,x,x,B,x], I should count 2, with [A,B,x,x,C] I should have 3. (x is just a placeholder for anything else)
I want to summarize this information for all my documents in the shape of an array, with the number of matches for each element. With the previous example that would give me [2,2,1], 2 matches for A, 2 matches for B, 1 match for C.
My Current aggregation is generated in javascript and follow this pattern :
Match documents with the event array containing A
Slice the array from A to the end of the array
Count the number of documents
Append the count of matching document to the summarizing array
Match documents with the event array containing B
Slice the array from B to the end of the array
Count the number of documents
etc
However, when an event does not appear in any of the arrays, it falls shorts, as there are no documents, I do not have a way to store the summarizing array. For example, with the events array [A,x,x,B,x] [A,B,x,x,C] and trying to match [A,B,D], I would expect [2,2,0], but I have [] as when trying to match D nothing comes up, and the aggregation cannot continue.
Here is the aggregation I'm working with : https://mongoplayground.net/p/rEdQD4FbyC4
change the matching letter l.75 to something not in the array to have the problematic behavior.
So is there a way to not lose my data when there is no match? like bypassing aggregation stages, I could not find anything related to bypassing stages in the mongoDB documentation.
Or are you aware of another way of doing this?
We ended using a reduce to solve our problem
The reduce is on the events array and with every event we try to match it with the element in sequence at the position "size of the accumulator", if it is a match it's added to the accumulator, ortherwise no, etc
here is the mongo playground : https://mongoplayground.net/p/ge4nlFWuLsZ\
The sequence we want to match is in the field "sequence"
The matched elements are in the "matching" field
I need to parse the following hash of 2d arrays, where the first array has the keys and the rest of the arrays has the values.
input = {
"result": [
[
"id",
"name",
"address"
],
[
"1",
"Vishnu",
"abc"
],
[
"2",
"Arun",
"def"
],
[
"3",
"Arjun",
"ghi"
]
]
}
This is the result I came up with.
input[:result].drop(1).collect{|arr| Hash[input[:result].first.zip arr]}
Here I'm iterating through the result array ignoring its first sub array (the one that contains keys) then zip the key array and value array to make a hash afterwards I collect the hash to another array.
The above solution gives me what I want which is a hash
[{"id"=>"1", "name"=>"Vishnu", "address"=>"abc"}, {"id"=>"2", "name"=>"Arun", "address"=>"def"}, {"id"=>"3", "name"=>"Arjun", "address"=>"ghi"}]
Is there a better way to achieve the same result?
zip is the correct tool here, so your code is fine.
I'd use Ruby's array decomposition feature to extract keys and values, and to_h instead of Hash[]:
keys, *values = input[:result]
values.map { |v| keys.zip(v).to_h }
Or, if you prefer a "one-liner": (harder to understand IMO)
input[:result].yield_self { |k, *vs| vs.map { |v| k.zip(v).to_h } }
I have a static list of values that is in a JSONArray. Here is my example array:
JSONArray json = new JSONArray()
json = ["B", "E", "C", "Z", "A", "X", "F", "H"]
I need to sort this json array in a custom way. I need to put "E" first, "F" second, and then sort the rest by alphabetical order.
I want my end result to be this:
json = ["E", "F", "A", "B", "C", "H", X", "Z"]
Groovy has the basic sort functionality that I can sort alphabetically or reverse alphabetically using:
json.sort()
or
json.reverse()
I'm looking for an easy way to do a custom sort.
in my 5-min experiment I used weights:
def json = ["B", "E", "C", "Z", "A", "X", "F", "H"]
def weights = [ E:10, F:9 ]
json.sort{
it.charAt( 0 ) - ( weights[ it ] ?: 0 )
}
assert '[E, F, A, B, C, H, X, Z]' == json.toString()
you might want to include some error checking
You can use closures if you define your own sort method, but what you're actually asking for is some array splitting with a little normal sorting.
json.findAll{it = 'E'} + json.findAll{it = 'F'} + json.findAll{!(it in ['E', 'F'])}.sort()
If you're worried about the efficiency of looping through your json 3 times you can iterate through your json once, adding it to different arrays as you go.
The below example is a little fancier. The inject method will iterate over a collection, passing a value between each iteration (in our case a list of 3 lists. The first list will hold our E's, the second our F's, and the 3rd for everything else. After sorting our catchall list we use .flatten() to transform the 3 lists back into one list.
List organizedList = json.inject([[],[],[]]) {List<List> result, String jsonValue ->
select(jsonValue) {
case 'E':
result.get(0).add(jsonValue) // Could have just added 'E' but I like symmetry
break;
case 'F':
result.get(1).add(jsonValue)
break;
default:
result.get(2).add(jsonValue)
}
return result // Gets passed to each iteration over json
}
organizedList.get(2).sort() // sort on a list modifies the original list
organizedList.flatten()
It's also possible using sort with a closure where you define your own sorting; but as you can see, it doesn't flow quite as easily.
json.sort {String a, String b ->
if (a = b) return 0 // For efficiency's sake
def letterFirst = {String priority -> // Closure to help sort against a hardcoded value
if (a = priority) return 1
if (b = priority) return -1
return 0
}
def toReturn = letterFirst('E')
if (!toReturn) toReturn = letterFirst('F') // groovy evaluates 0 as false
if (!toReturn) toReturn = a <=> b
return toReturn
}
I found two different syntaxes for querying in mongo where an array is non-empty. I imagine the crux of this may actually be my data rather than the query, but in the event mongo is doing some null-shenanigans I don't understand, I wanted to check here---what is the preferred way of selecting documents where the 'institution.tags' array is ["populated", "and"] not []?
First option---check that the 0-th item of the array exists:
> db.coll.find({'institution.tags.0': {'$exists':true}}).count()
7330
Second option---check that this list field is not null:
> db.coll.find({'institution.tags': {"$ne":null}}).count()
28014
In theory, all fields named 'institution.tags' are an array type---I don't expect any to be dictionary-types, strings, or numbers. But I do otherwise see dramatically different counts, so I'm wondering what I should be expecting, and which query is better both semantically (is it doing what I think it is doing) or for performance.
The following snippet from the Mongo shell should clarify your question:
> db.coll.insert({_id:1})
> db.coll.insert({_id:2, array: []})
> db.coll.insert({_id:3, array: ["entry"]})
> db.coll.insert({_id:4, array: "no_array"})
> db.coll.find().pretty()
{ "_id" : 1 }
{ "_id" : 2, "array" : [ ] }
{ "_id" : 3, "array" : [ "entry" ] }
{ "_id" : 4, "array" : "no_array" }
> db.coll.count({array: {$exists:true}})
3
> db.coll.count({array: {$ne:null}})
3
> db.coll.count({"array.0": {$exists:true}})
1
> db.coll.count({$and: [{$where: "Array.isArray(this.array)"}, {array: {$size: {$gt: 0}}}]})
1
Your first option is probably the right way to go, but if you want to be as explicit as possible have a look at the last query in my example which uses $where to check the $type as well as $size. Your second option simply checks if the field exists, you will not know
whether it is an array
or - if so - whether it is empty