I have added some array data to a Mongo DB using PyMongo. I've also created indexes for the array elements. The general approach is shown below.
data = [
{"d": ["a", "b", "c"],},
{"d": ["a", "c", "b"],},
{"d": ["b", "a", "c"],},
]
my_collection.insert_many(data)
my_collection.create_index('d.0')
my_collection.create_index('d.1')
my_collection.create_index('d.2')
The last element in the arrays can be matched by running,
[x for x in my_collection.find({"d.2": "c"})]
As I understand things, the below command will show that the indexes are being used.
my_collection.find({"d.2": "c"}).explain()
I would like reference array elements with negative integers. For example,
[x for x in my_collection.find({"d.-1": "c"})]
The above command doesn't work. However the following equivalent command will produce the desire results.
[x for x in my_collection.find({"$expr": {"$eq": [ { "$arrayElemAt": [ "$d", -1 ] }, "c" ] }})]
However, running,
my_collection.find({"$expr": {"$eq": [ { "$arrayElemAt": [ "$d", -1 ] }, "c" ] }}).explain()
seems to indicate that indexes will not be used. Is there a way to use both negative array references and indexes?
Related
I need to parse the following hash of 2d arrays, where the first array has the keys and the rest of the arrays has the values.
input = {
"result": [
[
"id",
"name",
"address"
],
[
"1",
"Vishnu",
"abc"
],
[
"2",
"Arun",
"def"
],
[
"3",
"Arjun",
"ghi"
]
]
}
This is the result I came up with.
input[:result].drop(1).collect{|arr| Hash[input[:result].first.zip arr]}
Here I'm iterating through the result array ignoring its first sub array (the one that contains keys) then zip the key array and value array to make a hash afterwards I collect the hash to another array.
The above solution gives me what I want which is a hash
[{"id"=>"1", "name"=>"Vishnu", "address"=>"abc"}, {"id"=>"2", "name"=>"Arun", "address"=>"def"}, {"id"=>"3", "name"=>"Arjun", "address"=>"ghi"}]
Is there a better way to achieve the same result?
zip is the correct tool here, so your code is fine.
I'd use Ruby's array decomposition feature to extract keys and values, and to_h instead of Hash[]:
keys, *values = input[:result]
values.map { |v| keys.zip(v).to_h }
Or, if you prefer a "one-liner": (harder to understand IMO)
input[:result].yield_self { |k, *vs| vs.map { |v| k.zip(v).to_h } }
I'm trying to understand arrays in Firebase a bit more. Currently, I'm storing maps in arrays, where one of the fields inside the map is a position that I can use in my mobile app to sort the array with on retrieval and show results in the order of position.
The docs on Firebase say:
Arrays are sorted by elements. If elements are equal, the arrays are sorted by length.
For example, [1, 2, 3] < [1, 2, 3, 1] < [2].
And then there's a section describing how maps are sorted as well:
Key ordering is always sorted. For example, if you write {c: "foo", a: "bar", b: "qux"} the map is sorted by key and saved as {a: "foo", b: "bar", c: "qux"}.
Map fields are sorted by key and compared by key-value pairs, first comparing the keys and then the values. If the first key-value pairs are equal, the next key-value pairs are compared, and so on. If two maps start with the same key-value pairs, then map length is considered. For example, the following maps are in ascending order:
{a: "aaa", b: "baz"}
{a: "foo", b: "bar"}
{a: "foo", b: "bar", c: "qux"}
{a: "foo", b: "baz"}
{b: "aaa", c: "baz"}
{c: "aaa"}
But then I tried this in Firestore: I jumbled up the order of the maps in the above example, and stored them in an array:
data= [{"c": "aaa"}, {"a": "aaa", "b": "baz"}, {"a": "foo", "b": "baz"}, {"b": "aaa", "c": "baz"}, {"a": "foo", "b": "bar", "c": "qux"}, {"a": "foo", "b": "bar"}]
And upon inserting into a Firestore document, the array did not get sorted! While the keys themselves do get sorted within a single Map, the elements in the array stay in the same order.
So does sorting in arrays even work when elements are Maps? Here's an example of what I'm storing in Firestore:
{
"car_collection": {
"models": {
data: [
{
"model": "Honda",
"color": "black",
"position": 0
},
{
"model": "Hyundai",
"color": "red",
"position": 1
},
{
"model": "Chevrolet",
"color": "yellow"
"position": 2
}
]
}
}
}
I'm storing an additional field called "position", and the order of maps stays the same on every retrieval. Wondering if I even need to store this field, or data will be sorted in the order that I store it in.
Submitted a ticket to Google to improve the documentation for Array type, and I think it's helpful and accurate as seen through some smoke testing.
https://firebase.google.com/docs/firestore/manage-data/data-types
Copy-pasting the current version here:
An array cannot contain another array value as one of its elements.
Within an array, elements maintain the position assigned to them. When sorting two or more arrays, arrays are ordered based on their element values.
When comparing two arrays, the first elements of each array are compared. If the first elements are equal, then the second elements are compared and so on until a difference is found. If an array runs out of elements to compare but is equal up to that point, then the shorter array is ordered before the longer array.
For example, [1, 2, 3] < [1, 2, 3, 1] < [2]. The array [2] has the greatest first element value. The array [1, 2, 3] has elements equal to the first three elements of [1, 2, 3, 1] but is shorter in length.
So it seems you can safely expect the order of elements to be maintained in Firestore, while understanding the effects of addition/removal as well.
You will have to sort your array before posting it to Firestore.
Arrays are not sorted in RTD nor Firestore objects however are sorted by it's keys.
Or sort the arrays on the client side.
I have an array of arrays and want to append elements to the sub-arrays. += does what I want, but I'd like to understand why push does not.
Behavior I expect (and works with +=):
b = Array.new(3,[])
b[0] += ["apple"]
b[1] += ["orange"]
b[2] += ["frog"]
b => [["apple"], ["orange"], ["frog"]]
With push I get the pushed element appended to EACH sub-array (why?):
a = Array.new(3,[])
a[0].push("apple")
a[1].push("orange")
a[2].push("frog")
a => [["apple", "orange", "frog"], ["apple", "orange", "frog"], ["apple", "orange", "frog"]]
Any help on this much appreciated.
The issue here is b = Array.new(3, []) uses the same object as the base value for all the array cells:
b = Array.new(3, [])
b[0].object_id #=> 28424380
b[1].object_id #=> 28424380
b[2].object_id #=> 28424380
So when you use b[0].push, it adds the item to "each" sub-array because they are all, in fact, the same array.
So why does b[0] += ["value"] work? Well, looking at the ruby docs:
ary + other_ary → new_ary
Concatenation — Returns a new array built by concatenating the two arrays together to produce a third array.
[ 1, 2, 3 ] + [ 4, 5 ] #=> [ 1, 2, 3, 4, 5 ]
a = [ "a", "b", "c" ]
c = a + [ "d", "e", "f" ]
c #=> [ "a", "b", "c", "d", "e", "f" ]
a #=> [ "a", "b", "c" ]
Note that
x += y
is the same as
x = x + y
This means that it produces a new array. As a consequence, repeated use of += on arrays can be quite inefficient.
So when you use +=, it replaces the array entirely, meaning the array in b[0] is no longer the same as b[1] or b[2].
As you can see:
b = Array.new(3, [])
b[0].push("test")
b #=> [["test"], ["test"], ["test"]]
b[0].object_id #=> 28424380
b[1].object_id #=> 28424380
b[2].object_id #=> 28424380
b[0] += ["foo"]
b #=> [["test", "foo"], ["test"], ["test"]]
b[0].object_id #=> 38275912
b[1].object_id #=> 28424380
b[2].object_id #=> 28424380
If you're wondering how to ensure each array is unique when initializing an array of arrays, you can do so like this:
b = Array.new(3) { [] }
This different syntax lets you pass a block of code which gets run for each cell to calculate its original value. Since the block is run for each cell, a separate array is created each time.
It's because in the second code section, you're selecting the sub-array and pushing to it, if you want an array of array's you need to push the array to the main array.
a = Array.new(3,[])
a.push(["apple"])
a.push(["orange"])
a.push(["frog"])
to get the same result as the first one.
EDIT: I forgot to mention, because you initialize the array with blank array's as elements, you will have three empty elements in front of the pushed elements,
I had to update an array, and I used += and << in different runs of code inside a block passed to Array#each_with_object:
Code 1
(1..5).each_with_object([]) do |i, a|
puts a.inspect
a += [i]
end
Output:
[]
[]
[]
[]
[]
Code 2
(1..5).each_with_object([]) do |i, a|
puts a.inspect
a << [i]
end
Output:
[]
[1]
[1,2]
[1,2,3]
[1,2,3,4]
The += operator does not update the original array. Why? What am I missing here?
In each_with_object, the so-called memo object is common among the iterations. You need to modify that object in order to do something meaningful. The += operator is syntax sugar for + and assignment, which does not modify the receiver, hence the iteration has no effect. If you use methods like << or push, then it will have effect.
On the other hand, in inject, the so-called memo object is the return value of the block, and you don't need to modify the object, but you need to return the value you want for the next iteration.
It is clear to me that += operator is not updating the original array. Why?
Because the documentation says so (emphasis mine):
ary + other_ary → new_ary
Concatenation — Returns a new array built by concatenating the two arrays together to produce a third array.
[ 1, 2, 3 ] + [ 4, 5 ] #=> [ 1, 2, 3, 4, 5 ]
a = [ "a", "b", "c" ]
c = a + [ "d", "e", "f" ]
c #=> [ "a", "b", "c", "d", "e", "f" ]
a #=> [ "a", "b", "c" ]
Note that
x += y
is the same as
x = x + y
This means that it produces a new array. As a consequence, repeated use of += on arrays can be quite inefficient.
See also #concat.
Compare to <<
ary << obj → ary
Append—Pushes the given object on to the end of this array. This expression returns the array itself, so several appends may be chained together.
[ 1, 2 ] << "c" << "d" << [ 3, 4 ]
#=> [ 1, 2, "c", "d", [ 3, 4 ] ]
The documentation of Array#+ clearly says that a new array is returned (no less than four times, actually). This is consistent with other uses of the + method in Ruby, e.g. Bignum#+, Fixnum#+, Complex#+, Rational#+, Float#+, Time#+, String#+, BigDecimal#+, Date#+, Matrix#+, Vector#+, Pathname#+, Set#+, and URI::Generic#+.
I have a simple Collection for understand sort in MongoDB
my documents are:
{
"_id" : ObjectId("54b94985d74d670613e4fd35"),
"tag" : [
"A",
"B",
"Z"
]
}
{
"_id" : ObjectId("54b949c9d74d670613e4fd36"),
"tag" : [
"D",
"E",
"F"
]
}
{
"_id" : ObjectId("54b949dfd74d670613e4fd37"),
"tag" : [
"G",
"H",
"I"
]
}
When I sort by Tag I Have these results
db.candy.find().sort({tag:1})
{
"_id" : ObjectId("54b94985d74d670613e4fd35"),
"tag" : [
"A",
"B",
"Z"
]
}
{
"_id" : ObjectId("54b949c9d74d670613e4fd36"),
"tag" : [
"D",
"E",
"F"
]
}
{
"_id" : ObjectId("54b949dfd74d670613e4fd37"),
"tag" : [
"G",
"H",
"I"
]
}
Instead with tag:-1
db.candy.find().sort({tag:-1})
{
"_id" : ObjectId("54b94985d74d670613e4fd35"),
"tag" : [
"A",
"B",
"Z"
]
}
{
"_id" : ObjectId("54b949dfd74d670613e4fd37"),
"tag" : [
"G",
"H",
"I"
]
}
{
"_id" : ObjectId("54b949c9d74d670613e4fd36"),
"tag" : [
"D",
"E",
"F"
]
}
The results are very similar, the first object It's the same and Change only the second and the third.
Same Results with Array of object.
My question is:
How it works the sort?
I know that the letter A is the first letter of Alphabetic ( ASCII CODE ) and the Z is the last.
The mongo check each element ( or object ) of array ?
And Why the order inside array is the same when I use tag:-1 and tag:1 ? I expect something like
tag:1
{
"_id" : ObjectId("54b94985d74d670613e4fd35"),
"tag" : [
"A",
"B",
"Z"
]
}
And tag:-1
{
"_id" : ObjectId("54b94985d74d670613e4fd35"),
"tag" : [
"Z",
"A",
"B"
]
}
sort operator when sorting docs by array field, does the following:
When sorting descending, it takes the biggest element from each array and compares with other
When sorting ascending , it takes the smallest element from each array and compares with other
These are used only to sort the documents, so thats why the order inside the document is the same
With arrays, a less-than comparison or an ascending sort compares the
smallest element of arrays, and a greater-than comparison or a
descending sort compares the largest element of the arrays. As such,
when comparing a field whose value is a single-element array (e.g. [ 1
]) with non-array fields (e.g. 2), the comparison is between 1 and 2.
A comparison of an empty array (e.g. [ ]) treats the empty array as
less than null or a missing field.
http://docs.mongodb.org/manual/reference/method/cursor.sort/
Just to re-iterate and elaborate on #marcinn, answer.
When the below statement is issued, it asks mongodb to sort the documents that were found by the query passed to the find() statement, in this case, all the documents in the collection would be returned by the find() function, by the tag field.
A point to note here is, the field tag is of type array and not a simple field.
db.candy.find().sort({tag:1})
If it were a simple field, the documents would have been sorted by the value in the tag field.
Anyway, mongodb needs a value by which it can sort the documents. To get the value of the field, mongodb does the following:
Checks if tag is an array. If it is an array, it needs to choose an
element from the array whose value it can assume to be the weight of
particular document.
Now it checks if the sort specified is in ascending or descending
order.
If it is in ascending order, it finds the smallest element in the tag
array, else the largest.
With the element chosen from the tag array, for each document, the
sort is applied.
One important point to note here is that, the sort operation only changes the order of the root documents retrieved, in other words, it acts like an order by clause in SQL terms.
It does not change the order of the tag array elements for each document.
As a rule of thumb, a find() query, with limit, sort operations chained to it, does not change the structure of the retrieved documents. To manipulate the structure of the documents you need to perform an aggregate operation.
What you expect in your question, is achieved by manipulating the fields in each document, which only an aggregation operation can do.
So if you aggregate it as,
db.candy.aggregate([
{$match:{"_id":ObjectId("54b94985d74d670613e4fd35")}},
{$unwind:"$tag"},
{$sort:{"tag":-1}},
{$group:{"_id":"$_id","tag":{$push:"$tag"}}}
])
then you could get your result as:
{
"_id" : ObjectId("54b94985d74d670613e4fd35"),
"tag" : [
"Z",
"A",
"B"
]
}