ArangoDB Indexes and arrays - arrays

I am trying use document collection for fast lookup, sample document
document Person {
...
groups: ["admin", "user", "godmode"],
contacts: [
{
label: "main office",
items: [
{ type: "phone", value: '333444222' },
{ type: "phone", value: '555222555' },
{ type: "email", value: 'bob#gmail.com' }
]
}
]
...
}
Create Hash index for "groups" field
Query: For P in Person FILTER "admin" IN P.groups RETURN P
Result: Working, BUT No index used via explain query
Question: How use queries with arrays filter and indexes ? performance is main factor
Create Hash index for "contacts[].items[].value"
Query: For P in Person FILTER "333444222" == P.contacts[*].items[*].value RETURN P
Result: Double usage of wildcard not supported?? Index not used, query empty
Question: How organize fast lookup with for this structure with indexes ?
P.S. also tried MATCHES function, multi lever for-in, hash indexed for arrays never used
ArangoDB version 2.6.8

Indexes can be used from ArangoDB version 2.8 on.
For the first query (FILTER "admin" IN p.groups), an array hash index on field groups[*] will work:
db._create("persons");
db.persons.insert(personDateFromOriginalExample);
db.persons.ensureIndex({ type: "hash", fields: [ "groups[*]" ] });
This type of index does not exist in versions prior to 2.8.
With an array index in place, the query will produce the following execution plan (showing that the index is actually used):
Execution plan:
Id NodeType Est. Comment
1 SingletonNode 1 * ROOT
6 IndexNode 1 - FOR p IN persons /* hash index scan */
3 CalculationNode 1 - LET #1 = "admin" in p.`groups` /* simple expression */ /* collections used: p : persons */
4 FilterNode 1 - FILTER #1
5 ReturnNode 1 - RETURN p
Indexes used:
By Type Collection Unique Sparse Selectivity Fields Ranges
6 hash persons false false 100.00 % [ `groups[*]` ] "admin" in p.`groups`
The second query will not be supported by array indexes, as it contains multiple levels of nesting. The array indexes in 2.8 are restricted to one level, e.g. groups[*] or contacts[*].label will work, but not groups[*].items[*].value.

about 1.) this is Work-in-progress and will be included in one of the next releases (most likely 2.8).
We have not yet decided about the AQL syntax to retrieve the array, but FILTER "admin" IN P.groups is among the most likely ones.
about 2.) having implemented 1. this will work out of the box as well, the index will be able to cover several depths of nesting.
Neither of the above can be properly indexed in the current release (2.6)
The only alternative i can offer is to externalize the values and use edges instead of arrays.
In your code the data would be the following (in arangosh).
I used fixed _key values for simplicity, works without them as well:
db._create("groups"); // saves the group elements
db._create("contacts"); // saves the contact elements
db._ensureHashIndex("value") // Index on contacts.value
db._create("Person"); // You already have this
db._createEdgeCollection("isInGroup"); // Save relation group -> person
db._createEdgeCollection("hasContact"); // Save relation item -> person
db.Person.save({_key: "user"}) // The remainder of the object you posted
// Now the items
db.contacts.save({_key:"phone1", type: "phone", value: '333444222' });
db.contacts.save({_key:"phone2", type: "phone", value: '555222555' });
db.contacts.save({_key:"mail1", type: "email", value: 'bob#gmail.com'});
// And the groups
db.groups.save({_key:"admin"});
db.groups.save({_key:"user"});
db.groups.save({_key:"godmode"});
// Finally the relations
db.hasContact.save({"contacts/phone1", "Person/user", {label: "main office"});
db.hasContact.save({"contacts/phone2", "Person/user", {label: "main office"});
db.hasContact.save({"contacts/mail1", "Person/user", {label: "main office"});
db.isInGroup.save("groups/admin", "Person/user", {});
db.isInGroup.save("groups/godmode", "Person/user", {});
db.isInGroup.save("groups/user", "Person/user", {});
Now you can execute the following queries:
Fetch all admins:
RETURN NEIGHBORS(groups, isInGroup, "admin")
Get all users having a contact with value 333444222:
FOR x IN contacts FILTER x.value == "333444222" RETURN NEIGHBORS(contacts, hasContact, x)

Related

How to insert/delete an item by index in an "array" in Firebase's realtime database?

In my app I have 3 vehicles in array, ordered by an index.
So my array looks like this:
// My array
const array = [
{ name: 'car', index: 1 },
{ name: 'aeroplane', index: 2 },
{ name: 'bus', index: 3 }
];
// How I push to the database
database.ref('/').push(array[0]);
database.ref('/').push(array[1]);
database.ref('/').push(array[2]);
// How my realtime db looks right after
{
"-M9UBf4uJa9cy0_ZT4LF" : {
"index" : 1,
"name" : "car"
},
"-M9UBf50Jdyo4GmXO23l" : {
"index" : 2,
"name" : "aeroplane"
},
"-M9UBf526cmgjUw99dWw" : {
"index" : 3,
"name" : "bus"
}
}
No problem so far.
However, I'm struggling when it comes to deleting or inserting vehicles in Firebase, because the indexes remain like they were. So for example, If I delete 'airplane' above with the following code:
// Delete second item
database.ref('/-M9UBf50Jdyo4GmXO23l').remove();
...I get an array like this:
[{name: 'car', index: 1}, {name: 'bus', index: 3}].
I need the bus's index to become '2' instead of remaining '3'. I realize there's no built-in way to achieve this using Firebase's API (and on purpose, since it would create problems in real time applications). However, I absolutely need the indexes to remain.
I thought about manually .update()-ing every index greater than the one I deleted but in cases where I have millions and millions of vehicles, wouldn't the performance be horrible?
Is there any way I could this with Firebase's realtime database or should I go with another database?

Apache solr filter based on array values

I have 2 data's which I have mentioned below:
{
id: intro.original
publicationID: TRENTXWB_EM_R1
hasBeenModifiedBy: [DAL]
isModificationFor: null
text: ... intro ...
}
{
id: intro.dal
publicationID: TRENTXWB_EM_R1
hasBeenModifiedBy: []
isModificationFor: DAL
text: ... intro ...
}
Need to develop a filter query which checks "hasBeenModifiedBy".If the array contains 'DAL' it has to commit that datset(i.e. igonore the dataset). So, in this case, we have to get the second dataset which doesn't have "DAL" in "hasBeenModified" array.
Please suggest me an approach.

Using $rename in MongoDB for an item inside an array of objects

Consider the following MongoDB collection of a few thousand Objects:
{
_id: ObjectId("xxx")
FM_ID: "123"
Meter_Readings: Array
0: Object
Date: 2011-10-07
Begin_Read: true
Reading: 652
1: Object
Date: 2018-10-01
Begin_Reading: true
Reading: 851
}
The wrong key was entered for 2018 into the array and needs to be renamed to "Begin_Read". I have a list using another aggregate of all the objects that have the incorrect key. The objects within the array don't have an _id value, so are hard to select. I was thinking I could iterate through the collection and find the array index of the errored Readings and using the _id of the object to perform the $rename on the key.
I am trying to get the index of the array, but cannot seem to select it correctly. The following aggregate is what I have:
[
{
'$match': {
'_id': ObjectId('xxx')
}
}, {
'$project': {
'index': {
'$indexOfArray': [
'$Meter_Readings', {
'$eq': [
'$Meter_Readings.Begin_Reading', True
]
}
]
}
}
}
]
Its result is always -1 which I think means my expression must be wrong as the expected result would be 1.
I'm using Python for this script (can use javascript as well), if there is a better way to do this (maybe a filter?), I'm open to alternatives, just what I've come up with.
I fixed this myself. I was close with the aggregate but needed to look at a different field for some reason that one did not work:
{
'$project': {
'index': {
'$indexOfArray': [
'$Meter_Readings.Water_Year', 2018
]
}
}
}
What I did learn was the to find an object within an array you can just reference it in the array identifier in the $indexOfArray method. I hope that might help someone else.

Ruby way of summing an array of objects by field

I have an array of objects that I'd like to group by field1 and sum by field2. An example would be a class product that has a title field and a price field.
In an array of products, I have multiple gloves with different prices, and multiple hats with different prices. I'd like to have an array with distinct titles, that aggregate all the prices under the same title.
There's an obvious solution with iterating over the array and using a hash, but I was wondering if there was a "ruby way" of doing something like this? I've seen a lot of examples where Ruby has some unique functionality that applies well to certain scenarios and being a Ruby newbie I'm curious about this.
Thanks
There's a method transform_values added in ruby 2.4 or if you require 'active_support/all', with this you can do something like so:
products = [
{type: "hat", price: 1, name: "fedora"},
{type: "hat", price: 2, name: "sombrero"},
{type: "glove", price: 3, name: "mitten"},
{type: "glove", price: 4, name: "wool"}
]
result = products
.group_by { |product| product[:type] }
.transform_values { |vals| vals.sum { |val| val[:price] } }
# => {"hat"=>3, "glove"=>7}
It's a little unclear to me from the question as asked what your data looks like, so I ended up with this:
Product = Struct.new(:title, :price)
products = [
Product.new("big hat", 1),
Product.new("big hat", 2),
Product.new("small hat", 3),
Product.new("small hat", 4),
Product.new("mens glove", 8),
Product.new("mens glove", 9),
Product.new("kids glove", 1),
Product.new("kids glove", 2)
]
Given that data, this is how I'd go about building a data structure which contains the sum of all the prices for a given title:
sum_by_title = products.inject({}) do |sums, product|
if sums[product.title]
sums[product.title] += product.price
else
sums[product.title] = product.price
end
sums
end
This produces:
{"big hat"=>3, "small hat"=>7, "mens glove"=>17, "kids glove"=>3}
To explain:
Ruby inject takes an initial value and passes that to the iteration block as a "memo". Here, {} is the initial value. The return value from the block is passed into the next iteration as the memo.
The product.title is used as a hash key and the running sum is stored in the hash value. An if statement is necessary because the first time a product title is encountered, the stored value for that title is nil and cannot be incremented.
I probably wouldn't ship this code due to the hidden magic of the hash default value constructor but it's possible to write the same code without the if statement:
sum_by_title = products.inject(Hash.new { 0 }) do |sums, product|
sums[product.title] += product.price
sums
end
Hope you enjoy Ruby as much as I do!

Remove all items from an array NOT in array in MongoDB

I have two collections in Mongo. For simplification I´m providing a minified example
Template Collection:
{
templateId:0,
values:[
{data:"a"},
{data:"b"},
{data:"c"}
{data:"e"}
]
}
Data Collection:
{
dataId:0,
templateId:0,
values:[
{data:"a",
value: 10},
{data:"b",
value: 120},
{data:"c",
value: 3220},
{data:"d",
value: 0}
]
}
I want to make a sync from Template Collection -> Data Collection, between the template 0 and all documents using that template. In the case of the example, that would mean:
Copy {data:"e"} into the arrays of all documents with the templateId: 0
Remove {data:"d"} from the arrays of all documents with the templateId: 0
BUT do not touch the rest of the items. I can´t simply replace the array, because those values have to be kept
I´ve found a solution for 1.
db.getCollection('data').update({templateId:0},
{$addToSet: {values: {
$each:[
{data:"a"},
{data:"b"},
{data:"c"}
{data:"e"}
]
}}}, {
multi: true}
)
And a partial solution for 2.
I got it. First tried with $pullAll, but the normal $pull seems to work together with the $nin operator
db.getCollection('data').update({templateId:"QNXC4bPAF9J6r9FQu"},
{$pull:{values: { $nin:[
{data:"a"},
{data:"b"},
{data:"c"}
{data:"e"}]
}}}, {
multi: true}
)
This will remove {data:"d"} from all document arrays, but it seems to overwrite the complete array, and this is not what I want, as those value entries need to be persisted
But how can I perform a query like Remove everything from an array EXCEPT/NOT IN [a,b,c,d,...] ?

Resources