how to use indexes in arangodb graph search?

how to use indexes in arangodb graph search? - graph-databases

i'm evaluating ArangoDb for my application.
I have a data model like a file system, with a Items document collection and a ItemsParents edge collection with parent-child relations about Items.
Now i would like to find all childs of a specific item, with a specific attribute
Ex: All childs of A with property Properties.Age.Value = 20
so i created an hash index over Items.Properties.Age.Value, and design this AQL query:
FOR item
IN GRAPH_NEIGHBORS('ItemsGraph', 'Items/A',
{ direction : 'outbound',
includeData: true,
neighborExamples : { 'Properties.Age.Value': 20 }
})
RETURN { Id: item._key, Name: item.Name }
the above query work well, but no index are used, so it perform a full scan of Items collection for test Properties.Age.Value filter.
How to design the query so that it performance efficiently using index and avoid a collection scan?
Thanks

Currently ArangoDB can only utilize edge indices in graph operations;
Not using GRAPH_NEIGHBOURS may offer using the index, but then you would have to filter for neighbours yourself.
Vertex centric indices, which would offer that kind of index-support may arive in one of the next two ArangoDB Releases.
[edit]
Meanwhile this is possible with newer ArangoDB releases.
GRAPH_NEIGHBOURS was integrated into the traversal engine. You would now create a combined index on Age and _from. You should use db._explain() to inspect the usage of indices.

Related

Neo4j How do I get this query to use an index?

I have this query and it refuses to use an index, idk if it's because the "Expand" stage in the pipeline or what exactly, but I can't get it to use an index in this form, especially in the ORDER BY clause, it still gives me a "Sort" stage in the planner, and I'd like to avoid it.
The index is the createdAt property.
PROFILE
MATCH (u:User {user_id: '61c84762da4e457d55656efa'})-[follows:FOLLOWS]->(following:User)-[relatedTo:POSTED|SHARED]->(everything)
WHERE relatedTo.createdAt > datetime("2000-02-12T15:42:10.866+00:00")
RETURN u, relatedTo, everything
ORDER BY relatedTo.createdAt DESC
Here is a picture of the planner
The only way it does what I want it to do, is if I remove everything prior to the last relation, which obviously defies the point of that query but it was just for testing.
PROFILE
MATCH (following:User)-[relatedTo:POSTED|SHARED]->(everything)
WHERE relatedTo.createdAt > datetime("2000-02-12T15:42:10.866+00:00")
RETURN relatedTo, everything
ORDER BY relatedTo.createdAt DESC
Now it uses the index.
Any ideas how to do I get it to use an index in both, the query & the sort?

I'm not entirely clear why you want to use an index?
In your first query an index is used to find the :User node and then relationship pointers are followed to find the other nodes of interest. In Neo4j following relationship pointers is always faster than trying to use an index to find nodes (unlike a relational database). Typically, you only want to use an index to find your start nodes in a path, which is what your first query is doing.
If you really want to split the query to start the index search in a different part of the path you could split the query into multiple parts using WITH.

Is there any way to sort on a nested value in Azure Cognitive Search?

Is there any way to sort on a nested value in Azure Cognitive Search?
My use case is that I have a database of songs that are associated with dances that one can dance to that song. Users can vote on the danceability of a dance to a song, so there is a is a numeric vote tally for each song/dance combination. A core part of the functionality for the search is to be able to do an arbitrary search and sort the results by the popularity of a particular dance.
I am currently modeling this by creating a new top level field with a decorated name (e.g. DNC_Salsa or DNC_Waltz) for each dance. This works. But aside from being clumsy, I can't associate other information with a dance. In addition, I have to dynamically add the dance fields, so I have to use the generic SearchDocument type in the C# library rather than using a POCO type.
I'd much prefer to model this with the dance fields as an array of subdocuments where the subdocuments contain a dance name, a vote count and the other information I'd like to associate with a dance.
A simplified example record would look something like this:
{
"title": "Baby, It's Cold Outside",
"artist": "Seth MacFarlane",
"tempo": 119.1,
"dances": [
{ "name", "cha cah", "votes", 1 },
{ "name", "foxtrot", "votes", 4 }
]
}
I gave this a try and received:
{"error":{"code":"OperationNotAllowed","message":"The request is invalid.","details":[{"code":"CannotEnableFieldForSorting","message":"The field 'Votes' cannot be enabled for sorting because it is directly or indirectly contained in a collection, which makes it a multi-valued field. Sorting is not allowed on multi-valued fields. Parameters: definition"}]}}
It looks like elastic search will do what I want:
Sort search results | Elasticsearch Guide [7.17] | Elastic
If I'm reading the Elasticsearch documetion correctly, you can basically say I'd like to sort on the dances subdocument by first filtering for name == "cha cha" and then sorting on the vote field.
Is there anything like this in Azure Cognitive Search? Or even something more restrictive? I don't need to do arbitrary sorting on anything in the subdocument. I would be happy to only ever sort on the vote count (although I'd have to be able to do that for any dance name).

It's not clear to me what your records or data model looks like. However, from the error message you provided, it's clear that you try to sort on a multivalue property. That is logically impossible.
Imagine a property Color that can contain colors like 'Red' or 'Blue'. If you sort by Color, you would get your red values before the blues. If you instead had 'Colors' that can contain multiple values like both 'Red' and 'Blue', how would you sort it? You can't.
So, if you actually want to sort by a property, that property has to contain a single value.
When that's said, I have a feeling you are really asking about ranking/boosting. Not sorting. Have a look at the examples with boosting and scoring profiles for different genres of music. I believe the use case in these examples could help you solve your use case.
https://learn.microsoft.com/en-us/azure/search/index-add-scoring-profiles#extended-example

MongoDB: $lookup on an indexed property vs $in on a non-indexed property

I am currently using MongoDB 3.5, I have two collections (users,items). Each user has a list of items
//users
{
_id: ObjectId('userObjId1')
itemArray: [
{ ObjectId('itemA'), specialId: '123-this-is-unique'},
{ ObjectId('itemB'), specialId: '456-this-is-unique'},
{ ObjectId('itemC'),specialId: '789-this-is-unique'},
]
}
and items
//items
{
_id: ObjectId('itemA')
specialId: '123this-is-unique'
owner: ObjectId('userObjId1')
}
One of my operations involve querying for users, given an array of specialIds
In my items collection, the items' specialIds are indexed.
Which one would be a better practice (and potentially better performance)?
A) Query the array of specialIds in the users' collection using the $in operator.
Pros: query stays within the same collection
Cons: The itemArray itself in each user is not indexed, from my understanding this may affect the performance
B) Query in the items collection, project the owner and use it to run $lookup in the users collection
Pros: newer sytanx, since specialIds is already indexed in the items collection, it should be a better performance.
Cons: Needs to access two collections in one query

It depends on how many users you have, and how many items each user has.
Plan A will work well if you have a small number of users, dozens perhaps hundreds, or if you can create an index on {"itemArray.specialId":1}
Plan B will use the index on specialId to selecting the items, and then the _id index in the users collection during lookup, which should perform fairly well.

Neo4j - How to use createRelationshipIndex / createNodeIndex in full-text search

So I understand that Neo4j 3.5 and above implements full-text search in cypher query via createNodeIndex(), e.g.:
CALL db.index.fulltext.createNodeIndex("myIndex", ["PersonNode"], ["name"])
where myIndex is an arbitrary variable I make up to store the index, PersonNode is the name of my Node label, and name is one of the attributes of PersonNode where I want the full-text search performed.
And to actually perform the search by name, I can do something like the following:
CALL db.index.fulltext.queryNodes("myIndex", "Charlie")
But now assume that PersonNode has a relationship of type PURCHASED_ITEM, which is connected to another node label ProductNode as follows:
PersonNode-[:PURCHASED_ITEM]->ProductNode
And assume further that ProductNode has an attribute called productTitle indicating the display title name for each product.
My question is, I would like to set up an index for this relationship (using, presumably, createRelationshipIndex()), and perform a full-text search by productTitle and return a list of all PersonNode that purchased the given product. How can I do this?
Addendum: I understand that the above could be done by first getting a list of all ProductNode instances matching the given title, then performing a normal cypher query to extract all related PersonNode instances. I also understand that for the above example, a normal cypher query would be all that I need. But the reason I'm asking this question is that I eventually need to implement a single search bar that would allow the user to input any text, including possible misspellings and all, and have it perform a search through multiple attributes and/or relationships of PersonNode, and the results need to be sorted by some kind of relevance score. And in order to do this, I feel I need to first grasp exactly how the relationship queries work in neo4j.

Here is an example of how to create a full-text index for the productTitle property of PURCHASED_ITEM relationships:
CALL db.index.fulltext.createRelationshipIndex("myRelIndex", ["PURCHASED_ITEM"], ["productTitle"])
And here is a snippet showing the use of that index:
CALL db.index.fulltext.queryRelationships("myRelIndex", "Hula Hoop") YIELD relationship, score
...

product title is the property of product node not the purchased item

MongoDB sorting order and unique fields

I'm using a mongoose schema like this one:
{
numero: {
type: Number,
required: true,
unique: true
},
capacidad: {
type: Number,
required: true
}
}
When I retrieve the collection's documents (p.e. using Model.find({})), I get the documents sorted by _id.
My questions are:
MongoDB creates an index for handling the unique: true requirement but it does not use it as default sorting mechanism?
If I do Model.find({}).sort("numero") does this use the index for handling uniqueness or must build another for my query?
If I define my own index (schema.index({ numero: 1 }), am I duplicating work?
Summarizing, what are the best practices for maintaining a collection sorted for querying?

First lets make a note that you are talking more specifically on how Mongoose does things as a mediator with MongoDB. What I mean by this is that in mongoose schema defining something like this:
numero: {
type: Number,
required: true,
unique: true
}
Actually means that (by using unique: true) you ARE creating a unique index on the field numero and the the index: true part is optional as per the documentation.
So you do not need to actually create another index.
That also should answer your question about Model.find({}).sort("numero") using the index as well.
As far as best practices go you should review the way you are querying your data and very importantly what type of data you have in order to figure out what kind of index you need. For example if you have lat/lng data you should probably be using the Geospatial Index etc.
You also do not want to go crazy on the indexing since that brings other issues as well.
Also very important tool to review what MongoDB is doing is the explain operator which gives you information on the query plan.
You should use it often to analyze and optimize your queries and figure out where your bottlenecks are.
Also you can view your collection statistics via
db.getCollection('your-collection-name').stats()
That would give you good information on the current indexes on the collection their size etc.
Hope this helps.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight