Exclude a vertex and its children in a traversal [Gremlin API] - graph-databases

In my graph product vertex can have a composed_of edge from another product vertex.
Now I am trying to exclude a vertex and its children with composed_of edge when selecting all vertex with label product.
initially i have the id of the vertex to be excluded, but i dont know how to exclude it and its children when selecting all the product vertex in one query.
Seed db:
//add product vertex
g.addV('product').property('id', 'product1').property('pk', 'product1');
g.addV('product').property('id', 'product2').property('pk', 'product2');
g.addV('product').property('id', 'product3').property('pk', 'product3');
g.addV('product').property('id', 'product4').property('pk', 'product4');
g.addV('product').property('id', 'product5').property('pk', 'product5');
//add composed_of edge
g.V('product1').addE('composed_of').to(g.V('product2'))
g.V('product1').addE('composed_of').to(g.V('product3'))
now i want to be able to select product4 and product5 by excluding product1 and its children with composed_of edge.
note: im sorry if this commands wont work with your gremlin console because i first started learning gremlin using cosmosDB.

I think this is what you looking for:
g.V().hasLabel('product').where(
__.not(coalesce(
hasId('product1'),
__.in('composed_of').hasId('product1')
))
)

Related

Recursively get all the vertices emerging from a group of vertices

I have a vertex label "group" and a group can have multiple "categories".
For example, a group named "food" can have multiple categories like "Seafood, Chinese, Indian" which are connected by an edge labelled "label1".
Now, a category can have further categories, like "Seafood" can have "fish, prawns" and so on. the depth is arbitrary and these all further categories are connected by edge labelled "label2".
food --label1--> seafood --label2--> fish --label2--> jellyfish --label2--> so on
--label2--> starfish
--label2--> prawns
--label2--> crab
--label1--> Indian
--label1--> Chinese
I want to recursively traverse all the vertices and get the data.
I hope you understood the question. Please help me out.
It's as easy as:
g.V(food).out("label1").
emit().
repeat(out("label2"))

Creating a one-to-many relationship in Drupal 7

I have a content-type Team Members. Each Team Member has a Position. Each Position has a Category.
E.g.:
Category 1
Position 1
Position 2
Category 2
Position 3
Position 4
If you create a new peace of content using the content-type Team Member you should be able to choose one and only one Position.
I then want to render all the Team Members in the order of the above example. So First you'll see the title of the first Category, then the title of the fist Position, then all team members with that position etc.
Can anyone tell me what's the best solution for this?
It sounds like you need to:
create a multilevel taxonomy
add a term reference field to your Team Members content type (you can try
Simple Hierarchical Select module if you have long/complex hierarchy)
use views ( possibly a combination of Views Tree and Views Field View ) to display it the way you want

ArangoDB: Insert as function of query by example

Part of my graph is constructed using a giant join between two large collections, and I run it every time I add documents to either collection.
The query is based on an older post.
FOR fromItem IN fromCollection
FOR toItem IN toCollection
FILTER fromItem.fromAttributeValue == toItem.toAttributeValue
INSERT { _from: fromItem._id, _to: toItem._id, otherAttributes: {}} INTO edgeCollection
This takes about 55,000 seconds to complete for my dataset. I would absolutely welcome suggestions for making that faster.
But I have two related issues:
I need an upsert. Normally, upsert would be fine, but in this case, since I have no way of knowing the key up front, it wouldn't help me. To get the key up front, I would need to query by example to find the key of the otherwise identical, existing edge. That seems reasonable as long as it doesn't kill my performance, but I don't know how in AQL to construct my query conditionally so that it inserts an edge if the equivalent edge does not exist yet, but does nothing if the equivalent edge does exist. How can I do this?
I need to run this every time data gets added to either collection. I need a way to run this only on the newest data so that it doesn't try to join the entire collection. How can I write AQL that allows me to join only the newly inserted records? They're added with Arangoimp, and I have no guarantees on which order they'll be updated in, so I cannot create the edges at the same time as I create the nodes. How can I join only the new data? I don't want to spend 55k seconds every time a record is added.
If you run your query as written without any indexes, then it will have to do two nested full collection scans, as can be seen by looking at the output of
db._explain(<your query here>);
which shows something like:
1 SingletonNode 1 * ROOT
2 EnumerateCollectionNode 3 - FOR fromItem IN fromCollection /* full collection scan */
3 EnumerateCollectionNode 9 - FOR toItem IN toCollection /* full collection scan */
4 CalculationNode 9 - LET #3 = (fromItem.`fromAttributeValue` == toItem.`toAttributeValue`) /* simple expression */ /* collections used: fromItem : fromCollection, toItem : toCollection */
5 FilterNode 9 - FILTER #3
...
If you do
db.toCollection.ensureIndex({"type":"hash", fields ["toAttributeValue"], unique:false})`
Then there will be a single full table collection scan in fromCollection and for each item found there is a hash lookup in the toCollection, which will be much faster. Everything will happen in batches, so this should already improve the situation. The db._explain() will show this:
1 SingletonNode 1 * ROOT
2 EnumerateCollectionNode 3 - FOR fromItem IN fromCollection /* full collection scan */
8 IndexNode 3 - FOR toItem IN toCollection /* hash index scan */
To only work on recently inserted items in fromCollection is relatively easy: Simply add a timestamp of the import time to all vertices, and use:
FOR fromItem IN fromCollection
FILTER fromItem.timeStamp > #lastRun
FOR toItem IN toCollection
FILTER fromItem.fromAttributeValue == toItem.toAttributeValue
INSERT { _from: fromItem._id, _to: toItem._id, otherAttributes: {}} INTO edgeCollection
and of course put a skiplist index on the timeStamp attribute in fromCollection.
This should work beautifully to discover new vertices in the fromCollection. It will "overlook" new vertices in the toCollection that are linked to old vertices in fromCollection.
You can discover these by interchanging the roles of the fromCollection and the toCollection in your query (do not forget the index on fromAttributeValue in fromCollection) and remembering to only put in edges if the from vertex is old, like in:
FOR toItem IN toCollection
FILTER toItem.timeStamp > #lastRun
FOR fromItem IN fromCollection
FILTER fromItem.fromAttributeValue == toItem.toAttributeValue
FILTER fromItem.timeStamp <= #lastRun
INSERT { _from: fromItem._id, _to: toItem._id, otherAttributes: {}} INTO edgeCollection
These two together should do what you want. Please find the fully worked example here.

Common vertex to two relationships in OrientDb

I am trying to pick up OrientDb and am trying out a few test queries to get a feel for the syntax and power of using graph databases. In particular I am having difficulty in find a common vertex satisfying two independent relationships to two other vertices that are unrelated.
For example,
Assuming I have a person vertex having attribute name, a car vertex having a model and a location vertex with attribute zip with the following edge dependencies
Person --- owns --> Car
Person --- lives --> Location
I am trying to find all the Person vertices that own a particular model of car and live at a particular zip.
I am not sure exactly what I am missing in my efforts, but any help would be greatly appreciated.
Thanks
Let's assume this domain model:
Car <--- Owns ---| Person |--- Lives ---> Location
* <--- Owns ---|1 *|--- Lives ---> 1
All the persons who own a particular car model:
select expand(in('Owns')) from Car where model = 'Volvo'
All the persons who live at a particular zip:
select expand(in('Lives')) from Location where zip = '10770'
Let's combine the above to get all the persons who own a particular car model and live at a particular zip:
select from (
select expand(in('Owns')) from Car where model = 'Volvo'
) where out('Lives') contains (zip='10770')
Minor:
The function expand() transforms the result set so as you get more than just the record IDs in the print out.
Passing Owns/Lives to the in/out functions ensures that only the edges of class types Owns/Lives are traversed.

Solr Search Facets: How do i make them count products and NOT product varieties

The shop i'm working on sells clothing. Each item of clothing come in multiple varieties. For example Shirt A might come in: Red Large, Red Medium, Blue Large, Blue Medium, White Large, and White Medium.
At first I had added each variety as a solr doc. So for the above product I added 6 solr docs, each with the same Product ID. I got solr to group the results by Product ID and everything worked perfectly.
However the facet counts were all variety counts and not product counts. So for example .. just limiting it to the one product above - (if that were the only product in the system say).. the facet counts would show:
Red (2)
Blue (2)
White (2)
Which was correct, there were 2 documents added for each color. But really what i want to see is this:
Red (1)
Blue (1)
White (1)
As there is only 1 product for each color.
So now i'm thinking in order to do that I need to make each solr document a product.
In that case i would add the product, and add the field "color" 3 times one red, one blue, one white, and add the field size 3 times as well. But now solr doesn't really know what size goes with each color. Maybe I only have white in small.
What is the correct way to go about this to make the facet counts as they should be?
Turns out I could do this using grouping (field collapsing) here
http://wiki.apache.org/solr/FieldCollapsing#Request_Parameters
specially these parameters added to the query
group=true
group.field=product_id"
group.limit=100
group.facet=true
group.ngroups=true
group.facet is the one that really make the facets work with the groups like i wanted them to.
I think that you have 2 options.
Option 1:
Once you get the list of facet values (Red, Blue & White in the given example), then fire the original query again with each facet value as a filter. For example, if the original query was q=xyz&group.field=ProductID then fire q=xyz&group.field=ProductID&group.ngroups=true&fq=color:Red. The ngroups value in the response will give you the required count for Red. Similarly, fire a separate query for Blue and White.
Option 2:
Create a separate field called Product_Color which includes both the ProductID and the color. For example, if a product has ID is ABC123 and color is Red, then Product_Color will be ABC123_Red. Now, to get the facets for color, fire a separate query which groups by Product_Color instead of ProductID and you will get the required facets with the correct values. Remeber to set group.truncate=true for this to work.
You can try looking into Facet Pivot, which would allow you to have single document, tree like facet with proper counts and filtering.

Resources