GraphDB: Node as a property of a relationship - graph-databases

I am trying to analyze graphDB as an alternative to RDBMS for a problem domain. Here is the analogy of a problem I am trying to solve.
P:Michael and P:Angela r:like_to_eat G:Apple and G:Bread. G:Apple and G:Bread are r:available_in S:Walmart and S:Whole Foods.
So far it's straightforward. Here is an image that I think best expresses the graph.
The problem is when I try to specify that Angela likes Apples from Whole Foods and Bread from Walmart. And Michael likes to eat Apples from Walmart and Bread from Whole Foods. How can I represent something like that in a graph? It sounds like I need the concept of a hypergraph to be able to solve this problem, but I have also heard that any hypergraph problem can be solved with property graph too. Can this be solved using standard graph solutions like Neo4j or CosmosDB? Can someone please help me with this

You can "reify" the 3-way relationship (between Person, Grocery, and Store) in a Preference node (say), resulting in a data model like this:
In neo4j, you can use this Cypher query to represent "Angela likes Apples from Whole Foods and Bread from Walmart":
MERGE (angela:Person {name: 'Angela'})
MERGE (apple:Grocery {name: 'Apple'})
MERGE (bread:Grocery {name: 'Bread'})
MERGE (wf:Store {name: 'Whole Foods'})
MERGE (wm:Store {name: 'Walmart'})
CREATE
(angela)-[:LIKES]->(pref1:Preference),
(pref1)-[:ITEM]->(apple),
(pref1)-[:AT_STORE]->(wf),
(angela)-[:LIKES]->(pref2:Preference),
(pref2)-[:ITEM]->(bread),
(pref2)-[:AT_STORE]->(wm)

Another alternative is to represent "Whole Foods" from "Angela likes Apples from Whole Foods" as a property of the edge "likes to eat", which becomes a real "property graph". Here is the data model:
In Nebula Graph (which is a graph database slution), you can use the following nGQL query for modellingļ¼š
// Define the schema
create tag person(name string)
create tag grocery(name string)
create tag store(name string)
create edge likes(storeID int)
create edge sells()
// Insert the vertices
INSERT VERTEX person(name) VALUES 100:("Michael");
INSERT VERTEX person(name) VALUES 101:("Angela");
INSERT VERTEX grocery(name) VALUES 200:("Apple");
INSERT VERTEX grocery(name) VALUES 201:("Bread");
INSERT VERTEX store(name) VALUES 300:("Walmart");
INSERT VERTEX store(name) VALUES 301:("Whole Foods");
// Insert the edges
INSERT EDGE likes(storeID) VALUES 101->200:(301);
INSERT EDGE likes(storeID) VALUES 101->201:(300);
INSERT EDGE sells() VALUES 300->200:();
INSERT EDGE sells() VALUES 300->201:();
INSERT EDGE sells() VALUES 301->200:();
INSERT EDGE sells() VALUES 301->201:();
To find which store's apples Angela likes
> GO FROM 101 OVER likes where likes._dst==200 YIELD likes.storeID as storeID | FETCH PROP ON store $-.storeID
To find how many groceries that Angela likes at Walmart
> GO FROM 101 OVER likes WHERE likes.storeID = 300
Hope that helps.

Related

Traverse graph database from random seed nodes

I am tasked with writing a query for a front-end application that visualizes a Neptune Graph database. Let us say that the first vertex are items while the second vertex user. A user can create an item. There are item to item relationships to show items derived from another item like in the case of media clips cut out of an original media clip. The first set of items created should be created in a vertex such as a SERVER which they are grouped by in the UI.
The following is the requirement:
Find (Y) seed nodes that are not connected by any ITEM-ITEM relationships on the graph (relationships via USERs etc... are fine)
Populate the graph with all relationships from these (Y) seed nodes with no limits on the relationships that are followed (relationships through USERs for example is fine).
Stop populating the graph once the number of nodes (not records limit) hits the limit specified by (X)
Here is a visual representation of the graph.
https://drive.google.com/file/d/1YNzh4wbzcdC0JeloMgD2C0oS6MYvfI4q/view?usp=sharing
A sample code to reproduce this graph is below. This graph could even get deeper. This is a just a simple example. Kindly see diagram:
g.addV('SERVER').property(id, 'server1')
g.addV('SERVER').property(id, 'server2')
g.addV('ITEM').property(id, 'item1')
g.addV('ITEM').property(id, 'item2')
g.addV('ITEM').property(id, 'item3')
g.addV('ITEM').property(id, 'item4')
g.addV('ITEM').property(id, 'item5')
g.addV('USER').property(id, 'user1')
g.V('item1').addE('STORED IN').to(g.V('server1'))
g.V('item2').addE('STORED IN').to(g.V('server2'))
g.V('item2').addE('RELATED TO').to(g.V('item1'))
g.V('item3').addE('DERIVED FROM').to(g.V('item2') )
g.V('item3').addE('CREATED BY').to(g.V('user1'))
g.V('user1').addE('CREATED').to(g.V('item4'))
g.V('item4').addE('RELATED TO').to(g.V('item5'))
The result should be in the form below if possible:
[
[
{
"V1": {},
"E": {},
"V2": {}
}
]
]
We have an API with an endpoint that allows for open-ended gremlin queries. We call this endpoint in our client app to fetch the data that is rendered visually. I have written a query that I do not think is quite right. Moreover, I would like to know how to filter the number of nodes traversed and stop at X nodes.
g.V().hasLabel('USER','SERVER').sample(5).aggregate('v1').repeat(__.as('V1').bothE().dedup().as('E').otherV().hasLabel('USER','SERVER').as('V2').aggregate('x').by(select('V1', 'E', 'V2'))).until(out().count().is(0)).as('V1').bothE().dedup().as('E').otherV().hasLabel(without('ITEM')).as('V2').aggregate('x').by(select('V1', 'E', 'V2')).cap('v1','x','v1').coalesce(select('x').unfold(),select('v1').unfold().project('V1'))
I would appreciate if I can get a single query that will fetch this dataset if it is possible. If vertices in the result are not connected to anything, I would want to retrieve them and render them like that on the UI.
I have looked at this again and came up with this query
g.V().hasLabel(without('ITEM')).sample(2).aggregate('v1').
repeat(__.as('V1').bothE().dedup().as('E').otherV().as('V2').
aggregate('x').by(select('V1', 'E', 'V2'))).
until(out().count().is(0)).
as('V1').bothE().dedup().as('E').otherV().as('V2').
aggregate('x').
by(select('V1', 'E', 'V2')).
cap('v1','x','v1').
coalesce(select('x').unfold(),select('v1').unfold().project('V1')).limit(5)
To meet the criteria for the node count rather than records count (or limit), I can pass to limit half the number passed in by the user as an input for nodes count and then exclude the edge E and vertice V2 of the last record from what will be rendered on the UI.
I will approach any suggestions on a better way.

How to check that a cycle exists in a Neo4j database ;

Trying to learn Neo4j, graph DB and using a test setup where i'm representing users who want to trade fruits.
Im trying to find a situation where there exists a "3 person trade" or a direct cycle between 3 or more persons in the system.
This is the scenario i'm trying to store
userA has apples , wants cherries
userB has bananas, wants apples
userC has cherries , wants bananas
So a trade is possible in the above scenario,if the 3 parties are involved in the trade. I need a query that will return the names of the traders/persons.
Need help representing this and writing the code to be able to solve this query. For the scenario, this is the cypher i'm using:
(userA)-[r:has]->(apples) (userA)-[r:wants]->(cherries)
(userB)-[r:has]->(bananas) (userB)-[r:wants]->(apples)
(userA)-[r:has]->(cherries) (userA)-[r:wants]->(bananas)
Also tried using this :
find the group in Neo4j graph db , but that query didnt work ..
thanks for any info, that can help!
The initial approach would be something like this:
MATCH (userA:User)
WHERE (userA)-[:WANTS]->() AND (userA)-[:HAS]->()
MATCH (userA)-[:WANTS]->()<-[:HAS]-(userB)-[:WANTS]->()<-[:HAS]-(userC)-[:WANTS]->()<-[:HAS]-(userA)
RETURN DISTINCT userA, userB, userC
That said, you may need to adjust this based on how big your graph is, and how fast the query runs on your graph.

Neo4j apply relationship multiple times in a match-query

I have a two nodes in database, arrival_airport and departure_airport, And I have 1 relationship between both airports.
So, when I want select all flights between 2 destinations (BOJ->SFX) I do the following:
MATCH (da:Departure_Airport {airport:'BOJ'})-[f:FlightInfo]->(aa:Arrival_Airport {airport: 'SFX'})
RETURN f, da, aa
The question is, how can I apply FlightInfo multiple times, in order to get also all flights with a legs? (for example: BOJ->FRA->SFX)
Maybe query should look similar to this one (with an asterix):
MATCH (da:Departure_Airport {airport:'BOJ'})-[f:FlightInfo]*->(aa:Arrival_Airport {airport: 'SFX'})
RETURN f, da, aa
UPDATE - Solution
So thanks for all answers and comments. I had to create the relationships between airports properly. So my query for airport-import and automatic creations of relationships (flights) looks as follow:
USING PERIODIC COMMIT 1000
LOAD CSV FROM "file:///airports.csv" AS line FIELDTERMINATOR ";"
MERGE (departure_airport: Airport {name:line[0]})
MERGE (arrival_airport: Airport {name: line[1]})
MERGE (departure_airport)-[f:Flight {departure_time:line[2], arrival_time:line[3], carrier_code:line[4], service_class:line[5], overall_conti:line[6]}]-(arrival_airport)
ON CREATE SET departure_airport.name=line[0],arrival_airport.name=line[1], f.departure_time=line[2], f.arrival_time=line[3], f.carrier_code=line[4]
As result you are be able to match flights as it was answered bellow
Of course I don't know all your requirements, but I assume a slightly adapted graph model works better for you. It could be easier, if the airport type (arrival / departure) is specified by the incoming or outgoing relationship to another airport or flight, rather than by the node respectively label itself. Therefore I'd like to suggest a change of your graph model in the following way:
CREATE
(boj:Airport {name: 'BOJ'}),
(sfx:Airport {name: 'SFX'}),
(fra:Airport {name: 'FRA'})
CREATE
(boj)-[:FLIGHT_INFO]->(sfx),
(boj)-[:FLIGHT_INFO]->(fra),
(fra)-[:FLIGHT_INFO]->(sfx);
Your desired query would be in this case:
MATCH
flightPaths = (departure:Airport {name: 'BOJ'})-[:FLIGHT_INFO*]->(arivial:Airport {name: 'SFX'})
RETURN DISTINCT
flightPaths;

Neo4J: How do I check each disjoint subgraph in a Neo4J query?

After I query through my database using Neo4J, I get a bunch of disjoint subgraphs like 'islands of nodes'.
What I want though is to get the most recent node for each 'island' (I have date values on each node).
How do I go about doing that?
Firstly you need to calculate yours islands like you said.
To di it yYou can check the neo4j-graph-algo with the procedure algo.unionFind : https://neo4j-contrib.github.io/neo4j-graph-algorithms/#_community_detection_connected_components
Then for each of your island, you have to order the nodes and take the first one.

Simple search by value?

I would like to store some information as follows (note, I'm not wedded to this data structure at all, but this shows you the underlying information I want to store):
{ user_id: 12345, page_id: 2, country: 'DE' }
In these records, user_id is a unique field, but the page_id is not.
I would like to translate this into a Redis data structure, and I would like to be able to run efficient searches as follows:
For user_id 12345, find the related country.
For page_id 2, find all related user_ids and their countries.
Is it actually possible to do this in Redis? If so, what data structures should I use, and how should I avoid the possibility of duplicating records when I insert them?
It sounds like you need two key types: a HASH key to store your user's data, and a LIST for each page that contains a list of related users. Below is an example of how this could work.
Load Data:
> RPUSH page:2:users 12345
> HMSET user:12345 country DE key2 value2
Pull Data:
# All users for page 2
> LRANGE page:2:users 0 -1
# All users for page 2 and their countries
> SORT page:2:users By nosort GET # GET user:*->country GET user:*->key2
Remove User From Page:
> LREM page:2:users 0 12345
Repeat GETs in the SORT to retrieve additional values for the user.
I hope this helps, let me know if there's anything you'd like clarified or if you need further assistance. I also recommend reading the commands list and documentation available at the redis web site, especially concerning the SORT operation.
Since user_id is unique and so does country, keep them in a simple key-value pair. Quering for a user is O(1) in such a case... Then, keep some Redis sets, with key the page_id and members all the user_ids..

Resources