How to store graph data in a database? - database

I am new to graphs and its very interesting.This question may be noob one but please site some good materials.
I am trying to make a small social Network where each user is a node and has undirected connection with his friend.
Its working fine but now I want to store it in a database.
How can I store the data?How to store all the connected nodes(pointer) of a node.
Is it better to delete the memory after the user log out and read it from database when he logs in or should logging in and logging out shouldnot have any impact on the node?
I know its theoretical. Any references will be really helpful.

Use an actual graph database to store your data.
http://www.neo4j.org/
You can store key/value pairs in a node and you can also store edges that connect nodes.
Then you can use something like Gremlin to query/traverse the graph -https://github.com/tinkerpop/gremlin. See their documentation to download examples and run sample queries: https://github.com/tinkerpop/gremlin/wiki/Getting-Started
An idea of the syntax:
gremlin> // lets only take 'knows' labeled edges
gremlin> v.out('knows')
==>v[2]
==>v[4]
gremlin> // lets do a traversal from the '1' marko vertex to its outgoing edges.
gremlin> // in the property graph world, edges are first class citizens that can be traversed to.
gremlin> v.outE
==>e[7][1-knows->2]
==>e[9][1-created->3]
==>e[8][1-knows->4]

I start at the bottom.
Is it better to delete the memory after the user log out and read it from database when he logs in or should logging in and logging out should not have any impact on the node?
You will need some sort of permanent storage, or your lose all the data you acquired on your first crash/restart that might upset your users a bit.
How can I store the data?
Well without knowing more about this it is difficult however assuming that you have a list of users and each user can have 0 or more friends then i would go with 2 tables.
Users - stores all your user information such as username and password
UsersFriends *- store all the relationships in a UserID -> UserID fashion *
Example
Users Table
UserID Username
1 user2511713
2 abstracthchaos
3 anotheruser
UsersFriends
UserID FriendUserID
1 3
2 3
1 2
Means user2511713 is friends with anotheruser & abstracthchaos and abstracthchaos friends with anotheruser, dependant on your business logic it may also be useful to imply the other way around such that 3 1 is the same as 1 3

Related

How to check that a cycle exists in a Neo4j database ;

Trying to learn Neo4j, graph DB and using a test setup where i'm representing users who want to trade fruits.
Im trying to find a situation where there exists a "3 person trade" or a direct cycle between 3 or more persons in the system.
This is the scenario i'm trying to store
userA has apples , wants cherries
userB has bananas, wants apples
userC has cherries , wants bananas
So a trade is possible in the above scenario,if the 3 parties are involved in the trade. I need a query that will return the names of the traders/persons.
Need help representing this and writing the code to be able to solve this query. For the scenario, this is the cypher i'm using:
(userA)-[r:has]->(apples) (userA)-[r:wants]->(cherries)
(userB)-[r:has]->(bananas) (userB)-[r:wants]->(apples)
(userA)-[r:has]->(cherries) (userA)-[r:wants]->(bananas)
Also tried using this :
find the group in Neo4j graph db , but that query didnt work ..
thanks for any info, that can help!
The initial approach would be something like this:
MATCH (userA:User)
WHERE (userA)-[:WANTS]->() AND (userA)-[:HAS]->()
MATCH (userA)-[:WANTS]->()<-[:HAS]-(userB)-[:WANTS]->()<-[:HAS]-(userC)-[:WANTS]->()<-[:HAS]-(userA)
RETURN DISTINCT userA, userB, userC
That said, you may need to adjust this based on how big your graph is, and how fast the query runs on your graph.

Nested traversal gremlin query for Titan db

I am wondering how is possible to have a gremlin query which returns results in a nested format. Suppose there is property graph as follows:
USER and PAGE vertices with some properties such as AGE for USER vertex;
FOLLOW edge between USER and PAGE;
I am looking for a single efficient query which gives all Users with age greater than 20 years and all of the followed pages by those users. I can do that using a simple loop from the application side and per each iteration use a simple traversal query. Unfortunately, such solution is not efficient for me, since it will generate lots of queries and network latency could be huge in this case.
Not sure what your definition of "efficient" is, but keep in mind that this is a typical OLAP use-case and you shouldn't expect fast OLTP realtime responses.
That said, the query should be as simple as:
g.V().has("USER", "AGE", gt(20)).as("user").
map(out("FOLLOW").fold()).as("pages").
select("user", "pages")
A small example using the modern sample graph:
gremlin> g = TinkerFactory.createModern().traversal().withComputer()
==>graphtraversalsource[tinkergraph[vertices:6 edges:6], graphcomputer]
gremlin> g.V().has("person", "age", gt(30)).as("user").
map(out("created").fold()).as("projects").
select("user","projects")
==>[user:v[6], projects:[v[3]]]
==>[user:v[4], projects:[v[5], v[3]]]
this is very easy:
g.V().label('user').has('age',gt(20))
.match(__.as('user').out('follows').as('page'))
.select('user','page')
just attention when you are using this query in gremlin, gremlin gives you null pointer exception you can use it in code and check if 'page' exist get that.

Large Neo4j graph not showing up

I created a large neo4j graph connecting users to the videos they watch like user -> video in a social graph or network type of graph. There are about 9000 user nodes and 20000 video nodes.
If I try:
MATCH (u)-[:VIEW]->(v)
RETURN u,v
The graph says "Displaying 300000 nodes, 0 relationships." No graph nor relationships nor nodes are showing up.
If I try:
MATCH (u)-[:VIEW]->(v)
RETURN u,v
LIMIT 1000
The graph says "Displaying 1000 nodes, 1000 relationships (completed with 1000 additional relationships)." All graph and relationships and nodes show up.
If I try:
MATCH (u)-[:VIEW]->(v)
RETURN u,v
LIMIT 10000
No graph nor relationships nor nodes show up.
Is the first graph too large to show? How can I get it to show up?
Thank you in advance.
Are you doing this in the web console? I suspect when you do the LIMIT 10000 that the result is just too big to be handled in the web browser. I'm actually a bit surprised that 1000 showed up (again, if you're in the web console).
What are you trying to get? If you want to get a table you can do this (I'm making up properties here):
MATCH (u)-[:VIEW]->(v)
RETURN u.username,v.title
If you want something else, then I'd need more information ;)

How to query for multiple vertices and counts of their relationships in Gremlin/Tinkerpop 3?

I am using Gremlin/Tinkerpop 3 to query a graph stored in TitanDB.
The graph contains user vertices with properties, for example, "description", and edges denoting relationships between users.
I want to use Gremlin to obtain 1) users by properties and 2) the number of relationships (in this case of any kind) to some other user (e.g., with id = 123). To realize this, I make use of the match operation in Gremlin 3 like so:
g.V().match('user',__.as('user').has('description',new P(CONTAINS,'developer')),
__.as('user').out().hasId(123).values('name').groupCount('a').cap('a').as('relationships'))
.select()
This query works fine, unless there are multiple user vertices returned, for example, because multiple users have the word "developer" in their description. In this case, the count in relationships is the sum of all relationships between all returned users and the user with id 123, and not, as desired, the individual count for every returned user.
Am I doing something wrong or is this maybe an error?
PS: This question is related to one I posted some time ago about a similar query in Tinkerpop 2, where I had another issue: How to select optional graph structures with Gremlin?
Here's the sample data I used:
graph = TinkerGraph.open()
g = graph.traversal()
v123=graph.addVertex(id,123,"description","developer","name","bob")
v124=graph.addVertex(id,124,"description","developer","name","bill")
v125=graph.addVertex(id,125,"description","developer","name","brandy")
v126=graph.addVertex(id,126,"description","developer","name","beatrice")
v124.addEdge('follows',v125)
v124.addEdge('follows',v123)
v124.addEdge('likes',v126)
v125.addEdge('follows',v123)
v125.addEdge('likes',v123)
v126.addEdge('follows',v123)
v126.addEdge('follows',v124)
My first thought, was: "Do we really need match step"? Secondarily, of course, I wanted to write this in TP3 fashion and not use a lambda/closure. I tried all manner of things in the first iteration and the closest I got was stuff like this from Daniel Kuppitz:
gremlin> g.V().as('user').local(out().hasId(123).values('name')
.groupCount()).as('relationships').select()
==>[relationships:[:]]
==>[relationships:[bob:1]]
==>[relationships:[bob:2]]
==>[relationships:[bob:1]]
so here we used local step to restrict the traversal within local to the current element. This works, but we lost the "user" tag in the select. Why? groupCount is a ReducingBarrierStep and paths are lost after those steps.
Well, let's go back to match. I figured I could try to make the match step traverse using local:
gremlin> g.V().match('user',__.as('user').has('description','developer'),
gremlin> __.as('user').local(out().hasId(123).values('name').groupCount()).as('relationships')).select()
==>[relationships:[:], user:v[123]]
==>[relationships:[bob:1], user:v[124]]
==>[relationships:[bob:2], user:v[125]]
==>[relationships:[bob:1], user:v[126]]
Ok - success - that's what we wanted: no lambdas and local counts. But, it still left me feeling like: "Do we really need match step"? That's when Mr. Kuppitz closed in on the final answer which makes copious use of the by step:
gremlin> g.V().has('description','developer').as("user","relationships").select().by()
.by(out().hasId(123).values("name").groupCount())
==>[user:v[123], relationships:[:]]
==>[user:v[124], relationships:[bob:1]]
==>[user:v[125], relationships:[bob:2]]
==>[user:v[126], relationships:[bob:1]]
As you can see, by can be chained (on some steps). The first by groups by vertex and the second by processes the grouped elements with a "local" groupCount.

Graph DB, create node for friend request?

Short and simple question:
For a social network platform would you create a separate node for the friend requests and creating the edge after confirmation, or creating the edge directly and set a confirmed flag?
What are the advantages / disadvantages?
I am interested in your comments.
One advantage of using the flag option is when either of the user nodes are deleted by delete vertex the friend request edge will be deleted automatically by OrientDB to maintain graph consistency. If you use a seperate node for the request then you need to delete that node manually.
Performance wise, I guess, the question you linked is relevant to OrientDB too.
For such decisions, I'd also consider the readability of the code. One advantage of using a graph DB is your code becomes easier to understand and reason about. So you can write the queries for different options and judge yourself about which code is more readable. Let's try it for the flag option:
# create
CREATE EDGE Friend
FROM (SELECT FROM User where name = "Alice")
TO (SELECT FROM User where name = "Bob")
SET status = "requested" # or confirmed = False
# confirmed
UPDATE Friend SET status = "confirmed" # or confirmed = True
WHERE out.name = "Alice" AND in.name = "Bob"
# query
SELECT in.name FROM Friend
WHERE out.name = "Alice" AND status = "confirmed"
# output: Bob
# another method
SELECT outE(Friend)[status = "confirmed"].in.name
FROM User WHERE name = "Alice"
# output: Bob
I'll argue that if you are familiar with graphs as mathematical objects and get used to the OrientDB syntax and terminology, this option enables you to write very understandable code.
If you don't like this option, as an alternative to keeping requests in a different node (class/table), I'll also suggest storing them inside the User nodes as a LINKSET or something similar.
I believe you should also take into consideration the memory you'll have available. If you store that info in the edge, that probably means you'll have to define an index on that property to have faster queries. And this means more memory needed.
I advise you to store friend requests in a different node.
Find friends is easier:
select expand(both('Friend')) from #12:0
Find requests is easier:
select expand(in('Request')) from #12:0
And they are very likely faster than an index on some property.
Using a model such User1(V)---Friendship(E)---->User2(V) is enough to represent the friendship bind between users and by using properties you can implement all the workflow from the request to the completion. This design is pretty basic so you'll have a standard complexity when it comes to query/traverse .... that can be more difficult more you add constraints on properties..... a disadvantage is that an Edge is not a Vertex and this will affect its interaction with others vertexes, if you need such interaction then an approach where friendship is a vertex is the way to go.

Resources