I want to keep track of who created what in my database and I was thinking about connecting my edges, with their out field to a OUser and perhaps to other relevant vertices. Is this possible or will this make walking the graph impossible?
Thanks
Related
I am interested in storing a set of users that have personality scores.
I would like to get them to be more connected (closer?) to each other based on formulas that are applied to their scores. The more similar the users are, the more connected or closer to each other they are (like in a cluster). The closest nodes are to one-another, the more similar they are.
I currently do this over multiple steps (some in SQL and other in code) from a relational database.
Most posts out there and documentation seems to focus on how to get started and what the advantages are at a high level compared to relational databases.
I am wondering if Graph databases are better suited for this and would do most of the heavy lifting out of the box or more natively. Any details are greatly appreciated.
You could consider modeling it like this:
Where a vertex type/label named Score_range was introduced, together with the label User(with property score).
User vertices are connected to Score_range vertex like User with score: 101 is connected to Score_range(vertexID=100) which stands for [100, 110).
Thus, those vertices with closer score are more connected/clusterred in this graph, and in your applicaiton, you need to make connection changes when the score are recaculated/changed to the graph database.
Then, either to run cluster algorithm(i.e. Louvain) on the whole graph or graph query to find path between any two user nodes(i.e. FIND PATH in Nebula Graph, an opensource distributed graph database speaks opencypher), the closeness will be reflected.
But, I think due to this connection/closness is actually numerical/sortable, simply handling this closeness relationship may not need a graph database from the context you already provided.
PS. I drew a picture of a graph in the above schema:
This is more of a logical question rather than technical one. I am asking for data organisation guidance for my requirements. Please keep in mind that I am willing to use a graph database for this purpose (though I am pretty new at that). So guidance in graph database context would be much appreciated.
Let me provide an overview of the scenario. There are two entities in the app, User and House. User can owns a house or rents a house. If an user rents a house, there should be time period mentioned for which the user has rented the house. An user may rent same house for different periods.
Demo Dataset:
A (User) -owns-> H1, H2, H3 (House) - one-liner for brevity
X -rents-> H2 (start=DATE1, end=DATE2)
Y -rents-> H2 (start=DATE3, end=DATE4)
X -rents-> H2 (start=DATE5, end=DATE6) - user rents same house again
I am assuming that User and House would be nodes and owns and rents would be edges. Rent period would be properties of rents edges. Please point out if there is any better way.
Questions:
Is this possible in graph database in general to have multiple edges of same type between two nodes? Should I keep just one edge for rent of a specific user to specific house and add periods? Or should I maintain multiple edges for multiple periods?
Is it possible to query for something like: "fetch all the houses that were empty for a period of 3 months"? This should fetch the houses that have a gap of 3 months between consecutive end and next start dates in rents. These houses may not be empty now.
I have checked neo4j, cayley, dgraph etc. Which may be better with this scenario?
Any guidance of how I should keep the data with relationships would be much appreciated. Have a nice day.
I think this may be solved, but I would just add that TerminusDB may be useful to assess as part of your process. The reason that I sat this is that you are:
TerminusDB uses an expressive logical schema language that allows anything that is logically expressible. So you could have multiple edges of same type between two nodes. However, data modeling is something of an art - as your question suggests - so it will depend on your domain. (I always think that 'deal' could be an edge or a node depending on context - you can participate in a deal or you could strike a deal with another party).
As TerminusDB is a native revision control database, time bound queries can be relatively straightforward. You can get a delta, or a series of deltas, between two events.
There could be a better answer than this, still posting my experience with graph on the given requirement if this of any help to you.
I think it is the best fit for the graph DB for your requirement and to answer your questions.
It is more of designing your graph model to suit the purpose and I think you can have multiple rent edges with different periods from node user to node house.
Which way you can maintain the history and you can later delete the older/expired period edges if you want.
[Just to avoid duplicates] Assume here you need to make sure the edge would be created between nodes (user & house) only if the period slot is free.
You can add logic to the query while creating the edge between nodes.
With the given demo data set, here is the sample graph I have created based on the scenario you have described.
http://console.neo4j.org/?id=bxu3sp
Click on the above console link and you can run the below cypher query in the query window at the bottom
MATCH (user:User)-[rent:RENT]->(house:House)
WITH house, rent
ORDER BY rent.startDate
WITH collect(rent) as rents, house
UNWIND range(0, size(rents)-1) as index
WITH rents, index, house
WHERE duration.inDays(date(rents[index].endDate), date(rents[index+1].startDate)).days > 30
RETURN house
This would get the list of houses that was with no allocation for a given period range.
I'm not an expert and never used other than neo4j and so far with my experience on neo4j, documentation is really good and it is powerful with additions like Kafka integrations, GraphQL, Halin monitoring, APOC, etc.
I'd say it is a learning curve, just explore and play around with it to get yourself into the graph DB world.
Update:
In case of the same user renting the same house for different periods then the graph would look something like below as said you should avoid creating duplicates by not allowing edges for the same/overlapping window period between any user node and any house node. here in this graph, I have created edges for the different and non-overlapping start/end date so which is valid and not a duplicate.
I am very new to graph databases and am trying to work on a survey of different graph databases. I am not able to understand what exactly the global indexing in graph databases are.
Can someone please help me to understand what is Global indexing in Graph Databases.
I am not sure whether all graph databases agree on the notion of what a global index is, but generally it means an index that applies to the whole graph. Such an index allows to efficiently retrieve vertices based on some indexed property, e.g.: find all person vertices with the name Manoj. Most graph queries use a global index to find one or a small number of vertices as an entry point into the graph and then traverse the graph from there.
Opposed to global indexes are vertex-centric indexes. They only apply to a specific vertex and can be used to make queries with so-called supernodes more efficient. The idea here is to index a property of incident edges of the vertex that can reduce the number of neighboring vertices returned to those that are really interesting for the query. Such a vertex-centric index could for example for twitter be used to index the followedSince property on follower edges. This would allow to efficiently query for all followers of Katy Perry that began following her on her birthday. Without an index you would have to check the property for all of her (currently over 95 Mio.) followers for this query.
(Your question didn't mention vertex-centric indexes but I think it helps to understand why global indexes are called that way when you know about vertex-centric indexes, as they are basically local indexes.)
For more information about indexing in graph databases see the respective sections in the documentation of graph databases like Titan or DSE Graph.
I am not able to find the way to create subgraph in arangodb.
no idea to found in the link https://docs.arangodb.com/3.0/AQL/Graphs/index.html
how to create subgraph in arangodb ?
Thanks in advance
When you create the graph in the ArangoDB interface, you will be prompted to list all the node and edge collections. Simply choose the subset of node and edge collections you want. There are equivalent ways to do this via REST and ArangoSH, but ultimately it's the same interface.
When you create a graph, it can contain many related vertex and edge collections. I have some graphs with 9 or more collections. You don't have to worry about creating a subgraph because ArangoDB will only use the collections that relate to your query.
So if you perform a query of all inbound connected vertices to a particular vertex you specify, ArangoDB will work out which edge and vertex collections to use to answer your query.
This is really powerful, as you don't have to track how collections are bound to each other in a 'graph space' rather just query it and the database handles it for you.
It is possible to reuse vertex and edge collections over multiple graphs, this can be useful if you want to reuse data in different groupings.
For me, I see graphs as another type of collection, which are really just a collection of connected vertex and edge collections. When you query the scope of that graph it uses it's graph definition to limit the scope of path traversals.
Good day!
I need to find a base for storage and processing complex structured information.
Something like a mind map. Need to have some arbitrary values in groups with connections to each other, connection must also have titles.
The biggest problem is that I need to get all the related values without knowing exactly what are the connections and how many of them.
For example:
With VALUE 3 connected
VALUE 1 from the group A as NAME OF COMMUNICATION 1
and VALUE 2 from group B as NAME OF COMMUNICATION 2
and ...
Before any level of the connections (i.e., the values of all properties connected to the associated properties, and for these properties and so on until a predetermined level) - but it can be implemented in the application logic.
I looked at some noSQL base, but they do not allow such requests without knowing the exact value or links. I pondered on the mysql development with a lot of logic in the application to handle all this, but perhaps there is a more suited storage for such a task?
I would be grateful for any help.
http://magika.tk/struct.png - A schematic example.
As Philipp says mind-maps are a type of graph, usually a spider diagram. A graph based NoSQL databases, such as Neo4j would be suitable. Here's a longer list. Graph databases store information about the nodes and the edges. Each node has a pointer to all its adjacent nodes so counting connections and groups should be very fast.