How to store history of an edge in graph database? - graph-databases

I am designing a way to store history of a graph in a graph database. I have the following in mind:
History of a node, say Vertex_A, is maintained by creating another history node, say History_Vertex_A. Whenever Vertex_A is modified, a new version node, say Vertex_A_Ver_X, is created. This new node stores the old values of the changed data. A new edge is created between the history node and the version node. Following diagram depicts this idea. Is there a better way to store history of a vertex/node in a graph database?
+------------------+
| Vertex_A (Ver N) |
+---------+--------+
|
+-----------v-----------+
| Edge_Vertex_A_History |
+-----------+-----------+
|
+---------v--------+
| History_Vertex_A |
+---------+--------+
|
+---------------------+----------+----------------+----------------------+
| | | |
+------v-------+ +------v-------+ +--------v-------+ +-------v--------+
| Edge_A_Ver_0 | | Edge_A_Ver_1 | | Edge_A_Ver_N-2 | | Edge_A_Ver_N-1 |
+------+-------+ +------+-------+ +--------+-------+ +-------+--------+
| | | |
+--------v---------+ +--------v---------+ +----------v---------+ +---------v----------+
| Vertex_A (Ver 0) | | Vertex_A (Ver 1) | .... | Vertex_A (Ver N-2) | | Vertex_A (Ver N-1) |
+------------------+ +------------------+ +--------------------+ +--------------------+
Now, say I have the following relation. Vertex_A is connected to Vertex_B via edge Edge_AB.
+----------+ +---------+ +----------+
| Vertex_A +------> Edge_AB +-------> Vertex_B |
+----------+ +---------+ +----------+
I can store the history of vertices as per the above design, but I cannot use that same idea to store history of edges, edge Edge_AB in this case. This is because it won't be possible to have an edge connecting to it's corresponding history vertex. An edge cannot connect to a vertex. So what is the best way to store history of an edge in a graph database?

Your approach is universally working among different graph databases.
One more approach that we are doing with NebulaGraph is to leverage the rank concept in its edge defination.
In NebulaGraph, the factor to define one instance of an edge is: [src, dst, edge_type, rank], where the rank is an int to represent things like transaction_id, timestamp, version or whatever generates multiple between two vertices in one edge type.
note, rank field could be ommited, where the value will be 0, thus it brings nothing new to us with same mind model from other graph databases when using it.
With rank, we could easily design the versioning of edges here. But how could we design the versioning of vertecies then? Our approach will be to introduce an edge with dst-vertex of itself, and put the propertis that could differ from different versions of vertices in this edge, where the rank is the version and the properties are on the edge.
ref:
https://docs.nebula-graph.io/3.2.1/1.introduction/2.data-model/
https://github.com/vesoft-inc/nebula

Related

Trouble removing state with dual keys

To ask my question i first have to show you my data and my proposed solution to the dual key problem:
Data has 1 of 2 keys x and y. Sometimes x is pressent sometimes y. One type of event has both.
Type 1: Key x and y
Type 2: key x
Type 3: Key y
To have the full session at the end of the pipeline we need all data under one key: x+y.
To achieve this, I copy the messages with both keys and key one of them by x and the other by y. Then in the following Processor I enrich type x and y.
Each message looks like this: [Flink key, potentialX, potentialY, rest of msg...]
Pipeline
Here is my scenario: I have a close session message
which is type 2. This will be propagated to the key X processor. Here
it will be enriched and we can shut down appropriate
processors in the rest of the pipeline. However key y is
never evicted because it never gets the close session
message.
Close msg flow
Now for the question: How can i close the state in the Y processor?
Initially i thought to duplicate the type 2 msg in the enricher, and make a sideoutput for it, grab that sideoutput before the keyby, and therefore have it go to the correct processor. This is not possible as the sideoutput can only be used after the processor where it was created. Then i found some jira-tickets about side-inputs, but that seems to not be an actual feature yet.
Lastly i thought i might make a sink for the sideoutput mentioned above, and a source at the keyby. This seems a bit hacky tho.
I really hope someone can help!
Edit:
Adding new diagram, to try to clarify the original flow. In the original drawings i tried to make make the flow of data easier to understand by making 2 boxes for the Enrichment processor. I've tried to make the flow more correct with this new drawing:
Improved drawing
That's a bit complicated to follow, but I've seen this pattern before when trying to unify logged-out sessions with logged-in sessions from web logs. If I've understood the details well enough, I think you could take a side output from the X processor, and feed it into the Y processor, like this:
+------------+ +-------+
| +--------------------------> |
+--------+ +-------+ X | X proc | | |
| | | +-----> | sideout +-----------+ | X + Y |
| | | | | +---------> | | |
| source +-----> split | +------------+ | +----> |
| | | | | Y proc | +-------+
| | | +----------------------------> |
+--------+ +-------+ Y +-----------+

CARTO - Performing cluster analysis on specific selection

Is there a way to perform cluster analysis on a selection of a layer in CARTO? For example, if had data points throughout the U.S., and I wanted to know cluster of points in San Francisco, could I feasibly do (pseudo-SQL ahead):
SELECT ST_ClusterWithin(geom) FROM table
WHERE city = "San Francisco"
Or am I better off just splitting layers by city and then performing analysis on each layer in CARTO? I realize this option may not be ideal for ease of updating data across the layers. Any help is appreciate, thank you.
You can use the Filter by column value to extract a selection of your table and then perform the cluster analysis on that analysis node. You can even drag out the original dataset source to create a layer and perform again your analysis on another selection.
+---------+
| dataset |
+------------+---------+----------+
| |
+-----------v------------+ +-------------v--------+
| name = "San Francisco" | | name = "New York" |
+-----------+------------+ +-------------+--------+
| |
| |
+----------v---------+ +-----------v-------+
| cluster analysis | | cluster analysis |
+--------------------+ +-------------------+

uml sequence diagram: create objects in a loop

In a sequence diagram i am trying to model a loop that creates a bunch of objects. i have found little information online regarding the creation of multiple objects in an SD diagram so i turn to you.
The classes are Deck and Card
Cards are created by fillDeck(), which is called by the constructor of Deck (FYI the objects are stored in an arraylist in Deck).
There are many types of cards with varying properties. Suppose i want 8 cards of type A to be made, 12 of type B and 3 of type C
How would i go about modelling such a thing? this is the idea i have in mind so far, but it is obviously incomplete.
Hope someone can help! thanks!
+------+
| Deck |
+------+
|
+--+-------+--------------+
| loop 8x / |
+--+-----+ +----------+ |
| |-------->| Card(A) | |
| | +-----+----+ |
+--+----------------------+
| |
+--+--------+------|-----------------------+
| loop 12x / | |
+--+------+ | +---------+ |
| |------------------------->| Card(B) | |
| | | +----+----+ |
|--+---------------------------------------+
| | | |
+--+-------+----------------------------------------------+
| loop 3x / | | |
+--+-----+ | | +---------+ |
| |--------------------------------------->| Card(C) | |
| | | | +----+----+ |
|--+------------------------------------------------------+
| | | |
"A sequence diagram describes an Interaction by focusing on the sequence of Messages that are exchanged, along with their corresponding OccurrenceSpecifications on the Lifelines." (UML standard) A lifeline are defined by one object. But that doesn't mean you must keep all objects in lifelines. You should show only these lifelines, that are exchanging messages you are thinking about.
And you needn't show all messages sequences logic on one diagram. In one SD normally you are showing one Interaction. Or maybe a few of them, if they are simple.
So, if your SD is showing one logical concept, it is correct. If there will be another interaction between some objects, you will draw another SD for this interaction, and there will be only objects participating in this second interaction.
UML standard 2.5. Figure 17.25 - Overview of Metamodel elements of a Sequence Diagram

ER diagram relation

I'm having a bit of trouble designing an ER diagram for a bike shop. The shop contains many bike parts (wheel, gear, brakes etc.) which have different attributes. I have therefore made each part as an entity in order to model the different attributes of theirs. They all contain a quantity attribute, name and price which is made by using inheritance. However, now when I have all these entities they should be mapped to the 'Bike' entity which is a collection of all the parts and the 'Stock' entity where all the parts are listed as well as their preferred amount and minimum amount.
My problem is that I'm not sure how to map the parts to the 'Bike' and 'Stock' entity. In the figure below I've made two different designs. Which one of them is correct, if any at all? Can I model it in a smarter way? (I have removed the attributes for simplification)
Solution 1
Solution 2
I think you are looking at a Bill of Materials type schema, where you could have a Part super-type and as many sub-types as you wish to hold specific details for particular types of Part. The Bill of Materials contains a quantity to hold the number of child Parts required to make the parent Part e.g 2 wheels, 1 frame. This goes all the way up to Bike, which is just another type of Part. The Part can then link into your entities for managing Stock and Inventory.
+-----------------+
| BOM |
+-----------------+
| parent_part_id |
| child_part_no |
| quantity |
+-----------------+
| |
| | +-------------+
| | | STOCK |
+-------------+ +-------------+
| PART |-----| ... |
+-------------+ +-------------+
| part_id |
| part_type |
+---------| ... |---------+
| +-------------+ |
| | |
| | |
| | |
+-------------+ +-------------+ +-------------+
| WHEEL | | GEAR | | BIKE |
+-------------+ +-------------+ +-------------+
| part_id | | part_id | | part_id |
| ... | | ... | | ... |
+-------------+ +-------------+ +-------------+
None of them. Sorry.
First of all, you may want to consider "Wheel", "Gear", "Brake" as a single entity "Part", instead of separate entities.
This makes the diagram more simple. And, rememeber, than there can be more parts, like "chain", "lights", and so on.
So instead of defining a sngle entity for each part, just define a single one: "Part", for all.
Second, Some parts can be parts of another part, and so on. This is called a "recursive" or "self referenced" entity. This may look odd, at first, but, also, makes the diagram more simple.
............................................................
...........+-------------+..................................
...........|.............|..................................
...........|.............|..................................
.........../\............|.........../\.....................
........../ \...........|........../ \....................
........./ \.....Many.|....Many./ \...................
......../ \.1.+-----+----+.../ \...1+----------+..
.......<IsPartOf>--| Part +--< Stores >---+ Stock |..
........\ /...+----------+...\ /....+----------+..
.........\ /.........|..........\ /...................
..........\ /..........|Many.......\ /....................
...........\/...........|............\/.....................
......................./ \.................................
....................../ \................................
...................../ \...............................
..................../ \..............................
...................< Composed >.............................
....................\ By /..............................
.....................\ /...............................
......................\ /................................
.......................\../.................................
........................|...................................
........................|1..................................
...................+----------+.............................
...................| Bike |.............................
...................+----------+.............................
............................................................
Cheers.

Constructing objects from a one-to-many relationships

Context
I am designing a data model for a node based system used to perform tasks. The system includes node, plug and edge objects.
A node is an object which performs an action. You can think of nodes as being like a program or executable. The functionality of the node may be altered via data passed through from other nodes.
Data is passed from one node to another via a connection. A connection between two nodes is called an edge.
Nodes are connected using plugs. Each node has a list of plugs which determine the input and output for the node. You can think of plugs as being like the arguments to a program or executable.
The relationship between nodes and plugs is a one-to-many relationship. So a node can have many plugs but a plug can only have one node. In this case I will store a reference to the node on each plug. Edges are really just an association between two plugs. Below is an example of how I imagine the data is stored:
The node table:
|-------------|-----|-------|
| PRIMARY_KEY | ID | TYPE |
|-------------|-----|-------|
| NODE.1 | 1 | NODE |
|-------------|-----|-------|
| NODE.2 | 2 | NODE |
|-------------|-----|-------|
The plug table:
|-------------|-----|-------|---------|
| PRIMARY_KEY | ID | TYPE | NODE |
|-------------|-----|-------|---------|
| PLUG.1 | 1 | PLUG | NODE.1 |
|-------------|-----|-------|---------|
| PLUG.2 | 2 | PLUG | NODE.2 |
|-------------|-----|-------|---------|
| PLUG.3 | 3 | PLUG | NODE.2 |
|-------------|-----|-------|---------|
The edge table:
|-------------|-----|-------|----------|----------|
| PRIMARY_KEY | ID | TYPE | SRC_PLUG | DST_PLUG |
|-------------|-----|-------|----------|----------|
| EDGE.1 | 1 | EDGE | PLUG.1 | PLUG.2 |
|-------------|-----|-------|----------|----------|
| EDGE.2 | 1 | EDGE | PLUG.1 | PLUG.3 |
|-------------|-----|-------|----------|----------|
Question
Assuming this is not completely wrong, my question is about how I would construct a node object from the data. It seems to me that a node is useless without the plugs which are associated to it. This suggests we must find all the plugs associated to the node at the time we create the node. Where and how is this information usually stored? In other words, how does the process used to create the node know to do the query for associated plugs?
All suggestions are much appreciated.
It sounds like plugs are children of nodes and cannot exist until the node is created, unless the Node property of Plug can be null. In that case you could pass one or more edges to the Node creator, and the node plugs would be the distinct set of destination plugs from them.
Having said that, it seems backwards in your example to create the plugs first, then the edges, then the nodes. I would think the object which performs the action (node) would be created first and dictate the destination plugs it requires. Edges would be defined last and would be more mutable over the lifetime of the application as different connections are created. It feels more natural to define and create a node and its associated plugs together.
I'm not sure I understand the ID column of the Edge table or its relationship to PRIMARY_KEY or ID of another object.

Resources