ArangoDB: Traversals where edges are connected to other edges - graph-databases

I recently read that ArangoDB is capable of connecting edges to other edges in a graph. In this situation, how would querying the path work? For example:
car <-------- part
^
|
|
installationEvidence
In this case, installationEvidence is a node connecting to the edge between the part to the car. Starting from the car node, what is the AQL to return installationEvidence but not part? Are both installationEvidence and part considered at the p.vertices[1] layer?

In ArangoDB edges are a special type of Documents.
That is why you can store edges pointing to other edges.
From a query point of view there are two directions for this edges:
A) The traversal leads to the target edge. In this case it is assumed to be the general type of document and the traversal will not follow any direction of the target edge.
Which means you would have to write 2 traversals steps in the statement.
The first ending in the edge.
The second starting at _from or _to of the edge.
In your case the query could look like this:
FOR edge IN 1 OUTBOUND #installationEvidece ##edges1
LET car = DOCUMENT(edge._to)
RETURN car
B) A traversal walks through an edge which has other edges pointing to it.
This case is more complicated. In ArangoDB's architecture the "vertex" does not know anything about it's attached edges, the edges know their vertices.
What you could do in this case is to again write two traversal statements where the second starts with the edge encountered, e.g.:
FOR part,edge IN 1 INBOUND #car ##edges1
FOR installationEvidence IN 1 INBOUND edge ##edges2
[...]
For the time being we did not encounter any use-case of customers to make the above traversal more transparent. If this is critical for you please contact us and we can increase the priority to make these kind of queries easier to formulate.

Related

Depth/Breadth First Search Traversal

I am using gremlin with AWS Neptune, and for certain reasons, I want to traverse the graph in either depth-first or breadth-first manner (doesn't matter). This is what I am doing currently:
g.V('0').repeat(out('connected_to').dedup().where(without('z')).aggregate('z')).until(out('connected_to').dedup().where(without('z')).count().is(0)).select('z').limit(1).unfold()
I know that a path exists from vertex '0' to every other vertex in the graph, but there may be cycles in the graph and so, I use the Collection 'z' to keep track of visited nodes, making sure I do not revisit such a node.
If this were to work, I would have all 1000 vertices of the graph in 'z' at the end. But that isn't the case. I get 600 vertices and some vertices are missing even though they have clear incoming edges from other vertices that are in 'z'. What's wrong with my logic here?

ArangoDB Creating counter edge for every directed edge for bidirectional edges

I am very new to graph database. And I have started with Arango. For this project I am not sure about the queries that I will encounter in future. I don't want to create bottlenecks. So I wanted to create undirected or bidirectional edges everywhere.
However as only directed edges are supported my current understanding is that if some vertex is not reachable by a directed traversal then I'll hit a bottleneck later. So whenever I am creating an edge a -> b I am also creating b -> a in the same edge collection.
Are my assumptions correct ? and Is the design decision acceptable ?
While edges are always directed, you can choose to ignore the edge direction in a traversal by using ANY: https://www.arangodb.com/docs/stable/aql/graphs-traversals.html
OUTBOUND to follow an edge in its defined direction (_from → _to)
INBOUND to follow in the opposite direction (_from ← _to)
ANY to follow regardless of the edge direction, inbound and outbound (_from ↔ _to)

What edges are not in any MST

This is a homework question. I do not want the solution - I'm offering the solution I've been thinking of and wish to know whether is it good or why is it flawed.
My motivation is to find what edges of an unweighted, undirected graph are not a part of any MST. This problem only makes sense when several edges have the same values, otherwise the MST is unique.
My idea comes from Prim's Algorithm with a slight change - instead of adding the minimum edge from S to T on every step (where S and T being the two sets of vertex) - instead look for the minimum edge and more edges of the same value going from S to the vertex the minimum edge goes to. By doing that, (so I suppose) we will receive a graph containing all the edges which appear in any MST. If this is right, I can simply XOR the edges list with the original graph edges list to find what edges are not in any MST.
Thanks in advance.
Do you add all the edges you find (=those with equal weight)? If so, you will lose some edges:
Consider a pentagon with equal edge costs. You start with 1 node and add the 2 edges to the 2 adjacent nodes. In you next step you would add the 2 edges going from those 2 adjacent nodes to the 2 disconnected nodes and you would be done. However, all edges are of equal cost and they are all valid to be in the MST. The edge between the last 2 nodes is not included by your algorithm but could be part of the MST.
It's even worse. Suppose that last edge is of lower cost. Your algorithm still doesn't include it, yet it's present in every MST. You're adding several edges per step to account for all the possibilities but adding those edges changes the next steps.

How does the winged-edge structure for meshes work?

I'm implementing an algorithm in which I need manipulate a mesh, adding and deleting edges quickly and iterating quickly over the edges adjacent to a vertex in CCW or CW order.
The winged-edge structure is used in the description of the algorithm I'm working from, but I can't find any concise descriptions of how to perform those operations on this data structure.
I've learned about it in University but that was a while ago.
In response to this question i've searched the web too for any good documentation, found none that is good, but we can go through a quick example for CCW and CW order and insertion/deletion here.
Have a look at this table and graphic:
from this page:
http://www.cs.mtu.edu/~shene/COURSES/cs3621/NOTES/model/winged-e.html
The table gives only the entry for one edge a, in a real table you have this row for every edge. You can see you get the:
left predecessor,
left successor,
right predecessor,
right successor
but here comes the critical point: it gives them relative to the direction of the edge which is X->Y in this case, and when it is right-traversed (e->a->c).
So for the CW-order of going through the graph this is very easy to read: edge a left has right-successor c and then you look into the row for edge c.
Ok, this table is easy to read for CW-order traversal; for CCW you have to think "from which edge did i come from when i walked this edge backwards". Effectively you get the next edge in CCW-order by taking the left-traverse-predecessor in this case b and continue with the row-entry for edge b in the same manner.
Now insertion and deletion: It is clear that you cant just remove the edge and think that the graph would still consist of only triangles; during deletion you have to join two vertices, for example X and Y in the graphic. To do this you first have to make sure that everywhere the edge a is referred-to we have to fix that reference.
So where can a be referred-to? only in the edges b,c,d and e (all other edges are too far away to know a) plus in the vertex->edge-table if you have that (but let's only consider the edges-table in this example).
As an example of how we have to fix edges lets take a look at c. Like a, c has a left and right pre- and successor (so 4 edges), which one of those is a? We cannot know that without checking because the table-entry for c can have the node Y in either its Start- or End-Node. So we have to check which one it is, let's assume we find that c has Y in its Start-Node, we then have to check whether a is c's right predecessor (which it is and which we find out by looking at c's entry and comparing it to a) OR whether it is c's right successor. "Successor??" you might ask? Yes because remember the two "left-traverse"-columns are relative to going the edge backward. So, now we have found that a is c's right predecessor and we can fix that reference by inserting a's right predecessor. Continue with the other 3 edges and you are done with the edges-table. Fixing an additional Node->Vertices is trivial of course, just look into the entries for X and Y and delete a there.
Adding edges is basically the reverse of this fix-up of 4 other edges BUT with a little twist. Lets call the node which we want to split Z (it will be split into X and Y). You have to take care that you split it in the right direction because you can have either d and e combined in a node or e and c (like if the new edge is horizontal instead of the vertical a in the graphic)! You first have to find out between which 2 edges of the soon-to-be X and between which 2 edges of Y the new edge is added: You just choose which edges shall be on one node and which on the other node: In this example graphic: choose that you want b, c and the 2 edges to the north in between them on one node, and it follows that the other edges are on the other node which will become X. You then find by vector-subtraction that the new edge a has to be between b and c, not between say c and one of the 2 edges in the north. The vector-subtraction is the desired position of the new X minus the desired position of Y.

R Tree 50,000 foot overview?

I'm working on a school project that involves taking a lat/long point and finding the top five closest points in a known list of places. The list is to be stored in memory, with the caveat that we must choose an "appropriate data structure" -- that is, we cannot simply store all the places in an array and compare distances one-by-one in a linear fashion. The teacher suggested grouping the place data by US State to prevent calculating the distance for places that are obviously too far away. I think I can do better.
From my research online it seems like an R-Tree or one of its variants might be a neat solution. Unfortunately, that sentence is as far as I've gotten with understanding the actual technique, as the literature is simply too dense for my non-academic head.
Can somebody give me a really high overview of what the process is for populating an R-Tree with lat/long data, and then traversing the tree to find those 5 nearest neighbors of a given point?
Additionally the project is in C, and I don't have to reinvent the wheel on this, so if you've used an existing open source C implementation of an R Tree I'd be interested in your experiences.
UPDATE: This blog post describes a straightforward search algorithm for a regionally partitioned space (like a PR quadtree). Hope that helps a future reader.
Have you considered alternative data structures?
I believe, instead of R-tree a Point Quadtree would be more effective for your need.Spatial Index Demos provides some demos for a list of possible data structures including R-tree and Point Quadtree. Hope it gives an insight.
Quad Trees
A quad tree takes a square of space and divides it into four children with half the dimensions along the X and Y axis.
+---+---+
| | | Each square is a child
| | | of the parent; when you
+---+---+ get to leaves a node has
| | | a single point or a list
| | | of points.
+---+---+
This data structure is recursive and you search for points by checking which child holds the point until you get to the leaf. A leaf either has a single member (point with X,Y coords) or a list of members, depending on the implementation. If you fill up a node you split it into 4 and distribute the children. Essentially, the data structure is a generalisation of a binary tree, so it is not necessarily balanced.
Balancing a quad tree may not be necessary for your purposes and is left as an exercise for the reader - try searching on the web for 'balanced quad tree'
Note that this data structure cannot index items that can overlap, but if you're only storing points this won't be a problem.
Finding nearest neighbours in a quad tree
Off the top of my head, here's a quick and dirty algorithm for finding the 'n' nearest neighbours to your point. It's not necessarily optimially efficient, but it will be fairly simple to implement. If someone has a link to a better one, feel free to post it in a comment or answer.
Locate the quad tree node containing
your point, keeping a list of its
parents.
Push all of the points in the
node into a priority queue based on
their distance from your base point
(i.e. by the length of the hypotenuse
per Pythagoras' theorem). Depending
on the implementation there may be
one or more per node. For a simple
implementation of a priority queue
data structure, look up 'binary
heap'.
If any of the 'n' points are further away then the edges of the bounding box, add the contents of its neighbours. i.e. If your base point is close to the edge of the bounding box, it is possible that neighbouring tree nodes might contain points that are closer than the points found within your bounding box. You will need to back up the tree to do this, which is why you need to keep track of your parent nodes.
When all of the 'n' closest points are closer than the edges of your bounding box you know that there could not possibly be neighbours that you have missed. Therefore, the 'n' closest points within this box must be your 'n' closest neighbours.

Resources