Create a consistent topology using pgrouting - postgis

I'm developing an application that needs routing information for certain cities. First, I downloaded a openstreetmap datafile (*.osm) and then I imported it into a postgreSQL database using osm2pgrouting tool (http://workshop.pgrouting.org/chapters/installation.html).
After this, I have the following tables:
nodes: that contains simple locations points
ways: that contains ways with some nodes involved
vertices_tmp: stores nodes that may be used for pgrouting functions like Djistra, A*...etc.
Would I use nodes that isn't in "vertices_tmp" table for calculate distances between nodes? Or I would only do it with the nodes stored in "vertices_tmp"?
Into ways table there are a field named "the_geom" that encapsulates different locations points (nodes). For example:
"MULTILINESTRING((1.5897786 42.5600441,1.5898376 42.5601455,1.589992 42.5605438,1.590095 42.5606795,1.5901782 42.5608026,1.5902238 42.561018,1.5902912 42.5616808,1.5903685 42.561899,1.5904008 42.5620563,1.5903836 42.5624117,1.5904265 42.5627151,1.5904947 42.5628368,1.5905981 42.5629553,1.5906926 42.5631007,1.590802 42.5633238,1.5908604 42.5634883,1.5909501 42.5637139,1.5910869 42.5638755,1.5913053 42.5639639,1.5914994 42.5640237,1.591648 42.5640261,1.5919232 42.5640145,1.5921124 42.5640363,1.5923292 42.5640953,1.592804 42.5643306))"
Can I route with intermediate nodes or only with source/target nodes?
My goal is to be able to routing between different nodes or POIs, depending of its amenity tags, not only driving distance, walking distance too. Furthermore I need to calculate shortest path for source/targets nodes.
Any idea for do this?

You can't use the elements of the nodes table.
If you want to plan route from one POI to another, first you have to find the nearest vertex/edge based on the selected algorithm(Shooting star requires edges, the others use vertices).
After this you can make the routing, just pick an algorithm from THIS SITE
You will find there a good tutorial about different routing solutions and some help for the detailed usage (including how to determine the closest way).

Related

How do I get all nodes that do not contain (relate) nodes with certain property in Cypher Neo4j

I have a case where I need to find nodes that do not contain (relate) all the required nodes.
My business logic is as follows:
* A Trajectory contains several Points.
* A Trajectory is complete when it has at least:
* ONE Point START
* ONE Point MIDDLE
* ONE Point FINISH
In the following example, I have 4 Trajectories
http://console.neo4j.org/?id=1fjeyl
One Trajectory is complete and the other three are incomplete.
How can I find all the trajectories that do not contain all the required points?
There's a few ways you can do this.
With this model, one way is you can collect the nodes per trajectory and use list predicates to only include trajectories where any of the required positions are missing:
MATCH (t:Trajectory)-[:CONTAINS]->(p)
WITH t, collect(DISTINCT p.pos) as pointPositions
WHERE size(pointPositions) < 3 OR any(required in ['START', 'END', 'MIDDLE'] WHERE NOT required IN pointPositions)
RETURN t
Note that if you refactor your model such that the position of a point is indicated by the relationship point instead, such as:
(:Trajectory)-[:HAS_START]->(:Point)
(:Trajectory)-[:HAS_END]->(:Point)
(:Trajectory)-[:HAS_MIDDLE]->(:Point)
Then your query will be a bit simpler and efficiency will improve (which will show the most gain when you have lots of trajectories, some with lots of connected nodes).
MATCH (t:Trajectory)
WHERE NOT (t)-[:HAS_START]->() OR NOT (t)-[:HAS_END]->() OR NOT (t)-[:HAS_MIDDLE]->()
RETURN t
With that modeling and this kind of query, we don't even have to expand out from trajectory nodes to get our answer, since a node knows what relationships are connected to it (by type and/or direction) and their count. It's very easy then to determine if certain relationship types are or aren't present.

Issue with OpenStreetMap administration area rendering

I imported latest OpenStreetMap data to local SQLServer db and generated "outer" shape for some administration area. Original area shape is available here:
http://www.openstreetmap.org/relation/2658573
What I received is below:
What I did is I took "outer" way definitions for this shape (mentioned above) in the order defined in the db and constructed POLYGON definition from points belonging to all the ways in the order defined in OpenStreetMap db.
To import data to SQLServer db I used OsmSharp.core nuget package.
The orientation of shape data is clockwise as per info on this page(see Naming and Orientation section)):
http://wiki.openstreetmap.org/wiki/Talk:Relation:multipolygon
My questions are:
Does anybody know why taking direct definitions for "ways"/lines describing some area shape doesn't work as it should?
Should I reject some points from line/way definitions?
Maybe I just missed something else...
Edit 1:
I did quick query to verify info from scai and it looks promising:
NodeNr column shows which point on the way/line definition I should take as the start/endpoint for polygon definition. WayNr shows which consecutive way shares point with which one.
Edit 2:
I tested data related to way definitions already and it looks the main issue is with order of nodes in the way definitions only. All the tested ways used all the nodes belonging to them (no any redundant points were there in case the way could cross some other way in the middle of that way).
Some sample data:
The definition of the way with nr 42 (WayNr = 42) shows that start point of this line (continuation of the way nr 41) is on position 44. It means point/node 44 in the definition of the way 42 has the same gps/coordinates as point on position 30 in the definition of way nr 41.
After taking such "reverted" lines in the correct order I received the shape shown below:
After adding an inner area (exclusion) it looks finally as the original one shape:
The other issue with OpenStreetMap data is it uses right-hand rule to define outer-ring of the shape (clockwise instead of counter-clockwise) so in case of SQLServer we have to use ReorientObject() method on Geography instance. To detect orientation of the shape you can use EnvelopeAngle() method on Geography object for instance.
Edit 3:
I received an answer on OSM forum the order of ways on relation members doesn't matter (we can't rely on it) so ways have to be reordered before any further processing yet...
There are already various similar questions here at stackoverflow and at gis.stackexchange.com.
Relations don't necessarily contain the ways in the correct order. Thus it will be necessary to order the ways yourself. Take a look at the ID of the last node of a way and search for the way who has the same ID for its start node. If you do this for all ways then the resulting geometry should be fine.

Need help solving a problem using graphs in C

i'm coding a c project for an algorithm class and i really need some help!
Here's the problem:
I have a set of names like this one N = (James,John,Robert,Mary,Patricia,Linda Barbara) wich are stored in an RB tree.
Starting from this set of names a series of couple like those ones are formed:
(James,Mary)
(James,Patricia)
(John,Linda)
(John,Barbara)
(Robert,Linda)
(Robert,Barbara)
Now i need to merge the elements in a way that i can form n subgroups with the constraint that each pairing is respected and the group has the smallest possible cardinality.
With the couples in the example they will form two groups
(James,Mary,Patricia) and (John,Robert,Barbara,Linda).
The task is to return the maximum number of groups formed and the number of males and females in the group with the maximum cardinality.
In this case it would be 2 2 2
I was thinking about building a graph where every name is represented by a vertex and two vertex are in an edge only if they are paired.
I can then use an algorithm (like Kruskal) to find the Minimum spanning tree.Is that right?
The problem is that the graph would not be completely connected.
I also need to find a way to map the names to the edges of the Graph and vice-versa.
Can the edges be indexed by a string?
Every help is really appreciated :)
Thanks in advice!
You don't need to find the minimum spanning tree. That is really for finding the "best" edges in a graph that will still keep the graph connected. In other words, you don't care how John and Robert are connected, just that they are.
You say that the problem is that the graph would not be completely connected, but I think that is actually the point. If you represent graph edges by using the couples as you suggest, then the vertices that are connected form the groups that you are looking for.
In your example, James is connected to Mary and also James is connected to Patricia. No other person connects to any of those three vertices (if they did, you would have another couple that included them), which is why they form a single group of (James, Mary, Patricia). Similarly all of John, Robert, Barbara, and Linda are connected to each other.
Your task is really to form the graph and find all of the connected subgraphs that are disjoint from each other.
While not a full algorithm, I hope that helps get you started.
I think that you can easily solve this with a dfs and connected components. Because every person(node) has a relation with an other one (edge). So you have an outer loop and run an explore function for every node which is unvisited and add the same number for every node explored by the explore function.
e.g
dfs() {
int group 0;
for(int i=0;i<num_nodes;i++) {
if(nodes[i].visited==false){
explore(nodes[i],group);
group++;
}
}
then you simple have to sort the node by the group and then you are ready. if you want to track the path you can use a pre number which indicates which node was explored first, second..etc
(sorry for my bad english)!
The sets of names and pairs of names already form a graph. A data structure with nodes and pointers to other nodes is just another representation, one that you don't necessarily need. Disjoint sets are easier to implement IMO, and their purpose in life is exactly to keep track of sameness as pairs of things are joined together.

Datamodel for Property Graph over HBase/Cassandra

I'm willing to store a Property Graph into HBase. A Property Graph is a graph nodes and edges have properties and multiple edges can link the same tuple of nodes as long as the edges belong to different types.
My query pattern will be either asking for properties and neighborhood or traversing the graph. An example is: Vertex[name=claudio]=>OutgoingEdge[knows]=>Vertex[gender=female], which will give me all the female people that claudio likes.
I know that a graph database does just this, but they usually don't scale on multiple nodes in case of a huge dataset. So I'm willing to implement this on a NoSQL ColumnStore (HBase, Cassandra...)
My datamodel follows.
Vertices Table:
key: vertexid (uuid)
Family "Properties:": <property name>=><property value>, ...
Family "OutgoingEdges:": <edge key>=><other vertexid>, ...
Family "IncomingEdges:": same as outgoing edges...
This table allows me to fetch quickly the properties of a vertex and
its adjacency list. I can't use the vertexid as the other endpoint
because multiple edges (with different types) can connect the same two
vertices.
Edges Table:
key: edge key (composite(<source vertexid>, <destination vertexid>,
<edge typename>)) (i.e. vertexid1_vertexid2_knows)
Family "Properties:": <property name>=><property value>, ...
This table allows me to fetch quickly the properties of an edge.
Edges Types:
key: composite(<source vertexid>, "out|in", <edge typename>) (i.e.
vertexid1_out_knows)
Family "Neighbor:": <destination vertexid>=>null,...
This table allows me to search/scan for edges that are either incoming
or outgoing from a vertex and belong to specific type and would be the
core of the traversing ability of the API (so i want it to be as fast as
possible both in terms of network I/O (RPCs), disk I/O (seek)). It
should also "scale" on the size of the graph, meaning that with the
growth of the graph the cost of this type of operation should depend on
the number of edges outgoing from the vertex and not on the total number
of vertices and edges.
The example above i'd be considering vertexid1 the source vertex with
property name:claudio i'd scan vertexid1_out_knows and receive the list of
vertices connected. After that i can scan on the column
"Properties:gender" on these vertices and look for those having the
"female" value.
Questions:
1) General: do you see a better data model for my operations?
2) Can i fit everything in one table where for certain keys some
families would be empty (i.e. the "OutgoingEdges:" family would not make
sense for the edges)? I'd like that because as you can see all the keys
are composed by the vertexid uuid prefix, so they would be very compact
and fit mostly on the same regionserver.
3) I guess that for the scanning I'd make extensive use of Filters. I
guess regexp Filter will be my friend. Do you have concerns about
performance of filters applied to this data model?
This type of model looks like a sensible starting point for Cassandra (don't know much about HBase) - but for any distributed store you will run up against problems when traversing, because traversals will cross multiple nodes.
This is why dedicated graph databases such as Neo4J use a single-node design, and try to keep all data in RAM.
Looking up properties of particular nodes or edges should work well and scale horizontally - Twitter's FlockDB (now apparently abandoned) was a notable example of this.
You also need to consider whether you need lookups other than by ID (i.e. do you need any indexes)?

Suggestions of the easiest algorithms for some Graph operations

The deadline for this project is closing in very quickly and I don't have much time to deal with what it's left. So, instead of looking for the best (and probably more complicated/time consuming) algorithms, I'm looking for the easiest algorithms to implement a few operations on a Graph structure.
The operations I'll need to do is as follows:
List all users in the graph network given a distance X
List all users in the graph network given a distance X and the type of relation
Calculate the shortest path between 2 users on the graph network given a type of relation
Calculate the maximum distance between 2 users on the graph network
Calculate the most distant connected users on the graph network
A few notes about my Graph implementation:
The edge node has 2 properties, one is of type char and another int. They represent the type of relation and weight, respectively.
The Graph is implemented with linked lists, for both the vertices and edges. I mean, each vertex points to the next one and each vertex also points to the head of a different linked list, the edges for that specific vertex.
What I know about what I need to do:
I don't know if this is the easiest as I said above, but for the shortest path between 2 users, I believe the Dijkstra algorithm is what people seem to recommend pretty often so I think I'm going with that.
I've been searching and searching and I'm finding it hard to implement this algorithm, does anyone know of any tutorial or something easy to understand so I can implement this algorithm myself? If possible, with C source code examples, it would help a lot. I see many examples with math notations but that just confuses me even more.
Do you think it would help if I "converted" the graph to an adjacency matrix to represent the links weight and relation type? Would it be easier to perform the algorithm on that instead of the linked lists? I could easily implement a function to do that conversion when needed. I'm saying this because I got the feeling it would be easier after reading a couple of pages about the subject, but I could be wrong.
I don't have any ideas about the other 4 operations, suggestions?
List all users in the graph network given a distance X
A distance X from what? from a starting node or a distance X between themselves? Can you give an example? This may or may not be as simple as doing a BF search or running Dijkstra.
Assuming you start at a certain node and want to list all nodes that have distances X to the starting node, just run BFS from the starting node. When you are about to insert a new node in the queue, check if the distance from the starting node to the node you want to insert the new node from + the weight of the edge from the node you want to insert the new node from to the new node is <= X. If it's strictly lower, insert the new node and if it is equal just print the new node (and only insert it if you can also have 0 as an edge weight).
List all users in the graph network given a distance X and the type of relation
See above. Just factor in the type of relation into the BFS: if the type of the parent is different than that of the node you are trying to insert into the queue, don't insert it.
Calculate the shortest path between 2 users on the graph network given a type of relation
The algorithm depends on a number of factors:
How often will you need to calculate this?
How many nodes do you have?
Since you want easy, the easiest are Roy-Floyd and Dijkstra's.
Using Roy-Floyd is cubic in the number of nodes, so inefficient. Only use this if you can afford to run it once and then answer each query in O(1). Use this if you can afford to keep an adjacency matrix in memory.
Dijkstra's is quadratic in the number of nodes if you want to keep it simple, but you'll have to run it each time you want to calculate the distance between two nodes. If you want to use Dijkstra's, use an adjacency list.
Here are C implementations: Roy-Floyd and Dijkstra_1, Dijkstra_2. You can find a lot on google with "<algorithm name> c implementation".
Edit: Roy-Floyd is out of the question for 18 000 nodes, as is an adjacency matrix. It would take way too much time to build and way too much memory. Your best bet is to either use Dijkstra's algorithm for each query, but preferably implementing Dijkstra using a heap - in the links I provided, use a heap to find the minimum. If you run the classical Dijkstra on each query, that could also take a very long time.
Another option is to use the Bellman-Ford algorithm on each query, which will give you O(Nodes*Edges) runtime per query. However, this is a big overestimate IF you don't implement it as Wikipedia tells you to. Instead, use a queue similar to the one used in BFS. Whenever a node updates its distance from the source, insert that node back into the queue. This will be very fast in practice, and will also work for negative weights. I suggest you use either this or the Dijkstra with heap, since classical Dijkstra might take a long time on 18 000 nodes.
Calculate the maximum distance between 2 users on the graph network
The simplest way is to use backtracking: try all possibilities and keep the longest path found. This is NP-complete, so polynomial solutions don't exist.
This is really bad if you have 18 000 nodes, I don't know any algorithm (simple or otherwise) that will work reasonably fast for so many nodes. Consider approximating it using greedy algorithms. Or maybe your graph has certain properties that you could take advantage of. For example, is it a DAG (Directed Acyclic Graph)?
Calculate the most distant connected users on the graph network
Meaning you want to find the diameter of the graph. The simplest way to do this is to find the distances between each two nodes (all pairs shortest paths - either run Roy-Floyd or Dijkstra between each two nodes and pick the two with the maximum distance).
Again, this is very hard to do fast with your number of nodes and edges. I'm afraid you're out of luck on these last two questions, unless your graph has special properties that can be exploited.
Do you think it would help if I "converted" the graph to an adjacency matrix to represent the links weight and relation type? Would it be easier to perform the algorithm on that instead of the linked lists? I could easily implement a function to do that conversion when needed. I'm saying this because I got the feeling it would be easier after reading a couple of pages about the subject, but I could be wrong.
No, adjacency matrix and Roy-Floyd are a very bad idea unless your application targets supercomputers.
This assumes O(E log V) is an acceptable running time, if you're doing something online, this might not be, and it would require some higher powered machinery.
List all users in the graph network given a distance X
Djikstra's algorithm is good for this, for one time use. You can save the result for future use, with a linear scan through all the vertices (or better yet, sort and binary search).
List all users in the graph network given a distance X and the type of relation
Might be nearly the same as above -- just use some function where the weight would be infinity if it is not of the correct relation.
Calculate the shortest path between 2 users on the graph network given a type of relation
Same as above, essentially, just determine early if you match the two users. (Alternatively, you can "meet in the middle", and terminate early if you find someone on both shortest path spanning tree)
Calculate the maximum distance between 2 users on the graph network
Longest path is an NP-complete problem.
Calculate the most distant connected users on the graph network
This is the diameter of the graph, which you can read about on Math World.
As for the adjacency list vs adjacency matrix question, it depends on how densely populated your graph is. Also, if you want to cache results, then the matrix might be the way to go.
The simplest algorithm to compute shortest path between two nodes is Floyd-Warshall. It's just triple-nested for loops; that's it.
It computes ALL-pairs shortest path in O(N^3), so it may do more work than necessary, and will take a while if N is huge.

Resources