Traverse two links in OrientDB - graph-databases

I'm really new to graph databases and trying to work with OrientDB.
I am using the document DB and I have something like:
projects <--> sources --> subsets
How do I get the subsets linked to sources of a specific project?
Is it possible to do something like this:
(select from (traverse sources.subsets from #project-rid))
Or should I use two traverse?
(select from (traverse subsets from (traverse sources from #project-rid)))
Thanks!

Traverse is useful on variable length depth. If it's known just cross them by using dot (.) notation.
select expand( sources.subsets ) from #project-rid

Related

Gremlin - Update values of multiple edges

I am using AWS Neptune and I have to modify a certain property of a set of EDGEs with specific values. I also need this done in a single transaction. In AWS Neptune, manual transaction logic using tx.commit() and tx.rollback() is not supported. Which means I have to do this operation in a single traversal.
If I was to modify properties of vertices instead of edges, I could have got it done with a query similar to the following one:
g.V(<id 1>).property('name', 'Marko').V(<id 2>).property('name', 'Stephen');
This is because it is possible to select vertices by id in mid traversal, i.e. the GraphTraversal class has V(String ... vertexIds) as a member function.
But this is not the same for the case of edges. I am not able to select edges this way because E(String ... edgeIds) is not a member function of the GraphTraversal class.
Can somebody suggest the correct way I can solve this problem?
Thank you.
Amazon Neptune engine 1.0.1.0.200463.0 added Support for Gremlin Sessions to enable executing multiple Gremlin traversals in a single transaction.
However, you can do it also with a single query like this:
g.E('id1', 'id2', 'id3').coalesce(
has(id, 'id1').property('name','marko'),
has(id, 'id2').property('name','stephan'),
has(id, 'id3').property('name','vadas'))
You can get the same result as a mid traversal E() using V().outE().hasId(<list of IDs>)

Neo4J: How do I check each disjoint subgraph in a Neo4J query?

After I query through my database using Neo4J, I get a bunch of disjoint subgraphs like 'islands of nodes'.
What I want though is to get the most recent node for each 'island' (I have date values on each node).
How do I go about doing that?
Firstly you need to calculate yours islands like you said.
To di it yYou can check the neo4j-graph-algo with the procedure algo.unionFind : https://neo4j-contrib.github.io/neo4j-graph-algorithms/#_community_detection_connected_components
Then for each of your island, you have to order the nodes and take the first one.

Cypher Neo4j - How to identify instances of a relationship for a particular field

I'm in the process of trying to learn Cypher for use with graph databases.
Using Neo4j's test database (http://www.neo4j.org/console?id=shakespeare)
, how can I find all the performances of a particular play? I'm trying to establish how many times Julius Caesar was performed.
I have tried to use:
MATCH (title)<-[:PERFORMED]-(performance)
WHERE (title: "Julias Caesar")
RETURN title AS Productions;
I'm aware it's quite easy to recognise manually, but on a larger scale it wouldn't be possible.
Thank you
You would have to count the number of performance nodes . You can use count to get the number of nodes.
MATCH (t)<-[:PERFORMED]-(performance)
WHERE t.title = "Julias Caesar"
RETURN DISTINCT t AS Productions,count(performance) AS count_of_performance

Array intersect Hive

I have two arrays of string in Hive like
{'value1','value2','value3'}
{'value1', 'value2'}
I want to merge arrays without duplicates, result:
{'value1','value2','value3'}
How I can do it in hive?
A native solution could be that:
SELECT id, collect_set(item)
FROM table
LATERAL VIEW explode(list) lTable AS item
GROUP BY id;
Firstly explode with lateralview, and next group by and remove duplicates with collect_set.
You will need a UDF for this. Klout has a bunch of opensource HivUDFS under the package
brickhouse. Here is the github link. They have a bunch of UDF's that exactly serves your purpose.
Download,build and add the JAR. Here is an example
CREATE TEMPORARY FUNCTION combine AS 'brickhouse.udf.collect.CombineUDF';
CREATE TEMPORARY FUNCTION combine_unique AS 'brickhouse.udf.collect.CombineUniqueUDAF';
select combine_unique(combine(array('a','b','c'), array('b','c','d'))) from reqtable;
OK
["d","b","c","a"]

What DBMS should I use to store openstreetmap as a graph?

Background:
I need to store the following data in a database:
osm nodes with tags;
osm edges with weights (that is an edge between two nodes extracted from 'way' from an .osm file).
Nodes that form edges, which are in the same 'way' sets should have the same tags as those ways, i.e. every node in a 'way' set of nodes which is a highway should have a 'highway' tag.
I need this structure to easily generate a graph based on various filters, e.g. a graph consisting only of nodes and edges which are highways, or a 'foot paths' graph, etc.
Problem:
I have not heard about the spatial index before, so I just parsed an .osm file into a MySQL database:
all nodes to a 'nodes' table (with respective coordinates columns) - OK, about 9,000,000 of rows in my case:
(INSERT INTO nodes VALUES [pseudocode]node_id,lat,lon[/pseudocode];
all ways to an 'edges' table (usually one way creates a few edges) - OK, about 9,000,000 of rows as well:
(INSERT INTO edges VALUES [pseudocode]edge_id,from_node_id,to_node_id[/pseudocode];
add tags to nodes, calculate weights for edges - Problem:
Here is the problematic php script:
$query = mysql_query('SELECT * FROM edges');
$i=0;
while ($res = mysql_fetch_object($query)) {
$i++;
echo "$i\n";
$node1 = mysql_query('SELECT * FROM nodes WHERE id='.$res->from);
$node1 = mysql_fetch_object($node1);
$tag1 = $node1->tags;
$node2 = mysql_query('SELECT * FROM nodes WHERE id='.$res->to);
$node2 = mysql_fetch_object($node2);
$tag2 = $node2->tags;
mysql_query('UPDATE nodes SET tags="'.$tag1.$res->tags.'" WHERE nodes.id='.$res->from);
mysql_query('UPDATE nodes SET tags="'.$tag2.$res->tags.'" WHERE nodes.id='.$res->to);`
Nohup shows the output for 'echo "$i\n"' each 55-60 seconds (which can take more than 17 years to finish if the size of the 'edges' table is more than 9,000,000 rows, like in my case).
Htop shows a /usr/bin/mysqld process which takes 40-60% of CPU.
The same problem exists for the script which tries to calculate the weight (the distance) of an edge (select all edges, take an edge, then select two nodes of this edge from 'nodes' table, then calculate the distance, then update the edges table).
Question:
How can I make this SQL updates faster? Should I tweak any of MySQL config settings? Or should I use PostgreSQL with PostGIS extension? Should I use another structure for my data? Or should I somehow utilize the spatial index?
If I understand you right there is two things to discuss.
First, your idea of putting the highway-tag on the start and stop nodes. A node can have more than one edge connected, where to put the tag from the second edge? Or third or fourth if it is a crossing? The reason the highway tag is putted in the edges table in the first place is that from a relational point of view that is where it belongs.
Second, to get the whole table and process it outside the database is not the right way. What a relational database is really good at is taking care of this whole process.
I have not worked with mysql, and I fully agree that you will probably get a lot more fun if migrating to PostGIS since PostGIS has a lot better spatial capabilities (even if you don't need any spatial capabilities for this particular task) from what I have heard.
So if we ignore the first problem and just for showing the concept say that there is only two edges connected to one node and that each node has two tag-fields. tag1 and tag2. Then it could look something like this in PostGIS:
UPDATE nodes set tag1=edges.tags from edges where nodes.id=edges.from;
UPDATE nodes set tag2=edges.tags from edges where nodes.id=edges.to;
If you disable the indexes that should be very fast.
Again,
if I have understood you right.
PostgreSQL
Openstreetmap itself uses PostgreSQL, so I guess that's recommended.
See: http://wiki.openstreetmap.org/wiki/PostgreSQL
You can see OSM's database schema at: http://wiki.openstreetmap.org/wiki/Database_Schema
So you can use the same fields, fieldtypes and indexes that OSM uses for maximum compatibility.
MySQL
If you want to import .osm files into a MySQL database, have a look at:
http://wiki.openstreetmap.org/wiki/OsmDB.pm
Here you will find perl code that will create MySQL tables, parse a OSM file and import it into your MySQL database.
Making it faster
If you are updating in bulk, you don't need to update the indexes after every update.
You can just disable the indexes, do all your updates and re-enable the index.
I'm guessing that should be a whole lot faster.
Good luck

Resources