How can I increase performance of shortest-path on AgensGraph? - agens-graph

I tried to use shortest-path on AgensGraph.
But, It is quietly slower than other graph database.
How can I increase performance of shortest-path on AgensGraph?
I want to some tips or configuration parameters.
Attaching sample script follow.
create graph shortestpath;
create vlabel o;
create vlabel l;
create elabel e;
create property index on o ( id );
create property index on l ( id );
create property index on e ( id );
create (:o{id:1})
create (:o{id:2})
create (:o{id:3})
create (:o{id:4})
create (:o{id:5})
create (:o{id:6})
create (:o{id:7})
create (:o{id:8})
create (:o{id:9});
match (o:o) create (:l{id:o.id});
match (n:l) where n.id >= 1 and n.id <= 9
create (n)-[:e{id:'l:'+n.id+'->l:'+(n.id*10+1)}]->(:l{id:n.id*10+1})
create (n)-[:e{id:'l:'+n.id+'->l:'+(n.id*10+2)}]->(:l{id:n.id*10+2})
create (n)-[:e{id:'l:'+n.id+'->l:'+(n.id*10+3)}]->(:l{id:n.id*10+3})
create (n)-[:e{id:'l:'+n.id+'->l:'+(n.id*10+4)}]->(:l{id:n.id*10+4})
create (n)-[:e{id:'l:'+n.id+'->l:'+(n.id*10+5)}]->(:l{id:n.id*10+5})
create (n)-[:e{id:'l:'+n.id+'->l:'+(n.id*10+6)}]->(:l{id:n.id*10+6})
create (n)-[:e{id:'l:'+n.id+'->l:'+(n.id*10+7)}]->(:l{id:n.id*10+7})
create (n)-[:e{id:'l:'+n.id+'->l:'+(n.id*10+8)}]->(:l{id:n.id*10+8})
create (n)-[:e{id:'l:'+n.id+'->l:'+(n.id*10+9)}]->(:l{id:n.id*10+9});
match (n:l) where n.id >= 11 and n.id <= 99
create (n)-[:e{id:'l:'+n.id+'->l:'+(n.id*10+1)}]->(:l{id:n.id*10+1})
create (n)-[:e{id:'l:'+n.id+'->l:'+(n.id*10+2)}]->(:l{id:n.id*10+2})
create (n)-[:e{id:'l:'+n.id+'->l:'+(n.id*10+3)}]->(:l{id:n.id*10+3})
create (n)-[:e{id:'l:'+n.id+'->l:'+(n.id*10+4)}]->(:l{id:n.id*10+4})
create (n)-[:e{id:'l:'+n.id+'->l:'+(n.id*10+5)}]->(:l{id:n.id*10+5})
create (n)-[:e{id:'l:'+n.id+'->l:'+(n.id*10+6)}]->(:l{id:n.id*10+6})
create (n)-[:e{id:'l:'+n.id+'->l:'+(n.id*10+7)}]->(:l{id:n.id*10+7})
create (n)-[:e{id:'l:'+n.id+'->l:'+(n.id*10+8)}]->(:l{id:n.id*10+8})
create (n)-[:e{id:'l:'+n.id+'->l:'+(n.id*10+9)}]->(:l{id:n.id*10+9});
match (n:l) where n.id >= 111 and n.id <= 999
create (n)-[:e{id:'l:'+n.id+'->l:'+(n.id*10+1)}]->(:l{id:n.id*10+1})
create (n)-[:e{id:'l:'+n.id+'->l:'+(n.id*10+2)}]->(:l{id:n.id*10+2})
create (n)-[:e{id:'l:'+n.id+'->l:'+(n.id*10+3)}]->(:l{id:n.id*10+3})
create (n)-[:e{id:'l:'+n.id+'->l:'+(n.id*10+4)}]->(:l{id:n.id*10+4})
create (n)-[:e{id:'l:'+n.id+'->l:'+(n.id*10+5)}]->(:l{id:n.id*10+5})
create (n)-[:e{id:'l:'+n.id+'->l:'+(n.id*10+6)}]->(:l{id:n.id*10+6})
create (n)-[:e{id:'l:'+n.id+'->l:'+(n.id*10+7)}]->(:l{id:n.id*10+7})
create (n)-[:e{id:'l:'+n.id+'->l:'+(n.id*10+8)}]->(:l{id:n.id*10+8})
create (n)-[:e{id:'l:'+n.id+'->l:'+(n.id*10+9)}]->(:l{id:n.id*10+9});
match (n:l) where n.id >= 1111 and n.id <= 9999
create (n)-[:e{id:'l:'+n.id+'->l:'+(n.id*10+1)}]->(:l{id:n.id*10+1})
create (n)-[:e{id:'l:'+n.id+'->l:'+(n.id*10+2)}]->(:l{id:n.id*10+2})
create (n)-[:e{id:'l:'+n.id+'->l:'+(n.id*10+3)}]->(:l{id:n.id*10+3})
create (n)-[:e{id:'l:'+n.id+'->l:'+(n.id*10+4)}]->(:l{id:n.id*10+4})
create (n)-[:e{id:'l:'+n.id+'->l:'+(n.id*10+5)}]->(:l{id:n.id*10+5})
create (n)-[:e{id:'l:'+n.id+'->l:'+(n.id*10+6)}]->(:l{id:n.id*10+6})
create (n)-[:e{id:'l:'+n.id+'->l:'+(n.id*10+7)}]->(:l{id:n.id*10+7})
create (n)-[:e{id:'l:'+n.id+'->l:'+(n.id*10+8)}]->(:l{id:n.id*10+8})
create (n)-[:e{id:'l:'+n.id+'->l:'+(n.id*10+9)}]->(:l{id:n.id*10+9});
\timing
match p = allshortestpaths( (l1:l)-[:e*]->(l2:l) ) where l1.id = 1 and l2.id = 11111 return l1.id as l1id, l2.id as l2id, count(p) order by l1id, l2id;
l1id | l2id | count
------+-------+-------
1 | 11111 | 1
(1 row)
Time: 133.547 ms
Is it possible that improve to under 10ms.

There is an improvement on Shortest-Path on "AgensGraph Version 2.1".
Algorithm is changed to "Bidirectional BFS" from "BFS".
agens (AgensGraph 2.1.0, based on PostgreSQL 10.4)
Type "help" for help.
match p = allshortestpaths( (l1:l)-[:e*]->(l2:l) ) where l1.id = 1 and l2.id = 11111 return l1.id as l1id, l2.id as l2id, count(p) order by l1id, l2id;
l1id | l2id | count
------+-------+-------
1 | 11111 | 1
(1 row)
Time: 1.776 ms
If you use "Version 1.3" or "Version 2.0", It is better choice upgrade to "Version 2.1"

Related

Adding multiple records from a string

I have a string of email addresses. For example, "a#a.com; b#a.com; c#a.com"
My database is:
record | flag1 | flag2 | emailaddresss
--------------------------------------------------------
1 | 0 | 0 | a#a.com
2 | 0 | 0 | b#a.com
3 | 0 | 0 | c#a.com
What I need to do is parse the string, and if the address is not in the database, add it.
Then, return a string of just the record numbers that correspond to the email addresses.
So, if the call is made with "A#a.com; c#a.com; d#a.com", the rountine would add "d#a.com", then return "1, 3,4" corresponding to the records that match the email addresses.
What I am doing now is calling the database once per email address to look it up and confirm it exists (adding if it doesn't exist), then looping thru them again to get the addresses 1 by 1 from my powershell app to collect the record numbers.
There has to be a way to just pass all of the addresses to SQL at the same time, right?
I have it working in powershell.. but slowly..
I'd love a response from SQL as shown above of just the record number for each email address in a single response. That is, "1,2,4" etc.
My powershell code is:
$EmailList2 = $EmailList.split(";")
# lets get the ID # for each eamil address.
foreach($x in $EmailList2)
{
$data = exec-query "select Record from emailaddresses where emailAddress = #email" -parameter #{email=$x.trim()} -conn $connection
if ($($data.Tables.record) -gt 0)
{
$ResponseNumbers = $ResponseNumbers + "$($data.Tables.record), "
}
}
$ResponseNumbers = $($ResponseNumbers+"XX").replace(", XX","")
return $ResponseNumbers
You'd have to do this in 2 steps. Firstly INSERT the new values and then use a SELECT to get the values back. This answer uses delimitedsplit8k (not delimitedsplit8k_LEAD) as you're still using SQL Server 2008. On the note of 2008 I strongly suggest looking at upgrade paths soon as you have about 6 weeks of support left.
You can use the function to split the values and then INSERT/SELECT appropriately:
DECLARE #Emails varchar(8000) = 'a#a.com;b#a.com;c#a.com';
WITH Emails AS(
SELECT DS.Item AS Email
FROM dbo.DelimitedSplit8K(#Emails,';') DS)
INSERT INTO YT (emailaddress) --I don't know what the other columns value should be, so have excluded
SELECT E.Email
FROM dbo.YourTable YT
LEFT JOIN Emails E ON YT.emailaddress = E.Email
WHERE E.Email IS NULL;
SELECT YT.record
FROM dbo.YourTable YT
JOIN dbo.DelimitedSplit8K(#Emails,';') DS ON DS.Item = YT.emailaddress;

How to find internal vertices in variable-length-edges on AgensGraph?

I tried to find internal vertices in variable-length-edges on AgensGraph.
But, It returns error message like following.
Is there problem on CYPHER query on mime?
How can I view internal vertices of VLE?
Attaching sample script follow.
create graph vle;
create vlabel o;
create vlabel l;
create elabel e;
create property index on o ( id );
create property index on l ( id );
create property index on e ( id );
create (:o{id:1})
create (:o{id:2})
create (:o{id:3})
create (:o{id:4})
create (:o{id:5})
create (:o{id:6})
create (:o{id:7})
create (:o{id:8})
create (:o{id:9});
match (o:o) create (:v{id:o.id});
match (n:v) where n.id >= 1 and n.id <= 9
create (n)-[:e{id:'v:'+n.id+'->v:'+(n.id*10+1)}]->(:v{id:n.id*10+1})
create (n)-[:e{id:'v:'+n.id+'->v:'+(n.id*10+2)}]->(:v{id:n.id*10+2})
create (n)-[:e{id:'v:'+n.id+'->v:'+(n.id*10+3)}]->(:v{id:n.id*10+3})
create (n)-[:e{id:'v:'+n.id+'->v:'+(n.id*10+4)}]->(:v{id:n.id*10+4})
create (n)-[:e{id:'v:'+n.id+'->v:'+(n.id*10+5)}]->(:v{id:n.id*10+5})
create (n)-[:e{id:'v:'+n.id+'->v:'+(n.id*10+6)}]->(:v{id:n.id*10+6})
create (n)-[:e{id:'v:'+n.id+'->v:'+(n.id*10+7)}]->(:v{id:n.id*10+7})
create (n)-[:e{id:'v:'+n.id+'->v:'+(n.id*10+8)}]->(:v{id:n.id*10+8})
create (n)-[:e{id:'v:'+n.id+'->v:'+(n.id*10+9)}]->(:v{id:n.id*10+9});
match (n:v) where n.id >= 11 and n.id <= 99
create (n)-[:e{id:'v:'+n.id+'->v:'+(n.id*10+1)}]->(:v{id:n.id*10+1})
create (n)-[:e{id:'v:'+n.id+'->v:'+(n.id*10+2)}]->(:v{id:n.id*10+2})
create (n)-[:e{id:'v:'+n.id+'->v:'+(n.id*10+3)}]->(:v{id:n.id*10+3})
create (n)-[:e{id:'v:'+n.id+'->v:'+(n.id*10+4)}]->(:v{id:n.id*10+4})
create (n)-[:e{id:'v:'+n.id+'->v:'+(n.id*10+5)}]->(:v{id:n.id*10+5})
create (n)-[:e{id:'v:'+n.id+'->v:'+(n.id*10+6)}]->(:v{id:n.id*10+6})
create (n)-[:e{id:'v:'+n.id+'->v:'+(n.id*10+7)}]->(:v{id:n.id*10+7})
create (n)-[:e{id:'v:'+n.id+'->v:'+(n.id*10+8)}]->(:v{id:n.id*10+8})
create (n)-[:e{id:'v:'+n.id+'->v:'+(n.id*10+9)}]->(:v{id:n.id*10+9});
match (n:v) where n.id >= 111 and n.id <= 999
create (n)-[:e{id:'v:'+n.id+'->v:'+(n.id*10+1)}]->(:v{id:n.id*10+1})
create (n)-[:e{id:'v:'+n.id+'->v:'+(n.id*10+2)}]->(:v{id:n.id*10+2})
create (n)-[:e{id:'v:'+n.id+'->v:'+(n.id*10+3)}]->(:v{id:n.id*10+3})
create (n)-[:e{id:'v:'+n.id+'->v:'+(n.id*10+4)}]->(:v{id:n.id*10+4})
create (n)-[:e{id:'v:'+n.id+'->v:'+(n.id*10+5)}]->(:v{id:n.id*10+5})
create (n)-[:e{id:'v:'+n.id+'->v:'+(n.id*10+6)}]->(:v{id:n.id*10+6})
create (n)-[:e{id:'v:'+n.id+'->v:'+(n.id*10+7)}]->(:v{id:n.id*10+7})
create (n)-[:e{id:'v:'+n.id+'->v:'+(n.id*10+8)}]->(:v{id:n.id*10+8})
create (n)-[:e{id:'v:'+n.id+'->v:'+(n.id*10+9)}]->(:v{id:n.id*10+9});
match (n:v) where n.id >= 1111 and n.id <= 9999
create (n)-[:e{id:'v:'+n.id+'->v:'+(n.id*10+1)}]->(:v{id:n.id*10+1})
create (n)-[:e{id:'v:'+n.id+'->v:'+(n.id*10+2)}]->(:v{id:n.id*10+2})
create (n)-[:e{id:'v:'+n.id+'->v:'+(n.id*10+3)}]->(:v{id:n.id*10+3})
create (n)-[:e{id:'v:'+n.id+'->v:'+(n.id*10+4)}]->(:v{id:n.id*10+4})
create (n)-[:e{id:'v:'+n.id+'->v:'+(n.id*10+5)}]->(:v{id:n.id*10+5})
create (n)-[:e{id:'v:'+n.id+'->v:'+(n.id*10+6)}]->(:v{id:n.id*10+6})
create (n)-[:e{id:'v:'+n.id+'->v:'+(n.id*10+7)}]->(:v{id:n.id*10+7})
create (n)-[:e{id:'v:'+n.id+'->v:'+(n.id*10+8)}]->(:v{id:n.id*10+8})
create (n)-[:e{id:'v:'+n.id+'->v:'+(n.id*10+9)}]->(:v{id:n.id*10+9});
match p = ( (v1:v{id:1})-[:e*]->(v2:v{id:11111}) ) return nodes(p);
ERROR: graph path and variable length edge cannot be used at the same time
LINE 1: match p = ( (v1:v{id:1})-[:e*]->(v2:v{id:11111}) ) return no...
^
Please, help...
It is impossible to get nodes between VLE path on older version of AgensGraph.
Think upgrading version of AgensGraph.
agens (AgensGraph 2.1.0, based on PostgreSQL 10.4)
Type "help" for help.
agens =# match p = ( (v1:v{id:1})-[:e*]->(v2:v{id:11111}) ) return nodes(p);
nodes
----------------------------------------------------------------------------------------------------
[v[6.1]{"id": 1},v[6.10]{"id": 11},v[6.91]{"id": 111},v[6.820]{"id": 1111},v[6.7381]{"id": 11111}]
(1 row)

Modeling a non-primary Key relationship

I am trying to model the following relationship with the intent of designing classes for EF code first.
Program table:
ProgramID - PK
ProgramName
ClusterCode
Sample data
ProgramID ProgramName ClusterCode
--------------------------------------
1 Spring A
2 Fall A
3 Winter B
4 Summer B
Cluster table:
ID
ClusterCode
ClusterDetails
Sample data:
ID ClusterCode ClusterDetails
---------------------------------
1 A 10
2 A 20
3 A 30
4 B 20
5 B 40
I need to join the Program table to the Cluster table so I can get the list of cluster details for each program.
The SQL would be
Select
from Programs P
Join Cluster C On P.ClusterCode = C.ClusterCode
Where P.ProgramID = 'xxx'
Note that for the Program table, ClusteCode is not unique.
For Cluster table, neither ClusterCode nor ClusterDetail is unique.
How would I model this so I can take advantage of navigation properties and code-first?
assuming you have mapped above two tables and make an association between them and you are using C#, you can use a simple join :
List<Sting> clustedDets=new ArrayList<String>();
var q =
from p in ClusterTable
join c in Program on p equals c.ClusterTable
select new { p.ClusterDetails };
foreach (var v in q)
{
clustedDets.Add(v.ClusterDetails);
}

Hive query, better option to self join

So I am working with a hive table that is set up as so:
id (Int), mapper (String), mapperId (Int)
Basically a single Id can have multiple mapperIds, one per mapper such as an example below:
ID (1) mapper(MAP1) mapperId(123)
ID (1) mapper(MAP2) mapperId(1234)
ID (1) mapper(MAP3) mapperId(12345)
ID (2) mapper(MAP2) mapperId(10)
ID (2) mapper(MAP3) mapperId(12)
I want to return the list of mapperIds associated to each unique ID. So for the above example I would want the below returned as a single row.
1, 123, 1234, 12345
2, null, 10, 12
The mapper Strings are known, so I was thinking of doing a self join for every mapper string I am interested in, but I was wondering if there was a more optimal solution?
If the assumption that the mapper column is distinct with respect to a given ID is correct, you could collect the mapper column and the mapperid column to a Map using brickhouse collect. You can clone the repo from that link and build the jar with Maven.
Query:
add jar /complete/path/to/jar/brickhouse-0.7.0-SNAPSHOT.jar;
create temporary function collect as 'brickhouse.udf.collect.CollectUDAF';
select id
,id_map['MAP1'] as mapper1
,id_map['MAP2'] as mapper2
,id_map['MAP3'] as mapper3
from (
select id
,collect(mapper, mapperid) as id_map
from some_table
group by id
) x
Output:
| id | mapper1 | mapper2 | mapper3 |
------------------------------------
1 123 1234 12345
2 10 12

Apply function to every element of an array in a SELECT statement

I am listing all functions of a PostgreSQL schema and need the human readable types for every argument of the functions. OIDs of the types a represented as an array in proallargtypes. I can unnest the array and apply format_type() to it, which causes the query to split into multiple rows for a single function. To avoid that I have to create an outer SELECT to GROUP the argtypes again because, apperently, one cannot group an unnested array. All columns are dependent on proname but I have to list all columns in GROUP BY clause, which is unnecessary but proname is not a primary key.
Is there a better way to achieve my goal of an output like this:
proname | ... | protypes
-------------------------------------
test | ... | {integer,integer}
I am currently using this query:
SELECT
proname,
prosrc,
pronargs,
proargmodes,
array_agg(proargtypes), -- see here
proallargtypes,
proargnames,
prodefaults,
prorettype,
lanname
FROM (
SELECT
p.proname,
p.prosrc,
p.pronargs,
p.proargmodes,
format_type(unnest(p.proallargtypes), NULL) AS proargtypes, -- and here
p.proallargtypes,
p.proargnames,
pg_get_expr(p.proargdefaults, 0) AS prodefaults,
format_type(p.prorettype, NULL) AS prorettype,
l.lanname
FROM pg_catalog.pg_proc p
JOIN pg_catalog.pg_language l
ON l.oid = p.prolang
JOIN pg_catalog.pg_namespace n
ON n.oid = p.pronamespace
WHERE n.nspname = 'public'
) x
GROUP BY proname, prosrc, pronargs, proargmodes, proallargtypes, proargnames, prodefaults, prorettype, lanname
you can use internal "undocumented" function pg_catalog.pg_get_function_arguments(p.oid).
postgres=# SELECT pg_catalog.pg_get_function_arguments('fufu'::regproc);
pg_get_function_arguments
---------------------------
a integer, b integer
(1 row)
Now, there are no build "map" function. So unnest, array_agg is only one possible. You can simplify life with own custom function:
CREATE OR REPLACE FUNCTION format_types(oid[])
RETURNS text[] AS $$
SELECT ARRAY(SELECT format_type(unnest($1), null))
$$ LANGUAGE sql IMMUTABLE;
and result
postgres=# SELECT format_types('{21,22,23}');
format_types
-------------------------------
{smallint,int2vector,integer}
(1 row)
Then your query should to be:
SELECT proname, format_types(proallargtypes)
FROM pg_proc
WHERE pronamespace = 2200 AND proallargtypes;
But result will not be expected probably, because proallargtypes field is not empty only when OUT parameters are used. It is empty usually. You should to look to proargtypes field, but it is a oidvector type - so you should to transform to oid[] first.
postgres=# SELECT proname, format_types(string_to_array(proargtypes::text,' ')::oid[])
FROM pg_proc
WHERE pronamespace = 2200
LIMIT 10;
proname | format_types
------------------------------+----------------------------------------------------
quantile_append_double | {internal,"double precision","double precision"}
quantile_append_double_array | {internal,"double precision","double precision[]"}
quantile_double | {internal}
quantile_double_array | {internal}
quantile | {"double precision","double precision"}
quantile | {"double precision","double precision[]"}
quantile_cont_double | {internal}
quantile_cont_double_array | {internal}
quantile_cont | {"double precision","double precision"}
quantile_cont | {"double precision","double precision[]"}
(10 rows)

Resources