Count all possible paths in a graph database for multiple vertices - graph-databases

I have a graph db and I need to get all the vertices that doesn't have any out edges and count how many vertices lead to each one
ex.
I have 8 vertices A,B1,B2,D, X,Y1,Y2,Z
D->B1->A
B2->A
Z->Y1->X
Y2->X
I would like to get a list the would have [A = 3, X = 3] plus the properties of each vertex
why 3? cause you can get to A from D, B1, and B2
what I have so far is to get the count of paths of one vertex but doing that query for each one is a bit slow, so I would like one query that will give me all that info
g.V().not(outE()).repeat(inE().outV().simplePath()).emit().dedup().count().next()

It looks like you had the right query, just need to add group to it:
g.V().not(outE()).group()
.by(label())
.by(repeat(inE().outV().simplePath()).emit().dedup().count())
I tested it here. seems to work as expected

Related

Implementing Cloud of Communities using Linked List

Recently I came across a question, I had to code it but failed to do so effectively. So i'll try to explain the question in the best way I can, and it goes like..
There are different people belonging to different communities. Say for example, 1 belongs to C1, 2 belong to C2 and 3 belongs to C3. We can perform two operations, Query and Join. Query returns the total number of people belonging to the person's community. And Join is used to combine the communities of exactly two persons into one.
We are taking the number of people and the number of operations to be performed as an input and we need to produce the result onto the standard output.
Example Case: (Q -> Query and J -> Join)
3 // No. of People
6 // No. of Operations
Q 1 // Prints 1
J 1,2 // Joins communities of 1 and 2
Q 1 // Prints 2
J 2,3 // Joins communities of 1 and 2
Q 3 // Prints 3
Q 1 // Prints 3
So essentially, its like people are belonging to individual bubbles initially and on join, we join the bubbles of two peoples to form a larger bubble containing two people.
There are different ways to solve this problem. Using ArrayList methods of Java, its pretty easy. I was trying to solve it using arrays.
My approach was to form an array for each person initially and as we join two communities, the respective arrays are added with the people as described :
Arr1 : 1 // Array for Person 1 ; Size 1
Arr2 : 2 // Array for Person 2 ; Size 1
J 1,2 results in,
Arr1 : 1,2 // Size 2
Arr2 : 2,1 // Size 2
But I was told that this is not an effective approach, an effective approach would be to make use of Linked List. I was unable to use linked list to solve it. So I would like some inputs from you guys on the approach, how exactly do I make use of Linked List to keep track of Join operations?
Sorry for the long post, and thanks in advance :)
P.S : I am not sure if the title is appropriate, kindly suggest proper title in case its not appropriate.
I do not know why somebody felt that linked lists would be better than arrays in this instance, but joining the communities is as simple as adding all of the members of one to the other, and changing all the pointers that point to the now empty community to point to the node containing all of the members. Query should be simple enough that it goes without saying.

Data search with partially match

I have a database with column A,B,C and row data, for example:
A B C
test1 2.0123 3.0123
test2 2.1234 3.1234
In my program I would like to search for the bestfit match in the databases,
for example I will key in value b=2.133, c=3.1342, then it will return me test2, how can I do that?
Please give me some idea or keyword to google as what I was thinking is searching algorithm but seem like searching algorithm are more on completely match and is not find the bestfit match. Or is this binpacking algorithm? How can I solve the problem with that.
I got about 5 column B,C,D,E,F and find the most match value.
Seems like you are looking for a k-d tree that maps 2-dimensional space (attributes B,C that are the key) to a value (attribute A).
K-D tree allows efficient look up for nearest neighbor of a given query, which seems to be exactly what you are after.
Note that the same DS will efficiently handle more attributes if needed, by increasing the dimensionality of the key.
Take a look at this(Nearest neighbor search):
http://en.wikipedia.org/wiki/Nearest_neighbor_search
In this the simplest algorithm(Linear search) would look something like this in SQL(for b=2.133, c=3.1342):
SELECT A, MIN(SQRT(POW(B-2.133,2)+POW(C-3.1342,2))) FROM tablename;
i.e. take the row with the minimum vector distance from the points (sqrt((b1-b2)^2+(c1-c2)^2))

generation of random multiset permutation with restrictions

Is there any known algorithm how to effectively generate any random multiset permutations with additional restrictions.
Example:
I have a multiset of items, for example: {1,1,1,2,2,3,3,3}, and a restricting set of sets, for example {{3},{1,2},{1,2,3},{1,2,3},{1,2,3},{1,2,3},{2,3},{2,3}}. I am looking for permutations of items, but the first element must be 3, and the second must be 1 or 2, etc.
One such permutation that fits restrictions is: {3,1,1,1,2,2,3,3}
Yes, there is. I asked in this German forum and got the following answer: The problem can be reduced to finding a maximum matching on a bipartite graph.
In order to do this, introduce vertices for all elements in the multiset. These vertices form the one side of the bipartite graph. Then, introduce vertices fo each restriction set. These vertices form the other side of the bipartite graph. Now introduce edges from each restriction set to those vertices on the first side, such that the vertex on the first side is "hit" if and only if it represents an element that is contained in the connected set.
The bipartite graph for your example would look like this:
Now the matching chooses edges in a way that no two adjacent edges are chosen. E.g. the first "1" is chosen for the second restriction "{1,2}", then it can not be used for any other restriction any more, since the use of another edge from this vertex would not result in a matching any more.
Feel free to ask in case you got another question on this.

turn a matrix into a sorted list google docs/spreadsheets

I've created a large-ish matrix by doing a =pearson( analysis on survey responses in google docs/spreadsheets and would like to convert it into a sorted list.
The matrix has labels (the survey questions) in row 2 and column b. Each intersecting cell has the value. Here's what the formula looks like.
=pearson(FILTER( Pc!$C$2:$AW$999 ; Pc!$C$2:$AW$2= C$2 ),FILTER(Pc!$C$2:$AW$999 ;Pc!$C$2:$AW$2=$B3))
This is what I'd like to get to:
a b c
Question one question 2 correlation
Then sorting by column c is easy.
How can I get all the points out of the matrix/array, along with the labels in this way?
Ideally I'd be able to do this only to points below the diagonal as there of course are dupes above..
Thanks!
I think I found a solution to placing the combination of the headers in a single column.
It involves a series of auxiliary columns, but works.
Let's say we have a single column with all unique headers on column A. I'll assume it's 6 values. So, on cell B1 we paste:
=ArrayFormula(join(";";A1&","&A2:A$6))
And then copy it down to B5. On C1 we join it all and split making a single column:
=transpose(split(join(";";B1:B5);";"))
If needed, we can split the combination in two columns again on D1
=ArrayFormula(split(C1:C15;","))
I don't know why, but the value on E1 does not work correctly, so I just pasted =A2
With these columns you can easily do your nice Pearson-Filter trick again to have it all in a single column. Hope this helps :)
Maybe something like this will help:
=ArrayFormula(transpose(split(CONCATENATE(transpose(C2:AW999)&char(9)), char(9))))
(C2:AW999 is your data range)

Big problem with Dijkstra algorithm in a linked list graph implementation

I have my graph implemented with linked lists, for both vertices and edges and that is becoming an issue for the Dijkstra algorithm. As I said on a previous question, I'm converting this code that uses an adjacency matrix to work with my graph implementation.
The problem is that when I find the minimum value I get an array index. This index would have match the vertex index if the graph vertexes were stored in an array instead. And the access to the vertex would be constant.
I don't have time to change my graph implementation, but I do have an hash table, indexed by a unique number (but one that does not start at 0, it's like 100090000) which is the problem I'm having. Whenever I need, I use the modulo operator to get a number between 0 and the total number of vertices.
This works fine for when I need an array index from the number, but when I need the number from the array index (to access the calculated minimum distance vertex in constant time), not so much.
I tried to search for how to inverse the modulo operation, like, 100090000 mod 18000 = 10000 and, 10000 invmod 18000 = 100090000 but couldn't find a way to do it.
My next alternative is to build some sort of reference array where, in the example above, arr[10000] = 100090000. That would fix the problem, but would require to loop the whole graph one more time.
Do I have any better/easier solution with my current graph implementation?
In your array, instead of just storing the count (or whatever you're storing there), store a structure which contains the count as well as the vertex index number.

Resources