similarities through whole Neo4j graph

similarities through whole Neo4j graph - database

i'm trying to get the similarities between one node say me and all the people in the graph not just "friend of friend" which uses the nth order relationship. do i have to go through the whole graph and looping through every user or there's a better way to go?
MATCH (me { name: 'Bradley' })-[:INTEREST]->(stuff)<-
[:INTEREST]-(another_user)
RETURN another_user.name, count(stuff)
ORDER BY count(stuff) DESC
in brief how to replace another_user with all users in the graph in a clean and not costing way

This query should help you:
match (me { name: 'Bradley' })-[:INTEREST]-(stuff) with stuff, collect(stuff) as stuffs match (another_user)-[:INTEREST]-(common_stuff) where common_stuff in stuffs return another_user.name,count(common_stuff) ORDER BY count(common_stuff) DESC
This query will first collect all the items that Bradley is interested in and then iterate through other users and count the items which are common interests between them and Bradley.

Related

Cypher returns exponential count result on 'join'

I am learning Cypher on Neo4j and I am having trouble in understanding how to perform an efficient 'join' equivalent in Cypher.
I am using the standard Matrix character example and I have added some nodes to the mix called 'Gun' with a relation of ':GIVEN_TO'. You can see the console with my query result here:
http://console.neo4j.org/r/rog2hv
The query I am using is:
MATCH (Neo:Crew { name: 'Neo' })-[:KNOWS*..]->(other:Crew),(other)<-[:GIVEN_TO]-(g:Gun),(Neo)<-[:GIVEN_TO]-(g2:Gun)
RETURN count(g2);
I have given Neo 4 guns, but when I perform the above I get a count of '12'. This seems to be the case because there are 3 'others' and 3*4 = 12. So I get some exponential result.
What should my query look like to get the correct count ('4') from the example?
Edit:
The reason I am not querying through Guns directly as suggested by #ceej is because in my real use case I have to do this traversal as described above. Adding DISTINCT does not do anything for my result.

The reason you get 12 guns instead of 4 is because your query produces a cartesian product. This is because you have asked for items in the same match statement without joining them. #ceej rightly pointed out if you want to find Neo's guns you would do as he suggested in his first query.
If you wanted to get a list of the crew members and their guns then you could do something like this...
MATCH (crew:Crew)<-[:GIVEN_TO]-(g:Gun)
RETURN crew.name, collect(g.name)
Which finds all of the crew members with guns and returns their name and the guns that they were given.
If you wanted to invert it and get a list of the guns and the respective crew members they were give to you could do the following...
MATCH (crew:Crew)<-[:GIVEN_TO]-(g:Gun)
RETURN g.name, collect(crew.name)
If you wanted to find all of the crew that knew Neo multiple levels deep that were given a gun you could write the query like this...
MATCH (crew:Crew)<-[:GIVEN_TO]-(g:Gun)
WITH crew, g
MATCH (neo:Crew {name: 'Neo'})-[:KNOWS*0..]->(crew)
RETURN crew.name, collect(g.name)
That finds all the crew that were given guns and then determines which of them have a :KNOWS path to Neo.

Forgive me, but I am am unclear why you have the initial MATCH in your query. From your explanation it would appear that you are trying to get the number of :Gun nodes linked to Neo by the :GIVEN_TO relationship. In which case all you need is the latter part of your query. Which would give you something like
MATCH (neo:Crew { name: 'Neo' })<-[:GIVEN_TO]-(g:Gun)
RETURN count(g)
Furthermore, to make sure that you are only counting distinct :Gun nodes you can add DISTINCT to the RETURN statement.
MATCH (neo:Crew { name: 'Neo' })<-[:GIVEN_TO]-(g:Gun)
RETURN count( DISTINCT g )
This is possibly unnecessary in your case but can be helpful when the pattern that you are matching on can arrive at the same node by different traversals.
Have I misunderstood your requirement?

Fetch field of another Doc in CouchDB?

I'm very new to CouchDB and I have a simple task that I have no been able to find a straight answer.
I have two related docs: order and invoice
invoice = {
id: "invoice_id",
order: "order_id",
values: [...]
}
order = {
id: "order_id",
**order_number: 12345**
}
I have defined a map function that select the unfulfilled invoices, now I need the order_number, which is in the order doc. They are the same transaction. How do I fetch the order_number from the order when I get my invoices?
I've looked around and I'm getting so many answers like: view collation, linked documents, include_docs=true, structure docs to have both...
I'm just looking for the simplest way with a clear explanation. I appreciate any help.
p.s.
Since I'm new I'm finding couchDB development to be very involved. I have map functions, but they need to be pushed to the couchInstance? Or I edit the map functions in Futon? Are there better ways to develop against couchDB? I see there's couchApp but the docs are sparse and the project hasn't been updated in a while.

Cypher Neo4j - How to identify instances of a relationship for a particular field

I'm in the process of trying to learn Cypher for use with graph databases.
Using Neo4j's test database (http://www.neo4j.org/console?id=shakespeare)
, how can I find all the performances of a particular play? I'm trying to establish how many times Julius Caesar was performed.
I have tried to use:
MATCH (title)<-[:PERFORMED]-(performance)
WHERE (title: "Julias Caesar")
RETURN title AS Productions;
I'm aware it's quite easy to recognise manually, but on a larger scale it wouldn't be possible.
Thank you

You would have to count the number of performance nodes . You can use count to get the number of nodes.
MATCH (t)<-[:PERFORMED]-(performance)
WHERE t.title = "Julias Caesar"
RETURN DISTINCT t AS Productions,count(performance) AS count_of_performance

Soccer and CouchDb (noob pining for sql and joins)

This has kept me awake until these wee hours.
I want a db to keep track of a soccer tournament.
Each match has two teams, home and away.
Each team can be the home or the away of many matches.
I've got one db, and two document types, "match" (contains: home: teamId and away: teamId) and team (contains: teamId, teamName, etc).
I've managed to write a working view but it would imply adding to each team the id of every match it is involved in, which doesn't make much logical sense - it's such an hack.
Any idea on how this view should be written? I am nearly tempted to just throw the sponge in and use postgres instead.
EDIT: what I want is to have the team info for both the home and away teams, given the id of a match. Pretty easy to do with two calls, but I don't want to make two calls.

Just emit two values in map for each match like this:
function (doc) {
if (!doc.home || !doc.away) return;
emit([doc._id, "home"], { _id: doc.home });
emit([doc._id, "away"], { _id: doc.away });
}
After querying the view for the match id MATCHID with:
curl 'http://localhost:5984/yourdb/_design/yourpp/_view/yourview?startkey=\["MATCHID"\]&endkey=\["MATCHID",\{\}\]&include_docs=true'
you should be able to get both teams' documents in doc fields in list of results (row), possibly like below:
{"total_rows":2,"offset":0,"rows":[
{"id":"MATCHID","key":["MATCHID","home"],"value":{"_id":"first_team_id"},"doc":{...full doc of the first team...}},
{"id":"MATCHID","key":["MATCHID","away"],"value":{"_id":"second_team_id"},"doc":{...full doc of the second team...}}
]}

Check the CouchDB Book: http://guide.couchdb.org/editions/1/en/why.html
It's free and includes answers to a beginner :)
If you like it, consider buying it. Even thought Chris and Jan are so awesome they just put their book out there for free you should still support the great work they did with their book.

Google App Engine query optimization

I have a Google App Engine datastore that could have several million records in it and I'm trying to figure out the best way to do a query where I need get records back that match several Strings.
For example, say I have the following Model:
String name
String level
Int score
I need to return all the records for a given "level" that also match a list of "names". There might be only 1 or 2 names in the name list, but there could be 100.
It's basically a list of high scores ("score") for players ("name") for a given level ("level"). I want to find all the scores for a given "level" for a list of players by "name" to build a high score list that include just your friends.
I could just loop over the list of "names" and do a query for each their high scores for that level, but I don't know if this is the best way. In SQL I could construct a single (complex) query to do this.
Given the size of the datastore, I want to make sure I'm not wasting time running python code that should be done by the query or vise-versa.
The "level" needs to be a String, not an Int since they are not numbered levesl but rather level names, but I don't know if that matters.

You can use IN filter operator to match property against a list of values (user names):
scores = Scores.all().filter('level ==', level).filter('user IN', user_list)
Note that under the hood this performs as much queries as there are users in user_list.

players = Player.all().filter('level =', level).order('score')
names = [name1, name2, name3, ...]
players = [p for p in players if p.name in names]
for player in players:
print name, print score
is this what you want?
...or am i simplifying too much?

No you can not do that in one pass.
You will have to either query the friends for the level one by one
or
make a friends score entity for each level. Each time the score changes check which friends list he belongs to and update all their lists. Then its just a matter or retrieving that list.
the first one will be slow and the second costly unless optimized.