Bipartite graph projection via Cypher query Neo4j - database

I'm new in neo4j and Im trying to make a projection of a bipartite graph of users and movies that they rated. Here is the information that I have:
enter image description here
I create the graph in Neo4j and this is what I got:
enter image description here
I'm trying to do a projection to connect users who rated movies with the SAME rating, but I have not been successful. This is the code that I have for the projection:
MATCH (u:User)-[r:RATED_MOVIE]->(m:Movie)
WITH m, collect(u) as users, collect(r) as raitings, count(r) as weights
UNWIND users as u1
UNWIND users as u2
UNWIND raitings as r1
UNWIND raitings as r2
WITH u1, u2, r1, r2
WHERE u1.UserId < u2.UserId and r1.rating = r2.rating
CREATE (u1)-[:CONNECTED{common_movies_rated:weights}]->(u2)
RETURN u1, u2
The expected output is a graph like this:
enter image description here

Good description, thanks for specifying the desired output.
What you are looking for are paths where two people give the same rating to the same movie, and then count those occurrences between the two people to get the weight so you can create the relationship between them.
We can use a simpler query to get the results you need.
MATCH (u1:User)-[r:RATED_MOVIE]->(m:Movie)<-[r2:RATED_MOVIE]-(u2)
WHERE id(u1) < id(u2) AND r.rating = r2.rating
WITH u1, u2, count(m) as weight
CREATE (u1)-[:CONNECTED {common_movies_rated:weight}]->(u2)
RETURN u1, u2

Related

Power BI - TopN + Others on data from two tables

I am a bit stuck with a specific case in Power BI. Let's say that we have two tables. The first one contains the product ID and the product name, and the second one contains the product ID and a specific budget.
I want to create a piechart showing the topN + group others. I have made a dax formula which works for data in a single table, but not on two.
Here is the formula :
ProductTop =
VAR rankSiteImpressions = RANKX(ALL(Piechart); [Impressions ];;DESC)
return
IF(rankSiteImpressions<=3;Piechart[Site];"Others")
How can I apply this on data from two tables to get the top products by budget?
Many thanks,
RĂ©mi

Adding scores from separate queries in Neo4j

Say I have two tables as results of two separate Cypher queries:
First table:
login score
abc 10
def 20
And second table:
login score
abc 50
ghi 100
I need a table in which the scores for the logins that exist in both tables are summed, and for other logins, they are listed with the single score available for them.
login score
abc 60
def 20
ghi 100
Can you help with a Cypher query for this? What if I want to apply a custom aggregate function instead of simple summation?
This will not be super efficient if you have a large dataset, but something like the following will do the trick :
MATCH (c:Choice)
WITH collect(c.login) AS cset
MATCH (t:Thing)
WHERE NOT t.login IN cset
RETURN t.login AS login, t.score AS score
UNION ALL
MATCH (t:Thing)
WITH collect(t.login) AS tset
MATCH (c:Choice)
WHERE NOT c.login IN tset
RETURN c.login AS login, c.score AS score
UNION ALL
MATCH (c:Choice),(t:Thing {login: c.login})
RETURN c.login AS login, c.score + t.score AS score;
There may be more efficient ways to do this, but
First part gets you the scores for logins only in the Thing-nodes
Second part gets you the scores for logins only in the Choice-nodes
Third part sums the scores for logins present in both node types.
UNIONs link it all together.
Hope this helps,
Tom

Is this use case a candidate for Graph Database application?

Consider I have some users U1, U2, U3 each with property 'age' such that;
U1.age = 10
U2.age = 30
U3.age = 70
I also have some lists which are dynamic collections of users based on some criteria, say L1, L2, L3, such that;
L1: where age < 60
L2: where age < 30
L3: where age > 20
Since the lists are dynamic, the relationship between lists and users is established only through the user properties and list criteria. There is no hard mapping to indicate which users belong to which list. When the age of any user changes or when the criteria of any list changes, the users associated with a list may also change.
In this scenario, at any point of time it is very easy to get the users associated with a list by querying users matching the list criteria.
But to get the lists associated with a user, is an expensive operation which involves first determining users associated with each list and then picking those lists where the result has the user in question.
Could this be a candidate for using Graph Database? And why? (I'm considering Neo4j) If yes, how to model the nodes and the relationships so that I can easily get the lists given a user.
Since 2.3 Neo4j does allow index range queries. Assume you have an index:
CREATE INDEX on :User(age)
Then this query gives you the list of people younger 60 years and is performed via the index
MATCH (u:User) WHERE u.age < 60 RETURN u
However I would not store the age, instead I'd store the date of birth as a long property. Otherwise you have can the age over and over again.
Update based on comment below
Assume you have a node for each list:
CREATE (:List{name:'l1', min:20, max:999})
CREATE (:List{name:'l2', min:0, max:30})
CREATE (:List{name:'l3', min:0, max:60})
Let's find all the lists a user U1 belongs to:
MATCH (me:User{name:'U1'})
WITH me.age as age
MATCH (l:List) WHERE age >= l.min AND age <= l.max // find lists
WITH l
MATCH (u:User) WHERE u.age >= l.min AND age <= l.max
RETURN l.name, collect(u)
Update 2
A complete different idea would be to use a timetree. Both, all users and your list definitions are connected to the timetree

What's the most effective way of storing this data?

Need help figuring out a good way to store data effectively and efficiently
I'm using Parse (JavaScript SDK), here's an example of what I'm trying to store
Predictions of football (soccer) matches so an example of one match would be;
Team A v Team B
EventID = "abc"
Categories = ["League-1","Sunday-League"]
User123 predicts the score will be Team A 2-0 Team B -> so 2-0
User456 predicts the score will be Team A 1-3 Team B -> so 1-3
Each event has information attached to it like an eventId, several categories, start time, end time, a result and more
I need to record a score prediction per user for each event (usually 10 events at a time so a lot of predictions will be coming in)
I need to store these so I can cross reference the correct result against the user's prediction and award points based on their prediction, the teams in the match and the categories of the event but instead of adding to a total I need all the awarded points stored separately per category and per user so I can then filter based on predictions between set dates and certain categories e.g.
Team A v Team B
EventID = "abc"
Categories = ["League-1","Sunday-League"]
User123 prediction = 2-0
Actual result = 2-0
So now I need to award X points to User123 for Team A, Team B, "League-1", and "Sunday-League" and record it to the event date too.
I would suggest you create a table for games and a table for users and then an associative table to handle the many to many relationship. This is a pretty standard many to many relationship.

GeoDjango distance query weighted by rating

I am using GeoDjango/PostGIS and have a model called Business that has a physical location and a rating. I would like to run a query to find 'nearest' highly rated businesses. How do I do this? To make it more concrete, suppose given a location, I want to find businesses sorted by rating/(1+distance). What is the best way to go about this?
from django.contrib.gis.db import models
class Business(models.Model):
name = models.CharField(max_length=255)
rating = models.IntegerField()
address = models.PointField()
I don't think you can sort by min distance in geodjango, but you can filter by distance, and only get the ones near of your point, and then order by rating.
from django.contrib.gis.geos import *
pnt = Point(954158.1, 4215137.1, srid=32140)
pnt.buffer(23) #radius
business = Business.objects.filter(address__intersects=pnt).order_by('rating')
In postgis you could get what you are asking for with a simple query like this:
SELECT name, rating,
ST_Distance(the_geom,ST_GeomFromEWKT('SRID=4326;POINT(19.232 91.00)') AS minDist
FROM business ORDER BY minDist,rating;

Resources