Neo4j/Cypher | Not getting unique results from the database

Neo4j/Cypher | Not getting unique results from the database - database

I'm trying to get completely unique results from my Neo4j database.
I have a database with all kinds of movies, users can watch movies, and movies can be similar to each other. I'm trying to get a completely unique list of movies that are related to the movies that the user already watched, and filter out the movies that are already watched.
As far as I know this should work by using RETURN DISTINCT m, but that doesn't work if you have watched multiple movies that are similar to one movie.
So to make it simple:
User watched movie A, B and C. All of those movies are similar to movie D
Right now, it returns: D, D, D.
I tried both DISTINCT movie and collect(DISTINCT m) without success
The complete query I'm using is:
MATCH (u:user {name:'" + user + "'})-[:watched]->()-[r:is_similar_to]-(m)
WHERE NOT (u)-[:watched]-(m)
RETURN collect(DISTINCT m), r ORDER BY r.rated
I hope you guys can help me out,
Thanks!

The reason for the "non unique" movies is because you have returned "r".
There is an implicit group by r here. When you switch to the Rows view you can see that you will have 3 rows with unique relationships (r) and within the row unique movies.
Maybe this is what you want:
MATCH (u:user {name:'" + user + "'})-[:watched]->()-[r:is_similar_to]-(m)
WHERE NOT (u)-[:watched]-(m)
RETURN m, sum(r.rated) as score
ORDER by score DESC
You will than have unique Movies and the sum of all the related scores.

Related

neo4j How to get a subset using a forloop as a subquery?

For each person, I want to get the first 5 events and last 5 events (based on eventTime). I want to compare the winrate of the first 5 event to the last to see the most improved person. I am struggling to find a way to handle the for loop logic in neo4j.
GDB Schema:
(p: Person) -[:PlaysIn]-> (e:Event {eventTime:, eventOutcome:})

The apoc.coll.sortNodes function does the trick for you. See https://neo4j.com/labs/apoc/4.1/overview/apoc.coll/apoc.coll.sortNodes/
MATCH (p:Person)
WITH p, apoc.coll.sortNodes([(p)-[:PlaysIn]->(e:Event) | e ], 'eventTime') AS events
RETURN p,
events[0..5] AS first5Events,
events[..-5] AS last5Events

How Can I Find Unique Values in this Dataframe with pandas?

I have a dataframe (image attached for reference), which is a list of venues in neighborhoods of Toronto.
For each venue the neighborhood name is listed, as well as the venue type (I got rid of everything else).
I need to find a way to grab the total number of unique venue types in each neighborhood. So for example, if there are 8 coffee shops and 2 restaurants, the value returned should be 2. If there's 1 coffee shop, 1 restaurant and 1 laundromat, the value should be 3, etc.
Does anyone know how to do this?

Try using 'groupby' and 'nunique'
df.groupby('Neighbourhood')['Venue Category'].nunique()
This will return the count of all the different 'Venue Category' for each 'Neighbourhood'.
Hope this solves your question :)

Try using pandas' unique() function
List unique values in the df['neighborhood'] column
df.neighborhood.unique()
Should return an array/list of values

Combining scores from multiple Neo4j queries

We have a movie graph. There are actors, directors, and movies. It's possible that a director has also played in another movie, or even the same movie. We give an actor a score for each movie he plays in. We also give a director a score for each of his movies. Now, we are looking to get a list of individuals and their scores, sorted from highest to lowest score. So these are the queries we currently have:
MATCH (p:Person)-[idr:IsDirectorOf]->(m:Movie)
RETURN p.name AS name, COUNT(idr) AS numberOfMoviesDirected
ORDER BY numberOfMoviesDirected DESC;
and
MATCH (p:Person)-[iai:IsActorIn]->(m:Movie)
RETURN p.name AS name, COUNT(iai) AS numberOfMoviesPlayed
ORDER BY numberOfMoviesPlayed DESC;
I want to combine these queries, to get the person who has the highest score from both queries together at the top. Also, please note that we may later need to add another query to the mix, so solutions that work for only two queries may not be the best.

You should try this query :
MATCH (p:Person)
RETURN p, size((p:Person)-[:IsDirectorOf]->(:Movie)) + size((p:Person)-[:IsActorIn]->(:Movie)) AS score
ORDER BY score DESC
And if you want to add additional score, just continue your query with a WITH.
Cheers
UPDATE
To respond to your comment, this is the query :
MATCH (p:Person)
WITH
max(size((p)-[:IsActorIn]->())) As maxActed,
max(size((p)-[:IsDirectorOf]->())) AS maxDirected
MATCH (p:Person)
RETURN
p,
size((p)-[:IsActorIn]->(:Movie)) / toFloat(maxActed) + size((p)-[:IsDirectorOf]->(:Movie)) / toFloat(maxDirected) AS score
ORDER BY score DESC

What's the most effective way of storing this data?

Need help figuring out a good way to store data effectively and efficiently
I'm using Parse (JavaScript SDK), here's an example of what I'm trying to store
Predictions of football (soccer) matches so an example of one match would be;
Team A v Team B
EventID = "abc"
Categories = ["League-1","Sunday-League"]
User123 predicts the score will be Team A 2-0 Team B -> so 2-0
User456 predicts the score will be Team A 1-3 Team B -> so 1-3
Each event has information attached to it like an eventId, several categories, start time, end time, a result and more
I need to record a score prediction per user for each event (usually 10 events at a time so a lot of predictions will be coming in)
I need to store these so I can cross reference the correct result against the user's prediction and award points based on their prediction, the teams in the match and the categories of the event but instead of adding to a total I need all the awarded points stored separately per category and per user so I can then filter based on predictions between set dates and certain categories e.g.
Team A v Team B
EventID = "abc"
Categories = ["League-1","Sunday-League"]
User123 prediction = 2-0
Actual result = 2-0
So now I need to award X points to User123 for Team A, Team B, "League-1", and "Sunday-League" and record it to the event date too.

I would suggest you create a table for games and a table for users and then an associative table to handle the many to many relationship. This is a pretty standard many to many relationship.

Solr: Fetching results with a minimum from each category

I am using solr 4.4.0. The search is performed on products, each of which has a category field. I want to retrieve top n products. But, if some category has less than m products among the top n, then I want to retrieve more products only for those categories.
Eg. I have 4 categories a, b, c, d. n=20 and m=5. Now lets say the top 20(=n) have following category distribution (a:6, b:4, c:6, d:4). Categories b and d have less than m(=5) products. So I would like to fetch one more product(with the next highest score) for both these categories.
Is there a way I can do this using solr

Did you try to solve this with FieldCollapsing?
You use group.field=category, and group.limit lets you set the size of each group. Then you need to be a bit careful on how the groups are sorted, I think it was by the first doc in the group...
But I guess you can achieve what you are looking for fairly easy.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Neo4j/Cypher | Not getting unique results from the database - database

Related

neo4j How to get a subset using a forloop as a subquery?

How Can I Find Unique Values in this Dataframe with pandas?

Combining scores from multiple Neo4j queries

What's the most effective way of storing this data?

Solr: Fetching results with a minimum from each category

Categories

Resources