Why duplicated results query with cypher neo4j - database

I am implementing the example database Movies en Neo4j. I already search something about duplicated rows but I still have doubts
I am using XOR. I am getting the
MATCH (m:Movie)<-[r]-(p:Person)
WHERE m.title STARTS WITH 'The'
XOR (m.released = 1999 OR m.released = 2003)
RETURN m.title, m.released
So, my result is
As you can see, there are duplicated rows, I don't understand why there are doing that and the number of duplicated results is according to what?
I know that DISTINCT removes duplicated. But I am interested in understanding why the query duplicated the results and the number of duplicated is according to what?.

This is because you are matching
MATCH (m:Movie)<-[r]-(p:Person)
So the movie title will be returned for each person in the movie, so if there are 4 people in the movie, you will get four movie titles back. You can remove duplicates by matching only the movie
MATCH (m:Movie)

As Tomaz said, it is returning a row for every :Person that has a relationship to :Movie. If you concluded your query with just RETURN m and viewed the results, you probably would only see non-duplicated nodes appear. Otherwise, you can conclude the query with RETURN DISTINCT m to ensure that non-duplicated results are returned.

Related

COUNT, IIF usage for counting records that also have a specific field value matched

Using MS Access and I have two tables, one is categories and the other is content.
My initial SQL statement, included below,takes a count of each content associated to a category and returns the count associated with each category.
So for each CATEGORY, I'm simply trying to return another count in which I count CONTENT that have a specific user level and are not deleted for each CATEGORY.
Below is what I am struggling with as I am not certain you can actually use COUNT like this.
COUNT(IIf([CONTENT.isDeleted]=0,1,0)) - COUNT(IIf([CONTENT.userLevel]=2)) AS userLevelCount
This is the full select statement with my addition but not working.
SELECT
CATEGORY.categoryId,
CATEGORY.categoryTitle,
CATEGORY.categoryDate,
CATEGORY.userLevel,
Last(CONTENT.contentDate) AS contentDate,
CATEGORY.isDeleted AS categoryDeleted,
COUNT(IIf([CONTENT.isDeleted]=0,1,0)) AS countTotal,
COUNT(IIf([CONTENT.isDeleted]=1,[CONTENT.contentID],Null)) AS countDeleted,
COUNT([CONTENT.categoryId]) - COUNT(IIf([CONTENT.isDeleted]=1,[CONTENT.contentID],Null))AS countDifference,
COUNT(IIf([CONTENT.isDeleted]=0,1,0)) - COUNT(IIf([CONTENT.userLevel]=2)) AS userLevelCount
FROM CATEGORY
LEFT JOIN CONTENT ON
CATEGORY.categoryId = CONTENT.categoryId
GROUP BY
CATEGORY.categoryId,
CATEGORY.categoryTitle,
CATEGORY.categoryDate,
CATEGORY.userLevel,
CATEGORY.isDeleted
HAVING (((CATEGORY.isDeleted)=0))
ORDER BY
CATEGORY.categoryTitle
you should be able to use the following
SUM(IIf([CONTENT.isDeleted]=0,1,0)) - COUNT(IIf([CONTENT.userLevel]=2,1,NULL)) AS userLevelCount
COUNT will not count NULL, but it will count zero. SUM will calculate the sum of all 1's - that's a second way of achieving the same.
IIF exists in the newer SQL versions
I believe I found the solution
Count(IIf([CONTENT.userLevel]=2,[CONTENT.contentID],Null)) AS countDifference2
This will return the count difference for CONTENT for each CATEGORY that isn't deleted and has a specific user level.

CYPHER, NEO4J Returning couples of nodes that are connected by a relationship of type A and not by a relation of type B

I am working on the Neo4j tutorial Movies database.
I would like to make a query that returns the people that directed a movie but did not produced it.
In order to accomplish this I have used the query:
match (m:Movie)<-[:DIRECTED]-(p:Person)-[r]->(m) where type(r) <> 'PRODUCED' return p
Nevertheless, if I make it return * I still get these couples (person, movie) where the person not only directed and produced the film but also wrote it:
In the image there is one of the not admissible couples that are returned by my query
On the contrary, the query seems to successfully rule out all those couples that are with only the two relationships 'PRODUCED' and 'DIRECTED'.
Is there a way of writing this query so that I can rule out all these couples as well?
You can use path pattern in WHERE:
match (m:Movie)<-[r:DIRECTED]-(p:Person)
where not (p)-[:PRODUCED]->(m)
return m, p, r

Cypher returns exponential count result on 'join'

I am learning Cypher on Neo4j and I am having trouble in understanding how to perform an efficient 'join' equivalent in Cypher.
I am using the standard Matrix character example and I have added some nodes to the mix called 'Gun' with a relation of ':GIVEN_TO'. You can see the console with my query result here:
http://console.neo4j.org/r/rog2hv
The query I am using is:
MATCH (Neo:Crew { name: 'Neo' })-[:KNOWS*..]->(other:Crew),(other)<-[:GIVEN_TO]-(g:Gun),(Neo)<-[:GIVEN_TO]-(g2:Gun)
RETURN count(g2);
I have given Neo 4 guns, but when I perform the above I get a count of '12'. This seems to be the case because there are 3 'others' and 3*4 = 12. So I get some exponential result.
What should my query look like to get the correct count ('4') from the example?
Edit:
The reason I am not querying through Guns directly as suggested by #ceej is because in my real use case I have to do this traversal as described above. Adding DISTINCT does not do anything for my result.
The reason you get 12 guns instead of 4 is because your query produces a cartesian product. This is because you have asked for items in the same match statement without joining them. #ceej rightly pointed out if you want to find Neo's guns you would do as he suggested in his first query.
If you wanted to get a list of the crew members and their guns then you could do something like this...
MATCH (crew:Crew)<-[:GIVEN_TO]-(g:Gun)
RETURN crew.name, collect(g.name)
Which finds all of the crew members with guns and returns their name and the guns that they were given.
If you wanted to invert it and get a list of the guns and the respective crew members they were give to you could do the following...
MATCH (crew:Crew)<-[:GIVEN_TO]-(g:Gun)
RETURN g.name, collect(crew.name)
If you wanted to find all of the crew that knew Neo multiple levels deep that were given a gun you could write the query like this...
MATCH (crew:Crew)<-[:GIVEN_TO]-(g:Gun)
WITH crew, g
MATCH (neo:Crew {name: 'Neo'})-[:KNOWS*0..]->(crew)
RETURN crew.name, collect(g.name)
That finds all the crew that were given guns and then determines which of them have a :KNOWS path to Neo.
Forgive me, but I am am unclear why you have the initial MATCH in your query. From your explanation it would appear that you are trying to get the number of :Gun nodes linked to Neo by the :GIVEN_TO relationship. In which case all you need is the latter part of your query. Which would give you something like
MATCH (neo:Crew { name: 'Neo' })<-[:GIVEN_TO]-(g:Gun)
RETURN count(g)
Furthermore, to make sure that you are only counting distinct :Gun nodes you can add DISTINCT to the RETURN statement.
MATCH (neo:Crew { name: 'Neo' })<-[:GIVEN_TO]-(g:Gun)
RETURN count( DISTINCT g )
This is possibly unnecessary in your case but can be helpful when the pattern that you are matching on can arrive at the same node by different traversals.
Have I misunderstood your requirement?

Why do these queries in Neo4j return different results?

I am practicing on Neo4j movie database that come with it when you install Neo4j. For some reason, these two queries return different results:
match (keanu:Person {name: "Keanu Reeves"})-[:ACTED_IN]->()
<-[:ACTED_IN]-(actor), (actor)-[:ACTED_IN]->()<-[:ACTED_IN]-(other)
where NOT ((keanu)-[:ACTED_IN]->()<-[:ACTED_IN]-(other)) and
other <>keanu return other.name, count(other) as count order by count DESC;
match (keanu:Person {name: "Keanu Reeves"})-[:ACTED_IN]->(movie)
<-[:ACTED_IN]-(actor), (actor)-[:ACTED_IN]->()<-[:ACTED_IN]-(other)
where NOT ((keanu)-[:ACTED_IN]->(movie)<-[:ACTED_IN]-(other)) and
other <>keanu return other.name, count(other) as count order by count DESC;
The only difference is that I specified 'movie' variable. I just want to look up the actors that have not played with Keanu, but have played with his co-stars most frequently. The results are the same, except when I specify a 'movie' variable a new actor is added to the top of the results (having most frequently acted with Keanu's co-stars). That actor does not show up at all in the first query results, but only in the second and tops the results.
The first variant contains
where NOT ((keanu)-[:ACTED_IN]->()<-[:ACTED_IN]-(other))
The second one
where NOT ((keanu)-[:ACTED_IN]->(movie)<-[:ACTED_IN]-(other))
So the first one filters out all paths where keanu acts together with other in any movie. The second one filters out all paths where keanu acts together with other in this movie.

Searching for and matching elements across arrays

I have two tables.
In one table there are two columns, one has the ID and the other the abstracts of a document about 300-500 words long. There are about 500 rows.
The other table has only one column and >18000 rows. Each cell of that column contains a distinct acronym such as NGF, EPO, TPO etc.
I am interested in a script that will scan each abstract of the table 1 and identify one or more of the acronyms present in it, which are also present in table 2.
Finally the program will create a separate table where the first column contains the content of the first column of the table 1 (i.e. ID) and the acronyms found in the document associated with that ID.
Can some one with expertise in Python, Perl or any other scripting language help?
It seems to me that you are trying to join the two tables where the acronym appears in the abstract. ie (pseudo SQL):
SELECT acronym.id, document.id
FROM acronym, document
WHERE acronym.value IN explode(documents.abstract)
Given the desired semantics you can use the most straight forward approach:
acronyms = ['ABC', ...]
documents = [(0, "Document zeros discusses the value of ABC in the context of..."), ...]
joins = []
for id, abstract in documents:
for word in abstract.split():
try:
index = acronyms.index(word)
joins.append((id, index))
except ValueError:
pass # word not an acronym
This is a straightforward implementation; however, it has n cubed running time as acronyms.index performs a linear search (of our largest array, no less). We can improve the algorithm by first building a hash index of the acronyms:
acronyms = ['ABC', ...]
documents = [(0, "Document zeros discusses the value of ABC in the context of..."), ...]
index = dict((acronym, idx) for idx, acronym in enumberate(acronyms))
joins = []
for id, abstract in documents:
for word in abstract.split():
try
joins.append((id, index[word]))
except KeyError:
pass # word not an acronym
Of course, you might want to consider using an actual database. That way you won't have to implement your joins by hand.
Thanks a lot for the quick response.
I assume the pseudo SQL solution is for MYSQL etc. However it did not work in Microsoft ACCESS.
the second and the third are for Python I assume. Can I feed acronym and document as input files?
babru
It didn't work in Access because tables are accessed differently (e.g. acronym.[id])

Resources