neo4j / cypher - why is starting node excluded? - database

I have a simple graph:
When I run this simple query in neoeclipse:
START me=node:node_auto_index(name="Me")
MATCH me-[:LIVES_IN]->()<-[:LIVES_IN]-(f)
RETURN f.name;
only my Girlfriend is returned!
Why am I excluded from the result?
Results
f.name Girlfriend

Because a path (what you specify in the match) will never contain the same relationship twice.
To find all the people living in the same location including yourself, you need to split into two actions, one finding your city and the other collecting people in this city using the with statement:
start me=node:node_auto_index(name='Me')
match me-[:LIVES_IN]->homebase
with homebase
match homebase<-[:LIVES_IN]-people
return people
See http://console.neo4j.org/?id=t0wjhg

Related

How to find the pointer node of a relationship using full text search on the edge node in neo4j

This question is related to Neo4j databases. Suppose I have a relationship (employee)-[WORKS-IN]->(company).. Imagine an employee works in multiple companies. I should be able to find the companies that a specific employee is working using full text search in neo4j. I'll be searching from the users name and I should be able to return company nodes..how to do that??
Full text search must be used.
So you want to search for a Person by name with full text and then retrieve the companies he worked for.
Compare this easily with the default Movies graph in Neo4j, you want to search for a Person by name with full text and then retrieve the movies the person acted in .
CALL db.index.fulltext.queryNodes('Person', 'kea*')
YIELD node
MATCH (node)-[:ACTED_IN]->(movie)
RETURN node.name, movie.title
This is an example when I created this node:
CREATE (e:Employee {name: 'Nirmana Testing'})
Then create the full text index on Employee.name
CREATE FULLTEXT INDEX employeeNameIdx FOR (e:Employee) ON EACH [e.name]
Then run a query using this full text index. Noted that the keyword 'nirmana' can be upper case or any case.
CALL db.index.fulltext.queryNodes("employeeNameIdx", "nirmana") YIELD node as employee
MATCH (employee)-[:WORKS-IN]->(company:Company)
RETURN employee, company
reference:
https://neo4j.com/docs/cypher-manual/current/indexes-for-full-text-search/
Thank you very much. Sorted it out. And one more thing. Suppose for a particular worker there can be various relationships except [WORKS-IN] , such as [PART TIME WORKER] , [FREELANCER], [PROJECT MANAGER] and so on. So for a particular user, If we want to find the place or company that he is working, freelancing, managing projects by searching the relationship type how could it be done using full text search.

Entity Framework complex search function

I'm using Entity Framework with a SQL Express database and now I have to make a search function to find users based on a value typed in a textbox, where the end user just can type in everything he wants (like Google)
What is the best way to create a search function for this. The input should search all columns.
So for example, I have 4 columns. firstname,lastname,address,emailaddress.
When someone types in the searchbox foo, all columns need to be searched for everything that contains foo.
So I thought I just could do something like
context.Users.Where(u =>
u.Firstname.Contains('foo') ||
u.Lastname.Contains('foo') ||
u.Address.Contains('foo') ||
u.EmailAddress.Contains('foo')
);
But... The end user may also type in foo bar. And then the space in the search value becomes an and requirement. So all columns should be searched and for example firstname might be foo and lastname can be bar.
I think this is to complex for a Linq query?
Maybe I should create a search index and combine all columns into the search index like:
[userId] [indexedValue] where indexedValue is [firstname + " "+ lastname + " "+ address +" " + emailaddress].
Then first split the search value based on spaces and then search for columns that have all words in the search value. Is that a good approach?
The first step with any project is managing expectation. Find the minimum viable solution for the business' need and develop that. Expand on it as the business value is proven. Providing a really flexible and intelligent-feeling search capability would of course make the business happy, but it can often not do what they expect it to do, or perform to a standard that they need, where a simpler solution would do what they need, be simpler to develop and execute faster.
If this represents the minimum viable solution and you want to "and" conditions based on spaces:
public IQueryable<User> SearchUser(string criteria)
{
if(string.IsNullOrEmpty(criteria))
return new List<User>().AsQueryable();
var criteriaValues = criteria.Split(' ');
var query = context.Users.AsQueryable();
foreach(var value in criteriaValues)
{
query = query.Where(u =>
|| u.Firstname.Contains(value)
|| u.Lastname.Contains(value)
|| u.Address.Contains(value)
|| u.EmailAddress.Contains(value));
}
return query;
}
The trouble with trying to index the combined values is that there is no guarantee that for a value like "foo bar" that "foo" represents a first name and "bar" represents a last name, or that "foo" represents a complete vs. partial value. You'd also want to consider stripping out commas and other punctuation as someone might type "smith, john"
When it comes to searching it might pay to perform a bit more of a pattern match to detect what the user might be searching for. For instance a single word like "smith" might search an exact match for first name or last name and display results. If there were no matches then perform a Contains search. If it contains 2 words then a First & last name match search assuming "first last" vs. "last, first" If the value has an "#" symbol, default to an e-mail address search, if it starts with a number, then an address search. Each detected search option could have a first pass search (expecting more exact values) then a 2nd pass more broad search assumption if it comes back empty. There could be even 3rd and 4th pass searches available with broader checks. When results are presented there could be a "more results..." button provided to trigger a 2nd, 3rd, 4th, etc. pass search if the returned results didn't return what the user was expecting.
The idea being when it comes to searching: Try to perform the most typical, narrow expected search and allow the user to broaden the search if they so desire. The goal would be to try and "hit" the most relevant results early, helping mold how users enter their criteria, and then tuning to better perform based on user feedback rather than try and write queries to return as many possible hits as possible. The goal is to help users find what they are looking for on the first page of results. Either way, building a useful search will add complexity of leverage new 3rd party libraries. First determine if that capability is really required.

Cypher query in neo4j to find specific node with most paths matching pattern

I have a neo4j database with statistical information on water and waste. In this database are data points linked with the facts that are relevant, including mappings to internal definitions. Here in the attached screenshot is an example of a data point and the related metadata. The node in the center is the value, and the immediate nodes linked by "HAS_DIMENSION" are the dimensions that came with the data provider. These are not fixed and change depending on the provider. Each dimension of interest is mapped to an internal definition. Currently this is my query:
MATCH (o:Observation {uq_id:'e__ABS_AGR_AQ__FSW__MIO_M3__BG__1970____9f07c7a629625e5ae00e35838fcd4f824a3593dd'})-[:HAS_DIMENSION]->()
MATCH (o)-[:HAS_DIMENSION]->()-[:HAS_SYNONYM_FROM]->()-[:WITH_TARGET_DEF]->(v:Variable)<-[:HAS_UNIT]-(u:Unit)
MATCH (o)-[vl0:HAS_DIMENSION]->()-[:HAS_SYNONYM_FROM]->()-[:WITH_TARGET_DEF]->(l:Location)
MATCH (o)-[vc0:HAS_DIMENSION]->()-[:HAS_SYNONYM_FROM]->()-[:WITH_TARGET_DEF]->(c:Country)
MATCH (o)-[vy0:HAS_DIMENSION]->()-[:HAS_SYNONYM_FROM]->()-[:WITH_TARGET_DEF]->(y:Year)
MATCH (o)-[:HAS_DIMENSION]->(unk0)
MATCH (o)-[sr0:CAME_FROM_FILE]->(ds0)-[sr1:BELONGS_TO]->(s0)
OPTIONAL MATCH (o)-[dtr0:HAS_DIMENSION]->()-[:HAS_SYNONYM_FROM]->()-[:WITH_TARGET_DEF]->(d:DataType)
RETURN *
The issue I have is exemplified by the pink circles. I want only one pink circle (which is a node with label Variable) in the query, in particular I want the variable like follows
MATCH (v:Variable)<-[:MAPS_TO]-()<-[:HAS_DIMENSION]-(o:Observation)
By this I want to force it to observe a pattern where it identifies the single variable that matches the pattern above for the most number of intermediate nodes. So the "Fresh surface water abstracted" variable would match this pattern, since it has two paths that match this. But the "Fresh groundwater abstracted" would not, since it only has one. How could I accomplish this?
It sounds like you want to return the Variable node with the most number of paths leading to it. Would something like this roughly return the results you are after? You will need to adapt according to your matching statements.
MATCH p=(o:Observation {uq_id:'<your_id>'})-[:HAS_DIMENSION]->()<-[:MAPS_TO]-(v:Variable)
RETURN v.name, COUNT(p) as p ORDER BY p DESC LIMIT 1

Cypher returns exponential count result on 'join'

I am learning Cypher on Neo4j and I am having trouble in understanding how to perform an efficient 'join' equivalent in Cypher.
I am using the standard Matrix character example and I have added some nodes to the mix called 'Gun' with a relation of ':GIVEN_TO'. You can see the console with my query result here:
http://console.neo4j.org/r/rog2hv
The query I am using is:
MATCH (Neo:Crew { name: 'Neo' })-[:KNOWS*..]->(other:Crew),(other)<-[:GIVEN_TO]-(g:Gun),(Neo)<-[:GIVEN_TO]-(g2:Gun)
RETURN count(g2);
I have given Neo 4 guns, but when I perform the above I get a count of '12'. This seems to be the case because there are 3 'others' and 3*4 = 12. So I get some exponential result.
What should my query look like to get the correct count ('4') from the example?
Edit:
The reason I am not querying through Guns directly as suggested by #ceej is because in my real use case I have to do this traversal as described above. Adding DISTINCT does not do anything for my result.
The reason you get 12 guns instead of 4 is because your query produces a cartesian product. This is because you have asked for items in the same match statement without joining them. #ceej rightly pointed out if you want to find Neo's guns you would do as he suggested in his first query.
If you wanted to get a list of the crew members and their guns then you could do something like this...
MATCH (crew:Crew)<-[:GIVEN_TO]-(g:Gun)
RETURN crew.name, collect(g.name)
Which finds all of the crew members with guns and returns their name and the guns that they were given.
If you wanted to invert it and get a list of the guns and the respective crew members they were give to you could do the following...
MATCH (crew:Crew)<-[:GIVEN_TO]-(g:Gun)
RETURN g.name, collect(crew.name)
If you wanted to find all of the crew that knew Neo multiple levels deep that were given a gun you could write the query like this...
MATCH (crew:Crew)<-[:GIVEN_TO]-(g:Gun)
WITH crew, g
MATCH (neo:Crew {name: 'Neo'})-[:KNOWS*0..]->(crew)
RETURN crew.name, collect(g.name)
That finds all the crew that were given guns and then determines which of them have a :KNOWS path to Neo.
Forgive me, but I am am unclear why you have the initial MATCH in your query. From your explanation it would appear that you are trying to get the number of :Gun nodes linked to Neo by the :GIVEN_TO relationship. In which case all you need is the latter part of your query. Which would give you something like
MATCH (neo:Crew { name: 'Neo' })<-[:GIVEN_TO]-(g:Gun)
RETURN count(g)
Furthermore, to make sure that you are only counting distinct :Gun nodes you can add DISTINCT to the RETURN statement.
MATCH (neo:Crew { name: 'Neo' })<-[:GIVEN_TO]-(g:Gun)
RETURN count( DISTINCT g )
This is possibly unnecessary in your case but can be helpful when the pattern that you are matching on can arrive at the same node by different traversals.
Have I misunderstood your requirement?

How to query for multiple vertices and counts of their relationships in Gremlin/Tinkerpop 3?

I am using Gremlin/Tinkerpop 3 to query a graph stored in TitanDB.
The graph contains user vertices with properties, for example, "description", and edges denoting relationships between users.
I want to use Gremlin to obtain 1) users by properties and 2) the number of relationships (in this case of any kind) to some other user (e.g., with id = 123). To realize this, I make use of the match operation in Gremlin 3 like so:
g.V().match('user',__.as('user').has('description',new P(CONTAINS,'developer')),
__.as('user').out().hasId(123).values('name').groupCount('a').cap('a').as('relationships'))
.select()
This query works fine, unless there are multiple user vertices returned, for example, because multiple users have the word "developer" in their description. In this case, the count in relationships is the sum of all relationships between all returned users and the user with id 123, and not, as desired, the individual count for every returned user.
Am I doing something wrong or is this maybe an error?
PS: This question is related to one I posted some time ago about a similar query in Tinkerpop 2, where I had another issue: How to select optional graph structures with Gremlin?
Here's the sample data I used:
graph = TinkerGraph.open()
g = graph.traversal()
v123=graph.addVertex(id,123,"description","developer","name","bob")
v124=graph.addVertex(id,124,"description","developer","name","bill")
v125=graph.addVertex(id,125,"description","developer","name","brandy")
v126=graph.addVertex(id,126,"description","developer","name","beatrice")
v124.addEdge('follows',v125)
v124.addEdge('follows',v123)
v124.addEdge('likes',v126)
v125.addEdge('follows',v123)
v125.addEdge('likes',v123)
v126.addEdge('follows',v123)
v126.addEdge('follows',v124)
My first thought, was: "Do we really need match step"? Secondarily, of course, I wanted to write this in TP3 fashion and not use a lambda/closure. I tried all manner of things in the first iteration and the closest I got was stuff like this from Daniel Kuppitz:
gremlin> g.V().as('user').local(out().hasId(123).values('name')
.groupCount()).as('relationships').select()
==>[relationships:[:]]
==>[relationships:[bob:1]]
==>[relationships:[bob:2]]
==>[relationships:[bob:1]]
so here we used local step to restrict the traversal within local to the current element. This works, but we lost the "user" tag in the select. Why? groupCount is a ReducingBarrierStep and paths are lost after those steps.
Well, let's go back to match. I figured I could try to make the match step traverse using local:
gremlin> g.V().match('user',__.as('user').has('description','developer'),
gremlin> __.as('user').local(out().hasId(123).values('name').groupCount()).as('relationships')).select()
==>[relationships:[:], user:v[123]]
==>[relationships:[bob:1], user:v[124]]
==>[relationships:[bob:2], user:v[125]]
==>[relationships:[bob:1], user:v[126]]
Ok - success - that's what we wanted: no lambdas and local counts. But, it still left me feeling like: "Do we really need match step"? That's when Mr. Kuppitz closed in on the final answer which makes copious use of the by step:
gremlin> g.V().has('description','developer').as("user","relationships").select().by()
.by(out().hasId(123).values("name").groupCount())
==>[user:v[123], relationships:[:]]
==>[user:v[124], relationships:[bob:1]]
==>[user:v[125], relationships:[bob:2]]
==>[user:v[126], relationships:[bob:1]]
As you can see, by can be chained (on some steps). The first by groups by vertex and the second by processes the grouped elements with a "local" groupCount.

Resources