Neo4j apply relationship multiple times in a match-query - database

I have a two nodes in database, arrival_airport and departure_airport, And I have 1 relationship between both airports.
So, when I want select all flights between 2 destinations (BOJ->SFX) I do the following:
MATCH (da:Departure_Airport {airport:'BOJ'})-[f:FlightInfo]->(aa:Arrival_Airport {airport: 'SFX'})
RETURN f, da, aa
The question is, how can I apply FlightInfo multiple times, in order to get also all flights with a legs? (for example: BOJ->FRA->SFX)
Maybe query should look similar to this one (with an asterix):
MATCH (da:Departure_Airport {airport:'BOJ'})-[f:FlightInfo]*->(aa:Arrival_Airport {airport: 'SFX'})
RETURN f, da, aa
UPDATE - Solution
So thanks for all answers and comments. I had to create the relationships between airports properly. So my query for airport-import and automatic creations of relationships (flights) looks as follow:
USING PERIODIC COMMIT 1000
LOAD CSV FROM "file:///airports.csv" AS line FIELDTERMINATOR ";"
MERGE (departure_airport: Airport {name:line[0]})
MERGE (arrival_airport: Airport {name: line[1]})
MERGE (departure_airport)-[f:Flight {departure_time:line[2], arrival_time:line[3], carrier_code:line[4], service_class:line[5], overall_conti:line[6]}]-(arrival_airport)
ON CREATE SET departure_airport.name=line[0],arrival_airport.name=line[1], f.departure_time=line[2], f.arrival_time=line[3], f.carrier_code=line[4]
As result you are be able to match flights as it was answered bellow

Of course I don't know all your requirements, but I assume a slightly adapted graph model works better for you. It could be easier, if the airport type (arrival / departure) is specified by the incoming or outgoing relationship to another airport or flight, rather than by the node respectively label itself. Therefore I'd like to suggest a change of your graph model in the following way:
CREATE
(boj:Airport {name: 'BOJ'}),
(sfx:Airport {name: 'SFX'}),
(fra:Airport {name: 'FRA'})
CREATE
(boj)-[:FLIGHT_INFO]->(sfx),
(boj)-[:FLIGHT_INFO]->(fra),
(fra)-[:FLIGHT_INFO]->(sfx);
Your desired query would be in this case:
MATCH
flightPaths = (departure:Airport {name: 'BOJ'})-[:FLIGHT_INFO*]->(arivial:Airport {name: 'SFX'})
RETURN DISTINCT
flightPaths;

Related

How to Create Relationship between two Different Column in Neo4j

I am trying to initiate a relationship between two columns in Neo4j. my dataset is a CSV file with two-column refers to Co-Authorship and I want to Construct a Network of it. I already load the data, return them and match them.
Loading
load csv from 'file:///conet1.csv' as rec
return the data
create (:Guys {source: rec[0], target: rec[1]})
now I need to Construct the Collaboration Network of data by making a relationship between source and target columns. What do you propose for the purpose?
I was able to make a relationship between mentioned columns in NetworkX graph libray in python like this:
import pandas as pd
import networkx as nx
g = nx.Graph()
df = pd.read_excel('Colab.csv', columns= ['source', 'target'])
g = nx.from_pandas_edgelist(df,'source','target', 'weight')
If I understand your use case, I do not believe you should be creating Guys nodes just to store relationship info. Instead, the graph-oriented approach would be to create an Author node for each author and a relationship (say, of type COLLABORATED_WITH) between the co-authors.
This might work for you, or at least give you a clue:
LOAD CSV FROM 'file:///conet1.csv' AS rec
MERGE (source:Author {id: rec[0]})
MERGE (target:Author {id: rec[1]})
CREATE (source)-[:COLLABORATED_WITH]->(target)
If it is possible that the same relationship could be re-created, you should replace the CREATE with a more expensive MERGE. Also, a work can have any number of co-authors, so having a relationship between every pair may be sub-optimal depending on what you are trying to do; but that is a separate issue.

CYPHER, NEO4J Returning couples of nodes that are connected by a relationship of type A and not by a relation of type B

I am working on the Neo4j tutorial Movies database.
I would like to make a query that returns the people that directed a movie but did not produced it.
In order to accomplish this I have used the query:
match (m:Movie)<-[:DIRECTED]-(p:Person)-[r]->(m) where type(r) <> 'PRODUCED' return p
Nevertheless, if I make it return * I still get these couples (person, movie) where the person not only directed and produced the film but also wrote it:
In the image there is one of the not admissible couples that are returned by my query
On the contrary, the query seems to successfully rule out all those couples that are with only the two relationships 'PRODUCED' and 'DIRECTED'.
Is there a way of writing this query so that I can rule out all these couples as well?
You can use path pattern in WHERE:
match (m:Movie)<-[r:DIRECTED]-(p:Person)
where not (p)-[:PRODUCED]->(m)
return m, p, r

Searching for particular combinations across two columns (nvarchar)

I am trying to search a table for entries that feature a combination of specific values across two particular columns.
I'm having no problem performing the search using one condition:
SELECT *
FROM Table
WHERE [artist_id] IN ('ID1', 'ID2', etc)
But I'd like to add a second condition, something like this:
AND WHERE [track_name] IN (NAME1', 'NAME2', etc)
A few notes:
"artist_id" and "track_name" are both formatted as nvarchar, with "track_name" taking the form of single words or phrases.
There are multiple entries for each "artist_id" and "track_name," but all combinations of the two are unique.
So, how can I combine these conditions into a single query?
Here's a snippet of the code:
SELECT *
FROM [Music].[dbo].[echonest_tracks]
WHERE [artist_id] IN ('AR03U0G1187B9B1D35', 'AR03U0G1187B9B1D35', etc)
AND [track_title] IN ('Location', 'Cape Vibes Got 'em?', 'Feeling Good (Instrumental Remix)', 'How my heart by you', etc)
I think this is what you are looking for since you are looking for combinations:
SELECT *
FROM [Music].[dbo].[echonest_tracks]
WHERE
([artist_id] = 'AR03U0G1187B9B1D35' AND [track_title] IN ('Location', 'Cape Vibes Got 'em?', 'Feeling Good (Instrumental Remix)')
OR
([artist_id] = 'AR03U0G1187B9B1D35' AND [track_title] IN ('How my heart by you', etc))
You are almost there.
SELECT *
FROM Table
WHERE [artist_id] IN ('ID1', 'ID2', etc)
AND [track_name] IN ('NAME1', 'NAME2', etc)
should do it.
As for the single quote within a query string, see this answer:
How do I escape a single quote in SQL Server?
You need to double up a single quote to have it within a query string.

Django Query Optimisation

I am working currently on telecom analytics project and newbie in query optimisation. To show result in browser it takes a full minute while just 45,000 records are to be accessed. Could you please suggest on ways to reduce time for showing results.
I wrote following query to find call-duration of a person of age-group:
sigma=0
popn=len(Demo.objects.filter(age_group=age))
card_list=[Demo.objects.filter(age_group=age)[i].card_no
for i in range(popn)]
for card in card_list:
dic=Fact_table.objects.filter(card_no=card.aggregate(Sum('duration'))
sigma+=dic['duration__sum']
avgDur=sigma/popn
Above code is within for loop to iterate over age-groups.
Model is as follows:
class Demo(models.Model):
card_no=models.CharField(max_length=20,primary_key=True)
gender=models.IntegerField()
age=models.IntegerField()
age_group=models.IntegerField()
class Fact_table(models.Model):
pri_key=models.BigIntegerField(primary_key=True)
card_no=models.CharField(max_length=20)
duration=models.IntegerField()
time_8bit=models.CharField(max_length=8)
time_of_day=models.IntegerField()
isBusinessHr=models.IntegerField()
Day_of_week=models.IntegerField()
Day=models.IntegerField()
Thanks
Try that:
sigma=0
demo_by_age = Demo.objects.filter(age_group=age);
popn=demo_by_age.count() #One
card_list = demo_by_age.values_list('card_no', flat=True) # Two
dic = Fact_table.objects.filter(card_no__in=card_list).aggregate(Sum('duration') #Three
sigma = dic['duration__sum']
avgDur=sigma/popn
A statement like card_list=[Demo.objects.filter(age_group=age)[i].card_no for i in range(popn)] will generate popn seperate queries and database hits. The query in the for-loop will also hit the database popn times. As a general rule, you should try to minimize the amount of queries you use, and you should only select the records you need.
With a few adjustments to your code this can be done in just one query.
There's generally no need to manually specify a primary_key, and in all but some very specific cases it's even better not to define any. Django automatically adds an indexed, auto-incremental primary key field. If you need the card_no field as a unique field, and you need to find rows based on this field, use this:
class Demo(models.Model):
card_no = models.SlugField(max_length=20, unique=True)
...
SlugField automatically adds a database index to the column, essentially making selections by this field as fast as when it is a primary key. This still allows other ways to access the table, e.g. foreign keys (as I'll explain in my next point), to use the (slightly) faster integer field specified by Django, and will ease the use of the model in Django.
If you need to relate an object to an object in another table, use models.ForeignKey. Django gives you a whole set of new functionality that not only makes it easier to use the models, it also makes a lot of queries faster by using JOIN clauses in the SQL query. So for you example:
class Fact_table(models.Model):
card = models.ForeignKey(Demo, related_name='facts')
...
The related_name fields allows you to access all Fact_table objects related to a Demo instance by using instance.facts in Django. (See https://docs.djangoproject.com/en/dev/ref/models/fields/#module-django.db.models.fields.related)
With these two changes, your query (including the loop over the different age_groups) can be changed into a blazing-fast one-hit query giving you the average duration of calls made by each age_group:
age_groups = Demo.objects.values('age_group').annotate(duration_avg=Avg('facts__duration'))
for group in age_groups:
print "Age group: %s - Average duration: %s" % group['age_group'], group['duration_avg']
.values('age_group') selects just the age_group field from the Demo's database table. .annotate(duration_avg=Avg('facts__duration')) takes every unique result from values (thus each unique age_group), and for each unique result will fetch all Fact_table objects related to any Demo object within that age_group, and calculate the average of all the duration fields - all in a single query.

Searching for and matching elements across arrays

I have two tables.
In one table there are two columns, one has the ID and the other the abstracts of a document about 300-500 words long. There are about 500 rows.
The other table has only one column and >18000 rows. Each cell of that column contains a distinct acronym such as NGF, EPO, TPO etc.
I am interested in a script that will scan each abstract of the table 1 and identify one or more of the acronyms present in it, which are also present in table 2.
Finally the program will create a separate table where the first column contains the content of the first column of the table 1 (i.e. ID) and the acronyms found in the document associated with that ID.
Can some one with expertise in Python, Perl or any other scripting language help?
It seems to me that you are trying to join the two tables where the acronym appears in the abstract. ie (pseudo SQL):
SELECT acronym.id, document.id
FROM acronym, document
WHERE acronym.value IN explode(documents.abstract)
Given the desired semantics you can use the most straight forward approach:
acronyms = ['ABC', ...]
documents = [(0, "Document zeros discusses the value of ABC in the context of..."), ...]
joins = []
for id, abstract in documents:
for word in abstract.split():
try:
index = acronyms.index(word)
joins.append((id, index))
except ValueError:
pass # word not an acronym
This is a straightforward implementation; however, it has n cubed running time as acronyms.index performs a linear search (of our largest array, no less). We can improve the algorithm by first building a hash index of the acronyms:
acronyms = ['ABC', ...]
documents = [(0, "Document zeros discusses the value of ABC in the context of..."), ...]
index = dict((acronym, idx) for idx, acronym in enumberate(acronyms))
joins = []
for id, abstract in documents:
for word in abstract.split():
try
joins.append((id, index[word]))
except KeyError:
pass # word not an acronym
Of course, you might want to consider using an actual database. That way you won't have to implement your joins by hand.
Thanks a lot for the quick response.
I assume the pseudo SQL solution is for MYSQL etc. However it did not work in Microsoft ACCESS.
the second and the third are for Python I assume. Can I feed acronym and document as input files?
babru
It didn't work in Access because tables are accessed differently (e.g. acronym.[id])

Resources