Neo4j: Do queries on specific database? - database

I´m new to Neo4j, and want to implement a service that makes use of it.
I´ve read the docs and searched for it, however I still didn´t get an answer to this simple question:
How do I specify which database to query in a Neo4j query?
E.g. I connected to bolt://localhost:7687, and have three databases in there: system, neo4j, and mydb. The neo4j database is the standard.
When I open the Neo4j browser and do a query such as MATCH (n) RETURN n, it automatically assumes that I want to query the standard DB which is called neo4j. However, I want to query another one, mydb.
My output when I query aforementioned query says
{
"query": {
"text": "match (n) return n",
"parameters": {}
},
"queryType": "r",
"counters": {
"_stats": {
"nodesCreated": 0,
"nodesDeleted": 0,
"relationshipsCreated": 0,
"relationshipsDeleted": 0,
"propertiesSet": 0,
"labelsAdded": 0,
"labelsRemoved": 0,
"indexesAdded": 0,
"indexesRemoved": 0,
"constraintsAdded": 0,
"constraintsRemoved": 0
},
"_systemUpdates": 0
},
"updateStatistics": {
"_stats": {
"nodesCreated": 0,
"nodesDeleted": 0,
"relationshipsCreated": 0,
"relationshipsDeleted": 0,
"propertiesSet": 0,
"labelsAdded": 0,
"labelsRemoved": 0,
"indexesAdded": 0,
"indexesRemoved": 0,
"constraintsAdded": 0,
"constraintsRemoved": 0
},
"_systemUpdates": 0
},
"plan": false,
"profile": false,
"notifications": [],
"server": {
"address": "localhost:7687",
"version": "Neo4j/4.4.5",
"agent": "Neo4j/4.4.5",
"protocolVersion": 4.4
},
"resultConsumedAfter": {
"low": 2,
"high": 0
},
"resultAvailableAfter": {
"low": 8,
"high": 0
},
"database": {
"name": "neo4j"
}
}
In the last JSON value is the proof that the query was executed on database neo4j.
What do I have to add to my queries to instead query another database in the same DBMS?

You can change/specify the database using the following options.
From the Neo4j Browser, you can select the database in the sidebar.
In Cypher syntax, the use command lets you choose different databases.
:use mydb.
If you connect to Neo4j through an Application driver, you can specify the database while creating the session object.
For example, if you are using the Python driver:
from neo4j import GraphDatabase
driver = GraphDatabase.driver(uri, auth=(user, password))
session = driver.session(database="mydb")
Specify the default database in a system-wide manner by modifying the config_dbms.default_database value in the the neo4j.conf file.

Related

Postgress INET to Snowflake Compatible

I need help in migrating a postgres function to snowflake function.
Currently, i have following function that takes an ip_address and returns the starting range of ip_address:
...
begin
if (p_ip_address is not null) then
return p_ip_address::inet - '0.0.0.0';
else
return null;
end if;
end
...
I know we have PARSE_IP in snowflake that give JSON file but i need just one piece of this json (ipv4_range_start)
For Example:
Input = 192.168.242.188
Output = 3232297472
If you only want peice of information, select it.
select parse_ip('127.0.0.0/24','INET') as a
,a:ipv4_range_end;
gives:
A A:IPV4_RANGE_END
{ "family": 4, "host": "127.0.0.0", "ip_fields": [ 2130706432, 0, 0, 0 ], "ip_type": "inet", "ipv4": 2130706432, "ipv4_range_end": 2130706687, "ipv4_range_start": 2130706432, "netmask_prefix_length": 24, "snowflake$type": "ip_address" } 2130706687

How to flatten an array in a nested json in aws glue using pyspark?

I am trying to flatten a JSON file to be able to load it into PostgreSQL all in AWS Glue. I am using PySpark. Using a crawler I crawl the S3 JSON and produce a table. I then use an ETL Glue script to:
read the crawled table
use the 'Relationalize' function to flatten the file
convert the dynamic frame to a dataframe
try to 'explode' the request.data field
Script so far:
datasource0 = glueContext.create_dynamic_frame.from_catalog(database = glue_source_database, table_name = glue_source_table, transformation_ctx = "datasource0")
df0 = Relationalize.apply(frame = datasource0, staging_path = glue_temp_storage, name = dfc_root_table_name, transformation_ctx = "dfc")
df1 = df0.select(dfc_root_table_name)
df2 = df1.toDF()
df2 = df1.select(explode(col('`request.data`')).alias("request_data"))
<then i write df1 to a PostgreSQL database which works fine>
Issues I face:
The 'Relationalize' function works well except the request.data field which becomes a bigint and therefore 'explode' doesn't work.
Explode cannot be done without using 'Relationalize' on the JSON first due to the structure of the data. Specifically the error is: "org.apache.spark.sql.AnalysisException: cannot resolve 'explode(request.data)' due to data type mismatch: input to function explode should be array or map type, not bigint"
If I try to make the dynamic frame a dataframe first then I get this issue: "py4j.protocol.Py4JJavaError: An error occurred while calling o72.jdbc.
: java.lang.IllegalArgumentException: Can't get JDBC type for struct..."
I tried to also upload a classifier so that the data would flatten in the crawl itself but AWS confirmed this wouldn't work.
The JSON format of the original file is as follows, that I an trying to normalise:
- field1
- field2
- {}
- field3
- {}
- field4
- field5
- []
- {}
- field6
- {}
- field7
- field8
- {}
- field9
- {}
- field10
# Flatten nested df
def flatten_df(nested_df):
for col in nested_df.columns:
array_cols = [c[0] for c in nested_df.dtypes if c[1][:5] == 'array']
for col in array_cols:
nested_df =nested_df.withColumn(col, F.explode_outer(nested_df[col]))
nested_cols = [c[0] for c in nested_df.dtypes if c[1][:6] == 'struct']
if len(nested_cols) == 0:
return nested_df
flat_cols = [c[0] for c in nested_df.dtypes if c[1][:6] != 'struct']
flat_df = nested_df.select(flat_cols +
[F.col(nc+'.'+c).alias(nc+'_'+c)
for nc in nested_cols
for c in nested_df.select(nc+'.*').columns])
return flatten_df(flat_df)
df=flatten_df(df)
It will replace all dots with underscore. Note that it uses explode_outer and not explode to include Null value in case array itself is null. This function is available in spark v2.4+ only.
Also remember, exploding array will add more duplicates and overall row size will increase. Flattening struct will increase column size. In short, your original df will explode horizontally and vertically. It may slow down processing data later.
Therefore my recommendation would be to identify feature related data and store only those data in postgresql and original json files in s3.
Once you have rationalized the json column, you don't need to explode it. Relationalize transforms the nested JSON into key-value pairs at the outermost level of the JSON document. The transformed data maintains a list of the original keys from the nested JSON separated by periods.
Example :
Nested json :
{
"player": {
"username": "user1",
"characteristics": {
"race": "Human",
"class": "Warlock",
"subclass": "Dawnblade",
"power": 300,
"playercountry": "USA"
},
"arsenal": {
"kinetic": {
"name": "Sweet Business",
"type": "Auto Rifle",
"power": 300,
"element": "Kinetic"
},
"energy": {
"name": "MIDA Mini-Tool",
"type": "Submachine Gun",
"power": 300,
"element": "Solar"
},
"power": {
"name": "Play of the Game",
"type": "Grenade Launcher",
"power": 300,
"element": "Arc"
}
},
"armor": {
"head": "Eye of Another World",
"arms": "Philomath Gloves",
"chest": "Philomath Robes",
"leg": "Philomath Boots",
"classitem": "Philomath Bond"
},
"location": {
"map": "Titan",
"waypoint": "The Rig"
}
}
}
Flattened out json after rationalize :
{
"player.username": "user1",
"player.characteristics.race": "Human",
"player.characteristics.class": "Warlock",
"player.characteristics.subclass": "Dawnblade",
"player.characteristics.power": 300,
"player.characteristics.playercountry": "USA",
"player.arsenal.kinetic.name": "Sweet Business",
"player.arsenal.kinetic.type": "Auto Rifle",
"player.arsenal.kinetic.power": 300,
"player.arsenal.kinetic.element": "Kinetic",
"player.arsenal.energy.name": "MIDA Mini-Tool",
"player.arsenal.energy.type": "Submachine Gun",
"player.arsenal.energy.power": 300,
"player.arsenal.energy.element": "Solar",
"player.arsenal.power.name": "Play of the Game",
"player.arsenal.power.type": "Grenade Launcher",
"player.arsenal.power.power": 300,
"player.arsenal.power.element": "Arc",
"player.armor.head": "Eye of Another World",
"player.armor.arms": "Philomath Gloves",
"player.armor.chest": "Philomath Robes",
"player.armor.leg": "Philomath Boots",
"player.armor.classitem": "Philomath Bond",
"player.location.map": "Titan",
"player.location.waypoint": "The Rig"
}
Thus in your case, request.data is already a new column flattened out from request column and its type is interpreted as bigint by spark.
Reference : Simplify/querying nested json with the aws glue relationalize transform

Why can't I retrieve the master appointment of a series via `AppointmentCalendar.FindAppointmentsAsync`?

I'm retrieving multiple appointments via AppointmentCalendar.FindAppointmentsAsync. I'm evaluating the Recurrence.RecurrenceType and noticed an unexpected value of 1 for master appointments of a series. I expect the Recurrence.RecurrenceType to be 0 (Master) but instead it is 1 (Instance).
(Note: I added AppointmentProperties.Recurrence to FindAppointmentsOptions.FetchProperties that is passed to GetAppointmentsAsync, so the Recurrence data should be fetched propertly.)
To double check I retrieved the respective master appointment via GetAppointmentAsync (instead of FindAppointmentsAsync) using its LocalId - and here the RecurrenceType is correctly set to 0.
Here is demo output for a test appointment series:
Data gotten by FindAppointmentsAsync (Instance??):
"Recurrence": {
"Unit": 0,
"Occurrences": 16,
"Month": 1,
"Interval": 1,
"DaysOfWeek": 0,
"Day": 1,
"WeekOfMonth": 0,
"Until": "2016-09-29T02:00:00+02:00",
"TimeZone": "Europe/Budapest",
"RecurrenceType": 1,
"CalendarIdentifier": "GregorianCalendar"
},
"StartTime": "2016-09-14T19:00:00+02:00",
"OriginalStartTime": "2016-09-14T19:00:00+02:00",
Data gotten by GetAppointmentAsync for the same appointment (Master):
"Recurrence": {
"Unit": 0,
"Occurrences": 16,
"Month": 1,
"Interval": 1,
"DaysOfWeek": 0,
"Day": 1,
"WeekOfMonth": 0,
"Until": "2016-09-29T02:00:00+02:00",
"TimeZone": "Europe/Budapest",
"RecurrenceType": 0,
"CalendarIdentifier": "GregorianCalendar"
},
"StartTime": "2016-09-14T19:00:00+02:00",
"OriginalStartTime": null,
Notice the difference in RecurrenceType. Also note that OriginalStartTime is set to null for the master gotten by GetAppointmentAsync but has a value for the appointment gotten by FindAppointmentsAsync.
You can also see that the StartTime for the master appointment is the start time set for the alleged Instance (which in reality is the master).
Shouldn't FindAppointmentsAsync return a master as the first element of a series, instead of an instance?
(SDK: 10.0.14393.0, Anniversary)
Code to explicitly find such a master/instance situation for a given calendar:
var appointmentsCurrent = await calendar.FindAppointmentsAsync(DateTimeOffset.Now, TimeSpan.FromDays(365), findAppointmentOptions);
foreach(var a in appointmentsCurrent)
{
var a2 = await calendar.GetAppointmentAsync(a.LocalId);
if (a2.Recurrence?.RecurrenceType == RecurrenceType.Master &&
a2.StartTime == a.StartTime &&
a.Recurrence?.RecurrenceType == RecurrenceType.Instance &&
a.OriginalStartTime == a2.StartTime)
{
Debug.WriteLine("Gotcha!");
}
}
I tested above code on my side. If you get the count of the appointments which are got from FindAppontmentsAsync by the following code:var count=appointmentsCurrent.Count;, you will find it does return the count of the appointment instances, not the count of master appointments. So the FindAppontmentsAsync method got all instances of the appointments not master appointments. This is the reason why the RecurrenceType is instance.
It seems like we can get one master appointment by method GetAppointmentAsync as you mentioned above, so I suppose this may not block you.
If you think this is not a good design for this API or you require a API for finding all the master appointments in one calendar, you can submit your ideas to the windows 10 feedback tool or the user voice site.

lucene solr - how to know numCount of each word in query

i have a query string with 5 words. for exmple "cat dog fish bird animals".
i need to know how many matches each word has.
at this point i create 5 queries:
/q=name:cat&rows=0&facet=true
/q=name:dog&rows=0&facet=true
/q=name:fish&rows=0&facet=true
/q=name:bird&rows=0&facet=true
/q=name:animals&rows=0&facet=true
and get matches count of each word from each query.
but this method takes too many time.
so is there a way to check get numCount of each word with one query?
any help appriciated!
In this case, functionQueries are your friends. In particular:
termfreq(field,term) returns the number of times the term appears in the field for that document. Example Syntax:
termfreq(text,'memory')
totaltermfreq(field,term) returns the number of times the term appears in the field in the entire index. ttf is an alias of
totaltermfreq. Example Syntax: ttf(text,'memory')
The following query for instance:
q=*%3A*&fl=cntOnSummary%3Atermfreq(summary%2C%27hello%27)+cntOnTitle%3Atermfreq(title%2C%27entry%27)+cntOnSource%3Atermfreq(source%2C%27activities%27)&wt=json&indent=true
returns the following results:
"docs": [
{
"id": [
"id-1"
],
"source": [
"activities",
"activities"
],
"title": "Ajones3 Activity Entry 1",
"summary": "hello hello",
"cntOnSummary": 2,
"cntOnTitle": 1,
"cntOnSource": 1,
"score": 1
},
{
"id": [
"id-2"
],
"source": [
"activities",
"activities"
],
"title": "Common activity",
"cntOnSummary": 0,
"cntOnTitle": 0,
"cntOnSource": 1,
"score": 1
}
}
]
Please notice that while it's working well on single value field, it seems that for multivalued fields, the functions consider just the first entry, for instance in the example above, termfreq(source%2C%27activities%27) returns 1 instead of 2.

count based on a specific field in SOLR

How to return the count of a field with each object in Solr
When I do fq=verify_ix:1 I have a response below, I want to get count where verify_ix = 1 in the response too. How can I do that?
"response": {
"numFound": 9484,
"start": 0,
"maxScore": 1,
"docs": [
{
"id": "10000000000965509",
"description_s": "No Description",
"recommendation_ix": 0,
"sId_lx": 30005938,
"sType_sx": "P",
"condition_ix": 1000,
"verify_ix": 1
},
.
.
.
{
"id": "10000000000965734",
"description_s": "No Description",
"recommendation_ix": 1,
"sId_lx": 30005947,
"sType_sx": "P",
"condition_ix": 2000,
"verify_ix": 1
}
]}
If you want counts of the different values for a given field, you can send a request to Solr with facet=true and facet.field=verify_ix. For counts over all records, set q=*:*. If you don't want to see any rows returned, you can set rows=0.
See here for more details on faceting:
https://cwiki.apache.org/confluence/display/solr/Faceting
(I tested this with Solr 5, but faceting should work with Solr 4 as well.)

Resources