clone some relationships according to a condition - sql-server

I exported two tables named Keys and Acc tables as CSV files from SQL Server and imported them successfully to Neo4J by using the commands below.
CREATE INDEX ON :Keys(IdKey)
USING PERIODIC COMMIT 500
LOAD CSV FROM 'file:///C:/Keys.txt' AS line
MERGE (k:Keys { IdKey: line[0] })
SET k.KeyNam=line[1], k.KeyLib=line[2], k.KeyTyp=line[3], k.KeySubTyp=line[4]
USING PERIODIC COMMIT 500
LOAD CSV FROM 'file:///C:/Acc.txt' AS line
MERGE (callerObject:Keys { IdKey : line[0] })
MERGE (calledObject:Keys { IdKey : line[1] })
MERGE (callerObject)-[rc:CALLS]->(calledObject)
SET rc.AccKnd=line[2], rc.Prop=line[3]
Keys stands for the source code objects, Acc stands for relations among them. I imported these two tables three times for three different application projects. So to maintain IdKey property being unique for three applications, I concatenated a five character prefix to IdKey to identify the Object for Application while exporting from sql server because we can not create index based on multiple fields as I learnt from manuals. Now my aim is constructing the relations among applications. For example:
Node1 is a source code object of Application1
Node2 is another source code object of Application1
Node3 is a source code object of Application2
There is already a CALL relation created from Node1 to Node2 because of the record in Acc already imported.
The Name of the Node2 is equal to name of Node3. So we can say that Node2 and Node3 are in fact the same source codes. So we should create a relation from Node1 to Node3. To realize it, I wrote a command below. But I want to be sure that it is correct. Because I do not know how long it will execute.
MATCH (caller:Keys)-[rel:CALLS]->(called:Keys),(calledNew:Keys)
WHERE calledNew.KeyNam = called.KeyNam
and calledNew.IdKey <> called.IdKey
CREATE (caller)-[:CALLS]->(calledNew)

This following query should be efficient, assuming you also create an index on :Keys(KeyNam).
MATCH (caller:Keys)-[rel:CALLS]->(called:Keys)
WITH caller, COLLECT(called.KeyNam) AS names
MATCH (calledNew:Keys)
WHERE calledNew.KeyNam IN names AND NOT (caller)-[:CALLS]->(calledNew)
CREATE (caller)-[:CALLS]->(calledNew)
Cypher will not use an index when doing comparisons directly between property values. So this query puts all the called names for each caller into a names collection, and then does a comparison between calledNew.KeyNam and the items in that collection. This causes the index to be used, and will speed up the identification of potential duplicate called nodes.
This query also does a NOT (caller)-[:CALLS]->(calledNew) check, to avoid creating duplicate relationships between the same nodes.

Related

How to find a MoveTo destination filled by database?

I could need some help with a Anylogic Model.
Model (short): Manufacturing scenario with orders move in a individual route. The workplaces (WP) are dynamical created by simulation start. Their names, quantity and other parameters are stored in a database (excel Import). Also the orders are created according to an import. The Agent population "order" has a collection routing which contains the Workplaces it has to stop in the specific order.
Target: I want a moveTo block in main which finds the next destination of the agent order.
Problem and solution paths:
I set the destination Type to agent and in the Agent field I typed a function agent.getDestination(). This function is in order which returns the next entry of the collection WP destinationName = routing.get(i). With this I get a Datatype error (while run not compiling). I quess it's because the database does not save the entrys as WP Type but only String.
Is there a possiblity to create a collection with agents from an Excel?
After this I tried to use the same getDestination as String an so find via findFirst the WP matching the returned name and return it as WP. WP targetWP = findFirst(wps, w->w.name == destinationName);
Of corse wps (the population of Workplaces) couldn't be found.
How can I search the population?
Maybe with an Agentlink?
I think it is not that difficult but can't find an answer or a solution. As you can tell I'm a beginner... Hope the description is good an someone can help me or give me a hint :)
Thanks
Is there a possiblity to create a collection with agents from an Excel?
Not directly using the collection's properties and, as you've seen, you can't have database (DB) column types which are agent types.1
But this is relatively simple to do directly via Java code (and you can use the Insert Database Query wizard to construct the skeleton code for you).
After this I tried to use the same getDestination as String an so find via findFirst the WP matching the returned name and return it as WP
Yes, this is one approach. If your order details are in Excel/the database, they are presumably referring to workplaces via some String ID (which will be a parameter of the workplace agents you've created from a separate Excel worksheet/database table). You need to use the Java equals method to compare strings though, not == (which is for comparing numbers or whether two objects are the same object).
I want a moveTo block in main which finds the next destination of the agent order
So the general overall solution is
Create a population of Workplace agents (let's say called workplaces in Main) from the DB, each with a String parameter id or similar mapped from a DB column.
Create a population of Order agents (let's say called orders in Main) from the DB and then, in their on-startup action, set up their collection of workplace IDs (type ArrayList, element class String; let's say called workplaceIDsList) using data from another DB table.
Order probably also needs a working variable storing the next index in the list that it needs to go to (so let's say an int variable nextWorkplaceIndex which starts at 0).
Write a function in Main called getWorkplaceByID that has a single String argument id and returns a Workplace. This gets the workplace from the population that matches the ID; a one-line way similar to yours is findFirst(workplaces, w -> w.id.equals(id)).
The MoveTo block (which I presume is in Main) needs to move the Order to an agent defined by getWorkplaceByID(agent.workplaceIDsList.get(nextWorkplaceIndex++)). (The ++ bit increments the index after evaluating the expression so it is ready for the next workplace to go to.)
For populating the collection, you'd have two tables, something like the below (assuming using strings as IDs for workplaces and orders):
orders table: columns for parameters of your orders (including some String id column) other than the workplace-list. (Create one Order agent per row.)
order_workplaces table: columns order_id, sequence_num and workplace_id (so with multiple rows specifying the sequence of workplace IDs for an order ID).
In the On startup action of Order, set up the skeleton query code via the Insert Database Query wizard as below (where we want to loop through all rows for this order's ID and do something --- we'll change the skeleton code to add entries to the collection instead of just printing stuff via traceln like the skeleton code does).
Then we edit the skeleton code to look like the below. (Note we add an orderBy clause to the initial query so we ensure we get the rows in ascending sequence number order.)
List<Tuple> rows = selectFrom(order_workplaces)
.where(order_workplaces.order_id.eq(id))
.orderBy(order_workplaces.sequence_num.asc())
.list();
for (Tuple row : rows) {
workplaceIDsList.add(row.get(order_workplaces.workplace_id));
}
1 The AnyLogic database is a normal relational database --- HSQLDB in fact --- and databases only understand their own specific data types like VARCHAR, with AnyLogic and the libraries it uses translating these to Java types like String. In the user interface, AnyLogic makes it look like you set the column types as int, String, etc. but these are really the Java types that the columns' contents will ultimately be translated into.
AnyLogic does support columns which have option list types (and the special Code type column for columns containing executable Java code) but these are special cases using special logic under the covers to translate the column data (which is ultimately still a string of characters) into the appropriate option list instance or (for Code columns) into compiled-on-the-fly-and-then-executed Java).
Welcome to Stack Overflow :) To create a Population via Excel Import you have to create a method and call Code like this. You also need an empty Population.
int n = excelFile.getLastRowNum(YOUR_SHEET_NAME);
for(int i = FIRST_ROW; i <= n; i++){
String name = excelFile.getCellStringValue(YOUR_SHEET_NAME, i, 1);
double SEC_PARAMETER_TO_READ= excelFile.getCellNumericValue(YOUR_SHEET_NAME, i, 2);
WP workplace = add_wps(name, SEC_PARAMETER_TO_READ);
}
Now if you want to get a workplace by name, you have to create a method similar to your try.
Functionbody:
WP workplaceToFind = wps.findFirst(w -> w.name.equals(destinationName));
if(workplaceToFind != null){
//do what ever you want
}

How to Create Relationship between two Different Column in Neo4j

I am trying to initiate a relationship between two columns in Neo4j. my dataset is a CSV file with two-column refers to Co-Authorship and I want to Construct a Network of it. I already load the data, return them and match them.
Loading
load csv from 'file:///conet1.csv' as rec
return the data
create (:Guys {source: rec[0], target: rec[1]})
now I need to Construct the Collaboration Network of data by making a relationship between source and target columns. What do you propose for the purpose?
I was able to make a relationship between mentioned columns in NetworkX graph libray in python like this:
import pandas as pd
import networkx as nx
g = nx.Graph()
df = pd.read_excel('Colab.csv', columns= ['source', 'target'])
g = nx.from_pandas_edgelist(df,'source','target', 'weight')
If I understand your use case, I do not believe you should be creating Guys nodes just to store relationship info. Instead, the graph-oriented approach would be to create an Author node for each author and a relationship (say, of type COLLABORATED_WITH) between the co-authors.
This might work for you, or at least give you a clue:
LOAD CSV FROM 'file:///conet1.csv' AS rec
MERGE (source:Author {id: rec[0]})
MERGE (target:Author {id: rec[1]})
CREATE (source)-[:COLLABORATED_WITH]->(target)
If it is possible that the same relationship could be re-created, you should replace the CREATE with a more expensive MERGE. Also, a work can have any number of co-authors, so having a relationship between every pair may be sub-optimal depending on what you are trying to do; but that is a separate issue.

Arangodb update properties depend on edge type

I am trying to use AQL to update the whole node collection , named Nodes, depend on the type of edges they have
.
Requirement:
Basically, if 2 entity in Nodes has relation type= "Same", they would be updated with unique groupid properties (same for more than 2)
This would only run one time in the beginning (to populate groupid)
My concept approach:
Use AQL
For each entity inside Node, query out all connectable nodes with type=SAME
Generate an groupid and Update all of them
Write to an lookup object those id
For next entity, do a lookup, skip the entity if their id is there.
What I tried
FOR v,e,p
In 1..10
ANY v
EntityRelationTest
OPTIONS {uniqueVertices:"global",bfs:true}
FILTER p.edges[*].relationType[0]== "EQUALS"
UPDATE v WITH { typeName2:"test1"} IN EntityTest
return NEW
But I am quite new to arangodb AQL, is something like above possible?
In the end, what I use is a customize traversal object running directly inside Foxx in order to get the best of both world: performance and correctness. It seemed that we cannot do the above with only AQL

Merging partial duplicate cases without losing data

I have a question in regards to preparing my dataset for research.
I have a dataset in SPSS 20 in long format as I am researching on individual level over multiple years. However some individuals were added twice to my dataset because there were differences in some variables matched to those individuals (5000 individuals with 25 variables per individual). I would like to merge those duplicates so that I can run my analysis over time. For those variables that differ between the duplicates I would like spss to make additional variables when all the duplicates are merged.
Is this at all possible and if yes HOW?
I suggest following steps>
create auxiliary variable "PrimaryLast" with procedure Data->Identify Duplicate Cases by... , set "Define matching cases by" to your case ID
create 2 new auxiliary datasets with Data->Select Cases with condition "PrimaryLast = 0" and "PrimaryLast = 1" and selection "Copy selected cases to new dataset"
merge both auxiliary datasets with procedure Data -> Merge Files-> Add Variables, rename duplicated variable names in left box and move them in right box and select your case ID as key
don't forget to control if you made "full outer join", in case you lost non-duplicated cases and have only duplicated cases in your dataset, just merge datasets from step 2. in different order in step 3.
Try this:
sort cases by caseID otherVar.
compute ind=1.
if $casenum>1 and caseID=lag(caseID) ind=lag(ind)+1.
casestovars /id=caseID /index=ind.
If a caseID is repeated more then once, after restructure there will be only one line for that case, while all the variables will be repeated with indexes.
If the order of the caseID repeats, replace the otherVar in the sort command with the corresponding variable (e.g. date). This way your new variables will also be indexed accordingly.

Syncronizing data between two django servers

I have a central Django server containing all of my information in a database. I want to have a second Django server that contains a subset of that information in a second database. I need a bulletproof way to selectively sync data between the two.
The secondary Django will need to pull its subset of data from the primary at certain times. The subset will have to be filtered by certain fields.
The secondary Django will have to occasionally push its data to the primary.
Ideally, the two-way sync would keep the most recently modified objects for each model.
I was thinking something along the lines of having using TimeStampedModel (from django-extensions) or adding my own DateTimeField(auto_now=True) so that every object stores its last modified time. Then, maybe a mechanism to dump the data from one DB and load it in to the other such that only the more recently modified objects are kept.
Possibilities I am considering are django's dumpdata, django-extensions dumpscript, django-test-utils makefixture or maybe django-fixture magic. There's a lot to think about, so I'm not sure which road to proceed down.
Here is my solution, which fits all of my requirements:
Implement natural keys and unique constraints on all models
Allows for a unique way to refer to each object without using primary key IDs
Sublcass each model from TimeStampedModel in django-extensions
Adds automatically updated created and modified fields
Create a Django management command for exporting, which filters a subset of data and serializes it with natural keys
baz = Baz.objects.filter(foo=bar)
yaz = Yaz.objects.filter(foo=bar)
objects = [baz, yaz]
flat_objects = list(itertools.chain.from_iterable(objects))
data = serializers.serialize("json", flat_objects, indent=3, use_natural_keys=True)
print(data)
Create a Django management command for importing, which reads in the serialized file and iterates through the objects as follows:
If the object does not exist in the database (by natural key), create it
If the object exists, check the modified timestamps
If the imported object is newer, update the fields
If the imported object is older, do not update (but print a warning)
Code sample:
# Open the file
with open(args[0]) as data_file:
json_str = data_file.read()
# Deserialize and iterate
for obj in serializers.deserialize("json", json_str, indent=3, use_natural_keys=True):
# Get model info
model_class = obj.object.__class__
natural_key = obj.object.natural_key()
manager = model_class._default_manager
# Delete PK value
obj.object.pk = None
try:
# Get the existing object
existing_obj = model_class.objects.get_by_natural_key(*natural_key)
# Check the timestamps
date_existing = existing_obj.modified
date_imported = obj.object.modified
if date_imported > date_existing:
# Update fields
for field in obj.object._meta.fields:
if field.editable and not field.primary_key:
imported_val = getattr(obj.object, field.name)
existing_val = getattr(existing_obj, field.name)
if existing_val != imported_val:
setattr(existing_obj, field.name, imported_val)
except ObjectDoesNotExist:
obj.save()
The workflow for this is to first call python manage.py exportTool > data.json, then on another django instance (or the same), call python manage.py importTool data.json.

Resources