Performance of CYPHER 2.3 in Neo4j query - database

I am having a problem in a Neo4j query. Suppose I have a Node type called App. The App nodes have the fields "m_id" and "info". I want to build a query to create a relationship between the nodes where the field "info" is equal.
This is the query:
MATCH (a:App {m_id:'SOME_VALUE' }),(b:App {info: a.info}) WHERE ID(a)<>ID(b) AND NOT (b)-[:INFO]->(a) MERGE (a)-[r:INFO]->(b) RETURN b.m_id;
I also have indexes for both fields:
CREATE CONSTRAINT ON (a:App) ASSERT a.m_id IS UNIQUE;
CREATE INDEX ON :App(info);
But the thing is I get very slow queries, with access in all the records of the App nodes.
This is the profile of the query:
+---------------+--------+---------+-----------------+--------------------------------------------------------------------------------------------------------------------------------+
| Operator | Rows | DB Hits | Identifiers | Other |
+---------------+--------+---------+-----------------+--------------------------------------------------------------------------------------------------------------------------------+
| +ColumnFilter | 0 | 0 | b.m_id | keep columns b.m_id |
| | +--------+---------+-----------------+--------------------------------------------------------------------------------------------------------------------------------+
| +Extract | 0 | 0 | a, b, b.m_id, r | b.m_id |
| | +--------+---------+-----------------+--------------------------------------------------------------------------------------------------------------------------------+
| +Merge(Into) | 0 | 1 | a, b, r | (a)-[r:INFO]->(b) |
| | +--------+---------+-----------------+--------------------------------------------------------------------------------------------------------------------------------+
| +Eager | 0 | 0 | a, b | |
| | +--------+---------+-----------------+--------------------------------------------------------------------------------------------------------------------------------+
| +Filter | 0 | 2000000 | a, b | Ands(b.info == a.info, NOT(IdFunction(a) == IdFunction(b)), NOT(nonEmpty(PathExpression((b)-[anon[104]:INFO]->(a), true)))) |
| | +--------+---------+-----------------+--------------------------------------------------------------------------------------------------------------------------------+
| +SchemaIndex | 184492 | 1000000 | a, b | { AUTOSTRING0}; :App(m_id) |
| | +--------+---------+-----------------+--------------------------------------------------------------------------------------------------------------------------------+
| +NodeByLabel | 184492 | 1000001 | b | :App |
+---------------+--------+---------+-----------------+--------------------------------------------------------------------------------------------------------------------------------+

Try finding a by itself, using a WITH clause to put a.info into a temporary variable that is used by a separate MATCH clause for b, as in:
MATCH (a:App { m_id:'SOME_VALUE' })
WITH a, a.info AS a_info
MATCH (b:App { info: a_info })
WHERE a <> b AND NOT (b)-[:INFO]->(a)
MERGE (a)-[r:INFO]->(b)
RETURN b.m_id;
It seems that indices tend not to be used when comparing the properties of 2 nodes. The use of a_info removes that impediment.
If the profile of the above shows that one or both indices are not being used, you can try adding index hints:
MATCH (a:App { m_id:'SOME_VALUE' })
USING INDEX a:App(m_id)
WITH a, a.info AS a_info
MATCH (b:App { info: a_info })
USING INDEX b:App(info)
WHERE a <> b AND NOT (b)-[:INFO]->(a)
MERGE (a)-[r:INFO]->(b)
RETURN b.m_id;

I figure out a solution using OPTIONAL MATCH:
MATCH (a:App {m_id:'SOME_VALUE' }) OPTIONAL MATCH (a),(b:App {info: a.info}) WHERE ID(a)<>ID(b) AND NOT (b)-[:INFO]->(a) MERGE (a)-[r:INFO]->(b) RETURN b.m_id;
This is the profile of the query:
+----------------+------+---------+-----------------+------------------------------------------------------------------------------------------------------------+
| Operator | Rows | DB Hits | Identifiers | Other |
+----------------+------+---------+-----------------+------------------------------------------------------------------------------------------------------------+
| +ColumnFilter | 0 | 0 | b.m_id | keep columns b.m_id |
| | +------+---------+-----------------+------------------------------------------------------------------------------------------------------------+
| +Extract | 0 | 0 | a, b, b.m_id, r | b.m_id |
| | +------+---------+-----------------+------------------------------------------------------------------------------------------------------------+
| +Merge(Into) | 0 | 1 | a, b, r | (a)-[r:INFO]->(b) |
| | +------+---------+-----------------+------------------------------------------------------------------------------------------------------------+
| +Eager | 0 | 0 | a, b | |
| | +------+---------+-----------------+------------------------------------------------------------------------------------------------------------+
| +OptionalMatch | 0 | 0 | a, b | |
| |\ +------+---------+-----------------+------------------------------------------------------------------------------------------------------------+
| | +Filter | 0 | 0 | a, b | Ands(NOT(IdFunction(a) == IdFunction(b)), NOT(nonEmpty(PathExpression((b)-[anon[109]:INFO]->(a), true)))) |
| | | +------+---------+-----------------+------------------------------------------------------------------------------------------------------------+
| | +SchemaIndex | 0 | 0 | a, b | a.info; :App(info) |
| | | +------+---------+-----------------+------------------------------------------------------------------------------------------------------------+
| | +Argument | 0 | 0 | a | |
| | +------+---------+-----------------+------------------------------------------------------------------------------------------------------------+
| +SchemaIndex | 0 | 1 | a | { AUTOSTRING0}; :App(m_id) |
+----------------+------+---------+-----------------+------------------------------------------------------------------------------------------------------------+

Related

Generate a new dataset from two existings datasets with conditions

I have two dataset with the same columns, and I would like to create a new one in another sheet with all rows from the first dataset and add to it specific rows from the second one.
My first dataset is like:
| Item Type | Item Numb | Start Date | End date |
---------------------------------------------------
| 1 | 1 | 17/02/2022 | 21/02/2022 |
| 1 | 2 | 19/02/2022 | 24/02/2022 |
| 2 | 1 | 15/02/2022 | 18/02/2022 |
| 2 | 2 | 17/02/2022 | 20/02/2022 |
| 3 | 1 | 21/02/2022 | 25/02/2022 |
And the second one is like:
| Item Type | Item Numb | Start Date | End date |
---------------------------------------------------
| 1 | 2 | 17/02/2022 | 20/02/2022 |
| 2 | 2 | 17/02/2022 | 20/02/2022 |
| 2 | 3 | 20/02/2022 | 23/02/2022 |
| 3 | 1 | 20/02/2022 | 23/02/2022 |
| 4 | 1 | 21/02/2022 | 24/02/2022 |
| 4 | 2 | 23/02/2022 | 28/02/2022 |
So now, I would like in a new sheet to retrieve the rows from the first dataset and add at the end the rows from the second one who are absent.
If a Combination of "Item Type" and "Item Numb" is already imported I don't want to get them from the second dataset, but if this specific combination isn't in the first one so I would like to add the row.
That's what I need as the result:
| Item Type | Item Numb | Start Date | End date |
---------------------------------------------------
| 1 | 1 | 17/02/2022 | 21/02/2022 |
| 1 | 2 | 19/02/2022 | 24/02/2022 |
| 2 | 1 | 15/02/2022 | 18/02/2022 |
| 2 | 2 | 17/02/2022 | 20/02/2022 |
| 3 | 1 | 21/02/2022 | 25/02/2022 |
| 2 | 3 | 20/02/2022 | 23/02/2022 |
| 4 | 1 | 21/02/2022 | 24/02/2022 |
| 4 | 2 | 23/02/2022 | 28/02/2022 |
Thanks in advance for your time folks!
try:
=INDEX(ARRAY_CONSTRAIN(QUERY(SORTN(
{Sheet1!A2:D, Sheet1!A2:A&Sheet1!B2:B;
Sheet2!A2:D, Sheet2!A2:A&Sheet2!B2:B}, 9^9, 2, 5, 1),
"where Col1 is not null", 0), 9^9, 4)

How to return first not empty cell from importrange values?

my google sheet excel document contain data like this
+---+---+---+---+---+---+
| | A | B | C | D | E |
+---+---+---+---+---+---+
| 1 | | c | | x | |
+---+---+---+---+---+---+
| 2 | | r | | 4 | |
+---+---+---+---+---+---+
| 3 | | | | m | |
+---+---+---+---+---+---+
| 4 | | | | | |
+---+---+---+---+---+---+
Column B and D contain data provided by IMPORTRANGE function, which are store in different files.
And i would like to fill column A with first not empty value in row, in other words: desired result must look like this:
+---+---+---+---+---+---+
| | A | B | C | D | E |
+---+---+---+---+---+---+
| 1 | c | c | | x | |
+---+---+---+---+---+---+
| 2 | r | r | | 4 | |
+---+---+---+---+---+---+
| 3 | m | | | m | |
+---+---+---+---+---+---+
| 4 | | | | | |
+---+---+---+---+---+---+
I tried ISBLANK function, but apperantly if column is imported then, even if the value is empty, is not blank, so this function dosn't work for my case. Then i tried QUERY function in 2 different variant:
1) =QUERY({B1;D1}; "select Col1 where Col1 is not null limit 1"; 0) but result in this case is wrong when row contain cells with numbers. Result with this query is following:
+---+---+---+---+---+---+
| | A | B | C | D | E |
+---+---+---+---+---+---+
| 1 | c | c | | x | |
+---+---+---+---+---+---+
| 2 | 4 | r | | 4 | |
+---+---+---+---+---+---+
| 3 | m | | | m | |
+---+---+---+---+---+---+
| 4 | | | | | |
+---+---+---+---+---+---+
2) =QUERY({B1;D1};"select Col1 where Col1 <> '' limit 1"; 0) / =QUERY({B1;D1};"select Col1 where Col1 != '' limit 1"; 0) and this dosn't work at all, result is always #N/A
Also i would like to avoid using nested IFs and javascript scripts, if possible, as solution with QUERY function suits for my case best due to easy expansion to another columns without any deeper knowladge about programming. Is there any way how to make it simply, just with QUERY, and i am just missing something, or i have to use IFs/javascript?
try:
=ARRAYFORMULA(SUBSTITUTE(INDEX(IFERROR(SPLIT(TRIM(TRANSPOSE(QUERY(
TRANSPOSE(SUBSTITUTE(B:G, " ", "♦")),,99^99))), " ")),,1), "♦", " "))
selective columns:

Multiple outcomes/scenarios

I got a problem that I have already created a solution for, but I'm wondering if there's a better way of solving the problem. Basically I have to create a flag for certain scenarios under a partition of ID and date. My solution involved mapping for all the possible scenarios, then creating "case when" statements for all these scenarios with the specific outcome. Basically, I was the one that created the outcomes. I am wondering if there's another way, something around letting SQL create the outcomes instead of myself.
Thanks a lot!
Background:
+----+-----------+--------+-------+------+-----------------+-----------------------------------------------------------------------------------+
| ID | Month | Status | Value | Flag | Scenario Number | Scenario Description |
+----+-----------+--------+-------+------+-----------------+-----------------------------------------------------------------------------------+
| 1 | 1/01/2016 | First | 123 | No | 1 | First, second and blank exists. Do not flag |
| 1 | 1/01/2016 | Second | 456 | No | 1 | First, second and blank exists. Do not flag |
| 1 | 1/01/2016 | | 789 | No | 1 | First, second and blank exists. Do not flag |
| 1 | 1/02/2016 | Second | 123 | Yes | 2 | First does not exist, two second but have different values. Flag these as Yes |
| 1 | 1/02/2016 | Second | 456 | Yes | 2 | First does not exist, two second but have different values. Flag these as Yes |
| 1 | 1/02/2016 | Second | 123 | No | 3 | First does not exist, two second have same values. Do not flag |
| 1 | 1/02/2016 | Second | 123 | No | 3 | First does not exist, two second have same values. Do not flag |
| 1 | 1/03/2016 | Second | 123 | No | 4 | Only one entry of Second exist. Do no flag |
| 1 | 1/04/2016 | | 123 | Yes | 5 | Two blanks for the partition. Flag these as Yes |
| 1 | 1/04/2016 | | 123 | Yes | 5 | Two blanks for the partition. Flag these as Yes |
| 1 | 1/05/2016 | | | No | 6 | Only one entry of blank exist. Do not flag these |
| 1 | 1/06/2016 | First | 123 | Yes | 7 | First exist for the partition. Do not flag |
| 1 | 1/06/2016 | | 456 | Yes | 7 | First exist for the partition. Do not flag |
| 1 | 1/07/2016 | Second | 123 | Yes | 8 | First does not exist and second and blank do not have the same value. Flag these. |
| 1 | 1/07/2016 | | 456 | Yes | 8 | First does not exist and second and blank do not have the same value. Flag these. |
| 1 | 1/07/2016 | Second | 123 | Yes | 8 | First does not exist and second and blank have the same value. Flag these. |
| 1 | 1/07/2016 | | 123 | Yes | 8 | First does not exist and second and blank have the same value. Flag these. |
+----+-----------+--------+-------+------+-----------------+-----------------------------------------------------------------------------------+
Data:
+----+-----------+-------+----------+---------------+
| ID | Month | Value | Priority | Expected_Flag |
+----+-----------+-------+----------+---------------+
| 1 | 1/01/2016 | 96.01 | | Yes |
| 1 | 1/01/2016 | 96.01 | | Yes |
| 1 | 1/02/2016 | 65.2 | First | No |
| 1 | 1/02/2016 | 3.47 | Second | No |
| 1 | 1/02/2016 | 45.99 | | No |
| 11 | 1/01/2016 | 25 | | No |
| 11 | 1/02/2016 | 74.25 | Second | No |
| 11 | 1/02/2016 | 74.25 | Second | No |
| 11 | 1/02/2016 | 23.25 | | No |
| 24 | 1/01/2016 | 1.25 | First | No |
| 24 | 1/01/2016 | 1.365 | | No |
| 24 | 1/04/2016 | 1.365 | First | No |
| 24 | 1/04/2016 | 1.365 | | No |
| 24 | 1/05/2016 | 1.365 | First | No |
| 24 | 1/05/2016 | 1.365 | First | No |
| 24 | 1/06/2016 | 1.365 | Second | No |
| 24 | 1/06/2016 | 1.365 | Second | No |
| 24 | 1/07/2016 | 1.365 | Second | Yes |
| 24 | 1/07/2016 | 1.365 | | Yes |
| 24 | 1/08/2016 | 1.365 | First | No |
| 24 | 1/08/2016 | 1.365 | | No |
| 24 | 1/09/2016 | 1.365 | Second | No |
| 24 | 1/09/2016 | 1.365 | | No |
| 27 | 1/01/2016 | 0 | Second | Yes |
| 27 | 1/01/2016 | 0 | Second | Yes |
| 27 | 1/02/2016 | 45.25 | Second | No |
| 3 | 1/01/2016 | 96.01 | First | No |
| 3 | 1/01/2016 | 96.01 | First | No |
| 3 | 1/03/2016 | 96.01 | First | No |
| 3 | 1/03/2016 | 96.01 | First | No |
| 35 | 1/01/2016 | | | Yes |
| 35 | 1/01/2016 | | | Yes |
| 35 | 1/02/2016 | | First | No |
| 35 | 1/02/2016 | | Second | No |
| 35 | 1/02/2016 | | | No |
| 35 | 1/02/2016 | | | No |
| 35 | 1/03/2016 | | Second | Yes |
| 35 | 1/03/2016 | | Second | Yes |
| 35 | 1/04/2016 | | Second | No |
| 35 | 1/04/2016 | | Second | No |
+----+-----------+-------+----------+---------------+

Generate variables that move information between rows in hierarchical data with spss syntax

I was wondering if you can help me with the following problem in spss syntax.
My dataset has nested structure.
Data are nested in companies, then each company has 1 or 2 bosses, but in this case I care only about boss 1. At a previous stage in time the boss graded the workers (not all of them). Now, the ID and the grade of the workers is on the row each worker.
I would like to move the information that was obtained during worker's assessment and create new sets of variables for each (worker ID and grade) on the line/row of the boss.
+---------+------+--------+--------------+---------+---------+--------+---------+
| company | boss |workerID|worker's grade|N:workID1|N:grade1 |N:work2 |N:grade2 |
+---------+------+--------+--------------+---------+---------+--------+---------+
| A | 1 | 1 | | 3 | A | 4 | A |
| A | 2 | 2 | | | |
| A | 0 | 3 | A | | |
| A | 0 | 4 | A | | |
| A | 0 | 5 | | | |
| B | 1 | 1 | | 3 | B | 4 | A |
| B | 0 | 2 | | | |
| B | 0 | 3 | B | | |
| B | 0 | 4 | A | | |
| C | 1 | 1 | | 2 | D | -1 | -1 |
| C | 0 | 2 | D | | |
I would like to move the worker's id and the grade that to the row of the boss in the NEW variables, without loosing the existing variables on workerID and worker's grade.
Basically, I will need to feed forward the information into the new variables and to the row of boss EQ 1 separately for each company.
I have no idea how to proceed with this. I assume that I need a loop that creates new variable for each worker ID that has a valid grade and then feeds forward the information from the worker's row to the boss' newly generated variables.
Any suggestions are very wellcome :-)
Take a look at VARSTOCASES (Data > Restructure)

How to make a SQL "IF-THEN-ELSE" statement

I've seen other questions about SQL If-then-else stuff, but I'm not seeing how to relate it to what I'm trying to do. I've been using SQL for about a year now but only basic stuff and never this.
If I have a SQL table that looks like this
| Name | Version | Category | Value | Number |
|:-----:|:-------:|:--------:|:-----:|:------:|
| File1 | 1.0 | Time | 123 | 1 |
| File1 | 1.0 | Size | 456 | 1 |
| File1 | 1.0 | Final | 789 | 1 |
| File2 | 1.0 | Time | 312 | 1 |
| File2 | 1.0 | Size | 645 | 1 |
| File2 | 1.0 | Final | 978 | 1 |
| File3 | 1.0 | Time | 741 | 1 |
| File3 | 1.0 | Size | 852 | 1 |
| File3 | 1.0 | Final | 963 | 1 |
| File1 | 1.1 | Time | 369 | 2 |
| File1 | 1.1 | Size | 258 | 2 |
| File1 | 1.1 | Final | 147 | 2 |
| File2 | 1.1 | Time | 741 | 2 |
| File2 | 1.1 | Size | 734 | 2 |
| File2 | 1.1 | Final | 942 | 2 |
| File3 | 1.1 | Time | 997 | 2 |
| File3 | 1.1 | Size | 997 | 2 |
| File3 | 1.1 | Final | 985 | 2 |
How can I write a SQL IF, ELSE statement that creates a new column called "Replication" that follows this rule:
A = B + 1 when x = 1
else
A = B
where A = the number we will use for the next Number
B = Max(Number)
x = Replication count (this is the number of times that a loop is executed. x=i)
The results table will look like this:
| Name | Version | Category | Value | Number | Replication |
|:-----:|:-------:|:--------:|:-----:|:------:|:-----------:|
| File1 | 1.0 | Time | 123 | 1 | 1 |
| File1 | 1.0 | Size | 456 | 1 | 1 |
| File1 | 1.0 | Final | 789 | 1 | 1 |
| File2 | 1.0 | Time | 312 | 1 | 1 |
| File2 | 1.0 | Size | 645 | 1 | 1 |
| File2 | 1.0 | Final | 978 | 1 | 1 |
| File1 | 1.0 | Time | 369 | 1 | 2 |
| File1 | 1.0 | Size | 258 | 1 | 2 |
| File1 | 1.0 | Final | 147 | 1 | 2 |
| File2 | 1.0 | Time | 741 | 1 | 2 |
| File2 | 1.0 | Size | 734 | 1 | 2 |
| File2 | 1.0 | Final | 942 | 1 | 2 |
| File1 | 1.1 | Time | 997 | 2 | 1 |
| File1 | 1.1 | Size | 997 | 2 | 1 |
| File1 | 1.1 | Final | 985 | 2 | 1 |
| File2 | 1.1 | Time | 438 | 2 | 1 |
| File2 | 1.1 | Size | 735 | 2 | 1 |
| File2 | 1.1 | Final | 768 | 2 | 1 |
| File1 | 1.1 | Time | 786 | 2 | 2 |
| File1 | 1.1 | Size | 486 | 2 | 2 |
| File1 | 1.1 | Final | 135 | 2 | 2 |
| File2 | 1.1 | Time | 379 | 2 | 2 |
| File2 | 1.1 | Size | 943 | 2 | 2 |
| File2 | 1.1 | Final | 735 | 2 | 2 |
EDIT: Based on the answer by Sean Lange, this is my 2nd attempt at a solution:
SELECT COALESCE(MAX)(Number) + CASE WHEN Replication = 1 then 1 else 0, 1) FROM Table
The COALESCE is in there for when there is no value yet in the Number column.
The IF/Else construct is used to control flow of statements in t-sql. You want a case expression, which is used to conditionally return values in a column.
https://msdn.microsoft.com/en-us/library/ms181765.aspx
Yours would be something like:
case when x = 1 then A else B end as A
As SeanLange pointed out in this case it would be better to use an CASE/WHEN but to illustrate how to use If\ELSE the way to do it in sql is like this:
if x = 1
BEGIN
---Do something
END
ELSE
BEGIN
--Do something else
END
I would say the best way to know the difference and when to use which is if you are writing a query and want a different field to appear based on a certain condition, use case/when. If a certain condition will cause a series of steps to happen then use if/else

Resources