Recursive CTE - Updating nodes in SQL Server - sql-server

I have some main tests. Each of these main tests consist on other tests. Each of these tests consists on other tests and so on. See below the trees as an example.
Main Test 1
ID:1
/ | \
/ | \
+ o +
Test Test Test
ID:2 ID:3 ID:4
/ \ / \
/ \ / \
+ o + o
Test Test Test Test
ID:5 ID:6 ID:7 ID:8
| / \
| / \
o o o
Test Test Test
ID:12 ID:9 ID:10
Main Test 2
ID:2
/ \
/ \
+ +
Test Test
ID:3 ID:8
/ | \
/ | \
o o o
Test Test Test
ID:5 ID:10 ID:7
Symbols:
'o' are leafs
'+' are parents
Main Test 1 and Main Test 2 are main tests (root tests).
Within each main test, ids for tests are unique, but ids tests within a main test can be repeated for within another main tests as above trees show.
I have an input table, let's say, "INPUT" with below columns:
ID_MainTest | ID_TEST | PASSED
With this input table we indicate which tests for each main test are passed.
Also we have another table that contains above trees representation into table, let's say table "Trees":
ID_MainTest | ID_TEST | PARENT_ID_TEST
Finally we have another table, let's say table "TESTS", which contains all the tests which indicates the current result (PENDING,FAILED,PASSED) for each test:
ID_MainTest | ID_TEST | RESULT
So suppose tables content are:
INPUT table (ID_MainTest and ID_Test are primary keys):
ID_MainTest | ID_TEST | PASSED
1 4 1
1 5 1
1 6 1
1 2 1
1 3 1
2 3 1
TREES table (ID_MainTest and ID_Test are primary keys):
ID_MainTest | ID_TEST | PARENT_ID_TEST
1 2 NULL
1 3 NULL
1 4 NULL
1 5 2
1 6 2
1 7 4
1 8 4
1 12 5
1 9 7
1 10 7
2 3 NULL
2 8 NULL
2 5 3
2 10 3
2 7 3
TESTS table (ID_MainTest and ID_Test are primary keys):
ID_MainTest | ID_TEST | RESULT
1 2 PENDING
1 3 FAILED
1 4 FAILED
1 5 PASSED
1 6 PENDING
1 7 PASSED
1 8 FAILED
1 12 PASSED
1 9 PASSED
1 10 PENDING
2 3 PENDING
2 8 FAILED
2 5 PASSED
2 10 PENDING
2 7 PENDING
The functionality is the following:
A test (those indicated in input table) will be switch to passed if and only if all its children figure as passed. If any of its children (or descendants) is failed, then parent will be set/switch to failed despite of indicated as passed in input table.
If test is indicated to be passed from input table, all its children (and descendants) will be set/switch to passed from the parent to the leafs when possible: Children (and descendants) may only be switch to passed if they figure as pending. If a child (or descendant) figures as failed it cannot be switch to passed (it keeps as failed). Also if a child (or descendant) already figures as passed it is not necessary to switch again to passed, it will be kept.
Parent indicated as passed in input table, can be switch to passed if
all its descendants figure as passed (independently if this parent
figures as failed or pending in the tests table, this is an
exception).
So taken into account the functionality and tables content above indicated I would like to obtain below result table with only the tests we have tried to switch to passed (successfully or not), switched to passed, or maintained to failed or passed, including those indicated in input table:
(ID_MainTest and ID_Test are primary keys):
ID_MainTest | ID_TEST | RESULT
1 2 PASSED
1 3 PASSED
1 4 FAILED
1 5 PASSED
1 6 PASSED
1 7 PASSED
1 8 FAILED
1 12 PASSED
1 9 PASSED
1 10 PASSED
2 3 PASSED
2 5 PASSED
2 10 PASSED
2 7 PASSED
I provide the initial tables below:
DECLARE #INPUT AS TABLE
(
ID_MainTest int,
ID_TEST int,
PASSED bit
)
INSERT INTO #INPUT VALUES
(1, 4, 1),
(1, 5, 1),
(1, 6, 1),
(1, 2, 1),
(1, 3, 1),
(2, 3, 1)
DECLARE #TREES AS TABLE
(
ID_MainTest int,
ID_TEST int,
PARENT_ID_TEST int
)
INSERT INTO #TREES VALUES
(1, 2, NULL),
(1, 3, NULL),
(1, 4, NULL),
(1, 5, 2),
(1, 6, 2),
(1, 7, 4),
(1, 8, 4),
(1, 12, 5),
(1, 9, 7),
(1, 10, 7),
(2, 3, NULL),
(2, 8, NULL),
(2, 5, 3),
(2, 10, 3),
(2, 7, 3)
DECLARE #TESTS AS TABLE
(
ID_MainTest int,
ID_TEST int,
RESULT NVARCHAR(50)
)
INSERT INTO #TESTS VALUES
(1, 2, 'PENDING'),
(1, 3, 'FAILED'),
(1, 4, 'FAILED'),
(1, 5, 'PASSED'),
(1, 6, 'PENDING'),
(1, 7, 'PASSED'),
(1, 8, 'FAILED'),
(1, 12, 'PASSED'),
(1, 9, 'PASSED'),
(1, 10, 'PENDING'),
(2, 3, 'PENDING'),
(2, 8, 'FAILED'),
(2, 5, 'PASSED'),
(2, 10, 'PENDING'),
(2, 7, 'PENDING')

Related

Snowflake: Subtracting one column from another and partitioning it by another column

Data:
Group
Date
Current
Next
A
03/09/2020
4
7
A
04/09/2020
2
4
A
05/09/2020
4
null
B
17/08/2020
4
9
B
19/08/2020
4
null
I don't think I can use a Windows Function, as I'm not using SUM, COUNT, MAX, MIN etc. If I do just do NEXT- CURRENT, it's not going to partition by group. Hence, the whole set will be treated as one group.
I want to subtract Current from Next but for it to partitioned by Group.
Desired output:
Group
Date
Current
Next
Diff
A
03/09/2020
4
7
3
A
04/09/2020
2
4
2
A
05/09/2020
4
null
null
B
17/08/2020
4
9
5
B
19/08/2020
4
null
null
This is just simple math, not sure what the real question is?
with data("Group", "Date", "Current", "Next") as (
select * from values
('A', '03/09/2020', 4, 7),
('A', '04/09/2020', 2, 4),
('A', '05/09/2020', 4, null),
('B', '17/08/2020', 4, 9),
('B', '19/08/2020', 4, null)
)
select *, "Next" - "Current" as Diff
from data;
Group
Date
Current
Next
DIFF
A
03/09/2020
4
7
3
A
04/09/2020
2
4
2
A
05/09/2020
4
B
17/08/2020
4
9
5
B
19/08/2020
4

Last WHEN expression overrides result of first WHEN expression in CASE WHEN in SQL SELECT statement

This looks so familiar to most of in this community as it is all about SQL CASE expression. But I am going now in triage mode rather doing actual implementation. I appreciate if there is optimal way to work around this.
SCENARIO:
I have one select statement where in I retrieve multiple columns from a table. The table has columns mostly with numeric(10, 3) datatype. I chose this datatype as I thought if I need to display int value the conversion would be easier rather vice versa. Here is the table structure.
Name: FleetRange
Columns:
CaptionID INT NOT NULL
Caption NVARCHAR(50)
FleetRange_1 numeric(10,3)
FleetRange_2_4 numeric(10,3)
.......
Total numeric(10,3)
Critera:
Current result:
My SQL query:
SELECT
CaptionId, Caption,
CASE
WHEN CaptionId IN (1, 2, 4, 5, 6, 7, 8, 9, 14, 15, 17, 18, 20, 21, 22, 23)
THEN CONVERT(INT, FleetRange_1)
WHEN CaptionId IN (11, 12, 13)
THEN CONVERT(NUMERIC(10, 2), FleetRange_1)
WHEN CaptionId IN (3, 10, 16, 19)
THEN CONVERT(NUMERIC(10, 3), FleetRange_1)
END AS 'FleetRange_1'
FROM
FleetRange
NOTE:
What currently happening is, the last WHEN is overriding previous evaluation and hence every row display values with 3 decimal places even if there is an integer value.
I have applied same case structure for other numeric(10,3), hence I have shortened the query.
instead of case written within above query, I tried below syntax too - but no difference.
WHEN
CaptionId = 11 OR
CaptionId = 12 OR
CaptionId = 13
THEN...
My expectation (desired actual result): my objective is - particular row value should be converted to int numeric given precision if the particular when expression with specific CaptionID is evaluated.
Something like below:
CaptionID | Caption | FleetRange_1 | FleetRange_2_4 | .....
1 | SafetyFirst | 0 | 1 |
11 | DriveSafe | 2.15 | null|
3 | Caution | 1.025 | 2.174|
Every expression in a query has a single data type. The data type of a CASE expression is:
the highest precedence type from the set of types in
result_expressions and the optional else_result_expression.
CASE (TSQL)
And see Data Type Precedence
If you want different display formatting for different rows in a single resultset, you'll have to convert them to strings and Format them yourself.

How to select specific row column pairs in numpy array which have specific value?

I am trying to have a numpy array with random numbers from 0 to 1:
import numpy as np
x = np.random.random((3,3))
yields
[[ 0.11874238 0.71885484 0.33656161]
[ 0.69432263 0.25234083 0.66118676]
[ 0.77542651 0.71230397 0.76212491]]
And, from this array, I need the row,column combinations which have values bigger than 0.3. So the expected output should look like:
(0,1),(0,2),(1,0),(1,2),(2,0),(2,1),(2,2)
To be able to extract the item (the values of x[row][column]),and tried to write the output to a file. I tried the following command:
with open('newfile.txt', 'w') as fd:
for row in x:
for item in row:
if item > 0.3:
print(item)
for row in item:
for col in item:
print(row,column,'\n')
fd.write(row,column,'\n')
However, it raises an error :
TypeError: 'numpy.float64' object is not iterable
Also, I searched but could not find how to start the numpy index from 1 instead of 0. For example, the expected output would look like this:
(1,2),(1,3),(2,1),(2,3),(3,1),(3,2),(3,3)
Do you know how to get these outputs?
Get the indices along first two axes that match that criteria with np.nonzero/np.where on the mask of comparisons and then simply index with integer array indexing -
r,c = np.nonzero(x>0.3)
out = x[r,c]
If you are looking to get those indices a list of tuples, zip those indices -
zip(r,c)
To get those starting from 1, add 1 and then zip -
zip(r+1,c+1)
On Python 3.x, you would need to wrap it with list() : list(zip(r,c)) and list(zip(r+1,c+1)).
Sample run -
In [9]: x
Out[9]:
array([[ 0.11874238, 0.71885484, 0.33656161],
[ 0.69432263, 0.25234083, 0.66118676],
[ 0.77542651, 0.71230397, 0.76212491]])
In [10]: r,c = np.nonzero(x>0.3)
In [14]: zip(r,c)
Out[14]: [(0, 1), (0, 2), (1, 0), (1, 2), (2, 0), (2, 1), (2, 2)]
In [18]: zip(r+1,c+1)
Out[18]: [(1, 2), (1, 3), (2, 1), (2, 3), (3, 1), (3, 2), (3, 3)]
In [13]: x[r,c]
Out[13]:
array([ 0.71885484, 0.33656161, 0.69432263, 0.66118676, 0.77542651,
0.71230397, 0.76212491])
Writing indices to file -
Use np.savetxt with int format, like so -
In [69]: np.savetxt("output.txt", np.argwhere(x>0.3), fmt="%d", comments='')
In [70]: !cat output.txt
0 1
0 2
1 0
1 2
2 0
2 1
2 2
With the 1 based indexing, add 1 to np.argwhere output -
In [71]: np.savetxt("output.txt", np.argwhere(x>0.3)+1, fmt="%d", comments='')
In [72]: !cat output.txt
1 2
1 3
2 1
2 3
3 1
3 2
3 3
You could use np.where, which returns two arrays (when applied to a 2D array), with indices of rows (and corresponding columns) satisfy the condition you specifiy as an argument.
Then you can zip these two arrays to get back a list of tuples:
list(zip(*np.where(x > 0.3)))
If you want to add 1 to every element of every tuple (use 1 based indexing), either loop over the tuples, either add 1 to each array returned by where:
res = np.where(x > 0.3)
res[0] += 1 # adds one to every element of res[0] thanks to broadcasting
res[1] += 1
list(zip(*res))

Case statement not correctly matching expected values

I'm trying to generate some randomized data, and I've been using newid() to seed functions since it is called once for every row and is guaranteed to return a different result each time. However I'm frequently getting values that are somehow not equal to any integers in the expected range.
I've tried a few variations, including a highly upvoted one, but they all result in the same issue. I've put it into a script that shows the problem:
declare #test table (id uniqueidentifier)
insert into #test
select newid() from sys.objects
select
floor(rand(checksum(id)) * 4),
case isnull(floor(rand(checksum(id)) * 4), -1)
when 0 then 0
when 1 then 1
when 2 then 2
when 3 then 3
when -1 then -1
else 999
end,
floor(rand(checksum(newid())) * 4),
case isnull(floor(rand(checksum(newid())) * 4), -1)
when 0 then 0
when 1 then 1
when 2 then 2
when 3 then 3
when -1 then -1
else 999
end
from #test
I expect the results to always be in the range 0 to 3 for all four columns. When the unique identifiers are retrieved from a table, the results are always correct (first two columns.) Similarly, when they're output on the fly they're also correct (third column.) But when they're compared on the fly to integers in a case statement, it often returns a value outside the expected range.
Here's an example, these are the first 20 rows when I ran it just now. As you can see there are '999' instances in the last column that shouldn't be there:
0 0 3 1
3 3 3 1
0 0 3 3
3 3 2 999
1 1 2 999
3 3 2 1
2 2 0 999
0 0 0 0
3 3 2 0
1 1 3 999
3 3 0 999
2 2 2 2
1 1 3 0
2 2 3 0
3 3 1 999
0 0 1 999
3 3 1 1
0 0 0 3
3 3 0 999
0 0 1 0
At first I thought maybe the type coercion was different than I expected, and the result of rand() * int was a float not an int. So I wrapped it all in floor to force it to be an int. Then I thought perhaps there's an odd null value creeping in, but with my case statement a null would be returned as -1, and there are none.
I've run this one two different SQL Server 2012 SP1 instances, both give the same sort of results.
In the fourth column, isnull(floor(rand(checksum(newid())) * 4), -1) is being evaluated up to five times for each row. Once for each branch of the case. On each call the values can be different. So it can return 2, not match 1, 3 not match 2, 1 not match 3, 3 not match 4 fall to the else and return 999.
This can be seen if you get the execution plan, and look at the XML, there is a line [whitespace added.]:
<ScalarOperator ScalarString="
CASE WHEN isnull(floor(rand(checksum(newid()))*(4.000000000000000e+000)),(-1.000000000000000e+000))=(0.000000000000000e+000) THEN (0)
ELSE CASE WHEN isnull(floor(rand(checksum(newid()))*(4.000000000000000e+000)),(-1.000000000000000e+000))=(1.000000000000000e+000) THEN (1)
ELSE CASE WHEN isnull(floor(rand(checksum(newid()))*(4.000000000000000e+000)),(-1.000000000000000e+000))=(2.000000000000000e+000) THEN (2)
ELSE CASE WHEN isnull(floor(rand(checksum(newid()))*(4.000000000000000e+000)),(-1.000000000000000e+000))=(3.000000000000000e+000) THEN (3)
ELSE CASE WHEN isnull(floor(rand(checksum(newid()))*(4.000000000000000e+000)),(-1.000000000000000e+000))=(-1.000000000000000e+000) THEN (-1)
ELSE (999)
END
END
END
END
END
">
Placing the expression in a CTE seems to keep the recomputes from happening:
; WITH T AS (SELECT isnull(floor(rand(checksum(newid())) * 4), -1) AS C FROM #Test)
SELECT CASE C
when 0 then 0
when 1 then 1
when 2 then 2
when 3 then 3
when -1 then -1
else 999 END
FROM T

Passing an array of arrays as parameter to a function

A web application can send to a function an array of arrays like
[
[
[1,2],
[3,4]
],
[
[],
[4,5,6]
]
]
The outer array length is n > 0. The middle arrays are of constant length, 2 in this example. And the inner arrays lengths are n >= 0.
I could string build it like this:
with t(a, b) as (
values (1, 4), (2, 3), (1, 4), (7, 3), (7, 4)
)
select distinct a, b
from t
where
(a = any(array[1,2]) or array_length(array[1,2],1) is null)
and
(b = any(array[3,4]) or array_length(array[3,4],1) is null)
or
(a = any(array[]::int[]) or array_length(array[]::int[],1) is null)
and
(b = any(array[4,5,6]) or array_length(array[4,5,6],1) is null)
;
a | b
---+---
7 | 4
1 | 4
2 | 3
But I think I can do better like this
with t(a, b) as (
values (1, 4), (2, 3), (1, 4), (7, 3), (7, 4)
), u as (
select unnest(a)::text[] as a
from (values
(
array[
'{"{1,2}", "{3,4}"}',
'{"{}", "{4,5,6}"}'
]::text[]
)
) s(a)
), s as (
select a[1]::int[] as a1, a[2]::int[] as a2
from u
)
select distinct a, b
from
t
inner join
s on
(a = any(a1) or array_length(a1, 1) is null)
and
(b = any(a2) or array_length(a2, 1) is null)
;
a | b
---+---
7 | 4
2 | 3
1 | 4
Notice that a text array was passed and then "casted" inside the function. That was necessary as Postgresql can only deal with arrays of matched dimensions and the passed inner arrays can vary in dimension. I could "fix" them before passing by adding some special value like zero to make them all the same length of the longest one but I think it is cleaner to deal with that inside the function.
Am I missing something? Is it the best approach?
I like your second approach.
SELECT DISTINCT t.*
FROM (VALUES (1, 4), (5, 1), (2, 3), (1, 4), (7, 3), (7, 4)) AS t(a, b)
JOIN (
SELECT arr[1]::int[] AS a1
,arr[2]::int[] AS b1
FROM (
SELECT unnest(ARRAY['{"{1,2}", "{3,4}"}'
,'{"{}" , "{4,5,6}"}'
,'{"{5}" , "{}"}' -- added element to 1st dimension
])::text[] AS arr -- 1d text array
) sub
) s ON (a = ANY(a1) OR a1 = '{}')
AND (b = ANY(b1) OR b1 = '{}')
;
Suggesting only minor improvements:
Subqueries instead of CTEs for slightly better performance.
Simplified test for empty array: checking against literal '{}' instead of function call.
One less subquery level for unwrapping the array.
Result:
a | b
--+---
2 | 3
7 | 4
1 | 4
5 | 1
For the casual reader: Wrapping the multi-dimensional array of integer is necessary, since Postgres demands that (quoting error message):
multidimensional arrays must have array expressions with matching dimensions
An alternate route would be with a 2-dimensional text array and unnest it using generate_subscripts():
WITH a(arr) AS (SELECT '{{"{1,2}", "{3,4}"}
,{"{}", "{4,5,6}"}
,{"{5}", "{}"}}'::text[] -- 2d text array
)
SELECT DISTINCT t.*
FROM (VALUES (1, 4), (5, 1), (2, 3), (1, 4), (7, 3), (7, 4)) AS t(a, b)
JOIN (
SELECT arr[i][1]::int[] AS a1
,arr[i][2]::int[] AS b1
FROM a, generate_subscripts(a.arr, 1) i -- using implicit LATERAL
) s ON (t.a = ANY(s.a1) OR s.a1 = '{}')
AND (t.b = ANY(s.b1) OR s.b1 = '{}');
Might be faster, can you test?
In versions before 9.3 one would use an explicit CROSS JOIN instead of lateral cross joining.

Resources