Find increasing and decreasing values in a column SNOWFLAKE ,SQL - snowflake-cloud-data-platform

I have a table in snowflake like below, and wanted to add new column represents if that row is increasing or decreasing based on the next row!
id
value
1
70
2
70
3
70
4
70
5
70
6
71
7
72
8
73
9
73
10
73
11
74
12
74
13
74
14
74
15
73
16
72
17
73
18
72
19
72
20
72
21
72
22
71
23
71
24
72
25
72
Expected Output:
id
value
DESIRED OUTPUT
1
70
INCREASING_ID1
2
70
INCREASING_ID2
3
70
INCREASING_ID3
4
70
INCREASING_ID4
5
70
INCREASING_ID5
6
71
INCREASING_ID6
7
72
INCREASING_ID7
8
73
INCREASING_ID8
9
73
INCREASING_ID9
10
73
INCREASING_ID10
11
74
INCREASING_ID11
12
74
INCREASING_ID12
13
74
INCREASING_ID13
14
74
INCREASING_ID14
15
73
DECREASING_ID15
16
72
DECREASING_ID16
17
73
INCREASING_ID17
18
72
DECREASING_ID18
19
72
DECREASING_ID19
20
72
DECREASING_ID20
21
72
DECREASING_ID21
22
71
DECREASING_ID22
23
71
DECREASING_ID23
24
72
INCREASING_ID24
25
72
INCREASING_ID25

This is a two step process. First step, find the direction of the lag: increasing, decreasing, or null if neither direction. Second step, if the current direction for a row is null, refer back to the last non-null value (null previously defined as no change in direction):
with DIRECTION as
(
select ID
,VALUE
,case
when ID = 1 then 'INCREASING'
when VALUE > lag(VALUE) over (order by id) then 'INCREASING'
when VALUE < lag(VALUE) over (order by id) then 'DECREASING'
else NULL
end as OUTPUT
from T1
)
select ID
,VALUE
,case when OUTPUT is null then
lag(OUTPUT) ignore nulls over (order by ID) || '_ID' || ID
else OUTPUT || '_ID' || ID
end as OUTPUT
from DIRECTION
;

So this is the same answer is Greg's, I have just written it verbosely to show the workings:
firstly a CTE for data:
with data(id, value, DESIRED_OUTPUT) as (
select * from values
(1, 70, 'INCREASING_ID1'),
(2, 70, 'INCREASING_ID2'),
(3, 70, 'INCREASING_ID3'),
(4, 70, 'INCREASING_ID4'),
(5, 70, 'INCREASING_ID5'),
(6, 71, 'INCREASING_ID6'),
(7, 72, 'INCREASING_ID7'),
(8, 73, 'INCREASING_ID8'),
(9, 73, 'INCREASING_ID9'),
(10, 73, 'INCREASING_ID10'),
(11, 74, 'INCREASING_ID11'),
(12, 74, 'INCREASING_ID12'),
(13, 74, 'INCREASING_ID13'),
(14, 74, 'INCREASING_ID14'),
(15, 73, 'DECREASING_ID15'),
(16, 72, 'DECREASING_ID16'),
(17, 73, 'INCREASING_ID17'),
(18, 72, 'DECREASING_ID18'),
(19, 72, 'DECREASING_ID19'),
(20, 72, 'DECREASING_ID20'),
(21, 72, 'DECREASING_ID21'),
(22, 71, 'DECREASING_ID22'),
(23, 71, 'DECREASING_ID23'),
(24, 72, 'INCREASING_ID24'),
(25, 72, 'INCREASING_ID25')
)
then two selects, as the two LAG's cannot co-exist at the same level:
select *
,lag(change)ignore nulls over(order by id) as lag_change
,nvl(change, lag_change) || '_ID' || id as final_answer
,DESIRED_OUTPUT = final_answer as is_same
from (
select *
,lag(value)over(order by id) as lag_v
,case
when lag_v is null then 'INCREASING'
when value<lag_v then 'DECREASING'
when value>lag_v then 'INCREASING'
end as change
from data
)
order by 1;
ID
VALUE
DESIRED_OUTPUT
LAG_V
CHANGE
LAG_CHANGE
FINAL_ANSWER
IS_SAME
1
70
INCREASING_ID1
null
INCREASING
null
INCREASING_ID1
TRUE
2
70
INCREASING_ID2
70
null
INCREASING
INCREASING_ID2
TRUE
3
70
INCREASING_ID3
70
null
INCREASING
INCREASING_ID3
TRUE
4
70
INCREASING_ID4
70
null
INCREASING
INCREASING_ID4
TRUE
5
70
INCREASING_ID5
70
null
INCREASING
INCREASING_ID5
TRUE
6
71
INCREASING_ID6
70
INCREASING
INCREASING
INCREASING_ID6
TRUE
7
72
INCREASING_ID7
71
INCREASING
INCREASING
INCREASING_ID7
TRUE
8
73
INCREASING_ID8
72
INCREASING
INCREASING
INCREASING_ID8
TRUE
9
73
INCREASING_ID9
73
null
INCREASING
INCREASING_ID9
TRUE
10
73
INCREASING_ID10
73
null
INCREASING
INCREASING_ID10
TRUE
11
74
INCREASING_ID11
73
INCREASING
INCREASING
INCREASING_ID11
TRUE
12
74
INCREASING_ID12
74
null
INCREASING
INCREASING_ID12
TRUE
13
74
INCREASING_ID13
74
null
INCREASING
INCREASING_ID13
TRUE
14
74
INCREASING_ID14
74
null
INCREASING
INCREASING_ID14
TRUE
15
73
DECREASING_ID15
74
DECREASING
INCREASING
DECREASING_ID15
TRUE
16
72
DECREASING_ID16
73
DECREASING
DECREASING
DECREASING_ID16
TRUE
17
73
INCREASING_ID17
72
INCREASING
DECREASING
INCREASING_ID17
TRUE
18
72
DECREASING_ID18
73
DECREASING
INCREASING
DECREASING_ID18
TRUE
19
72
DECREASING_ID19
72
null
DECREASING
DECREASING_ID19
TRUE
20
72
DECREASING_ID20
72
null
DECREASING
DECREASING_ID20
TRUE
21
72
DECREASING_ID21
72
null
DECREASING
DECREASING_ID21
TRUE
22
71
DECREASING_ID22
72
DECREASING
DECREASING
DECREASING_ID22
TRUE
23
71
DECREASING_ID23
71
null
DECREASING
DECREASING_ID23
TRUE
24
72
INCREASING_ID24
71
INCREASING
DECREASING
INCREASING_ID24
TRUE
25
72
INCREASING_ID25
72
null
INCREASING
INCREASING_ID25
TRUE
and thus the final SQL can be:
select id, value
,nvl(change, lag(change)ignore nulls over(order by id)) || '_ID' || id as final_answer
from (
select id, value
,lag(value)over(order by id) as lag_v
,case
when lag_v is null then 'INCREASING'
when value<lag_v then 'DECREASING'
when value>lag_v then 'INCREASING'
end as change
from data
)
order by 1;
which is the same as Greg's other than I have used the implicit NULL result in the inner CASE, and I used a NVL instead of CASE in the outer SELECT, a COALESCE could also be used here.

Related

2D matrix to 3D matrix with row to [row, col] mapping

I have a 2D matrix with in the 1st dimension different channels, and in the 2nd dimension time samples. I want to rearrange this to a 3D matrix, with in the 1st and 2nd dimension channels, and in the 3rd time samples.
The channels have to mapped according to a certain mapping. Right now I am using a for-loop to do so, but what would be a no-loop solution?
N_samples = 1000;
N_channels = 64;
channel_mapping = reshape(1:64, [8 8]).';
% Results in mapping: (can also be random)
% 1 2 3 4 5 6 7 8
% 9 10 11 12 13 14 15 16
% 17 18 19 20 21 22 23 24
% 25 26 27 28 29 30 31 32
% 33 34 35 36 37 38 39 40
% 41 42 43 44 45 46 47 48
% 49 50 51 52 53 55 55 56
% 57 58 59 60 61 62 63 64
data = rand(N_channels, N_samples);
data_grid = NaN(8,8, N_samples);
for k = 1:N_samples
tmp = data(:, k);
data_grid(:, :, k) = tmp(channel_mapping);
end
You can do it in one go as follows:
data_grid = reshape(data(channel_mapping, :), 8, 8, []);

SockMerchant Challenge Ruby Array#count not counting?

So, i'm doing a beginners challenge on HackerHank and, a strange behavior of ruby is boggling my mind.
The challenge is: find and count how many pairs there are in the array. (sock pairs)
Here's my code.
n = 100
ar = %w(50 49 38 49 78 36 25 96 10 67 78 58 98 8 53 1 4 7 29 6 59 93 74 3 67 47 12 85 84 40 81 85 89 70 33 66 6 9 13 67 75 42 24 73 49 28 25 5 86 53 10 44 45 35 47 11 81 10 47 16 49 79 52 89 100 36 6 57 96 18 23 71 11 99 95 12 78 19 16 64 23 77 7 19 11 5 81 43 14 27 11 63 57 62 3 56 50 9 13 45)
def sockMerchant(n, ar)
counter = 0
ar.each do |item|
if ar.count(item) >= 2
counter += ar.count(item)/2
ar.delete(item)
end
end
counter
end
print sockMerchant(n, ar)
The problem is, it doesn't count well. after running the function, in it's internal array ar still have countable pairs, and i prove it by running it again.
There's more. If you sort the array, it behaves differently.
it doesnt make sense to me.
you can check the behavior on this link
https://repl.it/repls/HuskyFrighteningNaturallanguage
You're deleting items from a collection while iterating over it - expect bad stuff to happen. In short, don't do that if you don't want to have such problems, see:
> arr = [1,2,1]
# => [1, 2, 1]
> arr.each {|x| puts x; arr.delete(x) }
# 1
# => [2]
We never get the 2 in our iteration.
A simple solution, that is a small variation of your code, could look as follows:
def sock_merchant(ar)
ar.uniq.sum do |item|
ar.count(item) / 2
end
end
Which is basically finding all unique socks, and then counting pairs for each of them.
Note that its complexity is n^2 as for each unique element n of the array, you have to go through the whole array in order to find all elements that are equal to n.
An alternative, first group all socks, then check how many pairs of each type we have:
ar.group_by(&:itself).sum { |k,v| v.size / 2 }
As ar.group_by(&:itself), short for ar.group_by { |x| x.itself } will loop through the array and create a hash looking like this:
{"50"=>["50", "50"], "49"=>["49", "49", "49", "49"], "38"=>["38"], ...}
And by calling sum on it, we'll iterate over it, summing the number of found elements (/2).

How to I sum up my data in 4 rows?

Select
AvHours.LineNumber,
(SProd.PoundsMade / (AvHours.AvailableHRS - SUM (ProdDtime.DownTimeHRS))) AS Throughput,
SUM (ProdDtime.DownTimeHRS) AS [Lost Time],
(SUM(cast(ProdDtime.DownTimeHRS AS decimal(10,1))) * 100) / (cast(AvHours.AvailableHRS AS decimal(10,1))) AS [%DownTime],
SUM(SProd.PoundsMade) AS [Pounds Made],
(SProd.PoundsMade / (AvHours.AvailableHRS - SUM (ProdDtime.DownTimeHRS))) * SUM (ProdDtime.DownTimeHRS) AS [Pounds Lost]
FROM rpt_Line_Shift_AvailableHrs AvHours
inner join rpt_Line_Shift_Prod SProd on
AvHours.LineNumber=SProd.LineNumber AND AvHours.Shiftnumber=SProd.Shiftnumber
inner join rpt_Line_Shift_ProdDownTime ProdDtime on
(AvHours.LineNumber=ProdDtime.LineNumber AND AvHours.Shiftnumber=ProdDtime.Shiftnumber)
GROUP BY AvHours.LineNumber,SProd.PoundsMade,AvHours.AvailableHRS
ORDER BY AvHours.LineNumber
The query above gives the following result set:
Line#,Throughput,Lost Time, %downtime,Pounds Made,Pounds Lost
1 53 49 27.222222 97538 2597
1 44 39 20.312500 116229 1716
1 47 40 22.222222 92190 1880
1 55 31 16.145833 133215 1705
1 111 49 27.222222 204442 5439
1 13 31 16.145833 33540 403
1 86 49 27.222222 159432 4214
1 81 31 16.145833 197145 2511
1 74 40 22.222222 146202 2960
1 63 49 27.222222 115920 3087
1 76 39 20.312500 199172 2964
2 64 40 22.222222 126028 2560
2 149 49 27.222222 273966 7301
2 35 39 20.312500 92616 1365
3 49 39 20.312500 129591 1911
3 65 40 22.222222 129248 2600
3 84 39 20.312500 219997 3276
4 95 31 16.145833 229485 2945
4 76 40 22.222222 149996 3040
4 94 31 16.145833 228375 2914
4 99 39 20.312500 259794 3861
What I actually want is just 4 lines (Line# = 1,2,3 or 4) and all the other fields summed.
I'm not sure how to do it. Can anybody help?
Get rid of PoundsMade and AvailableHrs in your group by. It sounds like you only want to group by the Linenumber.
You can use your sql as a nested table and then group by the line number
like the sql below.
Select LineNumber, Sum(Throughput), Sum([Lost Time]), Sum([%DownTime]), Sum([Pounds Made]), Sum([Pounds Lost])
From
(Select
AvHours.LineNumber,
(SProd.PoundsMade / (AvHours.AvailableHRS - SUM (ProdDtime.DownTimeHRS))) AS Throughput,
SUM (ProdDtime.DownTimeHRS) AS [Lost Time],
(SUM(cast(ProdDtime.DownTimeHRS AS decimal(10,1))) * 100) / (cast(AvHours.AvailableHRS AS decimal(10,1))) AS [%DownTime],
SUM(SProd.PoundsMade) AS [Pounds Made],
(SProd.PoundsMade / (AvHours.AvailableHRS - SUM (ProdDtime.DownTimeHRS))) * SUM (ProdDtime.DownTimeHRS) AS [Pounds Lost]
FROM rpt_Line_Shift_AvailableHrs AvHours
inner join rpt_Line_Shift_Prod SProd on
AvHours.LineNumber=SProd.LineNumber AND AvHours.Shiftnumber=SProd.Shiftnumber
inner join rpt_Line_Shift_ProdDownTime ProdDtime on
(AvHours.LineNumber=ProdDtime.LineNumber AND AvHours.Shiftnumber=ProdDtime.Shiftnumber)
GROUP BY AvHours.LineNumber,SProd.PoundsMade,AvHours.AvailableHRS
) A
Group BY LineNumber
ORDER BY LineNumber
I dont have a sql server right now to test this out, But let me know if you encounter any issue
Please mark this as answer if it helped resolving your issue

Create 3-dimensional array from 2 dimensional array in matlab

I would like to know how to generate a 3-d array from a 2-d array in matlab. My lack of understanding may simply be the result of not knowing the correct nomenclature.
I have a 2-dimensional array or matrix, A:
A = [12, 62, 93, -8, 22; 16, 2, 87, 43, 91; -4, 17, -72, 95, 6]
and I would like to add a 3rd dimension with the same values such that:
A(:,:,1) = 12 62 93 -8 22
16 2 87 43 91
-4 17 -72 95 6
and
A(:,:,2) = 12 62 93 -8 22
16 2 87 43 91
-4 17 -72 95 6
to
A(:,:,p) = 12 62 93 -8 22
16 2 87 43 91
-4 17 -72 95 6
how would I go about doing so in the most efficient way (I might have a much larger array where m = 100, n = 50, p= 1000 where A(m,n,p).
Try
result = reshape(repmat(A,1,p),m,n,p)

Multiple array items appending

This is the reproducible code:
a <- rep(1, 20)
a[c(1, 12, 15)] <- 0
b <- which(a == 0)
set.seed(123)
d <- round(runif(17) * 100)
I would like to append 0s to d to get the following result:
[1] 0 29 79 41 88 94 5 53 89 55 46 0 96 45 0 68 57 10 90 25
that is equal to d after appending 0s to each element which has index equal to b - 1.
I've seen append() accepts just one single "after" value, not more than one.
How could I do?
Please, keep in mind I cannot change the length of d because it is supposed it's the ouput of a quite long function, not a simple random sequence like in this example.
You can use negative subscripting to assign the non-zero elements to a new vector.
D <- numeric(length(c(b,d)))
D[-b] <- d
D
# [1] 0 29 79 41 88 94 5 53 89 55 46 0 96 45 0 68 57 10 90 25

Resources