SQL Grouping columns by row - sql-server

Struggling with the SQL query to convert the data I have into the required format. I have an event log for machines and would like to associate the start and stop time and event outcome in the same row.
I am unable to use LAG due to the version of SQLServer. Any help appreciated.
current dataset:
+----------+----------+------------+------------------------------+---------------------+
| MACHINE | EVENT_ID | EVENT_CODE | DATE_TIME | EVENT_DESCRIPTOR |
+----------+----------+------------+------------------------------+---------------------+
| 1 | 1 | 1 | 2020-08-06 14:59:26 | SCAN : START : z1 : |
| 1 | 2 | 6 | 2020-08-06 15:00:18 | SCAN : END : z1 : |
| 1 | 3 | 1 | 2020-08-06 15:00:45 | SCAN : START : z1 : |
| 1 | 4 | 5 | 2020-08-06 15:01:54 | SCAN : ABORT : z1 : |
| 2 | 5 | 1 | 2020-08-06 15:02:15 | SCAN : START : z1 : |
| 2 | 6 | 6 | 2020-08-06 15:05:07 | SCAN : END : z1 : |
| 1 | 7 | 1 | 2020-08-06 15:05:13 | NEST : START : z1 : |
| 1 | 8 | 6 | 2020-08-06 15:05:22 | NEST : END : z1 : |
| 1 | 9 | 1 | 2020-08-06 15:07:17 | CUT : START : z1 : |
| 1 | 10 | 6 | 2020-08-06 15:10:40 | CUT : END : z1 : |
+----------+----------+------------+------------------------------+---------------------+
The outcome I am trying to achieve:
+----------+------------------------------+------------------------------+----------+
| Machine | SCAN:START:Z1 _TIME | SCAN:STOP_OR_ABORT:Z1 _TIME | OUTCOME |
+----------+------------------------------+------------------------------+----------+
| 1 | Thu Aug 06 14:59:26 BST 2020 | 2020-08-06 15:00:18 | END |
| 1 | Thu Aug 06 15:00:45 BST 2020 | 2020-08-06 15:01:54 | ABORT |
| 1 | Thu Aug 06 15:02:15 BST 2020 | 2020-08-06 15:05:07 | END |
+----------+------------------------------+------------------------------+----------+

You can select the starting events and join them the ending events as subqueries, in the form, for example, of an outer apply.
select L1.Machine, L1.date_time as Start, L2.datetime as Stop_Or_Abort,
case L2.Event_Id when 5 then 'ABORT' when 6 then 'END' end as Outcome
from MyLogs L1
outer apply (select top 1 L2.date_time, L2.Event_Code
from MyLogs L2
where L2.Machine = L1.Machine and
L2.Event_ID > L1.Event_ID and
L2.Event_Code in (5, 6)
order by L2.Event_ID) as L2
where L1.Event_Descriptor Like 'SCAN%' and
L1.Event_Code = 1

You can use an Outer Apply to get the next record after the start, filtering only the SCAN events.
Select Machine = L.MACHINE,
[SCAN:START:Z1 _TIME] = L.DATE_TIME,
[SCAN:STOP_OR_ABORT:Z1 _TIME] = E.DATE_TIME,
Outcome = Case when E.EVENT_CODE = 5 then 'ABORT'
when E.EVENT_CODE = 6 then 'END'
End
From Logs L
Outer Apply
(
Select top 1 L1.DATE_TIME,
L1.EVENT_CODE
From Logs L1
where L1.MACHINE = L.MACHINE and
L1.EVENT_CODE in (5, 6) and
L1.DATE_TIME > L.DATE_TIME
order by EVENT_ID
) E
where L.EVENT_CODE = 1
and L.EVENT_DESCRIPTOR like 'SCAN%'

Related

Multi-dimensional data structure management in R

I have a concern about data organisation and the best approach to simplify some multi-layered data. Simply, I have a 10 replicates of small wood beams (BeamID, ~10) subjected to a 10 different treatment (TreatID, ~10), and each beam is load tested which produces a series data of a Load with consequent Displacement (ranging from 10 to 50 rows per test; I have code that corrects for disparities in row length). Each wood beam is tested multiple times (Rep, ~10).
My plan was to lump all this data into a 5-D array:
Array[Load, Deflection, BeamID, TreatID, Rep]
This way, I should be able to plot the load~deflection curves for a given BeamID, TreatID, for all Reps by using Array[ , ,1,1, ], right? So the hypothetical output for Array[ , ,1,1,1], would be:
+------------+--------+-----+
| Deflection | Load | Rep |
+------------+--------+-----+
| 0 | 0 | 1 |
| 6.35 | 10.5 | 1 |
| 12.7 | 20.8 | 1 |
| 19.05 | 45.3 | 1 |
| 25.4 | 75.2 | 1 |
+------------+--------+-----+
And Array[ , ,1,1,2] would be:
+------------+--------+-----+
| Deflection | Load | Rep |
+------------+--------+-----+
| 0 | 0 | 2 |
| 7.3025 | 12.075 | 2 |
| 14.605 | 23.92 | 2 |
| 21.9075 | 52.095 | 2 |
| 29.21 | 86.48 | 2 |
+------------+--------+-----+
Or I think I could keep it as a simpler, 'melted' dataframe, which would have columns for Load and Deflection, and BeamID, TreatID, and Rep would be repeated for each row of the test output.
+------------+--------+-----+--------+---------+
| Deflection | Load | Rep | BeamID | TreatID |
+------------+--------+-----+--------+---------+
| 0 | 0 | 1 | 1 | 1 |
| 6.35 | 10.5 | 1 | 1 | 1 |
| 12.7 | 20.8 | 1 | 1 | 1 |
| 19.05 | 45.3 | 1 | 1 | 1 |
| 25.4 | 75.2 | 1 | 1 | 1 |
| 0 | 0 | 2 | 1 | 1 |
| 7.3025 | 12.075 | 2 | 1 | 1 |
| 14.605 | 23.92 | 2 | 1 | 1 |
| 21.9075 | 52.095 | 2 | 1 | 1 |
| 29.21 | 86.48 | 2 | 1 | 1 |
+------------+--------+-----+--------+---------+
However, with the latter, I'm not sure how I could easily and discretely pull out all the Rep test values for a specific BeamID and TreatID, especially since I use a linear model to fit a 3rd order polynomial for an specific test to extract the slope of the curves. Having it as a continuous dataframe means I'd have to specify starting and stopping points to start the linear model, correct?
Thoughts, suggestions? Am I headed in the right direction in using a 5-D array? R is a new programming language for me, so please pardon my misunderstandings.

How do you make a table into one long row in SAS?

I have a table with a number of variables such as:
+-----------+------------+---------+-----------+--------+
| DateFrom | DateTo | Price | Discount | Cost |
+-----------+------------+---------+-----------+--------+
| 01jan17 | 01jul17 | 17 | 4 | 5 |
| 01aug17 | 01feb18 | 15 | 1 | 3 |
| 01mar18 | 01dec18 | 12 | 2 | 1 |
| ... | ... | ... | ... | ... |
+-----------+------------+---------+-----------+--------+
However I want to split this so I have:
+------------+------------+----------+-------------+---------+-------------+------------+----------+-------------+-------------+
| DateFrom1 | DateTo1 | Price1 | Discount1 | Cost1 | DateFrom2 | DateTo2 | Price2 | Discount2 | Cost2 ... |
+------------+------------+----------+-------------+---------+-------------+------------+----------+-------------+-------------+
| 01jan17 | 01jul17 | 17 | 4 | 5 | 01aug17 | 01feb18 | 15 | 1 | 3 |
+------------+------------+----------+-------------+---------+-------------+------------+----------+-------------+-------------+
There's a cool (not at all obvious) solution using proc summary and the idgroup statement that only takes a few lines of code. This runs in memory and you're likely to come into problems if the dataset is large, otherwise this works very well.
Note that out[3] relates to the number of rows in the source data. You could easily make this dynamic by adding a prior step that calculates the number of rows and stores it in a macro variable.
/* create initial dataset */
data have;
input (DateFrom DateTo) (:date7.) Price Discount Cost;
format DateFrom DateTo date7.;
datalines;
01jan17 01jul17 17 4 5
01aug17 01feb18 15 1 3
01mar18 01dec18 12 2 1
;
run;
/* transform data into 1 row */
proc summary data=have nway;
output out=want (drop=_:)
idgroup(out[3] (_all_)=) / autoname;
run;

Pivot tables with string as value for aggregate function

I have this table of stage-wise activities and their corresponding flags.
STAGE | ACTIVITY | FLAG | STATUS |
S1 | A1 |S1_A1_FLAG | ST_S1_A1 |
S1 | A2 |S1_A2_FLAG | ST_S1_A2 |
: : : :
SN | A1 |SN_A1_FLAG | ST_SN_A1 |
SN | A2 |SN_A2_FLAG | ST_SN_A2 |
: | : | : | : |
SN | AN |SN_AN_FLAG | ST_SN_AN |
I have to create a view that will have following structure...
STAGE| A1 | A2 | ... | AN |
---------------------------------------------------
S1 |S1_A1_FLAG| S1_A2_FLAG | ... | S1_AN_FLAG |
| ST_S1_A1 | ST_S1_A2 | ... | ST_S1_AN |
----------------------------------------------------
S2 |S2_A1_FLAG| S2_A2_FLAG | ... | S2_AN_FLAG |
| ST_S2_A1 | ST_S2_A2 | ... | ST_S2_AN |
----------------------------------------------------
: : : : :
----------------------------------------------------
SN |SN_A1_FLAG| SN_A2_FLAG | ... | SN_AN_FLAG |
| ST_SN_A1 | ST_SN_A2 | ... | ST_SN_AN |
Here 'flag' and 'status' are string values and have to be displayed in a single cell.
Also, using "hard-coded" case-when statements are not a viable option as there are a few hundred stages each containing at least a dozen activities.
As I am new to pivots in sql, any help regarding the same would be welcome
SELECT * from
(select Stage,Activity, flag+' ' + astatus as AStatus
FROM tblStage
)tb
PIVOT
(
Min(AStatus)
FOR Activity IN([A1],[A2])
)p;
Check these :
To Pivot on dynamic columns.
Also Check Pivot on Varchar Values.

Hive Query: working with String Array

I have a HIVE Table with following schema like this:
hive>desc books;
gen_id int
author array<string>
rating double
genres array<string>
hive>select * from books;
| gen_id | rating | author |genres
+----------------+-------------+---------------+----------
| 1 | 10 | ["A","B"] | ["X","Y"]
| 2 | 20 | ["C","A"] | ["Z","X"]
| 3 | 30 | ["D"] | ["X"]
Is there a query where I can perform some SELECT operation and that returns individual rows, like this:
| gen_id | rating | SplitData
+-------------+---------------+-------------
| 1 | 10 | "A"
| 1 | 10 | "B"
| 1 | 10 | "X"
| 1 | 10 | "Y"
| 2 | 20 | "C"
| 2 | 20 | "A"
| 2 | 20 | "Z"
| 2 | 20 | "X"
| 3 | 30 | "D"
| 3 | 30 | "X"
Can someone guide me how can get to this result. Thanks in advance for any kind of help.
You need to do Lateral view and explode,i.e.
SELECT
gen_id,
rating,
SplitData
FROM (
SELECT
gen_id,
rating,
array (ex_author,ed_genres) AS ar_SplitData
FROM
books
LATERAL VIEW explode(books.author) exploded_authors AS ex_author
LATERAL VIEW explode(books.genres) exploded_genres AS ed_genres
) tab
LATERAL VIEW explode(tab.ar_SplitData) exploded_SplitData AS SplitData;
I had no chance to test it but it should show you general path. GL!

What's the idiomatic way to split a Smalltalk array at the spot where a series of values changes?

Given an array of domain objects (with the properties subject, trial and run) like this:
+---------+-------+-----+
| Subject | Trial | Run |
+---------+-------+-----+
| 1 | 1 | 1 |
| 1 | 2 | 1 |
| 1 | 3 | 2 |
| 1 | 4 | 2 |
| 2 | 1 | 1 |
| 2 | 2 | 1 |
| 1 | 1 | 1 |
| 1 | 2 | 1 |
+---------+-------+-----+
i want to split it into multiple arrays at every point where the value for subject changes.
The above example should result in three arrays:
+---------+-------+-----+
| Subject | Trial | Run |
+---------+-------+-----+
| 1 | 1 | 1 |
| 1 | 2 | 1 |
| 1 | 3 | 2 |
| 1 | 4 | 2 |
+---------+-------+-----+
+---------+-------+-----+
| 2 | 1 | 1 |
| 2 | 2 | 1 |
+---------+-------+-----+
+---------+-------+-----+
| 1 | 1 | 1 |
| 1 | 2 | 1 |
+---------+-------+-----+
What would be the idiomatic Smalltalk (Pharo) way to split the array like this?
SequenceableCollection >> piecesCutWhere: which takes a binary block is your friend:
{ 1. 1. 2. 2. 2. 3. 1. 2. } piecesCutWhere: [:left :right | left ~= right]
=> an OrderedCollection #(1 1) #(2 2 2) #(3) #(1) #(2)

Resources