Filling missing rows , RIGHT JOIN with each group of GROUP BY

Filling missing rows , RIGHT JOIN with each group of GROUP BY - sql-server

On one table I have Id, and Name of 10 tests whitch should be done.
On second SN product, TestDate, and Id test that have been done to this product.
I need to find, and show tests whitch should be done but they are not.
Solution with CROSS JOIN, and LEFT OUTER JOIN works for 1000 rows, but for 8000-15000 it takes a long time 1-3 minutes.
Data are prepared by CTE query
example below
I want to add "missing" row to each group #Table2
#Table1 => four tests which should be done
number - Id of test
data3 -name of test
#Table2 => tests which were done
data1 - id of tested device
GROUP => tests of one device
DECLARE #table1 TABLE (data3 NVARCHAR(20), number INT)
DECLARE #table2 TABLE (data1 NVARCHAR(20), data2 NVARCHAR(20), number INT)
INSERT INTO #table1
SELECT 'xx', 1
UNION ALL
SELECT 'ee', 2
UNION ALL
SELECT 'zz', 3
UNION ALL
SELECT 'gg', 4
INSERT INTO #table2
SELECT '1', 'aaaaaaaaaa', 1 --GROUP 1
UNION ALL
SELECT '1', 'aaaaaaaaaa', 2 --GROUP 1
UNION ALL
SELECT '1', 'aaaaaaaaaa', 3 --GROUP 1
UNION ALL
SELECT '2', 'bbbbbbbbbb', 1 --GROUP 2
UNION ALL
SELECT '2', 'bbbbbbbbbb', 2 --GROUP 2
UNION ALL
SELECT '3', 'cccccccccc', 1 --GROUP 3
UNION ALL
SELECT '3', 'cccccccccc', 3 --GROUP 3
With this query only one row was added (first one), I need to fill each group of table2
If my group is eg. GROUP BY data1,data2
SELECT *
FROM #table2 t2
RIGHT JOIN #table1 t1 ON t2.number = t1.number
ORDER BY t2.data1, t1.number
Output:
data1 data2 number data3 number
-----------------------------------------
NULL NULL NULL gg 4
1 aaaaaaaaaa 1 xx 1
1 aaaaaaaaaa 2 ee 2
1 aaaaaaaaaa 3 zz 3
2 bbbbbbbbbb 1 xx 1
2 bbbbbbbbbb 2 ee 2
3 cccccccccc 3 zz 3
3 cccccccccc 1 xx 1
This is my required output (although only one 'number' column would also work)
data1 data2 number number3
-----------------------------------------
1 aaaaaaaaaa 1 1 --GROUP 1
1 aaaaaaaaaa 2 2 --GROUP 1
1 aaaaaaaaaa 3 3 --GROUP 1
NULL NULL NULL 4 --GROUP 1
2 bbbbbbbbbb 1 1 --GROUP 2
2 bbbbbbbbbb 2 2 --GROUP 2
NULL NULL NULL 3 --GROUP 2
NULL NULL NULL 4 --GROUP 2
3 cccccccccc 1 1 --GROUP 3
NULL NULL NULL 2 --GROUP 3
3 cccccccccc 3 3 --GROUP 3
NULL NULL NULL 4 --GROUP 3

Why your data1 are null for missing values? I guess they must be filled from table2. Try this query:
;with cte as (
select
distinct a.data1, b.number, b.data3
from
#table2 a
cross join #table1 b
)
select
c.data1, t.data2, t.number, c.data3, number3 = c.number
from
cte c
left join #table2 t on c.data1 = t.data1 and c.number = t.number
Output
data1 data2 number data3 number3
---------------------------------------------
1 aaaaaaaaaa 1 xx 1
1 aaaaaaaaaa 2 ee 2
1 aaaaaaaaaa 3 zz 3
1 NULL NULL gg 4
2 bbbbbbbbbb 1 xx 1
2 bbbbbbbbbb 2 ee 2
2 NULL NULL zz 3
2 NULL NULL gg 4
3 cccccccccc 1 xx 1
3 NULL NULL ee 2
3 cccccccccc 3 zz 3
3 NULL NULL gg 4
If you really need to show null values in data1 column, then add case statement to check value of data2.

Related

How to convert multiple column values into rows in hive?

Input:
ID COLUMN1 COLUMN2 COLUMN3
1 M,S,E,T 1,2,3,4 5,6,7
2 A,B,C 6,5,8,7,9,1 2,4,3,0,1
Output:
ID COLUMN1 COLUMN2 COLUMN3
1 M 10 50
1 S 20 60
1 E 30 70
1 T 40 NULL
2 A 6 2
2 B 5 4
2 C 8 3
2 NULL 7 0
2 NULL 9 1
2 NULL 1 NULL
Code:
select ID,
array_index( COLUMN1_arr, n ) as COLUMN1,
array_index( COLUMN2_arr, n ) as COLUMN2
from sample
lateral view numeric_range(size(COLUMN1_arr)) n1 as n;
Error:
FAILED: Semantic Exception [Error 10011]: Invalid function array_index
Here I'm having a multiple values in single column i need to convert it to rows as mentioned Output.

Explode Is an UDTF provided in hive you can use the same to split data from columns into rows.
SELECT ID1, col1,col2,col3
FROM tableName
lateral view explode(split(COLUMN1,',')) cols1 AS col1
lateral view explode(split(COLUMN2,',')) cols2 AS col2
lateral view explode(split(COLUMN3,',')) cols3 AS col3

Plain vanilla hive solution, without brickhouse UDFs.
Demo:
with
input as ( ---------------Input dataset
select stack(2,
1, array('M','S','E','T'), array(1,2,3,4), array(5,6,7),
2, array('A','B','C'), array(6,5,8,7,9,1), array( 2,4,3,0,1)
) as (ID,COLUMN1,COLUMN2,COLUMN3)
),
--explode each array and FULL join them
c1 as (
select i.id, v.column1, p
from input i
lateral view posexplode(i.COLUMN1) v as p,column1
),
c2 as (
select i.id, v.column2, p
from input i
lateral view posexplode(i.COLUMN2) v as p,column2
),
c3 as (
select i.id, v.column3, p
from input i
lateral view posexplode(i.COLUMN3) v as p,column3
)
--FULL JOIN
select coalesce(c1.id,c2.id,c3.id) id, c1.column1, c2.column2, c3.column3
from c1
full join c2 on c1.id=c2.id and c1.p=c2.p
full join c3 on nvl(c1.id,c2.id)=c3.id and nvl(c1.p,c2.p)=c3.p --note NVL usage
;
Result:
OK
id column1 column2 column3
1 M 1 5
1 S 2 6
1 E 3 7
1 T 4 NULL
2 A 6 2
2 B 5 4
2 C 8 3
2 NULL 7 0
2 NULL 9 1
2 NULL 1 NULL

How to sum a column in SQL Server recursive cte for optimization?

I have following table with hierarchical data:
FolderId ParentFolderId NumberOfAffectedItems
---------------------------------------------
1 NULL 2
2 1 3
3 2 5
4 2 3
5 1 0
I want to find number of affected items under each folders and all of its children. I can write a recursive cte, which can produce following result, after that by doing group by I can find out what I want.
Normal recursive CTE:
WITH FolderTree AS
(
SELECT
fsa.FolderId AS ParentFolderId,
fsa.FolderId AS ChildFolderId,
fsa.NumberOfReportsAffected
FROM
FoldersWithNumberOfReportsAffected fsa
UNION ALL
SELECT
ft.ParentFolderId,
fsa.FolderId AS ChildFolderId,
fsa.NumberOfReportsAffected
FROM
FoldersWithNumberOfReportsAffected fsa
INNER JOIN
FolderTree ft ON fsa.ParentFolderId = ft.ChildFolderId
)
Result:
ParentFolderId ChildFolderId NumberOfAffectedItems
--------------------------------------------------
1 1 2
1 2 3
1 3 5
1 4 3
1 5 0
2 2 3
2 3 5
2 4 3
3 3 5
4 4 3
5 5 0
But I want to optimize it, I want to start from the leaf child, while
moving through the CTE itself, I want to compute NumberOfAffectedItems.
Expected CTE
WITH FolderTree AS
(
SELECT
fsa.FolderId AS LeafChildId,
fsa.FolderId AS ParentFolderId,
fsa.NumberOfReportsAffected
FROM
FoldersWithNumberOfReportsAffected fsa
LEFT JOIN
FoldersWithNumberOfReportsAffected f ON fsa.folderid = f.ParentfolderId
WHERE
f.ParentfolderId is null -- this is finding leaf child
UNION ALL
SELECT
ft.LeafChildId,
fsa.FolderId AS ParentFolderId,
fsa.NumberOfReportsAffected + ft.NumberOfReportsAffected AS [ComputedResult]
FROM
FoldersWithNumberOfReportsAffected fsa
INNER JOIN
FolderTree ft ON fsa.FolderId = ft.ParentFolderId
)
Result:
LeafChildId ParentFolderId ComputedNumberOfAffectedItems
---------------------------------------------------------
3 3 5
3 2 8
3 1 10
4 4 3
4 2 5
4 1 7
5 5 0
5 1 2
If I group by ParentFolderId, I will get a wrong result, the reason is while doing computing in CTE, the same parent folder is visited from multiple
children, hence results in a wrong result. I want to find out is there anyway we can compute the result while going through the CTE itself.

Please check the following solution. I used your cte as basis and added the calculation (as column x) to it:
DECLARE #t TABLE(
FolderID INT
,ParentFolderID INT
,NumberOfAffectedItems INT
);
INSERT INTO #t VALUES (1 ,NULL ,2)
,(2 ,1 ,3)
,(3 ,2 ,5)
,(4 ,2 ,3)
,(5 ,1 ,0);
WITH FolderTree AS
(
SELECT 1lvl,
fsa.FolderId AS LeafChildId,
fsa.ParentFolderId AS ParentFolderId,
fsa.NumberOfAffectedItems
FROM
#t fsa
LEFT JOIN
#t f ON fsa.folderid = f.ParentfolderId
WHERE
f.ParentfolderId is null -- this is finding leaf child
UNION ALL
SELECT lvl + 1,
ft.LeafChildId,
fsa.ParentFolderId,
fsa.NumberOfAffectedItems
FROM
FolderTree ft
INNER JOIN #t fsa
ON fsa.FolderId = ft.ParentFolderId
)
SELECT LeafChildId,
ISNULL(ParentFolderId, LeafChildId) ParentFolderId,
NumberOfAffectedItems,
SUM(NumberOfAffectedItems) OVER (PARTITION BY LeafChildId ORDER BY ISNULL(ParentFolderId, LeafChildId) DESC) AS x
FROM FolderTree
ORDER BY 1, 2 DESC
OPTION (MAXRECURSION 0)
Result:
LeafChildId ParentFolderId NumberOfAffectedItems x
3 3 2 2
3 2 5 7
3 1 3 10
4 4 2 2
4 2 3 5
4 1 3 8
5 5 2 2
5 1 0 2

Performance issue with CTE SQL Server query

We have a table with a parent child relationship, that represents a deep tree structure.
We are using a view with a CTE to query the data but the performance is poor (see code and execution plan below).
Is there any way we can improve the performance?
WITH cte (ParentJobTypeId, Id) AS
(
SELECT
Id, Id
FROM
dbo.JobTypes
UNION ALL
SELECT
e.Id, cte.Id
FROM
cte
INNER JOIN
dbo.JobTypes AS e ON e.ParentJobTypeId = cte.ParentJobTypeId
)
SELECT
ISNULL(Id, 0) AS ParentJobTypeId,
ISNULL(ParentJobTypeId, 0) AS Id
FROM
cte

A quick example of using the range keys. As I mentioned before, hierarchies were 127K points and some sections where 15 levels deep
The cte Builds, let's assume the hier results will be will be stored in a table (indexed as well)
Declare #Table table(ID int,ParentID int,[Status] varchar(50))
Insert #Table values
(1,101,'Pending'),
(2,101,'Complete'),
(3,101,'Complete'),
(4,102,'Complete'),
(101,null,null),
(102,null,null)
;With cteOH (ID,ParentID,Lvl,Seq)
as (
Select ID,ParentID,Lvl=1,cast(Format(ID,'000000') + '/' as varchar(500)) from #Table where ParentID is null
Union All
Select h.ID,h.ParentID,cteOH.Lvl+1,Seq=cast(cteOH.Seq + Format(h.ID,'000000') + '/' as varchar(500)) From #Table h INNER JOIN cteOH ON h.ParentID = cteOH.ID
),
cteR1 as (Select ID,Seq,R1=Row_Number() over (Order by Seq) From cteOH),
cteR2 as (Select A.ID,R2 = max(B.R1) From cteOH A Join cteR1 B on (B.Seq Like A.Seq+'%') Group By A.ID)
Select B.R1
,C.R2
,A.Lvl
,A.ID
,A.ParentID
Into #TempHier
From cteOH A
Join cteR1 B on (A.ID=B.ID)
Join cteR2 C on (A.ID=C.ID)
Select * from #TempHier
Select H.R1
,H.R2
,H.Lvl
,H.ID
,H.ParentID
,Total = count(*)
,Complete = sum(case when D.Status = 'Complete' then 1 else 0 end)
,Pending = sum(case when D.Status = 'Pending' then 1 else 0 end)
,PctCmpl = format(sum(case when D.Status = 'Complete' then 1.0 else 0.0 end)/count(*),'##0.00%')
From #TempHier H
Join (Select _R1=B.R1,A.* From #Table A Join #TempHier B on A.ID=B.ID) D on D._R1 between H.R1 and H.R2
Group By H.R1
,H.R2
,H.Lvl
,H.ID
,H.ParentID
Order By 1
Returns the hier in a #Temp table for now. Notice the R1 and R2, I call these the range keys. Data (without recursion) can be selected and aggregated via these keys
R1 R2 Lvl ID ParentID
1 4 1 101 NULL
2 2 2 1 101
3 3 2 2 101
4 4 2 3 101
5 6 1 102 NULL
6 6 2 4 102
VERY SIMPLE EXAMPLE: Illustrates the rolling the data up the hier.
R1 R2 Lvl ID ParentID Total Complete Pending PctCmpl
1 4 1 101 NULL 4 2 1 50.00%
2 2 2 1 101 1 0 1 0.00%
3 3 2 2 101 1 1 0 100.00%
4 4 2 3 101 1 1 0 100.00%
5 6 1 102 NULL 2 1 0 50.00%
6 6 2 4 102 1 1 0 100.00%
The real beauty of the the range keys, is if you know an ID, you know where it exists (all descendants and ancestors).

A group by challenge

Let's say I have this table MyTbl
Record Id_try Id Type IsOk DateOk
1 1 MYDB00125 A 0 NULL
2 1 MYDB00125 B 1 2012-07-19 20:10:05.000
3 1 MYDB00125 A 0 2012-07-25 14:10:05.000
4 2 MYDB00125 A 0 2012-07-19 22:10:05.000
5 1 MYDB00254 B 0 2012-07-19 22:10:05.000
6 1 MYDB00254 A 0 NULL
7 3 MYDB00125 A 1 2012-07-19 22:15:05.000
8 3 MYDB00125 B 1 2012-07-19 22:42:53.000
9 1 MYDB00323 A 1 2012-07-22 00:15:05.00 0
10 1 MYDB00323 C 0 NULL
And I want a group by that brings me for each Id and Type my last "Id_Try Record".
SELECT Id, MAX(Id_Try), MyTbl.Type, IsOK, MAX(DateOk) from MyTbl
GROUP BY Id, MyTbl.Type, IsOK
Won't do, because It'll bring me the last Id_Try AND the last date (Date of record 3 in the example). And I don't care if its the last date or not, I need the date of the last Id_Try.
Is this only solved by a subselect? or a having clause could do?
This is the result expected:
Record Id_try Id Type IsOk DateOk
5 1 MYDB00254 B 0 2012-07-19 22:10:05.000
6 1 MYDB00254 A 0 NULL
7 3 MYDB00125 A 1 2012-07-19 22:15:05.000
8 3 MYDB00125 B 1 2012-07-19 22:42:53.000
9 1 MYDB00323 A 1 2012-07-22 00:15:05.00 0
10 1 MYDB00323 B 0 NULL

I think you will need to break this into two pieces:
with maxIDTry as
(
SELECT MAX(Id_try) as maxId, ID
FROM MyTable
GROUP BY ID
)
SELECT * FROM MyTable as mt
INNER JOIN maxIDTry as max
ON mt.id_try = max.maxId AND mt.id = max.id

I think you want this:
select * FROM
(
select *, row_number() over (partition by id,type order by Id_try desc) as position from mytbl
) foo
where position = 1
order by record
http://www.sqlfiddle.com/#!3/95742/5
Your sample result set lists
9 1 MYDB00323 A 1 2012-07-22 00:15:05.00 0
10 1 MYDB00323 A 0 NULL
But that doesn't make sense since you're saying the ID and the Id_try have the same value. I assume you meant for Id_try to be 2 maybe? Otherwise I think my results match up.

Hope this helps.
SELECT A.Record, A.Id_try, A.Id, A.Type, A.IsOk, A.DateOk
FROM MyTbl A INNER JOIN (
SELECT MAX(Id_Try) Id_Try, Id, B1.Type
from MyTbl B1
GROUP BY Id, B1.Type) AS B
ON A.Id_Try = B.Id_Try AND A.Id = B.Id AND A.Type = B.Type
ORDER BY A.RECORD

How can I calculate time duration between two rows of a column in SQL Server?

I have a data like this in the database
ID Server DownTime ServerStatus
--- ----------------------- ------------
1 2012-03-30 00:00:00.000 1
2 2012-03-30 00:30:00.000 0
3 2012-03-30 01:00:00.000 0
4 2012-03-30 01:30:00.000 0
5 2012-03-30 02:00:00.000 1
6 2012-03-30 02:30:00.000 1
7 2012-03-30 03:00:00.000 0
8 2012-03-30 03:30:00.000 1
I need a query or stored procedure that will give me output as
Start Time EndTime TotalDownTimeinMinutes
------------ ------------ ----------------------
3/30/12 0:30 3/30/12 2:00 90
3/30/12 3:00 3/30/12 3:30 30

-- because each "back up" can relate to multiple "down" times,
-- we take the longest period using MIN
SELECT Min(ServerDownTime) StartTime,
UpTime EndTime,
DateDiff(MI, Min(ServerDownTime), UpTime)
FROM
(
SELECT Down.ServerDownTime,
(-- subquery gives you the time when it came back up
SELECT Top 1 Up.ServerDownTime
FROM Tbl Up
WHERE Up.ServerDownTime > Down.ServerDownTime
AND Up.ServerStatus=1
ORDER BY Up.ServerDownTime ASC) UpTime
FROM Tbl Down
WHERE Down.ServerStatus=0 -- find all the downs
) X
GROUP BY UpTime
ORDER BY UpTime
You can test the above query using this DDL
create table Tbl
(
ID int,
ServerDownTime datetime,
ServerStatus bit
)
insert Tbl select
1 ,'2012-03-30 00:00:00.000', 1 union all select
2 ,'2012-03-30 00:30:00.000', 0 union all select
3 ,'2012-03-30 01:00:00.000', 0 union all select
4 ,'2012-03-30 01:30:00.000', 0 union all select
5 ,'2012-03-30 02:00:00.000', 1 union all select
6 ,'2012-03-30 02:30:00.000', 1 union all select
7 ,'2012-03-30 03:00:00.000', 0 union all select
8 ,'2012-03-30 03:30:00.000', 1
Or if you're on the web and nowhere near a SQL Server, here's an SQL Fiddle

This solution is based on recursive CTE's:
DECLARE #MyTable TABLE (
ID INT PRIMARY KEY,
ServerDownTime DATETIME NOT NULL,
UNIQUE (ServerDownTime),
ServerStatus BIT NOT NULL
);
INSERT #MyTable (ID, ServerDownTime, ServerStatus)
SELECT 1,'2012-03-30T00:00:00',1 UNION ALL
SELECT 2,'2012-03-30T00:30:00',0 UNION ALL
SELECT 3,'2012-03-30T01:00:00',0 UNION ALL
SELECT 4,'2012-03-30T01:30:00',0 UNION ALL
SELECT 5,'2012-03-30T02:00:00',1 UNION ALL
SELECT 6,'2012-03-30T02:30:00',1 UNION ALL
SELECT 7,'2012-03-30T03:00:00',0 UNION ALL
SELECT 8,'2012-03-30T03:30:00',1;
WITH Base
AS
(
SELECT *, ROW_NUMBER() OVER(ORDER BY t.ServerDownTime) AS RowNum
FROM #MyTable t
), DownTimeGrouping
AS
(
SELECT crt.RowNum,
crt.ID,
crt.ServerDownTime,
crt.ServerStatus,
CASE WHEN crt.ServerStatus=0 THEN 1 END AS GroupID,
CASE WHEN crt.ServerStatus=0 THEN 1 ELSE 0 END AS LastGroupID
FROM Base crt
WHERE crt.RowNum=1
UNION ALL
SELECT crt.RowNum,
crt.ID,
crt.ServerDownTime,
crt.ServerStatus,
CASE
WHEN prev.ServerStatus=0 AND crt.ServerStatus IN(0,1) THEN prev.GroupID
WHEN prev.ServerStatus=1 AND crt.ServerStatus=0 THEN prev.LastGroupID+1
END AS GroupID,
CASE
WHEN prev.ServerStatus=0 AND crt.ServerStatus IN(0,1) THEN prev.GroupID
WHEN prev.ServerStatus=1 AND crt.ServerStatus=0 THEN prev.LastGroupID+1
WHEN prev.ServerStatus=1 AND crt.ServerStatus=1 THEN prev.GroupID
END AS LastGroupID
FROM Base crt
INNER JOIN DownTimeGrouping prev ON crt.RowNum=prev.RowNum+1
)
SELECT *, DATEDIFF(MINUTE,x.StartTime,x.EndTime) AS MinutesDiff
FROM (
SELECT t.GroupID, MIN(t.ServerDownTime) AS StartTime, MAX(t.ServerDownTime) AS EndTime
FROM DownTimeGrouping t
WHERE t.GroupID IS NOT NULL
GROUP BY t.GroupID
) x
The basic idea is to group the rows starting with a ServerStatus=0 row and ending with a ServerStatus=1 row. For example, if you run this query you will see the downtime groups (column GroupID)::
WITH Base
AS
(...), DownTimeGrouping
AS
(...)
SELECT *
FROM DownTimeGrouping g
ORDER BY g.RowNum
RowNum ID ServerDownTime ServerStatus GroupID LastGroupID
-------------------- ----------- ----------------------- ------------ ----------- -----------
1 1 2012-03-30 00:00:00.000 1 NULL 0
2 2 2012-03-30 00:30:00.000 0 1 1
3 3 2012-03-30 01:00:00.000 0 1 1
4 4 2012-03-30 01:30:00.000 0 1 1
5 5 2012-03-30 02:00:00.000 1 1 1
6 6 2012-03-30 02:30:00.000 1 NULL 1
7 7 2012-03-30 03:00:00.000 0 2 2
8 8 2012-03-30 03:30:00.000 1 2 2