In SQL is there a way to simulate SAS MERGE operation - sql-server

Lets say I have the following 2 datasets:
A B C
----------- ----------- -----------
1 100 1000
1 120 1001
2 140 1002
2 160 1003
3 180 1004
3 190 1005
3 200 1006
and
A D E
----------- ----------- -----------
1 61 2000
1 62 2001
1 63 2001
2 64 2002
3 65 2004
3 66 2005
3 67 2006
3 68 2006
Is it possible to generate the following output dataset (. represents null)?
A B C D E
----------- ----------- ----------- ----------- -----------
1 100 1000 61 2000
1 120 1001 62 2001
1 . . 63 2001
2 140 1002 64 2002
2 160 1003 . .
3 180 1004 65 2004
3 190 1005 66 2005
3 200 1006 67 2006
3 . . 68 2006
The merge takes all the records from both tables, and adds them to the result set at most once.
If records join they are not multiplied as in a classical sql join. Each record is aligned with a matching record and when they run out of records null is inserted.
I've been thinking that perhaps the new partitioning functions can achieve this, but I've been away from SQL too long now and I cant think of a way to do this "special join".
I've also considered making a distinct list of the keys and then left joining them to the 2 tables, but then I get stuck, because the join will still multiply the record counts..

You can do this with the row_number() windowing function. Naming the two datasets DS1 and DS2 the result will look like this:
WITH DS1Seq As (
SELECT A, B, C, row_number() OVER (partition by A order by A, B, C) As SeqNumber
FROM DS1
),
DS2Seq As (
SELECT A, D, E, row_number() OVER (partition by A order by A, D, E) As SeqNumber
FROM DS2
)
SELECT coalesce(DS1Seq.A, DS2Seq.A) As A, B, C, D, E
FROM DS1Seq
FULL JOIN DS2Seq on DS1Seq.A = DS2Seq.A AND DS1Seq.SeqNumber = DS2Seq.SeqNumber

Related

How to pull consecutive months data in sql server even if there is null value

I'm newbie trying to create a SQL query to find how much each Theater has sold the tickets per month during previous year (i.e. for all 12 months). If the collection amount is null or blank I need to produce an output as Zero of any such given month in that year.
I have two tables as below mentioned:
TABLE 1:
Month_Number Year
1 2016
2 2016
3 2016
4 2016
5 2016
6 2016
7 2016
8 2016
9 2016
10 2016
11 2016
12 2016
TABLE 2:
Theater month Amount_In_Thousands
ABC 1 165
ABC 3 70
ABC 4 102
GHI 1 45
GHI 2 70
GHI 3 42
GHI 4 57
ABC 6 122
ABC 7 67
ABC 8 22
ABC 9 80
ABC 11 46
ABC 12 38
You might have noticed for 'ABC' Theater there is 0 or null values for month 2, month 5 and month 10. I am unable to produce these missing months with zero value. I tried with simple left outer join but still the data output row doesn't show with month/year and zero value.
I need to produce the output as below:
OUTPUT
Movie_Theators Month Amount_In_Thounds
ABC 1 165
ABC 2 0 *
ABC 3 70
ABC 4 102
ABC 5 0 *
ABC 6 122
ABC 7 67
ABC 8 22
ABC 9 80
ABC 10 0 *
ABC 11 46
ABC 12 38
GHI 1 45
GHI 2 70
GHI 3 42
GHI 4 57
Can anybody please help me how to write sql script in order to produce the output as shown above. Thank you so much in advance.
You can use a CROSS JOIN between every theater and month-year, and then perform a LEFT JOIN with Table2:
SELECT A.Theater,
B.Month_Number,
B.[Year],
ISNULL(C.Amount_In_Thousands,0) Amount_In_Thousands
FROM ( SELECT DISTINCT Theater
FROM dbo.Table2) A -- or use a dbo.Theater table if you have one
CROSS JOIN dbo.Table1 B
LEFT JOIN dbo.Table2 C
ON A.Theater = C.Theater
AND B.Month_Number = C.[month]
AND B.[Year] = C.[Year];

Oracle based PIVOT with multiple columns group

Using the following tables,
Productivity:
PRODUCTIVITYID PDATE EMPLOYEEID ROOMID ROOMS_SOLD SCR
81 03/26/2016 7499 21 56 43
82 03/26/2016 7566 42 - -
102 03/26/2016 7499 22 434 22
101 03/26/2016 7566 21 43 53
ProductivityD:
PRODUCTIVITYID WORKHRS MEALPANELTY DESCRIPTION
2 50 4 -
21 6.4 1 -
102 6 - -
81 1.32 - -
101 3.6 - -
Rooms:
ID ROOM PROPERTCODE
22 102 6325
41 103 6325
42 104 6325
43 105 6325
EMP:
EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO
7566 JONES MANAGER 7839 04/02/1981 2975 - 20
7788 SCOTT ANALYST 7566 12/09/1982 3000 - 20
7902 FORD ANALYST 7566 12/03/1981 3000 - 20
7369 SMITH CLERK 7902 12/17/1980 800 - 20
7499 ALLEN SALESMAN 7698 02/20/1981 1600 300 30
The following query is generating below output but I need to group employees and sum workhrs and then pivot RM_ROOM and RM_SCR
WITH pivot_data AS (
SELECT eNAME,workhrs,room, 'RM' as RM,SCR from PRODUCTIVITY p,PRODUCTIVITYd d, emp e, ROOMS R
where p.PRODUCTIVITYID=d.PRODUCTIVITYID and e.empno=p.employeeid
AND R.ID=P.ROOMID
)
SELECT *
FROM pivot_data
PIVOT (
MIN(room) as room,min(scr) as SCR --<-- pivot_clause
FOR RM--<-- pivot_for_clause
IN ('RM') --<-- pivot_in_clause
)
Current Output:
ENAME WORKHRS 'RM'_ROOM 'RM'_SCR
JONES 3.6 101 53
ALLEN 6 102 22
ALLEN 1.32 101 43
Desired Output:
ENAME WORKHRS 'RM'_ROOM 'RM'_SCR 'RM'_ROOM 'RM'_SCR
JONES 3.6 101 53 - -
ALLEN 7.32 101 43 102 22
You are pivoting on a fixed value, the string literal 'RM', so you're really not doing anything useful in the pivot - the output is the same as you'd get from running the 'pivot_data' query on its own:
SELECT eNAME,workhrs,room, SCR from PRODUCTIVITY p,PRODUCTIVITYd d, emp e, ROOMS R
where p.PRODUCTIVITYID=d.PRODUCTIVITYID and e.empno=p.employeeid
AND R.ID=P.ROOMID;
ENAME WORKHRS ROOM SCR
----- ---------- ---------- ----------
JONES 3.6 101 53
ALLEN 1.32 101 43
ALLEN 6 102 22
You want the aggregate workhrs for each employee, and a pivot of the rooms they sold. If you change that query to get the analytic sum of workhrs and a ranking of the room/scr values (and using modern join syntax) you get:
select e.ename, r.room, p.scr,
sum(d.workhrs) over (partition by e.ename) as wrkhrs,
rank() over (partition by e.ename order by r.room, p.scr) as rnk
from productivity p
join productivityd d on d.productivityid = p.productivityid
join emp e on e.empno=p.employeeid
join rooms r on r.id = p.roomid;
ENAME ROOM SCR WRKHRS RNK
----- ---------- ---------- ---------- ----------
ALLEN 101 43 7.32 1
ALLEN 102 22 7.32 2
JONES 101 53 3.6 1
You can then pivot on that generated rnk number:
with pivot_data as (
select e.ename, r.room, p.scr,
sum(d.workhrs) over (partition by e.ename) as wrkhrs,
rank() over (partition by e.ename order by r.room, p.scr) as rnk
from productivity p
join productivityd d on d.productivityid = p.productivityid
join emp e on e.empno=p.employeeid
join rooms r on r.id = p.roomid
)
select *
from pivot_data
pivot (
min(room) as room, min(scr) as scr --<-- pivot_clause
for rnk --<-- pivot_for_clause
in (1, 2, 3) --<-- pivot_in_clause
);
ENAME WRKHRS 1_ROOM 1_SCR 2_ROOM 2_SCR 3_ROOM 3_SCR
----- ---------- ---------- ---------- ---------- ---------- ---------- ----------
ALLEN 7.32 101 43 102 22
JONES 3.6 101 53
You need to know the maximum number of rooms any employee may have - i.e. the highest rnk could ever be - and include all of those in the in clause. Which means you're likely to end up with empty columns, as in this example where there is no data for 3_room or 3_scr. You can't avoid that though, unless you get an XML result or generate the query dynamically.
What you are saying makes no sense. What do you mean by "pivot RM_ROOM"? So I have to guess. I am guessing you want to group employees and sum workhrs, and then pivot the result. The "Output" you show seems to be the output for pivot_data, your subquery.
Your answer will only have eNAME and for each of them, a count of hours worked. So you don't need to SELECT the room numbers in the pivot_data subquery. You only need eNAME and workhrs. Then it is a simple matter of using the PIVOT syntax:
WITH pivot_data AS (
SELECT eNAME, workhrs FROM PRODUCTIVITY p,PRODUCTIVITYd d, emp e, ROOMS R
where p.PRODUCTIVITYID=d.PRODUCTIVITYID and e.empno=p.employeeid
AND R.ID=P.ROOMID
)
SELECT *
FROM pivot_data
PIVOT (
SUM(workhrs)
FOR eNAME IN ('JONES', 'ALLEN')
)
/
Output:
'JONES' 'ALLEN'
---------- ----------
3.6 7.32

Query is very slow

I have tables
table1
epid etid id EValue reqdate
----------- ----------- ----------- ------------ ----------
15 1 1 498925307069 2012-01-01
185 1 2 A5973FC43CE3 2012-04-04
186 1 2 44C6A4B776A2 2012-04-05
205 1 2 7A0ED3F1DA13 2012-09-19
206 1 2 77771D65F9C4 2012-09-19
207 1 2 AD74A4AA41BD 2012-09-19
208 1 2 9595ABE5A0C8 2012-09-19
209 1 2 7611D2FB395B 2012-09-19
210 1 2 04A510D6067A 2012-09-19
211 1 2 24D43EC268F8 2012-09-19
table2
PEId Id EPId
----------- ----------- -----------
43 9 15
44 10 15
45 11 15
46 12 15
47 13 15
48 14 15
49 15 15
50 16 15
51 17 15
52 18 15
table3
PLId PEId Id ToPayId
----------- ----------- ----------- -----------
71 43 9 1
72 43 9 2
73 44 10 1
74 44 10 2
75 45 11 1
76 45 11 2
77 46 12 1
78 46 12 2
79 47 13 1
80 47 13 2
I want to get one id whose count is less than 8 in table 3 and order by peid in table 2,
I have written query
SELECT Top 1 ToPayId FROM
(
SELECT Count(pl.ToPayId) C, pl.ToPayId
FROM table3 pl
INNER JOIN table2 pe ON pl.peid = pe.peid
INNER JOIN table1 e ON pe.epid = e.epid
WHERE e.EtId=1 GROUP BY pl.ToPayId
) As T
INNER JOIN table2 p ON T.ToPayId= p.Id
WHERE C < 8 ORDER BY p.PEId ASC
This query executes more than 1000 times in stored procedure depends on the entries in user-defined-table-type using while condition.
But it is very slow as we have millions of entries in each table.
Can anyone suggest better query regarding above?
maybe try with the having clause to get rid of the from select
select table2.id as due
from table3 inner join table2 on table2.PEId=table3.PEId...
group by ...
having count(due) <8
order by ...
-> you have a redundant Id column in table3 : seems pretty useless as the couple PEId and Id appears unique so remove it and reduce the size of table 3 by 25% hence improving performance of db
Will.. since you did not provide enough sample data and I am not sure what exactly your business logic is. So that I can just modify the code in blind.
SELECT ToPayId
FROM (
SELECT TOP 1 Count(pl.ToPayId) C, pl.ToPayId, pe.PEId
FROM table3 as pl
INNER JOIN table2 as pe ON pl.peid = pe.peid AND pl.ToPayId = pe.Id
INNER JOIN table1 e ON pe.epid = e.epid
WHERE e.EtId=1
GROUP BY pl.ToPayId, pe.PEId
HAVING Count(pl.ToPayId) < 8
ORDER BY pe.PEId ASC
) AS T

Fetch Only Last Entry by user daily

I am working on a small reporting application. I have two tables
Agent Table Data
AgentID AgentName
------- ---------
1001 ABC
1002 XYZ
1003 POI
1004 JKL
Report Table Data
ReportID AgentId Labor Mandays Amount SubmitDate
-------- ------- ----- ------- ------ ----------
1 1001 30 10 5000 11/12/2011
2 1001 44 18 8000 11/14/2011
3 1002 33 75 3022 11/12/2011
4 1001 10 10 1500 11/14/2011
5 1002 10 10 1800 11/14/2011
6 1001 10 10 1400 11/14/2011
7 1003 40 40 1500 11/14/2011
8 1003 40 40 1800 11/14/2011
I want to generate a report which gives us output like
ReportID AgentId Labor Mandays Amount SubmitDate
-------- ------- ----- ------- ------ ----------
1 1001 30 10 5000 11/12/2011
3 1002 33 75 3022 11/12/2011
6 1001 10 10 1400 11/14/2011
5 1002 10 10 1800 11/14/2011
8 1003 40 40 1800 11/14/2011
Thanks in Advance
You didn't mention what VERSION of SQL Server you're using - if you're on 2005 or newer, you can use a CTE (Common Table Expression) with the ROW_NUMBER function:
;WITH LastPerAgent AS
(
SELECT
AgentID, ReportID, Labor, Mandays, Amount, SubmitDate,
ROW_NUMBER() OVER(PARTITION BY AgentID,SubmitDate
ORDER BY ReportID DESC) AS 'RowNum'
FROM dbo.Report
)
SELECT
AgentID, ReportID, Labor, Mandays, Amount, SubmitDate,
FROM LastPerAgent
WHERE RowNum = 1
This CTE "partitions" your data by AgentID and SubmitDate, and for each partition, the ROW_NUMBER function hands out sequential numbers, starting at 1 and ordered by ReportID DESC - so the "last" row (with the highest ReportID) for each (AgentID, SubmitDate) pair gets RowNum = 1 which is what I select from the CTE in the SELECT statement after it.
PS: this doesn't work 100% on your input data, since you've not defined how to group and how to eliminate rows.... you might need to adapt this query a bit, based on your requirements...

SQL Server : Update a column

I have a TableA
ID MatCh01 Match02 Status
1 1001 12
2 1001 12
3 1001 12
4 1002 44
5 1002 47
6 1003 22
7 1003 22
8 1004 55
9 1004 57
I want to populate column = status with "FAIL" when :
For same match01, there exist different match02. Expected TableA :
ID MatCh01 Match02 Status
1 1001 12 NULL
2 1001 12 NULL
3 1001 12 NULL
4 1002 44 FAIL
5 1002 47 FAIL
6 1003 22 NULL
7 1003 22 NULL
8 1004 55 FAIL
9 1004 57 FAIL
Please NOTE: FAIL all 'match01' if its corresponding 'match02' is different.
Thanks
Basically this says Update all Values in TableA when the MAX and MIN of Column Match02 are not equal (meaning match01 has multiple rows with different values for match 02).
UPDATE A
SET Status = 'FAIL'
FROM TableA A
INNER JOIN (SELECT
a2.Match01
FROM TableA A2
GROUP BY a2.Match01
HAVING MAX(Match02) <> MIN(Match02)) B ON
A.Match01 = B.Match01
When there's more than one distinct value of match02 for any match01, update those rows with the same match01.
UPDATE t1
SET Status = 'FAIL'
FROM TableA t1
WHERE t1.Match01 in
(
SELECT t2.Match01
FROM TableA t2
GROUP BY t2.Match01
HAVING COUNT(DISTINCT t2.Match02) > 1
)

Resources