Group by on columns with repeated data - sql-server

In the SQL Server I have a table with repeated data in few columns and NULL values in few other columns some thing similar show as below.
DirName | BillingNPI | Average | TotalClaims | MaxString | MinString | CorrectString
------------------------------------------------------------------------------------
AASTA | 158967 | 10 | 20 | NULL | NULL | Value
AASTA | 158967 | 10 | 20 | NULL | ValueSTA | Value
AASTA | 158967 | 10 | 20 | STAValue | NULL | Value
Now using GROUP BY I'm trying the output of my query to be
DirName | BillingNPI | Average | TotalClaims | MaxString | MinString | CorrectString
------------------------------------------------------------------------------------
AASTA | 158967 | 10 | 20 | STAValue | ValueSTA | Value
Do I have to use inner join on the same table to achieve this ?

Does this get what you need?
select
DirName ,
BillingNPI ,
Average ,
TotalClaims ,
max(isnull(MaxString,'') maxstring,
max(isnull(MinString,'') minstring,
CorrectString
group by
DirName ,
BillingNPI ,
Average ,
TotalClaims ,
CorrectString

Related

How can I improve the response time of this query in Oracle

this query takes 24 seconds and returns 1891 results:
SELECT p.STATE, p.REFNUM, p.CODE, p.TYPE, i.STATE, pj.NAME, pj.DOCUMENT
TABLE p
inner join TABLE2 i on i.REFNUM = p.REFNUM
inner join TABLE3 pj on pj.NUMBER = i.NUMBER and p.OFIC_ID = pj.OFIC_ID and p.PUB_ID = pj.PUB_ID
inner join OFICE o on t.OFIC_ID = p.OFIC_ID and o.PUB_ID = p.PUB_ID
inner join GROUP glad on glad.GROUP_CODE=p.GROUP_CODE
WHERE glad.GROUP_TYPE ='3' AND i.STATE = '1'
AND p.PUB_ID IN ('05','11','12','09','08','13','04','02','01','06','10','03','07','14')
AND pj.NAME LIKE 'BANK%'
ORDER BY o.NAME,p.ID;
I have these indexes:
CREATE INDEX IND_TABLE1_REFNUM_ZONE ON TABLE1 (PUB_ID, OFIC_ID, REFNUM, ZONE_ID)
CREATE INDEX IND_TABLE1_REFNUMPUB ON TABLE1 (REFNUM, PUB_ID, OFIC_ID, GROUP_CODE );
CREATE INDEX IND_TABLE1_GROUP ON TABLE1 (PUB_ID, GROUP_CODE, REFNUM, OFIC_ID)
CREATE INDEX IND_TABLE2_REF ON TABLE2 (REFNUM, NUMBER, STATE);
CREATE INDEX IND_TABLE2_QUERY ON TABLE2 (NUMBER, TYPE, STATE, REFNUM, NUM, CODE);
CREATE INDEX IND_TABLE2_REFNUM ON TABLE2 (REFNUM)
CREATE INDEX IND_TABLE2_NUMBER ON TABLE2 (NUMBER)
CREATE INDEX IND_TABLE3_NUM ON TABLE3 (NUMBER, PUB_ID, OFIC_ID, NAME );
CREATE INDEX IND_TABLE3_NAME ON TABLE3 ( NAME );
CREATE INDEX IND_GROUP_COD ON GROUP (GROUP_CODE, GROUP_TYPE)
I made the following queries to see how many records are in each table:
SELECT count(*) FROM TABLE1 --> 18298458 results
SELECT count(*) FROM TABLE2 --> 60627924 results
SELECT count(*) FROM TABLE3 --> 18425913 results
SELECT count(*) FROM OFICE --> 65 results
SELECT count(*) FROM TABLE1 p INNER JOIN GROUP glad on glad.GROUP_CODE=p.GROUP_CODE where glad.GROUP_TYPE ='3' AND p.PUB_ID IN ('05','11','12','09','08','13','04','02','01','06','10','03','07','14') --> 1314077 results
SELECT count(*) FROM TABLE1 p INNER JOIN GROUP glad on glad.GROUP_CODE=p.GROUP_CODE where glad.GROUP_TYPE ='3' AND p.PUB_ID IN ('05') --> 53754 results
SELECT count(*) FROM TABLE3 WHERE NAME LIKE 'BANK%' --> 1922081 results
this is the plan generated by oracle:
-----------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Pstart| Pstop |
-----------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 291K| 38M| 384K| | |
| 1 | SORT ORDER BY | | 291K| 38M| 384K| | |
| 2 | HASH JOIN | | 291K| 38M| 375K| | |
| 3 | TABLE ACCESS FULL | OFICE | 64 | 960 | 3 | | |
| 4 | HASH JOIN | | 291K| 34M| 375K| | |
| 5 | INDEX SKIP SCAN | IND_GROUP_COD | 47 | 329 | 1 | | |
| 6 | HASH JOIN | | 452K| 50M| 375K| | |
| 7 | PART JOIN FILTER CREATE | :BF0000 | 452K| 50M| 375K| | |
| 8 | NESTED LOOPS | | 452K| 50M| 375K| | |
| 9 | NESTED LOOPS | | | | | | |
| 10 | STATISTICS COLLECTOR | | | | | | |
| 11 | HASH JOIN | | 2100K| 166M| 252K| | |
| 12 | NESTED LOOPS | | 2100K| 166M| 252K| | |
| 13 | STATISTICS COLLECTOR | | | | | | |
| 14 | PARTITION RANGE ALL | | 1681K| 89M| 82582 | 1 | 19 |
| 15 | PARTITION HASH ALL | | 1681K| 89M| 82582 | 1 | 32 |
| 16 | TABLE ACCESS FULL | TABLE3 | 1681K| 89M| 82582 | 1 | 608 |
| 17 | INDEX RANGE SCAN | IND_TABLE2_QUERY | 1 | 27 | 103K| | |
| 18 | INDEX FAST FULL SCAN | IND_TABLE2_QUERY | 32M| 845M| 103K| | |
| 19 | INDEX RANGE SCAN | IND_TABLE1_REFNUM_ZONE| | | | | |
| 20 | TABLE ACCESS BY GLOBAL INDEX ROWID| TABLE1 | 1 | 35 | 70380 | ROWID | ROWID |
| 21 | PARTITION RANGE ALL | | 19M| 650M| 70380 | 1 | 19 |
| 22 | PARTITION HASH JOIN-FILTER | | 19M| 650M| 70380 |:BF0000|:BF0000|
| 23 | TABLE ACCESS FULL | TABLE1 | 19M| 650M| 70380 | 1 | 608 |
-----------------------------------------------------------------------------------------------------------------
I think it takes time because this one is using TABLE ACCESS FULL for TABLE1 and TABLE3
if I perform the query filtering only PUB_ID='05' instead of all the numbers in the above query, the query returns 181 results and takes 8 seconds and in that case oracle generates this plan:
--------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Pstart| Pstop |
--------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 797 | 108K| 312K| | |
| 1 | SORT ORDER BY | | 797 | 108K| 312K| | |
| 2 | NESTED LOOPS | | 797 | 108K| 312K| | |
| 3 | HASH JOIN | | 1238 | 160K| 312K| | |
| 4 | TABLE ACCESS BY INDEX ROWID BATCHED| OFICE | 3 | 45 | 2 | | |
| 5 | INDEX RANGE SCAN | SYS_C0034405 | 3 | | 1 | | |
| 6 | HASH JOIN | | 2091 | 240K| 312K| | |
| 7 | PART JOIN FILTER CREATE | :BF0000 | 66316 | 5375K| 241K| | |
| 8 | NESTED LOOPS | | 66316 | 5375K| 241K| | |
| 9 | PARTITION RANGE ALL | | 53085 | 2903K| 82490 | 1 | 19 |
| 10 | PARTITION HASH ALL | | 53085 | 2903K| 82490 | 1 | 32 |
| 11 | TABLE ACCESS FULL | TABLE3 | 53085 | 2903K| 82490 | 1 | 608 |
| 12 | INDEX RANGE SCAN | IND_TABLE2_QUERY | 1 | 27 | 3 | | |
| 13 | PARTITION RANGE ALL | | 762K| 25M| 68657 | 1 | 19 |
| 14 | PARTITION HASH JOIN-FILTER | | 762K| 25M| 68657 |:BF0000|:BF0000|
| 15 | TABLE ACCESS FULL | TABLE1 | 762K| 25M| 68657 | 1 | 608 |
| 16 | INDEX RANGE SCAN | IND_GROUP_COD | 1 | 7 | 0 | | |
--------------------------------------------------------------------------------------------------------------
SYS_C0034405 is the primary key of OFFICE which contains these fields: (PUB_ID, REG_ID)
if in addition to filtering only PUB_ID='05' I remove the "order by", the query takes only 3.5 seconds but I definitely have to return the ordered data and I would prefer to be able to filter several PUB_IDs
I thought the query could be improved if I removed the "inner join" from GROUP and changed the filter "glad.GROUP_TYPE ='3'" to "p.GROUP_CODE in ('01','07','10','21 ')" (these are all type 3 codes), because now it should use the IND_TABLE1_GROUP index but instead of improving, it gets worse, it takes 13 seconds even filtering only PUB_ID='05'; This is the plan that oracle generates:
------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Pstart| Pstop |
------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 47 | 6251 | 172K| | |
| 1 | SORT ORDER BY | | 47 | 6251 | 172K| | |
| 2 | HASH JOIN | | 47 | 6251 | 172K| | |
| 3 | PARTITION RANGE ALL | | 47786 | 2613K| 82490 | 1 | 19 |
| 4 | PARTITION HASH ALL | | 47786 | 2613K| 82490 | 1 | 32 |
| 5 | TABLE ACCESS FULL | TABLE3 | 47786 | 2613K| 82490 | 1 | 608 |
| 6 | HASH JOIN | | 41945 | 3154K| 89633 | | |
| 7 | NESTED LOOPS | | 41945 | 3154K| 89633 | | |
| 8 | NESTED LOOPS | | 75740 | 3154K| 89633 | | |
| 9 | STATISTICS COLLECTOR | | | | | | |
| 10 | NESTED LOOPS | | 18935 | 924K| 15985 | | |
| 11 | TABLE ACCESS BY INDEX ROWID BATCHED | OFICE | 3 | 45 | 2 | | |
| 12 | INDEX RANGE SCAN | SYS_C0034405 | 3 | | 1 | | |
| 13 | INLIST ITERATOR | | | | | | |
| 14 | TABLE ACCESS BY GLOBAL INDEX ROWID BATCHED| TABLE1 | 6312 | 215K| 7039 | ROWID | ROWID |
| 15 | INDEX RANGE SCAN | IND_TABLE1_GROUP | 6828 | | 284 | | |
| 16 | INDEX RANGE SCAN | IND_TABLE2_REFNUM | 4 | | 2 | | |
| 17 | TABLE ACCESS BY INDEX ROWID | TABLE2 | 2 | 54 | 4 | | |
| 18 | INDEX FAST FULL SCAN | IND_TABLE2_QUERY | 2 | 54 | 2 | | |
------------------------------------------------------------------------------------------------------------------------------
And if I put all the PUB_IDs, Oracle generates this plan (it doesn't even use the IND_TABLE1_GROUP index anymore):
-------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost | Pstart| Pstop |
-------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 35661 | 4631K| 345K| | |
| 1 | SORT ORDER BY | | 35661 | 4631K| 345K| | |
| 2 | HASH JOIN | | 35661 | 4631K| 344K| | |
| 3 | TABLE ACCESS FULL | OFICE | 64 | 960 | 3 | | |
| 4 | HASH JOIN | | 35661 | 4109K| 344K| | |
| 5 | PARTITION RANGE ALL | | 1360K| 45M| 79580 | 1 | 19 |
| 6 | PARTITION HASH ALL | | 1360K| 45M| 79580 | 1 | 32 |
| 7 | TABLE ACCESS FULL | TABLE1 | 1360K| 45M| 79580 | 1 | 608 |
| 8 | HASH JOIN | | 2100K| 166M| 252K| | |
| 9 | NESTED LOOPS | | 2100K| 166M| 252K| | |
| 10 | STATISTICS COLLECTOR | | | | | | |
| 11 | PARTITION RANGE ALL | | 1681K| 89M| 82582 | 1 | 19 |
| 12 | PARTITION HASH ALL | | 1681K| 89M| 82582 | 1 | 32 |
| 13 | TABLE ACCESS FULL | TABLE3 | 1681K| 89M| 82582 | 1 | 608 |
| 14 | INDEX RANGE SCAN | IND_TABLE2_QUERY | 1 | 27 | 103K| | |
| 15 | INDEX FAST FULL SCAN | IND_TABLE2_QUERY | 32M| 845M| 103K| | |
-------------------------------------------------------------------------------------------------

How to place NULLs last in ascending order

I need to sort a table and I need to display the rows that include Nulls at the bottom. Whenever I run the below query
select * from t1
order by status, date;
Nulls show up in the first row which I don't want:
+--------+------------+--+
| Status | Date | |
+--------+------------+--+
| 1 | NULL | |
| 1 | 2011-12-01 | |
| 1 | 2011-12-21 | |
| 2 | NULL | |
| 2 | 2005-09-02 | |
| 3 | 2000-08-07 | |
| | | |
+--------+------------+--+
This is what I need:
+--------+------------+--+
| Status | Date | |
+--------+------------+--+
| 1 | 2011-12-01 | |
| 1 | 2011-12-21 | |
| 1 | NULL | |
| 2 | 2005-09-02 | |
| 2 | NULL | |
| 3 | 2000-08-07 | |
| | | |
+--------+------------+--+
How can I do it?
select * from t1
order by status,
date,
CASE WHEN date is NULL
THEN 1
ELSE 0
END;
Would this work: order by col asc nulls last?

Join 2 tables by matching children

I'm trying to have 2 tables (In this case it's actually 1 table in a self join) joined by their matching children.
Let me preface the purpose of this which might give a better understanding what I need:
I'm trying to look up a new order that I just got, to see if we ever had the same order, in order to find out in which box type this would be packaged.
So i'd need the matching order to contain the same item and the same qty for the item.
Look at the tables below and note that order 1300981 has the same items as order 1303097, how do I write this join?
Remember: I don't want the results to include any matches that do not match %100.
SQL Fiddle
OrderMain:
| OrderID | BoxId |
|---------|--------|
| 1300981 | 34 |
| 1303096 | (null) |
| 1303097 | (null) |
| 1303098 | (null) |
| 1303099 | (null) |
| 1303100 | (null) |
| 1303101 | (null) |
| 1303102 | (null) |
| 1303103 | (null) |
| 1303104 | B1 |
| 1303105 | (null) |
| 1303106 | (null) |
| 1303107 | 48 |
| 1303108 | (null) |
| 1303109 | (null) |
| 1303110 | (null) |
| 1303111 | (null) |
| 1303112 | (null) |
| 1303113 | (null) |
| 1303114 | (null) |
| 1303115 | (null) |
| 1303116 | (null) |
| 1303117 | (null) |
Order Detail:
| id | OrderID | Item | Qty |
|----|---------|--------|-----|
| 1 | 1300981 | 172263 | 3 |
| 2 | 1300981 | 171345 | 3 |
| 3 | 1300981 | 138757 | 3 |
| 4 | 1303117 | 231711 | 1 |
| 5 | 1303116 | 227835 | 1 |
| 6 | 1303115 | 244798 | 1 |
| 7 | 1303114 | 121755 | 1 |
| 8 | 1303113 | 145275 | 2 |
| 9 | 1303112 | 219554 | 1 |
| 10 | 1303111 | 179385 | 1 |
| 11 | 1303110 | 6229 | 1 |
| 12 | 1303109 | 217330 | 1 |
| 13 | 1303108 | 243596 | 1 |
| 14 | 1303107 | 246758 | 1 |
| 15 | 1303106 | 193931 | 1 |
| 16 | 1303105 | 244659 | 1 |
| 17 | 1303104 | 192548 | 1 |
| 18 | 1303103 | 228410 | 1 |
| 19 | 1303102 | 147474 | 1 |
| 20 | 1303101 | 239191 | 1 |
| 21 | 1303100 | 243594 | 1 |
| 22 | 1303099 | 232301 | 1 |
| 23 | 1303098 | 201212 | 1 |
| 24 | 1303097 | 172263 | 3 |
| 25 | 1303097 | 171345 | 3 |
| 26 | 1303097 | 138757 | 3 |
| 27 | 1303096 | 172263 | 3 |
| 28 | 1303096 | 171345 | 1 |
| 29 | 1303096 | 138757 | 3 |
| 30 | 1303095 | 172263 | 3 |
Expected Results
| OrderID | BoxId |
|---------|--------|
| 1303097 | 34 |
May be a weird way to do this, but if you convert the order details to xml and compare it to other orders, you can look for matches.
WITH BoxOrders AS
(
SELECT om.[OrderId],
om.[BoxId],
(SELECT Item, Qty
FROM orderDetails od
WHERE od.[OrderId] = om.[OrderId]
ORDER BY Item
FOR XML PATH('')) Details
FROM orderMain om
WHERE BoxID IS NOT NULL
)
SELECT mo.OrderId, bo.BoxId
FROM BoxOrders bo
JOIN (
SELECT om.[OrderId],
om.[BoxId],
(SELECT Item, Qty
FROM orderDetails od
WHERE od.[OrderId] = om.[OrderId]
ORDER BY Item
FOR XML PATH('')) Details
FROM orderMain om
WHERE BoxID IS NULL
) mo ON bo.Details = mo.Details
SQL Fiddle
Here's a different approach using SQL and a few analytics.
This joins order detail to itself based on item and qty and order number < other order number and ensures the count of items in each order matches. Thus if items match, count matches and qty matches then the order has the same items.
This returns both orders but easily enough to adjust. Using the CTE so the count materializes. Pretty sure you can't use a having with an analytic like this.
The one major assumption I'm making is that order numbers are sequential and when you say see if an older order exists, I should only need to look at earlier order numbers when evaluating if a prior order had the same items and quantities.
I'm also assuming a 100% match means: Exact same items. Same Quantity of items. and SAME Item Count so count of items for order 1 is 3 and order 2 is 3 and items and quantities match that is 100% but if order 2 had 4 items and order 1 only had 3, no match.
with cte as (
SELECT distinct OD1.OrderID PriorOrder, od2.orderID newOrder, OM.BoxId,
count(OD1.Item) over (partition by OD1.OrderID) OD1Cnt,
count(OD2.Item) over (partition by OD2.OrderID) OD2cnt
FROM OrderDetails OD1
INNER JOIN orderDetails OD2
on OD1.item=OD2.item
and od1.qty = od2.qty
and OD1.OrderID < OD2.OrderID
LEFT JOIN ORderMain OM
on OM.OrderID = OD1.orderID)
Select PriorOrder, NewOrder, boxID from cte where od1cnt = od2cnt

Group by Date bigger than SQL Server

I'm working with Micrososft SqlServer 2012 and I have this table:
Table
+-----------+--------+------------+
| Id_Client | Amount | Date |
+-----------+--------+------------+
| 1 | 100 | 24/08/2015 |
| 2 | 100 | 24/07/2015 |
| 3 | 100 | 24/06/2015 |
| 3 | 100 | 24/05/2015 |
+-----------+--------+------------+
And I need to make a query like this:
Query
SELECT ID_CLIENT,
CASE WHEN DATE <= '01/07/2015' THEN 'OLD' ELSE 'NEW' END,
SUM(AMOUNT) FROM TABLE
GROUP BY ID_CLIENT
How do I Group by Date with the condition, instead of each Date?
I expect something like:
Expected result
+---+-----+-----+
| 1 | NEW | 100 |
| 2 | NEW | 100 |
| 3 | OLD | 200 |
+---+-----+-----+

Finding the max and min date values in SQL Server tables

I have two tables:
A lookup table (tabOne):
KEY | Group | Name | Desc | Val_Key
----------------------------------------
1 | a | NameA | DescA | 10
2 | b | NameB | DescB | 20
3 | c | NameC | DescC | 30
4 | d | NameD | DescD | 40
5 | e | NameE | DescE | 50
6 | f | NameF | DescF | 60
A second table containing readings (tabTwo):
KEY | Date | Reading | Val_Key
----------------------------------------
1 | Date | Read | 10
2 | Date | Read | 20
3 | Date | Read | 40
4 | Date | Read | 40
5 | Date | Read | 30
6 | Date | Read | 20
7 | Date | Read | 40
8 | Date | Read | 20
9 | Date | Read | 10
10 | Date | Read | 20
11 | Date | Read | 50
12 | Date | Read | 60
What I need to do is join tabTwo with TabOne and create a column with the newest Reading and a column with the oldest reading for each item in the group column of TabOne.
At the end of the day I want a table that look as follow:
KEY | Group | Name | Desc | Val_Key | LastReading | FirstReading |
-------------------------------------------------------------------------
1 | a | NameA | DescA | 10 | | |
2 | b | NameB | DescB | 20 | | |
3 | c | NameC | DescC | 30 | | |
4 | d | NameD | DescD | 40 | | |
5 | e | NameE | DescE | 50 | | |
6 | f | NameF | DescF | 60 | | |
Thanks!
Freddie
If this is Sql Server 2005 or newer, outer apply will help:
select TabOne.*,
last.Reading LastReading,
first.Reading FirstReading
from TabOne
outer apply
(
select top 1
Reading
from TabTwo
where TabTwo.Val_Key = TabOne.val_Key
order by TabTwo.Date desc
) last
outer apply
(
select top 1
Reading
from TabTwo
where TabTwo.Val_Key = TabOne.val_Key
order by TabTwo.Date asc
) first
Live test is # Sql Fiddle.
#Nikola Markovinović's solution can be made more universally applicable if the subqueries are moved directly to the main query's SELECT clause, which is possible each of them retrieves only one value and is, therefore, valid as a scalar expression:
SELECT
t1.[KEY],
t1.[Group],
t1.Name,
t1.[Desc],
t1.Val_Key,
(
SELECT TOP 1 Reading
FROM TabTwo
WHERE Val_Key = t1.Val_Key
ORDER BY Date DESC
) AS LastReading,
(
SELECT TOP 1 Reading
FROM TabTwo
WHERE Val_Key = t1.Val_Key
ORDER BY Date ASC
) AS FirstReading
FROM TabOne t1
If you needed e.g. dates along the way, you would probably have to stick to Nikola's solution. There is an alternative to it, but it's more cumbersome (albeit more standard too): it would involve grouping TabTwo's data by Val_Key to get earliest/latest dates per Val_Key, then joining back to TabTwo to access entire rows corresponding to the found dates to finally pull the necessary columns, and ultimately joining both result sets to TabOne to get the final column set.

Resources