group by with 'pre-defined row' - sql-server

Say I have to following PaymentTransaction Table:
ID Amount PayMethodID
----------------------------
10254 100 1
15789 150 1
15790 200 0
16954 300 0
17864 400 1
19364 500 1
PayMethodID Desc
----------------------------
0 CASH
1 VISA
2 MASTER
3 AMEX
4 ETC
I can simply use a group by to group the PayMethodID under 1 and 0.
What i am trying to do is to show also the non-exist PayMethodID under GROUP BY
My current result with simple group by statement is
PayMethodID TotalAmount
-------------------------
0 500
1 1150
Expected result (to show 0 if its not exits in the transaction table):
PayMethodID TotalAmount
-------------------------
0 500
1 1150
2 0
3 0
4 0
This might be a simple and duplicated question, but i just cant find the keyword to search around. I would remove this post if you can find me any duplication. Thanks.

You can use LEFT JOIN, so all rows from leftmost table (TableA) will be shown whether it has a matching values on the other table or not.
SELECT a.PayMethodID,
TotalAmount = ISNULL(SUM(b.Amount), 0)
FROM TableA AS a -- <== contains list of card type
LEFT JOIN TableB AS b -- <== contains the payment list
ON a.PayMethodID = b.PayMethodID
GROUP BY a.PayMethodID

A regular OUTER (LEFT) JOIN will give you all rows from the PayMethod table no matter if they exist in the PaymentTransaction table, the rest of the sums being NULL. You can then use a COALESCE to make the null rows zero;
SELECT pm.PayMethodID, COALESCE(SUM(pt.Amount), 0) TotalAmount
FROM PayMethod pm
LEFT JOIN PaymentTransaction pt
ON pm.PayMethodID = pt.PayMethodID
GROUP BY pm.PayMethodID
An SQLfiddle to test with.

Related

How to fix Aggregation in Group By, missing aggregation values

I have a table of sales info, and am interested in Grouping by customer, and returning the sum, count, max of a few columns. Any ideas please.
I checked all the Select columns are included in the Group By statement, a detail is returned not the Groupings and aggregate values.
I tried some explicit naming but that didn't help.
SELECT
customerID AS CUST,
COUNT([InvoiceID]) AS Count_Invoice,
SUM([Income]) AS Total_Income,
SUM([inc2015]) AS Tot_2015_Income,
SUM([inc2016]) AS Tot_2016_Income,
MAX([prodA]) AS prod_A,
FROM [table_a]
GROUP BY
customerID, InvoiceID,Income,inc2015, inc2016, prodA
There are multiple rows of CUST, i.e. there should be one row for CUST 1, 2 etc.... it should say this...
---------------------------------------------
CUST Count_Invoice Total_Income Tot_2015_Income Tot_2016_Income prod_A
1 2 600 300 300 2
BUT IT IS RETURNING THIS
======================================
CUST Count_Invoice Total_Income Tot_2015_Income Tot_2016_Income prod_A
1 1 300 300 0 1
1 1 300 0 300 1
2 1 300 0 300 1
2 1 500 0 500 0
3 2 800 0 800 0
3 1 300 0 300 1
You don't need to group by other columns, since they are already aggregating by count, min, max or sum.
So you may try this
SELECT customerID as CUST
,count([InvoiceID]) as Count_Invoice
,sum([Income]) as Total_Income
,sum([inc2015]) as Tot_2015_Income
,sum([inc2016]) as Tot_2016_Income
,max([prodA]) as prod_A --- here you are taking Max but in output it seems like sum
FROM [table_a]
Group By customerID
Note: For column prod_A you are using max which gives 1 but in result it is showing 2 which is actually sum or count. Please check.
for more info you may find this link of Group by.
From the description of your expected output, you should be aggregating by customer alone:
SELECT
customerID A CUST,
COUNT([InvoiceID]) AS Count_Invoice,
SUM([Income]) AS Total_Income,
SUM([inc2015]) AS Tot_2015_Income,
SUM([inc2016]) AS Tot_2016_Income,
MAX([prodA]) AS prod_A
FROM [table_a]
GROUP BY
customerID;

Find nearest row that matches condition in SQL Server

I have a SQL table with unique IDs, a date of service for a health care encounter, and whether this encounter was an emergency room visit (ed = 1) or a hospital admission (hosp = 1).
For each unique ID, I want to identify ED visits that occurred <= 1 calendar day from a hospital stay.
Thus I think I want to ask SQL first identify ED visits and then search up and down to find the nearest hospital admission and calculate the difference in dates (absolute value). I'm familiar with lag/lead and rownumber() functions, but can't quite seem to figure this out.
Any ideas would be much appreciated! Thank you!
Table looks like this for one illustrative ID:
id date ed hosp
1 2012-01-01 0 1
1 2012-01-05 1 0
1 2012-02-01 0 1
1 2012-02-03 1 0
1 2012-05-01 0 0
And I want to create a new column (ed_hosp_diff) that is the minimum absolute date difference (days) between each ED visit and the closest hospital stay, something like this:
id date ed hosp ed_hosp_diff
1 2012-01-01 0 1 null
1 2012-01-05 1 0 4
1 2012-02-01 0 1 null
1 2012-02-03 1 0 2
1 2012-05-01 0 0 null
So this doesn't get you the output table you show, but it meets the requirement you list:
For each unique ID, I want to identify ED visits that occurred <= 1
calendar day from a hospital stay.
Your output table doesn't really give you that - it includes rows for ED Visits that don't have a matching hospital admit, and has rows for hospital admits, etc. This SQL doesn't give you those, it just gives you the ED Visits that were followed by a hospital admit within one day.
It also doesn't give you matches with negative days - cases where the hospital visit is prior to the ED visit (in terms of healthcare analytics, that's usually a different thing than looking for ED Visits followed by an IP Admit). If you do want those, delete the last bit of logic in the WHERE clause for the main query.
SELECT
ID = e.id,
ED_DATE = e.date,
HOSP_DATE = h.date
ED_HOSP_DIFF = DATEDIFF(dd, e.date, h.date)
FROM
Table1 AS e
JOIN
(
SELECT
id,
date
FROM
Table1
WHERE
hosp = 1
) AS h
ON
e.id = h.id
WHERE
e.ed = 1
AND
DATEDIFF(dd, e.date, h.date) <= 1
AND
DATEDIFF(dd, e.date, h.date) >= 0
use OUTER APPLY to get the record with ed = 1 and find the min date diff
SELECT *
FROM table t
OUTER APPLY
(
SELECT ed_hosp_diff = MIN ( ABS ( DATEDIFF(DAY, t.date, x.date) ) )
FROM table x
WHERE x.date <> t.date
AND x.ed = 1
) eh

SQL Server query involving subqueries - performance issues

I have three tables:
Table 1: | dbo.pc_a21a22 |
batchNbr Other columns...
-------- ----------------
12345
12346
12347
Table 2: | dbo.outcome |
passageId record
---------- ---------
00003 200
00003 9
00004 7
Table 3: | dbo.passage |
passageId passageTime batchNbr
---------- ------------- ---------
00001 2015.01.01 12345
00002 2016.01.01 12345
00003 2017.01.01 12345
00004 2018.01.01 12346
What I want to do: for each batchNbr in Table 1 get first its latest passageTime and the corresponding passageID from Table 3. With that passageID, get the relevant rows in Table 2 and establish whether any of these rows contains the record 200. Per passageId there are at most 2 records in Table 2
What is the most efficient way to do this?
I have already created a query that works, but it's awfully slow and thus unfit for tables with millions of rows. Any suggestion on how to either change the query or do it another way? Altering the table structure is not an option, I only have read rights to the database.
My current solution (slow):
SELECT TOP 50000
a.batchNbr,
CAST ( CASE WHEN 200 in (SELECT TOP 2 record FROM dbo.outcome where passageId in (
SELECT SubqueryResults.passageId From (SELECT Top 1 passageId FROM dbo.passage pass WHERE pass.batchNbr = a.batchNbr ORDER BY passageTime Desc) SubqueryResults
)
) then 1 else 0 end as bit) as KGT_IO_END
FROM dbo.pc_a21a22 a
The desired output is:
batchNbr 200present
--------- ----------
12345 1
12346 0
I suggest you use table joining rather than subqueries.
select
a.*, b.*
from
dbo.table1 a
join
dbo.table2 b on a.id = b.id
where
/*your where clause for filtering*/
EDIT:
You could use this as a reference Join vs. sub-query
Try this
SELECT TOP 50000 a.*, (CASE WHEN b.record = 200 THEN 1 ELSE 0 END) AS
KGT_IO_END
FROM dbo.Test1 AS a
LEFT OUTER JOIN
(SELECT record, p.batchNbr
FROM dbo.Test2 AS o
LEFT OUTER JOIN (SELECT MAX(passageId) AS passageId, batchNbr FROM
dbo.Test3 GROUP BY batchNbr) AS p ON o.passageId = p.passageId
) AS b ON a.batchNbr = b.batchNbr;
The MAX subquery is to get the latest passageId by batchNbr.
However, your example won't get the record 200, since the passageId of the record with 200 is 00001, while the latest passageId of the batchNbr 12345 is 00003.
I used LEFT OUTER JOIN since the passageId from Table2 no longer match any of the latest passageId from Table3. The resulting subquery would have no records to join to Table1. Therefore INNER JOIN would not show any records from your example data.
Output from your example data:
batchNbr KGT_IO_END
12345 0
12346 0
12347 0
Output if we change the passageId of record 200 to 00003 (the latest for 12345)
batchNbr KGT_IO_END
12345 1
12346 0
12347 0

SQL Pivot only select rows

I am attempting to pivot a database so that only certain rows become columns. Below is what my table looks like:
ID QType CharV NumV
1 AccNum 10
1 EmpNam John Inc 0
1 UW Josh 0
2 AccNum 11
2 EmpNam CBS 0
2 UW Dan 0
I would like the table to look like this:
ID AccNum EmpNam
1 10 John Inc
2 11 CBS
I have two main problems I am trying to account for.
1st: the value that I am trying to get isn't always in the same column. So while AccNum is always in the NumV column, EmpName is always in the CharV column.
2nd: I need to find a way to ignore data that I don't want. In this example it would be the row with UW in the QType column.
Below is the code that I have:
SELECT *
FROM testTable
Pivot(
MAX(NumV)
FOR[QType]
In ([AccNum],[TheValue])
)p
But it's giving me the below result:
ID CharV AccNum TheValue
1 10 NULL
2 11 NULL
2 CBS NULL NULL
2 Dan NULL NULL
1 John IncNULL NULL
1 Josh NULL NULL
In this case grouping with conditional aggregation should work. Try something like:
SELECT ID
, MAX(CASE WHEN QType = 'AccNum' THEN NumV END) AS AccNum
, MAX(CASE WHEN QType = 'EmpNam' THEN CharV END) AS EmpNam
FROM testTable
GROUP BY ID
Since the inner CASE only gets a value when the WHEN condition is met, the MAX function will give you the value desired. This of course, only works as long as there are only unique QTypes per ID.
Generally using PIVOT in Sql-Server doesn't work in one step when your conditions are complex, specially when you need values from different columns. You could pivot your table in two queries and join those, but it would perform poorly and is less readable than my suggestion.

SQL Server : Join from multiple table references

Forgive me for adding yet another JOIN question, but I've been stumped all day and haven't been able to find a good answer for this.
I'm trying to join 4 tables, such that they look like below:
QuarterID ReviewID SaleID PotentialID
1 1 1 1
1 2 2 null
1 3 null 2
1 4 null null
The relevant info from the tables is below
Sale:
QuarterID
ReviewID
IsArchived
Potential:
QuarterID
ReviewID
IsArchived
Quarter:
ID
Review:
ID
We can have multiple Sales and Potentials associated with one Quarter-Review pairing, but only one Sale and one Potential will have IsArchived = 0 for the given Quarter-Review pairing.
SELECT
quarter.id AS QID,
review.id AS RID,
Sales.id AS SID,
Potentials.id AS PID
FROM
dbo.quarter
JOIN
(SELECT *
FROM dbo.sale
WHERE isarchived = 0) AS Sales ON Sales.quarterid = quarter.id
JOIN
(SELECT *
FROM dbo.potential
WHERE isarchived = 0) AS Potentials ON Potentials.quarterid = quarter.id
JOIN
dbo.review ON dbo.review.id = Sales.reviewid
AND dbo.review.id = Potentials.reviewid
ORDER BY
quarter.id, rid
Using the above (there are some unnecessary columns, I know), I've managed to get the joins so that they get the 1st condition (where its all the Sales and Potentials that are in the same Quarter and Review combination, but I also want to see if there is a Quarter/Review combo with only a Sale and no Potential, if there is a Q/R combo with only a Potential and no Sale, and just every Quarter and Review combo, since there are only a few Q/R combos that have both a Sale and Potential, with almost all of the Q/R combos only having a Sale or Potential.
I guess overall the difficulty comes from needing to get the join from two intermediate tables. I can join Quarter, Sale, and Review easily, but having the Potential table joining on the same fields (ReviewID, QuarterID) as Sale is making me only get the AND, and I can't figure out an OR. I've been throwing around ORs for hours trying to get the right sequence without any luck. Help?
--Edit to include sample data--
Quarter
ID
1
2
Review
ID (Other fields, not relevant to join)
1
2
3
4
5
Sale
ID ReviewID QuarterID isArchived (Other fields, not relevant)
1 1 1 0
2 2 1 1
3 2 1 0
4 1 2 0
5 5 1 0
6 5 2 0
Potential
ID ReviewID QuarterID isArchived (Other fields, not relevant)
1 1 1 0
2 3 1 0
3 4 2 1
4 4 2 0
5 5 2 0
Joining the above sample data, I would expect the output to look like:
QuarterID ReviewID SaleID PotentialID
1 1 1 1
1 2 3 null
1 3 null 2
1 4 null null
1 5 5 null
2 1 4 null
2 2 null null
2 3 null null
2 4 null 4
2 5 6 5
But the problem I am having is I am only returning the rows like the first and last row, where there is both a Sale and Potential for a given Quarter/Review combo, and not the ones where one or many may be null.
Not sure if I understood your question correctly (some sample data will help) but I think you mean that you need all the combinations of Quarter and Review and then any related Sale and Potential data for each combination of Quarter and Review. If that is what you need, then try the below query:
SELECT [Quarter].ID AS QID, Review.ID AS RID, Sales.ID AS SID, Potentials.ID AS PID FROM [Quarter]
CROSS JOIN [Review]
LEFT JOIN (SELECT * FROM Sale WHERE IsArchived = 0) Sales ON [Quarter].ID = Sales.QuarterID AND [Review].ID = Sales.ReviewID
LEFT JOIN (SELECT * FROM Potential WHERE IsArchived = 0) Potentials ON [Quarter].ID = Potentials.QuarterID AND [Review].ID = Potentials.ReviewID

Resources