Finding aggregate by Joining Table - SQL SERVER - sql-server

Question: Find the percentage of people who died out of the cases reported for each country.
Data is in two tables- cases and death. There are country columns in both the tables. deaths is the column with no. of deaths in death table. And cases is the cases reported, which is in the cases table.
With the following query,
SELECT (sum(isnull(d.deaths,0))/sum(isnull(c.cases,0)))*100
FROM cases as c
JOIN death as d ON c.country=d.country
Im getting an answer 0.54493487236178.
==
Aggregating separately and averaging, I'm getting the average as the following. (same in excel)
SELECT sum(cases) FROM cases
The value is 106036635
SELECT sum(deaths) FROM death
The value is 716111
(716111/106036635)*100= 0.675343008.
How come both the values differ!!
==
ALSO
SELECT c.country, (sum(d.deaths)/sum(c.cases))*100
FROM cases as c
JOIN death as d ON c.country1=d.country1 AND c.cases IS NOT NULL AND d.deaths IS NOT NULL
GROUP BY c.country
is giving me Divide by zero error encountered.! I understand my codes are quite ugly and long since I'm a newbie. Plz help me ..

This is actually all correct. Your statements are all coming out with correct information you just need to adjust your queries to see it.
We will start with the ratios you feel are incorrect. Remember on your initial query you are doing an inner join that is why your sums seem to be incorrect. If you have cases without deaths then You will actually not count those cases. your check queries should look like
SELECT sum(cases) FROM cases where cases.country in (Select death.country from death)
and
SELECT sum(deaths) FROM death where death.country in (Select cases.country from cases)
Using this query should show you the correct ratio.
For your second problem there is a chance that you have countries listed without any cases. In this case we will want to add a conditional statement to help determine when there is an issue
Select c.country,
Case c.cases
When 0 Then
Case d.deaths
When 0 Then 0
Else 100
End
Else (sum(d.deaths)/sum(c.cases))*100
End As Ratio
From cases c
Inner Join death d
On c.country = d.country
Where c.cases Is Not Null
And d.death Is Not Null
Group By c.country
Or if you also want to Disclude 0 cases you can simplify the query to this
Select c.country,
(sum(d.deaths)/sum(c.cases))*100 As Ratio
From cases c
Inner Join death d
On c.country = d.country
Where c.cases <> 0
And c.cases Is Not Null
And d.death Is Not Null
Group By c.country

Try this:
SELECT c.country, cast(sum(d.deaths)/(case when sum(c.cases) = 0 then 1 else sum(c.cases) end) as float) *100
FROM cases as c
JOIN death as d ON c.country1=d.country1 AND c.cases IS NOT NULL AND d.deaths IS NOT NULL
GROUP BY c.country

Related

Unsupported subquery type cannot be evaluated with LIMIT 1 [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
(Submitting the following as both an attempt at info-share of best practices as well as a request for additional opinions...)
Q: I need to run a query that gets the exchange rate for a transaction. Based on the date of the transaction, I need the latest exchange rate from a rate history table where the starting effective date for the rate is less than the transaction's date.
Very basically:
SELECT t.Amount, t.TrxDate, t.Currency, ex.ExchangeRate
FROM Transactions t
LEFT JOIN LATERAL (
SELECT ExchangeRate
FROM RateHistory h
WHERE t.TrxDate <= h.EffectiveDate AND t.Currency = h.Currency ORDER BY EffectiveDate DESC
LIMIT 1) ex
The problem comes in with the LIMIT 1. I can't have this subquery returning multiple rows, I only need the most recent exchange rate. Using LIMIT 1 or attempting to use ROW_NUMBER() and a conditional join both result in the same "Unsupported subquery" error.
Anyone have any recommendations? The query works just fine elsewhere, but not in Snowflake.
Here is the actual query. The last subquery is the issue.
SELECT
'SG' AS Region,
CAST(gl.acct_no AS NUMERIC(12,6)) AS acct_no,
gl.trx_date,
source,
gl.reference,
doc_no,
gl.amount,
for_amt,
reference2,
gl.reg_int_id,
gl.reg_seq_no,
gl.curr_id AS gl_curr_id,
c.curr_id AS chart_curr_id,
z.reference AS reference3,
z.dist_dt,
cur.cur_desc,
ex.exchng_rate
FROM gltrx gl
JOIN chart c ON ROUND(c.acct_no,6) = ROUND(gl.acct_no,6)
LEFT OUTER JOIN currencies cur ON gl.curr_id = cur.curr_id
LEFT JOIN LATERAL (SELECT MAX(ap.reference) AS reference, MIN(aph.dist_dt) AS dist_dt
FROM apdist ap
LEFT OUTER JOIN aphist aph ON ap.voucher_no = aph.voucher_no
WHERE ap.reg_int_id = gl.reg_int_id AND ap.reg_seq_no = gl.reg_seq_no) z
LEFT JOIN LATERAL (SELECT Exchng_rate
FROM Curr_hist
WHERE Curr_Id = gl.curr_id
AND Exchng_Dt <= gl.trx_date
ORDER BY exchng_dt DESC
LIMIT 1) ex
Recommendation 1: Tried below query with returned one row successfully, both emp and dept tables have two rows each.
SELECT * FROM EMP WHERE DEPTNO = (SELECT DEPTNO FROM DEPT LIMIT 1)
So we can use LIMIT in subquery, there is no issue with LIMIT, issue is somewhere else.
Try to break query into pieces, and run each piece individually and then combine them one-by one to analyze.
Response To Recommendation 1: That subquery works, however the issue is with the LEFT JOIN LATERAL. I have tried my query without the LIMIT 1 and it works fine... except it obviously returns too many results.
Recommendation 2: Any Specific reason we are using LATERAL?
Try writing subquery using with clause and join it later.
Response to Recommendation 2: Using LATERAL because I need exactly one record from the subquery to join to each record in the transaction table. LATERAL allows for the conditional aggregation to achieve that. I'm not quite sure what you mean by using the WITH clause. Are you suggesting a CTE?
Recommendation 3: Maybe you can try removing LATERAL and also use MAX function instead of sorting subquery and getting limit, because essentially you need the latest rate after specified date but that was max by date will give you.
SELECT t.Amount, t.TrxDate, t.Currency, ex.ExchangeRate
FROM Transactions t
LEFT JOIN (
SELECT MAX(h.EffectiveDate ), h.Currency
FROM RateHistory h
WHERE t.TrxDate <= h.EffectiveDate
AND t.Currency = h.Currency
GROUP BY h.Currency
) ex
Response to Recommendation 3: Since what I need is the the exchange rate for the max date, I'd need to select the exchange rate in that subquery as well, which means it would need to be grouped by and I'd end up in the same place; too many joins.
Are there any other recommendations and/or work-arounds available out there??
use a window function to make sure you get only 1 row?
(SELECT Exchng_rate,
RANK() OVER (ORDER BY exchng_dt DESC) as order
FROM Curr_hist
WHERE Curr_Id = gl.curr_id
AND Exchng_Dt <= gl.trx_date
)
where order=1
So I moved the first LATERAL to a CTE which might not have been the problem.
The second LATERAL is needing each rows gl.tx_date so if you just LEFT JOIN on that (which gives to way to many rows) and then use the ROW_NUMBER() to implement the filter you wanted with the LIMIT 1. Which makes me think this will be slow, but should work.
WITH cte_a AS (
SELECT ap.reg_int_id
,ap.reg_seq_no
,MAX(ap.reference) AS reference
,MIN(aph.dist_dt) AS dist_dt
FROM apdist ap
LEFT OUTER JOIN aphist aph
ON ap.voucher_no = aph.voucher_no
GROUP BY 1,2
)
SELECT Region,
acct_no,
trx_date,
source,
reference,
doc_no,
amount,
for_amt,
reference2,
reg_int_id,
reg_seq_no,
gl_curr_id,
chart_curr_id,
reference3,
dist_dt,
cur_desc,
exchng_rate
FROM (
SELECT
'SG' AS Region,
CAST(gl.acct_no AS NUMERIC(12,6)) AS acct_no,
gl.trx_date,
source,
gl.reference,
doc_no,
gl.amount,
for_amt,
reference2,
gl.reg_int_id,
gl.reg_seq_no,
gl.curr_id AS gl_curr_id,
c.curr_id AS chart_curr_id,
z.reference AS reference3,
z.dist_dt,
cur.cur_desc,
ex.exchng_rate
row_number() over (partition by gl.curr_id order by ex.exchng_dt DESC) as rn
FROM gltrx gl
JOIN chart c
ON ROUND(c.acct_no,6) = ROUND(gl.acct_no,6)
LEFT OUTER JOIN currencies cur ON gl.curr_id = cur.curr_id
LEFT JOIN cte_a as z
ON z.reg_int_id = gl.reg_int_id AND z.reg_seq_no = gl.reg_seq_no
LEFT JOIN Curr_hist as ex
ON ex.Curr_Id = gl.curr_id and ex.Exchng_Dt <= gl.trx_date
)
WHERE rn = 1;

SUM vs EXIST in SqlServer

The intent is to return all 'Unprocessed' TransactionSets if they have NO PaymentUid and NO ProcessStatus.value('/CPI/#ProcessItem)[1]'... relations, and also pick up 'No-Matched-Payments' TransactionSets if they have ANY PaymentUid AND ANY ProcessStatus.value('/CPI/#ProcessItem)[1]'... relations.
The SUM function in the having seem clunky and don't allow SQL to quit when it encounters any or none. So it seems like it's inefficient, and at the very least quite clunky to read and deal with. Is there a way to write this with something like an EXIST ?
select ts.TransactionSetUid
from TransactionSet ts
join TransactionHeader eh on ts.TransactionSet = eh.TransactionSet
join TransactionPayment tp on eh.TransactionHeaderUid = tp.TransactionHeaderUid
left join ServicePayment sp on tp.TransactionPaymentUid = sp.TransactionPaymentUid
where TransactionStatus in ('Unprocessed', 'No-Matched-Payments')
group by ts.TransactionSet
having (TransactionStatus = 'Unprocessed'
and SUM( CASE WHEN sp.TransactionItem is null THEN 0 ELSE 1 END) = 0
and SUM( CASE WHEN tp.ProcessStatus.value('(/CPI/#ProcessItem)[1]', 'varchar(50)') IS NULL THEN 0 ELSE 1 END) = 0)
or (ts.RuleStatus = 'No-Matched-Payments'
and (SUM( CASE WHEN sp.TransactionItem is null THEN 0 ELSE 1 END) <> 0
or SUM( CASE WHEN tp.ProcessStatus.value('(/CPI/#ProcessItem)[1]', 'varchar(50)') IS NULL THEN 0 ELSE 1 END) <> 0))
UPDATE to answer questions. The relationships between the TransactionSet is one to many with the other tables. There could be many TransactionPayment records but the query is only concerned with ProcessStatus.value that has an xml node at (/CPI/#processItem)[1]. But with ServicePayment, any non-null TransactionItem will do.
As I understand it, the group by is only in there because of the SUM functions. The intent is to flag any TransactionSet that meets one of two conditions.
The first condition is:
the Transaction Status is 'Unprocessed'
and
there are no Process Status values
and
there are no Transaction Items.
The second condition is:
the Transaction Status is 'No-Matched-Payments'
and
there is at least one Process Status value
or
there is at least one Transaction Item.
So the query was set up to use SUM to count the number of times the left join on ServicePayment comes up NULL or when the XML value in TransactionPayment doesn't contain a '/CPI/#processItem'.
It seems to me that instead of using a SUM, the query could instead use an EXIST or some other mechanism to short circuit the test condition. The value of the SUM is not really important, It just needs to know if there is at least one or if there are none.
--
Thank you to everyone: I know i'm not a database expert by any means, and I've been programming in the seven C's (C,C++,C#,Java,etc.) for so long that I sometimes forget that SQL is not an imperative language, or more likely, I just don't think in declarative terms.
I think something like this should do the trick:
select ts.TransactionSetUid
from TransactionSet ts
where CASE WHEN EXISTS(SELECT * FROM TransactionHeader eh
join TransactionPayment tp on eh.TransactionHeaderUid = tp.TransactionHeaderUid
left join ServicePayment sp on tp.TransactionPaymentUid = sp.TransactionPaymentUid
where ts.TransactionSet = eh.TransactionSet and
(
sp.TransactionItem is not null or
tp.ProcessStatus.value('(/CPI/#ProcessItem)[1]', 'varchar(50)') IS not NULL
)
) THEN 1 ELSE 0 END =
CASE TransactionStatus
WHEN 'Unprocessed' THEN 0
WHEN 'No-Matched-Payments' THEN 1
END
That is, I've put the EXISTS check in to test for either condition and put it inside a CASE expression so that we don't have to write it out twice for which result we want (for Unprocessed and No-Matched-Payments).
I've also crafted the second CASE expression to return 0, 1 or NULL so that if the TransactionStatus is something else, it doesn't matter what result the EXISTS produces.
I hope I've followed the correct chains of 0/1, true/false, and/or, NULL/NOT NULL logic here - if it's not 100%, it's hopefully just tweaks to those options. I've also assumed I can shift all of the tables except TransactionSet into the EXISTS - it may be that TransactionHeader has to stay outside if that's where TransactionStatus is coming from.
If this isn't correct, you should probably add bare-bones tables and sample data to your question, alongside the expected results.
Yes, this might work... -- your query did not include a select distinct, but if this this produces duplicate TransactionSetUids, add the keyword distinct...
select [distinct] ts.TransactionSetUid from TransactionSet ts
join TransactionHeader th
on th.TransactionSet = ts.TransactionSet
join TransactionPayment tp
on tp.TransactionHeaderUid = th.TransactionHeaderUid
where not exists
( Select * from ServicePayment
Where TransactionPaymentUid = tp.TransactionPaymentUid
and tp.ProcessStatus.value(
'(/CPI/#ProcessItem)[1]', 'varchar(50)') IS NULL
and TransactionStatus = 'Unprocessed')
Or exists
( Select * from ServicePayment
Where TransactionPaymentUid = tp.TransactionPaymentUid
and ts.RuleStatus = 'No-Matched-Payments'
and tp.ProcessStatus.value(
'(/CPI/#ProcessItem)[1]', 'varchar(50)') IS not NULL
and ts.RuleStatus = 'No-Matched-Payments')

Complete Select Statement ELSE a defaulted value?

I'm wondering if there is a way to have a complete select statement and if no row is returned to return a row with a default value in a specific field (SQL Server)? Here's a generic version of my SQL to better explain:
SELECT COUNT(CASE WHEN CAST(c.InjuryDate as DATE)>DATEADD(dd,-60, getdate ()) THEN b.InjuryID end) InjuryCount, a.PersonID
FROM Person_Info a
JOIN Injury_Subject b on b.PersonID=a.PersonID
JOIN Injury_Info c on c.InjuryID=b.InjuryID
WHERE EXISTS (SELECT * FROM Hospital_Record d WHERE d.PersonID=b.PersonID and d.InjuryID=b.InjuryID) --There could be multiple people associated with the same InjuryID
GROUP BY a.PersonID
If NOT EXISTS (SELECT * FROM Hospital_Record d WHERE d.PersonID=a.PersonID) THEN '0' in InjuryCount
I want a row for each person who has had an injury to display. Then I'd like a count of how many injuries resulted in hospitalizations in the last 60 days. If they were not hospitalized, I'd like the row to still be generated, but display '0' in InjuryCount column. I've played with this a bunch, moving my date from the WHERE to the SELECT, trying IF ELSE combos, etc. Could someone help me figure out how to get what I want please?
It's hard to tell without an example of input and desired output, but I think this is what you are going for:
select
InjuryCount = count(case
when cast(ii.InjuryDate as date) > dateadd(day,-60,getdate())
then i.InjuryId
else null
end
)
, p.PersonId
from Person_Info p
left join Hosptal_Record h on p.PersonId = h.PersonId
left join Injury_Subject i on i.PersonId = h.PersonId
and h.InjuryId = i.InjuryId
left join Injury_Info ii on ii.InjuryId = i.InjuryId
group by p.PersonId;

How to replace an aggregate null value by 0

I need to calculate a percentage, based on the amount of money on an user account and group the data by the account ID. I make in my calculation a sum of every payment that is used. A problem is that I also need to show acocunts without payments. My idea was to use a CASE statement to check if the aggragate sum gives a null or a value. When it returns null, I replace it by 0.
I have following query
SELECT
DA.ACCOUNT_ID,
ROUND((1 - ((CASE
WHEN SUM(FPP.AMOUNT_IN_DEFAULT_CURRENCY) IS NOT NULL THEN SUM(FPP.AMOUNT_IN_DEFAULT_CURRENCY)
ELSE 0
END) / FAT.PERCENTAGE_INDICATOR)) * 100,0) AS "Percentage"
FROM DIM_ACCOUNT DA
JOIN TRANSACTION_TABLE FAT ON FAT.ACCOUNT_ID = DA.ID
JOIN PAYMENTS_TABLE FPP ON FPP.ACCOUNT_ID = DA.ID
GROUP BY DA.ACCOUNT_ID
But when I execute this with test data, it doesn't work. The account is not added in my list. Is something wrong with my NULL handling?
When I strip of the query and only do the sum at the payment table, I get following output:
< null > (without spaces)
How can I make this work?
NULL is ignored by aggregate functions like sum/max/min...
If all column values are null then it will give error.
Sum(column) cannot be null
SELECT sum(t.num) AS sum_val
FROM (
SELECT null AS num
) t
Operand data type NULL is invalid for sum operator
if by 'without payments' you mean with no related records in PAYMENTS_TABLE then use a LEFT JOIN:
SELECT
DA.ACCOUNT_ID,
ROUND((1 - (ISNULL(SUM(FPP.AMOUNT_IN_DEFAULT_CURRENCY), 0) / FAT.PERCENTAGE_INDICATOR)) * 100,0) AS "Percentage"
FROM DIM_ACCOUNT DA
JOIN TRANSACTION_TABLE FAT ON FAT.ACCOUNT_ID = DA.ID
LEFT JOIN PAYMENTS_TABLE FPP ON FPP.ACCOUNT_ID = DA.ID
GROUP BY DA.ACCOUNT_ID;
a INNER JOIN does exclude the accounts without payments because requires a matching record in all the involved tables.

query for two tables and have one column populate based on first match

This is one of those things I am sure has been discussed 1000 times, but all my searches have come just short of me understanding what I need.
Two tables: ledger and patient
I want to select several columns from ledger, and add one column from patient based on two records matching. Here is the query I wrote, but it just runs forever:
SELECT
ledger.OID
, ledger.PATID
, ledger.PROVIDERID
, ledger.LTYPE
, ledger.TRANDATE
, ledger.ESTINS
, ledger.AMOUNT
, ledger.PATPAID
, ledger.PATADJUST
, ledger.LEDGERID
, ledger.TYPE2
, ledger.rpid
, patient.oid
, ledger.DESCR
FROM
[exportb].[dbo].[LEDGER]
, exportb.dbo.patient
INNER JOIN
LEDGER AS LEDGE ON LEDGE.rpid = PATIENT.rpid
WHERE
ledger.PATID > '0'
AND ledger.LTYPE <> 'M'
AND ledger.LTYPE <> 'n'
AND (
ledger.ESTINS <> '0'
OR ledger.AMOUNT <> '0'
OR ledger.PATPAID <> '0'
OR ledger.PATADJUST <> '0'
)
When I run it without the patient.oid and join statement, I get 40037 records, which is exactly what I want.
What I want to do is add the patient.oid header to me results. I want the query to look at the LEDGER.RPID column and match the PATIENT.RPID column and populate PATIENT.OID for that record.
I'm sure it's simple, but was hoping someone could shed some light!
You have no predicate joining dbo.patient so are therefore getting a cross join (Cartesian Product):
from [exportb].[dbo].[LEDGER],exportb.dbo.patient
This should be:
FROM dbo.Ledger
INNER JOIN dbo.Patient
ON Patient.rpid = Ledger.rpid
You also never reference the table aliased as LEDGE anywhere apart from the join, so you may as well remove this completely. So, I think what you are after is:
USE Exportb;
SELECT l.OID,
l.PATID,
l.PROVIDERID,
l.LTYPE,
l.TRANDATE,
l.ESTINS,
l.AMOUNT,
l.PATPAID,
l.PATADJUST,
l.LEDGERID,
l.TYPE2,
l.rpid,
p.oid,
l.DESCR
FROM dbo.Ledger AS l
INNER JOIN dbo.Patient AS p
ON p.rpid = l.rpid
WHERE l.PATID > 0
AND l.LType NOT IN ('M', 'n')
AND (l.Estins <> 0 OR l.Amount <> 0 OR l.PatAdjust <> 0);
Of note, I have removed the cross join, and left a single join to dbo.Patient. I have simplified the following predicates:
AND ledger.LTYPE <> 'M'
AND ledger.LTYPE <> 'n'
to
AND l.LType NOT IN ('M', 'n')
I have also taken what I assume to be numeric constants out of single quotes to avoid implicit conversions, and also removed ledger.PATPAID <>'0' from the following line:
AND (ledger.ESTINS <> '0' OR ledger.AMOUNT <> '0' or ledger.PATPAID <>'0' or ledger.PATADJUST <> '0')
Since the part can never be true because of the predicate:
WHERE l.PATID > 0
If you have a one to many relationship from ledger to patients, i.e. you have multiple rows in dbo.patient and only care about the first match, then you will need to change your join to an APPLY and use TOP 1 to get a single row. Your title seems to suggest you want the "first match", but the question does not explain how you define "first":
USE Exportb;
SELECT l.OID,
l.PATID,
l.PROVIDERID,
l.LTYPE,
l.TRANDATE,
l.ESTINS,
l.AMOUNT,
l.PATPAID,
l.PATADJUST,
l.LEDGERID,
l.TYPE2,
l.rpid,
p.oid,
l.DESCR
FROM dbo.Ledger AS l
CROSS APPLY
( SELECT TOP 1 p.oid
FROM dbo.Patient AS p
WHERE p.rpid = l.rpid
ORDER BY p.oid -- YOU MAY NEED TO CHANGE THIS ORDER DEPENDING ON YOUR CRITERIA
) AS p
WHERE l.PATID > 0
AND l.LType NOT IN ('M', 'n')
AND (l.Estins <> 0 OR l.Amount <> 0 OR l.PatAdjust <> 0);
You are looking for LEFT JOIN my friend. Change the join from INNER to LEFT and you will get the results.
The column Patient.OID will be NULL where there is no match.
And yes, the from clause and join clause is incorrect, which is why it is running forever.
Change it to:
from [exportb].[dbo].[LEDGER] LEDGE
LEFT JOIN exportb.dbo.patient ON LEDGE.rpid = PATIENT.rpid

Resources