Summation in SQL over a computed column - sql-server

I have a Trans-SQL related question, concerning summations over a computed column.
I am having a problem with double-counting of these computed values.
Usually I would extract all the raw data and post-process it in Perl, but I can't do that on this occasion due to the particular reporting system we need to use. I'm relatively inexperienced with the intricacies of SQL, so I thought I'd refer this to the experts.
My data is arranged in the following tables (highly simplified and reduced for the purposes of clarity):
Patient table:
PatientId
PatientSer
Course table
PatientSer
CourseSer
CourseId
Diagnosis table
PatientSer
DiagnosisId
Plan table
PlanSer
CourseSer
PlanId
Field table
PlanSer
FieldId
FractionNumber
FieldDateTime
What I would like to do is find the difference between the maximum fraction number and the minimum fraction number over a range of dates in the FieldDateTime in the FieldTable. I would like to then sum these values over the possible plan ids associated with a course, but I do not want to double count over the two particular diagnosis ids (A or B or both) that I may encounter for a patient.
So, for a patient with two diagnosis codes (A and B) and two plans in the same course of treatment (Plan1 and Plan2), with a difference in fraction numbers of 24 for the first plan and 5 for the second what I would like to get out is something like this:
- **PatientId CourseId PlanId DiagnosisId FractionNumberDiff Sum
- AB1234 1 Plan1 A 24 29
- AB1234 1 Plan1 B * *
- AB1234 1 Plan2 A 5 *
- AB1234 1 Plan2 B * *
I've racked my brains about how to do this, and I've tried the following:
SELECT
Patient.PatientId,
Course.CourseId,
Plan.PlanId,
MAX(fractionnumber OVER PARTITION(Plan.PlanSer)) - MIN(fractionnumber OVER PARTITION(Plan.PlanSer)) AS FractionNumberDiff,
SUM(FractionNumberDiff OVER PARTITION(Course.CourseSer)
FROM
Patient P
INNER JOIN
Course C ON (P.PatientSer = C.PatientSer)
INNER JOIN
Plan Pl ON (Pl.CourseSer = C.CourseSer)
INNER JOIN
Diagnosis D ON (D.PatientSer = P.PatientSer)
INNER JOIN
Field F ON (F.PlanSer = Pl.PlanSer)
WHERE
FieldDateTime > [Start Date]
AND FieldDateTime < [End Date]
But this just double-counts over the diagnosis codes, meaning that I end up with 58 instead of 29.
Any ideas about what I can do?

change the FractionNumberDiff to
MAX(fractionnumber) OVER (PARTITION BY Plan.PlanSer) -
MIN(fractionnumber) OVER (PARTITION BY Plan.PlanSer) AS FractionNumberDiff
and remove the "SUM(FractionNumberDiff OVER PARTITION(Course.CourseSer)"
make the exisitng query as a derived table and calcualte the SUM(FractionNumberDiff) there
SELECT *, SUM(FractionNumberDiff) OVER ( PARTITION BYCourse.CourseSer)
FROM
(
< the modified existing query here>
) AS d
as for the double counting issue, please post some sample data and the expected result

Related

Sum of sale of each partnumber and partnumber have its substitute partnumber in another table

I have two table first have part number and sale of each month.
And another table have substitute part of part number.
Table 1
Partnumber jun19sale jul19sale
A 1 1
B 2 1
C 3 4
E 5 3
D 1 2
Table2
Partnumber subpart
A B
A C
A D
How can i get something like this.
Partnumber jun19sale jul19sale
A 7 8
B 7 8
C 7 8
E 5 3
D 7 8
I tried with subqueries with or,in which give me accurate result, but it takes too much time. Because tables have large amount of data.
Long way
Join the sales numbers to the parts (associate each sale with a sub part) using a left join (some records in sales will not associate), group and sum on the parts partnumber if it exists, or the sales part number if it doesn't (the sales are expressed on subparts and main parts so we want to map some subparts in sales to a main part). Once we have our sales expressed as main parts only, left join (otherwise you won't get row E in the output) it to a list of parts where main part is mapped to both main part and sub part (otherwise you won't get row A in the output)
SELECT
COALESCE(parts.partnumber, sales.partnumber) partnumber,
sum(jun19sale) as jun19sum,
sum(jul19sale) as jul19sum
FROM
table1 sales
LEFT JOIN
table2 parts
ON
sales.partnumber = parts.subpart
GROUP BY COALESCE(parts.partnumber, sales.partnumber)
This will give totals like A, 7, 8 etc. Now we need to join that back to a mapping of the parts to subparts that also includes the main part mapped to the main part (as a subpart), like this:
SELECT
COALESCE(msparts.subpart, subsum.partnumber) as partnumber,
subsum.jun19sum,
subsum.jul19sum
FROM
(
SELECT DISTINCT partnumber, partnumber as subpart FROM table1
UNION ALL
SELECT partnumber, subpart FROM table1
) msparts
RIGHT JOIN
(
SELECT
COALESCE(parts.partnumber, sales.partnumber) partnumber,
sum(jun19sale) as jun19sum,
sum(jul19sale) as jul19sum
FROM
table1 sales
LEFT JOIN
table2 parts
ON
sales.partnumber = parts.subpart
GROUP BY COALESCE(parts.partnumber, sales.partnumber)
) subsum
ON
msparts.partnumber = subsum.partnumber
We need a trick though, to prevent the A row from getting lost, because the parts table maps a to b,c,d but not to a- this means if we join the sims and show the subpart, row A will disappear from the results. If we add a inch of fake rows that maps A to A as well as to B C and D, then the row will remain. This is what the UNION ALL bit does
Short way
this might be simpler to achieve using analytic/window functions to do the same thing;
SELECT
sales.partnumber,
SUM(jun19sale) OVER(PARTITION BY COALESCE(parts.partnumber, sales.partnumber)) jun19sale,
SUM(jul19sale) OVER(PARTITION BY COALESCE(parts.partnumber, sales.partnumber)) jul19sale
FROM
table1 sales
LEFT JOIN
table2 parts
ON sales.partnumber = parts.subpart
Here we use the sales table as a driver so we keep rows A and E by default. We still do a left join on the parts table so some parts like B C D are mapped to A. We ask the analytic to sum on the group of main part from parts or if it is null, main part from sales (this is the PARTITION BY)
COALESCE is a cross platform compatible version of IFNULL

Calculate a Recursive Rolling Average in SQL Server

We are attempting to calculate a rolling average and have tried to convert numerous SO answers to solve the problem. To this point we are still unsuccessful.
What we've tried:
Here are some of the SO answers we have considered.
SQL Server: How to get a rolling sum over 3 days for different customers within same table
SQL Query for 7 Day Rolling Average in SQL Server
T-SQL calculate moving average
Our latest attempt has been to modify one of the solutions (#4) found here.
https://www.red-gate.com/simple-talk/sql/t-sql-programming/calculating-values-within-a-rolling-window-in-transact-sql/
Example:
Here is an example in SQL Fiddle: http://sqlfiddle.com/#!6/4570a/17
In the fiddle, we are still trying to get the SUM to work right but ultimately we are trying to get the average.
The end goal
Using the Fiddle example, we need to find the difference between Value1 and ComparisonValue1 and present it as Diff1. When a row has no Value1 available, we need to estimate it by taking the average of the last two Diff1 values and then add it to the ComparisonValue1 for that row.
With the correct query, the result would look like this:
GroupID Number ComparisonValue1 Diff1 Value1
5 10 54.78 2.41 57.19
5 11 55.91 2.62 58.53
5 12 55.93 2.78 58.71
5 13 56.54 2.7 59.24
5 14 56.14 2.74 58.88
5 15 55.57 2.72 58.29
5 16 55.26 2.73 57.99
Question: is it possible to calculate this average when it could potentially factor into the average of the following rows?
Update:
Added a VIEW to the Fiddle schema to simplify the final query.
Updated the query to include the new rolling average for Diff1 (column Diff1Last2Avg). This rolling average works great until we run into nulls in the Value1 column. This is where we need to insert the estimate.
Updated the query to include the estimate that should be used when there is no Value1 (column Value1Estimate). This is working great and would be perfect if we could use the estimate in place of NULL in the Value1 column. Since the Diff1 column reflects the difference between Value1 (or its estimate) and ComparisonValue1, including the Estimate would fill in all the NULL values in Diff1. This in turn would continue to allow the Estimates of future rows to be calculated. It gets confusing at this point, but still hacking away at it. Any ideas?
Credit for the idea goes to this answer: https://stackoverflow.com/a/35152131/6305294 from #JesúsLópez
I have included comments in the code to explain it.
UPDATE
I have corrected the query based on comments.
I have swapped numbers in minuend and subtrahend to get difference as a positive number.
Removed Diff2Ago column.
Results of the query now exactly match your sample output.
;WITH cte AS
(
-- This is similar to your ItemWithComparison view
SELECT i.Number, i.Value1, i2.Value1 AS ComparisonValue1,
-- Calculated Differences; NULL will be returned when i.Value1 is NULL
CONVERT( DECIMAL( 10, 3 ), i.Value1 - i2.Value1 ) AS Diff
FROM Item AS i
LEFT JOIN [Group] AS G ON g.ID = i.GroupID
LEFT JOIN Item AS i2 ON i2.GroupID = g.ComparisonGroupID AND i2.Number = i.Number
WHERE NOT i2.Id IS NULL
),
cte2 AS(
/*
Start with the first number
Note if you do not have at least 2 consecutive numbers (in cte) with non-NULL Diff value and therefore Diff1Ago or Diff2Ago are NULL then everything else will not work;
You may need to add additional logic to handle these cases */
SELECT TOP 1 -- start with the 1st number (see ORDER BY)
a.Number, a.Value1, a.ComparisonValue1, a.Diff, b.Diff AS Diff1Ago
FROM cte AS a
-- "1 number ago"
LEFT JOIN cte AS b ON a.Number - 1 = b.Number
WHERE NOT a.Value1 IS NULL
ORDER BY a.Number
UNION ALL
SELECT b.Number, b.Value1, b.ComparisonValue1,
( CASE
WHEN NOT b.Value1 IS NULL THEN b.Diff
ELSE CONVERT( DECIMAL( 10, 3 ), ( a.Diff + a.Diff1Ago ) / 2.0 )
END ) AS Diff,
a.Diff AS Diff1Ago
FROM cte2 AS a
INNER JOIN cte AS b ON a.Number + 1 = b.Number
)
SELECT *, ( CASE WHEN Value1 IS NULL THEN ComparisonValue1 + Diff ELSE Value1 END ) AS NewValue1
FROM cte2 OPTION( MAXRECURSION 0 );
Limitations:
this solution works well only when you need to consider small number of preceding values.

sql cross table calculations

Hi i need to write a query that does multiple things, i made it so it can get the details of orders from within a certain time frame as well as for ages between 20 and 30, however i need to check if the orders product cost more then a set amount
however that data is in multiple tables
one table has the orderid the prodcode and quantity, while the other day has the prod information such as code and price, and im 3rd from another table
So i need to access the price of the product with the prodcode and quantity to do a cross table calculation and see if its above 100 and trying to do this with an and where command
so if i have 3 tables
Orderplaced table with oid odate custno paid
ordered table with oid itemid quant
items itemid itemname price
and i need to do a calcultion across those tabkes in my query
SELECT DISTINCT Orderplaced.OID, Orderplaced.odate, Orderplaced.custno, Orderplaced.paid
FROM Cust, Orderplaced, items, Ordered
WHERE Orderplaced.odate BETWEEN '01-JUL-14' AND '31-DEC-14'
AND Floor((sysdate-Cust.DOB) / 365.25) Between '20' AND '30'
AND Cust.SEX='M'
AND items.itemid=ordered.itemid
AND $sum(ordered.quan*item.PRICE) >100;
no matter what way i try to get the calculation to work it doesnt seem to work always returns the same result even on orders under 100 dollars
so any advice on this would be good as its for my studies but is troubling me a lot
I think this is what you want. (I not familiar with $sum, I've replaced it with SUM())
SELECT
Orderplaced.OID,
Orderplaced.odate,
Orderplaced.custno,
Orderplaced.paid,
sum(ordered.quan * item.PRICE)
FROM
Cust
JOIN Orderplaced ON Cust.CustNo = Orderplaced.custno
JOIN Ordered ON Ordered.Oid = Orderplaced.Oid
JOIN items ON items.itemid = ordered.itemid
WHERE
Orderplaced.odate BETWEEN date 2014-07-01 AND date 2014-12-31
AND Floor((sysdate-Cust.DOB) / 365.25) Between 20 AND 30
AND Cust.SEX = 'M'
GROUP BY
Orderplaced.OID,
Orderplaced.odate,
Orderplaced.custno,
Orderplaced.paid
HAVING
sum(ordered.quant * item.PRICE) > 100;
I think you want to try something like this...
SELECT DISTINCT Orderplaced.OID, Orderplaced.odate, Orderplaced.custno, Orderplaced.paid
FROM Cust
JOIN Orderplaced ON
Cust.<SOMEID> = OrderPlaces.<CustId>
AND Orderplaced.odate BETWEEN '01-JUL-14' AND '31-DEC-14'
WHERE Floor((sysdate-Cust.DOB) / 365.25) Between 20 AND 30
AND Cust.SEX='M'
AND (
SELECT SUM(Ordered.quan*Item.PRICE)
FROM Ordered
JOIN Item ON Item.ItemId = Ordered.ItemId
WHERE Ordered.<SomeId> = OrderPlaced.<SomeId>) > 100
Couple of pointers:
1. Floor returns a number... you are comparing it to a string
2. Typically, when referencing a table in a query, the table has to be joined on its primary keys, ie. In your query you're referencing Item and ordered, without joining any of those tables on any key columns.
Hope that helps

Oracle: Select values in date range with days where value is missing

I want to select values from table in range.
Something like this:
SELECT
date_values.date_from,
date_values.date_to,
sum(values.value)
FROM values
inner join date_values on values.id_date = date_values.id
inner join date_units on date_values.id_unit = date_units.id
WHERE
date_values.date_from >= '14.1.2012' AND
date_values.date_to <= '30.1.2012' AND
date_units.id = 4
GROUP BY
date_values.date_from,
date_values.date_to
ORDER BY
date_values.date_from,
date_values.date_to;
But this query give me back only range of days, where is any value. Like this:
14.01.12 15.01.12 66
15.01.12 16.01.12 4
17.01.12 18.01.12 8
...etc
(Here missing 16.01.12 to 17.01.12)
But I want to select missing value too, like this:
14.01.12 15.01.12 66
15.01.12 16.01.12 4
16.01.12 17.01.12 0
17.01.12 18.01.12 8
...etc
I can't use PL/SQL and if can you advise more general solution which can I expand for use on Hours, Months, Years; will be great.
I'm going to assume you're providing date_from and date_to. If so, you can generate your list of dates first and then join to it to get the remainder of your result. Alternatively, you can union this query to your date_values table as union does a distinct this will remove any extra data.
If this is how the list of dates is generated:
select to_date('14.1.2012','dd.mm.yyyy') + level - 1 as date_from
, to_date('14.1.2012','dd.mm.yyyy') + level as date_to
from dual
connect by level <= to_date('30.1.2012','dd.mm.yyyy')
- to_date('14.1.2012','dd.mm.yyyy')
Your query might become
with the_dates as (
select to_date('14.1.2012','dd.mm.yyyy') + level - 1 as date_from
, to_date('14.1.2012','dd.mm.yyyy') + level as date_to
from dual
connect by level <= to_date('30.1.2012','dd.mm.yyyy')
- to_date('14.1.2012','dd.mm.yyyy')
)
SELECT
dv.date_from,
dv.date_to,
sum(values.value)
FROM values
inner join ( select the_dates.date_from, the_dates.date_to, date_values.id
from the_dates
left outer join date_values
on the_dates.date_from = date_values.date_from ) dv
on values.id_date = dv.id
inner join date_units on date_values.id_unit = date_units.id
WHERE
date_units.id = 4
GROUP BY
dv.date_from,
dv.date_to
ORDER BY
dv.date_from,
dv.date_to;
The with syntax is known as sub-query factoring and isn't really needed in this case but it makes the code cleaner.
I've also assumed that the date columns in date_values are, well, dates. It isn't obvious as you're doing a string comparison. You should always explicitly convert to a date where applicable and you should always store a date as a date. It saves a lot of hassle in the long run as it's impossible for things to be input incorrectly or to be incorrectly compared.

Efficient way to get max date before a given date

Suppose I have a table called Transaction and another table called Price. Price holds the prices for given funds at different dates. Each fund will have prices added at various dates, but they won't have prices at all possible dates. So for fund XYZ I may have prices for the 1 May, 7 May and 13 May and fund ABC may have prices at 3 May, 9 May and 11 May.
So now I'm looking at the price that was prevailing for a fund at the date of a transaction. The transaction was for fund XYZ on 10 May. What I want, is the latest known price on that day, which will be the price for 7 May.
Here's the code:
select d.TransactionID, d.FundCode, d.TransactionDate, v.OfferPrice
from Transaction d
inner join Price v
on v.FundCode = d.FundCode
and v.PriceDate = (
select max(PriceDate)
from Price
where FundCode = v.FundCode
/* */ and PriceDate < d.TransactionDate
)
It works, but it is very slow (several minutes in real world use). If I remove the line with the leading comment, the query is very quick (2 seconds or so) but it then uses the latest price per fund, which is wrong.
The bad part is that the price table is minuscule compared to some of the other tables we use, and it isn't clear to me why it is so slow. I suspect the offending line forces SQL Server to process a Cartesian product, but I don't know how to avoid it.
I keep hoping to find a more efficient way to do this, but it has so far escaped me. Any ideas?
You don't specify the version of SQL Server you're using, but if you are using a version with support for ranking functions and CTE queries I think you'll find this quite a bit more performant than using a correlated subquery within your join statement.
It should be very similar in performance to Andriy's queries. Depending on the exact index topography of your tables, one approach might be slightly faster than another.
I tend to like CTE-based approaches because the resulting code is quite a bit more readable (in my opinion). Hope this helps!
;WITH set_gen (TransactionID, OfferPrice, Match_val)
AS
(
SELECT d.TransactionID, v.OfferPrice, ROW_NUMBER() OVER(PARTITION BY d.TransactionID ORDER BY v.PriceDate ASC) AS Match_val
FROM Transaction d
INNER JOIN Price v
ON v.FundCode = d.FundCode
WHERE v.PriceDate <= d.TransactionDate
)
SELECT sg.TransactionID, d.FundCode, d.TransactionDate, sg.OfferPrice
FROM Transaction d
INNER JOIN set_gen sg ON d.TransactionID = sg.TransactionID
WHERE sg.Match_val = 1
There's a method for finding rows with maximum or minimum values, which involves LEFT JOIN to self, rather than more intuitive, but probably more costly as well, INNER JOIN to a self-derived aggregated list.
Basically, the method uses this pattern:
SELECT t.*
FROM t
LEFT JOIN t AS t2 ON t.key = t2.key
AND t2.Value > t.Value /* ">" is when getting maximums; "<" is for minimums */
WHERE t2.key IS NULL
or its NOT EXISTS counterpart:
SELECT *
FROM t
WHERE NOT EXISTS (
SELECT *
FROM t AS t2
WHERE t.key = t2.key
AND t2.Value > t.Value /* same as above applies to ">" here as well */
)
So, the result is all the rows for which there doesn't exist a row with the same key and the value greater than the given.
When there's just one table, application of the above method is pretty straightforward. However, it may not be that obvious how to apply it when there's another table, especially when, like in your case, the other table makes the actual query more complex not merely by its being there, but also by providing us with an additional filtering for the values we are looking for, namely with the upper limits for the dates.
So, here's what the resulting query might look like when applying the LEFT JOIN version of the method:
SELECT
d.TransactionID,
d.FundCode,
d.TransactionDate,
v.OfferPrice
FROM Transaction d
INNER JOIN Price v ON v.FundCode = d.FundCode
LEFT JOIN Price v2 ON v2.FundCode = v.FundCode /* this and */
AND v2.PriceDate > v.PriceDate /* this are where we are applying
the above method; */
AND v2.PriceDate < d.TransactionDate /* and this is where we are limiting
the maximum value */
WHERE v2.FundCode IS NULL
And here's a similar solution with NOT EXISTS:
SELECT
d.TransactionID,
d.FundCode,
d.TransactionDate,
v.OfferPrice
FROM Transaction d
INNER JOIN Price v ON v.FundCode = d.FundCode
WHERE NOT EXISTS (
SELECT *
FROM Price v2
WHERE v2.FundCode = v.FundCode /* this and */
AND v2.PriceDate > v.PriceDate /* this are where we are applying
the above method; */
AND v2.PriceDate < d.TransactionDate /* and this is where we are limiting
the maximum value */
)
Are both pricedate and transactiondate indexed? If not you are doing table scans which is likely the cause of the performance bottleneck.

Resources