I have a table that contains data for electric motors the format is:
DATE(DateTime) | TagName(VarChar(50) | Val(Float) |
2009-11-03 17:44:13.000 | Motor_1 | 123.45
2009-11-04 17:44:13.000 | Motor_1 | 124.45
2009-11-05 17:44:13.000 | Motor_1 | 125.45
2009-11-03 17:44:13.000 | Motor_2 | 223.45
2009-11-04 17:44:13.000 | Motor_2 | 224.45
Data for each motor is inserted daily, so there would be 31 Motor_1s and 31
Motor_2s etc. We do this so we can trend it on our control system displays.
I am using views to extract last months max val and last months min val.
Same for this months data. Then I join the two and calculate the difference
to get the actual run hours for that month. The "Val" is a nonresetable
Accumulation from a PLC(Controller). This is my query for Last months Max
Value:
SELECT TagName, Val AS Hours
FROM dbo.All_Data_From_Last_Mon AS cur
WHERE (NOT EXISTS
(SELECT TagName, Val
FROM dbo.All_Data_From_Last_Mon AS high
WHERE (TagName = cur.TagName) AND (Val > cur.Val)))
This is my query for Last months Max
Value:
SELECT TagName, Val AS Hours
FROM dbo.All_Data_From_Last_Mon AS cur
WHERE (NOT EXISTS
(SELECT TagName, Val
FROM dbo.All_Data_From_Last_Mon AS high
WHERE (TagName = cur.TagName) AND (Val < cur.Val)))
This is the query that calculates the difference and runs a bit slow:
SELECT dbo.Motors_Last_Mon_Max.TagName, STR(dbo.Motors_Last_Mon_Max.Hours - dbo.Motors_Last_Mon_Min.Hours, 12, 2) AS Hours
FROM dbo.Motors_Last_Mon_Min RIGHT OUTER JOIN
dbo.Motors_Last_Mon_Max ON dbo.Motors_Last_Mon_Min.TagName = dbo.Motors_Last_Mon_Max.TagName
I know there is a better way. Ultimately I just need last months total and this months total. Any help would be appreciated.
Thanks in advance
First two queries can be handled as one. Something like:
SELECT TagName, MAX(Val) AS MaxVal, MIN(Val) AS MinVal
FROM dbo.All_Data_From_Last_Mon
GROUP BY TagName
-- ORDER BY TagName (optionally)
I now see that these queries are SQL views, used for the third query... and I can see why this would be slow ;-)
The following reproduces the logic, but without the views, and this should allow SQL to optimize quite a bit. At any rate it provides more clarity as to what is being done...
Please "give it a spin".
SELECT DISTINCT Mx.TagName, STR(Mx.Hours - Mn.Hours, 12, 2) AS Hours
FROM dbo.All_Data_From_Last_Mon Mx
RIGHT OUTER JOIN dbo.All_Data_From_Last_Mon Mn ON Mx.TagName = Mn.TagName
AND dbo.All_Data_From_Last_Mon -- Cut the cross product a bit; may not be necessary
WHERE
NOT EXISTS (SELECT * FROM dbo.All_Data_From_Last_Mon Mx1
WHERE Mx1.TagName = Mx.TagName AND Mx1.Hours > Mx.Hours)
AND NOT EXISTS (SELECT * FROM dbo.All_Data_From_Last_Mon Mn1
WHERE Mn1.TagName = Mn.TagName AND Mn1.Hours < Mx.Hours)
Notes:
- notice the DISTINCT in SELECT statement. That is to avoid dup lines in he case there would be several days that show the Maximun (or minumum) Hours value for that month.
- the extra condition on the join is aimed at avoiding a full 31 * 31 cross product, but the conditions that truly bring it to a single line (or several in case of dups) are the NON EXISTS predicates that follow.
- A TagName+Hours index, if not readily present would greatly help.
==> I'd be interested in feedback on this query performance, as run with actual data.
Related
We are attempting to calculate a rolling average and have tried to convert numerous SO answers to solve the problem. To this point we are still unsuccessful.
What we've tried:
Here are some of the SO answers we have considered.
SQL Server: How to get a rolling sum over 3 days for different customers within same table
SQL Query for 7 Day Rolling Average in SQL Server
T-SQL calculate moving average
Our latest attempt has been to modify one of the solutions (#4) found here.
https://www.red-gate.com/simple-talk/sql/t-sql-programming/calculating-values-within-a-rolling-window-in-transact-sql/
Example:
Here is an example in SQL Fiddle: http://sqlfiddle.com/#!6/4570a/17
In the fiddle, we are still trying to get the SUM to work right but ultimately we are trying to get the average.
The end goal
Using the Fiddle example, we need to find the difference between Value1 and ComparisonValue1 and present it as Diff1. When a row has no Value1 available, we need to estimate it by taking the average of the last two Diff1 values and then add it to the ComparisonValue1 for that row.
With the correct query, the result would look like this:
GroupID Number ComparisonValue1 Diff1 Value1
5 10 54.78 2.41 57.19
5 11 55.91 2.62 58.53
5 12 55.93 2.78 58.71
5 13 56.54 2.7 59.24
5 14 56.14 2.74 58.88
5 15 55.57 2.72 58.29
5 16 55.26 2.73 57.99
Question: is it possible to calculate this average when it could potentially factor into the average of the following rows?
Update:
Added a VIEW to the Fiddle schema to simplify the final query.
Updated the query to include the new rolling average for Diff1 (column Diff1Last2Avg). This rolling average works great until we run into nulls in the Value1 column. This is where we need to insert the estimate.
Updated the query to include the estimate that should be used when there is no Value1 (column Value1Estimate). This is working great and would be perfect if we could use the estimate in place of NULL in the Value1 column. Since the Diff1 column reflects the difference between Value1 (or its estimate) and ComparisonValue1, including the Estimate would fill in all the NULL values in Diff1. This in turn would continue to allow the Estimates of future rows to be calculated. It gets confusing at this point, but still hacking away at it. Any ideas?
Credit for the idea goes to this answer: https://stackoverflow.com/a/35152131/6305294 from #JesúsLópez
I have included comments in the code to explain it.
UPDATE
I have corrected the query based on comments.
I have swapped numbers in minuend and subtrahend to get difference as a positive number.
Removed Diff2Ago column.
Results of the query now exactly match your sample output.
;WITH cte AS
(
-- This is similar to your ItemWithComparison view
SELECT i.Number, i.Value1, i2.Value1 AS ComparisonValue1,
-- Calculated Differences; NULL will be returned when i.Value1 is NULL
CONVERT( DECIMAL( 10, 3 ), i.Value1 - i2.Value1 ) AS Diff
FROM Item AS i
LEFT JOIN [Group] AS G ON g.ID = i.GroupID
LEFT JOIN Item AS i2 ON i2.GroupID = g.ComparisonGroupID AND i2.Number = i.Number
WHERE NOT i2.Id IS NULL
),
cte2 AS(
/*
Start with the first number
Note if you do not have at least 2 consecutive numbers (in cte) with non-NULL Diff value and therefore Diff1Ago or Diff2Ago are NULL then everything else will not work;
You may need to add additional logic to handle these cases */
SELECT TOP 1 -- start with the 1st number (see ORDER BY)
a.Number, a.Value1, a.ComparisonValue1, a.Diff, b.Diff AS Diff1Ago
FROM cte AS a
-- "1 number ago"
LEFT JOIN cte AS b ON a.Number - 1 = b.Number
WHERE NOT a.Value1 IS NULL
ORDER BY a.Number
UNION ALL
SELECT b.Number, b.Value1, b.ComparisonValue1,
( CASE
WHEN NOT b.Value1 IS NULL THEN b.Diff
ELSE CONVERT( DECIMAL( 10, 3 ), ( a.Diff + a.Diff1Ago ) / 2.0 )
END ) AS Diff,
a.Diff AS Diff1Ago
FROM cte2 AS a
INNER JOIN cte AS b ON a.Number + 1 = b.Number
)
SELECT *, ( CASE WHEN Value1 IS NULL THEN ComparisonValue1 + Diff ELSE Value1 END ) AS NewValue1
FROM cte2 OPTION( MAXRECURSION 0 );
Limitations:
this solution works well only when you need to consider small number of preceding values.
I have a Trans-SQL related question, concerning summations over a computed column.
I am having a problem with double-counting of these computed values.
Usually I would extract all the raw data and post-process it in Perl, but I can't do that on this occasion due to the particular reporting system we need to use. I'm relatively inexperienced with the intricacies of SQL, so I thought I'd refer this to the experts.
My data is arranged in the following tables (highly simplified and reduced for the purposes of clarity):
Patient table:
PatientId
PatientSer
Course table
PatientSer
CourseSer
CourseId
Diagnosis table
PatientSer
DiagnosisId
Plan table
PlanSer
CourseSer
PlanId
Field table
PlanSer
FieldId
FractionNumber
FieldDateTime
What I would like to do is find the difference between the maximum fraction number and the minimum fraction number over a range of dates in the FieldDateTime in the FieldTable. I would like to then sum these values over the possible plan ids associated with a course, but I do not want to double count over the two particular diagnosis ids (A or B or both) that I may encounter for a patient.
So, for a patient with two diagnosis codes (A and B) and two plans in the same course of treatment (Plan1 and Plan2), with a difference in fraction numbers of 24 for the first plan and 5 for the second what I would like to get out is something like this:
- **PatientId CourseId PlanId DiagnosisId FractionNumberDiff Sum
- AB1234 1 Plan1 A 24 29
- AB1234 1 Plan1 B * *
- AB1234 1 Plan2 A 5 *
- AB1234 1 Plan2 B * *
I've racked my brains about how to do this, and I've tried the following:
SELECT
Patient.PatientId,
Course.CourseId,
Plan.PlanId,
MAX(fractionnumber OVER PARTITION(Plan.PlanSer)) - MIN(fractionnumber OVER PARTITION(Plan.PlanSer)) AS FractionNumberDiff,
SUM(FractionNumberDiff OVER PARTITION(Course.CourseSer)
FROM
Patient P
INNER JOIN
Course C ON (P.PatientSer = C.PatientSer)
INNER JOIN
Plan Pl ON (Pl.CourseSer = C.CourseSer)
INNER JOIN
Diagnosis D ON (D.PatientSer = P.PatientSer)
INNER JOIN
Field F ON (F.PlanSer = Pl.PlanSer)
WHERE
FieldDateTime > [Start Date]
AND FieldDateTime < [End Date]
But this just double-counts over the diagnosis codes, meaning that I end up with 58 instead of 29.
Any ideas about what I can do?
change the FractionNumberDiff to
MAX(fractionnumber) OVER (PARTITION BY Plan.PlanSer) -
MIN(fractionnumber) OVER (PARTITION BY Plan.PlanSer) AS FractionNumberDiff
and remove the "SUM(FractionNumberDiff OVER PARTITION(Course.CourseSer)"
make the exisitng query as a derived table and calcualte the SUM(FractionNumberDiff) there
SELECT *, SUM(FractionNumberDiff) OVER ( PARTITION BYCourse.CourseSer)
FROM
(
< the modified existing query here>
) AS d
as for the double counting issue, please post some sample data and the expected result
I think this question has been answered but I am not skilled enough (yet!) to have recognized how someone elses' answer will help me fix my problem so I apologize if this feels like a repost.
I am using MS Server2012
I need the following results from a query:
LoanNumber | OpenDate | CreditLimit | CaptureDate | CaptureBalance | TodayDate | TodayBalance
LoanNumber is a unique identifier | OpenDate is the date the credit line was opened | CaptureDate is OpenDate + 6 days | CaptureBalance is what we consider to be the initial balance on the credit line and is defined as the balance 6 days after it was opened | TodayDate is today | TodayBalance is the balance today
I want to be able to look at a credit line and compare the initial balance (aka CaptureBalance) to the credit limit as well as compare that to the balance today.
Here's my code and see below for more definitions
select top 100
L1.LOANNUMBER as 'LoanNumber'
,L1.OPENDATE as 'OpenDate' --this is stored as Date
,L2.OPENDATE+6 as 'CaptureDate'
,L1.CREDITLIMIT as 'CreditLimit'
,( Select L2.BALANCE
From LOAN as L2
INNER JOIN LOAN as L1 on L2.LOANNUMBER = L1.LOANNUMBER
Where CONVERT(datetime,convert(char(8),L2.RUNDATE )) = L2.OPENDATE+6
) as 'CaptureBalance'
From LOAN as L1
INNER JOIN LOAN as L2 on L1.LOANNUMBER = L2.LOANNUMBER
Where L1.RUNDATE = 20151130 -- this is stored as INT
and L1.[TYPE] = 'Line of Credit'
RUNDATE is important because every day our system logs a snapsot of that loan. Where L1.RUNDATE = 20151130 is telling the system to give me the balance on Nov 30 2015. I also need to get what the balance was 6 days after the date the loan was opened causing me to reference 2 different run dates.
I have to compare the run date (INT) to OpenDate (Date) so I used CONVERT(datetime,convert(char(8),L2.RUNDATE )) to convert the run date INT --> Date so I can effectively compare the two dates.
When I run this I get:
Subquery returned more than 1 value. This is not permitted when the subquery follows =, !=, <, <= , >, >= or when the subquery is used as an expression.
Initially I was running all of this off of the same table. Then I decided to try giving the loan table 2 different aliases and that's where I stopped.
Is the way I'm using that subquery resulting in "more than 1 value" because each result of that query is trying to get listed as a column header? If yes, I still don't know how to get what I'm looking for.
HELP!?
I am pretty sure this is what you want, or at least one approach to it:
select top 100
L1.LOANNUMBER as 'LoanNumber'
,L1.OPENDATE as 'OpenDate' --this is stored as Date
,L2.RUNDATE as 'CaptureDate'
,L1.CREDITLIMIT as 'CreditLimit'
,L2.BALANCE as 'CaptureBalance'
,L1.RUNDATE as 'TodayDate'
,L1.BALANCE as 'TodayBalance'
From LOAN as L1
INNER JOIN LOAN as L2
on L1.LOANNUMBER = L2.LOANNUMBER
AND L2.RUNDATE=DATEADD(dd, 6, L1.OPENDATE)
Where L1.RUNDATE = 20151130 -- this is stored as INT
and L1.[TYPE] = 'Line of Credit'
I want to select values from table in range.
Something like this:
SELECT
date_values.date_from,
date_values.date_to,
sum(values.value)
FROM values
inner join date_values on values.id_date = date_values.id
inner join date_units on date_values.id_unit = date_units.id
WHERE
date_values.date_from >= '14.1.2012' AND
date_values.date_to <= '30.1.2012' AND
date_units.id = 4
GROUP BY
date_values.date_from,
date_values.date_to
ORDER BY
date_values.date_from,
date_values.date_to;
But this query give me back only range of days, where is any value. Like this:
14.01.12 15.01.12 66
15.01.12 16.01.12 4
17.01.12 18.01.12 8
...etc
(Here missing 16.01.12 to 17.01.12)
But I want to select missing value too, like this:
14.01.12 15.01.12 66
15.01.12 16.01.12 4
16.01.12 17.01.12 0
17.01.12 18.01.12 8
...etc
I can't use PL/SQL and if can you advise more general solution which can I expand for use on Hours, Months, Years; will be great.
I'm going to assume you're providing date_from and date_to. If so, you can generate your list of dates first and then join to it to get the remainder of your result. Alternatively, you can union this query to your date_values table as union does a distinct this will remove any extra data.
If this is how the list of dates is generated:
select to_date('14.1.2012','dd.mm.yyyy') + level - 1 as date_from
, to_date('14.1.2012','dd.mm.yyyy') + level as date_to
from dual
connect by level <= to_date('30.1.2012','dd.mm.yyyy')
- to_date('14.1.2012','dd.mm.yyyy')
Your query might become
with the_dates as (
select to_date('14.1.2012','dd.mm.yyyy') + level - 1 as date_from
, to_date('14.1.2012','dd.mm.yyyy') + level as date_to
from dual
connect by level <= to_date('30.1.2012','dd.mm.yyyy')
- to_date('14.1.2012','dd.mm.yyyy')
)
SELECT
dv.date_from,
dv.date_to,
sum(values.value)
FROM values
inner join ( select the_dates.date_from, the_dates.date_to, date_values.id
from the_dates
left outer join date_values
on the_dates.date_from = date_values.date_from ) dv
on values.id_date = dv.id
inner join date_units on date_values.id_unit = date_units.id
WHERE
date_units.id = 4
GROUP BY
dv.date_from,
dv.date_to
ORDER BY
dv.date_from,
dv.date_to;
The with syntax is known as sub-query factoring and isn't really needed in this case but it makes the code cleaner.
I've also assumed that the date columns in date_values are, well, dates. It isn't obvious as you're doing a string comparison. You should always explicitly convert to a date where applicable and you should always store a date as a date. It saves a lot of hassle in the long run as it's impossible for things to be input incorrectly or to be incorrectly compared.
Suppose I have a table called Transaction and another table called Price. Price holds the prices for given funds at different dates. Each fund will have prices added at various dates, but they won't have prices at all possible dates. So for fund XYZ I may have prices for the 1 May, 7 May and 13 May and fund ABC may have prices at 3 May, 9 May and 11 May.
So now I'm looking at the price that was prevailing for a fund at the date of a transaction. The transaction was for fund XYZ on 10 May. What I want, is the latest known price on that day, which will be the price for 7 May.
Here's the code:
select d.TransactionID, d.FundCode, d.TransactionDate, v.OfferPrice
from Transaction d
inner join Price v
on v.FundCode = d.FundCode
and v.PriceDate = (
select max(PriceDate)
from Price
where FundCode = v.FundCode
/* */ and PriceDate < d.TransactionDate
)
It works, but it is very slow (several minutes in real world use). If I remove the line with the leading comment, the query is very quick (2 seconds or so) but it then uses the latest price per fund, which is wrong.
The bad part is that the price table is minuscule compared to some of the other tables we use, and it isn't clear to me why it is so slow. I suspect the offending line forces SQL Server to process a Cartesian product, but I don't know how to avoid it.
I keep hoping to find a more efficient way to do this, but it has so far escaped me. Any ideas?
You don't specify the version of SQL Server you're using, but if you are using a version with support for ranking functions and CTE queries I think you'll find this quite a bit more performant than using a correlated subquery within your join statement.
It should be very similar in performance to Andriy's queries. Depending on the exact index topography of your tables, one approach might be slightly faster than another.
I tend to like CTE-based approaches because the resulting code is quite a bit more readable (in my opinion). Hope this helps!
;WITH set_gen (TransactionID, OfferPrice, Match_val)
AS
(
SELECT d.TransactionID, v.OfferPrice, ROW_NUMBER() OVER(PARTITION BY d.TransactionID ORDER BY v.PriceDate ASC) AS Match_val
FROM Transaction d
INNER JOIN Price v
ON v.FundCode = d.FundCode
WHERE v.PriceDate <= d.TransactionDate
)
SELECT sg.TransactionID, d.FundCode, d.TransactionDate, sg.OfferPrice
FROM Transaction d
INNER JOIN set_gen sg ON d.TransactionID = sg.TransactionID
WHERE sg.Match_val = 1
There's a method for finding rows with maximum or minimum values, which involves LEFT JOIN to self, rather than more intuitive, but probably more costly as well, INNER JOIN to a self-derived aggregated list.
Basically, the method uses this pattern:
SELECT t.*
FROM t
LEFT JOIN t AS t2 ON t.key = t2.key
AND t2.Value > t.Value /* ">" is when getting maximums; "<" is for minimums */
WHERE t2.key IS NULL
or its NOT EXISTS counterpart:
SELECT *
FROM t
WHERE NOT EXISTS (
SELECT *
FROM t AS t2
WHERE t.key = t2.key
AND t2.Value > t.Value /* same as above applies to ">" here as well */
)
So, the result is all the rows for which there doesn't exist a row with the same key and the value greater than the given.
When there's just one table, application of the above method is pretty straightforward. However, it may not be that obvious how to apply it when there's another table, especially when, like in your case, the other table makes the actual query more complex not merely by its being there, but also by providing us with an additional filtering for the values we are looking for, namely with the upper limits for the dates.
So, here's what the resulting query might look like when applying the LEFT JOIN version of the method:
SELECT
d.TransactionID,
d.FundCode,
d.TransactionDate,
v.OfferPrice
FROM Transaction d
INNER JOIN Price v ON v.FundCode = d.FundCode
LEFT JOIN Price v2 ON v2.FundCode = v.FundCode /* this and */
AND v2.PriceDate > v.PriceDate /* this are where we are applying
the above method; */
AND v2.PriceDate < d.TransactionDate /* and this is where we are limiting
the maximum value */
WHERE v2.FundCode IS NULL
And here's a similar solution with NOT EXISTS:
SELECT
d.TransactionID,
d.FundCode,
d.TransactionDate,
v.OfferPrice
FROM Transaction d
INNER JOIN Price v ON v.FundCode = d.FundCode
WHERE NOT EXISTS (
SELECT *
FROM Price v2
WHERE v2.FundCode = v.FundCode /* this and */
AND v2.PriceDate > v.PriceDate /* this are where we are applying
the above method; */
AND v2.PriceDate < d.TransactionDate /* and this is where we are limiting
the maximum value */
)
Are both pricedate and transactiondate indexed? If not you are doing table scans which is likely the cause of the performance bottleneck.