I am trying to construct SQL code that return me the largest non-overlapping interval containing both, start and end dates.
In this case, I would like to return 2 intervals such as from Jan-Jun and Jun-Aug. How would I do this in SQL? I am assuming maybe some sort of self join?
Given table: test (id int, s date, e date)
Find the rows which have no corresponding superset row.
SELECT t1.*
FROM test AS t1
LEFT JOIN test AS t2
ON t1.s >= t2.s AND t1.e <= t2.e
AND (t1.e - t1.s) < (t2.e - t2.s)
WHERE t2.id IS NULL
;
The join itself returns all rows from test (t1), along with the corresponding superset rows (t2) found. The rows which have no corresponding superset rows produce t2.id of NULL
This does not attempt to address the case where two rows have identical start and end. This is doable, but not explicitly required by the question.
Related
I need to calculate the difference of a column between two lines of a table. Is there any way I can do this directly in SQL? I'm using Microsoft SQL Server 2008.
I'm looking for something like this:
SELECT value - (previous.value) FROM table
Imagining that the "previous" variable reference the latest selected row. Of course with a select like that I will end up with n-1 rows selected in a table with n rows, that's not a probably, actually is exactly what I need.
Is that possible in some way?
Use the lag function:
SELECT value - lag(value) OVER (ORDER BY Id) FROM table
Sequences used for Ids can skip values, so Id-1 does not always work.
SQL has no built in notion of order, so you need to order by some column for this to be meaningful. Something like this:
select t1.value - t2.value from table t1, table t2
where t1.primaryKey = t2.primaryKey - 1
If you know how to order things but not how to get the previous value given the current one (EG, you want to order alphabetically) then I don't know of a way to do that in standard SQL, but most SQL implementations will have extensions to do it.
Here is a way for SQL server that works if you can order rows such that each one is distinct:
select rank() OVER (ORDER BY id) as 'Rank', value into temp1 from t
select t1.value - t2.value from temp1 t1, temp1 t2
where t1.Rank = t2.Rank - 1
drop table temp1
If you need to break ties, you can add as many columns as necessary to the ORDER BY.
WITH CTE AS (
SELECT
rownum = ROW_NUMBER() OVER (ORDER BY columns_to_order_by),
value
FROM table
)
SELECT
curr.value - prev.value
FROM CTE cur
INNER JOIN CTE prev on prev.rownum = cur.rownum - 1
Oracle, PostgreSQL, SQL Server and many more RDBMS engines have analytic functions called LAG and LEAD that do this very thing.
In SQL Server prior to 2012 you'd need to do the following:
SELECT value - (
SELECT TOP 1 value
FROM mytable m2
WHERE m2.col1 < m1.col1 OR (m2.col1 = m1.col1 AND m2.pk < m1.pk)
ORDER BY
col1, pk
)
FROM mytable m1
ORDER BY
col1, pk
, where COL1 is the column you are ordering by.
Having an index on (COL1, PK) will greatly improve this query.
LEFT JOIN the table to itself, with the join condition worked out so the row matched in the joined version of the table is one row previous, for your particular definition of "previous".
Update: At first I was thinking you would want to keep all rows, with NULLs for the condition where there was no previous row. Reading it again you just want that rows culled, so you should an inner join rather than a left join.
Update:
Newer versions of Sql Server also have the LAG and LEAD Windowing functions that can be used for this, too.
select t2.col from (
select col,MAX(ID) id from
(
select ROW_NUMBER() over(PARTITION by col order by col) id ,col from testtab t1) as t1
group by col) as t2
The selected answer will only work if there are no gaps in the sequence. However if you are using an autogenerated id, there are likely to be gaps in the sequence due to inserts that were rolled back.
This method should work if you have gaps
declare #temp (value int, primaryKey int, tempid int identity)
insert value, primarykey from mytable order by primarykey
select t1.value - t2.value from #temp t1
join #temp t2
on t1.tempid = t2.tempid - 1
Another way to refer to the previous row in an SQL query is to use a recursive common table expression (CTE):
CREATE TABLE t (counter INTEGER);
INSERT INTO t VALUES (1),(2),(3),(4),(5);
WITH cte(counter, previous, difference) AS (
-- Anchor query
SELECT MIN(counter), 0, MIN(counter)
FROM t
UNION ALL
-- Recursive query
SELECT t.counter, cte.counter, t.counter - cte.counter
FROM t JOIN cte ON cte.counter = t.counter - 1
)
SELECT counter, previous, difference
FROM cte
ORDER BY counter;
Result:
counter
previous
difference
1
0
1
2
1
1
3
2
1
4
3
1
5
4
1
The anchor query generates the first row of the common table expression cte where it sets cte.counter to column t.counter in the first row of table t, cte.previous to 0, and cte.difference to the first row of t.counter.
The recursive query joins each row of common table expression cte to the previous row of table t. In the recursive query, cte.counter refers to t.counter in each row of table t, cte.previous refers to cte.counter in the previous row of cte, and t.counter - cte.counter refers to the difference between these two columns.
Note that a recursive CTE is more flexible than the LAG and LEAD functions because a row can refer to any arbitrary result of a previous row. (A recursive function or process is one where the input of the process is the output of the previous iteration of that process, except the first input which is a constant.)
I tested this query at SQLite Online.
You can use the following funtion to get current row value and previous row value:
SELECT value,
min(value) over (order by id rows between 1 preceding and 1
preceding) as value_prev
FROM table
Then you can just select value - value_prev from that select and get your answer
I'm having trouble figuring this out.
According to Jeff Atwood A Visual Explanation of SQL Joins Left outer join produces a complete set of records from Table A, with the matching records (where available) in Table B. If there is no match, the right side will contain null.
The left table (TableA) doesn't have duplicates. The right tableB has 1 or 2 entries for each client number. The PrimaryTP designates one as primary with 1 and the other has 0.
I shouldn't have to include the line And B.PrimaryTP = 1 because TableA doesn't have duplicates. Yet if I leave it out I get duplicate client numbers. Why?
Can you help me understand how this works. It's being very confusing to me. The logic of And B.PrimaryTP = 1 escapes me. Yet it seems to work. Still, I'm scared to trust it if I don't understand it. Can you help me understand it. Or do I have a logic error hidden in the query?
SELECT A.ClientNum --returns a list with no duplicate client numbers
FROM (...<TableA>
) as A
Left Outer Join
<TableB> as B
on A.ClientNum = B.ClientNum
--eliminate mismatch of (ClientNum <> FolderNum)
Where A.ClientNum Not In
(
Select ClientNum From <TableB>
Where ClientNum Is Not Null
And ClientNum <> IsNull(FolderNum, '')
)
--eliminate case where B.PrimaryTP <> 1
And B.PrimaryTP = 1
The difference between an INNER JOIN and a LEFT JOIN is just that the LEFT JOIN still returns the rows in Table A when there are no corresponding rows in Table B.
But it's still a JOIN, which means that if there is more than one corresponding row in Table B, it will join the row from Table A to each one of them.
So if you want to make sure that you get no more than one result for each row in Table A, you have to make sure that no more than one row from Table B is found - hence the And B.PrimaryTP = 1.
If you have one client number in A and two matches in Table B, then you will get duplicates.
Suppose you have the following data,
Table-A(client Num) Table-B(client Num)
1 2
2 2
The left Join Results
Table-A(client Num) Table-B(client Num)
1 (null)
2 2
2 2
This is the cause of duplicates. So you need to take distinct values form Table B or perform Distinct on the result set.
I shouldn't have to include the line And B.PrimaryTP = 1 because TableA doesn't have duplicates. Yet if I leave it out I get duplicate client numbers. Why?
Because both rows in the right table match a row in the left table. There is no way for SQL Server to output a triangular result; it must show the columns from both tables for every joined row. And this is true for INNER JOIN as well.
DECLARE #a TABLE(a INT);
DECLARE #b TABLE(b INT);
INSERT #a VALUES(1),(2);
INSERT #b VALUES(1),(1);
SELECT a.a, b.b FROM #a AS a
LEFT OUTER JOIN #b AS b ON a.a = b.b;
SELECT a.a, b.b FROM #a AS a
INNER JOIN #b AS b ON a.a = b.b;
Results:
a b
-- ----
1 1
1 1
2 NULL
a b
-- --
1 1
1 1
On the link that you gave the joins are explained very good. So the problem is that you have several records from table A (no matter that there are no duplicates) is that to 1 record in A there are 2 records in B (in some cases). To avoid this you can use either DISTINCT clause, either GROUP BY clause.
The LEFT OUTER JOIN will give you all the records from A with all the matching records from B. The difference with an INNER JOIN is that if there are no matching records in B, an INNER join will omit the record from A entirely, while the LEFT join will then still include a row with the results from A.
In your case, however, you may also want to check out the DISTINCT keyword.
I have a postgreSQL query which should be the actual stock of samples on our lab.
The initial samples are taken from a table (tblStudies), but then there are 2 tables to look for to decrease the amount of samples.
So I made a union query for those 2 tables, and then matched the uniun query with the tblStudies to calculate the actual stock.
But the union query only gives values when there is a decrease in samples.
So when the study still has it's initial samples, the value isn't returned.
I figured out I should use a JOIN operation, but then I have NULL values for my study with initial samples.
Here is how far I got, any help please?
SELECT
"tblStudies"."Studie_ID", "SamplesWeggezet", c."Stalen_gebruikt", "SamplesWeggezet" - c."Stalen_gebruikt" as "Stock"
FROM
"Stability"."tblStudies"
LEFT JOIN
(
SELECT b."Studie_ID",sum(b."Stalen_gebruikt") as "Stalen_gebruikt"
FROM (
SELECT "tblAnalyses"."Studie_ID", sum("tblAnalyses"."Aant_stalen_gebruikt") AS "Stalen_gebruikt"
FROM "Stability"."tblAnalyses"
GROUP BY "tblAnalyses"."Studie_ID"
UNION
SELECT "tblStalenUitKamer"."Studie_ID", sum("tblStalenUitKamer".aant_stalen) AS "stalen_gebruikt"
FROM "Stability"."tblStalenUitKamer"
GROUP BY "tblStalenUitKamer"."Studie_ID"
) b
GROUP BY b."Studie_ID"
) c ON "tblStudies"."Studie_ID" = c."Studie_ID"
Because you're doing a LEFT JOIN to the inline query "C" some values of c."stalen_gebruikt" can be null. And any number - null is going to yield null. To address this we can use coalesce
So change
"samplesweggezet" - c."stalen_gebruikt" AS "Stock
to
"samplesweggezet" - COALESCE(c."stalen_gebruikt",0) AS "Stock
I have the following access query which I need to run in mssql:
SELECT
[PUB_op-mstr].[om-job],
Last([PUB_op-mstr].[om-emp]) AS [LastOfom-emp],
Max([PUB_op-mstr].[om-dt-end]) AS [MaxOfom-dt-end],
[PUB_op-mstr].[om-wkctr]
FROM
PUB_wc_mstr INNER JOIN [PUB_op-mstr]
ON
PUB_wc_mstr.wc_wkctr = [PUB_op-mstr].[om-wkctr]
GROUP BY
[PUB_op-mstr].[om-job],
[PUB_op-mstr].[om-wkctr],
PUB_wc_mstr.wc_dept
HAVING
(((Max([PUB_op-mstr].[om-dt-end]))>=Date()-7
And
(Max([PUB_op-mstr].[om-dt-end]))<Date())
AND ((PUB_wc_mstr.wc_dept)="633" Or (PUB_wc_mstr.wc_dept)="646"));
MS SQL doesn't support LAST aggregate function. So, you can just replace it with Min / Max. Or you have write your own SELECT like
[LastOfom-emp] = (SELECT ...
LAST() in Access gives the last element of the column you're looking in.
Example: T1 has one column c1, which contains:
one
two
three
The statement:
SELECT LAST(c1) FROM T1
gives: three
Porting this function to SQL Server is doable, but only if there is (at least) one sorted column in the table. To get the last element of the column, you would have to do:
SELECT TOP(1) c1 FROM T1 ORDER BY c1 DESC;
This would give you the wrong result, namely "two" (because the column isn't sorted). So, to find in this case the right answer, you would need another column, for example an incrementing ID
c1 c2
one 1
two 2
three 3
Now you can:
SELECT TOP(1) c1 FROM T1 ORDER BY c2 DESC;
Since c2 is sorted, you now get the result "three".
Assuming Last([PUB_op-mstr].[om-emp]) is the om-emp value for the maximum om-dt-end, try:
select [om-job], [LastOfom-emp], [MaxOfom-dt-end], [om-wkctr] from
(SELECT [PUB_op-mstr].[om-job],
[PUB_op-mstr].[om-emp] AS [LastOfom-emp],
[PUB_op-mstr].[om-dt-end] AS [MaxOfom-dt-end],
[PUB_op-mstr].[om-wkctr],
row_number() over (partition by [PUB_op-mstr].[om-job],
[PUB_op-mstr].[om-wkctr],
PUB_wc_mstr.wc_dept
order by [PUB_op-mstr].[om-dt-end] desc) rn
FROM PUB_wc_mstr
JOIN [PUB_op-mstr]
ON PUB_wc_mstr.wc_wkctr = [PUB_op-mstr].[om-wkctr]
WHERE PUB_wc_mstr.wc_dept IN ('633','646')
) sq
where rn=1 and
[MaxOfom-dt-end]>=Dateadd(d,-7, getdate()) And
[MaxOfom-dt-end]< getdate()
Suppose I have a table called Transaction and another table called Price. Price holds the prices for given funds at different dates. Each fund will have prices added at various dates, but they won't have prices at all possible dates. So for fund XYZ I may have prices for the 1 May, 7 May and 13 May and fund ABC may have prices at 3 May, 9 May and 11 May.
So now I'm looking at the price that was prevailing for a fund at the date of a transaction. The transaction was for fund XYZ on 10 May. What I want, is the latest known price on that day, which will be the price for 7 May.
Here's the code:
select d.TransactionID, d.FundCode, d.TransactionDate, v.OfferPrice
from Transaction d
inner join Price v
on v.FundCode = d.FundCode
and v.PriceDate = (
select max(PriceDate)
from Price
where FundCode = v.FundCode
/* */ and PriceDate < d.TransactionDate
)
It works, but it is very slow (several minutes in real world use). If I remove the line with the leading comment, the query is very quick (2 seconds or so) but it then uses the latest price per fund, which is wrong.
The bad part is that the price table is minuscule compared to some of the other tables we use, and it isn't clear to me why it is so slow. I suspect the offending line forces SQL Server to process a Cartesian product, but I don't know how to avoid it.
I keep hoping to find a more efficient way to do this, but it has so far escaped me. Any ideas?
You don't specify the version of SQL Server you're using, but if you are using a version with support for ranking functions and CTE queries I think you'll find this quite a bit more performant than using a correlated subquery within your join statement.
It should be very similar in performance to Andriy's queries. Depending on the exact index topography of your tables, one approach might be slightly faster than another.
I tend to like CTE-based approaches because the resulting code is quite a bit more readable (in my opinion). Hope this helps!
;WITH set_gen (TransactionID, OfferPrice, Match_val)
AS
(
SELECT d.TransactionID, v.OfferPrice, ROW_NUMBER() OVER(PARTITION BY d.TransactionID ORDER BY v.PriceDate ASC) AS Match_val
FROM Transaction d
INNER JOIN Price v
ON v.FundCode = d.FundCode
WHERE v.PriceDate <= d.TransactionDate
)
SELECT sg.TransactionID, d.FundCode, d.TransactionDate, sg.OfferPrice
FROM Transaction d
INNER JOIN set_gen sg ON d.TransactionID = sg.TransactionID
WHERE sg.Match_val = 1
There's a method for finding rows with maximum or minimum values, which involves LEFT JOIN to self, rather than more intuitive, but probably more costly as well, INNER JOIN to a self-derived aggregated list.
Basically, the method uses this pattern:
SELECT t.*
FROM t
LEFT JOIN t AS t2 ON t.key = t2.key
AND t2.Value > t.Value /* ">" is when getting maximums; "<" is for minimums */
WHERE t2.key IS NULL
or its NOT EXISTS counterpart:
SELECT *
FROM t
WHERE NOT EXISTS (
SELECT *
FROM t AS t2
WHERE t.key = t2.key
AND t2.Value > t.Value /* same as above applies to ">" here as well */
)
So, the result is all the rows for which there doesn't exist a row with the same key and the value greater than the given.
When there's just one table, application of the above method is pretty straightforward. However, it may not be that obvious how to apply it when there's another table, especially when, like in your case, the other table makes the actual query more complex not merely by its being there, but also by providing us with an additional filtering for the values we are looking for, namely with the upper limits for the dates.
So, here's what the resulting query might look like when applying the LEFT JOIN version of the method:
SELECT
d.TransactionID,
d.FundCode,
d.TransactionDate,
v.OfferPrice
FROM Transaction d
INNER JOIN Price v ON v.FundCode = d.FundCode
LEFT JOIN Price v2 ON v2.FundCode = v.FundCode /* this and */
AND v2.PriceDate > v.PriceDate /* this are where we are applying
the above method; */
AND v2.PriceDate < d.TransactionDate /* and this is where we are limiting
the maximum value */
WHERE v2.FundCode IS NULL
And here's a similar solution with NOT EXISTS:
SELECT
d.TransactionID,
d.FundCode,
d.TransactionDate,
v.OfferPrice
FROM Transaction d
INNER JOIN Price v ON v.FundCode = d.FundCode
WHERE NOT EXISTS (
SELECT *
FROM Price v2
WHERE v2.FundCode = v.FundCode /* this and */
AND v2.PriceDate > v.PriceDate /* this are where we are applying
the above method; */
AND v2.PriceDate < d.TransactionDate /* and this is where we are limiting
the maximum value */
)
Are both pricedate and transactiondate indexed? If not you are doing table scans which is likely the cause of the performance bottleneck.