Oracle: Select values in date range with days where value is missing - database

I want to select values from table in range.
Something like this:
SELECT
date_values.date_from,
date_values.date_to,
sum(values.value)
FROM values
inner join date_values on values.id_date = date_values.id
inner join date_units on date_values.id_unit = date_units.id
WHERE
date_values.date_from >= '14.1.2012' AND
date_values.date_to <= '30.1.2012' AND
date_units.id = 4
GROUP BY
date_values.date_from,
date_values.date_to
ORDER BY
date_values.date_from,
date_values.date_to;
But this query give me back only range of days, where is any value. Like this:
14.01.12 15.01.12 66
15.01.12 16.01.12 4
17.01.12 18.01.12 8
...etc
(Here missing 16.01.12 to 17.01.12)
But I want to select missing value too, like this:
14.01.12 15.01.12 66
15.01.12 16.01.12 4
16.01.12 17.01.12 0
17.01.12 18.01.12 8
...etc
I can't use PL/SQL and if can you advise more general solution which can I expand for use on Hours, Months, Years; will be great.

I'm going to assume you're providing date_from and date_to. If so, you can generate your list of dates first and then join to it to get the remainder of your result. Alternatively, you can union this query to your date_values table as union does a distinct this will remove any extra data.
If this is how the list of dates is generated:
select to_date('14.1.2012','dd.mm.yyyy') + level - 1 as date_from
, to_date('14.1.2012','dd.mm.yyyy') + level as date_to
from dual
connect by level <= to_date('30.1.2012','dd.mm.yyyy')
- to_date('14.1.2012','dd.mm.yyyy')
Your query might become
with the_dates as (
select to_date('14.1.2012','dd.mm.yyyy') + level - 1 as date_from
, to_date('14.1.2012','dd.mm.yyyy') + level as date_to
from dual
connect by level <= to_date('30.1.2012','dd.mm.yyyy')
- to_date('14.1.2012','dd.mm.yyyy')
)
SELECT
dv.date_from,
dv.date_to,
sum(values.value)
FROM values
inner join ( select the_dates.date_from, the_dates.date_to, date_values.id
from the_dates
left outer join date_values
on the_dates.date_from = date_values.date_from ) dv
on values.id_date = dv.id
inner join date_units on date_values.id_unit = date_units.id
WHERE
date_units.id = 4
GROUP BY
dv.date_from,
dv.date_to
ORDER BY
dv.date_from,
dv.date_to;
The with syntax is known as sub-query factoring and isn't really needed in this case but it makes the code cleaner.
I've also assumed that the date columns in date_values are, well, dates. It isn't obvious as you're doing a string comparison. You should always explicitly convert to a date where applicable and you should always store a date as a date. It saves a lot of hassle in the long run as it's impossible for things to be input incorrectly or to be incorrectly compared.

Related

How to filter IDs based on dates?

I have the following table:
ID | DATES
---+-----------
1 02-09-2010
2 03-08-2011
1 08-01-2011
3 04-03-2010
I am looking for IDs who had at least one date before 05-01-2010 AND at least one date after 05-02-2010
I tried the following:
WHERE tb1.DATES < '05-01-2010' AND tb1.DATES > '05-02-2010'
I don't think it's correct because I wasn't getting the right IDs when I did that and there's something wrong with that logic.
Can someone explain what I am doing wrong here?
The SQL command SELECT * FROM tb1 WHERE tb1.DATES < '05-01-2010' AND tb1.DATES > '05-02-2010' is asking "find all the rows where the 'dates' field is before 1 May and after 2 May" which - when put in English - is obviously none of them.
Instead, the command should be asking "find all the IDs which have a record that is before 1 May, and another record after 2 May" - creating the need to look at multiple records for each ID.
As #Martheen suggested, you could do this with two (sub)queries e.g.,
SELECT A.ID
FROM
(SELECT DISTINCT tb1.ID
FROM mytable tb1
WHERE tb1.[dates] < '20100501'
) AS A
INNER JOIN
(SELECT DISTINCT tb1.ID
FROM mytable tb1
WHERE tb1.[dates] > '20100502'
) AS B
ON A.ID = B.ID;
or using INTERSECT
SELECT DISTINCT tb1.ID
FROM mytable tb1
WHERE tb1.[dates] < '20100501'
INTERSECT
SELECT mt2.ID
FROM mytable mt2
WHERE mt2.[dates] > '20100502';
The use of DISTINCT in the above is so that you only get one row per ID, no matter how many rows they have before/after the relevant dates.
You could also do it via GROUP BY and HAVING - which in this particular case is easy as if any dates are before 1 May, then their earliest date must be before 1 May (and correspondingly for their max data and 2 May) e.g.,
SELECT mt1.ID
FROM mytable mt1
GROUP BY mt1.ID
HAVING MIN(mt1.[dates]) < '20100501' AND MAX(mt1.[dates]) > '20100502';
Here is a db<>fiddle with all 3 of these; all provide the same answer (one row, with ID = 1).
Finally, you should use an unambiguous format for your dates. My preferred one of these is 'yyyyMMdd' with no dashes/slashes/etc (as these make them ambiguous).
Different countries/servers/etc will convert the dates you have there differently e.g., SQL Server UTC string comparison not working
This is one solution to use between to specify range.
SELECT * from Table_name where
From_date BETWEEN '2013-01-03'AND '2013-01-09'
Other solution is to what you mentioned but see that the logic is correct
SELECT * from Table_name where
From_date > '2010-01-05'AND From_date <'2010-02-05'

SQL Server : Join If Between

I have 2 tables:
Query1: contains 3 columns, Due_Date, Received_Date, Diff
where Diff is the difference in the two dates in days
QueryHol with 2 columns, Date, Count
This has a list of dates and the count is set to 1 for everything. All these dates represent public holidays.
I want to be able to get the sum of QueryHol["Count"] if QueryHol["Date"] is between Query1["Due_Date"] and Query1["Received_Date"]
Result Wanted: a column joined onto Query1 to state how many public holidays fell into the date range so they can be subtracted from the Query1["Diff"] column to give a reflection of working days.
Because the 01-01-19 is a bank holiday i would want to minus that from the Diff to end up with results like below
Let me know if you require any more info.
Here's an option:
SELECT query1.due_date
, query1.received_date
, query1.diff
, queryhol.count
, COALESCE(query1.diff - queryhol.count, query1.diff) as DiffCount
FROM Query1
OUTER APPLY(
SELECT COUNT(*) AS count
FROM QueryHol
WHERE QueryHol.Date <= Query1.Received_Date
AND QueryHol.Date >= Query1.Due_Date
) AS queryhol
You may need to play around with the join condition - as it is assumes that the Received_Date is always later than the Due_Date which there is not enough data to know all of the use cases.
If I understand your problem, I think this is a possible solution:
select due_date,
receive_date,
diff,
(select sum(table2.count)
from table2
where table2.due_date between table1.due_date and table1.due_date) sum_holi,
table1.diff - (select sum(table2.count)
from table2
where table2.date between table1.due_date and table2.due_date) diff_holi
from table1
where [...] --here your conditions over table1.

Generate a row range based on "FromRange" "ToRange" field of each row

I have a table with following fields:
DailyWork(ID, WorkerID, FromHour, ToHour) assume that, all of the fields are of type INT.
This table needs to be expanded in a T_SQL statement to be part of a JOIN.
By expand a row, I mean, generate a hour for each number in range of FromHour and ToHour. and then join it with the rest of the statement.
Example:
Assume, I have another table like this: Worker(ID, Name). and a simple SELECT statement would be like this:
SELECT * FROM
Worker JOIN DailyWork ON Worker.ID = DailyWork.WorkerID
The result has columns similar to this: WorkerID, Name, DailyWorkID, WorkerID, FromHour, ToHour
But, what i need, has columns like this: WorkerID, Name, Hour.
In fact the range of FromHour and ToHour is expanded. and each individual hour placed in separate row, in Hour column.
Although i read a similar question to generate a range of number , but it didn't really help.
I you start with a list of numbers, then this is pretty easy. Often, the table master.spt_values is used for this purpose:
with nums as (
select row_number() over (order by (select null)) - 1 as n
from master.spt_values
)
select dw.*, (dw.fromhour + nums.n) as specifichour
from dailywork dw join
nums
on dw.tohour >= dw.fromhour + nums.n;
The table master.spt_values generally has a few thousand rows at least.
Another solution would be...
WITH [DayHours] AS (
SELECT 1 AS [DayHour]
UNION ALL
SELECT [DayHour] + 1 FROM [DayHours] WHERE [DayHour] + 1 <= 24
)
SELECT [Worker]
JOIN [DayHours] ON [Worker].[FromHour] <= [DayHours].[DayHour]
AND [Worker].[ToHour] >= [DayHours].[DayHour]

Substract 2 columns from postgreSQL LEFT JOIN query with NULL values

I have a postgreSQL query which should be the actual stock of samples on our lab.
The initial samples are taken from a table (tblStudies), but then there are 2 tables to look for to decrease the amount of samples.
So I made a union query for those 2 tables, and then matched the uniun query with the tblStudies to calculate the actual stock.
But the union query only gives values when there is a decrease in samples.
So when the study still has it's initial samples, the value isn't returned.
I figured out I should use a JOIN operation, but then I have NULL values for my study with initial samples.
Here is how far I got, any help please?
SELECT
"tblStudies"."Studie_ID", "SamplesWeggezet", c."Stalen_gebruikt", "SamplesWeggezet" - c."Stalen_gebruikt" as "Stock"
FROM
"Stability"."tblStudies"
LEFT JOIN
(
SELECT b."Studie_ID",sum(b."Stalen_gebruikt") as "Stalen_gebruikt"
FROM (
SELECT "tblAnalyses"."Studie_ID", sum("tblAnalyses"."Aant_stalen_gebruikt") AS "Stalen_gebruikt"
FROM "Stability"."tblAnalyses"
GROUP BY "tblAnalyses"."Studie_ID"
UNION
SELECT "tblStalenUitKamer"."Studie_ID", sum("tblStalenUitKamer".aant_stalen) AS "stalen_gebruikt"
FROM "Stability"."tblStalenUitKamer"
GROUP BY "tblStalenUitKamer"."Studie_ID"
) b
GROUP BY b."Studie_ID"
) c ON "tblStudies"."Studie_ID" = c."Studie_ID"
Because you're doing a LEFT JOIN to the inline query "C" some values of c."stalen_gebruikt" can be null. And any number - null is going to yield null. To address this we can use coalesce
So change
"samplesweggezet" - c."stalen_gebruikt" AS "Stock
to
"samplesweggezet" - COALESCE(c."stalen_gebruikt",0) AS "Stock

Efficient way to get max date before a given date

Suppose I have a table called Transaction and another table called Price. Price holds the prices for given funds at different dates. Each fund will have prices added at various dates, but they won't have prices at all possible dates. So for fund XYZ I may have prices for the 1 May, 7 May and 13 May and fund ABC may have prices at 3 May, 9 May and 11 May.
So now I'm looking at the price that was prevailing for a fund at the date of a transaction. The transaction was for fund XYZ on 10 May. What I want, is the latest known price on that day, which will be the price for 7 May.
Here's the code:
select d.TransactionID, d.FundCode, d.TransactionDate, v.OfferPrice
from Transaction d
inner join Price v
on v.FundCode = d.FundCode
and v.PriceDate = (
select max(PriceDate)
from Price
where FundCode = v.FundCode
/* */ and PriceDate < d.TransactionDate
)
It works, but it is very slow (several minutes in real world use). If I remove the line with the leading comment, the query is very quick (2 seconds or so) but it then uses the latest price per fund, which is wrong.
The bad part is that the price table is minuscule compared to some of the other tables we use, and it isn't clear to me why it is so slow. I suspect the offending line forces SQL Server to process a Cartesian product, but I don't know how to avoid it.
I keep hoping to find a more efficient way to do this, but it has so far escaped me. Any ideas?
You don't specify the version of SQL Server you're using, but if you are using a version with support for ranking functions and CTE queries I think you'll find this quite a bit more performant than using a correlated subquery within your join statement.
It should be very similar in performance to Andriy's queries. Depending on the exact index topography of your tables, one approach might be slightly faster than another.
I tend to like CTE-based approaches because the resulting code is quite a bit more readable (in my opinion). Hope this helps!
;WITH set_gen (TransactionID, OfferPrice, Match_val)
AS
(
SELECT d.TransactionID, v.OfferPrice, ROW_NUMBER() OVER(PARTITION BY d.TransactionID ORDER BY v.PriceDate ASC) AS Match_val
FROM Transaction d
INNER JOIN Price v
ON v.FundCode = d.FundCode
WHERE v.PriceDate <= d.TransactionDate
)
SELECT sg.TransactionID, d.FundCode, d.TransactionDate, sg.OfferPrice
FROM Transaction d
INNER JOIN set_gen sg ON d.TransactionID = sg.TransactionID
WHERE sg.Match_val = 1
There's a method for finding rows with maximum or minimum values, which involves LEFT JOIN to self, rather than more intuitive, but probably more costly as well, INNER JOIN to a self-derived aggregated list.
Basically, the method uses this pattern:
SELECT t.*
FROM t
LEFT JOIN t AS t2 ON t.key = t2.key
AND t2.Value > t.Value /* ">" is when getting maximums; "<" is for minimums */
WHERE t2.key IS NULL
or its NOT EXISTS counterpart:
SELECT *
FROM t
WHERE NOT EXISTS (
SELECT *
FROM t AS t2
WHERE t.key = t2.key
AND t2.Value > t.Value /* same as above applies to ">" here as well */
)
So, the result is all the rows for which there doesn't exist a row with the same key and the value greater than the given.
When there's just one table, application of the above method is pretty straightforward. However, it may not be that obvious how to apply it when there's another table, especially when, like in your case, the other table makes the actual query more complex not merely by its being there, but also by providing us with an additional filtering for the values we are looking for, namely with the upper limits for the dates.
So, here's what the resulting query might look like when applying the LEFT JOIN version of the method:
SELECT
d.TransactionID,
d.FundCode,
d.TransactionDate,
v.OfferPrice
FROM Transaction d
INNER JOIN Price v ON v.FundCode = d.FundCode
LEFT JOIN Price v2 ON v2.FundCode = v.FundCode /* this and */
AND v2.PriceDate > v.PriceDate /* this are where we are applying
the above method; */
AND v2.PriceDate < d.TransactionDate /* and this is where we are limiting
the maximum value */
WHERE v2.FundCode IS NULL
And here's a similar solution with NOT EXISTS:
SELECT
d.TransactionID,
d.FundCode,
d.TransactionDate,
v.OfferPrice
FROM Transaction d
INNER JOIN Price v ON v.FundCode = d.FundCode
WHERE NOT EXISTS (
SELECT *
FROM Price v2
WHERE v2.FundCode = v.FundCode /* this and */
AND v2.PriceDate > v.PriceDate /* this are where we are applying
the above method; */
AND v2.PriceDate < d.TransactionDate /* and this is where we are limiting
the maximum value */
)
Are both pricedate and transactiondate indexed? If not you are doing table scans which is likely the cause of the performance bottleneck.

Resources