Fill remaining dates between dates in SQL Server - sql-server
I have the following data in a table:
ItemID
Date
Status
001
2021-01-12
Active
001
2021-01-16
Discontinued
001
2021-01-20
Active
I need to fill in the remaining dates like this:
ItemID
Date
Status
001
2021-01-12
Active
001
2021-01-13
Active
001
2021-01-14
Active
001
2021-01-15
Active
001
2021-01-16
Discontinued
001
2021-01-17
Discontinued
001
2021-01-18
Discontinued
001
2021-01-19
Discontinued
001
2021-01-20
Active
Also, I need suggestions on will it be efficient to fill data like this or create two different columns for Valid from and to dates in Data Warehouse?
I have a working solution, but I am sure there are better ways to do this. I assume you would like a working solution, and then you can investigate the performance and optimize it if need be.
As pointed out in the comments, to solve this it is easiest if you have a calendar table. I assume you do not have anything, so I start from scratch. I generate the numbers 0 - 9 and then through successive CROSS JOINS I use those numbers to generate the numbers 0 - 10,000. I did make the assumption that there are not more than 10,000 days between the minimum date and the maximum date, but if this is not correct you can change the code to generate more numbers.
My approach uses several common table expressions as this is how I work to incrementally solve a problem. So first generate the digits, then generate numbers, then determine the minimum and maximum dates for each ItemID, then create a recordset that includes all the dates between the minimum and maximum dates for each ItemID, then I LEFT JOIN this to copy the Status. Finally, you have the interesting problem of how to get the last non NULL value for a column, and there are several approaches. Here is one article of many you can see different approaches: https://www.mssqltips.com/sqlservertip/7379/last-non-null-value-set-of-sql-server-records/ I used the approach that uses the MAX function in a window.
So, putting this all together into a script and starting with your data in a table variable (as well as adding some records for another test), the whole things looks like this:
DECLARE #Data TABLE([ItemID] VARCHAR(3), [Date] DATE, [Status] VARCHAR(15));
INSERT INTO #Data ([ItemID],[Date],[Status])
VALUES ('001', '2021-01-12', 'Active'), ('001', '2021-01-16','Discontinued'),('001', '2021-01-20','Active'),
('002','2022-02-01','Active'), ('002','2022-03-01','Discontinued');
;WITH digits (I) AS
(
SELECT I
FROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) AS digits (I)
)
,integers (I) AS (
SELECT D1.I + (10*D2.I) + (100*D3.I) + (1000*D4.I)
FROM digits AS D1 CROSS JOIN digits AS D2 CROSS JOIN digits AS D3 CROSS JOIN digits AS D4
), itemMinMaxDates AS (
SELECT [ItemID], MIN([Date]) AS [MinDate], MAX([Date]) AS [MaxDate]
FROM #Data GROUP BY [ItemID]
), itemsWithAllDates AS
(
SELECT [imm].[ItemID], DATEADD(DAY,i.I, imm.[MinDate]) AS [Date] FROM [itemMinMaxDates] AS imm CROSS JOIN [integers] AS i
WHERE DATEADD(DAY,i.I, imm.[MinDate]) BETWEEN imm.[MinDate] AND imm.[MaxDate]
), itemsWithAllDatesAndStatus AS
(
SELECT [allDates].[ItemID], [allDates].[Date], [d].[Status] FROM [itemsWithAllDates] AS allDates
LEFT OUTER JOIN #Data AS d ON [allDates].[ItemID] = [d].[ItemID] AND [allDates].[Date] = d.[Date]
), grp AS
(
SELECT [itemsWithAllDatesAndStatus].[ItemID],
[itemsWithAllDatesAndStatus].[Date],
[itemsWithAllDatesAndStatus].[Status],
MAX(IIF([itemsWithAllDatesAndStatus].[Status] IS NOT NULL, [itemsWithAllDatesAndStatus].[Date], NULL)) OVER (PARTITION BY [itemsWithAllDatesAndStatus].[ItemID] ORDER BY [itemsWithAllDatesAndStatus].[Date] ROWS UNBOUNDED PRECEDING) AS grp
FROM itemsWithAllDatesAndStatus
)
SELECT [grp].[ItemID], [grp].[Date],
MAX([grp].[Status]) OVER (PARTITION BY [grp].[ItemID], grp ORDER BY [grp].[Date] ROWS UNBOUNDED PRECEDING) AS [Status]
FROM [grp]
ORDER BY [grp].[ItemID], [grp].[Date];
The result is what you have shown (as well as the data I included for a test):
ItemID
Date
Status
001
2021-01-12
Active
001
2021-01-13
Active
001
2021-01-14
Active
001
2021-01-15
Active
001
2021-01-16
Discontinued
001
2021-01-17
Discontinued
001
2021-01-18
Discontinued
001
2021-01-19
Discontinued
001
2021-01-20
Active
002
2022-02-01
Active
002
2022-02-02
Active
002
2022-02-03
Active
002
2022-02-04
Active
002
2022-02-05
Active
002
2022-02-06
Active
002
2022-02-07
Active
002
2022-02-08
Active
002
2022-02-09
Active
002
2022-02-10
Active
002
2022-02-11
Active
002
2022-02-12
Active
002
2022-02-13
Active
002
2022-02-14
Active
002
2022-02-15
Active
002
2022-02-16
Active
002
2022-02-17
Active
002
2022-02-18
Active
002
2022-02-19
Active
002
2022-02-20
Active
002
2022-02-21
Active
002
2022-02-22
Active
002
2022-02-23
Active
002
2022-02-24
Active
002
2022-02-25
Active
002
2022-02-26
Active
002
2022-02-27
Active
002
2022-02-28
Active
002
2022-03-01
Discontinued
Like I said, this is a working solution, but it is likely not the best or most efficient solution - but it gets you up and running.
Related
SQL - aggregate related accounts - maually set up ID
I have two tables: Account & Amount column list of related accounts Data samples: Account | Amount --------+--------- 001 | $100 002 | $150 003 | $200 004 | $300 Account | Related Account --------+------------------ 001 | 002 002 | 003 003 | 002 My goal is to be able to aggregate all related accounts. From table two - 001,002 & 003 are actually all related to each other. What I would like to be able to do is to get a sum of all related accounts. Possibly ID 001 to 003 as Account #1, so I can aggregate them. Result below ID | Account | Amount -----+-----------+-------- #1 | 001 | $100 #1 | 002 | $150 #1 | 003 | $200 #2 | 004 | $300 I can then manipulate the above table as below (final result) ID | Amount -----+-------- #1 | $450 #2 | $300 I tried doing a join, but it doesn't quite achieve what I want. I still have a problem relating account 001 with 003 (they are indirectly related because 002 is related with both 001 and 003. If anyone can point me to the right direction, will be much appreciated.
Well, you really made this harder then it should be. If you could change the data in the second table, so it will not contain reversed duplicates (in your sample data - 2,3 and 3,2) it would simplify the solution. If you could refactor both tables into a single table, where the related column is a self referencing nullable foreign key, it would simplify the solution even more. Let's assume for a minute you can't do either, and you have to work with the data as provided. So the first thing you want to do is to ignore the reversed duplicates in the second table. This can be done using a common table expression and a couple of case expressions. First, create and populate sample tables (Please save us this step in your future questions): DECLARE #TAccount AS TABLE ( Account int, Amount int ) INSERT INTO #TAccount (Account, Amount) VALUES (1, 100), (2, 150), (3, 200), (4, 300) DECLARE #TRelatedAccounts AS TABLE ( Account int, Related int ) INSERT INTO #TRelatedAccounts (Account, Related) VALUES (1,2), (2,3), (3,2) You want to get only the first two records from the #TRelatedAccounts table. This is the AccountAndRelated CTE. Now, you want to left join the #TAccount table with the results of this query, so for each Account we will have the Account, the Amount, and the Related Account or NULL, if the account is not related to any other account or it's the first on the relationship chain. This is the CTERecursiveBase CTE. Then, based on that you can create a recursive CTE (called CTERecursive), and finally select the sum of amount from the recursive CTE based on the root of the recursion. Here is the entire script: ;WITH AccountAndRelated AS ( SELECT DISTINCT CASE WHEN Account > Related THEN Account Else Related END As Account, CASE WHEN Account > Related THEN Related Else Account END As Related FROM #TRelatedAccounts ) , CTERecursiveBase AS ( SELECT A.Account, Related, Amount FROM #TAccount As A LEFT JOIN AccountAndRelated As R ON A.Account = R.Account ) , CTERecursive AS ( SELECT Account As Id, Account, Related, Amount FROM CTERecursiveBase WHERE Related IS NULL UNION ALL SELECT Id, B.Account, B.Related, B.Amount FROM CTERecursiveBase AS B JOIN CTERecursive AS R ON B.Related = R.Account ) SELECT Id, SUM(Amount) As TotalAmount FROM CTERecursive GROUP BY Id Results: Id TotalAmount 1 450 4 300 You can see a live demo on rextester. Now, Let's assume you can modify the data of the second table. You can use the AccountAndRelated cte to get only the records you need to keep in the #TRelatedAccounts table - This means you can skip the AccountAndRelated cte and use the #TRelatedAccounts directly in the CTERecursiveBase cte. You can see a live demo of that as well. Finally, let's assume you can refactor your database. In that case, I would recommend joining the two tables together - so your #TAccount table would look like this: Account Amount Related 1 100 NULL 2 150 1 3 200 2 4 300 NULL Then you only need the recursive cte. Here is a live demo of that option as well.
SQL delete rows based on date difference
The situation is quite complicated to express in the title. An example should be much easier to understand. My table A: uid id ticket created_date 001 1 movie 2015-01-23 08:23:16 002 25 TV 2012-01-13 12:02:20 003 1 movie 2015-02-01 07:15:36 004 1 movie 2014-02-15 15:38:40 What I need to achieve is to remove duplicate records that appear within 31 days between each other and retain the record that appear first. So the above table would be reduced to B: uid id ticket created_date 001 1 movie 2015-01-23 08:23:16 002 25 TV 2012-01-13 12:02:20 004 1 movie 2014-02-15 15:38:40 because the 3rd row in A were within 31 days of row 1 and it appeared later than row 1 (2015-02-01 vs 2015-01-23), so it gets removed. Is there a clean way to do this?
I would suggest the following approach: SELECT A.uid AS uid INTO #tempA FROM A LEFT JOIN A AS B ON A.id=B.id AND A.ticket=B.ticket WHERE DATEDIFF(SECOND,B.date,A.date) > 0 AND DATEDIFF(SECOND,B.date,A.date) < 31*24*60*60; DELETE FROM A WHERE uid IN (SELECT uid FROM #tempA); This is assuming that by 'duplicate records' you mean records that have both identical id as well as identical ticket fields. If that's not the case you should adjust the ON clause accordingly.
how to use join in h2 database where sum function used?
there is two table:- 1. product 2. batch -:product table:-(Row Count=700) code --------- 001 002 -:Batch table:-(Row Count=35000) batchno productcode Qty --------- ----------- ----- B0002 001 5 B0003 001 10 B0004 001 15 C0005 002 20 C0034 002 10 where batch.qty integer,product.code varchar(20),batch.product varchar(20). This code is working in sql server 2008 but not in h2 embeded database. every fields and its data types are same as in sql server 2008. i want output like :- productcode qty ----------- ---- 001 30 002 30 please help . thanks in advance. i am using this query:- SELECT product.code,(SELECT sum(batch.qty) FROM batch WHERE batch.productcode = product.code)FROM product;
joining table in h2 database but not able to select data
there is two table:- 1. product 2. batch -:product table:- code --------- 001 002 -:Batch table:- batchno productcode Qty --------- ----------- ----- B0002 001 5 B0003 001 10 B0004 001 15 C0005 002 20 C0034 002 10 where batch.qty integer,product.code varchar(20),batch.product varchar(20). This code is working in sql server 2008 but not in h2 embeded database. every fields and its data types are same as in sql server 2008. i want output like :- productcode qty ----------- ---- 001 30 002 30 please help . thanks in advance. i am using this query:- SELECT product.code,(SELECT sum(batch.qty) FROM batch WHERE batch.productcode = product.code)FROM product;
According to your tables and your required result run the following sql command: SELECT batch.productcode, SUM(batch.qty) FROM batch GROUP BY batch.productcode;
How to select a data slice for a specific date from a SQL SERVER database tables which track changes on the row level
All tables in the database has a Date column named EffectiveDate. Data is imported into the database using a logic which detects and inserts changed records only. Let us assume 5 imports happened between 1/1/2014 and 5/1/2014 So Table A has: EffectiveDate id1 column1 column2 -------------- ---- -------- -------- 01/01/2014 1 ABC 123 02/01/2014 1 ABC 999 05/01/2014 1 XXX 999 01/01/2014 2 CCCC 555 03/01/2014 2 CCCC 444 04/01/2014 2 DDDD 444 01/01/2014 3 xxxxx 333 and Table B has EffectiveDate id2 column1 column2 -------------- ----- -------- -------- 01/01/2014 1 ZZZZ AAAAA 03/01/2014 1 ZZZZ AABBB 01/01/2014 2 TTTT AAAAA 05/01/2014 2 TTTT AABBB Now The task is to create 3 set of views for all tables: The first set is to give the Effective data as of current date The second set is to give latest data The third set is to give the data changes after today date (just next changes not the latest) Consideration: All views should return only one row for each id with applicable effective date. If effective date is not available then the maximum effective date in the table less then the requested effective date should be used. I was able to come up with solution for the Effective and Latest views but not for the third set of views (Next changes) Any idea how to address this?
You'll need to use the Row_Number function to get this. For each id, the first future row (whatever that means...) will have a row_number of 1. with RowNumbers as (select id1, effectivedate, row_number() over (partition by id1 order by effectivedate) as RowNumber from a where effectivedate > getdate() ) select a.* from A inner join RowNumbers on a.id1 = Rownumbers.id1 and rownumbers.rownumber = 1 and a.effectivedate = rownumbers.effectivedate SQL Fidldle