Most efficient way to migrate from timestamp to from-to columns - sql-server

Given a table Orders with columns:
id | revision | insertedAt
1 0 2016-01-01 00:00.000
1 1 2016-01-01 02:00.000
2 0 2016-01-01 02:00.000
Where the id, revision combination is unique.
How can I best migrate to this:
id | revision | applyFrom | applyTo
1 0 2016-01-01 00:00.000 2016-01-01 01:99.999
1 1 2016-01-01 02:00.000 9999-31-12 00:00.000
2 0 2016-01-01 02:00.000 9999-31-12 00:00.000
I've tried iterating over a CURSOR and updating as I go along.
UPDATE orders SET applyFrom = #newApplyFrom, applyTo = #newApplyTo
WHERE id = #id AND revision = #revision;
But with 226 million rows, estimated runtime is somewhere near 60 hours, even hitting the index.
Is there a faster way of achieving the same result? I can add indices as needed. Currently, there is a clustered index on (id, revision).

You can update like below: I am using lead and showing with select
;with cte as (
select *, lead(insertedAt,1,'9999-12-31 00:00.000') over(order by id) migdate from Orders
)
select *, case when insertedAt = migdate then '9999-12-31 00:00.000' else DATEADD(S, -1, migdate) end as applyto from cte

Here is a version including LEAD and a self join. Not sure about the performance on large data sets, but I've included batching just in case.
WITH cte AS (
SELECT
id,
revision,
insertedAt,
applyFrom,
applyTo,
LEAD(insertedAt) OVER (PARTITION BY id ORDER BY id, revision) AS newApplyTo
FROM orders
)
UPDATE TOP (#BatchSize) o SET
applyFrom = o.insertedAt,
applyTo = ISNULL(DATEADD(s, -1, o.newApplyTo), '9999-12-31')
FROM cte o
WHERE
o.applyFrom IS NULL AND
o.applyTo IS NULL;
The dataset I've used (with results) is:
Id revision insertedAt applyFrom applyTo
----------- ----------- --------------------------- --------------------------- ---------------------------
1 0 2016-01-01 00:00:00.0000000 2016-01-01 00:00:00.0000000 2016-01-01 01:59:59.0000000
1 1 2016-01-01 02:00:00.0000000 2016-01-01 02:00:00.0000000 9999-12-31 00:00:00.0000000
2 0 2016-01-01 02:00:00.0000000 2016-01-01 02:00:00.0000000 9999-12-31 00:00:00.0000000
3 0 2016-01-01 00:00:00.0000000 2016-01-01 00:00:00.0000000 2016-10-31 23:59:59.0000000
3 1 2016-11-01 00:00:00.0000000 2016-11-01 00:00:00.0000000 2016-11-30 23:59:59.0000000
3 2 2016-12-01 00:00:00.0000000 2016-12-01 00:00:00.0000000 9999-12-31 00:00:00.0000000

Related

Recursive first day of each month for current getdate

Using T-SQL, I want a new column that will show me the first day of each month, for the current year of getdate().
After that I need to count the rows on this specific date. Should I do it with CTE or a temp table?
If 2012+, you can use DateFromParts()
To Get a List of Dates
Select D = DateFromParts(Year(GetDate()),N,1)
From (values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12)) N(N)
Returns
D
2017-01-01
2017-02-01
2017-03-01
2017-04-01
2017-05-01
2017-06-01
2017-07-01
2017-08-01
2017-09-01
2017-10-01
2017-11-01
2017-12-01
Edit For Trans Count
To get Transactions (assuming by month). It becomes a small matter of a left join to created Dates
-- This is Just a Sample Table Variable for Demonstration.
-- Remove this and Use your actual Transaction Table
--------------------------------------------------------------
Declare #Transactions table (TransDate date,MoreFields int)
Insert Into #Transactions values
('2017-02-18',6)
,('2017-02-19',9)
,('2017-03-05',5)
Select TransMonth = A.MthBeg
,TransCount = count(B.TransDate)
From (
Select MthBeg = DateFromParts(Year(GetDate()),N,1)
,MthEnd = EOMonth(DateFromParts(Year(GetDate()),N,1))
From (values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12)) N(N)
) A
Left Join #Transactions B on TransDate between MthBeg and MthEnd
Group By A.MthBeg
Returns
TransMonth TransCount
2017-01-01 0
2017-02-01 2
2017-03-01 1
2017-04-01 0
2017-05-01 0
2017-06-01 0
2017-07-01 0
2017-08-01 0
2017-09-01 0
2017-10-01 0
2017-11-01 0
2017-12-01 0
For an adhoc table of months for a given year:
declare #year date = dateadd(year,datediff(year,0,getdate() ),0)
;with Months as (
select
MonthStart=dateadd(month,n,#year)
from (values(0),(1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11)) t(n)
)
select MonthStart
from Months
rextester demo: http://rextester.com/POKPM51023
returns:
+------------+
| MonthStart |
+------------+
| 2017-01-01 |
| 2017-02-01 |
| 2017-03-01 |
| 2017-04-01 |
| 2017-05-01 |
| 2017-06-01 |
| 2017-07-01 |
| 2017-08-01 |
| 2017-09-01 |
| 2017-10-01 |
| 2017-11-01 |
| 2017-12-01 |
+------------+
The first part: dateadd(year,datediff(year,0,getdate() ),0) adds the number of years since 1900-01-01 to the date 1900-01-01. So it will return the first date of the year. You can also swap year for other levels of truncation: year, quarter, month, day, hour, minute, second, et cetera.
The second part uses a common table expression and the table value constructor (values (...),(...)) to source numbers 0-11, which are added as months to the start of the year.
Not sure why you require recursive... But for first day of month you can try query like below:
Select Dateadd(day,1,eomonth(Dateadd(month, -1,getdate())))
declare #year date = dateadd(year,datediff(year,0,getdate() ),0)
;WITH months(MonthNumber) AS
(
SELECT 0
UNION ALL
SELECT MonthNumber+1
FROM months
WHERE MonthNumber < 11
)
select dateadd(month,MonthNumber,#year)
from months

MSSQL - split records per week_start and week_end

I have a table similar to the one represented below.
myID | some data | start_date | end_date
1 Tom 2016-01-01 2016-05-09
2 Mike 2015-03-01 2017-03-09
...
I have a function that when provided with start_date, end_date, interval (for example weeks)
returns me data as below. (splits the start and end dates to week intervals)
select * from my_function('2016-01-01','2016-01-12', 'ww')
2015-12-28 00:00:00.000 | 2016-01-03 00:00:00.000 15W53
2016-01-04 00:00:00.000 | 2016-01-10 00:00:00.000 16W1
2016-01-11 00:00:00.000 | 2016-01-17 00:00:00.000 16W2
I would like to be able to write a query that returns all of the values from the 1 table, but splits Start date and end date in to multiple rows using the function.
myID | some data | Week_start_date | Week_end_date | (optional)week_num
1 Tom 2015-12-28 2016-01-03 15W53
1 Tom 2016-01-04 2016-01-10 16W1
1 Tom 2016-01-11 2016-01-17 16W2
...
2 Mike etc....
Could someone please help me with creating such a query ?
select myID,some_data,b.Week_start_date,b.Week_end_date,b.(optional)week_num from #a cross apply
(select * from my_function('2016-01-01','2016-01-12', 'ww'))b
like sample data i tried
create table #a
(
myID int, some_data varchar(50) , start_date date, end_date date)
insert into #a values
(1,'Tom','2016-01-01','2016-05-09'),
(2,'Mike','2015-03-01','2017-03-09')
here iam keeping function result into one temp table
create table #b
(
a datetime,b datetime, c varchar(50)
)
insert into #b values
('2015-12-28 00:00:00.000','2016-01-03 00:00:00.000','15W53'),
('2016-01-04 00:00:00.000','2016-01-10 00:00:00.000','16W1 '),
('2016-01-11 00:00:00.000','2016-01-17 00:00:00.000','16W2 ')
select myID,some_data,b.a,b.b,b.c from #a cross apply
(select * from #b)b
output like this
myID some_data a b c
1 Tom 2015-12-28 00:00:00.000 2016-01-03 00:00:00.000 15W53
1 Tom 2016-01-04 00:00:00.000 2016-01-10 00:00:00.000 16W1
1 Tom 2016-01-11 00:00:00.000 2016-01-17 00:00:00.000 16W2
2 Mike 2015-12-28 00:00:00.000 2016-01-03 00:00:00.000 15W53
2 Mike 2016-01-04 00:00:00.000 2016-01-10 00:00:00.000 16W1
2 Mike 2016-01-11 00:00:00.000 2016-01-17 00:00:00.000 16W2
Based on your current result and expected result,the only difference ,i see is myID
so you will need to frame your query like this..
;with cte
as
(
select * from my_function('2016-01-01','2016-01-12', 'ww')
)
select dense_rank() over (order by somedata) as col,
* from cte
Dense Rank assigns same values for the same partition and assigs the sequential value to next partition ,unlike Rank
Look here for more info:
https://stackoverflow.com/a/7747342/2975396

Update table conditionally based on two columns and multiple rows in another table

I need to update a foreign key in table 1 with the correct entry based on table 2. The correct foreign key is the earliest date that falls after, but not before the next effective dates in table 2. If there are multiple entries in table 2 with the same effective date, then use the modified date column as a tie breaker and pick the most recent one. Here is the based table structure (all dates are in Date format):
Table 1
pK1 PeriodStartDate pK2
1 2016-04-01 00:00:00.000
2 2016-07-01 00:00:00.000
Table 2
pK2 EffectiveFrom ModifiedDate
3 2016-03-01 00:00:00.000 2016-04-01 00:00:00.000
4 2016-05-01 00:00:00.000 2016-06-01 00:00:00.000
5 2016-05-01 00:00:00.000 2016-06-02 00:00:00.000
So in the above example table 1 would look like this:
pK1 PeriodStartDate pK2
1 2016-04-01 00:00:00.000 3
2 2016-07-01 00:00:00.000 5
This is because for row 1 it falls between March 1st and May 1st (from table 2). And for row 2 it is after the last date, but as there are two similar start dates we choose the last modified.
I'm not sure of the solution. I was trying something like this:
UPDATE table1
SET pK2 = table2.pK2
FROM table2
WHERE PeriodStartDate > (SELECT FIRST(table2.EffectiveFrom) FROM table2)
I'm just not sure how to find an entry that is bounded by another row (and then needs another column for the tie breaker)
First off, you need to apply a row_number() over Table2, partitioned on the PeriodStart and ordered by the ModifiedDate (desc). Call this MaxModified; and 1 is always the most recently modified record.
pK2 PeriodStart ModifiedDate MaxModified
3 2016-03-01 00:00:00.000 2016-04-01 00:00:00.000 1
5 2016-05-01 00:00:00.000 2016-06-02 00:00:00.000 1
4 2016-05-01 00:00:00.000 2016-06-01 00:00:00.000 2
Then, for only where MaxModified=1, you add a new "id" to this so we can line up a start date, with the next rows start date (our end date). This is also done with the row_number() function ordered by the PeriodStart.
pK2 PeriodStart ModifiedDate MaxModified myID
3 2016-03-01 00:00:00.000 2016-04-01 00:00:00.000 1 1
5 2016-05-01 00:00:00.000 2016-06-02 00:00:00.000 1 2
Then we take that result and join it to itself offset by one row to get an end date value for each original row.
pK2 PeriodStart ModifiedDate MaxModified myID PeriodEnd
3 2016-03-01 00:00:00.000 2016-04-01 00:00:00.000 1 1 2016-05-01 00:00:00.000
5 2016-05-01 00:00:00.000 2016-06-02 00:00:00.000 1 2 NULL
Once we have that, its a simple matter of joining on the start/end dates to get our pk2 value.
Full script...
DECLARE #Table1 TABLE (pK1 INT, PeriodStart DATETIME, pK2 INT)
DECLARE #Table2 TABLE (pK2 INT, PeriodStart DATETIME, ModifiedDate DATETIME)
INSERT INTO #Table1
VALUES (1,'2016-04-01',NULL),
(2,'2016-07-01',NULL)
INSERT INTO #Table2
VALUES (3,'2016-03-01','2016-04-01'),
(4,'2016-05-01','2016-06-01'),
(5,'2016-05-01','2016-06-02')
;WITH OrderedList AS
(
SELECT *,
ROW_NUMBER() OVER(PARTITION BY PeriodStart ORDER BY ModifiedDate DESC) AS MaxModified
FROM #Table2
),X AS
(
SELECT *,
ROW_NUMBER() OVER(ORDER BY PeriodStart) AS myID
FROM OrderedList
WHERE MaxModified=1
), Y AS
(
SELECT L.*, R.PeriodStart AS PeriodEnd
FROM X L
LEFT JOIN X R ON L.myID=R.myID-1 AND R.MaxModified=1
WHERE L.MaxModified=1
)
UPDATE T SET pK2=Y.pK2
FROM #Table1 T
LEFT JOIN Y ON T.PeriodStart >= Y.PeriodStart AND T.PeriodStart < COALESCE(Y.PeriodEnd,CURRENT_TIMESTAMP)
SELECT *
FROM #Table1

DateDiff of Logtable Dates Which Have The Same Column in SQL Server

Using SQL Server 2012 I need to get the datediff of all dates in a Log table which has the same column, for example:
ID | Version | Status | Date
-----------------------------------------------------
12345 | 1 | new | 2014-05-01 00:00:00.000
12345 | 2 | up | 2014-05-02 00:00:00.000
12345 | 3 | appr | 2014-05-03 00:00:00.000
67890 | 1 | new | 2014-05-04 00:00:00.000
67890 | 2 | up | 2014-05-08 00:00:00.000
67890 | 3 | rej | 2014-05-13 00:00:00.000
I need to get the date diff of all sequential dates (date between 1, 2 and date between 2, 3)
I have tried creating a while but with no luck!
Your help is really appreciated!
This Calculates DateDiff as per your query "date diff of all sequential dates",if not sequential,it will just show same date.Further please don't use Reserved Keywords as Column names
SELECT ID,
[VERSION],
[STATUS],
[DATE],
CASE WHEN LEAD([DATE]) OVER (PARTITION BY ID ORDER BY [VERSION])=DATEADD(DAY,1,[DATE])
THEN CAST(DATEDIFF(DAY,[DATE],LEAD([DATE]) OVER (PARTITION BY ID ORDER BY VERSION)) AS VARCHAR(5))
ELSE [DATE] END AS DATEDIFFF
FROM
#TEMP
Another way with OUTER APPLY (get the previous value) :
SELECT t.*,
DATEDIFF(day,p.[Date],t.[Date]) as dd
FROM YourTable t
OUTER APPLY (
SELECT TOP 1 *
FROM YourTable
WHERE ID = t.ID AND [DATE] < t.[Date] AND [Version] < t.[Version]
ORDER BY [Date] DESC
) as p
Output:
ID Version Status Date dd
12345 1 new 2014-05-01 00:00:00.000 NULL
12345 2 up 2014-05-02 00:00:00.000 1
12345 3 appr 2014-05-03 00:00:00.000 1
67890 1 new 2014-05-04 00:00:00.000 NULL
67890 2 up 2014-05-08 00:00:00.000 4
67890 3 rej 2014-05-13 00:00:00.000 5
Note: If you are using SQL Server 2012 then better use LEAD and LAG functions.

SQL Query for Date Range, multiple start/end times

A table exists in Microsoft SQL Server with record ID, Start Date, End Date and Quantity.
The idea is that for each record, the quantity/total days in range = daily quantity.
Given that a table containing all possible dates exists, how can I generate a result set in SQL Server to look like the following example?
EX:
RecordID | Start Date | End Date | Quantity
1 | 1/1/2010 | 1/5/2010 | 30000
2 | 1/3/2010 | 1/9/2010 | 20000
3 | 1/1/2010 | 1/7/2010 | 10000
Results as
1 | 1/1/2010 | QTY (I can do the math easy, just need the dates view)
1 | 1/2/2010 |
1 | 1/3/2010 |
1 | 1/4/2010 |
1 | 1/3/2010 |
2 | 1/4/2010 |
2 | 1/5/2010 |
2 | 1/6/2010 |
2 | 1/7/2010 |
2 | 1/8/2010 |
2 | 1/9/2010 |
3 | 1/1/2010 |
3 | 1/2/2010 |
3 | 1/3/2010 |
3 | 1/4/2010 |
3 | 1/5/2010 |
3 | 1/6/2010 |
3 | 1/7/2010 |
Grouping on dates I could get then get the sum of quantity on that day however the final result set can't be aggregate due to user provided filters that may exclude some of these records down the road.
EDIT
To clarify, this is just a sample. The filters are irrelevant as I can join to the side to pull in details related to the record ID in the results.
The real data contains N records which increases weekly, the dates are never the same. There could be 2000 records with different start and end dates... That is what I want to generate a view for. I can right join onto the data to do the rest of what I need
I should also mention this is for past, present and future data. I would love to get rid of a temporary table of dates. I was using a recursive query to get all dates that exist within a 50 year span but this exceeds MAXRECURSION limits for a view, that I cannot use.
Answer
select RecordId,d.[Date], Qty/ COUNT(*) OVER (PARTITION BY RecordId) AS Qty
from EX join Dates d on d.Date between [Start Date] and [End Date]
ORDER BY RecordId,[Date]
NB: The below demo CTEs use the date datatype which is SQL Server 2008 the general approach should work for SQL2005 as well though.
Test Case
/*CTEs for testing purposes only*/
WITH EX AS
(
SELECT 1 AS RecordId,
cast('1/1/2010' as date) as [Start Date],
cast('1/5/2010' as date) as [End Date],
30000 AS Qty
union all
SELECT 2 AS RecordId,
cast('1/3/2010' as date) as [Start Date],
cast('1/9/2010' as date) as [End Date],
20000 AS Qty
),Dates AS /*Dates Table now adjusted to do greater range*/
(
SELECT DATEADD(day,s1.number + 2048*s2.number,'1990-01-01') AS [Date]
FROM master.dbo.spt_values s1 CROSS JOIN master.dbo.spt_values s2
where s1.type='P' AND s2.type='P' and s2.number <= 8
order by [Date]
)
select RecordId,d.[Date], Qty/ COUNT(*) OVER (PARTITION BY RecordId) AS Qty
from EX join Dates d on d.Date between [Start Date] and [End Date]
ORDER BY RecordId,[Date]
Results
RecordId Date Qty
----------- ---------- -----------
1 2010-01-01 6000
1 2010-01-02 6000
1 2010-01-03 6000
1 2010-01-04 6000
1 2010-01-05 6000
2 2010-01-03 2857
2 2010-01-04 2857
2 2010-01-05 2857
2 2010-01-06 2857
2 2010-01-07 2857
2 2010-01-08 2857
2 2010-01-09 2857
I think you can try this.
SELECT [Quantities].[RecordID], [Dates].[Date], SUM([Quantity])
FROM [Dates]
JOIN [Quantities] on [Dates].[Date] between [Quantities].[Start Date] and [End Date]
GROUP BY [Quantities].[RecordID], [Dates].[Date]
ORDER BY [Quantities].[RecordID], [Dates].[Date]

Resources