Update a column with LastExclusionDate - sql-server

In SQL Server 2012, I have a table t1 where we store a list of excluded product.
I would like to add a column LastExclusionDate to store the date since when the product has been excluded.
Every day the product is inserted into the table if it is excluded. If not there will be no row and the next time when the product will be excluded there will be a gap date with the previous insert.
I would like to find a T-SQL query to update the LastExclusionDate column.
I would like to use it to populate column LastExclusionDate the first time (=initialisation) and use it every day to update the column when we insert a new row
I've tried this query, but I don't know how to get LastExclusionDate!
;WITH Cte AS
(
SELECT
product_id,
CreationDate,
LAG(CreationDate) OVER (PARTITION BY Product_ID ORDER BY CreationDate) AS GapStart,
(DATEDIFF(DAY, LAG(CreationDate) OVER (PARTITION BY Product_id ORDER BY CreationDate), CreationDate) -1) AS GapDays
FROM
#t1
)
SELECT *
FROM cte
Here's some sample data:
+------------+--------------+--------------------------------+
| product_id | CreationDate | LastExclusionDate_(toPopulate) |
+------------+--------------+--------------------------------+
| 100 | 2018-05-01 | 2018-05-01 |
| 100 | 2018-05-02 | 2018-05-01 |
| 100 | 2018-05-03 | 2018-05-01 |
| 100 | 2018-06-01 | 2018-06-01 |
| 100 | 2018-06-02 | 2018-06-01 |
| 200 | 2018-09-01 | 2018-09-01 |
| 200 | 2018-09-02 | 2018-09-01 |
| 200 | 2018-09-17 | 2018-09-17 |
+------------+--------------+--------------------------------+
Thanks

The idea in finding gap-less sequences is to compare the series to a gap-less sequence and find groups of records where the difference of both doesn't change. For example, when the date increases one by one and a row number also does, then the difference between both stays the same and we found a group:
WITH
cte (product_id, CreationDate, grp) AS (
SELECT product_id, CreationDate
, DATEDIFF(day, '19000101', CreationDate)
- ROW_NUMBER() OVER (PARTITION BY product_id ORDER BY CreationDate)
FROM #t1
)
SELECT product_id, CreationDate
, MIN(CreationDate) OVER (PARTITION BY product_id, grp) AS LastExclusionDate
FROM cte

For ongoing daily insertions it can be done with something like this.
INSERT INTO <yourTable>
SELECT
newProduct.[product_id],
newProduct.[creationDate],
isnull(existingProduct.[lastExclusionDate], newProduct.[creationDate]) AS [lastExclusionDate]
FROM
(SELECT <#product_id> AS [product_id], <#createionDate> AS [creationDate]) AS newProduct
LEFT JOIN #temp existingProduct
ON existingProduct.[product_id] = newProduct.product_id
AND existingProduct.[creationDate] = DATEADD(DAY,-1,newProduct.[creationDate])
I've got a demo here http://rextester.com/BDEO23118 . It's a larger than necessary demo because it uses the code above with the data you provided to populate a table row-by-row like you might in a daily update process. It then does individual insertions using this code with some new dates so you can see the way it handles new ranges. (just an FYI, rextester displays result dates in day.month.year hh:mm:ss format, but you can dump the script into management studio and it will output in DATE format)

Related

How to create data between two different dates with limited data

I am trying to create the data between two different dates.
Data in table looks like as shown below:
StartDate | EndDate | StudId | Active
-----------+---------------+-------------+-----------
01-01-2009 | 02-15-2009 | 12345 | Y
02-16-2009 | 03-15-2009 | 12345 | Y
03-16-2009 | 04-10-2009 | 12345 | N
04-11-2009 | 05-31-2009 | 12345 | Y
01-01-2009 | 02-15-2009 | 23642 | Y
02-16-2009 | 03-15-2009 | 23642 | Y
03-16-2009 | 04-10-2009 | 23642 | N
04-11-2009 | 05-31-2009 | 23642 | Y
and the data in table goes on with different Startdate, EndDate and StudID.
I am trying to get the result as shown below:
Startdate | StudID | Active
------------+----------+--------
01-01-2009 | 12345 | Y
01-02-2009 | 12345 | Y
01-03-2009 | 12345 | Y
01-04-2009 | 12345 | Y
. . .
. . .
02-15-2009 | 12345 | Y
02-16-2009 | 12345 | Y
As shown above I am trying to load the active data for student based on dates between Startdate and enddate.
We don't have any daily data using startdate and enddate we need to create daily data. If there is a gap between EndDate and next Startdate then the Active field should be '0' for those dates
Can someone suggest how to do this?
This requires calendar table and join
WITH calendar
AS (SELECT Min(StartDate) AS dates,
Max(EndDate) ed_date
FROM Yourtable
UNION ALL
SELECT Dateadd(dd, 1, dates),
ed_date
FROM calendar
WHERE dates < ed_date)
SELECT a.dates as Startdate,b.StudID,b.Active
FROM calendar a
JOIN Yourtable b
ON a.dates BETWEEN b.StartDate AND b.EndDate
ORDER BY dates
OPTION (maxrecursion 0)
Note: I have used Recursive CTE to generate dates. It is better to create physical calendar table and use it in queries like this
Live Demo
This should work.
declare #tmp date;
select #tmp = max(EndDate) from tmpTable;
print #tmp
;with cte as
(
select min(StartDate) over() as dd from tmpTable
union all select dateadd(day,1,dd) from cte where dd < #tmp
)
select distinct dd as StartDate, isnull(Studid, 12345), isnull(Active,0) as Active from tmpTable as t
right join cte as c on c.dd between t.startDate and t.enddate
where t.Studid = 12345 or t.studid is null
option (maxrecursion 0)

SQL command tallies totals into 2nd table

I have a SQL command that SUMS up incidents from TableA and imports the totals into TableB. Then another command that calculates the totals from B and INSERTS INTO TableC. Is it possible to include in TableC the names of those that have the recorded incidents? (Right now it only SUMS up totals and reports as a whole with no names)
I'll give some examples:
TableB
Day 1
Name | Incidents
Tim | 1
Frank | 2
Jay | 1
Day 2
Name | incidents
Tim | 1
Frank | 1
Jay | 1
TableC
Name | Incidents
Tim | 2
Frank | 3
Jay | 2
TableC continues to record data while TableB will be dropped and re recorded daily.
Here is the SQL command to fill TableB:
SELECT [Name], SUM(TableAColumnA) AS TableBColumnB INTO TableB FROM TableA GROUP BY [Name]
Here is the SQL I've tried to populate TableC:
INSERT INTO TableC(ImportDate, DayofData, Name, ColumnBTalbeB)
SELECT GETDATE() AS ImportDate, DATEADD(day, -1, GETDATE()) AS DayofData,
(SELECT SUM(ColumnBTableB) FROM TableB);
What this does is give NULL value to Name and calculate all incidents recorded in TableB.ColumnB. I basically need to show the names of those that had contributed to the total of incidents into TableC. TableC looks like this:
TableC
Name | Incidents | ImportDate | DayofData
NULL | 4 | today's date/time | yesterday's date/time
Was hoping to do something like this.
TableC
Name | incidents | totalincidents | importdate | dayofdata
Tim | 1 | 4 | today's date/tome | yesterday's date/time
Is this possible or do I need to have it calculate into a whole separate table entirely? or just wishful thinking gone too far?
If you could do without TotalIncidents, you would use GROUP BY:
INSERT INTO TableC(ImportDate, DayofData, Name, Incidents)
SELECT GETDATE() AS ImportDate, DATEADD(day, -1, GETDATE()) AS DayofData, Name, Incidents
FROM (SELECT Name, SUM(ColumnBTableB) AS Incidents
FROM TableB
GROUP BY Name);
Since TotalIncidents can be obtained from other data by query:
SELECT SUM(Incidents) AS TotalIncidents
FROM TableC
WHERE DayOfData BETWEEN CONVERT(datetime, '1/24/2016', 101)
AND CONVERT(datetime, '1/25/2016', 101);
Do you really need to store TotalIncidents as a column? It just adds complexity.

How to remove the duplicate records in select query over clause

I am having Transactions table as follows in SQL SERVER.
UserID TranDate Amount
1 | 2015-04-01 | 0
1 | 2015-05-02 | 5000
1 | 2015-09-07 | 1000
1 | 2015-10-01 | -4000
1 | 2015-10-02 | -700
1 | 2015-10-03 | 252
1 | 2015-10-03 | 260
1 | 2015-10-04 | 1545
1 | 2015-10-05 | 1445
1 | 2015-10-06 | -2000
I want to query this table to get available balance at any particular date. So I used Windowing function for that.
SELECT TransactionDate,
SUM(Amount) OVER (PARTITION BY UserId ORDER BY TransactionDate ROWS
BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) FROM Transactions
But as transactions table is having duplicate entry for date 2015-10-03 it is repeating data for date 2015-10-03. Whenever there is same date I am expecting the last record of that date with available balance summed up.
Current output
TransactionDate AvailableBalance
2015-04-01 | 0
2015-05-02 | 5000
2015-09-07 | 6000
2015-10-01 | 2000
2015-10-02 | 1300
2015-10-03 | 1552
2015-10-03 | 1804
2015-10-04 | 3349
2015-10-05 | 4794
2015-10-06 | 2794
Expected: I want to remove below record from the above result set.
2015-10-03 | 1552
HERE is my sql fiddle
You can SUM before windowed function like:
SqlFiddleDemo
WITH cte AS
(
SELECT TransactionDate, UserId, SUM(Amount) AS Amount
FROM Transactions
GROUP BY TransactionDate, UserId
)
SELECT TransactionDate,
SUM(Amount) OVER (PARTITION BY UserId ORDER BY TransactionDate ROWS
BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS AvailableBalance
FROM cte
Use RANGE instead of ROWS.
SQL Fiddle
SELECT
TransactionDate,
SUM(Amount) OVER (
PARTITION BY UserId
ORDER BY TransactionDate
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS AvailableBalance
FROM Transactions;
This variant produces a different result set than originally requested, but it may be useful in some cases. This variant returns same number of rows as in Transactions table. So, it will return two rows with 2015-10-03, but for both rows AvailableBalance would be 1804.
I just wanted to highlight that there is that option RANGE. If you really need one row per day, then grouping by day at first as in the answer by #lad2025 is the way to go.

Remove Almost Duplicate Rows in SQL

I have found a lot of examples online of how to remove duplicate rows in a SQL table but I cannot figure out how to remove almost duplicate rows.
Data Example
+--------+----------+--------+
| Col1 | Col2 | NumCol |
+--------+----------+--------+
| USA | Organic | 300 |
| USA | Organic | 400 |
| Canada | Referral | 120 |
| Canada | Referral | 120 |
+--------+----------+--------+
Desired Output
+--------+----------+--------+
| Col1 | Col2 | NumCol |
+--------+----------+--------+
| USA | Organic | 400 |
| Canada | Referral | 120 |
+--------+----------+--------+
In this example, if 2 rows are identical then I would like one of them to be removed. In addition, if 2 rows match based on Col1 and Col2, then I would like the row with the lesser value in NumCol to be removed.
My SQL Server Express code is:
WITH CTE AS(
SELECT [Col1]
,[Col2]
,[NumCol]
, RN = ROW_NUMBER()OVER(PARTITION BY [Col1]
,[Col2]
,[NumCol] ORDER BY [Col1])
FROM table
)
DELETE FROM CTE WHERE RN > 1
This code does a good job of deleting duplicates but it doesn't get rid of rows where only Col1 and Col2 match but not NumCol. How should I approach something like this? I'm a newbie to SQL, so any explanation in layman's terms is appreciated!
You can let the row numbers restart per (Col1, Col2) pair by changing:
RN = ROW_NUMBER()OVER(PARTITION BY [Col1]
,[Col2]
,[NumCol] ORDER BY [Col1])
To:
RN = ROW_NUMBER() OVER(
PARTITION BY Col1, Col1
ORDER BY NumCol desc)
The order by NumCol desc makes sure that the rows with the lower NumCol are removed.

Efficient Date Comparisons in SQL

I hope this question provides all of the necessary information, but please do request more if anything is unclear. This is my first question on stack overflow so please bear with me.
I am running this query on SQL Server 2005.
I have a large derived dataset (i'll provide a small subset later) which has 4 fields;
ID,
Year,
StartDate,
EndDate
Within this data set the ID may (correctly) appear multiple times with different date combinations.
The question I have is what ways are there to identify if a record is 'new' I.E it's start date does not fall between the start and end date of any other records for the same id.
For an example take the data set below (I hope this table comes out correctly!);
+----+------+------------+------------+
| ID | Year | Start Date | End Date |
+----+------+------------+------------+
| 1 | 2007 | 01/01/2007 | 10/10/2007 |
| 1 | 2007 | 01/01/2007 | 05/04/2007 |
| 1 | 2007 | 05/04/2007 | 08/10/2007 |
| 1 | 2007 | 15/10/2007 | 20/10/2007 |
| 1 | 2007 | 25/10/2007 | 01/01/2008 |
| 2 | 2007 | 01/01/2007 | 01/01/2008 |
| 2 | 2008 | 01/01/2008 | 15/07/2008 |
| 2 | 2008 | 10/06/2008 | 01/01/2009 |
+----+------+------------+------------+
If we say nothing existed before 2007 then Row 1 and Row 6 are 'new' at that time.
Rows 2,3,7 and 8 are not 'new' as they either join the end of a previous record or overlap it to form a continuous date period (take rows 6 and 7 there are no 'breaks' between 01/01/2008 and 01/01/2009)
Row 4 and 5 would be considered a new record as it does not attach directly to the end of the previous period for ID 1 or overlap any of the other periods.
Currently to get this data set I have to put all of my data into temporary tables and then join them together on various fields to remove the records I don't want.
Firstly I remove rows where the startdate equals the enddate of another row for that ID (This would get rid of rows 3 and 7)
Then I remove rows where the the start date is between the startdate and enddate of other records for that ID (this would remove rows 2 and 8)
That would leave me withRows 1,4,5 and 6 as the 'new' records which is correct.
Is there a more efficient way to do this such as in some sort of loop, CTE or cough Cursor?
As per the above, if there is anything unclear don't hesitate to ask and I will try and provide you with the information you request.
Try
;with cte as
(
Select *, row_number() over (partition by id order by startdate) rn from yourtable
)
select distinct t1.*
from cte t1
left join cte t2
on t1.ID = t2.ID
and t1.EndDate>=t2.StartDate and t1.StartDate<=t2.EndDate
and t1.rn<>t2.rn
where t2.ID is null
or t1.rn=1
this should work, if you have a unique identifier for each row:
select * from
tbl t3
left outer join
(
select distinct t1.id as id_inside, t1.recno as recno_inside
from
tbl t1 inner join
tbl t2 on
t1.id = t2.id and
(t1.startdate <> t2.startdate or t1.enddate <> t2.enddate) and
(t1.startdate >= t2.startdate and t1.enddate <= t2.enddate)
) t4 on
t3.id = t4.id_inside and
t3.recno = t4.recno_inside
where
id_inside is null and
recno_inside is null
sqlfiddle

Resources