I hope this question provides all of the necessary information, but please do request more if anything is unclear. This is my first question on stack overflow so please bear with me.
I am running this query on SQL Server 2005.
I have a large derived dataset (i'll provide a small subset later) which has 4 fields;
ID,
Year,
StartDate,
EndDate
Within this data set the ID may (correctly) appear multiple times with different date combinations.
The question I have is what ways are there to identify if a record is 'new' I.E it's start date does not fall between the start and end date of any other records for the same id.
For an example take the data set below (I hope this table comes out correctly!);
+----+------+------------+------------+
| ID | Year | Start Date | End Date |
+----+------+------------+------------+
| 1 | 2007 | 01/01/2007 | 10/10/2007 |
| 1 | 2007 | 01/01/2007 | 05/04/2007 |
| 1 | 2007 | 05/04/2007 | 08/10/2007 |
| 1 | 2007 | 15/10/2007 | 20/10/2007 |
| 1 | 2007 | 25/10/2007 | 01/01/2008 |
| 2 | 2007 | 01/01/2007 | 01/01/2008 |
| 2 | 2008 | 01/01/2008 | 15/07/2008 |
| 2 | 2008 | 10/06/2008 | 01/01/2009 |
+----+------+------------+------------+
If we say nothing existed before 2007 then Row 1 and Row 6 are 'new' at that time.
Rows 2,3,7 and 8 are not 'new' as they either join the end of a previous record or overlap it to form a continuous date period (take rows 6 and 7 there are no 'breaks' between 01/01/2008 and 01/01/2009)
Row 4 and 5 would be considered a new record as it does not attach directly to the end of the previous period for ID 1 or overlap any of the other periods.
Currently to get this data set I have to put all of my data into temporary tables and then join them together on various fields to remove the records I don't want.
Firstly I remove rows where the startdate equals the enddate of another row for that ID (This would get rid of rows 3 and 7)
Then I remove rows where the the start date is between the startdate and enddate of other records for that ID (this would remove rows 2 and 8)
That would leave me withRows 1,4,5 and 6 as the 'new' records which is correct.
Is there a more efficient way to do this such as in some sort of loop, CTE or cough Cursor?
As per the above, if there is anything unclear don't hesitate to ask and I will try and provide you with the information you request.
Try
;with cte as
(
Select *, row_number() over (partition by id order by startdate) rn from yourtable
)
select distinct t1.*
from cte t1
left join cte t2
on t1.ID = t2.ID
and t1.EndDate>=t2.StartDate and t1.StartDate<=t2.EndDate
and t1.rn<>t2.rn
where t2.ID is null
or t1.rn=1
this should work, if you have a unique identifier for each row:
select * from
tbl t3
left outer join
(
select distinct t1.id as id_inside, t1.recno as recno_inside
from
tbl t1 inner join
tbl t2 on
t1.id = t2.id and
(t1.startdate <> t2.startdate or t1.enddate <> t2.enddate) and
(t1.startdate >= t2.startdate and t1.enddate <= t2.enddate)
) t4 on
t3.id = t4.id_inside and
t3.recno = t4.recno_inside
where
id_inside is null and
recno_inside is null
sqlfiddle
Related
In SQL Server 2012, I have a table t1 where we store a list of excluded product.
I would like to add a column LastExclusionDate to store the date since when the product has been excluded.
Every day the product is inserted into the table if it is excluded. If not there will be no row and the next time when the product will be excluded there will be a gap date with the previous insert.
I would like to find a T-SQL query to update the LastExclusionDate column.
I would like to use it to populate column LastExclusionDate the first time (=initialisation) and use it every day to update the column when we insert a new row
I've tried this query, but I don't know how to get LastExclusionDate!
;WITH Cte AS
(
SELECT
product_id,
CreationDate,
LAG(CreationDate) OVER (PARTITION BY Product_ID ORDER BY CreationDate) AS GapStart,
(DATEDIFF(DAY, LAG(CreationDate) OVER (PARTITION BY Product_id ORDER BY CreationDate), CreationDate) -1) AS GapDays
FROM
#t1
)
SELECT *
FROM cte
Here's some sample data:
+------------+--------------+--------------------------------+
| product_id | CreationDate | LastExclusionDate_(toPopulate) |
+------------+--------------+--------------------------------+
| 100 | 2018-05-01 | 2018-05-01 |
| 100 | 2018-05-02 | 2018-05-01 |
| 100 | 2018-05-03 | 2018-05-01 |
| 100 | 2018-06-01 | 2018-06-01 |
| 100 | 2018-06-02 | 2018-06-01 |
| 200 | 2018-09-01 | 2018-09-01 |
| 200 | 2018-09-02 | 2018-09-01 |
| 200 | 2018-09-17 | 2018-09-17 |
+------------+--------------+--------------------------------+
Thanks
The idea in finding gap-less sequences is to compare the series to a gap-less sequence and find groups of records where the difference of both doesn't change. For example, when the date increases one by one and a row number also does, then the difference between both stays the same and we found a group:
WITH
cte (product_id, CreationDate, grp) AS (
SELECT product_id, CreationDate
, DATEDIFF(day, '19000101', CreationDate)
- ROW_NUMBER() OVER (PARTITION BY product_id ORDER BY CreationDate)
FROM #t1
)
SELECT product_id, CreationDate
, MIN(CreationDate) OVER (PARTITION BY product_id, grp) AS LastExclusionDate
FROM cte
For ongoing daily insertions it can be done with something like this.
INSERT INTO <yourTable>
SELECT
newProduct.[product_id],
newProduct.[creationDate],
isnull(existingProduct.[lastExclusionDate], newProduct.[creationDate]) AS [lastExclusionDate]
FROM
(SELECT <#product_id> AS [product_id], <#createionDate> AS [creationDate]) AS newProduct
LEFT JOIN #temp existingProduct
ON existingProduct.[product_id] = newProduct.product_id
AND existingProduct.[creationDate] = DATEADD(DAY,-1,newProduct.[creationDate])
I've got a demo here http://rextester.com/BDEO23118 . It's a larger than necessary demo because it uses the code above with the data you provided to populate a table row-by-row like you might in a daily update process. It then does individual insertions using this code with some new dates so you can see the way it handles new ranges. (just an FYI, rextester displays result dates in day.month.year hh:mm:ss format, but you can dump the script into management studio and it will output in DATE format)
Is there a quicker way to convert my data from columns a - d being personnel information, then column e being leave starting day and column f being leave ending day to the following:
Column a - d repeating on each row and column e being a seperate row for each day/date included in the range?
At the moment I am doing this manually to prepare large leave taken/clocked in recon.
I should also add that each row contains a interval for an employees leave taken and that same employee could appear more than once in the dataset.
I am reading up on SQL scripts although it doesn't appear to cover this case with so many rows and intervals to create for each person.
If you want to solve this problem in SQL, then you can use a calendar or dates table for this sort of thing.
For only 152kb in memory, you can have 30 years of dates in a table with this:
/* dates table */
declare #fromdate date = '20000101';
declare #years int = 30;
/* 30 years, 19 used data pages ~152kb in memory, ~264kb on disk */
;with n as (select n from (values(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) t(n))
select top (datediff(day, #fromdate,dateadd(year,#years,#fromdate)))
[Date]=convert(date,dateadd(day,row_number() over(order by (select 1))-1,#fromdate))
into dbo.Dates
from n as deka cross join n as hecto cross join n as kilo
cross join n as tenK cross join n as hundredK
order by [Date];
create unique clustered index ix_dbo_Dates_date on dbo.Dates([Date]);
Without taking the actual step of creating a table, you can generate an adhoc tables of dates using a common table expression with just this:
declare #fromdate date, #thrudate date;
select #fromdate = min(fromdate), #thrudate = max(thrudate) from dbo.leave;
;with n as (select n from (values(0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) t(n))
, dates as (
select top (datediff(day, #fromdate, #thrudate)+1)
[Date]=convert(date,dateadd(day,row_number() over(order by (select 1))-1,#fromdate))
from n as deka cross join n as hecto cross join n as kilo
cross join n as tenK cross join n as hundredK
order by [Date]
)
Use either like so:
/* `distinct` if there are overlaps or duplicates to remove */
select distinct
l.personid
, d.[Date]
from dbo.leave l
inner join dates d
on d.date >= l.fromdate
and d.date <= l.thrudate;
rextester demo: http://rextester.com/AVOIN59493
from this test data:
create table leave (personid int, fromdate date, thrudate date)
insert into leave values
(1,'20170101','20170107')
,(1,'20170104','20170106') -- overlapped
,(1,'20170420','20170422')
,(2,'20170207','20170207') -- single day
,(2,'20170330','20170405')
returns:
+----------+------------+
| personid | Date |
+----------+------------+
| 1 | 2017-01-01 |
| 1 | 2017-01-02 |
| 1 | 2017-01-03 |
| 1 | 2017-01-04 |
| 1 | 2017-01-05 |
| 1 | 2017-01-06 |
| 1 | 2017-01-07 |
| 1 | 2017-04-20 |
| 1 | 2017-04-21 |
| 1 | 2017-04-22 |
| 2 | 2017-02-07 |
| 2 | 2017-03-30 |
| 2 | 2017-03-31 |
| 2 | 2017-04-01 |
| 2 | 2017-04-02 |
| 2 | 2017-04-03 |
| 2 | 2017-04-04 |
| 2 | 2017-04-05 |
+----------+------------+
Number and Calendar table reference:
Generate a set or sequence without loops - 2 - Aaron Bertrand
The "Numbers" or "Tally" Table: What it is and how it replaces a loop - Jeff Moden
Creating a Date Table/Dimension in sql Server 2008 - David Stein
Calendar Tables - Why You Need One - David Stein
Creating a date dimension or calendar table in sql Server - Aaron Bertrand
Guys how i solved this was actually just using another formula on this forum involving a join and between with the date intervals.
Worked fine!
Ps used the calendar for another scenario regarding actual work days taking into account weekends and public holidays....
Thanks
I have a SQL command that SUMS up incidents from TableA and imports the totals into TableB. Then another command that calculates the totals from B and INSERTS INTO TableC. Is it possible to include in TableC the names of those that have the recorded incidents? (Right now it only SUMS up totals and reports as a whole with no names)
I'll give some examples:
TableB
Day 1
Name | Incidents
Tim | 1
Frank | 2
Jay | 1
Day 2
Name | incidents
Tim | 1
Frank | 1
Jay | 1
TableC
Name | Incidents
Tim | 2
Frank | 3
Jay | 2
TableC continues to record data while TableB will be dropped and re recorded daily.
Here is the SQL command to fill TableB:
SELECT [Name], SUM(TableAColumnA) AS TableBColumnB INTO TableB FROM TableA GROUP BY [Name]
Here is the SQL I've tried to populate TableC:
INSERT INTO TableC(ImportDate, DayofData, Name, ColumnBTalbeB)
SELECT GETDATE() AS ImportDate, DATEADD(day, -1, GETDATE()) AS DayofData,
(SELECT SUM(ColumnBTableB) FROM TableB);
What this does is give NULL value to Name and calculate all incidents recorded in TableB.ColumnB. I basically need to show the names of those that had contributed to the total of incidents into TableC. TableC looks like this:
TableC
Name | Incidents | ImportDate | DayofData
NULL | 4 | today's date/time | yesterday's date/time
Was hoping to do something like this.
TableC
Name | incidents | totalincidents | importdate | dayofdata
Tim | 1 | 4 | today's date/tome | yesterday's date/time
Is this possible or do I need to have it calculate into a whole separate table entirely? or just wishful thinking gone too far?
If you could do without TotalIncidents, you would use GROUP BY:
INSERT INTO TableC(ImportDate, DayofData, Name, Incidents)
SELECT GETDATE() AS ImportDate, DATEADD(day, -1, GETDATE()) AS DayofData, Name, Incidents
FROM (SELECT Name, SUM(ColumnBTableB) AS Incidents
FROM TableB
GROUP BY Name);
Since TotalIncidents can be obtained from other data by query:
SELECT SUM(Incidents) AS TotalIncidents
FROM TableC
WHERE DayOfData BETWEEN CONVERT(datetime, '1/24/2016', 101)
AND CONVERT(datetime, '1/25/2016', 101);
Do you really need to store TotalIncidents as a column? It just adds complexity.
I need to write a statement joining two tables based on dates.
Table 1 contains time recording entries.
+----+-----------+--------+---------------+
| ID | Date | UserID | DESC |
+----+-----------+--------+---------------+
| 1 | 1.10.2010 | 5 | did some work |
| 2 | 1.10.2011 | 5 | did more work |
| 3 | 1.10.2012 | 4 | me too |
| 4 | 1.11.2012 | 4 | me too |
+----+-----------+--------+---------------+
Table 2 contains the position of each user in the company. The ValidFrom date is the date at which the user has been or will be promoted.
+----+-----------+--------+------------+
| ID | ValidFrom | UserID | Pos |
+----+-----------+--------+------------+
| 1 | 1.10.2009 | 5 | PM |
| 2 | 1.5.2010 | 5 | Senior PM |
| 3 | 1.10.2010 | 4 | Consultant |
+----+-----------+--------+------------+
I need a query which outputs table one with one added column which is the position of the user at the time the entry has been made. (the Date column)
All date fileds are of type date.
I hope someone can help. I tried a lot but don't get it working.
Try this using a subselect in the where clause:
SQL Fiddle
MS SQL Server 2008 Schema Setup:
CREATE TABLE TimeRecord
(
ID INT,
[Date] Date,
UserID INT,
Description VARCHAR(50)
)
INSERT INTO TimeRecord
VALUES (1,'2010-01-10',5,'did some work'),
(2, '2011-01-10',5,'did more work'),
(3, '2012-01-10', 4, 'me too'),
(4, '2012-11-01',4,'me too')
CREATE TABLE UserPosition
(
ID Int,
ValidFrom Date,
UserId INT,
Pos VARCHAR(50)
)
INSERT INTO UserPosition
VALUES (1, '2009-01-10', 5, 'PM'),
(2, '2010-05-01', 5, 'Senior PM'),
(3, '2010-01-10', 4, 'Consultant ')
Query 1:
SELECT TR.ID,
TR.[Date],
TR.UserId,
TR.Description,
UP.Pos
FROM TimeRecord TR
INNER JOIN UserPosition UP
ON UP.UserId = TR.UserId
WHERE UP.ValidFrom = (SELECT MAX(ValidFrom)
FROM UserPosition UP2
WHERE UP2.UserId = UP.UserID AND
UP2.ValidFrom <= TR.[Date])
Results:
| ID | Date | UserId | Description | Pos |
|----|------------|--------|---------------|-------------|
| 1 | 2010-01-10 | 5 | did some work | PM |
| 2 | 2011-01-10 | 5 | did more work | Senior PM |
| 3 | 2012-01-10 | 4 | me too | Consultant |
| 4 | 2012-11-01 | 4 | me too | Consultant |
You can do it using OUTER APPLY:
SELECT ID, [Date], UserID, [DESC], x.Pos
FROM table1 AS t1
OUTER APPLY (
SELECT TOP 1 Pos
FROM table2 AS t2
WHERE t2.UserID = t1.UserID AND t2.ValidFrom <= t1.[Date]
ORDER BY t2.ValidFrom DESC) AS x(Pos)
For every row of table1 OUTER APPLY operation fetches all table2 rows of the same user that have a ValidFrom date that is older or the same as [Date]. These rows are sorted in descending order and the most recent of these is finally returned.
Note: If no match is found by the OUTER APPLY sub-query then a NULL value is returned, meaning that no valid position exists in table2 for the corresponding record in table1.
Demo here
This works by using a rank function and subquery. I tested it with some sample data.
select sub.ID,sub.Date,sub.UserID,sub.Description,sub.Position
from(
select rank() over(partition by t1.userID order by t2.validfrom desc)
as 'rank', t1.ID as'ID',t1.Date as'Date',t1.UserID as'UserID',t1.Descr
as'Description',t2.pos as'Position', t2.validfrom as 'validfrom'
from temployee t1 inner join jobs t2 on -- replace join tables with your own table names
t1.UserID=t2.UserID
) as sub
where rank=1
This query would work
select t1.*,t2.pos from Table1 t1 left outer join Table2 t2 on
t1.Date=t2.Date and t1.UserID=t2.UserID
I have a Sales table with the following data:
| SalesId | CustomerId | Amount |
|---------|------------|--------|
| 1 | 1 | 100 |
| 2 | 2 | 75 |
| 3 | 1 | 30 |
| 4 | 3 | 49 |
| 5 | 1 | 93 |
I would like to insert a column into this table that tells us the number of times the customer has made a purchase. So it'll be like:
| SalesId | CustomerId | Amount | SalesNum |
|---------|------------|--------|----------|
| 1 | 1 | 100 | 1 |
| 2 | 2 | 75 | 1 |
| 3 | 1 | 30 | 2 |
| 4 | 3 | 49 | 1 |
| 5 | 1 | 93 | 3 |
So I can see that in salesId = 5, that is the 3rd transaction for customerId = 1. How can I write such a query to insert / update such column? I am on MS SQL but I am also interested in the MYSQL solution should I need to do this there in the future.
Thank you.
ps. Apology for the table formatting. Couldn't figure out how to format it nicely.
You need ROW_NUMBER() to assign a sequence number. I'd strongly advise against storing this value though, since you will need to recalculate it with every update, instead, you may be best off creating a view if you need it regularly:
CREATE VIEW dbo.SalesWithRank
AS
SELECT SalesID,
CustomerID,
Amount,
SalesNum = ROW_NUMBER() OVER(PARTITION BY CustomerID ORDER BY SalesID)
FROM Sales;
GO
SQL Server Example on SQL Fiddle
ROW_NUMBER() will not assign duplicates in the same group, e.g. if you were assigning the rows based on Amount and you have two sales for the same customer that are both 100, they will not have the same SalesNum, in the absence of any other ordering criteria in your ROW_NUMBER() function they will be randomly sorted. If you want Sales with the same amount to have the same SalesNum, then you need to use either RANK or DENSE_RANK. DENSE_RANK will have no gaps in the sequence, e.g 1, 1, 2, 2, 3, whereas RANK will start at the corresponding position, e.g. 1, 1, 3, 3, 5.
If you must do this as an update then you can use:
WITH CTE AS
( SELECT SalesID,
CustomerID,
Amount,
SalesNum,
NewSalesNum = ROW_NUMBER() OVER(PARTITION BY CustomerID ORDER BY SalesID)
FROM Sales
)
UPDATE CTE
SET SalesNum = NewSalesNum;
SQL Server Update Example on SQL Fiddle
MySQL Does not have ranking functions, so you need to use local variables to achieve a rank by keeping track of the value from the previous row. This is not allowed in views so you would just need to repeat this logic wherever you needed the row number:
SELECT s.SalesID,
s.Amount,
#r:= CASE WHEN #c = s.CustomerID THEN #r + 1 ELSE 1 END AS SalesNum,
#c:= CustomerID AS CustomerID
FROM Sales AS s
CROSS JOIN (SELECT #c:= 0, #r:= 0) AS var
ORDER BY s.CustomerID, s.SalesID;
The order by is critical here, which means in order to order the results without affecting the ranking you need to use a subquery:
SELECT SalesID,
Amount,
CustomerID,
SalesNum
FROM ( SELECT s.SalesID,
s.Amount,
#r:= CASE WHEN #c = s.CustomerID THEN #r + 1 ELSE 1 END AS SalesNum,
#c:= CustomerID AS CustomerID
FROM Sales AS s
CROSS JOIN (SELECT #c:= 0, #r:= 0) AS var
ORDER BY s.CustomerID, s.SalesID
) AS s
ORDER BY s.SalesID;
MySQL Example on SQL Fiddle
Again, I would recommend against storing the value, but if you must in MySQL you would use:
UPDATE Sales
INNER JOIN
( SELECT s.SalesID,
#r:= CASE WHEN #c = s.CustomerID THEN #r + 1 ELSE 1 END AS NewSalesNum,
#c:= CustomerID AS CustomerID
FROM Sales AS s
CROSS JOIN (SELECT #c:= 0, #r:= 0) AS var
ORDER BY s.CustomerID, s.SalesID
) AS s2
ON Sales.SalesID = s2.SalesID
SET SalesNum = s2.NewSalesNum;
MySQL Update Example on SQL Fiddle
Using Subquery,
Select *, (Select count(customerid)
from ##tmp t
where t.salesid <= s.salesid
and t.customerid = s.customerid)
from ##tmp s
Try this -
SELECT SalesId, CustomerId, Amount,
SalesNum = ROW_NUMBER() OVER (PARTITION BY CustomerId ORDER BY SalesId)
FROM YOURTABLE