I am trying to write a query that returns the time taken by an Order from start to completion.
My table looks like below.
Order No. Action DateTime
111 Start 3/23/2018 8:18
111 Complete 3/23/2018 9:18
112 Start 3/24/2018 6:00
112 Complete 3/24/2018 11:10
Now I am trying to calculate the date difference between start and completion of multiple orders and below is my query:
Declare #StartDate VARCHAR(100), #EndDate VARCHAR(100), #Operation VARCHAR(100)
declare #ORDERTable table
(
order varchar(1000)
)
insert into #ORDERTable values ('111')
insert into #ORDERTable values ('112')
Select #Operation='Boiling'
set #EndDate = (SELECT DATE_TIME from PROCESS WHERE ACTION='COMPLETE' AND ORDER in (select order from #ORDERTable) AND OPERATION=#Operation)
---SELECT #EndDate
set #StartDate = (SELECT DATE_TIME from PROCESS WHERE ACTION='START' AND ORDER in (select order from #ORDERTable) AND OPERATION=#Operation)
---SELECT #StartDate
SELECT DATEDIFF(minute, #StartDate, #EndDate) AS Transaction_Time
So, I am able to input multiple orders but I want to get multiple output as well.
And my second question is if I am able to achieve multiple records as output, how am I gonna make sure which datediff is for which Order?
Awaiting for your answers. Thanks in advance.
I am using MSSQL.
You can aggregate by order number and use MAX or MIN with CASE WHEN to get start or end time:
select
order_no,
max(case when action = 'Start' then date_time end) as start_time,
max(case when action = 'Completed' then date_time end) as end_time,
datediff(
minute,
max(case when action = 'Start' then date_time end),
max(case when action = 'Completed' then date_time end)
) as transaction_time
from process
group by order_no
order by order_no;
You can split up your table into two temp tables, cte's, whatever, and then join them together to find the minutes it took to complete
DECLARE #table1 TABLE (OrderNO INT, Action VARCHAR(100), datetime datetime)
INSERT INTO #table1 (OrderNO, Action, datetime)
VALUES
(111 ,'Start' ,'3/23/2018 8:18'),
(111 ,'Complete' ,'3/23/2018 9:18'),
(112 ,'Start' ,'3/24/2018 6:00'),
(112 ,'Complete' ,'3/24/2018 11:10')
;with cte_start AS (
SELECT orderno, Action, datetime
FROM #table1
WHERE Action = 'Start')
, cte_complete AS (
SELECT orderno, Action, datetime
FROM #table1
WHERE Action = 'Complete')
SELECT
start.OrderNO, DATEDIFF(minute, start.datetime, complete.datetime) AS duration
FROM cte_start start
INNER JOIN cte_complete complete
ON start.OrderNO = complete.OrderNO
Why don't you attempt to approach this problem with a set-based solution? After all, that's what a RDBMS is for. With an assumption that you'd have orders that are of interest to you in a table variable like you described, #ORDERTable(Order), it would go something along the lines of:
SELECT DISTINCT
[Order No.]
, DATEDIFF(
minute,
FIRST_VALUE([DateTime]) OVER (PARTITION BY [Order No.] ORDER BY [DateTime] ASC),
FIRST_VALUE([DateTime]) OVER (PARTITION BY [Order No.] ORDER BY [DateTime] DESC)
) AS Transaction_Time
FROM tableName
WHERE [Order No.] IN (SELECT Order FROM #ORDERTable);
This query works if all the values in the Action attribute are either Start or Complete, but also if there are others in between them.
To read up more on the FIRST_VALUE() window function, check out the documentation.
NOTE: works in SQL Server 2012 or newer versions.
Related
I have a subscription table with a user ID, a subscription start date and a subscription end date. I also have a calendar table with a datestamp field, that is every single date starting from the first subscription date in my subscription table.
I am trying to write something that would give me a table with a date column and three numbers: number of total active (on that day), number of new subscribers, number of unsubscribers.
(N.B. I tried to insert sample tables using the suggested GitHub Flavoured Markdown but it just all goes into one row.)
Currently I am playing with a query that creates multiple joins between the two tables, one for each number:
select a.datestamp
,count(distinct case when b_sub.UserID is not null then b_sub.UserID end) as total_w_subscription
,count(distinct case when b_in.UserID is not null then b_in.UserID end) as total_subscribed
,count(distinct case when b_out.UserID is not null then b_out.UserID end) as total_unsubscribed
from Calendar as a
left join Subscription as b_sub -- all those with subscription on given date
on b_sub.sub_dt <= a.datestamp
and (b_sub.unsub_dt > a.datestamp or b_sub.unsub_dt is null)
left join Subscription as b_in -- all those that subscribed on given date
on b_in.sub_dt = a.datestamp
left join Subscription as b_out -- all those that unsubscribed on given date
on b_out.unsub_dt = a.datestamp
where a.datestamp > '2021-06-10'
group by a.datestamp
order by datestamp asc
;
I have indexed the date fields in both tables. If I only look at one day, it runs in 3 seconds. Two days already takes forever. The Sub table is over 2.6M records and ideally I'll need my timeline to begin sometime in 2012.
What would be the most time efficient way to do this?
You're on the right track. I created some table variables and assumed a data structure that has each subscription include a start and end date.
--Create #dates table variable for calendar
DECLARE #startDate DATETIME = '2018-01-01'
DECLARE #endDate DATETIME = '2021-06-18'
DECLARE #dates TABLE
(
reportingdate DATETIME
)
WHILE #startDate <= #endDate
BEGIN
INSERT INTO #dates SELECT #startDate
SET #startDate += 1
END
--Create #subscriptions table variable for subcriptions to join onto calendar
DECLARE #subscriptions TABLE
(
id INT
,startDate DATETIME
,endDate DATETIME
)
INSERT INTO #subscriptions
VALUES
(1,'2018-01-01 00:00:00.000','2019-10-07 00:00:00.000')
,(2,'2018-01-11 00:00:00.000','2019-12-21 00:00:00.000')
,(3,'2019-04-21 00:00:00.000','2020-03-19 00:00:00.000')
,(4,'2019-12-09 00:00:00.000','2020-05-14 00:00:00.000')
,(5,'2020-04-26 00:00:00.000','2020-07-06 00:00:00.000')
,(6,'2020-05-02 00:00:00.000',NULL)
,(7,'2020-08-31 00:00:00.000','2020-10-29 00:00:00.000')
,(8,'2020-12-13 00:00:00.000','2021-01-13 00:00:00.000')
,(9,'2021-02-12 00:00:00.000','2021-04-19 00:00:00.000')
,(10,'2021-06-10 00:00:00.000',NULL)
;
Then I join the subscription onto the calendar table.
--CTE to join subscription onto calendar and use ROW_NUMBER functions
WITH cte AS (
SELECT
s.id AS SubID
,d.ReportingDate
,ROW_NUMBER() OVER (PARTITION BY s.id ORDER BY d.ReportingDate) AS asc_rn --used to identify 1st
,ROW_NUMBER() OVER (PARTITION BY s.id ORDER BY d.ReportingDate DESC) AS desc_rn --used to identify last
,CASE WHEN s.endDate IS NULL THEN 1 ELSE 0 END AS ActiveSub
FROM #subscriptions s
LEFT JOIN #dates d ON
d.reportingdate BETWEEN s.startDate AND ISNULL(s.endDate,'9999-12-31')
)
I used ROW_NUMBER to identify the first and last date rows of the subscription, as well as checking if the subscription endDate is NULL (still active). I then query the CTE to count subscriptions grouped by day, as well as summing new and terminated subscriptions grouped by day.
--Query CTE using asc_rn, desc_rn, and ActiveSub to identify new subscribers and unsubscribers.
SELECT
ReportingDate
,COUNT(*) AS TotalSubscribers
,SUM(CASE WHEN asc_rn = 1 THEN 1 ELSE 0 END) AS NewSubscribers
,SUM(CASE WHEN desc_rn = 1 AND ActiveSub = 0 THEN 1 ELSE 0 END) AS UnSubscribers
FROM cte
GROUP BY ReportingDate
ORDER BY ReportingDate
I want to sum values where date is between de creationdate and endDate,, hence ValueEnd.
For instances the second row, the creationDate is the same as the endDate, so I have to sum the ValuePerDay of this day to the previsou value. So in the column ValueEnd it is 3.4+1.17 = 4.57
I started by calculating the sum from the days where de Difference is 1, like this:
SELECT
CONVERT(CHAR(10), CreationDate,103) CreationDate
,CONVERT(CHAR(10), EndDate,103) EndDate
,SUM(Values_an) Values_an
FROM Dat1
WHERE Difference=1
GROUP BY CONVERT(CHAR(10), CreationDate,103), CONVERT(CHAR(10), EndDate,103), Difference
However, I'm having trouble sum the values where the difference if higher than 1. Can someone help me please?
OK, judging by the provided information - and as far as I understood everything right - the following approach might solve your problem:
DECLARE #t TABLE(
CreationDate date,
EndDate date,
Value_An decimal(19,4)
)
INSERT INTO #t VALUES
('2019-03-01', '2019-03-01', 3.4)
,('2019-03-01', '2019-03-03', 3.5)
,('2019-05-01', '2019-05-01', 3.6)
,('2019-06-01', '2019-06-04', 3.7)
;WITH cteMultiRow AS(
SELECT CreationDate, COUNT(*) cntRows
FROM #t
GROUP BY CreationDate
HAVING COUNT(*) > 1
),
cte AS(
SELECT t.*
,ROW_NUMBER() OVER (PARTITION BY t.CreationDate ORDER BY t.EndDate) AS rn
,DATEDIFF(d, t.CreationDate, t.EndDate)+1 AS Difference
,CASE WHEN m.CreationDate IS NOT NULL THEN t.Value_An/(DATEDIFF(d, t.CreationDate, t.EndDate)+1) ELSE t.Value_An END AS ValuePerD
FROM #t t
LEFT JOIN cteMultiRow m ON t.CreationDate = m.CreationDate
),
cteSums AS(
SELECT c.CreationDate, SUM(c.ValuePerD) AS ValuePerD
FROM cte c
GROUP BY c.CreationDate
)
SELECT c.CreationDate, c.EndDate, c.Value_An, c.Difference, c.ValuePerD, ISNULL(s.ValuePerD, c.Value_An) AS ValueEnd
FROM cte c
LEFT JOIN cteSums s ON c.CreationDate = s.CreationDate AND c.rn = 1
I am fairly new to SSIS, and now I have this requirement to exclude weekends in order to do a performance management. Now I have created a calendar and marked the weekends; what I am trying to do, using SSIS, is get the start and end date of every status and count how many weekends are there. I am kind of struggling to know which component to use to achieve this task.
So I have mainly two tables:
1- Table Calendar
2- Table History-Log
Calendar has the following columns:
1- ID
2- date
3- year
4- month
5- day of week
6- isweekend
History-Log has the following:
1- ID
2- Status
3- startdate
4- enddate
Your help is really appreciated.
I'm not an SSIS user, so apologies if this answer does not help, but if I wanted to get the result you describe, based on some test data:
DECLARE #Calendar TABLE (
ID INT,
[Date] DATETIME,
[Year] INT,
[Month] INT,
[DayOfWeek] VARCHAR(10),
IsWeekend BIT
)
DECLARE #HistoryLog TABLE (
ID INT,
[Status] INT,
StartDate DATETIME,
EndDate DATETIME
)
DECLARE #StartDate DATE = '20100101', #NumberOfYears INT = 10
DECLARE #CutoffDate DATE = DATEADD(YEAR, #NumberOfYears, #StartDate);
INSERT INTO #Calendar
SELECT ROW_NUMBER() OVER (ORDER BY d) AS ID,
d AS [Date],
DATEPART(YEAR,d) AS [Year],
DATEPART(MONTH,d) AS [Month],
DATENAME(WEEKDAY,d) AS [DayOfWeek],
CASE WHEN DATENAME(WEEKDAY,d) IN ('Saturday','Sunday') THEN 1 ELSE 0 END AS IsWeekend
FROM
(
SELECT d = DATEADD(DAY, rn - 1, #StartDate)
FROM
(
SELECT TOP (DATEDIFF(DAY, #StartDate, #CutoffDate))
rn = ROW_NUMBER() OVER (ORDER BY s1.[object_id])
FROM sys.all_objects AS s1
CROSS JOIN sys.all_objects AS s2
ORDER BY s1.[object_id]
) AS x
) AS y;
INSERT INTO #HistoryLog
SELECT 1, 3, '2016-01-05', '2016-01-20'
UNION
SELECT 2, 7, '2016-01-08', '2016-01-25'
UNION
SELECT 3, 4, '2016-01-01', '2016-02-03'
UNION
SELECT 4, 3, '2016-02-09', '2016-02-10'
I would use a query like this to return all of the HistoryLog records with a count of the number of weekend days between their StartDate and EndDate:
SELECT h.ID,
h.[Status],
h.StartDate,
h.EndDate,
COUNT(c.ID) AS WeekendDays
FROM #HistoryLog h
LEFT JOIN #Calendar c ON c.[Date] >= h.StartDate AND c.[Date] <= h.EndDate AND c.IsWeekend = 1
GROUP BY h.ID, h.[Status], h.StartDate, h.EndDate
ORDER BY 1
If you wanted to know the number of weekends, rather than the number of weekend days, we'd need to slightly amend this logic (and define how a range containing only one weekend day - or one starting on a Sunday and ending on a Saturday inclusive - should be handled). Assuming you just want to know how many distinct weekends are at least partially within the date range, you could do:
SELECT h.ID,
h.[Status],
h.StartDate,
h.EndDate,
COUNT(weekends.ID) AS Weekends
FROM #HistoryLog h
LEFT JOIN
(
SELECT c.ID,
c.[Date] AS SatDate,
DATEADD(DAY,1,c.[Date]) AS SunDate
FROM #Calendar c
WHERE c.[DayOfWeek] = 'Saturday'
) weekends ON h.StartDate BETWEEN weekends.SatDate AND weekends.SunDate
OR h.EndDate BETWEEN weekends.SatDate AND weekends.SunDate
OR (h.StartDate <= weekends.SatDate AND h.EndDate >= weekends.SunDate)
GROUP BY h.ID, h.[Status], h.StartDate, h.EndDate
Lets say I have following query:
SELECT top (5) CAST(Created AS DATE) as DateField,
Count(id) as Counted
FROM Table
GROUP BY CAST(Created AS DATE)
order by DateField desc
Lets say it will return following data set
DateField Counted
2016-01-18 34
2016-01-17 99
2016-01-14 1
2015-12-28 1
2015-12-27 6
But when I have Counted = 0 for certain Date I would like to get that in result set. So for example it should look like following
DateField Counted
2016-01-18 34
2016-01-17 99
2016-01-16 0
2016-01-15 0
2016-01-14 1
Thank you!
Expanding upon KM's answer, you need a date table which is like a numbers table.
There are many examples on the web but here's a simple one.
CREATE TABLE DateList (
DateValue DATE,
CONSTRAINT PK_DateList PRIMARY KEY CLUSTERED (DateValue)
)
GO
-- Insert dates from 01/01/2015 and 12/31/2015
DECLARE #StartDate DATE = '01/01/2015'
DECLARE #EndDatePlus1 DATE = '01/01/2016'
DECLARE #CurrentDate DATE = #StartDate
WHILE #EndDatePlus1 > #CurrentDate
BEGIN
INSERT INTO DateList VALUES (#CurrentDate)
SET #CurrentDate = DATEADD(dd,1,#CurrentDate)
END
Now you have a table
then you can rewrite your query as follows:
SELECT top (5) DateValue, isnull(Count(id),0) as Counted
FROM DateList
LEFT OUTER JOIN Table
on DateValue = CAST(Created AS DATE)
GROUP BY DateValue
order by DateValue desc
Two notes:
You'll need a where clause to specify your range.
A join on a cast isn't ideal. The type in your date table should match the type in your regular table.
One more solution as a single query:
;WITH dates AS
(
SELECT CAST(DATEADD(DAY, ROW_NUMBER() OVER (ORDER BY [object_id]) - 1, '2016-01-14') as date) 'date'
FROM sys.all_objects
)
SELECT TOP 5
[date] AS 'DateField',
SUM(CASE WHEN Created IS NULL THEN 0 ELSE 1 END) AS 'Counted'
FROM dates
LEFT JOIN Table ON [date]=CAST(Created as date)
GROUP BY [date]
ORDER BY [date]
For a more edgy solution, you could use a recursive common table expression to create the date list. PLEASE NOTE: do not use recursive common table expressions in your day job! They are dangerous because it is easy to create one that never terminates.
DECLARE #StartDate date = '1/1/2016';
DECLARE #EndDate date = '1/15/2016';
WITH DateList(DateValue)
AS
(
SELECT DATEADD(DAY, 1, #StartDate)
UNION ALL
SELECT DATEADD(DAY, 1, DateValue)
FROM DateList
WHERE DateList.DateValue < #EndDate
)
SELECT DateValue, isnull(Count(id),0) as Counted
FROM DateList
LEFT OUTER JOIN [Table]
ON DateValue = CAST(Created AS DATE)
GROUP BY DateValue
ORDER BY DateValue DESC
I have a query which retrieves latest data from table with pagination which is working fine.
But once the data is older from current time it should also appear but after the latest one.
SELECT A.UserId,A.FirstName,A.LastName,A.PostDate
(SELECT ROW_NUMBER() OVER(ORDER BY CAST(M.PostDate AS DATETIMEOFFSET) DESC) AS 'RowNumber'
M.UserId,
M.FirstName,
M.LastName,
M.PostDate
FROM Messages AS M
Where M.PostDate >= GetDate()
) A
WHERE A.RowNumber BETWEEN #RowStart AND #RowEnd
ORDER BY CAST(A.PostDate AS DATETIMEOFFSET) DESC
I'm not sure if I'm exactly getting what you're looking for, but this should get you something close. I created a CTE with two computed columns:
BeforeAfter, which determines if the date happens before or after the passed in date
AbsDiff, which gives the absolute value of the date diff.
Then, I just sort using those two fields:
DECLARE #current datetime = GETDATE()
;WITH cteMessages AS
(
SELECT UserId, FirstName, LastName, PostDate,
CASE WHEN PostDate < #current THEN 1 ELSE 0 END AS BeforeAfter,
ABS(DATEDIFF(SECOND, #current, PostDate)) AS AbsDiff
FROM Messages
)
SELECT * FROM cteMessages
ORDER BY BeforeAfter, AbsDiff
From those results, you can see how they are sorted first by messages newer than the passed in date, then in reverse order from older messages. You can substitute that order by into your Row_Number function.
You cant do Order By ASC and DESC on the Same column of a Table if you do you will get the following Error
Msg 169, Level 15, State 1, Line 1 A column has been specified more
than once in the order by list. Columns in the order by list must be
unique.
If you want it you may do using Join of the same Table and order by the same column like
SELECT A.UserId,A.FirstName,A.LastName,A.PostDate
(SELECT ROW_NUMBER() OVER(ORDER BY CAST(M.PostDate AS DATETIMEOFFSET) DESC) AS 'RowNumber'
M.UserId,
M.FirstName,
M.LastName,
M.PostDate
FROM Messages AS M
Where M.PostDate >= GetDate()
) A
Inner Join Messages AS Msg on Msg.UserId=A.UserId
WHERE A.RowNumber BETWEEN #RowStart AND #RowEnd
ORDER BY CAST(A.PostDate AS DATETIMEOFFSET) DESC,
CAST(Msg.PostDate AS DATETIMEOFFSET) ASC
I think you should remove the following line
Where M.PostDate >= GetDate()