apply query to each part of table individually - sql-server

I need to run some query against each rowset in a table (Azure SQL):
ID CustomerID MsgTimestamp Msg
-------------------------------------------------
1 123 2017-01-01 10:00:00 Hello
2 123 2017-01-01 10:01:00 Hello again
3 123 2017-01-01 10:02:00 Can you help me with my order
4 123 2017-01-01 11:00:00 Are you still there
5 456 2017-01-01 10:07:00 Hey I'm a new customer
What I want to do is to extract "chat session" for every customer from message records, that is, if the gap between someone's two consecutive messages is less than 30 minutes, they belong to the same session. I need to record the start and end time of each session in a new table. In the example above, start and end time of the first session for customer 123 are 10:00 and 10:02.
I know I can always use cursor and temp table to achieve that goal, but I'm thinking about utilizing any pre-built mechanism to reach better performance. Please kindly give me some input.

You can use window functions instead of cursor. Something like this should work:
declare #t table (ID int, CustomerID int, MsgTimestamp datetime2(0), Msg nvarchar(100))
insert #t values
(1, 123, '2017-01-01 10:00:00', 'Hello'),
(2, 123, '2017-01-01 10:01:00', 'Hello again'),
(3, 123, '2017-01-01 10:02:00', 'Can you help me with my order'),
(4, 123, '2017-01-01 11:00:00', 'Are you still there'),
(5, 456, '2017-01-01 10:07:00', 'Hey I''m a new customer')
;with x as (
select *, case when datediff(minute, lag(msgtimestamp, 1, '19000101') over(partition by customerid order by msgtimestamp), msgtimestamp) > 30 then 1 else 0 end as g
from #t
),
y as (
select *, sum(g) over(order by msgtimestamp) as gg
from x
)
select customerid, min(msgtimestamp), max(msgtimestamp)
from y
group by customerid, gg

Related

Find the date when a milestone was achieved

I have people that do many multi-day assignments (date x to date Y). I would like to find the date that they completed a milestone e.g. 50 days work completed.
Data is stored as a single row per Assignment
AssignmentId
StartDate
EndDate
I can sum up the total days they have completed up to a date, but am struggling to see how I would find out the date that a milestone was hit. e.g. How many people completed 50 days in October 2020 showing the date within the month that this occurred?
Thanks in advance
PS. Our database is SQL Server.
As mentioned by prwvious comments, it would be much easier to help you if you could provide example data and table structure in order help you answer this question.
However, guessing a simple DB structure with a table for your peolple, your tasks and the work each user completed, you can get the required sum of days by use of a date table (or cte) which contains a entry for each day and the window function SUM with UNBOUNDED PRECEDING. Following an example:
DECLARE #people TABLE(
id int
,name nvarchar(50)
)
DECLARE #tasks TABLE(
id int
,name nvarchar(50)
)
DECLARE #work TABLE(
people_id int
,task_id int
,task_StartDate date
,task_EndDate date
)
INSERT INTO #people VALUES (1, 'Peter'), (2, 'Paul'), (3, 'Mary');
INSERT INTO #tasks VALUES (1, 'Devleopment'), (2, 'QA'), (3, 'Sales');
INSERT INTO #work VALUES
(1, 1, '2019-04-05', '2019-04-08')
,(1, 1, '2019-05-05', '2019-06-08')
,(1, 1, '2019-07-05', '2019-09-08')
,(2, 2, '2019-04-08', '2019-06-08')
,(2, 2, '2019-09-08', '2019-10-03')
,(3, 1, '2019-11-01', '2019-12-01')
;WITH cte AS(
SELECT CAST('2019-01-01' AS DATE) AS dateday
UNION ALL
SELECT DATEADD(d, 1, dateday)
FROM cte
WHERE DATEADD(d, 1, dateday) < '2020-01-01'
),
cteWorkDays AS(
SELECT people_id, task_id, dateday, 1 AS cnt
FROM #work w
INNER JOIN cte c ON c.dateday BETWEEN w.task_StartDate AND w.task_EndDate
),
ctePeopleWorkdays AS(
SELECT *, SUM(cnt) OVER (PARTITION BY people_id ORDER BY dateday ROWS UNBOUNDED PRECEDING) dayCnt
FROM cteWorkDays
)
SELECT *
FROM ctePeopleWorkdays
WHERE dayCnt = 50
OPTION (MAXRECURSION 0)
The solution depends on how you store your data. The solution below assumes that each worked day exists as a single row in your data model.
The approach below uses a common table expression (cte) to generate a running total (Total) for each person (PersonId) and then filters on the milestone target (I set it to 5 to reduce the sample data size) and target month.
Sample data
create table WorkedDays
(
PersonId int,
TaskDate date
);
insert into WorkedDays (PersonId, TaskDate) values
(100, '2020-09-01'),
(100, '2020-09-02'),
(100, '2020-09-03'),
(100, '2020-09-04'),
(100, '2020-09-05'), -- person 100 worked 5 days by 2020-09-05 = milestone (in september)
(200, '2020-09-29'),
(200, '2020-09-30'),
(200, '2020-10-01'),
(200, '2020-10-02'),
(200, '2020-10-03'), -- person 200 worked 5 days by 2020-10-03 = milestone (in october)
(200, '2020-10-04'),
(200, '2020-10-05'),
(200, '2020-10-06'),
(300, '2020-10-10'),
(300, '2020-10-11'),
(300, '2020-10-12'),
(300, '2020-10-13'),
(300, '2020-10-14'), -- person 300 worked 5 days by 2020-10-14 = milestone (in october)
(300, '2020-10-15'),
(400, '2020-10-20'),
(400, '2020-10-21'); -- person 400 did not reach the milestone yet
Solution
with cte as
(
select wd.PersonId,
wd.TaskDate,
count(1) over(partition by wd.PersonId
order by wd.TaskDate
rows between unbounded preceding and current row) as Total
from WorkedDays wd
)
select cte.PersonId,
cte.TaskDate as MileStoneDate
from cte
where cte.Total = 5 -- milestone reached
and year(cte.TaskDate) = 2020
and month(cte.TaskDate) = 10; -- in october
Result
PersonId MilestoneDate
-------- -------------
200 2020-10-03
300 2020-10-14
Fiddle (also shows the common table expression output).

FInd duplicate rows and show only the earliest

I have the following table:
respid, uploadtime
I need a query that will show all the records that respid is duplicate and show them except the latest (by upload time)
exmple:
4 2014-01-01
4 2014-06-01
4 2015-01-01
4 2015-06-01
4 2016-01-01
In this case the query should return four records (the latest is : 4 2016-01-01 )
Thank you very much.
Use ROW_NUMBER:
WITH cte AS (
SELECT respid, uploadtime,
ROW_NUMBER() OVER (PARTITION BY respid ORDER BY uploadtime DESC) rn
FROM yourTable
)
SELECT respid, uploadtime
FROM cte
WHERE rn > 1
ORDER BY respid, uploadtime;
The logic here is to show all records except those having the first row number value, which would be the latest records for each respid group.
If I interpreted your question correctly, then you want to see all records where respid occurs multiple times, but exclude the last duplicate.
Translating this to SQL could sound like "show all records that have a later record for the same respid". That is exactly what the solution below does. It says that for every row in the result a later record with the same respid must exists.
Sample data
declare #MyTable table
(
respid int,
uploadtime date
);
insert into #MyTable (respid, uploadtime) values
(4, '2014-01-01'),
(4, '2014-06-01'),
(4, '2015-01-01'),
(4, '2015-06-01'),
(4, '2016-01-01'), --> last duplicate of respid=4, not part of result
(5, '2020-01-01'); --> has no duplicate, not part of result
Solution
select mt.respid, mt.uploadtime
from #MyTable mt
where exists ( select top 1 'x'
from #MyTable mt2
where mt2.respid = mt.respid
and mt2.uploadtime > mt.uploadtime );
Result
respid uploadtime
----------- ----------
4 2014-01-01
4 2014-06-01
4 2015-01-01
4 2015-06-01

In T-SQL is there a built-in command to determine if a number is in a range from another table

This is not a homework question.
I'm trying to take the count of t-shirts in an order and see which price range the shirts fall into, depending on how many have been ordered.
My initial thought (I am brand new at this) was to ask another table if count > 1st price range's maximum, and if so, keep looking until it's not.
printing_range_max printing_price_by_range
15 4
24 3
33 2
So for example here, if the order count is 30 shirts they would be $2 each.
When I'm looking into how to do that, it looks like most people are using BETWEEN or IF and hard-coding the ranges instead of looking in another table. I imagine in a business setting it's best to be able to leave the range in its own table so it can be changed more easily. Is there a good/built-in way to do this or should I just write it in with a BETWEEN command or IF statements?
EDIT:
SQL Server 2014
Let's say we have this table:
DECLARE #priceRanges TABLE(printing_range_max tinyint, printing_price_by_range tinyint);
INSERT #priceRanges VALUES (15, 4), (24, 3), (33, 2);
You can create a table with ranges that represent the correct price. Below is how you would do this in pre-2012 and post-2012 systems:
DECLARE #priceRanges TABLE(printing_range_max tinyint, printing_price_by_range tinyint);
INSERT #priceRanges VALUES (15, 4), (24, 3), (33, 2);
-- post-2012 using LAG
WITH pricerange AS
(
SELECT
printing_range_min = LAG(printing_range_max, 1, 0) OVER (ORDER BY printing_range_max),
printing_range_max,
printing_price_by_range
FROM #priceRanges
)
SELECT * FROM pricerange;
-- pre-2012 using ROW_NUMBER and a self-join
WITH prices AS
(
SELECT
rn = ROW_NUMBER() OVER (ORDER BY printing_range_max),
printing_range_max,
printing_price_by_range
FROM #priceRanges
),
pricerange As
(
SELECT
printing_range_min = ISNULL(p2.printing_range_max, 0),
printing_range_max = p1.printing_range_max,
p1.printing_price_by_range
FROM prices p1
LEFT JOIN prices p2 ON p1.rn = p2.rn+1
)
SELECT * FROM pricerange;
Both queries return:
printing_range_min printing_range_max printing_price_by_range
------------------ ------------------ -----------------------
0 15 4
15 24 3
24 33 2
Now that you have that you can use BETWEEN for your join. Here's the full solution:
-- Sample data
DECLARE #priceRanges TABLE
(
printing_range_max tinyint,
printing_price_by_range tinyint
-- if you're on 2014+
,INDEX ix_xxx NONCLUSTERED(printing_range_max, printing_price_by_range)
-- note: second column should be an INCLUDE but not supported in table variables
);
DECLARE #orders TABLE
(
orderid int identity,
ordercount int
-- if you're on 2014+
,INDEX ix_xxy NONCLUSTERED(orderid, ordercount)
-- note: second column should be an INCLUDE but not supported in table variables
);
INSERT #priceRanges VALUES (15, 4), (24, 3), (33, 2);
INSERT #orders(ordercount) VALUES (10), (20), (25), (30);
-- Solution:
WITH pricerange AS
(
SELECT
printing_range_min = LAG(printing_range_max, 1, 0) OVER (ORDER BY printing_range_max),
printing_range_max,
printing_price_by_range
FROM #priceRanges
)
SELECT
o.orderid,
o.ordercount,
--p.printing_range_min,
--p.printing_range_max
p.printing_price_by_range
FROM pricerange p
JOIN #orders o ON o.ordercount BETWEEN printing_range_min AND printing_range_max
Results:
orderid ordercount printing_price_by_range
----------- ----------- -----------------------
1 10 4
2 20 3
3 25 2
4 30 2
Now that we have that we can

Add time between 2 dates across multiple rows in SQL Server

I have a table that lists all users for my company. There are multiple entries for each staff member showing how they have been employed.
RowID UserID FirstName LastName Title StartDate Active EndDate
-----------------------------------------------------------------------------------
1 1 John Smith Manager 2017-01-01 0 2017-01-31
2 1 John Smith Director 2017-02-01 0 2017-02-28
3 1 John Smith CEO 2017-03-01 1 NULL
4 2 Sam Davey Manager 2017-01-01 0 2017-02-28
5 2 Sam Davey Manager 2017-03-01 0 NULL
6 3 Hugh Holland Admin 2017-02-01 1 NULL
7 4 David Smith Admin 2017-01-01 0 2017-02-28
I am trying to write a query that will tell me someones length of service at any given time.
The part I am having trouble with is as a single person is represented by multiple rows as their information changes over time I need combine multiple rows...
I have a query to report on who is employed at a point in time which is as far as I have gotten.
DECLARE #DateCheck datetime
SET #DateCheck = '2017/05/10'
SELECT *
FROM UsersTest
WHERE #DateCheck >= StartDate AND #DateCheck <= ISNULL(EndDate, #DateCheck)
You need to use the datediff function. The key will be choosing the appropriate number - days, months, years. The return value is an integer so if you choose years, it will be rounded (and remember, it will round for each record, not for the summary. I've chosen months below. The following has been added to get the most recent information for user name:
WITH CurrentName AS
(SELECT UserID, FirstName, LastName
from
UserStartStop
where Active = 1 -- You can replace this with a date check
)
SELECT uss.UserID,
MAX(cn.FirstName) as FirstName, -- the max is necessary because we are
-- grouping. Could include in group by
MAX(cn.LastName) as LastName,
SUM(DATEDIFF(mm,uss.StartDate,COALESCE(uss.EndDate,GETDATE())))
from UserStartStop uss
JOIN CurrentName cn
on uss.UserID = cn.UserID
GROUP BY UserID
order by UserID
For months in service, change 'd' to 'mm':
Create table #UsersTest (
RowId int
, UserID int
, FirstName nvarchar(100)
, LastName nvarchar(100)
, Title nvarchar(100)
, StartDate date
, Active bit
, EndDate date)
Insert #UsersTest values (1, 1, 'John', 'Smith', 'Manager', '2017-01-01', 0, '2017-01-31')
Insert #UsersTest values (1, 1, 'John', 'Smith', 'Director', '2017-02-01', 0, '2017-02-28')
Insert #UsersTest values (1, 1, 'John', 'Smith', 'CEO', '2017-03-01', 1, null)
Insert #UsersTest values (1, 2, 'Sam', 'Davey', 'Manager', '2017-01-01', 0, '2017-02-28')
Insert #UsersTest values (1, 2, 'Sam', 'Davey', 'Manager', '2017-03-01', 0, null)
Insert #UsersTest values (1, 3, 'Hugh', 'Holland', 'Admin', '2017-02-01', 1, null)
Insert #UsersTest values (1, 4, 'David', 'Smith', 'Admin', '2017-01-01', 0, '2017-02-28')
Declare #DateCheck as datetime = '2017/05/10'
Select UserID, FirstName, LastName
, Datediff(d, Min([StartDate]), iif(isnull(Max([EndDate]),'1900-01-01')<#DateCheck, #DateCheck ,Max([Enddate]))) as [LengthOfService]
from #UsersTest
Group by UserID, FirstName, LastName
Try it's
Select
FirstName,
LastName,
Min(StartDate)StartDate,
Max(isnull(EndDate,getdate()) as EndDate
from Table

Getting Running Total of Time column using T-SQL in SQL server

I have a table XYZ with employee login duration details in TIME datatype column.
EmployeeID | DomainID | LoginDuration
----------------------------------------------------------------
1111 12 02:32:55:0000000
1111 4 00:57:17.0000000
1111 12 01:06:25.0000000
1111 11 03:31:23.0000000
2222 11 02:42:17.0000000
2222 4 03:54:52.0000000
2222 10 04:08:29.0000000
Apart from the above columns, I also have LoginTimeStamp and LoginWeek columns, which I am using in a JOIN statement.
I am trying to obtain running totals for the LoginDuration Column as follows:
EmployeeID | DomainID | HoursBefore | LoginDuration | HoursAfter |
---------------------------------------------------------------------------------
1111 12 00:00:00.0000000 02:32:55:0000000 **00:00:00.0000000**
1111 4 02:32:55.0000000 00:57:17.0000000 03:30:12.0000000
1111 12 03:30:12.0000000 01:06:25.0000000 04:36:37.0000000
1111 11 04:36:37.0000000 03:31:23.0000000 08:08:00.0000000
2222 11 00:00:00.0000000 02:42:17.0000000 **00:00:00.0000000**
2222 4 01:32:31.0000000 03:54:52.0000000 04:14:48.0000000
2222 10 04:14:48.0000000 04:08:29.0000000 08:09:40.0000000
HoursBefore is Previous Value of HoursAfter(00:00:00 for first row of each employee)
HoursAfter = HoursBefore+LoginDuration
For this purpose,I wrote the below query, But I am getting an error with the HoursAfter Column. It is not adding up the current value and previous value for each employee.
SELECT
a.EmployeeID,a.LoginDuration,
COALESCE(CAST(
DATEADD(ms,
SUM(DATEDIFF(ms,0,CAST(b.LoginDuration as datetime)))
, 0)
as time)
,'00:00:00') AS HoursBefore,
a.LoginDuration as Hours,
COALESCE(CAST(
DATEADD(ms,
SUM(DATEDIFF(ms,0,CAST(b.LoginDuration as datetime)))
, a.Loginduration)
as time)
,'00:00:00') As HoursAfter
FROM XYZ AS a
LEFT OUTER JOIN XYZ AS b
ON (a.EmployeeID = b.EmployeeID)
AND (a.LoginWeek = b.LoginWeek)
AND (b.LoginTimeStamp < a.LoginTimeStamp)
GROUP BY a.EmployeeID, a.LoginTimeStamp,a.LoginDuration
ORDER BY a.LoginWeek, a.EmployeeID, a.LoginTimeStamp;
I need help with the query such that the HoursAfter column for each employee is appropriate.
Any help would be greatly appreciated.
(This is my first query, reply if you may need any further details.)
Thanks.
Pity SQL Server doesn't support period datatype yet, it would make the math so much simpler.
However, it dos have rather good support for window functions in newer versions, which we can use to solve this:
declare #t table (ID int, EmployeeID int, DomainID int, LoginDuration time)
insert #t
values
(1, 1111, 12, '02:32:55.0000000'),
(2, 1111, 4, '00:57:17.0000000'),
(3, 1111, 12, '01:06:25.0000000'),
(4, 1111, 11, '03:31:23.0000000'),
(5, 2222, 11, '02:42:17.0000000'),
(6, 2222, 4, '03:54:52.0000000'),
(7, 2222, 10, '04:08:29.0000000')
;with x as (
select *, dateadd(second, sum(datediff(second, 0, loginduration)) over (partition by employeeid order by id), 0) sum_duration_sec,
row_number() over (partition by employeeid order by id) rn
from #t
)
select
employeeid,
domainid,
convert(time, isnull(lag(sum_duration_sec) over (partition by employeeid order by id),0)) hoursbefore,
loginduration,
convert(time, case when rn = 1 then 0 else sum_duration_sec end) hoursafter
from x
I introduced the ID column for brevity to establish the sequence, you'd probably want to use the (LoginWeek, LoginTimestamp) to order by.
Also, not sure about the requirement that HoursAfter should be 0 in 1st and 5th row - if not, delete the row_number() thing altogether.
use OUTER APPLY to calculate the Hours After. Hours Before is just Hours After subtracting current duration
SELECT a.EmployeeID, a.DomainID,
HoursBefore = CONVERT(TIME, DATEADD(SECOND, b.after_secs - DATEDIFF(SECOND, 0, a.LoginDuration), 0)),
a.LoginDuration,
HoursAfter = CONVERT(TIME, DATEADD(SECOND, b.after_secs, 0))
FROM XYZ AS a
OUTER APPLY
(
SELECT after_secs = SUM(DATEDIFF(SECOND, 0, x.LoginDuration))
FROM XYZ x
WHERE x.EmployeeID = a.EmployeeID
AND x.LoginWeek = a.LoginWeek
AND x.LoginTimeStamp <= a.LoginTimeStamp
) b

Resources