Microsoft T-SQL Counting Consecutive Records - sql-server

Problem:
From the most current day per person, count the number of consecutive days that each person has received 0 points for being good.
Sample data to work from :
Date Name Points
2010-05-07 Jane 0
2010-05-06 Jane 1
2010-05-07 John 0
2010-05-06 John 0
2010-05-05 John 0
2010-05-04 John 0
2010-05-03 John 1
2010-05-02 John 1
2010-05-01 John 0
Expected answer:
Jane was bad on 5/7 but good the day before that. So Jane was only bad 1 day in a row most recently. John was bad on 5/7, again on 5/6, 5/5 and 5/4. He was good on 5/3. So John was bad the last 4 days in a row.
Code to create sample data:
IF OBJECT_ID('tempdb..#z') IS NOT NULL BEGIN DROP TABLE #z END
select getdate() as Date,'John' as Name,0 as Points into #z
insert into #z values(getdate()-1,'John',0)
insert into #z values(getdate()-2,'John',0)
insert into #z values(getdate()-3,'John',0)
insert into #z values(getdate()-4,'John',1)
insert into #z values(getdate(),'Jane',0)
insert into #z values(getdate()-1,'Jane',1)
select * from #z order by name,date desc
Firstly, I am sorry but new to this system and having trouble figuring out how to work the interface and post properly.
2010-05-13 ---------------------------------------------------------------------------
Joel, Thank you so much for your response below! I need it for a key production job that was running about 60 minutes.
Now the job runs in 2 minutes!!
Yes, there was 1 condition in my case that I needed to address. My source always had only record for those that had
been bad recently so that was not a problem for me. I did however have to handle records where they were never good,
and did that with a left join to add back in the records and gave them a date so the counting would work for all.
Thanks again for your help. It was opened my mind some more to SET based logic and how to approach it and was
a HUGE benefit to my production job.

The basic solution here is to first build a set that contains the name of each person and the value of the last day on which that person was good. Then join this set to the original table and group by name to find the count of days > the last good day for each person. You can build the set in either a CTE, a view, or an uncorrelated derived table (sub query) — any of those will work. My example below uses a CTE.
Note that while the concept is sound, this specific example might not return exactly what you want. Your actual needs here depend on what you want to happen for those who have not been good ever and for those who have not been bad recently (ie, you might need a left join to show users who were good yesterday). But this should get you started:
WITH LastGoodDays AS
(
SELECT MAX([date]) as [date], name
FROM [table]
WHERE Points > 0
GROUP BY name
)
SELECT t.name, count(*) As ConsecutiveBadDays
FROM [table] t
INNER JOIN LastGoodDays lgd ON lgd.name = t.name AND t.[date] > lgd.[date]
group by t.name

Related

SQL join conditional either or not both?

I have 3 tables that I'm joining and 2 variables that I'm using in one of the joins.
What I'm trying to do is figure out how to join based on either of the statements but not both.
Here's the current query:
SELECT DISTINCT
WR.Id,
CAL.Id as 'CalendarId',
T.[First Of Month],
T.[Last of Month],
WR.Supervisor,
WR.cd_Manager as [Manager], --Added to search by the Manager--
WR.[Shift] as 'ShiftId'
INTO #Workers
FROM #T T
--Calendar
RIGHT JOIN [dbo].[Calendar] CAL
ON CAL.StartDate <= T.[Last of Month]
AND CAL.EndDate >= T.[First of Month]
--Workers
--This is the problem join
RIGHT JOIN [dbo].[Worker_Filtered]WR
ON WR.Supervisor IN (SELECT Id FROM [dbo].[User] WHERE FullName IN(#Supervisors))
or (WR.Supervisor IN (SELECT Id FROM [dbo].[User] WHERE FullName IN(#Supervisors))
AND WR.cd_Manager IN(SELECT Id FROM [dbo].[User] WHERE FullNameIN(#Manager))) --Added to search by the Manager--
AND WR.[Type] = '333E7907-EB80-4021-8CDB-5380F0EC89FF' --internal
WHERE CAL.Id = WR.Calendar
AND WR.[Shift] IS NOT NULL
What I want to do is either have the result based on the Worker_Filtered table matching the #Supervisor or (but not both) have it matching both the #Supervisor and #Manager.
The way it is now if it matches either condition it will be returned. This should be limiting the returned results to Workers that have both the Supervisor and Manager which would be a smaller data set than if they only match the Supervisor.
UPDATE
The query that I have above is part of a greater whole that pulls data for a supervisor's workers.
I want to also limit it to managers that are under a particular supervisor.
For example, if #Supervisor = John Doe and #Manager = Jane Doe and John has 9 workers 8 of which are under Jane's management then I would expect the end result to show that there are only 8 workers for each month. With the current query, it is still showing all 9 for each month.
If I change part of the RIGHT JOIN to:
WR.Supervisor IN (SELECT Id FROM [dbo].[User] WHERE FullName IN (#Supervisors))
AND WR.cd_Manager IN(SELECT Id FROM [dbo].[User] WHERE FullName IN(#Manager))
Then it just returns 12 rows of NULL.
UPDATE 2
Sorry, this has taken so long to get a sample up. I could not get SQL Fiddle to work for SQL Server 2008/2014 so I am using rextester instead:
Sample
This shows the results as 108 lines. But what I want to show is just the first 96 lines.
UPDATE 3
I have made a slight update to the Sample. this does get the results that I want. I can set #Manager to NULL and it will pull all 108 records, or I can have the correct Manager name in there and it'll only pull those that match both Supervisor and Manager.
However, I'm doing this with an IF ELSE and I was hoping to avoid doing that as it duplicates code for the insert into the Worker table.
The description of expected results in update 3 makes it all clear now, thanks. Your 'problem' join needs to be:
RIGHT JOIN Worker_Filtered wr on (wr.Supervisor in(#Supervisors)
and case when #Manager is null then 1
else case when wr.Manager in(#Manager) then 1 else 0 end
end = 1)
By the way, I don't know what you are expecting the in(#Supervisors) to achieve, but if you're hoping to supply a comma separated list of supervisors as a single string and have wr.Supervisor match any one of them then you're going to be disappointed. This query works exactly the same if you have = #Supervisors instead.

Delete latest entry in SQL Server without using datetime or ID

I have a basic SQL Server delete script that goes:
Delete from tableX
where colA = ? and colB = ?;
In tableX, I do not have any columns indicating sequential IDs or timestamp; just varchar. I want to delete the latest entry that was inserted, and I do not have access to the row number from the insert script. TOP is not an option because it's random. Also, this particular table does not have a primary key, and it's not a matter of poor design. Is there any way I can do this? I recall mysql being able to call something like max(row_number) and also something along the lines of limit one.
ROW_NUMBER exists in SQL Server, too, but it must be used with an OVER (order_by_clause). So... in your case it's impossible for you unless you come up with another sorting algo.
MSDN
Edit: (Examples for George from MSDN ... I'm afraid his company has a Firewall rule that blocks MSDN)
SQL-Code
USE AdventureWorks2012;
GO
SELECT ROW_NUMBER() OVER(ORDER BY SalesYTD DESC) AS Row,
FirstName, LastName, ROUND(SalesYTD,2,1) AS "Sales YTD"
FROM Sales.vSalesPerson
WHERE TerritoryName IS NOT NULL AND SalesYTD <> 0;
Output
Row FirstName LastName SalesYTD
--- ----------- ---------------------- -----------------
1 Linda Mitchell 4251368.54
2 Jae Pak 4116871.22
3 Michael Blythe 3763178.17
4 Jillian Carson 3189418.36
5 Ranjit Varkey Chudukatil 3121616.32
6 José Saraiva 2604540.71
7 Shu Ito 2458535.61
8 Tsvi Reiter 2315185.61
9 Rachel Valdez 1827066.71
10 Tete Mensa-Annan 1576562.19
11 David Campbell 1573012.93
12 Garrett Vargas 1453719.46
13 Lynn Tsoflias 1421810.92
14 Pamela Ansman-Wolfe 1352577.13
Returning a subset of rows
USE AdventureWorks2012;
GO
WITH OrderedOrders AS
(
SELECT SalesOrderID, OrderDate,
ROW_NUMBER() OVER (ORDER BY OrderDate) AS RowNumber
FROM Sales.SalesOrderHeader
)
SELECT SalesOrderID, OrderDate, RowNumber
FROM OrderedOrders
WHERE RowNumber BETWEEN 50 AND 60;
Using ROW_NUMBER() with PARTITION
USE AdventureWorks2012;
GO
SELECT FirstName, LastName, TerritoryName, ROUND(SalesYTD,2,1),
ROW_NUMBER() OVER(PARTITION BY TerritoryName ORDER BY SalesYTD DESC) AS Row
FROM Sales.vSalesPerson
WHERE TerritoryName IS NOT NULL AND SalesYTD <> 0
ORDER BY TerritoryName;
Output
FirstName LastName TerritoryName SalesYTD Row
--------- -------------------- ------------------ ------------ ---
Lynn Tsoflias Australia 1421810.92 1
José Saraiva Canada 2604540.71 1
Garrett Vargas Canada 1453719.46 2
Jillian Carson Central 3189418.36 1
Ranjit Varkey Chudukatil France 3121616.32 1
Rachel Valdez Germany 1827066.71 1
Michael Blythe Northeast 3763178.17 1
Tete Mensa-Annan Northwest 1576562.19 1
David Campbell Northwest 1573012.93 2
Pamela Ansman-Wolfe Northwest 1352577.13 3
Tsvi Reiter Southeast 2315185.61 1
Linda Mitchell Southwest 4251368.54 1
Shu Ito Southwest 2458535.61 2
Jae Pak United Kingdom 4116871.22 1
Your current table design does not allow you to determine the latest entry. YOu have no field to sort on to indicate which record was added last.
You need to redesign or pull that information from the audit tables. If you have a database without audit tables, you might have to find a tool to read the transaction logs and it will be a very time-consuming and expensive process. Or if you know the date the records you want to remove were added, you could possibly use a backup from just before this happened to find the records that were added. Just be awwre that you might be looking at records changed after this date that you want to keep.
If you need to do this on a regular basis instead of one-time to fix some bad data, then you need to properly design your database to include an identity field and possibly a dateupdated field (maintained through a trigger) or audit tables. (In my opinion no database containing information your company is depending on should be without audit tables, one of the many reasons why you should never allow an ORM to desgn a database, but I digress.) If you need to know the order records were added to a table, it is your responsiblity as the developer to create that structure. Databases only store what is deisnged for tehm to store, if you didn't design it in, then it is not available easily or at all
If (colA +'_'+ colB) can not be dublicate try this.
declare #delColumn nvarchar(250)
set #delColumn = (select top 1 DeleteColumn from (
select (colA +'_'+ colB) as DeleteColumn ,
ROW_NUMBER() OVER(ORDER BY colA DESC) as Id from tableX
)b
order by Id desc
)
delete from tableX where (colA +'_'+ colB) =#delColumn

sql server sum column to find out who reached a value first

I asked a question similar to this here:
sql sum a column and also return the last time stamp
It turns out my use case was incorrect though so I need to make an adjustment to my question.
I've got a table in SQL Server with several columns. The relevant ones are:
name
distance
create_date
I have many people identified by name, and every few days they travel a certain distance. For example:
name distance create_date
john 15 09/12/2014
john 20 09/22/2014
alex 10 08/15/2014
alex 12 09/05/2014
alex 20 09/12/2014
john 8 09/30/2014
alex 30 09/14/2014
mike 12 09/10/2014
The query I need has 3 parameters:
#start_date
#end_date
#count
I need a query that between the two dates, returns for each person the distance traveled. The trick though is that for each person I should sum to an amount just past the amount indicated in #count and return the date this was achieved, or if the person did not pass the #count then return the sum and last date of entry. So for example, if I use the parameters:
#start_date=08/01/2014
#end_date=09/25/2014
#count=35
I would expect the following:
name distance create_date
alex 42 09/12/2014
john 35 09/22/2014
mike 12 09/10/2014
Does someone have an idea for this?
Thank you!
As per my understanding the query for the the person who crossed particular count
select A.name,sum(A.distance),A.create_date from (select name,distance,create_date
from table where create_date between #start_date and #end_date)A group by A.name,A.create_date
having sum(A.distance)>#count
Those who doesnt cross
select A.name,sum(A.distance),max(A.create_date) from (select name,distance,create_date
from table where create_date between #start_date and #end_date)A group by A.name
having sum(A.distance)<#count

Query to create records between two dates

I have a table with the following fields (among others)
TagID
TagType
EventDate
EventType
EventType can be populated with "Passed Inspection", "Failed Inspection" or "Repaired" (there are actually many others, but simplifies to this for my issue)
Tags can go many months between a failed inspection and the ultimate repair... in this state they are deemed to be "awaiting repair". Tags are still inspected each month even after they have been identified as having failed. (and just to be clear, a “failed inspection” doesn’t mean the item being inspected doesn’t work at all… it still works, just not at 100% capacity…which is why we still do inspections on it).
I need to create a query that counts, by TagType, Month and Year the number of Tags that are awaiting repair. The end result table would look like this, for example
TagType EventMonth EventYear CountofTagID
xyz 1 2011 3
abc 1 2011 2
xyz 2 2011 2>>>>>>>>>>>>indicating a repair had been made since 1/2011
abc 2 2011 2
and so on
The "awaiting repair" status should be assessed on the last day of the month
This is totally baffling me...
One thought that I had was to develop a query that returned:
TagID,
TagType,
FailedInspectionDate, and
NextRepairDate,
then try and do something that stepped thru the months in between the two dates, but that seems wildly inefficient.
Any help would be much appreciated.
Update
A little more research, and a break from the problem to think about it differently gave me the following approach. I'm sure its not efficient or elegant, but it works. Comments to improve would be appreciated.
declare #counter int
declare #FirstRepair date
declare #CountMonths as int
set #FirstRepair = (<Select statement to find first repair across all records>)
set #CountMonths = (<select statement to find the number of months between the first repair across all records and today>)
--clear out the scratch table
delete from dbo.tblMonthEndDate
set #counter=0
while #counter <=#CountMonths --fill the scratch table with the date of the last day of every month from the #FirstRepair till today
begin
insert into dbo.tblMonthEndDate(monthenddate) select dbo.lastofmonth(dateadd(m,#counter, #FirstRepair))
set #counter = #counter+1
end
--set up a CTE to get a cross join between the scratch table and the view that has the associated first Failed Inspection and Repair
;with Drepairs_CTE (FacilityID, TagNumber, CompType, EventDate)
AS
(
SELECT dbo.vwDelayedRepairWithRepair.FacilityID, dbo.vwDelayedRepairWithRepair.TagNumber, dbo.vwDelayedRepairWithRepair.CompType,
dbo.tblMonthEndDate.MonthEndDate
FROM dbo.vwDelayedRepairWithRepair INNER JOIN
dbo.tblMonthEndDate ON dbo.vwDelayedRepairWithRepair.EventDate <= dbo.tblMonthEndDate.MonthEndDate AND
dbo.vwDelayedRepairWithRepair.RepairDate >= dbo.tblMonthEndDate.MonthEndDate
)
--use the CTE to build the final table I want
Select FacilityID, CompType, Count(TagNumber), MONTH(EventDate), YEAR(EventDate), 'zzz' as EventLabel
FROM Drepairs_CTE
GROUP BY FacilityID, CompType, MONTH(EventDate), YEAR(EventDate)`
Result set ultimately looks like this:
FacilityID CompType Count Month Year Label
1 xyz 2 1 2010 zzz
1 xyz 1 2 2010 zzz
1 xyz 1 7 2009 zzz
Here is a recursive CTE which generates table of last dates of months in interval starting with minimum date in repair table and ending with maximum date.
;with tableOfDates as (
-- First generation returns last day of month of first date in repair database
-- and maximum date
select dateadd (m, datediff (m, 0, min(eventDate)) + 1, 0) - 1 startDate,
max(eventDate) endDate
from vwDelayedRepairWithRepair
union all
-- Last day of next month
select dateadd (m, datediff (m, 0, startDate) + 2, 0) - 1,
endDate
from tableOfDates
where startDate <= endDate
)
select *
from tableOfDates
-- If you change the CTE,
-- Set this to reasonable number of months
-- to prevent recursion problems. 0 means no limit.
option (maxrecursion 0)
EndDate column from tableOfDates is to be ignored, as it serves as upper bound only. If you create UDF which returns all the dates in an interval, omit endDate in select list or remove it from CTE and replace with a parameter.
Sql Fiddle playground is here.

SQL Server 2000: Ideas for performing concatenation aggregation subquery

i have a query that returns rows that i want, e.g.
QuestionID QuestionTitle UpVotes DownVotes
========== ============= ======= =========
2142075 Win32: Cre... 0 0
2232727 Win32: How... 2 0
1870139 Wondows Ae... 12 0
Now i want to have a column returned, that contains a comma separated list of "Authors" (e.g. original poster and editors). e.g.:
QuestionID QuestionTitle UpVotes DownVotes Authors
========== ============= ======= ========= ==========
2142075 Win32: Cre... 0 0 Ian Boyd
2232727 Win32: How... 2 0 Ian Boyd, roygbiv
1870139 Wondows Ae... 12 0 Ian Boyd, Aaron Klotz, Jason Diller, danbystrom
Faking It
SQL Server 2000 does not have a CONCAT(AuthorName, ', ') aggregation operation, i've been faking it - performing simple sub-selects for the TOP 1 author, and the author count.
QuestionID QuestionTitle UpVotes DownVotes FirstAuthor AuthorCount
========== ============= ======= ========= =========== ===========
2142075 Win32: Cre... 0 0 Ian Boyd 1
2232727 Win32: How... 2 0 Ian Boyd 2
1870139 Wondows Ae... 12 0 Ian Boyd 3
If there is more than one author, then i show the user an ellipses ("…"), to indicate there is more than one. e.g. the user would see:
QuestionID QuestionTitle UpVotes DownVotes Authors
========== ============= ======= ========= ==========
2142075 Win32: Cre... 0 0 Ian Boyd
2232727 Win32: How... 2 0 Ian Boyd, …
1870139 Wondows Ae... 12 0 Ian Boyd, …
And that works well enough, since normally a question isn't edited - which means i'm supporting the 99% case perfectly, and the 1% case only half-assed as well.
Threaded Re-query
As a more complicated, and bug-prone solution, i was thinking of iterating the displayed list, and spinning up a thread-pool worker thread for each "question" in the list, perform a query against the database to get the list of authors, then aggregating the list in memory. This would mean that the list fills first in the (native) application. Then i issue a few thousand individual queries afterwards.
But that would be horribly, horrendously, terribly, slow. Not to mention bug-riddled, since it will be thread work.
Yeah yeah yeah
Adam Mechanic says quite plainly:
Don't concatenate rows into delimited
strings in SQL Server. Do it client
side.
Tell me how, and i'll do it.
/cry
Can anyone think of a better solution, that is as fast (say...within an order of magnitude) than my original "TOP 1 plus ellipses" solution?
For example, is there a way to return a results set, where reach row has an associated results set? So for each "master" row, i could get at a "detail" results set that contains the list.
Code for best answer
Cade's link to Adam Machanic's solution i like the best. A user-defined function, that seems to operate via magic:
CREATE FUNCTION dbo.ConcatAuthors(#QuestionID int)
RETURNS VARCHAR(8000)
AS
BEGIN
DECLARE #Output VARCHAR(8000)
SET #Output = ''
SELECT #Output = CASE #Output
WHEN '' THEN AuthorName
ELSE #Output + ', ' + AuthorName
END
FROM (
SELECT QuestionID, AuthorName, QuestionDate AS AuthorDate FROM Questions
UNION
SELECT QuestionID, EditorName, EditDate FROM QuestionEdits
) dt
WHERE dt.QuestionID = #QuestionID
ORDER BY AuthorDate
RETURN #Output
END
With a T-SQL usage of:
SELECT QuestionID, QuestionTitle, UpVotes, DownVotes, dbo.ConcatAuthors(AuthorID)
FROM Questions
Have a look at these articles:
http://dataeducation.com/rowset-string-concatenation-which-method-is-best/
http://www.simple-talk.com/sql/t-sql-programming/concatenating-row-values-in-transact-sql/ (See Phil Factor's cross join solution in the responses - which will work in SQL Server 2000)
Obviously in SQL Server 2005, the FOR XML trick is easiest, most flexible and generally most performant.
As far as returning a rowset for each row, if you still want to do that for some reason, you can do that in a stored procedure, but the client will need to consume all the rows in the first rowset and then go to the next rowset and associate it with the first row in the first rowset, etc. Your SP would need to open a cursor on the same set it returned as the first rowset and run multiple selects in sequence to generate all the child rowsets. It's a technique I've done, but only where ALL the data actually was needed (for instance, in a fully-populated tree view).
And regardless of what people say, doing it client-side is often a very big waste of bandwidth, because returning all the rows and doing the looping and breaking in the client side means that huge number of identical columns are being transferred at the start of each row just to get the changing column at the end of the row.
Wherever you do it, it should be an informed decision based on your use case.
I tried 3 approaches to this solution, the one posted here, activex scripting and UDF functions.
The most effective script (speed-wise) for me was bizzarely Axtive-X script running multiple queries to get the additioanl data to concat.
UDF took an average of 22 minutes to transform, the Subquery method (posted here) took around 5m and the activeX script took 4m30, much to my annoyance since this was the script I was hoping to ditch. I'll have to see if I can iron out a few more efficiencies elsewhere.
I think the extra 30s is used by the tempdb being used to store the data since my script requires an order by.
It should be noted that I am concatanating huge quantities of textual data.
You can also take a look to this script. It's basically the cross join approach that Cade Roux also mentioned in his post.
The above approach looks very clean: you have to do a view first and secondly create a statement based on the values in the view. The second sql statement you can build dynamically in your code, so it should be straight forward to use.
I'm not sure if this works in SQL Server 2000, but you can try it:
--combine parent and child, children are CSV onto parent row
CREATE TABLE #TableA (RowID int, Value1 varchar(5), Value2 varchar(5))
INSERT INTO #TableA VALUES (1,'aaaaa','A')
INSERT INTO #TableA VALUES (2,'bbbbb','B')
INSERT INTO #TableA VALUES (3,'ccccc','C')
CREATE TABLE #TableB (RowID int, TypeOf varchar(10))
INSERT INTO #TableB VALUES (1,'wood')
INSERT INTO #TableB VALUES (2,'wood')
INSERT INTO #TableB VALUES (2,'steel')
INSERT INTO #TableB VALUES (2,'rock')
INSERT INTO #TableB VALUES (3,'plastic')
INSERT INTO #TableB VALUES (3,'paper')
SELECT
a.*,dt.CombinedValue
FROM #TableA a
LEFT OUTER JOIN (SELECT
c1.RowID
,STUFF(
(SELECT
', ' + TypeOf
FROM (SELECT
a.RowID,a.Value1,a.Value2,b.TypeOf
FROM #TableA a
LEFT OUTER JOIN #TableB b ON a.RowID=b.RowID
) c2
WHERE c2.rowid=c1.rowid
ORDER BY c1.RowID, TypeOf
FOR XML PATH('')
)
,1,2, ''
) AS CombinedValue
FROM (SELECT
a.RowID,a.Value1,a.Value2,b.TypeOf
FROM #TableA a
LEFT OUTER JOIN #TableB b ON a.RowID=b.RowID
) c1
GROUP BY RowID
) dt ON a.RowID=dt.RowID
OUTPUT from SQL Server 2005:
RowID Value1 Value2 CombinedValue
----------- ------ ------ ------------------
1 aaaaa A wood
2 bbbbb B rock, steel, wood
3 ccccc C paper, plastic
(3 row(s) affected)
EDIT query that replaces FOR XML PATH with FOR XML RAW, so this should work on SQL Server 2000
SELECT
a.*,dt.CombinedValue
FROM #TableA a
LEFT OUTER JOIN (SELECT
c1.RowID
,STUFF(REPLACE(REPLACE(
(SELECT
', ' + TypeOf as value
FROM (SELECT
a.RowID,a.Value1,a.Value2,b.TypeOf
FROM #TableA a
LEFT OUTER JOIN #TableB b ON a.RowID=b.RowID
) c2
WHERE c2.rowid=c1.rowid
ORDER BY c1.RowID, TypeOf
FOR XML RAW
)
,'<row value="',''),'"/>','')
, 1, 2, '') AS CombinedValue
FROM (SELECT
a.RowID,a.Value1,a.Value2,b.TypeOf
FROM #TableA a
LEFT OUTER JOIN #TableB b ON a.RowID=b.RowID
) c1
GROUP BY RowID
) dt ON a.RowID=dt.RowID
OUTPUT, same as original query

Resources