SQL Server Comparing Subsequent Rows for Duplicates

SQL Server Comparing Subsequent Rows for Duplicates - sql-server

I am trying to write a SQL Server query but have had no luck and was wondering if anyone may have any ideas on how to achieve my query.
What i'm trying to do:
I have a table with several columns naming the ones that i am dealing with TaskID, StatusCode, Timestamp. Now this table just holds tasks for one of our systems that run throughout the day and when something runs it gets a timestamp and the statuscode depending on the status for that task.
Sometimes what happens is the task table will be updated with a new timestamp but the statusCode will not have changed since the last update of the task so for two or more consecutive rows of a given task the statusCode can be the same. When i say consecutive rows i mean with regards to timestamp.
So example task 88 could have twenty rows at statusCode 2 after which the status code changes to something else.
Now what i am trying to do with no luck at the moment is to retrieve a list from this table of all the tasks and the statuscodes and the timestamps but in the case where i have more than one consecutive row for a task with the same statuscode i just want to take the first row with the lowest timestamp and ignore the rest of the row until the statuscode for that task changes.
To make it simpler in this case you can assume that i have a taskid which i am filtering on so i am just looking at a single task.
Does anyone have any ideas as to how i can do this or perhaps something that i coudl probably read to help me?
Thanks
Irfan.

This are a couple ways of getting what you want:
SELECT
T1.task_id,
T1.status_code,
T1.status_timestamp
FROM
My_Table T1
LEFT OUTER JOIN My_Table T2 ON
T2.task_id = T1.task_id AND
T2.status_timestamp < T1.status_timestamp
LEFT OUTER JOIN My_Table T3 ON
T3.task_id = T1.task_id AND
T3.status_timestamp < T1.status_timestamp AND
T3.status_timestamp > T2.status_timestamp
WHERE
T3.task_id IS NULL AND
(T2.status_code IS NULL OR T2.status_code <> T1.status_code)
ORDER BY
T1.status_timestamp
or
SELECT
T1.task_id,
T1.status_code,
T1.status_timestamp
FROM
My_Table T1
LEFT OUTER JOIN My_Table T2 ON
T2.task_id = T1.task_id AND
T2.status_timestamp = (
SELECT
MAX(status_timestamp)
FROM
My_Table T3
WHERE
T3.task_id = T1.task_id AND
T3.status_timestamp < T1.status_timestamp)
WHERE
(T2.status_code IS NULL OR T2.status_code <> T1.status_code)
ORDER BY
T1.status_timestamp
Both methods rely on there being no exact matches of the status_timestamp values (two rows can't have the same exact status_timestamp for a given task_id.)

Something like
select TaskID,StatusCode,Min(TimeStamp)
from table
group by TaskID,StatusCode
order by 1,2
Note that is statuscode can duplicate, you will need an additional field, but hopefully this can point you in the right direction...

Something like the following should get you in the right direction....
CREATE TABLE #T
(
TaskId INT
,StatusCode INT
,StatusTimeStamp DATETIME
)
INSERT INTO #T
SELECT 1, 1, '2009-12-01 14:20'
UNION SELECT 1, 2, '2009-12-01 16:20'
UNION SELECT 1, 2, '2009-12-02 09:15'
UNION SELECT 1, 2, '2009-12-02 12:15'
UNION SELECT 1, 3, '2009-12-02 18:15'
;WITH CTE AS
(
SELECT TaskId
,StatusCode
,StatusTimeStamp
,ROW_NUMBER() OVER (PARTITION BY TaskId, StatusCode ORDER BY TaskId, StatusTimeStamp DESC) AS RNUM
FROM #T
)
SELECT TaskId
,StatusCode
,StatusTimeStamp
FROM CTE
WHERE RNUM = 1
DROP TABLE #T

Related

SQL - Attain Previous Transaction Informaiton [duplicate]

I need to calculate the difference of a column between two lines of a table. Is there any way I can do this directly in SQL? I'm using Microsoft SQL Server 2008.
I'm looking for something like this:
SELECT value - (previous.value) FROM table
Imagining that the "previous" variable reference the latest selected row. Of course with a select like that I will end up with n-1 rows selected in a table with n rows, that's not a probably, actually is exactly what I need.
Is that possible in some way?

Use the lag function:
SELECT value - lag(value) OVER (ORDER BY Id) FROM table
Sequences used for Ids can skip values, so Id-1 does not always work.

SQL has no built in notion of order, so you need to order by some column for this to be meaningful. Something like this:
select t1.value - t2.value from table t1, table t2
where t1.primaryKey = t2.primaryKey - 1
If you know how to order things but not how to get the previous value given the current one (EG, you want to order alphabetically) then I don't know of a way to do that in standard SQL, but most SQL implementations will have extensions to do it.
Here is a way for SQL server that works if you can order rows such that each one is distinct:
select rank() OVER (ORDER BY id) as 'Rank', value into temp1 from t
select t1.value - t2.value from temp1 t1, temp1 t2
where t1.Rank = t2.Rank - 1
drop table temp1
If you need to break ties, you can add as many columns as necessary to the ORDER BY.

WITH CTE AS (
SELECT
rownum = ROW_NUMBER() OVER (ORDER BY columns_to_order_by),
value
FROM table
)
SELECT
curr.value - prev.value
FROM CTE cur
INNER JOIN CTE prev on prev.rownum = cur.rownum - 1

Oracle, PostgreSQL, SQL Server and many more RDBMS engines have analytic functions called LAG and LEAD that do this very thing.
In SQL Server prior to 2012 you'd need to do the following:
SELECT value - (
SELECT TOP 1 value
FROM mytable m2
WHERE m2.col1 < m1.col1 OR (m2.col1 = m1.col1 AND m2.pk < m1.pk)
ORDER BY
col1, pk
)
FROM mytable m1
ORDER BY
col1, pk
, where COL1 is the column you are ordering by.
Having an index on (COL1, PK) will greatly improve this query.

LEFT JOIN the table to itself, with the join condition worked out so the row matched in the joined version of the table is one row previous, for your particular definition of "previous".
Update: At first I was thinking you would want to keep all rows, with NULLs for the condition where there was no previous row. Reading it again you just want that rows culled, so you should an inner join rather than a left join.
Update:
Newer versions of Sql Server also have the LAG and LEAD Windowing functions that can be used for this, too.

select t2.col from (
select col,MAX(ID) id from
(
select ROW_NUMBER() over(PARTITION by col order by col) id ,col from testtab t1) as t1
group by col) as t2

The selected answer will only work if there are no gaps in the sequence. However if you are using an autogenerated id, there are likely to be gaps in the sequence due to inserts that were rolled back.
This method should work if you have gaps
declare #temp (value int, primaryKey int, tempid int identity)
insert value, primarykey from mytable order by primarykey
select t1.value - t2.value from #temp t1
join #temp t2
on t1.tempid = t2.tempid - 1

Another way to refer to the previous row in an SQL query is to use a recursive common table expression (CTE):
CREATE TABLE t (counter INTEGER);
INSERT INTO t VALUES (1),(2),(3),(4),(5);
WITH cte(counter, previous, difference) AS (
-- Anchor query
SELECT MIN(counter), 0, MIN(counter)
FROM t
UNION ALL
-- Recursive query
SELECT t.counter, cte.counter, t.counter - cte.counter
FROM t JOIN cte ON cte.counter = t.counter - 1
)
SELECT counter, previous, difference
FROM cte
ORDER BY counter;
Result:
counter
previous
difference
1
0
1
2
1
1
3
2
1
4
3
1
5
4
1
The anchor query generates the first row of the common table expression cte where it sets cte.counter to column t.counter in the first row of table t, cte.previous to 0, and cte.difference to the first row of t.counter.
The recursive query joins each row of common table expression cte to the previous row of table t. In the recursive query, cte.counter refers to t.counter in each row of table t, cte.previous refers to cte.counter in the previous row of cte, and t.counter - cte.counter refers to the difference between these two columns.
Note that a recursive CTE is more flexible than the LAG and LEAD functions because a row can refer to any arbitrary result of a previous row. (A recursive function or process is one where the input of the process is the output of the previous iteration of that process, except the first input which is a constant.)
I tested this query at SQLite Online.

You can use the following funtion to get current row value and previous row value:
SELECT value,
min(value) over (order by id rows between 1 preceding and 1
preceding) as value_prev
FROM table
Then you can just select value - value_prev from that select and get your answer

T-SQL Selecting TOP 1 In A Query With Aggregates/Groups

I'm still fairly new to SQL. This is a stripped down version of the query I'm trying to run. This query is suppose to find those customers with more than 3 cases and display either the top 1 case or all cases but still show the overall count of cases per customer in each row in addition to all the case numbers.
The TOP 1 subquery approach didn't work but is there another way to get the results I need? Hope that makes sense.
Here's the code:
SELECT t1.StoreID, t1.CustomerID, t2.LastName, t2.FirstName
,COUNT(t1.CaseNo) AS CasesCount
,(SELECT TOP 1 t1.CaseNo)
FROM MainDatabase t1
INNER JOIN CustomerDatabase t2
ON t1.StoreID = t2.StoreID
WHERE t1.SubmittedDate >= '01/01/2017' AND t1.SubmittedDate <= '05/31/2017'
GROUP BY t1.StoreID, t1.CustomerID, t2.LastName, t2.FirstName
HAVING COUNT (t1.CaseNo) >= 3
ORDER BY t1.StoreID, t1.PatronID
I would like it to look something like this, either one row with just the most recent case and detail or several rows showing all details of each case in addition to the store id, customer id, last name, first name, and case count.
Data Example

For these I usually like to make a temp table of aggregates:
DROP TABLE IF EXISTS #tmp;
CREATE TABLE #tmp (
CustomerlD int NOT NULL DEFAULT 0,
case_count int NOT NULL DEFAULT 0,
case_max int NOT NULL DEFAULT 0,
);
INSERT INTO #tmp
(CustomerlD, case_count, case_max)
SELECT CustomerlD, COUNT(tl.CaseNo), MAX(tl.CaseNo)
FROM MainDatabase
GROUP BY CustomerlD;
Then you can join this "tmp" table back to any other table you want to display the number of cases on, or the max case number on. And you can limit it to customers that have more than 3 cases with WHERE case_count > 3

Order By A Value In Another Field

I have a job definition table with example data, shown below, that needs to be sorted in such a way that records that have a NextJobDefinitionID > 0 are kept together. The sort order for records where the NextJobDefinitionID = 0 does not matter. In the example the record with the JobName of "M1 P1" must follow "M1 Pre-Roll" and "M1 Pre-Roll" must follow "M1 Recurring Benefits". I am using SQL Server 2014.
Data:
My desired output would be:
M1 Recurring Benefits
M1 Pre-Roll
M1 P1

I believe this constructs the required ordering:
declare #t table (ID int,NextID int)
insert into #t(ID,NextID) values
(1,0),
(2,5),
(3,6),
(4,2),
(5,0),
(6,4)
;With Parents as (
select ID,ID as ParentID, 0 as Depth, NextID
from #t
where ID not in (select NextID from #t)
union all
select p.NextID,p.ParentID,Depth+1,t.NextID
from Parents p
inner join
#t t
on
p.NextID = t.ID
where p.NextID != 0
)
select * from Parents
order by ParentID,Depth
It works by building a CTE by using rows which may be freely ordered as the base case and then following the NextID values along the chain, keeping the original ParentID and increasing a Depth value, to then be able to have a simple ORDER BY at the end.
(Translating back to your original column/table/sample data left as an exercise for the reader, since as I say, I don't need the typing practice to transcribe it from an image)

If I correctly understand, you need something like this:
(select JobDefinitionID, FloatingJobID, JobName, NextJobDefinitionID from JobDefinitions
where NextJobDefinitionID <> 0)
UNION ALL
(select JobDefinitionID, FloatingJobID, JobName, 9223372036854775807 AS NextJobDefinitionID from JobDefinitions WHERE JobDefinitionID = (SELECT MAX(NextJobDefinitionID) FROM JobDefinitions))
order by NextJobDefinitionID

How to use 'Merge' to combine rows into a single one

Scenario:
I have a simplified version of a result set obtained from a series of complex joins. I have placed the result set in a temporary table. The result set consists of records of activity/activities in a day.
I need to join the 2 rows (merge activities of a day into a single row) with similar dates so that the resulting result set would be
I am trying to make this work
Merge #temp as target
using #temp as source
on (target.Date = source.Date) and target.Writing is NULL
when matched then
update set target.Writing = source.Writing;
but I'm running into this error:
The MERGE statement attempted to UPDATE or DELETE the same row more
than once. This happens when a target row matches more than one source
row. A MERGE statement cannot UPDATE/DELETE the same row of the target
table multiple times. Refine the ON clause to ensure a target row
matches at most one source row, or use the GROUP BY clause to group
the source rows.
What code modifications can you suggest?

This should do it:
SELECT dfl.mydate, dfl.firststart, dfl.lastend, fa.ActivityA, sa.ActivityB
FROM
(select s.mydate, firststart, lastend FROM
(SELECT mydate, MIN(starttime) as firststart from target GROUP by mydate) s
iNNER JOIN
(SELECT mydate, MAX(EndTime) as lastend from target GROUP by mydate) e
ON s.mydate = e.mydate) AS dfl
INNER JOIN
target fa on dfl.mydate = fa.mydate and dfl.firststart = fa.starttime
INNER JOIN
target sa on dfl.mydate = sa.mydate and dfl.lastend = sa.EndTime
Please note for my test I have called my table target and the columns: mydate, starttime, endtime, activitya and activityb.
No need to merge, a (relatively) simple select yields the results you want.
HTH
PS It helps when using time data to use a 24 hour clock. I have assumed by 5:00 you really meant 17:00

You don't need MERGE statement.
DECLARE #Test TABLE ([Id] int, [Date] nvarchar(10), [TimeIn] nvarchar(10), [TimeOut] nvarchar(10), [Reading] nvarchar(10), [Writeing] nvarchar(10))
INSERT INTO #Test
VALUES
(1,'08-01','8:00','5:00','Y',NULL),
(2,'08-02','8:00','5:00',NULL,'Y'),
(3,'08-02','5:00','12:00',NULL,'Y'),
(4,'08-03','8:00','5:00',NULL,'Y'),
(5,'08-04','1:00','5:00','Y',NULL),
(6,'08-04','5:00','7:00',NULL,'Y'),
(7,'08-04','7:00','10:00',NULL,'Y'),
(8,'08-04','10:00','13:00',NULL,'Y'),
(9,'08-05','8:00','5:00','Y',NULL)
;WITH CTE AS
(
SELECT
t1.[Date],
t1.TimeIn,
ISNULL(t2.TimeOut, t1.TimeOut) AS TimeOut,
ROW_NUMBER() OVER (PARTITION BY t1.[Date] ORDER BY t1.Id) AS RowNumber
FROM #Test AS t1
LEFT OUTER JOIN #Test AS t2 ON t1.TimeOut = t2.TimeIn AND t1.[Date] = t2.[Date]
)
SELECT
c.[Date],
(SELECT c2.TimeIn FROM CTE AS c2 WHERE c2.[Date] = c.[Date] AND c2.RowNumber = MIN(c.RowNumber)) AS TimeIn,
(SELECT c2.TimeOut FROM CTE AS c2 WHERE c2.[Date] = c.[Date] AND c2.RowNumber = MAX(c.RowNumber)) AS TimeOut
FROM CTE AS c
GROUP BY c.[Date]

You can use merge statements in tables where you have an identical column. You can identify the one or more columns that can be used to uniquely identify the row to be merged.

SQL Select set of records from one table, join each record to top 1 record of second table matching 1 column, sorted by a column in the second table

This is my first question on here, so I apologize if I break any rules.
Here's the situation. I have a table that lists all the employees and the building to which they are assigned, plus training hours, with ssn as the id column, I have another table that list all the employees in the company, also with ssn, but including name, and other personal data. The second table contains multiple records for each employee, at different points in time. What I need to do is select all the records in the first table from a certain building, then get the most recent name from the second table, plus allow the result set to be sorted by any of the columns returned.
I have this in place, and it works fine, it is just very slow.
A very simplified version of the tables are:
table1 (ssn CHAR(9), buildingNumber CHAR(7), trainingHours(DEC(5,2)) (7200 rows)
table2 (ssn CHAR(9), fName VARCHAR(20), lName VARCHAR(20), sequence INT) (708,000 rows)
The sequence column in table 2 is a number that corresponds to a predetermined date to enter these records, the higher number, the more recent the entry. It is common/expected that each employee has several records. But several may not have the most recent(i.e. '8').
My SProc is:
#BuildingNumber CHAR(7), #SortField VARCHAR(25)
BEGIN
DECLARE #returnValue TABLE(ssn CHAR(9), buildingNumber CAHR(7), fname VARCHAR(20), lName VARCHAR(20), rowNumber INT)
INSERT INTO #returnValue(...)
SELECT(ssn,buildingNum,fname,lname,rowNum)
FROM SELECT(...,CASE #SortField Row_Number() OVER (PARTITION BY buildingNumber ORDER BY {sortField column} END AS RowNumber)
FROM table1 a
OUTER APPLY(SELECT TOP 1 fName,lName FROM table2 WHERE ssn = a.ssn ORDER BY sequence DESC) AS e
where buildingNumber = #BuildingNumber
SELECT * from #returnValue ORDER BY RowNumber
END
I have indexes for the following:
table1: buildingNumber(non-unique,nonclustered)
table2: sequence_ssn(unique,nonclustered)
Like I said this gets me the correct result set, but it is rather slow. Is there a better way to go about doing this?
It's not possible to change the database structure or the way table 2 operates. Trust me if it were it would be done. Are there any indexes I could make that would help speed this up?
I've looked at the execution plans, and it has a clustered index scan on table 2(18%), then a compute scalar(0%), then an eager spool(59%), then a filter(0%), then top n sort(14%).
That's 78% of the execution so I know it's in the section to get the names, just not sure of a better(faster) way to do it.
The reason I'm asking is that table 1 needs to be updated with current data. This is done through a webpage with a radgrid control. It has a range, start index, all that, and it takes forever for the users to update their data.
I can change how the update process is done, but I thought I'd ask about the query first.
Thanks in advance.

I would approach this with window functions. The idea is to assign a sequence number to records in the table with duplicates (I think table2), such as the most recent records have a value of 1. Then just select this as the most recent record:
select t1.*, t2.*
from table1 t1 join
(select t2.*,
row_number() over (partition by ssn order by sequence desc) as seqnum
from table2 t2
) t2
on t1.ssn = t1.ssn and t2.seqnum = 1
where t1.buildingNumber = #BuildingNumber;
My second suggestion is to use a user-defined function rather than a stored procedure:
create function XXX (
#BuildingNumber int
)
returns table as
return (
select t1.ssn, t1.buildingNum, t2.fname, t2.lname, rowNum
from table1 t1 join
(select t2.*,
row_number() over (partition by ssn order by sequence desc) as seqnum
from table2 t2
) t2
on t1.ssn = t1.ssn and t2.seqnum = 1
where t1.buildingNumber = #BuildingNumber;
);
(This doesn't have the logic for the ordering because that doesn't seem to be the central focus of the question.)
You can then call it as:
select *
from dbo.XXX(<building number>);
EDIT:
The following may speed it up further, because you are only selecting a small(ish) subset of the employees:
select *
from (select t1.*, t2.*, row_number() over (partition by ssn order by sequence desc) as seqnum
from table1 t1 join
table2 t2
on t1.ssn = t1.ssn
where t1.buildingNumber = #BuildingNumber
) t
where seqnum = 1;
And, finally, I suspect that the following might be the fastest:
select t1.*, t2.*, row_number() over (partition by ssn order by sequence desc) as seqnum
from table1 t1 join
table2 t2
on t1.ssn = t1.ssn
where t1.buildingNumber = #BuildingNumber and
t2.sequence = (select max(sequence) from table2 t2a where t2a.ssn = t1.ssn)
In all these cases, an index on table2(ssn, sequence) should help performance.

Try using some temp tables instead of the table variables. Not sure what kind of system you are working on, but I have had pretty good luck. Temp tables actually write to the drive so you wont be holding and processing so much in memory. Depending on other system usage this might do the trick.
Simple define the temp table using #Tablename instead of #Tablename. Put the name sorting subquery in a temp table before everything else fires off and make a join to it.
Just make sure to drop the table at the end. It will drop the table at the end of the SP when it disconnects, but it is a good idea to make tell it to drop to be on the safe side.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

SQL Server Comparing Subsequent Rows for Duplicates - sql-server

Something like select TaskID,StatusCode,Min(TimeStamp) from table group by TaskID,StatusCode order by 1,2 Note that is statuscode can duplicate, you will need an additional field, but hopefully this can point you in the right direction...

Related

SQL - Attain Previous Transaction Informaiton [duplicate]

T-SQL Selecting TOP 1 In A Query With Aggregates/Groups

Order By A Value In Another Field

How to use 'Merge' to combine rows into a single one

SQL Select set of records from one table, join each record to top 1 record of second table matching 1 column, sorted by a column in the second table

Categories

Resources