How to compare records in same SQL Server table - sql-server

My requirement is to compare each column of row with its previous row.
Compare row 2 with row 1
Compare row 3 with row 2
Also, if there is no difference, I need to make that column NULL. Eg: request_status_id of row 3 is same as that of row 2 so I need to update request_status_id of row 3 to NULL.
Is there a clean way to do this?

You can use the following UPDATE statement that employs LAG window function available from SQL Server 2012 onwards:
UPDATE #mytable
SET request_status_id = NULL
FROM #mytable AS m
INNER JOIN (
SELECT payment_history_id, request_status_id,
LAG(request_status_id) OVER(ORDER BY payment_history_id) AS prevRequest_status_id
FROM #mytable ) t
ON m.payment_history_id = t.payment_history_id
WHERE t.request_status_id = t.prevRequest_status_id
SQL Fiddle Demo here
EDIT:
It seems the requirement of the OP is to SET every column of the table
to NULL, in case the previous value is same as the current value. In this case the query becomes a bit more verbose. Here is an example with two columns being set. It can easily be expanded to incorporate any other column of the table:
UPDATE #mytable
SET request_status_id = CASE WHEN t.request_status_id = t.prevRequest_status_id THEN NULL
ELSE T.request_status_id
END,
request_entity_id = CASE WHEN t.request_entity_id = t.prevRequest_entity_id THEN NULL
ELSE t.request_entity_id
END
FROM #mytable AS m
INNER JOIN (
SELECT payment_history_id, request_status_id, request_entity_id,
LAG(request_status_id) OVER(ORDER BY payment_history_id) AS prevRequest_status_id,
LAG(request_entity_id) OVER(ORDER BY payment_history_id) AS prevRequest_entity_id
FROM #mytable ) t
ON m.payment_history_id = t.payment_history_id
SQL Fiddle Demo here

Related

SQL - Attain Previous Transaction Informaiton [duplicate]

I need to calculate the difference of a column between two lines of a table. Is there any way I can do this directly in SQL? I'm using Microsoft SQL Server 2008.
I'm looking for something like this:
SELECT value - (previous.value) FROM table
Imagining that the "previous" variable reference the latest selected row. Of course with a select like that I will end up with n-1 rows selected in a table with n rows, that's not a probably, actually is exactly what I need.
Is that possible in some way?
Use the lag function:
SELECT value - lag(value) OVER (ORDER BY Id) FROM table
Sequences used for Ids can skip values, so Id-1 does not always work.
SQL has no built in notion of order, so you need to order by some column for this to be meaningful. Something like this:
select t1.value - t2.value from table t1, table t2
where t1.primaryKey = t2.primaryKey - 1
If you know how to order things but not how to get the previous value given the current one (EG, you want to order alphabetically) then I don't know of a way to do that in standard SQL, but most SQL implementations will have extensions to do it.
Here is a way for SQL server that works if you can order rows such that each one is distinct:
select rank() OVER (ORDER BY id) as 'Rank', value into temp1 from t
select t1.value - t2.value from temp1 t1, temp1 t2
where t1.Rank = t2.Rank - 1
drop table temp1
If you need to break ties, you can add as many columns as necessary to the ORDER BY.
WITH CTE AS (
SELECT
rownum = ROW_NUMBER() OVER (ORDER BY columns_to_order_by),
value
FROM table
)
SELECT
curr.value - prev.value
FROM CTE cur
INNER JOIN CTE prev on prev.rownum = cur.rownum - 1
Oracle, PostgreSQL, SQL Server and many more RDBMS engines have analytic functions called LAG and LEAD that do this very thing.
In SQL Server prior to 2012 you'd need to do the following:
SELECT value - (
SELECT TOP 1 value
FROM mytable m2
WHERE m2.col1 < m1.col1 OR (m2.col1 = m1.col1 AND m2.pk < m1.pk)
ORDER BY
col1, pk
)
FROM mytable m1
ORDER BY
col1, pk
, where COL1 is the column you are ordering by.
Having an index on (COL1, PK) will greatly improve this query.
LEFT JOIN the table to itself, with the join condition worked out so the row matched in the joined version of the table is one row previous, for your particular definition of "previous".
Update: At first I was thinking you would want to keep all rows, with NULLs for the condition where there was no previous row. Reading it again you just want that rows culled, so you should an inner join rather than a left join.
Update:
Newer versions of Sql Server also have the LAG and LEAD Windowing functions that can be used for this, too.
select t2.col from (
select col,MAX(ID) id from
(
select ROW_NUMBER() over(PARTITION by col order by col) id ,col from testtab t1) as t1
group by col) as t2
The selected answer will only work if there are no gaps in the sequence. However if you are using an autogenerated id, there are likely to be gaps in the sequence due to inserts that were rolled back.
This method should work if you have gaps
declare #temp (value int, primaryKey int, tempid int identity)
insert value, primarykey from mytable order by primarykey
select t1.value - t2.value from #temp t1
join #temp t2
on t1.tempid = t2.tempid - 1
Another way to refer to the previous row in an SQL query is to use a recursive common table expression (CTE):
CREATE TABLE t (counter INTEGER);
INSERT INTO t VALUES (1),(2),(3),(4),(5);
WITH cte(counter, previous, difference) AS (
-- Anchor query
SELECT MIN(counter), 0, MIN(counter)
FROM t
UNION ALL
-- Recursive query
SELECT t.counter, cte.counter, t.counter - cte.counter
FROM t JOIN cte ON cte.counter = t.counter - 1
)
SELECT counter, previous, difference
FROM cte
ORDER BY counter;
Result:
counter
previous
difference
1
0
1
2
1
1
3
2
1
4
3
1
5
4
1
The anchor query generates the first row of the common table expression cte where it sets cte.counter to column t.counter in the first row of table t, cte.previous to 0, and cte.difference to the first row of t.counter.
The recursive query joins each row of common table expression cte to the previous row of table t. In the recursive query, cte.counter refers to t.counter in each row of table t, cte.previous refers to cte.counter in the previous row of cte, and t.counter - cte.counter refers to the difference between these two columns.
Note that a recursive CTE is more flexible than the LAG and LEAD functions because a row can refer to any arbitrary result of a previous row. (A recursive function or process is one where the input of the process is the output of the previous iteration of that process, except the first input which is a constant.)
I tested this query at SQLite Online.
You can use the following funtion to get current row value and previous row value:
SELECT value,
min(value) over (order by id rows between 1 preceding and 1
preceding) as value_prev
FROM table
Then you can just select value - value_prev from that select and get your answer

Why TRY_PARSE its so slow?

I have this query that basically returns (right now) only 10 rows as results:
select *
FROM Table1 as o
inner join Table2 as t on t.Field1 = o.Field2
where Code = 123456 and t.FakeData is not null
Now, if I want to parse the field FakeData (which, unfortunately, can contain different types of data, from DateTime to Surname/etc; i.e. nvarchar(70)), for data show and/or filtering:
select *, TRY_PARSE(t.FakeData as date USING 'en-GB') as RealDate
FROM Table1 as o
inner join Table2 as t on t.Field1 = o.Field2
where Code = 123456 and t.FakeData is not null
It takes x10 the query to be executed.
Where am I wrong? How can I speed up?
I can't edit the database, I'm just a customer which read data.
The TSQL documentation for TRY_PARSE makes the following observation:
Keep in mind that there is a certain performance overhead in parsing the string value.
NB: I am assuming your typical date format would be dd/mm/yyyy.
The following is something of a shot-in-the-dark that might help. By progressively assessing the nvarchar column if it is a candidate as a date it is possible to reduce the number of uses of that function. Note that a data point established in one apply can then be referenced in a subsequent apply:
CREATE TABLE mytable(
FakeData NVARCHAR(60) NOT NULL
);
INSERT INTO mytable(FakeData) VALUES (N'oiwsuhd ouhw dcouhw oduch woidhc owihdc oiwhd cowihc');
INSERT INTO mytable(FakeData) VALUES (N'9603200-0297r2-0--824');
INSERT INTO mytable(FakeData) VALUES (N'12/03/1967');
INSERT INTO mytable(FakeData) VALUES (N'12/3/2012');
INSERT INTO mytable(FakeData) VALUES (N'3/3/1812');
INSERT INTO mytable(FakeData) VALUES (N'ohsw dciuh iuh pswiuh piwsuh cpiuwhs dcpiuhws ipdcu wsiu');
select
t.FakeData, oa3.RealDate
from mytable as t
outer apply (
select len(FakeData) as fd_len
) oa1
outer apply (
select case when oa1.fd_len > 10 then 0
when len(replace(FakeData,'/','')) + 2 = oa1.fd_len then 1
else 0
end as is_candidate
) oa2
outer apply (
select case when oa2.is_candidate = 1 then TRY_PARSE(t.FakeData as date USING 'en-GB') end as RealDate
) oa3
FakeData
RealDate
oiwsuhd ouhw dcouhw oduch woidhc owihdc oiwhd cowihc
null
9603200-0297r2-0--824
null
12/03/1967
1967-03-12
12/3/2012
2012-03-12
3/3/1812
1812-03-03
ohsw dciuh iuh pswiuh piwsuh cpiuwhs dcpiuhws ipdcu wsiu
null
db<>fiddle here

Issue with query in DB2

I was using below query in sql server to update the table "TABLE" using the same table "TABLE". In sql server the below query is working fine.But in DB2 its getting failed.Not sure whether I need to make any change in this query to work in DB2.
The error I am getting in DB2 is
ExampleExceptionFormatter: exception message was: DB2 SQL Error:
SQLCODE=-204, SQLSTATE=42704
This is my input Data and there you can see ENO 679 is repeating in both round 3 and round 4.
My expected output is given below. Here I am taking the ID and round value from round 4 and updating rownumber 3 with the ID value from rownumber 4.
My requirement is to find the ENO which is exist in both round 3 and round 4 and update the values accordingly.
UPDATE TGT
SET TGT.ROUND = SRC.ROUND,
TGT.ID = SRC.ID
FROM TABLE TGT INNER JOIN TABLE SRC
ON TGT.ROUND='3' and SRC.ROUND='4' and TGT.ENO = SRC.ENO
Could someone help here please. I tried something like this.But its not working
UPDATE TABLE
SET ID = (SELECT t.ID
FROM TABLE t, TABLE t2
WHERE t.ENO = t2.ENO AND t.ROUND= ='4' AND t2.ROUND='3'
) ,
ROUND= (SELECT t.ROUND
FROM TABLE t, TABLE t2
WHERE t.ENO = t2.ENO AND t.ROUND= ='4' AND t2.ROUND='3')
where ROUND='3'
You may try this. I think the issue is you are not relating your inner subquery with outer main table
UPDATE TABLE TB
SET TB.ID = (SELECT t.ID
FROM TABLE t, TABLE t2
WHERE TB.ENO=t.ENO ---- added this
and t.ENO = t2.ENO AND t.ROUND= ='4' AND t2.ROUND='3'
) ,
TB.ROUND= (SELECT t.ROUND
FROM TABLE t, TABLE t2
WHERE TB.ENO=t.ENO --- added this
and t.ENO = t2.ENO AND t.ROUND= ='4' AND t2.ROUND='3')
where tb.ROUND='3'
Try this:
UPDATE MY_SAMPLE TGT
SET (ID, ROUND) = (SELECT ID, ROUND FROM MY_SAMPLE WHERE ENO = TGT.ENO AND ROUND = 4)
WHERE ROUND = 4 AND EXISTS (SELECT 1 FROM MY_SAMPLE WHERE ENO = TGT.ENO AND ROUND = 4);
The difference with yours is that the correlated subquery has to be a row-subselect, it has to guarantee zero or one row (and will assign nulls in case of returning zero rows). The EXISTS subquery excludes rows for which the correlated subquery will not return rows.

SQL Server copy rows to second table

I have a table for bookings (table_b) that has around 1.3M rows. A second table (table_s) is used to note when these rows are needed to be accessed by a separate application.
Currently there are triggers to make a record in table_s but this doesn't help with all existing data.
I believe I need to have a query that selects the rows that exists in table_b but not table_s and then insert a row for each line.
Here is my current syntax but don't think it has been formed correctly
DECLARE #b_id [INT] = 0;
WHILE(1 = 1)
BEGIN
SELECT TOP 10
#b_id = MIN([b].[b_id])
FROM
[table_b] AS [b]
LEFT JOIN
[table_s] AS [s] ON [b].[b_id] = [s].[b_id]
WHERE
[s].[b_id] IS NULL;
IF #b_id IS NULL
BREAK;
INSERT INTO [table_s] ([b_id], [processed])
VALUES (#b_id, 0);
END;
Syntactically everything is fine. But there are some misconceptions present in your query
select top 10 #b_id = MIN(b.b_id)
a variable can hold just one value, even though you select top 10 it will assign single value to variable. Your current approach will loop for each non existing record
I don't think for 1 million records insert we need to split the insert into batches. Try this way
INSERT INTO table_s
(b_id,
processed)
SELECT b_id,
0
FROM table_b AS b
WHERE NOT EXISTS (SELECT 1
FROM table_s AS s
WHERE b.b_id = s.b_id)

JOIN ON subselect returns what I want, but surrounding select is missing records when subselect returns NULL

I have a table where I am storing records with a Created_On date and a Last_Updated_On date. Each new record will be written with a Created_On, and each subsequent update writes a new row with the same Created_On, but an updated Last_Updated_On.
I am trying to design a query to return the newest row of each. What I have looks something like this:
SELECT
t1.[id] as id,
t1.[Store_Number] as storeNumber,
t1.[Date_Of_Inventory] as dateOfInventory,
t1.[Created_On] as createdOn,
t1.[Last_Updated_On] as lastUpdatedOn
FROM [UserData].[dbo].[StoreResponses] t1
JOIN (
SELECT
[Store_Number],
[Date_Of_Inventory],
MAX([Created_On]) co,
MAX([Last_Updated_On]) luo
FROM [UserData].[dbo].[StoreResponses]
GROUP BY [Store_Number],[Date_Of_Inventory]) t2
ON
t1.[Store_Number] = t2.[Store_Number]
AND t1.[Created_On] = t2.co
AND t1.[Last_Updated_On] = t2.luo
AND t1.[Date_Of_Inventory] = t2.[Date_Of_Inventory]
WHERE t1.[Store_Number] = 123
ORDER BY t1.[Created_On] ASC
The subselect works fine...I see X number of rows, grouped by Store_Number and Date_Of_Inventory, some of which have luo (Last_Updated_On) values of NULL. However, those rows in the sub-select where luo is null do not appear in the overall results. In other words, where I get 6 results in the sub-select, I only get 2 in the overall results, and its only those rows where the Last_Updated_On is not NULL.
So, as a test, I wrote the following:
SELECT 1 WHERE NULL = NULL
And got no results, but, when I run:
SELECT 1 WHERE 1 = 1
I get back a result of 1. Its as if SQL Server is not relating NULL to NULL.
How can I fix this? Why wouldn't two fields compare when both values are NULL?
You could use Coalesce (example assuming Store_Number is an integer)
ON
Coalesce(t1.[Store_Number],0) = Coalesce(t2.[Store_Number],0)
The ANSI Null comparison is not enabled by default; NULL doesn't equal NULL.
You can enable this (if your business case and your Database design usage of NULL requires this) by the Hint:
SET ansi_nulls off
Another alternative basic turn around using:
ON ((t1.[Store_Number] = t2.[Store_Number]) OR
(t1.[Store_Number] IS NULL AND t2.[Store_Number] IS NULL))
Executing your POC:
SET ansi_nulls off
SELECT 1 WHERE NULL = NULL
Returns:
1
This also works:
AND EXISTS (SELECT t1.Store_Number INTERSECT SELECT t2.Store_Number)

Resources