Compare two rows and identify columns whose values are different - sql-server

The Situation
We have an application where we store machine settings in a SQL table. When the user changes a parameter of the machine, we create a "revision", that means we insert a row into a table. This table has about 200 columns.
In our application, the user can take a look on each revision.
The Problem
We want to highlight the parameters that have changed since the last revision.
The Question
Is there an SQL-only way to get the column names of the differences between two rows?
An Example
ID | p_x | p_y | p_z
--------------------
11 | xxx | yyy | zzz
12 | xxy | yyy | zzy
The query should return p_x and p_z.
EDIT
The table has 200 columns, not rows...
MY WAY OUT
My intention was to find a "one-line-SQL-statement" for this problem.
I see in the answers below, it's kind a bigger thing in SQL.
As there is no short, SQL-included solution for this problem, solving it in the backend of our software (c#) is of course much easier!
But as this is not a real "answer" to my question, I don't mark it as answered.
Thanks for the help.

You say:
We want to highlight the parameters that have changed since the last revision.
This implies that you want the display (or report) to make the parameters that changed stand out.
If you're going to show all the parameters anyway, it would be a lot easier to do this programmatically in the front end. It would be a much simpler problem in a programming language. Unfortunately, not knowing what your front end is, I can't give you particular recommendations.
If you really can't do it in the front end but have to receive this information in a query from the database (you did say "SQL-only"), you need to specify the format you'd like the data in. A single-column list of the columns that changed between the two records? A list of columns with a flag indicating which columns did or didn't change?
But here's one way that would work, though in the process it converts all your fields to nvarchars before it does its comparison:
Use the technique described here (disclaimer: that's my blog) to transform your records into ID-name-value pairs.
Join the resulting data set to itself on ID, so that you can compare the values and print those that have changed:
with A as (
-- We're going to return the product ID, plus an XML version of the
-- entire record.
select ID
, (
Select *
from myTable
where ID = pp.ID
for xml auto, type) as X
from myTable pp )
, B as (
-- We're going to run an Xml query against the XML field, and transform it
-- into a series of name-value pairs. But X2 will still be a single XML
-- field, associated with this ID.
select Id
, X.query(
'for $f in myTable/#*
return
<data name="{ local-name($f) }" value="{ data($f) }" />
')
as X2 from A
)
, C as (
-- We're going to run the Nodes function against the X2 field, splitting
-- our list of "data" elements into individual nodes. We will then use
-- the Value function to extract the name and value.
select B.ID as ID
, norm.data.value('#name', 'nvarchar(max)') as Name
, norm.data.value('#value', 'nvarchar(max)') as Value
from B cross apply B.X2.nodes('/myTable') as norm(data))
-- Select our results.
select *
from ( select * from C where ID = 123) C1
full outer join ( select * from C where ID = 345) C2
on C1.Name = c2.Name
where c1.Value <> c2.Value
or not (c1.Value is null and c2.Value is null)

You can use unpivot and pivot. The key is to transpose data so that you can use where [11] != [12].
WITH CTE AS (
SELECT *
FROM
(
SELECT ID, colName, val
FROM tblName
UNPIVOT
(
val
FOR colName IN ([p_x],[p_y],[p_z])
) unpiv
) src
PIVOT
(
MAX(val)
FOR ID IN ([11], [12])
) piv
)
SELECT colName
--SELECT *
FROM CTE WHERE [11] != [12]
If there are only a few columns in the table, it's easy to simply put [p_x],[p_y],[p_z], but obviously it's not convenient to type 50 or more columns. Even though you may use this trick to drag and drop, or copy/paste, the column names from the table, it's still bulky. And for that, you may use the SELECT * EXCEPT strategy with dynamic sql.
DECLARE #TSQL NVARCHAR(MAX), #colNames NVARCHAR(MAX)
SELECT #colNames = COALESCE(#colNames + ',' ,'') + [name]
FROM syscolumns WHERE name <> 'ID' and id = (SELECT id FROM sysobjects WHERE name = 'tablelName')
SET #TSQL = '
WITH CTE AS (
SELECT *
FROM
(
SELECT ID, colName, val
FROM tablelName
UNPIVOT
(
val
FOR colName IN (' + #colNames + ')
) unpiv
) src
PIVOT
(
MAX(val)
FOR ID IN ([11], [12])
) piv
)
--SELECT colName
SELECT *
FROM CTE WHERE [11] != [12]
'
EXEC sp_executesql #TSQL

Here's one way using UNPIVOT:
;WITH
cte AS
(
SELECT CASE WHEN t1.p_x <> t2.p_x THEN 1 ELSE 0 END As p_x,
CASE WHEN t1.p_y <> t2.p_y THEN 1 ELSE 0 END As p_y,
CASE WHEN t1.p_z <> t2.p_z THEN 1 ELSE 0 END As p_z
FROM MyTable t1, MyTable t2
WHERE t1.ID = 11 AND t2.ID = 12 -- enter the two revisions to compare here
)
SELECT *
FROM cte
UNPIVOT (
Changed FOR ColumnName IN (p_x, p_y, p_z)
) upvt
WHERE upvt.Changed = 1
You have to add code to handle NULLs during the comparisons. You can also build the query dynamically if there are lots of columns in your table.

for sql server 2012 you can do something like that (duplicate it for
each column):
SELECT iif((p_x != lead(p_x) over(ORDER BY p_x)),
(SELECT COLUMN_NAME
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'tbl'
AND
TABLE_SCHEMA='schema'
AND
ORDINAL_POSITION='1')
,NULL)
FROM tbl
for sql server 2008 try
DECLARE #x int =11 -- first id
WHILE #x!=(SELECT count(1) FROM tbl)
BEGIN --comparison of two adjacent rows
if (SELECT p_x FROM tbl WHERE id=#x)!=(SELECT p_x FROM tbl WHERE id=#x+1)
BEGIN
SELECT COLUMN_NAME
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'tbl' --insert your table
AND
TABLE_SCHEMA='schema' --insert your schema
AND
ORDINAL_POSITION='1' --first column 'p_x'
END
set #x=#x+1
END

Related

How to get only the columns that have at least one non-null value in a table existing in SQL server

I have a table named Product with following data for instance:
p_id
p_name
p_cat
1
shirt
null
2
null
null
3
cap
null
Suppose I don't know numbre of rows and columns in the table as well as I don't know which columns are compltely null (no non-null value in all of its rows). How to write a query to retrieve just the columns that have atleast one non-null value in its rows. My approach is as follows but not getting a corret output:
select
column_name
into #TempColumns
from information_schema.columns
where
table_name = 'Product'
and table_schema = 'DDB'
declare #CurrentColumn nvarchar(max) = '', #IsNull bit, #NonNullCols nvarchar(max) = ''
declare Cur cursor for
select column_name from #TempColumns
open Cur
while 1=1
begin
fetch next from Cur into #CurrentColumn
select #IsNull = case when count(#CurrentColumn) > 0 then 0 else 1 end
from Product
if #IsNull = 1
begin
set #NonNullCols = #NonNullCols + ',' + #CurrentColumn
end
if ##fetch_status <> 0 break
end
close Cur
deallocate Cur
select #NonNullCols as NullColumns
drop table #TempColumns
If there is any other approach or correction in my above (T-SQL) query. Thanks in advance.
First I just create a temporary table to store all the column names avaialbe in the Product table. Then I looped in this temporary table and feteched each row and checked it on the product table whether the column is comptely null or not using the count() function. The condition sets the bit variable 1 if the column is completely null and then that particular column name is stored in anothe variable which is then retrieved as null columns.
Here is a conceptual example for you.
It is using SQL Server XML and XQuery powers without dynamic SQL and cursors/loops.
The algorithm is very simple.
When we are converting each row into XML, columns that hold NULL value are missing from the XML.
SQL
USE tempdb;
GO
DROP TABLE IF EXISTS #tmpTable;
CREATE TABLE #tmpTable (
client_id int,
client_name varchar(500),
client_surname varchar(500),
city varchar(500),
state varchar(500));
INSERT #tmpTable VALUES
(1,'Miriam',NULL,'Las Vegas',NULL),
(2,'Astrid',NULL,'Chicago',NULL),
(3,'David',NULL,'Phoenix',NULL),
(4,'Hiroki',NULL,'Orlando',NULL);
SELECT DISTINCT x.value('local-name(.)', 'SYSNAME') AS NotNULLColumns
FROM #tmpTable AS t
CROSS APPLY (SELECT t.* FOR XML PATH(''), TYPE, ROOT('root')) AS t1(c)
CROSS APPLY c.nodes('/root/*') AS t2(x);
SQL #2
To handle edge cases.
SELECT DISTINCT x.value('local-name(.)', 'SYSNAME') AS NotNULLColumns
FROM #tmpTable AS t
CROSS APPLY (SELECT t.* FOR XML RAW, ELEMENTS, BINARY BASE64, TYPE, ROOT('root')) AS t1(c)
CROSS APPLY c.nodes('/root/row/*') AS t2(x);
Output
NotNULLColumns
city
client_id
client_name

Substring is slow with while loop in SQL Server

One of my table column stores ~650,000 characters (each value of the column contains entire table). I know its bad design however, Client will not be able to change it.
I am tasked to convert the column into multiple columns.
I chose to use dbo.DelimitedSplit8K function
Unfortunately, it can only handle 8k characters at max.
So I decided to split the column into 81 8k batches using while loop and store the same in a variable table (temp or normal table made no improvement)
DECLARE #tab1 table ( serialnumber int, etext nvarchar(1000))
declare #scriptquan int = (select MAX(len (errortext)/8000) from mytable)
DECLARE #Counter INT
DECLARE #A bigint = 1
DECLARE #B bigint = 8000
SET #Counter=1
WHILE ( #Counter <= #scriptquan + 1)
BEGIN
insert into #tab1 select ItemNumber, Item from dbo.mytable cross apply dbo.DelimitedSplit8K(substring(errortext, #A, #B), CHAR(13)+CHAR(10))
SET #A = #A + 8000
SET #B = #B + 8000
SET #Counter = #Counter + 1
END
This followed by using below code
declare #tab2 table (Item nvarchar(max),itemnumber int, Colseq varchar(10)) -- declare table variable
;with cte as (
select [etext] ,ItemNumber, Item from #tab1 -- insert table name
cross apply dbo.DelimitedSplit8K(etext,' ')) -- insert table columns name that contains text
insert into #tab2 Select Item,itemnumber, 'a'+ cast (ItemNumber as varchar) colseq
from cte -- insert values to table variable
;WITH Tbl(item, colseq) AS(
select item, colseq from #tab2
),
CteRn AS(
SELECT item, colseq,
Rn = ROW_NUMBER() OVER(PARTITION BY colseq ORDER BY colseq)
FROM Tbl
)
SELECT
a1 Time,a2 Number,a3 Type,a4 Remarks
FROM CteRn r
PIVOT(
MAX(item)
FOR colseq IN(a1,a2,a3,a4)
)p
where a3 = 'error'
gives the desired output. However, just the loop takes 15 minutes to complete and overall query completes by 27 minutes. Is there any way I can make it faster? Total row count in my table is 2. So I don't think Index can help.
Client uses Azure SQL Database so I can't choose PowerShell or Python to accomplish this either.
Please let me know if more information is needed. I tried my best to mention everything I could.

TSQL Where clause based on temp table data

I have a straight forward SQL query that I am working with and trying to figure out the best way to approach the where clause.
Essentially, there are two temp tables created and if there is data in the XML string passed to the stored procedure, those tables are populated.
My where clause needs to check these temp tables for data, and if there is no data, it ignores them like they are not there and fetches all data.
-- Create temp tables to hold our XML filter criteria
DECLARE #users AS TABLE (QID VARCHAR(10))
DECLARE #dls AS TABLE (dlName VARCHAR(50))
-- Insert our XML filters
IF #xml.exist('/root/data/users') > 0
BEGIN
INSERT INTO #users( QID )
SELECT ParamValues.x1.value('QID[1]', 'varchar(10)')
FROM #xml.nodes('/root/data/users/user') AS ParamValues(x1)
END
IF #xml.exist('/root/data/dls') > 0
BEGIN
INSERT INTO #dls( dlName )
SELECT ParamValues.x1.value('dlName[1]', 'varchar(50)')
FROM #xml.nodes('/root/data/dld/dl') AS ParamValues(x1)
END
-- Fetch our document details based on the XML provided
SELECT d.documentID ,
d.sopID ,
d.documentName ,
d.folderLocation ,
d.userGroup ,
d.notes
FROM dbo.Documents AS d
LEFT JOIN dbo.DocumentContacts AS dc
ON dc.documentID = d.documentID
LEFT JOIN dbo.DocumentContactsDLs AS dl
ON dl.documentID = d.documentID
-- How can I make these two logic checks work only if there is data, otherwise, include everything.
WHERE dc.QID IN (SELECT QID FROM #users)
AND dl.DL IN (SELECT dlName FROM #dls)
FOR XML PATH ('data'), ELEMENTS, TYPE, ROOT('root');
In the query above, I am trying to used the data in the temp tables only if there is data in them, otherwise, it needs to act like that where statement isn't there for that specific value and include records regardless.
Example: If only #users had data, it would ignore AND dl.DL IN (SELECT dlName FROM #dls) and get everything, regardless of what was in the DL column on those joined records.
Use NOT EXISTS to check the existence of any record in variable table. Here is one way
WHERE ( dc.QID IN (SELECT QID FROM #users)
OR NOT EXISTS (SELECT 1 FROM #users) )
AND ( dl.DL IN (SELECT dlName FROM #dls)
OR NOT EXISTS (SELECT 1 FROM #dls) )
Try this. But please note that I did not get a chance to test it properly and I believe that you want to check the values in #users first and if there is no record existing in that table, then you want to check with the entries in #dls. Also if there are no entries in both of these tables, then you want to skip both the tables.
DECLARE #fl bit = 0
SELECT #fl = CASE WHEN EXISTS (SELECT 1 FROM #users) THEN
1
WHEN EXISTS (SELECT 1 FROM #dls) THEN
2
ELSE
0
END
WHERE ( (dc.QID IN (SELECT QID FROM #users) AND #fl = 1)
OR
(dl.DL IN (SELECT dlName FROM #dls) AND #fl = 2)
OR (1=1 AND #fl = 0)
)

Create a comma separated string with numbers 1 to x, where x is read from the record

I have a table document with a field steps. This is an integer field and can contain a number between 1 and 1000.
Now a new field is added (followedsteps) which must contain the numbers from 1 to [the number from field steps], comma separated.
So when the field steps contains the number 5, I want this string 1,2,3,4,5 to be set in the new column followedsteps.
The field steps is not null-able, lowest value is 1.
Is there an (easy) way to do this?
It's a one time migration.
As you are going to perform this only one time, it will be better to generate first the sequences:
IF OBJECT_ID('tempdb..#DataSource') IS NOT NULL
BEGIN;
DROP TABLE #DataSource;
END;
CREATE TABLE #DataSource
(
[ID] INT
,[Sequence] VARCHAR(MAX)
);
DECLARE #MaximumID INT = 1000; -- in your case: SELECT MAX(steps) FROM document
WITH DataSource AS
(
SELECT 1 AS num
UNION ALL
SELECT num+1
FROM DataSource
WHERE num+1<=#MaximumID
)
INSERT INTO #DataSource
SELECT A.[num]
,DS.[Sequence]
FROM DataSource A
CROSS APPLY
(
SELECT STUFF
(
(
SELECT ',' + CAST(B.[num] AS VARCHAR(12))
FROM DataSource B
WHERE A.[num] >= B.[num]
ORDER BY B.[num]
FOR XML PATH(''), TYPE
).value('.', 'VARCHAR(MAX)')
,1
,1
,''
)
) DS ([Sequence])
option (maxrecursion 32767)
The code above creates a temporary table with data you need to perform the update:
Then in transaction, perform the update by [ID]:
BEGIN TRAN
UPDATE document
SET followedsteps = [Sequence]
FROM document A
INNER JOIN ##DataSource B
ON A.[steps] = b.[id]
COMMIT TRAN

SQL Server: UPDATE a table by using ORDER BY

I would like to know if there is a way to use an order by clause when updating a table. I am updating a table and setting a consecutive number, that's why the order of the update is important. Using the following sql statement, I was able to solve it without using a cursor:
DECLARE #Number INT = 0
UPDATE Test
SET #Number = Number = #Number +1
now what I'd like to to do is an order by clause like so:
DECLARE #Number INT = 0
UPDATE Test
SET #Number = Number = #Number +1
ORDER BY Test.Id DESC
I've read: How to update and order by using ms sql The solutions to this question do not solve the ordering problem - they just filter the items on which the update is applied.
Take care,
Martin
No.
Not a documented 100% supported way. There is an approach sometimes used for calculating running totals called "quirky update" that suggests that it might update in order of clustered index if certain conditions are met but as far as I know this relies completely on empirical observation rather than any guarantee.
But what version of SQL Server are you on? If SQL2005+ you might be able to do something with row_number and a CTE (You can update the CTE)
With cte As
(
SELECT id,Number,
ROW_NUMBER() OVER (ORDER BY id DESC) AS RN
FROM Test
)
UPDATE cte SET Number=RN
You can not use ORDER BY as part of the UPDATE statement (you can use in sub-selects that are part of the update).
UPDATE Test
SET Number = rowNumber
FROM Test
INNER JOIN
(SELECT ID, row_number() OVER (ORDER BY ID DESC) as rowNumber
FROM Test) drRowNumbers ON drRowNumbers.ID = Test.ID
Edit
Following solution could have problems with clustered indexes involved as mentioned here. Thanks to Martin for pointing this out.
The answer is kept to educate those (like me) who don't know all side-effects or ins and outs of SQL Server.
Expanding on the answer gaven by Quassnoi in your link, following works
DECLARE #Test TABLE (Number INTEGER, AText VARCHAR(2), ID INTEGER)
DECLARE #Number INT
INSERT INTO #Test VALUES (1, 'A', 1)
INSERT INTO #Test VALUES (2, 'B', 2)
INSERT INTO #Test VALUES (1, 'E', 5)
INSERT INTO #Test VALUES (3, 'C', 3)
INSERT INTO #Test VALUES (2, 'D', 4)
SET #Number = 0
;WITH q AS (
SELECT TOP 1000000 *
FROM #Test
ORDER BY
ID
)
UPDATE q
SET #Number = Number = #Number + 1
The row_number() function would be the best approach to this problem.
UPDATE T
SET T.Number = R.rowNum
FROM Test T
JOIN (
SELECT T2.id,row_number() over (order by T2.Id desc) rowNum from Test T2
) R on T.id=R.id
update based on Ordering by the order of values in a SQL IN() clause
Solution:
DECLARE #counter int
SET #counter = 0
;WITH q AS
(
select * from Products WHERE ID in (SELECT TOP (10) ID FROM Products WHERE ID IN( 3,2,1)
ORDER BY ID DESC)
)
update q set Display= #counter, #counter = #counter + 1
This updates based on descending 3,2,1
Hope helps someone.
I had a similar problem and solved it using ROW_NUMBER() in combination with the OVER keyword. The task was to retrospectively populate a new TicketNo (integer) field in a simple table based on the original CreatedDate, and grouped by ModuleId - so that ticket numbers started at 1 within each Module group and incremented by date. The table already had a TicketID primary key (a GUID).
Here's the SQL:
UPDATE Tickets SET TicketNo=T2.RowNo
FROM Tickets
INNER JOIN
(select TicketID, TicketNo,
ROW_NUMBER() OVER (PARTITION BY ModuleId ORDER BY DateCreated) AS RowNo from Tickets)
AS T2 ON T2.TicketID = Tickets.TicketID
Worked a treat!
I ran into the same problem and was able to resolve it in very powerful way that allows unlimited sorting possibilities.
I created a View using (saving) 2 sort orders (*explanation on how to do so below).
After that I simply applied the update queries to the View created and it worked great.
Here are the 2 queries I used on the view:
1st Query:
Update MyView
Set SortID=0
2nd Query:
DECLARE #sortID int
SET #sortID = 0
UPDATE MyView
SET #sortID = sortID = #sortID + 1
*To be able to save the sorting on the View I put TOP into the SELECT statement. This very useful workaround allows the View results to be returned sorted as set when the View was created when the View is opened. In my case it looked like:
(NOTE: Using this workaround will place an big load on the server if using a large table and it is therefore recommended to include as few fields as possible in the view if working with large tables)
SELECT TOP (600000)
dbo.Items.ID, dbo.Items.Code, dbo.Items.SortID, dbo.Supplier.Date,
dbo.Supplier.Code AS Expr1
FROM dbo.Items INNER JOIN
dbo.Supplier ON dbo.Items.SupplierCode = dbo.Supplier.Code
ORDER BY dbo.Supplier.Date, dbo.Items.ID DESC
Running: SQL Server 2005 on a Windows Server 2003
Additional Keywords: How to Update a SQL column with Ascending or Descending Numbers - Numeric Values / how to set order in SQL update statement / how to save order by in sql view / increment sql update / auto autoincrement sql update / create sql field with ascending numbers
SET #pos := 0;
UPDATE TABLE_NAME SET Roll_No = ( SELECT #pos := #pos + 1 ) ORDER BY First_Name ASC;
In the above example query simply update the student Roll_No column depending on the student Frist_Name column. From 1 to No_of_records in the table. I hope it's clear now.
IF OBJECT_ID('tempdb..#TAB') IS NOT NULL
BEGIN
DROP TABLE #TAB
END
CREATE TABLE #TAB(CH1 INT,CH2 INT,CH3 INT)
DECLARE #CH2 INT = NULL , #CH3 INT=NULL,#SPID INT=NULL,#SQL NVARCHAR(4000)='', #ParmDefinition NVARCHAR(50)= '',
#RET_MESSAGE AS VARCHAR(8000)='',#RET_ERROR INT=0
SET #ParmDefinition='#SPID INT,#CH2 INT OUTPUT,#CH3 INT OUTPUT'
SET #SQL='UPDATE T
SET CH1=#SPID,#CH2= T.CH2,#CH3= T.CH3
FROM #TAB T WITH(ROWLOCK)
INNER JOIN (
SELECT TOP(1) CH1,CH2,CH3
FROM
#TAB WITH(NOLOCK)
WHERE CH1 IS NULL
ORDER BY CH2 DESC) V ON T.CH2= V.CH2 AND T.CH3= V.CH3'
INSERT INTO #TAB
(CH2 ,CH3 )
SELECT 1,2 UNION ALL
SELECT 2,3 UNION ALL
SELECT 3,4
BEGIN TRY
WHILE EXISTS(SELECT TOP 1 1 FROM #TAB WHERE CH1 IS NULL)
BEGIN
EXECUTE #RET_ERROR = sp_executesql #SQL, #ParmDefinition,#SPID =##SPID, #CH2=#CH2 OUTPUT,#CH3=#CH3 OUTPUT;
SELECT * FROM #TAB
SELECT #CH2,#CH3
END
END TRY
BEGIN CATCH
SET #RET_ERROR=ERROR_NUMBER()
SET #RET_MESSAGE = '#ERROR_NUMBER : ' + CAST(ERROR_NUMBER() AS VARCHAR(255)) + '#ERROR_SEVERITY :' + CAST( ERROR_SEVERITY() AS VARCHAR(255))
+ '#ERROR_STATE :' + CAST(ERROR_STATE() AS VARCHAR(255)) + '#ERROR_LINE :' + CAST( ERROR_LINE() AS VARCHAR(255))
+ '#ERROR_MESSAGE :' + ERROR_MESSAGE() ;
SELECT #RET_ERROR,#RET_MESSAGE;
END CATCH

Resources