Consolidation of two data rows in a loop for n occurrences - sql-server

We are running a table that holds some information for order of new products.
From time to time we receive new orders from a 3rd party system and insert them into our DB.
Sometimes, however, for a specific order there is already an entry in our table.
So instead of checking if there already IS an order, the colleagues just inserts new data sets into our table.
Now that the process of inserting is streamlined, I am supposed to consolidate the existing duplicates in the table.
The table looks like this:
I have 138 of these pairs where the PreOrderNumber occurrs twice. I'd like to insert the FK_VehicleFile number and the CommissionNumber to the row where the FK_Checklist is set and delete the duplicate with the missing FK_Checklist after that.
My idea is to write a transact script that looks like this:
First I store all the PreOrderNumbers that have duplicates in its an own table:
DECLARE #ResultSet TABLE ( PK_OrderNumber int,
FK_Checklist int,
FK_VehicleFile int,
PreOrderNumbers varchar(20))
INSERT INTO #ResultSet
SELECT PK_OrderNumber, PreOrderNumber
FROM [LUX_WEB_SAM].[dbo].[OrderNumbers]
GROUP BY PreOrderNumber
HAVING (COUNT(PreOrderNumber) > 1)
And that's it so far.
I'm very new to these kind of SQL scripts.
I think I need to use some kind of loop over all entries in the #ResultSet table to grab the FK_VehicleFile and CommissionNumber from the first data set and store them in the second data set.
Or do you have and suggestions how to solve this problem in a more easy way?

This response uses a CTE:
WITH [MergedOrders] AS
(
Select
ROW_NUMBER() OVER(PARTITION BY row1.PreOrderNumber ORDER BY row1.PK_OrderNumber) AS Instance
,row1.PK_OrderNumber AS PK_OrderNumber
,ISNULL(row1.FK_Checklist,row2.FK_Checklist) AS FK_Checklist
,ISNULL(row1.FK_VehicleFile,row2.FK_VehicleFile) AS FK_VehicleFile
,ISNULL(row1.PreOrderNumber,row2.PreOrderNumber) AS PreOrderNumber
,ISNULL(row1.CommissionNumber,row2.CommissionNumber) AS CommissionNumber
FROM [LUX_WEB_SAM].[dbo].[OrderNumbers] AS row1
INNER JOIN [LUX_WEB_SAM].[dbo].[OrderNumbers] AS row2
ON row1.PreOrderNumber = row2.PreOrderNumber
AND row1.PK_OrderNumber <> row2.PK_OrderNumber
)
SELECT
[PK_OrderNumber]
,[FK_Checklist]
,[FK_VehicleFile]
,[PreOrderNumber]
,[CommissionNumber]
FROM [MergedOrders]
WHERE Instance = 1 /* If we were to maintain Order Number of second instance, use 2 */
Here's the explanation:
A Common Table Expression (CTE) acts as an in-memory table, which we use to extract all rows that are repeated (NB: The INNER JOIN statement ensures that only rows that occur twice are selected). We use ISNULL to switch out values where one or the other is NULL, then select the output for our destination table.

You can take help from following scripts to perform your UPDATE and DELETE action.
Please keep in mind that both UPDATE and DELETE are risky operation and do your test first with test data.
CREATE TABLE #T(
Col1 VARCHAR(100),
Col2 VARCHAR(100),
Col3 VARCHAR(100),
Col4 VARCHAR(100),
Col5 VARCHAR(100)
)
INSERT INTO #T(Col1,Col2,Col3,Col4,Col5)
VALUES(30,NULL,222,00000002222,096),
(25,163,NULL,00000002222,NULL),
(30,163,NULL,00000002230,NULL)
SELECT * FROM #T
UPDATE A
SET A.Col3 = B.Col3, A.Col5 = B.Col5
FROM #T A
INNER JOIN #T B ON A.Col4 = B.Col4
WHERE A.Col2 IS NOT NULL AND B.Col2 IS NULL
DELETE FROM #T
WHERE Col4 IN (
SELECT Col4 FROM #T
GROUP BY Col4
HAVING COUNT(*) = 2
)
AND Col2 IS NULL
SELECT * FROM #T

Related

T-SQL Update Record need to add condition which requires JOIN

I have an existing query which updates specific values (represented as Col1, Col2, Col3 here) of all records in a table to the current values of those Cols for a defined source record in that table. This source record is defined by an Id which is a Foreign Key in this table (represented as FKRecId here)
DECLARE #SourceRecordId VARCHAR(50)
SET #SourceRecordId= N'{the Id}'
UPDATE dbo.TargetTable
SET [Col1] = SourceData.Col1
,[Col2] = SourceData.Col2
,[Col3] = SourceData.Col3
FROM ( SELECT Col1, Col2, Col3 FROM TargetTable WHERE FKRecId = #SourceRecordId ) SourceData
WHERE ((FKRecId != #SourceRecordId) AND
(CAST(TargetTable.Col1 AS NVARCHAR(MAX)) != CAST(SourceData.Col1 AS NVARCHAR(MAX)) OR
CAST(TargetTable.Col2 AS NVARCHAR(MAX)) != CAST(SourceData.Col2 AS NVARCHAR(MAX)) OR
CAST(TargetTable.Col3 AS NVARCHAR(MAX)) != CAST(SourceData.Col3 AS NVARCHAR(MAX)) )
GO
This currently updates all other records in the table using the values defined in the source record. However, I now need to add a further condition that it only updates those records where they are the same "type" as the source record. The value of "Type" is held in another table - (the Primary Key of that table entry is the Foreign Key Id mentioned above).
I have attempted to modify the query to incorporate a JOIN but I have limited T-SQL experience and have been struggling to get the syntax correct. The portion I have tried to update is shown below
FROM ( SELECT Col1, Col2, Col3 FROM TargetTable WHERE FKRecId = #SourceRecordId ) SourceData
JOIN dbo.OtherTable t On t.PrimaryKey = TargetTable.FKRecId
WHERE ((FKRecId != #SourceRecordId) AND (t.Type = KnownType) AND
This leads to an error in the SQL Server query editor at the TargetTable element of the JOIN statement. If I just put in the column name the intellisense error shows as "Invalid column name", if i qualify it with the table name it becomes "the multi part identifier xxx could not be bound".
Any help on where I'm going wrong with my syntax would be appreciated, I'd also appreciate if anyone could point me towards an online reference resource with examples of T-SQL statements that goes beyond the basics that I may haev fiound the answer to this in.
Thanks.
You should use an UPDATE statement like this:
DECLARE #SourceRecordId VARCHAR(50)
SET #SourceRecordId= N'{the Id}'
UPDATE U
SET U.[Col1] = SourceData.Col1
,U.[Col2] = SourceData.Col2
,U.[Col3] = SourceData.Col3
FROM dbo.TargetTable U
INNER JOIN dbo.OtherTable T ON T.PrimaryKey=U.FKRecId AND T.Type=KnownType
INNER JOIN dbo.TargetTable SourceData ON SourceData.FKRecId=#SourceRecordId
WHERE U.FKRecId!=#SourceRecordId AND
(CAST(U.Col1 AS NVARCHAR(MAX)) != CAST(SourceData.Col1 AS NVARCHAR(MAX)) OR
CAST(U.Col2 AS NVARCHAR(MAX)) != CAST(SourceData.Col2 AS NVARCHAR(MAX)) OR
CAST(U.Col3 AS NVARCHAR(MAX)) != CAST(SourceData.Col3 AS NVARCHAR(MAX)) )
Basically, the FROM and WHERE clause would be identical as for a SELECT statement. In the first line of the UPDATE statement you need to mention the alias of the table you want to update and in the SET part you specify the columns of that alias on the left side of the equals sign.
If the Col1, Col2 and Col3 columns are nullable, then you have to take care that the compare's will work correctly in case of NULL's.
You are not returning FKRecId column from SourceData subquery and attempting to access it with TargetTable whilst you already assigned a SourceData alias to that subquery and must use it:
FROM ( SELECT FKRecId, Col1, Col2, Col3 FROM TargetTable WHERE FKRecId = #SourceRecordId ) SourceData
JOIN dbo.OtherTable t On t.PrimaryKey = SourceData.FKRecId
WHERE ((FKRecId != #SourceRecordId) AND (t.Type = KnownType) AND
or without subquery
SELECT
tt.FKRecId,
tt.Col1,
tt.Col2,
tt.Col3
FROM TargetTable as tt
JOIN dbo.OtherTable t On t.PrimaryKey = tt.FKRecId
WHERE t.FKRecId != #SourceRecordId AND t.Type = KnownType
AND tt.FKRecId = #SourceRecordId
not sure if combination of filters with #SourceRecordId is logically okay.

T-SQL: Two Level Aggregation in Same Query

I have a query that joins a master and a detail table. Master table records are duplicated in results as expected. I get aggregation on detail table an it works fine. But I also need another aggregation on master table at the same time. But as master table is duplicated, aggregation results are duplicated too.
I want to demonstrate this situation as below;
If Object_Id('tempdb..#data') Is Not Null Drop Table #data
Create Table #data (Id int, GroupId int, Value int)
If Object_Id('tempdb..#groups') Is Not Null Drop Table #groups
Create Table #groups (Id int, Value int)
/* insert groups */
Insert #groups (Id, Value)
Values (1,100), (2,200), (3, 200)
/* insert data */
Insert #data (Id, GroupId, Value)
Values (1,1,10),
(2,1,20),
(3,2,50),
(4,2,60),
(5,2,70),
(6,3,90)
My select query is
Select Sum(data.Value) As Data_Value,
Sum(groups.Value) As Group_Value
From #data data
Inner Join #groups groups On groups.Id = data.GroupId
The result is;
Data_Value Group_Value
300 1000
Expected result is;
Data_Value Group_Value
300 500
Please note that, derived table or sub-query is not an option. Also Sum(Distinct groups.Value) is not suitable for my case.
If I am not wrong, you just want to sum value column of both table and show it in a single row. in that case you don't need to join those just select the sum as a column like :
SELECT (SELECT SUM(VALUE) AS Data_Value FROM #DATA),
(SELECT SUM(VALUE) AS Group_Value FROM #groups)
SELECT
(
Select Sum(d.Value) From #data d
WHERE EXISTS (SELECT 1 FROM #groups WHERE Id = d.GroupId )
) AS Data_Value
,(
SELECT Sum( g.Value) FROM #groups g
WHERE EXISTS (SELECT 1 FROM #data WHERE GroupId = g.Id)
) AS Group_Value
I'm not sure what you are looking for. But it seems like you want the value from one group and the collected value that represents a group in the data table.
In that case I would suggest something like this.
select Sum(t.Data_Value) as Data_Value, Sum(t.Group_Value) as Group_Value
from
(select Sum(data.Value) As Data_Value, groups.Value As Group_Value
from data
inner join groups on groups.Id = data.GroupId
group by groups.Id, groups.Value)
as t
The edit should do the trick for you.

Use same GUID twice per row

I am building an application to transfer data from an SQL server to an offsite location via ftp and XML files.
I am building the XML data for each file via a query with FOR XML PATH('path'), TYPE.
I'm going to use a GUID to generate the filename as well as use as an identiifier within the file, currently my SQL to get the table is as follows (simplified):
SELECT LVL1.inv_account_no
, LVL1.cus_postcode
, CONVERT(varchar(255),NEWID()) + '.xml' as FileName
, (SELECT (SELECT CONVERT(varchar(255),NEWID()) FOR XML PATH('ident'), TYPE), (
SELECT.... [rest of very long nested select code for generating XML]
SQL Fiddle Example
This is giving me:
Account Postcode FileName xCol
AD0001 B30 3HX 2DF21466-2DA3-4D62-8B9B-FC3DF7BD1A00 <ident>656700EA-8FD5-4936-8172-0135DC49D200</ident>
AS0010 NN12 8TN 58339997-8271-4D8C-9C55-403DE98F06BE <ident>78F8078B-629E-4906-9C6B-2AE21782DC1D</ident>
Basically different GUID's for each row/use of NEWID().
Is there a way I can insert the same GUID into both columns without incrementing a cursor or doing two updates?
Try something like this:
SELECT GeneratedGuid, GeneratedGuid
FROM YourTable
LEFT JOIN (SELECT NEWID() AS GeneratedGuid) AS gg ON 1 = 1
"GeneratedGuid" has a different GUID for every row.
You could use a Common Table Expression to generate the NEWID for each resulting row.
Here is the SQL Fiddle : http://www.sqlfiddle.com/#!3/74c0c/1
CREATE TABLE TBL (
ID INT IDENTITY(1,1) NOT NULL PRIMARY KEY,
GCOL VARCHAR(255),
XCOL XML)
create table tbl2 (
id int identity(1,1) not null primary key,
foo int not null )
insert into tbl2 (foo) values
(10),(20),(30)
; WITH cte_a as ( select NEWID() as ID )
INSERT INTO TBL
SELECT CONVERT(varchar(255),cte_a.ID )
, (SELECT CONVERT(varchar(255),cte_a.ID) FOR XML PATH('Output'), TYPE)
from tbl2, cte_a
Create a temporary variable and assign a NEWID() to it. Then you can use this variable as many times in your SELECT or INSERT queries. Value of temporary variable remain same till it's scope. In following example #gid is a temporary variable and assigned a GUID value as VARCHAR(36)
DECLARE #gid varchar(36)
SET #gid = CAST(NEWID() AS VARCHAR(36))
INSERT INTO [tableName] (col1, col2) VALUES(#gid, #gid)
after executing the above query col1 and col2 from [tablename] will have same guid value.

Insert from single table into multiple tables, invalid column name error

I am trying to do the following but getting an "Invalid Column Name {column}" error. Can someone please help me see the error of my ways? We recently split a transaction table into 2 tables, one containing the often updated report column names and the other containing the unchanging transactions. This leave me trying to change what was a simple insert into 1 table to a complex insert into 2 tables with unique columns. I attempted to do that like so:
INSERT INTO dbo.ReportColumns
(
FullName
,Type
,Classification
)
OUTPUT INSERTED.Date, INSERTED.Amount, INSERTED.Id INTO dbo.Transactions
SELECT
[Date]
,Amount
,FullName
,Type
,Classification
FROM {multiple tables}
The "INSERTED.Date, INSERTED.Amount" are the source of the errors, with or without the "INSERTED." in front.
-----------------UPDATE------------------
Aaron was correct and it was impossible to manage with an insert but I was able to vastly improve the functionality of the insert and add some other business rules with the Merge functionality. My final solution resembles the following:
DECLARE #TransactionsTemp TABLE
(
[Date] DATE NOT NULL,
Amount MONEY NOT NULL,
ReportColumnsId INT NOT NULL
)
MERGE INTO dbo.ReportColumns AS Trgt
USING ( SELECT
{FK}
,[Date]
,Amount
,FullName
,Type
,Classification
FROM {multiple tables}) AS Src
ON Src.{FK} = Trgt.{FK}
WHEN MATCHED THEN
UPDATE SET
Trgt.FullName = Src.FullName,
Trgt.Type= Src.Type,
Trgt.Classification = Src.Classification
WHEN NOT MATCHED BY TARGET THEN
INSERT
(
FullName,
Type,
Classification
)
VALUES
(
Src.FullName,
Src.Type,
Src.Classification
)
OUTPUT Src.[Date], Src.Amount, INSERTED.Id INTO #TransactionsTemp;
MERGE INTO dbo.FinancialReport AS Trgt
USING (SELECT
[Date] ,
Amount ,
ReportColumnsId
FROM #TransactionsTemp) AS Src
ON Src.[Date] = Trgt.[Date] AND Src.ReportColumnsId = Trgt.ReportColumnsId
WHEN NOT MATCHED BY TARGET And Src.Amount <> 0 THEN
INSERT
(
[Date],
Amount,
ReportColumnsId
)
VALUES
(
Src.[Date],
Src.Amount,
Src.ReportColumnsId
)
WHEN MATCHED And Src.Amount <> 0 THEN
UPDATE SET Trgt.Amount = Src.Amount
WHEN MATCHED And Src.Amount = 0 THEN
DELETE;
Hope that helps someone else in the future. :)
Output clause will return values you are inserting into a table, you need multiple inserts, you can try something like following
declare #staging table (datecolumn date, amount decimal(18,2),
fullname varchar(50), type varchar(10),
Classification varchar(255));
INSERT INTO #staging
SELECT
[Date]
,Amount
,FullName
,Type
,Classification
FROM {multiple tables}
Declare #temp table (id int, fullname varchar(50), type varchar(10));
INSERT INTO dbo.ReportColumns
(
FullName
,Type
,Classification
)
OUTPUT INSERTED.id, INSERTED.fullname, INSERTED.type INTO #temp
SELECT
FullName
,Type
,Classification
FROM #stage
INSERT into dbo.transacrions (id, date, amount)
select t.id, s.datecolumn, s.amount from #temp t
inner join #stage s on t.fullname = s.fullname and t.type = s.type
I am fairly certain you will need to have two inserts (or create a view and use an instead of insert trigger). You can only use the OUTPUT clause to send variables or actual inserted values ti another table. You can't use it to split up a select into two destination tables during an insert.
If you provide more information (like how the table has been split up and how the rows are related) we can probably provide a more specific answer.

Table variable in SQL Server

I am using SQL Server 2005. I have heard that we can use a table variable to use instead of LEFT OUTER JOIN.
What I understand is that, we have to put all the values from the left table to the table variable, first. Then we have to UPDATE the table variable with the right table values. Then select from the table variable.
Has anyone come across this kind of approach? Could you please suggest a real time example (with query)?
I have not written any query for this. My question is - if someone has used a similar approach, I would like to know the scenario and how it is handled. I understand that in some cases it may be slower than the LEFT OUTER JOIN.
Please assume that we are dealing with tables that have less than 5000 records.
Thanks
It can be done, but I have no idea why you would ever want to do it.
This realy does seem like it is being done backwards. But if you are trying this for your own learning only, here goes:
DECLARE #MainTable TABLE(
ID INT,
Val FLOAT
)
INSERT INTO #MainTable SELECT 1, 1
INSERT INTO #MainTable SELECT 2, 2
INSERT INTO #MainTable SELECT 3, 3
INSERT INTO #MainTable SELECT 4, 4
DECLARE #LeftTable TABLE(
ID INT,
MainID INT,
Val FLOAT
)
INSERT INTO #LeftTable SELECT 1, 1, 11
INSERT INTO #LeftTable SELECT 3, 3, 33
SELECT *,
mt.Val + ISNULL(lt.Val, 0)
FROM #MainTable mt LEFT JOIN
#LeftTable lt ON mt.ID = lt.MainID
DECLARE #Table TABLE(
ID INT,
Val FLOAT
)
INSERT INTO #Table
SELECT ID,
Val
FROM #MainTable
UPDATE #Table
SET Val = t.Val + lt.Val
FROM #Table t INNER JOIN
#LeftTable lt ON t.ID = lt.ID
SELECT *
FROM #Table
I don't think it's very clear from your question what you want to achieve? (What your tables look like, and what result you want). But you can certainly select data into a variable of a table datatype, and tamper with it. It's quite convenient:
DECLARE #tbl TABLE (id INT IDENTITY(1,1), userId int, foreignId int)
INSERT INTO #tbl (userId)
SELECT id FROM users
WHERE name LIKE 'a%'
UPDATE #tbl t
SET
foreignId = (SELECT id FROM foreignTable f WHERE f.userId = t.userId)
In that example I gave the table variable an identity column of its own, distinct from the one in the source table. I often find that useful. Adjust as you like... Again, it's not very clear what the question is, but I hope this might guide you in the right direction...?
Every scenario is different, and without full details on a specific case it's difficult to say whether it would be a good approach for you.
Having said that, I would not be looking to use the table variable approach unless I had a specific functional reason to - if the query can be fulfilled with a standard SELECT query using an OUTER JOIN, then I'd use that as I'd expect that to be most efficient.
The times where you may want to use a temp table/table variable instead, are more when you want to get an intermediary resultset and then do some processing on it before then returning it out - i.e. the kind of processing that cannot be done with a straight forward query.
Note the table variables are very handy, but take into account that they are not guaranteed to reside in-memory - they can get persisted to tempdb like standard temp tables.
Thank you, astander.
I tried with an example given below. Both of the approaches took 19 seconds. However, I guess some tuning will help the Table varaible update approach to become faster than LEFT JOIN.
AS I am not a master in tuning I request your help. Any SQL expert ready to prove it?
---- PLease replace "" with '' below. I am not familiar with how to put code in this forum... It causes some troubles....
CREATE TABLE #MainTable (
CustomerID INT PRIMARY KEY,
FirstName VARCHAR(100)
)
DECLARE #Count INT
SET #Count = 0
DECLARE #Iterator INT
SET #Iterator = 0
WHILE #Count <8000
BEGIN
INSERT INTO #MainTable SELECT #Count, "Cust"+CONVERT(VARCHAR(10),#Count)
SET #Count = #Count+1
END
CREATE TABLE #RightTable
(
OrderID INT PRIMARY KEY,
CustomerID INT,
Product VARCHAR(100)
)
CREATE INDEX [IDX_CustomerID] ON #RightTable (CustomerID)
WHILE #Iterator <400000
BEGIN
IF #Iterator % 2 = 0
BEGIN
INSERT INTO #RightTable SELECT #Iterator,2, "Prod"+CONVERT(VARCHAR(10),#Iterator)
END
ELSE
BEGIN
INSERT INTO #RightTable SELECT #Iterator,1, "Prod"+CONVERT(VARCHAR(10),#Iterator)
END
SET #Iterator = #Iterator+1
END
-- Using LEFT JOIN
SELECT mt.CustomerID,mt.FirstName,COUNT(rt.Product) [CountResult]
FROM #MainTable mt
LEFT JOIN #RightTable rt ON mt.CustomerID = rt.CustomerID
GROUP BY mt.CustomerID,mt.FirstName
---------------------------
-- Using Table variable Update
DECLARE #WorkingTableVariable TABLE
(
CustomerID INT,
FirstName VARCHAR(100),
ProductCount INT
)
INSERT
INTO #WorkingTableVariable (CustomerID,FirstName)
SELECT CustomerID, FirstName FROM #MainTable
UPDATE #WorkingTableVariable
SET ProductCount = [Count]
FROM #WorkingTableVariable wt
INNER JOIN
(SELECT CustomerID,COUNT(rt.Product) AS [Count]
FROM #RightTable rt
GROUP BY CustomerID) IV ON wt.CustomerID = IV.CustomerID
SELECT CustomerID,FirstName, ISNULL(ProductCount,0) [CountResult] FROM #WorkingTableVariable
ORDER BY CustomerID
--------
DROP TABLE #MainTable
DROP TABLE #RightTable
Thanks
Lijo
In my opinion there is one reason to do this:
If you have a complicated query with lots of inner joins and one left join you sometimes get in trouble because this query is hundreds of times less fast than using the same query without the left join.
If you query lots of records with a result of very few records to be joined to the left join you could get faster results if you materialize the intermediate result into a table variable or temp table.
But usually there is no need to really update the data in the table variable - you could query the table variable using the left join to return the result.
... just my two cents.

Resources