Wildcard character in value in MERGE ON statement - sql-server

I have a merge statement that starts like this:
MERGE INTO TEMSPASA
USING (SELECT *
FROM OPENQUERY(orad, 'SELECT * FROM CDAS.TDWHCORG')) AS TDWHPASA ON TEMSPASA.pasa_cd = LTRIM(RTRIM(TDWHPASA.corg_id)) AND
TEMSPASA.pasa_active_ind = TDWHPASA.corg_active_ind
WHEN MATCHED THEN
UPDATE
SET
TEMSPASA.pasa_desc = LTRIM(RTRIM(TDWHPASA.corg_nm)),
TEMSPASA.pasa_active_ind = TDWHPASA.corg_active_ind
WHEN NOT MATCHED THEN
INSERT (pasa_cd, pasa_desc, pasa_active_ind)
VALUES (LTRIM(RTRIM(TDWHPASA.corg_id)), TDWHPASA.corg_nm, TDWHPASA.corg_active_ind);
There are pasa_cd's like ('H04', 'H04*') where that * is NOT a wildcard. But I think the on statement is treating it like it is a wildcard because when I try to run the merge statement, I get the following error:
The MERGE statement attempted to UPDATE or DELETE the same row more than once. This happens when a target row matches more than one source row. A MERGE statement cannot UPDATE/DELETE the same row of the target table multiple times. Refine the ON clause to ensure a target row matches at most one source row, or use the GROUP BY clause to group the source rows.
I have verified that there are no duplicates in my table. The only thing I can think of is what I mentioned above, that the ON part of the merge statement is seeing that * as a wildcard.
I have tried searching, saw something about an escape character, but that was in the where clause. Any ideas how to deal with this?

This means you have more than 1 row that is matching between the source and target tables. You need to find out what row(s) are the issue here. It could be from either table. Something like this should help you identify where the problem is coming from.
SELECT LTRIM(RTRIM(TDWHPASA.corg_id))
, TDWHPASA.corg_active_ind
FROM CDAS.TDWHCORG as TDWHPASA
group by LTRIM(RTRIM(TDWHPASA.corg_id))
, TDWHPASA.corg_active_ind
having count(*) > 1
select t.pasa_cd
, t.pasa_active_ind
from TEMSPASA t
group by t.pasa_cd
, t.pasa_active_ind
having count(*) > 1

So my theory in my initial post was incorrect. There were duplicates in my source table, but it was based on the case sensitivity. There were values H04F and H04f. Both different rows, but because of the case insensitivity in my sql, it was seeing them as duplicates. To resolve the issue I added COLLATE Latin1_General_CS_AS to the end of the ON clause and it did the trick

Related

Need to Add Values to Certain Items

I have a table that I need to add the same values to a whole bunch of items
(in a nut shell if the item doesn't have a UNIT of "CTN" I want to add the same values i have listed to them all)
I thought the following would work but it doesn't :(
Any idea what i am doing wrong ?
INSERT INTO ICUNIT
(UNIT,AUDTDATE,AUDTTIME,AUDTUSER,AUDTORG,CONVERSION)
VALUES ('CTN','20220509','22513927','ADMIN','AU','1')
WHERE ITEMNO In '0','etc','etc','etc'
If I understand correctly you might want to use INSERT INTO ... SELECT from original table with your condition.
INSERT INTO ICUNIT (UNIT,AUDTDATE,AUDTTIME,AUDTUSER,AUDTORG,CONVERSION)
SELECT 'CTN','20220509','22513927','ADMIN','AU','1'
FROM ICUNIT
WHERE ITEMNO In ('0','etc','etc','etc')
The query you needs starts by selecting the filtered items. So it seems something like below is your starting point
select <?> from dbo.ICUNIT as icu where icu.UNIT <> 'CTN' order by ...;
Notice the use of schema name, terminators, and table aliases - all best practices. I will guess that a given "item" can have multiple rows in this table so long as ICUNIT is unique within ITEMNO. Correct? If so, the above query won't work. So let's try slightly more complicated filtering.
select distinct icu.ITEMNO
from dbo.ICUNIT as icu
where not exists (select * from dbo.ICUNIT as ctns
where ctns.ITEMNO = icu.ITEMNO -- correlating the subquery
and ctns.UNIT = 'CTN')
order by ...;
There are other ways to do that above but that is one common way. That query will produce a resultset of all ITEMNO values in your table that do not already have a row where UNIT is "CTN". If you need to filter that for specific ITEMNO values you simply adjust the WHERE clause. If that works correctly, you can use that with your insert statement to then insert the desired rows.
insert into dbo.ICUNIT (...)
select distinct icu.ITEMNO, 'CTN', '20220509', '22513927', 'ADMIN', 'AU', '1'
from ...
;

MERGE: INSERT on MATCH

I am downloading data that will have duplicates with previously downloaded data.
I am successfully using a MERGE statement to throw away the duplicates based on transaction number. Supposedly that is sufficient, but I would like to monitor if the detail ever changes on a particular transaction.
To that end I added a when matched clause on the merge with an additional insert that identifies the record as a duplicate.
This logic should not trigger very often so I am not too concerned that this method (if it worked) would report the same duplicate multiple times.
When I prepare this code I get this error message:
An action of type 'INSERT' is not allowed in the 'WHEN MATCHED' clause of a MERGE statement.
Is there a way to get the duplicate record to insert into this table or another table using the MERGE statement?
I am open to other solutions, but I would really like to find a way to do this with the MERGE statement because it would impact my code the least.
MERGE INTO dbo.TransactionDetail as t
USING (SELECT #TransNr --bigint
,#Detail -- [VARCHAR](50) NOT NULL
) as s
([TranNr]
,[Detail]
)
on t.TranNr = s.TranNr and t.CHANGED_RECORD = 0
when not matched then
INSERT (CHANGED_RECORD
,[TranNr]
,[Detail]
)
VALUES(0, s.TranNr, s.Detail)
/* Adding this does not allow statement to be prepared....
when matched and s.Detail <> t.Detail
then
INSERT (CHANGED_RECORD
,[TranNr]
,[Detail]
)
VALUES(1, s.TranNr, s.Detail)
*/
;
You can use an INSERT statement, like this:
INSERT INTO dbo.TransactionDetail (CHANGED_RECORD,TranNr,Detail)
SELECT CASE WHEN EXISTS (
SELECT * FROM dbo.TransactionDetail
WHERE TranNr=#TransNr AND CHANGED_RECORD=0 AND Detail<>#Detail
) THEN 1 ELSE 0 END AS CHANGED_RECORD,
#TransNr AS TranNr, #Detail AS Detail
WHERE NOT EXISTS (
SELECT * FROM dbo.TransactionDetail
WHERE TranNr=#TransNr AND CHANGED_RECORD=0 AND Detail=#Detail
)
This will skip inserting if a row which has CHANGED_RECORD=0 has the same detail. However, if the same detail it's found in another row which has CHANGED_RECORD=1, a new duplicate would be inserted. To avoid that, remove the AND CHANGED_RECORD=0 condition from the WHERE NOT EXISTS subquery.
You may also want to create a unique filtered index, to ensure unicity for the rows which have CHANGED_RECORD=0:
CREATE UNIQUE INDEX IX_TransactionDetail_Filtered
ON TransactionDetail (TranNr) /*INCLUDE (Detail)*/ WHERE CHANGED_RECORD=0
The INCLUDE (Detail) clause could also marginally improve performance of queries that are looking for the Detail of the rows which have CHANGED_RECORD=0 (at the expense of some additional disk space and a small performance penalty when updating the Detail column of existing rows).

T-SQL not equal operator vs Case statement

Assume I have a T-SQL statement:
select * from MyTable
where Code != 'RandomCode'
I've been tasked with making this kind of where statement perform more quickly. Books Online says that positive queries (=) are faster than negative (!= , <>).
So, one option is make this into a CASE statement e.g.
select * from MyTable
where
case when Code = 'RandomCode' then 0
else 1 end = 1
Does anyone know if this can be expected to be faster or slower than the original T-SQL ?
Thanks in advance.
You have to be more specific at what information you are interested in the table and what are the possible values of the Code column. Then you can create appropriate indexes to speed up the query.
For example, if values in the Code column could only be one of 'RandomCode', 'OtherCode', 'YetAnotherCode', you can re-write the query as:
SELECT * FROM MyTable WHERE Code = 'OtherCode' OR Code = 'YetAnotherCode'
And of course you need an index on the Code column.
If you have to do an inequality query, you can change SELECT * to a more narrow query like:
SELECT Id, Name, Whatever FROM MyTable WHERE Code != 'RandomCode'
Then create an index like:
CREATE INDEX idx_Code ON MyTable(Code) INCLUDE (Id,Name,Whatever)
This can reduce I/O by replacing a table scan with an index scan.

LIKE vs CONTAINS on SQL Server

Which one of the following queries is faster (LIKE vs CONTAINS)?
SELECT * FROM table WHERE Column LIKE '%test%';
or
SELECT * FROM table WHERE Contains(Column, "test");
The second (assuming you means CONTAINS, and actually put it in a valid query) should be faster, because it can use some form of index (in this case, a full text index). Of course, this form of query is only available if the column is in a full text index. If it isn't, then only the first form is available.
The first query, using LIKE, will be unable to use an index, since it starts with a wildcard, so will always require a full table scan.
The CONTAINS query should be:
SELECT * FROM table WHERE CONTAINS(Column, 'test');
Having run both queries on a SQL Server 2012 instance, I can confirm the first query was fastest in my case.
The query with the LIKE keyword showed a clustered index scan.
The CONTAINS also had a clustered index scan with additional operators for the full text match and a merge join.
I think that CONTAINS took longer and used Merge because you had a dash("-") in your query adventure-works.com.
The dash is a break word so the CONTAINS searched the full-text index for adventure and than it searched for works.com and merged the results.
Also try changing from this:
SELECT * FROM table WHERE Contains(Column, "test") > 0;
To this:
SELECT * FROM table WHERE Contains(Column, '"*test*"') > 0;
The former will find records with values like "this is a test" and "a test-case is the plan".
The latter will also find records with values like "i am testing this" and "this is the greatest".
I didn't understand actually what is going on with "Contains" keyword. I set a full text index on a column. I run some queries on the table.
Like returns 450.518 rows but contains not and like's result is correct
SELECT COL FROM TBL WHERE COL LIKE '%41%' --450.518 rows
SELECT COL FROM TBL WHERE CONTAINS(COL,N'41') ---40 rows
SELECT COL FROM TBL WHERE CONTAINS(COL,N'"*41*"') -- 220.364 rows

How does sql server choose values in an update statement where there are multiple options?

I have an update statement in SQL server where there are four possible values that can be assigned based on the join. It appears that SQL has an algorithm for choosing one value over another, and I'm not sure how that algorithm works.
As an example, say there is a table called Source with two columns (Match and Data) structured as below:
(The match column contains only 1's, the Data column increments by 1 for every row)
Match Data
`--------------------------
1 1
1 2
1 3
1 4
That table will update another table called Destination with the same two columns structured as below:
Match Data
`--------------------------
1 NULL
If you want to update the ID field in Destination in the following way:
UPDATE
Destination
SET
Data = Source.Data
FROM
Destination
INNER JOIN
Source
ON
Destination.Match = Source.Match
there will be four possible options that Destination.ID will be set to after this query is run. I've found that messing with the indexes of Source will have an impact on what Destination is set to, and it appears that SQL Server just updates the Destination table with the first value it finds that matches.
Is that accurate? Is it possible that SQL Server is updating the Destination with every possible value sequentially and I end up with the same kind of result as if it were updating with the first value it finds? It seems to be possibly problematic that it will seemingly randomly choose one row to update, as opposed to throwing an error when presented with this situation.
Thank you.
P.S. I apologize for the poor formatting. Hopefully, the intent is clear.
It sets all of the results to the Data. Which one you end up with after the query depends on the order of the results returned (which one it sets last).
Since there's no ORDER BY clause, you're left with whatever order Sql Server comes up with. That will normally follow the physical order of the records on disk, and that in turn typically follows the clustered index for a table. But this order isn't set in stone, particularly when joins are involved. If a join matches on a column with an index other than the clustered index, it may well order the results based on that index instead. In the end, unless you give it an ORDER BY clause, Sql Server will return the results in whatever order it thinks it can do fastest.
You can play with this by turning your upate query into a select query, so you can see the results. Notice which record comes first and which record comes last in the source table for each record of the destination table. Compare that with the results of your update query. Then play with your indexes again and check the results once more to see what you get.
Of course, it can be tricky here because UPDATE statements are not allowed to use an ORDER BY clause, so regardless of what you find, you should really write the join so it matches the destination table 1:1. You may find the APPLY operator useful in achieving this goal, and you can use it to effectively JOIN to another table and guarantee the join only matches one record.
The choice is not deterministic and it can be any of the source rows.
You can try
DECLARE #Source TABLE(Match INT, Data INT);
INSERT INTO #Source
VALUES
(1, 1),
(1, 2),
(1, 3),
(1, 4);
DECLARE #Destination TABLE(Match INT, Data INT);
INSERT INTO #Destination
VALUES
(1, NULL);
UPDATE Destination
SET Data = Source.Data
FROM #Destination Destination
INNER JOIN #Source Source
ON Destination.Match = Source.Match;
SELECT *
FROM #Destination;
And look at the actual execution plan. I see the following.
The output columns from #Destination are Bmk1000, Match. Bmk1000 is an internal row identifier (used here due to lack of clustered index in this example) and would be different for each row emitted from #Destination (if there was more than one).
The single row is then joined onto the four matching rows in #Source and the resultant four rows are passed into a stream aggregate.
The stream aggregate groups by Bmk1000 and collapses the multiple matching rows down to one. The operation performed by this aggregate is ANY(#Source.[Data]).
The ANY aggregate is an internal aggregate function not available in TSQL itself. No guarantees are made about which of the four source rows will be chosen.
Finally the single row per group feeds into the UPDATE operator to update the row with whatever value the ANY aggregate returned.
If you want deterministic results then you can use an aggregate function yourself...
WITH GroupedSource AS
(
SELECT Match,
MAX(Data) AS Data
FROM #Source
GROUP BY Match
)
UPDATE Destination
SET Data = Source.Data
FROM #Destination Destination
INNER JOIN GroupedSource Source
ON Destination.Match = Source.Match;
Or use ROW_NUMBER...
WITH RankedSource AS
(
SELECT Match,
Data,
ROW_NUMBER() OVER (PARTITION BY Match ORDER BY Data DESC) AS RN
FROM #Source
)
UPDATE Destination
SET Data = Source.Data
FROM #Destination Destination
INNER JOIN RankedSource Source
ON Destination.Match = Source.Match
WHERE RN = 1;
The latter form is generally more useful as in the event you need to set multiple columns this will ensure that all values used are from the same source row. In order to be deterministic the combination of partition by and order by columns should be unique.

Resources