TSql Output From in Update query - sql-server

I'm trying to update a table and return some values in the same query, however one of the values to return is located in a linked table
Since sub-queries (which seem perfectly unambiguous to me) aren't allowed in Output clauses I'm trying to write the query using the Output From syntax with a join but this seems to me to produce all sorts of ambiguity
Consider for example the following query:
UPDATE tbla SET datecol=CURRENT_TIMESTAMP
OUTPUT deleted.datecol AS old, inserted.datecol AS new, b.col2
FROM tbla a LEFT JOIN tblb b ON a.bkey=b.bkey
WHERE akey=6
How does Sql Server know to what that WHERE clause refers? It seems to me that both the UPDATE and FROM parts of the query both qualify for a WHERE clause, so will that WHERE clause restrict which rows update or which rows appear in the output or both?
In my testing I've also seen Sql Server ask for table identifiers to be added to the WHERE clause in situations that I've been unable to pick out a cause or pattern for, so is the tbla referred to in the UPDATE part of the statement implicitly identified as "a" because I've aliased it as that in the FROM statement? If so is "a" referencing inserted or deleted? And if I hadn't aliased it there which version of the table would tbla.akey refer to?
I've not been able to find any decipherable documentation on exactly how this works, and the messages coming back from SQL Server when I'm testing are only making me more confused
I'd also love to know why the following query isn't allowed as it seems like a faultlessly superior way of doing the same thing, certainly not at all ambiguous and a lot more self explanatory
UPDATE tbla SET datecol=CURRENT_TIMESTAMP
OUTPUT deleted.datecol AS old, inserted.datecol AS new,
(SELECT TOP(1) b.col2 FROM tblb b WHERE deleted.bkey = b.bkey) AS col2
WHERE akey=6

You could write the output to a table variable and then work with it:
Could look like that in your case:
DECLARE #output TABLE
(
bkey INT,
datecolold DATETIME,
datecolnew DATETIME
)
UPDATE tbla SET datecol=CURRENT_TIMESTAMP
OUTPUT deleted.bkey, deleted.datecol, inserted.datecol INTO #output
WHERE akey=6
SELECT b.col2, o.* FROM #output o INNER JOIN tblb b ON o.bkey = b.bkey

Related

Temp Table with Wild Card

I need to clean up some observations in a table that are inaccurate prior to joining to the after mentioned table, this will avoid duplicate observation output.
I validated that the max(date_value) removes the 9K inaccurate transactions ..... newer transaction were completed which fixed the problem.
The code below, without into #temp, fixes the issue but as soon as I add a temp table, I get a syntax error will not execute, I need like 20 variables out of the table and really don't feel like listing them all, must be a simple syntax or alternative method.
SELECT * INTO #temp FROM db.dbo.table WHERE MAX(date_value);
SELECT a.* INTO #temp
FROM table a
inner join (select id, max(created_at) as max_created
from db.table
group by id) b
on a.id = b.id

What happens in an UPDATE statement in which the updated table isn't mentioned in the FROM/JOIN clauses?

I intended to run the following UPDATE statement on a SQL Server database table:
UPDATE TABLE_A
SET COL_1=B.COL_1
FROM TABLE_A A
INNER JOIN TABLE_B B
ON A.KEY_1=B.KEY_1
WHERE B.COL_2 IS NOT NULL
AND A.COL_1=91216599
By mistake, I ran the following statement instead:
UPDATE TABLE_A
SET COL_1=B.COL_1
FROM TABLE_A_COPY A
INNER JOIN TABLE_B B
ON A.KEY_1=B.KEY_1
WHERE B.COL_2 is not NULL
AND A.COL_1=91216599
Notice that in this second statement (wrong one), the FROM clause specifies table TABLE_A_COPY instead of TABLE_A. Both tables have exactly the same schema (i.e., same columns) and the same data (before any UPDATE is executed, that is).
Both TABLE_A and TABLE_A_COPY have about 100 million records and the update affects about 500,000 records. The second statement (the wrong one) runs for several hours and fails while the 1st statement (the correct one) runs for 40 seconds and succeeds.
Clearly, both statements are syntactically correct, but I am not sure what exactly I asked SQL Server to do with the first statement.
My questions are:
What SQL Server was trying to do in the second statement? With my mistake I didn't specify the linkage between records from TABLE_A to TABLE_A_COPY, so was it trying to do a CROSS JOIN between the two, and then update each record in TABLE_A a gazillion times?
If it isn't too broad a question to ask, what would be a valid scenario for such an UPDATE statement in which the table being updated is not mentioned in the FROM/JOIN clauses. Why would anyone do that? Why would SQL Server even allow that?
I did try searching for an answer to my questions, but Google seems to think I'm asking about UPDATE FROM syntax.
1) There is no connection between TABLE_A and TABLE_A_COPY so you will get CROSS JOIN and massive update the same row. Result can be non-deterministic if parallel execution is involed:
LiveDemo
CREATE TABLE #TABLE_A(KEY_1 INT PRIMARY KEY,COL_1 INT);
CREATE TABLE #TABLE_A_COPY(KEY_1 INT PRIMARY KEY,COL_1 INT);
CREATE TABLE #TABLE_B(KEY_1 INT PRIMARY KEY, COL_1 INT, COL_2 INT);
INSERT INTO #TABLE_A VALUES (1,91216599),(2,91216599),(3,91216599),
(4,91216599),(5,91216599),(6,6);
INSERT INTO #TABLE_A_COPY VALUES (1,91216599),(2,91216599),(3,91216599),
(4,91216599),(5,91216599),(6,6);
INSERT INTO #TABLE_B VALUES (1,10,10),(2,20,20), (3,30,30);
/*
UPDATE #TABLE_A
SET COL_1=B.COL_1
--SELECT *
FROM #TABLE_A A
INNER JOIN #TABLE_B B
ON A.KEY_1=B.KEY_1
WHERE B.COL_2 IS NOT NULL
AND A.COL_1=91216599;
*/
UPDATE #TABLE_A
SET COL_1=B.COL_1
FROM #TABLE_A_COPY A
INNER JOIN #TABLE_B B
ON A.KEY_1=B.KEY_1
WHERE B.COL_2 is not NULL
AND A.COL_1=91216599
SELECT *
FROM #TABLE_A;
Check in above code how TABLE_A record with KEY_1 = 6 changed.
2)
SQL Server UPDATE FROM/DELETE FROM syntax is much more broad than ANSI standard, the problem you encounter can be reduced to multiple update the same row. With UPDATE you don't get any error or warning:
From Let's deprecate UPDATE FROM! and Deprecate UPDATE FROM and DELETE FROM :
Correctness? Bah, who cares?
Well, most do. That’s why we test.
If I mess up the join criteria in a SELECT query so that too many rows
from the second table match, I’ll see it as soon as I test, because I
get more rows back then expected. If I mess up the subquery criteria
in an ANSI standard UPDATE query in a similar way, I see it even
sooner, because SQL Server will return an error if the subquery
returns more than a single value. But with the proprietary UPDATE FROM
syntax, I can mess up the join and never notice – SQL Server will
happily update the same row over and over again if it matches more
than one row in the joined table, with only the result of the last of
those updates sticking. And there is no way of knowing which row that
will be, since that depends in the query execution plan that happens
to be chosen. A worst case scenario would be one where the execution
plan just happens to result in the expected outcome during all tests
on the single-processor development server – and then, after
deployment to the four-way dual-core production server, our precious
data suddenly hits the fan…
If you use for example MERGE you will get error indicating:
The MERGE statement attempted to UPDATE or DELETE the same row more
than once. This happens when a target row matches more than one source
row. A MERGE statement cannot UPDATE/DELETE the same row of the target
table multiple times. Refine the ON clause to ensure a target row
matches at most one source row, or use the GROUP BY clause to group
the source rows.
So you need to be more carefull and check your code. I wish also to get error but as you see in connect link this won't happen.
One way to avoid this is to use UPDATE alias so you are sure you use tables that take part in FROM JOIN and no other tables are involved.:
UPDATE A
SET COL_1=B.COL_1
FROM #TABLE_A A
INNER JOIN #TABLE_B B
ON A.KEY_1=B.KEY_1
WHERE B.COL_2 IS NOT NULL
AND A.COL_1=91216599;
SQL will allow a lot of stuff that probably does not make sense
Notice tableB is on both side of the on
select *
from tableA
join tableB
on tableB.col1 = tableB.col1
SQL just checks syntax - it is up to you so write a statement that makes sense
There might be some case you really do want to do want a cross product type update
This is how I would write that statement
I line the table names up so it is easier to see
UPDATE TABLE_A
SET A.COL_1 = B.COL_1
FROM TABLE_A A
JOIN TABLE_B B
ON A.KEY_1 = B.KEY_1
AND B.COL_2 IS NOT NULL
AND A.COL_1 = 91216599
AND A.COL_1 <> B.COL_1

How can I use SQL MERGE to upsert single row based on Id, without a source set

There are a number of questions on the proper use of the MERGE statement in SQL Server. They all use a table/set reference for the merge. Is this table reference necessary?
In my case I have a sproc with two parameters #myId and #myValue
I simply want an UPSERT into MyTable based on the column [MyId]
It seems strange that I would have to create a set with
USING (SELECT #myId AS myId) AS source
to proceed with the MERGE (upsert). Is this the only way?
EDIT: voted to close my own question as exact duplicated... but I think the other question's title made it difficult to find.
You can also use this syntax:
merge into MyTable mt
using (values (#myId, #myValue)) t(id, value) on mt.Id = t.id
when not matched then insert /* ... */
You're always going to need a set of some kind.

SQL WHERE NOT EXISTS (skip duplicates)

Hello I'm struggling to get the query below right. What I want is to return rows with unique names and surnames. What I get is all rows with duplicates
This is my sql
DECLARE #tmp AS TABLE (Name VARCHAR(100), Surname VARCHAR(100))
INSERT INTO #tmp
SELECT CustomerName,CustomerSurname FROM Customers
WHERE
NOT EXISTS
(SELECT Name,Surname
FROM #tmp
WHERE Name=CustomerName
AND ID Surname=CustomerSurname
GROUP BY Name,Surname )
Please can someone point me in the right direction here.
//Desperate (I tried without GROUP BY as well but get same result)
DISTINCT would do the trick.
SELECT DISTINCT CustomerName, CustomerSurname
FROM Customers
Demo
If you only want the records that really don't have duplicates (as opposed to getting duplicates represented as a single record) you could use GROUP BY and HAVING:
SELECT CustomerName, CustomerSurname
FROM Customers
GROUP BY CustomerName, CustomerSurname
HAVING COUNT(*) = 1
Demo
First, I thought that #David answer is what you want. But rereading your comments, perhaps you want all combinations of Names and Surnames:
SELECT n.CustomerName, s.CustomerSurname
FROM
( SELECT DISTINCT CustomerName
FROM Customers
) AS n
CROSS JOIN
( SELECT DISTINCT CustomerSurname
FROM Customers
) AS s ;
Are you doing that while your #Tmp table is still empty?
If so: your entire "select" is fully evaluated before the "insert" statement, it doesn't do "run the query and add one row, insert the row, run the query and get another row, insert the row, etc."
If you want to insert unique Customers only, use that same "Customer" table in your not exists clause
SELECT c.CustomerName,c.CustomerSurname FROM Customers c
WHERE
NOT EXISTS
(SELECT 1
FROM Customers c1
WHERE c.CustomerName = c1.CustomerName
AND c.CustomerSurname = c1.CustomerSurname
AND c.Id <> c1.Id)
If you want to insert a unique set of customers, use "distinct"
Typically, if you're doing a WHERE NOT EXISTS or WHERE EXISTS, or WHERE NOT IN subquery,
you should use what is called a "correlated subquery", as in ypercube's answer above, where table aliases are used for both inside and outside tables (where inside table is joined to outside table). ypercube gave a good example.
And often, NOT EXISTS is preferred over NOT IN (unless the WHERE NOT IN is selecting from a totally unrelated table that you can't join on.)
Sometimes if you're tempted to do a WHERE EXISTS (SELECT from a small table with no duplicate values in column), you could also do the same thing by joining the main query with that table on the column you want in the EXISTS. Not always the best or safest solution, might make query slower if there are many rows in that table and could cause many duplicate rows if there are dup values for that column in the joined table -- in which case you'd have to add DISTINCT to the main query, which causes it to SORT the data on all columns.
-- Not efficient at all.
And, similarly, the WHERE NOT IN or NOT EXISTS correlated subqueries can be accomplished (and give the exact same execution plan) if you LEFT OUTER JOIN the table you were going to subquery -- and add a WHERE . IS NULL.
You have to be careful using that, but you don't need a DISTINCT. Frankly, I prefer to use the WHERE NOT IN subqueries or NOT EXISTS correlated subqueries, because the syntax makes the intention clear and it's hard to go wrong.
And you do not need a DISTINCT in the SELECT inside such subqueries (correlated or not). It would be a waste of processing (and for WHERE EXISTS or WHERE IN subqueries, the SQL optimizer would ignore it anyway and just use the first value that matched for each row in the outer query). (Hope that makes sense.)

Output to Table Variable, Not CTE

Why is this legal:
DECLARE #Party TABLE
(
PartyID nvarchar(10)
)
INSERT INTO #Party
SELECT Name FROM
(INSERT INTO SomeOtherTable
OUTPUT inserted.Name
VALUES ('hello')) [H]
SELECT * FROM #Party
But the next block gives me an error:
WITH Hey (Name)
AS (
SELECT Name FROM
(INSERT INTO SomeOtherTable
OUTPUT inserted.Name
VALUES ('hello')) [H]
)
SELECT * FROM Hey
The second block gives me the error "A nested INSERT, UPDATE, DELETE, or MERGE statement is not allowed in a SELECT statement that is not the immediate source of rows for an INSERT statement.
It seems to be saying thst nested INSERT statements are allowed, but in my CTE case I did not nest inside another INSERT. I'm surprised at this restriction. Any work-arounds in my CTE case?
As for why this is illegal, allowing these SELECT operations with side effects would cause all sorts of problems I imagine.
CTEs do not get materialised in advance into their own temporary table so what should the following return?
;WITH Hey (Name)
AS
(
...
)
SELECT name
FROM Hey
JOIN some_other_table ON Name = name
If the Query Optimiser decided to use a nested loops plan and Hey as the driving table then presumably one insert would occur. However if it used some_other_table as the driving table then the CTE would get evaluated as many times as there were rows in that other table and so multiple inserts would occur. Except if the Query Optimiser decided to add a spool to the plan and then it would only get evaluated once.
Presumably avoiding this sort of mess is the motivation for this restriction (as with the restrictions on side effects in functions)

Resources