Output to Table Variable, Not CTE - sql-server

Why is this legal:
DECLARE #Party TABLE
(
PartyID nvarchar(10)
)
INSERT INTO #Party
SELECT Name FROM
(INSERT INTO SomeOtherTable
OUTPUT inserted.Name
VALUES ('hello')) [H]
SELECT * FROM #Party
But the next block gives me an error:
WITH Hey (Name)
AS (
SELECT Name FROM
(INSERT INTO SomeOtherTable
OUTPUT inserted.Name
VALUES ('hello')) [H]
)
SELECT * FROM Hey
The second block gives me the error "A nested INSERT, UPDATE, DELETE, or MERGE statement is not allowed in a SELECT statement that is not the immediate source of rows for an INSERT statement.
It seems to be saying thst nested INSERT statements are allowed, but in my CTE case I did not nest inside another INSERT. I'm surprised at this restriction. Any work-arounds in my CTE case?

As for why this is illegal, allowing these SELECT operations with side effects would cause all sorts of problems I imagine.
CTEs do not get materialised in advance into their own temporary table so what should the following return?
;WITH Hey (Name)
AS
(
...
)
SELECT name
FROM Hey
JOIN some_other_table ON Name = name
If the Query Optimiser decided to use a nested loops plan and Hey as the driving table then presumably one insert would occur. However if it used some_other_table as the driving table then the CTE would get evaluated as many times as there were rows in that other table and so multiple inserts would occur. Except if the Query Optimiser decided to add a spool to the plan and then it would only get evaluated once.
Presumably avoiding this sort of mess is the motivation for this restriction (as with the restrictions on side effects in functions)

Related

What is the "lifespan" of a postgres CTE expression? e.g. WITH... AS

I have a CTE I am using to pull some data from two tables then stick in an intermediate table called cte_list, something like
with cte_list as (
select pl.col_val from prune_list pl join employees.employee emp on pl.col_val::uuid = emp.id
where pl.col_nm = 'employee_ref_id' limit 100
)
Then, I am doing an insert to move records from the cte_list to another archive table (if they don't exist) called employee_arch_test
insert into employees.employee_arch_test (
select * from employees.employee where id in (select col_val::uuid from cte_list)
and not exists (select 1 from employees.employee_arch_test where employees.employee_arch_test.id=employees.employee.id)
);
This seems to work fine. The problem is when I add another statement after, to do some deletions from the main employee table using this aforementioned cte_list - the cte_list apparently no longer exists?
SQL Error [42P01]: ERROR: relation "cte_list" does not exist
the actual delete query:
delete from employees.employee where id in (select col_val::uuid from cte_list);
Can the cte_list CTE table only be used once or something? I'm running these statements in a LOOP and I need to run the exact same calls for about 2 or 3 other tables but hit a sticking point here.
A CTE only exists for the duration of the statement of which it's a part. I gather you have an INSERT statement with the CTE preceding it:
with cte_list
as (select pl.col_val
from prune_list pl
join employees.employee emp
on pl.col_val::uuid = emp.id
where pl.col_nm = 'employee_ref_id'
limit 100
)
insert into employees.employee_arch_test
(select *
from employees.employee
where id in (select col_val::uuid from cte_list)
and not exists (select 1
from employees.employee_arch_test
where employees.employee_arch_test.id = employees.employee.id)
);
The CTE is part of the INSERT statement - it is not a separate statement by itself. It only exists for the duration of the INSERT statement.
If you need something which lasts longer your options are:
Add the same CTE to each of your following statements. Note that because data may be changing in your database each invocation of the CTE may return different data.
Create a view which performs the same operations as the CTE, then use the view in place of the CTE. Note that because data may be changing in your database each invocation of the view may return different data.
Create a temporary table to hold the data from your CTE query, then use the temporary table in place of the CTE. This has the advantage of providing a consistent set of data to all operations.

erratic "delayed" CTE evaluation?

I observe a behaviour with CTEs which I did not expect (and seems inconsistent).
Not quite sure that it is correct...
Basically, through a CTE, I filter rows to avoid a particular problem, then use the result of that CTE to perform calculations that would break on the problematic rows which I thought I eliminated in my CTE...
Take a simple table with a varchar column that often has a number in it, but not always
CREATE TABLE MY_TABLE(ROW_ID INTEGER NOT NULL
, GOOD_ROW BOOLEAN NOT NULL
, SOME_VALUE VARCHAR NOT NULL);
INSERT INTO MY_TABLE(ROW_ID, GOOD_ROW, SOME_VALUE)
VALUES(1, TRUE, '1'), (2, TRUE, '2'), (3, FALSE, 'ABC');
I also create a small table with just numbers to join on
CREATE TABLE NUMBERS(NUMBER_ID INTEGER NOT NULL);
INSERT INTO NUMBERS(NUMBER_ID) VALUES(1), (2), (3);
Joining these two tables on SOME_VALUE results in an error because 'ABC' is not numeric and it appears that the JOIN is evaluated BEFORE the WHERE clause (BAD implications on performance here...)
SELECT *
FROM MY_TABLE
INNER JOIN NUMBERS ON NUMBERS.NUMBER_ID = TO_NUMBER(SOME_VALUE)
WHERE ROW_ID < 3; --> ERROR
So, I try to filter my first table through a CTE which only return rows for which SOME_VALUE is numeric
WITH ONLY_GOOD_ONES
AS (
SELECT SOME_VALUE
FROM MY_TABLE
WHERE GOOD_ROW = TRUE
)
SELECT *
FROM ONLY_GOOD_ONES;
Now, I would expect to be able to use the result of this CTE with SOME_VALUE being numeric.
WITH ONLY_GOOD_ONES
AS (
SELECT SOME_VALUE
FROM MY_TABLE
WHERE GOOD_ROW = TRUE
)
SELECT *
FROM ONLY_GOOD_ONES
INNER JOIN NUMBERS ON NUMBERS.NUMBER_ID = TO_NUMBER(SOME_VALUE);
Miracle!!!
It worked!
I get my 2 expected records.
So far so good...
However, if I had defined my CTE slightly differently (WHERE clause which filters the same records)
WITH ONLY_GOOD_ONES
AS (
SELECT SOME_VALUE
FROM MY_TABLE
WHERE ROW_ID < 3
)
SELECT *
FROM ONLY_GOOD_ONES;
This CTE returns exactly the same thing as before
But if I try to join, it Fails!
WITH ONLY_GOOD_ONES
AS (
SELECT *
FROM MY_TABLE
WHERE ROW_ID < 3
)
SELECT *
FROM ONLY_GOOD_ONES
INNER JOIN NUMBERS ON NUMBERS.NUMBER_ID = TO_NUMBER(SOME_VALUE);
I get the following error...
SQL Error [100038] [22018]: Numeric value 'ABC' is not recognized
Is there a particular explanation to this second version of the CTE behaving differently???
The actual answer is because snowflake does not follow the SQL standard, and execute SQL in the order given.
They apply transforms to data prior to filtering when there optimizer decides it wants to.
So for your table MY_TABLE when you do
SELECT some_value::NUMBER FROM my_table WHERE row_id IN (1,2);
You will under some cases have the as_number cast happen on all row, and explode on the 'ABC'. Which is violating SQL rules, that WHERE are evaluated before SELECT transforms are done, but Snowflake have known this for years, and it's intentional, as it makes things run faster.
The solution is to understand you have mixed data and therefore assume the code can and will be ran out of order, and thus use the protective versions of the functions like TRY_TO_NUMBER
The kicker is you can write a few nested SELECTs to avoid the problem and then put something like a window funcation around the code and the optimizer jump back into this behavour and you SQL explodes again. Thus the solution is to understand if you have mixed data, and handle it. Oh and complain it's a bug.
This is because you're getting a different execution plan with the different queries.
Here's how the query is executed with the working query:
... and here is how it's executed with the query generating a failure. The error comes from the fact that the join filter is applied directly on the table scan before the ROW_ID < 3 filter is applied, compared to the working query.
You can see these plans under history, clicking the query id and then the 'profile' tab.
It looks like the join filter is applied so early, maybe because of a wrong estimation. When I run the queries on my test database, they completed without any error.
To overcome the issue, you can always "Error-handling Conversion Functions":
SELECT *
FROM MY_TABLE
INNER JOIN NUMBERS ON NUMBERS.NUMBER_ID = TRY_TO_NUMBER(SOME_VALUE)
WHERE ROW_ID < 3;
More information:
https://docs.snowflake.com/en/sql-reference/functions-conversion.html#label-try-conversion-functions

TSql Output From in Update query

I'm trying to update a table and return some values in the same query, however one of the values to return is located in a linked table
Since sub-queries (which seem perfectly unambiguous to me) aren't allowed in Output clauses I'm trying to write the query using the Output From syntax with a join but this seems to me to produce all sorts of ambiguity
Consider for example the following query:
UPDATE tbla SET datecol=CURRENT_TIMESTAMP
OUTPUT deleted.datecol AS old, inserted.datecol AS new, b.col2
FROM tbla a LEFT JOIN tblb b ON a.bkey=b.bkey
WHERE akey=6
How does Sql Server know to what that WHERE clause refers? It seems to me that both the UPDATE and FROM parts of the query both qualify for a WHERE clause, so will that WHERE clause restrict which rows update or which rows appear in the output or both?
In my testing I've also seen Sql Server ask for table identifiers to be added to the WHERE clause in situations that I've been unable to pick out a cause or pattern for, so is the tbla referred to in the UPDATE part of the statement implicitly identified as "a" because I've aliased it as that in the FROM statement? If so is "a" referencing inserted or deleted? And if I hadn't aliased it there which version of the table would tbla.akey refer to?
I've not been able to find any decipherable documentation on exactly how this works, and the messages coming back from SQL Server when I'm testing are only making me more confused
I'd also love to know why the following query isn't allowed as it seems like a faultlessly superior way of doing the same thing, certainly not at all ambiguous and a lot more self explanatory
UPDATE tbla SET datecol=CURRENT_TIMESTAMP
OUTPUT deleted.datecol AS old, inserted.datecol AS new,
(SELECT TOP(1) b.col2 FROM tblb b WHERE deleted.bkey = b.bkey) AS col2
WHERE akey=6
You could write the output to a table variable and then work with it:
Could look like that in your case:
DECLARE #output TABLE
(
bkey INT,
datecolold DATETIME,
datecolnew DATETIME
)
UPDATE tbla SET datecol=CURRENT_TIMESTAMP
OUTPUT deleted.bkey, deleted.datecol, inserted.datecol INTO #output
WHERE akey=6
SELECT b.col2, o.* FROM #output o INNER JOIN tblb b ON o.bkey = b.bkey

Persistent WITH statement in SQL Server 2008 [duplicate]

I've got a question which occurs when I was using the WITH-clause in one of my script. The question is easy to pointed out I wanna use the CTE alias multiple times instead of only in outer query and there is crux.
For instance:
-- Define the CTE expression
WITH cte_test (domain1, domain2, [...])
AS
-- CTE query
(
SELECT domain1, domain2, [...]
FROM table
)
-- Outer query
SELECT * FROM cte_test
-- Now I wanna use the CTE expression another time
INSERT INTO sometable ([...]) SELECT [...] FROM cte_test
The last row will lead to the following error because it's outside the outer query:
Msg 208, Level 16, State 1, Line 12 Invalid object name 'cte_test'.
Is there a way to use the CTE multiple times resp. make it persistent? My current solution is to create a temp table where I store the result of the CTE and use this temp table for any further statements.
-- CTE
[...]
-- Create a temp table after the CTE block
DECLARE #tmp TABLE (domain1 DATATYPE, domain2 DATATYPE, [...])
INSERT INTO #tmp (domain1, domain2, [...]) SELECT domain1, domain2, [...] FROM cte_test
-- Any further DML statements
SELECT * FROM #tmp
INSERT INTO sometable ([...]) SELECT [...] FROM #tmp
[...]
Frankly, I don't like this solution. Does anyone else have a best practice for this problem?
Thanks in advance!
A CommonTableExpression doesn't persist data in any way. It's basically just a way of creating a sub-query in advance of the main query itself.
This makes it much more like an in-line view than a normal sub-query would be. Because you can reference it repeatedly in one query, rather than having to type it again and again.
But it is still just treated as a view, expanded into the queries that reference it, macro like. No persisting of data at all.
This, unfortunately for you, means that you must do the persistance yourself.
If you want the CTE's logic to be persisted, you don't want an in-line view, you just want a view.
If you want the CTE's result set to be persisted, you need a temp table type of solution, such as the one you do not like.
A CTE is only in scope for the SQL statement it belongs to. If you need to reuse its data in a subsequent statement, you need a temporary table or table variable to store the data in. In your example, unless you're implementing a recursive CTE I don't see that the CTE is needed at all - you can store its contents straight in a temporary table/table variable and reuse it as much as you want.
Also note that your DELETE statement would attempt to delete from the underlying table, unlike if you'd placed the results into a temporary table/table variable.

SQL WHERE NOT EXISTS (skip duplicates)

Hello I'm struggling to get the query below right. What I want is to return rows with unique names and surnames. What I get is all rows with duplicates
This is my sql
DECLARE #tmp AS TABLE (Name VARCHAR(100), Surname VARCHAR(100))
INSERT INTO #tmp
SELECT CustomerName,CustomerSurname FROM Customers
WHERE
NOT EXISTS
(SELECT Name,Surname
FROM #tmp
WHERE Name=CustomerName
AND ID Surname=CustomerSurname
GROUP BY Name,Surname )
Please can someone point me in the right direction here.
//Desperate (I tried without GROUP BY as well but get same result)
DISTINCT would do the trick.
SELECT DISTINCT CustomerName, CustomerSurname
FROM Customers
Demo
If you only want the records that really don't have duplicates (as opposed to getting duplicates represented as a single record) you could use GROUP BY and HAVING:
SELECT CustomerName, CustomerSurname
FROM Customers
GROUP BY CustomerName, CustomerSurname
HAVING COUNT(*) = 1
Demo
First, I thought that #David answer is what you want. But rereading your comments, perhaps you want all combinations of Names and Surnames:
SELECT n.CustomerName, s.CustomerSurname
FROM
( SELECT DISTINCT CustomerName
FROM Customers
) AS n
CROSS JOIN
( SELECT DISTINCT CustomerSurname
FROM Customers
) AS s ;
Are you doing that while your #Tmp table is still empty?
If so: your entire "select" is fully evaluated before the "insert" statement, it doesn't do "run the query and add one row, insert the row, run the query and get another row, insert the row, etc."
If you want to insert unique Customers only, use that same "Customer" table in your not exists clause
SELECT c.CustomerName,c.CustomerSurname FROM Customers c
WHERE
NOT EXISTS
(SELECT 1
FROM Customers c1
WHERE c.CustomerName = c1.CustomerName
AND c.CustomerSurname = c1.CustomerSurname
AND c.Id <> c1.Id)
If you want to insert a unique set of customers, use "distinct"
Typically, if you're doing a WHERE NOT EXISTS or WHERE EXISTS, or WHERE NOT IN subquery,
you should use what is called a "correlated subquery", as in ypercube's answer above, where table aliases are used for both inside and outside tables (where inside table is joined to outside table). ypercube gave a good example.
And often, NOT EXISTS is preferred over NOT IN (unless the WHERE NOT IN is selecting from a totally unrelated table that you can't join on.)
Sometimes if you're tempted to do a WHERE EXISTS (SELECT from a small table with no duplicate values in column), you could also do the same thing by joining the main query with that table on the column you want in the EXISTS. Not always the best or safest solution, might make query slower if there are many rows in that table and could cause many duplicate rows if there are dup values for that column in the joined table -- in which case you'd have to add DISTINCT to the main query, which causes it to SORT the data on all columns.
-- Not efficient at all.
And, similarly, the WHERE NOT IN or NOT EXISTS correlated subqueries can be accomplished (and give the exact same execution plan) if you LEFT OUTER JOIN the table you were going to subquery -- and add a WHERE . IS NULL.
You have to be careful using that, but you don't need a DISTINCT. Frankly, I prefer to use the WHERE NOT IN subqueries or NOT EXISTS correlated subqueries, because the syntax makes the intention clear and it's hard to go wrong.
And you do not need a DISTINCT in the SELECT inside such subqueries (correlated or not). It would be a waste of processing (and for WHERE EXISTS or WHERE IN subqueries, the SQL optimizer would ignore it anyway and just use the first value that matched for each row in the outer query). (Hope that makes sense.)

Resources