Strange behavior of CTE

Strange behavior of CTE - sql-server

I just answered this: Generate scripts with new ids (also for dependencies)
My first attempt was this:
DECLARE #Form1 UNIQUEIDENTIFIER=NEWID();
DECLARE #Form2 UNIQUEIDENTIFIER=NEWID();
DECLARE #tblForms TABLE(id UNIQUEIDENTIFIER,FormName VARCHAR(100));
INSERT INTO #tblForms VALUES(#Form1,'test1'),(#Form2,'test2');
DECLARE #tblFields TABLE(id UNIQUEIDENTIFIER,FormId UNIQUEIDENTIFIER,FieldName VARCHAR(100));
INSERT INTO #tblFields VALUES(NEWID(),#Form1,'test1.1'),(NEWID(),#Form1,'test1.2'),(NEWID(),#Form1,'test1.3')
,(NEWID(),#Form2,'test2.1'),(NEWID(),#Form2,'test2.2'),(NEWID(),#Form2,'test2.3');
--These are the originalIDs
SELECT frms.id,frms.FormName
,flds.id,flds.FieldName
FROM #tblForms AS frms
INNER JOIN #tblFields AS flds ON frms.id=flds.FormId ;
--The same with new ids
WITH FormsWithNewID AS
(
SELECT NEWID() AS myNewFormID
,*
FROM #tblForms
)
SELECT frms.myNewFormID, frms.id,frms.FormName
,NEWID() AS myNewFieldID,flds.FieldName
FROM FormsWithNewID AS frms
INNER JOIN #tblFields AS flds ON frms.id=flds.FormId
The second select should deliver - at least I thought so - two values in "myNewFormID", each three times... But it comes up with 6 different values. This would mean, that the CTE's "NEWID()" is done for each row of the final result set. What am I missing?

Your understanding of CTEs is wrong. They are not simply a table variable that's filled with the results of the query - instead, they are a query on their own. Note that CTEs can be used recursively - this would be quite a sight with table variables :)
From MSDN:
A common table expression (CTE) can be thought of as a temporary result set that is defined within the execution scope of a single SELECT, INSERT, UPDATE, DELETE, or CREATE VIEW statement. A CTE is similar to a derived table in that it is not stored as an object and lasts only for the duration of the query. Unlike a derived table, a CTE can be self-referencing and can be referenced multiple times in the same query.
The "can be thought" of is a bit deceiving - sure, it can be thought of, but it's not a result set. You don't see this manifesting when you're only using pure functions, but as you've noticed, newId is not pure. In reality, it's more like a named subquery - in your example, you'll get the same thing if you just move the query from the CTE to the from clause directly.
To illustrate this even further, you can add another join on the CTE to the query:
WITH FormsWithNewID AS
(
SELECT NEWID() AS myNewFormID
,*
FROM #tblForms
)
SELECT frms.myNewFormID, frms.id,frms.FormName
,NEWID() AS myNewFieldID,flds.FieldName,
frms2.myNewFormID
FROM FormsWithNewID AS frms
INNER JOIN #tblFields AS flds ON frms.id=flds.FormId
left join FormsWithNewID as frms2 on frms.id = frms2.id
You'll see that the frms2.myNewFormID contains different myNewFormIDs.
Keep this in mind - you can only treat the CTE as a result set when you're only using pure functions on non-changing data; in other words, if executing the same query in a serializable transaction isolation level twice will produce the same result sets.

NEWID() returns a value every time it is executed. Whenever you use it you get a new value
For example,
select top 5 newid()
from sys.tables
order by newid()
You will not see them order by because the selected field is produced with different values than the Order By field

Related

What is the "lifespan" of a postgres CTE expression? e.g. WITH... AS

I have a CTE I am using to pull some data from two tables then stick in an intermediate table called cte_list, something like
with cte_list as (
select pl.col_val from prune_list pl join employees.employee emp on pl.col_val::uuid = emp.id
where pl.col_nm = 'employee_ref_id' limit 100
)
Then, I am doing an insert to move records from the cte_list to another archive table (if they don't exist) called employee_arch_test
insert into employees.employee_arch_test (
select * from employees.employee where id in (select col_val::uuid from cte_list)
and not exists (select 1 from employees.employee_arch_test where employees.employee_arch_test.id=employees.employee.id)
);
This seems to work fine. The problem is when I add another statement after, to do some deletions from the main employee table using this aforementioned cte_list - the cte_list apparently no longer exists?
SQL Error [42P01]: ERROR: relation "cte_list" does not exist
the actual delete query:
delete from employees.employee where id in (select col_val::uuid from cte_list);
Can the cte_list CTE table only be used once or something? I'm running these statements in a LOOP and I need to run the exact same calls for about 2 or 3 other tables but hit a sticking point here.

A CTE only exists for the duration of the statement of which it's a part. I gather you have an INSERT statement with the CTE preceding it:
with cte_list
as (select pl.col_val
from prune_list pl
join employees.employee emp
on pl.col_val::uuid = emp.id
where pl.col_nm = 'employee_ref_id'
limit 100
)
insert into employees.employee_arch_test
(select *
from employees.employee
where id in (select col_val::uuid from cte_list)
and not exists (select 1
from employees.employee_arch_test
where employees.employee_arch_test.id = employees.employee.id)
);
The CTE is part of the INSERT statement - it is not a separate statement by itself. It only exists for the duration of the INSERT statement.
If you need something which lasts longer your options are:
Add the same CTE to each of your following statements. Note that because data may be changing in your database each invocation of the CTE may return different data.
Create a view which performs the same operations as the CTE, then use the view in place of the CTE. Note that because data may be changing in your database each invocation of the view may return different data.
Create a temporary table to hold the data from your CTE query, then use the temporary table in place of the CTE. This has the advantage of providing a consistent set of data to all operations.

TSql Output From in Update query

I'm trying to update a table and return some values in the same query, however one of the values to return is located in a linked table
Since sub-queries (which seem perfectly unambiguous to me) aren't allowed in Output clauses I'm trying to write the query using the Output From syntax with a join but this seems to me to produce all sorts of ambiguity
Consider for example the following query:
UPDATE tbla SET datecol=CURRENT_TIMESTAMP
OUTPUT deleted.datecol AS old, inserted.datecol AS new, b.col2
FROM tbla a LEFT JOIN tblb b ON a.bkey=b.bkey
WHERE akey=6
How does Sql Server know to what that WHERE clause refers? It seems to me that both the UPDATE and FROM parts of the query both qualify for a WHERE clause, so will that WHERE clause restrict which rows update or which rows appear in the output or both?
In my testing I've also seen Sql Server ask for table identifiers to be added to the WHERE clause in situations that I've been unable to pick out a cause or pattern for, so is the tbla referred to in the UPDATE part of the statement implicitly identified as "a" because I've aliased it as that in the FROM statement? If so is "a" referencing inserted or deleted? And if I hadn't aliased it there which version of the table would tbla.akey refer to?
I've not been able to find any decipherable documentation on exactly how this works, and the messages coming back from SQL Server when I'm testing are only making me more confused
I'd also love to know why the following query isn't allowed as it seems like a faultlessly superior way of doing the same thing, certainly not at all ambiguous and a lot more self explanatory
UPDATE tbla SET datecol=CURRENT_TIMESTAMP
OUTPUT deleted.datecol AS old, inserted.datecol AS new,
(SELECT TOP(1) b.col2 FROM tblb b WHERE deleted.bkey = b.bkey) AS col2
WHERE akey=6

You could write the output to a table variable and then work with it:
Could look like that in your case:
DECLARE #output TABLE
(
bkey INT,
datecolold DATETIME,
datecolnew DATETIME
)
UPDATE tbla SET datecol=CURRENT_TIMESTAMP
OUTPUT deleted.bkey, deleted.datecol, inserted.datecol INTO #output
WHERE akey=6
SELECT b.col2, o.* FROM #output o INNER JOIN tblb b ON o.bkey = b.bkey

SQL select after where clause

Here is the setup:
Table 1: table_1
column_id
column_12
column_13
column_14
Table 2: table_2
column_id
column_21
column_22
Select statement:
DECLARE #Variable
INT SET #Variable = 300
SELECT b.column_id,
b.column_12,
SUM(b.column_13) OVER (PARTITION BY b.column_id ORDER BY b.column_12) AS sum_column_13,
#Variable / nullif(SUM(b.column_13) OVER (PARTITION BY b.column_id ORDER BY b.column_12),0) AS divide_var,
(b.column_13*100) / nullif(b.column_14,0) AS divide_column_3
FROM dbo.table_1 b
WHERE b.column_12 IN ('AM','AJ','A-M','A-J','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q');
This works great, all the formulas are working and the correct results are shown.
b.column_id is retrieved
b.column_12 is retrieved
sum_column_13 is equal to the sum of all the column_13 values (partitioned by column_id)
divide_var is equal to a variable dived by sum_column_13
divide_column_13 is equal to column_13 divided by column_14
Now however I am trying to retrieve the #Variable from table_2, instead of it being static.
Both tables have a column_id, which could link them together. However this value is not unique.
The actual number for #Variable should come from table_2; by summing all the values of column_21 for each column_id.(Something similar sum_column_13)
I can make both things work separately, but when I try to combine them (with a JOIN, or an extra SELECT class) everything goes wild. For example when using the JOIN statement, the WHERE class is solely applied to the JOIN statement and not to the SELECT statement. How I imagine it should go is to use the column_id results from the current SELECT, then use this to retrieve the required data from table_2.
I understand my explanation is not very clear. So here is an SQLFiddle.
As you can see the variable right now comes from adding up the two values in table_2.
Hope this helps.
Thanks,

Here is the sample code, I've not made use of variable instead I'm using the sum of columns directly, also I've made use of CTE:
with tbl_2(col_id, col_sum) as
( select col_id, sum(column_21) col_sum from tbl_2 group by col_id)
SELECT b.column_id,
b.column_12,
SUM(b.column_13) OVER (PARTITION BY b.column_id ORDER BY b.column_12) AS sum_column_13,
col_sum / nullif(SUM(b.column_13) OVER (PARTITION BY b.column_id ORDER BY b.column_12),0) AS divide_var,
(b.column_13*100) / nullif(b.column_14,0) AS divide_column_3
FROM dbo.table_1 b
join tbl_2 on b.col_id=tbl_2.col_id
WHERE b.column_12 IN ('AM','AJ','A-M','A-J','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q');

SQL WHERE NOT EXISTS (skip duplicates)

Hello I'm struggling to get the query below right. What I want is to return rows with unique names and surnames. What I get is all rows with duplicates
This is my sql
DECLARE #tmp AS TABLE (Name VARCHAR(100), Surname VARCHAR(100))
INSERT INTO #tmp
SELECT CustomerName,CustomerSurname FROM Customers
WHERE
NOT EXISTS
(SELECT Name,Surname
FROM #tmp
WHERE Name=CustomerName
AND ID Surname=CustomerSurname
GROUP BY Name,Surname )
Please can someone point me in the right direction here.
//Desperate (I tried without GROUP BY as well but get same result)

DISTINCT would do the trick.
SELECT DISTINCT CustomerName, CustomerSurname
FROM Customers
Demo
If you only want the records that really don't have duplicates (as opposed to getting duplicates represented as a single record) you could use GROUP BY and HAVING:
SELECT CustomerName, CustomerSurname
FROM Customers
GROUP BY CustomerName, CustomerSurname
HAVING COUNT(*) = 1
Demo

First, I thought that #David answer is what you want. But rereading your comments, perhaps you want all combinations of Names and Surnames:
SELECT n.CustomerName, s.CustomerSurname
FROM
( SELECT DISTINCT CustomerName
FROM Customers
) AS n
CROSS JOIN
( SELECT DISTINCT CustomerSurname
FROM Customers
) AS s ;

Are you doing that while your #Tmp table is still empty?
If so: your entire "select" is fully evaluated before the "insert" statement, it doesn't do "run the query and add one row, insert the row, run the query and get another row, insert the row, etc."
If you want to insert unique Customers only, use that same "Customer" table in your not exists clause
SELECT c.CustomerName,c.CustomerSurname FROM Customers c
WHERE
NOT EXISTS
(SELECT 1
FROM Customers c1
WHERE c.CustomerName = c1.CustomerName
AND c.CustomerSurname = c1.CustomerSurname
AND c.Id <> c1.Id)
If you want to insert a unique set of customers, use "distinct"

Typically, if you're doing a WHERE NOT EXISTS or WHERE EXISTS, or WHERE NOT IN subquery,
you should use what is called a "correlated subquery", as in ypercube's answer above, where table aliases are used for both inside and outside tables (where inside table is joined to outside table). ypercube gave a good example.
And often, NOT EXISTS is preferred over NOT IN (unless the WHERE NOT IN is selecting from a totally unrelated table that you can't join on.)
Sometimes if you're tempted to do a WHERE EXISTS (SELECT from a small table with no duplicate values in column), you could also do the same thing by joining the main query with that table on the column you want in the EXISTS. Not always the best or safest solution, might make query slower if there are many rows in that table and could cause many duplicate rows if there are dup values for that column in the joined table -- in which case you'd have to add DISTINCT to the main query, which causes it to SORT the data on all columns.
-- Not efficient at all.
And, similarly, the WHERE NOT IN or NOT EXISTS correlated subqueries can be accomplished (and give the exact same execution plan) if you LEFT OUTER JOIN the table you were going to subquery -- and add a WHERE . IS NULL.
You have to be careful using that, but you don't need a DISTINCT. Frankly, I prefer to use the WHERE NOT IN subqueries or NOT EXISTS correlated subqueries, because the syntax makes the intention clear and it's hard to go wrong.
And you do not need a DISTINCT in the SELECT inside such subqueries (correlated or not). It would be a waste of processing (and for WHERE EXISTS or WHERE IN subqueries, the SQL optimizer would ignore it anyway and just use the first value that matched for each row in the outer query). (Hope that makes sense.)

T-SQL filtering on dynamic name-value pairs

I'll describe what I am trying to achieve:
I am passing down to a SP an xml with name value pairs that I put into a table variable, let's say #nameValuePairs.
I need to retrieve a list of IDs for expressions (a table) with those exact match of name-value pairs (attributes, another table) associated.
This is my schema:
Expressions table --> (expressionId, attributeId)
Attributes table --> (attributeId, attributeName, attributeValue)
After trying complicated stuff with dynamic SQL and evil cursors (which works but it's painfully slow) this is what I've got now:
--do the magic plz!
-- retrieve number of name-value pairs
SET #noOfAttributes = select count(*) from #nameValuePairs
select distinct
e.expressionId, a.attributeName, a.attributeValue
into
#temp
from
expressions e
join
attributes a
on
e.attributeId = a.attributeId
join --> this join does the filtering
#nameValuePairs nvp
on
a.attributeName = nvp.name and a.attributeValue = nvp.value
group by
e.expressionId, a.attributeName, a.attributeValue
-- now select the IDs I need
-- since I did a select distinct above if the number of matches
-- for a given ID is the same as noOfAttributes then BINGO!
select distinct
expressionId
from
#temp
group by expressionId
having count(*) = #noOfAttributes
Can people please review and see if they can spot any problems? Is there a better way of doing this?
Any help appreciated!

I belive that this would satisfy the requirement you're trying to meet. I'm not sure how much prettier it is, but it should work and wouldn't require a temp table:
SET #noOfAttributes = select count(*) from #nameValuePairs
SELECT e.expressionid
FROM expression e
LEFT JOIN (
SELECT attributeid
FROM attributes a
JOIN #nameValuePairs nvp ON nvp.name = a.Name AND nvp.Value = a.value
) t ON t.attributeid = e.attributeid
GROUP BY e.expressionid
HAVING SUM(CASE WHEN t.attributeid IS NULL THEN (#noOfAttributes + 1) ELSE 1 END) = #noOfAttributes
EDIT: After doing some more evaluation, I found an issue where certain expressions would be included that shouldn't have been. I've modified my query to take that in to account.

One error I see is that you have no table with an alias of b, yet you are using: a.attributeId = b.attributeId.
Try fixing that and see if it works, unless I am missing something.
EDIT: I think you just fixed this in your edit, but is it supposed to be a.attributeId = e.attributeId?

This is not a bad approach, depending on the sizes and indexes of the tables, including #nameValuePairs. If it these row counts are high or it otherwise becomes slow, you may do better to put #namValuePairs into a temp table instead, add appropriate indexes, and use a single query instead of two separate ones.
I do notice that you are putting columns into #temp that you are not using, would be faster to exclude them (though it would mean duplicate rows in #temp). Also, you second query has both a "distinct" and a "group by" on the same columns. You don't need both so I would drop the "distinct" (probably won't affect performance, because the optimizer already figured this out).
Finally, #temp would probably be faster with a clustered non-unique index on expressionid (I am assuming that this is SQL 2005). You could add it after the SELECT..INTO, but it is usually as fast or faster to add it before you load. This would require you to CREATE #temp first, add the clustered and then use INSERT..SELECT to load it instead.
I'll add an example of merging the queries in a mintue... Ok, here's one way to merge them into a single query (this should be 2000-compatible also):
-- retrieve number of name-value pairs
SET #noOfAttributes = select count(*) from #nameValuePairs
-- now select the IDs I need
-- since I did a select distinct above if the number of matches
-- for a given ID is the same as noOfAttributes then BINGO!
select
expressionId
from
(
select distinct
e.expressionId, a.attributeName, a.attributeValue
from
expressions e
join
attributes a
on
e.attributeId = a.attributeId
join --> this join does the filtering
#nameValuePairs nvp
on
a.attributeName = nvp.name and a.attributeValue = nvp.value
) as Temp
group by expressionId
having count(*) = #noOfAttributes

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight