Does WITH statement execute once per query or once per row? - sql-server

My understanding of the WITH statement (CTE) is that it executes once per query. With a query like this:
WITH Query1 AS ( ... )
SELECT *
FROM
SomeTable t1
LEFT JOIN Query1 t2 ON ...
If this results in 100 rows, I expect that Query1 was executed only once - not 100 times. If that assumption is correct, the time taken to run the entire query is roughly equal to the time taken to: run Query1 + select from SomeTable + join SomeTable to Query1.
I am in a situation where:
Query1 when run alone takes ~5 seconds (400k rows).
The remainder of the query, after removing the WITH statement and the LEFT JOIN takes ~15 seconds (400k rows).
So, when running the entire query with the WITH statement and the LEFT JOIN in place, I would have expected the query to complete in a timely manner, instead I've let it run for over an hour and once stopped it only got as far as 11k rows.
I am clearly wrong, but why?

Example:
SET NOCOUNT ON;
SET IMPLICIT_TRANSACTIONS ON;
CREATE TABLE MyTable (MyID INT PRIMARY KEY);
GO
INSERT MyTable (MyID)
VALUES (11), (22), (33), (44), (55);
PRINT 'Test MyCTE:';
WITH MyCTE
AS (
SELECT *, ROW_NUMBER()OVER(ORDER BY MyID) AS RowNum
FROM MyTable
)
SELECT *
FROM MyCTE crt
LEFT JOIN MyCTE prev ON crt.RowNum=prev.RowNum+1;
ROLLBACK;
If you run previous script in SSMS (press Ctrl+M -> Actual Execution Plan) then you will get this execution plan for the last query:
In this case, the CTE is executed one time for crt alias and five (!) times for prev alias, once for every row from crt.
So, the answer for this question
Does WITH statement execute once per query or once per row?
is both: once per query (crt) and once per row (prev: once for every for from crt).
To optimize this query, for the start,
1) You can try to store the results from CTE (MyCTE or Query) into a table variable or a temp table
2) Define the primary key of this table as been the join colum(s),
3) Rewrite the final query to use this table variable or temp table.
Off course, you can try to rewrite the final query without this self join between CTE.

Related

What is the "lifespan" of a postgres CTE expression? e.g. WITH... AS

I have a CTE I am using to pull some data from two tables then stick in an intermediate table called cte_list, something like
with cte_list as (
select pl.col_val from prune_list pl join employees.employee emp on pl.col_val::uuid = emp.id
where pl.col_nm = 'employee_ref_id' limit 100
)
Then, I am doing an insert to move records from the cte_list to another archive table (if they don't exist) called employee_arch_test
insert into employees.employee_arch_test (
select * from employees.employee where id in (select col_val::uuid from cte_list)
and not exists (select 1 from employees.employee_arch_test where employees.employee_arch_test.id=employees.employee.id)
);
This seems to work fine. The problem is when I add another statement after, to do some deletions from the main employee table using this aforementioned cte_list - the cte_list apparently no longer exists?
SQL Error [42P01]: ERROR: relation "cte_list" does not exist
the actual delete query:
delete from employees.employee where id in (select col_val::uuid from cte_list);
Can the cte_list CTE table only be used once or something? I'm running these statements in a LOOP and I need to run the exact same calls for about 2 or 3 other tables but hit a sticking point here.
A CTE only exists for the duration of the statement of which it's a part. I gather you have an INSERT statement with the CTE preceding it:
with cte_list
as (select pl.col_val
from prune_list pl
join employees.employee emp
on pl.col_val::uuid = emp.id
where pl.col_nm = 'employee_ref_id'
limit 100
)
insert into employees.employee_arch_test
(select *
from employees.employee
where id in (select col_val::uuid from cte_list)
and not exists (select 1
from employees.employee_arch_test
where employees.employee_arch_test.id = employees.employee.id)
);
The CTE is part of the INSERT statement - it is not a separate statement by itself. It only exists for the duration of the INSERT statement.
If you need something which lasts longer your options are:
Add the same CTE to each of your following statements. Note that because data may be changing in your database each invocation of the CTE may return different data.
Create a view which performs the same operations as the CTE, then use the view in place of the CTE. Note that because data may be changing in your database each invocation of the view may return different data.
Create a temporary table to hold the data from your CTE query, then use the temporary table in place of the CTE. This has the advantage of providing a consistent set of data to all operations.

WITH is not working as I expected sqlServer 2012

I am getting diferents results into a WITH statement. here is my first query:
with q as (select top (100000) * from table1) select * from q
Let's say that table1 has an ID field, everything seems to be normal if I execute that query, it works as I expected. But if I change the statement like this:
with q as (select top (100000) * from table1) select [ID] from q
or
with q as (select top (100000) * from table1) select q.[ID] from q
it brings me results that does not exists into the first query (note that I only bring ID). I understand that WITH statement is a temporal result set an I expect that both queries brings the same result no matter how many fields I select, so why is this happening?, this could be a problem if i want to perform an update or even worst if I do a delete I will not be completely sure if I have affected the rows that I wanted
If you select top x without an order by, the result set is arbitrarily returned. Meaning you can get a different result set if you execute it twice. Since you are changing the query slightly, I'm not surprised the result set is different. Add an ORDER BY if you SELECT TOP x

Performing INSERT for each row in a select RESULT

First, a general description of the problem: I'm running a periodical process which updates total figures in a table. The issue is, that multiple updates may be required in each execution of the process, and each execution depends on the previous results.
My question is, can it be done in a single SQL Server SP?
My code (I altered it a little to simply the sample):
INSERT INTO CustomerMinuteSessions(time, customer, sessions, bytes, previousTotalSessions)
SELECT MS.time,
MS.customer,
MS.totalSessions,
MS.totalBytes,
CTS.previousTotalSessions
FROM (SELECT time, customer, SUM(sessions) as totalSessions, SUM(bytes) AS totalBytes
FROM MinuteSessions
WHERE time > #time
GROUP BY time, x) MS
CROSS APPLY TVF_GetPreviousCustomerTotalSessions(MS.customer) CTS
ORDER BY time
The previousTotalSessions column depends on other rows in UpdatedTable, and its value is retrieved by CROSS APPLYing TVF_GetPreviousCustomerTotalSessions, but if I execute the SP as-is, all the rows use the value retrieved by the function without taking the rows added during the execution of the SP.
For the sake of completeness, here's TVF_GetPreviousCustomerTotalSessions:
FUNCTION [dbo].[TVF_GetCustomerCurrentSessions]
(
#customerId int
)
RETURNS #result TABLE (PreviousNumberOfSessions int)
AS
BEGIN
INSERT INTO #result
SELECT TOP 1 (PreviousNumberOfSessions + Opened - Closed) AS PreviousNumberOfSessions
FROM CustomerMinuteSessions
WHERE CustomerId = #customerId
ORDER BY time DESC
IF ##rowcount = 0
INSERT INTO #result(PreviousNumberOfSessions) VALUES(0)
RETURN
END
What is the best (i.e. without for loop, I guess...) to take previous rows within the query for subsequent rows?
If you are using SQL-2005 and later, you can do it with few CTEs in one shot. If you use SQL-2000 you'll can use inline table-valued function.
Personally I like the CTE approach more, so I'm including a schematic translation of your code to CTEs syntax. (Bare in mind hat I didn't prepare a test set to check it).
WITH LastSessionByCustomer AS
(
SELECT CustomerID, MAX(Time)
FROM CustomerMinuteSessions
GROUP BY CustomerID
)
, GetPreviousCustomerTotalSessions AS
(
SELECT LastSession.CustomerID, LastSession.PreviousNumberOfSessions + LastSession.Opened - LastSession.Closed AS PreviousNumberOfSessions
FROM CustomerMinuteSessions LastSession
INNER JOIN LastSessionByCustomer ON LastSessionByCustomer.CustomerID = LastSession.CustomerID
)
, MS AS
(
SELECT time, customer, SUM(sessions) as totalSessions, SUM(bytes) AS totalBytes
FROM MinuteSessions
WHERE time > #time
GROUP BY time, x
)
INSERT INTO CustomerMinuteSessions(time, customer, sessions, bytes, previousTotalSessions)
SELECT MS.time,
MS.customer,
MS.totalSessions,
MS.totalBytes,
ISNULL(GetPreviousCustomerTotalSessions.previousTotalSessions, 0)
FROM MS
RIGHT JOIN GetPreviousCustomerTotalSessions ON MS.Customer = GetPreviousCustomerTotalSessions.CustomerID
Going a bit beyond your question, I think that your query with cross apply could make big damage to the database once table CustomerMinuteSessions database grows
I would add an index like to improve your chances of getting Index-Seek:
CREATE INDEX IX_CustomerMinuteSessions_CustomerId
ON CustomerMinuteSessions (CustomerId, [time] DESC, PreviousNumberOfSessions, Opened, Closed );

Get next sequence from table in insert statement to another table

I have 3 tables
Table_A has a bunch of rows
Table_B where rows are going to be inserted with data from Table_A
Table_C holds a number (integer) called code_number
I have a stored procedure (sp_getNextCode) that selects the current code_number from Table_C, creates and returns a varchar code string with this number (like yyyyMMdd + cast(code_number as varchar) or something) and updates the Table_C code_number with the next value (code_number+1)
So far so good.
Now I want to insert a number of rows from Table_A to Table_B WITHOUT THE USE OF CURSOR
using a
INSERT INTO TABLE_B
SELECT .... FROM TABLE_A
Again so far so good
The problem is that one of the values in the above insert statement has to be the output of the stored procedure sp_getNextCode.
I cannot use the stored procedure in the statement
I cannot create a function with the same code as sp_getNextCode as the functions cannot have INSERT/UPDATE/DELETE
I don't have the option of SQL Server 2012 (which has a sequence) only SQL Server 2008
Is there any way to achieve this or the only way is with cursors (I REALLY want to avoid the cursor cause im talking about thousands of rows that need to be inserted and it takes too long )
Sure, there is a way: Create a stored procedure to read from table_A, inserts ino table_b and updates table_C:
BEGIN TRANSACTION
SELECT #last_value=last_vale FROM Table_C;
SELECT #records_to_insert = COUNT(*)
FROM Table_A
WHERE <Conditions>
INSERT INTO TABLE_A
SELECT <Fields>,...., #last_value + ROW_NUMBER()
OVER(ORDER BY <some ordering criteria>)
FROM TABLE_A
WHERE <Conditions>
UPATE Table_C
SET last_value = #last_value + #records_to_insert +1
COMMIT TRANSACTION
I am ommiting some details as the transformation of the numbers to the formatted code, exception handling and rollback, but I hope you can get the idea.
The trick is to use the ROW_NUMBER() OVER (ORDER BY...) function to obtain a unique number for each row.
http://msdn.microsoft.com/en-us/library/ms186734.aspx

Performance Issues with Count(*) in SQL Server

I am having some performance issues with a query I am running in SQL Server 2008. I have the following query:
Query1:
SELECT GroupID, COUNT(*) AS TotalRows FROM Table1
INNER JOIN (
SELECT Column1 FROM Table2 WHERE GroupID = #GroupID
) AS Table2
ON Table2.Column1 = Table1.Column1
WHERE CONTAINS(Table1.*, #Word) GROUP BY GroupID
Table1 contains about 500,000 rows. Table2 contains about 50,000, but will eventually contain millions. Playing around with the query, I found that re-writing the query as follows will reduce the execution time of the query to under 1 second.
Query 2:
SELECT GroupID FROM Table1
INNER JOIN (
SELECT Column1 FROM Table2 WHERE GroupID = #GroupID
) AS Table2 ON Table2.Column1 = Table1.Column1
WHERE CONTAINS(Table1.*, #Word)
What I do not understand is it is a simple count query. If I execute the following query on Table 1, it returns in < 1 s:
Query 3:
SELECT Count(*) FROM Table1
This query returns around 500,000 as the result.
However, the Original query (Query 1) mentioned above only returns a count of 50,000 and takes 3s to execute even though simply removing the GROUP BY (Query 2) reduces the execution time to < 1s.
I do not believe this is an indexing issue as I already have indexes on the appropriate columns. Any help would be very appreciated.
Performing a simple COUNT(*) FROM table can do a much more efficient scan of the clustered index, since it doesn't have to care about any filtering, joining, grouping, etc. The queries that include full-text search predicates and mysterious subqueries have to do a lot more work. The count is not the most expensive part there - I bet they're still relatively slow if you leave the count out but leave the group by in, e.g.:
SELECT GroupID FROM Table1
INNER JOIN (
SELECT Column1 FROM Table2 WHERE GroupID = #GroupID
) AS Table2 ON Table2.Column1 = Table1.Column1
WHERE CONTAINS(Table1.*, #Word)
GROUP BY GroupID;
Looking at the provided actual execution plan in the free SQL Sentry Plan Explorer*, I see this:
And this:
Which lead me to believe you should:
Update the statistics on both Inventory and A001_Store_Inventory so that the optimizer can get a better rowcount estimate (which could lead to a better plan shape).
Ensure that Inventory.ItemNumber and A001_Store_Inventory.ItemNumber are the same data type to avoid an implicit conversion.
(*) disclaimer: I work for SQL Sentry.
You should have a look at the query plan to see what SQL Server is doing to retrieve the data you requested. Also, I think it would be better to rewrite your original query as follows:
SELECT
Table1.GroupID -- When you use JOINs, it's always better to specify Table (or Alias) names
,COUNT(Table1.GroupID) AS TotalRows
FROM
Table1
INNER JOIN
Table2 ON
(Table2.Column1 = Table1.Column1) AND
(Table2.GroupID = #GroupID)
WHERE
CONTAINS(Table1.*, #Word)
GROUP BY
Table1.GroupID
Also, keep in mind that a simple COUNT and a COUNT with a JOIN and GROUP BY are not the same thing. In one case, it's just a matter of going through an index and making a count, in the other there are other tables and grouping involved, which can be time consuming depending on several factors.

Resources