TSQL - subquery inside Begin End - sql-server

Consider the following query:
begin
;with
t1 as (
select top(10) x from tableX
),
t2 as (
select * from t1
),
t3 as (
select * from t1
)
-- --------------------------
select *
from t2
join t3 on t3.x=t2.x
end
go
I was wondering if t1 is called twice hence tableX being called twice (which means t1 acts like a table)?
or just once with its rows saved in t1 for the whole query (like a variable in a programming lang)?
Just trying to figure out how tsql engine optimises this. This is important to know because if t1 has millions of rows and is being called many times in the whole query generating the same result then there should be a better way to do it..

Just create the table:
CREATE TABLE tableX
(
x int PRIMARY KEY
);
INSERT INTO tableX
VALUES (1)
,(2)
Turn on the execution plan generation and execute the query. You will get something like this:
So, yes, the table is queried two times. If you are using complex common table expression and you are working with huge amount of data, I will advice to store the result in temporary table.
Sometimes, I am getting very bad execution plans for complex CTEs which were working nicely in the past. Also, you are allowed to define indexes on temporary tables and improve performance further.

To be honest, there is no answer... The only answer is Race your horses (Eric Lippert).
The way you write your query does not tell you, how the engine will put it in execution. This depends on many, many influences...
You tell the engine, what you want to get and the engine decides how to get this.
This may even differ between identical calls depending on statistics, currently running queries, existing cached results etc.
Just as a hint, try this:
USE master;
GO
CREATE DATABASE testDB;
GO
USE testDB;
GO
--I create a physical test table with 1.000.000 rows
CREATE TABLE testTbl(ID INT IDENTITY PRIMARY KEY, SomeValue VARCHAR(100));
WITH MioRows(Nr) AS (SELECT TOP 1000000 ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM master..spt_values v1 CROSS JOIN master..spt_values v2 CROSS JOIN master..spt_values v3)
INSERT INTO testTbl(SomeValue)
SELECT CONCAT('Test',Nr)
FROM MioRows;
--Now we can start to test this
GO
CHECKPOINT;
GO
DBCC DROPCLEANBUFFERS;
GO
DECLARE #dt DATETIME2 = SYSUTCDATETIME();
--Your approach with CTEs
;with t1 as (select * from testTbl)
,t2 as (select * from t1)
,t3 as (select * from t1)
select t2.ID AS t2_ID,t2.SomeValue AS t2_SomeValue,t3.ID AS t3_ID,t3.SomeValue AS t3_SomeValue INTO target1
from t2
join t3 on t3.ID=t2.ID;
SELECT 'Final CTE',DATEDIFF(MILLISECOND,#dt,SYSUTCDATETIME());
GO
CHECKPOINT;
GO
DBCC DROPCLEANBUFFERS;
GO
DECLARE #dt DATETIME2 = SYSUTCDATETIME();
--Writing the intermediate result into a physical table
SELECT * INTO test1 FROM testTbl;
SELECT 'Write into test1',DATEDIFF(MILLISECOND,#dt,SYSUTCDATETIME());
select t2.ID AS t2_ID,t2.SomeValue AS t2_SomeValue,t3.ID AS t3_ID,t3.SomeValue AS t3_SomeValue INTO target2
from test1 t2
join test1 t3 on t3.ID=t2.ID
SELECT 'Final physical table',DATEDIFF(MILLISECOND,#dt,SYSUTCDATETIME());
GO
CHECKPOINT;
GO
DBCC DROPCLEANBUFFERS;
GO
DECLARE #dt DATETIME2 = SYSUTCDATETIME();
--Same as before, but with an primary key on the intermediate table
SELECT * INTO test2 FROM testTbl;
SELECT 'Write into test2',DATEDIFF(MILLISECOND,#dt,SYSUTCDATETIME());
ALTER TABLE test2 ADD PRIMARY KEY (ID);
SELECT 'Add PK',DATEDIFF(MILLISECOND,#dt,SYSUTCDATETIME());
select t2.ID AS t2_ID,t2.SomeValue AS t2_SomeValue,t3.ID AS t3_ID,t3.SomeValue AS t3_SomeValue INTO target3
from test2 t2
join test2 t3 on t3.ID=t2.ID
SELECT 'Final physical tabel with PK',DATEDIFF(MILLISECOND,#dt,SYSUTCDATETIME());
--Clean up (Careful with real data!!!)
GO
USE master;
GO
--DROP DATABASE testDB;
GO
On my system the
first takes 674ms, the
second 1.205ms (297 for writing into test1) and the
third 1.727ms (285 for writing into test2 and ~650ms for creating the index.
Although the query is performed twice, the engine can take advantage of cached results.
Conclusio
The engine is really smart... Don't try to be smarter...
If the table would cover a lot of columns and much more data per row the whole test might return something else...
If your CTEs (sub-queries) involve much more complex data with joins, views, functions and so on, the engine might get into troubles finding the best approach.
If performance matters, you can race your horses to test it out. One hint: I sometimes used a TABLE HINT quite successfully: FORCE ORDER. This will perform joins in the order specified in the query.

Here is a simple example to test the theories:
First, via temporary table which calls the matter only once.
declare #r1 table (id int, v uniqueidentifier);
insert into #r1
SELECT * FROM
(
select id=1, NewId() as 'v' union
select id=2, NewId()
) t
-- -----------
begin
;with
t1 as (
select * from #r1
),
t2 as (
select * from t1
),
t3 as (
select * from t1
)
-- ----------------
select * from t2
union all select * from t3
end
go
On the other hand, if we put the matter inside t1 instead of the temporary table, it gets called twice.
t1 as (
select id=1, NewId() as 'v' union
select id=2, NewId()
)
Hence, my conclusion is to use temporary table and not reply on cached results.
Also, ive implemented this on a large scale query that called the "matter" twice only and after moving it to temporary table the execution time went straight half!!

Related

Query tuning required for expensive query

Can someone help me to optimize the code? I have other way to optimize it by using compute column but we can not change the schema on prod as we are not sure how many API's are used to push data into this table. This table has millions of rows and adding a non-clustered index is not helping due to the query cost and it's going for a scan.
create table testcts(
name varchar(100)
)
go
insert into testcts(
name
)
select 'VK.cts.com'
union
select 'GK.ms.com'
go
DECLARE #list varchar(100) = 'VK,GK'
select * from testcts where replace(replace(name,'.cts.com',''),'.ms.com','') in (select value from string_split(#list,','))
drop table testcts
One possibility might be to strip off the .cts.com and .ms.com subdomain/domain endings before you insert or store the name data in your table. Then, use the following query instead:
SELECT *
FROM testcts
WHERE name IN (SELECT value FROM STRING_SPLIT(#list, ','));
Now SQL Server should be able to use an index on the name column.
If your values are always suffixed by cts.com or ms.com you could add that to the search pattern:
SELECT {YourColumns} --Don't use *
FROM dbo.testcts t
JOIN (SELECT CONCAT(SS.[value], V.Suffix) AS [value]
FROM STRING_SPLIT(#list, ',') SS
CROSS APPLY (VALUES ('.cts.com'),
('.ms.com')) V (Suffix) ) L ON t.[name] = L.[value];

Can we use Parameterized Views in Snowflake?

Can we use Parameterized Views in Snowflake. Such as pass the table name or database name as parameters instead of hardcoding it?
I think your best bet is to use session variables in conjunction with a regular view.
A session variable can be referenced in the view DDL, and will need to be set in any sessions querying the view.
To do this, you can make use of the IDENTIFIER function in Snowflake, which lets you use text as an object identifier.
create table t1 (col1 number, col2 number);
create table t2 (col1 number, col2 number);
set ti = 't1';
create view v1 as select col1, col2 from identifier($ti);
Before you query the view, you will need to set the session variable (ti in this case) to the table name (fully qualified if need be).
set ti = 't1';
select * from v1; -- returns data from t1
set ti = 't2';
select * from v1; -- returns data from t2
I have not found a way to do this, so I've created what I call a "wrapper view" in the past when I need something like this, example as follows.
I hope this helps...Rich
--create source tables and test records
CREATE TABLE t1 (id NUMBER, str VARCHAR);
CREATE TABLE t2 (id NUMBER, str VARCHAR);
CREATE TABLE t3 (id NUMBER, str VARCHAR);
INSERT INTO t1 VALUES(1, 'record from t1');
INSERT INTO t1 VALUES(2, 'record from t1');
INSERT INTO t2 VALUES(100, 'record from t2');
INSERT INTO t2 VALUES(101, 'record from t2');
INSERT INTO t3 VALUES(998, 'record from t3');
INSERT INTO t3 VALUES(999, 'record from t3');
--create the "wrapper" view
CREATE VIEW vw_t AS (
SELECT 't1' as table_name, * FROM t1
UNION ALL
SELECT 't2' as table_name, * FROM t2
UNION ALL
SELECT 't3' as table_name, * FROM t3);
--try it out
SELECT *
FROM vw_t
WHERE table_name = 't3';
--results
TABLE_NAME ID STR
t3 998 record from t3
t3 999 record from t3
I think the best way to handle something like this would be to create a UDTF that acts like a view that has been parameterized. So, in essence, you'd reference the UDTF like a view and pass the parameters into the UDTF, which would then return the data that you wish to use. Note that Snowflake has 2 options for UDTF (SQL and Javascript):
https://docs.snowflake.net/manuals/sql-reference/udf-table-functions.html
https://docs.snowflake.net/manuals/sql-reference/udf-js-table-functions.html
Although when using the interactive SQL worksheet on Snowflake, you can do this:
SET target_table_name='myTable';
SELECT * FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME=$target_table_name
That does not work programmatically. Instead, as described here, a parameterized query such as a view uses this syntax:
SELECT * FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME=(?)
And yes, the name of that first (?) parameter is 1.
I'm working on my SnowflakeSQLHelper (an adaptation of the Microsoft Patterns and Practices) that will help when attaching parameters.

What is easiest and optimize way to find specific value from database tables?

As per my requirement, I have to find if some words like xyz#test.com value exists in which tables of columns. The database size is very huge and more than 2500 tables.
Can anyone please provide an optimal way to find this type of value from the database. I've created a loop query which took around almost more than 9 hrs to run.
9 hours is clearly a long time. Furthermore, 2,500 tables seems close to insanity for me.
Here is one approach that will run 1 query per table, not one per column. Now I have no idea how this will perform against 2,500 tables. I suspect it may be horrible. That said I would strongly suggest a test filter first like Table_Name like 'OD%'
Example
Declare #Search varchar(max) = 'cappelletti' -- Exact match '"cappelletti"'
Create Table #Temp (TableName varchar(500),RecordData xml)
Declare #SQL varchar(max) = ''
Select #SQL = #SQL+ ';Insert Into #Temp Select TableName='''+concat(quotename(Table_Schema),'.',quotename(table_name))+''',RecordData = (Select A.* for XML RAW) From '+concat(quotename(Table_Schema),'.',quotename(table_name))+' A Where (Select A.* for XML RAW) like ''%'+#Search+'%'''+char(10)
From INFORMATION_SCHEMA.Tables
Where Table_Type ='BASE TABLE'
and Table_Name like 'OD%' -- **** Would REALLY Recommend a REASONABLE Filter *** --
Exec(#SQL)
Select A.TableName
,B.*
,A.RecordData
From #Temp A
Cross Apply (
Select ColumnName = a.value('local-name(.)','varchar(100)')
,Value = a.value('.','varchar(max)')
From A.RecordData.nodes('/row') as C1(n)
Cross Apply C1.n.nodes('./#*') as C2(a)
Where a.value('.','varchar(max)') Like '%'+#Search+'%'
) B
Drop Table #Temp
Returns
If it Helps, the individual queries would look like this
Select TableName='[dbo].[OD]'
,RecordData= (Select A.* for XML RAW)
From [dbo].[OD] A
Where (Select A.* for XML RAW) like '%cappelletti%'
On a side-note, you can search numeric data and even dates.
Make a procedure with VARCHAR datatype of column with table name and store into the temp table from system tables.
Now make one dynamic Query with executing a LOOP on each record with = condition with input parameter of email address.
If condition is matched in any statement using IF EXISTS statement, then store that table name and column name in another temp table. and retrieve the list of those records from temp table at end of the execution.

CROSS JOIN with one empty table sql server

I have two tables say MyFull and MyEmpty. I need to do the combination of all records from both tables. Sometimes, MyEmpty table might have no records, in this scenario I need to return all records from MyFull. This is what I have tried:
--Q1 >> returns no results
SELECT *
FROM MyFull CROSS JOIN MyEmpty
--Q2 >> returns no results
SELECT *
FROM MyFull, MyEmpty
I though of using LEFT JOIN but I have no common key to join on. This is a SQLFiddle
Try this:
SELECT *
FROM MyFull
LEFT JOIN MyEmpty ON 1=1
Though be aware that if you have multiple records in MyEmpty, it will repeat records from MyFull once for every record in MyEmpty. Effectively, the number of records in the results is MyFull * MyEmpty (unless MyEmpty has no records).
Full table :
declare #t table (Spot int,name varchar(1),pct int)
insert into #t(Spot,name,pct)values(1,'A',2),(1,'B',8),(1,'C',6),(2,'A',4),(2,'B',5),(3,'A',5),(3,'D',1),(3,'E',4)
Empty table :
declare #tt table (Spot int,name varchar(1),pct int)
select * from #t OUTER APPLY #tt

Table Vs Table Variable

I have below mentioned 2 approaches of accomplishing one task.
1st is selecting from Table directly multiple times and 2nd in selecting desired columns from table into table variable first and then using that table variable multiple times. Which one would perform better and why?
declare
#var1 varchar(10),
#var2 varchar(10)
----------------------------------------------------------------------------
-- 1st approach
----------------------------------------------------------------------------
select *
from tab1
where tab1.col1 in (select tab2.col1 from tab2 where tab2.col2 <> #var1) or
tab1.col2 in (select tab2.col2 from tab2 where tab2.col3 <> #var2)
----------------------------------------------------------------------------
-- 2nd approach
----------------------------------------------------------------------------
declare #tab2 table (col1 varchar(10), col2 varchar(10))
insert into #tab2
select col1,
col2
from tab2
select *
from tab1
where tab1.col1 in (select t.col1 from #tab2 as t where t.col2 <> #var1) or
tab1.col2 in (select t.col2 from #tab2 as t where t.col3 <> #var2)
According to me the first approach will be faster and efficient.
If u see the execution plan, extra cost for table insert gets added into the second approach.
Execution Plan for first approach:
Execution Plan for second approach:
EDIT: I wasn't understanding the question. Forget my answer. Please.
I don't think there is any difference of performance between your 2 methods, because the only difference is a tiny request to retrieve your column, which should be negligible.
The bonus that you get with the second approach is that if you change your column names in the future, you won't need to update your script.
PS:
I think your query to retrieve your columns isn't quite right. Y'oure not retrieving columns names here, but datas. I don't know your DBMS, but if it's Oracle, it should be something like :
SELECT column_name
FROM USER_TAB_COLUMNS
WHERE table_name = 'MYTABLE'
Why in the world would you think two selects are faster than one?
Why would you not just select col1, col2 from tab1 where ... ?
In both cases you have a select where
A select on table is faster than a select on table varible
So all you have done is added the overhead of inserting into a table variable to get a less efficient select
A table variable is stored in tempdb
Microsoft has all sorts of warnings on use of table variables
[Table variable][1]
For one not to use for more then 100 rows
It does not have indexes
Really what if tab1 had a million rows and the where limited it to 10
You really think insert a million rows into #tab2 is going to make it faster?

Resources