I have below mentioned 2 approaches of accomplishing one task.
1st is selecting from Table directly multiple times and 2nd in selecting desired columns from table into table variable first and then using that table variable multiple times. Which one would perform better and why?
declare
#var1 varchar(10),
#var2 varchar(10)
----------------------------------------------------------------------------
-- 1st approach
----------------------------------------------------------------------------
select *
from tab1
where tab1.col1 in (select tab2.col1 from tab2 where tab2.col2 <> #var1) or
tab1.col2 in (select tab2.col2 from tab2 where tab2.col3 <> #var2)
----------------------------------------------------------------------------
-- 2nd approach
----------------------------------------------------------------------------
declare #tab2 table (col1 varchar(10), col2 varchar(10))
insert into #tab2
select col1,
col2
from tab2
select *
from tab1
where tab1.col1 in (select t.col1 from #tab2 as t where t.col2 <> #var1) or
tab1.col2 in (select t.col2 from #tab2 as t where t.col3 <> #var2)
According to me the first approach will be faster and efficient.
If u see the execution plan, extra cost for table insert gets added into the second approach.
Execution Plan for first approach:
Execution Plan for second approach:
EDIT: I wasn't understanding the question. Forget my answer. Please.
I don't think there is any difference of performance between your 2 methods, because the only difference is a tiny request to retrieve your column, which should be negligible.
The bonus that you get with the second approach is that if you change your column names in the future, you won't need to update your script.
PS:
I think your query to retrieve your columns isn't quite right. Y'oure not retrieving columns names here, but datas. I don't know your DBMS, but if it's Oracle, it should be something like :
SELECT column_name
FROM USER_TAB_COLUMNS
WHERE table_name = 'MYTABLE'
Why in the world would you think two selects are faster than one?
Why would you not just select col1, col2 from tab1 where ... ?
In both cases you have a select where
A select on table is faster than a select on table varible
So all you have done is added the overhead of inserting into a table variable to get a less efficient select
A table variable is stored in tempdb
Microsoft has all sorts of warnings on use of table variables
[Table variable][1]
For one not to use for more then 100 rows
It does not have indexes
Really what if tab1 had a million rows and the where limited it to 10
You really think insert a million rows into #tab2 is going to make it faster?
Related
Consider the following query:
begin
;with
t1 as (
select top(10) x from tableX
),
t2 as (
select * from t1
),
t3 as (
select * from t1
)
-- --------------------------
select *
from t2
join t3 on t3.x=t2.x
end
go
I was wondering if t1 is called twice hence tableX being called twice (which means t1 acts like a table)?
or just once with its rows saved in t1 for the whole query (like a variable in a programming lang)?
Just trying to figure out how tsql engine optimises this. This is important to know because if t1 has millions of rows and is being called many times in the whole query generating the same result then there should be a better way to do it..
Just create the table:
CREATE TABLE tableX
(
x int PRIMARY KEY
);
INSERT INTO tableX
VALUES (1)
,(2)
Turn on the execution plan generation and execute the query. You will get something like this:
So, yes, the table is queried two times. If you are using complex common table expression and you are working with huge amount of data, I will advice to store the result in temporary table.
Sometimes, I am getting very bad execution plans for complex CTEs which were working nicely in the past. Also, you are allowed to define indexes on temporary tables and improve performance further.
To be honest, there is no answer... The only answer is Race your horses (Eric Lippert).
The way you write your query does not tell you, how the engine will put it in execution. This depends on many, many influences...
You tell the engine, what you want to get and the engine decides how to get this.
This may even differ between identical calls depending on statistics, currently running queries, existing cached results etc.
Just as a hint, try this:
USE master;
GO
CREATE DATABASE testDB;
GO
USE testDB;
GO
--I create a physical test table with 1.000.000 rows
CREATE TABLE testTbl(ID INT IDENTITY PRIMARY KEY, SomeValue VARCHAR(100));
WITH MioRows(Nr) AS (SELECT TOP 1000000 ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) FROM master..spt_values v1 CROSS JOIN master..spt_values v2 CROSS JOIN master..spt_values v3)
INSERT INTO testTbl(SomeValue)
SELECT CONCAT('Test',Nr)
FROM MioRows;
--Now we can start to test this
GO
CHECKPOINT;
GO
DBCC DROPCLEANBUFFERS;
GO
DECLARE #dt DATETIME2 = SYSUTCDATETIME();
--Your approach with CTEs
;with t1 as (select * from testTbl)
,t2 as (select * from t1)
,t3 as (select * from t1)
select t2.ID AS t2_ID,t2.SomeValue AS t2_SomeValue,t3.ID AS t3_ID,t3.SomeValue AS t3_SomeValue INTO target1
from t2
join t3 on t3.ID=t2.ID;
SELECT 'Final CTE',DATEDIFF(MILLISECOND,#dt,SYSUTCDATETIME());
GO
CHECKPOINT;
GO
DBCC DROPCLEANBUFFERS;
GO
DECLARE #dt DATETIME2 = SYSUTCDATETIME();
--Writing the intermediate result into a physical table
SELECT * INTO test1 FROM testTbl;
SELECT 'Write into test1',DATEDIFF(MILLISECOND,#dt,SYSUTCDATETIME());
select t2.ID AS t2_ID,t2.SomeValue AS t2_SomeValue,t3.ID AS t3_ID,t3.SomeValue AS t3_SomeValue INTO target2
from test1 t2
join test1 t3 on t3.ID=t2.ID
SELECT 'Final physical table',DATEDIFF(MILLISECOND,#dt,SYSUTCDATETIME());
GO
CHECKPOINT;
GO
DBCC DROPCLEANBUFFERS;
GO
DECLARE #dt DATETIME2 = SYSUTCDATETIME();
--Same as before, but with an primary key on the intermediate table
SELECT * INTO test2 FROM testTbl;
SELECT 'Write into test2',DATEDIFF(MILLISECOND,#dt,SYSUTCDATETIME());
ALTER TABLE test2 ADD PRIMARY KEY (ID);
SELECT 'Add PK',DATEDIFF(MILLISECOND,#dt,SYSUTCDATETIME());
select t2.ID AS t2_ID,t2.SomeValue AS t2_SomeValue,t3.ID AS t3_ID,t3.SomeValue AS t3_SomeValue INTO target3
from test2 t2
join test2 t3 on t3.ID=t2.ID
SELECT 'Final physical tabel with PK',DATEDIFF(MILLISECOND,#dt,SYSUTCDATETIME());
--Clean up (Careful with real data!!!)
GO
USE master;
GO
--DROP DATABASE testDB;
GO
On my system the
first takes 674ms, the
second 1.205ms (297 for writing into test1) and the
third 1.727ms (285 for writing into test2 and ~650ms for creating the index.
Although the query is performed twice, the engine can take advantage of cached results.
Conclusio
The engine is really smart... Don't try to be smarter...
If the table would cover a lot of columns and much more data per row the whole test might return something else...
If your CTEs (sub-queries) involve much more complex data with joins, views, functions and so on, the engine might get into troubles finding the best approach.
If performance matters, you can race your horses to test it out. One hint: I sometimes used a TABLE HINT quite successfully: FORCE ORDER. This will perform joins in the order specified in the query.
Here is a simple example to test the theories:
First, via temporary table which calls the matter only once.
declare #r1 table (id int, v uniqueidentifier);
insert into #r1
SELECT * FROM
(
select id=1, NewId() as 'v' union
select id=2, NewId()
) t
-- -----------
begin
;with
t1 as (
select * from #r1
),
t2 as (
select * from t1
),
t3 as (
select * from t1
)
-- ----------------
select * from t2
union all select * from t3
end
go
On the other hand, if we put the matter inside t1 instead of the temporary table, it gets called twice.
t1 as (
select id=1, NewId() as 'v' union
select id=2, NewId()
)
Hence, my conclusion is to use temporary table and not reply on cached results.
Also, ive implemented this on a large scale query that called the "matter" twice only and after moving it to temporary table the execution time went straight half!!
As per my requirement, I have to find if some words like xyz#test.com value exists in which tables of columns. The database size is very huge and more than 2500 tables.
Can anyone please provide an optimal way to find this type of value from the database. I've created a loop query which took around almost more than 9 hrs to run.
9 hours is clearly a long time. Furthermore, 2,500 tables seems close to insanity for me.
Here is one approach that will run 1 query per table, not one per column. Now I have no idea how this will perform against 2,500 tables. I suspect it may be horrible. That said I would strongly suggest a test filter first like Table_Name like 'OD%'
Example
Declare #Search varchar(max) = 'cappelletti' -- Exact match '"cappelletti"'
Create Table #Temp (TableName varchar(500),RecordData xml)
Declare #SQL varchar(max) = ''
Select #SQL = #SQL+ ';Insert Into #Temp Select TableName='''+concat(quotename(Table_Schema),'.',quotename(table_name))+''',RecordData = (Select A.* for XML RAW) From '+concat(quotename(Table_Schema),'.',quotename(table_name))+' A Where (Select A.* for XML RAW) like ''%'+#Search+'%'''+char(10)
From INFORMATION_SCHEMA.Tables
Where Table_Type ='BASE TABLE'
and Table_Name like 'OD%' -- **** Would REALLY Recommend a REASONABLE Filter *** --
Exec(#SQL)
Select A.TableName
,B.*
,A.RecordData
From #Temp A
Cross Apply (
Select ColumnName = a.value('local-name(.)','varchar(100)')
,Value = a.value('.','varchar(max)')
From A.RecordData.nodes('/row') as C1(n)
Cross Apply C1.n.nodes('./#*') as C2(a)
Where a.value('.','varchar(max)') Like '%'+#Search+'%'
) B
Drop Table #Temp
Returns
If it Helps, the individual queries would look like this
Select TableName='[dbo].[OD]'
,RecordData= (Select A.* for XML RAW)
From [dbo].[OD] A
Where (Select A.* for XML RAW) like '%cappelletti%'
On a side-note, you can search numeric data and even dates.
Make a procedure with VARCHAR datatype of column with table name and store into the temp table from system tables.
Now make one dynamic Query with executing a LOOP on each record with = condition with input parameter of email address.
If condition is matched in any statement using IF EXISTS statement, then store that table name and column name in another temp table. and retrieve the list of those records from temp table at end of the execution.
I haven't been able to find anything that solves this really, though I have found many things that seem to point in the right direction.
I have a table with ~4.7 Million records in it. This table also has ~319 columns. Of all of these ~319 columns, there are 16 that I am interested in, and I want to put them into another table that is just 2 columns. Now basically how this is set is that column "A" is just an ID and columns 1-15 are codes. None of the columns are grouped either (not sure if that matters).
I have tried things like:
Insert Into NewTable(ID,Profession)
Select ID, ProCode1 From OriginalTable WHERE ProCode1 > ''
UNION
Select ID, ProCode2 From OriginalTable WHERE ProCode2 > ''
And so on. This didn't seem to do anything at all and I let it go for ~ 20 minutes.
Now I can get a small result doing the same but dropping the union and using a TOP (1000) statement, however even that will never work.
So the question is what can I do to take this:
ID|PID|blah|blah|blah|...|ProCode1|ProCode2|ProCode3|...|ProCode15|blah|...
into:
ID|PID|ProCode|
across all ~4.7 million rows without running:
Insert Into NewTable(PID,ProCode)
select PID, ProCode1 FROM OriginalTable WHERE ProCode1 > ''
Insert Into NewTable(PID, ProCode)
select PID, ProCode2 FROM Original Table WHERE ProCode2 > ''
Insert Into New Table(PID, ProCode)
Select PID, ProCode3 FROM Original Table WHERE ProCode3 > ''
...
...
...
EDIT: I forgot that a majority of the columns for ProCodeX are blank. All ProCode1 rows are occupied, but that becomes exponentially less each increase (e.g. ProCode2 is <50% occupied, ProCode3 is <10% occupied)
Use Cross Apply with Table valued constructor to unpivot the data instead of using different UNION ALL
Insert Into NewTable(PID,ProCode)
select PID, ProCode FROM OriginalTable
Cross apply
(
values(ProCode1),(ProCode2),(ProCode3),..(ProCode15)
)
cs (ProCode)
Where ProCode <> ''
This will be much faster than the UNION ALL query since this will do single physical table hit.
Okay, so I have a query that looks like this:
Declare #Table1 Table (some columns)
Insert into #Table1 [QueryA]
Update #Table1
set Field1 = A.Value1
from ([QueryB]) A
where Field2 = A.Value2
Select * from #Table1
QueryA is a simple query that returns ~150 rows. QueryB is more complex and returns 3 rows. When run on its own, QueryB returns in less than 1 second. When run inside of the update statement, QueryB takes about 1 minute to run.
Now, if the query is reformatted like this, the whole thing takes less than a second:
Declare #Table1 Table (some columns)
Insert into #Table1 [QueryA]
Declare #Table2 Table (some columns)
Insert into #Table2 [QueryB]
Update #Table1
set Field1 = A.Value1
from (select * from #Table2) A
where Field2 = A.Value2
Select * from #Table1
Does anyone know why this is happening? My guess is that something wonky is going on with the optimizer engine, but if I'm missing something, I'd love to hear it.
SQL Server does not create statistics on table variables so the query plans will involve scans, lots and, lots of repeated scans. The other this does not save the query plan so on every run through it recreates the query plan.
So what you are getting is scan for every row * (recreate a query plan +execute a query plan).
I have the following Query and i need the query to fetch data from SomeTable based on the filter criteria present in the Someothertable. If there is nothing present in SomeOtherTable Query should return me all the data present in SomeTable
SQL SERVER 2005
SomeOtherTable does not have any indexes or any constraint all fields are char(50)
The Following Query work fine for my requirements but it causes performance problems when i have lots of parameters.
Due to some requirement of Client, We have to keep all the Where clause data in SomeOtherTable. depending on subid data will be joined with one of the columns in SomeTable.
For example the Query can can be
SELECT
*
FROM
SomeTable
WHERE
1=1
AND
(
SomeTable.ID in (SELECT DISTINCT ID FROM SomeOtherTable WHERE Name = 'ABC' and subid = 'EF')
OR
0=(SELECT Count(1) FROM SomeOtherTable WHERE spName = 'ABC' and subid = 'EF')
)
AND
(
SomeTable.date =(SELECT date FROM SomeOtherTable WHERE Name = 'ABC' and subid = 'Date')
OR
0=(SELECT Count(1) FROM SomeOtherTable WHERE spName = 'ABC' and subid = 'Date')
)
EDIT----------------------------------------------
I think i might have to explain my problem in detail:
We have developed an ASP.net application that is used to invoke parametrize crystal reports, parameters to the crystal reports are not passed using the default crystal reports method.
In ASP.net application we have created wizards which are used to pass the parameters to the Reports, These parameters are not directly consumed by the crystal report but are consumed by the Query embedded inside the crystal report or the Stored procedure used in the Crystal report.
This is achieved using a table (SomeOtherTable) which holds parameter data as long as report is running after which the data is deleted, as such we can assume that SomeOtherTable has max 2 to 3 rows at any given point of time.
So if we look at the above query initial part of the Query can be assumed as the Report Query and the where clause is used to get the user input from the SomeOtherTable table.
So i don't think it will be useful to create indexes etc (May be i am wrong).
SomeOtherTable does not have any
indexes or any constraint all fields
are char(50)
Well, there's your problem. There's nothing you can do to a query like this which will improve its performance if you create it like this.
You need a proper primary or other candidate key designated on all of your tables. That is to say, you need at least ONE unique index on the table. You can do this by designating one or more fields as the PK, or you can add a UNIQUE constraint or index.
You need to define your fields properly. Does the field store integers? Well then, an INT field may just be a better bet than a CHAR(50).
You can't "optimize" a query that is based on an unsound schema.
Try:
SELECT
*
FROM
SomeTable
LEFT JOIN SomeOtherTable ON SomeTable.ID=SomeOtherTable.ID AND Name = 'ABC'
WHERE
1=1
AND
(
SomeOtherTable.ID IS NOT NULL
OR
0=(SELECT Count(1) FROM SomeOtherTable WHERE spName = 'ABC')
)
also put 'with (nolock)' after each table name to improve performance
The following might speed you up
SELECT *
FROM SomeTable
WHERE
SomeTable.ID in
(SELECT DISTINCT ID FROM SomeOtherTable Where Name = 'ABC')
UNION
SELECT *
FROM SomeTable
Where
NOT EXISTS (Select spName From SomeOtherTable Where spName = 'ABC')
The UNION will effectivly split this into two simpler queries which can be optiomised separately (depends very much on DBMS, table size etc whether this will actually improve performance -- but its always worth a try).
The "EXISTS" key word is more efficient than the "SELECT COUNT(1)" as it will return true as soon as the first row is encountered.
Or check if the value exists in db first
And you can remove the distinct keyword in your query, it is useless here.
if EXISTS (Select spName From SomeOtherTable Where spName = 'ABC')
begin
SELECT *
FROM SomeTable
WHERE
SomeTable.ID in
(SELECT ID FROM SomeOtherTable Where Name = 'ABC')
end
else
begin
SELECT *
FROM SomeTable
end
Aloha
Try
select t.* from SomeTable t
left outer join SomeOtherTable o
on t.id = o.id
where (not exists (select id from SomeOtherTable where spname = 'adbc')
OR spname = 'adbc')
-Edoode
change all your select statements in the where part to inner jons.
the OR conditions should be union all-ed.
also make sure your indexing is ok.
sometimes it pays to have an intermediate table for temp results to which you can join to.
It seems to me that there is no need for the "1=1 AND" in your query. 1=1 will always evaluate to be true, leaving the software to evaluate the next part... why not just skip the 1=1 and evaluate the juicy part?
I am going to stick to my original Query.