selecting keys which are not in another table takes forever - sql-server

I have a query like this:
select key, name from localtab where key not in (select key from remotetab);
The query takes forever, and I don't understand why.
localtab is local table, and remotetab is a remote table in another server. key is an int column which has a unique index in both tables. When I query the both tables separately, it takes just a few seconds.

Linked Severs have terrible performance. Get the data you need to the local server and do the majority of the hard work and processing there instead of a mix of local and remote in a single query.
select remotetab into a temp table
select [key] into #remote_made_local from remotetab
Use the #temp table when doing the where clause filtering and use exists instead of in for better performance
select a.[key], a.name from localtab a where not exists (select 1 from #remote_made_local b where b.[key] = a.[key] )
Vs doing
select [key], name from localtab where key not in (select [key] from #remote_made_local)

There is also a solution without using temporary tables.
By using a left join instead of not in (select ...), you can massively speed up the query. Like this:
select l.key, l.name
from localtab l left join remotetab r on l.key = r.key
where r.key is null ;

Related

Query tuning required for expensive query

Can someone help me to optimize the code? I have other way to optimize it by using compute column but we can not change the schema on prod as we are not sure how many API's are used to push data into this table. This table has millions of rows and adding a non-clustered index is not helping due to the query cost and it's going for a scan.
create table testcts(
name varchar(100)
)
go
insert into testcts(
name
)
select 'VK.cts.com'
union
select 'GK.ms.com'
go
DECLARE #list varchar(100) = 'VK,GK'
select * from testcts where replace(replace(name,'.cts.com',''),'.ms.com','') in (select value from string_split(#list,','))
drop table testcts
One possibility might be to strip off the .cts.com and .ms.com subdomain/domain endings before you insert or store the name data in your table. Then, use the following query instead:
SELECT *
FROM testcts
WHERE name IN (SELECT value FROM STRING_SPLIT(#list, ','));
Now SQL Server should be able to use an index on the name column.
If your values are always suffixed by cts.com or ms.com you could add that to the search pattern:
SELECT {YourColumns} --Don't use *
FROM dbo.testcts t
JOIN (SELECT CONCAT(SS.[value], V.Suffix) AS [value]
FROM STRING_SPLIT(#list, ',') SS
CROSS APPLY (VALUES ('.cts.com'),
('.ms.com')) V (Suffix) ) L ON t.[name] = L.[value];

What is the "lifespan" of a postgres CTE expression? e.g. WITH... AS

I have a CTE I am using to pull some data from two tables then stick in an intermediate table called cte_list, something like
with cte_list as (
select pl.col_val from prune_list pl join employees.employee emp on pl.col_val::uuid = emp.id
where pl.col_nm = 'employee_ref_id' limit 100
)
Then, I am doing an insert to move records from the cte_list to another archive table (if they don't exist) called employee_arch_test
insert into employees.employee_arch_test (
select * from employees.employee where id in (select col_val::uuid from cte_list)
and not exists (select 1 from employees.employee_arch_test where employees.employee_arch_test.id=employees.employee.id)
);
This seems to work fine. The problem is when I add another statement after, to do some deletions from the main employee table using this aforementioned cte_list - the cte_list apparently no longer exists?
SQL Error [42P01]: ERROR: relation "cte_list" does not exist
the actual delete query:
delete from employees.employee where id in (select col_val::uuid from cte_list);
Can the cte_list CTE table only be used once or something? I'm running these statements in a LOOP and I need to run the exact same calls for about 2 or 3 other tables but hit a sticking point here.
A CTE only exists for the duration of the statement of which it's a part. I gather you have an INSERT statement with the CTE preceding it:
with cte_list
as (select pl.col_val
from prune_list pl
join employees.employee emp
on pl.col_val::uuid = emp.id
where pl.col_nm = 'employee_ref_id'
limit 100
)
insert into employees.employee_arch_test
(select *
from employees.employee
where id in (select col_val::uuid from cte_list)
and not exists (select 1
from employees.employee_arch_test
where employees.employee_arch_test.id = employees.employee.id)
);
The CTE is part of the INSERT statement - it is not a separate statement by itself. It only exists for the duration of the INSERT statement.
If you need something which lasts longer your options are:
Add the same CTE to each of your following statements. Note that because data may be changing in your database each invocation of the CTE may return different data.
Create a view which performs the same operations as the CTE, then use the view in place of the CTE. Note that because data may be changing in your database each invocation of the view may return different data.
Create a temporary table to hold the data from your CTE query, then use the temporary table in place of the CTE. This has the advantage of providing a consistent set of data to all operations.

Too many parameter values slowing down query

I have a query that runs fairly fast under normal circumstances. But it is running very slow (at least 20 minutes in SSMS) due to how many values are in the filter.
Here's the generic version of it, and you can see that one part is filtering by over 8,000 values, making it run slow.
SELECT DISTINCT
column
FROM
table_a a
JOIN
table_b b ON (a.KEY = b.KEY)
WHERE
a.date BETWEEN #Start and #End
AND b.ID IN (... over 8,000 values)
AND b.place IN ( ... 20 values)
ORDER BY
a.column ASC
It's to the point where it's too slow to use in the production application.
Does anyone know how to fix this, or optimize the query?
To make a query fast, you need indexes.
You need a separate index for the following columns: a.KEY, b.KEY, a.date, b.ID, b.place.
As gotqn wrote before, if you put your 8000 items to a temp table, and inner join it, it will make the query even faster too, but without the index on the other part of the join it will be slow even then.
What you need is to put the filtering values in temporary table. Then use the table to apply filtering using INNER JOIN instead of WHERE IN. For example:
IF OBJECT_ID('tempdb..#FilterDataSource') IS NOT NULL
BEGIN;
DROP TABLE #FilterDataSource;
END;
CREATE TABLE #FilterDataSource
(
[ID] INT PRIMARY KEY
);
INSERT INTO #FilterDataSource ([ID])
-- you need to split values
SELECT DISTINCT column
FROM table_a a
INNER JOIN table_b b
ON (a.KEY = b.KEY)
INNER JOIN #FilterDataSource FS
ON b.id = FS.ID
WHERE a.date BETWEEN #Start and #End
AND b.place IN ( ... 20 values)
ORDER BY .column ASC;
Few important notes:
we are using temporary table in order to allow parallel execution plans to be used
if you have fast (for example CLR function) for spiting, you can join the function itself
it is not good to use IN with many values, the SQL Server is not able to build always the execution plan which may lead to time outs/internal error - you can find more information here

MS SQL - Delete query taking too much time

I have the following query script:
declare #tblSectionsList table
(
SectionID int,
SectionCode varchar(255)
)
--assume #tblSectionsList has 50 sections- rows
DELETE
td
from
[dbo].[InventoryDocumentDetails] td
inner join [dbo].InventoryDocuments th
on th.Id = td.InventoryDocumentDetail_InventoryDocument
inner join #tblSectionsList ts
on ts.SectionID = th.InventoryDocument_Section
This script contains three tables, where #tblSectionsList is a temporary table, it may contains 50 records. Then I am using this table in the join condition with the InventoryDocuments table, then further joined to the InventoryDocumentDetails table. All joins are based on INT foreign-keys.
On the week-end I put this query on server and it is still running even after 2 days,4 hours... Can any body tell me if I am doing something wrong. Or is there any idea to improve its performance? Even I don't know how much more time it will take to give me the result.
Before this I also tried to create an index on the InventoryDocumentDetails table with following script:
CREATE NONCLUSTERED INDEX IX_InventoryDocumentDetails_InventoryDocument
ON dbo.InventoryDocumentDetails (InventoryDocumentDetail_InventoryDocument);
But this script also take more than one day and did not finish so I cancelled this query.
Additional info:
I am using MS SQL 2008 R2.
InventoryDocuments table contains 2108137 rows, has primary key 'Id'.
InventoryDocumentDetails table contains 25055158 rows, has primary key 'Id'.
Both tables have primary keys defined.
CUP - Intel Xeon - with 32 GB RAM
No indexes are defined, because now when I am going to create a new index, that query also get suspended.
Query Execution Plan (1):
2nd Part:
The following query give one row for this and showing status='suspended', and wait_type='LCK_M_IX'
SELECT r.session_id as spid, r.[status], r.command, t.[text], OBJECT_NAME(t.objectid, t.[dbid]) as object, r.logical_reads, r.blocking_session_id as blocked, r.wait_type, s.host_name, s.host_process_id, s.program_name, r.start_time
FROM sys.dm_exec_requests AS r LEFT OUTER JOIN sys.dm_exec_sessions s ON s.session_id = r.session_id OUTER APPLY sys.dm_exec_sql_text(r.[sql_handle]) AS t
WHERE r.session_id <> ##SPID AND r.session_id > 50
What happens when you change the Inner Join to EXISTS
DELETE td
FROM [dbo].[InventoryDocumentDetails] td
WHERE EXISTS (SELECT 1
FROM [dbo].InventoryDocuments th
WHERE EXISTS (SELECT 1
FROM #tblSectionsList ts
WHERE ts.SectionID = th.InventoryDocument_Section)
AND th.Id = td.InventoryDocumentDetail_InventoryDocument)
It sometimes can be more efficient time-wise to truncate a table and re-import the records you want to keep. A delete operation on a large tables is incredibly slow compared to an insert. Of course this is only an option if you can take your table offline. Also, only do this if your logging is set to simple.
Drop triggers table A.
Bulk copy table A to B.
Truncate table A
Enable Identity Insert.
Insert Into A From B Where A.ID Not in ID's to delete.
Disable Identity Insert.
Rebuild indexes.
Enable triggers
Try like the below. It might give you some idea at least.
DELETE FROM [DBO].[INVENTORYDOCUMENTDETAILS] WHERE INVENTORYDOCUMENTDETAILS_PK IN (
(SELECT INVENTORYDOCUMENTDETAILS_PK FROM
[DBO].[INVENTORYDOCUMENTDETAILS] TD
INNER JOIN [DBO].INVENTORYDOCUMENTS TH ON TH.ID = TD.INVENTORYDOCUMENTDETAIL_INVENTORYDOCUMENT
INNER JOIN #TBLSECTIONSLIST TS ON TS.SECTIONID = TH.INVENTORYDOCUMENT_SECTION
)

Is recursion good in SQL Server?

I have a table in SQL server that has the normal tree structure of Item_ID, Item_ParentID.
Suppose I want to iterate and get all CHILDREN of a particular Item_ID (at any level).
Recursion seems an intuitive candidate for this problem and I can write an SQL Server function to do this.
Will this affect performance if my table has many many records?
How do I avoid recursion and simply query the table? Please any suggestions?
With the new MS SQL 2005 you could use the WITHkeyword
Check out this question and particularly this answer.
With Oracle you could use CONNECT BY keyword to generate hierarchical queries (syntax).
AFAIK with MySQL you'll have to use the recursion.
Alternatively you could always build a cache table for your records parent->child relationships
As a general answer, it is possible to do some pretty sophisticated stuff in SQL Server that normally needs recursion, simply by using an iterative algorithm. I managed to do an XHTML parser in Transact SQL that worked surprisingly well. The the code prettifier I wrote was done in a stored procedure. It aint elegant, it is rather like watching buffalo doing Ballet. but it works .
Are you using SQL 2005?
If so you can use Common Table Expressions for this. Something along these lines:
;
with CTE (Some, Columns, ItemId, ParentId) as
(
select Some, Columns, ItemId, ParentId
from myTable
where ItemId = #itemID
union all
select a.Some, a.Columns, a.ItemId, a.ParentId
from myTable as a
inner join CTE as b on a.ParentId = b.ItemId
where a.ItemId <> b.ItemId
)
select * from CTE
The problem you will face with recursion and performance is how many times it will have to recurse to return the results. Each recursive call is another separate call that will have to be joined into the total results.
In SQL 2k5 you can use a common table expression to handle this recursion:
WITH Managers AS
(
--initialization
SELECT EmployeeID, LastName, ReportsTo
FROM Employees
WHERE ReportsTo IS NULL
UNION ALL
--recursive execution
SELECT e.employeeID,e.LastName, e.ReportsTo
FROM Employees e INNER JOIN Managers m
ON e.ReportsTo = m.employeeID
)
SELECT * FROM Managers
or another solution is to flatten the hierarchy into a another table
Employee_Managers
ManagerId (PK, FK to Employee table)
EmployeeId (PK, FK to Employee table)
All the parent child relation ships would be stored in this table, so if Manager 1 manages Manager 2 manages employee 3, the table would look like:
ManagerId EmployeeId
1 2
1 3
2 1
This allows the hierarchy to be easily queried:
select * from employee_managers em
inner join employee e on e.employeeid = em.employeeid and em.managerid = 42
Which would return all employees that have manager 42. The upside will be greater performance, but downside is going to be maintaining the hierarchy
Joe Celko has a book (<- link to Amazon) specifically on tree structures in SQL databases. While you would need recursion for your model and there would definitely be a potential for performance issues there, there are alternative ways to model a tree structure depending on what your specific problem involves which could avoid recursion and give better performance.
Perhaps some more detail is in order.
If you have a master-detail relationship as you describe, then won't a simple JOIN get what you need?
As in:
SELECT
SOME_FIELDS
FROM
MASTER_TABLE MT
,CHILD_TABLE CT
WHERE CT.PARENT_ID = MT.ITEM_ID
You shouldn't need recursion for children - you're only looking at the level directly below (i.e. select * from T where ParentId = #parent) - you only need recursion for all descendants.
In SQL2005 you can get the descendants with:
with AllDescendants (ItemId, ItemText) as (
select t.ItemId, t.ItemText
from [TableName] t
where t.ItemId = #ancestorId
union
select sub.ItemId, sub.ItemText
from [TableName] sub
inner join [TableName] tree
on tree.ItemId = sub.ParentItemId
)
You don't need recursion at all....
Note, I changed columns to ItemID and ItemParentID for ease of typing...
DECLARE #intLevel INT
SET #intLevel = 1
INSERT INTO TempTable(ItemID, ItemParentID, Level)
SELECT ItemID, ItemParentID, #intLevel
WHERE ItemParentID IS NULL
WHILE #intLevel < #TargetLevel
BEGIN
SET #intLevel = #intLevel + 1
INSERT INTO TempTable(ItemID, ItemParentID, Level)
SELECt ItemID, ItemParentID, #intLevel
WHERE ItemParentID IN (SELECT ItemID FROM TempTable WHERE Level = #intLevel-1)
-- If no rows are inserted then there are no children
IF ##ROWCOUNT = 0
BREAK
END
SELECt ItemID FROM TempTable WHERE Level = #TargetLevel

Resources