SQL Server 2008R2 query stalling - sql-server

I have 2 tables, TableA and TableB
TableA contains nearly 3,000,000 records
TableB contains about 10,000 records
I want to delete the entries in TableA that match a certain parameters. This query has qorked OK for smaller tables, but I get timeout exceptions when running in VB.Net
delete FROM TableA WHERE (((TableA.ID) In (SELECT [TableB].ID FROM TableB)) AND ((TableA.EVDATE)='20170720'));
In an effort to see what's going on, I changed this to a SELECT * FROM... in SSMS and at 5 minutes with no result, I stopped it...
Why does this stall and is there a better way of doing this?
I think this is much better to read:
delete FROM TableA WHERE TableA.EVDATE='20170720' and TableA.ID In
(SELECT [TableB].ID FROM TableB);

You will need to split the number of deleted records into multiple calls
Example:
Delete Each 1000 Rows Alone
DELETE FROM TableA
WHERE TableA.ID IN (SELECT TOP (1000) ID FROM TableA)
and you can make a loop over them and Check ##ROWS_COUNT if 0 then all rows were deleted and no need to recall or IF ##ROWS_COUNT < 1000 then no need to recall
The big query needs a lot of transaction log resources (DatabaseName_Log.mdb File not the database file) and when you split you will benefit from lower resources and lower logs which allows faster executions per smaller number of records to delete

Related

How to efficiently get latest Insert Time Stamp on a table with millions of rows

I have a question regarding table design / query efficiency in SQL.
I have two tables, Table A contains list of clients, Table B contains clients ID and the last time a message has been received from a client.
The number of clients is growing and in tens of 1000, each client sends a message at least once a minute, sometimes more, sometimes less, but on average it is about that.
Table B is growing rather fast.
The question is this: I want to be able to pull a list of all clients and their last seen date and time.
The problem is as the table grows the query execution time is getting larger and requires scan of all of the rows in Table A and B.
I have introduced a new column to Table B which is just a date type column and created non clustered, non-unique an index on it, however it does not seem to make much difference.
The query is:
SELECT [TableA].[Client_ID] ISNULL(R.Most_Recent_TimeStamp, '2000-01-01') AS Most_Recent_Comms
FROM [TableA]
LEFT JOIN (SELECT [TableB].[Client_ID], MAX([TableB].[Time_Stamp]) AS Most_Recent_TimeStamp FROM [TableB] WITH(NOLOCK) GROUP BY [TableB].[Client_ID]) AS R ON [TableA].[Client_ID] = R.Client_ID
The execution time is in tens of seconds. Things have improved when I included WITH(NOLOCK) statement a fair amount. And you can imagine as the time progresses and TableB grows, execution time will be growing.
I do not think this is the right way to go.
I am sure there is a better way. What about creating a view or another table and writing a trigger, which will update the new table every time a row is inserted in to TableB. The new table will be always kept up to date and one can call simple SELECT query.
I would suggest one of the following:
SELECT b.ClientId, MAX(b.TimeStamp)
FROM TableB b
GROUP BY b.ClientId;
This assumes that all clients are in TableB. If not:
SELECT a.ClientId, b.TimeStamp
FROM TableA OUTER APPLY
(SELECT b.*
FROM TableB b
WHERE b.Client_Id = a.Client_Id
ORDER BY b.TimeStamp DESC
) b;
For both queries, you want an index on TableB(ClientId, TimeStamp).

SQL DELETE performance, T-SQL or ISO-compatible query

If I have two very large tables (TableA and TableB), both with an Id column, and I would like to remove all rows from TableA that have their Ids present in TableB. Which would be the fastest? Why?
--ISO-compatible
DELETE FROM TabelA
WHERE Id IN (SELECT Id FROM TableB)
or
-- T-SQL
DELETE A FROM TabelA AS A
INNER JOIN TableB AS B
ON A.Id = B.Id
If there are indexes on each Id, they should perform equally well.
If there are not indexes on each Id, exists() or in () may perform better.
In general I prefer exists() over in () because it allows you to easily add more than one comparison when needed.
delete a
from tableA as a
where exists (
select 1
from tableB as b
where a.Id = b.Id
)
Reference:
in vs inner join - Gail Shaw
exists() vs in - Gail Shaw
As long as your Id in TableB is unique, both queries should create the same execution plan. Just include the execution plan to each queries and verify it.
Take a look at this nice post: in-vs-join-vs-exists
There's an easy way to find out, using the execution plan (press ctrl + L on SSMS).
Since we don't know the data model behind your tables (the eventual indexes etc), we can't know for sure which query will be the fastest.
By experience, I can tell you that, for very large tables (>1mil rows), the delete clause is quite slow, because of all the logging. Depending on the operation you're doing, you will want SQL Server NOT TO log the delete.
You might want to check at this question :
How to delete large data of table in SQL without log?

SQL Query too slow on second pc

We have a huge database with over 100 tables and millions of rows.
I created a stored procedure for a job, tested it local and got 500'000 results in less than 10sec. I tested the same query on a second pc and waited about 1 hours for the same result.
The simple version of the query is:
select * from Table1
inner join Table2 on Table1.Table2Id = Table2.Id
where Table1.Segment = #segment
Table1 38'553'864 Rows
Table2 10'647'167 Rows
I used the execution plan and got the following result:
On the local PC I got the result:
(I could send the whole execution plan if needed)
The second PC is a virtual Server(testystem). It has a lot more memory, more space... I also stopped every Job on the server and only tried the sql query, but got the same result. So there aren't any sql query which blocks the tables.
Later I created a Index on the foreign key of table1 and tried to use it, but can't improve the query.
Does anyone have an idea where the problem could be and how I could solve it?
It would take a while to create a execution plan for both querys. But here are a few steps, who already helped a lot. Thanks guys for your help.
The statistics on the tables are from september last year on the second PC. We don't use the query that much on the server. An update on the statistics is a good point.
https://msdn.microsoft.com/en-us/library/ms190397(v=sql.120).aspx
Another thing is to improve my sql query. I removed the where and add it as a condition on the first inner join. So it filter the rows in the first table and after join the huge amount of rows from the second. (The where filters about 90% of the first table and table 3 is really small)
select * from Table1
inner join Table3 on Table1.Segment = #segment
and Table1.Table3Id = Table3.Id
inner join Table2 on Table1.Table2Id = Table2.Id
A next step is. I created an SQL Job which rebuild all Indexes. So they are up to date.
It already a lot better, but I'm still open for other inputs.

netezza left outer join query performance

I have a question related to Netezza query performance .I have 2 tables Table A and Table B and Table B is the sub set of Table A with data alteration .I need to update those new values to table A from table B
We can have 2 approaches here
1) Left outer join and select relevant columns and insert in target table
2) Insert table a data into target table and update those values from tableB using join
I tried both and logically both are same.But Explain plan is giving different cost
for normal select
a)Sub-query Scan table "TM2" (cost=0.1..1480374.0 rows=8 width=4864 conf=100)
update
b)Hash Join (cost=356.5..424.5 rows=2158 width=27308 conf=21)
for left outer join
Sub-query Scan table "TM2" (cost=51.0..101474.8 rows=10000000 width=4864 conf=100)
From this I feel left outer join is better .Can anyone put some thought on this and guide
Thanks
The reason that the cost of insert into table_c select ... from table_a; update table_c set ... from table_b; is higher is because you're inserting, deleting, then inserting. Updates in Netezza mark the records to be updated as deleted, then inserts new rows with the updated values. Once the data is written to an extent, it's never (to my knowledge) altered.
With insert into table_c select ... from table_a join table_b using (...); you're only inserting once, thereby only updating all the zone maps once. The cost will be noticeably lower.
Netezza does an excellent job of keeping you away from the disk on reads, but it will write to the disk as often as you tell it to. In the case of updates, seemingly more so. Try to only write as often as is necessary to gain benefits of new distributions and co-located joins. Any more than that, and you're just using excess commit actions.

Why is this CTE so much slower than using temp tables?

We had an issue since a recent update on our database (I made this update, I am guilty here), one of the query used was much slower since then. I tried to modify the query to get faster result, and managed to achieve my goal with temp tables, which is not bad, but I fail to understand why this solution performs better than a CTE based one, which does the same queries. Maybe it has to do that some tables are in a different DB ?
Here's the query that performs badly (22 minutes on our hardware) :
WITH CTE_Patterns AS (
SELECT
PEL.iId_purchased_email_list,
PELE.sEmail
FROM OtherDb.dbo.Purchased_Email_List PEL WITH(NOLOCK)
INNER JOIN OtherDb.dbo.Purchased_Email_List_Email AS PELE WITH(NOLOCK) ON PELE.iId_purchased_email_list = PEL.iId_purchased_email_list
WHERE PEL.bPattern = 1
),
CTE_Emails AS (
SELECT
ILE.iId_newsletterservice_import_list,
ILE.iId_newsletterservice_import_list_email,
ILED.sEmail
FROM dbo.NewsletterService_import_list_email AS ILE WITH(NOLOCK)
INNER JOIN dbo.NewsletterService_import_list_email_distinct AS ILED WITH(NOLOCK) ON ILED.iId_newsletterservice_import_list_email_distinct = ILE.iId_newsletterservice_import_list_email_distinct
WHERE ILE.iId_newsletterservice_import_list = 1000
)
SELECT I.iId_newsletterservice_import_list,
I.iId_newsletterservice_import_list_email,
BL.iId_purchased_email_list
FROM CTE_Patterns AS BL WITH(NOLOCK)
INNER JOIN CTE_Emails AS I WITH(NOLOCK) ON I.sEmail LIKE BL.sEmail
When running both CTE queries separately, it's super fast (0 secs in SSMS, returns 122 rows and 13k rows), when running the full query, with INNER JOIN on sEmail, it's super slow (22 minutes)
Here's the query that performs well, with temp tables (0 sec on our hardware) and which does the eaxct same thing, returns the same result :
SELECT
PEL.iId_purchased_email_list,
PELE.sEmail
INTO #tb1
FROM OtherDb.dbo.Purchased_Email_List PEL WITH(NOLOCK)
INNER JOIN OtherDb.dbo.Purchased_Email_List_Email PELE ON PELE.iId_purchased_email_list = PEL.iId_purchased_email_list
WHERE PEL.bPattern = 1
SELECT
ILE.iId_newsletterservice_import_list,
ILE.iId_newsletterservice_import_list_email,
ILED.sEmail
INTO #tb2
FROM dbo.NewsletterService_import_list_email AS ILE WITH(NOLOCK)
INNER JOIN dbo.NewsletterService_import_list_email_distinct AS ILED ON ILED.iId_newsletterservice_import_list_email_distinct = ILE.iId_newsletterservice_import_list_email_distinct
WHERE ILE.iId_newsletterservice_import_list = 1000
SELECT I.iId_newsletterservice_import_list,
I.iId_newsletterservice_import_list_email,
BL.iId_purchased_email_list
FROM #tb1 AS BL WITH(NOLOCK)
INNER JOIN #tb2 AS I WITH(NOLOCK) ON I.sEmail LIKE BL.sEmail
DROP TABLE #tb1
DROP TABLE #tb2
Tables stats :
OtherDb.dbo.Purchased_Email_List : 13 rows, 2 rows flagged bPattern = 1
OtherDb.dbo.Purchased_Email_List_Email : 324289 rows, 122 rows with patterns (which are used in this issue)
dbo.NewsletterService_import_list_email : 15.5M rows
dbo.NewsletterService_import_list_email_distinct ~1.5M rows
WHERE ILE.iId_newsletterservice_import_list = 1000 retrieves ~ 13k rows
I can post more info about tables on request.
Can someone help me understand this ?
UPDATE
Here is the query plan for the CTE query :
Here is the query plan with temp tables :
As you can see in the query plan, with CTEs, the engine reserves the right to apply them basically as a lookup, even when you want a join.
If it isn't sure enough it can run the whole thing independently, in advance, essentially generating a temp table... let's just run it once for each row.
This is perfect for the recursion queries they can do like magic.
But you're seeing - in the nested Nested Loops - where it can go terribly wrong.
You're already finding the answer on your own by trying the real temp table.
Parallelism. If you noticed in your TEMP TABLE query, the 3rd Query indicates Parallelism in both distributing and gathering the work of the 1st Query. And Parallelism when combining the results of the 1st and 2nd Query. The 1st Query also incidentally has a relative cost of 77%. So the Query Engine in your TEMP TABLE example was able to determine that the 1st Query can benefit from Parallelism. Especially when the Parallelism is Gather Stream and Distribute Stream, so its allowing the divying up of work (join) because the data is distributed in such a way that allows for divying up the work then recombining. Notice the cost of the 2nd Query is 0% so you can ignore that as no cost other than when it needs to be combined.
Looking at the CTE, that is entirely processed Serially and not in Parallel. So somehow with the CTE it could not figure out the 1st Query can be run in Parallel, as well as the relationship of the 1st and 2nd query. Its possible that with multiple CTE expressions it assumes some dependency and did not look ahead far enough.
Another test you can do with the CTE is keep the CTE_Patterns but eliminate the CTE_Emails by putting that as a "subquery derived" table to the 3rd Query in the CTE. It would be curious to see the Execution Plan, and see if there is Parallelism when expressed that way.
In my experience it's best to use CTE's for recursion and temp tables when you need to join back to the data. Makes for a much faster query typically.

Resources