SQL partitioning query improvement is getting lower with where condition - sql-server

I am using SQL partition function based on ActionedOn column and gets scan count for the following query is 6
SET STATISTICS IO ON
SET STATISTICS TIME ON
SELECT *
FROM [EventData]
WHERE actionedon BETWEEN '2017-10-31 07:16:33.367' AND '2022-01-10 07:16:33.367'
If I add the where condition in the query it check the data in all partitions and the scan count is 9
SET STATISTICS IO ON
SET STATISTICS TIME ON
SELECT *
FROM [EventData]
WHERE actionedon BETWEEN '2017-10-31 07:16:33.367' AND '2022-01-10 07:16:33.367'
AND UserId = 8234725
Will the partition be overridden by the WHERE clause here?
Please confirm

Related

Insert from select or update from select with commit every 1M records

I've already seen a dozen such questions but most of them get answers that doesn't apply to my case.
First off - the database is am trying to get the data from has a very slow network and is connected to using VPN.
I am accessing it through a database link.
I have full write/read access on my schema tables but I don't have DBA rights so I can't create dumps and I don't have grants for creation new tables etc.
I've been trying to get the database locally and all is well except for one table.
It has 6.5 million records and 16 columns.
There was no problem getting 14 of them but the remaining two are Clobs with huge XML in them.
The data transfer is so slow it is painful.
I tried
insert based on select
insert all 14 then update the other 2
create table as
insert based on select conditional so I get only so many records and manually commit
The issue is mainly that the connection is lost before the transaction finishes (or power loss or VPN drops or random error etc) and all the GBs that have been downloaded are discarded.
As I said I tried putting conditionals so I get a few records but even this is a bit random and requires focus from me.
Something like :
Insert into TableA
Select * from TableA#DB_RemoteDB1
WHERE CREATION_DATE BETWEEN to_date('01-Jan-2016') AND to_date('31-DEC-2016')
Sometimes it works sometimes it doesn't. Just after a few GBs Toad is stuck running but when I look at its throughput it is 0KB/s or a few Bytes/s.
What I am looking for is a loop or a cursor that can be used to get maybe 100000 or a 1000000 at a time - commit it then go for the rest until it is done.
This is a one time operation that I am doing as we need the data locally for testing - so I don't care if it is inefficient as long as the data is brought in in chunks and a commit saves me from retrieving it again.
I can count already about 15GBs of failed downloads I've done over the last 3 days and my local table still has 0 records as all my attempts have failed.
Server: Oracle 11g
Local: Oracle 11g
Attempted Clients: Toad/Sql Dev/dbForge Studio
Thanks.
You could do something like:
begin
loop
insert into tablea
select * from tablea#DB_RemoteDB1 a_remote
where not exists (select null from tablea where id = a_remote.id)
and rownum <= 100000; -- or whatever number makes sense for you
exit when sql%rowcount = 0;
commit;
end loop;
end;
/
This assumes that there is a primary/unique key you can use to check if a row int he remote table already exists in the local one - in this example I've used a vague ID column, but replace that with your actual key column(s).
For each iteration of the loop it will identify rows in the remote table which do not exist in the local table - which may be slow, but you've said performance isn't a priority here - and then, via rownum, limit the number of rows being inserted to a manageable subset.
The loop then terminates when no rows are inserted, which means there are no rows left in the remote table that don't exist locally.
This should be restartable, due to the commit and where not exists check. This isn't usually a good approach - as it kind of breaks normal transaction handling - but as a one off and with your network issues/constraints it may be necessary.
Toad is right, using bulk collect would be (probably significantly) faster in general as the query isn't repeated each time around the loop:
declare
cursor l_cur is
select * from tablea#dblink3 a_remote
where not exists (select null from tablea where id = a_remote.id);
type t_tab is table of l_cur%rowtype;
l_tab t_tab;
begin
open l_cur;
loop
fetch l_cur bulk collect into l_tab limit 100000;
forall i in 1..l_tab.count
insert into tablea values l_tab(i);
commit;
exit when l_cur%notfound;
end loop;
close l_cur;
end;
/
This time you would change the limit 100000 to whatever number you think sensible. There is a trade-off here though, as the PL/SQL table will consume memory, so you may need to experiment a bit to pick that value - you could get errors or affect other users if it's too high. Lower is less of a problem here, except the bulk inserts become slightly less efficient.
But because you have a CLOB column (holding your XML) this won't work for you, as #BobC pointed out; the insert ... select is supported over a DB link, but the collection version will get an error from the fetch:
ORA-22992: cannot use LOB locators selected from remote tables
ORA-06512: at line 10
22992. 00000 - "cannot use LOB locators selected from remote tables"
*Cause: A remote LOB column cannot be referenced.
*Action: Remove references to LOBs in remote tables.

Query when value trails with .b decreases speed by 30x-100x

Consider table testTable, a table with six fields: one of them a UNIQUEIDENTIFIER, one a TIMESTAMP and four of them VARCHARs. Field Filename is VARCHAR.
This first query takes 1 minutes 38 seconds
Select top 1 * from testTable WHERE Filename = 'any.string.1512.b'
Either of these queries takes 1-3 seconds
Select top 1 * from testTable WHERE Filename = 'any.string.1512'
Select top 1 * from testTable WHERE Filename like 'cusip.realloc.1412.b%'
I have looked at the execution plan for all three and the only difference is that the last query (the LIKE statement) used 46% index seek\54% Key Lookup vs a 50\50 index\key lookup for the first two. As far as I can tell as soon as I no longer use the .b part of this search criterium, the queries go back to normal speed.
FileName has been indexed; table has been removed and recreated just in case. We have added indexes, removed indexes, check table, checked database, restart services, restart server, recreated the table. This field used to be VARCHAR(MAX) and I changed it to VARCHAR(100) to index it, but the problem was occurring before making this change.
Something else that I believe may be happening is that there might be something wrong with the end of the table. It will never complete a full:
Select * from testTable
I hoped it was a corrupted table but that wasn't the case. However when we attempt to generate a script in SSMS it fails to generate (no error given). I was able to recreate it by using another SQL client and generating the structure from SSMS and data copy from the other SQL client.
We are pretty stumped.

Reset SQL Server Index usage

I use below query to analyze usage of index in SQL Server.
SELECT *
FROM sys.dm_db_index_usage_stats A
WHERE A.database_id = DB_ID()
How can reset all data from this system table?
What do you mean by reset .. do you want to reset the index usage statistics in the table?
Taken from Here
Usage statistics: These are found in sys.dm_db_index_usage_stats.
Index usage statistics keep track of things like seeks and scans from
SELECT queries. They are not persisted and get reset on restart of sql
server. These statistics also get reset if the underlying index is
rebuilt "ALTER INDEX ... REBUILD", but not with "ALTER INDEX ...
REORG"
As said, you can't reset it manually. Take a look at this post which certainly says the same
http://social.msdn.microsoft.com/Forums/sqlserver/en-US/08eb7b79-64a3-4475-bfc3-69715aec8381/resetting-dmdbindexusagestats-without-restarting-or-detaching-a-database
Like mentioned, you cannot truly reset it without restarting the SQL Server.
BUT
Why do you want to reset it? Probably because you have made changes to your indexes and simply want to see how the usage has changed, am I right?
In this case you can hardcode the existing values into your query and subtract it to get new stats from this point.
By "hardcoding" I mean joining with a VALUES pseudo-table, something like this
--your SELECT goes here
--your FROM goes here
--add this JOIN
JOIN ( VALUES('IX_index1', 2412727),
('IX_index2', 1630517),
('IX_index3', 514129)) o(name, seeks) ON o.name=indexes.name
-- rest of your query
Now you can add this to your SELECT to get the difference:
SELECT dm_db_index_usage_stats.user_seeks - o.seeks AS newseeks
So in a nutshell:
SELECT the existing usage stats from dm_db_index_usage_stats
do some copy-pasting magic to get the existing stats and hardcode into your query
see the changes

TSQL Batch insert - math doesn't work

I need to insert 1.3 million of records from one table into another, and it takes really long time (over 13 min). After some research I found that it is better to do this operation in batches, so I put together something like this (actual query is more complicated, it is simplified here for briefness):
DECLARE #key INT; SET #key = 0;
CREATE TABLE #CURRENT_KEYS(KEY INT)
WHILE 1=1
BEGIN
-- Getting subset of keys
INSERT INTO #CURRENT_KEYS(KEY)
SELECT TOP 100000 KEY FROM #ALL_KEYS WHERE KEY > #key
IF ##ROWCOUNT = 0 BREAK
-- Main Insert
INSERT INTO #RESULT(KEY, VALUE)
SELECT MAIN_TABLE.KEY, MAIN_TABLE.VALUE
FROM MAIN_TABLE INNER_JOIN #CURRENT_KEYS
ON MAIN_TABLE.KEY = #CURRENT_KEYS.KEY
SELECT #key = MAX(KEY ) FROM #CURRENT_KEYS
TRUNCATE TABLE #CURRENT_KEYS
END
I already have indexed list of 1.3 million keys in #ALL_KEYS table so idea here is in a loop create smaller subset of keys for the JOIN and INSERT. The above loop executes 13 times (1,300,000 records / 100,000 records in a batch). If I put a break after just one iterations - execution time is 9 seconds. I assumed total execution time would be 9*13 seconds, but it's the same 13 minutes!
Any idea why?
NOTE: Instead of temp table #CURRENT_KEYS, I tried to use CTE, but with the same result.
UPDATE Some wait stats.
I am showing for this process PAGEIOLATCH_SH and sometimes PREEMPTIVE_OS_WRITEFILEGATHER in wait stats occasionally over 500ms, but often < 100Ms. Also SP_WHO shows user as suspended for the duration of the query.
I'm pretty sure you're experiencing disk pressure. PREEMPTIVE_OS_WRITEFILEGATHER is an autogrowth event (database getting larger), and PAGEIOLATCH_SH means that the process is waiting for a latch on a buffer that's an IO request (probably your file growth event).
http://blog.sqlauthority.com/2011/02/19/sql-server-preemptive-and-non-preemptive-wait-type-day-19-of-28/
http://blog.sqlauthority.com/2011/02/09/sql-server-pageiolatch_dt-pageiolatch_ex-pageiolatch_kp-pageiolatch_sh-pageiolatch_up-wait-type-day-9-of-28/
What I would recommend is pre-growing both tempdb (for your temp table) and the database that's going to hold the batch insert.
http://support.microsoft.com/kb/2091024

Modify SQL result set before returning from stored procedure

I have a simple table in my SQL Server 2008 DB:
Tasks_Table
-id
-task_complete
-task_active
-column_1
-..
-column_N
The table stores instructions for uncompleted tasks that have to be executed by a service.
I want to be able to scale my system in future. Until now only 1 service on 1 computer read from the table. I have a stored procedure, that selects all uncompleted and inactive tasks. As the service begins to process tasks it updates the task_active flag in all the returned rows.
To enable scaleing of the system I want to enable deployment of the service on more machines. Because I want to prevent a task being returned to more than 1 service I have to update the stored procedure that returns uncompleted and inactive tasks.
I figured that i have to lock the table (only 1 reader at a time - I know I have to use an apropriate ISOLATION LEVEL), and updates the task_active flag in each row of the result set before returning the result set.
So my question is how to modify the SELECT result set iin the stored procedure before returning it?
This is the typical dequeue pattern, is implemented using the OUTPUT clause and and is described in the MSDN, see the Queues paragraph in OUTPUT Clause (Transact-SQL):
UPDATE TOP(1) Tasks_Table WITH (ROWLOCK, READPAST)
SET task_active = 1
OUTPUT INSERTED.id,INSERTED.column_1, ...,INSERTED.column_N
WHERE task_active = 0;
The ROWLOCK,READPAST hint allows for high throughput and high concurency: multiple threads/processed can enqueue new tasks while mutliple threads/process dequeue tasks. There is no order guarantee.
Updated
If you want to order the result you can use a CTE:
WITH cte AS (
SELECT TOP(1) id, task_active, column_1, ..., column_N
FROM Task_Table WITH (ROWLOCK, READPAST)
WHERE task_active = 0
ORDER BY <order by criteria>)
UPDATE cte
SET task_active = 1
OUTPUT INSERTED.id, INSERTED.column_1, ..., INSERTED.column_N;
I discussed this and other enqueue/dequeue techniques on the article Using Tables as Queues.

Resources