I have a table in SQL DB Server. (Table Name : Material, 6 columns). It contains 2.6 million records. I need to update this table based on two column values. For update, system is taking 2 seconds.
Please help me how optimize below query.
UPDATE Material
SET Value = #Value,
Format = #Format,
SValue = #SValue,
CGroup = #CGroup
WHERE
SM = #SM
AND Characteristic = #Characteristic
You really need to provide the query plan before we can tell you with any certainty what, if anything, might help.
Having said that, the first thing I would check is whether the plan is showing a great deal of time doing a table scan. If so, you could improve performance substantially if it is a large table by adding an index on SM and Characteristic - that will allow the profiler to use the index to perform an index seek instead of a table scan, and could improve performance dramatically.
As you got big data few tweaks can increase query performance
(1) If column to be updated is indexed, remove index
(2) Executing the update in smaller batches
DECLARE #i INT=1
WHILE( #i <= 10 )
BEGIN
UPDATE TOP(20000) Material
SET Value = #Value,
Format = #Format,
SValue = #SValue,
CGroup = #CGroup
WHERE
SM = #SM
AND Characteristic = #Characteristic
SET #i=#i + 1
END
(3) Disabling Delete triggers (if any)
Hope this helps !
Try to put composite index for SM & Characteristic .By doing this, the sql server will be able to select records more easily. Operational wise, Update is a combination of insert & delete.If your table is having more columns, it may slow down your update even if you are not try to update all the columns.
Steps i prefer
Try to put composite index with SM & Characteristic
Try to re create a table with required columns & use joins where ever needed.
2.6 mil rows is not that much. 2 secs for an update is probably too much.
Having said that, the update times could depend on two things.
First, how many rows are being updated with a single update command, ie is it just one row or some larger set? You can't really do much about that, just saying it should be taken into consideration.
The other thing are indexes - you could either have too many of then or not enough.
If the table is missing an index on (SM, Characteristic) -- or (Characteristic, SM), depending on the selectivity -- then it's probably a full table scan every time. If the update touches only a couple of rows, it's waste of time. So, it's the first thing to check.
If there are too many indexes on the affected columns, this could slow down updates as well, because those indexes have to be maintained with every change of data. You can check the usefulness of indexes by querying the sys.dm_db_index_usage_stats DMV (plenty of explanation on the internet, so I won't get into it here) and remove the unused ones. Just be carefull with this and test thoroughly.
One other thing to check is whether the affected columns are part of some foreign key constraint. In that case, the engine must check the validity of the constraint every time (iow, check if the new value exists in the referenced table, or check if there's data in referencing tables, depending on which side of the FK the column is). If there are no supporting indexes for this check, it would cause (again) a scan on the other tables involved.
But to really make sure, check the exec plan and IO stats (SET STATISTICS IO ON), it will tell you exactly what is going on.
Related
I have 1 unit test which enters the user in system through UI but before that it removes the existing entry for that user
I have 3 sets of query and each set has only 1 record which I want to delete, but in my unit test it fails in executing delete query and returns timeout error
I don't know how can I optimise the query, If someone can help in this
delete from CustomerRoles where RegisteredCustomerId = (select Id from RegisteredCustomers where Email = 'boltestsignupseller#yahoo.com')
delete from SellerInfos where RegisteredCustomerId = (select Id from RegisteredCustomers where Email = 'boltestsignupseller#yahoo.com')
DELETE FROM RegisteredCustomers where Email = 'boltestsignupseller#yahoo.com'
Second and third query almost takes more than 2 minutes and eventually timeout
without knowledge of the database, this is impossible to comment on, but common causes would include:
a missing index on the column being used to filter (or an unusable index - perhaps due to varchar vs nvarchar, etc)
blocking due to conflicting operations
the existence of triggers performing an unbounded amount of additional hidden work
Since the queries appear to be expecting a single RegisteredCustomers record, you can possibly reduce some overhead by capturing the located Id into a local variable at the start, and using that local in all three deletes, but this isn't a magic wand:
declare #id int = (
select Id from RegisteredCustomers where Email = 'boltestsignupseller#yahoo.com');
delete from CustomerRoles where RegisteredCustomerId = #id;
delete from SellerInfos where RegisteredCustomerId = #id;
delete from RegisteredCustomers where Id = #id;
Most likely, though, you'll need to actually investigate what is happening (look at blocks, look at the query plan, look at the IO stats, look at the indexing etc).
If there are lots of foreign keys on the tables, and those foreign keys are poorly indexed, it can take non-trivial amounts of time to perform deletes simply because it has to do a lot of work to ensure that the deletes don't violate referential integrity. In some cases, it is preferable to perform a logical delete rather than a physical delete, to avoid this work - i.e. have a column that signifies deletion, and just do an update ... set DeletionDate = GETUTCDATE() ... where ... rather than a delete (but: you need to remember for filter by this column in your queries).
I am trying to speed up the execution time of my stored procedure. One inner join in particular is taking around 5 seconds to execute. I looked at the execution plan and it seemed the bottle neck was on an inner join.
I tried creating a few non clustered indexes as there was a 65% cost for an index seek (nonclustered).
Forgive me if I did not provide enough information as I am not that accustomed to using indexes in sql.
Here is the query that takes ~5 seconds to execute as the tables contain a lot of data:
INSERT INTO TBL_1(TBL2.COLA, TBL4.COLA, TBL4.COLB, TBL4.COLC, TBL3.COLA)
SELECT TBL2.COLA, TBL4.COLA, TBL4.COLB, TBL4.COLC, TBL2.COLB
FROM TBL_2 TBL2 with(index(idx_tbl2IDX))
INNER JOIN TBL_3 TBL3 with(index(idx_tbl3IDX))
ON TBL2.COLB = TBL3.COLB
INNER JOIN TBL_4 TBL4 with(index(idx_tbl4IDX))
ON TBL3.COLA = TBL4.COLD
AND TBL4.COLA % 1000 = TBL3.COLC
AND TBL4.COLE = 0
WHERE TBL2.COLC = 1
And here are my indexes (i originally just created one for TBL_4 since that is where the biggest cost in the execution plan was but i ended up creating one for each table to see if it made any difference, which it didn't
CREATE NONCLUSTERED INDEX [idx_tbl4IDX]
ON [dbo].TBL_4(COLD, COLA, COLE)
INCLUDE (COLB, COLC);
CREATE NONCLUSTERED INDEX [idx_tbl3IDX]
ON [dbo].TBL_3 (COLB, COLA, COLC)
CREATE NONCLUSTERED INDEX [idx_tbl2IDX]
ON [dbo].TBL_2(COLB, COLC)
INCLUDE (COLA);
I realize this may be a bit confusing as I renamed all the columns and tables, if it makes no sense please let me know and I will try and use better naming conventions.
Perhaps post the actual execution plan, but it's likely that this
AND TBL4.COLA % 1000 = TBL3.COLC
is causing the slowness. The order of the columns in the index also might play into this, depending on how big your dataset is. Try ordering them from Most to Least selective. For instance, if TBL4.COLE is a 1/0 value and there are very few 0's, then perhaps make that the first column in your index.
Without knowing number of rows, selectivity etc. it is really hard to say anything. I would suggest
Remove all those with(index... (and never return them back)
Update statistics for all tables (e.g. UPDATE STATISTICS TBL_2 WITH FULLSCAN)
Add all possible indexes. There are 6 for tables TBL_3 and TBL_4 and two for TBL_2.
Run the query, see which indexes are used and what the time is.
If the time is ok, You can just delete indexes You do not need. If it is not, You would probably need to do something with the % 1000. You can make calculated persisted column and index that instead.
Is it more efficient and ultimately FASTER to delete rows from a DB in blocks of 1000 or 10000? I am having to remove approx 3 million rows from many tables. I first did the deletes in blocks of 100K rows but the performance wasn't looking good. I changed to 10000 and seem to be removing faster. Wondering if even smaller like 1K per DELETE statement is even better.
Thoughts?
I am deleting like this:
DELETE TOP(10000)
FROM TABLE
WHERE Date < '1/1/2012'
Yes, it is. It all depends on your server though. I mean, last time I did that i was using this approeach to delete things in 64 million increments (on a table that had at that point around 14 billion rows, 80% Of which got ultimately deleted). I got a delete through every 10 seconds or so.
It really depends on your hardware. Going moreg granular is more work but it means less waiting for tx logs for other things operating on the table. You have to try out and find where you are comfortable - there is no ultimate answer because it is totally dependend on usage of the table and hardware.
We used Table Partitioning to remove 5 million rows in less than a sec but this was from just one table. It took some work up-front but ultimately was the best way. This may not be the best way for you.
From our document about partitioning:
Let’s say you want to add 5 million rows to a table but don’t want to lock the table up while you do it. I ran into a case in an ordering system where I couldn’t insert the rows without stopping the system from taking orders. BAD! Partitioning is one way of doing it if you are adding rows that don’t overlap current data.
WHAT TO WATCH OUT FOR:
Data CANNOT overlap current data. You have to partition the data on a value. The new data cannot be intertwined within the currently partitioned data. If removing data, you have to remove an entire partition or partitions. You will not have a WHERE clause.
If you are doing this on a production database and want to limit the locking on the table, create your indexes with “ONLINE = ON”.
OVERVIEW OF STEPS:
FOR ADDING RECORDS
Partition the table you want to add records to (leave a blank partition for the new data). Do not forget to partition all of your indexes.
Create new table with the exact same structure (keys, data types, etc.).
Add a constraint to the new table to limit that data so that it would fit into the blank partition in the old table.
Insert new rows into new table.
Add indexes to match old table.
Swap the new table with the blank partition of the old table.
Un-partition the old table if you wish.
FOR DELETING RECORDS
Partition the table into sets so that the data you want to delete is all on partitions by itself (this could be many different partitions).
Create a new table with the same partitions.
Swap the partitions with the data you want to delete to the new table.
Un-partition the old table if you wish.
Yes, no, it depends on the usage of table due to locking. I would try to delete the records in a slower pace. So the opposite of the op's question.
set rowcount 10000
while ##rowcount > 0
begin
waitfor delay '0:0:1'
delete
from table
where date < convert(datetime, '20120101', 112)
end
set rowcount 0
I'm looking for ways to reduce memory consumption by SQLite3 in my application.
At each execution it creates a table with the following schema:
(main TEXT NOT NULL PRIMARY KEY UNIQUE, count INTEGER DEFAULT 0)
After that, the database is filled with 50k operations per second. Write only.
When an item already exists, it updates "count" using an update query (I think this is called UPSERT). These are my queries:
INSERT OR IGNORE INTO table (main) VALUES (#SEQ);
UPDATE tables SET count=count+1 WHERE main = #SEQ;
This way, with 5 million operations per transaction, I can write really fast to the DB.
I don't really care about disk space for this problem, but I have a very limited RAM space. Thus, I can't waste too much memory.
sqlite3_user_memory() informs that its memory consumption grows to almost 3GB during the execution. If I limit it to 2GB through sqlite3_soft_heap_limit64(), database operations' performance drops to almost zero when reaching 2GB.
I had to raise cache size to 1M (page size is default) to reach a desirable performance.
What can I do to reduce memory consumption?
It seems that the high memory consumption may be caused by the fact that too many operations are concentrated in one big transaction. Trying to commit smaller transaction like per 1M operations may help. 5M operations per transaction consumes too much memory.
However, we'd balance the operation speed and memory usage.
If smaller transaction is not an option, PRAGMA shrink_memory may be a choice.
Use sqlite3_status() with SQLITE_STATUS_MEMORY_USED to trace the dynamic memory allocation and locate the bottleneck.
I would:
prepare the statements (if you're not doing it already)
lower the amount of INSERTs per transaction (10 sec = 500,000 sounds appropriate)
use PRAGMA locking_mode = EXCLUSIVE; if you can
Also, (I'm not sure if you know) the PRAGMA cache_size is in pages, not in MBs. Make sure you define your target memory in as PRAGMA cache_size * PRAGMA page_size or in SQLite >= 3.7.10 you can also do PRAGMA cache_size = -kibibytes;. Setting it to 1 M(illion) would result in 1 or 2 GB.
I'm curious how cache_size helps in INSERTs though...
You can also try and benchmark if the PRAGMA temp_store = FILE; makes a difference.
And of course, whenever your database is not being written to:
PRAGMA shrink_memory;
VACUUM;
Depending on what you're doing with the database, these might also help:
PRAGMA auto_vacuum = 1|2;
PRAGMA secure_delete = ON;
I ran some tests with the following pragmas:
busy_timeout=0;
cache_size=8192;
encoding="UTF-8";
foreign_keys=ON;
journal_mode=WAL;
legacy_file_format=OFF;
synchronous=NORMAL;
temp_store=MEMORY;
Test #1:
INSERT OR IGNORE INTO test (time) VALUES (?);
UPDATE test SET count = count + 1 WHERE time = ?;
Peaked ~109k updates per second.
Test #2:
REPLACE INTO test (time, count) VALUES
(?, coalesce((SELECT count FROM test WHERE time = ? LIMIT 1) + 1, 1));
Peaked at ~120k updates per second.
I also tried PRAGMA temp_store = FILE; and the updates dropped by ~1-2k per second.
For 7M updates in a transaction, the journal_mode=WAL is slower than all the others.
I populated a database with 35,839,987 records and now my setup is taking nearly 4 seconds per each batch of 65521 updates - however, it doesn't even reach 16 MB of memory consumption.
Ok, here's another one:
Indexes on INTEGER PRIMARY KEY columns (don't do it)
When you create a column with INTEGER PRIMARY KEY, SQLite uses this
column as the key for (index to) the table structure. This is a hidden
index (as it isn't displayed in SQLite_Master table) on this column.
Adding another index on the column is not needed and will never be
used. In addition it will slow INSERT, DELETE and UPDATE operations
down.
You seem to be defining your PK as NOT NULL + UNIQUE. PK is UNIQUE implicitly.
Assuming that all the operations in one transaction are distributed all over the table so that all pages of the table need to be accessed, the size of the working set is:
about 1 GB for the table's data, plus
about 1 GB for the index on the main column, plus
about 1 GB for the original data of all the table's pages changed in the transaction (probably all of them).
You could try to reduce the amount of data that gets changed for each operation by moving the count column into a separate table:
CREATE TABLE main_lookup(main TEXT NOT NULL UNIQUE, rowid INTEGER PRIMARY KEY);
CREATE TABLE counters(rowid INTEGER PRIMARY KEY, count INTEGER DEFAULT 0);
Then, for each operation:
SELECT rowid FROM main_lookup WHERE main = #SEQ;
if not exists:
INSERT INTO main_lookup(main) VALUES(#SEQ);
--read the inserted rowid
INSERT INTO counters VALUES(#rowid, 0);
UPDATE counters SET count=count+1 WHERE rowid = #rowid;
In C, the inserted rowid is read with sqlite3_last_insert_rowid.
Doing a separate SELECT and INSERT is not any slower than INSERT OR IGNORE; SQLite does the same work in either case.
This optimization is useful only if most operations update a counter that already exists.
In the spirit of brainstorming I will venture an answer. I have not done any testing like this fellow:
Improve INSERT-per-second performance of SQLite?
My hypothesis is that the index on the text primary key might be more RAM-intensive than a couple of indexes on two integer columns (what you'd need to simulate a hashed-table).
EDIT: Actually, you dont' even need a primary key for this:
create table foo( slot integer, myval text, occurrences int);
create index ix_foo on foo(slot); // not a unique index
An integer primary key (or a non-unique index on slot) would leave you with no quick way to determine if your text value were already on file. So to address that requirement, you might try implementing something I suggested to another poster, simulating a hashed-key:
SQLite Optimization for Millions of Entries?
A hash-key-function would allow you to determine where the text-value would be stored if it did exist.
http://www.cs.princeton.edu/courses/archive/fall08/cos521/hash.pdf
http://www.fearme.com/misc/alg/node28.html
http://cs.mwsu.edu/~griffin/courses/2133/downloads/Spring11/p677-pearson.pdf
I'm learning more details in table variable. It says that temp tables are always on disk, and table variables are in memory, that is to say, the performance of table variable is better than temp table because table variable uses less IO operations than temp table.
But sometimes, if there are too many records in a table variable that can not be contained in memory, the table variable will be put on disk like the temp table.
But I don't know what the "too many records" is. 100,000 records? or 1000,000 records? How can I know if a table variable I'm using is in memory or is on disk? Is there any function or tool in SQL Server 2005 to measure the scale of the table variable or letting me know when the table variable is put on disk from memory?
Your question shows you have succumbed to some of the common misconceptions surrounding table variables and temporary tables.
I have written quite an extensive answer on the DBA site looking at the differences between the two object types. This also addresses your question about disk vs memory (I didn't see any significant difference in behaviour between the two).
Regarding the question in the title though as to when to use a table variable vs a local temporary table you don't always have a choice. In functions, for example, it is only possible to use a table variable and if you need to write to the table in a child scope then only a #temp table will do
(table-valued parameters allow readonly access).
Where you do have a choice some suggestions are below (though the most reliable method is to simply test both with your specific workload).
If you need an index that cannot be created on a table variable then you will of course need a #temporary table. The details of this are version dependant however. For SQL Server 2012 and below the only indexes that could be created on table variables were those implicitly created through a UNIQUE or PRIMARY KEY constraint. SQL Server 2014 introduced inline index syntax for a subset of the options available in CREATE INDEX. This has been extended since to allow filtered index conditions. Indexes with INCLUDE-d columns or columnstore indexes are still not possible to create on table variables however.
If you will be repeatedly adding and deleting large numbers of rows from the table then use a #temporary table. That supports TRUNCATE (which is more efficient than DELETE for large tables) and additionally subsequent inserts following a TRUNCATE can have better performance than those following a DELETE as illustrated here.
If you will be deleting or updating a large number of rows then the temp table may well perform much better than a table variable - if it is able to use rowset sharing (see "Effects of rowset sharing" below for an example).
If the optimal plan using the table will vary dependent on data then use a #temporary table. That supports creation of statistics which allows the plan to be dynamically recompiled according to the data (though for cached temporary tables in stored procedures the recompilation behaviour needs to be understood separately).
If the optimal plan for the query using the table is unlikely to ever change then you may consider a table variable to skip the overhead of statistics creation and recompiles (would possibly require hints to fix the plan you want).
If the source for the data inserted to the table is from a potentially expensive SELECT statement then consider that using a table variable will block the possibility of this using a parallel plan.
If you need the data in the table to survive a rollback of an outer user transaction then use a table variable. A possible use case for this might be logging the progress of different steps in a long SQL batch.
When using a #temp table within a user transaction locks can be held longer than for table variables (potentially until the end of transaction vs end of statement dependent on the type of lock and isolation level) and also it can prevent truncation of the tempdb transaction log until the user transaction ends. So this might favour the use of table variables.
Within stored routines, both table variables and temporary tables can be cached. The metadata maintenance for cached table variables is less than that for #temporary tables. Bob Ward points out in his tempdb presentation that this can cause additional contention on system tables under conditions of high concurrency. Additionally, when dealing with small quantities of data this can make a measurable difference to performance.
Effects of rowset sharing
DECLARE #T TABLE(id INT PRIMARY KEY, Flag BIT);
CREATE TABLE #T (id INT PRIMARY KEY, Flag BIT);
INSERT INTO #T
output inserted.* into #T
SELECT TOP 1000000 ROW_NUMBER() OVER (ORDER BY ##SPID), 0
FROM master..spt_values v1, master..spt_values v2
SET STATISTICS TIME ON
/*CPU time = 7016 ms, elapsed time = 7860 ms.*/
UPDATE #T SET Flag=1;
/*CPU time = 6234 ms, elapsed time = 7236 ms.*/
DELETE FROM #T
/* CPU time = 828 ms, elapsed time = 1120 ms.*/
UPDATE #T SET Flag=1;
/*CPU time = 672 ms, elapsed time = 980 ms.*/
DELETE FROM #T
DROP TABLE #T
Use a table variable if for a very small quantity of data (thousands of bytes)
Use a temporary table for a lot of data
Another way to think about it: if you think you might benefit from an index, automated statistics, or any SQL optimizer goodness, then your data set is probably too large for a table variable.
In my example, I just wanted to put about 20 rows into a format and modify them as a group, before using them to UPDATE / INSERT a permanent table. So a table variable is perfect.
But I am also running SQL to back-fill thousands of rows at a time, and I can definitely say that the temporary tables perform much better than table variables.
This is not unlike how CTE's are a concern for a similar size reason - if the data in the CTE is very small, I find a CTE performs as good as or better than what the optimizer comes up with, but if it is quite large then it hurts you bad.
My understanding is mostly based on http://www.developerfusion.com/article/84397/table-variables-v-temporary-tables-in-sql-server/, which has a lot more detail.
Microsoft says here
Table variables does not have distribution statistics, they will not trigger recompiles. Therefore, in many cases, the optimizer will build a query plan on the assumption that the table variable has no rows. For this reason, you should be cautious about using a table variable if you expect a larger number of rows (greater than 100). Temp tables may be a better solution in this case.
I totally agree with Abacus (sorry - don't have enough points to comment).
Also, keep in mind it doesn't necessarily come down to how many records you have, but the size of your records.
For instance, have you considered the performance difference between 1,000 records with 50 columns each vs 100,000 records with only 5 columns each?
Lastly, maybe you're querying/storing more data than you need? Here's a good read on SQL optimization strategies. Limit the amount of data you're pulling, especially if you're not using it all (some SQL programmers do get lazy and just select everything even though they only use a tiny subset). Don't forget the SQL query analyzer may also become your best friend.
Variable table is available only to the current session, for example, if you need to EXEC another stored procedure within the current one you will have to pass the table as Table Valued Parameter and of course this will affect the performance, with temporary tables you can do this with only passing the temporary table name
To test a Temporary table:
Open management studio query editor
Create a temporary table
Open another query editor window
Select from this table "Available"
To test a Variable table:
Open management studio query editor
Create a Variable table
Open another query editor window
Select from this table "Not Available"
something else I have experienced is: If your schema doesn't have GRANT privilege to create tables then use variable tables.
writing data in tables declared declare #tb and after joining with other tables, I realized that the response time compared to temporary tables tempdb .. # tb is much higher.
When I join them with #tb the time is much longer to return the result, unlike #tm, the return is almost instantaneous.
I did tests with a 10,000 rows join and join with 5 other tables