I'm trying to understand how to properly use nonclustered indexes. Here what I found with test data.
CREATE TABLE TestTable
(
RowID int Not Null IDENTITY (1,1),
Continent nvarchar(100),
Location nvarchar(100)
CONSTRAINT PK_TestTable_RowID
PRIMARY KEY CLUSTERED (RowID)
)
ALTER TABLE TestTable
DROP CONSTRAINT PK_TestTable_RowID
GO
INSERT INTO TestTable
SELECT Continent, Location
FROM StgCovid19
DBCC DROPCLEANBUFFERS
DBCC FREEPROCCACHE
SELECT *
FROM TestTable
WHERE Continent = 'Asia' --551ms
CREATE NONCLUSTERED INDEX NCIContinent
ON TestTable(Continent)
DBCC DROPCLEANBUFFERS
DBCC FREEPROCCACHE
SELECT *
FROM TestTable
WHERE Continent = 'Asia' --1083ms
DROP INDEX NCIContinent
ON TestTable
CREATE NONCLUSTERED INDEX NCIContinent
ON TestTable(Continent)
INCLUDE (Location)
DBCC DROPCLEANBUFFERS
DBCC FREEPROCCACHE
SELECT *
FROM TestTable
WHERE Continent = 'Asia' ---530ms
As you guys can see, if I only add the non clustered index on the Continent column, it performs a seek and also takes double the time to execute the select.
When I add the INCLUDE (Location) it takes less than without any clustered index.
Are you guys able to tell me what is going on?
The strategy of accessing data depends on the table structure, but also, and mainly, on data distribution. It is why statistics about data distribution are stored in indexes and tables :
In indexes, to know the distribution (histogram) of the key value
In tables, to know the distribution (histogram) of the columns values
An execution plan is computed to create a tree compound of branches containing chained steps that are algorithms specialized in one action (join, sort, data access...) to construct the program that will retrieve the data in response to your demand (query).
The optimizer role is to determine, among many execution plans, which will be the most interresting, using the least resources (memory, data volume, cpu...). The plan choose is not systematycally the quicker one, but the one that will have the lower cost in terms in resources usage... This estimate is done by the optimizer upon the bias of statistics...
The test you make has no sense because we does not know the data distribution and the use of the DBCC DROPCLEANBUFFERS has a heavy slide effect that is not current in real word of databases exploitation. In real world, 98 % of the data used by users are in cache... !
Also mesuring time of execution of a query has two problematics :
this metric is not stable and depends of the PC activity which is
really heavy even when you do nothing. Usually we restart the test
at least 10 time, eliminating the slowest and the quickest time and
finally computing the average on the remaining 8 results
time is not the only figure that is interresting and can be mostly the time to send resulting data to the client application. To eliminate this
time, SSMS has a parameter that can execute the query without
sghowing the resulting dataset in SSMS
Related
Suppose a table in SQLServer with this structure:
TABLE t (Id INT PRIMARY KEY)
Then I have a stored procedure, which is constantly being called, that works inserting data in this table among other kind of things:
BEGIN TRAN
DECLARE #Id INT = SELECT MAX(Id) + 1 FROM t
INSERT t VALUES (#Id)
...
-- Stuff that gets a long time to get completed
...
COMMIT
The problem with this aproach is sometimes I get a primary key violation because 2 or more procedure calls get and try to insert the same Id on the table.
I have been able to solve this problem adding a tablock in the SELECT sentence:
DECLARE #Id INT = SELECT MAX(Id) + 1 FROM t WITH (TABLOCK)
The problem now is sucessive calls to the procedure must wait to the completion of the transaction currently beeing executed to start their work, allowing just one procedure to run simultaneosly.
Is there any advice or trick to get the lock just during the execution of the select and insert sentence?
Thanks.
TABLOCK is a terrible idea, since you're serialising all the calls (no concurrency).
Note that with an SP you will retain all the locks granted over the run until the SP completes.
So you want to minimise locks except for where you really need them.
Unless you have a special case, use an internally generated id:
CREATE TABLE t (Id INT IDENTITY PRIMARY KEY)
Improved performance, concurrency etc. since you are not dependent on external tables to manage the id.
If you have existing data you can (re)set the start value using DBCC
DBCC CHECKIDENT ('t', RESEED, 100)
If you need to inject rows with a value preassigned, use:
SET IDENTITY_INSERT t ON
(and off again afterwards, resetting the seed as required).
[Consider whether you want this value to be the primary key, or simply unique.
In many cases where you need to reference a tables PK as a FK then you'll want it as PK for simplicity of join, but having a business readable value (eg, Accounting Code or OrderNo+OrderLine is completely valid) : that's just modelling]
I have a big test table in my local host database (16 CPU, 124 GB RAM), which has nonclustered column store index. Every day I insert 10 million rows into this table. I found that my system runs very slowly without ending.
I see 2 queries which run parallel without ending and they make system extremely slow:
Query 1:
INSERT INTO TABLE ABC
Query 2:
ALTER NONCLUSTERED COLUMN STORED INDEX TABE ABC.
My questions:
Inserting into nonclustered column store index is very slow because it inserts new records and change the index at the same time. is that correct?
Do I need to disable the INDEX before INSERT and enable INDEX after INSERT to improve the performance?
i use SQL Server 2016 and this Version allows us to INSERT, UPDATE table with nonclustered column stored index.
Thank you
Those can't be running in parallel since according to the documentation:
Once you create a nonclustered columnstore index on a table, you cannot directly modify the data in that table. A query with INSERT, UPDATE, DELETE, or MERGE will fail and return an error message. To add or modify the data in the table, you can do one of the following:
Disable the columnstore index. You can then update the data in the table. If you disable the columnstore index, you can rebuild the columnstore index when you finish updating the data. For example:
Drop the columnstore index, update the table, and then re-create the columnstore index with CREATE COLUMNSTORE INDEX. For example:
EXAMPLE
ALTER INDEX mycolumnstoreindex ON mytable DISABLE;
-- update mytable --
ALTER INDEX mycolumnstoreindex on mytable REBUILD
DROP INDEX mycolumnstoreindex ON mytable
-- update mytable --
CREATE NONCLUSTERED COLUMNSTORE INDEX mycolumnstoreindex ON mytable;
So yes, you need to either DISABLE before the INSERT and REBUILD after the INSERT, or DROP then index before the INSERT and CREATE it again after the INSERT. I'm guessing the runs slow and never finishes is a blocking issue seperate from this index.
If the question was more generic, for a regular NONCLUSTERED INDEX, it could be beneficial to drop and recreate the index when you are trying to insert a large number of records, like 10 million in your case, since
indexes slow down inserts, updates and deletes (page splits, inserting / updating multiple indexes, etc)
inserting that many records will likely cause a lot of of fragmentation
I have a database which is 120GB in size. One of the tables that uses up a large amount of space has thousands of records being created on a daily basis. One of those columns is an nvarchar(max). This column can commonly have 2000 characters of data that only needs to be there for a week.
If I update that column to be blank after a week for those records, it does not seem to reduce the database size. E.g.:
UPDATE tblSample
SET largefieldname = ''
WHERE DateAdded < DATEADD(D, -7, GETDATE())
So if I insert 100mb worth of data into that column, and then blank that column after a week using the above statement, the database still remains 100mb in size.
How can I get the database to reduce in size after such a task? I don't want to delete the entire row, just remove the unnecessary disk usage by that one specific column.
You need to consider a redesign for the temporary data.
Put the temporary data in its own table and put a key to it in the main table.
When you are done with the data, delete the key value and truncate the table. This will not release the unused space but will make maintenance of it easier. The only way to reclaim usable disk space is to shrink the database or use partitions.
One way to release the space is to copy the table into temp file, truncate the table and reinsert the rows from the temp table.
Try this code:
create table dbo.x5 (name varchar(1000), i1 int IDENTITY(1,1))
set nocount on
-- select the next two rows and run
insert into dbo.x5 values (replicate('abcd',250))
go 10000
exec sp_spaceused 'dbo.x5' --11488 KB
update dbo.x5 set name=''
select * into #a1 from dbo.x5
truncate table dbo.x5
insert into dbo.x5 (name) select name from #a1
exec sp_spaceused 'dbo.x5' -- 136k
On a large table with several indexes it may pay to drop the indexes, reinsert the rows, and recreate the indexes. Your results will vary so run the code on your install.
If you have a clustered index on the table doing a rebuild on the index will also free up the space. I don't know if doing a rebuild on a heap (a table without a clustered index) will have the same effect.
you can apply a shrink process to the database in this way:
USE [YourDatabase]
GO
DBCC SHRINKFILE (N'YourDatabase')
GO
this is the command provided by SQL Server in order to return that space to the OS but also keep in mind that this is not recomended as this process create fragmentation over your indexes and also will impact CPU cost during the process, please take a look at the following reading http://www.sqlskills.com/blogs/paul/why-you-should-not-shrink-your-data-files/
Is there any possibility to disable auto creating statistics on specific table in database, without disabling auto creating statistics for entire database?
I have a procedure wich written as follow
create proc
as
create table #someTempTable(many columns, more than 100)
inserting into #someTempTable **always one or two row**
exec proc1
exec proc2
etc.
proc1, proc2 .. coontains many selects and updates like this:
select ..
from #someTempTable t
join someOrdinaryTable t2 on ...
update #someTempTable set col1 = somevalue
Profiler shows that before each select server starts collecting stats in #someTempTable, and it takes more than quarter of entire execution of proc. Proc is using in OLPT processing and should works very fast. I want to change this temporary table to table variable(because for table variables server doesn't collect stats) but can't because it lead me to rewrite all this procedures to passing variables between them and all of this legacy code should be retests. I'm searching alternative way how to force server to behave temporary table like table variables in part of collecting stats.
P.S. I'm know that stats is useful thing but in this case it's useless because table alway contains small amount of records.
I assume you know what you are doing. Disabling a statistics is generally a bad idea. Anyhow:
EXEC sp_autostats 'table_name', 'OFF'
More documentation here: https://msdn.microsoft.com/en-us/library/ms188775.aspx.
Edit: OP clarified that he wants to disable statistics for a temp table. Try this:
CREATE TABLE #someTempTable
(
ID int PRIMARY KEY WITH (STATISTICS_NORECOMPUTE = ON),
...other columns...
)
If you don't have a primary key already, use an identity column for a PK.
Complete newbie to Oracle DBA-ing, and yet trying to migrate a SQL Server DB (2008R2) to Oracle (11g - total DB size only ~20Gb)...
I'm having a major problem with my largest single table (~30 million rows). Rough structure of the table is:
CREATE TABLE TableW (
WID NUMBER(10,0) NOT NULL,
PID NUMBER(10,0) NOT NULL,
CID NUMBER(10,0) NOT NULL
ColUnInteresting1 NUMBER(3,0) NOT NULL,
ColUnInteresting2 NUMBER(3,0) NOT NULL,
ColUnInteresting3 FLOAT NOT NULL,
ColUnInteresting4 FLOAT NOT NULL,
ColUnInteresting5 VARCHAR2(1024 CHAR),
ColUnInteresting6 NUMBER(3,0) NOT NULL,
ColUnInteresting7 NUMBER(5,0) NOT NULL,
CreatedDate DATE NOT NULL,
ModifiedDate DATE NOT NULL,
CreatedByUser VARCHAR2(20 CHAR),
ModifiedByUser VARCHAR2(20 CHAR)
);
ALTER TABLE TableW ADD CONSTRAINT WPrimaryKey PRIMARY KEY (WID)
ENABLE;
CREATE INDEX WClusterIndex ON TableW (PID);
CREATE INDEX WCIDIndex ON TableW (CID);
ALTER TABLE TableW ADD CONSTRAINT FKTableC FOREIGN KEY (CID)
REFERENCES TableC (CID) ON DELETE CASCADE
ENABLE;
ALTER TABLE TableW ADD CONSTRAINT FKTableP FOREIGN KEY (PID)
REFERENCES TableP (PID) ON DELETE CASCADE
ENABLE;
Running through some basics test, it seems a simple 'DELETE FROM TableW WHERE PID=13455' is taking a huge amount of time (~880s) to execute what should be a quick delete (~350 rows). [query run via SQL Developer].
Generally, the performance of this table is noticeably worse than its SQL equivalent. There are no issues under SQL Server, and the structure of this table and the surrounding ones look sensible for Oracle by comparison to SQL.
My problem is that I cannot find a useful set of diagnostics to start looking for where the problem lies. Any queries / links greatly appreciated.
[The above is a request for help based on the assumption it should not take anything like 10 minutes to delete 350 rows from a table with 30 million records, when it takes SQL Server <1s to do the same for an equivalent DB structure]
EDIT:
The migration is being performed thus:
1 In SQL developer:
- Create Oracle User, tablespace, grants etc AS Sys
- Create the tables, sequences, triggers etc AS New User
2 Via some Java:
- Check SQL-Oracle structure consistency
- Disable all foreign keys
- Move data (Truncate destination table, Select From Old, Insert Into New)
- Adjust sequences to correct starting value
- Enable foreign keys
If you ask us how to improve the performance, then there are several ways to improve it:
Parallel DML
Partitioning.
Parallel DML consumes all the resource you have to perform the operation. Oracle runs several threads to complete the operation. Other sessions has to wait for the end of the operation, because system resources are busy.
Partitioning let you exclude old sections right away. For example, your table stores the data from 2000 to 2014. Most likely you don't need old records, so you can split your table for several partitions and exclude the oldest one.
Check the wait events for your session that's doing the DELETE. That will tell you what your main bottleneck is.
And echoing Marco's comment above - Make sure your table stats are up to date - that will help the optimizer build a good plan to run those queries for you.
To update all (and in case any else finds this):
The correct question to find a solution was: what tables do you have referencing this one?
The problem was another table (let's call it TableV) using WID as a foreign key, but the WID column in TableV was not indexed. This means for every record delete in TableW, the whole of TableV had to be searched for associated records to be deleted. As TableV is >3 million rows, deleting the small set of 350 rows in TableV meant the Oracle server trying to read a total of >1 billion rows.
A single index added to WID in TableV, and the delete statement now takes <1s.
Thanks to all for the comments - a lot of Oracle inner working learnt!