SQL Server inserts and select taking long time - sql-server

We have a table with about 20 columns as shown below:
We need to do 1000 records insert and select later also produces about 1000 records.
inserts were tried to be done in 2 ways:
parallel via parallel.For c# loop
sql adapter inserting whole dataset filled with 1000 records.
Inserts in both cases are taking over 30 seconds. We even tried doing this in a fresh clean table. How can this be sped up ?
[Earlier for normal 10 column table we have done 2 million record inserts via parallel.for in about 60 seconds.]
Select (tested from SQL mgmt studio) returning 2000 records is also taking more than 30 seconds, even in a clean table.
Time is variable as per:
mgmt studio was running since many days: 17-30 seconds
closed and reopened - 1st select returns in 1 sec.
- 2nd and consequent selects about 7-10 seconds to retrieve all rows.
Does variable size or upper limit fixed size make lot of difference in columns VARCHAR(SIZE) ?
[disk is good speed one(RAID ? not sure) and dedicated for this database]
Table schema: (No PK)
varchar(50)
varchar(2)
smallint
varchar(2048)
int
int
varchar(2048)
varchar(MAX)
varchar(MAX)
varchar(MAX)
smallint
varchar(500)
varchar(500)
varchar(MAX)
smallint
smallint
bigint
bigint
bigint
varchar(2048)
smallint
varchar(MAX)
varchar(MAX)
varchar(2048)
datetime
Index:
Index is on varchar(50) , non-unique non-clustered
SELECT statement:
select *
from table
where varchar(50) = 'value1'
and varchar(2) = 'value2'
and smallint = 'value3'
The composition is each unique varchar(50) has 5 unique varchar(2) entries and for each varchar(2) further, 1-3 smallint entries.

Have a look at the SqlBulkCopy class. I did a comparison a while back about high performance loading of data from .NET to SQL Server, comparing SqlBulkCopy vs SqlDataAdapter with the bottom line being, to load 100,000 rows:
SqlDataAdapter: 25.0729s
SqlBulkCopy: 0.8229s
Blogged about it here
UPDATE:
In terms of SELECT performance, try an index on the 3 fields being queried on - that will allow an index seek to be performed. At present, with just an index on the VARCHAR(50), it will be doing a scan. As you are doing a SELECT * to return ALL columns, it will then have to go off and lookup the rest of the data from those other columns as they would not be included in the index. This could be expensive, so you should consider NOT doing the SELECT * and only return the columns you actually need (if you don't actually need them all). The ones you do really need, name explicitly in the SELECT and you can then INCLUDE them in the index you created on the 3 fields in the WHERE clause. (see MDSN ref on INCLUDE: http://msdn.microsoft.com/en-us/library/ms190806.aspx)

To speed up queries:
don't make a VARCHAR(50) your primary (and thus: clustering) key; use something narrower, and something that is fixed in size. INT IDENTITY works the best
why do you have VARCHAR(8000) in your table?? That poses a lot of pressure on the table - why not just make those VARCHAR(MAX) as well??
analyse your queries and create the proper non-clustered indices on columns that can be indexed

Related

Slow insert performance on a table with a lot of empty nvarchar columns

I have a table with 150 columns. Most of them are of type nvarchar(100) null, and some are decimal(20,7) null
Example:
Create table myTable
(
ID bigint (PK),
Col1 Nvarchar(100) null,
Col2 Nvarchar(100) null,
Col3 Nvarchar(100) null,
....
Col150 nvarchar(100) null
)
When I do an insert I insert only in 20 columns. When I try to insert 1 or 2 million records it takes a lot of time (>1 minute with 32gb ram)
When I insert the same amount of records into a temp table it takes just 1-2 seconds.
I also tried to remove primary key but the results are the same. How can I speed up insert into a table with a lot of empty nvarchar columns?
Since you have a lot of empty columns I'd suggest using XML as a column type if it's possible
Create table myTable
(
ID bigint (PK),
Col1 XML
)
This will increase your performances in circumstances you explained.
Since using xml is a greater topic you can read more here
Speaking about the design, this is not the best practice obviously, having hundreds of nullable columns, which will be null most of the times.
So, I may say :
If you can redesign the table, you may :
Use one XML column to store all of those nullable columns (as #seesharpguru mentioned)
Convert all of those columns into rows in a separate table
Use empty strings instead of NULL values, so you can set all the columns NOT NULLABLE
If you don't have any choices, you may :
Use BULK INSERT
Just wait until finish and let's hope users will be OK with the SLA ;-)
I'm sure there are many options out there, but at least I've gone through all of these solutions before.
Sorry I cannot give you more other than ideas.
It's likely you experience massive page splits. Can be confirmed by monitoring (Profiler or DMV).
Too many columns per a table.
Requires either to:
redesign the table (normalize or split - Having 4 tables with 40 columns is much faster than 1 table with 150 columns)
try to defragment the table
try different fillfactor

Varchar(100) takes more space than char(100) on variable length data

I am on a quest to understand the different char datatypes in SQL Server. I have a test column Address with 100.000 records of type char(100). The data distribution in the column is uniform, which means each character length between 1-100 is represented exactly 1000 times. I then ran the following script, expecting to see a decrease in storage space used:
exec sp_spaceused N'AddressTable' -- Data size: 59.920 KB
alter table dbo.AddressTable
alter column [Address] varchar(100)
alter table dbo.AddressTable REBUILD
exec sp_spaceused N'AddressTable' -- Data size: 61.848 KB
but as you can see, varchar(100) actually takes up more space than char(100). How is that possible, given that the data entries vary so much in size?
Thanks for your time.
What happens if you create a new table, and then insert your data into it, instead of using ALTER TABLE?
Alternatively, if your table has a clustered index (no DDL so we don't know) try this:
ALTER INDEX ClusteredIndexName ON YourTable REBUILD

Can I have a primary key and a separate clustered index together?

Let's assume I already have a primary key, which makes sure uniqueness. My primary key is also ordering index for the records. However, I am curious about the primary key's task in physical order of records in the disk (if there is). And the actual question is can I have a separate clustered index for these records?
This is an attempt at testing the size and performance characteristics of a covering secondary index on a clustered table, as per discussion with #Catcall.
All tests were done on MS SQL Server 2008 R2 Express (inside a fairly underpowered VM).
Size
First, I crated a clustered table with a secondary index and filled it with some test data:
CREATE TABLE THE_TABLE (
FIELD1 int,
FIELD2 int NOT NULL,
CONSTRAINT THE_TABLE_PK PRIMARY KEY (FIELD1)
);
CREATE INDEX THE_TABLE_IE1 ON THE_TABLE (FIELD2) INCLUDE (FIELD1);
DECLARE #COUNT int = 1;
WHILE #COUNT <= 1000000 BEGIN
INSERT INTO THE_TABLE (FIELD1, FIELD2) VALUES (#COUNT, #COUNT);
SET #COUNT = #COUNT + 1;
END;
EXEC sp_spaceused 'THE_TABLE';
The last line gave me the following result...
name rows reserved data index_size unused
THE_TABLE 1000000 27856 KB 16808 KB 11008 KB 40 KB
So, the index's B-Tree (11008 KB) is actually smaller than the table's B-Tree (16808 KB).
Speed
I generated a random number within the range of the data in the table, and then used it as criteria for selecting a whole row from the table. This was repeated 10000 times and the total time measured:
DECLARE #I int = 1;
DECLARE #F1 int;
DECLARE #F2 int;
DECLARE #END_TIME DATETIME2;
DECLARE #START_TIME DATETIME2 = SYSDATETIME();
WHILE #I <= 10000 BEGIN
SELECT #F1 = FIELD1, #F2 = FIELD2
FROM THE_TABLE
WHERE FIELD1 = (SELECT CEILING(RAND() * 1000000));
SET #I = #I + 1;
END;
SET #END_TIME = SYSDATETIME();
SELECT DATEDIFF(millisecond, #START_TIME, #END_TIME);
The last line produces an average time (of 10 measurements) of 181.3 ms.
When I change the query condition to: WHERE FIELD2 = ..., so the secondary index is used, the average time is 195.2 ms.
Execution plans:
So the performance (of selecting on the PK versus on the covering secondary index) seems to be similar. For much larger amounts of data, I suspect the secondary index could possibly be slightly faster (since it seems more compact and therefore cache-friendly), but I didn't hit that yet in my testing.
String Measurements
Using varchar(50) as type for FIELD1 and FIELD2 and inserting strings that vary in length between 22 and 28 characters gave similar results.
The sizes were:
name rows reserved data index_size unused
THE_TABLE 1000000 208144 KB 112424 KB 95632 KB 88 KB
And the average timings were: 254.7 ms for searching on FIELD1 and 296.9 ms fir FIELD2.
Conclusion
If a clustered table has a covering secondary index, that index will have space and time characteristics similar to the table itself (possibly slightly slower, but not by much). If effect, you'll have two B-Trees that sort their data differently, but are otherwise very similar, achieving your goal of having a "second cluster".
It depends on your dbms. Not all of them implement clustered indexes. Those that do are liable to implement them in different ways. As far as I know, every platform that implements clustered indexes also provides ways to choose which columns are in the clustered index, although often the primary key is the default.
In SQL Server, you can create a nonclustered primary key and a separate clustered index like this.
create table test (
test_id integer primary key nonclustered,
another_column char(5) not null unique clustered
);
I think that the closest thing to this in Oracle is an index organized table. I could be wrong. It's not quite the same as creating a table with a clustered index in SQL Server.
You can't have multiple clustered indexes on a single table in SQL Server. A table's rows can only be stored in one order at a time. Actually, I suppose you could store rows in multiple, distinct orders, but you'd have to essentially duplicate all or part of the table for each order. (Although I didn't know it at the time I wrote this answer, DB2 UDB supports multiple clustered indexes, and it's quite an old feature. Its design and implementation is quite different from SQL Server.)
A primary key's job is to guarantee uniqueness. Although that job is often done by creating a unique index on the primary key column(s), strictly speaking uniqueness and indexing are two different things with two different aims. Uniqueness aims for data integrity; indexing aims for speed.
A primary key declaration isn't intended to give you any information about the order of rows on disk. In practice, it usually gives you some information about the order of index entries on disk. (Because primary keys are usually implemented using a unique index.)
If you SELECT rows from a table that has a clustered index, you still can't be assured that the rows will be returned to the user in the same order that they're stored on disk. Loosely speaking, the clustered index helps the query optimizer find rows faster, but it doesn't control the order in which those rows are returned to the user. The only way to guarantee the order in which rows are returned to the user is with an explicit ORDER BY clause. (This seems to be a fairly frequent point of confusion. A lot of people seem surprised when a bare SELECT on a clustered index doesn't return rows in the order they expect.)

Changing Datatype from int to bigint for tables containing billions of rows

I have couple tables with millions, and in some table billions, of rows, with one column as int now I am changing to bigint. I tried changing datatype using SSMS and it failed after a couple of hours as transaction log full.
Another approach I took is to create a new column and started updating value from old column to new column in batches, by setting ROWCOUNT property to 100000, it works but it very slow and it claims full server memory. With this approach, it may take a couple of days to complete, and it won't be acceptable in production.
What is the fast\best way to change datatype? The source column is not identity column and duplicate, and null is allowed. The table has an index on other columns, shall disabling index will speed up the process? Will adding Begin Tran and Commit help?
I ran a test for the ALTER COLUMN that shows the actual time required to make the change. The results show that the ALTER COLUMN is not instantaneous, and the time required grows linearly.
RecordCt Elapsed Mcs
----------- -----------
10000 184019
100000 1814181
1000000 18410841
My recommendation would be to batch it as you suggested. Create a new column, and pre-populate the column over time using a combination of ROWCOUNT and WAITFOR.
Code your script so that the WAITFOR value is read from a table. That way you can modify the WAITFOR value on-the-fly as your production server starts to bog down. You can shorten the WAITFOR during off-peak hours. (You can even use DMVs to make your WAITFOR value automatic, but this is certainly more complex.)
This is a complex update that will require planning and a lot of babysitting.
Rob
Here is the ALTER COLUMN test code.
USE tempdb;
SET NOCOUNT ON;
GO
IF EXISTS (SELECT * FROM sys.tables WHERE [object_id] = OBJECT_ID('dbo.TestTable'))
DROP TABLE dbo.TestTable;
GO
CREATE TABLE dbo.TestTable (
ColID int IDENTITY,
ColTest int NULL,
ColGuid uniqueidentifier DEFAULT NEWSEQUENTIALID()
);
GO
INSERT INTO dbo.TestTable DEFAULT VALUES;
GO 10000
UPDATE dbo.TestTable SET ColTest = ColID;
GO
DECLARE #t1 time(7) = SYSDATETIME();
DECLARE #t2 time(7);
ALTER TABLE dbo.TestTable ALTER COLUMN ColTest bigint NULL;
SET #t2 = SYSDATETIME();
SELECT
MAX(ColID) AS RecordCt,
DATEDIFF(mcs, #t1, #t2) AS [Elapsed Mcs]
FROM dbo.TestTable;
a simple alter table <table> alter column <column> bigint null should take basically no time. there won't be any conversion issues or null checks - i don't see why this wouldn't be relatively instant
if you do it through the GUI, it'll probably try to create a temp table, drop the existing table, and create a new one - definitely don't do that
In SQL Server 2016+, this alter table <table> alter column <column> bigint null statement will be a simple metadata change (instant) if the table is fully compressed.
More info here from #Paul White:
https://sqlperformance.com/2020/04/database-design/new-metadata-column-changes-sql-server-2016
Compression must be enabled:
On all indexes and partitions, including the base heap or clustered index.
Either ROW or PAGE compression.
Indexes and partitions may use a mixture of these compression levels. The important thing is there are no uncompressed indexes or partitions.
Changing from NULL to NOT NULL is not allowed.
The following integer type changes are supported:
smallint to integer or bigint.
integer to bigint.
smallmoney to money (uses integer representation internally).
The following string and binary type changes are supported:
char(n) to char(m) or varchar(m)
nchar(n) to nchar(m) or nvarchar(m)
binary(n) to binary(m) or varbinary(m)
All of the above only for n < m and m != max
Collation changes are not allowed

SQL Server 2005 XML data type

UPDATE: This issue is note related to the XML, I duplicated the table using an nvarchar(MAX) instead and still same issue. I will repost a new topic.
I have a table with about a million records, the table has an XML field. The query is running extremely slow, even when selecting just an ID. Is there anything I can do to increase the speed of this, I have tried setting text in row on, but SQL server will not allow me to, I receive the error "Cannot switch to in row text in table".
I would appreciate any help in a fix or knowledge that I seem to be missing.
Thanks
TABLE
/****** Object: Table [dbo].[Audit] Script Date: 08/14/2009 09:49:01 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
CREATE TABLE [dbo].[Audit](
[ID] [int] IDENTITY(1,1) NOT NULL,
[ParoleeID] [int] NOT NULL,
[Page] [int] NOT NULL,
[ObjectID] [int] NOT NULL,
[Data] [xml] NOT NULL,
[Created] [datetime] NULL,
CONSTRAINT [PK_Audit] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
QUERY
DECLARE #ID int
SET #ID = NULL
DECLARE #ParoleeID int
SET #ParoleeID = 158
DECLARE #Page int
SET #Page = 2
DECLARE #ObjectID int
SET #ObjectID = 93
DECLARE #Created datetime
SET #Created = NULL
SET NOCOUNT ON;
Select TOP 1 [Audit].* from [Audit]
where
(#ID IS NULL OR Audit.ID = #ID) AND
(#ParoleeID IS NULL OR Audit.ParoleeID = #ParoleeID) AND
(#Page IS NULL OR Audit.Page = #Page) AND
(#ObjectID IS NULL OR Audit.ObjectID = #ObjectID) AND
(#Created is null or(Audit.Created > #Created and Audit.Created < DATEADD (d, 1, #Created )) )
You need to create a primary XML index on the column. Above anything else having this will assist ALL your queries.
Once you have this, you can create indexing into the XML columns on the xml data.
From experience though, if you can store some information in the relation tables, SQL is much better at searching and indexing that than XML. Ie any key columns and commonly searched data should be stored relationally where possible.
Sql Server 2005 – Twelve Tips For Optimizing Query Performance by Tony Wright
Turn on the execution plan, and statistics
Use Clustered Indexes
Use Indexed Views
Use Covering Indexes
Keep your clustered index small.
Avoid cursors
Archive old data
Partition your data correctly
Remove user-defined inline scalar functions
Use APPLY
Use computed columns
Use the correct transaction isolation level
http://tonesdotnetblog.wordpress.com/2008/05/26/twelve-tips-for-optimising-sql-server-2005-queries/
I had the very same scenario - and the solution in our case is computed columns.
For those bits of information that you need frequently from your XML, we created a computed column on the "hosting" table, which basically reaches into the XML and pulls out the necessary value from the XML using XPath. In most cases, we're even able to persist this computed column, so that it becomes part of the table and can be queried and even indexed and query speed is absolutely no problem anymore (on those columns).
We also tried XML indices in the beginning, but their disadvantage is the fact that they're absolutely HUGE on disk - this may or may not be a problem. Since we needed to ship back and forth the whole database frequently (as a SQL backup), we eventually gave up on them.
OK, to setup a computed column to retrieve from information bits from your XML, you first need to create a stored function, which will take the XML as a parameter, extract whatever information you need, and then pass that back - something like this:
CREATE FUNCTION dbo.GetShopOrderID(#ShopOrder XML)
RETURNS VARCHAR(100)
AS BEGIN
DECLARE #ShopOrderID VARCHAR(100)
SELECT
#ShopOrderID = #ShopOrder.value('(ActivateOrderRequest/ActivateOrder/OrderHead/OrderNumber)[1]', 'varchar(100)')
RETURN #ShopOrderID
END
Then, you'll need to add a computed column to your table and connect it to this stored function:
ALTER TABLE dbo.YourTable
ADD ShopOrderID AS dbo.GetShipOrderID(ShopOrderXML) PERSISTED
Now, you can easily select data from your table using this new column, as if it were a normal column:
SELECT (fields) FROM dbo.YourTable
WHERE ShopOrderID LIKE 'OSA%'
Best of all - whenever you update your XML, all the computed columns are updated as well - they're always in sync, no triggers or other black magic needed!
Marc
Some information like the query you run, the table structure, the XML content etc would definitely help. A lot...
Without any info, I will guess. The query is running slow when selecting just an ID because you don't have in index on ID.
Updated
There are at least a few serious problems with your query.
Unless an ID is provided, the table can only be scanned end-to-end because there are no indexes
Even if an ID is provided, the condition (#ID is NULL OR ID = #ID) is no guaranteed to be SARGable so it may still result in a table scan.
And most importantly: the query will generate a plan 'optimized' for the first set of parameters it sees. It will reuse this plan on any combination of parameters, no matter which are NULL or not. That would make a difference if there would be some variations on the access path to choose from (ie. indexes) but as it is now, the query can only choose between using a scan or a seek, if #id is present. Due to the ways is constructed, it will pretty much always choose a scan because of the OR.
With this table design your query will run slow today, slower tomorrow, and impossibly slow next week as the size increases. You must look back at your requirements, decide which fields are impoortant to query on, index them and provide separate queryes for them. OR-ing together all possible filters like this is not going to work.
The XML you're trying to retrieve has absolutely nothing to do with the performance problem. You are simply brute forcing a table scan and expect SQL to magically find the records you want.
So if you want to retrieve a specific ParoleeID, Page and ObjectID, you index the fields you search on and run a run a query for those and only those:
CREATE INDEX idx_Audit_ParoleeID ON Audit(ParoleeID);
CREATE INDEX idx_Audit_Page ON Audit(Page);
CREATE INDEX idx_Audit_ObjectID ON Audit(ObjectID);
GO
DECLARE #ParoleeID int
SET #ParoleeID = 158
DECLARE #Page int
SET #Page = 2
DECLARE #ObjectID int
SET #ObjectID = 93
SET NOCOUNT ON;
Select TOP 1 [Audit].* from [Audit]
where Audit.ParoleeID = #ParoleeID
AND Audit.Page = #Page
AND Audit.ObjectID = #ObjectID;

Resources