i just want to know how will i index the this table for optimal performance? This will potentially hold around 20M rows.
CREATE TABLE [dbo].[Table1](
[ID] [bigint] NOT NULL,
[Col1] [varchar](100) NULL,
[Col2] [varchar](100) NULL,
[Description] [varchar](100) NULL
) ON [PRIMARY]
Basically, this table will be queried ONLY in this manner.
SELECT ID FROM Table1
WHERE Col1 = 'exactVal1' AND Col2 = 'exactVal2' AND [Description] = 'exactDesc'
This is what i did:
CREATE NONCLUSTERED INDEX IX_ID
ON Table1(ID)
GO
CREATE NONCLUSTERED INDEX IX_Col1
ON Table1(Col1)
GO
CREATE NONCLUSTERED INDEX IX_Col2
ON Table1(Col2)
GO
CREATE NONCLUSTERED INDEX IX_ValueDescription
ON Table1(ValueDescription)
GO
Am i right to index all these columns? Not really that confident yet. Just new to SQL stuff, please let me know if im on the right track.
Again, a lot of data will be put on this table. Unfortunately, i cannot test the performance yet since there are no available data. But I will soon be generating some dummy data to test the performance. But it would be great if there is already another option(suggestion) available that i can compare the results with.
Thanks,
jack
I would combine these indexes into one index, instead of having three separate indexes. For example:
CREATE INDEX ix_cols ON dbo.Table1 (Col1, Col2, Description)
If this combination of columns is unique within the table, then you should add the UNIQUE keyword to make the index unique. This is for performance reasons, but, also, more importantly, to enforce uniqueness. It may also be created as a primary key if that is appropriate.
Placing all of the columns into one index will give better performance because it will not be necessary for SQL Server to use multiple passes to find the row you are seeking.
Try this -
CREATE TABLE dbo.Table1
(
ID BIGINT NOT NULL
, Col1 VARCHAR(100) NULL
, Col2 VARCHAR(100) NULL
, [Description] VARCHAR(100) NULL
)
GO
CREATE CLUSTERED INDEX IX_Table1 ON dbo.Table1
(
Col1
, Col2
, [Description]
)
Or this -
CREATE TABLE dbo.Table1
(
ID BIGINT PRIMARY KEY NOT NULL
, Col1 VARCHAR(100) NULL
, Col2 VARCHAR(100) NULL
, [Description] VARCHAR(100) NULL
)
GO
CREATE UNIQUE NONCLUSTERED INDEX IX_Table1 ON dbo.Table1
(
Col1
, Col2
, [Description]
)
Related
I have a temporal table for my employee as shown below. But the problem is that the [data] column which is of type xml makes the history table grow so fast and users are running out of space.
To solve this I need a temporal table by excluding the data column which means I am not going to have data column in my temporal table; is that possible?
I do not care on what changes in the data xml at this point.
CREATE TABLE dbo.Employee
(
[EmployeeID] int NOT NULL PRIMARY KEY CLUSTERED
, [Name] nvarchar(100) NOT NULL
, [Position] varchar(100) NOT NULL
, [Department] varchar(100) NOT NULL
, [Data] [xml] NOT NULL
, [ValidFrom] datetime2 GENERATED ALWAYS AS ROW START
, [ValidTo] datetime2 GENERATED ALWAYS AS ROW END
, PERIOD FOR SYSTEM_TIME (ValidFrom, ValidTo)
)
WITH (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.EmployeeHistory));
I have a table with a very large number of rows which I wish to execute via dynamic SQL. They are basically existence checks and insert statements and I want to migrate data from one production database to another - we are merging transactional data. I am trying to find the optimal way to execute the rows.
I've been finding the coalesce method for appending all the rows to one another to not be efficient for this particularly when the number of rows executed at a time is greater than ~100.
Assume the structure of the source table is something arbitrary like this:
CREATE TABLE [dbo].[MyTable]
(
[ID] [int] IDENTITY(1,1) NOT NULL,
[DataField1] [int] NOT NULL,
[FK_ID1] [int] NOT NULL,
[LotsMoreFields] [NVARCHAR] (MAX),
CONSTRAINT [PK_MyTable] PRIMARY KEY CLUSTERED ([ID] ASC)
)
CREATE TABLE [dbo].[FK1]
(
[ID] [int] IDENTITY(1,1) NOT NULL,
[Name] [int] NOT NULL, -- Unique constrained value
CONSTRAINT [PK_FK1] PRIMARY KEY CLUSTERED ([ID] ASC)
)
The other requirement is I am tracking the source table PK vs the target PK and whether an insert occurred or whether I have already migrated that row to the target. To do this, I'm tracking migrated rows in another table like so:
CREATE TABLE [dbo].[ChangeTracking]
(
[ReferenceID] BIGINT IDENTITY(1,1),
[Src_ID] BIGINT,
[Dest_ID] BIGINT,
[TableName] NVARCHAR(255),
CONSTRAINT [PK_ChangeTracking] PRIMARY KEY CLUSTERED ([ReferenceID] ASC)
)
My existing method is executing some dynamic sql generated by a stored procedure. The stored proc does PK lookups as the source system has different PK values for table [dbo].[FK1].
E.g.
IF NOT EXISTS (<ignore this existence check for now>)
BEGIN
INSERT INTO [Dest].[dbo].[MyTable] ([DataField1],[FK_ID1],[LotsMoreFields]) VALUES (333,(SELECT [ID] FROM [Dest].[dbo].[FK1] WHERE [Name]=N'ValueFoundInSource'),N'LotsMoreValues');
INSERT INTO [Dest].[dbo].[ChangeTracking] ([Src_ID],[Dest_ID],[TableName]) VALUES (666,SCOPE_IDENTITY(),N'MyTable'); --666 is the PK in [Src].[dbo].[MyTable] for this inserted row
END
So when you have a million of these, it isn't quick.
Is there a recommended performant way of doing this?
As mentioned, the MERGE statement works well when you're looking at a complex JOIN condition (if any of these fields are different, update the record to match). You can also look into creating a HASHBYTES hash of the entire record to quickly find differences between source and target tables, though that can also be time-consuming on very large data sets.
It sounds like you're making these updates like a front-end developer, by checking each row for a match and then doing the insert. It will be far more efficient to do the inserts with a single query. Below is an example that looks for names that are in the tblNewClient table, but not in the tblClient table:
INSERT INTO tblClient
( [Name] ,
TypeID ,
ParentID
)
SELECT nc.[Name] ,
nc.TypeID ,
nc.ParentID
FROM tblNewClient nc
LEFT JOIN tblClient cl
ON nc.[Name] = cl.[Name]
WHERE cl.ID IS NULL;
This is will way more efficient than doing it RBAR (row by agonizing row).
Taking the two answers from #RusselFox and putting them together, I reached this tentative solution (but looking a LOT more efficient):
MERGE INTO [Dest].[dbo].[MyTable] [MT_D]
USING (SELECT [MT_S].[ID] as [SrcID],[MT_S].[DataField1],[FK_1_D].[ID] as [FK_ID1],[MT_S].[LotsMoreFields]
FROM [Src].[dbo].[MyTable] [MT_S]
JOIN [Src].[dbo].[FK_1] ON [MT_S].[FK_ID1] = [FK_1].[ID]
JOIN [Dest].[dbo].[FK_1] [FK_1_D] ON [FK_1].[Name] = [FK_1_D].[Name]
) [SRC] ON 1 = 0
WHEN NOT MATCHED THEN
INSERT([DataField1],[FL_ID1],[LotsMoreFields])
VALUES ([DataField1],[FL_ID1],[LotsMoreFields])
OUTPUT [SRC].[SrcID],INSERTED.[ID],0,N'MyTable' INTO [Dest].[dbo].[ChangeTracking]([Src_ID],[Dest_ID],[AlreadyExists],[TableName]);
I have a table named dbo.ReferenceDetails which contains several millions of records. Here is the create and select script of my table :
CREATE TABLE [dbo].[RefDetails](
[REQ_XREF_TYPE] [varchar](12) NULL
, [REQUEST_ID] [varchar](24) NULL
, [CROSS_REFERENCE] [varchar](32) NULL
, [RUN_BY] [varchar](100) NULL
, [RUN_DATE] [datetime] NULL
, [ISCURRENTRECORD] [int] NULL
, [RECORDSTARTDATE] [datetime2](7) NULL
, [RECORDENDDATE] [datetime2](7) NULL
, [UPDATE_FLAG] [varchar](50) NULL
, [SECUREFLAG] [int] NULL
, [EVENT_TIMESTAMP] [datetime2](7) NULL
) ON [PRIMARY]
SELECT TOP (10)
[REQ_XREF_TYPE]
,[REQUEST_ID]
,[CROSS_REFERENCE]
,[RUN_BY]
,[RUN_DATE]
,[ISCURRENTRECORD]
,[RECORDSTARTDATE]
,[RECORDENDDATE]
,[UPDATE_FLAG]
,[SECUREFLAG]
,[EVENT_TIMESTAMP]
FROM [dbo].[ReferenceDetails]
Can I alter the table so that REQ_XREF_TYPE, ISCURRENTRECORD and EVENT_TIMESTAMP is Primary Key and NOT NULL without dropping the table?
Your response will be appreciated. :)
See below. You first need to convert the columns to NOT NULL then you create the Primary Key. If you already have data in the table then the creation of the primary key may take some time.
ALTER TABLE [dbo].[RefDetails] ALTER COLUMN [REQ_XREF_TYPE] VARCHAR(12) NOT NULL
ALTER TABLE [dbo].[RefDetails] ALTER COLUMN [ISCURRENTRECORD] INT NOT NULL
ALTER TABLE [dbo].[RefDetails] ALTER COLUMN [EVENT_TIMESTAMP] DATETIME2(7) NOT NULL
ALTER TABLE [dbo].[RefDetails] ADD CONSTRAINT PK_RefDetails PRIMARY KEY ([REQ_XREF_TYPE],[ISCURRENTRECORD],[EVENT_TIMESTAMP]);
Note: Your creation script says the table name is RefDetails but your OP says ReferenceDetails. I went with the creation script name.
Update:
The Primary Key requires that any column(s) selected contain a unique combination - duplicates are not allowed. If duplicates exist, the creation of the primary key will fail. To check for duplicates before creating a primary key, run the following:
SELECT [REQ_XREF_TYPE], [ISCURRENTRECORD], [EVENT_TIMESTAMP], CountDupes = COUNT(1)
FROM [dbo].[RefDetails]
GROUP BY [REQ_XREF_TYPE], [ISCURRENTRECORD], [EVENT_TIMESTAMP]
HAVING COUNT(1) > 1
ORDER BY [REQ_XREF_TYPE], [ISCURRENTRECORD], [EVENT_TIMESTAMP]
You are expecting 0 results, which means there are no duplicates. Any result will identify the unique set of duplicate records along with the number of times they are duplicates (see the CountDupes column result).
If you get 0 results, then you are clear to create the primary key.
If you get any results, then you will need to address this (i.e., remove the duplicates or include additional columns that create a unique combination).
I'm working on a project (Microsoft SQL Server 2012) in which I do need to store quite some data.
Currently my table does contains 1441352 records in total.
The structure of the table is as follows:
RecordIdentifier (int, not null)
GlnCode (PK, nvarchar(100), not null)
Description (nvarchar(MAX), not null)
VendorId (nvarchar(100), not null)
VendorName (nvarchar(100), not null)
ItemNumber (PK, nvarchar(100), not null)
ItemUOM (PK, nvarchar(128), not null)
My table is indexed on the following fields:
NonClustered - GlnCode, Ascending
NonClustered - ItemNumber, Ascending
NonClustered - ItemUOM, Ascending
NonClustered - VendorID, Ascending
Clustered - Unique (The above 4 columns together).
Now, when I'm writing an API to return the records in the table.
The API exposes methods and it's executing this query:
SELECT TOP (51)
[GlnCode] AS [GlnCode],
[VendorId] AS [VendorId],
[ItemNumber] AS [ItemNumber],
[ItemUOM] AS [ItemUOM],
[RecordIdentitifer] AS [RecordIdentitifer],
[Description] AS [Description],
[VendorName] AS [VendorName]
FROM [dbo].[T_GENERIC_ARTICLE]
If I look at the performance, this is good.
But, this doesn't guarantee me to return always the same set, so I need to apply an ORDER BY clause, meaning the query being executed looks like this:
SELECT TOP (51)
[GlnCode] AS [GlnCode],
[VendorId] AS [VendorId],
[ItemNumber] AS [ItemNumber],
[ItemUOM] AS [ItemUOM],
[RecordIdentitifer] AS [RecordIdentitifer],
[Description] AS [Description],
[VendorName] AS [VendorName]
FROM [dbo].[T_GENERIC_ARTICLE]
ORDER BY [GlnCode] ASC, [ItemNumber] ASC, [ItemUOM] ASC, [VendorId] ASC
Now, the query takes a few seconds to return, which I can't afford.
Anyone has any idea on how to solve this issue?
Your table index definitions are not optimal. You also don't have to created the additional individual indexes because they are covered by the Non Clustered Index. You will have better performance when structuring your indexes as follows:
Table definition:
CREATE TABLE [dbo].[T_GENERIC_ARTICLE]
(
RecordIdentifier int IDENTITY(1,1) PRIMARY KEY NOT NULL,
GlnCode nvarchar(100) NOT NULL,
Description nvarchar(MAX) NOT NULL,
VendorId nvarchar(100) NOT NULL,
VendorName nvarchar(100) NOT NULL,
ItemNumber nvarchar(100) NOT NULL,
ItemUOM nvarchar(128) NOT NULL
)
GO
CREATE UNIQUE NONCLUSTERED INDEX [UniqueNonClusteredIndex-Composite2]
ON [dbo].[T_GENERIC_ARTICLE](GlnCode, ItemNumber,ItemUOM,VendorId ASC);
GO
Revised Query
SELECT TOP (51)
[RecordIdentifier] AS [RecordIdentitifer],
[GlnCode] AS [GlnCode],
[VendorId] AS [VendorId],
[ItemNumber] AS [ItemNumber],
[ItemUOM] AS [ItemUOM],
[Description] AS [Description],
[VendorName] AS [VendorName]
FROM [dbo].[T_GENERIC_ARTICLE]
ORDER BY [GlnCode], [ItemNumber], [ItemUOM], [VendorId]
First a key lookup will be performed on the Primary Key and then a Non Clustered Index Scan. This is where you want the majority of the work to be done.
Reference:
Indexes in SQL Server
Hope This helps
In a table schema like below
CREATE TABLE [dbo].[Employee](
[EmployeeId] [uniqueidentifier] NOT NULL,
[Name] [nvarchar](50) NOT NULL,
[Location] [nvarchar](50) NOT NULL,
[Skills] [xml] NOT NULL
CONSTRAINT [PK_Employee] PRIMARY KEY CLUSTERED
How would i get Employees having C#(case insensitive) programming skills assuming the xml saved in the Skills columns is as below.
Could you advice on other functions would help me filter, sort when using xml data type columns
<Skills><Skill>C#</Skill><Skill>ASP.NET</Skill><Skill>VB.NET</Skill></Skills>
The comparison is case sensitive so you need to compare against both c# and C#. In SQL Server 2008 you can use upper-case.
declare #T table
(
ID int identity,
Skills XML
)
insert into #T values
('<Skills><Skill>C#</Skill><Skill>ASP.NET</Skill><Skill>VB.NET</Skill></Skills>')
insert into #T values
('<Skills><Skill>CB.NET</Skill><Skill>ASP.NET</Skill><Skill>c#</Skill></Skills>')
insert into #T values
('<Skills><Skill>F#</Skill><Skill>ASP.NET</Skill><Skill>VB.NET</Skill></Skills>')
select ID
from #T
where Skills.exist('/Skills/Skill[contains(., "C#") or contains(., "c#")]') = 1
Result:
ID
-----------
1
2
Update:
This will also work.
select T.ID
from #T as T
cross apply T.Skills.nodes('/Skills/Skill') as X(N)
where X.N.value('.', 'nvarchar(50)') like '%C#%'