Merge Partition Locking table in sql server - sql-server

I have a BIG table with 400kk rows.
I want to partition this table but I`m having a problem when merging the two older Partition Functions.
I have this table:
CREATE TABLE [dbo].[PartitionDemo](
[Id] [int] IDENTITY(1,1) NOT NULL,
[myDate] [date] NOT NULL,
[variable] [varchar](100) NULL,
CONSTRAINT [pk_PartitionDemo] PRIMARY KEY CLUSTERED
(
[myDate] ASC,
[Id] ASC
)ON [PartitionDemo_PS](mydate)
)
CREATE PARTITION SCHEME [PartitionDemo_PS] AS PARTITION [PartitionDemo_PF] TO ([PartitionDemo_FG_Prev], [PartitionDemo_FG_Historical], [PartitionDemo_FG_201609], [PartitionDemo_FG_201610], [PartitionDemo_FG_201611], [PartitionDemo_FG_201612], [PartitionDemo_FG_201701], [PartitionDemo_FG_201702], [PartitionDemo_FG_201703], [PartitionDemo_FG_201704])
GO
CREATE PARTITION FUNCTION [PartitionDemo_PF](date) AS RANGE RIGHT FOR VALUES (N'2015-03-01T00:00:00.000', N'2016-09-01T00:00:00.000', N'2016-10-01T00:00:00.000', N'2016-11-01T00:00:00.000', N'2016-12-01T00:00:00.000', N'2017-01-01T00:00:00.000', N'2017-02-01T00:00:00.000', N'2017-03-01T00:00:00.000', N'2017-04-01T00:00:00.000')
GO
This is my table with 400kk rows.
What I do to merge partition is:
CREATE TABLE [staging].[PartitionDemo](
[Id] [int] IDENTITY(1,1) NOT NULL,
[myDate] [date] NOT NULL,
[variable] [varchar](100) NULL,
CONSTRAINT [pk_PartitionDemo] PRIMARY KEY CLUSTERED
(
[myDate] ASC,
[Id] ASC
)ON [PartitionDemo_PS](mydate)
)
GO
ALTER TABLE PartitionDemo
SWITCH PARTITION 2 TO [staging].[PartitionDemo] PARTITION 2
ALTER TABLE PartitionDemo
SWITCH PARTITION 3 TO [staging].[PartitionDemo] PARTITION 3
ALTER PARTITION FUNCTION [PartitionDemo_PF]()
MERGE RANGE ('2016-03-01');
The problem is that it locks both tables while merging.
What is the workaround with this problem?

If you remove the first boundary with this function and scheme, all data before 2015-03-01 will be moved to the PartitionDemo_FG_Prev filegroup instead of the PartitionDemo_FG_Historical filegroup as intended. I recommend a NULL partition boundary to ensure the first partition is always empty. This will also allow you to remove files from this unused filegroup and facilitate partition maintenance going forward. See http://www.dbdelta.com/table-partitioning-best-practices/ for more information on this practice.
A brief schema modification lock will be acquired during the SWITCH, MERGE, and SPLIT operations but those should fast meta-data operations because no data movement is needed. The physical data movement is done by the staging table CREATE INDEX...DROP_EXISTING-ON, which also avoids a sort to rebuild the index. This script acquires an exclusive table lock during the SWITCH, MERGE, and SPLIT operations to avoid deadlocking with other activity.
--create staging table exactly like original table
CREATE TABLE [staging].[PartitionDemo](
[Id] [int] IDENTITY(1,1) NOT NULL,
[myDate] [date] NOT NULL,
[variable] [varchar](100) NULL,
CONSTRAINT [pk_PartitionDemo] PRIMARY KEY CLUSTERED
(
[myDate] ASC,
[Id] ASC
) ON [PartitionDemo_PS](mydate)
);
--create temporary partition function and scheme with desired end state
CREATE PARTITION FUNCTION [StagingPartitionDemo_PF](date) AS RANGE RIGHT FOR VALUES (
CAST(NULL AS datetime) --NULL boundary ensures first parttion is always empty
, N'2016-09-01T00:00:00.000' --upper boundary of historical data fg (less than this date)
, N'2016-10-01T00:00:00.000'
, N'2016-11-01T00:00:00.000'
, N'2016-12-01T00:00:00.000'
, N'2017-01-01T00:00:00.000'
, N'2017-02-01T00:00:00.000'
, N'2017-03-01T00:00:00.000'
, N'2017-04-01T00:00:00.000'
);
CREATE PARTITION SCHEME [StagingPartitionDemo_PS] AS PARTITION [StagingPartitionDemo_PF] TO (
[PartitionDemo_FG_Prev]
, [PartitionDemo_FG_Historical]
, [PartitionDemo_FG_201609]
, [PartitionDemo_FG_201610]
, [PartitionDemo_FG_201611]
, [PartitionDemo_FG_201612]
, [PartitionDemo_FG_201701]
, [PartitionDemo_FG_201702]
, [PartitionDemo_FG_201703]
, [PartitionDemo_FG_201704]
);
GO
SET XACT_ABORT ON;
BEGIN TRAN;
--acquire exclusive table lock to prevent deadlocking with concurrent activity
SELECT TOP(0) myDate FROM dboPartitionDemo WITH(TABLOCKX);
--switch first partition into staging (in case data exists before 2015-03-01)
ALTER TABLE dbo.PartitionDemo
SWITCH PARTITION $PARTITION.PartitionDemo_PF(CAST(NULL AS datetime))
TO [staging].[PartitionDemo] PARTITION $PARTITION.PartitionDemo_PF(CAST(NULL AS datetime));
--switch second partition into staging (on or after 2015-03-01 and before 2016-09-01)
ALTER TABLE dbo.PartitionDemo
SWITCH PARTITION $PARTITION.PartitionDemo_PF('2015-03-01T00:00:00.000')
TO [staging].[PartitionDemo] PARTITION $PARTITION.PartitionDemo_PF('2015-03-01T00:00:00.000');
--switch third partition into staging (on or after 2016-09-01 and before 2016-10-01)
ALTER TABLE dbo.PartitionDemo
SWITCH PARTITION $PARTITION.PartitionDemo_PF('2016-09-01T00:00:00.000')
TO [staging].[PartitionDemo] PARTITION $PARTITION.PartitionDemo_PF('2016-09-01T00:00:00.000');
COMMIT;
GO
--rebuild staging table on temporary partition scheme
CREATE UNIQUE CLUSTERED INDEX pk_PartitionDemo ON staging.PartitionDemo(
[myDate] ASC,
[Id] ASC
)
WITH(DROP_EXISTING=ON)
ON [StagingPartitionDemo_PS](mydate);
GO
SET XACT_ABORT ON;
BEGIN TRAN;
--acquire exclusive table lock to prevent deadlocking with concurrent activity
SELECT TOP(0) myDate FROM dboPartitionDemo WITH(TABLOCKX);
--modify original partition scheme to match temporary one
ALTER PARTITION SCHEME PartitionDemo_PS
NEXT USED PartitionDemo_FG_Historical;
ALTER PARTITION FUNCTION PartitionDemo_PF()
SPLIT RANGE(CAST(NULL AS datetime));
ALTER PARTITION FUNCTION PartitionDemo_PF()
MERGE RANGE('2015-03-01T00:00:00.000');
--switch historical data partition partition back to main table
ALTER TABLE staging.PartitionDemo
SWITCH PARTITION $PARTITION.PartitionDemo_PF(NULL)
TO dbo.[PartitionDemo] PARTITION $PARTITION.PartitionDemo_PF(CAST(NULL AS datetime));
--switch 2016-09-01 partition back to main table
ALTER TABLE staging.PartitionDemo
SWITCH PARTITION $PARTITION.PartitionDemo_PF('2016-09-01T00:00:00.000')
TO dbo.[PartitionDemo] PARTITION $PARTITION.PartitionDemo_PF('2016-09-01T00:00:00.000');
COMMIT;
GO

Related

How to drop a clustered columnstore index?

How can I drop a clustered columnstore index on a table?
I am trying to alter the length of a column, but getting this error:
The statement failed because a secondary dictionary reached the maximum size limit.
Consider dropping the columnstore index, altering the column, then creating a new columnstore index.
I have a table that looks like this, roughly:
CREATE TABLE [dim].[Ticket]
(
[ID] [bigint] NULL,
[Rev] [int] NULL,
[Timestamp] [datetime2](7) NULL,
[Title] [varchar](260) NULL,
[Description] [varchar](4005) NULL
)
WITH
(
DISTRIBUTION = HASH ( [ID] ),
CLUSTERED COLUMNSTORE INDEX
)
When I try variations of this recommendation:
https://learn.microsoft.com/en-us/sql/t-sql/statements/drop-index-transact-sql?view=sql-server-ver16
I just get errors.
I checked that this works on Synapse too. Discover the CCI's name with
select * from sys.indexes where object_id = object_id('dim.Ticket')
then drop it
drop index ClusteredIndex_fdddc3c574214a2096190cbc54f58cc4 on dim. Ticket
You'll then have a heap. When you're ready re-compress the table with
create clustered columnstore index cci_dim_ticket on dim.Ticket
But it would be more efficient to create a new table with a CTAS, and then rename and drop the old one. Dropping the CCI actually requires rewriting the table as an uncompressed heap, which you can skip with CTAS.

How enable Incremental Statistics by alter table

to enable incremental update statistics i have to create partition function, partition scheme, index on my table and create table in this way
create table [tmp].[PartitionTest]
(
[RecordId] int not null
,[CreateDate] datetime
,[Quantity] int
) on [ups_partionByDate_scheme226] ([CreateDate])
But, when I can't create table like and add this line
on [ups_partionByDate_scheme226] ([CreateDate])
Can I do this by alter table or other way?
Yes.
If your table has a clustered index, then you need to drop it and after that you can use the following code snippet. If you have no cluster index, skip the previous sentence.
ALTER TABLE [tmp].[PartitionTest] ADD CONSTRAINT [PK_ParitionTest_CreateDate] PRIMARY KEY CLUSTERED
(
[CreateDate]
) ON [ups_partionByDate_scheme226] ([CreateDate]);
See also Create Partitioned Tables and Indexes

Dynamic SQL to execute large number of rows from a table

I have a table with a very large number of rows which I wish to execute via dynamic SQL. They are basically existence checks and insert statements and I want to migrate data from one production database to another - we are merging transactional data. I am trying to find the optimal way to execute the rows.
I've been finding the coalesce method for appending all the rows to one another to not be efficient for this particularly when the number of rows executed at a time is greater than ~100.
Assume the structure of the source table is something arbitrary like this:
CREATE TABLE [dbo].[MyTable]
(
[ID] [int] IDENTITY(1,1) NOT NULL,
[DataField1] [int] NOT NULL,
[FK_ID1] [int] NOT NULL,
[LotsMoreFields] [NVARCHAR] (MAX),
CONSTRAINT [PK_MyTable] PRIMARY KEY CLUSTERED ([ID] ASC)
)
CREATE TABLE [dbo].[FK1]
(
[ID] [int] IDENTITY(1,1) NOT NULL,
[Name] [int] NOT NULL, -- Unique constrained value
CONSTRAINT [PK_FK1] PRIMARY KEY CLUSTERED ([ID] ASC)
)
The other requirement is I am tracking the source table PK vs the target PK and whether an insert occurred or whether I have already migrated that row to the target. To do this, I'm tracking migrated rows in another table like so:
CREATE TABLE [dbo].[ChangeTracking]
(
[ReferenceID] BIGINT IDENTITY(1,1),
[Src_ID] BIGINT,
[Dest_ID] BIGINT,
[TableName] NVARCHAR(255),
CONSTRAINT [PK_ChangeTracking] PRIMARY KEY CLUSTERED ([ReferenceID] ASC)
)
My existing method is executing some dynamic sql generated by a stored procedure. The stored proc does PK lookups as the source system has different PK values for table [dbo].[FK1].
E.g.
IF NOT EXISTS (<ignore this existence check for now>)
BEGIN
INSERT INTO [Dest].[dbo].[MyTable] ([DataField1],[FK_ID1],[LotsMoreFields]) VALUES (333,(SELECT [ID] FROM [Dest].[dbo].[FK1] WHERE [Name]=N'ValueFoundInSource'),N'LotsMoreValues');
INSERT INTO [Dest].[dbo].[ChangeTracking] ([Src_ID],[Dest_ID],[TableName]) VALUES (666,SCOPE_IDENTITY(),N'MyTable'); --666 is the PK in [Src].[dbo].[MyTable] for this inserted row
END
So when you have a million of these, it isn't quick.
Is there a recommended performant way of doing this?
As mentioned, the MERGE statement works well when you're looking at a complex JOIN condition (if any of these fields are different, update the record to match). You can also look into creating a HASHBYTES hash of the entire record to quickly find differences between source and target tables, though that can also be time-consuming on very large data sets.
It sounds like you're making these updates like a front-end developer, by checking each row for a match and then doing the insert. It will be far more efficient to do the inserts with a single query. Below is an example that looks for names that are in the tblNewClient table, but not in the tblClient table:
INSERT INTO tblClient
( [Name] ,
TypeID ,
ParentID
)
SELECT nc.[Name] ,
nc.TypeID ,
nc.ParentID
FROM tblNewClient nc
LEFT JOIN tblClient cl
ON nc.[Name] = cl.[Name]
WHERE cl.ID IS NULL;
This is will way more efficient than doing it RBAR (row by agonizing row).
Taking the two answers from #RusselFox and putting them together, I reached this tentative solution (but looking a LOT more efficient):
MERGE INTO [Dest].[dbo].[MyTable] [MT_D]
USING (SELECT [MT_S].[ID] as [SrcID],[MT_S].[DataField1],[FK_1_D].[ID] as [FK_ID1],[MT_S].[LotsMoreFields]
FROM [Src].[dbo].[MyTable] [MT_S]
JOIN [Src].[dbo].[FK_1] ON [MT_S].[FK_ID1] = [FK_1].[ID]
JOIN [Dest].[dbo].[FK_1] [FK_1_D] ON [FK_1].[Name] = [FK_1_D].[Name]
) [SRC] ON 1 = 0
WHEN NOT MATCHED THEN
INSERT([DataField1],[FL_ID1],[LotsMoreFields])
VALUES ([DataField1],[FL_ID1],[LotsMoreFields])
OUTPUT [SRC].[SrcID],INSERTED.[ID],0,N'MyTable' INTO [Dest].[dbo].[ChangeTracking]([Src_ID],[Dest_ID],[AlreadyExists],[TableName]);

Why Clustered Index Scan in this query?

This query (SQL2012) execution plan shows me that Clustered Index Scan used in the internal sub-query on PK index:
SELECT n3.id as node_id,x.id as id,
(select xv.value from xv
--with(forceseek)
where xv.id=x.id) as [value]
FROM x
INNER JOIN n3
ON x.[obj_id]=n3.id
AND n3.parent_id = '975422E0-5630-4545-8CF7-062D7DF72B6B'
The tables x and xv are master->details tables.
When I use hint forceseek then it shows Clustered Index Seek and query executes fast.
Why there is Scan instead of Seek?
How to change the query to have Index Seek without the hint FORCESEEK?
UPD:
The full demo script:
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
/*
DROP TABLE [dbo].[xv]
DROP TABLE [dbo].[x]
DROP TABLE [dbo].[n3]
*/
CREATE TABLE [dbo].[n3](
[id] [uniqueidentifier] NOT NULL,
[parent_id] [uniqueidentifier] NOT NULL,
CONSTRAINT [PK_n3] PRIMARY KEY CLUSTERED
(
[id] ASC
)
)
GO
CREATE TABLE [dbo].[x](
[obj_id] [uniqueidentifier] NOT NULL,
[id] [int] IDENTITY(1,1) NOT NULL,
CONSTRAINT [PK_x] PRIMARY KEY CLUSTERED
(
[id] ASC
))
GO
ALTER TABLE [dbo].[x] WITH CHECK ADD CONSTRAINT [FK_x_n3] FOREIGN KEY([obj_id])
REFERENCES [dbo].[n3] ([id])
ON DELETE CASCADE
GO
ALTER TABLE [dbo].[x] CHECK CONSTRAINT [FK_x_n3]
GO
CREATE TABLE [dbo].[xv](
[id] [int] NOT NULL,
[value] [sql_variant] NOT NULL,
CONSTRAINT [PK_xv] PRIMARY KEY CLUSTERED
(
[id] ASC
))
GO
ALTER TABLE [dbo].[xv] WITH CHECK ADD CONSTRAINT [FK_xv_x] FOREIGN KEY([id])
REFERENCES [dbo].[x] ([id])
ON DELETE CASCADE
GO
ALTER TABLE [dbo].[xv] CHECK CONSTRAINT [FK_xv_x]
GO
INSERT INTO n3(id,parent_id)
select newid(), '975422E0-5630-4545-8CF7-062D7DF72B6B'
GO 10
INSERT INTO n3(id,parent_id)
select newid(), '805422E0-5630-4545-8CF7-062D7DF72B6B'
GO 5
INSERT INTO x([obj_id])
select id from n3 where parent_id='975422E0-5630-4545-8CF7-062D7DF72B6B';
insert into xv (id, value)
select id, cast(RAND(1) as sql_variant) from x
--select * from x
--select * from n3
SELECT n3.id as node_id,x.id as id,
(select xv.value from dbo.xv
--with(forceseek)
where xv.id=x.id
) as [value]
FROM dbo.x
INNER JOIN dbo.n3
ON x.[obj_id]=n3.id
AND n3.parent_id = '975422E0-5630-4545-8CF7-062D7DF72B6B'
/*
DROP TABLE [dbo].[xv]
DROP TABLE [dbo].[x]
DROP TABLE [dbo].[n3]
*/
--Update statistics xv with fullscan
I suspect the statistics of xv table might be out of date. Update the statistics of xv and try running the query again.
Update statistics xv with fullscan
Update :
After looking at the data setup and query, For the given parent_id input it is very clear that all the records in both x and xv match so it is obvious that optimizer chooses index scan instead of seek because it has fetch all the records from both x and xv table
Also the number of records is less so optimizer will prefer scan instead of seek

New uniqueidentifier on the go

I want to add a column for a table which would become a PRIMARY KEY and be of type uniqueidentifier. I have this, but I wonder if there is a faster (in fewer code lines) way?
ALTER TABLE [table] ADD [id] [uniqueidentifier]
DEFAULT('00000000-0000-0000-0000-000000000000') NOT NULL
GO
UPDATE [table] SET [id] = NEWID()
GO
ALTER TABLE [table] ADD CONSTRAINT [PK_table_id] PRIMARY KEY CLUSTERED
GO
If you want to keep naming your constraints (and you should), I don't think we can reduce it below 2 statements:
create table T (
Col1 varchar(10) not null
)
go
insert into T (Col1)
values ('abc'),('def')
go
ALTER TABLE T ADD [id] [uniqueidentifier] constraint DF_T_id DEFAULT(NEWID()) NOT NULL
GO
ALTER TABLE T ADD constraint PK_T PRIMARY KEY CLUSTERED (id)
go
drop table T
Note, that I've added a name for the default constraint. Also, this ensures that new rows also have id values assigned. As I said in my comment, it's usually preferable to avoid having columns with values generated by NEWID() clustered - it leads to lots of fragmentation. If you want to avoid that, consider NEWSEQUENTIALID().
If you don't care about constraint names, you can do it as a single query:
ALTER TABLE T ADD [id] [uniqueidentifier] DEFAULT(NEWID()) NOT NULL PRIMARY KEY CLUSTERED

Resources