SQL Server Scripting Partitioning - sql-server

Had a good look on the net and books online and couldn't find an answer to my question, so here goes.
Working on someone else's design, I have several tables all tied to the same partition schema and partition function. I wish to perform a split operation which would affect many hundreds of millions of rows.
To split is no problem:
ALTER PARTITION SCHEME [ps_Scheme] NEXT USED [FG1] ;
ALTER PARTITION FUNCTION [pfcn_Function]() SPLIT RANGE (20120331)
However, I'm concerned that this will affect many tables at once and is not desirable.
Therefore, I was going to create a new copy of the table and do the split on a new function
CREATE PARTITION FUNCTION [pfcn_Function1](INT)
AS RANGE RIGHT
FOR VALUES
(
20090101, 20090130, 20090131, 20090201...etc
)
CREATE PARTITION SCHEME [ps_Scheme1]
AS PARTITION [pfcn_Function1] TO
([FG1], [FG2] etc
CREATE TABLE [dbo].[myTableCopy]
(
....
) ON ps_Scheme1
Then I would switch the partition I wish to split across:
-- The partition numbers did not align because they are based on 2 different functions.
ALTER TABLE [Table] SWITCH PARTITION 173 TO [TableCopy] PARTITION 172
Finally my question is can this be automated? You can make a copy of the table easily in SQL using SELECT INTO, but I cannot see how to automate the partitioning of the table i.e. the bit on the end of the CREATE TABLE statement that points to the partition scheme.
Thanks for any responses.

Found this on books online:
You can turn an existing nonpartitioned table into a partitioned table in one of two ways.
One way is to create a partitioned clustered index on the table by using the CREATE INDEX statement.
This action is similar to creating a clustered index on any table, because SQL Server essentially
drops the table and re-creates it in a clustered index format. If the table already has a
partitioned clustered index applied to it, you can drop the index and rebuilding it on a partition
scheme by using CREATE INDEX with the DROP EXISTING = ON clause
I think this might solve my problem.

It can be automated, but I'm not sure is worth it. If is only 'several' tables, not hundreds, then is better to just script out each table and then build a script that does the copy out/split the copy/switch out/split the source/switch in.
Automating this would involve dynamically building the temp table definition(s), including all indexes, from sys.tables/sys.columns/sys.indexes/sys.index_columns and other similar views. Same way SMO Scripting does it.

Yes, you can switch partitions in a automated process. Here is a code sample you can customise. It is driven from a metadata table.
CREATE TABLE [dbo].[PartitionTableSetup](
[Id] [int] IDENTITY(1,1) NOT NULL,
[TableName] [varchar](256) NULL,
[SwitchTable] [varchar](256) NULL,
[Partition] [int] NULL)
select #merge = (
Select N'' + com + '' from (
Select N' ALTER TABLE '
+ TableName +
' SWITCH PARTITION 2 TO '
+ SwitchTable
+ ' PARTITION 2 Truncate table '
+ SwitchTable as com
,value
,1 as ord
From (
SELECT convert(datetime,value) as value
,pt.TableName
,pt.SwitchTable
FROM sys.partition_range_values AS RV
JOIN sys.partition_functions AS PF
ON RV.function_id = PF.function_id
Join dbo.[Partitions] pr
On name = PartitionFunction
Join dbo.PartitionTableSetup pt
On pt.[Partition] = pr.ID
WHERE datediff(d,convert(datetime,value),GETDATE()) > pr.[Range] -3
) a
Union all
Select N' ALTER PARTITION FUNCTION '
+ b.PartitionFunction
+ '() MERGE RANGE ('''
+ Convert(nvarchar,value,121)
+''')' as com
,value
,2 as ord
From (
SELECT convert(datetime,value) as value
,pr.PartitionFunction
FROM sys.partition_range_values AS RV
JOIN sys.partition_functions AS PF
ON RV.function_id = PF.function_id
Join dbo.[Partitions] pr
On name = PartitionFunction
WHERE datediff(d,convert(datetime,value),GETDATE()) > pr.[Range] -3
) b
) c Order by value
, ord
for xml path ('')
)
EXECUTE (#merge)

Related

Query tuning required for expensive query

Can someone help me to optimize the code? I have other way to optimize it by using compute column but we can not change the schema on prod as we are not sure how many API's are used to push data into this table. This table has millions of rows and adding a non-clustered index is not helping due to the query cost and it's going for a scan.
create table testcts(
name varchar(100)
)
go
insert into testcts(
name
)
select 'VK.cts.com'
union
select 'GK.ms.com'
go
DECLARE #list varchar(100) = 'VK,GK'
select * from testcts where replace(replace(name,'.cts.com',''),'.ms.com','') in (select value from string_split(#list,','))
drop table testcts
One possibility might be to strip off the .cts.com and .ms.com subdomain/domain endings before you insert or store the name data in your table. Then, use the following query instead:
SELECT *
FROM testcts
WHERE name IN (SELECT value FROM STRING_SPLIT(#list, ','));
Now SQL Server should be able to use an index on the name column.
If your values are always suffixed by cts.com or ms.com you could add that to the search pattern:
SELECT {YourColumns} --Don't use *
FROM dbo.testcts t
JOIN (SELECT CONCAT(SS.[value], V.Suffix) AS [value]
FROM STRING_SPLIT(#list, ',') SS
CROSS APPLY (VALUES ('.cts.com'),
('.ms.com')) V (Suffix) ) L ON t.[name] = L.[value];

How does one Remove a Partition from a Table?

I have managed to add a Partition to a Table (Logs) but needed to create a Rollback script in case it needs to be removed. Unfortunately, this has now failed and Logs now has no primary key as a result of failing part-way through the rollback script and I have no way to add it back as I get the error...
Column 'SuperLogId' is partitioning column of the index 'PK__Logs__0E6B88F2'. Partition columns for a unique index must be a subset of the index key.
when trying to run this:
ALTER TABLE dbo.Logs
ADD PRIMARY KEY CLUSTERED (Id ASC)
So I tried following this guide (https://www.patrickkeisler.com/2013/01/how-to-remove-undo-table-partitioning.html) and ended up having to write this to generate a script to merge all my dynamically-created partitions.
DECLARE #partitionsTable dbo.NVarCharCollectionTableType --User-defined table type to hold a collection of NVarChars.
INSERT INTO #partitionsTable
SELECT CONCAT('ALTER PARTITION FUNCTION Logs_SuperLogId_PartitionFunction() MERGE RANGE (', CONVERT(NVARCHAR, [Value]), ')')
FROM SYS.PARTITION_SCHEMES
INNER JOIN SYS.PARTITION_FUNCTIONS ON PARTITION_FUNCTIONS.FUNCTION_ID = PARTITION_SCHEMES.FUNCTION_ID
INNER JOIN SYS.PARTITION_RANGE_VALUES ON PARTITION_RANGE_VALUES.FUNCTION_ID = PARTITION_FUNCTIONS.FUNCTION_ID
WHERE PARTITION_SCHEMES.Name = 'Logs_SuperLogId_PartitionScheme'
AND PARTITION_FUNCTIONS.Name = 'Logs_SuperLogId_PartitionFunction'
ORDER BY [Value] ASC
DECLARE #statement NVARCHAR(MAX)
SELECT #statement =
CASE
WHEN #statement IS NULL
THEN CAST([Text] AS NVARCHAR(MAX))
ELSE CONCAT(#statement, '; ', [Text])
END
FROM #partitionsTable
ORDER BY [Text] ASC
SELECT #statement
EXECUTE SP_EXECUTESQL #statement
ALTER PARTITION SCHEME Logs_SuperLogId_PartitionScheme NEXT USED [PRIMARY]
The guide suggested this would help somehow but it didn't! I still get the same error when trying to re-add the Primary Key and still get these errors for trying to drop the Partition Function and Partition Scheme!
DROP PARTITION SCHEME Logs_SuperLogId_PartitionScheme
The partition scheme "Logs_SuperLogId_PartitionScheme" is currently being used to partition one or more tables.
DROP PARTITION FUNCTION CatLogs_CatSessionLogId_PartitionFunction
Partition function 'Logs_SuperLogId_PartitionFunction' is being used by one or more partition schemes.
How is my Partition Scheme still being used? Why can't I just get rid of it and it be not used anymore? I just want to de-partition my Logs table and re-add its original clustered primary key (which I had to previously remove and replace with a non-clustered primary key to make SuperLogId have a clustered index on it so it could be partitioned upon).
Update:
I was able to use the following hack to get the Partition removed from my table but I still can't drop the Partition Scheme or Function.
--HACK: Dummy Index to disassociate the table from the partitioning scheme.
CREATE CLUSTERED INDEX IX_Logs_Id ON dbo.Logs(Id) ON [Primary]
--Now that the table has been disassociated with the partition, this dummy index can be dropped.
DROP INDEX IX_Logs_Id ON dbo.Logs
I have since ran this script to find out which tables are using any Partitions in my database and it returns nothing, as expected.
SELECT DISTINCT TABLES.NAME
FROM SYS.PARTITIONS
INNER JOIN SYS.TABLES ON PARTITIONS.OBJECT_ID = TABLES.OBJECT_ID
WHERE PARTITIONS.PARTITION_NUMBER <> 1
This allowed me to re-add the Primary key but I still get the The partition scheme "Logs_SuperLogId_PartitionScheme" is currently being used... error when trying to drop the Partition Scheme.
Based on the Microsoft documentation (https://learn.microsoft.com/en-us/sql/t-sql/statements/drop-partition-scheme-transact-sql?view=sql-server-2017), the Partition Scheme should be droppable if there are no tables or indices references it. Therefore I subsequently also ran this script to check for an index using it...
SELECT DISTINCT indexes.NAME
FROM SYS.PARTITIONS
INNER JOIN SYS.indexes ON indexes.index_id = partitions.index_id
WHERE PARTITIONS.PARTITION_NUMBER <> 1
...And it returned nothing! So what on earth is using my Partition Scheme?!
I was able to remove the Partition from its table with the following code.
--HACK: Dummy Index to disassociate the table from the partitioning scheme.
CREATE CLUSTERED INDEX IX_Logs_Id ON dbo.Logs(Id) ON [Primary]
--Now that the table has been disassociated with the partition, this dummy index can be dropped.
DROP INDEX IX_Logs_Id ON dbo.Logs
Then, using the following script, found out that two indices were still holding onto the Partition Scheme.
SELECT SCHEMA_NAME(B.SCHEMA_ID) SCHEMANAME, B.NAME TABLENAME, C.INDEX_ID, C.NAME INDEXNAME, C.TYPE_DESC,
A.PARTITION_NUMBER, D.NAME DATASPACENAME, F.NAME SCHEMADATASPACENAME,
H.VALUE DATARANGEVALUE, A.ROWS,
J.IN_ROW_RESERVED_PAGE_COUNT, J.LOB_RESERVED_PAGE_COUNT,
J.IN_ROW_RESERVED_PAGE_COUNT+J.LOB_RESERVED_PAGE_COUNT TOTALPAGECOUNT,
I.LOCATION
FROM SYS.PARTITIONS A
JOIN SYS.TABLES B ON A.OBJECT_ID = B.OBJECT_ID
JOIN SYS.INDEXES C ON A.OBJECT_ID = C.OBJECT_ID AND A.INDEX_ID = C.INDEX_ID
JOIN SYS.DATA_SPACES D ON C.DATA_SPACE_ID = D.DATA_SPACE_ID
LEFT JOIN SYS.DESTINATION_DATA_SPACES E ON E.PARTITION_SCHEME_ID = D.DATA_SPACE_ID AND A.PARTITION_NUMBER = E.DESTINATION_ID
LEFT JOIN SYS.DATA_SPACES F ON E.DATA_SPACE_ID = F.DATA_SPACE_ID
LEFT JOIN SYS.PARTITION_SCHEMES G ON D.NAME = G.NAME
LEFT JOIN SYS.PARTITION_RANGE_VALUES H ON G.FUNCTION_ID = H.FUNCTION_ID AND H.BOUNDARY_ID = A.PARTITION_NUMBER
LEFT JOIN (SELECT DISTINCT DATA_SPACE_ID, LEFT(PHYSICAL_NAME, 1) LOCATION FROM SYS.DATABASE_FILES) I ON I.DATA_SPACE_ID = ISNULL(F.DATA_SPACE_ID, D.DATA_SPACE_ID)
LEFT JOIN SYS.DM_DB_PARTITION_STATS J ON J.OBJECT_ID = A.OBJECT_ID AND J.INDEX_ID = A.INDEX_ID AND J.PARTITION_NUMBER = A.PARTITION_NUMBER
ORDER BY 1, 2, 3, A.PARTITION_NUMBER
All I had to do was drop the two indices referencing the Partition Scheme then that allowed me to drop the Partition Scheme, then Partition Function.
Taking the SSMS UI route (rather than figuring out all the DDL script), R-click the partitioned table in the Object Explorer, Design, R-click design area, Indexes, select each partitioned index, expand Data Space Specification, select Data Space Type dropdown and select "Filegroup." Your index will be off the partition and back on PRIMARY.
However, you're not done. Hit F4 to bring up table properties on the right, and do the same process. Remember to Save when you're done. Freedom!

How to create high performance SQL select query, if I need a condition for referrer table's records?

For example, I have 2 tables, which I need for my query, Property and Move for history of moving properties.
I must create a query which will return all properties + 1 additional boolean column, IsInService, which will have value true, in cases, when Move table has a record for property with DateTo = null and MoveTypeID = 1 ("In service").
I have created this query:
SELECT
[ID], [Name],
(SELECT COUNT(*)
FROM [Move]
WHERE PropertyID = p.ID
AND DateTo IS NULL
AND MoveTypeID = 1) AS IsInService
FROM
[Property] as p
ORDER BY
[Name] ASC
OFFSET 100500 ROWS FETCH NEXT 50 ROWS ONLY;
I'm not so strong in SQL, but as I know, subqueries are the evil :)
How to create high performance SQL query in my case, if it is expected that these tables will include millions of records?
I've updated the code based on your comment. If you need something else, please provide input and output data expected. This is about all I can do based on inference from the existing comments. Further, this isn't intended to give you an exact working solution. My intention was to give you a prototype from which you can build your solution.
That said:
The code below is the basic join that you need. However, keep in mind that indexing is probably going to play as big a part in performance as the structure of the table and the query. It doesn't matter how you query the tables if the indexes aren't there to support the queries once you reach a certain size. There are a lot of resources online for indexing but viewing querying plans should be at the top of your list.
As a note, your column [dbo].[Property] ([Name]) should probably be NVARCHAR to allow SQL to minimize data storage. Indexes on that column will then be smaller and searches/updates faster.
DECLARE #Property AS TABLE
(
[ID] INT
, [Name] NVARCHAR(100)
);
INSERT INTO #Property
([ID]
, [Name])
VALUES (1,N'A'),
(2,N'B'),
(3,N'C');
DECLARE #Move AS TABLE
(
[ID] INT
, [DateTo] DATE
, [MoveTypeID] INT
, [PropertyID] INT
);
INSERT INTO #Move
([ID]
, [DateTo]
, [MoveTypeID]
, [PropertyID])
VALUES (1,NULL,1,1),
(2,NULL,1,2),
(3,N'2017-12-07',1,2);
SELECT [Property].[ID] AS [property_id]
, [Property].[Name] AS [property_name]
, CASE
WHEN [Move].[DateTo] IS NULL
AND [Move].[MoveTypeID] = 1 THEN
N'true'
ELSE
N'false'
END AS [in_service]
FROM #Property AS [Property]
LEFT JOIN #Move AS [Move]
ON [Move].[PropertyID] = [Property].[ID]
WHERE [Move].[DateTo] IS NULL
AND [Move].[MoveTypeID] = 1;

Adding clustered index on temp table to improve performance

I have ran an execution plan and noticed that the query is taking time while inserting into temp tables. We have multiple queries that insert into temp tables. I have shared two of them below. How do I add the clustered index to the temp table via the storedprocedure query. It needs to create the index on the fly and destroy it
if object_id('tempdb..#MarketTbl') is not null drop table #MarketTbl else
select
mc.companyId,
mc.pricingDate,
mc.tev,
mc.sharesOutstanding,
mc.marketCap
into #MarketTbl
from ciqMarketCap mc
where mc.pricingDate > #date
and mc.companyId in (select val from #companyId)
---- pricing table: holds pricing data for the stock pprice
if object_id('tempdb..#PricingTbl') is not null drop table #PricingTbl else
select
s.companyId,
peq.pricingDate,
ti.currencyId,
peq.priceMid
into #PricingTbl
from ciqsecurity s
join ciqtradingitem ti on s.securityid = ti.securityid
join ciqpriceequity peq on peq.tradingitemid = ti.tradingitemid
where s.primaryFlag = 1
and s.companyId in (select val from #companyId)
and peq.pricingDate> #date
and ti.primaryflag = 1
Execution plan
What you are doing is pure nonsense. You have to speed up your select, not insert.
And to speed it up you (maybe) need indexes on tables from which you select.
What you are doing now is trying to add a clustered index to a table that does not exist (the error tells you about it!), and the table does not exist because if it exists you drop it
1.First, your data is not more than 5 to 10 thousand, do not use temp table, use table type variable.
2.You can create the index, after inserting the data, use alter table syntax.

Dealing with large amounts of data, and a query with 12 inner joins in SQL Server 2008

There is an old SSIS package that pulls a lot of data from oracle to our Sql Server Database everyday. The data is inserted into a non-normalized database, and I'm working on a stored procedure to select that data, and insert it into a normalized database. The Oracle databases were overly normalized, so the query I wrote ended up having 12 inner joins to get all the columns I need. Another problem is that I'm dealing with large amounts of data. One table I'm selecting from has over 12 million records. Here is my query:
Declare #MewLive Table
(
UPC_NUMBER VARCHAR(50),
ITEM_NUMBER VARCHAR(50),
STYLE_CODE VARCHAR(20),
COLOR VARCHAR(8),
SIZE VARCHAR(8),
UPC_TYPE INT,
LONG_DESC VARCHAR(120),
LOCATION_CODE VARCHAR(20),
TOTAL_ON_HAND_RETAIL NUMERIC(14,0),
VENDOR_CODE VARCHAR(20),
CURRENT_RETAIL NUMERIC(14,2)
)
INSERT INTO #MewLive(UPC_NUMBER,ITEM_NUMBER,STYLE_CODE,COLOR,[SIZE],UPC_TYPE,LONG_DESC,LOCATION_CODE,TOTAL_ON_HAND_RETAIL,VENDOR_CODE,CURRENT_RETAIL)
SELECT U.UPC_NUMBER, REPLACE(ST.STYLE_CODE, '.', '')
+ '-' + SC.SHORT_DESC + '-' + REPLACE(SM.PRIM_SIZE_LABEL, '.', '') AS ItemNumber,
REPLACE(ST.STYLE_CODE, '.', '') AS Style_Code, SC.SHORT_DESC AS Color,
REPLACE(SM.PRIM_SIZE_LABEL, '.', '') AS Size, U.UPC_TYPE, ST.LONG_DESC, L.LOCATION_CODE,
IB.TOTAL_ON_HAND_RETAIL, V.VENDOR_CODE, SD.CURRENT_RETAIL
FROM MewLive.dbo.STYLE AS ST INNER JOIN
MewLive.dbo.SKU AS SK ON ST.STYLE_ID = SK.STYLE_ID INNER JOIN
MewLive.dbo.UPC AS U ON SK.SKU_ID = U.SKU_ID INNER JOIN
MewLive.dbo.IB_INVENTORY_TOTAL AS IB ON SK.SKU_ID = IB.SKU_ID INNER JOIN
MewLive.dbo.LOCATION AS L ON IB.LOCATION_ID = L.LOCATION_ID INNER JOIN
MewLive.dbo.STYLE_COLOR AS SC ON ST.STYLE_ID = SC.STYLE_ID INNER JOIN
MewLive.dbo.COLOR AS C ON SC.COLOR_ID = C.COLOR_ID INNER JOIN
MewLive.dbo.STYLE_SIZE AS SS ON ST.STYLE_ID = SS.STYLE_ID INNER JOIN
MewLive.dbo.SIZE_MASTER AS SM ON SS.SIZE_MASTER_ID = SM.SIZE_MASTER_ID INNER JOIN
MewLive.dbo.STYLE_VENDOR AS SV ON ST.STYLE_ID = SV.STYLE_ID INNER JOIN
MewLive.dbo.VENDOR AS V ON SV.VENDOR_ID = V.VENDOR_ID INNER JOIN
MewLive.dbo.STYLE_DETAIL AS SD ON ST.STYLE_ID = SD.STYLE_ID
WHERE (U.UPC_TYPE = 1) AND (ST.ACTIVE_FLAG = 1)
That query pretty much crashes our server. I tried to fix the problem by breaking the query up into smaller queries, but the temp table variable I use causes the tempdb database to fill the hard drive. I figure this is because the server runs out of memory, and crashes. Is there anyway to solve this problem?
Have you tried using a real table instead of a temporary one. You can use SELECT INTO to create a real table to store the results instead of a temporary one.
Syntax would be:
SELECT
U.UPC_NUMBER,
REPLACE(ST.STYLE_CODE, '.', '').
....
INTO
MEWLIVE
FROM
MewLive.dbo.STYLE AS ST INNER JOIN
...
The command will create the table, and may help with the memory issues you are seeing.
Additionally try looking at the execution plan in query analyser or try the index tuning wizard to suggest some indexes that may help speed up the query.
Try running the query from the Oracle server rather than from the SQL server. As it stands, there's most likely going to be a lot of communication over the wire as the query tries to process.
By pre-processing the joins (maybe with a view), you'll only be sending over the results.
Regarding the over-normalization: have you tested whether or not it's an issue in terms of speed? I find it hard to believe that it could be too normalized.
Proper indexing will definitely help
IF
amount of rows in this query not over "zillions" of rows.
Try the following:
Join on dbo.COLOR is excessive if there is FKey on dbo.STYLE_COLOR(COLOR_ID)=>dbo.COLOR(COLOR_ID)
Proper index (excessive, should be reviewed)
USE MewLive
CREATE INDEX ix1 ON dbo.STYLE_DETAIL (STYLE_ID)
INCLUDE (STYLE_CODE, LONG_DESC)
WHERE ACTIVE_FLAG = 1
GO
CREATE INDEX ix2 ON dbo.UPC (SKU_ID)
INCLUDE(UPC_NUMBER)
WHERE UPC_TYPE = 1
GO
CREATE INDEX ix3 ON dbo.SKU(STYLE_ID)
INCLUDE(SKU_ID)
GO
CREATE INDEX ix3_alternative ON dbo.SKU(SKU_ID)
INCLUDE(STYLE_ID)
GO
CREATE INDEX ix4 ON dbo.IB_INVENTORY_TOTAL(SKU_ID, LOCATION_ID)
INCLUDE(TOTAL_ON_HAND_RETAIL)
GO
CREATE INDEX ix5 ON dbo.LOCATION(LOCATION_ID)
INCLUDE(LOCATION_CODE)
GO
CREATE INDEX ix6 ON dbo.STYLE_COLOR(STYLE_ID)
INCLUDE(SHORT_DESC,COLOR_ID)
GO
CREATE INDEX ix7 ON dbo.COLOR(COLOR_ID)
GO
CREATE INDEX ON dbo.STYLE_SIZE(STYLE_ID)
INCLUDE(SIZE_MASTER_ID)
GO
CREATE INDEX ix8 ON dbo.SIZE_MASTER(SIZE_MASTER_ID)
INCLUDE(PRIM_SIZE_LABEL)
GO
CREATE INDEX ix9 ON dbo.STYLE_VENDOR(STYLE_ID)
INCLUDE(VENDOR_ID)
GO
CREATE INDEX ixA ON dbo.VENDOR(VENDOR_ID)
INCLUDE(VENDOR_CODE)
GO
CREATE INDEX ON dbo.STYLE_DETAIL(STYLE_ID)
INCLUDE(CURRENT_RETAIL)
In SELECT list replace U.UPC_TYPE, to 1 as UPC_TYPE,
Can you segregate the imports - batch them by SKU/location/vendor/whatever and run multiple queries to get the data over? Is there a particular reason it all needs to go across in one hit? (apart from the ease of writing the query)

Resources