How do you set up partitioning by year when most fact tables have a datetime2 data type?

How do you set up partitioning by year when most fact tables have a datetime2 data type? - sql-server

We're using SQL Server 2019. Our fact tables utilize datetime2 but I want to partition on year.
I don't have sysadmin privs so I can't set up different filegroups. I can create partition functions and partition schemes, but it isn't clear to me how to set up the partition scheme so that when I partition the table on ActivityLog for example that it will store entries in their respective year partition.
I've searched the web and haven't found answers as to how it all works.

Partitioning by year on a datetime2 column in a fact table can be a useful technique for managing large data sets, improving query performance, and reducing maintenance costs. Here are the steps to set up partitioning by year:
Define a partition function: A partition function defines the ranges or
boundaries for partitioning the data. In this case, you would define a
partition function that partitions the data by year. For example, the
following code creates a partition function that partitions the data by
year:
CREATE PARTITION FUNCTION pfFactTableByYear (datetime2(0))
AS RANGE RIGHT FOR VALUES
('2010-01-01T00:00:00', '2011-01-01T00:00:00', '2012-01-01T00:00:00', '2013-01-01T00:00:00', '2014-01-01T00:00:00', '2015-01-01T00:00:00', '2016-01-01T00:00:00', '2017-01-01T00:00:00', '2018-01-01T00:00:00', '2019-01-01T00:00:00', '2020-01-01T00:00:00')
Define a partition scheme: A partition scheme maps the partition function to
a set of filegroups. In this case, you would define a partition scheme that
maps the partition function to a set of filegroups. For example, the
following code creates a partition scheme that maps the partition function
to a set of filegroups:
CREATE PARTITION SCHEME psFactTableByYear
AS PARTITION pfFactTableByYear
TO (fg2010, fg2011, fg2012, fg2013, fg2014, fg2015, fg2016, fg2017, fg2018, fg2019, fg2020)
Create the fact table with partitioning: You would create the fact table
with the partition scheme defined in step 2. For example, the following code
creates a fact table with partitioning by year:
CREATE TABLE FactTable
(
Id INT IDENTITY(1,1),
DateColumn datetime2(0) NOT NULL,
ValueColumn decimal(18,2) NOT NULL,
CONSTRAINT PK_FactTable PRIMARY KEY (Id, DateColumn)
)
ON psFactTableByYear(DateColumn)
This creates a fact table with a primary key that includes the partitioning column (DateColumn), and maps the partition scheme to the fact table's data filegroups.
Load data into the fact table: Once the fact table is created, you can load
data into it using standard INSERT statements.
Perform maintenance tasks: As time goes on, new partitions will need to be
created to accommodate new data. You can automate this process using
partition switching or by running a maintenance script that creates new
partitions on a regular basis. You may also want to periodically archive or
remove old data to keep the data set manageable.
Note that partitioning by year is just one option for partitioning a fact table, and the partition function and scheme would need to be adjusted accordingly for other partitioning strategies, such as partitioning by month, quarter, or some other time period.

Related

Altering/Editing an already partitioned table

What steps to take to add additional partitions to the end of an already partitioned table in SQL Server?
Conditions:
The Partition Function is Right Range.
Table considers as a VLTB.
No DB downtime is acceptable (<10min).
Also, How to verify the partitions and rows are correctly mapped?

Addressing your questions in turn:
What steps to take to add additional partitions to the end of an already partitioned table in SQL Server?
Partitioned tables are built on partition schemes which themselves are built on partition functions. Partition functions explicitly specify partition boundaries which implicitly define the partitions. To add a new partition to the table, you need to alter the partition function to add a new partition boundary. The syntax for that is alter partition function... split. For example, let's say that you have an existing partition function on a datetime data type that defines monthly partitions.
CREATE PARTITION FUNCTION PF_Monthly(datetime)
AS RANGE RIGHT FOR VALUES (
'2022-10-01',
'2022-11-01',
'2022-12-01',
'2023-01-01'
);
Pausing there and talking about the last two partitions in the current setup. The next-to-last partition is defined as 2022-12-01 <= x < 2023-01-01 while the last partition is defined as 2023-01-01 <= x. Which is to say that the next-to-last partition is bounded for the month of December 2022, the last partition is unbounded on the high side and includes data for January 2023 but also anything larger.
If you want to bound the last partition to just January 2023, you'll add a partition boundary to the function for the high side of that partition. There's a small catch in that you'll also need to alter the partition scheme to tell SQL where to put data, but that's a small thing.
ALTER PARTITION SCHEME PS_Monthly
NEXT USED someFileGroup;
ALTER PARTITION FUNCTION PF_Monthly()
SPLIT RANGE ('2023-02-01');
At this point, what used to be your highest partition is now defined as 2023-01-01 <= x < 2023-02-01 and the highest partition is defined as 2023-02-01 <= x. I should note that adding a boundary to a partition function will affect all tables that use it. When I was using table partitioning at a previous job, I had a rule to have only one table using a given partition function (even if they were logically equivalent).
No DB downtime is acceptable (<10min)
The above exposition doesn't mention one important point - if there is data in either side of the new boundary, a new B-tree is going to be built for it (which is a size-of-data operation). There's more on that in the documentation. To keep that at a minimum, I like to keep two empty partitions at the end of the scheme. Using my above example, that would mean that I'd have added the January partition boundary in November. By doing it this way, you have some leeway in when the actual partition split happens (i.e. if it's a bit late, you're not accidentally incurring data movement). I'd also put in monitoring that's something along the lines of "if the highest partition boundary is less than 45 days away, alert". A slightly more sophisticated but more correct alert would be "if there is data in the second to last partition, send an alert".
Also, How to verify the partitions and rows are correctly mapped?
You can query the DMVs for this. I like using the script in this blog post. There's also the $PARTITION() function if you want to see which partition specific rows in your table belong to.

Moving partition to a new filegroup

I have a huge table which is partitioned by date.
We have 8 partitions all on different file groups, with one of these file groups being PRIMARY.
I would like to replace the PRIMARY file group with a new file group called 'FG_odsvr_misc', and remove PRIMARY from the partition schema.
How would i achieve this without creating a new table with a new partition function?
The boundaries look like below -
The partition function is as below -
CREATE PARTITION FUNCTION [fn_odstable1](numeric(9,0))
AS RANGE LEFT FOR VALUES (20151231, 20161231, 20171231, 20181231, 20191231, 20201231, 20211231)
The partition scheme is as below -
CREATE PARTITION SCHEME [sch_odstable1] AS PARTITION [fn_odstable1]
TO ([FG_odsvr_pre_2016], [FG_odsvr_2016], [FG_odsvr_2017], [FG_odsvr_2018], [FG_odsvr_2019], [FG_odsvr_2020], [FG_odsvr_2021], [PRIMARY])

Ok. The partition you have on the PRIMARY filegroup is the so-called "Permanent Partition"
From Dan Guzman's Table Partitioning Best Practices:
You might not be aware that each partition scheme has a permanent
partition that can never be removed. This is the first partition of a
RANGE RIGHT function and the last partition of a RANGE LEFT one. Be
mindful of this permanent partition when creating a new partition
scheme when multiple filegroups are involved because the filegroup on
which this permanent partition is created is determined when the
partition scheme is created and cannot be removed from the scheme.
. . .
Consider mapping partitions containing data outside the expected range
to a dummy filegroup with no underlying files. This will guarantee
data integrity much like a check constraint because data outside the
allowable range cannot be inserted. If you must accommodate errant
data rather than rejecting it outright, instead map these partitions
to a generalized filegroup like DEFAULT or one designated specifically
for that purpose.
http://www.dbdelta.com/table-partitioning-best-practices/
Since this is a RANGE LEFT partition scheme you can move all the data off of PRIMARY onto a new filegroup by splitting the rightmost partition at a boundary point greater than the greatest value present in your table.
ALTER PARTITION SCHEME sch_odstable1 NEXT USED [FG_odsvr_2022];
ALTER PARTITION FUNCTION fn_odstable1() SPLIT RANGE (20221231);
The rightmost partition will still be on PRIMARY though. You'll just need to create your future partitions before you need them to keep that partition empty. If you want to you can create a new Partition Scheme
alter database current add filegroup no_files_cant_be_used
CREATE PARTITION SCHEME [sch_odstable2] AS PARTITION [fn_odstable1]
TO ([FG_odsvr_pre_2016], [FG_odsvr_2016], [FG_odsvr_2017], [FG_odsvr_2018], [FG_odsvr_2019], [FG_odsvr_2020], [FG_odsvr_2021], [FG_odsvr_2022], no_files_cant_be_used)
And then create a matching table on the new scheme, ALTER TABLE SWITCH to move all the partitions to the new table, and then rename the tables.

Alter partition function and scheme when a columnstore table is in same database

I have a table which is having weekly partitioned with partition function and scheme defined. The most important thing is this table is having clustered columnstore index with same weekly partition scheme.
So now I have to add few more ranges in partition function and scheme. Which is failing with error saying “cannot alter partition function which is having non empty partition ......... “ where in the data file is of only 4KB with no data loaded.
From one of the post of 2014 Ssms, I came to know that we need to disable clustered index and alter the partition scheme and enable again.
Please help in solving this issue. I’m using 2016 sql and enterprise edition. Thanks in advance.

For columnstore index you need to empty the partition that is going to be split. That can be done by:
moving the data to other partition (by updating its partition key)
altering Partition Schema (with NEXT USED clause) and Partition function (with SPLIT RANGE clause)
moving the data back to correct partition.
Above can be done in one transaction.
For the future, (assuming the data is partitioned by date periods) it's recommended to have a few empty partitions, so a maintenance task/job can automatically split the partitions (and create a few new partitions for future periods) without any issues.
Alternatively you can use ALTER TABLE with SWITCH PARTITION clause, but that approach is less efficient. SWITCH PARTITION is mostly used to quickly delete the old partitions.

How to create composite partition in SQL Server?

I want to create a sample database using composite partition. I know about Range Partition and List Partition. But, I don't have enough knowledge about Hash Values and how to create Hash Partition in my database?. So, I have decided that I should make a sample database using Composite Partition and I want to use Range Partition and Hash Partition in it. Can anybody describe it more and in easy word so, i can understand well about Hash Partition as well as Composite Partition.
I have also read some documents on internet. But, I could not understand how to create Hash Partition and How to create Composite Partition in my database. Actually I don't have enough knowledge about Hash Value and Hash Functoin. I have read about it but, I could not understand very well. I need a simple definition.
Definition of Horizontal Partition & Vertical Partition
Partition (database)
Hash Functions

Composite Partitioning feature is not available in SQL Server 2008. Only Range Partitioning is available in SQL Server.

Although the partitioning column must be a single column, it does not need to be numeric and it can be calculated so that the range can include multiple columns.
For instance it is common to partition on datetime data by month. This will work well, because that data is usually in a single column, but what do you do if you have data for multiple companies and you also want to partition by company? For this you could use a computed column for the partitioning column. This will create a computed column using the ‘company id’ and ‘order month’ which is then used for the partitions. It will partition three companies for the first three months of 2007.
the computed column must be persisted to form the partitioning column.
CREATE PARTITION FUNCTION MyPartitionRange (INT) AS RANGE LEFT FOR VALUES (1200701,1200702,1200703,2200701,2200702,2200703,3200701,3200702,3200703)
CREATE PARTITION SCHEME MyPartitionScheme AS PARTITION MyPartitionRange ALL TO ([PRIMARY])
CREATE TABLE CompanyOrders
( Company_id INT ,
OrderDate datetime ,
Item_id INT ,
Quantity INT ,
OrderValue decimal(19,5) ,
PartCol AS Company_id * 10000 + CONVERT(VARCHAR(4),OrderDate,112) persisted
) ON MyPartitionScheme (PartCol)

Where should the partitioning column go in the primary key on SQL Server?

Using SQL Server 2005 and 2008.
I've got a potentially very large table (potentially hundreds of millions of rows) consisting of the following columns:
CREATE TABLE (
date SMALLDATETIME,
id BIGINT,
value FLOAT
)
which is being partitioned on column date in daily partitions. The question then is should the primary key be on date, id or value, id?
I can imagine that SQL Server is smart enough to know that it's already partitioning on date and therefore, if I'm always querying for whole chunks of days, then I can have it second in the primary key. Or I can imagine that SQL Server will need that column to be first in the primary key to get the benefit of partitioning.
Can anyone lend some insight into which way the table should be keyed?

As is the standard practice, the Primary Key should be the candidate key that uniquely identifies a given row.
What you wish to do, is known as Aligned Partitioning, which will ensure that the primary key is also split by your partitioning key and stored with the appropriate table data. This is the default behaviour in SQL Server.
For full details, consult the reference Partitioned Tables and Indexes in SQL Server 2005

There is no specific need for the partition key to be the first field of any index on the partitioned table, as long as it appears within the index it can then be aligned to the partition scheme.
With that in mind, you should apply the normal rules for index field order supporting the most queries / selectivity of the values.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight