My company just provided me with SQL Server 2005 Enterprise Edition and I wanted to partition some tables with large(r) amounts of data. I have about about 5 or 6 tables which would be a good fit to partition by datetime.
There will be some queries that need 2 of these tables in the course of the same query.
I was wondering if I should use the same partition scheme for all of these tables or if I should copy the partition scheme and put different tables on each one.
Thanks for any help in advance.
You should define your partition by what makes sense for your domain. i.e. if you deal primarily in year quarters, create 5 partitions (4 quarters + 1 overspill).
You should also take into account physical file placement. From the MSDN article:
The first step in partitioning tables
and indexes is to define the data on
which the partition is keyed. The
partition key must exist as a single
column in the table and must meet
certain criteria. The partition
function defines the data type on
which the key (also known as the
logical separation of data) is based.
The function defines this key but not
the physical placement of the data on
disk. The placement of data is
determined by the partition scheme. In
other words, the scheme maps the data
to one or more filegroups that map the
data to specific file(s) and therefore
disks. The scheme always uses a
function to do this: if the function
defines five partitions, then the
scheme must use five filegroups. The
filegroups do not need to be
different; however, you will get
better performance when you have
multiple disks and, preferably,
multiple CPUs. When the scheme is used
with a table, you will define the
column that is used as an argument for
the partition function.
These two articles may be useful:
Partitioned Tables in SQL Server 2005
Partitioned Tables and Indexes in SQL Server 2005
Related
I have a partition function and a partition schema by date that I'm already using on a big table on my DDBB. As I have several big tables in my database that share this common pattern of having a date column, I wonder if it's OK to reuse this partition function & schema among them or if otherwise it's better to create a separate pair of partition function & schema for each table to be partitioned on the database.
Like all things SQL Server, "it depends".
Sharing partition functions/schemes among different objects is ok when partition maintenance for all referencing objects is performed on the same cycle and planned accordingly. In cases where partitioning is leveraged to efficiently purge/load data by date, those processes need to be coordinated to avoid conflicts.
I usually create a separate function/scheme for each object except in cases where maintenance will be always be performed in tandem, such as related tables of a single application.
I want to combine horizontal and vertical partitioning because I have a huge table (huge number of records and also a big row size) .. I need to know does this combined partitioning cause any negative performance ..
Is combining between the two available in SQL server 208 R2?
Is there any live example for this combining or any video tutorial ?
Can I perform the vertical partitioning on the primary key ?
Is there any disadvantages for partitioning?
In theory I think they are possible to combine.
Maybe there are some columns that are redundant or rarely accessed, this columns can be moved to another table linked to the primary table by primary key and foreign key relationships. And at the same time the primary table can be partitioned horizontally based on date column (or whatever your table data might be partitioned on).
Vertical partitioning can't be done on the primary key because vertical partitioning divides a table into multiple tables that contain fewer columns.
SQL Server 2014 supports up to 15,000 partitions by default. In versions earlier than SQL Server 2012, the number of partitions was limited to 1,000 by default. Your server needs to have at least 16Gb of RAM if a large number of partitions is in use. More than 1000 partitions affect performance, DML and DDL statements may cause memory issues.
http://technet.microsoft.com/en-us/library/ms178148(v=sql.105).aspx
Index Organized Tables (IOTs) are tables stored in an index structure. Whereas a table stored
in a heap is unorganized, data in an IOT is stored and sorted by primary key (the data is the index). IOTs behave just like “regular” tables, and you use the same SQL to access them.
Every table in a proper relational database is supposed to have a primary key... If every table in my database has a primary key, should I always use an index organized table?
I'm guessing the answer is no, so when is an index organized table not the best choice?
Basically an index-organized table is an index without a table. There is a table object which we can find in USER_TABLES but it is just a reference to the underlying index. The index structure matches the table's projection. So if you have a table whose columns consist of the primary key and at most one other column then you have a possible candidate for INDEX ORGANIZED.
The main use case for index organized table is a table which is almost always accessed by its primary key and we always want to retrieve all its columns. In practice, index organized tables are most likely to be reference data, code look-up affairs. Application tables are almost always heap organized.
The syntax allows an IOT to have more than one non-key column. Sometimes this is correct. But it is also an indication that maybe we need to reconsider our design decisions. Certainly if we find ourselves contemplating the need for additional indexes on the non-primary key columns then we're probably better off with a regular heap table. So, as most tables probably need additional indexes most tables are not suitable for IOTs.
Coming back to this answer I see a couple of other responses in this thread propose intersection tables as suitable candidates for IOTs. This seems reasonable, because it is common for intersection tables to have a projection which matches the candidate key: STUDENTS_CLASSES could have a projection of just (STUDENT_ID, CLASS_ID).
I don't think this is cast-iron. Intersection tables often have a technical key (i.e. STUDENT_CLASS_ID). They may also have non-key columns (metadata columns like START_DATE, END_DATE are common). Also there is no prevailing access path - we want to find all the students who take a class as often as we want to find all the classes a student is taking - so we need an indexing strategy which supports both equally well. Not saying intersection tables are not a use case for IOTs. just that they are not automatically so.
I'd consider them for very narrow tables (such as the join tables used to resolve many-to-many tables). If (virtually) all the columns in the table are going to be in an index anyway, then why shouldn't you used an IOT.
Small tables can be good candidates for IOTs as discussed by Richard Foote here
I consider the following kinds of tables excellent candidates for IOTs:
"small" "lookup" type tables (e.g. queried frequently, updated infrequently, fits in a relatively small number of blocks)
any table that you already are going to have an index that covers all the columns anyway (i.e. may as well save the space used by the table if the index duplicates 100% of the data)
From the Oracle Concepts guide:
Index-organized tables are useful when
related pieces of data must be stored
together or data must be physically
stored in a specific order. This type
of table is often used for information
retrieval, spatial (see "Overview of
Oracle Spatial"), and OLAP
applications (see "OLAP").
This question from AskTom may also be of some interest especially where someone gives a scenario and then asks would an IOT perform better than an heap organised table, Tom's response is:
we can hypothesize all day long, but
until you measure it, you'll never
know for sure.
An index-organized table is generally a good choice if you only access data from that table by the key, the whole key, and nothing but the key.
Further, there are many limitations about what other database features can and cannot be used with index-organized tables -- I recall that in at least one version one could not use logical standby databases with index-organized tables. An index-organized table is not a good choice if it prevents you from using other functionality.
All an IOT really saves is the logical read(s) on the table segment, and as you might have spent two or three or more on the IOT/index this is not always a great saving except for small data sets.
Another feature to consider for speeding up lookups, particularly on larger tables, is a single table hash cluster. When correctly created they are more efficient for large data sets than an IOT because they require only one logical read to find the data, whereas an IOT is still an index that needs multiple logical i/o's to locate the leaf node.
I can't per se comment on IOTs, however if I'm reading this right then they're the same as a 'clustered index' in SQL Server. Typically you should think about not using such an index if your primary key (or the value(s) you're indexing if it's not a primary key) are likely to be distributed fairly randomly - as these inserts can result in many page splits (expensive).
Indexes such as identity columns (sequences in Oracle?) and dates 'around the current date' tend to make for good candidates for such indexes.
An Index-Organized Table--in contrast to an ordinary table--has its own way of structuring, storing, and indexing data.
Index organized tables (IOT) are indexes which actually hold the data which is being indexed, unlike the indexes which are stored somewhere else and have links to actual data.
In Oracle, a table cluster is a group of tables that share common columns and store related data in the same blocks. When tables are clustered, a single data block can contain rows from multiple tables. For example, a block can store rows from both the employees and departments tables rather than from only a single table:
http://download.oracle.com/docs/cd/E11882_01/server.112/e10713/tablecls.htm#i25478
Can this be done in SQLServer?
On the one hand, this sounds very much like views. Data is stored in the table, and the views provide access to only those columns within the table specified by the view's definition. (Thus, your "common columns".)
On the other hand, this sounds like how the database engine stores data the hard drive. In SQL, this is done via 8kb pages. Assuming two completely separate table definitions, there is no way to store data from two such distinct tables in the same page. (If an Oracle block is more along the lines of OS files, then that turns into SQL Files and File Groups, at which point the answer is "yes"... but I suspect this is not what blocks are about.)
Not based on what I am reading here. In SQL Server, each table's pages are independent of other tables' pages.
On the other hand, each table can have a choice of clustered index which can influence the performance greatly. In addition, I believe partitions will influence the execution plan and if both table have similar partition functions, this might boost performance, but the normal objective of partitioning is not for performance reasons.
Typically, optimization of JOINS involves index strategies (in my experience, preferably with covering non-clustered indexes)
The database I'm working with is currently over 100 GiB and promises to grow much larger over the next year or so. I'm trying to design a partitioning scheme that will work with my dataset but thus far have failed miserably. My problem is that queries against this database will typically test the values of multiple columns in this one large table, ending up in result sets that overlap in an unpredictable fashion.
Everyone (the DBAs I'm working with) warns against having tables over a certain size and I've researched and evaluated the solutions I've come across but they all seem to rely on a data characteristic that allows for logical table partitioning. Unfortunately, I do not see a way to achieve that given the structure of my tables.
Here's the structure of our two main tables to put this into perspective.
Table: Case
Columns:
Year
Type
Status
UniqueIdentifier
PrimaryKey
etc.
Table: Case_Participant
Columns:
Case.PrimaryKey
LastName
FirstName
SSN
DLN
OtherUniqueIdentifiers
Note that any of the columns above can be used as query parameters.
Rather than guess, measure. Collect statistics of usage (queries run), look at the engine own statistics like sys.dm_db_index_usage_stats and then you make an informed decision: the partition that bests balances data size and gives best affinity for the most often run queries will be a good candidate. Of course you'll have to compromise.
Also don't forget that partitioning is per index (where 'table' = one of the indexes), not per table, so the question is not what to partition on, but which indexes to partition or not and what partitioning function to use. Your clustered indexes on the two tables are going to be the most likely candidates obviously (not much sense to partition just a non-clustered index and not partition the clustered one) so, unless you're considering redesign of your clustered keys, the question is really what partitioning function to choose for your clustered indexes.
If I'd venture a guess I'd say that for any data that accumulates over time (like 'cases' with a 'year') the most natural partition is the sliding window.
If you have no other choice you can partition by key module the number of partition tables.
Lets say that you want to partition to 10 tables.
You will define tables:
Case00
Case01
...
Case09
And partition you data by UniqueIdentifier or PrimaryKey module 10 and place each record in the corresponding table (Depending on your unique UniqueIdentifier you might need to start manual allocation of ids).
When performing a query, you will need to run same query on all tables, and use UNION to merge the result set into a single query result.
It's not as good as partitioning the tables based on some logical separation which corresponds to the expected query, but it's better then hitting the size limit of a table.
Another possible thing to look at (before partitioning) is your model.
Are you in a normalized database? Are there further steps which could improve performance by different choices in the normalization/de-/partial-normalization? Are there options to transform the data into a Kimball-style dimensional star model which is optimal for reporting/querying?
If you aren't going to drop partitions of the table (sliding window, as mentioned) or treat different partitions differently (you say any columns can be used in the query), I'm not sure what you are trying to get out of the partitioning that you won't already get out of your indexing strategy.
I'm not aware of any table limits on rows. AFAIK, the number of rows is limited only by available storage.