how can I combine horizontal and vertical partitioning in SQL server - sql-server

I want to combine horizontal and vertical partitioning because I have a huge table (huge number of records and also a big row size) .. I need to know does this combined partitioning cause any negative performance ..
Is combining between the two available in SQL server 208 R2?
Is there any live example for this combining or any video tutorial ?
Can I perform the vertical partitioning on the primary key ?
Is there any disadvantages for partitioning?

In theory I think they are possible to combine.
Maybe there are some columns that are redundant or rarely accessed, this columns can be moved to another table linked to the primary table by primary key and foreign key relationships. And at the same time the primary table can be partitioned horizontally based on date column (or whatever your table data might be partitioned on).
Vertical partitioning can't be done on the primary key because vertical partitioning divides a table into multiple tables that contain fewer columns.
SQL Server 2014 supports up to 15,000 partitions by default. In versions earlier than SQL Server 2012, the number of partitions was limited to 1,000 by default. Your server needs to have at least 16Gb of RAM if a large number of partitions is in use. More than 1000 partitions affect performance, DML and DDL statements may cause memory issues.
http://technet.microsoft.com/en-us/library/ms178148(v=sql.105).aspx

Related

Partition or Index large table in SQL Server

I have a large table consisting of 4 Billion+ rows and 50 columns, most of which are either datetime or numeric except a few which are varchar.
Data will be inserted into the table on a weekly basis (about 20 million rows).
I expect queries with where clauses on some of the datetime columns, and a couple of the the varchar columns. There is no primary key in the table.
There are no indexes, nor the table is partitioned. I am using SQL Server 2016.
I understand that I need to partition or index the table, but I am not sure which approach to take or both in-fact.
Since the table is large, should I create the indexes first or should I create the partitions first? If I do create the indexes and then create the partitions, what should I do to maintain these with new data coming in weekly.
EDIT: Also, minimal updates and deletes are expected on the table
I understand that I need to partition or index the table
You need to understand what you gain from partitioning. It is not at all the case that SQL Server requires partitioning on big tables to function adequately. SQL Server scales to arbitrary tables sizes without any inherent issues.
Common benefits of partitioning are:
Mass deletion in constant time
Different storage for older partitions
Not backing up old partitions
Sometimes in special situations (e.g. columnstore), partitioning can help as a strategy to speed up queries. Normally, indexing is better for that.
Essentially, partitioning splits the table physically into multiple sub tables. Most often this has a negative effect on query plans. Indexes are perfectly capable of restricting the set of data that needs to be touched. Partitions are worse for that.
Most of the queries will be filtering on the datetime columns and on some of the varchar columns. Like, get data for a certain daterange for a certain entity. With the indexes, it will be fragmented a lot because of new inserts and rebuilding/reorganising the indexes will also consume a lot of time. I can do it but again not sure which approach.
It seems you can best solve this by indexing:
Index according to the queries you expect.
Maintain the indexes properly. This is not too hard. For example, rebuild them after the weekly load.
Since the table is large, should I create the indexes first or should I create the partitions first?
Set up that partitioning objects first. Then, create or rebuild the clustered index on the new partitioning scheme. If possible drop other indexes first and recreate them afterwards (might not work due to availability restrictions).
what should I do to maintain these with new data coming in weekly.
What concerns do you have? New data will be stored in the appropriate partitions automatically. Make sure to create new partitions before loading the data. Keep partitions ready for 2 weeks in advance. The latest partitions must always be empty to avoid costly splits.
There is no primary key in the table.
Most often this is a not a good design. Most tables should have a primary key and a clustered index. If there is no natural key use an artifical one such as a bigint identity.
You definitely can apply partitioning but my feeling is that it will not gain you what you maybe expect. But it will force you to take on additional maintenance burdens, possibly reduce performance and there is risk of making mistakes that threaten availability. Simplicity is important.

Partitioning table that is typically only inserted to in SQL Server 2012

I have a table that is having approximately 450,000 records per month going into it. It is an audit table of sorts that tracks changes to other tables in the database. That is, inserts, updates and deletes of records. Typically this table is not queried (perhaps only 2-3 times per month to examine how data in other tables changed and under very specific circumstances)
It has been put to me that we should consider partitioning this table to help improve database performance. If the table is only being inserted to 99.9% of the time and rarely queried, would there be any tangible benefit to this partitioning this table?
Thanks.
If the table is only being inserted to 99.9% of the time and rarely
queried, would there be any tangible benefit to this partitioning this
table?
Partitioning is mostly a manageability feature. I would expect no difference in insert performance with or without able partitioning. For SELECT queries, partitioning may improve performance of large scans if partitions can be eliminated (i.e. partitioning column specified in WHERE clause, but indexing and query tuning is usually the key to performance.
Partitioning can improve performance of purge operations. For example, you could use a monthly sliding window to purge an entire month of data at once rather than individual row deletes. I don't know if that's with the trouble with only 450K rows/month, though.
I think you want to get fast access to your recent data.
Add date column as first column in primary key clustered instead of partioning

Calculating Storage Requirements for SQL-Server CE

I've got data potentially to be pushed to SQL-ce on a 3rd party windows phone application but I don't have anywhere to conduct a test so I need to figure if we'll exceed the 4Gb max database size (many millions of records).
I know the sizes of various data types but are there additional requirements for indexes, row id's, etc. Also this data will need to be synchronized/replicated so I assume every row needs a GUID or the like as well?
Table1 (first 2 fields are clustered primary key)
nvarchar(20)
int
int
datetime
Table2 (First field is primary key)
int
int
datetime
Table3 (First two fields are clustered primary key)
int
int
int
I have access to Sql Server (not CE) but I'm an Oracle guy and don't know my way around there very well. Any help or insight is appreciated.
This will be a starting point: http://support.microsoft.com/kb/827968
I have command line tools to migrate from SQL Server to SQL Compact, that will give you more rprecise results: http://exportsqlce.codeplex.com
Also, Merge replication adds columns and system tables to your database.
Luckily your tables are very narrow, so the 4 GB can be stretched to a ton of rows. Every row will need a GUID, you're correct. Look into SEQUENTIALID, which will keep your records in some sort of order, reducing some performance hindrances of GUIDs. Do you currently have access to the data, or do you have a rough estimate of how many records you'll be storing? If you have the data I'd create a clean DB, create your tables and insert it. Index it to your liking and check the size. Indexes can take up quite a bit of space, but you shouldn't need much in the way of indexes on these narrow tables.

Should Lookup Table Foreign Keys Always be Indexed?

If I have a lookup table with very few records in it (say, less than ten), should I bother putting an index on the Foreign Key of another table to which it is attached? For that matter, does the lookup table even need an index on the Primary Key?
Specifically, is there any performance benefit that outweighs the overhead of maintaining the indexes? If not, are there any benefits other than speed?
Note: an example of a lookup table might be Order Status, where the tuples are:
1 - Order Received
2 - In Process
3 - Shipped
4 - Paid
On a transactional system there may be no significant benefit to putting an index on such a column (i.e. a low cardinality reference column) as the query optimiser probably won't use it. It will also generate additional disk traffic on writes to the table as the indexes have to be updated. So for low cardinality FK's on a transactional database it is usually better not to index the columns. This particularly applies to high volume systems.
Note that you may still want the FK for referential integrity and that the FK lookup on a small reference table will probably generate no I/O as the lookup table will almost always be cached.
However, you may find that you want to include the column in a composite index for some reason - perhaps to create a covering index for a commonly used query.
On a table that is frequently bulk-loaded (e.g. a data warehouse) the index write traffic will be much larger than that of the table load if you have many indexed columns. You will probably need to drop or disable the FKs and indexes for a bulk load if any indexes are present.
On a Star Schema you can get some benefit from indexing low cardinality columns, even on SQL Server. If you are doing a highly selective query (i.e. one where the query optimiser decides that the row set returned will be small) then it can do a 'star query' plan where it uses a technique known as index intersection.
Generally, query plans on a star schema should be based around a table scan of the fact table or a highly selective process that bookmarks the fact table and then returns a smaller set of rows. Index intersection is efficient for the latter type of query as the selection can be resolved before doing any I/O on the fact table.
Bitmap indexes are a real win for low cardinality columns on platforms such as Oracle that support them, but SQL Server does not. Even so, low cardinality indexes can still participate in star query plans on SQL Server.
Yes, always have an index.
The query optimizer of a modern database management system (DBMS) will make the determination as to which is faster: (1) actually reading from an index on a column, (2) performing a full table scan.
The table size (in number of rows) needs to be "large enough" for use of the index to be considered.
Yes to both. Always index as a rule of thumb.
Points:
You also can't set up an FK without a unique index on the lookup table
What if you want to delete or update in the lookup table? Especially accidently...
However, saying that, we don't always.
We have very OLTP table (5 million rows+ per day) with several parent tables. We only indexes on the FK columns where we need them. We assume no deletes/key updates on some parent tables, so we reduce the amount of work needed and disk space used.
We used the SQL Server 2005 dmvs to establish that indexes weren't used. We still have the FK in place though.
My personal opinion is that you should... it may be small now but ALWAYS anticipate your tables growing in size. A good database schema will grow easily with more records. Foreign Keys are almost always a good idea.
In sql server, the primary key is the clustered index if there isn't one already (clustered index that is).

How Many Tables Should be Placed on the Same Partition Scheme?

My company just provided me with SQL Server 2005 Enterprise Edition and I wanted to partition some tables with large(r) amounts of data. I have about about 5 or 6 tables which would be a good fit to partition by datetime.
There will be some queries that need 2 of these tables in the course of the same query.
I was wondering if I should use the same partition scheme for all of these tables or if I should copy the partition scheme and put different tables on each one.
Thanks for any help in advance.
You should define your partition by what makes sense for your domain. i.e. if you deal primarily in year quarters, create 5 partitions (4 quarters + 1 overspill).
You should also take into account physical file placement. From the MSDN article:
The first step in partitioning tables
and indexes is to define the data on
which the partition is keyed. The
partition key must exist as a single
column in the table and must meet
certain criteria. The partition
function defines the data type on
which the key (also known as the
logical separation of data) is based.
The function defines this key but not
the physical placement of the data on
disk. The placement of data is
determined by the partition scheme. In
other words, the scheme maps the data
to one or more filegroups that map the
data to specific file(s) and therefore
disks. The scheme always uses a
function to do this: if the function
defines five partitions, then the
scheme must use five filegroups. The
filegroups do not need to be
different; however, you will get
better performance when you have
multiple disks and, preferably,
multiple CPUs. When the scheme is used
with a table, you will define the
column that is used as an argument for
the partition function.
These two articles may be useful:
Partitioned Tables in SQL Server 2005
Partitioned Tables and Indexes in SQL Server 2005

Resources