sql server clustered index on non unique keys

sql server clustered index on non unique keys - sql-server

We have a very large database and have been using shards which we want to get away from. The shards work by everytime a table gets really big, we start a new table that has the same schema as the previous table and keep a number in another table that helps us find which table the data is in. This is a cumbersome manual process and means we have data spread out over N different tables all with the same schema.
The idea we are trying for is to eliminate this need for sharding by using indexes. Our data lookup queries do not use unique keys and many records are returned that have the same values across columns.
The following illustrates many of our lookup selects for a particular table, the fields with the * indicate that field may or may not be in the select.
where clause: scheduled_test, *script, *label, *error_message
group/order: messenger_id, timeslice, script, label, error_message, step_sequence, *adapter_type
My thought is that I would not want to create an index with all of these 11 fields. I instead picked 3 of the ones that seemed to be used more commonly including the one that is always in the where clause. I had read that it is advisable not to have too wide an index with too many fields. I also had heard that the optimizer will use the indexed fields first and that it is not uncommon to have non unique indexes even though MSDN states to the effect that unique indexes is the big advantage. It's just not how our data is designed. I realize SQL will add something to the index to make it unique, but that doesn't seem to matter for our purposes.
When I look at the execution plan in sql server management studio on a query that is similar to what we might run, it says "clustered index seek cost 100%", but it is using the clustered index that I created so I am hoping this is better than the default clustered index that is just the generated primary key (previously how the table was defined). I am hoping that what I have here is as good or better than our sharding method and will eliminate the need for the shards.
We do insert alot of data into the tables all at once, but these rows all have the same data values across many columns and I think they would even tend to get inserted at the end as well. These inserts don't share values with older data and if the index is just 3 columns hopefully that would not be a very big hit on the inserts.
Does what I am saying seem reasonable or what else should I look into or consider ? Thanks alot, I am not that familiar with these types of indexing issues but have been looking on various websites and experimenting.

Generally, the narrower the clustered index the better as the clustering key of the clustered index will be added to all non-clustered indexes, making them less efficient.
SQL server will add a uniquifier to non-unique clustered indexes, making them (and all non-clustered indexes) even wider still.
If the space used by these indexes is not an issue for you, then you should consider whether the value of the clustered index key is ever increasing (or decreasing) as if it isn't, you will get page splits and fragmentation which will definitely hurt your inserts.
It's probably worth setting this up in a test system if you can to examine the impact different indexing strategies have on your normal queries.

Related

How do I know if I should create an Non clustered Index on a Clustered Index or on a heap?

I have a DB containing some tables, no table has non-clustered index defined. The big application which uses this DB is slow(because the number of rows are close to a million). I want to optimize DB fetch operations by adding indexes. When I read about indexes I came across index names like:
Clustered Index
Non clustered Index on a Clustered Index
Non Clustered Index on a heap
Also, indexes need to be created only on some columns. How will I identify that in a table which kind of index need to be created and across which column(s)?
P.S. Execution plan while running query tells to create NCI on all columns. Can I blindly go ahead and create index as suggested by SQL Server?

A clustered index is a type of index which defines how the data of your table will be stored (more precisely, how the data is sorted). This is the reason why the clustered index columns should be chosen very carefully (sequentially inserted data is primordial or you will end up with fragmentation and performance issues over time, an integer "identity" column is a good pick for example).
I found out that it is a good practice to always have a clustered index on your permanent tables.
A table without a clustered index is a heap because data is not sorted in a particular way (it'll be added at the end of the file), data is therefore harder to retrieve. The only improvement you can get from using a heap without indexes is that data insertion will be faster.
A non-clustered index is a separate file that will help speed up your queries on the columns you choose (it will store values of the indexed data and their reference to the location in the main file). As the data of your table become more and more important, having those separate files can dramatically improve the performance of your queries because the db engine won't have to scan the entire table for the data you are looking for, but just look for the position of the rows to retrieve in the index file (which contains ordered data of the columns you've chosen).
Adding indexes will speed up your select queries, but slow down writing operations as the indexes have to be updated. So, don't create too many indexes on too many columns !

There are two types of tables: heap tables (which have no clustered index) and clustered tables (which do). Each of these can have any number of non-clustered indexes built on them.
When do you use a heap table? Realistically, in only one scenario: when you're doing parallel bulk imports. This specific scenario requires that the table have no clustered index. In all other scenarios, a heap table has worse performance than a table with a clustered index -- don't take my word for it, though: Microsoft has an article on this that, while dated, is still relevant. In other words, for most practical database work, you can ignore heap tables as a curiosity.
On what do you create your clustered index? Ideally, on a column with values that are ever increasing (or decreasing) and aren't changed in updates. Why? Because this has the least overhead for updating, as no data has to be moved. Because of these two requirements, surrogate keys in the form of IDENTITY columns are popular, since they neatly meet them. This is certainly not the only possible choice, though: indexing on an ever increasing timestamp is also popular (in big data warehouses, for example).
With that (mostly) out of the way, how do you decide what other columns to index? Now that's a great question, but not one I feel qualified to answer in all its glory here. I've gotten a lot of experience myself with index design over the years, but I'm not aware of specific books or articles that I could recommend (which is not to say they don't exist, and I hope other people can chime in with suggestions). For what it's worth, Microsoft itself has written a guide here, which is quite in-depth (perhaps too much so), but I haven't thoroughly read this myself.
Can you blindly go ahead and create the indexes as suggested by the query optimizer? If by that you mean "should I", then the answer is almost certainly no. The query optimizer is very eager to suggest and and all possible indexes that could speed up a query, but that doesn't mean they should all be created -- every index increases the overhead of performing inserts and updates on the table. If you followed the optimizer's advice, it's probable that you would eventually end up with indexes covering every possible combination of columns, which would be pretty terrible for anything that's not a SELECT query. Having said that, creating too many indexes is almost always not as awful as creating no indexes at all, since that quickly kills performance for most queries that involve tables with more than about 10.000 rows.
I could write books on this topic, but I haven't the time or (I fear) the skill. I hope this at least gets you started.

Create more than one non clustered index on same column in SQL Server

What is the index creating strategy?
Is it possible to create more than one non-clustered index on the same column in SQL Server?
How about creating clustered and non-clustered on same column?
Very sorry, but indexing is very confusing to me.
Is there any way to find out the estimated query execution time in SQL Server?

The words are rather logical and you'll learn them quite quickly. :)
In layman's terms, SEEK implies seeking out precise locations for records, which is what the SQL Server does when the column you're searching in is indexed, and your filter (the WHERE condition) is accurrate enough.
SCAN means a larger range of rows where the query execution planner estimates it's faster to fetch a whole range as opposed to individually seeking each value.
And yes, you can have multiple indexes on the same field, and sometimes it can be a very good idea. Play out with the indexes and use the query execution planner to determine what happens (shortcut in SSMS: Ctrl + M). You can even run two versions of the same query and the execution planner will easily show you how much resources and time is taken by each, making optimization quite easy.
But to expand on these a bit, say you have an address table like so, and it has over 1 billion records:
CREATE TABLE ADDRESS
(ADDRESS_ID INT -- CLUSTERED primary key ADRESS_PK_IDX
, PERSON_ID INT -- FOREIGN KEY, NONCLUSTERED INDEX ADDRESS_PERSON_IDX
, CITY VARCHAR(256)
, MARKED_FOR_CHECKUP BIT
, **+n^10 different other columns...**)
Now, if you want to find all the address information for person 12345, the index on PERSON_ID is perfect. Since the table has loads of other data on the same row, it would be inefficient and space-consuming to create a nonclustered index to cover all other columns as well as PERSON_ID. In this case, SQL Server will execute an index SEEK on the index in PERSON_ID, then use that to do a Key Lookup on the clustered index in ADDRESS_ID, and from there return all the data in all other columns on that same row.
However, say you want to search for all the persons in a city, but you don't need other address information. This time, the most effective way would be to create an index on CITY and use INCLUDE option to cover PERSON_ID as well. That way, a single index seek / scan would return all the information you need without the need to resort to checking the CLUSTERED index for the PERSON_ID data on the same row.
Now, let's say both of those queries are required but still rather heavy because of the 1 billion records. But there's one special query that needs to be really really fast. That query wants all the persons on addresses that have been MARKED_FOR_CHECKUP, and who must live in New York (ignore whatever checkup means, that doesn't matter). Now you might want to create a third, filtered index on MARKED_FOR_CHECKUP and CITY, with INCLUDE covering PERSON_ID, and with a filter saying CITY = 'New York' and MARKED_FOR_CHECKUP = 1. This index would be insanely fast, as it only ever cover queries that satisfy those exact conditions, and therefore has a fraction of the data to go through compared to the other indexes.
(Disclaimer here, bear in mind that the query execution planner is not stupid, it can use multiple nonclustered indexes together to produce the correct results, so the examples above may not be the best ones available as it's very hard to imagine when you would need 3 different indexes covering the same column, but I'm sure you get the idea.)
The types of index, their columns, included columns, sorting orders, filters etc depend entirely on the situation. You will need to make covering indexes to satisfy several different types of queries, as well as customized indexes created specifically for singular, important queries. Each index takes up space on the HDD so making useless indexes is wasteful and requires extra maintenance whenever the data model changes, and wastes time in defragmentation and statistics update operations though... so you don't want to just slap an index on everything either.
Experiment, learn and work out which works best for your needs.

I'm not the expert on indexing either, but here is what I know.
You can have only ONE Clustered Index per table.
You can have up to a certain limit of non clustered indexes per table. Refer to http://social.msdn.microsoft.com/Forums/en-US/63ba3877-e0bd-4417-a04b-19c3bfb02ac9/maximum-number-of-index-per-table-max-no-of-columns-in-noncluster-index-in-sql-server?forum=transactsql
Indexes should just have different names, but its better not to use the same column(s) on a lot of different indexes as you will run into some performance problems.
A very important point to remember is that Indexes although it makes your select faster, influence your Insert/Update/Delete speed as the information needs to be added to the index, which means that the more indexes you have on a column that gets updated a lot, will drastically reduce the speed of the update.
You can include columns that is used on a CLUSTERED index in one or more NON-CLUSTERED indexes.
Here is some more reading material
http://www.sqlteam.com/article/sql-server-indexes-the-basics
http://www.programmerinterview.com/index.php/database-sql/what-is-an-index/
EDIT
Another point to remember is that an index takes up space just like the table. The more indexes you create the more space it uses, so try not to use char/varchar (or nchar/nvarchar) in an index. It uses to much space in the index, and on huge columns give basically no benefit. When your Indexes start to become bigger than your table, it also means that you have to relook your index strategy.

Best practice for storing millions of rows with TSQL (Sql Server 2008)

To start off, I'm not that great with database strategies, so I don't know really how to even approach this.
What I want to do is store some info in a database. Essentially the data is going to look like this
SensorNumber (int)
Reading (int)
Timestamp (Datetime?)(I just want to track down to the minute, nothing further is needed)
The only thing about this is that over a few months of tracking I'm going to have millions of rows (~5 million rows).
I really only care about searching by Timestamp and/or SensorNumber. The data in here is pretty much going to be never edited (insert once, read many times).
How should I go about building this? Is there anything special I should do other than create the table? and create the one index for SensorNumber and Temp?

Based on your comment, I would put a clustered index on (Sensor, Timestamp).
This will always cover when you want to search for SENSOR alone, but will also cover both fields checked in combination.
If you want to ever search for Timestamp alone, you can add a nonclustered index there as well.
One issue you will have with this design is the need to rebuild the table since you are going to be inserting rows non-sequentially - the new rows won't always belong at the end of the index.
Also, please do not name a field timestamp - this is a keyword in SQL Server and can cause you all kinds of issues if you don't delimit it everywhere.

You definitely want to use a SQL-Server "clustered index" for the most selective data you're likely to search on.
Here's more info:
http://www.sql-server-performance.com/2007/clustered-indexes/
http://odetocode.com/articles/70.aspx
http://www.sql-server-performance.com/2002/index-not-equal/
ELABORATION:
"Sensor" would be a poor choice - you're likely to have few sensors, many rows. This would not be a discriminating index.
"Time" would be discriminating... but it would also be a poor choice. Because the time itself, independent of sensor, temperature, etc, is probably meaningless to your query.
A clustered index on "sensor,time" might be ideal. Or maybe not - it depends on what you're after.
Please review the above links.
PS:
Please, too, consider using "datetime" instead of "timestamp". They're two completely different types under MSSQL ... and "datetime" is arguably the better, more flexible choice:
http://www.sqlteam.com/article/timestamps-vs-datetime-data-types

I agree with using a clustered index, you are almost certainly going to end up with one anyway - so it's better to define it.
A clustered index determines the order that the data is stored, adding to the end is cheaper than inserting into the middle.
Think of a deck of cards you are trying to keep in rank order as you add cards. If the highest rank is a 8, adding a 9 is trivial - put it at the top.
If you add a 5, it gets more complex, you have to work out where to put it and then insert it.
So adding items with a clustered index in order is optimal.
Given that I would suggest having a clustered index in (Timestamp,Sensor).
Clustering on (Sensor, Timestamp) will create a LOT of changes to the physical ordering of data which is very expensive (even using SSD).
If Timestamp,Sensor combo is unique then define it as being UNIQUE, otherwise Sql Server will add in a uniqueidentifier on the index to resolve duplicates.
Primary keys are automatically unique, almost all tables should have a primary key.
If (Timestamp,Sensor) is not unique, or you want to reference this data from another table, consider using an identity column as the clustered Primary Key.
Good Luck!

Clustered index consideration in regards to distinct valus and large result sets and a single vertical table for auditing

I've been researching best practices for creating clustered indexes and I'm just trying to totally understand these two suggestions that's listed with pretty much every BLOG or article on the matter
Columns that contain a large number of distinct values.
Queries that return large result sets.
These seem to be slightly contrary or I'm guessing maybe it just depends on how you're accessing the table.. Or my interpretation of what "large result sets" mean is wrong....
Unless you're doing range queries over the clustered column it seems like you typically won't be getting large result sets that matter. So in cases where SQL Server defaults the clustered indexes on the PK you're rarely going to fulfill the large result set suggestion but of course it does the large number of distinct values..
To give the question a little more context. This quetion stems from a vertical auditing table we have that has a column for TABLE.... Every single query that's written against this table has a
WHERE TABLE = 'TABLENAME'
But the TableName is highly non distinct... Each result set of tablenames is rather large which seems to fulfill that second conditon but it's definitely not largerly unique.... Which means all that other stuff happens with having to add the 4 byte Uniquifer (sp?) which makes the table a lot larger etc...
This situation has come up a few times for me when I've come upon DBs that have say all the contact or some accounts normalized into a single table and they are only separated by a TYPE parameter. Which is on every query....
In the case of the audit table the queries are typically not that exciting either they are just sorted by date modified, sometimes filtered by column, user that made the change etc...
My other thought with this auditing scenario was to just make the auditing table a HEAP so that inserting is fast so there's not contention between tables being audited and then to generate indexed views over the data ...

Index design is just as much art as it is science.
There are many things to consider, including:
How the table will be accessed most often: mostly inserts? any updates? more SELECTs than DML statements? Any audit table will likely have mostly inserts, no updates, rarely deletes unless there is a time-limit on the data, and some SELECTs.
For Clustered indexes, keep in mind that the data in each column of the clustered index will be copied into each non-clustered index (though not for UNIQUE indexes, I believe). This is helpful as those values are available to queries using the non-clustered index for covering, etc. But it also means that the physical space taken up by the non-clustered indexes will be that much larger.
Clustered indexes generally should either be declared with the UNIQUE keyword or be the Primary Key (though there are exceptions, of course). A non-unique clustered index will have a hidden 4-byte field called a uniqueifier that is required to make each row with a non-unique key value addressable, and is just wasted space given that the order of your rows within the non-unique groupings is not apparently obvious so trying to narrow down to a single row is still a range.
As is mentioned everywhere, the clustered index is the physical ordering of the data so you want to cater to what needs the best I/O. This relates also to the point directly above where non-unique clustered indexes have an order but if the data is truly non-unique (as opposed to unique data but missing the UNIQUE keyword when the index was created) then you miss out on a lot of the benefit of having the data physically ordered.
Regardless of any information or theory, TEST TEST TEST. There are many more factors involved that pertain to your specific situation.
So, you mentioned having a Date field as well as the TableName. If the combination of the Date and TableName is unique then those should be used as a composite key on a PK or UNIQUE CLUSTERED index. If they are not then find another field that creates the uniqueness, such as UserIDModified.
While most recommendations are to have the most unique field as the first one (due to statistics being only on the first field), this doesn't hold true for all situations. Given that all of your queries are by TableName, I would opt for putting that field first to make use of the physical ordering of the data. This way SQL Server can read more relevant data per read without having to seek to other locations on disk. You would likely also being ordering on the Date so I would put that field second. Putting TableName first will cause higher fragmentation across INSERTs than putting the Date first, but upon an index rebuild the data access will be faster as the data is already both grouped ( TableName ) and ordered ( Date ) as the queries expect. If you put Date first then the data is still ordered properly but the rows needed to satisfy the query are likely spread out across the datafile(s) which would require more I/O to get. AND, more data pages to satisfy the same query means more pages in the Buffer Pool, potentially pushing out other pages and reducing Page Life Expectancy (PLE). Also, you would then really need to inculde the Date field in all queries as any queries using only TableName (and possibly other filters but NOT using the Date field) will have to scan the clustered index or force you to create a nonclustered index with TableName being first.
I would be weary of the Heap plus Indexed View model. Yes, it might be optimized for the inserts but the system still needs to maintain the data in the indexed view across all DML statements against the heap. Again you would need to test, but I don't see that being materially better than a good choice of fields for a clustered index on the audit table.

Effects of Clustered Index on DB Performance

I recently became involved with a new software project which uses SQL Server 2000 for its data storage.
In reviewing the project, I discovered that one of the main tables uses a clustered index on its primary key which consists of four columns:
Sequence numeric(18, 0)
Date datetime
Client varchar(9)
Hash tinyint
This table experiences a lot of inserts in the course of normal operation.
Now, I'm a C++ developer, not a DB Admin, but my first impression of this table design was that that having these fields as a clustered index would be very detrimental to insert performance, since the data would have to be physically reordered on each insert.
In addition, I can't really see any benefit to this since one would have to be querying all of these fields frequently to justify the clustered index, right?
So basically I need some ammunition for when I go to the powers that be to convince them that the table design should be changed.

The clustered index should contain the column(s) most queried by to give the greatest chance of seeks or of making a nonclustered index cover all the columns in the query.
The primary key and the clustered index do not have to be the same. They are both candidate keys, and tables often have more than one such key.
You said
In addition, I can't really see any benefit to this since one would have to be querying all of these fields frequently to justify the clustered index, right?
That's not true. A seek can be had just by using the first column or two of the clustered index. It may be a range seek, but it's still a seek. You don't have to specify all the columns of it in order to get that benefit. But the order of the columns does matter a lot. If you're predominantly querying on Client, then the Sequence column is a bad choice as the first in the clustered index. The choice of the second column should be the item that is most queried in conjunction with the first (not by itself). If you find that a second column is queried by itself almost as often as the first column, then a nonclustered index will help.
As others have said, reducing the number of columns/bytes in the clustered index as much as possible is important.
It's too bad that the Sequence is a random value instead of incrementing, but that may not be able to be helped. The answer isn't to throw in an identity column unless your application can start using it as the primary query condition on this table (unlikely). Now, since you're stuck with this random Sequence column (presuming it IS the most often queried), let's look at another of your statements:
having these fields as a clustered index would be very detrimental to insert performance, since the data would have to be physically reordered on each insert.
That's not entirely true.
The physical location on the disk is not really what we're talking about here, but it does come into play in terms of fragmentation, which is a performance implication.
The rows inside each 8k page are not ordered. It's just that all the rows in each page are less than the next page and more than the previous one. The problem occurs when you insert a row and the page is full: you get a page split. The engine has to copy all the rows after the inserted row to a new page, and this can be expensive. With a random key you're going to get a lot of page splits. You can ameliorate the problem by using a lower fillfactor when rebuilding the index. You'd have to play with it to get the right number, but 70% or 60% might serve you better than 90%.
I believe that having datetime as the second CI column could be beneficial, since you'd still be dealing with pages needing to be split between two different Sequence values, but it's not nearly as bad as if the second column in the CI was also random, since you'd be guaranteed to page split on every insert, where with an ascending value you can get lucky if the row can be added to a page because the next Sequence number starts on the next page.
Shortening the data types and number of all columns in a table as well as its nonclustered indexes can boost performance too, since more rows per page = fewer page reads per request. Especially if the engine is forced to do a table scan. Moving a bunch of rarely-queried columns to a separate 1-1 table could do wonders for some of your queries.
Last, there are some design tweaks that could help as well (in my opinion):
Change the Sequence column to a bigint to save a byte for every row (8 bytes instead of 9 for the numeric).
Use a lookup table for Client with a 4-byte int identity column instead of a varchar(9). This saves 5 bytes per row. If possible, use a smallint (-32768 to 32767) which is 2 bytes, an even greater savings of 7 bytes per row.
Summary: The CI should start with the column most queried on. Remove any columns from the CI that you can. Shorten columns (bytes) as much as you can. Use a lower fillfactor to mitigate the page splits caused by the random Sequence column (if it has to stay first because of being queried the most).
Oh, and get your online defragging going. If the table can't be changed, at least it can be reorganized frequently to keep it in best possible shape. Don't neglect statistics, either, so the engine can pick appropriate execution plans.
UPDATE
Another strategy to consider is if the composite key used in the table can be converted to an int, and a lookup table of the values is created. Let's say some combination of less than all 4 columns is repeated in over 100 rows, for example, Sequence + Client + Hash but only with varying Date values. Then an insert to a separate SequenceClientHash table with an identity column could make sense, because then you could look up the artificial key once and use it over and over again. This would also get your CI to add new rows only on the last page (yay) and substantially reduce the size of the CI as repeated in all nonclustered indexes (yippee). But this would only make sense in certain narrow usage patterns.
Now, marc_s suggested just adding an additional int identity column as the clustered index. It is possible that this could help by making all the nonclustered indexes get more rows per page, but it all depends on exactly where you want the performance to be, because this would guarantee that every single query on the table would have to use a bookmark lookup and you could never get a table seek.
About "tons of page splits and bad index fragmentation": as I already said this can be ameliorated somewhat with a lower fill factor. Also, frequent online index reorganization (not the same as rebuilding) can help reduce the effect of this.
Ultimately, it all comes down to the exact system and its unique pattern of data access combined with decisions about which parts you want optimized. For some systems, having a slower insert isn't bad as long as selects are always fast. For others, having consistent but slightly slower select times is more important than having slightly faster but inconsistent select times. For others, the data isn't really read until it's pushed to a data warehouse anyway so the inserts need to be as fast as possible. And adding into the mix is the fact that performance isn't just about user wait time or even query response time but also about server resources especially in the case of massive parallelism, so that total throughput (say, in client responses per time unit) matters more than any other factor.

Clustered indexes (CI) work best over ever-increasing, narrow, rarely changing values. You'll want your CI to cover the column(s) that get hit the most often in queries with >=, <=, or BETWEEN statements.
I'm not sure how your data normally gets hit. Most often you'll see a CI on an IDENTITY column or another narrow column (because this column will also be returned "tacked on" to all non-clustered indexes, and we don't want a ton of data added on to every fetch if it isn't needed). It's possible the data might be getting queried most often on date, and that may be a good choice, but all four columns is likely not correct (I stress likely, because I don't know the set-up; this may not have anything wrong with it). There are some pointers here: http://msdn.microsoft.com/en-us/library/aa933131%28SQL.80%29.aspx

There are a few things you are misunderstanding about how SQL creates and uses indexes.
Clustered indexes aren't necessarily physically ordered on disk by the clustered index, at least not in real-time. They are just a logical ordering.
I wouldn't expect a major performance hit based on this structure and removing the clustered index before you have actually identified a performance issue related to that index is clearly premature optimization.
Also, an index can be useful (especially one with several fields in it) even for searches that don't sort or get queried on all columns included in it.
Obviously, there should be a justification for creating a multi-part clustered index, just like any index, so it makes sense to ask for that if you think it was added capriciously.
Bottom line: Don't optimize the indexes for insert performance until you have actually detected a performance problem with inserts. It usually isn't worth it.

If you have only that single clustered index on your table, that might not be too bad. However, the clustering index is also used for looking up the real data page for any hit in a non-clustered index - therefor, the clustered index (all its columns) are also part of each and every non-clustered index you might have on your table.
So if you have a few nonclustered indices on your table, then you're definitely a) wasting a lot of space (and not just on disk - also in your server's RAM!), and b) your performance will be bad.
A good clustered index ought to be:
small (best bet: a 4-byte INT) - yours is pretty bad with up to 28 bytes per entry
unique
stable (never change)
ever-increasing
I would bet your current setup violates at least two if not more of those requirements. Not following these recommendations will lead to waste of space, and as you rightfully say, lots of page and index fragmentation and page splits (having to "rearrange" the data when an insert happens somewhere in the middle of the clustered index).
Quite honestly: just add a surrogate ID INT IDENTITY(1,1) to your table and make that the primary clustered key - you should see quite a nice boost in performance, just from that, if you have lots of INSERT (and UPDATE) operations going on!
See some more background info on what makes a good clustering key, and what is important about them, here:
GUIDs as PRIMARY KEYs and/or the clustering key
The Clustered Index Debate Continues...
Ever-increasing clustering key - the Clustered Index Debate..........again!

I ultimately agree with Erik's last paragraph:
"Ultimately, it all comes down to the exact system and its unique pattern of data access combined with decisions about which parts you want optimized..."
This is the basic thing I force people to learn: there's no universal solution.
You have to know your data and the actions performed against it. You have to know how frequent different type of actions are and their impact and expected execution times (you don't have to hard tune some rarely executed query and impact everything else if the end user agrees the query execution time is not so important--let's say waiting for few minutes for some report once per week is okay). Of course, as Erik said
"performance isn't just about user wait time or even query response time but also about server resources"
If such a query affects overall server performance, it should be considered as a serious candidate for optimization, even if execution time is fine. I've seen some very fast queries that used huge amount of CPU on multiprocessor servers, while slightly slower solution were incomparable "lighter" from resource utilization point of view. In that case I almost always go for the slower one.
Once you know what is your goal you can decide how many indexes you need and which one should be clustered. Unique constraints, filtered indexes, indexes with included columns are quite powerful tools for tuning. Choosing proper columns is important, but often choosing proper order of columns is even more important. And at the end, don't kill insert/update performance with tons of indexes if the table is frequently modified.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight