SQL Server : time-series data performance - sql-server

I have a table of a little over 1 billion rows of time-series data with fantastic insert performance but (sometimes) awful select performance.
Table tblTrendDetails (PK is ordered as shown):
PK TrendTime datetime
PK CavityId int
PK TrendValueId int
TrendValue real
The table is continuously pulling in new data and purging old data, so insert and delete performance needs to remain snappy.
When executing a query such as the following, performance is poor (30 sec):
SELECT *
FROM tblTrendDetails
WHERE TrendTime BETWEEN #inMinTime AND #inMaxTime
AND CavityId = #inCavityId
AND TrendValueId = #inTrendId
If I execute the same query again (with similar times, but any #inCavityId or #inTrendId), performance is very good (1 sec). Performance counters show that disk access is the culprit the first time the query is run.
Any recommendations regarding how to improve performance without (significantly) adversely affecting the insert or delete performance? Any suggestions (including completely changing the underlying database) are welcome.

The fact that subsequent queries of the same or similar data run much faster is probably due to SQL Server caching your data. That said, is it possible to speed this initial query up?
Verify the query plan:
My guess is that your query should result in an Index Seek rather than an Index Scan (or worse, a Table Scan). Please verify this using SET SHOWPLAN_TEXT ON; or a similar feature. Using between and = as your query does should really take advantage of the clustered index, though that's debatable.
Index Fragmentation:
It is possible that your clustered index (the primary key in this case) is quite fragmented after all of those inserts and deletes. I would probably check this with DBCC SHOWCONTIG (tblTrendDetails).
You can defrag the table's indexes with DBCC INDEXDEFRAG (MyDatabase, tblTrendDetails).
This may take some time, but will allow the table to remain accessible, and you can stop the operation without any nasty side-effects.
You might have to go further and use DBCC DBREINDEX (tblTrendDetails). This is an offline operation, though, so you should only do this when the table does not need to be accessed.
There are some differences described here: Microsoft SQL Server 2000 Index Defragmentation Best Practices.
Be aware that your transaction log can grow quite a bit from defragging a large table, and it can take a long time.
Partitioned Views:
If these do not remedy the situation (or fragmentation is not a problem), you may even wish to look to partitioned views, in which you create a bunch of underlying base tables for various ranges of records, then union them all up in a view (replacing your original table).
Better Stuff:
If performance of these selects is a real business need, you may be able to make the case for better hardware: faster drives, more memory, etc. If your drives are twice as fast, then this query will run in half the time, yeah? Also, this may not be workable for you, but I've simply found newer versions of SQL Server to truly be faster with more options and better to maintain. I'm glad to have moved most of my company's data to 2008R2. But I digress...

Related

SQL Server performance optimization - Audit Table Redesign

I have an Audit table in a SQL server database that with the following columns:
Sequence --- bigint (primary key)
TableName --- varachar(50) (nonclustered index)
ColumnName --- varachar(50) (nonclustered index)
Control --- char(10) (nonclustered index)
BeforeValue --- varchar(500) (nonclustered index)
AfterValue, ---varchar(500)
DateChanged --- datetime
ChangedBy --- char(20)
CompanyCode --- char(5) (nonclustered index)
It has 5 billion+ rows of data. Around 200+ triggers are inserting data in this table and around 50+ stored procedures are inserting as well as querying data from this table. Whenever a column is updated/deleted in any of the 200+ tables in a transactional database, a row is inserted in the Audit table respectively.
I inherited this table recently. We have been experiencing performance issues lately and I am told to redesign this Audit table to address the associated performance problems.
I am looking for suggestions, next steps, performance matrix ideas, any help will be appreciated.
Thanks in advance.
I think you do not need to change much, but just need to redesign your process as follows:
Create an archive table exactly the same as your current audit table.
Transfer all your current audit able into this new archive table, meaning your current audit table is empty
Schedule a daily (or weekly) job to move data from your audit table to the archive table.
Based on your data retention policy, clean up your archive table.
But for performance issue, you have to make sure whether inserting into your audit table is really a culprit. If it is, then the above-mentioned way may relieve your pain. Otherwise, it may not help.
I had to do something similar. If you must maintain the 5 billion records, then your best solution is to partition the table. You will need to do the following:
Create partition function
Create partition schema
Create partition files
The partition function will govern how the data is partitioned: typically it is either by row count (i.e., sequence) or date (i.e., monthly, quarterly, yearly, etc.). Note: partitioning is only available on SQL Server 201X Enterprise.
https://learn.microsoft.com/en-us/sql/relational-databases/partitions/partitioned-tables-and-indexes
I highly recommend that you first read Microsoft's white paper on this before doing anything. Also, once you implement it, this should be done over the weekend to allow for processing as I imagine it will take some time to complete.
https://technet.microsoft.com/en-us/library/dd578580(v=sql.100).aspx
For comparisons sake, a query would not complete after 10 minutes of run time in a pre-partitioned state. After partitioning the table, the query completed within 10 seconds. Note: once the table is partitioned, you will need to tune your queries to include the partitioned column in the predicate. Otherwise, you probably won't notice much of a difference in response times.
You're not highly clear on where the performance issues are: are they in reading the table, or writing to it? Typically, if the issue is in reading the table, you'll notice a particular area of the system that is slow due to such reads, while if the issue is with writing to the table, you'll have a more subtle performance impact, but will be much more widespread and tends to slow everything down (in varying degrees) instead of just certain hotspot areas.
Another unclear issue is whether this is primarily write-only or heavily read data. Audit tables are typically write-optimized, to be as fast as they can be for adding new data, while slow to read from it when you do need it (typically we're writing far more often than reading: the opposite of "normal" transactional rows in a RDMS).
Thus, the first step is to determine where the performance issues are, and gather some clues from there to determine whether you need to be optimizing the read portion or the write portion of the table.
If it is indeed a write performance issue (a bit harder to profile and recognize compared to read issues), I would look at perhaps dropping some indexes that are on the table. I think for a write-optimized table, there's a lot of indexes on there that invoking a lot of overhead on each write operation (remember that each index needs to be maintained with every write to the table, so while indexes are great for reading, they take away write performance: especially nonclustered indexes).
If it is a read issue, I might still opt to write-optimize anyway, then devise some means to improve the read performance as best can be once it's write-optimized. Other answers here address that in particular, but without being more familiar with the system requirements, it's hard to say which direction is the best to be moving.

Sql Server 2008 R2 DC Inserts Performance Change

I have noticed an interesting performance change that happens around 1,5 million entered values. Can someone give me a good explanation why this is happening?
Table is very simple. It is consisted of (bigint, bigint, bigint, bool, varbinary(max))
I have a pk clusered index on first three bigints. I insert only boolean "true" as data varbinary(max).
From that point on, performance seems pretty constant.
Legend: Y (Time in ms) | X (Inserts 10K)
I am also curios about constant relatively small (sometimes very large) spikes I have on the graph.
Actual Execution Plan from before spikes.
Legend:
Table I am inserting into: TSMDataTable
1. BigInt DataNodeID - fk
2. BigInt TS - main timestapm
3. BigInt CTS - modification timestamp
4. Bit: ICT - keeps record of last inserted value (increases read performance)
5. Data: Data
Bool value Current time stampl keeps
Enviorment
It is local.
It is not sharing any resources.
It is fixed size database (enough so it does not expand).
(Computer, 4 core, 8GB, 7200rps, Win 7).
(Sql Server 2008 R2 DC, Processor Affinity (core 1,2), 3GB, )
Have you checked the execution plan once the time goes up? The plan may change depending on statistics. Since your data grow fast, stats will change and that may trigger a different execution plan.
Nested loops are good for small amounts of data, but as you can see, the time grows with volume. The SQL query optimizer then probably switches to a hash or merge plan which is consistent for large volumes of data.
To confirm this theory quickly, try to disable statistics auto update and run your test again. You should not see the "bump" then.
EDIT: Since Falcon confirmed that performance changed due to statistics we can work out the next steps.
I guess you do a one by one insert, correct? In that case (if you cannot insert bulk) you'll be much better off inserting into a heap work table, then in regular intervals, move the rows in bulk into the target table. This is because for each inserted row, SQL has to check for key duplicates, foreign keys and other checks and sort and split pages all the time. If you can afford postponing these checks for a little later, you'll get a superb insert performance I think.
I used this method for metrics logging. Logging would go into a plain heap table with no indexes, no foreign keys, no checks. Every ten minutes, I create a new table of this kind, then with two "sp_rename"s within a transaction (swift swap) I make the full table available for processing and the new table takes the logging. Then you have the comfort of doing all the checking, sorting, splitting only once, in bulk.
Apart from this, I'm not sure how to improve your situation. You certainly need to update statistics regularly as that is a key to a good performance in general.
Might try using a single column identity clustered key and an additional unique index on those three columns, but I'm doubtful it would help much.
Might try padding the indexes - if your inserted data are not sequential. This would eliminate excessive page splitting and shuffling and fragmentation. You'll need to maintain the padding regularly which may require an off-time.
Might try to give it a HW upgrade. You'll need to figure out which component is the bottleneck. It may be the CPU or the disk - my favourite in this case. Memory not likely imho if you have one by one inserts. It should be easy then, if it's not the CPU (the line hanging on top of the graph) then it's most likely your IO holding you back. Try some better controller, better cached and faster disk...

SQL Server 2008 indexes - performance gain on queries vs. loss on INSERT/UPDATE

How can you determine if the performance gained on a SELECT by indexing a column will outweigh the performance loss on an INSERT in the same table? Is there a "tipping-point" in the size of the table when the index does more harm than good?
I have table in SQL Server 2008 with 2-3 million rows at any given time. Every time an insert is done on the table, a lookup is also done on the same table using two of its columns. I'm trying to determine if it would be beneficial to add indexes to the two columns used in the lookup.
Like everything else SQL-related, it depends:
What kind of fields are they? Varchar? Int? Datetime?
Are there other indexes on the table?
Will you need to include additional fields?
What's the clustered index?
How many rows are inserted/deleted in a transaction?
The only real way to know is to benchmark it. Put the index(es) in place and do frequent monitoring, or run a trace.
This depends on your workload and your requirements. Sometimes data is loaded once and read millions of times, but sometimes not all loaded data is ever read.
Sometimes reads or writes must complete in certain time.
case 1: If table is static and is queried heavily (eg: item table in Shopping Cart application) then indexes on the appropriate fields is highly beneficial.
case 2: If table is highly dynamic and not a lot of querying is done on a daily basis (eg: log tables used for auditing purposes) then indexes will slow down the writes.
If above two cases are the boundary cases, then to build indexes or not to build indexes on a table depends on which case above does the table in contention comes closest to.
If not leave it to the judgement of Query tuning advisor. Good luck.

SQL Server 2008 Performance: No Indexes vs Bad Indexes?

i'm running into a strange problem in Microsoft SQL Server 2008.
I have a large database (20 GB) with about 10 tables and i'm attempting to make a point regarding how to correctly create indexes.
Here's my problem: on some nested queries i'm getting faster results without using indexes! It's close (one or two seconds), but in some cases using no indexes at all seems to make these queries run faster... I'm running a Checkpoiunt and a DBCC dropcleanbuffers to reset the caches before running the scripts, so I'm kinda lost.
What could be causing this?
I know for a fact that the indexes are poorly constructed (think one index per relevant field), the whole point is to prove the importance of constructing them correctly, but it should never be slower than having no indexes at all, right?
EDIT: here's one of the guilty queries:
SET STATISTICS TIME ON
SET STATISTICS IO ON
USE DBX;
GO
CHECKPOINT;
GO
DBCC DROPCLEANBUFFERS;
GO
DBCC FREEPROCCACHE;
GO
SELECT * FROM Identifier where CarId in (SELECT CarID from Car where ManufactId = 14) and DataTypeId = 1
Identifier table:
- IdentifierId int not null
- CarId int not null
- DataTypeId int not null
- Alias nvarchar(300)
Car table:
- CarId int not null
- ManufactId int not null
- (several fields followed, all nvarchar(100)
Each of these bullet points has an index, along with some indexes that simultaneously store two of them at a time (e.g. CarId and DataTypeId).
Finally, The identifier table has over million entries, while the Car table has two or three million
My guess would be that SQL Server is incorrectly deciding to use an index, which is then forcing a bookmark lookup*. Usually when this happens (the incorrect use of an index) it's because the statistics on the table are incorrect.
This can especially happen if you've just loaded large amounts of data into one or more of the tables. Or, it could be that SQL Server is just screwing up. It's pretty rare that this happens (I can count on one hand the times I've had to force index use over a 15 year career with SQL Server), but the optimizer is not perfect.
* A bookmark lookup is when SQL Server finds a row that it needs on an index, but then has to go to the actual data pages to retrieve additional columns that are not in the index. If your result set returns a lot of rows this can be costly and clustered index scans can result in better performance.
One way to get rid of bookmark lookups is to use covering indexes - an index which has the filtering columns first, but then also includes any other columns which you would need in the "covered" query. For example:
SELECT
my_string1,
my_string2
FROM
My_Table
WHERE
my_date > '2000-01-01'
covering index would be (my_date, my_string1, my_string2)
Indexes don't really have any benefit until you have many records. I say many because I don't really know what that tipping over point is...It depends on the specific application and circumstances.
It does take time for the SQL Server to work with an index. If that time exceeds the benefit...This would especially be true in subqueries, where a small difference would be multiplied.
If it works better without the index, leave out the index.
Try DBCC FREEPROCCACHE to clear the execution plan cache as well.
This is an empty guess. Maybe if you have a lot of indexes, SQL Server is spending time on analyzing and picking one, and then rejecting all of them. If you had no indexes, the engine wouldn't have to waste it's time with this vetting process.
How long this vetting process actually takes, I have no idea.
For some queries, it is faster to read directly from the table (clustered index scan), than it is to read the index and fetch records from the table (index scan + bookmark lookup).
Consider that a record lives along with other records in a datapage. Datapage is the basic unit of IO. If the table is read directly, you could get 10 records for the cost of 1 IO. If the index is read directly, and then records are fetched from the table, you must pay 1 IO per record.
Generally SQL server is very good at picking the best way to access a table (direct vs index). There may be something in your query that is blinding the optimizer. Query hints can instruct the optimizer to use an index when it is wrong to do so. Join hints can alter the order or method of access of a table. Table Variables are considered to have 0 records by the optimizer, so if you have a large Table Variable - the optimizer may choose a bad plan.
One more thing to look out for - varchar vs nvarchar. Make sure all parameters are of the same type as the target columns. There's a case where SQL Server will convert the whole index to the parameter's type in the event of a type mismatch.
Normally SQL Server does a good job at deciding what index to use if any to retrieve the data in the fastest way. Quite often it will decide not to use any indexes as it can retrieve small amounts of data from small tables quicker without going away to the index (in some situations).
It sounds like in your case SQL may not be taking the most optimum route. Having lots of badly created indexes may be causing it to pick the wrong routes to get to the data.
I would suggest viewing the query plan in management studio to check what indexes its using, and where the time is being taken. This should give you a good idea where to start.
Another note is it maybe that these indexes have gotten fragmented over time and are now not performing to their best, it maybe worth checking this and rebuilding some of them if needed.
Check the execution plan to see if it is using one of these indexes that you "know" to be bad?
Generally, indexing slows down writing data and can help to speed up reading data.
So yes, I agree with you. It should never be slower than having no indexes at all.
SQL server actually makes some indexes for you (e.g. on primary key).
Indexes can become fragmented.
Too many indexes will always reduce performance (there are FAQs on why not to index every col in the db)
also there are some situations where indexes will always be slower.
run:
SET SHOWPLAN_ALL ON
and then run your query with and without the index usage, this will let you see what index if any are being used, where the "work" is going on etc.
No Sql Server analyzes both the indexes and the statistics before deciding to use an index to speed up a query. It is entirely possible that running a non-indexed version is faster than an indexed version.
A few things to try
ensure the indexes are created and rebuilt, and re-organized (defragmented).
ensure that the auto create statistics is turned on.
Try using Sql Profiler to capture a tuning profile and then using the Database Engine Tuning Advisor to create your indexes.
Surprisingly the MS Press Examination book for Sql administration explains indexes and statistics pretty well.
See Chapter 4 table of contents in this amazon reader preview of the book
Amazon Reader of Sql 2008 MCTS Exam Book
To me it sounds like your sql is written very poorly and thus not utilizing the indexes that you are creating.
you can add indexes till you're blue in the face but if your queries aren't optimized to use those indexes then you won't get any performance gain.
give us a sample of the queries you're using.
alright...
try this and see if you get any performance gains (with the pk indexes)
SELECT i.*
FROM Identifier i
inner join Car c
on i.CarID=c.CarID
where c.ManufactId = 14 and i.DataTypeId = 1

Very large tables in SQL Server

We have a very large table (> 77M records and growing) runing on SQL Server 2005 64bit Standard edition and we are seeing some performance issues. There are up to a hundred thousand records added daily.
Does anyone know if there is a limit to the number of records SQL server Standard edition can handle? Should be be considering moving to Enterprise edition or are there some tricks we can use?
Additional info:
The table in question is pretty flat (14 columns), there is a clustered index with 6 fields, and two other indexes on single fields.
We added a fourth index using 3 fields that were in a select in one problem query and did not see any difference in the estimated performance (the query is part of a process that has to run in the off hours so we don't have metrics yet). These fields are part of the clustered index.
Agreeing with Marc and Unkown above ... 6 indexes in the clustered index is way too many, especially on a table that has only 14 columns. You shouldn't have more than 3 or 4, if that, I would say 1 or maybe 2. You may know that the clustered index is the actual table on the disk so when a record is inserted, the database engine must sort it and place it in it's sorted organized place on the disk. Non clustered indexes are not, they are supporting lookup 'tables'. My VLDBs are laid out on the disk (CLUSTERED INDEX) according to the 1st point below.
Reduce your clustered index to 1 or 2. The best field choices are the IDENTITY (INT), if you have one, or a date field in which the fields are being added to the database, or some other field that is a natural sort of how your data is being added to the database. The point is you are trying to keep that data at the bottom of the table ... or have it laid out on the disk in the best (90%+) way that you'll read the records out. This makes it so that there is no reorganzing going on or that it's taking one and only one hit to get the data in the right place for the best read. Be sure to put the removed fields into non-clustered indexes so you don't lose the lookup efficacy. I have NEVER put more than 4 fields on my VLDBs. If you have fields that are being update frequently and they are included in your clustered index, OUCH, that's going to reorganize the record on the disk and cause COSTLY fragmentation.
Check the fillfactor on your indexes. The larger the fill factor number (100) the more full the data pages and index pages will be. In relation to how many records you have and how many records your are inserting you will change the fillfactor # (+ or -) of your non-clustered indexes to allow for the fill space when a record is inserted. If you change your clustered index to a sequential data field, then this won't matter as much on a clustered index. Rule of thumb (IMO), 60-70 fillfactor for high writes, 70-90 for medium writes, and 90-100 for high reads/low writes. By dropping your fillfactor to 70, will mean that for every 100 records on a page, 70 records are written, which will leave free space of 30 records for new or reorganized records. Eats up more space, but it sure beats having to DEFRAG every night (see 4 below)
Make sure the statistics exist on the table. If you want to sweep the database to create statistics using the "sp_createstats 'indexonly'", then SQL Server will create all the statistics on all the indexes that the engine has accumulated as requiring statistics. Don't leave off the 'indexonly' attribute though or you'll add statistics for every field, that would then not be good.
Check the table/indexes using DBCC SHOWCONTIG to see which indexes are getting fragmented the most. I won't go into the details here, just know that you need to do it. Then based on that information, change the fillfactor up or down in relation to the changes the indexes are experiencing change and how fast (over time).
Setup a job schedule that will do online (DBCC INDEXDEFRAG) or offline (DBCC DBREINDEX) on individual indexes to defrag them. Warning: don't do DBCC DBREINDEX on this large of a table without it being during maintenance time cause it will bring the apps down ... especially on the CLUSTERED INDEX. You've been warned. Test and test this part.
Use the execution plans to see what SCANS, and FAT PIPES exist and adjust the indexes, then defrag and rewrite stored procs to get rid of those hot spots. If you see a RED object in your execution plan, it's because there are not statistics on that field. That's bad. This step is more of the "art than the science".
On off peak times, run the UPDATE STATISTICS WITH FULLSCAN to give the query engine as much information about the data distributions as you can. Otherwise do the standard UPDATE STATISTICS (with standard 10% scan) on tables during the weeknights or more often as you see fit with your observerations to make sure the engine has more information about the data distributions to retrieve the data for efficiently.
Sorry this is so long, but it's extremely important. I've only give you here minimal information but will help a ton. There's some gut feelings and observations that go in to strategies used by these points that will require your time and testing.
No need to go to Enterprise edition. I did though in order to get the features spoken of earlier with partitioning. But I did ESPECIALLY to have much better mult-threading capabilities with searching and online DEFRAGING and maintenance ... In Enterprise edition, it is much much better and more friendly with VLDBs. Standard edition doesn't handle doing DBCC INDEXDEFRAG with online databases as well.
The first thing I'd look at is indexing. If you use the execution plan generator in Management Studio, you want to see index seeks or clustered index seeks. If you see scans, particularly table scans, you should look at indexing the columns you generally search on to see if that improves your performance.
You should certainly not need to move to Enterprise edition for this.
[there is a clustered index with 6 fields, and two other indexes on single fields.]
Without knowing any details about the fields, I would try to find a way to make the clustered index smaller.
With SQL Server, all the clustered-key fields will also be included in all the non-clustered indices (as a way to do the final lookup from non-clustered index to actual data page).
If you have six fields at 8 bytes each = 48 bytes, multiply that by two more indices times 77 million rows - and you're looking at a lot of wasted space which translates into a lot
of I/O operations (and thus degrades performance).
For the clustered index, it's absolutely CRUCIAL for it to be unique, stable, and as small as possible (preferably a single INT or such).
Marc
Do you really need to have access to all 77 million records in a single table?
For example, if you only need access to the last X months worth of data, then you could consider creating an archiving strategy. This could be used to relocate data to an archive table in order to reduce the volume of data and subsequently, query time on your 'hot' table.
This approach could be implemented in the standard edition.
If you do upgrade to the Enterprise edition you can make use of table partitioning. Again depending on your data structure this can offer significant performance improvements. Partitioning can also be used to implement the strategy previously mentioned but with less administrative overhead.
Here is an excellent White paper on table partitioning in SQL Server 2005
http://msdn.microsoft.com/en-us/library/ms345146.aspx
I hope what I have detailed is clear and understandable. Please do feel to contact me directly if you require further assistance.
Cheers,
http://msdn.microsoft.com/en-us/library/ms143432.aspx
You've got some room to grow.
As far as performance issues, that's a whole other question. Caching, sharding, normalizing, indexing, query tuning, app code tuning, and so on.
Standard should be able to handle it. I would look at indexing and the queries you use with the table. You want to structure things in such a way that your inserts don't cause too many index recalcs, but your queries can still take advantage of the index to limit lookups to a small portion of the table.
Beyond that, you might consider partitioning the table. This will allow you to divide the table into several logical groups. You can do it "behind-the-scenes", so it still appears in sql server as one table even though it stored separately, or you can do it manually (create a new 'archive' or yearly table and manually move over rows). Either way, only do it after you looked at the other options first, because if you don't get that right you'll still end up having to check every partition. Also: partitioning does require Enterprise Edition, so that's another reason to save this for a last resort.
In and of itself, 77M records is not a lot for SQL Server. How are you loading the 100,000 records? is that a batch load each day? or thru some sort of OLTP application? and is that the performance issue you are having, i.e adding the data? or is it the querying that giving you the most problems?
If you are adding 100K records at a time, and the records being added are forcing the cluster-index to re-org your table, that will kill your performance quickly. More details on the table structure, indexes and type of data inserted will help.
Also, the amount of ram and the speed of your disks will make a big difference, what are you running on?
maybe these are minor nits, but....
(1) relational databases don't have FIELDS... they have COLUMNS.
(2) IDENTITY columns usually mean the data isn't normalized (or the designer was lazy). Some combination of columns MUST be unique (and those columns make up the primary key)
(3) indexing on datetime columns is usually a bad idea; CLUSTERING on datetime columns is also usually a bad idea, especially an ever-increasing datetime column, as all the inserts are contending for the same physical space on disk. Clustering on datetime columns in a read-only table where that column is part of range restrictions is often a good idea (see how the ideas conflict? who said db design wasn't an art?!)
What type of disks do you have?
You might monitor some disk counters to see if requests are queuing.
You might move this table to another drive by putting it in another filegroup. You can also to the same with the indexes.
Initially I wanted to agree with Marc. The width of your clustered index seems suspect, as it will essentially be used as the key to perform lookups on all your records. The wider the clustered index, the slower the access, generally. And a six field clustered index feels really, really suspect.
Uniqueness is not required for a clustered index. In fact, the best candidates for fields that should be in the clustered index are ones that are not unique and used in joins. For example, in a Persons table where each Person belongs to one Group and you frequently join Persons to Groups, while accessing batches of people by group, Person.group_id would be an ideal candidate, for this particular use case.

Resources