Related
This question was migrated from Stack Overflow because it can be answered on Database Administrators Stack Exchange.
Migrated 3 days ago.
Environment:
SQL Server 2019 on Windows Server 2019, on KVM backed by TrueNAS, 16 cores, 32 GB RAM.
Application runs 50 parallel threads all inserting into the same massive table.
This combination appears to work against the SQL Server architecture
Additional details
the problem table is both deep and wide - 20,000,000 rows with over 300 columns and 40-50 indexes
The application uses JDBC Batch API's. This particular table, due to row size, is inserting in batches of 1,000 rows.
Tables with more reasonable row sizes are inserting in batches of 10,000 rows
I can't share the actual DDL, but it's pretty mundane apart from the row simply being massive (a surrogate key BIGINT ID column, two natural key VARCHAR columns, 300 or so cargo columns, 0 BLOB/CLOB columns, then 40-50 indexes)
The primary key index DDL is "create unique index mytable_pk on dbo.mytable (keycolumn);"
The only other unique index DDL is "create unique index mytable_ndx1 on dbo.mytable (division, itemnum)";
The product that owns the database is used by hundreds of fortune 2000 customers, so changing hte data model is not an option for me or the product vendor.
Restrictions
Since the database is ultimately a third party's, any changes I make
to it must be in-place. Once the data is inserted into it, I no
longer have any access to it.
The database is owned by a third party
off-the-shelf application.
the primary key is a sequential integer
Observations and metrics
Early in the process, we were bottlenecked on CPU resources.
Once we hit about 1,000,000 rows, we were single threading on latches, sometimes spending over two seconds in a latch, and rarely spending less than 500ms in a latch. Latching and IO buffer waits were both excessive. CPU dropped to about 12% usage.
In a second test, I dropped all of the indexes and re-ran the job. The job completed 8 times as quickly, showing zero load on the SQL server and bottlenecking on CPU on the application which is very good from the SQL Server perspective.
After reading Microsoft's literature, I came to the conclusion that the data model is working against SQL Server's indexing architecture for tuning for massive inserts.
I will not always have the option of dropping and recreating the indexes. Is there a way to tune the table to distribute the I/O
** Now to the real question **
Is there a way to tune SQL Server, under the covers, to distribute the IO so sequential numbers in an index not in the same buffer when doing massive inserts of sequential data?
There are several well-known approaches to addressing last page insert contention in SQL Server.
Many of these are covered in the documentation at Resolve last-page insert PAGELATCH_EX contention in SQL Server. Summarising the options from that link:
Use OPTIMIZE_FOR_SEQUENTIAL_KEY (details)
Move primary key off identity column
Make the leading key a non-sequential column
Add a non-sequential value as a leading key
Use a GUID as a leading key
Use table partitioning and a computed column with a hash value
Switch to In-Memory OLTP
Method 7 can also be implemented as an in-memory OLTP table to handle a high rate of ingestion with regular batch moves to the final destination table. For the very highest concurrency, use natively compiled code with the in-memory table as much as possible (including for the inserts). The frequency and size of moves is dictated by your requirements.
As mentioned in another answer, delayed durability can also improve insert performance in many cases.
Related Q & A: Solving periodic high PAGELATCH_EX Waits. Last page contention?
All that said, you haven't shown evidence of a last-page contention issue at all. More likely, you're encountering problems related to updating all those secondary indexes and a lack of memory on the instance meaning index maintenance often has to wait for pages to be brought in from storage for modification. You don't mention the type of latch you see waits on, but I imagine they'd be PAGEIOLATCH_*.
The primary solution would be to dramatically increase the memory available to SQL Server for its buffer pool so fewer IOs are necessary. Failing that, a faster storage subsystem would be required.
Have you tried using Delayed Durability?
When to use delayed transaction durability
Some of the cases in which you could benefit from using delayed transaction durability are:
You can tolerate some data loss.
If you can tolerate some data loss, for example, where individual records are not critical as long as you have most of the data, then delayed durability may be worth considering. If you cannot tolerate any data loss, do not use delayed transaction durability.
You are experiencing a bottleneck on transaction log writes.
If your performance issues are due to latency in transaction log writes, your application will likely benefit from using delayed transaction durability.
Your workloads have a high contention rate.
If your system has workloads with a high contention level much time is lost waiting for locks to be released. Delayed transaction durability reduces commit time and thus releases locks faster, which results in higher throughput.
The short answer to your "real question" is no because contiguous keys of a disk-based b-tree index must be stored in the same page.
I've never used SQL server, but your problem isn't specific to one database, so maybe this can still help.
When inserting a large number of rows per second the bottlenecks are either going to be parsing overhead (which can be parallelized), index updates (which may be parallelizable or not), primary key sequence generation, or other stuff like postgres' large object support, but that depends on your column types and database quirks. Then at some point any transactional database must generate sequential transaction log entries which is also a concurrency bottleneck.
First thing you should do is check if the inserts are grouped into transactions (not one insert per transaction). Then make sure the IO is fast, look for bottlenecks there, iowait, etc.
In a second test, I dropped all of the indexes and re-ran the job. The job completed 8 times as quickly, showing zero load on the SQL server
So that eliminates some of the candidates and hints that the problem is indices.
For example if 50 threads each insert a row at the same time, and...
You have a high cardinality index with each row hitting a different page in the index, then these can be parallelized
You have a low cardinality index, most of the inserted rows have the same value in the same column, and all these threads are fighting for control of the same index page.
This can compound with index/table page splits if your fillfactor is too high, in this case all the threads will want to insert in the same index page, and it's already full, so one thread is splitting the page while all others are waiting.
Unfortunately you didn't post the table info in the question, which you should really do. But you probably know if your indices are low cardinality or high. The first thing you could do is run the same tests again, adding the indices one by one, try to see which one causes trouble.
You can also lower fillfactor so there is less chance the inserts end up in a page that is already full.
If you find a problematic low cardinality index then you should first wonder if it's actually useful for queries, maybe you can drop it. If you want to keep it, you can hack it into a high cardinality index by adding a dummy column at the end. For example if you have an index on (category) which has few different values and causes problems for inserts, you can turn it into (category,other_column) which will work just as well for selecting based on category and might provide some extra features like sorting on other_column while selecting on category. However other_column should not be the PK or date or any other column that will have have values that end up in the same page in all your concurrent inserts, because that would be back to square one.
Next, you can try single-threading, or a low number of threads. Back to this:
In a second test, I dropped all of the indexes and re-ran the job. The job completed 8 times as quickly, showing zero load on the SQL server and bottlenecking on CPU on the application which is very good from the SQL Server perspective.
This may look nice at first glance but there's a problem here. Basically your application is doing the easy things (processing rows) and delegating the hard things (ie, concurrency) to the database. That's fine until it exceeds the database's capabilities, then it breaks down. Databases are excellent at handling concurrency correctly, but doing it fast is a very hard problem: coordinating several cores on a lock has a hard performance limit, caused by latency of communication between the cores, which is the speed of information propagation, in other words the speed of light, which cannot be negotiated with.
Locks are just memory held as cache lines in CPU caches. So a side effect of the way multicore systems work is, it's much faster for the same core to reacquire a lock it just released, because the line is still in its cache, so there is no slow inter-core communication involved. Likewise, several cores attempting to modify different parts of the same index page will result in cache line exchanges between them and lots of communication to determine what core owns what byte in that page. And that is surprisingly slow, it can take microseconds instead of nanoseconds.
In addition you have 50 client threads, so 50 server threads, and only 16 cores, so on the database server the OS will multitask the 50 threads between the 16 cores. This means the OS will end up putting one thread to sleep while it's holding a lock, and when that happens, performance is destroyed.
So the next test you can do is to compare insertion time with all your indices between these two scenarii:
Your current one with 50 threads
Then stop it, copy the inserted data from your main table into a temp table, truncate the main table, and insert the exact same data again with:
INSERT INTO yourtable SELECT * FROM temptable
In the second case you're inserting the same data. For the test to be valid it should be in the same order, so you might want to add an ORDER BY primary key while copying the rows into the temp table, so they come out in the proper order. I don't know if the tables are clustered, but you'll find a way to get the order correct.
You can also try various orders, one of the indices may be faster if data is inserted in an order that it likes.
If the second insert is much faster than the mutli-threaded one, then that will give you a clue of what you need to do. In this case that's probably a funnel, ie a process that gathers rows generated by the many threads and inserts them using a low number of threads, maybe just one.
This can simply be all the threads inserting into a non-indexed table, and a separate task flushing this table into the main one every X milliseconds.
I am very new to Snowflake and while working with snowflake I had conflict between the below 2 options.
Single warehouse with size X-Large (16 credits / hour)
Multi-cluster (with max clusters=2 & min clusters=2) with size Large (8 credits / hour)
Considering the above 2 options
Is there any advantage I can get by choosing 2nd option in terms of performance?
Note: I know the advantages of multi-cluster over a single warehouse. Please share your answer for this specific scenario (when min = max).
So the things that happen in running a query are.
belong I am going to just use single to mean the single instance and 'multi` to mean the multi instance cluster, of which when we run a query it is only ever on one instance.
Reading\Writing IO from your storage layer:
Here a single has twice the IO over the multi thus if your query is IO saturated the single is the better choice.
Parallel steps:
So if you have a GROUP BY over a high cardinality columns, both the single and multi should be equally good. If you have a low cardinality but billions of rows, the smaller instance might give better results as those complex steps cannot be broken over the larger cluster size of the single instance. But this is most likely lost in the wash if you have many concurrent queries.
Many queries / Noisy neighbour:
If you have hundreds of queries hitting in waves the larger single instance is worse at starting those queries, as it just has less concurrent tasks at once, and a single very large query which can flush caches, or just dominate cluster, means you stop handling normal/small queries. Where-as having the mutli cluster allow if only one "super heavy" query comes in, you only stall half your normal queries.
Other thoughts
It also really depends on your load patterns, at my last job, we had auto-scaling cluster of SMALL instances used to used to answer our read queries of dashboards, reports, and we allowed it to run a little over provisioned, so things were snappy.
Where-as our data loading ran on second auto-scaling cluster of MEDIUM instances, and which we overloaded on purpose, as we were trying to load data the fastest/cheapest. And in non-peak times we programmatically reduced the auto-scalling MAX to almost starve the load. But would do some expensive reprocessing on a LARGE instance via those saved credits in "the middle of the night" and also our loading tasks had the ability to spin up exclusive LARGE+ size warehouses to do one off rebuilds, as this was all IO bound work, and thus the smaller the window of "outage" the better the system was, and the IO scale linear, so the total cost was the same.
Which is all to say, "what is best" really depends on what you are doing, your budget, and the trade offs you are prepared for. The golden thing about snowflake it is not like a classic DB where you have to pick the size and get it right, pick one, and watch it, if it's struggling change it on the fly. We did this a number of times when a release of our code or snowflake changed the performance of some critical SQL, we would jump in, and double or triple the instance count, or size, to get past the situation, while trying to fix or work around SF issues, or awaiting SF to roll a release back. for a couple hours generally spending more credits is not budget braking. This flexibility also means you can just experiment, "what happens if we trying 4x smaller instance.." "oh nothing... look we just saved heaps of money"..
If you have min=max=2 then you permanently have 2 warehouses running (as long as they are not suspended). If you configure your multi-cluster warehouse like this then you lose a lot of the advantages but for your specific use case it might make sense, I suppose
Based on your comment, here is my answer:
In both scenarios, you will have the same resources to process your queries. The important difference would be about running single heavy queries. As you may know, a single query can not spawn to multiple clusters (yet), so when you run a query in your multi-cluster warehouse, it will be processed on one of the Large warehouses (and use max 8 nodes).
If you run the same query on your single XL warehouse, it can be executed by (max) 16 nodes. So if you will run heavy queries which requires more memory and CPU, using a single XL warehouse would be better for you.
About concurrency, there is a parameter named "MAX_CONCURRENCY_LEVEL". Its default value is 8, and it limits maximum number of concurrent executions per warehouse. If you do not change it, your single XL warehouse will execute a maximum of 8 queries concurrently, while your multi-cluster warehouse can execute 16 queries concurrently.
https://docs.snowflake.com/en/sql-reference/parameters.html#max-concurrency-level
You may increase this parameter, and provide same concurrency on both single XL and multi-cluster L warehouse. But in this case, you should be careful when you runn heavy and light queries together. Because one query may use most of the resources of the warehouse, and your light queries may have less resources and take a longer time. So I would recommend using a multi-cluster warehouse, if you will have "relatively" light/concurrent queries.
I am designing an application, one aspect of which is that it is supposed to be able to receive massive amounts of data into SQL database. I designed the database stricture as a single table with bigint identity, something like this one:
CREATE TABLE MainTable
(
_id bigint IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,
field1, field2, ...
)
I will omit how am I intending to perform queries, since it is irrelevant to the question I have.
I have written a prototype, which inserts data into this table using SqlBulkCopy. It seemed to work very well in the lab. I was able to insert tens of millions records at a rate of ~3K records/sec (full record itself is rather large, ~4K). Since the only index on this table is autoincrementing bigint, I have not seen a slowdown even after significant amount of rows was pushed.
Considering that the lab SQL server was a virtual machine with relatively weak configuration (4Gb RAM, shared with other VMs disk sybsystem), I was expecting to get significantly better throughput on the physical machine, but it didn't happen, or lets say the performance increase was negligible. I could, maybe get 25% faster inserts on physical machine. Even after I configured 3-drive RAID0, which performed 3 times faster than a single drive (measured by a benchmarking software), I got no improvement. Basically: faster drive subsystem, dedicated physical CPU and double RAM almost didn't translate into any performance gain.
I then repeated the test using biggest instance on Azure (8 cores, 16Gb), and I got the same result. So, adding more cores did not change insert speed.
At this time I have played around with following software parameters without any significant performance gain:
Modifying SqlBulkInsert.BatchSize parameter
Inserting from multiple threads simultaneously, and adjusting # of threads
Using table lock option on SqlBulkInsert
Eliminating network latency by inserting from a local process using shared memory driver
I am trying to increase performance at least 2-3 times, and my original idea was that throwing more hardware would get tings done, but so far it doesn't.
So, can someone recommend me:
What resource could be suspected a bottleneck here? How to confirm?
Is there a methodology I could try to get reliably scalable bulk insert improvement considering there is a single SQL server system?
UPDATE I am certain that load app is not a problem. It creates record in a temporary queue in a separate thread, so when there is an insert it goes like this (simplified):
===>start logging time
int batchCount = (queue.Count - 1) / targetBatchSize + 1;
Enumerable.Range(0, batchCount).AsParallel().
WithDegreeOfParallelism(MAX_DEGREE_OF_PARALLELISM).ForAll(i =>
{
var batch = queue.Skip(i * targetBatchSize).Take(targetBatchSize);
var data = MYRECORDTYPE.MakeDataTable(batch);
var bcp = GetBulkCopy();
bcp.WriteToServer(data);
});
====> end loging time
timings are logged, and the part that creates a queue never takes any significant chunk
UPDATE2 I have implemented collecting how long each operation in that cycle takes and the layout is as follows:
queue.Skip().Take() - negligible
MakeDataTable(batch) - 10%
GetBulkCopy() - negligible
WriteToServer(data) - 90%
UPDATE3 I am designing for standard version of SQL, so I cannot rely on partitioning, since it's only available in Enterprise version. But I tried a variant of partitioning scheme:
created 16 filegroups (G0 to G15),
made 16 tables for insertion only (T0 to T15) each bound to its individual group. Tables are with no indexes at all, not even clustered int identity.
threads that insert data will cycle through all 16 tables each. This makes it almost a guarantee that each bulk insert operation uses its own table
That did yield ~20% improvement in bulk insert. CPU cores, LAN interface, Drive I/O were not maximized, and used at around 25% of max capacity.
UPDATE4 I think it is now as good as it gets. I was able to push inserts to a reasonable speeds using following techniques:
Each bulk insert goes into its own table, then results are merged into main one
Tables are recreated fresh for every bulk insert, table locks are used
Used IDataReader implementation from here instead of DataTable.
Bulk inserts done from multiple clients
Each client is accessing SQL using individual gigabit VLAN
Side processes accessing the main table use NOLOCK option
I examined sys.dm_os_wait_stats, and sys.dm_os_latch_stats to eliminate contentions
I have a hard time to decide at this point who gets a credit for answered question. Those of you who don't get an "answered", I apologize, it was a really tough decision, and I thank you all.
UPDATE5: Following item could use some optimization:
Used IDataReader implementation from here instead of DataTable.
Unless you run your program on machine with massive CPU core count, it could use some re-factoring. Since it is using reflection to generate get/set methods, that becomes a major load on CPUs. If performance is a key, it adds a lot of performance when you code IDataReader manually, so that it is compiled, instead of using reflection
For recommendations on tuning SQL Server for bulk loads, see the Data Loading and Performance Guide paper from MS, and also Guidelines for Optimising Bulk Import from books online. Although they focus on bulk loading from SQL Server, most of the advice applies to bulk loading using the client API. This papers apply to SQL 2008 - you don't say which SQL Server version you're targetting
Both have quite a lot of information which it's worth going through in detail. However, some highlights:
Minimally log the bulk operation. Use bulk-logged or simple recovery.
You may need to enable traceflag 610 (but see the caveats on doing
this)
Tune the batch size
Consider partitioning the target table
Consider dropping indexes during bulk load
Nicely summarised in this flow chart from Data Loading and Performance Guide:
As others have said, you need to get some peformance counters to establish the source of the bottleneck, since your experiments suggest that IO might not be the limitation.
Data Loading and Performance Guide includes a list of SQL wait types and performance counters to monitor (there are no anchors in the document to link to but this is about 75% through the document, in the section "Optimizing Bulk Load")
UPDATE
It took me a while to find the link, but this SQLBits talk by Thomas Kejser is also well worth watching - the slides are available if you don't have time to watch the whole thing. It repeats some of the material linked here but also covers a couple of other suggestions for how to deal with high incidences of particular performance counters.
It seems you have done a lot however I am not sure if you have had chance to study Alberto Ferrari SqlBulkCopy Performance Analysis report, which describes several factors to consider the performance related with SqlBulkCopy. I would say lots of things discussed in that paper is still worth trying to that would good to try first.
I am not sure why you are not getting 100% utilization on CPU, IO or memory. But if you simply want to improve your bulk load speeds, here is something to consider:
Partition you data file into different files. Or if they are coming from different sources, then simply create different data files.
Then run multiple bulk inserts simultaneously.
Depending on your situation the above may not be feasible; but if you can then I am sure it should improve your load speeds.
One of my Clients has a reservation based system. Similar to air lines. Running on MS SQL 2005.
The way the previous company has designed it is to create an allocation as a set of rows.
Simple Example Being:
AllocationId | SeatNumber | IsSold
1234 | A01 | 0
1234 | A02 | 0
In the process of selling a seat the system will establish an update lock on the table.
We have a problem at the moment where the locking process is running slow and we are looking at ways to speed it up.
The table is already efficiently index, so we are looking at a hardware solution to speed up the process. The table is about 5 mil active rows and sits on a RAID 50 SAS array.
I am assuming hard disk seek time is going to be the limiting factor in speeding up update locks when you have 5mil rows and are updating 2-5 rows at a time (I could be wrong).
I've herd about people using index partition over several disk arrays, has anyone had similar experiences with trying to speed up locking? can anyone give me some advise onto a possible solution on what hardware might be able to be upgraded or what technology we can take advantage of in order to speed up the update locks (without moving to a cluster)?
One last try…
It is clear that there are too many locks hold for too long.
Once the system starts slowing down
due to too many locks there is no
point in starting more transactions.
Therefore you should benchmark the system to find out the optimal number of currant transaction, then use some queue system (or otherwise) to limit the number of currant transaction. Sql Server may have some setting (number of active connections etc) to help, otherwise you will have to write this in your application code.
Oracle is good at allowing reads to bypass writes, however SqlServer is not as standared...
Therefore I would split the stored proc to use two transactions, the first transaction should just:
be a SNAPSHOT (or READ UNCOMMITTED) transaction
find the “Id” of the rows for the seats you wish to sell.
You should then commit (or abort) this transaction,
and use a 2nd (hopefully very short) transaction that
Most likcly is READ COMMITTED, (or maybe SERIALIZABLE)
Selects each row for update (use a locking hint)
Check it has not been sold in the mean time (abort and start again if it has)
Set the “IsSold” flag on the row
(You may be able to the above in a single update statement using “in”, and then check that the expected number of rows were updated)
Sorry sometimes you do need to understant what each time of transaction does and how locking works in detail.
If the table is smaller, then the
update is shorter and the locks are
hold for less time.
Therefore consider splitting the table:
so you have a table that JUST contains “AllocationId” and “IsSold”.
This table could be stored as a single btree (index organized table on AllocationId)
As all the other indexes will be on the table that contrains the details of the seat, no indexes should be locked by the update.
I don't think you'd getting anything out of table partitioning -- the only improvement you'd get would be in fewer disk reads from a smaller (shorter) index trees (each read will hit each level of the index at least once, so the fewer levels the quicker the read.) However, I've got a table with a 4M+ row partition, indexed on 4 columns, net 10 byte key length. It fits in three index levels, with the topmost level 42.6% full. Assuming you had something similar, it seems reasonable that partitioning might only remove one level from the tree, and I doubt that's much of an improvement.
Some off the-cuff hardward ideas:
Raid 5 (and 50) can be slower on writes, because of the parity calculation. Not an issue (or so I'm told) if the disk I/O cache is large enough to handle the workload, but if that's flooded you might want to look at raid 10.
Partition the table across multiple drive arrays. Take two (or more) Raid arrays, distribute the table across the volumes[files/file groups, with or without table partitioning or partitioned views], and you've got twice the disk I/O speed, depending on where the data lies relative to the queries retrieving it. (If everythings on array #1 and array #2 is idle, you've gained nothing.)
Worst case, there's probably leading edge or bleeding edge technology out there that will blow your socks off. If it's critical to your business and you've got the budget, might be worth some serious research.
How long is the update lock hold for?
Why is the lock on the “table” not just the “rows” being sold?
If the lock is hold for more then a
faction of a second that is likely to
be your problem. SqlServer does not
like you holding locks while users
fill in web forms etc.
With SqlServer, you have to implement a “shopping cart” yourself, by temporary reserving the seat until the user pays for it. E.g add a “IsReserved” and “ReservedAt” colunn, then any seats that has been reserved for more then n minutes should be automatically unreserved.
This is a hard problem, as a shopper does not expect a seat that is in stock to be sold to someone else where he is checking out. However you don’t know if the shopper will ever complete the checkout. So how do you show it on a UI. Think about having a look at what other booking websites do then copy one that your users already know how to use.
(Oracle can sometimes cope with lock being kept for a long time, but even Oracle is a lot faster and happier if you keep your locking short.)
I would first try to figure out why the you are locking the table rather than just a row.
One thing to check out is the Execution plan of the Update statement to see what Indexes it causes to be updated and then make sure that row_level_lock and page_level_lock are enabled on those indexes.
You can do so with the following statement.
Select allow_row_locks, allow_page_locks from sys.indexes where name = 'IndexNameHere'
Here are a few ideas:
Make sure your data and logs are on separate spindles, to maximize write performance.
Configure your drives to only use the first 30% or so for data, and have the remainder be for backups (minimize seek / random access times).
Use RAID 10 for the log volume; add more spindles as needed for performance (write performance is driven by the speed of the log)
Make sure your server has enough RAM. Ideally, everything needed for a transaction should be in memory before the transaction starts, to minimize lock times (consider pre-caching). There are a bunch of performance counters you can check for this.
Partitioning may help, but it depends a lot on the details of your app and data...
I'm assuming that the T-SQL, indexes, transaction size, etc, have already been optimized.
In case it helps, I talk about this subject in detail in my book (including SSDs, disk array optimization, etc) -- Ultra-Fast ASP.NET.
Im trying to squeeze some extra performance from searching through a table with many rows.
My current reasoning is that if I can throw away some of the seldom used member from the searched table thereby reducing rowsize the amount of pagesplits and hence IO should drop giving a benefit when data start to spill from memory.
Any good resource detailing such effects?
Any experiences?
Thanks.
Tuning the size of a row is only a major issue if the RDBMS is performing a full table scan of the row, if your query can select the rows using only indexes then the row size is less important (unless you are returning a very large number of rows where the IO of returning the actual result is significant).
If you are doing a full table scan or partial scans of large numbers of rows because you have predicates that are not using indexes then rowsize can be a major factor. One example I remember, On a table of the order of 100,000,000 rows splitting the largish 'data' columns into a different table from the columns used for querying resulted in an order of magnitude performance improvement on some queries.
I would only expect this to be a major factor in a relatively small number of situations.
I don't now what else you tried to increase performance, this seems like grasping at straws to me. That doesn't mean that it isn't a valid approach. From my experience the benefit can be significant. It's just that it's usually dwarfed by other kinds of optimization.
However, what you are looking for are iostatistics. There are several methods to gather them. A quite good introduction can be found ->here.
The sql server query plan optimizer is a very complex algorithm and decision what index to use or what type of scan depends on many factors like query output columns, indexes available, statistics available, statistic distribution of you data values in the columns, row count, and row size.
So the only valid answer to your question is: It depends :)
Give some more information like what kind of optimization you have already done, what does the query plan looks like, etc.
Of cause, when sql server decides to do a table scna (clustered index scan if available), you can reduce io-performance by downsize row size. But in that case you would increase performance dramatically by creating a adequate index (which is a defacto a separate table with smaller row size).
If the application is transactional then look at the indexes in use on the table. Table partitioning is unlikely to be much help in this situation.
If you have something like a data warehouse and are doing aggregate queries over a lot of data then you might get some mileage from partitioning.
If you are doing a join between two large tables that are not in a 1:M relationship the query optimiser may have to resolve the predicates on each table separately and then combine relatively large intermediate result sets or run a slow operator like nested loops matching one side of the join. In this case you may get a benefit from a trigger-maintained denormalised table to do the searches. I've seen good results obtained from denormalised search tables for complex screens on a couple of large applications.
If you're interested in minimizing IO in reading data you need to check if indexes are covering the query or not. To minimize IO you should select column that are included in the index or indexes that cover all columns used in the query, this way the optimizer will read data from indexes and will never read data from actual table rows.
If you're looking into this kind of details maybe you should consider upgrading HW, changing controllers or adding more disk to have more disk spindle available for the query processor and so allowing SQL to read more data at the same time
SQL Server disk I/O is frequently the cause of bottlenecks in most systems. The I/O subsystem includes disks, disk controller cards, and the system bus. If disk I/O is consistently high, consider:
Move some database files to an additional disk or server.
Use a faster disk drive or a redundant array of inexpensive disks (RAID) device.
Add additional disks to a RAID array, if one already is being used.
Tune your application or database to reduce disk access operations.
Consider index coverage, better indexes, and/or normalization.
Microsoft SQL Server uses Microsoft Windows I/O calls to perform disk reads and writes. SQL Server manages when and how disk I/O is performed, but the Windows operating system performs the underlying I/O operations. Applications and systems that are I/O-bound may keep the disk constantly active.
Different disk controllers and drivers use different amounts of CPU time to perform disk I/O. Efficient controllers and drivers use less time, leaving more processing time available for user applications and increasing overall throughput.
First thing I would do is ensure that your indexes have been rebuilt; if you are dealing with huge amount of data and an index rebuild is not possible (if SQL server 2005 onwards you can perform online rebuilds without locking everyone out), then ensure that your statistics are up to date (more on this later).
If your database contains representative data, then you can perform a simple measurement of the number of reads (logical and physical) that your query is using by doing the following:
SET STATISTICS IO ON
GO
-- Execute your query here
SET STATISTICS IO OFF
GO
On a well setup database server, there should be little or no physical reads (high physical reads often indicates that your server needs more RAM). How many logical reads are you doing? If this number is high, then you will need to look at creating indexes. The next step is to run the query and turn on the estimated execution plan, then rerun (clearing the cache first) displaying the actual execution plan. If these differ, then your statistics are out of date.
I think you're going to be farther ahead using standard optimization techniques first -- check your execution plan, profiler trace, etc. and see whether you need to adjust your indexes, create statistics etc. -- before looking at the physical structure of your table.