Distributed File's and named filegroups in SQL Server performance - sql-server

I have 2 options when creating a database and the number 1 priority for these databases is performance.
Option 1: Distributed Files over multiple drives on 1 Filegroup. Therefore all files are managed by SQL server and the Hard-Drives are therefore used and managed from a space perspective but we as DBA's have zero control about which drive the tables (And all associated indexes) are stored on.
Option 2: Named File Groups with the database actively partitioned into the specified hard drives.
A good assumption for this question is that all our disks are identical in speed and performance and our SAN controller is of enough quality to not be our bottleneck in this scenario.
Also assumed is that we have a "Good" tempDB setup where we have the correct file partitions on a local SSD to the server.
The second option gives us control and we can put indexes for large tables on different hard disks. This controls our read and write process for high intensity tasks and allows us to read from 2 disks and write to a third.
So my question here is how does the Distributed Files (SQL Managed) perform against the Named File Groups where the disk read and write is the limiting factor in the hardware configuration.

For Option 1:
Depending on the SAN vendor, there are many different techniques used to construct a logical disk partition (LUN) for use by SQL Server, and some of them, such as concatenating disks rather than striping.
You should consider database storage availability: RAID 1 ( mirroring) , RAID 5 to provide redundancy and RAID 10 and thus improve disk availability.
Also SSD disks are manged by SAN and modern SAN Storage auto move files with high access to SSD without user interruption.
Also the cache size of San for Read/Write data should be considered.
So Understand the capability of your SAN and be in touch with SAN engineer to build the data ,log and tempdb files especially there are different varieties of high/low speed disks contained in SAN.
For more detail SAN Storage Best Practices for SQL Server
High Performance Storage Systems for SQL Server
For option 2,
It doesn't matter in SAN environment. For details read:
SQL Server Database File Groups on a SAN: Relevant or Not?

Related

File system scalability options

We have a NTFS volume used for storing large number of files(currently at 500GB and growing). Currently the same is accessed via a file share by a single application. We are looking at options to scale out the file server for access by multiple applications. The other applications would only be reading files and not performing any updates. What are the options available in designing such a file server so that it doesnt become a single point of failure and provides scalability, replication and availability? Some people have suggested moving the files to the database and thus achieve all of these. Are there any better options? Thanks in advance
Microsoft Distributed File System.
DFS Replication. New state-based, multimaster replication engine that is optimized for WAN environments. DFS Replication supports replication scheduling, bandwidth throttling, and a new byte-level compression algorithm known as remote differential compression (RDC).
Wikipedia has a good overview here
RAID (redundant array of inexpensive/independent disks) would be your best option. RAID is a system where multiple drives can be grouped into one volume for added space, backup, or a combination of both. You can dynamically add, change and remove disks without having to lose data.
For example:
RAID 0 groups drives into one volume, but no backing up
RAID 1 uses half of an array's drives as a 1:1 backup for the other half
RAID 4 uses one of the array's disks as dedicated parity, essentially a compressed backup
RAID 5 is the same as the above, except the parity is spread out among all drives
RAID 6 is the same as RAID 5 except twice the amount of parity is used, which is safer
You can also dynamically switch between RAID configurations.

MDF and LDF Files size

I was wondering if there was any recommended max size for MDF and/or LDF Files for an SQL server instance.
For example, if I want to create a 400 GBytes Database, is there a rule to help me decide how many mdf files I should create ? or should I just go ahead and create a single gigantic 400Gbytes mdf file?
If so is this going to somehow affect the database performances ?
What you do will depend on your disk system. You need to figure out what type of transactions your application will be performing and configure your disks to be able to handle those transactions. The I/O system is the bottleneck in most systems, so this will definitely affect performance. Isolate sequential I/O's and distribute random I/O's.
Some guidelines from a SQL 2000 tuning book:
Isolate the transaction log on it's own RAID 1 or RAID 10 drive.
Configure enough drives in your RAID array or split database into filegroups on separate disks so you can keep the volumes at fewer than 125 I/Os per second(that number may be outdated).
Configure data file volumes as RAID 5 if the transactions are expected to be mostly read.
Configure data volumes as RAID 10 if more than 10% writes are expected.

Help with SQL Server Log, Master and TempDB locations on specific array config

I have a quick question that is specific to the server setup we have available to us.
We are getting ready to migrate our help desk to a new server, as well as convert a small ACT DB to the full version of SQL 2005 Standard (from SQL Expr.)
Right now, we only have the following resources available to us as far as array configurations go.
It is server 2008 64 standard, and we will be using SQL 2005 Standard 64.
2 drives in raid 1+0 for the OS (1)
3 drives in raid 5, (2)
and 3 additional drives to allocate out for additional resources. (3)
My initial plans were to install ACT, our Help Desk and the SQL Program files and transaction log files on (2), and use (3) in raid 0 for the tempDB.
The DB sizes are very small, and honestly we could probably run everything on the first 2 arrarys with minimal performance loss (just because the DB are so small)
However we may decide to dedicate this server to SQL somewhere down the line, moving many more DB's over to it, and remove the help desk (web front end) to another server.
How intensive are the log file write operations for 2 small (<500MB) db's?
How risky is putting the TempDB on a raid 0?
Would moving the log files to the system array (1) improve performance?
With 8 disks available I'd recommend the following independent RAID arrays:
OS: RAID 1 (2 disks) (you specified RAID 10 in your question - you can't do RAID 10 with only two drives).
Database data files (including TempDB data file and log file): RAID 5 (4 disks).
Database log files: RAID 1 (2 disks).
If you think the usage of your databases will increase or you're planning to consolidate further databases you may also consider adding further disks to split out your TempDB database.
If disk space capacity isn't an issue you could consider changing the 4 disk RAID 5 array to RAID 10 which would be more performant (particularly for writes).
How are you planning to do backups?
1. How intensive are the log file write operations for 2 small (<500MB) db's?
This depends on what your application is doing and the number of concurrent connections.
2. How risky is putting the TempDB on a raid 0?
Putting TempDB on RAID 0 is risky, if you lose the TempDB database because one of the disks fails your SQL instance will stop.
3. Would moving the log files to the system array (1) improve performance?
Yes, but putting them on their own independent array would be more performant and resilient.
You should really ask this question on serverfault.com (its content is skewed towards administration rather than programming).

Recommended placement of tempdb and log for SQL Server OLTP database(s)

Suppose the following configuration:
Drive D ... Data, Drive E .... TempDB, Drive F ... Log.
and suppose all drives are on separate spindles with respective drive controllers.
Concerning performance; is the above configuration optimal, decent, or not advisable?
With budgetary constraints in mind, can any of these DB's share the save drive without significant performance degradation?
Which of these drives needs to be the fastest?
This is difficult to answer without a full analysis of your system. For example, to do this properly, we should have an idea what kind of IOPS your system will generate, in order to plan for slightly more capacity than peak load.
I always love RAID10 across the board, separate arrays for everything, and in many instances splitting into different file groups as performance needs dictate.
However, in a budget-constrained environment, here is a decent, basic configuration, for someone who wants to approximate the ideal:
4 separate arrays:
System databases: RAID 5 (not the operating system array, either!)
Data: RAID 5
Logs: RAID 10
Tempdb: RAID 1 or 10, the latter for high IOPS scenario
(Optional) - RAID 5 to dump backups to (copy from here to tape)
This setup provides decent performance, and higher recoverability chances. For example, in this instance, if your Data array fails, you can still run the server and BACKUP LOG to do a point in time recovery on the failed databases, since you could still access system databases and your transaction logs in the face of data array failure.

What's the best storage for SQL Server 2000?

I want to access my sql server database files in a INTEL SS4000-E storage. It´s a NAS Storage. Could it be possible to work with it as a storage for sql server 2000? If not, what is the best solution?
I strongly recommend against it.
Put your data files locally on the server itself, with RAID mirrored drives. The reasons are twofold:
SQL Server will run much faster for all but the smallest workloads
SQL Server will be much less prone to corruption in case the link to the NAS gets broken.
Use the NAS to store backups of your SQL Server, not to host your datafiles. I don't know what your database size will be, or what your usage pattern will be, so I can't tell you what you MUST have. At a minimum for a database that's going to take any significant load in a production environment, I would recommend two logical drives (one for data, one for your transaction log), each consisting of a RAID 1 array of the fastest drives you can stomach to buy. If that's overkill, put your database on just two physical drives, (one for the transaction log, and one for data). If even THAT is over budget, put your data on a single drive, back up often. But if you choose the single-drive or NAS solution, IMO you are putting your faith in the Power of Prayer (which may not be a bad thing, it just isn't that effective when designing databases).
Note that a NAS is not the same thing as a SAN (on which people typically DO put database files). A NAS typically is much slower and has much less bandwidth than a SAN connection, which is designed for very high reliability, high speed, advanced management, and low latency. A NAS is geared more toward reducing your cost of network storage.
My gut reaction - I think you're mad risking your data on a NAS. SQL's expectation is continuous low-latency uninterrupted access to your storage subsystem. The NAS is almost certainly none of those things - you local or SAN storage (in order of performance, simplicity and therefore preference) - leave the NAS for offline file storage/backups.
The following KB lists some of the constraints and issues you'd encounter trying to use a NAS with SQL - while the KB covers SQL 7 through 2005, a lot of the information still applies to SQL 2008 too.
http://support.microsoft.com/kb/304261
local is almost always faster than networked storage.
Your performance for sql will depend on how your objects, files, and filegroups are defined, and how consumers use the data.
Well "best" means different things to different people, but I think "best" performance would be a TMS RAMSAN or a RAID of SSDs... etc
Best capacity would be achieved with a RAID of large HDDs...
Best reliability/data saftey would be achieved with Mirroring across many drives, and regular backups (off site preferably)...
Best availability... I don't know... maybe a clone the system and have a hot backup ready to go at all times.
Best security would require encryption, but mainly limiting physical access to the machine (and it's backups) is enough unless it's internet connected.
As the other answers point out, there will be a performance penalty here.
It is also worth mentioning that these things sometimes implement a RAM cache to improve I/O performance, if that is the case and you do trial this config, the NAS should be on the same power protection / UPS as the server hardware, otherwise in case of power outtage the NAS may 'loose' the part of the file in cache. ouch!
It can work but a dedicated fiber attached SAN will be better.
Local will usually be faster but it has limited size and won't scale easily.
I'm not familiar with the hardware but we initially deployed a warehouse on a shared NAS. Here's what we found.
We were regularly competing for resources on the head unit -- there was only so much bandwidth that it could handle. Massive warehouse queries and data loads were severely impacted.
We needed 1.5 TB for our warehouse (data/indexes/logs) we put each of these resources onto a separate set of LUNS (like you might do with attached storage). Data was spanning just 10 disks. We ran into all sorts of IO bottlenecks with this. the better solution was to create one big partition across lots of small disks and store data, index and logs all in the same place. This sped things up considerably.
If you're dealing with a moderately used OLTP system, you might be fine but a NAS can be troublesome.

Resources