I was wondering if there was any recommended max size for MDF and/or LDF Files for an SQL server instance.
For example, if I want to create a 400 GBytes Database, is there a rule to help me decide how many mdf files I should create ? or should I just go ahead and create a single gigantic 400Gbytes mdf file?
If so is this going to somehow affect the database performances ?
What you do will depend on your disk system. You need to figure out what type of transactions your application will be performing and configure your disks to be able to handle those transactions. The I/O system is the bottleneck in most systems, so this will definitely affect performance. Isolate sequential I/O's and distribute random I/O's.
Some guidelines from a SQL 2000 tuning book:
Isolate the transaction log on it's own RAID 1 or RAID 10 drive.
Configure enough drives in your RAID array or split database into filegroups on separate disks so you can keep the volumes at fewer than 125 I/Os per second(that number may be outdated).
Configure data file volumes as RAID 5 if the transactions are expected to be mostly read.
Configure data volumes as RAID 10 if more than 10% writes are expected.
Related
I have 2 options when creating a database and the number 1 priority for these databases is performance.
Option 1: Distributed Files over multiple drives on 1 Filegroup. Therefore all files are managed by SQL server and the Hard-Drives are therefore used and managed from a space perspective but we as DBA's have zero control about which drive the tables (And all associated indexes) are stored on.
Option 2: Named File Groups with the database actively partitioned into the specified hard drives.
A good assumption for this question is that all our disks are identical in speed and performance and our SAN controller is of enough quality to not be our bottleneck in this scenario.
Also assumed is that we have a "Good" tempDB setup where we have the correct file partitions on a local SSD to the server.
The second option gives us control and we can put indexes for large tables on different hard disks. This controls our read and write process for high intensity tasks and allows us to read from 2 disks and write to a third.
So my question here is how does the Distributed Files (SQL Managed) perform against the Named File Groups where the disk read and write is the limiting factor in the hardware configuration.
For Option 1:
Depending on the SAN vendor, there are many different techniques used to construct a logical disk partition (LUN) for use by SQL Server, and some of them, such as concatenating disks rather than striping.
You should consider database storage availability: RAID 1 ( mirroring) , RAID 5 to provide redundancy and RAID 10 and thus improve disk availability.
Also SSD disks are manged by SAN and modern SAN Storage auto move files with high access to SSD without user interruption.
Also the cache size of San for Read/Write data should be considered.
So Understand the capability of your SAN and be in touch with SAN engineer to build the data ,log and tempdb files especially there are different varieties of high/low speed disks contained in SAN.
For more detail SAN Storage Best Practices for SQL Server
High Performance Storage Systems for SQL Server
For option 2,
It doesn't matter in SAN environment. For details read:
SQL Server Database File Groups on a SAN: Relevant or Not?
I have 3 instances of SQL Server 2008, each on different machines with multiple databases on each instance. I have 2 separate LUNS on my SAN for MDF and LDF files. The NDX and TempDB files run on the local drive on each machine. Is it O.K. for the 3 instances to share a same volume for the data files and another volume for the log files?
I don't have thin provisioning on the SAN so I would like to not constaint disk space creating multiple volumes because I was adviced that I should create a volume (drive letter) for each instance, if not for each database. I am aware that I should split my logs and data files at least. No instance would share the actual database files, just the space on drive.
Any help is appretiated.
Of course the answer is: "It depends". I can try to give you some hints on what it depends however.
A SQL Server Instance "assumes" that it has exclusive access to its resources. So it will fill all available RAM per default, it will use all CPUs and it will try to saturate the I/O channels to get maximum performance. That's the reason for the general advice to keep your instances from concurrently accessing the same disks.
Another thing is that SQL Server "knows" that sequential I/O access gives you much higher trhoughput than random I/O, so there are a lot of mechanisms at work (like logfile organization, read-ahead, lazy writer and others) to avoid random I/O as much as possible.
Now, if three instances of SQL Server do sequential I/O requests on a single volume at the same time, then from the perspective of the volume you are getting random I/O requests again, which hurts your performance.
That being said, it is only a problem if your I/O subsystem is a significant bottleneck. If your logfile volume is fast enough that the intermingled sequential writes from the instances don't create a problem, then go ahead. If you have enough RAM on the instances that data reads can be satisfied from the buffer cache most of the time, you don't need much read performance on your I/O subsystem.
What you should avoid in each case is multiple growth steps on either log or data files. If several files on one filesystem are growing, you will get fragmentation and fragmentation can transform a sequential read or write request even from a single source to random I/O again.
The whole picture changes again if you use SSDs as disks. These have totally different requirements and behaviour, but since you didn't say anything about SSD I will assume that you use a "conventional" disk-based array or RAID configuration.
Short summary: You might get away with it, if the circumstances are right, but it is hard to assess without knowing a lot more about your systems, from both the SAN and SQL perspective.
We have a NTFS volume used for storing large number of files(currently at 500GB and growing). Currently the same is accessed via a file share by a single application. We are looking at options to scale out the file server for access by multiple applications. The other applications would only be reading files and not performing any updates. What are the options available in designing such a file server so that it doesnt become a single point of failure and provides scalability, replication and availability? Some people have suggested moving the files to the database and thus achieve all of these. Are there any better options? Thanks in advance
Microsoft Distributed File System.
DFS Replication. New state-based, multimaster replication engine that is optimized for WAN environments. DFS Replication supports replication scheduling, bandwidth throttling, and a new byte-level compression algorithm known as remote differential compression (RDC).
Wikipedia has a good overview here
RAID (redundant array of inexpensive/independent disks) would be your best option. RAID is a system where multiple drives can be grouped into one volume for added space, backup, or a combination of both. You can dynamically add, change and remove disks without having to lose data.
For example:
RAID 0 groups drives into one volume, but no backing up
RAID 1 uses half of an array's drives as a 1:1 backup for the other half
RAID 4 uses one of the array's disks as dedicated parity, essentially a compressed backup
RAID 5 is the same as the above, except the parity is spread out among all drives
RAID 6 is the same as RAID 5 except twice the amount of parity is used, which is safer
You can also dynamically switch between RAID configurations.
I have a quick question that is specific to the server setup we have available to us.
We are getting ready to migrate our help desk to a new server, as well as convert a small ACT DB to the full version of SQL 2005 Standard (from SQL Expr.)
Right now, we only have the following resources available to us as far as array configurations go.
It is server 2008 64 standard, and we will be using SQL 2005 Standard 64.
2 drives in raid 1+0 for the OS (1)
3 drives in raid 5, (2)
and 3 additional drives to allocate out for additional resources. (3)
My initial plans were to install ACT, our Help Desk and the SQL Program files and transaction log files on (2), and use (3) in raid 0 for the tempDB.
The DB sizes are very small, and honestly we could probably run everything on the first 2 arrarys with minimal performance loss (just because the DB are so small)
However we may decide to dedicate this server to SQL somewhere down the line, moving many more DB's over to it, and remove the help desk (web front end) to another server.
How intensive are the log file write operations for 2 small (<500MB) db's?
How risky is putting the TempDB on a raid 0?
Would moving the log files to the system array (1) improve performance?
With 8 disks available I'd recommend the following independent RAID arrays:
OS: RAID 1 (2 disks) (you specified RAID 10 in your question - you can't do RAID 10 with only two drives).
Database data files (including TempDB data file and log file): RAID 5 (4 disks).
Database log files: RAID 1 (2 disks).
If you think the usage of your databases will increase or you're planning to consolidate further databases you may also consider adding further disks to split out your TempDB database.
If disk space capacity isn't an issue you could consider changing the 4 disk RAID 5 array to RAID 10 which would be more performant (particularly for writes).
How are you planning to do backups?
1. How intensive are the log file write operations for 2 small (<500MB) db's?
This depends on what your application is doing and the number of concurrent connections.
2. How risky is putting the TempDB on a raid 0?
Putting TempDB on RAID 0 is risky, if you lose the TempDB database because one of the disks fails your SQL instance will stop.
3. Would moving the log files to the system array (1) improve performance?
Yes, but putting them on their own independent array would be more performant and resilient.
You should really ask this question on serverfault.com (its content is skewed towards administration rather than programming).
Suppose the following configuration:
Drive D ... Data, Drive E .... TempDB, Drive F ... Log.
and suppose all drives are on separate spindles with respective drive controllers.
Concerning performance; is the above configuration optimal, decent, or not advisable?
With budgetary constraints in mind, can any of these DB's share the save drive without significant performance degradation?
Which of these drives needs to be the fastest?
This is difficult to answer without a full analysis of your system. For example, to do this properly, we should have an idea what kind of IOPS your system will generate, in order to plan for slightly more capacity than peak load.
I always love RAID10 across the board, separate arrays for everything, and in many instances splitting into different file groups as performance needs dictate.
However, in a budget-constrained environment, here is a decent, basic configuration, for someone who wants to approximate the ideal:
4 separate arrays:
System databases: RAID 5 (not the operating system array, either!)
Data: RAID 5
Logs: RAID 10
Tempdb: RAID 1 or 10, the latter for high IOPS scenario
(Optional) - RAID 5 to dump backups to (copy from here to tape)
This setup provides decent performance, and higher recoverability chances. For example, in this instance, if your Data array fails, you can still run the server and BACKUP LOG to do a point in time recovery on the failed databases, since you could still access system databases and your transaction logs in the face of data array failure.