We have a NTFS volume used for storing large number of files(currently at 500GB and growing). Currently the same is accessed via a file share by a single application. We are looking at options to scale out the file server for access by multiple applications. The other applications would only be reading files and not performing any updates. What are the options available in designing such a file server so that it doesnt become a single point of failure and provides scalability, replication and availability? Some people have suggested moving the files to the database and thus achieve all of these. Are there any better options? Thanks in advance
Microsoft Distributed File System.
DFS Replication. New state-based, multimaster replication engine that is optimized for WAN environments. DFS Replication supports replication scheduling, bandwidth throttling, and a new byte-level compression algorithm known as remote differential compression (RDC).
Wikipedia has a good overview here
RAID (redundant array of inexpensive/independent disks) would be your best option. RAID is a system where multiple drives can be grouped into one volume for added space, backup, or a combination of both. You can dynamically add, change and remove disks without having to lose data.
For example:
RAID 0 groups drives into one volume, but no backing up
RAID 1 uses half of an array's drives as a 1:1 backup for the other half
RAID 4 uses one of the array's disks as dedicated parity, essentially a compressed backup
RAID 5 is the same as the above, except the parity is spread out among all drives
RAID 6 is the same as RAID 5 except twice the amount of parity is used, which is safer
You can also dynamically switch between RAID configurations.
Related
I have 2 options when creating a database and the number 1 priority for these databases is performance.
Option 1: Distributed Files over multiple drives on 1 Filegroup. Therefore all files are managed by SQL server and the Hard-Drives are therefore used and managed from a space perspective but we as DBA's have zero control about which drive the tables (And all associated indexes) are stored on.
Option 2: Named File Groups with the database actively partitioned into the specified hard drives.
A good assumption for this question is that all our disks are identical in speed and performance and our SAN controller is of enough quality to not be our bottleneck in this scenario.
Also assumed is that we have a "Good" tempDB setup where we have the correct file partitions on a local SSD to the server.
The second option gives us control and we can put indexes for large tables on different hard disks. This controls our read and write process for high intensity tasks and allows us to read from 2 disks and write to a third.
So my question here is how does the Distributed Files (SQL Managed) perform against the Named File Groups where the disk read and write is the limiting factor in the hardware configuration.
For Option 1:
Depending on the SAN vendor, there are many different techniques used to construct a logical disk partition (LUN) for use by SQL Server, and some of them, such as concatenating disks rather than striping.
You should consider database storage availability: RAID 1 ( mirroring) , RAID 5 to provide redundancy and RAID 10 and thus improve disk availability.
Also SSD disks are manged by SAN and modern SAN Storage auto move files with high access to SSD without user interruption.
Also the cache size of San for Read/Write data should be considered.
So Understand the capability of your SAN and be in touch with SAN engineer to build the data ,log and tempdb files especially there are different varieties of high/low speed disks contained in SAN.
For more detail SAN Storage Best Practices for SQL Server
High Performance Storage Systems for SQL Server
For option 2,
It doesn't matter in SAN environment. For details read:
SQL Server Database File Groups on a SAN: Relevant or Not?
I have 3 instances of SQL Server 2008, each on different machines with multiple databases on each instance. I have 2 separate LUNS on my SAN for MDF and LDF files. The NDX and TempDB files run on the local drive on each machine. Is it O.K. for the 3 instances to share a same volume for the data files and another volume for the log files?
I don't have thin provisioning on the SAN so I would like to not constaint disk space creating multiple volumes because I was adviced that I should create a volume (drive letter) for each instance, if not for each database. I am aware that I should split my logs and data files at least. No instance would share the actual database files, just the space on drive.
Any help is appretiated.
Of course the answer is: "It depends". I can try to give you some hints on what it depends however.
A SQL Server Instance "assumes" that it has exclusive access to its resources. So it will fill all available RAM per default, it will use all CPUs and it will try to saturate the I/O channels to get maximum performance. That's the reason for the general advice to keep your instances from concurrently accessing the same disks.
Another thing is that SQL Server "knows" that sequential I/O access gives you much higher trhoughput than random I/O, so there are a lot of mechanisms at work (like logfile organization, read-ahead, lazy writer and others) to avoid random I/O as much as possible.
Now, if three instances of SQL Server do sequential I/O requests on a single volume at the same time, then from the perspective of the volume you are getting random I/O requests again, which hurts your performance.
That being said, it is only a problem if your I/O subsystem is a significant bottleneck. If your logfile volume is fast enough that the intermingled sequential writes from the instances don't create a problem, then go ahead. If you have enough RAM on the instances that data reads can be satisfied from the buffer cache most of the time, you don't need much read performance on your I/O subsystem.
What you should avoid in each case is multiple growth steps on either log or data files. If several files on one filesystem are growing, you will get fragmentation and fragmentation can transform a sequential read or write request even from a single source to random I/O again.
The whole picture changes again if you use SSDs as disks. These have totally different requirements and behaviour, but since you didn't say anything about SSD I will assume that you use a "conventional" disk-based array or RAID configuration.
Short summary: You might get away with it, if the circumstances are right, but it is hard to assess without knowing a lot more about your systems, from both the SAN and SQL perspective.
Suppose the following configuration:
Drive D ... Data, Drive E .... TempDB, Drive F ... Log.
and suppose all drives are on separate spindles with respective drive controllers.
Concerning performance; is the above configuration optimal, decent, or not advisable?
With budgetary constraints in mind, can any of these DB's share the save drive without significant performance degradation?
Which of these drives needs to be the fastest?
This is difficult to answer without a full analysis of your system. For example, to do this properly, we should have an idea what kind of IOPS your system will generate, in order to plan for slightly more capacity than peak load.
I always love RAID10 across the board, separate arrays for everything, and in many instances splitting into different file groups as performance needs dictate.
However, in a budget-constrained environment, here is a decent, basic configuration, for someone who wants to approximate the ideal:
4 separate arrays:
System databases: RAID 5 (not the operating system array, either!)
Data: RAID 5
Logs: RAID 10
Tempdb: RAID 1 or 10, the latter for high IOPS scenario
(Optional) - RAID 5 to dump backups to (copy from here to tape)
This setup provides decent performance, and higher recoverability chances. For example, in this instance, if your Data array fails, you can still run the server and BACKUP LOG to do a point in time recovery on the failed databases, since you could still access system databases and your transaction logs in the face of data array failure.
I want to access my sql server database files in a INTEL SS4000-E storage. It´s a NAS Storage. Could it be possible to work with it as a storage for sql server 2000? If not, what is the best solution?
I strongly recommend against it.
Put your data files locally on the server itself, with RAID mirrored drives. The reasons are twofold:
SQL Server will run much faster for all but the smallest workloads
SQL Server will be much less prone to corruption in case the link to the NAS gets broken.
Use the NAS to store backups of your SQL Server, not to host your datafiles. I don't know what your database size will be, or what your usage pattern will be, so I can't tell you what you MUST have. At a minimum for a database that's going to take any significant load in a production environment, I would recommend two logical drives (one for data, one for your transaction log), each consisting of a RAID 1 array of the fastest drives you can stomach to buy. If that's overkill, put your database on just two physical drives, (one for the transaction log, and one for data). If even THAT is over budget, put your data on a single drive, back up often. But if you choose the single-drive or NAS solution, IMO you are putting your faith in the Power of Prayer (which may not be a bad thing, it just isn't that effective when designing databases).
Note that a NAS is not the same thing as a SAN (on which people typically DO put database files). A NAS typically is much slower and has much less bandwidth than a SAN connection, which is designed for very high reliability, high speed, advanced management, and low latency. A NAS is geared more toward reducing your cost of network storage.
My gut reaction - I think you're mad risking your data on a NAS. SQL's expectation is continuous low-latency uninterrupted access to your storage subsystem. The NAS is almost certainly none of those things - you local or SAN storage (in order of performance, simplicity and therefore preference) - leave the NAS for offline file storage/backups.
The following KB lists some of the constraints and issues you'd encounter trying to use a NAS with SQL - while the KB covers SQL 7 through 2005, a lot of the information still applies to SQL 2008 too.
http://support.microsoft.com/kb/304261
local is almost always faster than networked storage.
Your performance for sql will depend on how your objects, files, and filegroups are defined, and how consumers use the data.
Well "best" means different things to different people, but I think "best" performance would be a TMS RAMSAN or a RAID of SSDs... etc
Best capacity would be achieved with a RAID of large HDDs...
Best reliability/data saftey would be achieved with Mirroring across many drives, and regular backups (off site preferably)...
Best availability... I don't know... maybe a clone the system and have a hot backup ready to go at all times.
Best security would require encryption, but mainly limiting physical access to the machine (and it's backups) is enough unless it's internet connected.
As the other answers point out, there will be a performance penalty here.
It is also worth mentioning that these things sometimes implement a RAM cache to improve I/O performance, if that is the case and you do trial this config, the NAS should be on the same power protection / UPS as the server hardware, otherwise in case of power outtage the NAS may 'loose' the part of the file in cache. ouch!
It can work but a dedicated fiber attached SAN will be better.
Local will usually be faster but it has limited size and won't scale easily.
I'm not familiar with the hardware but we initially deployed a warehouse on a shared NAS. Here's what we found.
We were regularly competing for resources on the head unit -- there was only so much bandwidth that it could handle. Massive warehouse queries and data loads were severely impacted.
We needed 1.5 TB for our warehouse (data/indexes/logs) we put each of these resources onto a separate set of LUNS (like you might do with attached storage). Data was spanning just 10 disks. We ran into all sorts of IO bottlenecks with this. the better solution was to create one big partition across lots of small disks and store data, index and logs all in the same place. This sped things up considerably.
If you're dealing with a moderately used OLTP system, you might be fine but a NAS can be troublesome.
As a part of database maintenance we are thinking of taking daily backups onto an external/firewire drives. Are there any specific recommended drives for the frequent read/write operations from sql server 2000 to take backups?
Whatever you do, just don't use USB 1.1.
The simple fact is that harddrives over a period of time will fail. The best two solutions
I can recommend unfortunately do not avail of using harddrives.
Using a tape backup, granted is slower but you get the flexibility of having the option of offsite backups. It is easy to put a tape in the boot of a car. Rotating the tapes means that you can have pretty recent protection against any unforseen situations.
Another option is an online backup solution where the backups are encrypted and copied offsite. My reccommendation is definitly at least having some sort of offsite backup external to the building that you keep the SQL servers. After all it is "disaster" recovery.
Pretty much any external drive can be used here, provided it has the space to hold your backups and enough performance to get the backups there. The specifics depend on your exact requirements.
In my experience, FireWire tends to outperform USB for disk activity, regardless of their theoretical maximum transfer rates. And FireWire 800 will perform even better yet. I have found poor performance from FireWire and USB drives when you have multiple concurrent reads/writes going on, but with backups, it's generally more large sequential reads and writes.
Another option that is a little bit more complex to setup and manage, but can provide you with greater flexibility and performance is external SATA (eSATA). You can even get Hot Swappable external SATA enclosures for even greater convenience, and ease of taking your backups offsite.
However, another related option that I've had excellent success with is to setup a separate server to act as your backup server. You can use whatever disk options you choose (FireWire, SATA, eSATA, SCSI, FiberChannel, iSCSI, etc), and share out that disk storage as a network share (I use NFS and Samba on a Linux box, but for a Windows oriented network, a Windows share will work fine). You can then access the shares across the network and backup multiple machines to it. Also, the separation of backup server from your production machines will give you greater flexibility if you need to take it offline for maintenance, adding/removing storage, etc.
Drobo!
A USB hard drive RAID array that uses normal - off the shelf hard drives. 4 bays, when you need more space, buy another hard drive. Out of bays? Buy bigger hard drives and replace your smallest in the array.
http://www.drobo.com/
Depending on the size of the databases speed of the drive can be a real factor. I would look into something like Drobo but with an eSata or SAS interface. There is nothing more entertaining than watching a terabyte go through USB 2.0. Also, you might consider something like hyperbac or RedGate SQL Backup to compress the backup and make it easier to fit on the drive as well.
For the most part, external drives aren't a good option - unless your database is really small.
Other than some of the options others have listed, you can also use UNC/Network shares as a great 'off-box' option.
Check out the following video for some other options:
SQL Server Backup Options (Free Video)
And the videos on configuring backups on the site will show you how to specify a network path for backup purposes.