SQL Server 2008 Failover Cluster on Cluster Shared Volumes? - sql-server

Can anyone think of a reason a SQL Server 2008 failover cluster couldn't use Cluster Shared Volumes for databases and log files?
It seems that using CSVs should reduce failover time and reduce the complexity of the cluster group configurations (the physical drive resources wouldn't need to "failover" anymore).

I think, but am not 100% sure, Microsoft restricts what can be on a CSV.
Of course, you can put the VHD there. That's the whole point. And the VHD could contain the database data and log files -- who knows what's "inside" a VHD.
But, not knowing your configuration, I don't know what you are trying to do. CSV is there so a single volume (e.g., a single LUN on a SAN) can be shared by multiple cluster members with individual files there in being used by different cluster members. Specifically, the information making up a VM definition and VHD.
Previously, one had to put the VM definition and VHD in a separate LUN so it could move about individually. There was nothing "wrong" with this other than the complexity of having so many LUNs.
Database files are different. You don't have as many. They are big. You want to carefully place them and watch them. Etc.
If you just put the database files inside a VHD then, as said originally, all is easy, except you don't get the detailed treatment you probably want.
If you put the database files in a separate LUN then you have all of the detailed treatment and that LUN will failover as easily as anything...

Related

Multiple File groups on a Virtual machine for SQL Server

In many of the SQL Server articles it is mentioned that the best practice is to use multiple File group on a physical disk to avoid disk contention and disk spindle issues .So my query are :
1:Does the same theory of having multiple file group hold true for
a virtual machine ?
2:Should i still create my temp db to a
different disk and should i also create multiple temp db files to
avoid large read/write operation on the same temp db file in a
virtual machine setup for my production environment
You recommendation and reasoning would be helpful to decide the best practice.
Thanks.
Yes, it still applies to virtual servers. Part of the contention problem is accessing the Global Allocation Map (GAM) or Shared Global Allocation Map (SGAM), which exists for each database file and can only be accessed by one process or thread at a time. This is the "latch wait" problem.
If your second disk is actually on different spindles, then yes. If the database files would be on different logical disks but identical spindles, then it's not really important.
The MS recommendation is that you should create one database file for each logical processor on your server, up to 8. You should test to see if you find problems with latch contention on tempdb before adding more than 8 database files.
You do not need to (and generally should not) create multiple tempdb log files because those are used sequentially. You're always writing to the next page in the sequence, so there's no way to split up disk contention.
The question needs a bit more information about your environment. If the drives for the VM are hosted on a SAN somewhere and the drives presented to the VM are all spread across the same physical disks on the SAN then you're not going to avoid contention. If, however, the drives are not on the same physical drives then you may see an improvement. Your SAN team will have to advise you on that.
That being said, I still prefer having the log files split from the data file and tempDB being on it's own drive. The reason being that if a query doesn't go as planned then it can fill the log file drive which may take that database offline, but other databases may still be able to keep running (assuming they have enough empty space in their log files).
Again with tempDB, if that does get filled then the transaction will error out, and everything else should keep running without intervention.

Which is the best method to store files on the server (in database or storing the location alone)?

In my project (similar to mediafire and rapidshare), clients can upload files to the server. I am using DB2 database and IBM WAS web server and JSP as server side scripting. I am creating my own encryption algorithm, as it is the main aim of the project.
I need suggestion whether files themselves should be stored in the database or if only the location of the files should be stored. Which approach is best?
There are Pros and Cons for storing BLOBs in the database.
Advantages
DBMS support for BLOBs is very good nowadays
JDBC driver support for BLOBs is very good
access to the "documents" can happen inside a transaction. No need to worry about manual cleanup or "housekeeping". If the row is deleted, so is the BLOB data
Don't have to worry about filesystem limits. Filesystems are typically not very good at storing million of files in a single directory. You will have to distribute your files across several directories.
Everything is backed up together. If you take a database backup you have everything, no need to worry about an additional filesystem backup (but see below)
Easily accessible through SQL (no FTP or other tools necessary). That access is already there and under control.
Same access controls as for the rest of the data. No need to set up OS user groups to limit access to the BLOB files.
Disadvantages
Not accessible from the OS directly (problem if you need to manipulate the files using commandline tools)
Cannot be served by e.g. a webserver directly (that could be performance problem)
Database backup (and restore) is more complicated (because of size). Incremental backups are usually more efficient in the filesystem
DBMS cache considerations
Not suited for high-write scenarios
You need to judge for yourself which advantage and which disadvantage is more important for you.
I don't share the wide-spread assumption that storing BLOBs in a database is always a bad idea. It depends - as with many other decisions.
It's general knowledge that storing files in the database -especially big ones- it's generally a bad idea. There are brilliant explanations in these questions:
Storing a file in a database as opposed to the file system?
Storing Images in DB - Yea or Nay?
And I'd like to highlight some points myself:
Storing files in your DBMS will make your data very big, and big databases are a maintaining hell (specially backups)
Portability becomes an issue, as every DBMS vendor makes its own implementation of BLOB files
There's a performance lost related to SELECT sentences to BLOB fields, compared to disk access
Well my Opinion would be to store the relevant information like path, name, description, etc... in the database and keep the file evtl. encrypted on the filesystem, it would be cheaper to scale your system adding a webserver than adding a database one as webspace is cheap comparing with databases, all you will need then is to add an IP column to your database or server name so you can address teh new webserver.

Merge multiple Access database into one big database

I have multiple ~50MB Access 2000-2003 databases (MDB files) that only contain tables with data. The data-databases are located on a server in my enterprise that can take ~1-2 second to respond (and about 10 seconds to actually open the 50 MDB file manually while browsing in the file explorer). I have other databases that only contain forms. Most of those forms-database (still MDB files) are actually copied from the server to the client (after some testing, the execution looks smoother) before execution with a batch file. Most of those forms-databases use table-links to fetch the data from the data-databases.
Now, my question is: is there any advantage/disadvantage to merge all data-databases from my ~50MB databases to make one big database (let's say 500MB)? Will it be slower? It would actually help to clean up my code if I wouln't have to connect to all those different databases and I don't think 500MB is a lot, but I don't pretend to be really used to Access by any mean and that's why I'm asking. If Access needs to read the whole MDB file to get the data from a specific table, then it would be slower. It wouldn't be really that surprising from Microsoft, but I've been pleased so far with MS Access database performances.
There will never be more than ~50 people connected to the database at the same time (most likely, this number won't in fact be more than 10, but I prefer being a little bit conservative here just to be sure).
The db engine does not read the entire MDB file to get information from a specific table. It must read information from the system tables (hidden tables whose names start with MSys) to determine where the data you need is stored. Furthermore, if you're using a query to retrieve information from the table, and the db engine can use an index to determine which rows satisfy the query's WHERE clause, it may read only those rows from the table.
However, you have issues with your network's performance. When those lead to dropped connections, you risk corrupting the MDB. That is why Access is not well suited for use in wide area networks or with wireless connections. And even on a wired LAN, you can suffer such problems when the network is flaky.
So while reducing the amount of data you pull across the network is a good thing, it is not the best remedy for Access on a flaky network. Instead you should migrate the data to a client-server db so it can be kept safe in spite of dropped connections.
You are walking on thin ice here.
Access will handle your scenario, but is not really meant to allow so many concurrent connections.
Merging everything in a big database (500mb) is not a wise move.
Have you tried to open it from a network location?
As far as I can suggest, I will use a backend SqlServer Express to merge all the tables in a single real client-server database.
The changes required by client mdb front-end should not be very pervasive.

Best strategy for storing documents in SQL Server 2008

One of our teams is going to be developing an application to store records in a SQL2008 database and each of these records will have an associated PDF file. There is currently about 340GB of files, with most (70%) being about 100K, but some are several Megabytes in size. Data is mostly inserted and read, but the files are updated on occasion. We are debating between the following options:
Store the files as BLOBs in the database.
Store the files outside the database and store the paths in the database.
Use SQL2008's Filestream feature to store the files.
We have read the Micrsoft best practices regarding filestream data, but since the files vary in size, we are not sure which path to choose. We are leaning toward option 3 (filestream), but have some questions:
Which architecture would you choose given the amount of data and file sizes noted above?
Data access will be done using SQL authentication, not Windows authentication, and the web server will likely not be able to access the files using Windows API. Would this make filstream perform worse than the other two options?
Since the SQL backups include the filestream data, this would lead to very large database backups. How do others handle backing up databases with a large amount of filestream data?
OK, here we go. Option 2 is a really bad idea - you end up with untestable integrity constraints and backups that are not guaranteed to be consistent per definition because you can not take point in time backups. Not a problem in MOST scenarios, it turns into one the moment you have a more complicated (point in time) recovery.
Options 1 and 3 are pretty equal, albeit with some implications.
Filestream can use a lot more disc space. Basically, every version has a guid, if you make updates the old files stay around until the next backup.
OTOH the files do not count as db size (express edition - not against the 10gb limit should you use it) and access is further down possible using a file share. This is added flexibility.
In database has the most limited options regarding access (no way for the web server to just open the file after getting the path from the sql - it has to funnel the complete file through the sql protocol layer) but has advantages in regards of having less files (numbers). Putting the blobs into a separate table and that one a separate set of spindles may be strategically a good idea.
Regarding your questions:
1: I would go with in database storage. Try out both - filestream and not. As you use the same API anyway, this is a simple change in the table definition.
2: Yes, worse than direct file access, but it would be more protected than direct file access. Otherwise I do not think filestream and blob make a significant difference.
3: where do you have a huge backup here? Sorry to ask, but your 340gb is not exactly a large database. And you need to back it up ANYWAY. Better do it in one consistent state, which is what you achieve with db storage. Plus integrity (no one accidentally deleting unused documents without cleaning up the database). The DB is not significantly larger than doing that split, and it is a simple one place backup.
At the end, the question is db integrity and ease of backing things up. Win for SQL Server unless you get large - and this means 360 terabyte of data.
Store the files outside the database and store the paths in the database.
because it takes too much space to store files in the database.
I would definitely recommend (3) - this is the sort of scenario that this feature is specifically built to handle, and it is handled very well in my opinion.
This white paper has lots of useful information - http://msdn.microsoft.com/en-us/library/cc949109(SQL.100).aspx - and from a security point of view mentions that...
There are two security requirements for using the FILESTREAM feature. Firstly, SQL Server must be configured for integrated security. Secondly, if remote access will be used, then the SMB port (445) must be enabled through any firewall systems.
With regard to Backups, see the accepted answer to this question - SQL Server FILESTREAM limitation
I've used a Index/Content method that you haven't listed but it might help. You have a table of files that are stored as a blob of binary code with a unique id or row number. The next SQL table will provide the index, the name of the file, the path to it, keywords, file type, file size, check sum... what ever you need. This is the best I have have seen to store files for working with thousands of uploaded documents. The index is required to view the file as it would just be binary text to the user if they have no idea what the file type is. We store the data in 2 separate databases to allow the index on one server and the file store on multiple servers for easy expansion. At that point the index table/database contains the name or key to the server the file is on. If the user has access to read that particular index table, then they have access to the file.
This scenario is easy: the FILESTREAM recomendation said that is best when the files are (on average) larger than 1MB, wich is not your case, for smaller objects, storing varbinary(max) BLOBs in the database often provides better streaming performance.
Since you will be accesing the files directly from SQL Server and not from filesystem then you should store it using BLOBs.
Read When to Use FILESTREAM: http://technet.microsoft.com/en-us/library/bb933993%28v=sql.105%29.aspx
Have you looked at RBS (Remote Blob Storage) solution? If you use the Filestream RBS provider, it will internally keep your blobs as Filestream files or varbinary(max) values, depending on what gets better performances based on the blob size.
Remote BLOB Store Provider Library Implementation Specification
SQL Remote Blob Storage Team Blog

Does each instance of SQL Server on a cluster require its own LUN?

I'm setting up a Misc SQL Cluster (Windows 2008/SQL 2005 & 2008) that will be active/active and have about a dozen SQL instances on it. From the documentation I've read, I can't tell if each SQL instance will need its own LUN, or if I can have a single, really big LUN created, and then create a dozen different partitions on that LUN (one for each SQL instance).
In either case, the physical disk layout on the SAN won't change, so it really doesn't matter from a performance standpoint which one I choose (assuming I can choose either). I just want to know if the partition method works, or if each instance needs to own its own LUN to handle the failover properly.
Each instance will need separate disks/LUNs.
They will be "owned" by the active node and are a dependent resource.
If you think about it, how can 2 SQL Server instances share a drive? It's a conflict.
Since Windows 2003 you can use NTFS mount points, that is mount a LUN in an empoty folder in a drive. I've not tried it myself though.
Edit: some nice pictures here "How do Cluster Shared Volumes work"
You will need separate disks for each instance. In Server 2008 you will add the shared storage for each instance in the Failover Manager.
I wonder if there is a way to use CSV (Clustered Shared Volumes) to get around this. In Windows 2008 R2 you can set it up so that you can have more than one node in a cluster access the same LUN - no longer do you have to assign a LUN to one node only!!

Resources