Multiple File groups on a Virtual machine for SQL Server - sql-server

In many of the SQL Server articles it is mentioned that the best practice is to use multiple File group on a physical disk to avoid disk contention and disk spindle issues .So my query are :
1:Does the same theory of having multiple file group hold true for
a virtual machine ?
2:Should i still create my temp db to a
different disk and should i also create multiple temp db files to
avoid large read/write operation on the same temp db file in a
virtual machine setup for my production environment
You recommendation and reasoning would be helpful to decide the best practice.
Thanks.

Yes, it still applies to virtual servers. Part of the contention problem is accessing the Global Allocation Map (GAM) or Shared Global Allocation Map (SGAM), which exists for each database file and can only be accessed by one process or thread at a time. This is the "latch wait" problem.
If your second disk is actually on different spindles, then yes. If the database files would be on different logical disks but identical spindles, then it's not really important.
The MS recommendation is that you should create one database file for each logical processor on your server, up to 8. You should test to see if you find problems with latch contention on tempdb before adding more than 8 database files.
You do not need to (and generally should not) create multiple tempdb log files because those are used sequentially. You're always writing to the next page in the sequence, so there's no way to split up disk contention.

The question needs a bit more information about your environment. If the drives for the VM are hosted on a SAN somewhere and the drives presented to the VM are all spread across the same physical disks on the SAN then you're not going to avoid contention. If, however, the drives are not on the same physical drives then you may see an improvement. Your SAN team will have to advise you on that.
That being said, I still prefer having the log files split from the data file and tempDB being on it's own drive. The reason being that if a query doesn't go as planned then it can fill the log file drive which may take that database offline, but other databases may still be able to keep running (assuming they have enough empty space in their log files).
Again with tempDB, if that does get filled then the transaction will error out, and everything else should keep running without intervention.

Related

Merge multiple Access database into one big database

I have multiple ~50MB Access 2000-2003 databases (MDB files) that only contain tables with data. The data-databases are located on a server in my enterprise that can take ~1-2 second to respond (and about 10 seconds to actually open the 50 MDB file manually while browsing in the file explorer). I have other databases that only contain forms. Most of those forms-database (still MDB files) are actually copied from the server to the client (after some testing, the execution looks smoother) before execution with a batch file. Most of those forms-databases use table-links to fetch the data from the data-databases.
Now, my question is: is there any advantage/disadvantage to merge all data-databases from my ~50MB databases to make one big database (let's say 500MB)? Will it be slower? It would actually help to clean up my code if I wouln't have to connect to all those different databases and I don't think 500MB is a lot, but I don't pretend to be really used to Access by any mean and that's why I'm asking. If Access needs to read the whole MDB file to get the data from a specific table, then it would be slower. It wouldn't be really that surprising from Microsoft, but I've been pleased so far with MS Access database performances.
There will never be more than ~50 people connected to the database at the same time (most likely, this number won't in fact be more than 10, but I prefer being a little bit conservative here just to be sure).
The db engine does not read the entire MDB file to get information from a specific table. It must read information from the system tables (hidden tables whose names start with MSys) to determine where the data you need is stored. Furthermore, if you're using a query to retrieve information from the table, and the db engine can use an index to determine which rows satisfy the query's WHERE clause, it may read only those rows from the table.
However, you have issues with your network's performance. When those lead to dropped connections, you risk corrupting the MDB. That is why Access is not well suited for use in wide area networks or with wireless connections. And even on a wired LAN, you can suffer such problems when the network is flaky.
So while reducing the amount of data you pull across the network is a good thing, it is not the best remedy for Access on a flaky network. Instead you should migrate the data to a client-server db so it can be kept safe in spite of dropped connections.
You are walking on thin ice here.
Access will handle your scenario, but is not really meant to allow so many concurrent connections.
Merging everything in a big database (500mb) is not a wise move.
Have you tried to open it from a network location?
As far as I can suggest, I will use a backend SqlServer Express to merge all the tables in a single real client-server database.
The changes required by client mdb front-end should not be very pervasive.

Shared volume for data (multiple MDF) and another shared volume for logs (multiple LDF) on SAN

I have 3 instances of SQL Server 2008, each on different machines with multiple databases on each instance. I have 2 separate LUNS on my SAN for MDF and LDF files. The NDX and TempDB files run on the local drive on each machine. Is it O.K. for the 3 instances to share a same volume for the data files and another volume for the log files?
I don't have thin provisioning on the SAN so I would like to not constaint disk space creating multiple volumes because I was adviced that I should create a volume (drive letter) for each instance, if not for each database. I am aware that I should split my logs and data files at least. No instance would share the actual database files, just the space on drive.
Any help is appretiated.
Of course the answer is: "It depends". I can try to give you some hints on what it depends however.
A SQL Server Instance "assumes" that it has exclusive access to its resources. So it will fill all available RAM per default, it will use all CPUs and it will try to saturate the I/O channels to get maximum performance. That's the reason for the general advice to keep your instances from concurrently accessing the same disks.
Another thing is that SQL Server "knows" that sequential I/O access gives you much higher trhoughput than random I/O, so there are a lot of mechanisms at work (like logfile organization, read-ahead, lazy writer and others) to avoid random I/O as much as possible.
Now, if three instances of SQL Server do sequential I/O requests on a single volume at the same time, then from the perspective of the volume you are getting random I/O requests again, which hurts your performance.
That being said, it is only a problem if your I/O subsystem is a significant bottleneck. If your logfile volume is fast enough that the intermingled sequential writes from the instances don't create a problem, then go ahead. If you have enough RAM on the instances that data reads can be satisfied from the buffer cache most of the time, you don't need much read performance on your I/O subsystem.
What you should avoid in each case is multiple growth steps on either log or data files. If several files on one filesystem are growing, you will get fragmentation and fragmentation can transform a sequential read or write request even from a single source to random I/O again.
The whole picture changes again if you use SSDs as disks. These have totally different requirements and behaviour, but since you didn't say anything about SSD I will assume that you use a "conventional" disk-based array or RAID configuration.
Short summary: You might get away with it, if the circumstances are right, but it is hard to assess without knowing a lot more about your systems, from both the SAN and SQL perspective.

SQL Server 2008: Is there any benefit to multiple filegroups on the same physical drive?

This is most likely a ridiculous question, but I'm intrigued by the thought so I'll ask anyway. Is there any performance or benefit (outside of disaster recovery handling) of having a database on multiple filegroups stored on the same physical drive?
More specifically, if I create a secondary filegroup ONLY for full-text indexes on the same physical drive, is it beneficial? Could it be a bottleneck?
Log files in my situation are stored on a separate physical drive from data files.
It shouldn't provide any additional benefit, except that with separate file groups you could, potentially, split out your backups. As far as the I/O on the same drive, you won't gain much if anything by doing this, so if you're considering it strictly for an I/O performance reason, I would suggest holding off until you can budget separate spindles.
Multiple files have the benefit of reducing allocation contention (PFS latch contention). Really really really fast IO subsystems (eg. SSD drives) can expose this problem and require mitigation by adding more files to the database. There are more details on this at How many files should a database have? or on Benchmarking: Multiple data files on SSDs.
Multiple filegroups imply multiple files, but at the same time a hot table will not benefit from multiple filegroups because the hot spot will be, again, in a single filegroup (unless, of course, the hot filegroup is itself split into multiple files). So I would say that filegroups are to be used solely for administration purposes (eg. piece meal restore).
No, it does not matter. All it does is make it easier to move them later if you want to.

SQL Server 2008 Failover Cluster on Cluster Shared Volumes?

Can anyone think of a reason a SQL Server 2008 failover cluster couldn't use Cluster Shared Volumes for databases and log files?
It seems that using CSVs should reduce failover time and reduce the complexity of the cluster group configurations (the physical drive resources wouldn't need to "failover" anymore).
I think, but am not 100% sure, Microsoft restricts what can be on a CSV.
Of course, you can put the VHD there. That's the whole point. And the VHD could contain the database data and log files -- who knows what's "inside" a VHD.
But, not knowing your configuration, I don't know what you are trying to do. CSV is there so a single volume (e.g., a single LUN on a SAN) can be shared by multiple cluster members with individual files there in being used by different cluster members. Specifically, the information making up a VM definition and VHD.
Previously, one had to put the VM definition and VHD in a separate LUN so it could move about individually. There was nothing "wrong" with this other than the complexity of having so many LUNs.
Database files are different. You don't have as many. They are big. You want to carefully place them and watch them. Etc.
If you just put the database files inside a VHD then, as said originally, all is easy, except you don't get the detailed treatment you probably want.
If you put the database files in a separate LUN then you have all of the detailed treatment and that LUN will failover as easily as anything...

SQL Server 2005 / 2008 - multiple filegroups?

I'm a developer at heart - but every now and then, a customer doesn't have a decent DBA to deal with these issues, so I'm called in to decide....
What are your strategies / best practices when it comes to dealing with a reasonably sized SQL Server database (anything larger than Northwind or AdventureWorks) - do you use multiple filegroups? If so: how many? And why?
What are your criteria to decide when to move away from the "one filegroup for everything" approach:
database size?
database complexity?
availability / reliability requirements?
what else?
If you use multiple file groups, how many do you use? One for data, one for index, one for log? Several (how many) for data? What are your reasons for your choice - why do you use that exact number of filegroups :-)
The Microsoft trained and best practice methodology is as follows:
Log files are placed on a separate physical drive
Data files are placed on a separate physical drive
Multiple file groups: When a particular table is extremely big. Often the case in transactional database (Separate Physical Drive)
Multiple file groups: When using ranges or when wanting to split lookup data into a read-only database file (Separate Physical Drive)
Keep in mind that an MDF technically works similarly to a hard drive partition when it comes to storing data. The MDF is a randomly read file, whereas the LDF is a sequentially read file. Therefore splitting them into separate drives causes a huge performance gain, unless running solid state drives, in which case the gain is still there.
There's at least ONE good reason for having multiple (at least two) file groups in SQL Server 2008 : if you want to use the FILESTREAM feature, you have to have a dedicated and custom filegroup for your FILESTREAM data :-)
Marc
Maintaining multiple filegroups helps you reduce the I/O burden. It also allows you storage flexibility where you can back up a filegroup easily rather than a single file and separate them into an individual disk drive per file group.
Generally you should just have one Primary Filegroup and one log file against that.
Sometimes when you have very static data, you can create a SECOND filegroup that contains this static data. You can then make the filegroup READONLY which improves your performance. After all, this is pretty static data. It's not worth it if you have a low number of readonly rows (eg. lookup table values). But for some stuff (eg. archived content that can still be read in) then this might be a great option.
I got the idea from this blog post.
HTH.
I've worked on a good range of DBs, and the only time we've used filegroups was when a disk was running short on space, and we had to create a new file group on another spindle. I'm sure there are good performance reasons why that's not ideal, but that was the reality.
among other reasons additional filegroups make sense if you want to partition a table. and that makes sense if there are many rivaling reads with dissimilar where-conditions of that table. you can configure each partition to reflect one such where-condition and to be located on a different disk, thereby sending each read to another disk, thus parallel reads and less conflict.

Resources