I'm creating a new DB and have a bunch of static data that won't change. If it does, it will be a manual process AND it will happen very rarely.
This data is a mix of varchars and Geographies.
I'm guessing it could be around 100K or so in total, over 4 or so tables.
Questions
Should I put these on a READ ONLY filegroup
Can I create the tables in the designer and define the filegroup during creation? Or is it only possible via a script?
Once the data is in the table (on a read only filegroup), can I change it later? Is it really hard to do that?
thanks.
It is worth it for VLDB (very large databases) for assorted reasons.
For 100,000 rows or 100 KB, I wouldn't bother.
This SQL Server support engineering team article discusses one of the associated "urban legends".
There is another one (can't find it) where you need 300 GB - 1B of data before you should consider multiple files/filegroups.
But, to answer specifically
Personal choice (there is no hard and fast rule)
Yes (edit:) In SSMS 2005, design mode, go to Indexes/Key, "data space specfication". The data lives where the clustered index is. WIthout a clustered index, then you can only do it via CREATE TABLE (..) ON filegroup
Yes, but You'll have to ALTER DATABASE myDB MODIFY FILEGROUP foo READ_WRITE with the database in single user exclusive mode
It is unlikely to hurt to put the data in to a read only space but I am unsure you will gain significantly. A read-only file group (or tablespace in Oracle) can give you 2 advantages; less to back-up each time a full backup is taken and a higher level of security over the data (e.g. it cannot be changed by a bug, accessing the DB via another tool, etc). The backup advantage is most true with larger DBs where backup windows are tight so putting a small amount of effort into excluding file groups is valuable. The security one depends on the nature of the site, data, etc. (if you do exclude the read-only space from regular backups make sure you get a copy on any retained backup tapes. I tend to backup up read-only spaces once a month.)
I am not familiar with designer.
Changing to and from read only is not onerous.
I think anything you read here is likely to be speculation, unless you have any evidence that it's been actually tried and recommended - to me it looks like a novel but unlikely idea. Do you have some reason to suspect that conventional practices will be unsatisfactory? It should be fairly easy to just try it and find out. Post your results if you get a chance.
Related
I need a bit of a help with the following.
Note: in the following scenario, I do not have access to the application's source code, therefore I can only make changes at the database level.
Our database uses dbo.[BLOB] to store all kinds of files and documents. The table uses an IMAGE (yeah, obsolete) data type. Since this particular table is growing quite fast, I was thinking to implement some archiving feature.
My idea is to move all files older than X months to a second database, and then somehow link from the dbo.[BLOB] table to the external/archiving database.
Is this even possible? The goal is to reduce the database size, in order to improve backup and query performance.
Any ideas and hints much appreciated.
Thanks.
Fabian
There are 2 features to help you with backup speed and database size in this case:
Filestream will allow you to store BLOBS as files on the file system instead of in database file. It complicates backup scenario, you have to backup both database and files but you get smaller database file along with faster access time to documents. It is much faster to read file from filesystem than from blob column. Additionally filestream allows for files bigger than 2GB.
Partitioning will split table into smaller chunks on physical level. This way you do not need to access application code to change where particular rows are stored physically and decide which data needs to be accessed fast and put it on SSD drive and which can land on slower archive. This way you can have more frequent backups on current partition, while less frequent on archive.
Prior to SQL Server 2016 SP1 - this feature was available in Enterprise version only. For SQL Server 2016 SP1 this is available in all editions.
In your case most likely you should go with filestream first.
W/o modifying the application you can do, basically, nothing. You may try to see if changing the column type will be tolerated by the application (very unlikely, 99.99% it will break the app) and try to use FILESTREAM, but even if you succeed it does not give much benefits (backup size will be the same, for example).
A second thing you can try is to replace the table with a view, using INSTEAD OF triggers for updates. It is still very likely to break the application (lets say 99.98%). The goal would be to have a distributed partitioned view (or cross DB partitioned view) which presents to the application an unified view of the 'cold' and 'hot' data. Is complex, error prone, but it will reduce the size of the backups (as long as data is moved from hot to cold and cold data is immutable, requiring few backups).
The goal is to reduce the database size, in order to improve backup and query performance.
To reduce the backup size, as I explained above, you can do, basically, nothing. But performance you need to investigate it and address it appropriately, based on your findings. Saying the the database is slow 'because of BLOBs' is hand-waving.
Due to a new architecture I have to split the current database in 2 databases both of them having 50% of the initial database (= 15GB).
1/ Is this a good idea to execute DBCC SHRINKDATABASE (0) for the 2 newly created databases? I'm asking this as I've read many articles stating the shrinking database leads to fragmentation.
2/ Is a good approach to set both databases as SIMPLE recovery while doing the separation and then to set it to FULL back?
What is the action you recommend to apply in this cases?
Shrinking obviously is there is no chance the space gets reiused. THat said ,the database initially is on the really tiny size - I have databases here that have multiple files and earch is larger. So, the gain may simply not worth it.
But it is a valid case for shrinking IF the database will not grow back in reasonable time and you need the space.
We are using SQL Server 2005. Recently SQL server 2005 crashed in our production environment due to large tempdb size.
1) what could be reason for large tempdb size?
2) Is there any way to look what data is there in tempdb?
2) Is there any way to look what data is there in tempdb?
No, because it is not kept there. Tempdb has very special treatment, like being dropped on every server restart.
1) what could be reason for large tempdb size?
Inefficient SQL, maintenance jobs or just the data at hand. Obviously a 800gb, 6000gb database may require more tempdb space than a 4gb online crm attempt. You dont really specify ANY size in absolute terms. What IS large? I have tempdb databases hardcoded at 64gb ony my smaller servers.
Typical SQL that goes into Tempdb are:
Sorts that are not solvable as part of the query (you need to store keys SOMEWHERE)
DISCTINCT. Needs all returned data in tempdb to find doubles.
Certain poerations psossibly during joins.
Tempdb usage (temporary tables). I just mention them becasue I often keep some hundred megabytes worth of data in them during loads and scrubbing.
In general you can find those queries by having hugh IO stats in the query log, or simply being slow.
That said, maintenance plans also go int there, but with reason. At the end, your "large" is possibly mine "not even worth mentioning tiny". It really depends what you do. Use the query trace tool to find out what takes long.
Physically Tempdb is very special in treatment - sql server does NOT write to the file if it does not have to (i.e. keeps thigns in memory). Writes to the disc are a sign of memory flowing ofer. This is different from normal db write behavior. Tempdb, IF it flows over, is best put onto a decently fast SSD... which wont normally be SO expensive because it still will be relatively small.
Use the query here to find other queries for tempdb - basicaly you are fishing in dirty water here, need to try out things until you find the culprit.
The usual way to grow a SQL Server database — any database, not just tempdb — is to have it's data and log files set to autogrow (especially the log files). SQL Server is perfectly happy grow the log and data files until the consume all the disk space available to them.
Best practice, IMHO, is to allow limited autogrowth on the data files (put an upper bound on how big it can grow) and fix the size of the log files. You might need to do some analysis to figure out how big the log files need to be. For tempdb, especially, the recovery model should be set to simple, too.
Ok tempdb is a kinda special database. Any temporary objects you use in procedures etc, is created here. So if you application uses a lot of temp tables in queries, they will all reside here, but they should clean themselves up after the connection (spid) is reset.
The other thing that can grow a tempdb is database maintenance tasks, however they will take a larger toll on the database log files.
Tempdb is also cleared every time you restart the SQL Service. It basically drops the database and re-create it. I agree with #Nic about leaving tempdb as it is, dont muck around with it, any issues with space in tempdb, usually indicates another larger problem somewhere else. More space will mask the problem, but only for so long. How much free space does your drive have that you have tempdb on?
Other thing, if not already, try and put tempdb on it's own drive, and one more if possible, have the data and the log files on their own separate drives.
So, if you dont restart your SQL Server/Service, your drive will run out of space pretty soon.,
use tempdb
select (size*8) as FileSizeKB from sys.database_files
I have a design decision to make regarding documents uploaded to my web site: I can either store them on my file server somewhere, or I can store them as a blob in my database (MSSQL 2005). If it makes any difference to the design decision, these documents are confidential and must have a certain degree of protection.
The considerations I've thought of are:
Storing on the file server makes for HUUUUUUUGE numbers of files all dumped in a single directory, and therefore slower access, unless I can work out a reasonable semantic definition for a directory tree structure
OTOH, I'm guessing that the file server can handle compression somewhat better than the DB... or am I wrong?
My instincts tell me that the DB's security is stronger than the file server's, but I'm not sure if that's necessarily true.
Don't know how having terabytes of blobs in my DB will affect performance.
I'd very much appreciate some recommendations here. Thanks!
In SQL Server 2005, you only have the choice of using VARBINARY(MAX) to store the files inside the database table, or then keep them outside.
The obvious drawback of leaving them outside the database is that the database can't really control what happens to them; they could be moved, renamed, deleted.....
SQL Server 2008 introduces the FILESTERAM attribute on VARBINARY(MAX) types, which allows you to leave the files outside the database table, but still under transactional control of the database - e.g. you cannot just delete the files from the disk, the files are integral part of the database and thus get copied and backed up with it. Great if you need it, but it could make for some huge backups! :-)
The SQL Server 2008 launch presented some "best practices" as to when to store stuff in the database directly, and when to use FILESTREAM. These are:
if the files are typically less than 256 KB in size, the database table is the best option
if the files are typically over 1 MB in size, or could be more than 2 GB in size, then FILESTREAM (or in your case: plain old filesystem) is your best choice
no recommendation for files between those two margins
Also, in order not to negatively impact performance of your queries, it's often a good idea to put the large files into a separate table alltogether - don't have the huge blobs be part of your regular tables which you query - but rather create a separate table, which you only ever query against, if you really need the megabytes of documents or images.
So that might give you an idea of where to start out from!
I strongly suggest you to consider the filesystem solution. The reasons are:
you have better access to the files (precious in case of debugging), meaning that you can use regular console-based tools
you can quickly and easily take advantage of the OS to distribute the load, for example using a distributed filesystem, add redundancy via a hardware RAID etc.
you can take advantage of the OS access control lists to enforce permissions.
you don't clog your database
If you are worried about large amounts of entries in your directories, you can always create a branching schema. for example:
filename : hello.txt
filename md5: 2e54144ba487ae25d03a3caba233da71
final filesystem position: /path/2e/54/hello.txt
There's a LOT of "it depends" behind this popular subject. Since you say the documents are sensitive and confidential, off the cuff I'd go with storing in the database. Here are a few reasons:
Potentially better security. It is often easier to hack a file system than a database.
Better volume control. Thousands of files in one folder can strain an OS, where a database can take millions of rows in one table without blinking.
Better searching and scanning. Add categorizing columns when you load the data, or try out full text indexing to scan the actual documents.
Backups may be more efficient -- just add another database to your backup plan, and you're covered (once you work out space details, of course). And those backup files are another layer of obfuscation on anyone trying to get at your sensitive documents.
SQL Server 2008 has data compression options that may help here. That, or have the application do it? (More security through obfuscation, perhaps)
SQL Server 2008 also has the filestream data type, which may help here, but I'm not familiar enough with it to give a recommendation for your situation.
I'm a developer at heart - but every now and then, a customer doesn't have a decent DBA to deal with these issues, so I'm called in to decide....
What are your strategies / best practices when it comes to dealing with a reasonably sized SQL Server database (anything larger than Northwind or AdventureWorks) - do you use multiple filegroups? If so: how many? And why?
What are your criteria to decide when to move away from the "one filegroup for everything" approach:
database size?
database complexity?
availability / reliability requirements?
what else?
If you use multiple file groups, how many do you use? One for data, one for index, one for log? Several (how many) for data? What are your reasons for your choice - why do you use that exact number of filegroups :-)
The Microsoft trained and best practice methodology is as follows:
Log files are placed on a separate physical drive
Data files are placed on a separate physical drive
Multiple file groups: When a particular table is extremely big. Often the case in transactional database (Separate Physical Drive)
Multiple file groups: When using ranges or when wanting to split lookup data into a read-only database file (Separate Physical Drive)
Keep in mind that an MDF technically works similarly to a hard drive partition when it comes to storing data. The MDF is a randomly read file, whereas the LDF is a sequentially read file. Therefore splitting them into separate drives causes a huge performance gain, unless running solid state drives, in which case the gain is still there.
There's at least ONE good reason for having multiple (at least two) file groups in SQL Server 2008 : if you want to use the FILESTREAM feature, you have to have a dedicated and custom filegroup for your FILESTREAM data :-)
Marc
Maintaining multiple filegroups helps you reduce the I/O burden. It also allows you storage flexibility where you can back up a filegroup easily rather than a single file and separate them into an individual disk drive per file group.
Generally you should just have one Primary Filegroup and one log file against that.
Sometimes when you have very static data, you can create a SECOND filegroup that contains this static data. You can then make the filegroup READONLY which improves your performance. After all, this is pretty static data. It's not worth it if you have a low number of readonly rows (eg. lookup table values). But for some stuff (eg. archived content that can still be read in) then this might be a great option.
I got the idea from this blog post.
HTH.
I've worked on a good range of DBs, and the only time we've used filegroups was when a disk was running short on space, and we had to create a new file group on another spindle. I'm sure there are good performance reasons why that's not ideal, but that was the reality.
among other reasons additional filegroups make sense if you want to partition a table. and that makes sense if there are many rivaling reads with dissimilar where-conditions of that table. you can configure each partition to reflect one such where-condition and to be located on a different disk, thereby sending each read to another disk, thus parallel reads and less conflict.