Storing files in SQL Server - sql-server

It's an old question I know, but with SQL Server 2012 is it finally ok to store files in the database, or should they really be kept in the filesystem with only references to them in the database?
If storing them in the database is considered acceptable these days, what is the most effective way to do it?
I'm planning to apply encryption so I appreciate processing will not be lightning fast.

There's a really good paper by Microsoft Research called To Blob or Not To Blob.
Their conclusion after a large number of performance tests and analysis is this:
if your pictures or document are typically below 256K in size, storing them in a database VARBINARY column is more efficient
if your pictures or document are typically over 1 MB in size, storing them in the filesystem is more efficient (and with SQL Server 2008's FILESTREAM attribute, they're still under transactional control and part of the database)
in between those two, it's a bit of a toss-up depending on your use
If you decide to put your pictures into a SQL Server table, I would strongly recommend using a separate table for storing those pictures - do not store the employee photo in the employee table - keep them in a separate table. That way, the Employee table can stay lean and mean and very efficient, assuming you don't always need to select the employee photo, too, as part of your queries.
For filegroups, check out Files and Filegroup Architecture for an intro. Basically, you would either create your database with a separate filegroup for large data structures right from the beginning, or add an additional filegroup later. Let's call it "LARGE_DATA".
Now, whenever you have a new table to create which needs to store VARCHAR(MAX) or VARBINARY(MAX) columns, you can specify this file group for the large data:
CREATE TABLE dbo.YourTable
(....... define the fields here ......)
ON Data -- the basic "Data" filegroup for the regular data
TEXTIMAGE_ON LARGE_DATA -- the filegroup for large chunks of data
Check out the MSDN intro on filegroups, and play around with it!

There's still no simple answer. It depends on your scenario. MSDN has documentation to help you decide.
There are other options covered here. Instead of storing in the file system directly or in a BLOB, you can use the FileStream or File Table in SQL Server 2012. The advantages to File Table seem like a no-brainier (but admittedly I have no personal first-hand experience with them.)
The article is definitely worth a read.

You might read up on FILESTREAM. Here is some info from the docs that should help you decide:
If the following conditions are true, you should consider using FILESTREAM:
Objects that are being stored are, on average, larger than 1 MB.
Fast read access is important.
You are developing applications that use a middle tier for application logic.
For smaller objects, storing varbinary(max) BLOBs in the database often provides better streaming performance.

Related

How to store files effectively in SQL server [duplicate]

It's an old question I know, but with SQL Server 2012 is it finally ok to store files in the database, or should they really be kept in the filesystem with only references to them in the database?
If storing them in the database is considered acceptable these days, what is the most effective way to do it?
I'm planning to apply encryption so I appreciate processing will not be lightning fast.
There's a really good paper by Microsoft Research called To Blob or Not To Blob.
Their conclusion after a large number of performance tests and analysis is this:
if your pictures or document are typically below 256K in size, storing them in a database VARBINARY column is more efficient
if your pictures or document are typically over 1 MB in size, storing them in the filesystem is more efficient (and with SQL Server 2008's FILESTREAM attribute, they're still under transactional control and part of the database)
in between those two, it's a bit of a toss-up depending on your use
If you decide to put your pictures into a SQL Server table, I would strongly recommend using a separate table for storing those pictures - do not store the employee photo in the employee table - keep them in a separate table. That way, the Employee table can stay lean and mean and very efficient, assuming you don't always need to select the employee photo, too, as part of your queries.
For filegroups, check out Files and Filegroup Architecture for an intro. Basically, you would either create your database with a separate filegroup for large data structures right from the beginning, or add an additional filegroup later. Let's call it "LARGE_DATA".
Now, whenever you have a new table to create which needs to store VARCHAR(MAX) or VARBINARY(MAX) columns, you can specify this file group for the large data:
CREATE TABLE dbo.YourTable
(....... define the fields here ......)
ON Data -- the basic "Data" filegroup for the regular data
TEXTIMAGE_ON LARGE_DATA -- the filegroup for large chunks of data
Check out the MSDN intro on filegroups, and play around with it!
There's still no simple answer. It depends on your scenario. MSDN has documentation to help you decide.
There are other options covered here. Instead of storing in the file system directly or in a BLOB, you can use the FileStream or File Table in SQL Server 2012. The advantages to File Table seem like a no-brainier (but admittedly I have no personal first-hand experience with them.)
The article is definitely worth a read.
You might read up on FILESTREAM. Here is some info from the docs that should help you decide:
If the following conditions are true, you should consider using FILESTREAM:
Objects that are being stored are, on average, larger than 1 MB.
Fast read access is important.
You are developing applications that use a middle tier for application logic.
For smaller objects, storing varbinary(max) BLOBs in the database often provides better streaming performance.

Sql Server XML columns substitute for Document DB?

Is it possible to use Sql Server XML columns as a substitute for a real Document DB (such as Couch or Mongo) ?
If I were to create a table with a guid PK Id and an XML column for the document.
What would be the main problems compared to using a document DB?
Sql Server supports indexing over XML columns so querying should not be completely horrible?
You've got several questions in here:
Is it possible to use Sql Server XML columns as a substitute for a real Document DB (such as Couch or Mongo) ? Yes, you can use it as a substitute, but no, you probably wouldn't be satisfied with performance if you're exclusively storing XML and not leveraging any of SQL Server's relational tools.
If I were to create a table with a guid PK Id and an XML column for the document. What would be the main problems compared to using a document DB? In a nutshell, scaling out. SQL Server doesn't scale this kind of thing out well. You can do it with replication, but it's painful to manage relative to a "real" Document DB.
Sql Server supports indexing over XML columns so querying should not be completely horrible? The problem is that SQL Server's XML indexes can take several times the storage space of the original data. These indexes can't be maintained online (as in defrags), so you end up with locking issues during maintenance windows.
I'm doing some experimenting with this on:
http://rogeralsing.com/2011/03/02/linq-to-sqlxml-projections/
Query speed is 'decent' , it's nothing I'd use for scaling.
But the joy of schema free storage running on standard infrastructure is quite nice.
Yes, you can. Storing a document inside a SqlServer XML column will work and if you use standard XML serialization that will leave you with a decent ACID complant key/value store. Also, it will allow you to do queries on it with relative ease and you can join the results to data that you store in a more relational way. We do so, it works. If you store content in XML fields, storage demands are a lot lower than using NTEXT and querying it will be more flexible and faster.
What SqlServer will not get you (comparing to mongo) is the seamless failover of replica-sets an the autosharding of mongo. Also, atomic operations like incrementing a specific property deep inside a document is hard (though not impossible with the XQuery update function). Updates tend to be faster on most NoSql databases, because they are more relaxed on the "data is only safe on disk" principle.
Yes, it is possible. As to whether it's a good idea, this is just my 2 cents...
Before the XML datatype came along I worked on a system storing XML in an NTEXT column - that wasn't pleasant, and to get any real use out of the data meant shredding some of that data out into relational form.
OK, the XML datatype now makes it easier to query an XML blob and to extract certain values/index them. But personally, in general, I wouldn't. I'm not saying never use XML as there are scenarios for that - rather if that's all your planning on doing then I'd be thinking "is this the right tool for the job". Using a RDBMS as a document database makes me feel a bit uneasy. Whereas something like MongoDB has been built from the ground up as a document database.
In all honesty, I haven't done any performance testing on storing data as XML so I can't give you an indication of what performance would be like. Would be interested to know how this performs at scale.

large database file (mdb) takes time to load in vb.net so need alternative

I have 4k records in access database. And one of the field value contains ~100 lines each
so and one other field has ~25 lines. So total database size reaches ~30MB and it takes lot of time 15-20 seconds to load the database in vb.net using odbc http://www.homeandlearn.co.uk/net/nets12p5.html
and updating of any other small fields also takes time due to database being large
So as an alternative I used rtf file (txt files were not preserving all the newline characters). So these file are around 5-10kb only. But for 4k records and 2 fields I have now 8k files. And copying of these 8k rtf files is taking huge time for 5MB transfer it takes an hour or so.
So is there any other alternative for storage of this data. So that it will be portable and easily loaded/accessed/updated from vb.net?
MDB Databases
MDB is the Access database filetype. Access databases were never designed to be used for backends of web systems, they are mainly for light office use.
Improving performance
For temporary improvement of performance, you can compact and repair the database. Open it up, and find the link in the tools menu. Alternatively you can do this programaticaly. This should be done reasonably frequently depending on the number of changes your databases has made to it. What does compact and repairing do?
Also, slowness is often a sign of inefficient design. Consider reading up on database normalisation if your database is not fully normalised. This should significantly improve performance and is an essential standard that should be learned.
Alternatives
For 4k+ records you should probably be using a decent database system designed specifically for larger amounts of data.
SQL-Server is an excellent database system from Microsoft. MySQL is also a great open source alternative. The Internet is full of tutorials on how to connect to these databases.
I'm using sometimes Access databases in .net too. Ok, MS-Access isn't the best database for this kind of application, I know. But the easy-doing complex queryes and the functional and well-knowed reports makes Access a good cost-benefit solution.
I saw the link that you've indicated. This way was my first technique, but then I realized there was another easier and faster. I suggest you to do the linkage for Access database in a different way.
Create a dataSet, if you already didn't it.
Create a connection to the MS-Access database using database explorer.
Drag and drop your desired tables on created DataSet (.net will create the designer code for you in backStage)
On code, create an tableAdapter object and a table object:
Supose that your dataSet name is DS1 and a table name is table01.
language: VB.NET
check intellisense autocomplete for your dataobjects
creates a tableadapter object and table object (designed when you drop the database explorer objects in dataset)
dim table01_TA as new ds1Tableadapters.table01_tableAdapter
dim table01 as new ds1.table01dataTable
loads the database data into the on-memory table table01
table01 = table01_TA.getData
do your opperations using table01 (add, update, insert, delete, queries)
for automatic generation of scripts for update, insert and delete, make sure your table has primaryKeys and correct relationships.
finally, update the table adapter. Unless you do it, the data will not be updated in the database.
table01_Ta.update(table01)
I suggest you use LINQ to query your data, and the datatable methods to adding and editing data. These methods are created automatically when you drop the databaseExplorer tables on dataSet and save it. Its worth to compact and repair Access database frequently.
Contat-me if you have troubles.
I agree with Tom's recommendation. Get yourself a decent database server. However, judging by your description of your performance issues it seems like you have other serious problems which are probably going to be difficult to resolve here.

Working with images in WCF

I have a desktop application that needs to upload/download images to/from service computer over TCP Protocol.
At first, I stored images in file system, but I need to in MS SQL DB to compare which solution is better. Number of images is over half a million. I don't know yet will there be any limitation on size of a photo.
If you have done smth like that, please, write what your opinion upon this question.
Which one is faster, more safe? Which of them works better with this number of photos? If I'll store on DB, do I need to store images apart from all other tables which I use for my application and which type works better - image or varbinary on DB?..and so on.
Thank you.
There's a really good paper by Microsoft Research called To Blob or Not To Blob.
Their conclusion after a large number of performance tests and analysis is this:
if your pictures or document are typically below 256K in size, storing them in a database VARBINARY column is more efficient
if your pictures or document are typically over 1 MB in size, storing them in the filesystem is more efficient (and with SQL Server 2008's FILESTREAM attribute, they're still under transactional control and part of the database)
in between those two, it's a bit of a toss-up depending on your use
If you decide to put your pictures into a SQL Server table, I would strongly recommend using a separate table for storing those pictures - do not store the employee foto in the employee table - keep them in a separate table. That way, the Employee table can stay lean and mean and very efficient, assuming you don't always need to select the employee foto, too, as part of your queries.
For filegroups, check out Files and Filegroup Architecture for an intro. Basically, you would either create your database with a separate filegroup for large data structures right from the beginning, or add an additional filegroup later. Let's call it "LARGE_DATA".
Now, whenever you have a new table to create which needs to store VARCHAR(MAX) or VARBINARY(MAX) columns, you can specify this file group for the large data:
CREATE TABLE dbo.YourTable
(....... define the fields here ......)
ON Data -- the basic "Data" filegroup for the regular data
TEXTIMAGE_ON LARGE_DATA -- the filegroup for large chunks of data
Check out the MSDN intro on filegroups, and play around with it!
Which version of SQL server? Version 2008 adds FILESTREAM which is specifically designed for this purpose. FILESTREAM data can be located on disk which makes it very fast to access.
If this is not an option, you could look into creating a separate filegroup for your image data (to give you the most flexibility when partitioning your data) and use the varbinary(max) or image data types.
A SQL guru will probably chime in with better info.

Separating an SQL Server database

I'm using SQL Server 2008. My database is almost 2GB in size. 90% of it is one table (as per sp_spaceused), that I need don't for most of my work.
I was wondering if it was possible to take this table, and have it backed up in a separate file, allowing me to transfer the important data on a more frequent basis than this one.
My guess is the easiest way to do this is create a new database, create the table there, copy the table contents to the new database, drop the table relationships, drop the table, create a view pointing to the other database and use that view in my applications.
However, I was wondering if you had any pointers to different strategies that I may not be aware of at this point.
Create the table in a different FileGroup.
Here's a link with some good examples.
This creates a second physical file for just that table. It can be placed on a different physical drive for performance. You can do a backup or restore of just specific filegroups, which is what it sounds like you need.
This is one example of the larger topic of "Data Partitioning", which involves various methods of dividing large tables across multiple files.
I suggest the filegroup solution. However to copy a table from a database to another you can do this trick:
SELECT * INTO MyNewDatabase..MyTable FROM MyOldDatabase..MyTable

Resources