I have a desktop application that needs to upload/download images to/from service computer over TCP Protocol.
At first, I stored images in file system, but I need to in MS SQL DB to compare which solution is better. Number of images is over half a million. I don't know yet will there be any limitation on size of a photo.
If you have done smth like that, please, write what your opinion upon this question.
Which one is faster, more safe? Which of them works better with this number of photos? If I'll store on DB, do I need to store images apart from all other tables which I use for my application and which type works better - image or varbinary on DB?..and so on.
Thank you.
There's a really good paper by Microsoft Research called To Blob or Not To Blob.
Their conclusion after a large number of performance tests and analysis is this:
if your pictures or document are typically below 256K in size, storing them in a database VARBINARY column is more efficient
if your pictures or document are typically over 1 MB in size, storing them in the filesystem is more efficient (and with SQL Server 2008's FILESTREAM attribute, they're still under transactional control and part of the database)
in between those two, it's a bit of a toss-up depending on your use
If you decide to put your pictures into a SQL Server table, I would strongly recommend using a separate table for storing those pictures - do not store the employee foto in the employee table - keep them in a separate table. That way, the Employee table can stay lean and mean and very efficient, assuming you don't always need to select the employee foto, too, as part of your queries.
For filegroups, check out Files and Filegroup Architecture for an intro. Basically, you would either create your database with a separate filegroup for large data structures right from the beginning, or add an additional filegroup later. Let's call it "LARGE_DATA".
Now, whenever you have a new table to create which needs to store VARCHAR(MAX) or VARBINARY(MAX) columns, you can specify this file group for the large data:
CREATE TABLE dbo.YourTable
(....... define the fields here ......)
ON Data -- the basic "Data" filegroup for the regular data
TEXTIMAGE_ON LARGE_DATA -- the filegroup for large chunks of data
Check out the MSDN intro on filegroups, and play around with it!
Which version of SQL server? Version 2008 adds FILESTREAM which is specifically designed for this purpose. FILESTREAM data can be located on disk which makes it very fast to access.
If this is not an option, you could look into creating a separate filegroup for your image data (to give you the most flexibility when partitioning your data) and use the varbinary(max) or image data types.
A SQL guru will probably chime in with better info.
Related
It's an old question I know, but with SQL Server 2012 is it finally ok to store files in the database, or should they really be kept in the filesystem with only references to them in the database?
If storing them in the database is considered acceptable these days, what is the most effective way to do it?
I'm planning to apply encryption so I appreciate processing will not be lightning fast.
There's a really good paper by Microsoft Research called To Blob or Not To Blob.
Their conclusion after a large number of performance tests and analysis is this:
if your pictures or document are typically below 256K in size, storing them in a database VARBINARY column is more efficient
if your pictures or document are typically over 1 MB in size, storing them in the filesystem is more efficient (and with SQL Server 2008's FILESTREAM attribute, they're still under transactional control and part of the database)
in between those two, it's a bit of a toss-up depending on your use
If you decide to put your pictures into a SQL Server table, I would strongly recommend using a separate table for storing those pictures - do not store the employee photo in the employee table - keep them in a separate table. That way, the Employee table can stay lean and mean and very efficient, assuming you don't always need to select the employee photo, too, as part of your queries.
For filegroups, check out Files and Filegroup Architecture for an intro. Basically, you would either create your database with a separate filegroup for large data structures right from the beginning, or add an additional filegroup later. Let's call it "LARGE_DATA".
Now, whenever you have a new table to create which needs to store VARCHAR(MAX) or VARBINARY(MAX) columns, you can specify this file group for the large data:
CREATE TABLE dbo.YourTable
(....... define the fields here ......)
ON Data -- the basic "Data" filegroup for the regular data
TEXTIMAGE_ON LARGE_DATA -- the filegroup for large chunks of data
Check out the MSDN intro on filegroups, and play around with it!
There's still no simple answer. It depends on your scenario. MSDN has documentation to help you decide.
There are other options covered here. Instead of storing in the file system directly or in a BLOB, you can use the FileStream or File Table in SQL Server 2012. The advantages to File Table seem like a no-brainier (but admittedly I have no personal first-hand experience with them.)
The article is definitely worth a read.
You might read up on FILESTREAM. Here is some info from the docs that should help you decide:
If the following conditions are true, you should consider using FILESTREAM:
Objects that are being stored are, on average, larger than 1 MB.
Fast read access is important.
You are developing applications that use a middle tier for application logic.
For smaller objects, storing varbinary(max) BLOBs in the database often provides better streaming performance.
Currently putting together a base POC of architecture for an application that will be storing large amounts of records and each record will have one field that contains a few thousand characters.
e.g.
TableID int
Field1 nvarchar(50)
Field2 nvarchar(50)
Field2 nvarchar(MAX)
This is all being hosted in Azure. We have one webjob that does the work to obtain the data and populate it into the data store and then another webjob comes through periodically and processes the data.
Currently the data is just stored in an Azure SQL Database. I'm just worried once the record count turns into the many millions it's going to be incredibly inefficient to store/process/retrieve the data this way.
Advice required on the best way to store this in Azure. Wanted to start trying the fact that we keep the rows in Azure SQL but the large field's data is pushed into another repository (e.g. Data lake, DocumentDB) and has a reference back to the SQL record therefore the SQL calls are still lean and big data is stored somewhere else. Is this a clean manor of doing it or am I totally missing something?
Azure Table Storage can help with this solution - it is a NoSQL KeyValue store. Each entity can be up to 1MB in size. You could also use individual blobs as well. There is a design guide that includes a full description of how to design Table Storage solutions for scale - including patterns for using Table Storage along with other repositories see Table Design Guide
https://azure.microsoft.com/en-us/documentation/articles/storage-table-design-guide/
It's an old question I know, but with SQL Server 2012 is it finally ok to store files in the database, or should they really be kept in the filesystem with only references to them in the database?
If storing them in the database is considered acceptable these days, what is the most effective way to do it?
I'm planning to apply encryption so I appreciate processing will not be lightning fast.
There's a really good paper by Microsoft Research called To Blob or Not To Blob.
Their conclusion after a large number of performance tests and analysis is this:
if your pictures or document are typically below 256K in size, storing them in a database VARBINARY column is more efficient
if your pictures or document are typically over 1 MB in size, storing them in the filesystem is more efficient (and with SQL Server 2008's FILESTREAM attribute, they're still under transactional control and part of the database)
in between those two, it's a bit of a toss-up depending on your use
If you decide to put your pictures into a SQL Server table, I would strongly recommend using a separate table for storing those pictures - do not store the employee photo in the employee table - keep them in a separate table. That way, the Employee table can stay lean and mean and very efficient, assuming you don't always need to select the employee photo, too, as part of your queries.
For filegroups, check out Files and Filegroup Architecture for an intro. Basically, you would either create your database with a separate filegroup for large data structures right from the beginning, or add an additional filegroup later. Let's call it "LARGE_DATA".
Now, whenever you have a new table to create which needs to store VARCHAR(MAX) or VARBINARY(MAX) columns, you can specify this file group for the large data:
CREATE TABLE dbo.YourTable
(....... define the fields here ......)
ON Data -- the basic "Data" filegroup for the regular data
TEXTIMAGE_ON LARGE_DATA -- the filegroup for large chunks of data
Check out the MSDN intro on filegroups, and play around with it!
There's still no simple answer. It depends on your scenario. MSDN has documentation to help you decide.
There are other options covered here. Instead of storing in the file system directly or in a BLOB, you can use the FileStream or File Table in SQL Server 2012. The advantages to File Table seem like a no-brainier (but admittedly I have no personal first-hand experience with them.)
The article is definitely worth a read.
You might read up on FILESTREAM. Here is some info from the docs that should help you decide:
If the following conditions are true, you should consider using FILESTREAM:
Objects that are being stored are, on average, larger than 1 MB.
Fast read access is important.
You are developing applications that use a middle tier for application logic.
For smaller objects, storing varbinary(max) BLOBs in the database often provides better streaming performance.
I have 4k records in access database. And one of the field value contains ~100 lines each
so and one other field has ~25 lines. So total database size reaches ~30MB and it takes lot of time 15-20 seconds to load the database in vb.net using odbc http://www.homeandlearn.co.uk/net/nets12p5.html
and updating of any other small fields also takes time due to database being large
So as an alternative I used rtf file (txt files were not preserving all the newline characters). So these file are around 5-10kb only. But for 4k records and 2 fields I have now 8k files. And copying of these 8k rtf files is taking huge time for 5MB transfer it takes an hour or so.
So is there any other alternative for storage of this data. So that it will be portable and easily loaded/accessed/updated from vb.net?
MDB Databases
MDB is the Access database filetype. Access databases were never designed to be used for backends of web systems, they are mainly for light office use.
Improving performance
For temporary improvement of performance, you can compact and repair the database. Open it up, and find the link in the tools menu. Alternatively you can do this programaticaly. This should be done reasonably frequently depending on the number of changes your databases has made to it. What does compact and repairing do?
Also, slowness is often a sign of inefficient design. Consider reading up on database normalisation if your database is not fully normalised. This should significantly improve performance and is an essential standard that should be learned.
Alternatives
For 4k+ records you should probably be using a decent database system designed specifically for larger amounts of data.
SQL-Server is an excellent database system from Microsoft. MySQL is also a great open source alternative. The Internet is full of tutorials on how to connect to these databases.
I'm using sometimes Access databases in .net too. Ok, MS-Access isn't the best database for this kind of application, I know. But the easy-doing complex queryes and the functional and well-knowed reports makes Access a good cost-benefit solution.
I saw the link that you've indicated. This way was my first technique, but then I realized there was another easier and faster. I suggest you to do the linkage for Access database in a different way.
Create a dataSet, if you already didn't it.
Create a connection to the MS-Access database using database explorer.
Drag and drop your desired tables on created DataSet (.net will create the designer code for you in backStage)
On code, create an tableAdapter object and a table object:
Supose that your dataSet name is DS1 and a table name is table01.
language: VB.NET
check intellisense autocomplete for your dataobjects
creates a tableadapter object and table object (designed when you drop the database explorer objects in dataset)
dim table01_TA as new ds1Tableadapters.table01_tableAdapter
dim table01 as new ds1.table01dataTable
loads the database data into the on-memory table table01
table01 = table01_TA.getData
do your opperations using table01 (add, update, insert, delete, queries)
for automatic generation of scripts for update, insert and delete, make sure your table has primaryKeys and correct relationships.
finally, update the table adapter. Unless you do it, the data will not be updated in the database.
table01_Ta.update(table01)
I suggest you use LINQ to query your data, and the datatable methods to adding and editing data. These methods are created automatically when you drop the databaseExplorer tables on dataSet and save it. Its worth to compact and repair Access database frequently.
Contat-me if you have troubles.
I agree with Tom's recommendation. Get yourself a decent database server. However, judging by your description of your performance issues it seems like you have other serious problems which are probably going to be difficult to resolve here.
I'm using SQL Server 2008. My database is almost 2GB in size. 90% of it is one table (as per sp_spaceused), that I need don't for most of my work.
I was wondering if it was possible to take this table, and have it backed up in a separate file, allowing me to transfer the important data on a more frequent basis than this one.
My guess is the easiest way to do this is create a new database, create the table there, copy the table contents to the new database, drop the table relationships, drop the table, create a view pointing to the other database and use that view in my applications.
However, I was wondering if you had any pointers to different strategies that I may not be aware of at this point.
Create the table in a different FileGroup.
Here's a link with some good examples.
This creates a second physical file for just that table. It can be placed on a different physical drive for performance. You can do a backup or restore of just specific filegroups, which is what it sounds like you need.
This is one example of the larger topic of "Data Partitioning", which involves various methods of dividing large tables across multiple files.
I suggest the filegroup solution. However to copy a table from a database to another you can do this trick:
SELECT * INTO MyNewDatabase..MyTable FROM MyOldDatabase..MyTable