file stream vs local save in sql server? - sql-server

my application play videos files after that user they are registered .(files are larger than 100 MB ) .
Is it better to do I store them on the hard drive and Keep file path in database ?
Or
do I store in database as File Stream Type ?
When data is stored in the database, are more secure against manipulation vs with stored in hard ?
How to provide data security against manipulation ?
Thanks .

There's a really good paper by Microsoft Research called To Blob or Not To Blob.
Their conclusion after a large number of performance tests and analysis is this:
if your pictures or document are typically below 256K in size, storing them in a database VARBINARY column is more efficient
if your pictures or document are typically over 1 MB in size, storing them in the filesystem is more efficient (and with SQL Server 2008's FILESTREAM attribute, they're still under transactional control and part of the database)
in between those two, it's a bit of a toss-up depending on your use
If you decide to put your pictures into a SQL Server table, I would strongly recommend using a separate table for storing those pictures - do not store the employee foto in the employee table - keep them in a separate table. That way, the Employee table can stay lean and mean and very efficient, assuming you don't always need to select the employee foto, too, as part of your queries.
For filegroups, check out Files and Filegroup Architecture for an intro. Basically, you would either create your database with a separate filegroup for large data structures right from the beginning, or add an additional filegroup later. Let's call it LARGE_DATA.
Now, whenever you have a new table to create which needs to store VARCHAR(MAX) or VARBINARY(MAX) columns, you can specify this file group for the large data:
CREATE TABLE dbo.YourTable
(....... define the fields here ......)
ON Data -- the basic "Data" filegroup for the regular data
TEXTIMAGE_ON LARGE_DATA -- the filegroup for large chunks of data
Check out the MSDN intro on filegroups, and play around with it!

1 - depends on how you define "better". In general, I prefer to store binary assets in the database so they are backed up alongside the associated data, but cache them on the file system. Streaming the binary data out of SQL Server for a page request is a real performance hog, and it doesn't really scale.
If an attacker can get to your hard drive, your entire system is compromised - storing things in the database will offer no significant additional security.
3 - that's a whole question in its own right. Too wide for Stack Overflow...

Related

How does SQLITE DB saves data of multiple tables in a single file?

I am working on a project to create a simplified version of SQLite Database. I got stuck when trying to figure out how does it manages to store data of multiple tables with different schema, in a single file. I suppose it should be using some indexes to map the data of different tables. Can someone shed more light on how its actually done? Thanks.
Edit: I suppose there is already an explanation in the docs, but looking for some easier way to understand it better and faster.
The schema is the list of all entities (tables, views etc) (the database as a whole) rather than a database existing of many schemas on a per entity basis.
Data itself is stored in pages each page being owned by an entity. It is these blocks that are saved.
The default page size is 4k. You will notice that the file size will always be a mutliple of 4K. You could also, with experimentation create a database with some tables, note it's size, then add some data, and if the added data does not require another page, see that the size of the file is the same. This demonstrating how it's all about pages rather than a linear/contiguos stream of data.
It, the schema, is saved in a table called sqlite_master. This table has columns :-
type (the type e.g. table etc),
name (the name given to the entity),
tbl_name (the tale to which the entity applies )
root page (the map to the first page)
sql (the SQL used to generate the entity, if any)
note that another schema, sqlite_temp_master, may also exist if there are temporary tables.
For example :-
Using SELECT * FROM sqlite_master; could result in something like :-
2.6. Storage Of The SQL Database Schema

SQL Server 2014 - Insert xml data into varbinary field

In my stored procedure, I am creating an XML file which has the potential to be very large, > 1GB in size. The data needs to be inserted into a varbinary column and I was wondering what the most efficient method of doing this is in SQL Server 2014?
I was storing it in an xml column but have been asked to move it to this new column as a result of a decision outside of my control
If you have the slightest chance to speak with these persons, you should do this!
You must be aware, that XML is not stored as the string representation you see, but as a hierarchically organized tree. Reading this data or manipulating it is astonishingly fast! If you store the XML as BLOB, you will keep it in its string format (hopefully this is unicode/UCS-2!). Reading this data will need a cast to NVARCHAR(MAX) and then to XML, which means a full parse of the whole document to get the hierarchy tree. When this is done, you can use XML data type methods like .value or .nodes). You will need this very expensive process over and over and over and ...
Especially in cases of huge XMLs (or - even worse - many of them) this is a really bad decision!! Why should one do this??? It will take roughly the same amount of storage space.
The only thing you will get is bad performance! And you will be the one who has to repair this later...
VARBINARY is the appropriate type for data, where you do not care what's inside (e.g. pictures). If these XMLs are just plain archive data and you do not want to read or manipulate them, this can be a choice. But there is no advantage at all!
I would look into using a File Table: https://learn.microsoft.com/en-us/sql/relational-databases/blob/filetables-sql-server
And Check out this for inserting blobs: How to insert a blob into a database using sql server management studio

tables with multiple varbinary columns

If i have a table with varbinary(Max) datatype and have FILESTREAM attributes on the column. Now I need to have to store another binary data but without FILESTREAM attribute. So, if I add another column with VARBINARY(MAX) datatypes on the same table would there be any performance issue? Do I gain faster performance if I separate a table with FILESTREAM attributes and Create another separate table to store other VARBINARY(MAX) data?
for your this question.you can.
Filestream is the new feature in sqlserver2008,and in 2012 ,that change the name ,call fileTable.
I tested it.this feature is use the DB manage the file .and up file about 5M/s.
for your other column,if you not open the filestream,the file will be change the binary ,and store in sqlserver data file.
open the filestream,the file will store the server, and managed by sqlserver.
for your second question,i am not 100% sure,but if you use the filestream,it's will gain more effiencit,need to attention the backup and store.
one years ago,i implemented this function in our system,and i have the shcame,if you want ,i will send you.
sorry,my english is not good.
your performance might be effected if you add another VARBINARY(MAX) on the same table
When the FILESTREAM attribute is set, SQL Server stores the BLOB data in the NT file system and keeps a pointer the file, in the table. this allows SQL Server to take advantage of the NTFS I/O streaming capabilities. and reduces overhead on the SQL engine
The MAX types (varchar, nvarchar and varbinary) and in your case VARBINARY(MAX) datatype cannot be stored internally as a contiguous memory area, since they can possibly grow up to 2Gb. So they have to be represented by a streaming interface.
and they will effect performance very much
if you are sure your files are small you can go for VARBINARY(MAX) other wise if they are larger thab 2gb FILESTREAM is the best option for you
and yeah i would suggest you Create another separate table to store other VARBINARY(MAX) data

Should image binaries be stored as BLOBS in a SQL Server?

If an application requires images (ie. JPGs, PNGs etc) to be referenced in a database-driven application, should these images just be stored in a file system with their path referenced in a database, or should the images actually be stored in the database as BLOBS?
There's a really good paper by Microsoft Research called To Blob or Not To Blob.
Their conclusion after a large number of performance tests and analysis is this:
if your pictures or document are typically below 256K in size, storing them in a database VARBINARY column is more efficient
if your pictures or document are typically over 1 MB in size, storing them in the filesystem is more efficient (and with SQL Server 2008's FILESTREAM attribute, they're still under transactional control and part of the database)
in between those two, it's a bit of a toss-up depending on your use
If you decide to put your pictures into a SQL Server table, I would strongly recommend using a separate table for storing those pictures - do not store the employee foto in the employee table - keep them in a separate table. That way, the Employee table can stay lean and mean and very efficient, assuming you don't always need to select the employee foto, too, as part of your queries.
For filegroups, check out Files and Filegroup Architecture for an intro. Basically, you would either create your database with a separate filegroup for large data structures right from the beginning, or add an additional filegroup later. Let's call it "LARGE_DATA".
Now, whenever you have a new table to create which needs to store VARCHAR(MAX) or VARBINARY(MAX) columns, you can specify this file group for the large data:
CREATE TABLE dbo.YourTable
(....... define the fields here ......)
ON Data -- the basic "Data" filegroup for the regular data
TEXTIMAGE_ON LARGE_DATA -- the filegroup for large chunks of data
Check out the MSDN intro on filegroups, and play around with it!

How to insert exisitng documents stored on NFTS in sql server filestream's storage

I am doing investigation on filestream (asking on stackoverflow while reading whitepapers and google searching), in my current screnario documents are managed in this way:
1) I have a DB table where I keep the document id and the doc path (like \fileserver\DocumentRepository\file000000001.pdf)
2) I have a document folder (\fileserver\DocumentRepository) where I store the documents
Of course I need to change this to a varbinary(max)/filestream storage.
What is the best way to perform this task?
Is it possible to say "\fileserver\DocumentRepository\file000000001.pdf" is assigned to a varbinary(max) field or I have to explicitly insert it? So somehow tell to the varbinary(max) field: "now you are a pointer to the existing document".
You can't assign an existing file to a varbinary(max)/filestream value. You have to explicitly insert it.
That being said, if for some reason this is not an option for you (e.g. you can't afford copying huge amounts of data or would hit a disk space problem while copying) there are some hacks to do the migration with 0-copy. The trick would be to do the following steps:
Switch the DB to simple recovery model.
Insert placeholder filestream files for all the files you're about to migrate. When inserting, use a varbinary value of 0x. While inserting, collect the (document id/file path) => (filestream file name) pairs.
Stop Sql Server.
Overwrite the empty filestream files with the real documents (use move/hard links to avoid copying data).
Start Sql Server, perform some sanity checks (DBCC) and start a new backup chain.
Obviously, this hack is not recommended and prone do database corruption. :)

Resources