Streaming and Linq Blobs

Streaming and Linq Blobs - sql-server

I have an object I am using to store document meta data into a table. The body text of the document can be very large, sometimes > 2GB so I will be storing it into a nvarchar(max) field in SQL 2008. I'll use SQL 2008 later to index that field. I won't be using filestreams because they are very restrictive to the database and prevents certain types of concurrency locking schemes.
This object is exposed to the developer via LinqToSQL. My concern is that the field will be to large and I have seen .Net bomb out with an OutOfMemory exception if the text is > 1.5 GB.
So I am wondering, can I treat this blob as a stream with Linq? Or do I have to bypass Linq altogether if I want to use a blob?

Given the answer to "Can a LINQ query retrieve BLOBs [...]" I suspect you're out of luck. The System.Data.Linq.Binary type doesn't have any mechanism for streaming - basically it's just an immutable byte array representation.
There may be some deep LINQ mojo you could invoke, but I suspect it would really have to be pretty deep.
It's possible that the Entity Framework would handle it - I haven't investigated that.

I ended up writing my own method around linqtoSql utlising the write method avaiable to varchar(max) objects in SQL. This allows developers to chunk inserts into the DB for large data types.

Related

Any reason to NOT use FileTable (as opposed to plain FileStream) in SQL Server? [duplicate]

I want to store images in a sql database. The size of the image is between 50kb to 1mb. I was reading about a FileStream and a FileTable but I don't know which to choose. Each row will have 2 images and some other fields.
The images will never be updated/deleted and about 3000 rows will be inserted a day.
Which is recommend in this situation?

Originally it was always a bad idea to store files (= binary data) in a database. The usual workaround is to store the filepath in the database and ensure that a file actually exists at that path. It wás possible to store files in the database though, with the varbinary(MAX) data type.
sqlfilestream was introduced in sql-server-2008 and handles the varbinary column by not storing the data in the database files (only a pointer), but in a different file on the filesystem, dramatically improving the performance.
filetable was introduced with sql-server-2012 and is an enhancement over filestream, because it provides metadata directly to SQL and it allows access to the files outside of SQL (you can browse to the files).
Advice: Definitely leverage FileStream, and it might not be a bad idea to use FileTable as well.
More reading (short): http://www.databasejournal.com/features/mssql/filestream-and-filetable-in-sql-server-2012.html

In SQL Server, BLOBs can be standard varbinary(max) data that stores the data in tables, or FILESTREAM varbinary(max) objects that store the data in the file system. The size and use of the data determines whether you should use database storage or file system storage.
If the following conditions are true, you should consider using FILESTREAM:
Objects that are being stored are, on average, larger than 1 MB.
Fast read access is important.
You are developing applications that use a middle tier for application logic.
For smaller objects, storing varbinary(max) BLOBs in the database
often provides better streaming performance.
Benefits of the FILETABLE:
Windows API compatibility for file data stored within a SQL Server database. Windows API compatibility includes the following:
Non-transactional streaming access and in-place updates to FILESTREAM data.
A hierarchical namespace of directories and files.
Storage of file attributes, such as created date and modified date.
Support for Windows file and directory management APIs.
Compatibility with other SQL Server features including management tools, services, and relational query capabilities over FILESTREAM and file attribute data.

It depends. I personally will preffer link to the image inside the table. It is more simple and the files from the directory can be backed up separately.
You have to take into account several things:
How you will process images. Having only link allows you easily incorporates imges inside web pages (with propper config of the Web server).
How much are the images - if they are stored in the DB and they are a lot - this will increase the size of the DB and backups.
Are the images change oftenly - in that case it may be better to have them inside DB to have actual state of the backup inside DB.

Advantages of XML DataType in SQL Server 2005

I was trying to understand the basic advantage of using XML DataType in SQL Server 2005. I underwent the article here, saying that in case you want to delete multiple records. Serialize the XMl, send it in Database and using below query you can delete it..
I was curious to look into any other advantage of using this DataType...
EDIT
Reasons for Storing XML Data in SQL Server 2005
Here are some reasons for using native XML features in SQL Server 2005 as opposed to managing your XML data in the file system:
You want to use administrative functionality of the database server for managing your XML data (for example, backup, recovery and replication).
My Understanding - Can you share some knowledge over it to make it clear?
You want to share, query, and modify your XML data in an efficient and transacted way. Fine-grained data access is important to your application. For example, you may want to insert a new section without replacing your whole document.
My Understanding - XML is in specific column row, In order to add new section in this row's cell, Update is required, so whole document will be updated. Right?
You want the server to guarantee well-formed data, and optionally validate your data according to XML schemas.
My Understanding - Can you share some knowledge over it to make it clear?
You want indexing of XML data for efficient query processing and good scalability, and the use a first-rate query optimizer.
My Understanding - Same can be done by adding individual columns. Then why XML column?

Pros:
Allows storage of xml data that can be automatically controlled by an xml schema - thereby guaranteeing a certain level of data quality
Many web/desktop apps store data in xml form, these can then be easily stored and queried in the database - so it is a great place to store xml data that an app may need to use (e.g. for configuration settings)
Cons:
Be careful about using xml fields, they may start off as innocent storage but can become a performance nightmare if you want to search, analyse and report on many records.
Also, if xml fields will be added to, changed or deleted this can be slow and leads to complex t-sql.
In replication, the whole xml gets updated even if only one node changes - therefore you could have many more conflicts that cannot easily be resolved.

I would say of the 4 advantages you've listed, these two are critical:
You want to share, query, and modify your XML data in an efficient and transacted way
SQL Server stores the XML in an optimised way that it wouldn't for plain strings, and lets you query the XML in an efficient way, rather than requiring you to bring the entire XML document back to the client. Think how inefficient it is if you want to query 10,000 XML columns, each containing 1k of data. For a small XPath query you would need to return 10k of data across the wire, for each client, each time.
You want indexing of XML data for efficient query processing and good scalability, and the use a first-rate query optimizer
This ties into what I said above, it's far more efficiently stored than a plain text column which would also run into page fragmentation issues.

Sql Server XML columns substitute for Document DB?

Is it possible to use Sql Server XML columns as a substitute for a real Document DB (such as Couch or Mongo) ?
If I were to create a table with a guid PK Id and an XML column for the document.
What would be the main problems compared to using a document DB?
Sql Server supports indexing over XML columns so querying should not be completely horrible?

You've got several questions in here:
Is it possible to use Sql Server XML columns as a substitute for a real Document DB (such as Couch or Mongo) ? Yes, you can use it as a substitute, but no, you probably wouldn't be satisfied with performance if you're exclusively storing XML and not leveraging any of SQL Server's relational tools.
If I were to create a table with a guid PK Id and an XML column for the document. What would be the main problems compared to using a document DB? In a nutshell, scaling out. SQL Server doesn't scale this kind of thing out well. You can do it with replication, but it's painful to manage relative to a "real" Document DB.
Sql Server supports indexing over XML columns so querying should not be completely horrible? The problem is that SQL Server's XML indexes can take several times the storage space of the original data. These indexes can't be maintained online (as in defrags), so you end up with locking issues during maintenance windows.

I'm doing some experimenting with this on:
http://rogeralsing.com/2011/03/02/linq-to-sqlxml-projections/
Query speed is 'decent' , it's nothing I'd use for scaling.
But the joy of schema free storage running on standard infrastructure is quite nice.

Yes, you can. Storing a document inside a SqlServer XML column will work and if you use standard XML serialization that will leave you with a decent ACID complant key/value store. Also, it will allow you to do queries on it with relative ease and you can join the results to data that you store in a more relational way. We do so, it works. If you store content in XML fields, storage demands are a lot lower than using NTEXT and querying it will be more flexible and faster.
What SqlServer will not get you (comparing to mongo) is the seamless failover of replica-sets an the autosharding of mongo. Also, atomic operations like incrementing a specific property deep inside a document is hard (though not impossible with the XQuery update function). Updates tend to be faster on most NoSql databases, because they are more relaxed on the "data is only safe on disk" principle.

Yes, it is possible. As to whether it's a good idea, this is just my 2 cents...
Before the XML datatype came along I worked on a system storing XML in an NTEXT column - that wasn't pleasant, and to get any real use out of the data meant shredding some of that data out into relational form.
OK, the XML datatype now makes it easier to query an XML blob and to extract certain values/index them. But personally, in general, I wouldn't. I'm not saying never use XML as there are scenarios for that - rather if that's all your planning on doing then I'd be thinking "is this the right tool for the job". Using a RDBMS as a document database makes me feel a bit uneasy. Whereas something like MongoDB has been built from the ground up as a document database.
In all honesty, I haven't done any performance testing on storing data as XML so I can't give you an indication of what performance would be like. Would be interested to know how this performs at scale.

large database file (mdb) takes time to load in vb.net so need alternative

I have 4k records in access database. And one of the field value contains ~100 lines each
so and one other field has ~25 lines. So total database size reaches ~30MB and it takes lot of time 15-20 seconds to load the database in vb.net using odbc http://www.homeandlearn.co.uk/net/nets12p5.html
and updating of any other small fields also takes time due to database being large
So as an alternative I used rtf file (txt files were not preserving all the newline characters). So these file are around 5-10kb only. But for 4k records and 2 fields I have now 8k files. And copying of these 8k rtf files is taking huge time for 5MB transfer it takes an hour or so.
So is there any other alternative for storage of this data. So that it will be portable and easily loaded/accessed/updated from vb.net?

MDB Databases
MDB is the Access database filetype. Access databases were never designed to be used for backends of web systems, they are mainly for light office use.
Improving performance
For temporary improvement of performance, you can compact and repair the database. Open it up, and find the link in the tools menu. Alternatively you can do this programaticaly. This should be done reasonably frequently depending on the number of changes your databases has made to it. What does compact and repairing do?
Also, slowness is often a sign of inefficient design. Consider reading up on database normalisation if your database is not fully normalised. This should significantly improve performance and is an essential standard that should be learned.
Alternatives
For 4k+ records you should probably be using a decent database system designed specifically for larger amounts of data.
SQL-Server is an excellent database system from Microsoft. MySQL is also a great open source alternative. The Internet is full of tutorials on how to connect to these databases.

I'm using sometimes Access databases in .net too. Ok, MS-Access isn't the best database for this kind of application, I know. But the easy-doing complex queryes and the functional and well-knowed reports makes Access a good cost-benefit solution.
I saw the link that you've indicated. This way was my first technique, but then I realized there was another easier and faster. I suggest you to do the linkage for Access database in a different way.
Create a dataSet, if you already didn't it.
Create a connection to the MS-Access database using database explorer.
Drag and drop your desired tables on created DataSet (.net will create the designer code for you in backStage)
On code, create an tableAdapter object and a table object:
Supose that your dataSet name is DS1 and a table name is table01.
language: VB.NET
check intellisense autocomplete for your dataobjects
creates a tableadapter object and table object (designed when you drop the database explorer objects in dataset)
dim table01_TA as new ds1Tableadapters.table01_tableAdapter
dim table01 as new ds1.table01dataTable
loads the database data into the on-memory table table01
table01 = table01_TA.getData
do your opperations using table01 (add, update, insert, delete, queries)
for automatic generation of scripts for update, insert and delete, make sure your table has primaryKeys and correct relationships.
finally, update the table adapter. Unless you do it, the data will not be updated in the database.
table01_Ta.update(table01)
I suggest you use LINQ to query your data, and the datatable methods to adding and editing data. These methods are created automatically when you drop the databaseExplorer tables on dataSet and save it. Its worth to compact and repair Access database frequently.
Contat-me if you have troubles.

I agree with Tom's recommendation. Get yourself a decent database server. However, judging by your description of your performance issues it seems like you have other serious problems which are probably going to be difficult to resolve here.

Database field type for storing web pages

What database field type should I use to store web pages (html, pdf, text) files in the same field? nvarchar(max)?

Use VARBINARY(MAX). NVARCHAR is only for Unicode content, which won't handle PDFs well at all.

You'd want to use VARBINARY(MAX) or NVARCHAR(MAX), depending on the type of data being stored and what you want to do with it. If you going to store files (which it sounds like, especially with mixed extensions), use VARBINARY(MAX). You can full-text index off that data type too -- although PDF's require an additional iFilter (at least it did with our SQL Server 2005 instance -- it may be there by default in 2008).
Keep in mind that you don't want to use IMAGE, as that data type (along with TEXT and NTEXT) is deprecated and is being removed in a future version of SQL Server. Here's the link about that.
Hope this helps.

VARBINARY(MAX). If you're using SQL Server 2008, then FILESTREAM is also an option that should be considered. According to Microsoft's guidelines, consider FILESTREAM when:
Objects that are being stored are, on
average, larger than 1 MB.
Fast read access is important.
You are developing applications that
use a middle tier for application
logic.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight