In my stored procedure, I am creating an XML file which has the potential to be very large, > 1GB in size. The data needs to be inserted into a varbinary column and I was wondering what the most efficient method of doing this is in SQL Server 2014?
I was storing it in an xml column but have been asked to move it to this new column as a result of a decision outside of my control
If you have the slightest chance to speak with these persons, you should do this!
You must be aware, that XML is not stored as the string representation you see, but as a hierarchically organized tree. Reading this data or manipulating it is astonishingly fast! If you store the XML as BLOB, you will keep it in its string format (hopefully this is unicode/UCS-2!). Reading this data will need a cast to NVARCHAR(MAX) and then to XML, which means a full parse of the whole document to get the hierarchy tree. When this is done, you can use XML data type methods like .value or .nodes). You will need this very expensive process over and over and over and ...
Especially in cases of huge XMLs (or - even worse - many of them) this is a really bad decision!! Why should one do this??? It will take roughly the same amount of storage space.
The only thing you will get is bad performance! And you will be the one who has to repair this later...
VARBINARY is the appropriate type for data, where you do not care what's inside (e.g. pictures). If these XMLs are just plain archive data and you do not want to read or manipulate them, this can be a choice. But there is no advantage at all!
I would look into using a File Table: https://learn.microsoft.com/en-us/sql/relational-databases/blob/filetables-sql-server
And Check out this for inserting blobs: How to insert a blob into a database using sql server management studio
Related
I am using data in Svmlight format (you may know it as libsvm)
this is the format
I tried to create a SQL database to store it, the issue is that the format is sparse, and in a regular database it will be very consuming to store the data, and if i store it in a sparse format (string for every row) i can't query stuff by their column wise content (e.g -> i need to query for all the rows that contain value for feature #)
I am looking for a straight forward way to convert it to a database that will make filtering and querying faster and more simple.
Can anyone point me to a suiting database solution and if possible an already made utility that makes to conversion?
Thanks!
If i have a table with varbinary(Max) datatype and have FILESTREAM attributes on the column. Now I need to have to store another binary data but without FILESTREAM attribute. So, if I add another column with VARBINARY(MAX) datatypes on the same table would there be any performance issue? Do I gain faster performance if I separate a table with FILESTREAM attributes and Create another separate table to store other VARBINARY(MAX) data?
for your this question.you can.
Filestream is the new feature in sqlserver2008,and in 2012 ,that change the name ,call fileTable.
I tested it.this feature is use the DB manage the file .and up file about 5M/s.
for your other column,if you not open the filestream,the file will be change the binary ,and store in sqlserver data file.
open the filestream,the file will store the server, and managed by sqlserver.
for your second question,i am not 100% sure,but if you use the filestream,it's will gain more effiencit,need to attention the backup and store.
one years ago,i implemented this function in our system,and i have the shcame,if you want ,i will send you.
sorry,my english is not good.
your performance might be effected if you add another VARBINARY(MAX) on the same table
When the FILESTREAM attribute is set, SQL Server stores the BLOB data in the NT file system and keeps a pointer the file, in the table. this allows SQL Server to take advantage of the NTFS I/O streaming capabilities. and reduces overhead on the SQL engine
The MAX types (varchar, nvarchar and varbinary) and in your case VARBINARY(MAX) datatype cannot be stored internally as a contiguous memory area, since they can possibly grow up to 2Gb. So they have to be represented by a streaming interface.
and they will effect performance very much
if you are sure your files are small you can go for VARBINARY(MAX) other wise if they are larger thab 2gb FILESTREAM is the best option for you
and yeah i would suggest you Create another separate table to store other VARBINARY(MAX) data
what is the best way to store long texts (articles) in a database? it doesnt need to be searchable.
i want to allow ppl to read the first chapter of every book in my bookstore. dumping it into a database field makes it difficult to style paragraphs using css..
EDIT: access database
If it is sql server 2005 USE VARCHAR(MAX)
EDIT,
It seems he saif access,
so i would go with memo
Up to 63,999 characters. (If the Memo
field is manipulated through DAO and
only text and numbers [not binary
data] will be stored in it, then the
size of the Memo field is limited by
the size of the database.)
or OLE Object (if you can)
An object (such as a Microsoft Excel
spreadsheet, a Microsoft Word
document, graphics, sounds, or other
binary data) linked (OLE/DDE link: A
connection between an OLE object and
its OLE server, or between a Dynamic
Data Exchange (DDE) source document
and a destination document.) to or
embedded (embed: To insert a copy of
an OLE object from another
application. The source of the object,
called the OLE server, can be any
application that supports object
linking and embedding. Changes to an
embedded object are not reflected in
the original object.) in a Microsoft
Access table.
Up to 1 gigabyte (limited by available
disk space)
you have several options:
store it as a long single string with no formatting, which will look bland on the screen.
store it as a long single string with embedded html and css, which will be a bad choice if you ever want to make your site have a different look/feel.
normalize it so you have tables to store books, chapters, paragraphs, etc. you could then format and style the text as you load it into the application.
The main difference between long text (CLOB / TEXT / VARCHAR(MAX)) and long data (BLOB / IMAGE / VARBINARY(MAX)) is that the former is subject to character set conversions while the former is not.
If you need to make character set conversion on the database side, use CLOB and similar.
If you always want to retrieve your data as you atored it, byte-to-byte (as opposed character-to-character), use BLOB and similar.
I don't know which database you're using, but if text doesn't need to be searchable, then you can simply store the HTML formatted text (for instance, value coming from an FCKEditor or components like this). If you need also searchability, then you can store both HTML an plain text in two separated fields.
Fields can be nvarchar(MAX) if you use MS SQL Server 2008 or any equivalent datatype on other databases.
EDIT:
Seems you're using Access, so go for Memo data type!
If you decide to store HTML, consider to store only a generic markup (div, p) to divide your text, than later apply CSS formatting, wrapping stored text within another div specifing formatting classes for children elements.
I wouldn't store any of the documents in the database, but store the data in files in the file system, and the only thing that's in the database would be a pointer to the data files.
You don't give any details in your question that would suggest any need whatsoever to store the documents in the database itself.
And there are very few circumstances where it's advantageous.
Use a CLOB.
For SQL Server
TEXT / NTEXT for SQL Server 2000
VARCHAR(MAX) / NVARCHAR(MAX) for SQL Server 2005 onwards
I would propose storing the first chapter as pdf file. This is secure and allows for good formatting. Then use a blob, clob, varchar, or text field depending on your product (see the other answers).
Or you could use images and look into something like amazone's "look inside". It would work with the same db techniques.
Alternatively you could use something like markup.
I personally do not like to put html in my database. Even if it is only for output. Too easy to put in some javascript. But maybe I'm just too cautious.
The following applies to Jet 4.0 only, being the version of the Access Database Engine in the era Access2000 to Access2003 inclusive:
I wouldn't store any of the documents in the database, but store the data in files in the file system, and the only thing that's in the database would be a pointer to the data files.
You don't give any details in your question that would suggest any need whatsoever to store the documents in the database itself.
And there are very few circumstances where it's advantageous.
If you are using ACE, being the version of the Access Database Engine in the Access2007 era, the Attachment data type would be an option, however I don't really know how it works, I've never used it so I can't recommend it nor say whether it's better or worse for this purpose. I'm also wary of new data types in the first release of a major version of the Access Database Engine. I just remember all the issues with byte and decimal fields at the introduction of Jet 4 and don't want to commit to something that may never work properly. The Attachment type in the ACCDB format was introduced for Sharepoint compatibility, and that outside dependency is something that gives me pause. Will the ACCEDB data type change someday if Sharepoint changes the way it works? I'm not sure I'd want to take that risk.
Put it in a TEXT field, and put it with their <p> so you'll be able to style paragraphs.
As it doesn't need to be searchable, it won't impact your sql performance.
Has any one used SQL Server 2008 as an xml document database? what are you thoughts on doing so. Is the indexing and querying of the XML data type sufficent to support this type of role? Is the query performance of XML acceptable?
I don't know what exactly your requirements are, and how many documents and what sizes we're talking about here.
SQL Server 2005 does allow you to specify XML schemas so you can definitely get some validation into the equation which is certainly beneficial.
As for XML indexing - you can index on three different strategies once you've created a basic primary XML index.
the first index type more for optimizing the XPath to a single XML node when you do lots of XPath-based queries for nodes (CREATE XML INDEX ..... FOR PATH)
the second index type more for optimizing access to values inside your XML nodes, when you search more based on values in the XML document (CREATE XML INDEX ..... FOR VALUE)
the third is somewhat of a hybrid of the two above (which I never quite groked myself, to be honest; CREATE XML INDEX ..... FOR PROPERTY)
The XML indexes worked quite well in our samples, but the main drawback in our case was the sheer size of the indexes on disk. Our 1.3 GB database grew to over 11 GB just by adding a PRIMARY XML and a XML FOR PATH index to roughly 45'000 entries with an XML field. Due to disk constraints, we ended up having to take down those indices :-(
This is really not all that surprising considering how the XML index will be built-up with entries for each XML node, attribute and so forth - it's just lots of data.
What we've done in the end is create a number of stored functions that reach into the XML from our Entry table, and we extract those bits and pieces that we need most often. Those are now stored on the Entrytable as computed, persisted properties. This is as fast as "proper" fields on the Entry table, it's always up to date and gets set automatically when new data is inserted, and we hardly ever need to really use any significant XQuery requests anymore.
What I can say from personal experience is that the XML support in SQL Server 2005 is really quite profound and well thought out, in my opinion. So all in all, I would say - go give it a try! You won't really be able to tell whether it works and scales nicely enough in your specific case until you've given it a try.
Marc
I have not tried it, but XML seems a little too verbose to me. I figure I can generate xml from my data later, why worry about storing it in XML.
Is there a direct route that is pretty straight forward? (i.e. can SQL Server read XML)
Or, is it best to parse the XML and just transfer it in the usual way via ADO.Net either as individual rows or perhaps a batch update?
I realize there may be solutions that involve large complex stored procs--while I'm not entirely opposed to this, I tend to prefer to have most of my business logic in the C# code. I have seen a solution using SQLXMLBulkLoad, but it seemed to require fairly complex SQL code.
For reference, I'll be working with about 100 rows at a time with about 50 small pieces of data for each (strings and ints). This will eventually become a daily batch job.
Any code snippets you can provide would be very much appreciated.
SQL Server 2005 and up have a datatype called "XML" which you can store XML in - untyped or typed with a XSD schema.
You can basically fill columns of type XML from an XML literal string, so you can easily just use a normal INSERT statement and fill the XML contents into that field.
Marc
You can use the function OPENXML and stored procedure sp_xml_preparedocument to easily convert your XML into rowsets.
If you are using SQL Server 2008 (or 2005), it has an xml native datatype. You can associate an XSD schema with xml variables, and Insert directly into columns of type xml.
Yes, SQL Server 2005 and above can parse XML out of the box.
You use the nodes, value and query methods to break it down how you want, whether values or attributes
Some shameless plugging:
Importing XML into SQL Server
Search XML Column in SQL
Xml data and Xml document could have different meaning.
When xml type is good for data, it doesn't save formatting (white spaces removed), so in some cases (e.g. cofiguration files) the best option is nvarchar.