Read BLOB column and convert it to xml in a file - sql-server

I have a column that is a blob. Here is the screenshot:
As you can see that I have a TBDOCUMENTS table. In this table DOCUMENT column is BLOB. I want to read this column. Then I know that for this particular DOCUMENTURL this column contains xml. So I want to convert it into XML. And then I want to write this XML in a file.
How can I do it in SQL Server? I am using SQL Server 2014

To read the xml you stored inside a blob column (for example varbinary) you can use CONVERT:
select CONVERT(xml,(CONVERT(varbinary(max),DOCUMENT)))
from TBDOCUMENTS
where DOCUMENTURL='...'
Now you can write this string to an XML file. I think that SQL Server is probably not a good fit for this task; nonetheless there are many techniques to achieve this, for example with bcp (more info here and here)
As reported in the comments a DTD related error may occur when using CONVERT:
Parsing XML with internal subset DTDs not allowed. Use CONVERT with style option 2 to enable limited internal subset DTD support.
In this case using the xml-styles option of CONVERT command (more info here) fixes the problem:
select CONVERT(xml,(CONVERT(varbinary(max),DOCUMENT)), 2)
from TBDOCUMENTS
where DOCUMENTURL='...'

Related

What SQL Server datatype to use for mixed XML and HL7v2 data?

Consider a column in an MS SQL database which will house either potentially large chunks or XML or pipe-delimited HL7v2 data.
Currently (due to not using forward-thinking) it's currently typed as XML because originally we were only ever accepting XML data. While technically this could work, it means that all the XML special characters in the HL7v2 messages are being encoded (& --> & etc.).
This is not ideal for what we are doing. If I were to convert this column to a different datatype, what would be recommended? I was thinking nvarchar(max) as it seems like it would handle it, but I'm not well-versed in SQL datatypes and the implications of using different types for such data.
There really isn't much of a choice other than nvarchar(max).
The other options are either varchar(max) or varbinary(max). You might need Unicode so you can't use varchar. It would work to store it as varbinary, but it would just be annoying to work with.
Use HAPI to transform the HL7 messages from ER7 (pipe delimited) to XML encoding. That way you can use a single SQL Server XML column for everything. And it will give you the added benefit of being able to query into HL7 message contents using XQuery.
As Nicks say, converting pipe delimited to XML and then persist in XML is the best option, trying to persist xml and pipe delimited values in a same column for me it make no sense, as on source they are different data types.

How to generate XML with attribute data types from SQL Server table definition?

I need to generate XML document based on a SQL Server table. This is very simple to do with the following syntax.
SELECT *
FROM MyTableName
WHERE name = 'Bob'
FOR XML PATH
What I'm struggling with is figuring out a way to include data types for all elements automatically without having to modify XML and including additional tags manually.
I want XML attributes reflect datatypes found in table definition.
I found the answer through use of SQL Prompt auto complete feature. After typing FOR XML RAW, I explored other options that can be used with it.
As the result I found following two options that can be used with FOR XML RAW
FOR XML RAW, XMLDATA
FOR XML RAW, XMLSCHEMA
I ran the query with those options and looked at the xml document generated by the output. After I realized that this is what I been looking for, I was able to find documentation https://msdn.microsoft.com/en-us/library/bb510461.aspx that explained both options and how it can be used.
I slightly modified my original query,
SELECT *
FROM MyTableName
WHERE 1 = 0
FOR XML RAW, ELEMENTS, XMLSCHEMA
By including 1=0 in where clause allowed me to only generate attribute without any data. Now all I had to do was to copy output into notepad++ and save it as .xsd file.

SQL Server 2012 Full Text Search on RTF

I have my Database running on a SQL Server 2012. One Column of my Table contains RTF Text. The Datatype of the Column is nvarchar(MAX).
I want setup a full text search for this column which analyses the rtf and searches only in the real text, so that I don't get rtf Tags as result.
As I understand, parsing rtf should already be part of the SQL Server. But I don't get it working :-(
I did following:
Create a full text catalog
Select the column containing rtf and add a full_text Index
But I still get wrong results
SELECT * FROM myTable WHERE
CONTAINS(myRtfColumn,'rtf')
--> still get all columns, as 'rtf' is a keyword
Any Ideas what I doing wrong? Do I have to activate rtf-Search for my SQL Server or something similar?
A full text search works only on text columns. You are inserting into your database binary stuff -> rtf. When you have chosen nvarchar you told the sql server you want to store text, but you are storing binary stuff. For binary stuff use varbinary(max) instead.
The problem will still remain, because the index routines don't know how to interpret richtext - what are control chars what is content.
let us talk about the interpreter/filter
documentation says:
https://technet.microsoft.com/en-us/en-en/library/ms142531(v=SQL.105).aspx
varbinary(max) or varbinary data
A single varbinary(max) or varbinary column can store many types of documents. SQL Server 2008 supports any document type for which a filter is installed and available in the operative system. The document type of each document is identified by the file extension of the document. For example, for a .doc file extension, full-text search uses the filter that supports Microsoft Word documents. For a list of available document types, query the sys.fulltext_document_types catalog view.
Note that the Full-Text Engine can leverage existing filters that are installed in the operating system. Before you can use operating-system filters, word breakers, and stemmers, you must load them in the server instance, as follows:
Finally todo:
check if ".rtf" is as filter available.
EXEC sp_help_fulltext_system_components 'filter';
then add a calculated column to you table "typ" which always returns ".rtf"
alter table yourname add [Typ] AS (CONVERT([nvarchar](8),'.rtf',0));
This can used now for the index as type specification.

Mass convert all non-unicode fields to unicode in SSIS

I have quite a few tables and I'm using SSIS to bring the data from Oracle to SQL Server, in the process I'd like to convert all varchar fields to nvarchar. I know I can use the Data Conversion transformer but it seems the only way to do this is to set each field one by one, then I'll have to manually set the mapping in the destination component to map to the "Copy of" field. I've got thousands of fields and it would be tedious to set it on each one... is there a way to say "if field is DT_STR convert to DT_WSTR"?
what you can do is, instead of replacing varchar with nvarchar manually before running the script is copy and save all the create table scripts generated by SSIS to a document. Then you can do a global replace nvarchar x varchar in the document.
Use then the amended script as a step in your SSIS package to create the tables before populating them with the data from Oracle.
The proper way is to use the data conversion step...
That said, it appears if you disable external meta data validation in SSIS, you can bypass this error. SQL will then use an implicit conversion to the destination type.
See this SO post for a quick explanation.

What is the best way to save XML data to SQL Server?

Is there a direct route that is pretty straight forward? (i.e. can SQL Server read XML)
Or, is it best to parse the XML and just transfer it in the usual way via ADO.Net either as individual rows or perhaps a batch update?
I realize there may be solutions that involve large complex stored procs--while I'm not entirely opposed to this, I tend to prefer to have most of my business logic in the C# code. I have seen a solution using SQLXMLBulkLoad, but it seemed to require fairly complex SQL code.
For reference, I'll be working with about 100 rows at a time with about 50 small pieces of data for each (strings and ints). This will eventually become a daily batch job.
Any code snippets you can provide would be very much appreciated.
SQL Server 2005 and up have a datatype called "XML" which you can store XML in - untyped or typed with a XSD schema.
You can basically fill columns of type XML from an XML literal string, so you can easily just use a normal INSERT statement and fill the XML contents into that field.
Marc
You can use the function OPENXML and stored procedure sp_xml_preparedocument to easily convert your XML into rowsets.
If you are using SQL Server 2008 (or 2005), it has an xml native datatype. You can associate an XSD schema with xml variables, and Insert directly into columns of type xml.
Yes, SQL Server 2005 and above can parse XML out of the box.
You use the nodes, value and query methods to break it down how you want, whether values or attributes
Some shameless plugging:
Importing XML into SQL Server
Search XML Column in SQL
Xml data and Xml document could have different meaning.
When xml type is good for data, it doesn't save formatting (white spaces removed), so in some cases (e.g. cofiguration files) the best option is nvarchar.

Resources