Converting UTF8 to uTF16 in SQL Server - sql-server

I have an application that receives XML from some web service written in PHP and inserts it to SQL Server database. When I try to insert received XML that contains Polish diacritical characters, I get an error like this:
XML parsing: line 2, character 703, illegal xml character
I tried to do something like this:
DECLARE #xml XML;
SET #xml = '(here I paste some sample XML that contains diacritical characters)';
SELECT #xml = CAST(#xmlstr AS XML);
INSERT INTO vos_DirectXML_ut(ValidXML,synchronization_time,synchronization_type,MethodName)
VALUES(#xml,GETDATE(),#SynchroType,#method);
ValidXML is a XML type column.
I googled to find some solution and I found Utf8String:
http://msdn.microsoft.com/en-us/library/ms160893(v=sql.90).aspx
I installed it, and tried to convert XML to Utf8String and then convert it again to normal varchar, and then to XML, and insert it to my table, but looks like it does not changes any characters inside of this XML, it just changes type of variable and it didn't solve my problem.
I also found some guy's advice that it's possible to solve similar problem by writing a procedure that goes through loop for every character in variable (XML in my case) and manually change it's encoding, but this guy also said that it may work slow. Is this really the only option to solve my problem?

Try cast to UNICODE:
DECLARE #xmlstr NVARCHAR(MAX) --<--
SELECT #xmlstr = N'(some sample XML that contains diacritical characters)'; --<-- N''
DECLARE #xml XML
SELECT #xml = CAST(#xmlstr AS XML)
INSERT INTO dbo.vos_DirectXML_ut
(
ValidXML
, synchronization_time
, synchronization_type
, MethodName
)
SELECT
#xml
, GETDATE()
, #SynchroType
, #method

for XML file , UTF-16 is not supported by SQL server 2008 R2 ,so for the xml file,which is starts with
when you parse this xml gives error
Msg 6602, Level 16, State 2, Procedure sp_xml_preparedocument, Line 1
The error description is 'Switch from current encoding to specified encoding not supported.'.
to resolve the above error the easy step is to use the SQL replace function
REPLACE('#xmldata','utf-16','') or REPLACE('#xmldata','utf-16','utf-8')
I have worked on 3 procedures using an xml file,whenever i tried to use utf-16 XML parser gives error.
Always use utf-8 for SQL server 2008 R2

Related

Select more than 65,536 characters from nvarchar(max) in SQL Server [duplicate]

Software I'm working with uses a text field to store XML. From my searches online, the text datatype is supposed to hold 2^31 - 1 characters. Currently SQL Server is truncating the XML at 65,535 characters every time. I know this is caused by SQL Server, because if I add a 65,536th character to the column directly in Management Studio, it states that it will not update because characters will be truncated.
Is the max length really 65,535 or could this be because the database was designed in an earlier version of SQL Server (2000) and it's using the legacy text datatype instead of 2005's?
If this is the case, will altering the datatype to Text in SQL Server 2005 fix this issue?
that is a limitation of SSMS not of the text field, but you should use varchar(max) since text is deprecated
Here is also a quick test
create table TestLen (bla text)
insert TestLen values (replicate(convert(varchar(max),'a'), 100000))
select datalength(bla)
from TestLen
Returns 100000 for me
MSSQL 2000 should allow up to 2^31 - 1 characters (non unicode) in a text field, which is over 2 billion. Don't know what's causing this limitation but you might wanna try using varchar(max) or nvarchar(max). These store as many characters but allow also the regular string T-SQL functions (like LEN, SUBSTRING, REPLACE, RTRIM,...).
If you're able to convert the column, you might as well, since the text data type will be removed in a future version of SQL Server. See here.
The recommendation is to use varchar(MAX) or nvarchar(MAX). In your case, you could also use the XML data type, but that may tie you to certain database engines (if that's a consideration).
You should have a look at
XML Support in Microsoft SQL Server
2005
Beginning SQL Server 2005 XML
Programming
So I would rather try to use the data type appropriate for the use. Not make a datatype fit your use from a previous version.
Here's a little script I wrote for getting out all data
SELECT #data = N'huge data';
DECLARE #readSentence NVARCHAR (MAX) = N'';
DECLARE #dataLength INT = ( SELECT LEN (#data));
DECLARE #currIndex INT = 0;
WHILE #data <> #readSentence
BEGIN
DECLARE #temp NVARCHAR (MAX) = N'';
SET #temp = ( SELECT SUBSTRING (#data, #currIndex, 65535));
SELECT #temp;
SET #readSentence += #temp;
SET #currIndex += 65535;
END;

TSQL "Illegal XML Character" When Converting Varbinary to XML

I'm trying to create a stored procedure in SQL Server 2016 that converts XML that was previously converted into Varbinary back into XML, but getting an "Illegal XML character" error when converting. I've found a workaround that seems to work, but I can't actually figure out why it works, which makes me uncomfortable.
The stored procedure takes data that was converted to binary in SSIS and inserted into a varbinary(MAX) column in a table and performs a simple
CAST(Column AS XML)
It worked fine for a long time, and I only began seeing an issue when the initial XML started containing an ® (registered trademark) symbol.
Now, when I attempt to convert the binary to XML I get this error
Msg 9420, Level 16, State 1, Line 23
XML parsing: line 1, character 7, illegal xml character
However, if I first convert the binary to varchar(MAX), then convert that to XML, it seems to work fine. I don't understand what is happening when I perform that intermediate CAST that is different than casting directly to XML. My main concern is that I don't want to add it in to account for this scenario and end up with unintended consequences.
Test code:
DECLARE #foo VARBINARY(MAX)
DECLARE #bar VARCHAR(MAX)
DECLARE #Nbar NVARCHAR(MAX)
--SELECT Varbinary
SET #foo = CAST( '<Test>®</Test>' AS VARBINARY(MAX))
SELECT #foo AsBinary
--select as binary as varchar
SET #bar = CAST(#foo AS VARCHAR(MAX))
SELECT #bar BinaryAsVarchar -- Correct string output
--select binary as nvarchar
SET #nbar = CAST(#foo AS NVARCHAR(MAX))
SELECT #nbar BinaryAsNvarchar -- Chinese characters
--select binary as XML
SELECT TRY_CAST(#foo AS XML) BinaryAsXML -- ILLEGAL XML character
-- SELECT CONVERT(xml, #obfoo) BinaryAsXML --ILLEGAL XML Character
--select BinaryAsVarcharAsXML
SELECT TRY_CAST(#bar AS XML) BinaryAsVarcharAsXML -- Correct Output
--select BinaryAsNVarcharAsXML
SELECT TRY_CAST(#nbar AS XML) BinaryAsNvarcharAsXML -- Chinese Characters
There are several things to know:
SQL-Server is rather limited with character encodings. There is VARCHAR, which is 1-byte-encoded extended ASCII and NVARCHAR, which is UCS-2 (almost the same as utf-16).
VARCHAR uses plain latin for the first set of characters and a codepage-mapping provided by the collation in use for the second set.
VARCHAR is not utf-8. utf-8 works with VARCHAR, as long as all characters are 1-byte-enocded. But utf-8 knows a lot of 2-byte-enocded (up to 4-byte-enocded) characters, which would break the internal storage of a VARCHAR string.
NVARCHAR will work with almost any 2-byte encoded character natively (that means with almost any existing character). But it is not exactly utf-16 (there are 3-byte encoded characters, which would break SQL-Servers internal storage).
XML is not stored as the XML-string you see, but as an hierarchically organised physical table, based on NVARCHAR values.
The natively stored XML is really fast, while any text-based storage will need a very expensive parse-operation in advance (over and over...).
Storing XML as string is bad, storing XML as VARCHAR string is even worse.
Storing a VARCHAR-string-XML as VARBINARY is a cummulation of things you should not do.
Try this:
DECLARE #text1Byte VARCHAR(100)='<test>blah</test>';
DECLARE #text2Byte NVARCHAR(100)=N'<test>blah</test>';
SELECT CAST(#text1Byte AS VARBINARY(MAX)) AS text1Byte_Binary
,CAST(#text2Byte AS VARBINARY(MAX)) AS text2Byte_Binary
,CAST(#text1Byte AS XML) AS text1Byte_XML
,CAST(#text2Byte AS XML) AS text2Byte_XML
,CAST(CAST(#text1Byte AS VARBINARY(MAX)) AS XML) AS text1Byte_XML_via_Binary
,CAST(CAST(#text2Byte AS VARBINARY(MAX)) AS XML) AS text2Byte_XML_via_Binary
The only difference you'll see are the many zeros in 0x3C0074006500730074003E0062006C00610068003C002F0074006500730074003E00. This is due to the 2-byte-encoding of nvarchar, each second byte is not needed in this sample. But if you'd need far-east-characters the picture would be completely different.
The reason why it works: SQL-Server is very smart. The cast from the variable to XML is rather easy, as the engine knows, that the underlying variable is varchar or nvarchar. But the last two casts are different. The engine has to examine the binary, whether it is a valid nvarchar and will give it a second try with varchar if it fails.
Now try to add your registered trademark to the given example. Add it first to the second variable DECLARE #text2Byte NVARCHAR(100)=N'<test>blah®</test>'; and try to run this. Then add it to the first variable and try it again.
What you can try:
Cast your binary to varchar(max), then to nvarchar(max) and finally to xml.
,CAST(CAST(CAST(CAST(#text1Byte AS VARBINARY(MAX)) AS VARCHAR(MAX)) AS NVARCHAR(MAX)) AS XML) AS text1Byte_XML_via_Binary
This will work, but it won't be fast...

How to test if field is XML in xpath query?

I query an image field to parse out the Lot number but I cannot be guaranteed that the image field will be XML. I am querying using embedded SQL. Not a stored proc.
How can I test if the field is XML and if not, get out gracefully?
i.e. NullIf (not XML) or equivalent.
DECLARE #x xml
SET #x = (SELECT [image]
FROM [QM].[dbo].[ticket]
where ticket_id = :ticketID)
SELECT #x.query('(/*:NewDataSet/*:tickets/*:lot/text())[1]')as LotNo
I actually run the query above. If it fails due to not being proper XML, I set the Lot No to '';

Does the command FOR XML in MS SQL Server save the file in disk?

Does the command FOR XML in MS SQL Server save the file in disk?
I'm creating a trigger to log operations in a table and part of this trigger is create a XML with the affected row. I'm thinking of using the FOR XML to generate the XML.
SELECT *
FROM TBL_Test
WHERE ID=3040
FOR XML RAW
My worry is that I will be using it in a trigger and I don´t want to save files in the server every time I call the FOR XML function.
In addition: would you guys know how to parse it to varchar?
Thanks in advance for any help!
Like any SELECT query, a query with the FOR XML clause will return the result to the client and not save the result to disk.
You can use a scalar subquery to assign the result XML to a varchar variable instead of returning to the client:
DECLARE #xml varchar(MAX) =
(
SELECT *
FROM dbo.TBL_Test
WHERE ID=3040
FOR XML RAW
);

Best way to transfer an xml to SQL Server?

I have been hearing the podcast blog for a while, I hope I dont break this.
The question is this: I have to insert an xml to a database. This will be for already defined tables and fields. So what is the best way to accomplish this? So far I am leaning toward programatic. I have been seeing varios options, one is Data Transfer Objects (DTO), in the SQL Server there is the sp_xml_preparedocument that is used to get transfer XMLs to an object and throught code.
I am using CSharp and SQL Server 2005. The fields are not XML fields, they are the usual SQL datatypes.
In an attempt to try and help, we may need some clarification. Maybe by restating the problem you can let us know if this is what you're asking:
How can one import existing xml into a SQL 2005 database, without relying on the built-in xml type?
A fairly straight forward solution that you already mentioned is the sp_xml_preparedocument, combined with openxml.
Hopefully the following example illustrates the correct usage. For a more complete example checkout the MSDN docs on Using OPENXML.
declare #XmlDocumentHandle int
declare #XmlDocument nvarchar(1000)
set #XmlDocument = N'<ROOT>
<Customer>
<FirstName>Will</FirstName>
<LastName>Smith</LastName>
</Customer>
</ROOT>'
-- Create temp table to insert data into
create table #Customer
(
FirstName varchar(20),
LastName varchar(20)
)
-- Create an internal representation of the XML document.
exec sp_xml_preparedocument #XmlDocumentHandle output, #XmlDocument
-- Insert using openxml allows us to read the structure
insert into #Customer
select
FirstName = XmlFirstName,
LastName = XmlLastName
from openxml ( #XmlDocumentHandle, '/ROOT/Customer',2 )
with
(
XmlFirstName varchar(20) 'FirstName',
XmlLastName varchar(20) 'LastName'
)
where ( XmlFirstName = 'Will' and XmlLastName = 'Smith' )
-- Cleanup xml document
exec sp_xml_removedocument #XmlDocumentHandle
-- Show the data
select *
from #Customer
-- Drop tmp table
drop table #Customer
If you have an xml file and are using C#, then defining a stored procedure that does something like the above and then passing the entire xml file contents to the stored procedure as a string should give you a fairly straight forward way of importing xml into your existing table(s).
If your XML conforms to a particular XSD schema, you can look into using the "xsd.exe" command line tool to generate C# object classes that you can bind the XML to, and then form your insert statements using the properties of those objects: MSDN XSD Doc
Peruse this document and it will give you the options:
MSDN: XML Options in Microsoft SQL Server 2005
You may want to use XSLT to transfer your XML into SQL statements... ie
<xml type="user">
<data>1</data>
<data>2</data>
<xml>
Then the XSLT would look like
<xsl:template match="xml">
INSERT INTO <xsl:value-of select="#type" /> (data1, data2) VALUES (
'<xsl:value-of select="data[1]" />',
'<xsl:value-of select="data[2]" />');
</xsl:template>
The match statement most likely won't be the root node, but hopefully you get the idea. You may also need to wrap the non xsl:value-of parts in xsl:text to prevent extra characters from being dumped into the query. And you'd have to make sure the output of the XSLT was text. That said you could get a list of SQL statements that you could run through the DB. or you could use XSLT to output a T-SQL statement that you could load as a stored procedure.

Resources