SQL Server 2008 Xml Issue With Xml Escape Characters - sql-server

Our current Point of Sale system executes too many queries in nested transactions that leave duplicated or partial data in place. I changed the entire thing to a single stored procedure where all sale item data is passed in as Xml, iterated through in a temp table, and saved to the database, then committed. However, SQL rejects special characters in the xml.
For example:
<?xml version="1.0" encoding="utf-16"?>
<list>
<item>
<objectid>bd99fcb6-3031-48b7-9a71-5f8cefe0a614</objectid>
<amount>50.00</amount>
<fee>1.50</fee>
<waivedfee>0.00</waivedfee>
<tax>0.00</tax>
<name>TEST & TEST PERSON</name>
<payeeid>197</payeeid>
<accountnumber>5398520352</accountnumber>
<checknumber />
<comedreceiptnumber />
<isexpedited>0</isexpedited>
<echeckrefnumber />
</item>
</list>
Fails. It tells me that there is an illegal character where & is located. I don't know why. It's escaped properly with &. I can't find any solutions online, anywhere. Everywhere people tell me to replace & with & - which is what I am doing!

Use XML PATH(''), it will encode the special characters for you.
SELECT 'TEST & TEST PERSON' FOR XML PATH('')

I figured it out. UTF-16 is correct. That Xml is fine. There was a final piece of xml, the ledgers, that were just plain strings with no encoding and no escaping special characters. Once I corrected that it all worked.
Thanks for the help!

Related

How to insert XML into SQL Server when it contains escaped invalid characters

I'm trying to insert some XML into a SQL Server database table which uses column type XML.
This works fine most of the time, but one user submitted some XML with the character with hex value 3, and SQL Server gave the error "hexadecimal value 0x03, is an invalid character."
Now I want to check, and remove, any invalid XML characters before doing the insert, and there are various articles suggesting how invalid XML characters can be replaced using regex or something similar.
However, the problem for me is that the user submitted the XML document with the invalid character escaped i.e. "", and none of the methods I've found will detect this. This is also why the error was not detected earlier: it's only when inserting it into the SQL database that the problem occurs.
Has anyone written a function that will check for all escaped invalid XML characters? I suppose the character above could have been written as  or , or lots of other ways, so it's quite hard to catch them all.
Thanks in advance for any help you can offer.
You could try importing the XML to a temporary varchar(max) variable or table column and use REPLACE to strip out the offending characters, then insert the cleansed string into the destination CASTing it to XML

How can I insert from a TEXT column to an XML column in SQL Server 2014

I have XML docs stored in a TEXT column (collation_name French_CI_AS, character_set_name iso_1).
I want to move them to a new table, in an XML column with the following SQL...
INSERT INTO Signature(JustifId, SignedJustif)
SELECT JustifID, CONVERT(XML, Justif.SignedJustif,2)
FROM Justif
When I do this, I get character encoding errors, that point to the high ascii character in this fragment "presentación, OU=CERES, O=FNMT-RCM, C=ES" - a spanish accented o in an X509 certificate.
This ó started life in utf8, became utf16 as a .net string, then became iso_1 when inserted into the TEXT column. I can copy and paste it into a web page no problem. How, then, do I move it from a TEXT column to an XML column in the same DB (and why is this so difficult?)?
The CONVERT idea came from this post. This MS page covers creating XML from varchar and nvarchar.
This is tricky... A conversion on byte-level might lead to unexpected results...
Try this
INSERT INTO Signature(JustifId, SignedJustif)
SELECT JustifID, CONVERT(XML, CONVERT(VARCHAR(MAX),Justif.SignedJustif))
FROM Justif
If you still get issues, try to specify the specific collation together with the conversion and/or try to convert to NVARCHAR(MAX).
If this doesn't help, please edit your question and poste a (reduced) example. Best was a test-scenario with a minimal XML, where one can reproduce the error.

Format fields during bulk insert SQL 2008

I am currently working on a project that requires data from a report generated by third party software to be inserted into a local SQL database. So far I have the data stored as a tab delimited .txt file and the following bulk insert SQL statement:
BULK INSERT ExampleTable
FROM 'c:\temp\Example.txt'
WITH
(
FIRSTROW = 2,
FIELDTERMINATOR = '\t',
ROWTERMINATOR = '\n'
)
GO
The two problems I am encountering are, quotation marks around any value that includes it's own comma, and money signs in every field that has a dollar amount.
For instance one of the columns of the table is a description field and some of the values come out looking like:
"this is an example description, some more information, I don't know why the author would use commas in the first place here"
I don't care about the description field nearly as much as other fields that include dollar amounts. Each of these fields is already prefixed with a $ sign, so I have to set them as a nvarchar instead of a decimal or a float, which would be A LOT more useful for reporting. Furthermore, when the dollar amount is greater than 1000, the field will also contain a comma, and thus, quotation marks. ex "$1,084.59"
I am familiar with SSMS, but I have never made a format or bcp file (the solutions I have found online).
Any help would be greatly appreciated.
You can use a format file, but only if your metadata remains constant, which it does not appear to be in your case. You state that the dollar amounts are enclosed in quotes only when they exceed 999 and the comma is inserted. A format file would allow you to define per column delimiters such as [,] or [","]. But if that delimiter is shifting throughout your file, you will have to pre-process the file. Text qualifiers themselves are not supported.
For reference:
CSV import in SQL Server 2008
http://jessesql.blogspot.com/2010/05/bulk-insert-csv-with-text-qualifiers.html
i dont see why, but ThiefMaster deleted my answer :-(
probabaly a mistake and he did not check the link, as this link is the full answer to you question, i will try again for the last time here...
Tip: if your CSV file don't have consistent format, for example ON THE SAME COLUMN some of the values are doubleqouted and some not than this blog will help you do it in an easy way (using openrowset in the last step make it a one simple query): http://ariely.info/Blog/tabid/83/EntryId/122/Using-Bulk-Insert-to-import-inconsistent-data-format-using-pure-T-SQL.aspx
There is a new WIKI at: http://social.technet.microsoft.com/wiki based on this blog if you prefer to read from Microsoft site.

illegal xml character on SQL Insert

I am trying to insert the following string into an sql xml field
<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Ip>x.x.x.x</Ip>
<CountryCode>CA</CountryCode>
<CountryName>Canada</CountryName>
<RegionCode>QC</RegionCode>
<RegionName>Québec</RegionName>
<City>Dorval</City>
<ZipCode>h9p1j3</ZipCode>
<Latitude>45.45000076293945</Latitude>
<Longitude>-73.75</Longitude>
<MetroCode></MetroCode>
<AreaCode></AreaCode>
</Response>
The insert code looks like:
INSERT
INTO Traffic(... , xmlGeoLocation, ...)
VALUES (
...
<!---
<cfqueryparam CFSQLType="cf_sql_varchar" value="#xmlGeoLocation#">,
--->
'#xmlGeoLocation#',
...
)
Two bad things happen:
Québec gets turned into Québec
I get an error saying [Macromedia][SQLServer JDBC Driver][SQLServer]XML parsing: line 8, character 16, illegal xml character
UPDATE:
The incoming test stream is mostly single byte characters.
The é is a two byte character. In particular C3A9
Also I don't have control over the incoming xml stream
I'm going to strip the header...
I'm having the same issue with a funny little apostrophe thing. I think the issue is that by the time the string is getting converted to XML, it's not UTF-8 anymore, but sql server is trying to use the header to decode it. If it's VARCHAR, it's in the client's encoding. If it's NVARCHAR, it's UTF-16. Here are some variations I tested:
SQL (varchar, UTF-8):
SELECT CONVERT(XML,'<?xml version="1.0" encoding="UTF-8"?><t>We’re sorry</t>')
Error:
XML parsing: line 1, character 44, illegal xml character
SQL (nvarchar, UTF-8):
SELECT CONVERT(XML,N'<?xml version="1.0" encoding="UTF-8"?><t>We’re sorry</t>')
Error:
XML parsing: line 1, character 38, unable to switch the encoding
SQL (varchar, UTF-16)
SELECT CONVERT(XML,'<?xml version="1.0" encoding="UTF-16"?><t>We’re sorry</t>')
Error:
XML parsing: line 1, character 39, unable to switch the encoding
SQL (nvarchar, UTF-16)
SELECT CONVERT(XML,N'<?xml version="1.0" encoding="UTF-16"?><t>We’re sorry</t>')
Worked!
Have a look at this link from w3, it tells me that:
In HTML, there is a list of some built-in character names like é for é but XML does not have this. In XML, there are only five built-in character entities: <, >, &, " and &apos; for <, >, &, " and ' respectively. You can define your own entities in a Document Type Definition, or you can use any Unicode character (see next item).
In HTML, there are also numeric character references, such as & for &. You can refer to any Unicode character, but the number is decimal, whereas in the Unicode tables the number is usually in hexadecimal. XML also allows hexadecimal references: & for example.
This leads me to believe that, é might work for an é character.
Also the information at this link from Microsoft states that:
SQLXML 4.0 relies upon the limited support for DTDs provided in SQL Server. SQL Server allows for an internal DTD in xml data type data, which can be used to supply default values and to replace entity references with their expanded contents. SQLXML passes the XML data "as is" (including the internal DTD) to the server. You can convert DTDs to XML Schema (XSD) documents using third-party tools, and load the data with inline XSD schemas into the database.
But all this does not help you if you don't have control over the incoming XML stream. I doubt that it is possible to save an é (or any special character for that matter, except for the built in character entities mentioned above) inside an XML document into an SQL Server XML field, without either adding a DTD or replacing the character with its hexadecimal reference counterpart. In both cases you would need to be able to modify the XML before it goes into the database.
Just a quick example for anyone wanting to go down the "adding a DTD" route.
Here's how to add an internal DTD to an xml file which declares an entity for an é character:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE root [<!ENTITY eacute "é">]>
<root>
<RegionName>Québec</RegionName>
</root>
If you go here and search on the page "Ctrl+F" for "eacute", you end up in a list with examples for other characters which you could just copy and paste into your own internal DTD.
Edit
You could off course add all entities as they are specified at the link above: <!ENTITY eacute "é"><!ENTITY .. // Next entity>, or just copy them all from this file. I do understand how adding an internal DTD to every single XML file you add to the database isn't such a good idea. I would be interested to know if adding it for 1 file fixes your issue though.
Try to change this:
<RegionName>Québec</RegionName>
to:
<RegionName><![CDATA[Québec
]]></RegionName>

SQL Server XML Value formatting newline

T-SQL XML value loses new line formats
I have XML file loaded into SQL server. I query this file to extract the nodes with value.
The problem is the new line characters are lost while selection. How to retain formatting so that when I display the text on the web, it not appear messy without line breaks.
See text and screenshots for details
T-SQL code:
declare #Text xml ;
set #Text= '<?xml version="1.0" encoding="utf-8"?>
<topic>
<L1>
<Subject>Subject text</Subject>
<Details>
Story Details are
This is paragraph
Text after After two line breaks
Text after After two line breaks
</Details>
</L1>
</topic>'
;with t as (select #Text [xmlcolumn])
--select * from t
SELECT x.a.value('(Subject)[1]','nvarchar(max)') as [Subject]
, x.a.value('(Details)[1]','nvarchar(max)') as [Details]
FROM t
cross apply
t.xmlcolumn.nodes('//L1') x(a)
Update: I misread your question - the problem with the newlines is purely in SQL Server Management Studio - it cannot represent those newlines. When you read your XML from an application in C# or VB.NET, those newlines will still be there - trust me.
But this original answer might also be relevant in other cases - you need to be aware that SQL Server is not storing your XML "as is" - it parses and converts it. So when you ask to get it back, it might look slightly different, but it's still the same XML functionally.
Yes, this is normal, expected behavior.
SQL Server stores your XML in a tokenized format - e.g. it doesn't store the actual, textual representation of your XML, but it parses and tokenizes your XML into XML fragments that are then stores inside this XML datatype.
Therefore, when you query it again, you'll get back a semantically correct and identical representation - but there's a possibility that certain textual representations are different.
E.g. when you pass in an empty XML element something like this:
<MyEmptyElement></MyEmptyElement>
you'll get back the "short" form of that when you retrieve the XML from SQL Server again:
<MyEmptyElement />
This is not the exact same text - but it's 100% the same XML from a semantic perspective.
As far as I know, you cannot influence this behavior in any way - you'll just have to live with it.

Resources