T-SQL XML value loses new line formats
I have XML file loaded into SQL server. I query this file to extract the nodes with value.
The problem is the new line characters are lost while selection. How to retain formatting so that when I display the text on the web, it not appear messy without line breaks.
See text and screenshots for details
T-SQL code:
declare #Text xml ;
set #Text= '<?xml version="1.0" encoding="utf-8"?>
<topic>
<L1>
<Subject>Subject text</Subject>
<Details>
Story Details are
This is paragraph
Text after After two line breaks
Text after After two line breaks
</Details>
</L1>
</topic>'
;with t as (select #Text [xmlcolumn])
--select * from t
SELECT x.a.value('(Subject)[1]','nvarchar(max)') as [Subject]
, x.a.value('(Details)[1]','nvarchar(max)') as [Details]
FROM t
cross apply
t.xmlcolumn.nodes('//L1') x(a)
Update: I misread your question - the problem with the newlines is purely in SQL Server Management Studio - it cannot represent those newlines. When you read your XML from an application in C# or VB.NET, those newlines will still be there - trust me.
But this original answer might also be relevant in other cases - you need to be aware that SQL Server is not storing your XML "as is" - it parses and converts it. So when you ask to get it back, it might look slightly different, but it's still the same XML functionally.
Yes, this is normal, expected behavior.
SQL Server stores your XML in a tokenized format - e.g. it doesn't store the actual, textual representation of your XML, but it parses and tokenizes your XML into XML fragments that are then stores inside this XML datatype.
Therefore, when you query it again, you'll get back a semantically correct and identical representation - but there's a possibility that certain textual representations are different.
E.g. when you pass in an empty XML element something like this:
<MyEmptyElement></MyEmptyElement>
you'll get back the "short" form of that when you retrieve the XML from SQL Server again:
<MyEmptyElement />
This is not the exact same text - but it's 100% the same XML from a semantic perspective.
As far as I know, you cannot influence this behavior in any way - you'll just have to live with it.
Related
I want to include a line (simple text) in ForXMLPath query as
<Cat>
but I am having difficulties.
When I try it brings in weird characters with it.
Please help.
Thanks.
select
'<Cat>'
I expect this
<Cat>
but it displays below
<Cat>
I must admit, that your question is not clear...
XML is not just some text with fancy extras, but a very strictly organised text based container for data.
A simple SELECT '<Cat>' would never return as <Cat> without a FOR XML somewhere in your query. So please show us a (reduced!) example of your full query and the expected output, best provided as MCVE (a stand-alone sample with DDL, sample data, own attempt and expected output).
Just some general remarks:
If you want to place <Cat> within your XML the whole output will be broken XML. This opening tag demands for a closing </Cat> (or - alternatively - a self-closing <Cat />)
Assumably you try to add out-written tags to your XML as you'd do it in XSLT, JS, ASP.Net or any other XML/HTML producing approach.
Assumably your solution will be a FOR XML PATH() approach without the need of an out-written tag within your XML.
Just to give you an idea:
SELECT 'test' AS [SomeElement] FOR XML PATH('SomeRowTag'),ROOT('SomeRootTag');
prouces this XML
<SomeRootTag>
<SomeRowTag>
<SomeElement>test</SomeElement>
</SomeRowTag>
</SomeRootTag>
If you want to add a <Cat> element you could use an XPath like here
SELECT 'test' AS [Cat/SomeElement] --<-- You can add nest-levels here!
FOR XML PATH('SomeRowTag'),ROOT('SomeRootTag');
The result
<SomeRootTag>
<SomeRowTag>
<Cat>
<SomeElement>test</SomeElement>
</Cat>
</SomeRowTag>
</SomeRootTag>
I'm trying to insert some XML into a SQL Server database table which uses column type XML.
This works fine most of the time, but one user submitted some XML with the character with hex value 3, and SQL Server gave the error "hexadecimal value 0x03, is an invalid character."
Now I want to check, and remove, any invalid XML characters before doing the insert, and there are various articles suggesting how invalid XML characters can be replaced using regex or something similar.
However, the problem for me is that the user submitted the XML document with the invalid character escaped i.e. "", and none of the methods I've found will detect this. This is also why the error was not detected earlier: it's only when inserting it into the SQL database that the problem occurs.
Has anyone written a function that will check for all escaped invalid XML characters? I suppose the character above could have been written as or , or lots of other ways, so it's quite hard to catch them all.
Thanks in advance for any help you can offer.
You could try importing the XML to a temporary varchar(max) variable or table column and use REPLACE to strip out the offending characters, then insert the cleansed string into the destination CASTing it to XML
A piece of tsql code doesnt behave the same from production to Test environment. When the code below is executed on prod it brings back data
SELECT [col1xml]
FROM [DBName].[dbo].[Table1] (NOLOCK)
WHERE (cast([col1xml] as xml).value('(/Payment/****/trn1)[1]','nvarchar(20)') ='123456'))
However that same code brings back the below error when ran in Test.
Msg 9402, Level 16, State 1, Line 9
XML parsing: line 1, character 38, unable to switch the encoding
I have seen the fix provided by this site of conversion of UTF and this works in both prod and test. See below. However i need to provide an answer to the developers of why this behavior is occurring and a rationale why they should change their code(if that is the case)
WHERE CAST(
REPLACE(CAST(col1xml AS VARCHAR(MAX)), 'encoding="utf-16"', 'encoding="utf-8"')
AS XML).value('(/Payment/****/trn1)[1]','NVARCHAR(max)') ='123456')
I have compared both DB's and looked for anything obvious such as ansi nulls and ansi padding. Everything is the same and the version of SQL Server. This is SQL SERVER 2012 11.0.5388 version. Data between environments is different but the table schema is identical and the data type for col1xml is ntext.
In SQL Server you should store XML in a column typed XML. This native type has a got a lot of advantages. It is much faster and has implicit validity checks.
From your question I take, that you store your XML in NTEXT. This type is deprecated for centuries and will not be supported in future versions! You ought to change this soon!
SQL-Server knows two kinds of strings:
1 byte strings (CHAR or VARCHAR), which is extended ASCII
Important: This is not UTF-8! Native UTF-8 support will be part of a coming version.
2 byte string (NCHAR or NVARCHAR), which is UTF-16 (UCS-2)
If the XML has a leading declaration with an encoding (in most cases this is utf-8 or utf-16) you can get into troubles.
If the XML is stored as 2-byte-string (at least the NTEXT tells me this), the declaration has to be utf-16. With a 1-byte-string it should be utf-8.
The best (and easiest) was to ommit the declaration completely. You do not need it. Storing the XML in the appropriate type will kill this declaration automatically.
What you should do: Create a new column of type XML and shuffle all your XMLs to this column. Get rid of any TEXT, NTEXT and IMAGE columns you might have!
The next step is: Be happy and enjoy the fast and easy going with the native XML type :-D
UPDATE Differences in environment
You write: Data between environments is different
The error happens here:
cast([col1xml] as xml)
If your column would store the XML in the native type, you would not need a cast (which is very expensive!!) at all. But in your case this cast depends on the actual XML. As this is stored in NTEXT it is 2-byte-string. If your XML starts with a declaration stating a non-supported encoding (in most cases utf-8), this will fail.
Try this:
This works
DECLARE #xml2Byte_UTF16 NVARCHAR(100)='<?xml version="1.0" encoding="utf-16"?><root>test1</root>';
SELECT CAST(#xml2Byte_UTF16 AS XML);
DECLARE #xml1Byte_UTF8 VARCHAR(100)='<?xml version="1.0" encoding="utf-8"?><root>test2</root>';
SELECT CAST(#xml1Byte_UTF8 AS XML);
This fails
DECLARE #xml2Byte_UTF8 NVARCHAR(100)='<?xml version="1.0" encoding="utf-8"?><root>test3</root>';
SELECT CAST(#xml2Byte_UTF8 AS XML);
DECLARE #xml1Byte_UTF16 VARCHAR(100)='<?xml version="1.0" encoding="utf-16"?><root>test4</root>';
SELECT CAST(#xml1Byte_UTF16 AS XML);
Play around with VARCHAR and NVARCHAR and utf-8 and utf-16...
I have XML docs stored in a TEXT column (collation_name French_CI_AS, character_set_name iso_1).
I want to move them to a new table, in an XML column with the following SQL...
INSERT INTO Signature(JustifId, SignedJustif)
SELECT JustifID, CONVERT(XML, Justif.SignedJustif,2)
FROM Justif
When I do this, I get character encoding errors, that point to the high ascii character in this fragment "presentación, OU=CERES, O=FNMT-RCM, C=ES" - a spanish accented o in an X509 certificate.
This ó started life in utf8, became utf16 as a .net string, then became iso_1 when inserted into the TEXT column. I can copy and paste it into a web page no problem. How, then, do I move it from a TEXT column to an XML column in the same DB (and why is this so difficult?)?
The CONVERT idea came from this post. This MS page covers creating XML from varchar and nvarchar.
This is tricky... A conversion on byte-level might lead to unexpected results...
Try this
INSERT INTO Signature(JustifId, SignedJustif)
SELECT JustifID, CONVERT(XML, CONVERT(VARCHAR(MAX),Justif.SignedJustif))
FROM Justif
If you still get issues, try to specify the specific collation together with the conversion and/or try to convert to NVARCHAR(MAX).
If this doesn't help, please edit your question and poste a (reduced) example. Best was a test-scenario with a minimal XML, where one can reproduce the error.
I am trying to insert the following string into an sql xml field
<?xml version="1.0" encoding="UTF-8"?>
<Response>
<Ip>x.x.x.x</Ip>
<CountryCode>CA</CountryCode>
<CountryName>Canada</CountryName>
<RegionCode>QC</RegionCode>
<RegionName>Québec</RegionName>
<City>Dorval</City>
<ZipCode>h9p1j3</ZipCode>
<Latitude>45.45000076293945</Latitude>
<Longitude>-73.75</Longitude>
<MetroCode></MetroCode>
<AreaCode></AreaCode>
</Response>
The insert code looks like:
INSERT
INTO Traffic(... , xmlGeoLocation, ...)
VALUES (
...
<!---
<cfqueryparam CFSQLType="cf_sql_varchar" value="#xmlGeoLocation#">,
--->
'#xmlGeoLocation#',
...
)
Two bad things happen:
Québec gets turned into Québec
I get an error saying [Macromedia][SQLServer JDBC Driver][SQLServer]XML parsing: line 8, character 16, illegal xml character
UPDATE:
The incoming test stream is mostly single byte characters.
The é is a two byte character. In particular C3A9
Also I don't have control over the incoming xml stream
I'm going to strip the header...
I'm having the same issue with a funny little apostrophe thing. I think the issue is that by the time the string is getting converted to XML, it's not UTF-8 anymore, but sql server is trying to use the header to decode it. If it's VARCHAR, it's in the client's encoding. If it's NVARCHAR, it's UTF-16. Here are some variations I tested:
SQL (varchar, UTF-8):
SELECT CONVERT(XML,'<?xml version="1.0" encoding="UTF-8"?><t>We’re sorry</t>')
Error:
XML parsing: line 1, character 44, illegal xml character
SQL (nvarchar, UTF-8):
SELECT CONVERT(XML,N'<?xml version="1.0" encoding="UTF-8"?><t>We’re sorry</t>')
Error:
XML parsing: line 1, character 38, unable to switch the encoding
SQL (varchar, UTF-16)
SELECT CONVERT(XML,'<?xml version="1.0" encoding="UTF-16"?><t>We’re sorry</t>')
Error:
XML parsing: line 1, character 39, unable to switch the encoding
SQL (nvarchar, UTF-16)
SELECT CONVERT(XML,N'<?xml version="1.0" encoding="UTF-16"?><t>We’re sorry</t>')
Worked!
Have a look at this link from w3, it tells me that:
In HTML, there is a list of some built-in character names like é for é but XML does not have this. In XML, there are only five built-in character entities: <, >, &, " and ' for <, >, &, " and ' respectively. You can define your own entities in a Document Type Definition, or you can use any Unicode character (see next item).
In HTML, there are also numeric character references, such as & for &. You can refer to any Unicode character, but the number is decimal, whereas in the Unicode tables the number is usually in hexadecimal. XML also allows hexadecimal references: & for example.
This leads me to believe that, é might work for an é character.
Also the information at this link from Microsoft states that:
SQLXML 4.0 relies upon the limited support for DTDs provided in SQL Server. SQL Server allows for an internal DTD in xml data type data, which can be used to supply default values and to replace entity references with their expanded contents. SQLXML passes the XML data "as is" (including the internal DTD) to the server. You can convert DTDs to XML Schema (XSD) documents using third-party tools, and load the data with inline XSD schemas into the database.
But all this does not help you if you don't have control over the incoming XML stream. I doubt that it is possible to save an é (or any special character for that matter, except for the built in character entities mentioned above) inside an XML document into an SQL Server XML field, without either adding a DTD or replacing the character with its hexadecimal reference counterpart. In both cases you would need to be able to modify the XML before it goes into the database.
Just a quick example for anyone wanting to go down the "adding a DTD" route.
Here's how to add an internal DTD to an xml file which declares an entity for an é character:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE root [<!ENTITY eacute "é">]>
<root>
<RegionName>Québec</RegionName>
</root>
If you go here and search on the page "Ctrl+F" for "eacute", you end up in a list with examples for other characters which you could just copy and paste into your own internal DTD.
Edit
You could off course add all entities as they are specified at the link above: <!ENTITY eacute "é"><!ENTITY .. // Next entity>, or just copy them all from this file. I do understand how adding an internal DTD to every single XML file you add to the database isn't such a good idea. I would be interested to know if adding it for 1 file fixes your issue though.
Try to change this:
<RegionName>Québec</RegionName>
to:
<RegionName><![CDATA[Québec
]]></RegionName>