SQL string to varbinary through XML different than nvarchar - sql-server

We currently have a function in SQL which I simply do not understand.
Currently we convert a nvarchar to XML, and then select the XML value, and convert that to a varbinary.
When I try to simplify this to convert the nvarchar directly to varbinary, the output is different... Why?
--- Current situation:
Declare #inputString nvarchar(max) = '4d95605d1b8f3bca5ea3e0d2af26027004d17218152e726da0622d669a71f85c'
--1: input to XML
declare #inputXML XML = convert(varchar(max), #inputString)
--2: input XML to binary
declare #inputBinray varbinary(max) = #inputXML.value('(/)[1]', 'varbinary(max)')
select #inputString -- 4d95605d1b8f3bca5ea3e0d2af26027004d17218152e726da0622d669a71f85c
select #inputXML -- 4d95605d1b8f3bca5ea3e0d2af26027004d17218152e726da0622d669a71f85c
select #inputBinray -- 0xE1DF79EB4E5DD5BF1FDDB71AE5E6B77B477669FDBAD36EF4D38775EF6D7CD79D9EEF6E9D6B4EB6D9DEBAF5AEF57FCE5C
--- New situation
--1: Input to binary
declare #inputString2 varbinary(max) = CAST(#inputString as varbinary(max));
select #inputString2 -- 0x3400640039003500360030003500640031006200380066003300620063006100350065006100330065003000640032006100660032003600300032003700300030003400640031003700320031003800310035003200650037003200360064006100300036003200320064003600360039006100370031006600380035006300

Using the value() function to get a XML value specified as varbinary(max) will read the data as if it was Base64 encoded. Casting a string to varbinary(max) does not, it treats it as just any string.
If you use the input string QQA= which is the letter A in UTF-16 LE encoded to Base64 you will see more clearly what is happening.
XML gives you 0x4100, the varbinary of the letter A, and direct cast on the string gives you 0x5100510041003D00 where you have two 5100 = "Q" and of course one 4100 = "A" followed by a 3D00 = "="

Might be I get something wrong, but - if I understand you correctly - I think you simply want to get a real binary from a HEX-string, which just looks like a binary. Correct?
Above I wrote "simply", but this was not simple at all a while ago.
I'm not sure at the moment, but I think it was version v2012, which enhanced CONVERT() (read about binary values and how the third parameter works) and try this:
DECLARE #hexString VARCHAR(max)='4d95605d1b8f3bca5ea3e0d2af26027004d17218152e726da0622d669a71f85c';
SELECT CONVERT(varbinary(max),#hexString,2);
The result is a real binary
0x4D95605D1B8F3BCA5EA3E0D2AF26027004D17218152E726DA0622D669A71F85C
What might be the reason for your issue:
Very long ago, I think it was until v2005, the default encoding of varbinaries in XML was a HEX string. Later this was changed to base64. Might be, that you code was used in a very old environment and was upgraded to a higher version?
Today we use XML in a smiliar way to create and to read base64, which is not supported otherwise. Maybe your code did something similar with HEX strings...?
One more hint for this: The many 00 in your New Situation example show clearly, that this is a two-byte encoded NVARCHAR string. Contrary, your Current Situation shows a simple HEX string.
Your final result is just the binary pattern of your input as string:

Related

Select Hex/Char conversion

I have some data in a SQL database stored in the format below, which I would like to convert to a readable string:
540045005300540049004E00470031003200330034
I would like to run some kind of SELECT statement to return the text which should be TESTING1234
It appears to be in Hex format separated by 00 between each character, so if I run these statements:
SELECT CHAR(0x54)
SELECT CHAR(0x45)
This returns:
T
E
Is there any way I can convert the whole string in one statement?
Thanks!
The 00 point to 2-byte-enocding which is represented as NVARCHAR. Try this
SELECT CAST(0x540045005300540049004E00470031003200330034 AS NVARCHAR(MAX))
Or directly from the HEX-string as string:
SELECT CAST(CONVERT(VARBINARY(MAX),'540045005300540049004E00470031003200330034',2) AS NVARCHAR(MAX));
The result is TESTING1234
Some more background on string encoding
SQL-Server knows exactly two types of strings:
1-byte-encoded VARCHAR / CHAR
2-byte-encoded nVARCHAR / nCHAR
The 1-byte string is extended ASCII, the related collation provides a code page to map non-plain-latin characters (it is not utf-8 as people sometimes tell).
The 2-byte string is UCS-2 (almost the same as utf-16).
I've corrected the word unicode above, as it is not correct actually.
There are many encodings SQL-Server will not be able to interpret natively.
The string above looks like it is good for NVARCHAR, but this is not guaranteed in any case.
Some more background on binary encoding
SQL-Server knows BINARY and VARBINARY as a real BLOB-Type. In the result of a SELECT they are presented as HEX-string and in a script you can use a HEX-string as native input. But it is important to know, that this HEX-string is not the actual value!, just the human readable representation on a computer screen.
And there is a real string, which looks like a HEX-string (but isn't).
0x123 != '0x123'
If you have a string, which is a HEX-string, but is coming to you as "normal" string (e.g. in a text based container like a CSV file or an XML) you have to convert this.
And, not really related to this question, just to mention it: There are more string based binary representers like base64.
Some examples
--start with a normal string
DECLARE #str VARCHAR(100)='This is a test to explain conversions from string to hex to binary and back';
--see the HEX string (real binary!)
SELECT CAST(#str AS VARBINARY(MAX)) ThisIsTheHexStringOfTheString;
--I copy the binary behind the "=" _wihtout_ quotes
DECLARE #ThisIsTheBinary VARBINARY(MAX)=0x546869732069732061207465737420746F206578706C61696E20636F6E76657273696F6E732066726F6D20737472696E6720746F2068657820746F2062696E61727920616E64206261636B;
--This can be re-casted directly
SELECT CAST(#ThisIsTheBinary AS VARCHAR(MAX)) ThisIsReconvertedBinary;
--there is an undocumented function providing a HEX-string from a binary
DECLARE #aHEXstring VARCHAR(MAX)=sys.fn_varbintohexstr(CAST(#str AS VARBINARY(MAX)));
--This string looks exactly the same as above, but it is a string
SELECT #aHEXstring AS ThisIsStringWhichLooksLikeHEX;
--You can use dynamic SQL
EXEC('SELECT CAST(' + #aHEXstring + ' AS VARCHAR(MAX)) AS CastedViaDynamicSQL');
--or CONVERT's abilities (read the documentation!)
SELECT CAST(CONVERT(VARBINARY(MAX),#aHEXstring,1) AS VARCHAR(MAX)) AS ConvertedViaCONVERT

TSQL - Base64 Encoding Issues - text v/s column

After referring to the question Base64 encoding in SQL Server 2005 T-SQL, I tried to get the base64 values for some data from sql table but it's not giving proper values when compared to direct text values.
Using Direct text:
SELECT CAST('?' as varbinary) FOR XML PATH(''), BINARY BASE64
gives value as Pw== which is correct and it decodes to ?
Using Database entry:
SELECT CAST([Col] as varbinary) from tblTable FOR XML PATH(''), BINARY BASE64
with [Col] value = ?, gives output as PwA= which when decoded gives ? and an extra non-printable character.
Screenshot when checked using len function.
The reason for this is that I want to convert data for few columns from plain text to base64 using update statement, this is just sample value and actual values are bigger text which is also having this problem but with every character.
Edit: This when decoded from ASP.Net, if it's on label then it displays properly but when used in textbox shows extra junk characters.
Two things:
First, the "Direct Text" example:
SELECT CAST('?' as varbinary) FOR XML PATH(''), BINARY BASE64
----
Pw==
Is encoding the single byte (VARCHAR) character to base 64.
For an NVARCHAR, a 2 bytes per character type, it should be this:
SELECT CAST(N'?' as varbinary) FOR XML PATH(''), BINARY BASE64
----
PwA=
In the sceond part of your question, asking why there is an extra character produced during decoding of your previously encoded NVARCHAR type column. You're actually taking the 2 bytes encoded as base 64 and converting them to 2 single byte (VARCHAR) characters.
In order to decode to NVARCHAR you need to do this:
SELECT CAST(CAST( 'PwA=' as XML ).value('.','varbinary(max)') AS NVARCHAR(250) )
---
?

MS SQL XQuery xs:base64Binary returns NULL

I (have to) use base64Binary to convert my base64 encoded string into bytes. In most cases it works good enough, but from time to time it returns NULL .
For example this works like a charm:
DECLARE #Base64String VARCHAR(MAX)
SET #Base64String = 'qwerqwerqwerqwer'
declare #Base64Binary VARBINARY(MAX)
set #Base64Binary = cast('' as xml).value('xs:base64Binary(sql:variable("#Base64String"))', 'VARBINARY(max)');
select #Base64Binary as 'base64'
Result is 0xAB07ABAB07ABAB07ABAB07AB and that's ok for me.
But if I set SET #Base64String = 'qwerqwerqwerqwe=' then I get NULL as result. Why? I pass pretty valid base64 string and expect not null value. I've tried to find some workaround, but no luck. How can I made xs:base64Binary to return valid varbinary value for such input strings?
Having had a little look at this, I would suggest that qwerqwerqwerqwe= is not a valid base64 string.
Decoding qwerqwerqwerqwe= using a base64 conversion tool in C# renders the following:
0xAB07ABAB07ABAB07ABAB07
Encoding this in SQL server actually gives the output qwerqwerqwerqwc=:
DECLARE #Base64String VARCHAR(MAX)
DECLARE #Base64Binary VARBINARY(MAX)
SET #Base64Binary = 0xAB07ABAB07ABAB07ABAB07
PRINT #Base64Binary
SET #Base64String = CAST('' AS XML).value('xs:base64Binary(sql:variable("#Base64Binary"))', 'VARCHAR(max)');
PRINT #Base64String
I would suggest that the reason that SQL Server is returning NULL to you is that the base64 string you are working with is not actually valid.

Why does casting a UTF-8 VARCHAR column to XML require converting to NVARCHAR and encoding change?

I am trying to convert data in a varchar column to XML but I was getting errors with certain characters. Running this ...
-- This fails
DECLARE #Data VARCHAR(1000) = '<?xml version="1.0" encoding="utf-8"?><NewDataSet>Test¦</NewDataSet>';
SELECT CAST(#Data AS XML) AS DataXml
... results in the following error
Msg 9420, Level 16, State 1, Line 3
XML parsing: line 1, character 55, illegal xml character
It appears that it's the broken pipe character that is causing the error but I thought that it was a valid character for UTF-8. Looking at the XML spec it appears to be valid.
When I change it to this ...
-- This works
DECLARE #Data VARCHAR(1000) = '<?xml version="1.0" encoding="utf-8"?><NewDataSet>Test¦</NewDataSet>';
SELECT CAST(REPLACE(CAST(#Data AS NVARCHAR(MAX)), 'encoding="utf-8"', '') AS XML) AS DataXml
... it works without error (replacing encoding string to utf-16 also works). I'm using SQL Server 2008 R2 with SQL_Latin1_General_CP1_CI_AS Coallation.
Can anyone tell my why I need to convert to NVARCHAR and strip the encoding="utf-8" for this to work?
Thanks,
Edit
It appears that this also works ...
DECLARE #Data VARCHAR(1000) = '<?xml version="1.0" encoding="utf-8"?><NewDataSet>Test¦</NewDataSet>';
SELECT CAST(REPLACE(#Data, 'encoding="utf-8"', '') AS XML) AS DataXml
Removing the utf-8 encoding from the prolog is sufficient for SQL Server to do the conversion.
Remy's answer is, unfortunately, incorrect. VARCHAR absolutely does support Extended ASCII. Standard ASCII is only the first 128 values (0x00 - 0x7F). That happens to be the same for all code pages (i.e. 8-bit VARCHAR data) and UTF-16 (i.e. 16-bit NVARCHAR data) in SQL Server. Extended ASCII covers the remaining 128 of the 256 total values (0x80 - 0xFF). These 128 values / code points differ per code page, though there is a lot of overlap between some of them.
Remy states that VARCHAR does not support U+00A6 BROKEN BAR. This is easily disproven by simply adding SELECT #Data; after the first line:
DECLARE #Data VARCHAR(1000) =
'<?xml version="1.0" encoding="utf-8"?><NewDataSet>Test¦</NewDataSet>';
SELECT #Data;
That returns:
<?xml version="1.0" encoding="utf-8"?><NewDataSet>Test¦</NewDataSet>
The ¦ character is clearly supported, so the problem must be something else.
It appears that it's the broken pipe character that is causing the error but I thought that it was a valid character for UTF-8.
The broken pipe character is a valid character in UTF-8. The problem is: you aren't passing in UTF-8 data. Yes, you state that the encoding is UTF-8 in the xml declaration, but that doesn't mean that the data is UTF-8, it merely sets the expectation that it needs to be UTF-8.
You are converting a VARCHAR literal into XML. Your database's default collation is SQL_Latin1_General_CP1_CI_AS which uses the Windows-1252 code page for VARCHAR data. This means that the broken vertical bar character has a value of 166 or 0xA6. Well, 0xA6 is not a valid UTF-8 encoded anything. If you were truly passing in UTF-8 encoded data, then that broken vertical bar character would be two bytes: 0xC2 and then 0xA6. If we add that 0xC2 byte to the original input value (the 0xA6 is the same, so we can keep that where it is), we get:
DECLARE #Data VARCHAR(1000) = '<?xml version="1.0" encoding="utf-8"?><NewDataSet>Test'
+ CHAR(0xC2) + '¦</NewDataSet>';
SELECT #Data AS [#Data];
SELECT CAST(#Data AS XML) AS [DataXml];
and that returns:
<?xml version="1.0" encoding="utf-8"?><NewDataSet>Test¦</NewDataSet>
followed by:
<NewDataSet>Test¦</NewDataSet>
This is why removing the encoding="utf-8" fixed the problem:
with it there, the bytes of that string needed to actually be UTF-8 but they weren't, and ...
with it removed, the encoding is assumed to be the same as the string itself, which is VARCHAR, and that means the encoding is the code page associated with the collation of the string, and a VARCHAR literal or variable uses the database's default collation. Meaning, in this context, either without the encoding="xxxxxx", or with encoding="Windows-1252", the bytes will need to be encoded as Windows-1252, and indeed they are.
Putting this all together, we get:
If you have an actual UTF-8 encoded string, then it can be passed into the XML datatype, but you need to have:
no upper-case "N" prefixing the string literal, and no NVARCHAR variable or column being used to contain the string
the XML declaration stating that the encoding is UTF-8
If you have a string encoded in the code page that is associated with the database's default collation, then you need to have:
no upper-case "N" prefixing the string literal, and no NVARCHAR variable or column being used to contain the string
either no "encoding" as part of an <?xml ?> declaration, or have encoding set to the code page associated with the database's default collation (e.g. Windows-1252 for code page 1252)
If your string is already Unicode, then you need to:
prefix a string literal with an upper-case "N" or use an NVARCHAR variable or column for the incoming XML
have either no "encoding" as part of an <?xml ?> declaration, or have encoding set to "utf-16"
Please see my answer to "Converting accented characters in varchar() to XML causing “illegal XML character”" for more details on this.
And, just to have it stated: while SQL Server 2019 introduced native support for UTF-8 in VARCHAR literals, variables, and columns, that has no impact on what is being discussed in this answer.
For info on collations, character encoding, etc, please visit: Collations Info
Your pipe character is using Unicode codepoint U+00A6 BROKEN BAR instead of U+007C VERTICAL LINE. U+00A6 is outside of ASCII. VARCHAR does not support non-ASCII characters. That is why you have to use NVARCHAR instead, which is designed to handle Unicode data.

Cast FOR XML to Varchar(max) [duplicate]

This question already has an answer here:
For XML length limitation
(1 answer)
Closed 10 years ago.
I have a query that returns XML which I want to convert to varchar. My query returns 93,643 characters of XML. When I try to cast my xml result as varchar, I only get 43,679 characters when I copy the result set to a text editor. When I do len(xmlString), I get 93,643 characters.
I know from this post that varchar(max) can have up to 2^31 characters and 1 byte = 1 character, but it seems to be cutting off my data.
Do XML characters count as more than 1 byte? Why am I not able to select all the data from my xml result?
CAST((SELECT COLUMNS FROM TABLE FOR XML PATH('Name'), TYPE) AS VARCHAR(MAX)
This is just a limitation of the Managementstudio.
With a testquery on a bigger table I get described 43,679 characters.
The same Query deliveres 267089 characters in a application via ADO.
Not sure why you need to cast you xml data to varchar(max) but if you just want to copy all data don't cast it at all. In this case in the result window you will see one cell with a clickable value (just like a web link). Click it and all your data will be opened in a new window, then you will be able to save it like a file or just copy it. Hope it helps.

Resources