SQL Server 2014 - Parsing XML with Cyrillic Characters

SQL Server 2014 - Parsing XML with Cyrillic Characters - sql-server

I'm parsing an xml file but have problem with cyrillic characters:
this is the relevant part of the stored Procedure
SOAP input to parse:
'<?xml version="1.0"?>
<soapenv:Envelope xmlns:.......>
<soapenv:Header>
</soapenv:Header>
<soapenv:Body>
<GetResponse>
<BuyerInfo>
<Name>Polydoros Stoltidys</Name>
<Street>Луговой проезд дом 4 корпус 1 квартира 12</Street>
</BuyerInfo>
</GetResponse>
</soapenv:Body>
</soapenv:Envelope>'
Stored Procedure
CREATE PROCEDURE dbo.spXML_ParseSOAP
(
#XML XML
)
AS
SET NOCOUNT ON;
DECLARE #S nvarchar(max)='',
#C nvarchar(max)='',
#D nvarchar(max)=''
SELECT
#C= IIF (CHARINDEX('['+T.X.value('local-name(.)', 'nvarchar(100)')+']',#C)=0, CONCAT( ISNULL(#C + ',','') , QUOTENAME(T.X.value('local-name(.)', 'nvarchar(100)'))), #C),
#D= IIF (CHARINDEX('['+T.X.value('local-name(.)', 'nvarchar(100)')+']',#CP)=0, CONCAT( ISNULL(#D + ',N','') , '''', T.X.value(N'text()[1]', 'nvarchar(max)'),''''), #D),
FROM #XML.nodes('//*[count(child::*) = 0]') AS T(X)
WHERE T.X.value(N'local-name(.)', 'nvarchar(500)')
IN (select name from Customers.sys.columns where [object_id]=#O and is_identity=0)
SET #S=N'INSERT INTO Sales.dbo.ShippingAddress ('+#C+',ShippingAddressID) VALUES ('+#D+','''+#FADR+''')
Print #S
the problem is that #S looks like this
INSERT INTO Sales.dbo.ShippingAddress ([Name],[Street1],ShippingAddressID)
VALUES
(N'Polydoros Sample',N'??????? ?????? ??? 4 ?????? 1 ???????? 12','KkQ0LhbhwXfzi+Ko1Ai6s+SDZRT2kYhYC3vM2x2TB5Y=')
where Cyrillic Charachters are transformed into ???
I put the N before all input but problem is clearly before:
I can suppose is in the
T.X.value(N'text()[1]', 'nvarchar(max)')
but I do not know how solve it.
Can suggest a solution?
Thanks

Your DECLARE #XML line is wrong. The string literal needs to be prefixed with a capital N. The characters are getting converted to ? in the interpretation of that literal.
Also, you have not prefixed all string literals with a capital-N, but you have at least one of them prefixed (the first one in the SET #S = N' line, and so the rest of the literals (which are VARCHAR without the N prefix) will be implicitly converted to NVARCHAR.
The following adaptation of your updated code shows this behavior, and how placing the N prefix on the input string (prior to calling the Stored Procedure) fixes the problem:
DECLARE #XML XML = N' <!-- remove the N from the left to get all ???? for "Street"-->
<BuyerInfo>
<Name>Polydoros Stoltidys</Name>
<Street>Луговой проезд дом 4 корпус 1 квартира 12</Street>
</BuyerInfo>
';
DECLARE #S nvarchar(max)='',
#C nvarchar(max)='Street',
#D nvarchar(max)=''
SELECT
#D= IIF (T.X.value('local-name(.)', 'nvarchar(100)') = N'Street',
T.X.value('./text()[1]', 'nvarchar(100)'),
#C)
FROM #XML.nodes('//*[count(child::*) = 0]') AS T(X)
SET #S=N'INSERT INTO Sales.dbo.ShippingAddress ('
+ #C+',ShippingAddressID) VALUES (N'''+#D+''',''a'') '
Print #S;
Also, SQL Server XML does not ever store the <?xml ... ?> declaration line, so you might as well remove it from the beginning of the literal value.

First of all: If this solves your problem, please accept srutzky's answer, it is the correct answer to solve your initial example with the declared variable. (but you may vote on this :-) ).
This is just an example to show the problem:
Try this
SELECT 'Луговой проезд'
SELECT N'Луговой проезд'
And now try this:
CREATE PROCEDURE dbo.TestXML(#xml XML)
AS
BEGIN
SELECT #xml;
END
GO
EXEC dbo.TestXML '<root><Street>Луговой проезд дом 4 корпус 1 квартира 12</Street></root>';
returns
<root>
<Street>??????? ?????? ??? 4 ?????? 1 ???????? 12</Street>
</root>
While this call (see the leading "N")
EXEC dbo.TestXML N'<root><Street>Луговой проезд дом 4 корпус 1 квартира 12</Street></root>';
returns
<root>
<Street>Луговой проезд дом 4 корпус 1 квартира 12</Street>
</root>
Conclusio
This does not happen within your procedure. The string you pass over to the stored procedure is wrong before you even enter the SP.

Related

Max Value in split XML using SQL Server

Anyone able to advice why does my max value return smaller value?
DECLARE #SalesYear as nvarchar(max),
#SalesPeriod as nvarchar(max)
SET #SalesYear = 2020
SET #SalesPeriod = '5,6,7,8,10'
BEGIN
DECLARE #Split char(1)=',',
#X xml
SELECT #X = CONVERT(xml, ' <root> <myvalue>' +
REPLACE(#SalesPeriod,#Split,'</myvalue> <myvalue>') + '</myvalue> </root>')
IF (OBJECT_ID('tempdb..#breakdown') IS NOT NULL)
BEGIN
DROP TABLE #breakdown
END
SELECT T.c.value('.','varchar(20)') breakdown
INTO #breakdown
FROM #X.nodes('/root/myvalue') T(c)
END
SELECT MAX(breakdown)
FROM #breakdown
It returns max value as '8' instead of '10'. Anything wrong with my code?

I would change the type INT instead of varchar as SalesPeriod seems have a numerical values :
SELECT T.c.value('.','INT') AS breakdown INTO #breakdown
FROM #X.nodes('/root/myvalue') T(c)
So, string value comparisons 8 will be higher than 10, You can check :
select case when '8' > '10' then 1 else 0 end
If you change the type (remove quotes), you will see right flagging. So, i would recommend to use appropriate datatype.

What is the encode(<columnName>, 'escape') PostgreSQL equivalent in SQL Server?

In the same vein as this question, what is the equivalent in SQL Server to the following Postgres statement?
select encode(some_field, 'escape') from only some_table

As you were told already, SQL-Server is not the best with such issues.
The most important advise to avoid such issues is: Use the appropriate data type to store your values. Storing binary data as a HEX-string is running against this best practice. But there are some workarounds:
I use the HEX-string taken from the linked question:
DECLARE #str VARCHAR(100)='0x61736461640061736461736400';
--here I use dynamically created SQL to get the HEX-string as a real binary:
DECLARE #convBin VARBINARY(MAX);
DECLARE #cmd NVARCHAR(MAX)=N'SELECT #bin=' + #str;
EXEC sp_executeSql #cmd
,N'#bin VARBINARY(MAX) OUTPUT'
,#bin=#convBin OUTPUT;
--This real binary can be converted to a VARCHAR(MAX).
--Be aware, that in this case the input contains 00 as this is an array.
--It is possible to split the input at the 00s, but this is going to far...
SELECT #convBin AS HexStringAsRealBinary
,CAST(#convBin AS VARCHAR(MAX)) AS CastedToString; --You will see the first "asda" only
--If your HEX-string is not longer than 10 bytes there is an undocumented function:
--You'll see, that the final AA is cut away, while a shorter string would be filled with zeros.
SELECT sys.fn_cdc_hexstrtobin('0x00112233445566778899AA')
SELECT CAST(sys.fn_cdc_hexstrtobin(#str) AS VARCHAR(100));
UPDATE: An inlinable approach
The following recursive CTE will read the HEX-string character by character.
Furthermore it will group the result and return two rows in this case.
This solution is very specific to the given input.
DECLARE #str VARCHAR(100)='0x61736461640061736461736400';
WITH recCTE AS
(
SELECT 1 AS position
,1 AS GroupingKey
,SUBSTRING(#str,3,2) AS HEXCode
,CHAR(SUBSTRING(sys.fn_cdc_hexstrtobin('0x' + SUBSTRING(#str,3,2)),1,1)) AS TheLetter
UNION ALL
SELECT r.position+1
,r.GroupingKey + CASE WHEN SUBSTRING(#str,2+(r.position)*2+1,2)='00' THEN 1 ELSE 0 END
,SUBSTRING(#str,2+(r.position)*2+1,2)
,CHAR(SUBSTRING(sys.fn_cdc_hexstrtobin('0x' + SUBSTRING(#str,2+(r.position)*2+1,2)),1,1)) AS TheLetter
FROM recCTE r
WHERE position<LEN(#str)/2
)
SELECT r.GroupingKey
,(
SELECT x.TheLetter AS [*]
FROM recCTE x
WHERE x.GroupingKey=r.GroupingKey
AND x.HEXCode<>'00'
AND LEN(x.HEXCode)>0
ORDER BY x.position
FOR XML PATH(''),TYPE
).value('.','varchar(max)')
FROM recCTE r
GROUP BY r.GroupingKey;
The result
1 asdad
2 asdasd
Hint: Starting with SQL Server 2017 there is STRING_AGG(), which would reduce the final SELECT...

If you need this functionality, it's going to be up to you to implement it. Assuming you just need the escape variant, you can try to implement it as a T-SQL UDF. But pulling strings apart, working character by character and building up a new string just isn't a T-SQL strength. You'd be looking at a WHILE loop to count over the length of the input byte length, SUBSTRING to extract the individual bytes, and CHAR to directly convert the bytes that don't need to be octal encoded.1
If you're going to start down this route (and especially if you want to support the other formats), I'd be looking at using the CLR support in SQL Server, to create the function in a .NET language (C# usually preferred) and use the richer string manipulation functionality there.
Both of the above assume that what you're really wanting is to replicate the escape format of encode. If you just want "take this binary data and give me a safe string to represent it", just use CONVERT to get the binary hex encoded.
1Here's my attempt at it. I'd suggest a lot of testing and tweaking before you use it in anger:
create function Postgresql_encode_escape (#input varbinary(max))
returns varchar(max)
as
begin
declare #i int
declare #len int
declare #out varchar(max)
declare #chr int
select #i = 1, #out = '',#len = DATALENGTH(#input)
while #i <= #len
begin
set #chr = SUBSTRING(#input,#i,1)
if #chr > 31 and #chr < 128
begin
set #out = #out + CHAR(#chr)
end
else
begin
set #out = #out + '\' +
RIGHT('000' + CONVERT(varchar(3),
(#chr / 64)*100 +
((#chr / 8)%8)*10 +
(#chr % 8))
,3)
end
set #i = #i + 1
end
return #out
end

How to convert TIMESTAMP values to VARCHAR in T-SQL as SSMS does?

I am trying to convert a TIMESTAMP field in a table to a string so that it can be printed or executed as part of dynamic SQL. SSMS is able to do it, so there must be a built-in method to do it. However, I can't get it to work using T-SQL.
The following correctly displays a table result:
SELECT TOP 1 RowVersion FROM MyTable
It shows 0x00000000288D17AE. However, I need the result to be part of a larger string.
DECLARE #res VARCHAR(MAX) = (SELECT TOP 1 'test' + CONVERT(BINARY(8), RowVersion) FROM MyTable)
PRINT(#res)
This yields an error: The data types varchar and binary are incompatible in the add operator
DECLARE #res VARCHAR(MAX) = (SELECT TOP 1 'test' + CONVERT(VARCHAR(MAX), RowVersion) FROM MyTable)
PRINT(#res)
This results in garbage characters: test (®
In fact, the spaces are just null characters and terminate the string for the purpose of running dynamic SQL using EXEC().
DECLARE #sql VARCHAR(MAX) = 'SELECT TOP 1 ''test'' + CONVERT(VARCHAR(MAX), RowVersion) FROM MyTable'
EXEC (#sql)
This just displays a table result with the word "test". Everything after "test" in the dynamic SQL is cut off because the CONVERT function returns terminating null characters first.
Obviously, what I want the resultant string to be is "test0x00000000288D17AE" or even the decimal equivalent, which in this case would be "test680335278".
Any ideas would be greatly appreciated.

SELECT 'test' + CONVERT(NVARCHAR(MAX), CONVERT(BINARY(8), RowVersion), 1). The trick is the 1 to the CONVERT as the style, per the documentation. (Pass 2 to omit the 0x.)

As mentioned in the comments, the undocumented function master.sys.fn_varbintohexstr will convert binary to string such that you could then concatenate with some other string value:
DECLARE #binary BINARY(8)
SELECT #binary = CAST(1234567890 AS BINARY(8))
SELECT #binary AS BinaryValue,
LEFT(master.sys.fn_varbintohexstr(#binary),2) + UPPER(RIGHT(master.sys.fn_varbintohexstr(#binary),LEN(master.sys.fn_varbintohexstr(#binary))-2)) AS VarcharValue,
'test' + LEFT(master.sys.fn_varbintohexstr(#binary),2) + UPPER(RIGHT(master.sys.fn_varbintohexstr(#binary),LEN(master.sys.fn_varbintohexstr(#binary))-2)) AS ConcatenatedVarcharValue
I went ahead and split the first two characters and did not apply the UPPER function to them, to exactly reproduce the format as displayed when a binary value.
Results:
/--------------------------------------------------------------------\
| BinaryValue | VarcharValue | ConcatenatedVarcharValue |
|--------------------+--------------------+--------------------------|
| 0x00000000499602D2 | 0x00000000499602D2 | test0x00000000499602D2 |
\--------------------------------------------------------------------/

Have a look at this:
SELECT
substring(replace(replace(replace(replace(cast(CAST(GETDATE() AS datetime2) as
varchar(50)),'-',''),' ',''),':',''),'.',''),1,18)

parsing character error in SQL via XML

I have the below code which errors when I run it because it has the "&" sign and can not convert it.
the result should display "testing &". however if I change the xml bit to "testing &" it works. I need a way to replace it so that it does not error.
Declare #Request XML = null
If #Request IS NULL
BEGIN
SET #Request = '
<Request>
<ProductRequest>
<ProductName>testing &</ProductName>
</ProductRequest>
</Request>'
END
select #Request.value ('(//ProductName)[1]','nvarchar(100)')

You are probably looking for this:
Declare #Request XML = null
If #Request IS NULL
BEGIN
SET #Request = (SELECT 'testing &' AS ProductName FOR XML PATH('ProductRequest'),ROOT('Request'));
END
select #Request.value ('(//ProductName)[1]','nvarchar(100)')
Some background:
XML is more than just some text with extra characters. XML should never be generated just by typing (as in your case) or by string concatenation (often seen). Use the proper method to generate your XML and all encoding issues are solved for you implicitly.
Look at the XML generated and you will find, that the & is found as &. While reading this with value() the re-encoding is done for you - again implicitly.
You should not start to do own REPLACE approaches. Next day someone enters a <or > or another not supported character and you have the same troubles again.

The & is a reserved/special character in XML. It should be &amp ; and remove space between &amp and ;
as the next:
Declare #Request XML = null
If #Request IS NULL
BEGIN
SET #Request = '
<Request>
<ProductRequest>
<ProductName>testing &</ProductName>
</ProductRequest>
</Request>'
END
select #Request.value ('(//ProductName)[1]','nvarchar(100)')

You need to specify the entity reference & in the XML string for the ampersand:
DECLARE #Request XML = NULL;
IF #Request IS NULL
BEGIN
SET #Request = '
<Request>
<ProductRequest>
<ProductName>testing &</ProductName>
</ProductRequest>
</Request>';
END;
SELECT #Request.value ('(//ProductName)[1]','nvarchar(100)');
See https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references

Read value in XML Node - T-SQL

This is my code.......
DECLARE #XML AS XML;
SET #XML = CAST('<Session id="ID969138672" realTimeID="4300815712">
<VarValues>
<varValue id="ID123" source="Internal" name="DisconnectedBy">VisitorClosedWindow</varValue>
<varValue id="ID1234" source="PreChat" name="email">1234#mail.ru</varValue>
</VarValues>
</Session>
' AS XML)
SELECT
xmlData.Col.value('#id','varchar(max)')
,xmlData.Col.value('#source','varchar(max)')
,xmlData.Col.value('#name','varchar(max)')
FROM #XML.nodes('//Session/VarValues/varValue') xmlData(Col);
This is the output.....
How can I include the actual values of the varValue?
I need to read the values VisistorClosedWindow and 1234#mail.ru values as well

You can get that by doing this:
xmlData.Col.value('.','varchar(max)')
So the select would be:
SELECT
xmlData.Col.value('#id','varchar(max)')
,xmlData.Col.value('#source','varchar(max)')
,xmlData.Col.value('#name','varchar(max)')
,xmlData.Col.value('.','varchar(max)')
FROM #XML.nodes('//Session/VarValues/varValue') xmlData(Col);

Just use the .value('.', 'varchar(50)) line for that:
SELECT
xmlData.Col.value('#id','varchar(25)'),
xmlData.Col.value('#source','varchar(50)'),
xmlData.Col.value('#name','varchar(50)'),
xmlData.Col.value('.','varchar(50)') -- <== this gets your the element's value
FROM #XML.nodes('//Session/VarValues/varValue') xmlData(Col);

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

SQL Server 2014 - Parsing XML with Cyrillic Characters - sql-server

Related

Max Value in split XML using SQL Server

What is the encode(<columnName>, 'escape') PostgreSQL equivalent in SQL Server?

How to convert TIMESTAMP values to VARCHAR in T-SQL as SSMS does?

parsing character error in SQL via XML

Read value in XML Node - T-SQL

Categories

Resources