What is exactly XML Path? - sql-server

Can someone please explain with more detail what is actually SELECT (STUFF SELECT...))FOR XML PATH(''), TYPE).value('.','NVARCHAR(max)')
From what I know:
XmlPath('')to concatenate column data into single row.
Stuff is used to remove the first ‘,’ after string concatenation.
So what about TYPE).value('.','NVARCHAR(max)') use for ?

So what about TYPE).value('.','NVARCHAR(max)') use for ?
The basics of FOR XML PATH you can find in a lot of questions/answers on SO (e.g. 1, or 2).
I'll focus on the TYPE directive. Consider the following T-SQL statement which concatenates the strings in the derived table dt:
SELECT
[text()]=dt.x+';'
FROM
(
VALUES('text > blabla'),
('text < blabla'),
('f&m')
) AS dt(x)
FOR XML
PATH('');
The result is:
text > blabla;text < blabla;f&m;
You'll see that the >, < and & are substituted by the >, < and &. The special characters (XML predefined entities) will be replaced.
You can disable this by specifying the TYPE directive, which exports this as XML rather than text, then getting the string value from the XML. Getting this string value from the resulting XML, can be done by using the value() method.
Using the TYPE directive in the above example:
SELECT
(
SELECT
[text()]=dt.x+';'
FROM
(
VALUES('text > blabla'),
('text < blabla'),
('f&m')
) AS dt(x)
FOR XML
PATH(''), TYPE
).value('.','NVARCHAR(MAX)');
You'd get:
text > blabla;text < blabla;f&m;
You can see that the XML special characters are not substituted.

Related

SQL Server geography: Reduce size (decimal precision) of WKT text

For my farming app, a stored proc retrieves paddock/field boundaries that are stored as SQL Server geography data type, for display on the user's mobile device.
SQL Server's .ToString() and .STAsText() functions render each vertex as a lat/long pair, to 15 decimal places. From this answer, 15 decimals defines a location to within the width of an atom! To the nearest metre would be good enough for me.
The resulting overly-precise payload is extremely large and too slow for use on larger farms.
From my SQL Server geography data, I would like to produce WKT that is formatted to 4 or 5 decimals. I could find no built-in methods, but my best leads are:
Postgis and Google Cloud "BigQuery" have the ST_SNAPTOGRID function, which would be perfect, and
Regex might be useful, e.g. this answer, but SQL Server doesn't seem to have a regex replace.
I would think this is a common problem: is there a simple solution?
By stepping through #iamdave's excellent answer, and using that same approach, it looks like we need split only on periods.... I think we can ignore all parantheses and commas, and ignore the POLYGON prefix (which means it'll work across other GEOGRAPHY types, such as MULTIPOLYGON.)
i.e. Each time we find a period, grab only the next 4 characters after it, and throw away any numbers after that (until we hit a non-number.)
This works for me (using #iamdave's test data):
DECLARE #wkt NVARCHAR(MAX), #wktShort NVARCHAR(MAX);
DECLARE #decimalPlaces int = 4;
SET #wkt = 'POLYGON((-121.973669 37.365336,-121.97367 37.365336,-121.973642 37.365309,-121.973415 37.365309,-121.973189 37.365309,-121.973002 37.365912,-121.972815 37.366515,-121.972796 37.366532,-121.972776 37.366549,-121.972627 37.366424,-121.972478 37.366299,-121.972422 37.366299,-121.972366 37.366299,-121.972298 37.366356,-121.97223 37.366412,-121.97215 37.366505,-121.97207 37.366598,-121.971908 37.366794,-121.971489 37.367353,-121.971396 37.367484,-121.971285 37.36769,-121.971173 37.367897,-121.971121 37.368072,-121.971068 37.368248,-121.971028 37.36847,-121.970987 37.368692,-121.970987 37.368779,-121.970987 37.368866,-121.970949 37.368923,-121.970912 37.36898,-121.970935 37.36898,-121.970958 37.36898,-121.970975 37.368933,-121.970993 37.368887,-121.971067 37.368807,-121.97114 37.368726,-121.971124 37.368705,-121.971108 37.368685,-121.971136 37.368698,-121.971163 37.368712,-121.97134 37.368531,-121.971516 37.368351,-121.971697 37.368186,-121.971878 37.368021,-121.972085 37.367846,-121.972293 37.36767,-121.972331 37.367629,-121.972369 37.367588,-121.972125 37.367763,-121.97188 37.367938,-121.971612 37.36815,-121.971345 37.368362,-121.971321 37.36835,-121.971297 37.368338,-121.971323 37.368298,-121.97135 37.368259,-121.971569 37.368062,-121.971788 37.367865,-121.971977 37.367716,-121.972166 37.367567,-121.972345 37.367442,-121.972524 37.367317,-121.972605 37.367272,-121.972687 37.367227,-121.972728 37.367227,-121.972769 37.367227,-121.972769 37.367259,-121.972769 37.367291,-121.972612 37.367416,-121.972454 37.367542,-121.972488 37.367558,-121.972521 37.367575,-121.972404 37.367674,-121.972286 37.367773,-121.972194 37.367851,-121.972101 37.367928,-121.972046 37.36799,-121.971991 37.368052,-121.972008 37.368052,-121.972025 37.368052,-121.972143 37.367959,-121.972261 37.367866,-121.972296 37.367866,-121.972276 37.36794,-121.972221 37.36798,-121.972094 37.368097,-121.971966 37.368214,-121.971956 37.368324,-121.971945 37.368433,-121.971907 37.368753,-121.971868 37.369073,-121.97184 37.369578,-121.971812 37.370083,-121.971798 37.370212,-121.971783 37.370342,-121.971542 37.370486,-121.971904 37.370324,-121.972085 37.37028,-121.972266 37.370236,-121.972559 37.370196,-121.972852 37.370155,-121.973019 37.370155,-121.973186 37.370155,-121.973232 37.370136,-121.973279 37.370116,-121.973307 37.370058,-121.973336 37.370001,-121.973363 37.369836,-121.973391 37.369671,-121.973419 37.369227,-121.973446 37.368784,-121.973429 37.368413,-121.973413 37.368041,-121.973361 37.367714,-121.973308 37.367387,-121.973285 37.367339,-121.973262 37.36729,-121.973126 37.3673,-121.972989 37.36731,-121.973066 37.36728,-121.973144 37.367251,-121.973269 37.367237,-121.973393 37.367223,-121.973443 37.367158,-121.973493 37.367093,-121.973518 37.36702,-121.973543 37.366947,-121.973582 37.366618,-121.973622 37.366288,-121.97366 37.365826,-121.973698 37.365363,-121.973669 37.365336))';
-- Split on '.', then get the next N decimals, and find the index of the first non-number.
-- Then recombine the fragments, skipping the unwanted numbers.
WITH points AS (
SELECT value, LEFT(value, #decimalPlaces) AS decimals, PATINDEX('%[^0-9]%', value) AS indx
FROM STRING_SPLIT(#wkt, '.')
)
SELECT #wktShort = STRING_AGG(IIF(indx < #decimalPlaces, '', decimals) + SUBSTRING(value, indx, LEN(value)), '.')
FROM points;
Comparing original vs shortened, we can see that each number is truncated to 4dp:
SELECT #wkt AS Text UNION ALL SELECT #wktShort;
Edit
I believe I may have misunderstood your question and you are looking to transmit the WKT rather than the binary representations of the polygons? If that is the case, my answer below still shows you how to chop off some decimal places (without rounding them). Just don't wrap the stuff(...) FOR XML in a STGeomFromText and you have the amended WKT.
When working with geography data types, it can be handy to maintain a very detailed 'master' version, from which you generate and persist less detailed versions based on your requirements.
An easy way to generate these reduced complexity polygons is to use the helpfully named Reduce function, which I think will actually help you in this situation.
If you would prefer to go down the route of reducing the number of decimal places you will either have to write a custom CLR function or enter the wonderful world of SQL Server string manipulation!
SQL Query
declare #DecimalPlaces int = 4; -- Specify the desired number of lat/long decimals
with g as(
select p.g -- Original polygon, for comparison purposes
,geography::STGeomFromText('POLYGON((' -- stripped apart and then recreated polygon from text, using a custom string split function. You won't be able to use the built in STRING_SPLIT here as it doesn't guarantee sort order.
+ stuff((select ', ' + left(s.item,charindex('.',s.item,0) + #DecimalPlaces) + substring(s.item,charindex(' ',s.item,0),charindex('.',s.item,charindex(' ',s.item,0)) - charindex(' ',s.item,0) + 1 + #DecimalPlaces)
from dbo.fn_StringSplitMax(replace(replace(p.g.STAsText(),'POLYGON ((',''),'))',''),', ',null) as s
for xml path(''), type).value('.', 'NVARCHAR(MAX)') -- STUFF and FOR XML mimics GROUP_CONCAT functionality seen in other SQL languages, to recombine shortened Points back into a Polygon string
,1,2,''
)
+ '))', 4326).MakeValid() as x -- Remember to make the polygon valid again, as you have been messing with the Point data
from(values(geography::STGeomFromText('POLYGON((-121.973669 37.365336,-121.97367 37.365336,-121.973642 37.365309,-121.973415 37.365309,-121.973189 37.365309,-121.973002 37.365912,-121.972815 37.366515,-121.972796 37.366532,-121.972776 37.366549,-121.972627 37.366424,-121.972478 37.366299,-121.972422 37.366299,-121.972366 37.366299,-121.972298 37.366356,-121.97223 37.366412,-121.97215 37.366505,-121.97207 37.366598,-121.971908 37.366794,-121.971489 37.367353,-121.971396 37.367484,-121.971285 37.36769,-121.971173 37.367897,-121.971121 37.368072,-121.971068 37.368248,-121.971028 37.36847,-121.970987 37.368692,-121.970987 37.368779,-121.970987 37.368866,-121.970949 37.368923,-121.970912 37.36898,-121.970935 37.36898,-121.970958 37.36898,-121.970975 37.368933,-121.970993 37.368887,-121.971067 37.368807,-121.97114 37.368726,-121.971124 37.368705,-121.971108 37.368685,-121.971136 37.368698,-121.971163 37.368712,-121.97134 37.368531,-121.971516 37.368351,-121.971697 37.368186,-121.971878 37.368021,-121.972085 37.367846,-121.972293 37.36767,-121.972331 37.367629,-121.972369 37.367588,-121.972125 37.367763,-121.97188 37.367938,-121.971612 37.36815,-121.971345 37.368362,-121.971321 37.36835,-121.971297 37.368338,-121.971323 37.368298,-121.97135 37.368259,-121.971569 37.368062,-121.971788 37.367865,-121.971977 37.367716,-121.972166 37.367567,-121.972345 37.367442,-121.972524 37.367317,-121.972605 37.367272,-121.972687 37.367227,-121.972728 37.367227,-121.972769 37.367227,-121.972769 37.367259,-121.972769 37.367291,-121.972612 37.367416,-121.972454 37.367542,-121.972488 37.367558,-121.972521 37.367575,-121.972404 37.367674,-121.972286 37.367773,-121.972194 37.367851,-121.972101 37.367928,-121.972046 37.36799,-121.971991 37.368052,-121.972008 37.368052,-121.972025 37.368052,-121.972143 37.367959,-121.972261 37.367866,-121.972296 37.367866,-121.972276 37.36794,-121.972221 37.36798,-121.972094 37.368097,-121.971966 37.368214,-121.971956 37.368324,-121.971945 37.368433,-121.971907 37.368753,-121.971868 37.369073,-121.97184 37.369578,-121.971812 37.370083,-121.971798 37.370212,-121.971783 37.370342,-121.971542 37.370486,-121.971904 37.370324,-121.972085 37.37028,-121.972266 37.370236,-121.972559 37.370196,-121.972852 37.370155,-121.973019 37.370155,-121.973186 37.370155,-121.973232 37.370136,-121.973279 37.370116,-121.973307 37.370058,-121.973336 37.370001,-121.973363 37.369836,-121.973391 37.369671,-121.973419 37.369227,-121.973446 37.368784,-121.973429 37.368413,-121.973413 37.368041,-121.973361 37.367714,-121.973308 37.367387,-121.973285 37.367339,-121.973262 37.36729,-121.973126 37.3673,-121.972989 37.36731,-121.973066 37.36728,-121.973144 37.367251,-121.973269 37.367237,-121.973393 37.367223,-121.973443 37.367158,-121.973493 37.367093,-121.973518 37.36702,-121.973543 37.366947,-121.973582 37.366618,-121.973622 37.366288,-121.97366 37.365826,-121.973698 37.365363,-121.973669 37.365336))', 4326))) as p(g)
)
-- select various versions of the polygons into the same column for overlay comparison in SSMS
select 'Original' as l
,g
from g
union all
select 'Short' as l
,x
from g
union all
select 'Original Reduced' as l
,g.Reduce(10)
from g
union all
select 'Short Reduced' as l
,x.Reduce(10)
from g;
Output
What is interesting to note here is the difference in length of the geog binary representation (simple count of characters as displayed). As I mentioned above, simply using the Reduce function may do what you need, so you will want to test various approaches to see how best you cut down on your data transfer.
+------------------+--------------------+------+
| l | g | Len |
+------------------+--------------------+------+
| Original | 0xE6100000010484...| 4290 |
| Short | 0xE6100000010471...| 3840 |
| Original Reduced | 0xE6100000010418...| 834 |
| Short Reduced | 0xE610000001041E...| 1184 |
+------------------+--------------------+------+
Visual Comparison
String Split Function
As polygon data can be bloody huge, you will need a string splitter that can handle more then 4k or 8k characters. In my case I tend to opt for an xml based approach:
create function [dbo].[fn_StringSplitMax]
(
#str nvarchar(max) = ' ' -- String to split.
,#delimiter as nvarchar(max) = ',' -- Delimiting value to split on.
,#num as int = null -- Which value to return.
)
returns table
as
return
with s as
( -- Convert the string to an XML value, replacing the delimiter with XML tags
select convert(xml,'<x>' + replace((select #str for xml path('')),#delimiter,'</x><x>') + '</x>').query('.') as s
)
select rn
,item -- Select the values from the generated XML value by CROSS APPLYing to the XML nodes
from(select row_number() over (order by (select null)) as rn
,n.x.value('.','nvarchar(max)') as item
from s
cross apply s.nodes('x') as n(x)
) a
where rn = #num
or #num is null;

Search for a certain name followed by any number (or no number)

I want to search for a field matching a name followed by a number of variable size (including no number), but I can't see how to use the wildcards on PATINDEX and LIKE to detect an unknown number of digits.
This is the regexp I would like to check : MYNAME[1-9]*
It has to recognize MYNAME, MYNAME5, MYNAME12, MYNAME275, ...
It shouldn't recognize ANOTHERNAME, MYNAMEXX12, MYNAME12X5, MYNAME12X
PATINDEX and LIKE don't recognize the * on the regexp to indicate a variable number of digits.
Do you know of a way to search for a pattern where a part has a variable size ?.
Thank you.
DECLARE #Test TABLE (
Field VARCHAR(32)
)
INSERT #Test( Field )
VALUES
('MYNAME'),
('MYNAME5'),
('MYNAME12'),
('MYNAME275'),
('MYNAME275TEXT')
SELECT *
FROM #Test
WHERE (
( Field = 'MYNAME'
OR Field LIKE 'MYNAME[0-9]%'
)
AND Field NOT LIKE 'MYNAME[0-9]%[^0-9]%'
)
Bit of a wild card (geddit?) guess, however, perhaps this?
SELECT *
FROM (VALUES('MYNAME'),('MYNAME5'),('MYNAME12'),('MYNAME275'),('MYNAME654A'))V(N)
WHERE V.N = 'MYNAME'
OR (V.N LIKE 'MYNAME[0-9]%'
AND V.N NOT LIKE 'MYNAME[0-9]%[^0-9]');

How to use column value in AS of SELECT FOR XML statement

I have reduced this sample to the absolute necessary minimum. So I'm aware, that it does not make sense :-)
I know, I can select XML from a table with the following query:
select id as [Direction/#Id],
direction as [Direction]
from devicepdo
for xml path('')
The result will look like this:
<Direction Id="1">I</Direction>
<Direction Id="2">O</Direction>``
<Direction Id="3">I</Direction>
<Direction Id="4">O</Direction>
....
The column direction always contains I or O, and I want to change the enclosing XML Tag depending on the column value.
My result should look like this:
<In Id="1">I</In>
<Out Id="2">O</Out>``
<In Id="3">I</In>
<Out Id="4">O</Out>
....
All my tries to use a variable or colum in the "AS" clause failed.
In my query, the "Direction" in "as [Direction/#Id]" should be variable depending on the column value.
Is this possible? Any hints or ideas?
If all attributes are null, and the contents is null, no tag will be generated. You can use this fact to generate different types of tags depending on values in each row.
select
-- If direction is I then generate an In tag
case when direction = 'I' then id else null end as [In/#Id],
case when direction = 'I' then direction else null end as [In],
-- If direction is O then generate an Out tag
case when direction = 'O' then id else null end as [Out/#Id],
case when direction = 'O' then direction else null end as [Out]
from devicepdo
for xml path('')
Effectively you have expressions for all possible tags, but for rows where a type of tag is not desired, use a case statement to make the expression null.

Show comma instead of point as decimal separator

I just want to get the right number format here in germany, so i need to show commas as decimal separator instead of points. But this...
DECLARE #euros money
SET #euros = 1025040.2365
SELECT CONVERT(varchar(30), #euros, 1)
Displays 1,025,040.24 instead of 1.025.040,24 (or 1025040,24). In C# it would be simple to provide the appropriate CultureInfo but how to do it in T-SQL?
Do i really need to use REPLACE? But even if, how to replace 1,025,040.24 correctly?
To provide the appropriate culture info, in SQL 2012 there is the FORMAT() function. Here's an example:
declare #f float = 123456.789;
select
[raw] = str(#f,20,3)
,[standard] = cast(format(#f, 'N', 'en-US') as varchar(20))
,[German] = cast(format(#f, 'N', 'de-DE') as varchar(20))
returns
raw |standard |German |
---------------------|-----------|-----------|
123456.789 |123,456.79 |123.456,79 |
You can also specify in the second parameter a custom format string with the same rules as for .NET.
Docs: https://msdn.microsoft.com/en-US/library/hh213505.aspx
Well, as far as I know, there are no culture-specific options for convert available.
So you can do it using replaces (yes, it looks a bit ugly...)
select
replace(replace(replace(convert(varchar(30), #euros, 1), ',', '|'), '.', ','), '|', '.')
Idea: first change comma to something, then change dot to comma, and then "something" back to dot.
DECLARE #euros money
SET #euros = 1025040.2365
SELECT REPLACE(CONVERT(varchar(30), #euros, 0), '.', ',')
should do it (at least to get 1025040,24)
You could use replace something like this:
DECLARE #euros money
SET #euros = 1025040.2365
SELECT REPLACE(REPLACE(CONVERT(varchar(30), #euros, 1),',',''),'.',',');
SQL Fiddle Demo
You can first replace thousand separator comma(,) to a Zero length string (''), and then you can replace Decimal('.') to comma(',') in the same select statement.

Parse simple XML in SQL Server

How to parse the following xml as recordset?
<root>
<240>0</240>
<241>1</241>
<242>2</242>
<243>3</243>
<249>4</249>
</root>
<root 240="0" 241="1" 242="2" 243="3" 249="4"/>
When I try
declare #ids xml = N'<root><240>0</240><241>1</241></root>'
SELECT T.Item.value('240[1]', 'int')
from #ids.nodes('/root') AS T(Item)
I get an error
ML parsing: line 1, character 8, illegal qualified name character: declare #ids xml = N'<240>0' SELECT T.Item.value('a[1]', 'int') from #ids.nodes('/root') AS T(Item)
But generally I need the following output:
|240|0|
|241|1|
...
When xml elements are named as usual everything is Ok (<row key=240 value="0"/>).
XML disallows to use numbers as first character of an element name. Use format from your example:
<row key=240 value="0"/>

Resources