SQL Server geography: Reduce size (decimal precision) of WKT text - sql-server

For my farming app, a stored proc retrieves paddock/field boundaries that are stored as SQL Server geography data type, for display on the user's mobile device.
SQL Server's .ToString() and .STAsText() functions render each vertex as a lat/long pair, to 15 decimal places. From this answer, 15 decimals defines a location to within the width of an atom! To the nearest metre would be good enough for me.
The resulting overly-precise payload is extremely large and too slow for use on larger farms.
From my SQL Server geography data, I would like to produce WKT that is formatted to 4 or 5 decimals. I could find no built-in methods, but my best leads are:
Postgis and Google Cloud "BigQuery" have the ST_SNAPTOGRID function, which would be perfect, and
Regex might be useful, e.g. this answer, but SQL Server doesn't seem to have a regex replace.
I would think this is a common problem: is there a simple solution?

By stepping through #iamdave's excellent answer, and using that same approach, it looks like we need split only on periods.... I think we can ignore all parantheses and commas, and ignore the POLYGON prefix (which means it'll work across other GEOGRAPHY types, such as MULTIPOLYGON.)
i.e. Each time we find a period, grab only the next 4 characters after it, and throw away any numbers after that (until we hit a non-number.)
This works for me (using #iamdave's test data):
DECLARE #wkt NVARCHAR(MAX), #wktShort NVARCHAR(MAX);
DECLARE #decimalPlaces int = 4;
SET #wkt = 'POLYGON((-121.973669 37.365336,-121.97367 37.365336,-121.973642 37.365309,-121.973415 37.365309,-121.973189 37.365309,-121.973002 37.365912,-121.972815 37.366515,-121.972796 37.366532,-121.972776 37.366549,-121.972627 37.366424,-121.972478 37.366299,-121.972422 37.366299,-121.972366 37.366299,-121.972298 37.366356,-121.97223 37.366412,-121.97215 37.366505,-121.97207 37.366598,-121.971908 37.366794,-121.971489 37.367353,-121.971396 37.367484,-121.971285 37.36769,-121.971173 37.367897,-121.971121 37.368072,-121.971068 37.368248,-121.971028 37.36847,-121.970987 37.368692,-121.970987 37.368779,-121.970987 37.368866,-121.970949 37.368923,-121.970912 37.36898,-121.970935 37.36898,-121.970958 37.36898,-121.970975 37.368933,-121.970993 37.368887,-121.971067 37.368807,-121.97114 37.368726,-121.971124 37.368705,-121.971108 37.368685,-121.971136 37.368698,-121.971163 37.368712,-121.97134 37.368531,-121.971516 37.368351,-121.971697 37.368186,-121.971878 37.368021,-121.972085 37.367846,-121.972293 37.36767,-121.972331 37.367629,-121.972369 37.367588,-121.972125 37.367763,-121.97188 37.367938,-121.971612 37.36815,-121.971345 37.368362,-121.971321 37.36835,-121.971297 37.368338,-121.971323 37.368298,-121.97135 37.368259,-121.971569 37.368062,-121.971788 37.367865,-121.971977 37.367716,-121.972166 37.367567,-121.972345 37.367442,-121.972524 37.367317,-121.972605 37.367272,-121.972687 37.367227,-121.972728 37.367227,-121.972769 37.367227,-121.972769 37.367259,-121.972769 37.367291,-121.972612 37.367416,-121.972454 37.367542,-121.972488 37.367558,-121.972521 37.367575,-121.972404 37.367674,-121.972286 37.367773,-121.972194 37.367851,-121.972101 37.367928,-121.972046 37.36799,-121.971991 37.368052,-121.972008 37.368052,-121.972025 37.368052,-121.972143 37.367959,-121.972261 37.367866,-121.972296 37.367866,-121.972276 37.36794,-121.972221 37.36798,-121.972094 37.368097,-121.971966 37.368214,-121.971956 37.368324,-121.971945 37.368433,-121.971907 37.368753,-121.971868 37.369073,-121.97184 37.369578,-121.971812 37.370083,-121.971798 37.370212,-121.971783 37.370342,-121.971542 37.370486,-121.971904 37.370324,-121.972085 37.37028,-121.972266 37.370236,-121.972559 37.370196,-121.972852 37.370155,-121.973019 37.370155,-121.973186 37.370155,-121.973232 37.370136,-121.973279 37.370116,-121.973307 37.370058,-121.973336 37.370001,-121.973363 37.369836,-121.973391 37.369671,-121.973419 37.369227,-121.973446 37.368784,-121.973429 37.368413,-121.973413 37.368041,-121.973361 37.367714,-121.973308 37.367387,-121.973285 37.367339,-121.973262 37.36729,-121.973126 37.3673,-121.972989 37.36731,-121.973066 37.36728,-121.973144 37.367251,-121.973269 37.367237,-121.973393 37.367223,-121.973443 37.367158,-121.973493 37.367093,-121.973518 37.36702,-121.973543 37.366947,-121.973582 37.366618,-121.973622 37.366288,-121.97366 37.365826,-121.973698 37.365363,-121.973669 37.365336))';
-- Split on '.', then get the next N decimals, and find the index of the first non-number.
-- Then recombine the fragments, skipping the unwanted numbers.
WITH points AS (
SELECT value, LEFT(value, #decimalPlaces) AS decimals, PATINDEX('%[^0-9]%', value) AS indx
FROM STRING_SPLIT(#wkt, '.')
)
SELECT #wktShort = STRING_AGG(IIF(indx < #decimalPlaces, '', decimals) + SUBSTRING(value, indx, LEN(value)), '.')
FROM points;
Comparing original vs shortened, we can see that each number is truncated to 4dp:
SELECT #wkt AS Text UNION ALL SELECT #wktShort;

Edit
I believe I may have misunderstood your question and you are looking to transmit the WKT rather than the binary representations of the polygons? If that is the case, my answer below still shows you how to chop off some decimal places (without rounding them). Just don't wrap the stuff(...) FOR XML in a STGeomFromText and you have the amended WKT.
When working with geography data types, it can be handy to maintain a very detailed 'master' version, from which you generate and persist less detailed versions based on your requirements.
An easy way to generate these reduced complexity polygons is to use the helpfully named Reduce function, which I think will actually help you in this situation.
If you would prefer to go down the route of reducing the number of decimal places you will either have to write a custom CLR function or enter the wonderful world of SQL Server string manipulation!
SQL Query
declare #DecimalPlaces int = 4; -- Specify the desired number of lat/long decimals
with g as(
select p.g -- Original polygon, for comparison purposes
,geography::STGeomFromText('POLYGON((' -- stripped apart and then recreated polygon from text, using a custom string split function. You won't be able to use the built in STRING_SPLIT here as it doesn't guarantee sort order.
+ stuff((select ', ' + left(s.item,charindex('.',s.item,0) + #DecimalPlaces) + substring(s.item,charindex(' ',s.item,0),charindex('.',s.item,charindex(' ',s.item,0)) - charindex(' ',s.item,0) + 1 + #DecimalPlaces)
from dbo.fn_StringSplitMax(replace(replace(p.g.STAsText(),'POLYGON ((',''),'))',''),', ',null) as s
for xml path(''), type).value('.', 'NVARCHAR(MAX)') -- STUFF and FOR XML mimics GROUP_CONCAT functionality seen in other SQL languages, to recombine shortened Points back into a Polygon string
,1,2,''
)
+ '))', 4326).MakeValid() as x -- Remember to make the polygon valid again, as you have been messing with the Point data
from(values(geography::STGeomFromText('POLYGON((-121.973669 37.365336,-121.97367 37.365336,-121.973642 37.365309,-121.973415 37.365309,-121.973189 37.365309,-121.973002 37.365912,-121.972815 37.366515,-121.972796 37.366532,-121.972776 37.366549,-121.972627 37.366424,-121.972478 37.366299,-121.972422 37.366299,-121.972366 37.366299,-121.972298 37.366356,-121.97223 37.366412,-121.97215 37.366505,-121.97207 37.366598,-121.971908 37.366794,-121.971489 37.367353,-121.971396 37.367484,-121.971285 37.36769,-121.971173 37.367897,-121.971121 37.368072,-121.971068 37.368248,-121.971028 37.36847,-121.970987 37.368692,-121.970987 37.368779,-121.970987 37.368866,-121.970949 37.368923,-121.970912 37.36898,-121.970935 37.36898,-121.970958 37.36898,-121.970975 37.368933,-121.970993 37.368887,-121.971067 37.368807,-121.97114 37.368726,-121.971124 37.368705,-121.971108 37.368685,-121.971136 37.368698,-121.971163 37.368712,-121.97134 37.368531,-121.971516 37.368351,-121.971697 37.368186,-121.971878 37.368021,-121.972085 37.367846,-121.972293 37.36767,-121.972331 37.367629,-121.972369 37.367588,-121.972125 37.367763,-121.97188 37.367938,-121.971612 37.36815,-121.971345 37.368362,-121.971321 37.36835,-121.971297 37.368338,-121.971323 37.368298,-121.97135 37.368259,-121.971569 37.368062,-121.971788 37.367865,-121.971977 37.367716,-121.972166 37.367567,-121.972345 37.367442,-121.972524 37.367317,-121.972605 37.367272,-121.972687 37.367227,-121.972728 37.367227,-121.972769 37.367227,-121.972769 37.367259,-121.972769 37.367291,-121.972612 37.367416,-121.972454 37.367542,-121.972488 37.367558,-121.972521 37.367575,-121.972404 37.367674,-121.972286 37.367773,-121.972194 37.367851,-121.972101 37.367928,-121.972046 37.36799,-121.971991 37.368052,-121.972008 37.368052,-121.972025 37.368052,-121.972143 37.367959,-121.972261 37.367866,-121.972296 37.367866,-121.972276 37.36794,-121.972221 37.36798,-121.972094 37.368097,-121.971966 37.368214,-121.971956 37.368324,-121.971945 37.368433,-121.971907 37.368753,-121.971868 37.369073,-121.97184 37.369578,-121.971812 37.370083,-121.971798 37.370212,-121.971783 37.370342,-121.971542 37.370486,-121.971904 37.370324,-121.972085 37.37028,-121.972266 37.370236,-121.972559 37.370196,-121.972852 37.370155,-121.973019 37.370155,-121.973186 37.370155,-121.973232 37.370136,-121.973279 37.370116,-121.973307 37.370058,-121.973336 37.370001,-121.973363 37.369836,-121.973391 37.369671,-121.973419 37.369227,-121.973446 37.368784,-121.973429 37.368413,-121.973413 37.368041,-121.973361 37.367714,-121.973308 37.367387,-121.973285 37.367339,-121.973262 37.36729,-121.973126 37.3673,-121.972989 37.36731,-121.973066 37.36728,-121.973144 37.367251,-121.973269 37.367237,-121.973393 37.367223,-121.973443 37.367158,-121.973493 37.367093,-121.973518 37.36702,-121.973543 37.366947,-121.973582 37.366618,-121.973622 37.366288,-121.97366 37.365826,-121.973698 37.365363,-121.973669 37.365336))', 4326))) as p(g)
)
-- select various versions of the polygons into the same column for overlay comparison in SSMS
select 'Original' as l
,g
from g
union all
select 'Short' as l
,x
from g
union all
select 'Original Reduced' as l
,g.Reduce(10)
from g
union all
select 'Short Reduced' as l
,x.Reduce(10)
from g;
Output
What is interesting to note here is the difference in length of the geog binary representation (simple count of characters as displayed). As I mentioned above, simply using the Reduce function may do what you need, so you will want to test various approaches to see how best you cut down on your data transfer.
+------------------+--------------------+------+
| l | g | Len |
+------------------+--------------------+------+
| Original | 0xE6100000010484...| 4290 |
| Short | 0xE6100000010471...| 3840 |
| Original Reduced | 0xE6100000010418...| 834 |
| Short Reduced | 0xE610000001041E...| 1184 |
+------------------+--------------------+------+
Visual Comparison
String Split Function
As polygon data can be bloody huge, you will need a string splitter that can handle more then 4k or 8k characters. In my case I tend to opt for an xml based approach:
create function [dbo].[fn_StringSplitMax]
(
#str nvarchar(max) = ' ' -- String to split.
,#delimiter as nvarchar(max) = ',' -- Delimiting value to split on.
,#num as int = null -- Which value to return.
)
returns table
as
return
with s as
( -- Convert the string to an XML value, replacing the delimiter with XML tags
select convert(xml,'<x>' + replace((select #str for xml path('')),#delimiter,'</x><x>') + '</x>').query('.') as s
)
select rn
,item -- Select the values from the generated XML value by CROSS APPLYing to the XML nodes
from(select row_number() over (order by (select null)) as rn
,n.x.value('.','nvarchar(max)') as item
from s
cross apply s.nodes('x') as n(x)
) a
where rn = #num
or #num is null;

Related

SQL SERVER: Selecting numerals following Some Characters

I am trying to select some numerals from a long string, which are following some characters that is :RCT. So far I have managed to write this script;
DECLARE #rct varchar(MAX)
SET #rct = 'Reallocation of Identified Receiptsv6055161LIVERPOOL SCHOOL OF TROPICAL MEDICINE (LSTM) LONDON(GROUPA8):RCT1122489'
SELECT SUBSTRING(#rct, CHARINDEX(':RCT', #rct), LEN(#rct)) as RCT
Unfortunately, it returns an empty result. The result I expect is;
RCT
--------
1122489
There might be a more efficient way to return end-index of ':RCT' from the parent string.
But the following does the job for you:
DECLARE #rct varchar(MAX)
SET #rct = 'Reallocation of Identified Receiptsv6055161LIVERPOOL SCHOOL OF TROPICAL MEDICINE (LSTM) LONDON(GROUPA8):RCT1122489'
SELECT SUBSTRING(#rct, CHARINDEX(':RCT', #rct)+LEN(':RCT'), LEN(#rct)) as RCT

How to remove space when concatenating data from different rows into one column using xml?

I am trying to combine data from different rows into one column, and this is working with just one minor problem.
declare #RitID int = 16
select ...,
( select distinct
ISNULL(LTRIM(RTRIM(r2.LotNr)), LTRIM(RTRIM(r.LotNr))) + '+' as 'data()'
from tblExtraBestemming eb2
inner join tblRit r2 on eb2.RitID = r2.RitID
where eb2.BestemmingID = eb.BestemmingID
and eb2.BestemmingTypeID = eb.BestemmingTypeID
and ( (eb.CombinedChildExtraBestemmingID is null and eb2.RitID = #RitID)
or
(eb.CombinedChildExtraBestemmingID is not null and eb2.RitID in (select r4.RitID from tblRit r4 where r4.MasterRitID = #RitID) )
)
for XML PATH('')
) as LotNr
from tblExtraBestemming eb
where ...
this returns the correct data for the column LotNr, like this
GTT18196
GTT18197
GTT18198+ GTT18199
Now my only problem is the space after the + sign in the third row from the result, how can I get rid of this ?
I expect this result
GTT18196
GTT18197
GTT18198+GTT18199
PS, actually there is also a + at the end of each row, but that is removed by the client. I thought I better mentions this already.
EDIT
I checked the data, there are no spaces at the end or the beginning of the data
EDIT
Query updated as suggested by #Larnu
EDIT
if I check the data in the table, this is the result
select '/' + r.LotNr + '/' from tblRit r where r.RitID in (50798, 50799)
COLUMN1
-------
/GTT18198/
/GTT18199/
So it appears to me there are no characters before or after the data
Just remove AS 'data()' from your query (it is not required in the above case).
And if trailing + is a problem, move it to the beginning and use STUFF function to chop off the first character from the result.

Show comma instead of point as decimal separator

I just want to get the right number format here in germany, so i need to show commas as decimal separator instead of points. But this...
DECLARE #euros money
SET #euros = 1025040.2365
SELECT CONVERT(varchar(30), #euros, 1)
Displays 1,025,040.24 instead of 1.025.040,24 (or 1025040,24). In C# it would be simple to provide the appropriate CultureInfo but how to do it in T-SQL?
Do i really need to use REPLACE? But even if, how to replace 1,025,040.24 correctly?
To provide the appropriate culture info, in SQL 2012 there is the FORMAT() function. Here's an example:
declare #f float = 123456.789;
select
[raw] = str(#f,20,3)
,[standard] = cast(format(#f, 'N', 'en-US') as varchar(20))
,[German] = cast(format(#f, 'N', 'de-DE') as varchar(20))
returns
raw |standard |German |
---------------------|-----------|-----------|
123456.789 |123,456.79 |123.456,79 |
You can also specify in the second parameter a custom format string with the same rules as for .NET.
Docs: https://msdn.microsoft.com/en-US/library/hh213505.aspx
Well, as far as I know, there are no culture-specific options for convert available.
So you can do it using replaces (yes, it looks a bit ugly...)
select
replace(replace(replace(convert(varchar(30), #euros, 1), ',', '|'), '.', ','), '|', '.')
Idea: first change comma to something, then change dot to comma, and then "something" back to dot.
DECLARE #euros money
SET #euros = 1025040.2365
SELECT REPLACE(CONVERT(varchar(30), #euros, 0), '.', ',')
should do it (at least to get 1025040,24)
You could use replace something like this:
DECLARE #euros money
SET #euros = 1025040.2365
SELECT REPLACE(REPLACE(CONVERT(varchar(30), #euros, 1),',',''),'.',',');
SQL Fiddle Demo
You can first replace thousand separator comma(,) to a Zero length string (''), and then you can replace Decimal('.') to comma(',') in the same select statement.

Need to eliminate the last 4 characters in a string(varchar)

i am using a stored procedure, where it is taking policy number as parameter which is varchar. I need to eliminate the last 4 characters of the policy number when we retrive from the tables. But the data for policy numbers is not consistent, so I am confused how to use the logic for this. The sample policy numbers are:
KSDRE0021-000
APDRE-10-21-000
KSDRE0021
APDRE-10-21
These are four formats where policies are there in our tables.For some policies there is no tailing end '-000', so that is the challenging part. Now, I need to eliminate the tailing part '-000' from the policies when I retrieve the data from tables.
This is the sample code, which is pulling the policy data from tables.
Create Proc usp.dbo.policydataSP #policy_num varchar(18)
AS
Begin
Select * from policy_table pt
where pt.policy_num = #policy_num
End
STEP 1: Create a User Defined Function to normalize a policy number.
create function dbo.normalize_policy_num
(#policy_num varchar(100))
returns varchar(100)
as
begin
-- replace trailing digits
if (#policy_num like '%-[0-9][0-9][0-9]')
set #policy_num = left(#policy_num, len(#policy_num) - 4)
-- replace remaining hyphens
set #policy_num = replace(#policy_num, '-', '')
return #policy_num
end
What this essentially doing is stripping off the trailing '-000' from policy numbers that contain the pattern, then removing remaining hyphens. This function seems to work on your supplied policy numbers:
-- returns: KSDRE0021
select dbo.normalize_policy_num('KSDRE0021-000')
-- returns: APDRE1021
select dbo.normalize_policy_num('APDRE-10-21-000')
-- returns: KSDRE0021
select dbo.normalize_policy_num('KSDRE0021')
-- returns: APDRE1021
select dbo.normalize_policy_num('APDRE-10-21')
STEP 2: Modify your SP as follows:
create proc usp.dbo.policydataSP
#policy_num varchar(18)
as
begin
select
dbo.normalize_policy_num(pt.policy_num) as normalized_policy_num,
pt.*
from policy_table pt
where dbo.normalize_policy_num(#policy_num) = dbo.normalize_policy_num(pt.policy_num)
Note: If you are able to modify the table schema, you could add a persisted computed column using the UDF specified above. If you add an index to it, queries will run much faster. However, there will be some penalty for inserts, so there is a trade-off.
this is probably your best bet. Match the policy number up to the length of the requested parameter:
Create Proc usp.dbo.policydataSP
#policy_num varchar(18)
AS
Begin
Select * from policy_table pt where LEFT(len(#policy_num),pt.policy_num) = #policy_num
End
If you only want to strip -000 when returning results:
select case right(policy_num, 4)
when '-000' then left(policy_num, len(policy_num) - 4)
else policy_num end as policy_num
from policy_table pt
where pt.policy_num = #policy_num
If you want to strip any 3-digit value following a dash:
select case when policy_num like '%-[0-9][0-9][0-9]' then left(policy_num, len(policy_num) - 4)
else policy_num end as policy_num
from policy_table pt
where pt.policy_num = #policy_num

Query to fetch data between two characters in informix

I have a value in informix which is like this :
value AMOUNT: <15000000.00> USD
I need to fetch 15000000.00 afrom the above.
I am using this query to fetch the data between <> as workaround
select substring (value[15,40]
from 1 for length (value[15,40]) -5 )
from tablename p where value like 'AMOUNT%';
But, this is not generic as the lenght may vary.
Please help me with a generic query for this, fetch the data between <>.
The database I am using is Informix version 9.4.
It's a diabolical problem, created by whoever chose to break one of the fundamental rules of database design: that the content of a column should be a single, indivisible value.
The best solution would be to modify the table to contain a value_descr = "AMOUNT", a value = 15000000.00, and a value_type = "USD", and ensure that the incoming data is stored in that fashion. Easier said than done, I know.
Failing that, you'll have to write a UDR that parses the string and returns the numeric portion of it. This would be feasible in SPL, but probably very slow. Something along the lines of:
CREATE PROCEDURE extract_value (inp VARCHAR(255)) RETURNING DECIMAL;
DEFINE s SMALLINT;
DEFINE l SMALLINT;
DEFINE i SMALLINT;
FOR i = 1 TO LENGTH(inp)
IF SUBSTR(inp, i, 1) = "<" THEN
LET s = i + 1;
ELIF SUBSTR(inp, i, 1) = ">" THEN
LET l = i - s - 1;
RETURN SUBSTR(inp, s, l)::DECIMAL;
END IF;
END FOR;
RETURN NULL::DECIMAL; -- could not parse out number
END PROCEDURE;
... which you would execute thus:
SELECT extract_value(p.value)
FROM tablename AS p
WHERE p.value LIKE 'AMOUNT%'
NB: that procedure compiles and produces output in my limited testing on version 11.5. There is no validation done to ensure the string between the <> parses as a number. I don't have an instance of 9.4 handy, but I haven't used any features not available in 9.4 TTBOMK.

Resources