Preserve start points in UnionAggregate - sql-server

Use case 1:
DECLARE #Geom TABLE
(
shape geometry,
shapeType nvarchar(50)
);
INSERT INTO #Geom(shape,shapeType)
VALUES('LINESTRING(1 2, 3 4)', 'A'),
('LINESTRING(3.2 4, 7 8)', 'B');
SELECT *
FROM #Geom
SELECT geometry::UnionAggregate(shape).ToString(), geometry::UnionAggregate(shape)
FROM #Geom;
The WKT for the output is
MULTILINESTRING ((7 8, 3.2 4), (3 4, 1 2))
when I would want
MULTILINESTRING ((1 2, 3 4), (3.2 4, 7 8))
Where the beginning of the "A" and "B" line should be (1 2) and (3.2 4) respectfully.
This behavior of UnionAggregate doesn't seem to care about "direction" of the geometry in order to maintain that A union B and B union A is the same result. However, I want to preserve start/endpoints as I am unioning street geometry and I want all the LINESTRINGs to go in their original direction.
This problem is discussed here: https://social.msdn.microsoft.com/Forums/sqlserver/en-US/89e95366-3649-4294-a0bc-f3921598157f/union-of-linestrings-and-reversing-direction?forum=sqlspatial
They seem to suggest at a possible solution about checking the end result, but it is not clear to me how to do that. It is hinted at in a linked thread that
The MultiLineString always represents the graph from the point which farthest from the origin point.
It is not clear to me exactly what this means, but I don't think I can just assume the result of a UnionAggregate is always the reverse of what I want
If it is hard to know directional intent then I can add M measures where the direction should follow increasing M values.
Assuming I have a method for reversing the points in line, how would I go about solving for this?
I found a function that mimics for STUnion for added support for Z and M measure: http://www.spatialdbadvisor.com/files/SQLServer.html#robo48 however it is noted that "their direction could change (eg Start/Start Point relationship).", which is what I want to avoid.
Edit:
The functionality I also need is that when to LINESTRING have a shared endpoint, the result is a connect LINESTRING
Use case 2:
DECLARE #Geom TABLE
(
shape geometry,
shapeType nvarchar(50)
);
INSERT INTO #Geom(shape,shapeType)
VALUES('LINESTRING(1 2, 3 4)', 'A'),
('LINESTRING(3 4, 7 8)', 'B');
SELECT *
FROM #Geom
SELECT geometry::UnionAggregate(shape).ToString(), geometry::UnionAggregate(shape)
FROM #Geom;
This results in WKT LINESTRING (7 8, 3 4, 1 2)
When I would want
LINESTRING (1 2, 3 4, 7 8)
Attempt at solution
The geometry::CollectionAggregate(shape).Reduce(0) as suggested by Clay solves use case 1. I tried just using STUnion on the result with an empty linestring and while it works it falls back to the incorrect ordering.
I suspect the solution will be a scaler function similar to ST_LineMerge which takes the result of the CollectionAggregate (MULTILINESTRING) and then merges the points together when it can into one LINESTRING, and when it can't returns the geometry back unaltered

The geometry types don't record/encode directionality. The lines that you give it may be considered "undirected" or "bi-directional". This returns 1:
select geometry::STGeomFromText('LINESTRING(1 2, 3 4)',0).STEquals(
geometry::STGeomFromText('LINESTRING(3 4, 1 2)',0))
So what you're looking for isn't available using these types. You consider the "start points" to be special. I suggest you separately record those as individual POINTs.
This does make all of the resulting code uglier now though - you have to keep these data pairs processed together:
DECLARE #Geom TABLE
(
start geometry,
shape geometry,
shapeType nvarchar(50)
);
INSERT INTO #Geom(start,shape,shapeType)
VALUES('POINT(1 2)','LINESTRING(1 2, 3 4)', 'A'),
('POINT(3.2 4)','LINESTRING(3.2 4, 7 8)', 'B');
SELECT *
FROM #Geom
SELECT
geometry::UnionAggregate(start).ToString(), geometry::UnionAggregate(shape).ToString(),
geometry::UnionAggregate(start), geometry::UnionAggregate(shape)
FROM #Geom;
At this point you may decide to stop using the geography type directly - you can create a CLR UDT that references SqlGeography (a CLR surfacing of the same type) and uses that internally but also tracks it's "directionality" too, all wrapped up together, and start using that instead.
You're unlikely to want to surface all of the geography methods in that wrapper though - you'll have to pick and choose your battles. And, of course, since it's not really the SQL Server geography turning up in your results, you won't get the benefit of the "Spatial Results" tab in Management Studio.
The only place I can think of where some "directionality" does exist in these types is the left-hand rule for disambiguating geography shapes.

Originally, I suggested...
DECLARE #Geom TABLE
(
shape geometry,
shapeType nvarchar(50)
);
INSERT #Geom(shape,shapeType) VALUES
('LINESTRING(1 2, 3 4)', 'A'),
('LINESTRING(3.2 4, 7 8)', 'B');
SELECT * FROM #Geom
SELECT
geometry::CollectionAggregate(shape).Reduce(0).ToString(),
geometry::CollectionAggregate(shape).Reduce(0)
FROM #Geom
You get:
...however, it became clear to me that the answer I gave isn't quite good enough. For example, it's kinda hard to keep Reduce() from simplifying away part of your lines,
I still like the CollectionAggregate for getting your original array of lines into a single thing, but then I figured there just has to be a way of building the requisite geometry structure.
I played with this several times, and this iteration will eval to a LineString or a MultiLineString depending on whether there are disjoint LineString elements in the inputs:
create function dbo.SimplifyToLine( #geo geometry ) returns geometry as
begin
declare
#numSubGeos int = #geo.STNumGeometries(),
#subGeoIdx int = 1,
#sql nvarchar( max ) = N'',
#subGeo geometry,
#oldEndX float = -1.0e26,
#oldEndY float = -1.0e26,
#startX float,
#startY float,
#endX float,
#endY float,
#idx int,
#numPoints int,
#point geometry,
#segment int = 1,
#continue bit,
#result geometry,
#started bit = 0
declare
#geos table
(
Idx int primary key,
SubGeo geometry,
StartX decimal,
EndX decimal,
StartY decimal,
EndY decimal,
NumPoints int,
ContinueFromPrevious bit
)
declare
#multiLines table
(
Idx int primary key,
Segment nvarchar(max)
)
--> collect geometries and extents...
while ( #subGeoIdx <= #numSubGeos )
begin
select #subGeo = #geo.STGeometryN( #subGeoIdx )
select
#startX = #subGeo.STPointN( 1 ).STX,
#startY = #subGeo.STPointN( 1 ).STY,
#endX = #subGeo.STPointN( #subGeo.STNumPoints( ) ).STX,
#endY = #subGeo.STPointN( #subGeo.STNumPoints( ) ).STY
insert #geos values
(
#subGeoIdx,
#subGeo,
#startX,
#endX,
#startY,
#endY,
#subGeo.STNumPoints() ,
case when #subGeoIdx = 1 then 1 when #oldEndX = #startX and #oldEndY = #startY then 1 else 0 end
)
select
#oldEndX = #endX,
#oldEndY = #endY,
#subGeoIdx = #subGeoIdx + 1
end
if not exists ( select * from #geos where ContinueFromPrevious = 0 ) --> then all LineStrings are connected
begin
--> build a single LINESTRING( )...
select #sql = ''
declare c cursor for select SubGeo, StartX, EndX, StartY, EndY, NumPoints, ContinueFromPrevious from #geos order by Idx
open c
while ( 1 = 1 )
begin
fetch next from c into #subGeo, #startX, #endX, #startY, #endY, #numPoints, #continue
if ##fetch_status != 0 break;
select #idx = case when #started = 0 then 1 else 2 end, #started = 1 --> accrue all points, de-duplicating line ends...
while ( #idx <= #numPoints )
begin
select #point = #subGeo.STPointN( #idx )
select #sql += convert( nvarchar, #point.STX ) + N' ' + convert( nvarchar, #point.STY ) + N','
select #idx = #idx + 1
end
end
close c
deallocate c
select #sql = substring( #sql, 1, len( #sql ) -1 )
select #result = geometry::STGeomFromText(N'LINESTRING(' + #sql + N')', 0 )
end
else --> we have disjoint lines in the inputs...
begin
select #sql = N'', #started = 0
--> build a MULTILINESTRING((),()...) with line segements terminated at disjoint points..
declare c cursor for select SubGeo, StartX, EndX, StartY, EndY, NumPoints, ContinueFromPrevious from #geos order by Idx
open c
while ( 1=1 )
begin
fetch next from c into #subGeo, #startX, #endX, #startY, #endY, #numPoints, #continue
if ##fetch_status != 0 break;
if #continue = 1
begin
select #idx = case when #started = 0 then 1 else 2 end, #started = 1
while ( #idx <= #numPoints )
begin
select #point = #subGeo.STPointN( #idx )
select #sql += convert( nvarchar, #point.STX ) + N' ' + convert( nvarchar, #point.STY ) + N','
select #idx = #idx + 1
end
end
else
begin
insert #multiLines values ( #segment, substring( #sql, 1, len( #sql ) -1 ) ) --> collect the segment
select #idx = 1, #sql = N'', #segment = #segment + 1
while ( #idx <= #numPoints )
begin
select #point = #subGeo.STPointN( #idx )
select #sql += convert( nvarchar, #point.STX ) + N' ' + convert( nvarchar, #point.STY ) + N','
select #idx = #idx + 1
end
end
end
close c
deallocate c
insert #multiLines values ( #segment, substring( #sql, 1, len( #sql ) -1 ) )
select #sql = N''
select #sql += N'(' + Segment + N'),' from #multiLines order by Idx --> appends all segments
select #sql = substring( #sql, 1, len( #sql ) -1 )
select #result = geometry::STGeomFromText( 'MULTILINESTRING('+ #sql + N')', 1 )
end
...and finally, given:
DECLARE #Geom TABLE
(
shape geometry,
shapeType nvarchar(50)
);
INSERT #Geom(shape,shapeType) VALUES
('LINESTRING(1 2, 3 4)', 'A'),
('LINESTRING(3 4, 9 9)', 'B'), --> disjoint from here to the next LINESTRING
('LINESTRING(9 8, 3 4)', 'C'),
('LINESTRING(3 4, 1 2)', 'D');
select
dbo.SimplifyToLine(geometry::CollectionAggregate(shape)).ToString(),
dbo.SimplifyToLine(geometry::CollectionAggregate(shape))
from
#Geom
delete #Geom
INSERT #Geom(shape,shapeType) VALUES
('LINESTRING(1 2, 3 4)', 'A'),
('LINESTRING(3 4, 9 8)', 'B'),
('LINESTRING(9 8, 3 4)', 'C'),
('LINESTRING(3 4, 1 2)', 'D');
select
dbo.SimplifyToLine(geometry::CollectionAggregate(shape)).ToString(),
dbo.SimplifyToLine(geometry::CollectionAggregate(shape))
from
#Geom
...you get:

Going off Clay's idea of passing in a GeometryCollection I implemented a robust version that will take any combination of POINT, MULTIPOINT, LINESTRING, MULTILINESTRING and remove any touching endpoints within a #Tolerance and create a POINT, LINESTRING, MULTILINESTRING
Here is a demostration of it working (notice how the tolerance of 0 and 0.1 makes a difference for the 2nd and 3rd output):
DECLARE #GeometryCollection GEOMETRY = GEOMETRY::STGeomFromText('GEOMETRYCOLLECTION (LINESTRING (1 2, 3 4), LINESTRING (3 4, 100 100), LINESTRING (9 8, 3 4), LINESTRING (3 4, 1 2), POINT(1 2), POINT(1 2), POINT(1 2))',0)
SELECT [dbo].[fnSimplifyToLine](#GeometryCollection, 0).ToString();
--Output: MULTILINESTRING ((1 2, 3 4, 100 100), (9 8, 3 4, 1 2))
SET #GeometryCollection = GEOMETRY::STGeomFromText('GEOMETRYCOLLECTION (LINESTRING (1 2, 3 4.1), LINESTRING (3 4, 9 9, 6 1))',0)
SELECT [dbo].[fnSimplifyToLine](#GeometryCollection, 0).ToString()
--Output: MULTILINESTRING ((1 2, 3 4.1), (3 4, 9 9, 6 1))
SET #GeometryCollection = GEOMETRY::STGeomFromText('GEOMETRYCOLLECTION (LINESTRING (1 2, 3 4.1), LINESTRING (3 4, 9 9, 6 1))',0)
SELECT [dbo].[fnSimplifyToLine](#GeometryCollection, 0.1).ToString()
--Output: LINESTRING (1 2, 3 4.1, 9 9, 6 1)
SET #GeometryCollection = GEOMETRY::STGeomFromText('GEOMETRYCOLLECTION (POINT(1 2))',0)
SELECT [dbo].[fnSimplifyToLine](#GeometryCollection, 0).ToString()
--Output: POINT (1 2)
SET #GeometryCollection = GEOMETRY::STGeomFromText('GEOMETRYCOLLECTION (MULTIPOINT((1 2), (2 3)))',0)
SELECT [dbo].[fnSimplifyToLine](#GeometryCollection, 0).ToString()
--Output: (1 2, 2 3)
First I had to create a recursive CTE function that takes a geometry and extracts all points.
CREATE FUNCTION [dbo].[fnGetPoints]
(
#Geometry GEOMETRY
)
RETURNS TABLE
AS
RETURN
(
WITH GeometryPoints(N, Point) AS (
SELECT
CAST(1 AS DECIMAL(9,2)) as N
,#Geometry.STPointN(1) as Point
UNION ALL
SELECT
CAST(N + 1.0 AS DECIMAL(9,2)) as N
,#Geometry.STPointN(N + 1) as Point
FROM GeometryPoints GP
WHERE N < #Geometry.STNumPoints()
)
SELECT *
FROM GeometryPoints
)
Then I created a function that CROSS APPLY fnGetPoints to every geometry in the #GeometryCollection to get a point matrix. Using windowed function (LAG) to find places where the endpoints are within a #Tolerance and remove those points. Then I did a data smear to combine the geometries where they shared endpoints.
CREATE FUNCTION [dbo].[fnSimplifyToLine] (#GeometryCollection GEOMETRY, #Tolerance DECIMAL(19,10))
RETURNS GEOMETRY
AS
BEGIN
DECLARE #PointMatrix TABLE (
PointId INT,
LinestringId INT,
GeometryIndex INT,
GeometryType varchar(100),
PointIndex INT,
Point GEOMETRY,
Duplicate BIT
);
DECLARE #Linestrings TABLE (
LinestringId INT,
PointArrayStr varchar(max)
);
WITH CollectionGeometries(N, Geom) AS (
SELECT
CAST(1 AS DECIMAL(9,2)) as N
,#GeometryCollection.STGeometryN(1) as Geom
UNION ALL
SELECT
CAST(N + 1.0 AS DECIMAL(9,2)) as N
, #GeometryCollection.STGeometryN(N + 1) as Geom
FROM CollectionGeometries CG
WHERE N < #GeometryCollection.STNumGeometries()
), PointMatrix AS (
SELECT
ROW_NUMBER() OVER(ORDER BY G.N, P.N) as PointId
,G.N as GeometryIndex
,G.Geom.STGeometryType() as GeometryType
,P.N as PointIndex
,P.Point
FROM CollectionGeometries G
CROSS APPLY dbo.fnGetPoints(Geom) P
)
INSERT INTO #PointMatrix
SELECT
PointId
,GeometryIndex as LinestringId
,GeometryIndex
,GeometryType
,PointIndex
,Point
,CASE
WHEN
GeometryIndex != LAG(GeometryIndex) OVER(ORDER BY PointId)
AND ABS(Point.STX - LAG(Point.STX) OVER(ORDER BY PointId)) <= #Tolerance
AND ABS(Point.STY - LAG(Point.STY) OVER(ORDER BY PointId)) <= #Tolerance
THEN 1
ELSE 0
END as Duplicate
FROM PointMatrix
OPTION (MAXRECURSION 10000)
-- POLYGON, MULTIPOLYGON, GEOMETRYCOLLECTION, CIRCULARSTRING, COMPOUNDCURVE, CURVEPOLYGON not supported
IF EXISTS ( SELECT * FROM #PointMatrix WHERE GeometryType NOT IN ('POINT', 'MULTIPOINT', 'LINESTRING', 'MULTILINESTRING'))
RETURN CAST('Geometries in #GeometryCollection must all be IN (''POINT'',''MULTIPOINT'', ''LINESTRING'', ''MULTILINESTRING'')' as GEOMETRY);
DECLARE #SRID INT = (SELECT DISTINCT Point.STSrid FROM #PointMatrix)
UPDATE #PointMatrix
SET LinestringId = NULL
WHERE GeometryIndex IN (
SELECT GeometryIndex FROM #PointMatrix WHERE Duplicate = 1
)
DELETE #PointMatrix
WHERE Duplicate = 1;
-- Data smear
WITH Cnt AS (
SELECT PointId, Point, LinestringId,c=COUNT(LinestringId) OVER (ORDER BY PointId)
FROM #PointMatrix
), SmearedLineStringId AS (
SELECT PointId, Point, LinestringId=MAX(LinestringId) OVER (PARTITION BY c)
FROM Cnt
)
INSERT #Linestrings
SELECT
LinestringId
,'(' +
STUFF((
SELECT ',' + CAST(Point.STX as varchar(100)) + ' ' + CAST(Point.STY as varchar(100))
FROM SmearedLineStringId t2
WHERE t1.LinestringId = t2.LinestringId
ORDER BY PointId
FOR XML PATH ('')
), 1, 1, '')
+ ')' as PointArray
FROM SmearedLineStringId t1
GROUP BY LinestringId
DECLARE #Type varchar(100) = CASE
WHEN 1 =(SELECT COUNT(*) FROM #PointMatrix) THEN
'POINT'
WHEN 1 =(SELECT COUNT(*) FROM #Linestrings) THEN
'LINESTRING'
ELSE
'MULTILINESTRING'
END
DECLARE #BeginParens char(1) = '(';
DECLARE #EndParens char(1) = ')'
IF #Type != 'MULTILINESTRING'
BEGIN
SET #BeginParens = '';
SET #EndParens = '';
END
DECLARE #Wkt varchar(max) = #Type + #BeginParens +
STUFF((
SELECT ',' + PointArrayStr
FROM #Linestrings t2
ORDER BY LinestringId
FOR XML PATH ('')
), 1, 1, '')
+ #EndParens
RETURN Geometry::STGeomFromText(#Wkt, #SRID)
END
GO

Related

Split With Cross Apply function in SQL Server

I need to split a variable as, exp
declare #testString varchar(100)
set #testString = ' Agency=100|Org=2112|RepOrg=2112|SubOrg= |Fund=0137|Approp=6755|Object= |SubObject= |Activity= |Function= |Job= |ReportingCat= '
select
y.items
from
dbo.Split(#testString, '|') x
cross apply
dbo.Split(x.items, '=') y
Leads to error :
Msg 102, Level 15, State 1, Line 7
Incorrect syntax near '.'.
Not sure where I'm going wrong.
May be you need something like this:-
DECLARE #testString VARCHAR(100)
SET #testString =
' Agency=100|Org=2112|RepOrg=2112|SubOrg= |Fund=0137|Approp=6755|Object= |SubObject= |Activity= |Function= |Job= |ReportingCat= '
SELECT X.VALUE AS ACTUALVALUE,
SUBSTRING(
X.VALUE,
1,
CASE
WHEN CHARINDEX('=', X.VALUE) = 0 THEN LEN(X.VALUE)
ELSE CHARINDEX('=', X.VALUE) -1
END
) AS FIELD,
SUBSTRING(X.VALUE, CHARINDEX('=', X.VALUE) + 1, 10) AS VALUE
FROM string_split(#testString, '|') x
I have used the same function which you have used dbo.split. To get the output (Agency in one column and code in another), you can make use of substring along with char index which will help you to split into two columns.
Few changes I made to your script:
Changed the length from 100 to 250 as it was truncating the string, and
removed another cross apply as it was creating duplicates.
declare #testString varchar(250)
set #testString = 'Agency=100|Org=2112|RepOrg=2112|SubOrg=
|Fund=0137|Approp=6755|Object= |SubObject= |Activity= |Function= |Job= |ReportingCat='
select substring( (x.items),1,
case when CHARINDEX('=', x.items) = 0 then LEN(x.items)
else CHARINDEX('=', x.items) -1 end ) Agency ,
substring( (x.items),
case when CHARINDEX('=', x.items) = 0 then LEN(x.items)
else CHARINDEX('=', x.items) +1 end,len(x.items) -
case when CHARINDEX('=', x.items) = 0 then LEN(x.items)
else CHARINDEX('=', x.items)-1 end) as Code from dbo.split
(#testString, '|') x
It ran without error, and that function is here as Ben mentioned.
https://social.msdn.microsoft.com/Forums/en-US/bb2b2421-6587-4956-aff0-a7df9c91a84a/what-is-dbosplit?forum=transactsql
Output which I get:
Agency Code
Agency 100
Org 2112
RepOrg 2112
SubOrg
Fund 0137
Approp 6755
Object
SubObject
Activity
Function
Job
ReportingCat

Multiple values stored in one column

I have a table that contains a column with settings, they're formatted like:
setting_name=setting_value|setting_name=setting_value|setting_name=setting_value
The thing is that it varies a lot on which settings have been filled. I would like to split all values and store them in a better way.
Currently it looks like this:
And I would like it to be:
To get there I used a function to split the values. Then I union them together and use a substring to get the setting_value that belongs to the setting_name. This is what I got so far:
/*
create function [dbo].[split_to_columns](#text varchar(8000)
, #column tinyint
, #separator char(1))
returns varchar(8000)
as
begin
declare #pos_start int = 1
declare #pos_end int = charindex(#separator, #text, #pos_start)
while (#column > 1 and #pos_end > 0)
begin
set #pos_start = #pos_end + 1
set #pos_end = charindex(#separator, #text, #pos_start)
set #column = #column - 1
end
if #column > 1 set #pos_start = len(#text) + 1
if #pos_end = 0 set #pos_end = len(#text) + 1
return substring(#text, #pos_start, #pos_end - #pos_start)
end
*/
create table #settings(id int, setting varchar(255))
insert into #settings(id, setting) values(1,'setting1=a|setting2=b|setting3=c')
insert into #settings(id, setting) values(2,'setting1=d|setting2=e')
insert into #settings(id, setting) values(3,'setting1=f|setting3=g')
insert into #settings(id, setting) values(4,'setting2=h')
;
with cte as (
select id, dbo.split_to_columns(setting, 1, '|') as setting from #settings
union select id, dbo.split_to_columns(setting, 2, '|') from #settings
union select id, dbo.split_to_columns(setting, 3, '|') from #settings
)
select distinct
x.id
, (select substring(setting, charindex('=', setting) + 1, 255) from cte where setting like 'setting1=%' and id = x.id) as setting1
, (select substring(setting, charindex('=', setting) + 1, 255) from cte where setting like 'setting2=%' and id = x.id) as setting2
, (select substring(setting, charindex('=', setting) + 1, 255) from cte where setting like 'setting3=%' and id = x.id) as setting3
from cte x
drop table #settings
Am I doing this in the right way? I can't help myself thinking that I am making it too complex. Though I am not a big fan of the way my settings are formatted right now, I do see it more often. Which means that more people have to do this trick...
Edit:
I am importing picture-properties into a database. The settings mentioned above are the picture-properties and the id is the name of the picture the settings belong to.
Example of settings in one column:
FullName=D:\8.jpg|FolderName=D:\|FileName=8.jpg|Size=7284351|Extension=.jpg|datePictureTaken=10-3-2017
11:53:38|ApertureValue=2|DateTime=10-3-2017
11:53:38|DateTimeDigitized=10-3-2017
11:53:38|DateTimeOriginal=10-3-2017
11:53:38|ExposureTime=0,0025706940874036|FocalLength=3,65|GPSAltitude=43|GPSDateStamp=10-3-2017
0:00:00|Model=QCAM-AA|ShutterSpeedValue=8,604
This is the reason I would like to have it formatted in the way described above.
I would convert the text into a basic chunk of XML so that we can then take a set-based approach to transforming the data into the results you want:
declare #settings table(id int, setting varchar(255))
insert into #settings (id,setting) values
(1,'setting1=a|setting2=b|setting3=c'),
(2,'setting1=d|setting2=e'),
(3,'setting1=f|setting3=g'),
(4,'setting2=h')
;with Xmlised (id,detail) as (
select id,CONVERT(xml,'<prob><setting name="' +
REPLACE(
REPLACE(setting,'=','">'),
'|','</setting><setting name="') + '</setting></prob>')
from #settings
), shredded as (
select
x.id,
S.value('./#name','varchar(50)') as name,
S.value('./text()[1]','varchar(100)') as value
from
Xmlised x
cross apply
detail.nodes('prob/setting') as T(S)
)
select
id,setting1,setting2,setting3
from
shredded
pivot (MAX(value) for name in (setting1,setting2,setting3)) u
Hopefully I've broken it into enough steps that you can see what it's doing and how.
Results:
id setting1 setting2 setting3
----------- --------- --------- ---------
1 a b c
2 d e NULL
3 f NULL g
4 NULL h NULL
As Sean suggested in the comments though, I'd normally not consider storing the pivotted result and would generally skip that step
WITH is pretty slow. I would suggest table that would store setting name, value, and some kind of group id. For example:
CREATE TABLE [dbo].[settings_table](
[id] [int] NULL,
[group] [int] NULL,
[name] [nchar](10) NULL,
[value] [nchar](10) NOT NULL
) ON [PRIMARY]
I don't know exactly what your program is doing with those settings, but this structure would be much more efficient in long run.
I would do the following 3 steps:
1) Create a generic Split function. This is the one I use:
CREATE FUNCTION Split(
#StringToSplit VARCHAR(MAX)
,#Delimiter VARCHAR(10)
)
RETURNS #SplitResult TABLE (id int, item VARCHAR(MAX))
BEGIN
DECLARE #item VARCHAR(8000)
DECLARE #counter int = 1
WHILE CHARINDEX(#Delimiter, #StringToSplit,0) <> 0
BEGIN
SELECT
#item = RTRIM(LTRIM(SUBSTRING(#StringToSplit,1, CHARINDEX(#Delimiter,#StringToSplit,0)-1))),
#StringToSplit = RTRIM(LTRIM(SUBSTRING(#StringToSplit, CHARINDEX(#Delimiter,#StringToSplit,0) + LEN(#Delimiter), LEN(#StringToSplit))))
IF LEN(#item) > 0
INSERT INTO #SplitResult SELECT #counter, #item
SET #counter = #counter + 1
END
IF LEN(#StringToSplit) > 0
INSERT INTO #SplitResult SELECT #counter,#StringToSplit
SET #counter = #counter + 1
RETURN
END
GO
-- You use it like this
SELECT S.id, T.item FROM #settings AS S CROSS APPLY Split(S.setting, '|') AS T
2) Split the settings and separate the setting name from it's value.
SELECT
S.id,
T.item,
SettingName = SUBSTRING(T.item, 1, CHARINDEX('=', T.item, 1) - 1), -- -1 to not include the "="
SettingValue = SUBSTRING(T.item, CHARINDEX('=', T.item, 1) + 1, 100) -- +1 to not include the "="
FROM
#settings AS S
CROSS APPLY Split(S.setting, '|') AS T
3) Pivot the known settings by name:
;WITH SplitValues AS
(
SELECT
S.id,
SettingName = SUBSTRING(T.item, 1, CHARINDEX('=', T.item, 1) - 1), -- -1 to not include the "="
SettingValue = SUBSTRING(T.item, CHARINDEX('=', T.item, 1) + 1, 100) -- +1 to not include the "="
FROM
#settings AS S
CROSS APPLY Split(S.setting, '|') AS T
)
SELECT
P.id,
P.setting1,
P.setting2,
P.setting3
FROM
SplitValues AS S
PIVOT (
MAX(S.SettingValue) FOR SettingName IN ([setting1], [setting2], [setting3])
) AS P
For set columns (photo properties) I agree with columns in a row
Use the proper type e.g. DateTime, Int, Numeric as you can search on range, sort, and it is just more efficient.
I know you asked for SQL but I would do this in .NET as you are going to need to do some clean up like remove comma from an integer. In real life read lines from a file so you can leave the command (insert) open.
public static void ParsePhoto(string photo)
{
if(string.IsNullOrEmpty(photo))
{
photo = #"FullName = D:\8.jpg | FolderName = D:\| FileName = 8.jpg | Size = 7284351 | Extension =.jpg | datePictureTaken = 10 - 3 - 2017 11:53:38 | ApertureValue = 2 | DateTime = 10 - 3 - 2017 11:53:38 | DateTimeDigitized = 10 - 3 - 2017 11:53:38 | DateTimeOriginal = 10 - 3 - 2017 11:53:38 | ExposureTime = 0,0025706940874036 | FocalLength = 3,65 | GPSAltitude = 43 | GPSDateStamp = 10 - 3 - 2017 0:00:00 | Model = QCAM - AA | ShutterSpeedValue = 8,604";
}
List<KeyValuePair<string, string>> kvp = new List<KeyValuePair<string, string>>();
foreach(string s in photo.Trim().Split(new char[] {'|'}, StringSplitOptions.RemoveEmptyEntries))
{
string[] sp = s.Split(new char[] { '=' }, StringSplitOptions.RemoveEmptyEntries);
if (sp.Count() == 2)
{
kvp.Add(new KeyValuePair<string, string>(sp[0].Trim(), sp[1].Trim()));
}
else
{
throw new IndexOutOfRangeException("bad photo");
}
}
foreach(KeyValuePair<string, string> pair in kvp)
{
Debug.WriteLine($"{pair.Key} = {pair.Value}");
//build up and execute insert statement here
}
Debug.WriteLine("Done");
}
FullName = D:\8.jpg
FolderName = D:\
FileName = 8.jpg
Size = 7284351
Extension = .jpg
datePictureTaken = 10 - 3 - 2017 11:53:38
ApertureValue = 2
DateTime = 10 - 3 - 2017 11:53:38
DateTimeDigitized = 10 - 3 - 2017 11:53:38
DateTimeOriginal = 10 - 3 - 2017 11:53:38
ExposureTime = 0,0025706940874036
FocalLength = 3,65
GPSAltitude = 43
GPSDateStamp = 10 - 3 - 2017 0:00:00
Model = QCAM - AA
ShutterSpeedValue = 8,604
If performance is important you can get this done easily without a splitter function, casting the data as XML or doing any pivoting. This technique is commonly referred to as the Cascading CROSS APPLY. The code is a little more verbose but the performance pay-off is amazing. First the solution:
SELECT
id,
setting1 = substring(setting, s1.p+1, x1.x),
setting2 = substring(setting, s2.p+1, x2.x),
setting3 = substring(setting, s3.p+1, x3.x)
FROM #settings t
CROSS APPLY (VALUES (nullif(charindex('setting1=', t.setting),0)+8)) s1(p)
CROSS APPLY (VALUES (nullif(charindex('setting2=', t.setting),0)+8)) s2(p)
CROSS APPLY (VALUES (nullif(charindex('setting3=', t.setting),0)+8)) s3(p)
CROSS APPLY (VALUES (isnull(nullif(charindex('|',t.setting,s1.p),0)-s1.p-1, 1))) x1(x)
CROSS APPLY (VALUES (isnull(nullif(charindex('|',t.setting,s2.p),0)-s2.p-1, 1))) x2(x)
CROSS APPLY (VALUES (isnull(nullif(charindex('|',t.setting,s3.p),0)-s3.p-1, 1))) x3(x);
Note the execution plans:
I don't have time to put together a performance test but, based on the execution plans - the cascading cross apply technique is roughly 44,000 times faster.
Try this:
declare #table table (id int, setting varchar(100))
insert into #table values
(1,'setting1=a|setting2=b|setting3=c'),
(2,'setting1=d|setting2=e'),
(3,'setting1=f|setting3=g'),
(4,'setting2=h')
select id,
case when charindex('setting1=',setting) = 0 then null else SUBSTRING(setting, charindex('setting1=',setting) + 9, 1) end [setting1],
case when charindex('setting2=',setting) = 0 then null else SUBSTRING(setting, charindex('setting2=',setting) + 9, 1) end [setting2],
case when charindex('setting3=',setting) = 0 then null else SUBSTRING(setting, charindex('setting3=',setting) + 9, 1) end [setting3]
from #table

How to find UPPER case characters in a string and replace them with a space with SQL or SSRS

I have a column which has string values with mixed upper and lower case characters like (AliBabaSaidHello). I want to use this column values for my SSRS table cell headers like (Ali Baba Said Hello). First, I like to find each UPPER case letter and add space to it.
Ascii 65-90 tip was helpful for creating below code for a function:
declare #Reset bit;
declare #Ret varchar(8000);
declare #i int;
declare #c char(1);
select #Reset = 1, #i=1, #Ret = '';
while (#i <= len('AliBabaSaidHello'))
select #c= substring('AliBabaSaidHello',#i,1),
#Reset = case when ascii(#c) between 65 and 90 then 1 else 0 end,
#Ret = #Ret + case when #Reset=1 then ' ' + #c else #c end,
#i = #i +1
select #Ret
Thanks all, after Reading all the answers, I created this flexible and very efficient function:
FUNCTION dbo.UDF_DelimitersForCases (#string NVARCHAR(MAX), #Delimiter char(1))
RETURNS NVARCHAR(MAX)
AS
BEGIN
DECLARE #len INT = LEN(#string)
,#iterator INT = 2 --Don't put space to left of first even if it's a capital
;
WHILE #iterator <= LEN(#string)
BEGIN
IF PATINDEX('[ABCDEFGHIJKLMNOPQRSTUVWXYZ]',SUBSTRING(#string,#iterator,1) COLLATE Latin1_General_CS_AI) <> 0
BEGIN
SET #string = STUFF(#string,#iterator,0,#Delimiter);
SET #iterator += 1;
END
;
SET #iterator += 1;
END
RETURN #string;
END
;
GO
Example:
SELECT dbo.udf_DelimitersForCases('AliBabaSaidHello','_');
Returns "Ali_Baba_Said_Hello" (no quotes).
get chars one by one like "A" , "l" , "i", and look whether returning value of method ascii('&i_char') is between 65 and 90, those are "capital letters".
( ascii('A')=65(capital), ascii('l')=108(non-capital), ascii('i')=105(non-capital) )
Use case sensitive collation for your qry and combine with like for each of character. When you itterate characters you can easily replace upper characters for upper char + space.
WHERE SourceText COLLATE Latin1_General_CS_AI like '[A-Z]'
-- or for variable #char COLLATE Latin1_General_CS_AI = upper(#char)
The important in Latin1_General_CS_AI where "CS" is Case sensitive.
If you want to make this reusable for some reason, here's the code to make a user function to call.
DROP FUNCTION IF EXISTS udf_SpacesforCases;
GO
CREATE FUNCTION udf_SpacesForCases (#string NVARCHAR(MAX))
RETURNS NVARCHAR(MAX)
AS
BEGIN
DECLARE #len INT = LEN(#string)
,#iterator INT = 2 --Don't put space to left of first even if it's a capital
;
WHILE #iterator <= LEN(#string)
BEGIN
IF PATINDEX('[ABCDEFGHIJKLMNOPQRSTUVWXYZ]',SUBSTRING(#string,#iterator,1) COLLATE Latin1_General_CS_AI) <> 0
BEGIN
SET #string = STUFF(#string,#iterator,0,' ');
SET #iterator += 1;
END
;
SET #iterator += 1;
END
RETURN #string;
END
;
GO
SELECT dbo.udf_SpacesForCases('AliBabaSaidHello');
Any solution that involves a scalar user defined function and/or a loop will not perform as well as a set-based solution. This is a cake walk using using NGrams8K:
DECLARE #string varchar(1000) = 'AliBabaSaidHello';
SELECT newString =
( SELECT
CASE
WHEN ASCII(token) BETWEEN 65 AND 90 AND position > 1 THEN ' '+token ELSE token
END+''
FROM dbo.NGrams8k(#string, 1)
FOR XML PATH(''));
Returns: "Ali Baba Said Hello" (no quotes).
Note that the there is not a space before the first character. Alternatively, a set-based solution that doesn't use the function would look like this:
WITH
E1(N) AS (SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))t(c)),
iTally(N) AS
(
SELECT TOP (LEN(#string)) ROW_NUMBER() OVER (ORDER BY (SELECT 1))
FROM E1 a, E1 b, E1 c, E1 d
),
nGrams(NewString) AS
(
SELECT
CASE WHEN ASCII(SUBSTRING(#string, N, 1)) BETWEEN 65 AND 90 AND N > 1
THEN ' '+SUBSTRING(#string, N, 1) ELSE SUBSTRING(#string, N, 1)
END+''
FROM iTally
FOR XML PATH('')
)
SELECT NewString
FROM nGrams;
The APL approach is to split the input into characters, pad the characters as needed, then reassemble the string. In T-SQL it would look rather like this:
-- Sample data.
declare #Samples as Table ( Sample VarChar(32) );
insert into #Samples ( Sample ) values ( 'AliBabaSaidHello' ), ( 'MeshMuscleShirt' );
select * from #Samples;
-- Stuff it.
with
Ten ( Number ) as ( select Number from ( values (0), (1), (2), (3), (4), (5), (6), (7), (8), (9) ) as Digits( Number ) ),
TenUp2 ( Number ) as ( select 42 from Ten as L cross join Ten as R ),
TenUp4 ( Number ) as ( select 42 from TenUp2 as L cross join TenUp2 as R ),
Numbers ( Number ) as ( select Row_Number() over ( order by ( select NULL ) ) from TenUp4 ),
Characters ( Sample, Number, PaddedCh ) as (
select S.Sample, N.Number, PC.PaddedCh
from #Samples as S inner join
Numbers as N on N.Number <= Len( S.Sample ) cross apply
( select SubString( S.Sample, N.Number, 1 ) as Ch ) as SS cross apply
( select case when N.Number > 1 and ASCII( 'A' ) <= ASCII( SS.Ch ) and ASCII( SS.Ch ) <= ASCII( 'Z' ) then ' ' + Ch else Ch end as PaddedCh ) as PC )
select S.Sample,
( select PaddedCh from Characters where Sample = S.Sample order by Number for XML path(''), type).value('.[1]', 'VarChar(max)' ) as PaddedSample
from #Samples as S
order by Sample;
Another option (quite verbose) might be:
SELECT REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE('AliBabaSaidHello' COLLATE Latin1_General_CS_AS,'A',' A'),'B',' B'),'C',' C'),'D',' D'),'E',' E'),'F',' F'),'G',' G'),'H',' H'),'I',' I'),'J',' J'),'K',' K'),'L',' L'),'M',' M'),'N',' N'),'O',' O'),'P',' P'),'Q',' Q'),'R',' R'),'S',' S'),'T',' T'),'U',' U'),'V',' V'),'W',' W'),'X',' X'),'Y',' Y'),'Z',' Z')

T-SQL - Update first letter in each word of a string that are not 'or', 'of' or 'and' to uppercase. Lowercase 'or', 'of' or 'and' if found

Given the below table and data:
IF OBJECT_ID('tempdb..#Temp') IS NOT NULL
DROP TABLE #Temp
CREATE TABLE #Temp
(
ID INT,
Code INT,
PDescription VARCHAR(2000)
)
INSERT INTO #Temp
(ID,
Code,
PDescription)
VALUES (1,0001,'c and d, together'),
(2,0002,'equals or Exceeds $27.00'),
(3,0003,'Fruit Evaporating Or preserving'),
(4,0004,'Domestics And domestic Maintenance'),
(5,0005,'Bakeries and cracker')
SELECT *
FROM #Temp
DROP TABLE #Temp
Output:
ID Code PDescription
1 1 c and d, together
2 2 equals or Exceeds $27.00
3 3 Fruit Evaporating Or preserving
4 4 Domestics And domestic Maintenance
5 5 Bakeries and cracker
I need a way to achieve the below update to the description field:
ID Code PDescription
1 1 C and D, Together
2 2 Equals or Exceeds $27.00
3 3 Fruit Evaporating or Preserving
4 4 Domestics and Domestic Maintenance
5 5 Bakeries and Cracker
If you fancied going the SQL CLR route the function could look something like
using System.Data.SqlTypes;
using System.Text.RegularExpressions;
public partial class UserDefinedFunctions
{
//One or more "word characters" or apostrophes
private static readonly Regex _regex = new Regex("[\\w']+");
[Microsoft.SqlServer.Server.SqlFunction]
public static SqlString ProperCase(SqlString subjectString)
{
string resultString = null;
if (!subjectString.IsNull)
{
resultString = _regex.Replace(subjectString.ToString().ToLowerInvariant(),
(Match match) =>
{
var word = match.Value;
switch (word)
{
case "or":
case "of":
case "and":
return word;
default:
return char.ToUpper(word[0]) + word.Substring(1);
}
});
}
return new SqlString(resultString);
}
}
Doubtless there may be Globalization issues in the above but it should do the job for English text.
You could also investigate TextInfo.ToTitleCase but that still leaves you needing to handle your exceptions.
The following function is not the most elegant of solutions but should do what you want.
ALTER FUNCTION [dbo].[ToProperCase](#textValue AS NVARCHAR(2000))
RETURNS NVARCHAR(2000)
AS
BEGIN
DECLARE #reset BIT;
DECLARE #properCase NVARCHAR(2000);
DECLARE #index INT;
DECLARE #character NCHAR(1);
SELECT #reset = 1, #index=1, #properCase = '';
WHILE (#index <= len(#textValue))
BEGIN
SELECT #character= substring(#textValue,#index,1),
#properCase = #properCase + CASE WHEN #reset=1 THEN UPPER(#character) ELSE LOWER(#character) END,
#reset = CASE WHEN #character LIKE N'[a-zA-Z\'']' THEN 0 ELSE 1 END,
#index = #index +1
END
SET #properCase = N' ' + #properCase + N' ';
SET #properCase = REPLACE(#properCase, N' And ', N' and ');
SET #properCase = REPLACE(#properCase, N' Or ', N' or ');
SET #properCase = REPLACE(#properCase, N' Of ', N' of ');
RETURN RTRIM(LTRIM(#properCase))
END
Example use:
IF OBJECT_ID('tempdb..#Temp') IS NOT NULL
DROP TABLE #Temp
CREATE TABLE #Temp
(
ID INT,
Code INT,
PDescription VARCHAR(2000)
)
INSERT INTO #Temp
(ID,
Code,
PDescription)
VALUES (1,0001, N'c and d, together and'),
(2,0002, N'equals or Exceeds $27.00'),
(3,0003, N'Fruit Evaporating Or preserving'),
(4,0004, N'Domestics And domestic Maintenance'),
(5,0005, N'Bakeries and cracker')
SELECT ID, Code, dbo.ToProperCase(PDescription) AS [Desc]
FROM #Temp
DROP TABLE #Temp
If you want to convert your text to proper case before inserting into table, then simply call function as follow:
INSERT INTO #Temp
(ID,
Code,
PDescription)
VALUES (1,0001, dbo.ToProperCase( N'c and d, together and')),
(2,0002, dbo.ToProperCase( N'equals or Exceeds $27.00')),
(3,0003, dbo.ToProperCase( N'Fruit Evaporating Or preserving')),
(4,0004, dbo.ToProperCase( N'Domestics And domestic Maintenance')),
(5,0005, dbo.ToProperCase( N'Bakeries and cracker'))
This is a dramatically modified version of my Proper UDF. The good news is you may be able to process the entire data-set in ONE SHOT rather than linear.
Take note of #OverR (override)
Declare #Table table (ID int,Code int,PDescription varchar(150))
Insert into #Table values
(1,1,'c and d, together'),
(2,2,'equals or Exceeds $27.00'),
(3,3,'Fruit Evaporating Or preserving'),
(4,4,'Domestics And domestic Maintenance'),
(5,5,'Bakeries and cracker')
-- Generate Base Mapping Table - Can be an Actual Table
Declare #Pattn table (Key_Value varchar(25));Insert into #Pattn values (' '),('-'),('_'),(','),('.'),('&'),('#'),(' Mc'),(' O''') -- ,(' Mac')
Declare #Alpha table (Key_Value varchar(25));Insert Into #Alpha values ('A'),('B'),('C'),('D'),('E'),('F'),('G'),('H'),('I'),('J'),('K'),('L'),('M'),('N'),('O'),('P'),('Q'),('R'),('S'),('T'),('U'),('V'),('W'),('X'),('Y'),('X')
Declare #OverR table (Key_Value varchar(25));Insert Into #OverR values (' and '),(' or '),(' of ')
Declare #Map Table (MapSeq int,MapFrom varchar(25),MapTo varchar(25))
Insert Into #Map
Select MapSeq=1,MapFrom=A.Key_Value+B.Key_Value,MapTo=A.Key_Value+B.Key_Value From #Pattn A Join #Alpha B on 1=1
Union All
Select MapSeq=99,MapFrom=A.Key_Value,MapTo=A.Key_Value From #OverR A
-- Convert Base Data Into XML
Declare #XML xml
Set #XML = (Select KeyID=ID,String=+' '+lower(PDescription)+' ' from #Table For XML RAW)
-- Convert XML to varchar(max) for Global Search & Replace
Declare #String varchar(max)
Select #String = cast(#XML as varchar(max))
Select #String = Replace(#String,MapFrom,MapTo) From #Map Order by MapSeq
-- Convert Back to XML
Select #XML = cast(#String as XML)
-- Generate Final Results
Select KeyID = t.col.value('#KeyID', 'int')
,NewString = ltrim(rtrim(t.col.value('#String', 'varchar(150)')))
From #XML.nodes('/row') AS t (col)
Order By 1
Returns
KeyID NewString
1 C and D, Together
2 Equals or Exceeds $27.00
3 Fruit Evaporating or Preserving
4 Domestics and Domestic Maintenance
5 Bakeries and Cracker
You don't even need functions and temporary objects. Take a look at this query:
WITH Processor AS
(
SELECT ID, Code, 1 step,
CONVERT(nvarchar(MAX),'') done,
LEFT(PDescription, CHARINDEX(' ', PDescription, 0)-1) process,
SUBSTRING(PDescription, CHARINDEX(' ', PDescription, 0)+1, LEN(PDescription)) waiting
FROM #temp
UNION ALL
SELECT ID, Code, step+1,
done+' '+CASE WHEN process IN ('and', 'or', 'of') THEN LOWER(process) ELSE UPPER(LEFT(process, 1))+LOWER(SUBSTRING(process, 2, LEN(process))) END,
CASE WHEN CHARINDEX(' ', waiting, 0)>0 THEN LEFT(waiting, CHARINDEX(' ', waiting, 0)-1) ELSE waiting END,
CASE WHEN CHARINDEX(' ', waiting, 0)>0 THEN SUBSTRING(waiting, CHARINDEX(' ', waiting, 0)+1, LEN(waiting)) ELSE NULL END FROM Processor
WHERE process IS NOT NULL
)
SELECT ID, Code, done PSDescription FROM
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY step DESC) RowNum FROM Processor
) Ordered
WHERE RowNum=1
ORDER BY ID
It produces desired result as well. You can SELECT * FROM Processor to see all steps executed.

split alpha and numeric using sql

I have a table and it has a 3 columns. The first column is the data that contains value(numeric) and unit(percentage and etc..), the second column is numeric column, the third is Unit column. What I want to do is split the numeric and the unit from the first column then put those split-ted data to its designated column.
Here is my table:
I tried this function:SO link here..., it really does splitting alpha and numeric but then I'm new in using SQL Function, my problem there is the parameter must be in string STRING, so what I did is change it to Sub Query but it gives me error.
Sample COde:
SQL FUNCTION:
create function [dbo].[GetNumbersFromText](#String varchar(2000))
returns table as return
(
with C as
(
select cast(substring(S.Value, S1.Pos, S2.L) as int) as Number,
stuff(s.Value, 1, S1.Pos + S2.L, '') as Value
from (select #String+' ') as S(Value)
cross apply (select patindex('%[0-9]%', S.Value)) as S1(Pos)
cross apply (select patindex('%[^0-9]%', stuff(S.Value, 1, S1.Pos, ''))) as S2(L)
union all
select cast(substring(S.Value, S1.Pos, S2.L) as int),
stuff(S.Value, 1, S1.Pos + S2.L, '')
from C as S
cross apply (select patindex('%[0-9]%', S.Value)) as S1(Pos)
cross apply (select patindex('%[^0-9]%', stuff(S.Value, 1, S1.Pos, ''))) as S2(L)
where patindex('%[0-9]%', S.Value) > 0
)
select Number
from C
)
SELECT STATEMENT with SUB Query:
declare #S varchar(max)
select number from GetNumbersFromText(Select SomeColm From Table_Name) option (maxrecursion 0)
BTW, im using sql server 2005.
Thanks!
If the numeric part is always at the beginning, then you can use this:
PATINDEX('%[0-9][^0-9]%', ConcUnit)
to get the index of the last digit.
Thus, this:
DECLARE #str VARCHAR(MAX) = '4000 ug/ML'
SELECT LEFT(#str, PATINDEX('%[0-9][^0-9]%', #str )) AS Number,
LTRIM(RIGHT(#str, LEN(#str) - PATINDEX('%[0-9][^0-9]%', #str ))) As Unit
gives you:
Number Unit
-------------
4000 ug/ML
EDIT:
If numeric data include double values as well, then you can use this:
SELECT LEN(#str) - PATINDEX ('%[^0-9][0-9]%', REVERSE(#str))
to get the index of the last digit.
Thus, this:
SELECT LEFT(#str, LEN(#str) - PATINDEX ('%[^0-9][0-9]%', REVERSE(#str)))
gives you the numeric part.
And this:
SELECT LEFT(#str, LEN(#str) - PATINDEX ('%[^0-9][0-9]%', REVERSE(#str))) AS Numeric,
CASE
WHEN CHARINDEX ('%', #str) <> 0 THEN LTRIM(RIGHT(#str, LEN(#str) - CHARINDEX ('%', #str)))
ELSE LTRIM(RIGHT(#str, PATINDEX ('%[^0-9][0-9]%', REVERSE(#str))))
END AS Unit
gives you both numberic and unit part.
Here are some tests that I made with the data you have posted:
Input:
DECLARE #str VARCHAR(MAX) = '50 000ug/ML'
Output:
Numeric Unit
------------
50 000 ug/ML
Input:
DECLARE #str VARCHAR(MAX) = '99.5%'
Output:
Numeric Unit
------------
99.5
Input:
DECLARE #str VARCHAR(MAX) = '4000 . 35 % ug/ML'
Output:
Numeric Unit
------------------
4000 . 35 ug/ML
Here is my answer. Check output in SQLFiddle for the same.
create TABLE temp
(
string NVARCHAR(50)
)
INSERT INTO temp (string)
VALUES
('4000 ug\ml'),
('2000 ug\ml'),
('%'),
('ug\ml')
SELECT subsrtunit,LEFT(subsrtnumeric, PATINDEX('%[^0-9]%', subsrtnumeric+'t') - 1)
FROM (
SELECT subsrtunit = SUBSTRING(string, posofchar, LEN(string)),
subsrtnumeric = SUBSTRING(string, posofnumber, LEN(string))
FROM (
SELECT string, posofchar = PATINDEX('%[^0-9]%', string),
posofnumber = PATINDEX('%[0-9]%', string)
FROM temp
) d
) t
Updated Version to handle 99.5 ug\ml
create TABLE temp
(
string NVARCHAR(50)
)
INSERT INTO temp (string)
VALUES
('4000 ug\ml'),
('2000 ug\ml'),
('%'),
('ug\ml'),
('99.5 ug\ml')
SELECT subsrtunit,LEFT(subsrtnumeric, PATINDEX('%[^0-9.]%', subsrtnumeric+'t') - 1)
FROM (
SELECT subsrtunit = SUBSTRING(string, posofchar, LEN(string)),
subsrtnumeric = SUBSTRING(string, posofnumber, LEN(string))
FROM (
SELECT string, posofchar = PATINDEX('%[^0-9.]%', string),
posofnumber = PATINDEX('%[0-9.]%', string)
FROM temp
) d
) t
Updated Version: To handle 1 000 ug\ml,20 000ug\ml
create TABLE temp
(
string NVARCHAR(50)
)
INSERT INTO temp (string)
VALUES
('4000 ug\ml'),
('2000 ug\ml'),
('%'),
('ug\ml'),
('99.5 ug\ml'),
('1 000 ug\ml'),
('20 000ug\ml')
SELECT substring(replace(subsrtunit,' ',''),PATINDEX('%[0-9.]%', replace(subsrtunit,' ',''))+1,len(subsrtunit)),
LEFT(replace(subsrtnumeric,' ',''), PATINDEX('%[^0-9.]%', replace(subsrtnumeric,' ','')+'t') - 1)
FROM (
SELECT subsrtunit = SUBSTRING(string, posofchar, LEN(string)),
subsrtnumeric = SUBSTRING(string, posofnumber, LEN(string))
FROM (
SELECT string, posofchar = PATINDEX('%[^0-9.]%', replace(string,' ','')),
posofnumber = PATINDEX('%[0-9.]%', replace(string,' ',''))
FROM temp
) d
) t
Check out SQLFiddle for the same.
Would something like this work? Based on the shown data it looks like it would.
Apply it to your data set as a select and if you like the results then you can make an update from it.
WITH cte as (SELECT 'ug/mL' ConcUnit, 500 as [Numeric], '' as Unit
UNION ALL SELECT '2000 ug/mL', NULL, '')
SELECT
[ConcUnit] as [ConcUnit],
[Numeric] as [Original Numeric],
[Unit] as [Original Unit],
CASE WHEN ConcUnit LIKE '% %' THEN
SUBSTRING(ConcUnit, 1, CHARINDEX(' ', ConcUnit) - 1)
ELSE [Numeric] END as [New Numeric],
CASE WHEN ConcUnit LIKE '% %'
THEN SUBSTRING(ConcUnit, CHARINDEX(' ', ConcUnit) + 1, LEN(ConcUnit))
ELSE ConcUnit END as [New Unit]
FROM cte
change #concunit & #unitx Respectively
DECLARE #concunit varchar(10)='45.5%'
DECLARE #unitx varchar(10)='%'
BEGIN
SELECT RTRIM(SUBSTRING( #concunit , 1 , CHARINDEX( #unitx , #concunit
) - 1
)) AS Number,
RTRIM(SUBSTRING( #concunit , CHARINDEX( #unitx , #concunit
) , LEN( #concunit
) - (CHARINDEX( #unitx , #concunit
) - 1)
)) AS Unit
end
I had the same dilemma, but in my case the alpha's were in front of the numerics.
So using the logic that #Giorgos Betsos added to his answer, I just reversed it.
I.e., when your input is :
abc123
You can split it like this:
declare #input varchar(30) = 'abc123'
select
replace(#input,reverse(LEFT(reverse(#input), PATINDEX('%[0-9][^0-9]%', reverse(#input) ))),'') Alpha
, reverse(LEFT(reverse(#input), PATINDEX('%[0-9][^0-9]%', reverse(#input) ))) Numeric
Results :

Resources