splitting a string column correctly by spaces

splitting a string column correctly by spaces - sql-server

in my query I have several hundred records with strings from the iSeries Message Queue like this:
006 1 AccountSetBalance 0000000000 EQ 2016-03-01-18.45.42.002000 0038882665 _ 123456 12345612345678 17017362 0 0
I need to show in my results the account number part 12345678 and the balance part which is 17017362
I have tried:
SELECT MQ_Message
, SUBSTRING(MQ_Message,92,30) -- = 12345678 17017362 0
, SUBSTRING(MQ_Message,92,8) -- = 12345678 , SUBSTRING(MQ_Message,100, CHarIndex(' ', SUBSTRING('006 1 AccountSetBalance 0000000000 EQ 2016-03-01-18.45.42.002000 0038882665 _ 123456 12345612345678 17017362 0 0',92,20)) )
, CHarIndex(' ', SUBSTRING('006 1 AccountSetBalance 0000000000 EQ 2016-03-01-18.45.42.002000 0038882665 _ 123456 12345612345678 17017362 0 0',99,20))
, CHARINDEX(' ','17017362 0 0')
from outboundMessages WHERE message_Type = '006'
I can get the account easily enough, as the string is fixed length up to the balance, but then I need to split the string returned by SUBSTRING(MQ_Message,92,30) and get the balance part out of it which is 17017362 and will be different between 0 and maybe 999999 (in pence!)
I am really stuck trying to get the balance, having tried every possible combination of using CHARINDEX.
What is the best way to do this?

DECLARE #string NVARCHAR(MAX) = '006 1 AccountSetBalance 0000000000 EQ 2016-03-01-18.45.42.002000 0038882665 _ 123456 12345612345678 17017362 0 0',
#xml xml
select #xml = cast('<d><q>'+REPLACE(#string,' ','</q><q>')+'</q></d>' as xml)
SELECT n.v.value('q[9]','integer'),
n.v.value('q[11]','integer')
FROM #xml.nodes('/d') AS n(v);
Result:
----------- -----------
123456 17017362
(1 row(s) affected)

Related

PATINDEX with SOUNDEX

Want to search the string using PATINDEX and SOUNDEX.
I have the following table with some sample data to search the given string using PATINDEX and SOUNDEX.
create table tbl_pat_soundex
(
col_str varchar(max)
);
insert into tbl_pat_soundex values('Smith A Steve');
insert into tbl_pat_soundex values('Steve A Smyth');
insert into tbl_pat_soundex values('A Smeeth Stive');
insert into tbl_pat_soundex values('Steve Smith A');
insert into tbl_pat_soundex values('Smit Steve A');
String to search:- 'Smith A Steve'
SELECT col_str,PATINDEX('%Smith%',col_str) [Smith],PATINDEX('%A%',col_str) [A],PATINDEX('%Steve%',col_str) [Steve]
FROM tbl_pat_soundex
Getting Output:
col_str Smith A Steve
---------------------------------
Smith A Steve 1 7 9
Steve A Smyth 0 7 1
A Smeeth Stive 0 1 0
Steve Smith A 7 13 1
Smit Steve A 0 12 6
Expected Output:
col_str Smith A Steve
---------------------------------
Smith A Steve 1 7 9
Steve A Smyth 9 7 1
A Smeeth Stive 3 1 10
Steve Smith A 7 13 1
Smit Steve A 1 12 6
Tried:
SELECT col_str,
PATINDEX('%'+soundex('Smith')+'%',soundex(col_str)) [Smith],
PATINDEX('%'+soundex('A')+'%',soundex(col_str)) [A],
PATINDEX('%'+soundex('Steve')+'%',soundex(col_str)) [Steve]
FROM tbl_pat_soundex
But getting unexpected result:
col_str Smith A Steve
---------------------------------
Smith A Steve 1 0 0
Steve A Smyth 0 0 1
A Smeeth Stive 0 1 0
Steve Smith A 0 0 1
Smit Steve A 1 0 0
Note: I have 100 Millions of records in the table to search for.

Here's one option, not sure how it would perform with 100 million records considering all that you need to do. You'll have to test that out.
At a high level how I understand this is you basically need
Search all words in a string based on the words of another string
Returning the character starting position in the original string where that word equals or sounds like the search word.
You can use DIFFERENCE() for the comparison:
DIFFERENCE compares two different SOUNDEX values, and returns an
integer value. This value measures the degree that the SOUNDEX values
match, on a scale of 0 to 4. A value of 0 indicates weak or no
similarity between the SOUNDEX values; 4 indicates strongly similar,
or even identically matching, SOUNDEX values.
You'll need to split the string based on the space ' ' and since you're 2008 you'd have to roll your own function.
I used the XML function from here, https://sqlperformance.com/2012/07/t-sql-queries/split-strings, for my examples, you'll obviously need to adjust if you have your own or want to use something different:
CREATE FUNCTION dbo.SplitStrings_XML
(
#List NVARCHAR(MAX),
#Delimiter NVARCHAR(255)
)
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
(
SELECT Item = y.i.value('(./text())[1]', 'nvarchar(4000)')
FROM
(
SELECT x = CONVERT(XML, '<i>'
+ REPLACE(#List, #Delimiter, '</i><i>')
+ '</i>').query('.')
) AS a CROSS APPLY x.nodes('i') AS y(i)
);
GO
I switched and use table variables to show the example, I would suggest not doing that with the amount of data you have and create and use physical tables.
Option 1 - Not dynamic:
DECLARE #tbl_pat_soundex TABLE
(
[col_str] VARCHAR(MAX)
);
INSERT INTO #tbl_pat_soundex
VALUES ( 'Smith A Steve' )
,( 'Steve A Smyth' )
,( 'A Smeeth Stive' )
,( 'Steve Smith A' )
,( 'Smit Steve A' )
SELECT DISTINCT [aa].[col_str]
, MAX([aa].[Smith]) OVER ( PARTITION BY [aa].[col_str] ) AS [Smith]
, MAX([aa].[A]) OVER ( PARTITION BY [aa].[col_str] ) AS [A]
, MAX([aa].[Steve]) OVER ( PARTITION BY [aa].[col_str] ) AS [Steve]
FROM (
SELECT [a].[col_str]
, CASE WHEN DIFFERENCE([b].[item], 'Smith') = 4 THEN
CHARINDEX([b].[item], [a].[col_str])
ELSE 0
END AS [Smith]
, CASE WHEN DIFFERENCE([b].[item], 'A') = 4 THEN
CHARINDEX([b].[item], [a].[col_str])
ELSE 0
END AS [A]
, CASE WHEN DIFFERENCE([b].[item], 'Steve') = 4 THEN
CHARINDEX([b].[item], [a].[col_str])
ELSE 0
END AS [Steve]
FROM #tbl_pat_soundex [a]
CROSS APPLY [dbo].[SplitStrings_XML]([a].[col_str], ' ') [b]
) AS [aa];
Using the function we split the string into individual words
Then we use a case statement to check the DIFFERENCE value
If that DIFFERENCE value equals 4 we then return the CHARINDEX value of the original word against string.
If doesn't equal we return 0
Then from there it's a matter of getting the max value of each based on the original string:
, MAX([aa].[Smith]) OVER ( PARTITION BY [aa].[col_str] ) AS [Smith]
, MAX([aa].[A]) OVER ( PARTITION BY [aa].[col_str] ) AS [A]
, MAX([aa].[Steve]) OVER ( PARTITION BY [aa].[col_str] ) AS [Steve]
To get you your final results:
Option 2 - Dynamic with a pivot:
We'll declare the string we want to search, split that out and search for those individuals words in the original string and then pivot the results.
--This example is using global temp tables as it's showing how
--to build a dynamic pivot
IF OBJECT_ID('tempdb..##tbl_pat_soundex') IS NOT NULL
DROP TABLE [##tbl_pat_soundex];
IF OBJECT_ID('tempdb..##tbl_col_str_SearchString') IS NOT NULL
DROP TABLE [##tbl_col_str_SearchString];
CREATE TABLE [##tbl_pat_soundex]
(
[col_str] VARCHAR(MAX)
);
INSERT INTO [##tbl_pat_soundex]
VALUES ( 'Smith A Steve' )
, ( 'Steve A Smyth' )
, ( 'A Smeeth Stive' )
, ( 'Steve Smith A' )
, ( 'Smit Steve A' );
--What are you searching for?
DECLARE #SearchString NVARCHAR(200);
SET #SearchString = N'Smith A Steve';
--We build a table we load with every combination of the words from the string and the words from the SearchString for easier comparison.
CREATE TABLE [##tbl_col_str_SearchString]
(
[col_str] NVARCHAR(MAX)
, [col_str_value] NVARCHAR(MAX)
, [SearchValue] NVARCHAR(200)
);
--Load that table for comparison
--split our original string into individual words
--also split our search string into individual words and give me all combinations.
INSERT INTO [##tbl_col_str_SearchString] (
[col_str]
, [col_str_value]
, [SearchValue]
)
SELECT DISTINCT [a].[col_str]
, [b].[item]
, [c].[item]
FROM [##tbl_pat_soundex] [a]
CROSS APPLY [dbo].[SplitStrings_XML]([a].[col_str], ' ') [b]
CROSS APPLY [dbo].[SplitStrings_XML](#SearchString, ' ') [c]
ORDER BY [a].[col_str];
--Then we can easily compare each word and search word for those that match or sound alike using DIFFERNCE()
SELECT [col_str], [col_str_value], [SearchValue], CASE WHEN DIFFERENCE([col_str_value], [SearchValue]) = 4 THEN CHARINDEX([col_str_value], [col_str]) ELSE 0 END AS [Match] FROM ##tbl_col_str_SearchString
--Then we can pivot on it
--and we will need to make it dynamic since we are not sure what what #SearchString could be.
DECLARE #PivotSQL NVARCHAR(MAX);
DECLARE #pivotColumn NVARCHAR(MAX);
SET #pivotColumn = N'[' + REPLACE(#SearchString, ' ', '],[') + N']';
SET #PivotSQL = N'SELECT * FROM (
SELECT [col_str], [SearchValue], CASE WHEN DIFFERENCE([col_str_value], [SearchValue]) = 4 THEN CHARINDEX([col_str_value], [col_str]) ELSE 0 END AS [Match] FROM ##tbl_col_str_SearchString
) aa
PIVOT (MAX([Match]) FOR [SearchValue] IN (' + #pivotColumn
+ N')) AS MaxMatch
ORDER BY [MaxMatch].[col_str]
';
--Giving us the final results.
EXEC sp_executesql #PivotSQL

Sorting varchar field with mixed alphanumeric data

I searched and read a lot of answers on here, but can't find one that will answer my problem, (or help me to find the answer on my own).
We have a table which contains a varchar display field, who's data is entered by the customer.
When we display the results, our customer wants the results to be ordered "correctly".
A sample of what the data could like is as follows:
"AAA 2 1 AAA"
"AAA 10 1 AAA"
"AAA 10 2 BAA"
"AAA 101 1 AAA"
"BAA 101 2 BBB"
"BAA 101 10 BBB"
"BAA 2 2 AAA"
Sorting by this column ASC returns:
1: "AAA 10 1 AAA"
2: "AAA 10 2 BAA"
3: "AAA 101 1 AAA"
4: "AAA 2 1 AAA"
5: "BAA 101 10 BBB"
6: "BAA 101 2 BBB"
7: "BAA 2 2 AAA"
The customer would like row 4 to actually be the first row (as 2 comes before 10), and similarly row 7 to be between rows 4 and 5, as shown below:
1: "AAA 2 1 AAA"
2: "AAA 10 1 AAA"
3: "AAA 10 2 BAA"
4: "AAA 101 1 AAA"
5: "BAA 2 2 AAA"
6: "BAA 101 10 BBB"
7: "BAA 101 2 BBB"
Now, the real TRICKY bit is, there is no hard and fast rule to what the data will look like in this column; it is entirely down to the customer as to what they put in here (the data shown above is just arbitrary to demonstrate the problem).
Any Help?
EDIT:
learning that this is referred to as "natural sorting" has improved my search results massively
I'm going to give the accepted answer to this question a bash and will update accordingly:
Natural (human alpha-numeric) sort in Microsoft SQL 2005

First create this function
Create FUNCTION dbo.SplitAndJoin
(
#delimited nvarchar(max),
#delimiter nvarchar(100)
) RETURNS Nvarchar(Max)
AS
BEGIN
declare #res nvarchar(max)
declare #t TABLE
(
-- Id column can be commented out, not required for sql splitting string
id int identity(1,1), -- I use this column for numbering splitted parts
val nvarchar(max)
)
declare #xml xml
set #xml = N'<root><r>' + replace(#delimited,#delimiter,'</r><r>') + '</r></root>'
insert into #t(val)
select
r.value('.','varchar(max)') as item
from #xml.nodes('//root/r') as records(r)
SELECT #res = STUFF((SELECT ' ' + case when isnumeric(val) = 1 then RIGHT('00000000'+CAST(val AS VARCHAR(8)),8) else val end
FROM #t
FOR XML PATH('')), 1, 1, '')
RETURN #Res
END
GO
This function gets an space delimited string and split it to words then join them together again by space but if the word is number it adds 8 leading zeros
then you use this query
Select * from Test
order by dbo.SplitAndJoin(col1,' ')
Live result on SQL Fiddle

Without consistency, you only have brute force
Without rules, your brute force is limited
I've made some assumptions with this code: if it starts with 3 alpha characters, then a space, then a number (up to 3 digits), let's treat it differently.
There's nothing special about this - it is just string manipulation being brute forced in to giving you "something". Hopefully it illustrates how painful this is without having consistency and rules!
DECLARE #t table (
a varchar(50)
);
INSERT INTO #t (a)
VALUES ('AAA 2 1 AAA')
, ('AAA 10 1 AAA')
, ('AAA 10 2 BAA')
, ('AAA 101 1 AAA')
, ('BAA 101 2 BBB')
, ('BAA 101 10 BBB')
, ('BAA 2 2 AAA')
, ('Completely different')
;
; WITH step1 AS (
SELECT a
, CASE WHEN a LIKE '[A-Z][A-Z][A-Z] [0-9]%' THEN 1 ELSE 0 END As fits_pattern
, CharIndex(' ', a) As first_space
FROM #t
)
, step2 AS (
SELECT *
, CharIndex(' ', a, first_space + 1) As second_space
, CASE WHEN fits_pattern = 1 THEN Left(a, 3) ELSE 'ZZZ' END As first_part
, CASE WHEN fits_pattern = 1 THEN SubString(a, first_space + 1, 1000) ELSE 'ZZZ' END As rest_of_it
FROM step1
)
, step3 AS (
SELECT *
, CASE WHEN fits_pattern = 1 THEN SubString(rest_of_it, 1, second_space - first_space - 1) ELSE 'ZZZ' END As second_part
FROM step2
)
SELECT *
, Right('000' + second_part, 3) As second_part_formatted
FROM step3
ORDER
BY first_part
, second_part_formatted
, a
;
Relevant, sorted results:
a
---------------------
AAA 2 1 AAA
AAA 10 1 AAA
AAA 10 2 BAA
AAA 101 1 AAA
BAA 2 2 AAA
BAA 101 10 BBB
BAA 101 2 BBB
Completely different
This code can be vastly improved/shortened. I've just left it verbose in order to give you some clarity over the steps taken.

How to split column values and add to a temp table

I am using SQL Server 2008 R2.
I have a table tblInstitution as follows
InstitutionCode InstitutionDesc
---------------------------------
ABC Abra Cada Brad
DEF Def Fede Eeee
GHJ Gee Hee
I want to split the values in InstitutionDesc and store it based on the institution code
InstitutionCode Token Score
-------------------------------
ABC Abra 0
ABC Cada 0
ABC Brad 0
DEF Def 0
DEF Fede 0
DEF Eeee 0
GHJ Gee 0
GHJ Hee 0
Is there a way I can do this in a set-based operation?
I have seen examples where a single column value can be split into multiple column values of the same row. But I am not able to find an example where the same column can be split into different rows. I am not sure what exactly should be searched for. Is it something to do with CTEs.

Here is a recursive CTE option...
If Object_ID('tempdb..#tblInstitution') Is Not Null Drop Table #tblInstitution;
Create Table #tblInstitution (InstitutionCode Varchar(10), InstitutionDesc Varchar(50));
Insert #tblInstitution (InstitutionCode, InstitutionDesc)
Values ('ABC','Abra Cada Brad'),
('DEF','Def Fede Eeee'),
('GHJ','Gee Hee'),
('KLM','Kappa');
With base As
(
Select InstitutionCode,
LTRIM(RTRIM(InstitutionDesc)) As InstitutionDesc
From #tblInstitution
), recur As
(
Select InstitutionCode,
Left(InstitutionDesc, CharIndex(' ', InstitutionDesc + ' ') - 1) As Token,
Case
When CharIndex(' ', InstitutionDesc) > 0
Then Right(InstitutionDesc, Len(InstitutionDesc) - CharIndex(' ', InstitutionDesc))
Else Null
End As Remaining
From base
Union All
Select InstitutionCode,
Left(Remaining, CharIndex(' ', Remaining + ' ') - 1) As Token,
Case
When CharIndex(' ', Remaining) > 0
Then Right(Remaining, Len(Remaining) - CharIndex(' ', Remaining))
Else Null
End As Remaining
From recur
Where Remaining Is Not Null
)
Select InstitutionCode,
Token,
0 As Score
From recur
Order By InstitutionCode

SQL query issue No data is showing

here is my result set but when i am doing query on the result set then no data is coming but i am not being able to understand what is the wrong in my query.
call Start Caller direction Is_Internal continuation call duration party1name
------------------ ------------ --------- ----------- ------------ ------------------ -----------------
1/15/2014 8:47 346241552 I 0 0 0:00:18 VM Kanaal 1
1/15/2014 9:56 252621028 I 0 0 0:00:17 Kanaal 1
1/15/2014 9:58 252621028 I 0 0 0:00:17 Kanaal 1
1/15/2014 9:01 252621028 I 0 1 0:00:08 Kanaal 1
1/15/2014 9:01 252621028 I 0 0 0:01:57 Coen
1/15/2014 9:06 302 O 0 0 0:01:53 Coen
1/15/2014 9:07 306 O 0 0 0:01:33 koos de Bruijn
1/15/2014 9:11 644686793 I 0 0 0:00:08 VM Kanaal 1
1/15/2014 9:11 644686793 I 0 0 0:01:46 Coen
1/15/2014 9:27 306 O 0 0 0:00:43 koos de Bruijn
1/15/2014 9:25 302 O 0 0 0:06:46 Coen
1/15/2014 9:46 455426194 I 0 1 0:00:07 VM Kanaal 1
1/15/2014 9:46 455426194 I 0 0 0:00:50 Coen
1/15/2014 9:57 725716251 I 0 1 0:00:10 VM Kanaal 1
1/15/2014 10:00 0 I O 1 0:00:00 Voicemail
my query is here
SELECT Convert(varchar,[call Start],101) Date,[Caller] [Phone No], Count(*) [Total Incomming calls] FROM tridip
where direction='I' and
CAST([call Start] AS datetime) >= CONVERT(datetime,'09:00:00') and
CAST([call Start] AS datetime) <= CONVERT(datetime,'17:30:00')
AND Is_Internal=0 and continuation=0 AND
CONVERT(datetime,CAST([call duration] AS DATETIME),108) <> CAST('00:00:00' AS DATETIME)
and party1name not in ('Voice Mail') and party1name not like 'VM %'
and party1name not like 'Line%'
group by [Caller],Convert(varchar,[call Start],101)
just tell me what is wrong my query. it should show date & phone number and count may be 1 or more than 1. please guide me what to alter.
at the second row there is phone no 252621028 which is repeated and it's count should be greater than 1.
thanks
UPDATE
i figured out the problem and rectified sql as follow
SELECT [Caller] [Phone No], Count(*) [Total Incomming calls] FROM tridip
where direction='I' and
CONVERT(VARCHAR,[call Start],108) >= '09:00:00' and
CONVERT(VARCHAR,[call Start],108) <= '17:30:00'
AND Is_Internal=0 and continuation=0 AND
[call duration] <> '00:00:00'
and party1name not in ('Voice Mail') and party1name not like 'VM %'
and party1name not like 'Line%' and [Caller]>0
group by [Caller]

You are not getting results most probably because of this condition -
CAST([call Start] AS datetime) <= CONVERT(datetime,'17:30:00')
Reason -
Try this -
SELECT CONVERT(datetime,'17:30:00')
The result is - 1900-01-01 17:30:00.000
I believe none of your records have [call start] time less than this and hence no results.
Are your trying to get all the records between 2 times ?

As has been previously mentioned, you are getting a DateTime with the value of 1900-01-01 09:00, which is obviously lower than any of your dates.
You can also use the DATEPART function of SQL Server to compare dates and times easier:
where direction='I'
and DATEPART(hour, CAST([call Start] AS datetime)) >= 9
and DATEPART(hour, CAST([call Start] AS datetime)) <= 17
(You will need to modify this slightly to work for being before 17:30, but it gives you the general idea).

How to Add decimal & zero's after number

I got a problem with converting values.
My Source values are:
1
2
3
4
1 - abcd
2 - xyzabcdefgh
abcdefgh
lmnop
I need the output as
1.00
2.00
3.00
4.00
The problem I have is that some fields contain letters only, and there are also those that contains contain both letters and digits.
I only need to format those numbers only:
1 as 1.00
2 as 2.00
The fields containing letters need to remain the same.
Any help please?

Try something like
select case
when MyVarcharColumn NOT LIKE '%[^0-9]%' then convert(decimal(18,2), MyVarcharColumn)
else MyVarcharColumn
end as MyVarcharColumn
from x
Might not be the most elegent/efficient solution, but it should work
I also Agree with Joe

You can do what Lee said, or enclose within a try catch
Declare #str varchar(10)
Set #str = '1'
Begin Try
Select Convert(Decimal(10,2), #str) -- scale 10, precision 2
End Try
Begin Catch
Select 'Not Numeric'
End Catch

try this
select
case when col1 NOT LIKE '%[^0-9]%'
then convert(varchar(50), convert(decimal(18,2), col1))
else col1 end as col1 from mysource

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

splitting a string column correctly by spaces - sql-server

Related

PATINDEX with SOUNDEX

Sorting varchar field with mixed alphanumeric data

How to split column values and add to a temp table

SQL query issue No data is showing

How to Add decimal & zero's after number

Categories

Resources