How to extract this specific substring in SQL Server? - sql-server

I have a string with a specific pattern:
23;chair,red [$3]
i.e., a number followed by a semicolon, then a name followed by a left square bracket.
Assuming the semicolon ; always exists and the left square bracket [ always exists in the string, how do I extract the text between (and not including) the ; and the [ in a SQL Server query? Thanks.

Combine the SUBSTRING(), LEFT(), and CHARINDEX() functions.
SELECT LEFT(SUBSTRING(YOUR_FIELD,
CHARINDEX(';', YOUR_FIELD) + 1, 100),
CHARINDEX('[', YOUR_FIELD) - 1)
FROM YOUR_TABLE;
This assumes your field length will never exceed 100, but you can make it smarter to account for that if necessary by employing the LEN() function. I didn't bother since there's enough going on in there already, and I don't have an instance to test against, so I'm just eyeballing my parentheses, etc.

Assuming they always exist and are not part of your data, this will work:
declare #string varchar(8000) = '23;chair,red [$3]'
select substring(#string, charindex(';', #string) + 1, charindex(' [', #string) - charindex(';', #string) - 1)

An alternative to the answer provided by #Marc
SELECT SUBSTRING(LEFT(YOUR_FIELD, CHARINDEX('[', YOUR_FIELD) - 1), CHARINDEX(';', YOUR_FIELD) + 1, 100)
FROM YOUR_TABLE
WHERE CHARINDEX('[', YOUR_FIELD) > 0 AND
CHARINDEX(';', YOUR_FIELD) > 0;
This makes sure the delimiters exist, and solves an issue with the currently accepted answer where doing the LEFT last is working with the position of the last delimiter in the original string, rather than the revised substring.

select substring(your_field, CHARINDEX(';',your_field)+1
,CHARINDEX('[',your_field)-CHARINDEX(';',your_field)-1)
from your_table
Can't get the others to work. I believe you just want what is in between ';' and '[' in all cases regardless of how long the string in between is. After specifying the field in the substring function, the second argument is the starting location of what you will extract. That is, where the ';' is + 1 (fourth position - the c), because you don't want to include ';'. The next argument takes the location of the '[' (position 14) and subtracts the location of the spot after the ';' (fourth position - this is why I now subtract 1 in the query). This basically says substring(field,location I want substring to begin, how long I want substring to be). I've used this same function in other cases. If some of the fields don't have ';' and '[', you'll want to filter those out in the "where" clause, but that's a little different than the question. If your ';' was say... ';;;', you would use 3 instead of 1 in the example. Hope this helps!

If you need to split something into 3 pieces, such as an email address and you don't know the length of the middle part, try this (I just ran this on sqlserver 2012 so I know it works):
SELECT top 2000
emailaddr_ as email,
SUBSTRING(emailaddr_, 1,CHARINDEX('#',emailaddr_) -1) as username,
SUBSTRING(emailaddr_, CHARINDEX('#',emailaddr_)+1, (LEN(emailaddr_) - charindex('#',emailaddr_) - charindex('.',reverse(emailaddr_)) )) domain
FROM
emailTable
WHERE
charindex('#',emailaddr_)>0
AND
charindex('.',emailaddr_)>0;
GO
Hope this helps.

Related

Select substring between two words in a long string

I have a table with a large string column called "HL7_MESSAGE" that I need to pull a string from several key words, as you'll see in the code below. I'm an Oracle person so the code was written and works in Oracle SQL but I need to convert it into SQL server code. SQL server doesn't have Regexp_substr function but I haven't been able to get this to work using charindex or Patindex. Basically I select a string between two strings and in the decode statements I look for if there is data missing between the two words/sections. If it just finds '.br\' then it's missing data and I just flag missing or filled. Anyway, code is below...if someone can decode it to SQL server version 2011 I would appreciate it.
CODE:
select
primary_key,
trim(REPLACE(trim(regexp_substr(hl7_message, 'RHRN:(.*)BIRTHDATE:', 1, 1, null, 1)),'\.br\',' ')) AS PATIENT_RHRN,
trim(REPLACE(trim(regexp_substr(hl7_message, 'PATIENT NAME:(.*)RHRN:', 1, 1, null, 1)),'\.br\',' ')) AS PATIENT_NAME,
trim(REPLACE(trim(regexp_substr(hl7_message, 'ULI:(.*)GENDER:', 1, 1, null, 1)),'\.br\',' ')) AS PATIENT_ULI,
decode(replace(to_char(regexp_substr(hl7_message,'FINDINGS:(.*)ADVERSE EVENTS:',1,1,'',1)),'\.br\'),NULL, 'missing', 'filled') FINDINGS_TO_ADVS_EVENTS_FLAG,
decode(replace(to_char(regexp_substr(hl7_message,'IMPRESSIONS:(.*)RECOMMENDATIONS:',1,1,'',1)),'\.br\'),NULL, 'missing', 'filled') IMPRESSION_TO_RECOMM_FLAG,
decode(replace(to_char(regexp_substr(hl7_message,'RECOMMENDATIONS:(.*)_____________________________',1,1,'',1)),'\.br\'),NULL, 'missing', 'filled') RECOMM_TO_SIG_UNDERLINE
from TEST;
Thanks
I have provided below a quick example using patindex and substring to extract information from a string between two other strings, Hopefully this will act as a basis for you to be able to do the conversion.
DECLARE #a varchar(300) = 'this is a long string I will extract data from'
PRINT SUBSTRING(#a, PATINDEX('% long %', #a) + LEN('% long %')-2, PATINDEX('% extract %', #a) - (PATINDEX('% long %', #a) + LEN('% long %')-2))
The key is in using the two patindex patterns to determine the start point and length:
First find the end of the first string pattern (Patindex to find the start, len - 2 for the end):
Patindex(pattern, string) + LEN(pattern, string) - 2
Then to find the length, use Patindex to find the start of the second string, and subtract the start point found above.
Patindex(pattern, string) - (Patindex(start pattern, string) + LEN(start pattern, string) - 2 )
I hope that this helps.

Select with Left Function Condition SQL Server [duplicate]

I've been using this for some time:
SUBSTRING(str_col, PATINDEX('%[^0]%', str_col), LEN(str_col))
However recently, I've found a problem with columns with all "0" characters like '00000000' because it never finds a non-"0" character to match.
An alternative technique I've seen is to use TRIM:
REPLACE(LTRIM(REPLACE(str_col, '0', ' ')), ' ', '0')
This has a problem if there are embedded spaces, because they will be turned into "0"s when the spaces are turned back into "0"s.
I'm trying to avoid a scalar UDF. I've found a lot of performance problems with UDFs in SQL Server 2005.
SUBSTRING(str_col, PATINDEX('%[^0]%', str_col+'.'), LEN(str_col))
Why don't you just cast the value to INTEGER and then back to VARCHAR?
SELECT CAST(CAST('000000000' AS INTEGER) AS VARCHAR)
--------
0
Other answers here to not take into consideration if you have all-zero's (or even a single zero).
Some always default an empty string to zero, which is wrong when it is supposed to remain blank.
Re-read the original question. This answers what the Questioner wants.
Solution #1:
--This example uses both Leading and Trailing zero's.
--Avoid losing those Trailing zero's and converting embedded spaces into more zeros.
--I added a non-whitespace character ("_") to retain trailing zero's after calling Replace().
--Simply remove the RTrim() function call if you want to preserve trailing spaces.
--If you treat zero's and empty-strings as the same thing for your application,
-- then you may skip the Case-Statement entirely and just use CN.CleanNumber .
DECLARE #WackadooNumber VarChar(50) = ' 0 0123ABC D0 '--'000'--
SELECT WN.WackadooNumber, CN.CleanNumber,
(CASE WHEN WN.WackadooNumber LIKE '%0%' AND CN.CleanNumber = '' THEN '0' ELSE CN.CleanNumber END)[AllowZero]
FROM (SELECT #WackadooNumber[WackadooNumber]) AS WN
OUTER APPLY (SELECT RTRIM(RIGHT(WN.WackadooNumber, LEN(LTRIM(REPLACE(WN.WackadooNumber + '_', '0', ' '))) - 1))[CleanNumber]) AS CN
--Result: "123ABC D0"
Solution #2 (with sample data):
SELECT O.Type, O.Value, Parsed.Value[WrongValue],
(CASE WHEN CHARINDEX('0', T.Value) > 0--If there's at least one zero.
AND LEN(Parsed.Value) = 0--And the trimmed length is zero.
THEN '0' ELSE Parsed.Value END)[FinalValue],
(CASE WHEN CHARINDEX('0', T.Value) > 0--If there's at least one zero.
AND LEN(Parsed.TrimmedValue) = 0--And the trimmed length is zero.
THEN '0' ELSE LTRIM(RTRIM(Parsed.TrimmedValue)) END)[FinalTrimmedValue]
FROM
(
VALUES ('Null', NULL), ('EmptyString', ''),
('Zero', '0'), ('Zero', '0000'), ('Zero', '000.000'),
('Spaces', ' 0 A B C '), ('Number', '000123'),
('AlphaNum', '000ABC123'), ('NoZero', 'NoZerosHere')
) AS O(Type, Value)--O is for Original.
CROSS APPLY
( --This Step is Optional. Use if you also want to remove leading spaces.
SELECT LTRIM(RTRIM(O.Value))[Value]
) AS T--T is for Trimmed.
CROSS APPLY
( --From #CadeRoux's Post.
SELECT SUBSTRING(O.Value, PATINDEX('%[^0]%', O.Value + '.'), LEN(O.Value))[Value],
SUBSTRING(T.Value, PATINDEX('%[^0]%', T.Value + '.'), LEN(T.Value))[TrimmedValue]
) AS Parsed
Results:
Summary:
You could use what I have above for a one-off removal of leading-zero's.
If you plan on reusing it a lot, then place it in an Inline-Table-Valued-Function (ITVF).
Your concerns about performance problems with UDF's is understandable.
However, this problem only applies to All-Scalar-Functions and Multi-Statement-Table-Functions.
Using ITVF's is perfectly fine.
I have the same problem with our 3rd-Party database.
With Alpha-Numeric fields many are entered in without the leading spaces, dang humans!
This makes joins impossible without cleaning up the missing leading-zeros.
Conclusion:
Instead of removing the leading-zeros, you may want to consider just padding your trimmed-values with leading-zeros when you do your joins.
Better yet, clean up your data in the table by adding leading zeros, then rebuilding your indexes.
I think this would be WAY faster and less complex.
SELECT RIGHT('0000000000' + LTRIM(RTRIM(NULLIF(' 0A10 ', ''))), 10)--0000000A10
SELECT RIGHT('0000000000' + LTRIM(RTRIM(NULLIF('', ''))), 10)--NULL --When Blank.
Instead of a space replace the 0's with a 'rare' whitespace character that shouldn't normally be in the column's text. A line feed is probably good enough for a column like this. Then you can LTrim normally and replace the special character with 0's again.
My version of this is an adaptation of Arvo's work, with a little more added on to ensure two other cases.
1) If we have all 0s, we should return the digit 0.
2) If we have a blank, we should still return a blank character.
CASE
WHEN PATINDEX('%[^0]%', str_col + '.') > LEN(str_col) THEN RIGHT(str_col, 1)
ELSE SUBSTRING(str_col, PATINDEX('%[^0]%', str_col + '.'), LEN(str_col))
END
The following will return '0' if the string consists entirely of zeros:
CASE WHEN SUBSTRING(str_col, PATINDEX('%[^0]%', str_col+'.'), LEN(str_col)) = '' THEN '0' ELSE SUBSTRING(str_col, PATINDEX('%[^0]%', str_col+'.'), LEN(str_col)) END AS str_col
This makes a nice Function....
DROP FUNCTION [dbo].[FN_StripLeading]
GO
CREATE FUNCTION [dbo].[FN_StripLeading] (#string VarChar(128), #stripChar VarChar(1))
RETURNS VarChar(128)
AS
BEGIN
-- http://stackoverflow.com/questions/662383/better-techniques-for-trimming-leading-zeros-in-sql-server
DECLARE #retVal VarChar(128),
#pattern varChar(10)
SELECT #pattern = '%[^'+#stripChar+']%'
SELECT #retVal = CASE WHEN SUBSTRING(#string, PATINDEX(#pattern, #string+'.'), LEN(#string)) = '' THEN #stripChar ELSE SUBSTRING(#string, PATINDEX(#pattern, #string+'.'), LEN(#string)) END
RETURN (#retVal)
END
GO
GRANT EXECUTE ON [dbo].[FN_StripLeading] TO PUBLIC
cast(value as int) will always work if string is a number
SELECT CAST(CAST('000000000' AS INTEGER) AS VARCHAR)
This has a limit on the length of the string that can be converted to an INT
If you are using Snowflake SQL, might use this:
ltrim(str_col,'0')
The ltrim function removes all instances of the designated set of characters from the left side.
So ltrim(str_col,'0') on '00000008A' would return '8A'
And rtrim(str_col,'0.') on '$125.00' would return '$125'
This might help
SELECT ABS(column_name) FROM [db].[schema].[table]
replace(ltrim(replace(Fieldname.TableName, '0', '')), '', '0')
The suggestion from Thomas G worked for our needs.
The field in our case was already string and only the leading zeros needed to be trimmed. Mostly it's all numeric but sometimes there are letters so the previous INT conversion would crash.
For converting number as varchar to int, you could also use simple
(column + 0)
Very easy way, when you just work with numeric values:
SELECT
TRY_CONVERT(INT, '000053830')
Try this:
replace(ltrim(replace(#str, '0', ' ')), ' ', '0')
If you do not want to convert into int, I prefer this below logic because it can handle nulls
IFNULL(field,LTRIM(field,'0'))
SUBSTRING(str_col, IIF(LEN(str_col) > 0, PATINDEX('%[^0]%', LEFT(str_col, LEN(str_col) - 1) + '.'), 0), LEN(str_col))
Works fine even with '0', '00' and so on.
Starting with SQL Server 2022 (16.x) you can do this
TRIM ( [ LEADING | TRAILING | BOTH ] [characters FROM ] string )
In MySQL you can do this...
Trim(Leading '0' from your_column)

Parsing Chars In SQL Using PATINDEX

I'm trying to validate a string using raw sql;
tried using:
DECLARE #AlphaNumeric varchar(50)
SET #AlphaNumeric = '1017a'
SELECT SUBSTRING(#AlphaNumeric, 1, (PATINDEX('%[^0-9]%', #AlphaNumeric) - 1)) AS 'Numeric',
SUBSTRING(#AlphaNumeric, PATINDEX('%[^0-9]%', #AlphaNumeric), DATALENGTH(#AlphaNumeric)) AS 'Alpha'
But if the user types 101a7a,this doesnt work properly;what i want to do exactly is;
I want the variable always to be, numeric+alphanumeric,lenght doesnt matter.
For example :
2303A OK
23A434A NOT OK
A344 NOT OK.
4324AAC OK
This would be dead easy if i could do it in Regex but sql gives me headaches :(
Letters followed by numbers are OK; Numbers followed by letters aren't; All characters must be letters or numbers. Hence...
select * from yourtable
where yourfield like '%[0-9][a-z]%'
and not (yourfield like '%[a-z][0-9]%')
and not (yourfield like '%[^0-9a-z]%')
I think this will do what you want. At least, it works on your sample data:
with t as (
select '2303A' as col union all
select '23A434A' union all
select 'A344'
)
select *,
(case when col like '%[0-9]%' and
substring(col, patindex('%[A-Z]%', col), len(col)) not like '%[^A-Z]%'
then 'OK'
else 'NOT OK'
end)
from t;
The two conditions are. First check that the character string has a number somewhere. Then, check that there are only letters after the first letter is found. I'm assuming that all letters are uppercase.
EDIT:
There might be an easier way. You can check that a number is followed by a letter somewhere in the string, but that a letter is never followed by a number. For this, you only need like:
select (case when col not like '%[^A-Z0-9]%' and
col like '%[0-9][A-Z]%' and
col not like '%[A-Z][0-9]%'
then 'OK'
else 'NOT OK'
end)
I have an approach that should work in your situation. Basically identify the position of the last integer and compare it to the position of the first non integer. You can get the position of the last integer like this
len(#AlphaNumeric) - PATINDEX('%[0-9]%', Reverse(#AlphaNumeric))+1
and you can get the position of the first non integer like this
PATINDEX('%[^0-9]%', #AlphaNumeric)
so that would make your where clause (where all integers precede any non integers like this
Where (len(#AlphaNumeric) - PATINDEX('%[0-9]%', Reverse(#AlphaNumeric))+1 ) < PATINDEX('%[^0-9]%', #AlphaNumeric)

trimming a substring in sql server column

I have data in a text column that looks like xxxxx.x.xx
I need to get the string into a view with a format of xxxxx.x (removing the trailing .xx).
I am not sure how to do this. Any help would be appreciated.
I think I would need to get the length from the start to the 2nd "." and then Left that length.
I also wonder, can this be done as a column expression or would it need a function?
Here is another twist. How would I handle the same issue if the char length is variable such as xxxxxx.xx.x and xxxxx.x.x?
You can also try this:
SELECT
REVERSE(
SUBSTRING(
REVERSE(#word),
CHARINDEX('.', REVERSE(#word))+1,
LEN(REVERSE(#word))
)
)
FROM yourTable
Here is a SQLFiddle
This solution is generic, so it would work also for the case in which you have xx.xxxx.xx.xxx or any other number of letters sepparated by dots.
This can be achieved using the expression:
left(#s, charindex('.', #s, charindex('.', #s) + 1) - 1)
(where #s is your varchar value)
Online demo: http://www.sqlfiddle.com/#!3/d41d8/23448
select 'xxxxx.x.xx' as original
,Replace( 'xxxxx.x.xx',Right('xxxxx.x.xx',3),'') as new

Right pad a string with variable number of spaces

I have a customer table that I want to use to populate a parameter box in SSRS 2008. The cust_num is the value and the concatenation of the cust_name and cust_addr will be the label. The required fields from the table are:
cust_num int PK
cust_name char(50) not null
cust_addr char(50)
The SQL is:
select cust_num, cust_name + isnull(cust_addr, '') address
from customers
Which gives me this in the parameter list:
FIRST OUTPUT - ACTUAL
1 cust1 addr1
2 customer2 addr2
Which is what I expected but I want:
SECOND OUTPUT - DESIRED
1 cust1 addr1
2 customer2 addr2
What I have tried:
select cust_num, rtrim(cust_name) + space(60 - len(cust_name)) +
rtrim(cust_addr) + space(60 - len(cust_addr)) customer
from customers
Which gives me the first output.
select cust_num, rtrim(cust_name) + replicate(char(32), 60 - len(cust_name)) +
rtrim(cust_addr) + replicate(char(32), 60 - len(cust_addr)) customer
Which also gives me the first output.
I have also tried replacing space() with char(32) and vice versa
I have tried variations of substring, left, right all to no avail.
I have also used ltrim and rtrim in various spots.
The reason for the 60 is that I have checked the max length in both fields and it is 50 and I want some whitespace between the fields even if the field is maxed. I am not really concerned about truncated data since the city, state, and zip are in different fields so if the end of the street address is chopped off it is ok, I guess.
This is not a show stopper, the SSRS report is currently deployed with the first output but I would like to make it cleaner if I can.
Whammo blammo (for leading spaces):
SELECT
RIGHT(space(60) + cust_name, 60),
RIGHT(space(60) + cust_address, 60)
OR (for trailing spaces)
SELECT
LEFT(cust_name + space(60), 60),
LEFT(cust_address + space(60), 60),
The easiest way to right pad a string with spaces (without them being trimmed) is to simply cast the string as CHAR(length). MSSQL will sometimes trim whitespace from VARCHAR (because it is a VARiable-length data type). Since CHAR is a fixed length datatype, SQL Server will never trim the trailing spaces, and will automatically pad strings that are shorter than its length with spaces. Try the following code snippet for example.
SELECT CAST('Test' AS CHAR(20))
This returns the value 'Test '.
This is based on Jim's answer,
SELECT
#field_text + SPACE(#pad_length - LEN(#field_text)) AS RightPad
,SPACE(#pad_length - LEN(#field_text)) + #field_text AS LeftPad
Advantages
More Straight Forward
Slightly Cleaner (IMO)
Faster (Maybe?)
Easily Modified to either double pad for displaying in non-fixed width fonts or split padding left and right to center
Disadvantages
Doesn't handle LEN(#field_text) > #pad_length
Based on KMier's answer, addresses the comment that this method poses a problem when the field to be padded is not a field, but the outcome of a (possibly complicated) function; the entire function has to be repeated.
Also, this allows for padding a field to the maximum length of its contents.
WITH
cte AS (
SELECT 'foo' AS value_to_be_padded
UNION SELECT 'foobar'
),
cte_max AS (
SELECT MAX(LEN(value_to_be_padded)) AS max_len
)
SELECT
CONCAT(SPACE(max_len - LEN(value_to_be_padded)), value_to_be_padded AS left_padded,
CONCAT(value_to_be_padded, SPACE(max_len - LEN(value_to_be_padded)) AS right_padded;
declare #t table(f1 varchar(50),f2 varchar(50),f3 varchar(50))
insert into #t values
('foooo','fooooooo','foo')
,('foo','fooooooo','fooo')
,('foooooooo','fooooooo','foooooo')
select
concat(f1
,space(max(len(f1)) over () - len(f1))
,space(3)
,f2
,space(max(len(f2)) over () - len(f2))
,space(3)
,f3
)
from #t
result
foooo fooooooo foo
foo fooooooo fooo
foooooooo fooooooo foooooo

Resources