T-SQL: Splitting character values between non-consistent instances of delimiters - sql-server

There is a string accompanying a value that I need to extract from a column. I can extract the value from most of the rows, but there are a few cases where the value has different properties. This is a simplified example of the problem;
IF OBJECT_ID('TEMPDB..#TABLE') IS NOT NULL
DROP TABLE #TABLE
CREATE TABLE #TABLE(
colSTRING NVARCHAR(MAX) NULL
);
INSERT INTO #TABLE (colSTRING)
VALUES (',SHOULD NOT BE STORED THIS WAY:22.67')
,(',SHOULD NOT BE STORED THIS WAY:46.32')
,(',SHOULD NOT BE STORED THIS WAY:23.45')
,(',SHOULD NOT BE STORED THIS WAY:66.67')
,(',SHOULD NOT BE STORED THIS WAY:22.35,ANOTHER BAD THING:OK')
;
SELECT * FROM #TABLE
Output:
Notice that there is a number at the end of the string to the right of the ':'. This is the number I need to extract.
The bottom row however shows that there is a second string entry in the same cell. I need to extract 22.35 from this cell while omitting the rest of the string.
This is what I have so far;
SELECT
(RIGHT(colSTRING,CHARINDEX(':',REVERSE(colSTRING))-1)) [STRING NUMBER]
FROM #TABLE
output:
This works for the other values in the table, but the bottom row does not extract the correct value. It takes the string to the right of the ':' of the second string entry.
Is there some way to use this logic on only the first occurrence of the ':'?

So this is how I solved the problem, thanks to #MartinSmith 's idea. I adjusted the example a bit to show how this interacts with a number with more than 2 digits (>=100.00).
IF OBJECT_ID('TEMPDB..#TABLE') IS NOT NULL
DROP TABLE #TABLE
CREATE TABLE #TABLE(
colSTRING NVARCHAR(MAX) NULL
);
INSERT INTO #TABLE (colSTRING)
VALUES (',SHOULD NOT BE STORED THIS WAY:22.67')
,(',SHOULD NOT BE STORED THIS WAY:46.32')
,(',SHOULD NOT BE STORED THIS WAY:23.45')
,(',SHOULD NOT BE STORED THIS WAY:766.67')
,(',SHOULD NOT BE STORED THIS WAY:22.35,ANOTHER BAD THING:OK')
;
SELECT * FROM #TABLE
Solution: In this case, every string entry always starts with a comma. I can use that information in a CASE statement. I make a column with populated entries for each case when there are numbers <100.00 or >=100.00
SELECT ISNULL(CASE WHEN [2DIGITS] LIKE ',%' THEN NULL ELSE [2DIGITS] END,[3DIGITS]) [FIXED]
FROM(
SELECT
(RIGHT(colSTRING,CHARINDEX(':',REVERSE(colSTRING))-1)) [STRING NUMBER]
,SUBSTRING(colSTRING,1 + PATINDEX('%:[0-9][0-9].[0-9][0-9]%', colSTRING),5) [2DIGITS]
,SUBSTRING(colSTRING,1 + PATINDEX('%:[0-9][0-9][0-9].[0-9][0-9]%', colSTRING),6) [3DIGITS]
FROM #TABLE
)A

Related

Using SELECT result as string_pattern in REPLACE function

On SQL Server 2019, I would like to write a Scalar-value Function returning a string stripped from a series of possible substrings.
Assuming all of the substrings I want to replace with '' are in table MY_TABLE, in column MY_SUBSTRINGS, what would be the most efficient way to use the results of SELECT MY_SUBSTRINGS FROM MY_TABLE as string patterns in a REPLACE function to strip #MyString from the substrings.
Should I store the result of the SELECT in a table variable and loop through each of the substrings and call REPLACE for each substring possibilities or is there a better way to do this?
If I understand your question:
Here is a little demonstration how you can replace multiple values
Let's pretend the table variable #Map is an actual table
Example
Declare #Map Table (sFrom varchar(100))
Insert Into #Map values
('some')
,('begin')
,('curDateTime')
Declare #S varchar(max) = 'This is some string to begin testing [curDateTime]'
Select #S=replace(#S,sFrom,'')
From (Select top 1000 * From #Map Order By len(sFrom) Desc) A
Select ltrim(rtrim(replace(replace(replace(#S,' ','†‡'),'‡†',''),'†‡',' ')))
Returns
This is string to testing []
Note: The final Select ltrim(rtrim(...)) strips any number of duplicating spaces. This is optional

SQL Server - parameter with unknown number of multiple values

I'm creating a grid that has two columns: Name and HotelId. The problem is that data for this grid should be sent with a single parameter of VARCHAR type and should look like this:
#Parameter = 'Name1:5;Name2:10;Name3:6'
As you can see, the parameter contains Name and a number that represents ID value and you can have multiple such entries, separated by ";" symbol.
My first idea was to write a query that creates a temp table that will have two columns and populate it with data from the parameter.
How could I achieve this? It seems like I need to split the parameter two times: by the ";" symbol for each row and then by ":" symbol for each column.
How should I approach this?
Also, if there is any other more appropriate solution, I'm open to suggestions.
First Drop the #temp table if Exists...
IF OBJECT_ID('tempdb..#temp', 'U') IS NOT NULL
/*Then it exists*/
DROP TABLE #temp
Then create #temp table
CREATE TABLE #temp (v1 VARCHAR(100))
Declare all the #Paramter....
DECLARE #Parameter VARCHAR(50)
SET #Parameter= 'Name1:5;Name2:10;Name3:6'
DECLARE #delimiter nvarchar(1)
SET #delimiter= N';';
Here, Inserting all #parameter value into #temp table using ';' separated..
INSERT INTO #temp(v1)
SELECT * FROM(
SELECT v1 = LTRIM(RTRIM(vals.node.value('(./text())[1]', 'nvarchar(4000)')))
FROM (
SELECT x = CAST('<root><data>' + REPLACE(#Parameter, #delimiter, '</data><data>') + '</data></root>' AS XML).query('.')
) v
CROSS APPLY x.nodes('/root/data') vals(node)
)abc
After inserting the value into #temp table..get all the value into ':' seprated...
select Left(v1, CHARINDEX(':', v1)-1) as Name , STUFF(v1, 1, CHARINDEX(':', v1), '') as HotelId FROM #temp
Then you will get this type of Output

Most effective way to check sub-string exists in comma-separated string in SQL Server

I have a comma-separated list column available which has values like
Product1, Product2, Product3
I need to search whether the given product name exists in this column.
I used this SQL and it is working fine.
Select *
from ProductsList
where productname like '%Product1%'
This query is working very slowly. Is there a more efficient way I can search for a product name in the comma-separated list to improve the performance of the query?
Please note I have to search comma separated list before performing any other select statements.
user defined functions for comma separation of the string
Create FUNCTION [dbo].[BreakStringIntoRows] (#CommadelimitedString varchar(max))
RETURNS #Result TABLE (Column1 VARCHAR(max))
AS
BEGIN
DECLARE #IntLocation INT
WHILE (CHARINDEX(',', #CommadelimitedString, 0) > 0)
BEGIN
SET #IntLocation = CHARINDEX(',', #CommadelimitedString, 0)
INSERT INTO #Result (Column1)
--LTRIM and RTRIM to ensure blank spaces are removed
SELECT RTRIM(LTRIM(SUBSTRING(#CommadelimitedString, 0, #IntLocation)))
SET #CommadelimitedString = STUFF(#CommadelimitedString, 1, #IntLocation, '')
END
INSERT INTO #Result (Column1)
SELECT RTRIM(LTRIM(#CommadelimitedString))--LTRIM and RTRIM to ensure blank spaces are removed
RETURN
END
Declare #productname Nvarchar(max)
set #productname='Product1,Product2,Product3'
select * from product where [productname] in(select * from [dbo].[![enter image description here][1]][1][BreakStringIntoRows](#productname))
Felix is right and the 'right answer' is to normalize your table. Although, maybe you have 500k lines of code that expect this column to exist as it is. So your next best (non-destructive) answer is:
Create a table to hold normalize data:
CREATE TABLE ProductsList2 (ProductId INT, ProductName VARCHAR)
Create a TRIGGER that on UPDATE/INSERT/DELETE maintains ProductList2 by splitting the string 'Product1,Product2,Product3' into three records.
Index your new table.
Query against your new table:
SELECT *
FROM ProductsList
WHERE ProductId IN (SELECT x.ProductId
FROM ProductsList2 x
WHERE x.ProductName = 'Product1')

What is the best way to concatonate three varchar fields, each of which is either null or contains a comma separated string?

We have a result set that has three fields and each of those fields is either null or contains a comma separated list of strings.
We need to combine all three into one comma separated list and eliminate duplicates.
What is the best way to do that?
I found a nice function that can split a string and return a table:
T-SQL split string
I tried to create a UDF that would take three varchar parameters and call that split string function three times, combine them into one table, and then use a FOR XML from there and return it as one comma separated string.
But SQL is complaining about having a SELECT in a function.
Here's an example using the SplitString function you referenced.
DECLARE
#X varchar(max) = 'A, C, F'
, #Y varchar(max) = null
, #Z varchar(max) = 'A, D, E, A'
;WITH SplitResults as
(
-- Note: the function does not remove leading spaces.
SELECT LTRIM([Name]) [Name] FROM SplitString(#X)
UNION
SELECT LTRIM([Name]) [Name] FROM SplitString(#Y)
UNION
SELECT LTRIM([Name]) [Name] FROM SplitString(#Z)
)
SELECT STUFF((
SELECT ', ' + [Name]
FROM SplitResults
FOR XML PATH(''), TYPE
-- Note: here we're pulling the value out in case any characters were escaped, ie. &
-- and then STUFF is removing the leading ,<space>
).value('.', 'nvarchar(max)'), 1, 2, '')
I would not store data as a comma separated string in a single field. Separate the string to a new table and combine it to a string again when you need to.
Finding duplicates and managing the data will also be much easier.
I've used this function before (I didn't write it, and unfortunately cannot remember where I found it) to split a string and add a key (in this case an int) to the data as a separate table, linking back to the original table's PK
CREATE FUNCTION SplitWithID (#id int, #sep VARCHAR(10), #s VARCHAR(MAX))
RETURNS #t TABLE
(
id int,
val VARCHAR(MAX)
)
AS
BEGIN
DECLARE #xml XML
SET #XML = N'<root><r>' + REPLACE(#s, #sep, '</r><r>') + '</r></root>'
INSERT INTO #t(id,val)
SELECT #id, r.value('.','VARCHAR(40)') as Item
FROM #xml.nodes('//root/r') AS RECORDS(r)
RETURN
END
GO
Once you have the data on separate rows you can use any duplicate removal technique to clean the data before applying a primary key to the table.

SQL Server : converting varchar to INT

I am stuck on converting a varchar column UserID to INT. I know, please don't ask why this UserID column was not created as INT initially, long story.
So I tried this, but it doesn't work. and give me an error:
select CAST(userID AS int) from audit
Error:
Conversion failed when converting the varchar value
'1581............................................................................................................................' to data type int.
I did select len(userID) from audit and it returns 128 characters, which are not spaces.
I tried to detect ASCII characters for those trailing after the ID number and ASCII value = 0.
I have also tried LTRIM, RTRIM, and replace char(0) with '', but does not work.
The only way it works when I tell the fixed number of character like this below, but UserID is not always 4 characters.
select CAST(LEFT(userID, 4) AS int) from audit
You could try updating the table to get rid of these characters:
UPDATE dbo.[audit]
SET UserID = REPLACE(UserID, CHAR(0), '')
WHERE CHARINDEX(CHAR(0), UserID) > 0;
But then you'll also need to fix whatever is putting this bad data into the table in the first place. In the meantime perhaps try:
SELECT CONVERT(INT, REPLACE(UserID, CHAR(0), ''))
FROM dbo.[audit];
But that is not a long term solution. Fix the data (and the data type while you're at it). If you can't fix the data type immediately, then you can quickly find the culprit by adding a check constraint:
ALTER TABLE dbo.[audit]
ADD CONSTRAINT do_not_allow_stupid_data
CHECK (CHARINDEX(CHAR(0), UserID) = 0);
EDIT
Ok, so that is definitely a 4-digit integer followed by six instances of CHAR(0). And the workaround I posted definitely works for me:
DECLARE #foo TABLE(UserID VARCHAR(32));
INSERT #foo SELECT 0x31353831000000000000;
-- this succeeds:
SELECT CONVERT(INT, REPLACE(UserID, CHAR(0), '')) FROM #foo;
-- this fails:
SELECT CONVERT(INT, UserID) FROM #foo;
Please confirm that this code on its own (well, the first SELECT, anyway) works for you. If it does then the error you are getting is from a different non-numeric character in a different row (and if it doesn't then perhaps you have a build where a particular bug hasn't been fixed). To try and narrow it down you can take random values from the following query and then loop through the characters:
SELECT UserID, CONVERT(VARBINARY(32), UserID)
FROM dbo.[audit]
WHERE UserID LIKE '%[^0-9]%';
So take a random row, and then paste the output into a query like this:
DECLARE #x VARCHAR(32), #i INT;
SET #x = CONVERT(VARCHAR(32), 0x...); -- paste the value here
SET #i = 1;
WHILE #i <= LEN(#x)
BEGIN
PRINT RTRIM(#i) + ' = ' + RTRIM(ASCII(SUBSTRING(#x, #i, 1)))
SET #i = #i + 1;
END
This may take some trial and error before you encounter a row that fails for some other reason than CHAR(0) - since you can't really filter out the rows that contain CHAR(0) because they could contain CHAR(0) and CHAR(something else). For all we know you have values in the table like:
SELECT '15' + CHAR(9) + '23' + CHAR(0);
...which also can't be converted to an integer, whether you've replaced CHAR(0) or not.
I know you don't want to hear it, but I am really glad this is painful for people, because now they have more war stories to push back when people make very poor decisions about data types.
This question has got 91,000 views so perhaps many people are looking for a more generic solution to the issue in the title "error converting varchar to INT"
If you are on SQL Server 2012+ one way of handling this invalid data is to use TRY_CAST
SELECT TRY_CAST (userID AS INT)
FROM audit
On previous versions you could use
SELECT CASE
WHEN ISNUMERIC(RTRIM(userID) + '.0e0') = 1
AND LEN(userID) <= 11
THEN CAST(userID AS INT)
END
FROM audit
Both return NULL if the value cannot be cast.
In the specific case that you have in your question with known bad values I would use the following however.
CAST(REPLACE(userID COLLATE Latin1_General_Bin, CHAR(0),'') AS INT)
Trying to replace the null character is often problematic except if using a binary collation.
This is more for someone Searching for a result, than the original post-er. This worked for me...
declare #value varchar(max) = 'sad';
select sum(cast(iif(isnumeric(#value) = 1, #value, 0) as bigint));
returns 0
declare #value varchar(max) = '3';
select sum(cast(iif(isnumeric(#value) = 1, #value, 0) as bigint));
returns 3
I would try triming the number to see what you get:
select len(rtrim(ltrim(userid))) from audit
if that return the correct value then just do:
select convert(int, rtrim(ltrim(userid))) from audit
if that doesn't return the correct value then I would do a replace to remove the empty space:
select convert(int, replace(userid, char(0), '')) from audit
This is how I solved the problem in my case:
First of all I made sure the column I need to convert to integer doesn't contain any spaces:
update data set col1 = TRIM(col1)
I also checked whether the column only contains numeric digits.
You can check it by:
select * from data where col1 like '%[^0-9]%' order by col1
If any nonnumeric values are present, you can save them to another table and remove them from the table you are working on.
select * into nonnumeric_data from data where col1 like '%[^0-9]%'
delete from data where col1 like '%[^0-9]%'
Problems with my data were the cases above. So after fixing them, I created a bigint variable and set the values of the varchar column to the integer column I created.
alter table data add int_col1 bigint
update data set int_col1 = CAST(col1 AS VARCHAR)
This worked for me, hope you find it useful as well.

Resources