Get a number from a sql string range - sql-server

I have a column of data that contains a percentage range as a string that I'd like to convert to a number so I can do easy comparisons.
Possible values in the string:
'<5%'
'5-10%'
'10-15%'
...
'95-100%'
I'd like to convert this in my select where clause to just the first number, 5, 10, 15, etc. so that I can compare that value to a passed in "at least this" value.
I've tried a bunch of variations on substring, charindex, convert, and replace, but I still can't seem to get something that works in all combinations.
Any ideas?

Try this,
SELECT substring(replace(interest , '<',''), patindex('%[0-9]%',replace(interest , '<','')), patindex('%[^0-9]%',replace(interest, '<',''))-1) FROM table1
Tested at my end and it works, it's only my first try so you might be able to optimise it.

#Martin: Your solution works.
Here is another I came up with based on inspiration from #mercutio
select cast(replace(replace(replace(interest,'<',''),'%',''),'-','.0') as numeric) test
from table1 where interest is not null

You can convert char data to other types of char (convert char(10) to varchar(10)), but you won't be able to convert character data to integer data from within SQL.

I don't know if this works in SQL Server, but within MySQL, you can use several tricks to convert character data into numbers. Examples from your sample data:
"<5%" => 0
"5-10%" => 5
"95-100%" => 95
now obviously this fails your first test, but some clever string replacements on the start of the string would be enough to get it working.
One example of converting character data into numbers:
SELECT "5-10%" + 0 AS foo ...
Might not work in SQL Server, but future searches may help the odd MySQL user :-D

You can do this in sql server with a cursor. If you can create a CLR function to pull out number groupings that will help. Its possible in T-SQL, just will be ugly.
Create the cursor to loop over the list.
Find the first number, If there is only 1 number group in their then return it. Otherwise find the second item grouping.
if there is only 1st item grouping returned and its the first item in the list set it to upper bound.
if there is only 1st item grouping returned and its the last item in the list set it to lower bound.
Otherwise set the 1st item grouping to lower, and the 2nd item grouping to upper bound
Just set the resulting values back to a table

The issue you are having is a symptom of not keeping the data atomic. In this case it looks purely unintentional (Legacy) but here is a link about it.
To design yourself out of this create a range_lookup table:
Create table rangeLookup(
rangeID int -- or rangeCD or not at all
,rangeLabel varchar(50)
,LowValue int--real or whatever
,HighValue int
)
To hack yourself out here some pseudo steps this will be a deeply nested mess.
normalize your input by replacing all your crazy charecters.
replace(replace(rangeLabel,"%",""),"<","")
--This will entail many nested replace statments.
Add a CASE and CHARINDEX to look for a space if there is none you have your number
else use your substring to take everything before the first " ".
-- theses steps are wrapped around the previous step.

It's complicated, but for the test cases you provided, this works. Just replace #Test with the column you are looking in from your table.
DECLARE #TEST varchar(10)
set #Test = '<5%'
--set #Test = '5-10%'
--set #Test = '10-15%'
--set #Test = '95-100%'
Select CASE WHEN
Substring(#TEST,1,1) = '<'
THEN
0
ELSE
CONVERT(integer,SUBSTRING(#TEST,1,CHARINDEX('-',#TEST)-1))
END
AS LowerBound
,
CASE WHEN
Substring(#TEST,1,1) = '<'
THEN
CONVERT(integer,Substring(#TEST,2,CHARINDEX('%',#TEST)-2))
ELSE
CONVERT(integer,Substring(#TEST,CHARINDEX('-',#TEST)+1,CHARINDEX('%',#TEST)-CHARINDEX('-',#TEST)-1))
END
AS UpperBound

You'd probably be much better off changing <5% and 5-10% to store 2 values in 2 fields. Instead of storing <5%, you would store 0, and 5, and instead of 5-10%, yould end up with 5 and 10. You'd end up with 2 columns, one called lowerbound, and one called upperbound, and then just check value >= lowerbound AND value < upperbound.

Related

In MS SQL an index on a computed column that uses RIGHT and CHARINDEX results in Invalid length parameter passed to RIGHT

I have this computed column
ALTER TABLE mytable
ADD vMessage AS (CONVERT([nvarchar] (200),RIGHT(Message,CHARINDEX('.',REVERSE(Message),1)-1),0))
And I am trying to make an index on vMessage
CREATE NONCLUSTERED INDEX [IX_vMessage]
ON mytable ([vMessageType])
I get this error
Invalid length parameter passed to the RIGHT function.
Creating the computed column and running this query works
SELECT
Message,
RIGHT(Message,charindex('.',reverse(Message),1)-1),
CONVERT([nvarchar](200),RIGHT(Message,CHARINDEX('.',REVERSE(Message),1)-1),0)
FROM mytable
The data is similar to this sample data.
DECLARE #mytable TABLE (message nvarchar(1024))
INSERT INTO #mytable (message) VALUES
('Services.Common.Contracts.InternalContract'),
('Services.Common.Contracts.ItemArchivedContract'),
('Services.Common.Contracts.ItemCreatedContract'),
('Services.Common.Contracts.ItemInformationUpdatedContract'),
('Services.Common.Contracts.EmailContract'),
('Services.Common.Contracts.Customer.SetCredentialsContract'),
('Services.Common.Contracts.InternalItemContract')
SELECT
Message,
RIGHT(Message,charindex('.',reverse(Message),1)-1),
CONVERT([nvarchar](200),RIGHT(Message,CHARINDEX('.',REVERSE(Message),1)-1),0)
FROM #myTable
No message is empty. All messages have at least one dot character in them. Below returns nothing.
SELECT * FROM mytable WHERE CHARINDEX('.', Message) = 0
The problem is that CHARINDEX can return 0 if it doesn't find the value being searched for. To avoid causing errors because of this, you should always put CHARINDEX into NULLIF to null out the 0
ALTER TABLE mytable
ADD vMessage AS (
CONVERT(nvarchar(200),
RIGHT(
Message,
NULLIF(
CHARINDEX(
'.',
REVERSE(Message)
),
0
) - 1
)
)
);
I say always, because using a WHERE doesn't always help, as SQL Server often rearranges expressions. NULLIF uses a CASE internally, which is the only guaranteed way for this not to happen.
The code starts by reversing the string and then using CHARINDEX() to look for a . character. Since you don't want to keep the . in the final result, you then subtract 1 from the returned value. This is where the problem comes in.
CHARINDEX() returns 0 if the value isn't found (using 1-based rather than 0-based indexing for the string). When we subtract 1 from that and pass it to the RIGHT() function, you have an invalid argument and will see this error.
But I also see this:
All messages have at least one dot character in them.
I suggest checking that again. Perhaps the test is running in a different environment from production. We can see your query runs fine on the provided sample data when we load it to db fiddle:
https://dbfiddle.uk/bUMPVALz

SQL SUBSTRING & PATINDEX of varying lengths

SQL Server 2017.
Given the following 3 records with field of type nvarchar(250) called fileString:
_318_CA_DCA_2020_12_11-01_00_01_VM6.log
_319_CA_DCA_2020_12_12-01_VM17.log
_333_KF_DCA01_00_01_VM232.log
I would want to return:
VM6
VM17
VM232
Attempted thus far with:
SELECT
SUBSTRING(fileString, PATINDEX('%VM[0-9]%', fileString), 3)
FROM dbo.Table
But of course that only returns VM and 1 number.
How would I define the parameter for number of characters when it varies?
EDIT: to pre-emptively answer a question that may come up, yes, the VM pattern will always be proceeded immediately by .log and nothing else. But even if I took that approach and worked backwards, I still don't understand how to define the number of characters to take when the number varies.
here is one way :
DECLARE #test TABLE( fileString varchar(500))
INSERT INTO #test VALUES
('_318_CA_DCA_2020_12_11-01_00_01_VM6.log')
,('_319_CA_DCA_2020_12_12-01_00_01_VM17.log')
,('_333_KF_DCA_2020_12_15-01_00_01_VM232.log')
-- 5 is the length of file extension + 1 which is always the same size '.log'
SELECT
REVERSE(SUBSTRING(REVERSE(fileString),5,CHARINDEX('_',REVERSE(fileString))-5))
FROM #test AS t
This will dynamically grab the length and location of the last _ and remove the .log.
It is not the most efficient, if you are able to write a CLR function usnig C# and import it into SQL, that will be much more efficient. Or you can use this as starting point and tweak it as needed.
You can remove the variable and replace it with your table like below
DECLARE #TESTVariable as varchar(500)
Set #TESTVariable = '_318_CA_DCA_2020_12_11-01_00_01_VM6adf.log'
SELECT REPLACE(SUBSTRING(#TESTVariable, PATINDEX('%VM[0-9]%', #TESTVariable), PATINDEX('%[_]%', REVERSE(#TESTVariable))), '.log', '')
select *,
part = REPLACE(SUBSTRING(filestring, PATINDEX('%VM[0-9]%', filestring), PATINDEX('%[_]%', REVERSE(filestring))), '.log', '')
from table
Your lengths are consistent at the beginning. So get away from patindex and use substring to crop out the beginning. Then just replace the '.log' with an empty string at the end.
select *,
part = replace(substring(filestring,33,255),'.log','')
from table;
Edit:
Okay, from your edit you show differing prefix portions. Then patindex is in fact correct. Here's my solution, which is not better or worse than the other answers but differs with respect to the fact that it avoids reverse and delegates the patindex computation to a cross apply section. You may find it a bit more readable.
select filestring,
part = replace(substring(filestring, ap.vmIx, 255),'.log','')
from table
cross apply (select
vmIx = patindex('%_vm%', filestring) + 1
) ap

Issues using multiple parameters in SSRS report (stored procedure)

I've read countless posts on this topic but I can't seem to get any of the recommendations to apply to my particular situation (which isn't different than others...)
I have an SSRS report. Dataset 1 is using a stored procedure and in the where clause I have
and (#param is null or alias.column in
(select Item from dbo.ufnSplit(#param,',')))
I borrowed the dbo.ufnSplit function from this post here: https://stackoverflow.com/a/512300/22194
FUNCTION [dbo].[ufnSplit]
(#RepParam nvarchar(max), #Delim char(1)= ',')
RETURNS #Values TABLE (Item nvarchar(max))AS
--based on John Sansoms StackOverflow answer:
--https://stackoverflow.com/a/512300/22194
BEGIN
DECLARE #chrind INT
DECLARE #Piece nvarchar(100)
SELECT #chrind = 1
WHILE #chrind > 0
BEGIN
SELECT #chrind = CHARINDEX(#Delim,#RepParam)
IF #chrind > 0
SELECT #Piece = LEFT(#RepParam,#chrind - 1)
ELSE
SELECT #Piece = #RepParam
INSERT #Values(Item) VALUES(#Piece)
SELECT #RepParam = RIGHT(#RepParam,LEN(#RepParam) - #chrind)
IF LEN(#RepParam) = 0 BREAK
END
RETURN
END
In dataset 2 I am getting the values that I want to pass to dataset 1
select distinct list from table
My parameter for #param is configured to look at dataset 2 for available values
My issue is that if I select a single value from my parameter dropdown for #param, the report works. If I select multiple values from the dropdown, I only return data for the first value selected.
My values in dataset 2 do not contain any ,'s
Did I miss anything for fail to provide enough information? I'm open to criticism, feedback, do's and don'ts for this, I've struggled with this issue for some time, and by no means a SQL expert :)
Cheers,
MD
Update So SQL Profiler is showing me this:
exec sp... #param=N'value1,value2 ,value3 '
Questions are:
1. Shouldn't every value be wrapped in single quotes?
2. What's with the N before the list?
3. Guessing the trailing spaces need to be trimmed out
When you select multiple values from a parameter dropdown list they are stored in an array. In order to convert that to a string that you can pass to SQL you can use the Join function. Go to your dataset properties and then to the Parameters tab. Replace the Parameter Value with this expression:
=Join(Parameters!param.Value, ",")
It should look like this:
Now your split function will get one comma separated string like it's supposed to. I would also suggest having the split function trim off spaces from the values after it has separated them.
So I figured it out and wanted to post my results here in hopes it helps someone else.
Bad data. One trailing space was blowing up my entire result set, and I didn't notice it until I ran through several scenarios (choosing many combinations of parameters)
My result set had trailing spaces - once I did an rtrim on it I didn't have to do any fancy join/split's in SSRS.

How can I make LIKE match a number or empty string inside square brackets in T-SQL?

Is it possible to have a LIKE clause with one character number or an empty string?
I have a field in which I will write a LIKE clause (as a string). I will apply it later with an expression in the WHERE clause: ... LIKE tableX.FormatField .... It must contain a number (a single character or an empty string).
Something like [0-9 ]. Where the space bar inside square brackets means an empty string.
I have a table in which I have a configuration for parameters - TblParam with field DataFormat. I have to validate a value from another table, TblValue, with field ValueToCheck. The validation is made by a query. The part for the validation looks like:
... WHERE TblValue.ValueToCheck LIKE TblParam.DataFormat ...
For the configuration value, I need an expression for one numeric character or an empty string. Something like [0-9'']. Because of the automatic nature of the check, I need a single expression (without AND OR OR operators) which can fit the query (see the example above). The same check is valid for other types of the checks, so I have to fit my check engine.
I am almost sure that I can not use [0-9''], but is there another suitable solution?
Actually, I have difficulty to validate a version string: 1.0.1.2 or 1.0.2. It can contain 2-3 dots (.) and numbers.
I am pretty sure it is not possible, as '' is not even a character.
select ascii(''); returns null.
'' = ' '; is true
'' is null; is false
If you want exactly 0-9 '' (and not ' '), then you do to something like this (in a more efficient way than like):
where col in ('1','2','3','4','5','6','7','9','0') or (col = '' and DATALENGTH(col) = 0)
That's a tricky one... As far as I can tell, there isn't a way to do it with only one like clause. You need to do like '[0-9]' OR like ''.
You could accomplish this by having a second column in your TableX. That indicates either a second pattern, or whether or not to include blanks.
If I correctly understand your question, you need something that catches an empty string. Try to use the nullif() function:
create table t1 (a nvarchar(1))
insert t1(a) values('')
insert t1(a) values('1')
insert t1(a) values('2')
insert t1(a) values('a')
-- must select first three
select a from t1 where a like '[0-9]' or nullif(a,'') is null
It returns exactly three records: '', '1' and '2'.
A more convenient method with only one range clause is:
select a from t1 where isnull(nullif(a,''),0) like '[0-9]'

TSQL Variable With List of Values for IN Clause

I want to use a clause along the lines of "CASE WHEN ... THEN 1 ELSE 0 END" in a select statement. The tricky part is that I need it to work with "value IN #List".
If I hard code the list it works fine - and it performs well:
SELECT
CASE WHEN t.column_a IN ( 'value a', 'value b' ) THEN 1 ELSE 0 END AS priority
, t.column_b
, t.column_c
FROM
table AS t
ORDER BY
priority DESC
What I would like to do is:
-- #AvailableValues would be a list (array) of strings.
DECLARE
#AvailableValues ???
SELECT
#AvailableValues = ???
FROM
lookup_table
SELECT
CASE WHEN t.column_a IN #AvailableValues THEN 1 ELSE 0 END AS priority
, t.column_b
, t.column_c
FROM
table AS t
ORDER BY
priority DESC
Unfortunately, it seems that SQL Server doesn't do this - you can't use a variable with an IN clause. So this leaves me with some other options:
Make '#AvailableValues' a comma-delimited string and use a LIKE statement. This does not perform well.
Use an inline SELECT statement against 'lookup_table' in place of the variable. Again, doesn't perform well (I think) because it has to lookup the table on each row.
Write a function wrapping around the SELECT statement in place of the variable. I haven't tried this yet (will try it now) but it seems that it will have the same problem as a direct SELECT statement.
???
Are there any other options? Performance is very important for the query - it has to be really fast as it feeds a real-time search result page (i.e. no caching) for a web site.
Are there any other options here? Is there a way to improve the performance of one of the above options to get good performance?
Thanks in advance for any help given!
UPDATE: I should have mentioned that the 'lookup_table' in the example above is already a table variable. I've also updated the sample queries to better demonstrate how I'm using the clause.
UPDATE II: It occurred to me that the IN clause is operating off an NVARCHAR/NCHAR field (due to historical table design reasons). If I was to make changes that dealt with integer fields (i.e through PK/FK relationship constraints) could this have much impact on performance?
You can use a variable in an IN clause, but not in the way you're trying to do. For instance, you could do this:
declare #i int
declare #j int
select #i = 10, #j = 20
select * from YourTable where SomeColumn IN (#i, #j)
The key is that the variables cannot represent more than one value.
To answer your question, use the inline select. As long as you don't reference an outer value in the query (which could change the results on a per-row basis), the engine will not repeatedly select the same data from the table.
Based on your update and assuming the lookup table is small, I suggest trying something like the following:
DECLARE #MyLookup table
(SomeValue nvarchar(100) not null)
SELECT
case when ml.SomeValue is not null then 1 else 0 end AS Priority
,t.column_b
,t.column_c
from MyTable t
left outer join #MyLookup ml
on ml.SomeValue = t.column_a
order by case when ml.SomeValue is not null then 1 else 0 end desc
(You can't reference the column alias "Priority" in the ORDER BY clause. Alternatively, you could use the ordinal position like so:
order by 1 desc
but that's generally not recommended.)
As long as the lookup table is small , this really should run fairly quickly -- but your comment implies that it's a pretty big table, and that could slow down performance.
As for n[Var]char vs. int, yes, integers would be faster, if only because the CPU has fewer bytes to juggle around... which shoud only be a problem when processing a lot of rows, so it might be worth trying.
I solved this problem by using a CHARINDEX function. I wanted to pass the string in as a single parameter. I created a string with leading and trailing commas for each value I wanted to test for. Then I concatenated a leading and trailing commas to the string I wanted to see if was "in" the parameter. At the end I checked for CHARINDEX > 0
DECLARE #CTSPST_Profit_Centers VARCHAR (256)
SELECT #CTSPST_Profit_Centers = ',CS5000U37Y,CS5000U48B,CS5000V68A,CS5000V69A,CS500IV69A,CS5000V70S,CS5000V79B,CS500IV79B,'
SELECT
CASE
WHEN CHARINDEX(','+ISMAT.PROFIT_CENTER+',' ,#CTSPST_Profit_Centers) > 0 THEN 'CTSPST'
ELSE ISMAT.DESIGN_ID + ' 1 CPG'
END AS DESIGN_ID
You can also do it in the where clause
WHERE CHARINDEX(','+ISMAT.PROFIT_CENTER+',',#CTSPST_Profit_Centers) > 0
If you were trying to compare numbers you'd need to convert the number to a text string for the CHARINDEX function to work.
This might be along the lines of what you need.
Note that this assumes that you have permissions and the input data has been sanitized.
From Running Dynamic Stored Procedures
CREATE PROCEDURE MyProc (#WHEREClause varchar(255))
AS
-- Create a variable #SQLStatement
DECLARE #SQLStatement varchar(255)
-- Enter the dynamic SQL statement into the
-- variable #SQLStatement
SELECT #SQLStatement = "SELECT * FROM TableName WHERE " + #WHEREClause
-- Execute the SQL statement
EXEC(#SQLStatement)

Resources