T-SQL: LIKE operator, compare string to one value - sql-server

Is there a way to compare multiple values in one column to a single value in another column.
Example:
Column A contains: [a;b;c;d]
Column B contains: [a]
At the moment I'm using the LIKE operator to achieve this but not result. I tried it with a wildcard % but I get no match because of the ;.

As Larnu suggested, the real fix here is to fix the design. You should go back to the owners and remind them that the database is for storing relational data; if you're jamming multiple "facts" into a single column, you may as well be using a flat file. The exception is if you are storing a comma-separated list for the application and only the application is responsible for assembling and exploding that set.
Anyway, given that you are probably stuck with this (and let's say ColumnA is limited to 128 characters):
CREATE TABLE dbo.BadDesign
(
ColumnA nvarchar(128),
ColumnB nvarchar(max)
);
INSERT dbo.BadDesign(ColumnA, ColumnB) VALUES
(N'[a]', N'[a;b;c;d]'), -- only match
(N'[p]', N'[q;r;s]'),
(N'[h]', N'[hi;j;k]');
You can see the following solutions demonstrated in turn in this db<>fiddle:
Nested Replace
In the old days before (SQL Server 2017), we would perform nested REPLACE() calls to get rid of the square brackets and replace each end of the string with delimiters:
-- All versions
SELECT ColumnA, ColumnB
FROM dbo.BadDesign
WHERE REPLACE(REPLACE(ColumnB, N'[',N';'),N']', N';')
LIKE N'%' + REPLACE(REPLACE(ColumnA ,N'[',N';'),N']', N';') + N'%';
Gross, but results:
ColumnA
ColumnB
[a]
[a;b;c;d]
We can't use TRIM() on versions prior to SQL Server 2017, but I explain below why we don't want to use that function on modern versions anyway.
OpenJson
In 2016+ we can use OPENJSON after a little manipulation to the string. And here I use a PARSENAME() trick which is only safe if the ColumnA <= 128 characters. I show other workarounds in this db<>fiddle:
SELECT b.ColumnA, b.ColumnB
FROM dbo.BadDesign AS b
CROSS APPLY OPENJSON(REPLACE(REPLACE(REPLACE(
b.ColumnB, N'[', N'["'), N']', N'"]"'), N';', N'","')) AS j
WHERE j.value = PARSENAME(b.ColumnA, 1);
Results:
ColumnA
ColumnB
[a]
[a;b;c;d]
Translate
In SQL Server 2017, it can be a little less gross with TRANSLATE():
-- SQL Server 2017+
SELECT ColumnA, ColumnB
FROM dbo.BadDesign
WHERE TRANSLATE(ColumnB, N'[]',N';;')
LIKE N'%' + TRANSLATE(ColumnA, N'[]',N';;') + N'%';
ColumnA
ColumnB
[a]
[a;b;c;d]
We don't want to use TRIM() here because we don't simply want to remove the enclosing square brackets; we want delimiters there so we can always compare A to B regardless of where B is in the string. Without surrounding delimiters replaced or translated, we could get inaccurate results if the match is at the beginning or end of the multi-value string.
Split Function
Alternatively, you could create this function on SQL Server 2016+ (or a messier one that doesn't use STRING_SPLIT() in earlier versions - as Smor noted, a search will turn up hundreds of those):
CREATE FUNCTION dbo.SplitAndClean(#s nvarchar(max))
RETURNS TABLE
WITH SCHEMABINDING
AS
RETURN
(
SELECT value
FROM STRING_SPLIT
(
-- if 2016:
REPLACE(REPLACE(#s, N'[',N';'),N']', N';'),
-- if 2017+, TRANSLATE() is slightly cleaner:
/* TRANSLATE(#s, N'[]',N';;'), */
N';'
)
WHERE value > N''
);
Then you can say:
SELECT bd.ColumnA, bd.ColumnB
FROM dbo.BadDesign AS bd
CROSS APPLY dbo.SplitAndClean(bd.ColumnA) AS a
CROSS APPLY dbo.SplitAndClean(bd.ColumnB) AS b
WHERE a.value = b.value;
ColumnA
ColumnB
[a]
[a;b;c;d]
But in the end...
...these are all gross "solutions" masking bad design, and you should really have them reconsider how they're using the database.
I know that many shops can't just switch to passing sets between the app and the database using TVPs, because several client providers and ORMs haven't quite had more than a decade to catch that train. If you can't use TVPs or can't change the app, you should at least consider intercepting the comma-separated list passed by the app and break it apart using SPLIT_STRING() or the like. Then you can store the values relationally and let the database do what the database was designed to do, without being handcuffed by app limitations.

If there will be always only one value in col_b like in your example, you can user nested replace function to remove [ and ] and then use "like" for search
select *
from test_data
where col_a like '%' + replace(replace(col_b, '[', ''), ']', '') + '%';
But
if there could be more than value in col_b and it could be in any order (e.g. "[a;c]" or "[d;a]") you'll find answer among already answered questions or you may google for "string_split()" function on msdn. The latter has great examples section that will definitely help you out

Related

SQL SUBSTRING & PATINDEX of varying lengths

SQL Server 2017.
Given the following 3 records with field of type nvarchar(250) called fileString:
_318_CA_DCA_2020_12_11-01_00_01_VM6.log
_319_CA_DCA_2020_12_12-01_VM17.log
_333_KF_DCA01_00_01_VM232.log
I would want to return:
VM6
VM17
VM232
Attempted thus far with:
SELECT
SUBSTRING(fileString, PATINDEX('%VM[0-9]%', fileString), 3)
FROM dbo.Table
But of course that only returns VM and 1 number.
How would I define the parameter for number of characters when it varies?
EDIT: to pre-emptively answer a question that may come up, yes, the VM pattern will always be proceeded immediately by .log and nothing else. But even if I took that approach and worked backwards, I still don't understand how to define the number of characters to take when the number varies.
here is one way :
DECLARE #test TABLE( fileString varchar(500))
INSERT INTO #test VALUES
('_318_CA_DCA_2020_12_11-01_00_01_VM6.log')
,('_319_CA_DCA_2020_12_12-01_00_01_VM17.log')
,('_333_KF_DCA_2020_12_15-01_00_01_VM232.log')
-- 5 is the length of file extension + 1 which is always the same size '.log'
SELECT
REVERSE(SUBSTRING(REVERSE(fileString),5,CHARINDEX('_',REVERSE(fileString))-5))
FROM #test AS t
This will dynamically grab the length and location of the last _ and remove the .log.
It is not the most efficient, if you are able to write a CLR function usnig C# and import it into SQL, that will be much more efficient. Or you can use this as starting point and tweak it as needed.
You can remove the variable and replace it with your table like below
DECLARE #TESTVariable as varchar(500)
Set #TESTVariable = '_318_CA_DCA_2020_12_11-01_00_01_VM6adf.log'
SELECT REPLACE(SUBSTRING(#TESTVariable, PATINDEX('%VM[0-9]%', #TESTVariable), PATINDEX('%[_]%', REVERSE(#TESTVariable))), '.log', '')
select *,
part = REPLACE(SUBSTRING(filestring, PATINDEX('%VM[0-9]%', filestring), PATINDEX('%[_]%', REVERSE(filestring))), '.log', '')
from table
Your lengths are consistent at the beginning. So get away from patindex and use substring to crop out the beginning. Then just replace the '.log' with an empty string at the end.
select *,
part = replace(substring(filestring,33,255),'.log','')
from table;
Edit:
Okay, from your edit you show differing prefix portions. Then patindex is in fact correct. Here's my solution, which is not better or worse than the other answers but differs with respect to the fact that it avoids reverse and delegates the patindex computation to a cross apply section. You may find it a bit more readable.
select filestring,
part = replace(substring(filestring, ap.vmIx, 255),'.log','')
from table
cross apply (select
vmIx = patindex('%_vm%', filestring) + 1
) ap

How to use an evaluated expression in SQL IN clause

I have an application that takes a comma separated string for multiple IDs to be used in the 'IN' clause of a SQL query.
SELECT * FROM TABLE WHERE [TABLENAME].[COLUMNNAME]
IN
((SELECT '''' + REPLACE('PARAM(0, Enter ID/IDS. Separate multiple ids by
comma., String)', char(44), ''',''') + ''''))
I have tested that PARAM gets the string entered e.g. 'ID1, ID2' but SELECT/REPLACE does not execute. The statement becomes,
SELECT * FROM TABLE WHERE [TABLENAME].[COLUMNNAME]
IN
((SELECT '''' + REPLACE('ID1,ID2', char(44), ''',''') + ''''))
I am trying to achieve,
SELECT * FROM TABLE WHERE [TABLENAME].[COLUMNNAME]
IN ('ID1', 'ID2')
The query does not return any results/errors. I am confident the corresponding records are in the database I am working with. Not sure how to fix this.
You can't do it like this. The IN operator expects a list of parameters separated by comma, but you supply it with a single parameter that happens to contain a comma delimited string:
If you are working on SQL Server version 2016 or higher, you can use the built in string_split to convert the delimited string into a table.
SELECT *
FROM TABLE
WHERE [TABLENAME].[COLUMNNAME]
IN STRING_SPLIT(#CommaDelimitedString, ',')
For older versions, there are multiple user defined functions you can choose from, my personal favorite is Jeff Moden's DelimitedSplit8K. For more options, read Aaron Bertrand's Split strings the right way – or the next best way.

Is there a SQL Server collation option that will allow matching different apostrophes?

I'm currently using SQL Server 2016 with SQL_Latin1_General_CP1_CI_AI collation. As expected, queries with the letter e will match values with the letters e, è, é, ê, ë, etc because of the accent insensitive option of the collation. However, queries with a ' (U+0027) do not match values containing a ’ (U+2019). I would like to know if such a collation exists where this case would match, since it's easier to type ' than it is to know that ’ is keystroke Alt-0146.
I'm confident in saying no. The main thing, here, is that the two characters are different (although similar). With accents, e and ê are still both an e (just one has an accent). This enables you (for example) to do searches for things like SELECT * FROM Games WHERE [Name] LIKE 'Pokémon%'; and still have rows containing Pokemon return (because people haven't used the accent :P).
The best thing I could suggest would be to use REPLACE (at least in your WHERE clause) so that both rows are returned. That is, however, likely going to get expensive.
If you know what columns are going to be a problem, you could, therefore, add a PERSISTED Computed Column to that table. Then you could use that column in your WHERE clause, but display the one the original one. Something like:
USE Sandbox;
--Create Sample table and data
CREATE TABLE Sample (String varchar(500));
INSERT INTO Sample
VALUES ('This is a string that does not contain either apostrophe'),
('Where as this string, isn''t without at least one'),
('’I have one of them as well’'),
('’Well, I''m going to use both’');
GO
--First attempt (without the column)
SELECT String
FROM Sample
WHERE String LIKE '%''%'; --Only returns 2 of the rows
GO
--Create a PERSISTED Column
ALTER TABLE Sample ADD StringRplc AS REPLACE(String,'’','''') PERSISTED;
GO
--Second attempt
SELECT String
FROM Sample
WHERE StringRplc LIKE '%''%'; --Returns 3 rows
GO
--Clean up
DROP TABLE Sample;
GO
The other answer is correct. There is no such collation. You can easily verify this with the below.
DECLARE #dynSql NVARCHAR(MAX) =
'SELECT * FROM (' +
(
SELECT SUBSTRING(
(
SELECT ' UNION ALL SELECT ''' + name + ''' AS name, IIF( NCHAR(0x0027) = NCHAR(0x2019) COLLATE ' + name + ', 1,0) AS Equal'
FROM sys.fn_helpcollations()
FOR XML PATH('')
), 12, 0+ 0x7fffffff)
)
+ ') t
ORDER BY Equal, name';
PRINT #dynSql;
EXEC (#dynSql);

TSQL Variable With List of Values for IN Clause

I want to use a clause along the lines of "CASE WHEN ... THEN 1 ELSE 0 END" in a select statement. The tricky part is that I need it to work with "value IN #List".
If I hard code the list it works fine - and it performs well:
SELECT
CASE WHEN t.column_a IN ( 'value a', 'value b' ) THEN 1 ELSE 0 END AS priority
, t.column_b
, t.column_c
FROM
table AS t
ORDER BY
priority DESC
What I would like to do is:
-- #AvailableValues would be a list (array) of strings.
DECLARE
#AvailableValues ???
SELECT
#AvailableValues = ???
FROM
lookup_table
SELECT
CASE WHEN t.column_a IN #AvailableValues THEN 1 ELSE 0 END AS priority
, t.column_b
, t.column_c
FROM
table AS t
ORDER BY
priority DESC
Unfortunately, it seems that SQL Server doesn't do this - you can't use a variable with an IN clause. So this leaves me with some other options:
Make '#AvailableValues' a comma-delimited string and use a LIKE statement. This does not perform well.
Use an inline SELECT statement against 'lookup_table' in place of the variable. Again, doesn't perform well (I think) because it has to lookup the table on each row.
Write a function wrapping around the SELECT statement in place of the variable. I haven't tried this yet (will try it now) but it seems that it will have the same problem as a direct SELECT statement.
???
Are there any other options? Performance is very important for the query - it has to be really fast as it feeds a real-time search result page (i.e. no caching) for a web site.
Are there any other options here? Is there a way to improve the performance of one of the above options to get good performance?
Thanks in advance for any help given!
UPDATE: I should have mentioned that the 'lookup_table' in the example above is already a table variable. I've also updated the sample queries to better demonstrate how I'm using the clause.
UPDATE II: It occurred to me that the IN clause is operating off an NVARCHAR/NCHAR field (due to historical table design reasons). If I was to make changes that dealt with integer fields (i.e through PK/FK relationship constraints) could this have much impact on performance?
You can use a variable in an IN clause, but not in the way you're trying to do. For instance, you could do this:
declare #i int
declare #j int
select #i = 10, #j = 20
select * from YourTable where SomeColumn IN (#i, #j)
The key is that the variables cannot represent more than one value.
To answer your question, use the inline select. As long as you don't reference an outer value in the query (which could change the results on a per-row basis), the engine will not repeatedly select the same data from the table.
Based on your update and assuming the lookup table is small, I suggest trying something like the following:
DECLARE #MyLookup table
(SomeValue nvarchar(100) not null)
SELECT
case when ml.SomeValue is not null then 1 else 0 end AS Priority
,t.column_b
,t.column_c
from MyTable t
left outer join #MyLookup ml
on ml.SomeValue = t.column_a
order by case when ml.SomeValue is not null then 1 else 0 end desc
(You can't reference the column alias "Priority" in the ORDER BY clause. Alternatively, you could use the ordinal position like so:
order by 1 desc
but that's generally not recommended.)
As long as the lookup table is small , this really should run fairly quickly -- but your comment implies that it's a pretty big table, and that could slow down performance.
As for n[Var]char vs. int, yes, integers would be faster, if only because the CPU has fewer bytes to juggle around... which shoud only be a problem when processing a lot of rows, so it might be worth trying.
I solved this problem by using a CHARINDEX function. I wanted to pass the string in as a single parameter. I created a string with leading and trailing commas for each value I wanted to test for. Then I concatenated a leading and trailing commas to the string I wanted to see if was "in" the parameter. At the end I checked for CHARINDEX > 0
DECLARE #CTSPST_Profit_Centers VARCHAR (256)
SELECT #CTSPST_Profit_Centers = ',CS5000U37Y,CS5000U48B,CS5000V68A,CS5000V69A,CS500IV69A,CS5000V70S,CS5000V79B,CS500IV79B,'
SELECT
CASE
WHEN CHARINDEX(','+ISMAT.PROFIT_CENTER+',' ,#CTSPST_Profit_Centers) > 0 THEN 'CTSPST'
ELSE ISMAT.DESIGN_ID + ' 1 CPG'
END AS DESIGN_ID
You can also do it in the where clause
WHERE CHARINDEX(','+ISMAT.PROFIT_CENTER+',',#CTSPST_Profit_Centers) > 0
If you were trying to compare numbers you'd need to convert the number to a text string for the CHARINDEX function to work.
This might be along the lines of what you need.
Note that this assumes that you have permissions and the input data has been sanitized.
From Running Dynamic Stored Procedures
CREATE PROCEDURE MyProc (#WHEREClause varchar(255))
AS
-- Create a variable #SQLStatement
DECLARE #SQLStatement varchar(255)
-- Enter the dynamic SQL statement into the
-- variable #SQLStatement
SELECT #SQLStatement = "SELECT * FROM TableName WHERE " + #WHEREClause
-- Execute the SQL statement
EXEC(#SQLStatement)

What is a good way to paginate out of SQL 2000 using Start and Length parameters?

I have been given the task of refactoring an existing stored procedure so that the results are paginated. The SQL server is SQL 2000 so I can't use the ROW_NUMBER method of pagination. The Stored proc is already fairly complex, building chunks of a large sql statement together before doing an sp_executesql and has various sorting options available.
The first result out of google seems like a good method but I think the example is wrong in that the 2nd sort needs to be reversed and the case when the start is less than the pagelength breaks down. The 2nd example on that page also seems like a good method but the SP is taking a pageNumber rather than the start record. And the whole temp table thing seems like it would be a performance drain.
I am making progress going down this path but it seems slow and confusing and I am having to do quite a bit of REPLACE methods on the Sort order to get it to come out right.
Are there any other easier techniques I am missing?
There are two SQL Server 2000 compliant answers in this StackOverflow question - skip the accepted one, which is 2005-only:
No, I'm afraid not - SQL Server 2000 doesn't have any of the 2005 niceties like Common Table Expression (CTE) and such..... the method described in the Google link seems to be one way to go.
Marc
Also take a look here
http://databases.aspfaq.com/database/how-do-i-page-through-a-recordset.html
scroll down to Stored Procedure Methods
Depending on your application architecture (and your amount of data, it's structure, DB server load etc.) you could use the DB access layer for paging.
For example, with ADO you can define a page size on the record set (DataSet in ADO.NET) object and do the paging on the client. Classic ADO even lets you use a server side cursor, though I don't know if that scales well (I think this was removed altogether in ADO.NET).
MSDN documentation: Paging Through a Query Result (ADO.NET)
After playing with this for a while there seems to be only one way of really doing this (using Start and Length parameters) and that's with the temp table.
My final solution was to not use the #start parameter and instead use a #page parameter and then use the
SET #sql = #sql + N'
SELECT * FROM
(
SELECT TOP ' + Cast( #length as varchar) + N' * FROM
(
SELECT TOP ' + Cast( #page*#length as varchar) + N'
field1,
field2
From Table1
order by field1 ASC
) as Result
Order by Field1 DESC
) as Result
Order by Field 1 ASC'
The original query was much more complex than what is shown here and the order by was ordered on at least 3 fields and determined by a long CASE clause, requiring me to use a series of REPLACE functions to get the fields in the right order.
We've been using variations on this query for a number of years. This example gives items 50,000 to 50,300.
select top 300
Items.*
from Items
where
Items.CustomerId = 1234 AND
Items.Active = 1 AND
Items.Id not in
(
select top 50000 Items.Id
from Items
where
Items.CustomerId = 1234 AND
Items.Active = 1
order by Items.id
)
order by Items.Id

Resources