T-SQL pattern matching with exceptions - sql-server

Here's a problem I've repeatedly encountered while playing with the Stack Exchange Data Explorer, which is based on T-SQL:
How to search for a string except when it occurs as a substring of some other string?
For example, how can I select all records in a table MyTable where the column MyCol contains the string foo, but ignoring any foos that are part of the string foobar?
A quick and dirty attempt would be something like:
SELECT *
FROM MyTable
WHERE MyCol LIKE '%foo%'
AND MyCol NOT LIKE '%foobar%'
but obviously this will fail to match e.g. MyCol = 'not all foos are foobars', which I do want to match.
One solution I've come up with is to replace all occurrences of foobar with some dummy marker (that is not a substring of foo) and then checking for any remaining foos, as in:
SELECT *
FROM MyTable
WHERE REPLACE(MyCol, 'foobar', 'X') LIKE '%foo%'
This works, but I suspect it's not very efficient, since it has to run the REPLACE() on every record in the table. (For SEDE, this would typically be the Posts table, which currently has about 30 million rows.) Are the any better ways to do this?
(FWIW, the real use case that prompted this question was searching for SO posts with image URLs that use the http:// scheme prefix but do not point to the host i.stack.imgur.com.)

Neither of the ways given so far are guaranteed to work as advertised and only perform the REPLACE on a subset of rows.
SQL Server does not guarantee short circuiting of predicates and can move compute scalars up into the underlying query for derived tables and CTEs.
The only thing that is (mostly) guaranteed to work is the CASE statement. Below I use the syntactic sugar variety of IIF that expands out to CASE
SELECT *
FROM MyTable
WHERE 1 = IIF(MyCol LIKE '%foo%',
IIF(REPLACE(MyCol, 'foobar', 'X') LIKE '%foo%', 1, 0),
0);

A three-stage filter should work:
collect all rows matching '%foo%';
replace all instances of 'foobar' with a non-occurring string (such as '' perhaps);
Check again for matching '%foo%'
Here you only perform the REPLACE on potentially matching rows, not all rows. If you are expecting only a small percentage of matches, this should be much more efficient.
SQL would look like this:
;with data as (
select *
from MyTable
where MyCol like '%foo%'
)
select *
from data
where replace(MyCol, 'foobar', 'X') like '%foo%'
Note that a sub-query is required, as there are no expression short-cuts in SQL; the engine is free to reorder Boolean terms as desired for efficient processing within a singe query level.

This will be faster than your current query:
SELECT *
FROM MyTable
WHERE
MyCol like '%foo%' AND
REPLACE(MyCol, 'foobar', 'X') LIKE '%foo%'
The REPLACE is calculated after MyCol has been applied, so this is faster than just:
REPLACE(MyCol, 'foobar', 'X') LIKE '%foo%'

Assuming you're only interested in finding instances of foo with spaces surrounding them
SELECT *
FROM MyTable
WHERE MyCol LIKE 'foo %' OR MyCol LIKE '% foo %' OR MyCol LIKE '% foo'

Related

SQL Server - Find out if string returned in subquery contains another string

I have two tables. One has a separate rows for each ID. The other has a string with a comma separated list of IDs. I'm trying to find out if the ID from the first table appears anywhere within the string of comma separated IDs in the second table.
Here's a sample (non-working) query:
select * from
(select 'b' as ID) table1
where table1.ID in
(select 'a,b,c' as CSV_LIST)
This is not how IN works, of course, but I don't know how else to approach this.
I've thought about using STRING_SPLIT() but it doesn't work in this version of SQL Server. I've also thought about using CONTAINS() but I can't seem to get it to work either.
Any ideas?
You can use LIKE or a custom string splitter like Jeff Moden's if you can't fix the design.
select table1.*
from table1
inner join table2
on table2.csv like '%' + table1.b + '%'
Note, this isn't SARGable because of the leading % so as Sean pointed out, fixing the design would be best, followed by another split function that doesn't use a WHILE loop.

How does a view with union handle where's

Let's say I have a View like this
CREATE VIEW MyView
AS
SELECT Id, Name FROM Source1
UNION
SELECT Id, Name FROM Source2
Then I query the View
SELECT Id, Name From MyView WHERE Name = 'Sally'
Will SQL Server internally first Select from Source1 and Source2 all the Data and then apply the where or will it put the where for each Select statement?
SQL Server can move predicates around as it sees fit in order to optimize a query. Views are effectively macros that are expanded into the body of the query before optimization occurs.
What it will do in any particular case isn't 100% possible to predict - because in SQL, you tell the system what you want, not how to do it.
For a trivial example like this, I would expect it to evaluate the predicate against the base tables and then perform the union, but only an examination of the query plan on your database, with your tables and indexes could answer the question for sure.
Depends on the optimizer, cardinalities, indices available etc but yes it will apply the criteria to base tables where appropriate.
Note that your UNION as oppose to a UNION ALL requires a SORT to remove duplicates.

NOT IN Operation on multilple value (Sybase SQL)

I am trying to write a SQL like:
SELECT *
FROM TABLE1
WHERE (TABLE1.A, TABLE1.B) NOT IN
(SELECT TABLE2.A, TABLE2.B FROM TABLE2)
It seems this is not allowed in Sybase.
Can someone tell me how to fix this?
IN and NOT IN indeed work only on a single column.
The solution is not so difficult: concatenate the columns into a single value. For example, if both are VARCHAR columns, do something like this:
WHERE (TABLE1.A||'~~~'||TABLE1.B) NOT IN
(SELECT TABLE2.A||'~~~'||TABLE2.B FROM TABLE2)
This assumes that the string '~~~' will not occur in the data -- pick any string that works for you.

Possible to test for null records in SQL only?

I am trying to help a co-worker with a peculiar problem, and she's limited to MS SQL QUERY code only. The object is to insert a dummy record (into a surrounding union) IF no records are returned from a query.
I am having a hard time going back and forth from PL/SQL to MS SQL, and I am appealing for help (I'm not particularly appealing, but I am appealing to the StackOverflow audiance).
Basically, we need a single, testable value from the target Select ... statement.
In theory, it would do this:
(other records from unions)
Union
Select "These" as fld1, "are" as fld2, "Dummy" as fld3, "Fields" as fld4
where NOT (Matching Logic)
Union
Select fld1, fld2, fld3, fld4 // Regular records exist
From tested_table
Where (Matching Logic)
Forcing an individual dummy record, with no conditions, works.
IS there a way to get a single, testable result from a Select?
Can't do it in code (not allowed), but can feed SQL
Anybody? Anybody? Bbeller?
You could put the unions in a with, then include another union that returns a null only when the big union is empty:
; with BigUnion as
(
select *
from table1
union all
select *
from table2
)
select *
from BigUnion
union all
select null
where not exists (select * from BigUnion)

Using the SQL LIKE operator with %%

I had a requirement to create a query in SQL Server where the search condition would include/exclude a table based on user input.
Say I have two tables, TABLE_A and TABLE_B with columns KEYCOLUMN_A and COLUMN_A in TABLE_A and columns FKCOLUMN_B and COLUMN_B in TABLE_B.
And a query like:
SELECT TABLE_A.* FROM TABLE_A, TABLE_B WHERE TABLE_A.KEYCOLUMN_A = TABLE_B.FKCOLUMN_B
AND TABLE_A.COLUMN_A LIKE '%SEARCH%' AND TABLE_B.COLUMN_B LIKE '%SEARCH2%'
Now if the user does not input SEARCH2, I don't need to search TABLE_B. But this would mean an IF ELSE clause. And as the number of "optional" tables in the query increases, the permutations and combinations would also increase and there will be many IF and ELSE statements.
Instead I decided to keep the statement as it is. So if SEARCH2 is empty, the query will effectively become:
SELECT * FROM TABLE_A, TABLE_B WHERE TABLE_A.KEYCOLUMN_A = TABLE_B.FKCOLUMN_B
AND TABLE_A.COLUMN_A LIKE '%SEARCH%' AND TABLE_B.COLUMN_B LIKE '% %'
Can the SQL optimizer recognize that LIKE %% is as good as removing the condition itself?
Wrap an OR around your "B" table, such as:
AND (len(searchString)=0 OR table_b.column_b LIKE "%searchString%" )
This way, if no value for the string, its length would be zero, and the first part of the OR would be evaluated, always come back as true and return that portion of the equation as valid and ignore the other half using the LIKE clause.
You could apply the same for as many linked tables as you need.
First thing, you have a space in your example:
AND TABLE_B.COLUMN_B LIKE '% %'
That will never be optimized as it is indeed a significant condition.
Now, I think that if it is optimized away depends on the database engine and how smart it is.
For example, SQL Server 2005 does offer the same execution plan for the two types of queries, while MySQL 5.0.38 does not.
LIKE is used with the WHERE clause to search, update, and delete a record using wild cards.
Example:
To search all records whose employee name is starred from a character, 'a':
select * from Employee where Name like 'a%'
To update all records with name amit whose employee name is starting with a character, 'a':
update Employee set Name='amit' where Name like 'a%'
To delete all records whose employee name is starting with a character, 'a':
delete from Employee where Name like 'a%'
The LIKE operator is used in a WHERE clause to search for a specified pattern in a column
LIKE '%[p-s]' -- "It search data from table parameter where sentence ending with p,q,r,s word."
LIKE '[0-9]' --It use for search only numeric value
LIKE '%table%' -- it use for search parameter from table where use "table" keyword'.
LIKE %[^p-r] -- it set the condition where Not Ending With a Range of Characters
Example:
SELECT T1.BrandName,T1.BrandID,T2.CategoryName,T2.Color FROM TABLE1 T1
LEFT JOIN TABLE2 T2 on T1.ID = T2.BrandID
WHERE T1.BrandName LIKE '%Samsung%'
Example:
SELECT T1.BrandName,T1.BrandID,T2.CategoryName,T2.Color FROM TABLE1 T1
LEFT JOIN TABLE2 T2 on T1.ID = T2.BrandID
WHERE T1.BrandName LIKE '%[a-j]'
In MySQL you can also use ILIKE, and then it's case insensitive.
You can rewrite you query like this:
SELECT TABLE_A.* FROM TABLE_A, TABLE_B WHERE TABLE_A.KEYCOLUMN_A = TABLE_B.FKCOLUMN_B
AND (#paramA='' or TABLE_A.COLUMN_A LIKE '%' + #paramA + '%')
AND (#paramB='' or TABLE_B.COLUMN_B LIKE '%' + #paramB + '%')
This way, if paramA or paramB is '', then the other column that is queried inside same parentheses will not be queried.
Use UNION and proper JOINs.
The %foo% search term is bad enough (can't use index) without adding OR and LEN to the mix too.
SELECT
TABLE_A.*
FROM
TABLE_A
JOIN
TABLE_B On TABLE_A.KEYCOLUMN_A = TABLE_B.FKCOLUMN_B
WHERE
TABLE_A.COLUMN_A LIKE '%SEARCH%' AND TABLE_B.COLUMN_B LIKE '%SEARCH2%'
UNION
SELECT
TABLE_A.*
FROM
TABLE_A
WHERE
TABLE_A.COLUMN_A LIKE '%SEARCH%'

Resources