How to match a substring exactly in a string in SQL server? - sql-server

I have a column workId in my table which has values like :
W1/2009/12345, G2/2018/2345
Now a user want to get this particular id G2/2018/2345. I am using like operator in my query as below:
select * from u_table as s where s.workId like '%2345%' .
It is giving me both above mentioned workids. I tried following query:
select * from u_table as s where s.workId like '%2345%' and s.workId not like '_2345'
This query also giving me same result.
If anyone please provide me with the correct query. Thanks!

Why not use the existing delimiters to match with your criteria?
select *
from u_table
where concat('/', workId, '/') like concat('%/', '2345', '/%');
Ideally of course your 3 separate values would be 3 separate columns; delimiting multiple values in a single column goes against first-normal form and prevents the optimizer from performing an efficient index seek, forcing a scan of all rows every time, hurting performance and concurrency.

Related

Sybase like clause for matching a pattern between the string

I want to build a sybase ASE query to match lastname, firstname for a person. There are few different formats for name. It can be "lastname, firstname" OR it can be "lastname,firstname" (no space in between , and firstname). I have tried using name like 'lastname[,][ ]firstname' but it does not work. I can not use lastname,%firstname as it would match with any character for firstname. The valid character is either space or nothing. Any suggestions?
Unfortunately SAP/Sybase ASE does not provide support for regex patterns (eg, 'zero or more spaces'), so you're left with a few basic options ...
union (all) two queries:
select *
from names_table
where name like 'lastname, firstname'
union all
select *
from names_table
where name like 'lastname,firstname'
NOTE: Both queries should use an index on the name column assuming statistics show an index access plan is the best option.
or two where clauses:
select *
from names_table
where (name like 'lastname, firstname' or name like 'lastname,firstname')
NOTE: Whether or not this uses an index on the name column will depend on the statistics for the index and column and/or the complexity of the actual query.
Strip out spaces and match what's left:
select *
from names_table
where str_replace(name,' ',null) like 'lastname,firstname'
NOTE: In most cases this will disable the use of an index on the name column.
From an indexing perspective ...
If you need to run this type of query often, and the performance of said query is less than acceptable, you could look at a couple additional indexing options:
(materialized) computed column + index on said computed column
function-based index (ASE basically creates a 'system' computed column under the covers and then creates the index on said column)

Query performance

I have table that has the following schema:
ID,firstName,MiddleName,LastName,FML,[some other columns]
FML column is created by concatenation firstName,space character,MiddleName,space character and last name. I want to search persong when you know FML. Therefore my query is
SELECT * from tbl where FML LIKE #Param
But I want to optimize this query, I'm thinking of separating input string into firstName,MiddleName,LastName strings and make query like that
SELECT * FROM tbl where firstName like #FN and MiddleName like #MN and LastName like #ln.
Also will query
SELECT smth from tbl where Val='test'
Be better in terms of performance then
Select smth from tbl where Val like 'test'
Thank you.
If you mean =, then use =. If you mean like, then use like. But once you add wildcards to like, the performance will decrease.
By separating and filtering on separate fields, you lose flexibility, but increase the ability to be more specific in your search. So it's not optimising, per se, as the functionality is different.
Imagine you have two records, Jack Roberts, and Robert Jack
Your first query allows you to find them both if your query is '%Robert%', whereas the second allows you to find them with separate queries.
Yes '=' operator gives the best performance, whereas LIKE searches all the Val which has test in its value.

Find columns that match in two tables

I need to query two tables of companies in the first table are the full names of companies, and the second table are also the names but are incomplete. The idea is to find the fields that are similar. I put pictures of the reference and SQL code I'm using.
The result I want is like this
The closest way I found to do so:
SELECT DISTINCT
RTRIM(a.NombreEmpresaBD_A) as NombreReal,
b.EmpresaDB_B as NombreIncompleto
FROM EmpresaDB_A a, EmpresaDB_B b
WHERE a.NombreEmpresaBD_A LIKE 'VoIP%' AND b.EmpresaDB_B LIKE 'VoIP%'
The problem with the above code is that it only returns the record specified in the WHERE and if I put this LIKE '%' it returns the Cartesian product of two tables. The RDBMS is Microsoft SQL Server. I would greatly appreciate if you help me with any proposed solution.
Use the short name plus appended '%' as argument in the LIKE expression:
Edit with info that we deal with SQL Server:
SELECT a.NombreEmpresaBD_A as NombreReal
,b.NombreEmpresaBD_B as NombreIncompleto
FROM EmpresaDB_A a, EmpresaDB_B b
WHERE a.NombreEmpresaBD_A LIKE (b.NombreEmpresaBD_B + '%');
According to your screenshot you had the column name wrong!
String concatenation in T-SQL with + operator.
Above query finds a case where
'Computex S.A' LIKE 'Computex%'
but not:
'Voip Service Mexico' LIKE 'VoipService%'
For that you would have to strip blanks first or use more powerful pattern matching functions.
I have created a demo for you on data.SE.
Look up pattern matching or the LIKE operator in the manual.
I would suggest adding a foreign key between the tables linking the data. Then you can just search for the one table and join the second to get the other results.

SQL Full Text Indexer, exact matches and escaping

I'm trying to replace a Keyword Analyser based Lucene.NET index with an SQL Server 2008 R2 based one.
I have a table that contains custom indexed fields that I need to query upon. The value of the index column (see below) is a combination of name/ value pairs of the custom index fields from a series of .NET types - the actual values are pulled from attributes at run time, because the structure is unknown.
I need to be able to search for set name and value pairs, using ANDs and ORs and return the rows where the query matches.
Id Index
====================================================================
1 [Descriptor.Type]=[5][Descriptor.Url]=[/]
2 [Descriptor.Type]=[23][Descriptor.Url]=[/test]
3 [Descriptor.Type]=[25][Descriptor.Alternative]=[hello]
4 [Descriptor.Type]=[26][Descriptor.Alternative]=[hello][Descriptor.FriendlyName]=[this is a test]
A simple query look like this:
select * from Indices where contains ([Index], '[Descriptor.Url]=[/]');
That query will results in the following error:
Msg 7630, Level 15, State 2, Line 1
Syntax error near '[' in the full-text search condition '[Descriptor.Url]=[/]'.
So with that in mind, I altered the data in the Index column to use | instead of [ and ]:
select * from Indices where contains ([Index], '|Descriptor.Url|=|/|');
Now, while that query is now valid, when I run it all rows containing Descriptor.Url and starting with / are returned, instead of the records (exactly one in this case) that exactly matches.
My question is, how can I escape the query to account for the [ and ] and ensure that just the exact matching row is returned?
A more complex query looks a little like this:
select * from Indices where contains ([Index], '[Descriptor.Type]=[12] AND ([Descriptor.Url]=[/] OR [Descriptor.Url]=[/test])');
Thanks,
Kieron
Your main issue is in using a SQL wordbreaker, and the CONTAINS syntax. By default, SQL wordbreakers eliminates punctuation, and normalizes numbers, dates, urls, email addresses, and the like. It also lowercases everything, and stems words.
So, for your input string:
[Descriptor.Type]=[5][Descriptor.Url]=[/]
You would have the following tokens added to the index (along with their positions)
descriptor type nn5 5 descriptor url
(Note: the nn5 is a way to simplify quering numbers and dates given in different formats, the original number is also indexed at the same position)
So, as you can see, the punctutation is not even stored in the full text index, and thus, there is no way to query it using the CONTAINS statement.
So your statement:
select * from Indices where contains ([Index], '|Descriptor.Url|=|/|');
Would actually be normalized down to "descriptor url" by the query generator before submitting it to the full text index, thus the hits on all the entries that have "descriptor" next to "url", excluding punctuation.
What you need is the LIKE statement.
Using "|" as your delimiter causes the contains query to think of OR. Which is why you are getting unexpected results. You should be able to escape the bracket like so:
SELECT * FROM Indices WHERE
contains ([Index], '[[]Descriptor.Type]=[[]12]')

What makes a SQL statement sargable?

By definition (at least from what I've seen) sargable means that a query is capable of having the query engine optimize the execution plan that the query uses. I've tried looking up the answers, but there doesn't seem to be a lot on the subject matter. So the question is, what does or doesn't make an SQL query sargable? Any documentation would be greatly appreciated.
For reference: Sargable
The most common thing that will make a query non-sargable is to include a field inside a function in the where clause:
SELECT ... FROM ...
WHERE Year(myDate) = 2008
The SQL optimizer can't use an index on myDate, even if one exists. It will literally have to evaluate this function for every row of the table. Much better to use:
WHERE myDate >= '01-01-2008' AND myDate < '01-01-2009'
Some other examples:
Bad: Select ... WHERE isNull(FullName,'Ed Jones') = 'Ed Jones'
Fixed: Select ... WHERE ((FullName = 'Ed Jones') OR (FullName IS NULL))
Bad: Select ... WHERE SUBSTRING(DealerName,4) = 'Ford'
Fixed: Select ... WHERE DealerName Like 'Ford%'
Bad: Select ... WHERE DateDiff(mm,OrderDate,GetDate()) >= 30
Fixed: Select ... WHERE OrderDate < DateAdd(mm,-30,GetDate())
Don't do this:
WHERE Field LIKE '%blah%'
That causes a table/index scan, because the LIKE value begins with a wildcard character.
Don't do this:
WHERE FUNCTION(Field) = 'BLAH'
That causes a table/index scan.
The database server will have to evaluate FUNCTION() against every row in the table and then compare it to 'BLAH'.
If possible, do it in reverse:
WHERE Field = INVERSE_FUNCTION('BLAH')
This will run INVERSE_FUNCTION() against the parameter once and will still allow use of the index.
In this answer I assume the database has sufficient covering indexes. There are enough questions about this topic.
A lot of the times the sargability of a query is determined by the tipping point of the related indexes. The tipping point defines the difference between seeking and scanning an index while joining one table or result set onto another. One seek is of course much faster than scanning a whole table, but when you have to seek a lot of rows, a scan could make more sense.
So among other things a SQL statement is more sargable when the optimizer expects the number of resulting rows of one table to be less than the tipping point of a possible index on the next table.
You can find a detailed post and example here.
For an operation to be considered sargable, it is not sufficient for it to just be able to use an existing index. In the example above, adding a function call against an indexed column in the where clause, would still most likely take some advantage of the defined index. It will "scan" aka retrieve all values from that column (index) and then eliminate the ones that do not match to the filter value provided. It is still not efficient enough for tables with high number of rows.
What really defines sargability is the query ability to traverse the b-tree index using the binary search method that relies on half-set elimination for the sorted items array. In SQL, it would be displayed on the execution plan as a "index seek".

Resources