SQL Full Text Indexer, exact matches and escaping - sql-server

I'm trying to replace a Keyword Analyser based Lucene.NET index with an SQL Server 2008 R2 based one.
I have a table that contains custom indexed fields that I need to query upon. The value of the index column (see below) is a combination of name/ value pairs of the custom index fields from a series of .NET types - the actual values are pulled from attributes at run time, because the structure is unknown.
I need to be able to search for set name and value pairs, using ANDs and ORs and return the rows where the query matches.
Id Index
====================================================================
1 [Descriptor.Type]=[5][Descriptor.Url]=[/]
2 [Descriptor.Type]=[23][Descriptor.Url]=[/test]
3 [Descriptor.Type]=[25][Descriptor.Alternative]=[hello]
4 [Descriptor.Type]=[26][Descriptor.Alternative]=[hello][Descriptor.FriendlyName]=[this is a test]
A simple query look like this:
select * from Indices where contains ([Index], '[Descriptor.Url]=[/]');
That query will results in the following error:
Msg 7630, Level 15, State 2, Line 1
Syntax error near '[' in the full-text search condition '[Descriptor.Url]=[/]'.
So with that in mind, I altered the data in the Index column to use | instead of [ and ]:
select * from Indices where contains ([Index], '|Descriptor.Url|=|/|');
Now, while that query is now valid, when I run it all rows containing Descriptor.Url and starting with / are returned, instead of the records (exactly one in this case) that exactly matches.
My question is, how can I escape the query to account for the [ and ] and ensure that just the exact matching row is returned?
A more complex query looks a little like this:
select * from Indices where contains ([Index], '[Descriptor.Type]=[12] AND ([Descriptor.Url]=[/] OR [Descriptor.Url]=[/test])');
Thanks,
Kieron

Your main issue is in using a SQL wordbreaker, and the CONTAINS syntax. By default, SQL wordbreakers eliminates punctuation, and normalizes numbers, dates, urls, email addresses, and the like. It also lowercases everything, and stems words.
So, for your input string:
[Descriptor.Type]=[5][Descriptor.Url]=[/]
You would have the following tokens added to the index (along with their positions)
descriptor type nn5 5 descriptor url
(Note: the nn5 is a way to simplify quering numbers and dates given in different formats, the original number is also indexed at the same position)
So, as you can see, the punctutation is not even stored in the full text index, and thus, there is no way to query it using the CONTAINS statement.
So your statement:
select * from Indices where contains ([Index], '|Descriptor.Url|=|/|');
Would actually be normalized down to "descriptor url" by the query generator before submitting it to the full text index, thus the hits on all the entries that have "descriptor" next to "url", excluding punctuation.
What you need is the LIKE statement.

Using "|" as your delimiter causes the contains query to think of OR. Which is why you are getting unexpected results. You should be able to escape the bracket like so:
SELECT * FROM Indices WHERE
contains ([Index], '[[]Descriptor.Type]=[[]12]')

Related

How to match a substring exactly in a string in SQL server?

I have a column workId in my table which has values like :
W1/2009/12345, G2/2018/2345
Now a user want to get this particular id G2/2018/2345. I am using like operator in my query as below:
select * from u_table as s where s.workId like '%2345%' .
It is giving me both above mentioned workids. I tried following query:
select * from u_table as s where s.workId like '%2345%' and s.workId not like '_2345'
This query also giving me same result.
If anyone please provide me with the correct query. Thanks!
Why not use the existing delimiters to match with your criteria?
select *
from u_table
where concat('/', workId, '/') like concat('%/', '2345', '/%');
Ideally of course your 3 separate values would be 3 separate columns; delimiting multiple values in a single column goes against first-normal form and prevents the optimizer from performing an efficient index seek, forcing a scan of all rows every time, hurting performance and concurrency.

Sybase like clause for matching a pattern between the string

I want to build a sybase ASE query to match lastname, firstname for a person. There are few different formats for name. It can be "lastname, firstname" OR it can be "lastname,firstname" (no space in between , and firstname). I have tried using name like 'lastname[,][ ]firstname' but it does not work. I can not use lastname,%firstname as it would match with any character for firstname. The valid character is either space or nothing. Any suggestions?
Unfortunately SAP/Sybase ASE does not provide support for regex patterns (eg, 'zero or more spaces'), so you're left with a few basic options ...
union (all) two queries:
select *
from names_table
where name like 'lastname, firstname'
union all
select *
from names_table
where name like 'lastname,firstname'
NOTE: Both queries should use an index on the name column assuming statistics show an index access plan is the best option.
or two where clauses:
select *
from names_table
where (name like 'lastname, firstname' or name like 'lastname,firstname')
NOTE: Whether or not this uses an index on the name column will depend on the statistics for the index and column and/or the complexity of the actual query.
Strip out spaces and match what's left:
select *
from names_table
where str_replace(name,' ',null) like 'lastname,firstname'
NOTE: In most cases this will disable the use of an index on the name column.
From an indexing perspective ...
If you need to run this type of query often, and the performance of said query is less than acceptable, you could look at a couple additional indexing options:
(materialized) computed column + index on said computed column
function-based index (ASE basically creates a 'system' computed column under the covers and then creates the index on said column)

TSQL + MDX using OPENQUERY() - column names do not exist

I have a view against a OPENQUERY() which gets data from a SSAS cube.
MDX query looks like this:
WITH MEMBER [Measures].[Measure1] AS
(--calculation)
SELECT
{[Measures].[Measure1]}
ON 0,
NON EMPTY ([Dim1].[Dim_key].[Dim_key], [Dim2].[Dim_key].[Dim_key])
ON 1
FROM [Cube]
WHERE ([Dim3].[Hierarchy].[Level].[Member])
My problem is that when the WHERE filter results in 0 rows the view does not work, with error:
Invalid column name '[Dim1].[Dim_key].[Dim_key].[MEMBER_CAPTION]'.
Since its using the column name to have a GROUP BY
How can I force it return at least one row? Or always return the column names?
I cannot remove the NON EMPTY since whole set takes about 1 min to load.
So far I've tried these solutions:
MDX - Always return at least one row even if no data is available
Force mdx query to return column names
but it seems like it does not work since I have a where condition on another dimension.
Managed to figure it out. Added this measure which essentially replicates Dim1 members returned on the rows:
WITH MEMBER [Measures].[Measure1] AS
(--calculation)
MEMBER [Measures].[Dim_Key] AS
[Dim1].[Dim_key].CurrentMember.Member_Key
SELECT
{[Measures].[Measure1]
,[Measures].[Dim_Key]}
ON 0,
NON EMPTY ([Dim1].[Dim_key].[Dim_key], [Dim2].[Dim_key].[Dim_key])
ON 1
FROM [Cube]
WHERE ([Dim3].[Hierarchy].[Level].[Member])
So even if there are no rows I get the new measure back as one of the columns. If there are rows I get two additional columns (which have row tags) but I just don't use them in the view

How to compose LIKE in T-SQL to show all rows except those containing ":","-","~"?

I have a SQL Server engine in which I have a filed with filter clause. I need that clause to be compose to show all rows except those that contain :, -, ~.
My query is:
SELECT 1
WHERE '' LIKE '%[^:-~]%'
It is not working - it shows zero rows. Also I try this:
SELECT 1
WHERE 'aa:a' LIKE '%[^:-~]%'
And it shows as result 1, which is not desired result.
Is there a way to manage this?
REMARK: expression after like must be string which will be saved inside the table field (for exmaple: '%[^:-~]%' will be used as LIKE x.fldFilter)
EDIT: I need to make validation inside my engine inside SQL Server. I have a table with Parameters. Inside it I have column Format. For a specific parameter I need to check if value provided fits Format column.
For example:
DECLARE #value AS VARCHAR(1000) = 'aaa:aa';
SELECT 1 FROM dbo.ParameterDefinitions X WHERE #value LIKE X.[Format];
Where X.[Format] column contains '%[^:-~]%'.
When I test a value check must return 1 if it fits conditions and nothing if not fits.
So if I test value 'aaa:aa' or even ' ' it works. But when I have empty string ('') the condition does not working.
Unfortunately I can not change my engine and can not replace '' with space for example. I just wonder why '' does not fit the condition?
This is due to SQL Server not having a solid regex implementation.
Instead of negating your search with ^
Negate it with Not
SELECT 1
WHERE '' NOT LIKE '%[:-~]%'
Returns 1 row
SELECT 1
WHERE 'aa:a' NOT LIKE '%[:-~]%'
Returns 0 rows
EDIT:
Breaking down your search cases
'' LIKE '%[^:-~]%'
[^:-~] requires a single character so an empty string must fail
'aa:a' LIKE '%[^:-~]%'
% is a 0 or more wildcard which lets [^:-~] take its choice of 'a' while either % collects your forbidden character.
With a full regex engine we could repeat your negated range with the following [^:-~]* but SQL Server doesn't support that. Docs
The only option left to us is to perform a search for the forbidden characters '%[:-~]%' and to negate the like.

SQL Server - Percent based Full Text Search

I want to conduct search on a particular column of a table in such a way that returning result set should satify following 2 conditions:
Returning result set should have records whose 90% of the characters matches with the given search text.
Returning result set should have records whose 70% of the consecutive characters matches with the given search text.
It implies that when 10 character word Sukhminder is searched, then:
it should return records like Sukhmindes, ukhminder, Sukhmindzr, because it fulfils both of the above mentioned conditions.
But it should not return records like Sukhmixder because it does not fulfil the second condition. Likewise, It should not return record Sukhminzzz because it does not fulfil the first condition.
I am trying to use Full Text Search feature of SQL Server. But, could not formulate the required query yet. Kindly reply ASAP.
You could try using a combination of the SOUNDEX command and DIFFERENCE command with full text searching.
Check out this Google book online which talks about it
Do you mean 70% of the original word? I think the only way you could do this exactly as stated would be to work out all possible string permutations that could match the 70% criteria and bring back records matching any of those
Col LIKE '%min%' AND (
Col LIKE '%Sukhmin%' OR Col LIKE '%ukhmind%'
OR Col LIKE '%khminde%' OR Col LIKE '%hminder%' )
then do further processing to see if the 90% criteria is met.
Edit: Actually you might find this link on Fuzzy Searching to be of interest http://anastasiosyal.com/archive/2009/01/11/18.aspx

Resources