How in SQL can we write something which performs matching similar to the SSIS Fuzzy Matching component ?
What options do we have available using SQL Server features and SQL syntax ?
Thanks,
You can use the full text indexing feature of SQL server, together with the associated functions CONTAINS, RANK, etc.
The easiest way to do fuzzy matching in T-SQL would be using SOUNDEX() and DIFFERENCE().
For example
select
soundex('SQL') as 'four-character (SOUNDEX) code' -- Returns S240
, soundex('Sequel') as 'four-character (SOUNDEX) code' -- Returns S240
, difference('SQL', 'Sequel') as '0: weak or no similarity. 4: strong similarity or the same values.' -- Returns 4
Related
Using the camel sql component seems like a good thing in a project using camel. But i dont see the point for cases when dynamic sql is needed. Use case :
on front end user can
select a type of record only and submit search, in this case where clause is : "from table1 where col1 = valueX1"
also select a date range for offer start date so then where clause looks like "from table1 where col1 = valueX1 and dateCol between (...)"
and so on for other UI if values are given total of 10 different columns, in different combinations
I tried to use a dynamic sql figured out three choices:
1. using a receipient list so route is selected at run time, seemed over kill.
2. using the body as a sql and using the useMessageBodyForSql=true
3. using a custom prepareStatementStrategy
For 2 and 3 i was not able to send parameter names or specify headers or properties to be part of values to be used in Prepared statement.
For .2. had to give the sql like :
select c1, c2 ... from t1 where x = ? and y = ?
and then a java util list with the values in order.
So - is there any advantage to using this? Any feature of the sql component that makes it better to use than to directly use the spring jdbc template that it uses?
I would suggest to use Camel Templating to make the statements dynamic like that:
to("freemarker://sql/template.ftl")
.log("${body}")
.to("sql:ignored?useMessageBodyForSql=true");
Note that query parameters are represented by a ? instead of a # symbol if the statement comes from the body:
-- sql/template.ftl
select count(*) as count
from a_table
<#if headers.namePattern?has_content>
where name like :?namePattern
</#if>
You might also switch to the MyBatis component which supports advanced templating via MyBatis but this comes with a much higher overhead in terms of coding and configuration.
I have this query:
select * from catnames where contains(name, 'NEAR((david, smith), MAX, TRUE)')
Which matches David Smith where the terms appear in order.
I'm wondering if it's possible to check david OR dave smith in the same query. The documentation for CONTAINS and NEAR is a little confusing. I've played around with a few attempts, mostly trying to add 'OR', but no dice.
Is it possible?
(Edit: Obviously, I mean within a single CONTAINS rather than chaining both CONTAINS)
Use an OR operator in your full text search.
select * from catnames where contains(name, 'NEAR((david, smith), MAX, TRUE) OR NEAR((dave, smith), MAX, TRUE)')
You could also use the full-text-search thesaurus to indicate that "David" and "Dave" are synonyms, and therefore would not need the OR clause. The thesaurus allows you to declare replacement terms and expansion terms. For your case, you would use an expansion term. The entry in the XML thesaurus file (there is one per language) would look like this:
<expansion>
<sub>David</sub>
<sub>Dave</sub>
</expansion>
See here for more info on thesaurus files in SQL Server 2012.
I'm a novice at regexs and am currently trying to come up with a simple regex that searches for a serial number in the following format: 0217103200XX, where "XX" can each be a numeric digit. I'm using SQL Server Management Studio to pass the regex as a parameter in a stored procedure. I'm not sure if the syntax is any different from other programming languages. I have the following regex as a reference:
(?:2328\d\d(?:0[1-9]|[1-4]\d|5[0-3])\d{4})
Any suggestions are appreciated.
UPDATE:
I'm actually using this in a SQL Query and not in a .Net application. The format is as follows:
USE [MyDB]
EXEC MyStoredProcedure #regex = '(?:2328\d\d(?:0[1-9]|[1-4]\d|5[0-3])\d{4})'
Use LIKE: there is no native RegEx in SQL Server
LIKE '0217103200[0-9][0-9]'
As OMG Ponies stated - SQL Server does not natively support regex (need to use SQLCLR for 2005+, or xp_cre).
If I have understood your question, you could use a PATINDEX to find the serial numbers
Select *
From dbo.MyTable
Where PATINDEX('0217103200[0-9][0-9]', SerialNumberColumn) > 0
I'm writing a fairly complex stored procedure to search an image library.
I was going to use a view and write dynamic sql to query the view, but I need to use a full text index, and my view needs outer joins (MS-SQL 2005 full-text index on a view with outer joins)
So, I'm back to a stored procedure.
I need to search on (all optional):
a general search query that uses the full text index (or no search terms)
one or more categories (or none)
a single tag (or none)
Is there a way to do a conditional FREETEXT in the 'WHERE' clause? The query may be empty, in which case I want to ignore this, or just return all FTI matches.
...AND FREETEXT(dbo.MediaLibraryCultures.*, '"* "') doesn't seem to work. Not sure how a case statement would work here.
Am I better off inserting the category/tag filter results into a temp table/table variable, then joining the FTI search results? That way I can only do the join if the search term is supplied.
Thoughts?
I know it's a year later and a newer version of SQL but FYI...
I am using SQL Server 2008 and have tried to short circuit using
AND ( #searchText = '' OR freetext(Name, #searchText))
and I receive the message "Null or empty full-text predicate" when setting #searchText = ''. I guess something in 2008 has changed that keeps short circuiting from working in this case.
You could add a check for the empty search string like
where ...
AND (FREETEXT(dbo.MediaLibraryCultures.*, #FreeTextSearchFor) OR #FreeTextSearchFor = '')
(I have a feeling that freetext searches can't have null passed into them, so I'm comparing to an empty string)
If the term to search for is empty, the whole clause will evaluate to true, so no restrictions will be applied (by this clause) to the rows returned, and of course since its a constant being compared to a variable - I would think the optimizer would come into play and not perform that comparison for each row.
Hmm, I thought there was no short-circuiting in sql server?
AND (#q = '' OR FREETEXT(dbo.MediaLibraryCultures.*, #q))
seems to work just fine!
Strangely, the full text scan is still part of the execution plan.
Doesn't work on SQL Server 2014. I tried the suggested short circuit in one of my stored procedures, but it keeps evaluating the FREETEXT expression. The only solution I have found is the following:
IF ISNULL(#Text, N'') = N'' SET #Text = N'""'
SELECT ...
WHERE ...
AND (#Text = '""' OR FREETEXT([Data], #Text)
In Management Studio, you can right click on the tables group to create a filter for the table list. Has anyone figured out a way to include multiple tables in the filter? For example, I'd like all tables with "br_*" and "tbl_*" to show up.
Anyone know how to do this?
No, you can't do this. When we first got Management Studio I've tried every possible combination of everything you could think of: _, %, *, ", ', &&, &, and, or, |, ||, etc...
You might be able to roll your own addon to SMSS that would allow you to do what you are looking for:
The Black Art of Writing a SQL Server Management Studio 2005 Add-In
Extend Functionality in SQL Server 2005 Management Studio with Add-ins
The first one is specifically for searching and displaying all schema objects with a given name so you might be able to expand upon that for what you are looking for.
I'm using SQL Server Management Studio v17.1 and it has a SQL injection bug in it's filter construction, so you can actually escape default
tbl.name like '%xxx%'
and write your own query (with some limitations). For example to filter tables that are ending with "_arch", "_hist", "_purge" I used following filter value
_arch') and RIGHT(tbl.name, 5) != N'purge' and RIGHT(tbl.name, 4) != N'hist' and not(tbl.name like N'bbb
You can use SQL Server Profiler to see the constructed query and adjust it as needed.
Not sure if this same bug is available in previous SQL Management Studio versions or when it will be fixed, but for now I'm happy with the result.
I've used Toad for SQL Server (freeware version) which has very nice filtering options.
At first it looks like it could use a CONTAINS query (e.g. "br_*" OR "tbl_*"), but it doesn't seem to. It seems to only support a value that is then passed into a LIKE clause (e.g. 'app' becomes '%app%').
The "sql injection" method still works (v17.5), but with a twist:
zzzz' or charindex('pattern1', name) > 0 or charindex('pattern2', name) > 0 or name like 'zzzz
(I used the 'zzzz' to bypass the '%')
It doesn´t work if '_' or '%' is used in the patterns (or anywhere on your code), because it will automatically be replaced by '[_]' or '[%]' before evaluation.
As others have said, you cannot do this in SQL Server Management Studio (up and including 2014).
The following query will give you a filtered list of tables, if this is all you need:
SELECT
CONCAT(TABLE_SCHEMA, '.', TABLE_NAME) AS TABLE_SCHEMA_AND_NAME,
TABLE_SCHEMA,
TABLE_NAME
FROM
INFORMATION_SCHEMA.TABLES
WHERE
TABLE_SCHEMA IN ('X', 'Y', 'Z') -- schemas go here
ORDER BY
TABLE_SCHEMA,
TABLE_NAME;
The SQL injection method still works (somewhat) as of SSMS 2017 v17.8.1, although it puts brackets around the % symbol, so it will interpret those literally.
If you're using the Name->Contains filter, Profiler shows:
... AND dtb.name LIKE N'%MyDatabase1%')
So, in the Name->Contains field: MyDatabase1') OR (dtb.name LIKE 'MyDatabase2 should do it for simple cases.
This is old I know, but it's good to know that it can works if you input just entering the "filter" text. Skip * or % or any other standard search characters, just enter br_ or tbl_ or whatever you want to filter on.
Your in luck, I just conquered that feat, although my success is small because you can filter by schema which would allow you see more than 1 table but you have to type the filter text in each time you want to change it.