SQL query to search for patterns against a specific string - sql-server

I am currently dealing with a table containing a list of hundreds of part number patterns for discount purposes.
For example:
1) [FR]08-[01237]0[67]4E-%
2) _10-[01]064[CD]-____
3) F12-[0123]0[67]4C-%
I have a string search criteria: F10-1064C-02TY and I am trying to find out which pattern(s) matches that particular string. For this example my query would return the second pattern. The objective is to find the correct part discount based on the matched pattern(s).
What is the best approach in handling this type of problem? Is there a simple and elegant approach or does this involve some complex TSQL procedure?

The pattern for the right side of a LIKE clause can be any expression, which includes values from a table. This means you can use your patterns table in a query i.e.
SELECT PatternId, Pattern
FROM Patterns
WHERE 'F10-1064C-02TY' LIKE Pattern
You can build from here - if your part numbers are stored in a different table, join to that table using the LIKE clause as a join criterion, or build a procedure that takes a part number parameter, or whatever else fits your requirements.

Related

Can a Snowflake UDF be used to create MD5 on the fly?

I was wondering if anyone has an example of creating an MD5 result using an UDF in Snowflake?
Scenario: I want a UDF that can set X columns depending on the source to create an MD5 result. So table A might have 5 columns, table B has 10....and accounting for various data types.
Thanks,
Todd
Snowflake already provided md5 built in fucntion.
https://docs.snowflake.com/en/sql-reference/functions/md5.html
select md5('Snowflake');
----------------------------------+
MD5('SNOWFLAKE') |
----------------------------------+
edf1439075a83a447fb8b630ddc9c8de |
----------------------------------+
There are many ways you can do the MD5 calculation. But I thought it will be good to understand your use case. I am assuming that you want to use MD5 to validate the data migrated to Snowflake. If that is the case, then MD5 way of checking each row on snowflake may be expensive. A more optimal way of validation will be to identify each column for the table and calculate the MIN, MAX, COUNT, NUMBER OF NULLS, DISTINCT COUNT for each column and validate it with the source. I have created a framework with this approach where I use the 'SHOW COLUMNS' query to get the list if COLUMNS. The framework also allows to skip some columns if required, also filter the number of rows retrieved based on a dynamic criteria. This way of validating the data will be more optimal. It will definitely help to understand your use case better.
MD5
Does this work for you
create or replace function md5_calc (column_name varchar)
returns varchar
LANGUAGE SQL
AS $$
select md5(column_name)
$$;
SELECT EMPLID,md5_calc(EMPLID),EMPNAME,md5_calc(EMPNAME) from employee;

Splitting a VARCHAR column into multiple columns

I am struggling to split the data in the column into multiple columns.
I have data of names of customers and the data needs cleaning as there can be duplicates and I also need to set up new standards for the future data.
I have been able to successfully split the first two words in the string but not being able to split further data.
I only have read permissions. So I cannot create any functions.
For example:
Customer name: Illinois Institute of Technology
My query will only fetch "Illinois" in one column and "Institute of Technology" in other column. Considering delimiter as 'space', I am looking to separate each word into separate columns. I am not sure how to identify the 2nd space and further spaces.
I have also tried using 'parsename' function, but I feel it will create more difficulty in cleaning the data.
select name,
left (name, CHARINDEX(' ', name)) as f,
substring(name, CHARINDEX(' ', name)+1, len(name)) as s
from customer
EDIT: This only works for SQL Server 2016 and above. OP has SQL Server 2014.
There isn't really a good way to do this, but here's one method that might work for you, modified from an example here:
create table #customer (id int, name nvarchar(max))
insert into #customer
values (1, 'Illinois Institute of Technology'),
(2, 'The City University of New York'),
(3, 'University of the District of Columbia'),
(4, 'Santa Fe University of Art and Design')
;
with c as(
select id, name
,value
,row_number() over(partition by id order by (select null)) as rn
from #customer
cross apply string_split(name, ' ') as bk
)
select id, name
,[1]
,[2]
,[3]
,[4]
,[5]
,[6]
from c
pivot(
max(value)
for rn in([1],[2],[3],[4],[5],[6])
) as pvt
drop table #customer
Notice a few things:
You have to explicitly declare columns in the output. You could create some overly-complex dynamic SQL that would generate as many column names as you need, but that makes it harder to fix issues and make modifications, and you probably won't get the same query optimisations.
Because of (1), you will just end up dropping words if there are too many to fit the number of columns you've defined. See the last example, id=4.
Beware of other methods that might not keep your words in order, or that skip out duplicate words, eg. "of" in the example id=3.
You don't mention what you plan to do with the data once you retrieve it. Since you only have read permissions you can't store it in a table. Something you may not have thought of is to create a local database where you do have write permissions and do your work there. The easiest way would be to get a copy of the database, but you could also access the readonly database using fully qualified names.
As for your string-splitting needs, I can point you to a great way of splitting strings created by a gentleman named Jeff Moden. You can find an article discussing it as well as a link to code here:
Tally OH! An Improved SQL 8K “CSV Splitter” Function
While it's a very informative read, inasmuch as much of it is about performance testing and such, you might want to skip much of it and go straight for the code, but try to pick out the stuff which discusses functionality and read that because the code will strike the uninitiated as unconventional at best.
The code creates a function but because you don't have permission to do that you will have to remove the meat from the function and use it directly.
I'll try to provide a little overview of the approach to get you started.
The heart of the approach is a Tally Table. If you are unfamiliar with that term (it is also called a Numbers table), it is basically a table in which every row contains an integer and the rows are basically a set of all integers in some range, usually pretty large. So how does a Tally Table help with splitting strings? The magic happens by joining the tally table to the table containing the strings to be split and using the where clause to identify the delimiters by looking at 1-character substrings indexed by the tally table numbers. The natural set-based operations of SQL Server then search for all of your delimiters in one go and your select list then extracts the substrings bracketed by the delimiters. It's really quite clever and very fast.
When you get into the code, the first part of the function may look very strange (because it is), but it is necessary since you only have read rights. It is basically using SQL Server's Common Table Expression (CTE) functionality to build an internal tally table on the fly using some ugly logic which you don't really need to understand (but if you want to dig in, it's clever even if it is ugly). Since the table is only local to the query it won't violate your readonly permissions.
It also uses CTEs to represent the starting index and length of the delimited substrings so the final query is pretty simple, yielding rows with a row number followed by a string split from the original data.
I hope this helps you with your task -- it really is a nice tool to have in your toolkit.
Edit: I just realized that you wanted your output to be in separate columns as opposed to rows. That's quite a bit more difficult since each of the columns you are splitting might produce a different number of strings and also, your columns will need names. If you already know the column names and know the number of output strings it will be easier but still tricky. The row data from the splitter could be tweaked to provide an identifier for the row where the data originated and the row numbers might help with creating arbitrary column names if you need them, but the big problem is that with only read privileges you will find processing things in steps rather tricky -- CTEs can be employed even further for this but your code will likely get rather messy unless the requirements are pretty simple.

ContainsTable across columns with 'and'

I need to run a full text search over a table from within a stored procedure.
There are a couple of things I need to take into account:
Unknown amount of search words, this depends on user input
The table we are searching can change (depending on what we are searching for)
Currently I am using a query that results in this:
SELECT *
FROM Table
INNER JOIN CONTAINSTABLE(Table,*,'"searchword*" AND "secondsearchword*"') s ON s.[Key] = Table.[Key]
However the 'contains_search_condition' is performed for each column individually.
What I need is to get the rows where all search words are contained within any column. How do I do this? Is the only option the use of computed columns/Views or is there another solution?

SQL server search

I'm going to perform a search in my SQL server DB (ASP.NET, VS2010,C#), user types a phrase and I should search this phrase in several fields, how is it possible? do we have functions such as CONTAINS() in SQL server? can I perform my search using normal queries or I should work in my queries using C# functions?
for instance I have 3 fields in my table which can contain user search phrase, is it OK to write following sql command? (for instance user search phrase is GAME)
select * from myTable where columnA='GAME' or columnB='GAME' or columnC='GAME
I have used AND between different conditions, but can I use OR? how can I search inside my table fields? if one of my fields contains the phrase GAME, how can I find it? columnA='GAME' finds only those fields that are exactly 'GAME', is it right?
I'm a bit confused about my search approach, please help me, thanks guys
OR works fine if you want at least one of the conditions to be true.
If you want to search inside your text strings you can use LIKE
select * from myTable where columnA like '%GAME%' or columnB like '%GAME%' or columnC like '%GAME%'
Note that % is the wildcard.
If you want to find everything that begins with 'GAME' you type LIKE 'GAME%', if you allow 'GAME' to be in the middle you need % in both ends.
You can use LIKE instead of equals and then it can contain wildcard characters, so your example could be:
select * from myTable where columnA LIKE '%GAME%' or columnB LIKE '%GAME%' or columnC LIKE '%GAME%'
Further information may be found in MSDN
This is going to do some pretty heavy lifting in terms of what the database has to do though - I would suggest you consider something like full text search as I think it would more likely be suited to your scenario and provide faster results (of course, if you never have many records to search LIKE would probably do fine). Information on this is also in MSDN
Don't use LIKE, as suggested by other answers. It won't work with indexes, and therefore will be slow to return and expensive to run. Instead, you have two options:
Option 1: Full-Text Indexes
do we have functions such as CONTAINS() in SQL server?
Yes! You can use the CONTAINS() function in sql server. You just have to set up a full-text index for each of the columns you need to search on.
Option 2: Lucene.Net
Lucene.Net is a popular client-side library for searching text data that integrates closely with Sql Server. You can use it to make implementing your search a little easier.

Entity Framework, full-text search and temporary tables

I have a LINQ-2-Entity query builder, nesting different kinds of Where clauses depending on a fairly complex search form. Works great so far.
Now I need to use a SQL Server fulltext search index in some of my queries. Is there any chance to add the search term directly to the LINQ query, and have the score available as a selectable property?
If not, I could write a stored procedure to load a list of all row IDs matching the full-text search criteria, and then use a LINQ-2-Entity query to load the detail data and evaluate other optional filter criteria in a loop per row. That would be of course a very bad idea performance-wise.
Another option would be to use a stored procedure to insert all row IDs matching the full-text search into a temporary table, and then let the LINQ query join the temporary table. Question is: how to join a temporary table in a LINQ query, as it cannot be part of the entity model?
I think I would probably suggest a hybrid approach.
Write a stored procedure which returns all the information you need.
Map an entity to the results. The entity can be created for this sole purpose. Alternately, use version 4 of the Entity Framework, which allows mapping complex types to start procedure results. The point is that instead of trying to coerce the procedure results in to existing entity types, were going to handle them as their own type.
Now you can build a LINQ to Entities query.
Sample query:
var q = from r in Context.SearchFor("searchText")
let fooInstance = (r.ResultType == "Foo")
? Context.Foos.Where(f => f.Id == r.Id)
: null
where ((fooInstance == null) || (fooInstance.SpecialCriterion == r.SpecialCriterion))
select {
// ...
This is off the top of my head, so the syntax might not be right. The important point is treating search results as an entity.
Alternately: Use a more flexible FTS system, which can do the "special", per-type filtering when building the index.
I've seen code like this for EF4:
var query = context.ExecuteStoreQuery<Person>(
"SELECT * FROM People WHERE FREETEXT(*,{0})",
searchText
).AsQueryable();
This may be simpler than creating a stored proc or UDP in some cases.

Resources