How can I match a complex pattern in SQL Server? - sql-server

I have a table of allowed formats that I need to be able to lookup based on a string. Some example formats are:
^.^.testdomain.com
^.testdomain.com
^qwerty^.testdomain.com
I need to return one or more matching rows based on a sample input of:
device.name.testdomain.com would return the first and second format
johnqwertysmith.testdomain.com would return all the formats
Unfortunately the character '^' is fixed as the wildcard, this is a constraint I can't change.
I was hoping that perhaps regex could save me here, maybe using the format as the pattern and comparing it to the input. Regex is not something I have used much in the past so I'm not really sure of its capabilities. Failing that, is there any other way of performing this lookup?

Like this:
SELECT *
FROM yourTable
WHERE #INPUT LIKE REPLACE(yourTable.savedFormat, '^', '%')
This won't allow you to leverage any indexs on your savedFormat column, but I suspect thats hopeless anyway for these requirements.

Related

Customize Normalization in SQL Server Full Text Search by replacing characters

I want to customize SQL Server FTS to handle language specific features better.
In many language like Persian and Arabic there are similar characters that in a proper search behavior they should consider as identical char like these groups:
['آ' , 'ا' , 'ء' , 'ا']
['ي' , 'ی' , 'ئ']
Currently my best solution is to store duplicate data in new column and replace these characters with a representative member and also normalize search term and perform search in the duplicated column.
Is there any way to tell SQL Server to treat any members of these groups as an identical character?
as far as i understand ,this would be used for suggestioning purposes so the being so accurate is not important. so
in farsi actually none of the character in list above doesn't share same meaning but we can say they do have a shared short form in some writing cases ('آ' != 'اِ' but they both can write as 'ا' )
SCENARIO 1 : THE INPUT TEXT IS IN COMPLETE FORM
imagine "محمّد" is a record in a table formatted (id int,text nvarchar(12))named as 'table'.
after removing special character we can use following command :
select * from [db].[dbo].[table] where text REPLACE(text,' ّ ','') = REPLACE(N'محمد',' ّ ','');
the result would be
SCENARIO 2: THE INPUT IS IN SHORT FORMAT
imagine "محمد" is a record in a table formatted (id int,text nvarchar(12))named as 'table'.
in this scenario we need to do some logical operation on text before we query in data base
for e.g. if "محمد" is input as we know and have a list of this special character ,it should be easily searched in query as :
select * from [db].[dbo].[table] where REPLACE(text,' ّ ','') = 'محمد';
note:
this solution is not exactly a best one because the input should not be affected in client side it, would be better if the sql server configure to handle this.
for people who doesn't understand farsi simply he wanna tell sql that َA =["B","C"] and a have same value these character in the list so :
when a "dad" word searched, if any word "dbd" or "dcd" exist return them too.
add:
some set of characters can have same meaning some of some times not ( ['ي','أ'] are same but ['آ','اِ'] not) so in we got first scenario :
select * from [db].[dbo].[table] where text like N'%هی[أي]ت' and text like N'هی[أي]ت%';

How to take apart information between hyphens in SQL Server

How would I take apart a column that contains string:
92873-987dsfkj80-2002-04-11
20392-208kj48384-2008-01-04
Data would look like this:
Filename Yes/No Key
Abidabo Yes 92873-987dsfkj80-2002-04-11
Bibiboo No 20392-208kj48384-2008-01-04
Want it to look like this:
Filename Yes/No Key
Abidabo Yes 92873-987dsfkj80-20020411
Bibiboo No 20392-208kj48384-20080104
whereby I would like to concat the dates in the end as 20020411 and 20080104. From the right side, the information is the same always. From the left it is not, otherwise I could have concatenated it. It is not an import issue.
As mentioned in the comments already, storing data like this is a bad idea. However, you can obtain the dates from those strings by using a RIGHT function like so:
SELECT RIGHT('20392-208kj48384-2008-01-04', 10)
Output:
2008-01-04
Depending on the SQLSERVER version you are using, you can use STRING_SPLIT which requieres COMPATIBILITY_LEVEL 130. You can also build your own User Defined Function to split the contents of a field and manipulate it as you need, you can find some useful examples of SPLIT functions in this thread:
Split function equivalent in T-SQL?
Assuming I'm correct and the date part is always on the right side of the string, you can simply use RIGHT and CAST to get the date (assuming, again, that the date is represented as yyyy-mm-dd):
SELECT CAST(RIGHT(YourColumn, 10) As Date)
FROM YourTable
However, Panagiotis is correct in his comment - You shouldn't store data like that. Each column in the database should hold only a single point of data, be it string, number or date.
Update following your comment and the updated question:
SELECT LEFT(YourColumn, LEN(YourColumn) - 10) + REPLACE(RIGHT(YourColumn, 10), '-', '')
FROM YourTable
will return the desired results.

T-SQL Check string pattern

Just curious that is there any easy way to filter certain string out instead of using the following method:
example: for AccountNumber attribute, that should allow exactly 10 digits as the value, like, 0123456789,
So for the query I made like :
#input like '[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
I am just wondering is there any alternate way to write this query? for those value which require exact 100 digits, nobody want to count while keep pasting [0-9], right? I notice there is something in C# like ^(\d{10})$, but I cannot find such matching method in TSQL, does this similar method exist?
Your logic is fine. You can also write this as:
where len(AccountNumber) = 10 and AccountNumber not like '%[^0-9]%'
That is, the length is 10 and it contains no characters that are not digits.
You could use
#input like REPLICATE('[0-9]',10) COLLATE Latin1_General_100_BIN2
The explicit collate clause is because in some collations the range will match things that aren't strictly digits.

SQL Server validating postcodes

I have a table containing postcodes but there is no validation built in to the entry form so there is no consistency in the way they are stored in the database, sample below:
ID Postcode
001742 B5
001745
001746
001748 DY3
001750
001751
001768 B276LL
001774 B339HY
001776 B339QY
001780 WR51DD
I want to use these postcode to map the distance from a central point but before I can do that I need to put them into a valid format and filter out any blanks or incomplete postcodes.
I had considered using
left(postcode,3) + ' ' + right(postcode,3)
To correct the formatting but this wouldn't work for postcodes like 'M6 8HD'
My aim is to get the list of postcodes in a valid format but I don't know how to account for different lengths of postcode. Is this there a way to do this in SQL Server?
As discussed in the comments, sometimes looking at a problem the other way around presents a far simpler solution.
You have a list of arbitrary input provided by users, which frequently doesn't contain the correct spacing. You also have a list of valid postcodes which are correctly spaced.
You're trying to solve the problem of finding the correct place to insert spaces into your arbitrary inputs to make them match the list of valid codes, and this is extremely difficult to do in practice.
However, performing the opposite task - removing the spaces from the valid postcodes - is remarkably easy to do. So that is what I'd suggest doing.
In our most recent round of data modelling, we have modelled addresses with two postcode columns - PostCode containing the postcode as provided from whatever sources, and PostCodeNoSpace, a computed column which strips whitespace characters from PostCode. We use the latter column for e.g. searches based on user input. You may want to do something similar with your list of Valid postcodes, if you're keeping it around permanently - so that you can perform easy matches/lookups and then translate those matches back into a version that has spaces - which is actually a solution to the original question posed!

SQL LIKE Operator doesn't work with Asian Languages (SQL Server 2008)

Dear Friends,
I've faced with a problem never thought of ever. My problem seems too simple but I can't find a solution to it.
I have a sql server database column that is of type NVarchar and is filled with standard persian characters. when I'm trying to run a very simple query on it which incorporates the LIKE operator, the resultset becomes empty although I know the query term is present in the table. Here is the very smiple example query which doesn't act corectly:
SELECT * FROM T_Contacts WHERE C_ContactName LIKE '%ف%'
ف is a persian character and the ContactName coulmn contains multiple entries which contain that character.
Please tell me how should I rewrite the expression or what change should I apply. Note that my database's collation is SQL_Latin1_General_CP1_CI_AS.
Thank you very much
Also, if those values are stored as NVARCHAR (which I hope they are!!), you should always use the N'..' prefix for any string literals to make sure you don't get any unwanted conversions back to non-Unicode VARCHAR.
So you should be searching:
SELECT * FROM T_Contacts
WHERE C_ContactName COLLATE Persian_100_CI_AS LIKE N'%ف%'
Shouldn't it be:
SELECT * FROM T_Contacts WHERE C_ContactName LIKE N'%ف%'
ie, with the N in front of the comparing string, so it treats it like an nvarchar?

Resources