T-SQL Check string pattern - sql-server

Just curious that is there any easy way to filter certain string out instead of using the following method:
example: for AccountNumber attribute, that should allow exactly 10 digits as the value, like, 0123456789,
So for the query I made like :
#input like '[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'
I am just wondering is there any alternate way to write this query? for those value which require exact 100 digits, nobody want to count while keep pasting [0-9], right? I notice there is something in C# like ^(\d{10})$, but I cannot find such matching method in TSQL, does this similar method exist?

Your logic is fine. You can also write this as:
where len(AccountNumber) = 10 and AccountNumber not like '%[^0-9]%'
That is, the length is 10 and it contains no characters that are not digits.

You could use
#input like REPLICATE('[0-9]',10) COLLATE Latin1_General_100_BIN2
The explicit collate clause is because in some collations the range will match things that aren't strictly digits.

Related

How to take apart information between hyphens in SQL Server

How would I take apart a column that contains string:
92873-987dsfkj80-2002-04-11
20392-208kj48384-2008-01-04
Data would look like this:
Filename Yes/No Key
Abidabo Yes 92873-987dsfkj80-2002-04-11
Bibiboo No 20392-208kj48384-2008-01-04
Want it to look like this:
Filename Yes/No Key
Abidabo Yes 92873-987dsfkj80-20020411
Bibiboo No 20392-208kj48384-20080104
whereby I would like to concat the dates in the end as 20020411 and 20080104. From the right side, the information is the same always. From the left it is not, otherwise I could have concatenated it. It is not an import issue.
As mentioned in the comments already, storing data like this is a bad idea. However, you can obtain the dates from those strings by using a RIGHT function like so:
SELECT RIGHT('20392-208kj48384-2008-01-04', 10)
Output:
2008-01-04
Depending on the SQLSERVER version you are using, you can use STRING_SPLIT which requieres COMPATIBILITY_LEVEL 130. You can also build your own User Defined Function to split the contents of a field and manipulate it as you need, you can find some useful examples of SPLIT functions in this thread:
Split function equivalent in T-SQL?
Assuming I'm correct and the date part is always on the right side of the string, you can simply use RIGHT and CAST to get the date (assuming, again, that the date is represented as yyyy-mm-dd):
SELECT CAST(RIGHT(YourColumn, 10) As Date)
FROM YourTable
However, Panagiotis is correct in his comment - You shouldn't store data like that. Each column in the database should hold only a single point of data, be it string, number or date.
Update following your comment and the updated question:
SELECT LEFT(YourColumn, LEN(YourColumn) - 10) + REPLACE(RIGHT(YourColumn, 10), '-', '')
FROM YourTable
will return the desired results.

T-SQL Regex for social security number (SQL Server 2008 R2)

I need to find invalid social security numbers in a varchar field in a SQL Server 2008 database table. (Valid SSNs are being defined by being in the format ###-##-#### - doesn't matter what the numbers are, as long as they are in that "3-digit dash 2-digit dash 4-digit" pattern.
I do have a working regex:
SELECT *
FROM mytable
WHERE ssn NOT LIKE '[0-9][0-9][0-9]-[0-9][0-9]-[0-9][0-9][0-9][0-9]'
That does find the invalid SSNs in the column, but I know (okay - I'm pretty sure) that there is a way to shorten that to indicate that the previous pattern can have x iterations.
I thought this would work:
'[0-9]{3}-[0-9]{2}-[0-9]{4}'
But it doesn't.
Is there a shorter regex than the one above in the select, or not? Or perhaps there is, but T-SQL/SQL Server 2008 doesn't support it!?
If you plan to get a shorter variant of your LIKE expression, then the answer is no.
In T-SQL, you can only use the following wildcards in the pattern:
%
- Any string of zero or more characters.
WHERE title LIKE '%computer%' finds all book titles with the word computer anywhere in the book title.
_ (underscore)
Any single character.
WHERE au_fname LIKE '_ean' finds all four-letter first names that end with ean (Dean, Sean, and so on).
[ ]
Any single character within the specified range ([a-f]) or set ([abcdef]).
WHERE au_lname LIKE '[C-P]arsen' finds author last names ending with arsen and starting with any single character between C and P, for example Carsen, Larsen, Karsen, and so on. In range searches, the characters included in the range may vary depending on the sorting rules of the collation.
[^]
Any single character not within the specified range ([^a-f]) or set ([^abcdef]).
So, your LIKE statement is already the shortest possible expression. No limiting quantifiers can be used (those like {min,max}), not shorthand classes like \d.
If you were using MySQL, you could use a richer set of regex utilities, but it is not the case.
I suggest you to use another solution like this:
-- Use `REPLICATE` if you really want to use a number to repeat
Declare #rgx nvarchar(max) = REPLICATE('#', 3) + '-' +
REPLICATE('#', 2) + '-' +
REPLICATE('#', 4);
-- or use your simple format string
Declare #rgx nvarchar(max) = '###-##-####';
-- then use this to get your final `LIKE` string.
Set #rgx = REPLACE(#rgx, '#', '[0-9]');
And you can also use something like '_' for characters then replace it with [A-Z] and so on.

How can I match a complex pattern in SQL Server?

I have a table of allowed formats that I need to be able to lookup based on a string. Some example formats are:
^.^.testdomain.com
^.testdomain.com
^qwerty^.testdomain.com
I need to return one or more matching rows based on a sample input of:
device.name.testdomain.com would return the first and second format
johnqwertysmith.testdomain.com would return all the formats
Unfortunately the character '^' is fixed as the wildcard, this is a constraint I can't change.
I was hoping that perhaps regex could save me here, maybe using the format as the pattern and comparing it to the input. Regex is not something I have used much in the past so I'm not really sure of its capabilities. Failing that, is there any other way of performing this lookup?
Like this:
SELECT *
FROM yourTable
WHERE #INPUT LIKE REPLACE(yourTable.savedFormat, '^', '%')
This won't allow you to leverage any indexs on your savedFormat column, but I suspect thats hopeless anyway for these requirements.

finding exact strings with sql like statement

I have the following statement
SELECT *
FROM Delivery_Note this_
WHERE mycol like '%[126]%'
I've noticed this also returns rows that contain [123]. What is the best way to find exact matching in a string.
edit
I appreciate this is not an efficient query. There a few other reasons for this approach.
Brackets specify a range when used with LIKE. You can use the ESCAPE keyword.
WHERE mycol LIKE '%\[126\]%' ESCAPE '\';
Of course if you are trying to match an exact string, you don't need LIKE, or you can drop the % characters and LIKE will behave like = (this makes it flexible to pass in wildcards or exact matches to a parameter).

Function to find the Exact match in Microsoft SQL Server

What is the way to find the exactly matching substring in the given string in Microsoft SQL server?
For example, in the string '0000020354', I want to find '20354'. Of course it has to be an exact match. I tried to use CHARINDEX(#providerId, external_prv_id) > -1, but the problem with CHARINDEX is that it gives me the index as soon as it finds the first match.
Basically I am looking for function like indexOf("") in Microsoft SQL SERVER.
Assuming #ProviderId is a VARCHAR
You could just use LIKE :
SELECT Id FROM TableName WHERE Column LIKE '%' + #ProviderId + '%'
Which will return rows where Column contains 2034.
And if you don't want to use LIKE, You can use PATINDEX:
SELECT Id FROM TableName WHERE PATINDEX('%' + #ProviderId + '%', Column) > 0
Which returns the starting position of any match that it finds.
What's the data you're storing? It sounds like another storage type (e.g. a separate table) might be more suitable.
Ahh, 2034 was a typo. What I don't understand from your question is that you say you need the exact match. If CHARINDEX returns non-zero for '20354' you know that it's matched '20354'. If you don't know what #providerId is, return that in your query along with the result of CHARINDEX. Similarly, if you want external_prv_id, include that, e.g.:
SELECT external_prv_id, CHARINDEX(#providerId, external_prv_id)
WHERE CHARINDEX(#providerId, external_prv_id) > 0
(Note that CHARINDEX returning 0 means it was not found.)
If you actually mean that '20354' could include wildcards, you need PATINDEX.
The LIKE %VAL% stuff will be overly broad, e.g. the database contains 00000012345 and you search for 1234 you'll pull this row, which is what the OP does not intend (if I'm understanding the "EXACT" part correctly).
What you want is a regular expression that does something like: any number of zeroes followed by the match and end of line.
From this question we know how to trim leading zeroes:
Better techniques for trimming leading zeros in SQL Server?
SUBSTRING(str_col, PATINDEX('%[^0]%', str_col+'.'), LEN(str_col))
So, combine that with your query, and you can do something like the following:
WHERE SUBSTRING(external_prv_id, PATINDEX('%[^0]%', external_prv_id+'.'), LEN(external_prv_id)) = '12345'
Of course, the better (best?) solution would be to store them as INTEGERS so you get full indexability and don't have to muck with all of this crap. If you REALLY need to store the exact string then you have a couple of options:
store the normalized integer results
in another column and use that for
all internal queries
always store an integer but then pad
with zeros upon query (my vote)

Resources