Select only alphabetic prefix from a column in sql server - sql-server

I have a regex expression I need to convert to a sql server query. The goal is to select the alphabetic prefix from a column, so AAA-1234BC would become AAA, ABC123 would become ABC and 1234A would become ''.
From my understanding, regex isnt natively supported in SQL server, so I am wondering if there is another purely sql way to do this.
Thanks

We can use PATINDEX here:
SELECT
col,
CASE WHEN col LIKE '[A-Z]%'
THEN LEFT(col, PATINDEX('%[^A-Za-z]%', col) - 1)
ELSE '' END AS first_alpha
FROM yourTable;
Demo
The above logic finds, in the case of strings which do begin with a letter, the index of the first non letter character, and then takes a substring from the first letter until before that first non letter. Otherwise, it returns empty string.

Related

Remove special character from customerId column in SQL Server

I want remove special character from CustomerId column.
Currently CustomerId column contains ¬78254782 and I want to remove the ¬ character.
Could you please help me with that ?
Applying the REPLACE T-SQL function :
SELECT REPLACE(CustomerId, '¬', '') FROM Customers;
SQL Server does not really have any regex support, which is what you would probably want to be using here. That being said, you could try using the enhanced LIKE operator as follows:
UPDATE yourTable
SET CustomerId = RIGHT(CustomerId, LEN(CustomerId) - 1)
WHERE CustomerId LIKE '[^A-Za-z0-9]%';
Here we are phrasing the condition of the first character being special using [^A-Za-z0-9], followed by anything else. In that case, we substring off the first character in the update.

SQL Server: Replace Arabic Character in Select Statement

I would like to replace Arabic character in select statement with another Arabic character in SQL server, example query:
select replace(ArabicName, 'أ', 'ا') as ArabicName from dataIndividual;
But it does not replace, the reason probably that SQL server does not read the character as Unicode but as ASCII (I guess).
Is there a way to pass a Unicode characters for replace function?
Note: I already tried collate after the Arabic character.
Prefix your string with N
select replace(ArabicName, N'أ' , N'ا') as ArabicName
from dataIndividual;

Determining index of last uppercase letter in column value (SQL)?

Short version: Is there a way to easily extract and ORDER BY a substring of values in a DB column based on the index (position) of the last upper case letter in that value, only using SQL?
Long version: I have a table with a username field, and the convention for usernames is the capitalized first initial of the first name, followed by the capitalized first initial of the last name, followed by the rest of the last name. As a result, ordering by the username field is 'wrong'. Ordering by a substring of the username value would theoretically work, e.g.
SUBSTRING(username,2, LEN(username))
...except that there are values with a capitalized middle initials between the other two initials. I am curious to know if, using only SQL (MS SQL Server,) there is a fairly straightforward / simple way to:
Test the case of a character in a DB value (and return a boolean)
Determine the index of the last upper case character in a string value
Assuming this is even remotely possible, I assume one would have to loop through the individual letters of each username to accomplish it, making it terribly inefficient, but if you have a magical shortcut, feel free to share. Note: This question is purely academic as I have since decided to go a much simpler way. I am just curious if this is even possible.
Test the case of a character in a DB value (and return a boolean)
SQL Server does not have a boolean datatype. bit is often used in its place.
DECLARE #Char CHAR(1) = 'f'
SELECT CAST(CASE
WHEN #Char LIKE '[A-Z]' COLLATE Latin1_General_Bin
THEN 1
ELSE 0
END AS BIT)
/* Returns 0 */
Note it is important to use a binary collation rather than a case sensitive collate clause with the above syntax. If using a CS collate clause the pattern would need to be spelled out in full as '[ABCDEFGHIJKLMNOPQRSTUVWXYZ]' to avoid matching lower case characters.
Determine the index of the last upper case character in a string value
SELECT PATINDEX('%[A-Z]%' COLLATE Latin1_General_Bin, REVERSE('Your String'))
/* Returns one based index (6 ) */
SELECT PATINDEX('%[A-Z]%' COLLATE Latin1_General_Bin, REVERSE('no capitals'))
/* Returns 0 if the test string doesn't contain any letters in the range A-Z */
To extract the surname according to those rules you can use
SELECT RIGHT(Name,PATINDEX('%[A-Z]%' COLLATE Latin1_General_Bin ,REVERSE(Name)))
FROM YourTable

WHERE clause on VARCHAR column seems to operate as a LIKE

I've stumbled across a situation I've never seen before. I hope that someone can explain the following.
I've ran the following query, hoping to get only the results of columns whoes value is exactly equal to 1101
select '--' + MyColumn + '--' SeeSpaces, Len(MyColumn) as LengthOfColumn
from MyTable
where MyColumn = '1101'
However, I also see values where 1101 is followed by (what I believe are) spaces.
So SeeSpaces returns
--1101 --
And LengthOfColumn returns 4
MyColumn is a VARCHAR(8), NOT NULL column. Its values (including the spaces) are inserted through a separate workflow.
Why does this select not return only the exact results?
Thanks in advance
The reason is to do with the way that SQL server compares strings with trailing spaces, it follows the ANSI standard and so the strings '1101' and '1101 ' are equivalent.
See the following for more details:
INF: How SQL Server Compares Strings with Trailing Spaces
I think you have to use LTRIM() and RTRIM() function while comparing like :
LTRIM(RTRIM(MYCOLUMN))='1101'
Also LEN function does not count spaces, it only count characters in string. Please refere : http://msdn.microsoft.com/en-us/library/ms190329%28SQL.90%29.aspx

How can I make SQL Server return FALSE for comparing varchars with and without trailing spaces?

If I deliberately store trailing spaces in a VARCHAR column, how can I force SQL Server to see the data as mismatch?
SELECT 'foo' WHERE 'bar' = 'bar '
I have tried:
SELECT 'foo' WHERE LEN('bar') = LEN('bar ')
One method I've seen floated is to append a specific character to the end of every string then strip it back out for my presentation... but this seems pretty silly.
Is there a method I've overlooked?
I've noticed that it does not apply to leading spaces so perhaps I run a function which inverts the character order before the compare.... problem is that this makes the query unSARGable....
From the docs on LEN (Transact-SQL):
Returns the number of characters of the specified string expression, excluding trailing blanks. To return the number of bytes used to represent an expression, use the DATALENGTH function
Also, from the support page on How SQL Server Compares Strings with Trailing Spaces:
SQL Server follows the ANSI/ISO SQL-92 specification on how to compare strings with spaces. The ANSI standard requires padding for the character strings used in comparisons so that their lengths match before comparing them.
Update: I deleted my code using LIKE (which does not pad spaces during comparison) and DATALENGTH() since they are not foolproof for comparing strings
This has also been asked in a lot of other places as well for other solutions:
SQL Server 2008 Empty String vs. Space
Is it good practice to trim whitespace (leading and trailing)
Why would SqlServer select statement select rows which match and rows which match and have trailing spaces
you could try somethign like this:
declare #a varchar(10), #b varchar(10)
set #a='foo'
set #b='foo '
select #a, #b, DATALENGTH(#a), DATALENGTH(#b)
Sometimes the dumbest solution is the best:
SELECT 'foo' WHERE 'bar' + 'x' = 'bar ' + 'x'
So basically append any character to both strings before making the comparison.
After some search the simplest solution i found was in Anthony Bloesch
WebLog.
Just add some text (a char is enough) to the end of the data (append)
SELECT 'foo' WHERE 'bar' + 'BOGUS_TXT' = 'bar ' + 'BOGUS_TXT'
Also works for 'WHERE IN'
SELECT <columnA>
FROM <tableA>
WHERE <columnA> + 'BOGUS_TXT' in ( SELECT <columnB> + 'BOGUS_TXT' FROM <tableB> )
The approach I’m planning to use is to use a normal comparison which should be index-keyable (“sargable”) supplemented by a DATALENGTH (because LEN ignores the whitespace). It would look like this:
DECLARE #testValue VARCHAR(MAX) = 'x';
SELECT t.Id, t.Value
FROM dbo.MyTable t
WHERE t.Value = #testValue AND DATALENGTH(t.Value) = DATALENGTH(#testValue)
It is up to the query optimizer to decide the order of filters, but it should choose to use an index for the data lookup if that makes sense for the table being tested and then further filter down the remaining result by length with the more expensive scalar operations. However, as another answer stated, it would be better to avoid these scalar operations altogether by using an indexed calculated column. The method presented here might make sense if you have no control over the schema , or if you want to avoid creating the calculated columns, or if creating and maintaining the calculated columns is considered more costly than the worse query performance.
I've only really got two suggestions. One would be to revisit the design that requires you to store trailing spaces - they're always a pain to deal with in SQL.
The second (given your SARG-able comments) would be to add acomputed column to the table that stores the length, and add this column to appropriate indexes. That way, at least, the length comparison should be SARG-able.

Resources