I want to get a substring in SQL Server from last sequence of a split on dot (.).
I have a column which contains file names such as hello.exe, and I want to find the extension of the file exactly as Path.GetExtension("filename") does in C#.
You can use reverse along with substring and charindex to get what you're looking for:
select
reverse(substring(reverse(filename), 1,
charindex('.', reverse(filename))-1)) as FileExt
from
mytable
This holds up, even if you have multiple . in your file (e.g.-hello.world.exe will return exe).
So I was playing around a bit with this, and this is another way (only one call to reverse):
select
SUBSTRING(filename,
LEN(filename)-(CHARINDEX('.', reverse(filename))-2), 8000) as FileExt
from
mytable
This calculates 10,000,000 rows in 25 seconds versus 29 seconds for the former method.
DECLARE #originalstring VARCHAR(100)
SET #originalstring = 'hello.exe'
DECLARE #extension VARCHAR(50)
SET #extension = SUBSTRING(#originalstring, CHARINDEX('.', #originalstring) + 1, 999)
SELECT #extension
That should do it, I hope! This works as long as you only have a single '.' in your file name - separating the file name from the extension.
Marc
Try this
SELECT RIGHT(
'C:\SomeRandomFile\Filename.dat',
CHARINDEX(
'.',
REVERSE(
'C:\SomeRandomFile\Filename.dat'
),
0)
-1)
Same as accepted answer, but I've added a condition to avoid error when filename is null or when filename has no extension (no point):
select
reverse(substring(reverse(filename), 1,
charindex('.', reverse(filename))-1)) as FileExt
from
mytable
where
filename is not null
and charindex('.',filename) > 0
The following SQL request adressed most of the edge cases in my weird database where many files didn't have extensions.
select distinct reverse(left(reverse(fileNameWithExtension), charindex('.', reverse(fileNameWithExtension)) - 1))
from myTable
where charindex('.', reverse(fileNameWithExtension)) - 1 > 0 and charindex('.', reverse(fileNameWithExtension)) - 1 < 7 and fileNameWithExtension is not null
Related
I am looking for a function that selects English numbers and letters only:
Example:
TEKA תנור ביל דין in HLB-840 P-WH לבן
I want to run a function and get the following result:
TEKA HLB-840 P-WH
I'm using MS SQL Server 2012
What you really need here is regex replacement, which SQL Server does not support. Broadly speaking, you would want to find [^A-Za-z0-9 -]+\s* and then replace with empty string. Here is a demo showing that this works as expected:
Demo
This would output TEKA in HLB-840 P-WH for the input you provided. You might be able to do this in SQL Server using a regex package or UDF. Or, you could do this replacement outside of SQL using any number of tools which support regex (e.g. C#).
SQL-Server is not the right tool for this.
The following might work for you, but there is no guarantee:
declare #yourString NVARCHAR(MAX)=N'TEKA תנור ביל דין in HLB-840 P-WH לבן';
SELECT REPLACE(REPLACE(REPLACE(REPLACE(CAST(#yourString AS VARCHAR(MAX)),'?',''),' ','|~'),'~|',''),'|~',' ');
The idea in short:
A cast of NVARCHAR to VARCHAR will return all characters in your string, which are not known in the given collation, as question marks. The rest is replacements of question marks and multi-blanks.
If your string can include a questionmark, you can replace it first to a non-used character, which you re-replace at the end.
If you string might include either | or ~ you should use other characters for the replacements of multi-blanks.
You can influence this approach by specifying a specific collation, if some characters pass by...
there is no build in function for such purpose, but you can create your own function, should be something like this:
--create function (split string, and concatenate required)
CREATE FUNCTION dbo.CleanStringZZZ ( #string VARCHAR(100))
RETURNS VARCHAR(100)
BEGIN
DECLARE #B VARCHAR(100) = '';
WITH t --recursive part to create sequence 1,2,3... but will better to use existing table with index
AS
(
SELECT n = 1
UNION ALL
SELECT n = n+1 --
FROM t
WHERE n <= LEN(#string)
)
SELECT #B = #B+SUBSTRING(#string, t.n, 1)
FROM t
WHERE SUBSTRING(#string, t.n, 1) != '?' --this is just an example...
--WHERE ASCII(SUBSTRING(#string, t.n, 1)) BETWEEN 32 AND 127 --you can use something like this
ORDER BY t.n;
RETURN #B;
END;
and then you can use this function in your select statement:
SELECT dbo.CleanStringZZZ('TEKA תנור ביל דין in HLB-840 P-WH לבן');
create function dbo.AlphaNumericOnly(#string varchar(max))
returns varchar(max)
begin
While PatIndex('%[^a-z0-9]%', #string) > 0
Set #string = Stuff(#string, PatIndex('%[^a-z0-9]%', #string), 1, '')
return #string
end
SELECT NAME
FROM SERVERS
returns:
SDACR.hello.com
SDACR
SDACR\AIR
SDACR.hello.com\WATER
I need the SELECT query for below result:
SDACR
SDACR
SDACR\AIR
SDACR\WATER
Kindly help ! I tried using LEFT and RIGHT functions as below, but not able to get combined output correctly:
SELECT
LEFT(Name, CHARINDEX('.', Name) - 1)
FROM
SERVERS
SELECT
RIGHT(Name, LEN(Name) - CHARINDEX('\', Name))
FROM
SERVERS
It looks like you're just trying to REPLACE a substring of characters in your column. You should try this:
SELECT REPLACE(Name,'.hello.com','') AS ReplacementName
FROM SERVERS
In tsql, you can concatenate values with CONCAT(), or you can simply add strings together with +.
SELECT LEFT(Name, CHARINDEX('.',Name)-1) + RIGHT(Name,LEN(Name)-CHARINDEX('\',Name)) from SERVERS
Also, be careful with doing arithmetic with CHARINDEX(). A value without a '.' or a '\' will return a NULL and you will get an error.
You can use LEFT for this to select everything up to the first period (dot) and add on everything after the last \
declare #servers table ([NAME] varchar(64))
insert into #servers
values
('SDACR.hello.com '),
('SDACR'),
('SDACR\AIR'),
('SDACR.hello.com\WATER')
select
left([NAME],case when charindex('.',[NAME]) = 0 then len([NAME]) else charindex('.',[NAME]) -1 end) +
case when charindex('\',left([NAME],case when charindex('.',[NAME]) = 0 then len([NAME]) else charindex('.',[NAME]) -1 end)) = 0 then right([NAME],charindex('\',reverse([NAME]))) else '' end
from #servers
Throwing my hat in.... Showing how to use Values and APPLY for cleaner code.
-- sample data in an easily consumable format
declare #yourdata table (txt varchar(100));
insert #yourdata values
('SDACR.hello.com'),
('SDACR'),
('SDACR\AIR'),
('SDACR.hello.com\WATER');
-- solution
select
txt,
newTxt =
case
when loc.dot = 0 then txt
when loc.dot > 0 and loc.slash = 0 then substring(txt, 1, loc.dot-1)
else substring(txt, 1, loc.dot-1) + substring(txt, loc.slash, 100)
end
from #yourdata
cross apply (values (charindex('.',txt), (charindex('\',txt)))) loc(dot,slash);
Results
txt newTxt
------------------------------ --------------------
SDACR.hello.com SDACR
SDACR SDACR
SDACR\AIR SDACR\AIR
SDACR.hello.com\WATER SDACR\WATER
In a SqlServer database I use, the database name is something like StackExchange.Audio.Meta, or StackExchange.Audio or StackOverflow . By sheer luck this is also the url for a website. I only need split it on the dots and reverse it: meta.audio.stackexchange. Adding http:// and .com and I'm done. Obviously Stackoverflow doesn't need any reversing.
Using the SqlServer 2016 string_split function I can easy split and reorder its result:
select value
from string_split(db_name(),'.')
order by row_number() over( order by (select 1)) desc
This gives me
| Value |
-----------------
| Meta |
| Audio |
| StackExchange |
As I need to have the url in a variable I hoped to concatenate it using this answer so my attempt looks like this:
declare #revname nvarchar(150)
select #revname = coalesce(#revname +'.','') + value
from string_split(db_name(),'.')
order by row_number() over( order by (select 1)) desc
However this only returns me the last value, StackExchange. I already noticed the warnings on that answer that this trick only works for certain execution plans as explained here.
The problem seems to be caused by the order by clause. Without that I get all values, but then in the wrong order. I tried to a add ltrimand rtrim function as suggested in the Microsoft article as well as a subquery but so far without luck.
Is there a way I can nudge the Sql Server 2016 Query Engine to concatenate the ordered result from that string_split in a variable?
I do know I can use for XML or even a plain cursor to get the result I need but I don't want to give up this elegant solution yet.
As I'm running this on the Stack Exchange Data Explorer I can't use functions, as we lack the permission to create those. I can do Stored procedures but I hoped I could evade those.
I prepared a SEDE Query to experiment with. The database names to expect are either without dots, aka StackOverflow, with 1 dot: StackOverflow.Meta or 2 dots, `StackExchange.Audio.Meta, the full list of databases is here
I think you are over-complicating things. You could use PARSENAME:
SELECT 'http://' + PARSENAME(db_name(),1) +
ISNULL('.' + PARSENAME(db_name(),2),'') + ISNULL('.'+PARSENAME(db_name(),3),'')
+ '.com'
This is exactly why I have the Presentation Sequence (PS) in my split function. People often scoff at using a UDF for such items, but it is generally a one-time hit to parse something for later consumption.
Select * from [dbo].[udf-Str-Parse]('meta.audio.stackexchange','.')
Returns
Key_PS Key_Value
1 meta
2 audio
3 stackexchange
The UDF
CREATE FUNCTION [dbo].[udf-Str-Parse] (#String varchar(max),#delimeter varchar(10))
--Usage: Select * from [dbo].[udf-Str-Parse]('meta.audio.stackexchange','.')
-- Select * from [dbo].[udf-Str-Parse]('John Cappelletti was here',' ')
-- Select * from [dbo].[udf-Str-Parse]('id26,id46|id658,id967','|')
Returns #ReturnTable Table (Key_PS int IDENTITY(1,1) NOT NULL , Key_Value varchar(max))
As
Begin
Declare #intPos int,#SubStr varchar(max)
Set #IntPos = CharIndex(#delimeter, #String)
Set #String = Replace(#String,#delimeter+#delimeter,#delimeter)
While #IntPos > 0
Begin
Set #SubStr = Substring(#String, 0, #IntPos)
Insert into #ReturnTable (Key_Value) values (#SubStr)
Set #String = Replace(#String, #SubStr + #delimeter, '')
Set #IntPos = CharIndex(#delimeter, #String)
End
Insert into #ReturnTable (Key_Value) values (#String)
Return
End
Probably less elegant solution but it takes only a few lines and works with any number of dots.
;with cte as (--build xml
select 1 num, cast('<str><s>'+replace(db_name(),'.','</s><s>')+'</s></str>' as xml) str
)
,x as (--make table from xml
select row_number() over(order by num) rn, --add numbers to sort later
t.v.value('.[1]','varchar(50)') s
from cte cross apply cte.str.nodes('str/s') t(v)
)
--combine into string
select STUFF((SELECT '.' + s AS [text()]
FROM x
order by rn desc --in reverse order
FOR XML PATH('')
), 1, 1, '' ) name
Is there a way I can nudge the Sql Server 2016 Query Engine to concatenate the ordered result from that string_split in a variable?
You can just use CONCAT:
DECLARE #URL NVARCHAR(MAX)
SELECT #URL = CONCAT(value, '.', #URL) FROM STRING_SPLIT(DB_NAME(), '.')
SET #URL = CONCAT('http://', LOWER(#URL), 'com');
The reversal is accomplished by the order of parameters to CONCAT. Here's an example.
It changes StackExchange.Garage.Meta to http://meta.garage.stackexchange.com.
This can be used to split and reverse strings in general, but note that it does leave a trailing delimiter. I'm sure you could add some logic or a COALESCE in there to make that not happen.
Also note that vNext will be adding STRING_AGG.
To answer the 'X' of this XY problem, and to address the HTTPS switch (especially for Meta sites) and some other site name changes, I've written the following SEDE query which outputs all site names in the format used on the network site list.
SELECT name,
LOWER('https://' +
IIF(PATINDEX('%.Mathoverflow%', name) > 0,
IIF(PATINDEX('%.Meta', name) > 0, 'meta.mathoverflow.net', 'mathoverflow.net'),
IIF(PATINDEX('%.Ubuntu%', name) > 0,
IIF(PATINDEX('%.Meta', name) > 0, 'meta.askubuntu.com', 'askubuntu.com'),
IIF(PATINDEX('StackExchange.%', name) > 0,
CASE SUBSTRING(name, 15, 200)
WHEN 'Audio' THEN 'video'
WHEN 'Audio.Meta' THEN 'video.meta'
WHEN 'Beer' THEN 'alcohol'
WHEN 'Beer.Meta' THEN 'alcohol.meta'
WHEN 'CogSci' THEN 'psychology'
WHEN 'CogSci.Meta' THEN 'psychology.meta'
WHEN 'Garage' THEN 'mechanics'
WHEN 'Garage.Meta' THEN 'mechanics.meta'
WHEN 'Health' THEN 'medicalsciences'
WHEN 'Health.Meta' THEN 'medicalsciences.meta'
WHEN 'Moderators' THEN 'communitybuilding'
WHEN 'Moderators.Meta' THEN 'communitybuilding.meta'
WHEN 'Photography' THEN 'photo'
WHEN 'Photography.Meta' THEN 'photo.meta'
WHEN 'Programmers' THEN 'softwareengineering'
WHEN 'Programmers.Meta' THEN 'softwareengineering.meta'
WHEN 'Vegetarian' THEN 'vegetarianism'
WHEN 'Vegetarian.Meta' THEN 'vegetarianism.meta'
WHEN 'Writers' THEN 'writing'
WHEN 'Writers.Meta' THEN 'writing.meta'
ELSE SUBSTRING(name, 15, 200)
END + '.stackexchange.com',
IIF(PATINDEX('StackOverflow.%', name) > 0,
CASE SUBSTRING(name, 15, 200)
WHEN 'Br' THEN 'pt'
WHEN 'Br.Meta' THEN 'pt.meta'
ELSE SUBSTRING(name, 15, 200)
END + '.stackoverflow.com',
IIF(PATINDEX('%.Meta', name) > 0,
'meta.' + SUBSTRING(name, 0, PATINDEX('%.Meta', name)) + '.com',
name + '.com'
)
)
)
)
) + '/'
)
FROM sys.databases WHERE database_id > 5
I know to do parts of it but not all of it, lets say my table name is REV and column name is DESCR and it has a value like
R&B , Semiprivate 2 Beds , Medical/Surgical/GYN
i use
SELECT DESCR, LEFT(DESCR, Charindex(',', DESCR)), SUBSTRING(DESCR, CHARINDEX(',', DESCR) + 1, LEN(DESCR)) from REV
i get 'R&B ,' in one column and 'Semiprivate 2 Beds , Medical/Surgical/GYN' in another column in the above select statement but i dont know how to selesct the strings from teh second comma onwards
what i like to return is 'R&B' in one column without the comma and 'Semiprivate 2 Beds' in another column and 'Medical/Surgical/GYN' so on
basically select test between commas and when there is no comma it should be blank
This should work:
SELECT
LEFT(DESCR, CHARINDEX(',', DESCR)-1),
SUBSTRING(DESCR, CHARINDEX(',', DESCR)+1, CHARINDEX(',', DESCR, CHARINDEX(',', DESCR)+1) - CHARINDEX(',', DESCR) -1 ),
RIGHT(DESCR, CHARINDEX(',', REVERSE(DESCR))-1)
FROM REV
This should work:
SELECT
LEFT(DESCR, CHARINDEX(',', DESCR)-1),
SUBSTRING(DESCR, CHARINDEX(',', DESCR)+1, LEN(DESCR)-CHARINDEX(',', DESCR)-CHARINDEX(',',REVERSE(DESCR ))),
RIGHT(DESCR, CHARINDEX(',', REVERSE(DESCR))-1)
FROM REV
Sample SQL Fiddle
This will split the string, but leave blank at the beginning and end of the strings, you can use LTRIMand RTRIMto trim away the blanks.
There might be better ways to do this though; see the article Split strings the right way – or the next best way by Aaron Bertrand at (that Andrew mentioned in a comment).
If I have the following nvarchar variable - BTA200, how can I extract just the BTA from it?
Also, if I have varying lengths such as BTA50, BTA030, how can I extract just the numeric part?
I would recommend a combination of PatIndex and Left. Carefully constructed, you can write a query that always works, no matter what your data looks like.
Ex:
Declare #Temp Table(Data VarChar(20))
Insert Into #Temp Values('BTA200')
Insert Into #Temp Values('BTA50')
Insert Into #Temp Values('BTA030')
Insert Into #Temp Values('BTA')
Insert Into #Temp Values('123')
Insert Into #Temp Values('X999')
Select Data, Left(Data, PatIndex('%[0-9]%', Data + '1') - 1)
From #Temp
PatIndex will look for the first character that falls in the range of 0-9, and return it's character position, which you can use with the LEFT function to extract the correct data. Note that PatIndex is actually using Data + '1'. This protects us from data where there are no numbers found. If there are no numbers, PatIndex would return 0. In this case, the LEFT function would error because we are using Left(Data, PatIndex - 1). When PatIndex returns 0, we would end up with Left(Data, -1) which returns an error.
There are still ways this can fail. For a full explanation, I encourage you to read:
Extracting numbers with SQL Server
That article shows how to get numbers out of a string. In your case, you want to get alpha characters instead. However, the process is similar enough that you can probably learn something useful out of it.
substring(field, 1,3) will work on your examples.
select substring(field, 1,3) from table
Also, if the alphabetic part is of variable length, you can do this to extract the alphabetic part:
select substring(field, 1, PATINDEX('%[1234567890]%', field) -1)
from table
where PATINDEX('%[1234567890]%', field) > 0
LEFT ('BTA200', 3) will work for the examples you have given, as in :
SELECT LEFT(MyField, 3)
FROM MyTable
To extract the numeric part, you can use this code
SELECT RIGHT(MyField, LEN(MyField) - 3)
FROM MyTable
WHERE MyField LIKE 'BTA%'
--Only have this test if your data does not always start with BTA.
declare #data as varchar(50)
set #data='ciao335'
--get text
Select Left(#Data, PatIndex('%[0-9]%', #Data + '1') - 1) ---->>ciao
--get numeric
Select right(#Data, len(#data) - (PatIndex('%[0-9]%', #Data )-1) ) ---->>335