Remove portion of string TSQL - sql-server

I have a string of numbers that I need to trim a portion from using TSQL.
The string of numbers will always start with a 101 then it will have a set of 0's and a set of random numbers.
Example: 1010000123456
I need to trim the 101 and the set of zeros. This is probably simple but I'm having all kinds of issues because I don't have a specific character to reference to using a CHARINDEX and the possible combination of a 001 when the random numbers start that I need to keep is giving me issues using a PATINDEX with a SUBSTRING.

Remove the 101 in the string, Cast to big integer and then cast back to string.
select cast(cast(right('1010000123456', len('1010000123456')-3) as bigint) as varchar(20))
select cast(cast(right('1010000103456', len('1010000103456')-3) as bigint) as varchar(20))

Related

Converting varchar into a decimal

How do I turn the values below on the left into the values in brackets? (SQL Server 2012)
50 (000.050)
100 (000.100)
1000 (001.000)
9999 (009.999)
20000 (020.000)
This seems to be rather easy... What have you tried yourself?
First of all: If these values are (integer) numbers, you should not store them in a string column. Any code can break easily, if there are non-numeric values among them...
You can try this:
DECLARE #mockup TABLE(ID INT IDENTITY,YourValue VARCHAR(100))
INSERT INTO #mockup VALUES('50'),('100'),('1000'),('9999'),('20000');
SELECT *
,CAST(YourValue AS DECIMAL(10,3))/1000 AS NewValue
,FORMAT(CAST(YourValue AS DECIMAL(10,3))/1000,'000.000') AS Formatted
FROM #mockup
Casting a string like "1000" to DECIMAL will get a number back. This number can be divided by 1000 to get the value needed.
If you need the format as provided, you can use FORMAT() on SQL-Server 2012+, but this function is known as rather slow... If this is important for you, search for other ways to format a number. There are many examples here on SO...

How to remove weird Excel character in SQL Server?

There is a weird whitespace character I can't seem to get rid of that occasionally shows up in my data when importing from Excel. Visibly, it comes across as a whitespace character BUT SQL Server sees it as a question mark (ASCII 63).
declare #temp nvarchar(255); set #temp = 'carolg#c?am.com'
select #temp
returns:
?carolg#c?am.com
How can I get rid of the whitespace without getting rid of real question marks? If I look at the ASCII code for each of those "?" characters I get 63 when in fact, only one of them is a real qustion mark.
Have a look at this answer for someone with a similar issue. Sorry if this is a bit long winded:
SQL Server seems to flatten Unicode to ASCII by mapping unrepresentable characters (for which there is no suitable substitution) to a question mark. To replicate this, try opening the Character Map Windows program (should be installed on most machines), select Arial as the font and find U+034f "Combining Grapheme Joiner". select this character, copy to clipboard and paste it between the single quotes below:
declare #t nvarchar(10)
set #t = '͏'
select rtrim(ltrim(#t)) -- we can try and trim it, but by this stage it's already a '?'
You'll get a question mark out, because it doesn't know how to represent this non-ASCII character when it casts it to varchar. To force it to accept it as a double-byte character (nvarchar) you need to use N'' instead, as has already been mentioned. Add an N before the quotes above and the question mark disappears (but the original invisible character is preserved in the output - and ltrim and rtrim won't remove it as demonstrated below):
declare #t nvarchar(10),
#s varchar(10) -- note: single-byte string
set #t = rtrim(ltrim(N'͏')) -- trimming doesn't work here either
set #s = #t
select #s -- still outputs a question mark
Imported data can definitely do this, I've seen it before, and characters like the one I've shown above are particularly hard to diagnose because you can't see them! You will need to create some sort of scrubbing process to remove these unprintables (and any other junk characters, for that matter), and make sure that you use nvarchar everywhere, or you'll end up with this issue. Worse, those phantom question marks will become real question marks that you won't be able to distinguish from legitimate ones.
To see what character code you're dealing with, you can cast as varbinary as follows:
declare #t nvarchar(10)
set #t = N'͏test?'
select cast(#t as varbinary) -- returns 0x4F0374006500730074003F00
-- Returns:
-- 0x4F03 7400 6500 7300 7400 3F00
-- badchar t e s t ?
Now to get rid of it:
declare #t nvarchar(10)
set #t = N'͏test?'
select cast(#t as varbinary) -- bad char
set #t = replace(#t COLLATE Latin1_General_100_BIN2, nchar(0x034f), N'');
select cast(#t as varbinary) -- gone!
Note I had to swap the byte order from 0x4f03 to 0x034f (same reason "t" appears in the output as 0x7400, not 0x0074). For some notes on why we're using binary collation, see this answer.
This is kind of messy, because you don't know what the dirty characters are, and they could be one of thousands of possibilities. One option is to iterate over strings using like or even the unicode() function and discard characters in strings that aren't in a list of acceptable characters, but this could be slow. It may be that most of your bad characters are either at the start or end of the string, which might speed this process up if that's an assumption you think you can make.
You may need to build additional processes either external to SQL Server or as part of a SSIS import based on what I've shown you above to strip this out quickly if you have a lot of data to import. If you aren't sure the best way to do this, that's probably best answered in a new question.
I hope that helps.

Query to search an alphanumeric string in a non-alphanumeric column

Here is the issue - I have a database column that holds product serial number that are filled in by users, but without any kind of filters. For example, the user can fill the field as: DC-538, DC 538 or DC538, depending on his own interpretation - since the serial number is usually in the metal part of the product and it can be difficult to know If there's a blank space for examplo.
I can't format the current column values, because that are so many brands and we couldn't know for sure If taking out a non alpha numeric character can lead to problems. I mean, If they consider these kinds of character as part of an official number. For example: "DC-538-XXX" and "DC538-XXX" could be related to 2 different products. Very unlikely, but we cannot assume it doesn't happen.
Now I need to offer a search by serial number in my website... but, If the user searchs for "DC538" instead of "DC 538" he won't find it. What's the best approach ?
I believe that the perfect solution would be to have a kind of select that would search the exact string and also strip the non-alpha-num from the search term and compare to a stripped string in the database (that I don't have). But I don't know If there's a way to do that with SQL only.
Any ideas ?
Cheers
By using the below function, which was offered as an answer here and modifying it to return numeric characters:
CREATE FUNCTION [dbo].[RemoveNonAlphaCharacters] (#Temp VARCHAR(1000))
RETURNS VARCHAR(1000)
AS
BEGIN
DECLARE #KeepValues AS VARCHAR(MAX)
SET #KeepValues = '%[^a-z0-9]%'
WHILE PatIndex(#KeepValues, #Temp) > 0
SET #Temp = Stuff(#Temp, PatIndex(#KeepValues, #Temp), 1, '')
RETURN #Temp
END
You can do the following:
DECLARE Input NVARCHAR(MAX)
SET #Input = '%' + dbo.RemoveNonAlphaCharacters('text inputted by user') + '%'
SELECT *
FROM Table
WHERE dbo.RemoveNonAlphaCharacters(ColumnCode) LIKE #Input
Here is a sample working SQLFiddle

T-SQL Extract portion of xml or nvarchar(max) column matching pattern

I'd like to do something that I think is fairly trivial using T-SQL//SQL Server 2008 R2, but I can't seem to figure out a way.
If I were in Java, C#, C++, whatever, I would do:
Find position of first occurrance of '123' in string
Execute substring operation from that position getting next 50 characters
So, in SQL Server, I'd basically like:
Find all rows where column (X) contains said string (basically a
LIKE clause)
Return 50 characters from that column starting at the said string's location.
Can I do this somehow? I can cast an XML column to nvarchar(max), do a like operation, and do a substring operation, I don't know how to get the position of the said string in the column in the first place though.
Sample content requested in comment
CREATE TABLE SampleTable(xmlData xml);
Pretend the value is in one if SampleTable's xmlData column is as follows. I would like to, for debugging purposes, extract the string from the funny unicode Þ character forward 50 characters (or to the end of the file if that's less than 50).
<RootNode>
<Row>
<NestedNode1>
some text.
</NestedNode1>
<NestedNode2>
123456
</NestedNode2>
<NestedNode3>
Þ Some crazy name with unicode letters. Þ
</NestedNode3>
</Row>
</RootNode>
Are you looking for CHARINDEX?
;WITH CTE AS(
SELECT CAST (xmlData as nvarchar(max)) as X
FROM SampleTable
)
SELECT SUBSTRING(X,CHARINDEX(N'Þ',X),50) as [String]
FROM CTE
WHERE CHARINDEX(N'Þ',X)>0

Get a number from a sql string range

I have a column of data that contains a percentage range as a string that I'd like to convert to a number so I can do easy comparisons.
Possible values in the string:
'<5%'
'5-10%'
'10-15%'
...
'95-100%'
I'd like to convert this in my select where clause to just the first number, 5, 10, 15, etc. so that I can compare that value to a passed in "at least this" value.
I've tried a bunch of variations on substring, charindex, convert, and replace, but I still can't seem to get something that works in all combinations.
Any ideas?
Try this,
SELECT substring(replace(interest , '<',''), patindex('%[0-9]%',replace(interest , '<','')), patindex('%[^0-9]%',replace(interest, '<',''))-1) FROM table1
Tested at my end and it works, it's only my first try so you might be able to optimise it.
#Martin: Your solution works.
Here is another I came up with based on inspiration from #mercutio
select cast(replace(replace(replace(interest,'<',''),'%',''),'-','.0') as numeric) test
from table1 where interest is not null
You can convert char data to other types of char (convert char(10) to varchar(10)), but you won't be able to convert character data to integer data from within SQL.
I don't know if this works in SQL Server, but within MySQL, you can use several tricks to convert character data into numbers. Examples from your sample data:
"<5%" => 0
"5-10%" => 5
"95-100%" => 95
now obviously this fails your first test, but some clever string replacements on the start of the string would be enough to get it working.
One example of converting character data into numbers:
SELECT "5-10%" + 0 AS foo ...
Might not work in SQL Server, but future searches may help the odd MySQL user :-D
You can do this in sql server with a cursor. If you can create a CLR function to pull out number groupings that will help. Its possible in T-SQL, just will be ugly.
Create the cursor to loop over the list.
Find the first number, If there is only 1 number group in their then return it. Otherwise find the second item grouping.
if there is only 1st item grouping returned and its the first item in the list set it to upper bound.
if there is only 1st item grouping returned and its the last item in the list set it to lower bound.
Otherwise set the 1st item grouping to lower, and the 2nd item grouping to upper bound
Just set the resulting values back to a table
The issue you are having is a symptom of not keeping the data atomic. In this case it looks purely unintentional (Legacy) but here is a link about it.
To design yourself out of this create a range_lookup table:
Create table rangeLookup(
rangeID int -- or rangeCD or not at all
,rangeLabel varchar(50)
,LowValue int--real or whatever
,HighValue int
)
To hack yourself out here some pseudo steps this will be a deeply nested mess.
normalize your input by replacing all your crazy charecters.
replace(replace(rangeLabel,"%",""),"<","")
--This will entail many nested replace statments.
Add a CASE and CHARINDEX to look for a space if there is none you have your number
else use your substring to take everything before the first " ".
-- theses steps are wrapped around the previous step.
It's complicated, but for the test cases you provided, this works. Just replace #Test with the column you are looking in from your table.
DECLARE #TEST varchar(10)
set #Test = '<5%'
--set #Test = '5-10%'
--set #Test = '10-15%'
--set #Test = '95-100%'
Select CASE WHEN
Substring(#TEST,1,1) = '<'
THEN
0
ELSE
CONVERT(integer,SUBSTRING(#TEST,1,CHARINDEX('-',#TEST)-1))
END
AS LowerBound
,
CASE WHEN
Substring(#TEST,1,1) = '<'
THEN
CONVERT(integer,Substring(#TEST,2,CHARINDEX('%',#TEST)-2))
ELSE
CONVERT(integer,Substring(#TEST,CHARINDEX('-',#TEST)+1,CHARINDEX('%',#TEST)-CHARINDEX('-',#TEST)-1))
END
AS UpperBound
You'd probably be much better off changing <5% and 5-10% to store 2 values in 2 fields. Instead of storing <5%, you would store 0, and 5, and instead of 5-10%, yould end up with 5 and 10. You'd end up with 2 columns, one called lowerbound, and one called upperbound, and then just check value >= lowerbound AND value < upperbound.

Resources