Extract the highest number out of a row of strings - sql-server

I have data in a #temp_table like this:
SCORM_VARNAME
1) cmi.interactions.0.id
2) cmi.interactions.1.id
3) cmi.interactions.10.id
4) cmi.interactions.5.id
5) cmi.interactions.8.id
etc etc...
In the above example, the number I am ULTIMATELY wanting to save in a variable is 10.
I want to return the highest number that appears in these tables, it can be 0-100000 (I doubt it) but it can't be hardcoded for example. It could only go up to 3!
I have the following code that basically takes whatever string you give it, and CORRECTLY extracts the number, which is great!
DECLARE #string varchar(100)
SET #string = 'cmi.interactions.10.id'
WHILE PATINDEX('%[^0-9]%',#string) <> 0
SET #string = STUFF(#string,PATINDEX('%[^0-9]%',#string),1,'')
SELECT #string
Except I am not sure how to run this function across all the rows in the temp_table and to either replace each row with the highest number, or to return the highest number.
So either I need to figure out a way to return the highest number found in the table, or if you guys could help me by figuring out some way to UPDATE the temp_table, and replacing each string with the number found in it... I would be able to do the rest.

set #v = (
select max(cast(replace(replace(s, 'cmi.interactions.', ''), '.id', '') as int))
from T
);

Late answer ... Just another option is parsename()
Example
Declare #YourTable Table ([SCORM_VARNAME] varchar(50))
Insert Into #YourTable Values
('cmi.interactions.0.id')
,('cmi.interactions.1.id')
,('cmi.interactions.10.id')
,('cmi.interactions.5.id')
,('cmi.interactions.8.id')
Select *
,Value = try_convert(int,parsename(Scorm_Varname,2))
From #YourTable
Returns
SCORM_VARNAME Value
cmi.interactions.0.id 0
cmi.interactions.1.id 1
cmi.interactions.10.id 10
cmi.interactions.5.id 5
cmi.interactions.8.id 8
For the MAX
Select Value = max(try_convert(int,parsename(Scorm_Varname,2)))
From #YourTable

Related

Need help removing functions from CASE WHEN

I have a situation where I have created script to select data in our company's environment. In doing so, I decided to use functions for some pattern matching and stripping of characters in a CASE WHEN.
However, one of our clients doesn't want to let us put their data in our local environment, so I now have the requirement of massaging the script to be able to run on their environment--essentially meaning I need to remove the functions, and I am having trouble thinking about how I need to move stuff around to do so.
An example of the function call would be:
SELECT ....
CASE WHEN Prp = 'Key Cabinet'
AND SerialNumber IS NOT NULL
AND dbo.fnRemoveNonNumericCharacters(SerialNumber) <> ''
THEN dbo.fnRemoveNonNumericCharacters(SerialNumber)
....
INTO #EmpProperty
FROM ....
Where Prp is a column that contains the property type and SerialNumber is a column that contains a serial number, but also some other random garbage because data entry was sloppy.
The function definition is:
WHILE PATINDEX('%[^0-9]%', #strText) > 0
BEGIN
SET #strText = STUFF(#strText, PATINDEX('%[^0-9]%', #strText), 1, '')
END
RETURN #strText
where #strText is the SerialNumber I am passing in.
I may be stuck in analysis paralysis because I just can't figure out a good way to do this. I don't need a full on solution per-say, perhaps just point me in a direction you know will work. Let me know if you would like some sample DDL/DML to mess around with stuff.
Example 'SerialNumber' values: CA100 (Trash bins), T110, 101B.
There are also a bunch of other types of values such as all text or all numbers, but we are filtering those out. The current patterning matching is good enough.
So I think you mean you can't use a function... so, perhaps:
declare #table table (SomeCol varchar(4000))
insert into #table values
('1 ab2cdefghijk3lmnopqr4stuvwxyz5 6 !7##$8%^&9*()-10_=11+[]{}12\|;:13></14? 15'),
('CA100 (Trash bins), T110, 101B')
;with cte as (
select top (100)
N=row_number() over (order by ##spid) from sys.all_columns),
Final as (
select SomeCol, Col
from #table
cross apply (
select (select X + ''
from (select N, substring(SomeCol, N, 1) X
from cte
where N<=datalength(SomeCol)) [1]
where X between '0' and '9'
order by N
for xml path(''))
) Z (Col)
where Z.Col is not NULL
)
select
SomeCol
,cast(Col as varchar) CleanCol --change this to BIGINT if it isn't too large
from Final

Find and change specific number in first part out of string and replace it

I start to work on old project and there is sql server database column which stores articles numbers for example as follows:
11.1006.45
11.1006.46
11.1006.47
01.10012.11
01.10012.12
2.234.1
2.234.2
2.234.3
657.104324.32
Every number contains 3 parts. First part describe what producent it is and that's something i have to change when user choose diffrent number for specific producent. For example producent number 2 will be now 13 so according to our examples:
2.234.1
2.234.2
2.234.3
has to be done this way right now:
13.234.1
13.234.2
13.234.3
I am looking for sql query which would find all records where producent number is e.g 2.xxxxx and then replace to 13.xxxxx. I would like this query to be secure to avoid any issues with numbers replacments.Hope you understand what i ment.
You could use this for update. '2. and 13.' could be any other string
DECLARE #SampleTable AS TABLE
(
Version varchar(100)
)
INSERT INTO #SampleTable
VALUES
('11.1006.45'),
('11.1006.46'),
('11.1006.47'),
('01.10012.11'),
('01.10012.12'),
('2.234.1'),
('2.234.2'),
('2.234.3'),
('657.104324.32')
UPDATE #SampleTable
SET
Version = '13.' + substring(Version, charindex('.', Version) + 1, len(Version) - charindex('.', Version))
WHERE Version LIKE '2.%'
SELECT * FROM #SampleTable st
Demo link: Rextester
update t
set t.col= replace(yourcol,substring(yourcol,1,charindex('.',yourcol,1),2)
from table t
this finds first character before first dot
substring(yourcol,1,charindex('.',yourcol,1)
then you use replace ,to replace it with whatever you need
You can use this query for multiple updation,
DECLARE #Temp AS TABLE
(
ArtNo VARCHAR(100)
)
INSERT INTO #Temp
VALUES
('11.1006.45'),
('11.1006.46'),
('11.1006.47'),
('01.10012.11'),
('01.10012.12'),
('2.234.1'),
('2.234.2'),
('2.234.3'),
('657.104324.32')
UPDATE #Temp
SET ArtNo = CASE WHEN SUBSTRING(ArtNo,1,CHARINDEX('.',ArtNo)-1) = '2' THEN STUFF(ArtNo,1,CHARINDEX('.',ArtNo)-1,'13')
WHEN SUBSTRING(ArtNo,1,CHARINDEX('.',ArtNo)-1) = '11' THEN STUFF(ArtNo,1,CHARINDEX('.',ArtNo)-1,'15')
ELSE ArtNo
END
SELECT * FROM #Temp

SQL Server : converting varchar to INT

I am stuck on converting a varchar column UserID to INT. I know, please don't ask why this UserID column was not created as INT initially, long story.
So I tried this, but it doesn't work. and give me an error:
select CAST(userID AS int) from audit
Error:
Conversion failed when converting the varchar value
'1581............................................................................................................................' to data type int.
I did select len(userID) from audit and it returns 128 characters, which are not spaces.
I tried to detect ASCII characters for those trailing after the ID number and ASCII value = 0.
I have also tried LTRIM, RTRIM, and replace char(0) with '', but does not work.
The only way it works when I tell the fixed number of character like this below, but UserID is not always 4 characters.
select CAST(LEFT(userID, 4) AS int) from audit
You could try updating the table to get rid of these characters:
UPDATE dbo.[audit]
SET UserID = REPLACE(UserID, CHAR(0), '')
WHERE CHARINDEX(CHAR(0), UserID) > 0;
But then you'll also need to fix whatever is putting this bad data into the table in the first place. In the meantime perhaps try:
SELECT CONVERT(INT, REPLACE(UserID, CHAR(0), ''))
FROM dbo.[audit];
But that is not a long term solution. Fix the data (and the data type while you're at it). If you can't fix the data type immediately, then you can quickly find the culprit by adding a check constraint:
ALTER TABLE dbo.[audit]
ADD CONSTRAINT do_not_allow_stupid_data
CHECK (CHARINDEX(CHAR(0), UserID) = 0);
EDIT
Ok, so that is definitely a 4-digit integer followed by six instances of CHAR(0). And the workaround I posted definitely works for me:
DECLARE #foo TABLE(UserID VARCHAR(32));
INSERT #foo SELECT 0x31353831000000000000;
-- this succeeds:
SELECT CONVERT(INT, REPLACE(UserID, CHAR(0), '')) FROM #foo;
-- this fails:
SELECT CONVERT(INT, UserID) FROM #foo;
Please confirm that this code on its own (well, the first SELECT, anyway) works for you. If it does then the error you are getting is from a different non-numeric character in a different row (and if it doesn't then perhaps you have a build where a particular bug hasn't been fixed). To try and narrow it down you can take random values from the following query and then loop through the characters:
SELECT UserID, CONVERT(VARBINARY(32), UserID)
FROM dbo.[audit]
WHERE UserID LIKE '%[^0-9]%';
So take a random row, and then paste the output into a query like this:
DECLARE #x VARCHAR(32), #i INT;
SET #x = CONVERT(VARCHAR(32), 0x...); -- paste the value here
SET #i = 1;
WHILE #i <= LEN(#x)
BEGIN
PRINT RTRIM(#i) + ' = ' + RTRIM(ASCII(SUBSTRING(#x, #i, 1)))
SET #i = #i + 1;
END
This may take some trial and error before you encounter a row that fails for some other reason than CHAR(0) - since you can't really filter out the rows that contain CHAR(0) because they could contain CHAR(0) and CHAR(something else). For all we know you have values in the table like:
SELECT '15' + CHAR(9) + '23' + CHAR(0);
...which also can't be converted to an integer, whether you've replaced CHAR(0) or not.
I know you don't want to hear it, but I am really glad this is painful for people, because now they have more war stories to push back when people make very poor decisions about data types.
This question has got 91,000 views so perhaps many people are looking for a more generic solution to the issue in the title "error converting varchar to INT"
If you are on SQL Server 2012+ one way of handling this invalid data is to use TRY_CAST
SELECT TRY_CAST (userID AS INT)
FROM audit
On previous versions you could use
SELECT CASE
WHEN ISNUMERIC(RTRIM(userID) + '.0e0') = 1
AND LEN(userID) <= 11
THEN CAST(userID AS INT)
END
FROM audit
Both return NULL if the value cannot be cast.
In the specific case that you have in your question with known bad values I would use the following however.
CAST(REPLACE(userID COLLATE Latin1_General_Bin, CHAR(0),'') AS INT)
Trying to replace the null character is often problematic except if using a binary collation.
This is more for someone Searching for a result, than the original post-er. This worked for me...
declare #value varchar(max) = 'sad';
select sum(cast(iif(isnumeric(#value) = 1, #value, 0) as bigint));
returns 0
declare #value varchar(max) = '3';
select sum(cast(iif(isnumeric(#value) = 1, #value, 0) as bigint));
returns 3
I would try triming the number to see what you get:
select len(rtrim(ltrim(userid))) from audit
if that return the correct value then just do:
select convert(int, rtrim(ltrim(userid))) from audit
if that doesn't return the correct value then I would do a replace to remove the empty space:
select convert(int, replace(userid, char(0), '')) from audit
This is how I solved the problem in my case:
First of all I made sure the column I need to convert to integer doesn't contain any spaces:
update data set col1 = TRIM(col1)
I also checked whether the column only contains numeric digits.
You can check it by:
select * from data where col1 like '%[^0-9]%' order by col1
If any nonnumeric values are present, you can save them to another table and remove them from the table you are working on.
select * into nonnumeric_data from data where col1 like '%[^0-9]%'
delete from data where col1 like '%[^0-9]%'
Problems with my data were the cases above. So after fixing them, I created a bigint variable and set the values of the varchar column to the integer column I created.
alter table data add int_col1 bigint
update data set int_col1 = CAST(col1 AS VARCHAR)
This worked for me, hope you find it useful as well.

Detecting changes in SQL Server 2000 table data

I have a periodic check of a certain query (which by the way includes multiple tables) to add informational messages to the user if something has changed since the last check (once a day).
I tried to make it work with checksum_agg(binary_checksum(*)), but it does not help, so this question doesn't help much, because I have a following case (oversimplified):
select checksum_agg(binary_checksum(*))
from
(
select 1 as id,
1 as status
union all
select 2 as id,
0 as status
) data
and
select checksum_agg(binary_checksum(*))
from
(
select 1 as id,
0 as status
union all
select 2 as id,
1 as status
) data
Both of the above cases result in the same check-sum, 49, and it is clear that the data has been changed.
This doesn't have to be a simple function or a simple solution, but I need some way to uniquely identify the difference like these in SQL server 2000.
checksum_agg appears to simply add the results of binary_checksum together for all rows. Although each row has changed, the sum of the two checksums has not (i.e. 17+32 = 16+33). This is not really the norm for checking for updates, but the recommendations I can come up with are as follows:
Instead of using checksum_agg, concatenate the checksums into a delimited string, and compare strings, along the lines of SELECT binary_checksum(*) + ',' FROM MyTable FOR XML PATH(''). Much longer string to check and to store, but there will be much less chance of a false positive comparison.
Instead of using the built-in checksum routine, use HASHBYTES to calculate MD5 checksums in 8000 byte blocks, and xor the results together. This will give you a much more resilient checksum, although still not bullet-proof (i.e. it is still possible to get false matches, but very much less likely). I'll paste the HASHBYTES demo code that I wrote below.
The last option, and absolute last resort, is to actually store the table table in XML format, and compare that. This is really the only way you can be absolutely certain of no false matches, but is not scalable and involves storing and comparing large amounts of data.
Every approach, including the one you started with, has pros and cons, with varying degrees of data size and processing requirements against accuracy. Depending on what level of accuracy you require, use the appropriate option. The only way to get 100% accuracy is to store all of the table data.
Alternatively, you can add a date_modified field to each table, which is set to GetDate() using after insert and update triggers. You can do SELECT COUNT(*) FROM #test WHERE date_modified > #date_last_checked. This is a more common way of checking for updates. The downside of this one is that deletions cannot be tracked.
Another approach is to create a modified table, with table_name (VARCHAR) and is_modified (BIT) fields, containing one row for each table you wish to track. Using insert, update and delete triggers, the flag against the relevant table is set to True. When you run your schedule, you check and reset the is_modified flag (in the same transaction) - along the lines of SELECT #is_modified = is_modified, is_modified = 0 FROM tblModified
The following script generates three result sets, each corresponding with the numbered list earlier in this response. I have commented which output correspond with which option, just before the SELECT statement. To see how the output was derived, you can work backwards through the code.
-- Create the test table and populate it
CREATE TABLE #Test (
f1 INT,
f2 INT
)
INSERT INTO #Test VALUES(1, 1)
INSERT INTO #Test VALUES(2, 0)
INSERT INTO #Test VALUES(2, 1)
/*******************
OPTION 1
*******************/
SELECT CAST(binary_checksum(*) AS VARCHAR) + ',' FROM #test FOR XML PATH('')
-- Declaration: Input and output MD5 checksums (#in and #out), input string (#input), and counter (#i)
DECLARE #in VARBINARY(16), #out VARBINARY(16), #input VARCHAR(MAX), #i INT
-- Initialize #input string as the XML dump of the table
-- Use this as your comparison string if you choose to not use the MD5 checksum
SET #input = (SELECT * FROM #Test FOR XML RAW)
/*******************
OPTION 3
*******************/
SELECT #input
-- Initialise counter and output MD5.
SET #i = 1
SET #out = 0x00000000000000000000000000000000
WHILE #i <= LEN(#input)
BEGIN
-- calculate MD5 for this batch
SET #in = HASHBYTES('MD5', SUBSTRING(#input, #i, CASE WHEN LEN(#input) - #i > 8000 THEN 8000 ELSE LEN(#input) - #i END))
-- xor the results with the output
SET #out = CAST(CAST(SUBSTRING(#in, 1, 4) AS INT) ^ CAST(SUBSTRING(#out, 1, 4) AS INT) AS VARBINARY(4)) +
CAST(CAST(SUBSTRING(#in, 5, 4) AS INT) ^ CAST(SUBSTRING(#out, 5, 4) AS INT) AS VARBINARY(4)) +
CAST(CAST(SUBSTRING(#in, 9, 4) AS INT) ^ CAST(SUBSTRING(#out, 9, 4) AS INT) AS VARBINARY(4)) +
CAST(CAST(SUBSTRING(#in, 13, 4) AS INT) ^ CAST(SUBSTRING(#out, 13, 4) AS INT) AS VARBINARY(4))
SET #i = #i + 8000
END
/*******************
OPTION 2
*******************/
SELECT #out

How to get MAX value of numeric values in varchar column

I have a table with a nvarchar column. This column has values for example:
983
294
a343
a3546f
and so on.
I would like to take MAX of this values, but not as text but like from numerics. So in this example numerics are:
983
294
343
3546
And the MAX value is the last one - 3546. How to do this in TSQL on Microsoft SQL?
First install a regular expression function. This article has code you can cut/paste.
Then with RegexReplace (from that article) you can extract digits from a string:
dbo.RegexReplace( '.*?(\d+).*', myField, '$1' )
Then convert this string to a number:
CAST( dbo.RegexReplace( '.*?(\d+).*', myField, '$1' ) AS INT )
Then use this expression inside a MAX() function in a SELECT.
You can try to keep it simple without using Regular Expression
Here is the source
create table #t ( val varchar(100) )
insert #t select 983
insert #t select 294
insert #t select 'a343'
insert #t select 'a3546f';
GO
;with ValueRange as (
select val,
[from] = patindex('%[0-9]%', val),
[to] = case patindex('%[a-z]', val)
when 0 then len(val)
else patindex('%[a-z]', val) - patindex('%[0-9]%', val)
end
from #t
)
select substring(val, [from], [to]) as val
from ValueRange VR
order by cast(substring(val, [from], [to]) as int) desc
CAST() would do the trick, probably.
SELECT MAX(CAST(yourColumn AS int)) AS maxColumns FROM yourTable
Edit.
I didn't read the whole question, as it seems...
– Function to strip out non-numeric chars
ALTER FUNCTION dbo.UDF_ParseNumericChars
(
#string VARCHAR(8000)
)
RETURNS VARCHAR(8000)
AS
BEGIN
DECLARE #IncorrectCharLoc SMALLINT
–SET #IncorrectCharLoc = PATINDEX(’%[^0-9A-Za-z]%’, #string)
SET #IncorrectCharLoc = PATINDEX(’%[^0-9.]%’, #string)
WHILE #IncorrectCharLoc > 0
BEGIN
SET #string = STUFF(#string, #IncorrectCharLoc, 1, ”)
SET #IncorrectCharLoc = PATINDEX(’%[^0-9.]%’, #string)
END
SET #string = #string
RETURN #string
END
GO
I picked it from here. (I voted up the reg exp answer though)
you can write a function something like
create FUNCTION [dbo].[getFirstNumeric](
#s VARCHAR(50)
)
RETURNS int AS
BEGIN
set #s = substring(#s,patindex('%[0-9]%',#s),len(#s)-patindex('%[0-9]%',#s) + 1)
if patindex('%[^0-9]%',#s) = 0
return #s
set #s = substring(#s,1,patindex('%[^0-9]%',#s)-1)
return cast(#s as int)
end
and then call
select max(dbo.getFirstNumeric(yourColumn)) from yourTable
if you are using SQL Server 2005 or never you can also use the solution posted by Sung Meister
As far as I know you would need to create a process (or user defined function) to scrub the column, so that you can actually convert it to an INT or other appropriate datatype, then you can take the max of that.
By using user defined function parse the value to an int and then run the select.
SELECT MAX(dbo.parseVarcharToInt(column)) FROM table
SELECT dbo.RegexReplace('[^0-9]', '','a5453b',1, 1)
and RegexReplace installation like Jason Cohen said
This is an old question, I know - but to add to the knowledge base for others...
Assuming all your values have at least 1 number in them:
Select max(convert(int, SubString(VarName, PATINDEX('%[0-9]%',VarName), Len(VarName))))
from ATable
This is my simple answer. You can try it. But it works for fixed removable string value.
select max(cast(SUBSTRING(T.column,3,len(T.column)) as int)) from tablename T

Resources