Separating Numeric Values from string - sql-server

I'm having an issue, whereby I need to separate the following BPO007 to show BPO and 007. In some cases another example would be GFE0035, whereby I still need to split the numeric value from the characters.
When I do the following select isnumeric('BPO007') the result is 0, which is correct, however, I'm not sure how split these from each other.
I've had a look at this link, but it does not really answer my question.
I need the above split for a separate validation purpose in my trigger.
How would I develop something like this?
Thank you in advance.

As told in a comment before:
About your question How would I develop something like this?:
Do not store two pieces of information within the same column (read about 1.NF). You should keep separate columns in your database for BPO and for 007 (or rather an integer 7).
Then use some string methods to compute the BPO007 when you need it in your output.
Just not to let you alone in the rain.
DECLARE #tbl TABLE(YourColumn VARCHAR(100));
INSERT INTO #tbl VALUES('BPO007'),('GFE0035');
SELECT YourColumn,pos
,LEFT(YourColumn,pos) AS CharPart
,CAST(SUBSTRING(YourColumn,pos+1,1000) AS INT) AS NumPart
FROM #tbl
CROSS APPLY(SELECT PATINDEX('%[0-9]%',YourColumn)-1) AS A(pos);
Will return
YourColumn pos CharPart NumPart
BPO007 3 BPO 7
GFE0035 3 GFE 35
Hint: I use a CROSS APPLY here to compute the position of the first numeric character and then use pos in the actual query like you'd use a variable. Otherwise the PATINDEX would have to be repeated...

Since the number and text varies, you can use the following codes
DECLARE #NUMERIC TABLE (Col VARCHAR(50))
INSERT INTO #NUMERIC VALUES('BPO007'),('GFE0035'),('GFEGVT003509'),('GFEMTS10035')
SELECT
Col,
LEFT(Col,LEN(Col)-LEN(SUBSTRING(Col,PATINDEX('%[0-9]%',Col),DATALENGTH(Col)))) AS TEXTs,
RIGHT(Col,LEN(Col)-LEN(LEFT(Col,LEN(Col)-LEN(SUBSTRING(Col,PATINDEX('%[0-9]%',Col),DATALENGTH(Col)))))) AS NUMERICs
FROM #NUMERIC

Related

Why two string do not match although they are exactly the same?

I have a lookup table with a list of values. Lookup Table I need to filter on a value from the LUT table in a simple where condition. It works with all table values except one and I don't know why. I have tried using trim function and lower function to change the string but nothing helped. Does anyone have the same experience? Why it does work for all table values except one? My code:
SELECT * FROM "PossibleNewGMCIssues" WHERE "gmcIssue" = 'Suspended account for policy violation'
Thank you in advance.
It's too hard for anybody to say without seeing the strings themselves. They probably look similar but have different unicode values. You can convert to the hex values to see where they are different though by using hex_encode:
Below I create a table that uses two strings that look the same but aren't. One contains an m-dash and the other an en-dash.
-- Create a table with two columns with strings in them that look the same but arent
create or replace transient table test_table as (
select 'a-string'::string col1, 'a—string'::string col2
);
-- This returns 0 results
select * from test_table where col1=col2;
-- You can tell that the strings are different by checking the hex representation of them
select hex_encode(col1), hex_encode(col2)
from test_table
;
-- The above returns:
-- +----------------+--------------------+
-- |HEX_ENCODE(COL1)|HEX_ENCODE(COL2) |
-- +----------------+--------------------+
-- |612D737472696E67|61E28094737472696E67|
-- +----------------+--------------------+

not able to identify difference between same value

I have data inside a table's column. I SELECT DISTINCT of that column, i also put LTRIM(RTRIM(col_name)) as well while writing SELECT. But still I am getting duplicate column record.
How can we identify why it is happening and how we can avoid it?
I tried RTRIM, LTRIM, UPPER function. Still no help.
Query:
select distinct LTRIM(RTRIM(serverstatus))
from SQLInventory
Output:
Development
Staging
Test
Pre-Production
UNKNOWN
NULL
Need to be decommissioned
Production
Pre-Produc​tion
Decommissioned
Non-Production
Unsupported Edition
Looks like there's a unicode character in there, somewhere. I copied and pasted the values out initially as a varchar, and did the following:
SELECT DISTINCT serverstatus
FROM (VALUES('Development'),
('Staging'),
('Test'),
('Pre-Production'),
('UNKNOWN'),
('NULL'),
('Need to be decommissioned'),
('Production'),
(''),
('Pre-Produc​tion'),
('Decommissioned'),
('Non-Production'),
('Unsupported Edition'))V(serverstatus);
This, interestingly, returned the values below:
Development
Staging
Test
Pre-Production
UNKNOWN
NULL
Need to be decommissioned
Production
Pre-Produc?tion
Decommissioned
Non-Production
Unsupported Edition
Note that one of the values is Pre-Produc?tion, meaning that there is a unicode character between the c and t.
So, let's find out what it is:
SELECT 'Pre-Produc​tion', N'Pre-Produc​tion',
UNICODE(SUBSTRING(N'Pre-Produc​tion',11,1));
The UNICODE function returns back 8203, which is a zero-width space. I assume you want to remove these, so you can update your data by doing:
UPDATE SQLInventory
SET serverstatus = REPLACE(serverstatus, NCHAR(8203), N'');
Now your first query should work as you expect.
(I also suggest you might therefore want a lookup table for your status' with a foreign key, so that this can't happen again).
DB<>fiddle
I deal with this type of thing all the time. For stuff like this NGrams8K and PatReplace8k and PATINDEX are your best friends.
Putting what you posted in a table variable we can analyze the problem:
DECLARE #table TABLE (txtID INT IDENTITY, txt NVARCHAR(100));
INSERT #table (txt)
VALUES ('Development'),('Staging'),('Test'),('Pre-Production'),('UNKNOWN'),(NULL),
('Need to be decommissioned'),('Production'),(''),('Pre-Produc​tion'),('Decommissioned'),
('Non-Production'),('Unsupported Edition');
This query will identify items with characters other than A-Z, spaces and hyphens:
SELECT t.txtID, t.txt
FROM #table AS t
WHERE PATINDEX('%[^a-zA-Z -]%',t.txt) > 0;
This returns:
txtID txt
----------- -------------------------------------------
10 Pre-Produc​tion
To identify the bad character we can use NGrams8k like this:
SELECT t.txtID, t.txt, ng.position, ng.token -- ,UNICODE(ng.token)
FROM #table AS t
CROSS APPLY dbo.NGrams8K(t.txt,1) AS ng
WHERE PATINDEX('%[^a-zA-Z -]%',ng.token)>0;
Which returns:
txtID txt position token
------ ----------------- -------------------- ---------
10 Pre-Produc​tion 11 ?
PatReplace8K makes cleaning up stuff like this quite easily and quickly. First note this query:
SELECT OldString = t.txt, p.NewString
FROM #table AS t
CROSS APPLY dbo.patReplace8K(t.txt,'%[^a-zA-Z -]%','') AS p
WHERE PATINDEX('%[^a-zA-Z -]%',t.txt) > 0;
Which returns this on my system:
OldString NewString
------------------ ----------------
Pre-Produc?tion Pre-Production
To fix the problem you can use patreplace8K like this:
UPDATE t
SET txt = p.newString
FROM #table AS t
CROSS APPLY dbo.patReplace8K(t.txt,'%[^a-zA-Z -]%','') AS p
WHERE PATINDEX('%[^a-zA-Z -]%',t.txt) > 0;

select max value from a string column in SQL

I have table with a column Order of type varchar as follows
Order
-----
Ord-998,
Ord-999,
Ord-1000,
Ord-1001,
I want to get the max value as 1001
But when I run this query, I am getting 999 as max value always
select
SUBSTRING((select isnull(MAX(OrderNo), '0000000')
from OrderSummary
where OrderNo like 'Ord%'), 5, 10) as [OrderMax]
Can anybody provide a solution?
Since you are maxing a string it is sorting alphabetically where C is larger than AAA and 9 is larger than 10. Remove the letters and cast it as an int then get the max. Given that it will always be Ord-### we can remove the Ord- and cast the remainder as an INT.
SELECT
MAX(CAST(SUBSTRING(OrderNo,5,LEN(OrderNo)-4) AS INT))
FROM OrderSummary
WHERE OrderNo LIKE 'Ord-%'
Another possibility would be the REPLACE-function:
SELECT
MAX(CONVERT(int,(REPLACE(OrderNo, 'Ord-' ,'')))) AS OrderMax
FROM OrderSummary;
But like already mentioned in comments, the best solution will be to get rid of the "Ord-" and create a int-column instead.
If I were tackling this issue I would Order Number as an integer instead of a varchar, and not store the repetitive text "Ord-" in the field. If end users require "Ord-998", I'd handle it via my CRUD layer.
This also allows you to declare the Order Number column as an identity column, which will handle the automatic incrementing for you.
Please try this query in SQL Server
select MAX(convert(int, substring(orderno, 5, LEN(orderno))))
from OrderSummary
I attach another possible solution this query is posgresQL
SELECT
CONCAT(RPAD('Ord-',5), MAX(CAST(SUBSTRING(OrderNo,5,length(OrderNo)-4) AS INT)))AS
ORDER
FROM OrderSummary
RESULT
Order
Ord-1001
Use len keyword instead of like keyword
select MAX(CAST(SUBSTRING(OrderNo, 6, len(OrderNo)) AS INT)) FROM OrderSummary
The problem, as stated in other answers is that you have a string. I will provide here an alternative solution: you could select top 1 from your table and have an order by len(OrderNo) desc, OrderNo desc. This way you will not have to cut all the varchars and convert them to integer, but will be able to simply compare their length (very quick) and if similar, their varchar value (pretty slow). Of course, a default for null values is a good idea, but make sure you do not put more 0-s there than the number of digits of your first number.
u can cast as int,it will consider value,not lexic. order
ie.
max(cast(Col as int)) from Table

Removing characters from a alphanumeric field SQL

Im moving data from one table to another using insert into. in the select bit need to transfer from column with characters and numerical in to another with only the numerical. The original column is in varchar format.
original column -
ABC100
XYZ:200
DD2000
Wanted column
100
200
2000
Cant write a function because cant have a function in side select statement when inserting
Using MS SQL
I encourage you to read this:
Extracting Data
There is an example function that removes alpha characters from a string. This will be much faster than a bunch of replace statements.
You can probably do that with a regex replace. The syntax for this depends on your database software (which you haven't specified).
You should be able to do function calls in your SELECT statement, even when you're using it to INSERT INTO.
If your data is fixed-format I'd do something like
INSERT INTO SOME_TABLE(COLUMN1, COLUMN2, COLUMN3)
SELECT TO_NUMBER(SUBSTR(SOURCE_COLUMN, 4, 3)),
TO_NUMBER(SUBSTR(SOURCE_COLUMN, 12, 3)),
TO_NUMBER(SUBSTR(SOURCE_COLUMN, 18, 4))
FROM SOME_OTHER_TABLE
WHERE <conditions>;
The above code is for Oracle. Depending on the database you're using you may have to do things a bit differently.
I hope this helps.
You certainly can have a function inside a SELECT statement during an INSERT:
INSERT INTO CleanTable (CleanColumn)
SELECT dbo.udf_CleanString(DirtyColumn)
FROM DirtyTable
Your main problem is going to be getting the function right (the one the G Mastros linked to is pretty good) and getting it performing. If you're only talking thousands of rows, this should be fine. If you are talking about millions of rows, you might need a different strategy.
Writing a UDF is how I've solved this problem in the past. However, I got to thinking if there was a set-based solution. Here's what I have:
First my table which I used Red Gate's Data Generator to populate with a bunch of random alpha numeric values:
Create Table MixedValues (
Id int not null identity(1,1) Primary Key
, AlphaValue varchar(50)
)
Next I built a Tally table on the fly using a CTE but normally I have a fixed table for this. A Tally table is just a table of sequential numbers.
;With Tally As
(
Select ROW_NUMBER() OVER ( ORDER BY object_id ) As Num
From sys.columns
)
, IndividualChars As
(
Select MX.Id, Substring(MX.AlphaValue, Num, 1) As CharValue, Num
From Tally
Cross Join MixedValues As MX
Where Num Between 1 And Len(MX.AlphaValue)
)
Select MX.Id, MX.AlphaValue
, Replace(
(
Select '' + CharValue
From IndividualChars As IC
Where IC.Id = MX.Id
And PATINDEX('[ 0-9]', CharValue) > 0
Order By Num
For Xml Path('')
)
, ' ', ' ') As NewValue
From MixedValues As MX
From a top level, the idea here is to split the string into one row per individual character, test the type of pattern you want and then re-constitute it.
Note that my sys.columns table only contains 500 some odd rows. If you had strings larger than 500 characters, you could simply cross join sys.columns to itself and get 500^2 rows. In addition, For Xml Path returns a string with spaces escaped (note the space in my pattern index [ 0-9] which tells the system to include spaces.) so I use the replace function to reverse the escaping.
EDIT: Btw, this will only work in SQL 2005+ because of my use of the CTE. If you wanted a SQL 2000 solution, you would need to break up the CTE into separate table creation calls (e.g. Temp tables) but it could still be done.
EDIT: I added the Num column in the IndividualChars CTE and added an Order By to the NewValue query at the end. Although it probably will reconstitute the string in order, I wanted to ensure that it would by explicitly ordering the results.

Select max int from varchar column

I am trying to retrieve the largest number from a varchar column that includes both numbers and strings. An example of the data I'm working with:
BoxNumber
123
A5
789
B1
I need to return the largest number (789 in this case) from the column while ignoring the non-numeric values of A5 and B1.
I have found solutions that use custom functions to solve the problem, but I need something that can be executed ad-hoc query without relying on custom functions or procs.
you need a combination because of the fact that isnumeric returns 1 for the following things
select isnumeric('+'),isnumeric('5d2')
your where clause would be like this
WHERE VALUE NOT LIKE '%[a-z]%'
AND ISNUMERIC(VALUE) = 1
create table #bla (value varchar(50))
insert #bla values('123')
insert #bla values('a5')
insert #bla values('789')
insert #bla values('b1')
SELECT MAX(CAST(value AS Int)) FROM #bla
WHERE VALUE NOT LIKE '%[a-z]%'
AND ISNUMERIC(VALUE) = 1
I wrote about this here ISNUMERIC Trouble
You might try
Select MAX(BoxNumber) from {table} where IsNumeric(BoxNumber) = 1
Why not
SELECT MAX(CAST(Value AS Int)) FROM #bla
WHERE ISNUMERIC(Value)=1
AND Value LIKE '%[0-9]%'
then you're only dealing with numeric strings. In this case you may not need ISNUMERIC()
The selected answer worked for me until I added this value to the temp table along with the others in the sample:
insert #bla values('1234')
I expected my max() result to now be 1234, but it remained at 789. Perhaps this is due to some collation setting, but I was able to reproduce on multiple databases. I found this query below worked for me, but I'd certainly be interested to hear if there is a more efficient way of doing this. Also, I did not want to include any decimal values, so I excluded anything with a period as well.
SELECT MAX(CAST(Value AS Int)) FROM #bla
WHERE ISNUMERIC(Value)=1
AND Value NOT LIKE '%[a-z]%'
AND Value NOT LIKE '%.%'
You should check this solution out for values like '+' and '-' as I think the IsNumeric function may return 1 for these values
Well, in the intervening 11 years since this question was asked, SQL Server gained the try_cast function, which returns null rather than throwing an error if the conversion doesn't succeed. So in current versions of SQL Server, a valid answer is
select max(try_cast(BoxNumber as int)) from theTable
Look into casting the column to an int, then selecting the MAX(). I don't know what it will do to columns that contain letters, but it's worth exploring.
http://doc.ddart.net/mssql/sql70/ca-co_1.htm
These answers are only half right. They succeed in isolating numeric values, but don't realize that when the underlying field is a number the Max function evaluates as characters (regardless of casting), reading left to right, so that 789 > 1000 because 7 > 1. A way around this might be to forget about casting and left pad the numbers with zeros to a common length, when Max in character mode should work.
SELECT MAX(CAST(value as signed)) FROM yourTable
WHERE VALUE NOT LIKE '%[a-z]%'
AND ISNUMERIC(VALUE) = 1
In MySql
You should use signed instead of int to cast correctly.
P.s types used to cast may differ in MySql
For more conversions visit this link
select max(cast(column as numeric)) from table
Where column was varchar which was casted to numeric in postgresql

Resources