I have data inside a table's column. I SELECT DISTINCT of that column, i also put LTRIM(RTRIM(col_name)) as well while writing SELECT. But still I am getting duplicate column record.
How can we identify why it is happening and how we can avoid it?
I tried RTRIM, LTRIM, UPPER function. Still no help.
Query:
select distinct LTRIM(RTRIM(serverstatus))
from SQLInventory
Output:
Development
Staging
Test
Pre-Production
UNKNOWN
NULL
Need to be decommissioned
Production
Pre-Production
Decommissioned
Non-Production
Unsupported Edition
Looks like there's a unicode character in there, somewhere. I copied and pasted the values out initially as a varchar, and did the following:
SELECT DISTINCT serverstatus
FROM (VALUES('Development'),
('Staging'),
('Test'),
('Pre-Production'),
('UNKNOWN'),
('NULL'),
('Need to be decommissioned'),
('Production'),
(''),
('Pre-Production'),
('Decommissioned'),
('Non-Production'),
('Unsupported Edition'))V(serverstatus);
This, interestingly, returned the values below:
Development
Staging
Test
Pre-Production
UNKNOWN
NULL
Need to be decommissioned
Production
Pre-Produc?tion
Decommissioned
Non-Production
Unsupported Edition
Note that one of the values is Pre-Produc?tion, meaning that there is a unicode character between the c and t.
So, let's find out what it is:
SELECT 'Pre-Production', N'Pre-Production',
UNICODE(SUBSTRING(N'Pre-Production',11,1));
The UNICODE function returns back 8203, which is a zero-width space. I assume you want to remove these, so you can update your data by doing:
UPDATE SQLInventory
SET serverstatus = REPLACE(serverstatus, NCHAR(8203), N'');
Now your first query should work as you expect.
(I also suggest you might therefore want a lookup table for your status' with a foreign key, so that this can't happen again).
DB<>fiddle
I deal with this type of thing all the time. For stuff like this NGrams8K and PatReplace8k and PATINDEX are your best friends.
Putting what you posted in a table variable we can analyze the problem:
DECLARE #table TABLE (txtID INT IDENTITY, txt NVARCHAR(100));
INSERT #table (txt)
VALUES ('Development'),('Staging'),('Test'),('Pre-Production'),('UNKNOWN'),(NULL),
('Need to be decommissioned'),('Production'),(''),('Pre-Production'),('Decommissioned'),
('Non-Production'),('Unsupported Edition');
This query will identify items with characters other than A-Z, spaces and hyphens:
SELECT t.txtID, t.txt
FROM #table AS t
WHERE PATINDEX('%[^a-zA-Z -]%',t.txt) > 0;
This returns:
txtID txt
----------- -------------------------------------------
10 Pre-Production
To identify the bad character we can use NGrams8k like this:
SELECT t.txtID, t.txt, ng.position, ng.token -- ,UNICODE(ng.token)
FROM #table AS t
CROSS APPLY dbo.NGrams8K(t.txt,1) AS ng
WHERE PATINDEX('%[^a-zA-Z -]%',ng.token)>0;
Which returns:
txtID txt position token
------ ----------------- -------------------- ---------
10 Pre-Production 11 ?
PatReplace8K makes cleaning up stuff like this quite easily and quickly. First note this query:
SELECT OldString = t.txt, p.NewString
FROM #table AS t
CROSS APPLY dbo.patReplace8K(t.txt,'%[^a-zA-Z -]%','') AS p
WHERE PATINDEX('%[^a-zA-Z -]%',t.txt) > 0;
Which returns this on my system:
OldString NewString
------------------ ----------------
Pre-Produc?tion Pre-Production
To fix the problem you can use patreplace8K like this:
UPDATE t
SET txt = p.newString
FROM #table AS t
CROSS APPLY dbo.patReplace8K(t.txt,'%[^a-zA-Z -]%','') AS p
WHERE PATINDEX('%[^a-zA-Z -]%',t.txt) > 0;
Related
I have a lookup table with a list of values. Lookup Table I need to filter on a value from the LUT table in a simple where condition. It works with all table values except one and I don't know why. I have tried using trim function and lower function to change the string but nothing helped. Does anyone have the same experience? Why it does work for all table values except one? My code:
SELECT * FROM "PossibleNewGMCIssues" WHERE "gmcIssue" = 'Suspended account for policy violation'
Thank you in advance.
It's too hard for anybody to say without seeing the strings themselves. They probably look similar but have different unicode values. You can convert to the hex values to see where they are different though by using hex_encode:
Below I create a table that uses two strings that look the same but aren't. One contains an m-dash and the other an en-dash.
-- Create a table with two columns with strings in them that look the same but arent
create or replace transient table test_table as (
select 'a-string'::string col1, 'a—string'::string col2
);
-- This returns 0 results
select * from test_table where col1=col2;
-- You can tell that the strings are different by checking the hex representation of them
select hex_encode(col1), hex_encode(col2)
from test_table
;
-- The above returns:
-- +----------------+--------------------+
-- |HEX_ENCODE(COL1)|HEX_ENCODE(COL2) |
-- +----------------+--------------------+
-- |612D737472696E67|61E28094737472696E67|
-- +----------------+--------------------+
TablesIs there a way for me to have a String and check within multiple columns in another table in order starting from 1 until I get a match?
I have table with a few fields
Medicine
-----------
Advil
Tylenol
Midol
I need to check it against another table and check column in order for the medicine above.
MedsToTry1 | MedsToTry2 | MedsToTry3 | MedsToTry4 | MedsToTry5 | MedsToTry6 |
------------|------------|------------|------------|------------|------------|
NotAdvil Advil Null Null Null Null
Tylenol Ibuprofen NotTylenol Null Null Null
NotMidol NotAdvil Ibuprofen Midol Null Null
So I have to go through each one of the fields in the first table and search for them in the 'MedsToTry1' field if not there then on 'MedsToTry2' and so on until found.
I've tried concatentation on all the strings in the MedsToTry fields and searching for the string in there but it doesn't guarantee that it'll be in order and I need for 'MedsToTry1' to be checked first.
I tried to use COALESCE but it returns the fields on MedsToTry1 since they're all not null but won't go to MedsToTry2 to see if it's there.
Is there a way for me to do this? Have a String and check within multiple columns in another table in order starting from 1 until I get a match?
If I need to provide more information please let me know. I pretty new to SQL so I'm take any and all help I can get.
Thank you.
Your table is - quite probably - not the real source table. If you can query against this, it was much better. This is - again my guessing - the result of a pivot operation, where a row-wise table is transformed to a side-by-side or column-wise format.
You did not state the expected output. For the next time, or to improve this question, please look at my code and try to prepare a stand-alone example to reproduce your issue yourself. This time I've done it for you:
First we have to declare mockup-tables to simulate your issue:
DECLARE #tblA TABLE(Medicine VARCHAR(100));
INSERT INTO #tblA VALUES
('Advil')
,('Tylenol')
,('Midol');
DECLARE #tblB TABLE(RowId INT,MedsToTry1 VARCHAR(100),MedsToTry2 VARCHAR(100),MedsToTry3 VARCHAR(100),MedsToTry4 VARCHAR(100),MedsToTry5 VARCHAR(100),MedsToTry6 VARCHAR(100));
INSERT INTO #tblB VALUES
(1,'NotAdvil','Advil',Null,Null ,Null,Null)
,(2,'Tylenol','Ibuprofen','NotTylenol',Null ,Null,Null)
,(3,'NotMidol','NotAdvil','Ibuprofen','Midol',Null,Null);
--This is the query (use your own table names to test this in your environment)
SELECT B.RowId
,C.*
FROM #tblB b
CROSS APPLY (VALUES(1,b.MedsToTry1)
,(2,b.MedsToTry2)
,(3,b.MedsToTry3)
,(4,b.MedsToTry4)
,(5,b.MedsToTry5)
,(6,b.MedsToTry6)) C(MedRank,Medicin)
WHERE EXISTS(SELECT 1 FROM #tblA A WHERE A.Medicine=C.Medicin);
The idea in short:
The trick with CROSS APPLY (VALUES... will return each name-numbered column (MedsToTry1, MedsToTry2...) in one row, together with a rank. This way we do not lose the information of the sort order or the position within the table.
The WHERE will reduce the set to rows, where there exists a corresponding medicine in the other table.
The result
RowId MedRank Medicin
1 2 Advil
2 1 Tylenol
3 4 Midol
For some reason we found on the database characters like this: Ã
I can assume this character represent the character: é
Now I need to revise the whole table but checking all other characters to make sure there are no others.
Where I can find the relation of characters for example between this à and é? or probably find an SQL function that is already done to make those replacement.
I'm using SQL Server 2014
As mentioned by Daniel E, your dirty data might have been caused by the use of incorrect code pages (UTF-8 that was interpreted as ISO 8859-1).
One way to find entries with dirty data, is to use a "not exists" ("^") like expression with the list of valid characters in that expression. See example below.
declare #t table (name varchar(20))
insert into #t values ('touché')
insert into #t values ('encore touché')
insert into #t values ('reçu')
insert into #t values ('hello world')
select * from #t where name like '%[^a-zA-Z., -]%'
select * from #t where name like '%[^a-zA-Z.,èêé -]%' COLLATE Latin1_General_BIN
I'm having an issue, whereby I need to separate the following BPO007 to show BPO and 007. In some cases another example would be GFE0035, whereby I still need to split the numeric value from the characters.
When I do the following select isnumeric('BPO007') the result is 0, which is correct, however, I'm not sure how split these from each other.
I've had a look at this link, but it does not really answer my question.
I need the above split for a separate validation purpose in my trigger.
How would I develop something like this?
Thank you in advance.
As told in a comment before:
About your question How would I develop something like this?:
Do not store two pieces of information within the same column (read about 1.NF). You should keep separate columns in your database for BPO and for 007 (or rather an integer 7).
Then use some string methods to compute the BPO007 when you need it in your output.
Just not to let you alone in the rain.
DECLARE #tbl TABLE(YourColumn VARCHAR(100));
INSERT INTO #tbl VALUES('BPO007'),('GFE0035');
SELECT YourColumn,pos
,LEFT(YourColumn,pos) AS CharPart
,CAST(SUBSTRING(YourColumn,pos+1,1000) AS INT) AS NumPart
FROM #tbl
CROSS APPLY(SELECT PATINDEX('%[0-9]%',YourColumn)-1) AS A(pos);
Will return
YourColumn pos CharPart NumPart
BPO007 3 BPO 7
GFE0035 3 GFE 35
Hint: I use a CROSS APPLY here to compute the position of the first numeric character and then use pos in the actual query like you'd use a variable. Otherwise the PATINDEX would have to be repeated...
Since the number and text varies, you can use the following codes
DECLARE #NUMERIC TABLE (Col VARCHAR(50))
INSERT INTO #NUMERIC VALUES('BPO007'),('GFE0035'),('GFEGVT003509'),('GFEMTS10035')
SELECT
Col,
LEFT(Col,LEN(Col)-LEN(SUBSTRING(Col,PATINDEX('%[0-9]%',Col),DATALENGTH(Col)))) AS TEXTs,
RIGHT(Col,LEN(Col)-LEN(LEFT(Col,LEN(Col)-LEN(SUBSTRING(Col,PATINDEX('%[0-9]%',Col),DATALENGTH(Col)))))) AS NUMERICs
FROM #NUMERIC
Im moving data from one table to another using insert into. in the select bit need to transfer from column with characters and numerical in to another with only the numerical. The original column is in varchar format.
original column -
ABC100
XYZ:200
DD2000
Wanted column
100
200
2000
Cant write a function because cant have a function in side select statement when inserting
Using MS SQL
I encourage you to read this:
Extracting Data
There is an example function that removes alpha characters from a string. This will be much faster than a bunch of replace statements.
You can probably do that with a regex replace. The syntax for this depends on your database software (which you haven't specified).
You should be able to do function calls in your SELECT statement, even when you're using it to INSERT INTO.
If your data is fixed-format I'd do something like
INSERT INTO SOME_TABLE(COLUMN1, COLUMN2, COLUMN3)
SELECT TO_NUMBER(SUBSTR(SOURCE_COLUMN, 4, 3)),
TO_NUMBER(SUBSTR(SOURCE_COLUMN, 12, 3)),
TO_NUMBER(SUBSTR(SOURCE_COLUMN, 18, 4))
FROM SOME_OTHER_TABLE
WHERE <conditions>;
The above code is for Oracle. Depending on the database you're using you may have to do things a bit differently.
I hope this helps.
You certainly can have a function inside a SELECT statement during an INSERT:
INSERT INTO CleanTable (CleanColumn)
SELECT dbo.udf_CleanString(DirtyColumn)
FROM DirtyTable
Your main problem is going to be getting the function right (the one the G Mastros linked to is pretty good) and getting it performing. If you're only talking thousands of rows, this should be fine. If you are talking about millions of rows, you might need a different strategy.
Writing a UDF is how I've solved this problem in the past. However, I got to thinking if there was a set-based solution. Here's what I have:
First my table which I used Red Gate's Data Generator to populate with a bunch of random alpha numeric values:
Create Table MixedValues (
Id int not null identity(1,1) Primary Key
, AlphaValue varchar(50)
)
Next I built a Tally table on the fly using a CTE but normally I have a fixed table for this. A Tally table is just a table of sequential numbers.
;With Tally As
(
Select ROW_NUMBER() OVER ( ORDER BY object_id ) As Num
From sys.columns
)
, IndividualChars As
(
Select MX.Id, Substring(MX.AlphaValue, Num, 1) As CharValue, Num
From Tally
Cross Join MixedValues As MX
Where Num Between 1 And Len(MX.AlphaValue)
)
Select MX.Id, MX.AlphaValue
, Replace(
(
Select '' + CharValue
From IndividualChars As IC
Where IC.Id = MX.Id
And PATINDEX('[ 0-9]', CharValue) > 0
Order By Num
For Xml Path('')
)
, ' ', ' ') As NewValue
From MixedValues As MX
From a top level, the idea here is to split the string into one row per individual character, test the type of pattern you want and then re-constitute it.
Note that my sys.columns table only contains 500 some odd rows. If you had strings larger than 500 characters, you could simply cross join sys.columns to itself and get 500^2 rows. In addition, For Xml Path returns a string with spaces escaped (note the space in my pattern index [ 0-9] which tells the system to include spaces.) so I use the replace function to reverse the escaping.
EDIT: Btw, this will only work in SQL 2005+ because of my use of the CTE. If you wanted a SQL 2000 solution, you would need to break up the CTE into separate table creation calls (e.g. Temp tables) but it could still be done.
EDIT: I added the Num column in the IndividualChars CTE and added an Order By to the NewValue query at the end. Although it probably will reconstitute the string in order, I wanted to ensure that it would by explicitly ordering the results.