Why two string do not match although they are exactly the same? - snowflake-cloud-data-platform

I have a lookup table with a list of values. Lookup Table I need to filter on a value from the LUT table in a simple where condition. It works with all table values except one and I don't know why. I have tried using trim function and lower function to change the string but nothing helped. Does anyone have the same experience? Why it does work for all table values except one? My code:
SELECT * FROM "PossibleNewGMCIssues" WHERE "gmcIssue" = 'Suspended account for policy violation'
Thank you in advance.

It's too hard for anybody to say without seeing the strings themselves. They probably look similar but have different unicode values. You can convert to the hex values to see where they are different though by using hex_encode:
Below I create a table that uses two strings that look the same but aren't. One contains an m-dash and the other an en-dash.
-- Create a table with two columns with strings in them that look the same but arent
create or replace transient table test_table as (
select 'a-string'::string col1, 'a—string'::string col2
);
-- This returns 0 results
select * from test_table where col1=col2;
-- You can tell that the strings are different by checking the hex representation of them
select hex_encode(col1), hex_encode(col2)
from test_table
;
-- The above returns:
-- +----------------+--------------------+
-- |HEX_ENCODE(COL1)|HEX_ENCODE(COL2) |
-- +----------------+--------------------+
-- |612D737472696E67|61E28094737472696E67|
-- +----------------+--------------------+

Related

Need to Add Values to Certain Items

I have a table that I need to add the same values to a whole bunch of items
(in a nut shell if the item doesn't have a UNIT of "CTN" I want to add the same values i have listed to them all)
I thought the following would work but it doesn't :(
Any idea what i am doing wrong ?
INSERT INTO ICUNIT
(UNIT,AUDTDATE,AUDTTIME,AUDTUSER,AUDTORG,CONVERSION)
VALUES ('CTN','20220509','22513927','ADMIN','AU','1')
WHERE ITEMNO In '0','etc','etc','etc'
If I understand correctly you might want to use INSERT INTO ... SELECT from original table with your condition.
INSERT INTO ICUNIT (UNIT,AUDTDATE,AUDTTIME,AUDTUSER,AUDTORG,CONVERSION)
SELECT 'CTN','20220509','22513927','ADMIN','AU','1'
FROM ICUNIT
WHERE ITEMNO In ('0','etc','etc','etc')
The query you needs starts by selecting the filtered items. So it seems something like below is your starting point
select <?> from dbo.ICUNIT as icu where icu.UNIT <> 'CTN' order by ...;
Notice the use of schema name, terminators, and table aliases - all best practices. I will guess that a given "item" can have multiple rows in this table so long as ICUNIT is unique within ITEMNO. Correct? If so, the above query won't work. So let's try slightly more complicated filtering.
select distinct icu.ITEMNO
from dbo.ICUNIT as icu
where not exists (select * from dbo.ICUNIT as ctns
where ctns.ITEMNO = icu.ITEMNO -- correlating the subquery
and ctns.UNIT = 'CTN')
order by ...;
There are other ways to do that above but that is one common way. That query will produce a resultset of all ITEMNO values in your table that do not already have a row where UNIT is "CTN". If you need to filter that for specific ITEMNO values you simply adjust the WHERE clause. If that works correctly, you can use that with your insert statement to then insert the desired rows.
insert into dbo.ICUNIT (...)
select distinct icu.ITEMNO, 'CTN', '20220509', '22513927', 'ADMIN', 'AU', '1'
from ...
;

not able to identify difference between same value

I have data inside a table's column. I SELECT DISTINCT of that column, i also put LTRIM(RTRIM(col_name)) as well while writing SELECT. But still I am getting duplicate column record.
How can we identify why it is happening and how we can avoid it?
I tried RTRIM, LTRIM, UPPER function. Still no help.
Query:
select distinct LTRIM(RTRIM(serverstatus))
from SQLInventory
Output:
Development
Staging
Test
Pre-Production
UNKNOWN
NULL
Need to be decommissioned
Production
Pre-Produc​tion
Decommissioned
Non-Production
Unsupported Edition
Looks like there's a unicode character in there, somewhere. I copied and pasted the values out initially as a varchar, and did the following:
SELECT DISTINCT serverstatus
FROM (VALUES('Development'),
('Staging'),
('Test'),
('Pre-Production'),
('UNKNOWN'),
('NULL'),
('Need to be decommissioned'),
('Production'),
(''),
('Pre-Produc​tion'),
('Decommissioned'),
('Non-Production'),
('Unsupported Edition'))V(serverstatus);
This, interestingly, returned the values below:
Development
Staging
Test
Pre-Production
UNKNOWN
NULL
Need to be decommissioned
Production
Pre-Produc?tion
Decommissioned
Non-Production
Unsupported Edition
Note that one of the values is Pre-Produc?tion, meaning that there is a unicode character between the c and t.
So, let's find out what it is:
SELECT 'Pre-Produc​tion', N'Pre-Produc​tion',
UNICODE(SUBSTRING(N'Pre-Produc​tion',11,1));
The UNICODE function returns back 8203, which is a zero-width space. I assume you want to remove these, so you can update your data by doing:
UPDATE SQLInventory
SET serverstatus = REPLACE(serverstatus, NCHAR(8203), N'');
Now your first query should work as you expect.
(I also suggest you might therefore want a lookup table for your status' with a foreign key, so that this can't happen again).
DB<>fiddle
I deal with this type of thing all the time. For stuff like this NGrams8K and PatReplace8k and PATINDEX are your best friends.
Putting what you posted in a table variable we can analyze the problem:
DECLARE #table TABLE (txtID INT IDENTITY, txt NVARCHAR(100));
INSERT #table (txt)
VALUES ('Development'),('Staging'),('Test'),('Pre-Production'),('UNKNOWN'),(NULL),
('Need to be decommissioned'),('Production'),(''),('Pre-Produc​tion'),('Decommissioned'),
('Non-Production'),('Unsupported Edition');
This query will identify items with characters other than A-Z, spaces and hyphens:
SELECT t.txtID, t.txt
FROM #table AS t
WHERE PATINDEX('%[^a-zA-Z -]%',t.txt) > 0;
This returns:
txtID txt
----------- -------------------------------------------
10 Pre-Produc​tion
To identify the bad character we can use NGrams8k like this:
SELECT t.txtID, t.txt, ng.position, ng.token -- ,UNICODE(ng.token)
FROM #table AS t
CROSS APPLY dbo.NGrams8K(t.txt,1) AS ng
WHERE PATINDEX('%[^a-zA-Z -]%',ng.token)>0;
Which returns:
txtID txt position token
------ ----------------- -------------------- ---------
10 Pre-Produc​tion 11 ?
PatReplace8K makes cleaning up stuff like this quite easily and quickly. First note this query:
SELECT OldString = t.txt, p.NewString
FROM #table AS t
CROSS APPLY dbo.patReplace8K(t.txt,'%[^a-zA-Z -]%','') AS p
WHERE PATINDEX('%[^a-zA-Z -]%',t.txt) > 0;
Which returns this on my system:
OldString NewString
------------------ ----------------
Pre-Produc?tion Pre-Production
To fix the problem you can use patreplace8K like this:
UPDATE t
SET txt = p.newString
FROM #table AS t
CROSS APPLY dbo.patReplace8K(t.txt,'%[^a-zA-Z -]%','') AS p
WHERE PATINDEX('%[^a-zA-Z -]%',t.txt) > 0;

Need to be able to check column by one by one and for a certain string but can't figure out a way to do so

TablesIs there a way for me to have a String and check within multiple columns in another table in order starting from 1 until I get a match?
I have table with a few fields
Medicine
-----------
Advil
Tylenol
Midol
I need to check it against another table and check column in order for the medicine above.
MedsToTry1 | MedsToTry2 | MedsToTry3 | MedsToTry4 | MedsToTry5 | MedsToTry6 |
------------|------------|------------|------------|------------|------------|
NotAdvil Advil Null Null Null Null
Tylenol Ibuprofen NotTylenol Null Null Null
NotMidol NotAdvil Ibuprofen Midol Null Null
So I have to go through each one of the fields in the first table and search for them in the 'MedsToTry1' field if not there then on 'MedsToTry2' and so on until found.
I've tried concatentation on all the strings in the MedsToTry fields and searching for the string in there but it doesn't guarantee that it'll be in order and I need for 'MedsToTry1' to be checked first.
I tried to use COALESCE but it returns the fields on MedsToTry1 since they're all not null but won't go to MedsToTry2 to see if it's there.
Is there a way for me to do this? Have a String and check within multiple columns in another table in order starting from 1 until I get a match?
If I need to provide more information please let me know. I pretty new to SQL so I'm take any and all help I can get.
Thank you.
Your table is - quite probably - not the real source table. If you can query against this, it was much better. This is - again my guessing - the result of a pivot operation, where a row-wise table is transformed to a side-by-side or column-wise format.
You did not state the expected output. For the next time, or to improve this question, please look at my code and try to prepare a stand-alone example to reproduce your issue yourself. This time I've done it for you:
First we have to declare mockup-tables to simulate your issue:
DECLARE #tblA TABLE(Medicine VARCHAR(100));
INSERT INTO #tblA VALUES
('Advil')
,('Tylenol')
,('Midol');
DECLARE #tblB TABLE(RowId INT,MedsToTry1 VARCHAR(100),MedsToTry2 VARCHAR(100),MedsToTry3 VARCHAR(100),MedsToTry4 VARCHAR(100),MedsToTry5 VARCHAR(100),MedsToTry6 VARCHAR(100));
INSERT INTO #tblB VALUES
(1,'NotAdvil','Advil',Null,Null ,Null,Null)
,(2,'Tylenol','Ibuprofen','NotTylenol',Null ,Null,Null)
,(3,'NotMidol','NotAdvil','Ibuprofen','Midol',Null,Null);
--This is the query (use your own table names to test this in your environment)
SELECT B.RowId
,C.*
FROM #tblB b
CROSS APPLY (VALUES(1,b.MedsToTry1)
,(2,b.MedsToTry2)
,(3,b.MedsToTry3)
,(4,b.MedsToTry4)
,(5,b.MedsToTry5)
,(6,b.MedsToTry6)) C(MedRank,Medicin)
WHERE EXISTS(SELECT 1 FROM #tblA A WHERE A.Medicine=C.Medicin);
The idea in short:
The trick with CROSS APPLY (VALUES... will return each name-numbered column (MedsToTry1, MedsToTry2...) in one row, together with a rank. This way we do not lose the information of the sort order or the position within the table.
The WHERE will reduce the set to rows, where there exists a corresponding medicine in the other table.
The result
RowId MedRank Medicin
1 2 Advil
2 1 Tylenol
3 4 Midol

Separating Numeric Values from string

I'm having an issue, whereby I need to separate the following BPO007 to show BPO and 007. In some cases another example would be GFE0035, whereby I still need to split the numeric value from the characters.
When I do the following select isnumeric('BPO007') the result is 0, which is correct, however, I'm not sure how split these from each other.
I've had a look at this link, but it does not really answer my question.
I need the above split for a separate validation purpose in my trigger.
How would I develop something like this?
Thank you in advance.
As told in a comment before:
About your question How would I develop something like this?:
Do not store two pieces of information within the same column (read about 1.NF). You should keep separate columns in your database for BPO and for 007 (or rather an integer 7).
Then use some string methods to compute the BPO007 when you need it in your output.
Just not to let you alone in the rain.
DECLARE #tbl TABLE(YourColumn VARCHAR(100));
INSERT INTO #tbl VALUES('BPO007'),('GFE0035');
SELECT YourColumn,pos
,LEFT(YourColumn,pos) AS CharPart
,CAST(SUBSTRING(YourColumn,pos+1,1000) AS INT) AS NumPart
FROM #tbl
CROSS APPLY(SELECT PATINDEX('%[0-9]%',YourColumn)-1) AS A(pos);
Will return
YourColumn pos CharPart NumPart
BPO007 3 BPO 7
GFE0035 3 GFE 35
Hint: I use a CROSS APPLY here to compute the position of the first numeric character and then use pos in the actual query like you'd use a variable. Otherwise the PATINDEX would have to be repeated...
Since the number and text varies, you can use the following codes
DECLARE #NUMERIC TABLE (Col VARCHAR(50))
INSERT INTO #NUMERIC VALUES('BPO007'),('GFE0035'),('GFEGVT003509'),('GFEMTS10035')
SELECT
Col,
LEFT(Col,LEN(Col)-LEN(SUBSTRING(Col,PATINDEX('%[0-9]%',Col),DATALENGTH(Col)))) AS TEXTs,
RIGHT(Col,LEN(Col)-LEN(LEFT(Col,LEN(Col)-LEN(SUBSTRING(Col,PATINDEX('%[0-9]%',Col),DATALENGTH(Col)))))) AS NUMERICs
FROM #NUMERIC

SEQUENCE in SQL Server 2008 R2

I need to know if there is any way to have a SEQUENCE or something like that, as we have in Oracle. The idea is to get one number and then use it as a key to save some records in a table. Each time we need to save data in that table, first we get the next number from the sequence and then we use the same to save some records. Is not an IDENTITY column.
For example:
[ID] [SEQUENCE ID] [Code] [Value]
1 1 A 232
2 1 B 454
3 1 C 565
Next time someone needs to add records, the next SEQUENCE ID should be 2, is there any way to do it? the sequence could be a guid for me as well.
As Guillelon points out, the best way to do this in SQL Server is with an identity column.
You can simply define a column as being identity. When a new row is inserted, the identity is automatically incremented.
The difference is that the identity is updated on every row, not just some rows. To be honest, think this is a much better approach. Your example suggests that you are storing both an entity and detail in the same table.
The SequenceId should be the primary identity key in another table. This value can then be used for insertion into this table.
This can be done using multiple ways, Following is what I can think of
Creating a trigger and there by computing the possible value
Adding a computed column along with a function that retrieves the next value of the sequence
Here is an article that presents various solutions
One possible way is to do something like this:
-- Example 1
DECLARE #Var INT
SET #Var = Select Max(ID) + 1 From tbl;
INSERT INTO tbl VALUES (#var,'Record 1')
INSERT INTO tbl VALUES (#var,'Record 2')
INSERT INTO tbl VALUES (#var,'Record 3')
-- Example 2
INSERT INTO #temp VALUES (1,2)
INSERT INTO #temp VALUES (1,2)
INSERT INTO ActualTable (col1, col2, sequence)
SELECT temp.*, (SELECT MAX(ID) + 1 FROM ActualTable)
FROM #temp temp
-- Example 3
DECLARE #var int
INSERT INTO ActualTable (col1, col2, sequence) OUTPUT #var = inserted.sequence VALUES (1, 2, (SELECT MAX(ID) + 1 FROM ActualTable))
The first two examples rely on batch updating. But based on your comment, I have added example 3 which is a single input initially. You can then use the sequence that was inserted to insert the rest of the records. If you have never used an output, please reply in comments and I will expand further.
I would isolate all of the above inside of a transactions.
If you were using SQL Server 2012, you could use the SEQUENCE operator as shown here.
Forgive me if syntax errors, don't have SSMS installed

Resources