Is not empty string not working for special characters [duplicate] - sql-server

Ok so I have a table with three columns:
Id, Key, Value
I would like to delete all rows where Value is empty (''). Therefore I wrote the query to select before I delete which was:
Select * from [Imaging.ImageTag] where [Value] = ''
all pretty standard so far...
Now heres the strange part. This query returned two rows shown below with commas seperating columns:
CE7C367C-5C4A-4531-9C8C-8F2A26B1B980, ObjectType, 🎃
F5B2F8A8-C4A8-4799-8824-E5FFEEDAB887, Caption, 🍰
Why are these two rows matching on ''?
Extra Info
I am using Sql-Server, The [Value] column is of type NVARCHAR(300) and yes the table name really is [Imaging.ImageTag]

This is collation dependant.
Matches empty string
SELECT 1 where N'' = N'🍰' COLLATE latin1_general_ci_as
Doesn't match empty string
SELECT 1 WHERE N'' = N'🍰' COLLATE latin1_general_100_ci_as
The 100 collations are more up-to-date (though still not bleeding edge, they have been available since 2008) and you should use more modern collations unless you have some specific reason not to. The BOL entry for 100 collations specifically calls out
Weighting has been added to previously non-weighted characters that
would have compared equally.

It's not an answer to your "why", but in terms of your overall goal, perhaps you should alter your strategy for searching for empty values:
Select * from [Imaging.ImageTag] where LEN([Value]) = 0
As per the comments (thanks Martin Smith for providing some copy/pastable emoji):
SELECT CASE WHEN N'' = N'🍰' then 1 else 0 end --returns 1, no good for checking
SELECT LEN(N'🍰') --returns 2, can be used to check for zero length values?

Complementing this answers
When you need use 'like' at sql
WHERE
N'' + COLUMNS like N'%'+ #WordSearch +'%' COLLATE latin1_general_100_ci_as

Google send me here looking for a way filter all rows with an emoji on a varchar column.
In case that your looking for something similar:
SELECT mycolumn
FROM mytable
WHERE REGEXP_EXTRACT(mycolumn,'\x{1f600}') <> ''
--sqlserver WHERE SUBSTRING(MyCol, (PATINDEX( '\x{1f600}', MyCol ))) <> ''
the \x{1f600} is the char code for the searched emoji, you can find the emoji codes here

Related

Why is SQL like returning both 2010 and 010 when searched with - where columnName like N'%ဲ010%' [duplicate]

Ok so I have a table with three columns:
Id, Key, Value
I would like to delete all rows where Value is empty (''). Therefore I wrote the query to select before I delete which was:
Select * from [Imaging.ImageTag] where [Value] = ''
all pretty standard so far...
Now heres the strange part. This query returned two rows shown below with commas seperating columns:
CE7C367C-5C4A-4531-9C8C-8F2A26B1B980, ObjectType, 🎃
F5B2F8A8-C4A8-4799-8824-E5FFEEDAB887, Caption, 🍰
Why are these two rows matching on ''?
Extra Info
I am using Sql-Server, The [Value] column is of type NVARCHAR(300) and yes the table name really is [Imaging.ImageTag]
This is collation dependant.
Matches empty string
SELECT 1 where N'' = N'🍰' COLLATE latin1_general_ci_as
Doesn't match empty string
SELECT 1 WHERE N'' = N'🍰' COLLATE latin1_general_100_ci_as
The 100 collations are more up-to-date (though still not bleeding edge, they have been available since 2008) and you should use more modern collations unless you have some specific reason not to. The BOL entry for 100 collations specifically calls out
Weighting has been added to previously non-weighted characters that
would have compared equally.
It's not an answer to your "why", but in terms of your overall goal, perhaps you should alter your strategy for searching for empty values:
Select * from [Imaging.ImageTag] where LEN([Value]) = 0
As per the comments (thanks Martin Smith for providing some copy/pastable emoji):
SELECT CASE WHEN N'' = N'🍰' then 1 else 0 end --returns 1, no good for checking
SELECT LEN(N'🍰') --returns 2, can be used to check for zero length values?
Complementing this answers
When you need use 'like' at sql
WHERE
N'' + COLUMNS like N'%'+ #WordSearch +'%' COLLATE latin1_general_100_ci_as
Google send me here looking for a way filter all rows with an emoji on a varchar column.
In case that your looking for something similar:
SELECT mycolumn
FROM mytable
WHERE REGEXP_EXTRACT(mycolumn,'\x{1f600}') <> ''
--sqlserver WHERE SUBSTRING(MyCol, (PATINDEX( '\x{1f600}', MyCol ))) <> ''
the \x{1f600} is the char code for the searched emoji, you can find the emoji codes here

not able to identify difference between same value

I have data inside a table's column. I SELECT DISTINCT of that column, i also put LTRIM(RTRIM(col_name)) as well while writing SELECT. But still I am getting duplicate column record.
How can we identify why it is happening and how we can avoid it?
I tried RTRIM, LTRIM, UPPER function. Still no help.
Query:
select distinct LTRIM(RTRIM(serverstatus))
from SQLInventory
Output:
Development
Staging
Test
Pre-Production
UNKNOWN
NULL
Need to be decommissioned
Production
Pre-Produc​tion
Decommissioned
Non-Production
Unsupported Edition
Looks like there's a unicode character in there, somewhere. I copied and pasted the values out initially as a varchar, and did the following:
SELECT DISTINCT serverstatus
FROM (VALUES('Development'),
('Staging'),
('Test'),
('Pre-Production'),
('UNKNOWN'),
('NULL'),
('Need to be decommissioned'),
('Production'),
(''),
('Pre-Produc​tion'),
('Decommissioned'),
('Non-Production'),
('Unsupported Edition'))V(serverstatus);
This, interestingly, returned the values below:
Development
Staging
Test
Pre-Production
UNKNOWN
NULL
Need to be decommissioned
Production
Pre-Produc?tion
Decommissioned
Non-Production
Unsupported Edition
Note that one of the values is Pre-Produc?tion, meaning that there is a unicode character between the c and t.
So, let's find out what it is:
SELECT 'Pre-Produc​tion', N'Pre-Produc​tion',
UNICODE(SUBSTRING(N'Pre-Produc​tion',11,1));
The UNICODE function returns back 8203, which is a zero-width space. I assume you want to remove these, so you can update your data by doing:
UPDATE SQLInventory
SET serverstatus = REPLACE(serverstatus, NCHAR(8203), N'');
Now your first query should work as you expect.
(I also suggest you might therefore want a lookup table for your status' with a foreign key, so that this can't happen again).
DB<>fiddle
I deal with this type of thing all the time. For stuff like this NGrams8K and PatReplace8k and PATINDEX are your best friends.
Putting what you posted in a table variable we can analyze the problem:
DECLARE #table TABLE (txtID INT IDENTITY, txt NVARCHAR(100));
INSERT #table (txt)
VALUES ('Development'),('Staging'),('Test'),('Pre-Production'),('UNKNOWN'),(NULL),
('Need to be decommissioned'),('Production'),(''),('Pre-Produc​tion'),('Decommissioned'),
('Non-Production'),('Unsupported Edition');
This query will identify items with characters other than A-Z, spaces and hyphens:
SELECT t.txtID, t.txt
FROM #table AS t
WHERE PATINDEX('%[^a-zA-Z -]%',t.txt) > 0;
This returns:
txtID txt
----------- -------------------------------------------
10 Pre-Produc​tion
To identify the bad character we can use NGrams8k like this:
SELECT t.txtID, t.txt, ng.position, ng.token -- ,UNICODE(ng.token)
FROM #table AS t
CROSS APPLY dbo.NGrams8K(t.txt,1) AS ng
WHERE PATINDEX('%[^a-zA-Z -]%',ng.token)>0;
Which returns:
txtID txt position token
------ ----------------- -------------------- ---------
10 Pre-Produc​tion 11 ?
PatReplace8K makes cleaning up stuff like this quite easily and quickly. First note this query:
SELECT OldString = t.txt, p.NewString
FROM #table AS t
CROSS APPLY dbo.patReplace8K(t.txt,'%[^a-zA-Z -]%','') AS p
WHERE PATINDEX('%[^a-zA-Z -]%',t.txt) > 0;
Which returns this on my system:
OldString NewString
------------------ ----------------
Pre-Produc?tion Pre-Production
To fix the problem you can use patreplace8K like this:
UPDATE t
SET txt = p.newString
FROM #table AS t
CROSS APPLY dbo.patReplace8K(t.txt,'%[^a-zA-Z -]%','') AS p
WHERE PATINDEX('%[^a-zA-Z -]%',t.txt) > 0;

SQL Query Where Column = '' returning Emoji characters 🎃 and 🍰

Ok so I have a table with three columns:
Id, Key, Value
I would like to delete all rows where Value is empty (''). Therefore I wrote the query to select before I delete which was:
Select * from [Imaging.ImageTag] where [Value] = ''
all pretty standard so far...
Now heres the strange part. This query returned two rows shown below with commas seperating columns:
CE7C367C-5C4A-4531-9C8C-8F2A26B1B980, ObjectType, 🎃
F5B2F8A8-C4A8-4799-8824-E5FFEEDAB887, Caption, 🍰
Why are these two rows matching on ''?
Extra Info
I am using Sql-Server, The [Value] column is of type NVARCHAR(300) and yes the table name really is [Imaging.ImageTag]
This is collation dependant.
Matches empty string
SELECT 1 where N'' = N'🍰' COLLATE latin1_general_ci_as
Doesn't match empty string
SELECT 1 WHERE N'' = N'🍰' COLLATE latin1_general_100_ci_as
The 100 collations are more up-to-date (though still not bleeding edge, they have been available since 2008) and you should use more modern collations unless you have some specific reason not to. The BOL entry for 100 collations specifically calls out
Weighting has been added to previously non-weighted characters that
would have compared equally.
It's not an answer to your "why", but in terms of your overall goal, perhaps you should alter your strategy for searching for empty values:
Select * from [Imaging.ImageTag] where LEN([Value]) = 0
As per the comments (thanks Martin Smith for providing some copy/pastable emoji):
SELECT CASE WHEN N'' = N'🍰' then 1 else 0 end --returns 1, no good for checking
SELECT LEN(N'🍰') --returns 2, can be used to check for zero length values?
Complementing this answers
When you need use 'like' at sql
WHERE
N'' + COLUMNS like N'%'+ #WordSearch +'%' COLLATE latin1_general_100_ci_as
Google send me here looking for a way filter all rows with an emoji on a varchar column.
In case that your looking for something similar:
SELECT mycolumn
FROM mytable
WHERE REGEXP_EXTRACT(mycolumn,'\x{1f600}') <> ''
--sqlserver WHERE SUBSTRING(MyCol, (PATINDEX( '\x{1f600}', MyCol ))) <> ''
the \x{1f600} is the char code for the searched emoji, you can find the emoji codes here

SQL Server Management Studio decimal column with blank value

I have the couple of columns defined as decimal(16,2).
I would like to leave the value '' (blank) in it.
When I have the select query as
CASE
WHEN FIELD1 IS NULL THEN ''
ELSE FIELD1
END AS FIELD_NAME
This will not allowed as the nature of the column.
Could you please help me how can I put a blank value in this column?
Many thanks
You will either need to leave the values as NULL, which is recommended solution, or convert everything to text. The reason for the error is that you cannot return a decimal and a text value in the same column ('' is a text value).
To convert everything to text and insert the blank value, use the following:
SELECT ISNULL(CONVERT(VARCHAR(20), Field1), '')
FROM myTable
I agree with comments, but if you must do it in SQL you could use a cast in your case statement as per below code. Better handled in front end though as the below code would mean your final select returns a varchar and not a decimal. Handled front end, you can return a decimal from SQL and display blank if NULL.
create table #temp (thing decimal(16,2))
insert into #temp (thing) values(NULL)
select
case when thing IS NULL then ''
else cast(thing as varchar(5)) end
from #temp

How to convert varchar to ASCII7

I want to replace any latin / accented characters with their basic alphabet letters and strip out everything that cant be converted
examples:
'ë' to be replaced with 'e'
'ß' to be replaced with 's' , 'ss' if possible, if neither then strip it
i am able to do this in c# code but im just not well experienced in MSSQL to solve this without taking many days
UPDATE: the data in the varchar column is populated from a trigger on another table which should have normal UNICODE text. i want to convert the text to ascii7 in a function to use for further processing.
UPDATE: i prefer a solution where this can be done in SQL only and avoiding custom character mapping. can this be done, or is it currently just not possible?
As Aaron said, I don't think you can dispose of mapping tables entirely in SQL, but mapping characters to ASCII-7 should involve some fairly simple tables, used in conjunction with AI collations. Here there are two tables, one to map characters in the column, and one for the letter of the alphabet (which could be expanded if necessary).
By using the AI collations, I get around a lot of explicit mapping definitions.
-----------------------------------------------
-- One time mapping table setup
CREATE TABLE t4000(i INT PRIMARY KEY);
GO
INSERT INTO t4000 --Just a simple list of integers from 1 to 4000
SELECT ROW_NUMBER()OVER(ORDER BY a.x)
FROM (VALUES(1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) a(x)
CROSS APPLY (VALUES(1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) b(x)
CROSS APPLY (VALUES(1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) c(x)
CROSS APPLY (VALUES(1),(2),(3),(4)) d(x)
GO
CREATE TABLE TargetChars(ch NVARCHAR(2) COLLATE Latin1_General_CS_AI PRIMARY KEY);
GO
INSERT TargetChars -- A-Z, a-z, ss
SELECT TOP(128) CHAR(i)
FROM t4000
WHERE i BETWEEN 65 AND 90
OR i BETWEEN 97 AND 122
UNION ALL
SELECT 'ss'
-- plus any other special targets here
GO
-----------------------------------------------
-- function
CREATE FUNCTION dbo.TrToA7(#str NVARCHAR(4000))
RETURNS NVARCHAR(4000)
AS
BEGIN
DECLARE #mapped NVARCHAR(4000) = '';
SELECT TOP(LEN(#str))
#mapped += ISNULL(tc.ch, SUBSTRING(#str, i, 1))
FROM t4000
LEFT JOIN TargetChars tc ON tc.ch = SUBSTRING(#str, i, 1)
COLLATE Latin1_General_CS_AI;
RETURN #mapped;
END
GO
Usage example:
SELECT dbo.TrToA7('It was not á tötal löß.');
Result:
--------------------------
It was not a total loss.

Resources