Replacing German Characters (Characters with Umlauts) - sql-server

When I do a simple select like case 1 below, the replace function works as expected.
CASE 1
select replace ('äaü','ä','ae')
RESULT
aeaü
When I do the same on the column in a table it replaces even 'a' with 'ae' which in unexpected.
CASE 2
select replace (column_1,'ä','ae') as actual_text
from table
RESULT
aeaeü
How can I achieve the expected results in case 2?

Thanks #juergen d!
My database default collation was set to Latin1_General_CI_AI earlier.
Now i executed the following statement which gives me expected results
select replace (column_1 collate Latin1_General_CI_AS,'ä','ae') as actual_text
from table
i.e, the collation property has been changed from Accent insensitive to Accent sensitive.

SELECT 'Citroën' COLLATE Ukrainian_CI_AI -> return Citroen

Related

Does Snowflake support case insensitive where clause filter similar to SQL Server

I am migrating SQL code to snowflake and during migration i found that by default snowflake is comparing varchar field (ex. select 1 where 'Hello' = 'hello') incorrectly. To solve this problem i set collation 'en-ci' at account level. However now i am not able to use REPLACE like crucial function.
Is it possible in snowflake to do case insensitive varchar comparison (without mentioning collation explicitly or using UPPER function every time) and still use replace function?
I will appreciate any help.
Thanks,
you can compare the test with a case intensive match via ILIKE
select 1 where 'Hello' ilike 'hello'
regexp_replace is your friend, it allows for 'i' parameter that stands for "ignore case":
https://docs.snowflake.com/en/sql-reference/functions/regexp_replace.html
For example you can do something like that:
select regexp_replace('cats are grey, cAts are Cats','cats','dogs',1,0,'i');
I've assumed the default values for position and occurrence but those can also be adjusted
And you can still do comparison (also based on regexp, aka "RLIKE"):
https://docs.snowflake.com/en/sql-reference/functions/rlike.html
Snowflake supports COLLATE:
SELECT 1 WHERE 'Hello' = 'hello' COLLATE 'en-ci';
-- 1
SELECT 'Hello' = 'hello'
,'Hello' = 'hello' COLLATE 'en-ci';
Output:
The collation could be setup at account/database/schema/table level with parameter DEFAULT_DDL_COLLATION:
Sets the default collation used for the following DDL operations:
CREATE TABLE
ALTER TABLE … ADD COLUMN
Setting this parameter forces all subsequently-created columns in the affected objects (table, schema, database, or account) to have the specified collation as the default, unless the collation for the column is explicitly defined in the DDL.

Is there a SQL Server collation option that will allow matching different apostrophes?

I'm currently using SQL Server 2016 with SQL_Latin1_General_CP1_CI_AI collation. As expected, queries with the letter e will match values with the letters e, è, é, ê, ë, etc because of the accent insensitive option of the collation. However, queries with a ' (U+0027) do not match values containing a ’ (U+2019). I would like to know if such a collation exists where this case would match, since it's easier to type ' than it is to know that ’ is keystroke Alt-0146.
I'm confident in saying no. The main thing, here, is that the two characters are different (although similar). With accents, e and ê are still both an e (just one has an accent). This enables you (for example) to do searches for things like SELECT * FROM Games WHERE [Name] LIKE 'Pokémon%'; and still have rows containing Pokemon return (because people haven't used the accent :P).
The best thing I could suggest would be to use REPLACE (at least in your WHERE clause) so that both rows are returned. That is, however, likely going to get expensive.
If you know what columns are going to be a problem, you could, therefore, add a PERSISTED Computed Column to that table. Then you could use that column in your WHERE clause, but display the one the original one. Something like:
USE Sandbox;
--Create Sample table and data
CREATE TABLE Sample (String varchar(500));
INSERT INTO Sample
VALUES ('This is a string that does not contain either apostrophe'),
('Where as this string, isn''t without at least one'),
('’I have one of them as well’'),
('’Well, I''m going to use both’');
GO
--First attempt (without the column)
SELECT String
FROM Sample
WHERE String LIKE '%''%'; --Only returns 2 of the rows
GO
--Create a PERSISTED Column
ALTER TABLE Sample ADD StringRplc AS REPLACE(String,'’','''') PERSISTED;
GO
--Second attempt
SELECT String
FROM Sample
WHERE StringRplc LIKE '%''%'; --Returns 3 rows
GO
--Clean up
DROP TABLE Sample;
GO
The other answer is correct. There is no such collation. You can easily verify this with the below.
DECLARE #dynSql NVARCHAR(MAX) =
'SELECT * FROM (' +
(
SELECT SUBSTRING(
(
SELECT ' UNION ALL SELECT ''' + name + ''' AS name, IIF( NCHAR(0x0027) = NCHAR(0x2019) COLLATE ' + name + ', 1,0) AS Equal'
FROM sys.fn_helpcollations()
FOR XML PATH('')
), 12, 0+ 0x7fffffff)
)
+ ') t
ORDER BY Equal, name';
PRINT #dynSql;
EXEC (#dynSql);

Replacing a specific Unicode Character in MS SQL Server

I'm using MS SQL Server Express 2012.
I'm having trouble removing the unicode character U+02CC (Decimal : 716) in the grid results. The original text is 'λeˌβár'.
I tried it like this, it doesn't work:
SELECT ColumnTextWithUnicode, REPLACE(ColumnTextWithUnicode , 'ˌ','')
FROM TableName
The column has Latin1_General_CI_AS collation and datatype is nvarchar. I tried changing the collation to something binary, but no success as well:
SELECT ColumnTextWithUnicode, REPLACE(ColumnTextWithUnicode collate Latin1_General_BIN, 'ˌ' collate Latin1_General_BIN,'')
FROM TableName
Or even using the NChar() function like:
SELECT ColumnTextWithUnicode, REPLACE(ColumnTextWithUnicode , NCHAR(716),'')
FROM TableName
The results are 'λeˌβár' for all three.
But if I cast the column to varchar like:
SELECT ColumnTextWithUnicode, REPLACE(CAST(ColumnTextWithUnicode as varchar(100)), 'ˌ','')
FROM TableName
the result becomes 'eßár', removing both the first character and 'ˌ'.
Any ideas to remove just the 'ˌ'?
you just need to put N before string pattern too (if you want look for unicode char):
SELECT REPLACE (N'λeˌβár' COLLATE Latin1_General_BIN, N'ˌ', '')
It is working fine by following select query as we are getting U+FFFD � REPLACEMENT CHARACTER when we bulk inserting address filled from txt to sql.
select Address, REPLACE(Address COLLATE Latin1_General_BIN,N'�',' ') from #Temp

Confused about default string comparison option in SQL Server

I am completely confused about the default string comparison method used in Microsoft SQL Server. Up till now I had been using UPPER() and LOWER() functions for performing any string comparison on Microsoft SQL Server.
However got to know that by default Microsoft SQL Server is case insensitive and we need to change the collation while installing Microsoft SQL Server to make it case sensitive. However if this is the case then what is the use of UPPER and LOWER() functions.
if you like to compare case sensitive string this might be the syntax you looking for
IF #STR1 COLLATE Latin1_General_CS_AS <> #STR2 COLLATE Latin1_General_CS_AS
PRINT 'NOT MATCH'
As you have discovered, upper and lower are only of use in comparisons when you have a case-sensitive collation applied, but that doesn't make them useless.
For example, Upper and Lower can be used for formatting results.
select upper(LicencePlate) from cars
You can apply collations without reinstalling, by applying to a column in the table design, or to specific comparisons ie:
if 'a' = 'A' collate latin1_general_cs_as
select '1'
else
select '2'
if 'a' = 'A' collate latin1_general_ci_as
select '3'
else
select '4'
See http://technet.microsoft.com/en-us/library/aa258272(v=sql.80).aspx

SQL search for values in a field with text datatype regardless of casing

I am trying to get all values for a particular search regardless of casing. On our SQL Server database case sensitivity is turned on and I don't want to have to change this if possible.
If I do a SELECT statement that includes the LOWER() function as follows
SELECT COUNT(blogpostId) as blogpostcount
FROM blogposts
WHERE stateId = '1'
AND blogId = '20'
AND LOWER(blogpostContent) LIKE '%test%'
it throws and error to say
Argument data type text is invalid for argument 1 of lower function.
The data type for the blogpostContent column is text. If I change this to nvarchar this works however nvarchar only allows a maximum of 255 chars and I need a lot more than this.
Is there anyway to check for results in the text field regardless of casing??
Thanks in advance
You could explicity force it to use a CASE INSENSITIVE collation like so:
SELECT COUNT(blogpostId) as blogpostcount
FROM blogposts
WHERE stateId='1'
AND blogId = '20'
AND blogpostContent LIKE '%test%' COLLATE SQL_Latin1_General_CP1_CI_AS

Resources