Search for Accented Characters - sql-server

I've been looking around and I've found a lot of information regarding creating searches that are accent insensitive. This isn't what I'm after.
Some accented data is causing a problem in my UI and I'm looking to do some impact analysis.
Is there an elegant way to search a field for any accented character, other than unioning many different selects with different characters in each?

The COLLATE clause changes the foo expression to allow you to GROUP BY separately
DECLARE #t TABLE (foo varchar(100));
INSERT #t VALUES ('bar'), ('bár'), ('xxx'), ('yyy'), ('foo'), ('foó'), ('foö');
SELECT
foo COLLATE Latin1_General_CI_AI,
MIN(foo COLLATE Latin1_General_CI_AS), MAX(foo COLLATE Latin1_General_CI_AS)
FROM
#t
GROUP BY
foo COLLATE Latin1_General_CI_AI
HAVING
MIN(foo COLLATE Latin1_General_CI_AS) <> MAX(foo COLLATE Latin1_General_CI_AS);
Or
SELECT
foo COLLATE Latin1_General_CI_AI,
COUNT(*)
FROM
#t
GROUP BY
foo COLLATE Latin1_General_CI_AI
HAVING
COUNT(*) > 1;
The first one gives you some actual values rather than just a COUNT. But not all if you have several accented words
For SQL Server 2012, you can use this
SELECT
*
FROM
(
SELECT
FIRST_VALUE(foo) OVER (PARTITION BY foo COLLATE Latin1_General_CI_AI ORDER BY foo COLLATE Latin1_General_CI_AS) AS safeFoo,
foo
FROM
#t
) X
WHERE
safeFoo <> foo
The 1st and 3rd rely on the sorting of non-accent characters before accented characters

Related

Merge Join Behaving differently when executed on Server and Local Machine. Outputs completely different

I have an SSIS package which has two source inputs sorted identically and with the same collation. The Merge Join is doing a full outer join.
The two Queries in the source are as follows:
SELECT [PolicyReference]
,[PolicyNarrativeReference]
,[PolicyNarrativeTypeCode]
,[PolicySystemCode]
,[NaturalKeyHash]
,[FullRowHash]
,[SystemName]
,[FunctionalEntityName]
FROM [KeyHash].[dbo].[vw_Debug_PolicyNarrative_Server_KeyHash]
ORDER BY [PolicyReference] COLLATE Latin1_General_CI_AS
,[PolicyNarrativeReference] COLLATE Latin1_General_CI_AS
,[PolicyNarrativeTypeCode] COLLATE Latin1_General_CI_AS
,[PolicySystemCode] COLLATE Latin1_General_CI_AS
SELECT [PolicyReference]
,[PolicyNarrativeDate]
,[PolicyNarrativeReference]
,[PolicyNarrativeText]
,[PolicyNarrativeTypeCode]
,[PolicyNarrativeSystemCode]
,[PolicySystemCode]
FROM [KeyHash].[dbo].[vw_Debug_PolicyNarrative_Server_Source]
ORDER BY [PolicyReference] COLLATE Latin1_General_CI_AS
,[PolicyNarrativeReference] COLLATE Latin1_General_CI_AS
,[PolicyNarrativeTypeCode] COLLATE Latin1_General_CI_AS
,[PolicySystemCode] COLLATE Latin1_General_CI_AS
Now when this Package is run on the SQL Server through the SQL Agent, The sorting of the two sets becomes skewed and creates 2995 rows which have a blank output
SELECT *
FROM [KeyHash].[dbo].[Debug_PolicyNarrative_Server_MJ]
where NaturalKeyHash = 0
However when we run the exact same SSIS package locally with the exact same variables, Servers, etc, the sorting works as intended and produces exactly what we want.
SELECT *
FROM [KeyHash].[dbo].[Debug_PolicyNarrative_Local_MJ]
where NaturalKeyHash = 0
The data is exactly the same, the queries are exactly the same, the collations are identical. What on earth could be causing this strange behaviour?
We have tried executing this same package with the same variables and connections but we are seeing the results as above and we are completely stumped.
Also worth mentioning as per David Browne's comment that we do not have Duplicate values within the Order By columns
SELECT COUNT(*)
,[PolicyReference]
,[PolicyNarrativeReference]
,[PolicyNarrativeTypeCode]
,[PolicySystemCode]
FROM [KeyHash].[dbo].[vw_Debug_PolicyNarrative_Server_KeyHash]
GROUP BY [PolicyReference]
,[PolicyNarrativeReference]
,[PolicyNarrativeTypeCode]
,[PolicySystemCode]
HAVING COUNT(*) > 1
SELECT COUNT(*)
,[PolicyReference]
,[PolicyNarrativeReference]
,[PolicyNarrativeTypeCode]
,[PolicySystemCode]
FROM [KeyHash].[dbo].[vw_Debug_PolicyNarrative_Server_Source]
GROUP BY [PolicyReference]
,[PolicyNarrativeReference]
,[PolicyNarrativeTypeCode]
,[PolicySystemCode]
HAVING COUNT(*) > 1
Output

SQL SERVER LIKE statement only works when inserting one Unicode character (doesn't work with multiple chars)

I have some SQL query, like this:
SELECT ...
FROM ...
WHERE FIELD LIKE N'%ב%'
which works fine. but if I insert more characters, it doesn't return anything even it should (the field contains 'בוצע')
SELECT ...
FROM ...
WHERE FIELD LIKE N'%בו%'
And ideas? thanks!
As an absolute last-resort option: if you need to insert Unicode text in a Unicode-unsafe end-to-end scenario (i.e. where something in-between your keyboard and the target database is mangling correct Unicode encoding, or using some other encoding) then you should always be able to fall-back to using CONCAT( NCHAR(), ... ) to build a string using only 7-bit ASCII-safe T-SQL source...
You'd replace the literal Hebrew characters in the string-literal with an NCHAR() function - this will work regardless of how your SQL is saved or encoded (don't forget RTL/LTR handling too...):
DECLARE #hebrewLikePattern nvarchar(50) = CONCAT( N'%', NCHAR( 1489 ), NCHAR( 1493 ), NCHAR( 1510 ), NCHAR( 1506 ), N'%' );
SELECT
*
FROM
tbl
WHERE
someColumn LIKE #hebrewLikePattern;
(For some reason I can't seem to post Hebrew text to StackOverflow without RTL/LTR getting messed-up - so here's a screenshot of the char values I got using Linqpad):
Please check the following.
SQL
-- SQL Server 2017 and earlier
DECLARE #tbl TABLE (ID INT IDENTITY PRIMARY KEY, hebrew_col NVARCHAR(100));
INSERT INTO #tbl (hebrew_col) VALUES
(N'שָׁלוֹם');
SELECT * FROM #tbl;
SELECT * FROM #tbl
WHERE hebrew_col LIKE N'%ל%';
-- SQL Server 2019 onwards
DECLARE #tbl2 TABLE (
ID INT IDENTITY PRIMARY KEY,
hebrew_col VARCHAR(100) COLLATE Latin1_General_100_CI_AI_SC_UTF8);
INSERT INTO #tbl2 (hebrew_col) VALUES
(N'שָׁלוֹם');
SELECT * FROM #tbl2;
SELECT * FROM #tbl2
WHERE hebrew_col LIKE N'%ל%';

unexpected output sql server using count

I am using sql-server 2012
The query is :
CREATE TABLE TEST ( NAME VARCHAR(20) );
INSERT TEST
( NAME
)
SELECT NULL
UNION ALL
SELECT 'James'
UNION ALL
SELECT 'JAMES'
UNION ALL
SELECT 'Eric';
SELECT NAME
, COUNT(NAME) AS T1
, COUNT(COALESCE(NULL, '')) T2
, COUNT(ISNULL(NAME, NULL)) T3
, COUNT(DISTINCT ( Name )) T4
, COUNT(DISTINCT ( COALESCE(NULL, '') )) T5
, ##ROWCOUNT T6
FROM TEST
GROUP BY Name;
DROP TABLE TEST;
In the result set ther is no 'JAMES' ? (caps)
please tell how this was excluded
expected was Null,james,JAMES,eric
You need to change your Name column collation to Latin1_General_CS_AS which is case sensitive
SELECT NAME COLLATE Latin1_General_CS_AS,
Count(NAME) AS T1,
Count(COALESCE(NULL, '')) T2,
Count(Isnull(NAME, NULL)) T3,
Count(DISTINCT ( Name )) T4,
Count(DISTINCT ( COALESCE(NULL, '') )) T5,
##ROWCOUNT T6
FROM TEST
GROUP BY Name COLLATE Latin1_General_CS_AS;
Use a sensitive case collation like COLLATE Latin1_General_CS_AS.
CREATE TABLE TEST ( NAME VARCHAR(20) COLLATE Latin1_General_CS_AS );
The other people who commented here are correct.
It would be easier for you to understand their meaning if you googled for collation and case sensitivity, but in layman's terms it's like this:
Collation is a little like encoding; It determines how the characters in string columns are interpreted, ordered and compared to one another. Case insensitive means that UPPERCASE / lowercase are considered exactly the same, so for instance 'JAMES', 'james', 'JaMeS' etc would be no different to SQL Server. So when your database has a case-insensitive collation and you then create a table with a column without defining the collation, that column will inherit the default collation used by the database, which is how we arrived here.
You can manually alter a column collation, or define it during a query, but bear in mind that whenever you compare two different columns, you need to assign both of them to use the same collation, or you will get an error. That's why it's good practice to pretty much use the same collation throughout the database barring special query-specific circumstances.
To your question regarding what Latin1_General_CS_AS means, it basically means "Latin1_General" alphabet, the details of which you can check online. The "CS" part means case-sensitive, if it were case-insensitive you would see "CI" instead. The "AS" means accent-sensitivity, and "AI" would mean accent-insensitivity. Basically, whether 'Á' is considered to be equal to 'A', or not.
You can read a lot more about it from the source, here.

ORDER BY ... COLLATE in SQL Server

Under SQL Server. A table contains some text with different cases. I want to sort them case-sensitive and thought that a COLLATE in the ORDER BY would do it. It doesn't. Why?
CREATE TABLE T1 (C1 VARCHAR(20))
INSERT INTO T1 (C1) VALUES ('aaa1'), ('AAB2'), ('aba3')
SELECT * FROM T1 ORDER BY C1 COLLATE Latin1_General_CS_AS
SELECT * FROM T1 ORDER BY C1 COLLATE Latin1_General_CI_AS
Both queries return the same, even if the first one is "CS" for case-sensitive
aaa1
AAB2
aba3
(in the first case, I want AAB2, aaa1, aba3)
My server is a SQL Server Express 2008 (10.0.5500) and its default server collation is Latin1_General_CI_AS.
The collation of the database is Latin1_General_CI_AS too.
The result remains the same if I use SQL_Latin1_General_CP1_CS_AS in place of Latin1_General_CS_AS.
You need a binary collation for your desired sort order with A-Z sorted before a-z.
SELECT * FROM T1 ORDER BY C1 COLLATE Latin1_General_bin
The CS collation sorts aAbB ... zZ
Because that is the correct case sensitive collation sort order. It is explained in Case Sensitive Collation Sort Order why this is the case, it has to do with the Unicode specifications for sorting. aa will sort ahead of AA but AA will sort ahead of ab.

SQL change field Collation in a select

i'm trying to do the following select:
select * from urlpath where substring(urlpathpath, 3, len(urlpathpath))
not in (select accessuserpassword from accessuser where accessuserparentid = 257)
I get the error:
Cannot resolve the collation conflict between
"SQL_Latin1_General_CP1_CI_AI" and
"SQL_Latin1_General_CP1_CI_AS" in the equal to operation.
Does anyone know how i can cast as a collation, or something that permits me to match this condition?
Thanx
You can add COLLATE CollationName after the column name for the column you want to "re-collate". (Note: the collation name is literal, not quoted)
You can even do the collate on the query to create a new table with that query, for example:
SELECT
*
INTO
#TempTable
FROM
View_total
WHERE
YEAR(ValidFrom) <= 2007
AND YEAR(ValidTo)>= 2007
AND Id_Product = '001'
AND ProductLine COLLATE DATABASE_DEFAULT IN (SELECT Product FROM #TempAUX)
COLLATE DATABASE_DEFAULT causes the COLLATE clause inherits the collation of the current database, eliminating the difference between the two

Resources