unexpected output sql server using count - sql-server

I am using sql-server 2012
The query is :
CREATE TABLE TEST ( NAME VARCHAR(20) );
INSERT TEST
( NAME
)
SELECT NULL
UNION ALL
SELECT 'James'
UNION ALL
SELECT 'JAMES'
UNION ALL
SELECT 'Eric';
SELECT NAME
, COUNT(NAME) AS T1
, COUNT(COALESCE(NULL, '')) T2
, COUNT(ISNULL(NAME, NULL)) T3
, COUNT(DISTINCT ( Name )) T4
, COUNT(DISTINCT ( COALESCE(NULL, '') )) T5
, ##ROWCOUNT T6
FROM TEST
GROUP BY Name;
DROP TABLE TEST;
In the result set ther is no 'JAMES' ? (caps)
please tell how this was excluded
expected was Null,james,JAMES,eric

You need to change your Name column collation to Latin1_General_CS_AS which is case sensitive
SELECT NAME COLLATE Latin1_General_CS_AS,
Count(NAME) AS T1,
Count(COALESCE(NULL, '')) T2,
Count(Isnull(NAME, NULL)) T3,
Count(DISTINCT ( Name )) T4,
Count(DISTINCT ( COALESCE(NULL, '') )) T5,
##ROWCOUNT T6
FROM TEST
GROUP BY Name COLLATE Latin1_General_CS_AS;

Use a sensitive case collation like COLLATE Latin1_General_CS_AS.
CREATE TABLE TEST ( NAME VARCHAR(20) COLLATE Latin1_General_CS_AS );

The other people who commented here are correct.
It would be easier for you to understand their meaning if you googled for collation and case sensitivity, but in layman's terms it's like this:
Collation is a little like encoding; It determines how the characters in string columns are interpreted, ordered and compared to one another. Case insensitive means that UPPERCASE / lowercase are considered exactly the same, so for instance 'JAMES', 'james', 'JaMeS' etc would be no different to SQL Server. So when your database has a case-insensitive collation and you then create a table with a column without defining the collation, that column will inherit the default collation used by the database, which is how we arrived here.
You can manually alter a column collation, or define it during a query, but bear in mind that whenever you compare two different columns, you need to assign both of them to use the same collation, or you will get an error. That's why it's good practice to pretty much use the same collation throughout the database barring special query-specific circumstances.
To your question regarding what Latin1_General_CS_AS means, it basically means "Latin1_General" alphabet, the details of which you can check online. The "CS" part means case-sensitive, if it were case-insensitive you would see "CI" instead. The "AS" means accent-sensitivity, and "AI" would mean accent-insensitivity. Basically, whether 'Á' is considered to be equal to 'A', or not.
You can read a lot more about it from the source, here.

Related

SQL Server : finding substring using PATINDEX function

I'm writing different queries in SQL Server.
I have 2 tables, Employees and Departments.
Table Employees consists of EMPLOYEE_ID, ENAME, ID_DEP - department id. Table Departments consists of ID_DEP, DNAME.
The task is to show Employee.ENAME and his Department.DNAME where Department.DNAME has word Sales inside. I have to use functions SUBSTRING and PATINDEX.
Here is my code, but I think that it looks quite strange and it's meaningless. Nevertheless I need to use both functions in this task.
SELECT e.ENAME, d.DNAME
FROM EMPLOYEE e
JOIN DEPARTMENTS d ON d.ID_DEP = e.ID_DEP
WHERE UPPER(SUBSTRING(d.DNAME, (PATINDEX('%SALES%', d.DNAME)), 5)) = 'SALES'
Any ideas what should I change while continuing using these two functions?
The answer is just below, and BTW, using row constructor VALUES is an excellent mean to get a simple demo of what you want.
The query below provides several possible answers to your ambiguous question. Why would you need to use these functions? Is it an homework that specify this? If your SQL Server database was installed with a case insensitive collation, or the column 'name' was set to this collation, no matter how UPPER is used, it will makes no difference in match. The most you can get of UPPER is to make the data appears uppercase in the result, or turn data to uppercase if you update the column. PATINDEX/LIKE are going to perform case insensitive match. And you know, this is so useful, that most people configure their server with some case insensitive collation. To circumvent default comparison behavior that match the column/database collation, specify the collate clause, as in the outer apply of Test2.
Here are the queries. Watch the results, they show what I said.
select *
From
(Values ('très sales'), ('TRES SALES'), ('PLUTOT PROPRE')) as d(name)
outer apply (Select Test1='match' Where Substring(name, patindex('%SALES%', name), 5) = 'SALES') as test1
outer apply (Select Test2='match' Where name COLLATE Latin1_General_CS_AS like '%SALES%' ) as test2 -- CS_AS mean case sensitive
outer apply (Select Test3='match' Where name like '%SALES%') as test3
select * -- really want an upper case match ?
From
(Values ('très sales'), ('TRES SALES'), ('PLUTOT PROPRE')) as d(name)
Where name COLLATE Latin1_General_CS_AS like '%SALES%'
select * -- demo of patindex
From
(Values ('très sales'), ('TRES SALES'), ('PLUTOT PROPRE')) as d(name)
outer apply (Select ReallyUpperMatch=name Where patindex('%SALES%', name COLLATE Latin1_General_CS_AS)>0 ) as ReallyUpperMatch -- CI_AS mean case sensitive
outer apply (Select ciMatch=name Where name like '%SALES%' ) as ciMatch
outer apply (Select MakeItLookUpper=UPPER(ciMatch) ) MakeItLookUpper

Case sensitivity goes crazy

I have a database and I am trying to execute the following query:
SELECT COUNT(*) FROM [Resource] WHERE Name LIKE 'ChinaApp%'
SELECT COUNT(*) FROM [Resource] WHERE Name LIKE 'Chinaapp%'
This is returning 2 different counts:
The first thing that came to my mind is to check the case sensitivity. I checked the collation on the server level, the database level and the column level:
Server level : Latin1_General_CI_AS
SELECT SERVERPROPERTY('COLLATION')
Database level : Danish_Norwegian_CI_AS
SELECT DATABASEPROPERTYEX('Data Warehouse', 'Collation')
Column level : Danish_Norwegian_CI_AS
SELECT TABLE_NAME, COLUMN_NAME, COLLATION_NAME
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'Resource'
AND COLUMN_NAME = 'Name'
Question :
What is going wrong with the query? The case sensitivity is disabled as proven before. Why the counts are different?
Danish_Norwegian_CI_AS is the issue! Thank you #realspirituals for the hint!
In this default collation I have, 'aa' is actually one single character. The last line in the following link explain it. Å, å, AA, Aa and aa are all the same.
Collation chart for Danish_Norwegian_CI_AS
The following queries now provide the correct result set (count):
SELECT COUNT(*) FROM [Resource] WHERE Name LIKE 'ChinaApp%'
and
SELECT COUNT(*) FROM [Resource] WHERE Name LIKE 'Chinaapp%'
COLLATE Latin1_General_CI_AS
Try this:
SELECT COUNT ( * ) FROM [RESOURCE] WHERE Name COLLATE SQL_Latin1_General_CI_AS LIKE 'Chinaapp%'

Search for Accented Characters

I've been looking around and I've found a lot of information regarding creating searches that are accent insensitive. This isn't what I'm after.
Some accented data is causing a problem in my UI and I'm looking to do some impact analysis.
Is there an elegant way to search a field for any accented character, other than unioning many different selects with different characters in each?
The COLLATE clause changes the foo expression to allow you to GROUP BY separately
DECLARE #t TABLE (foo varchar(100));
INSERT #t VALUES ('bar'), ('bár'), ('xxx'), ('yyy'), ('foo'), ('foó'), ('foö');
SELECT
foo COLLATE Latin1_General_CI_AI,
MIN(foo COLLATE Latin1_General_CI_AS), MAX(foo COLLATE Latin1_General_CI_AS)
FROM
#t
GROUP BY
foo COLLATE Latin1_General_CI_AI
HAVING
MIN(foo COLLATE Latin1_General_CI_AS) <> MAX(foo COLLATE Latin1_General_CI_AS);
Or
SELECT
foo COLLATE Latin1_General_CI_AI,
COUNT(*)
FROM
#t
GROUP BY
foo COLLATE Latin1_General_CI_AI
HAVING
COUNT(*) > 1;
The first one gives you some actual values rather than just a COUNT. But not all if you have several accented words
For SQL Server 2012, you can use this
SELECT
*
FROM
(
SELECT
FIRST_VALUE(foo) OVER (PARTITION BY foo COLLATE Latin1_General_CI_AI ORDER BY foo COLLATE Latin1_General_CI_AS) AS safeFoo,
foo
FROM
#t
) X
WHERE
safeFoo <> foo
The 1st and 3rd rely on the sorting of non-accent characters before accented characters

Sql Server - Using Collation

I am using below query in which I need to specify collation hint to avoid collation issues across databases as this query uses tables from 2 databases.
Msg 468, Level 16, State 9, Line 12 Cannot resolve the collation
conflict between "Latin1_General_CS_AI" and "Latin1_General_CS_AS" in
the equal to operation.
Currently I am getting above error for collation conflicts when I run some of the queries which uses different databases with different collations:
Delete from table1 where oldcolumn in
(
select newcolumn from Database2.dbo.table2
where invoiceid = #invno
and complete = 0
)
I changed the query to include collation hint as below:
Delete from table1 where oldcolumn COLLATE SQL_Latin1_General_CP1_CS_AS in
(
select newcolumn from Database2.dbo.table2
where invoiceid = #invno
and complete = 0
)
Will above query solve the problem of collation?
Is it same to specify collate hint on left or right of operator (e.g. "=" operator)?
Can query like invoiceid = #invno ever generate runtime collation conflit error?
Note: I am asking this question as I do not have access to any of the above 2 databases and the script will be run on actual databases.
Use Below Query :
DELETE FROM Table1
WHERE Table1.ID IN (
SELECT Table1.ID
FROM Table1
INNER JOIN Database2.dbo.Table2 Table2 ON Table2.NewColumn = Table1.OldColumn COLLATE SQL_Latin1_General_CP1_CS_AS
WHERE Table2.invoiceid = #invno
AND Table2.complete = 0
)

SQL change field Collation in a select

i'm trying to do the following select:
select * from urlpath where substring(urlpathpath, 3, len(urlpathpath))
not in (select accessuserpassword from accessuser where accessuserparentid = 257)
I get the error:
Cannot resolve the collation conflict between
"SQL_Latin1_General_CP1_CI_AI" and
"SQL_Latin1_General_CP1_CI_AS" in the equal to operation.
Does anyone know how i can cast as a collation, or something that permits me to match this condition?
Thanx
You can add COLLATE CollationName after the column name for the column you want to "re-collate". (Note: the collation name is literal, not quoted)
You can even do the collate on the query to create a new table with that query, for example:
SELECT
*
INTO
#TempTable
FROM
View_total
WHERE
YEAR(ValidFrom) <= 2007
AND YEAR(ValidTo)>= 2007
AND Id_Product = '001'
AND ProductLine COLLATE DATABASE_DEFAULT IN (SELECT Product FROM #TempAUX)
COLLATE DATABASE_DEFAULT causes the COLLATE clause inherits the collation of the current database, eliminating the difference between the two

Resources