SQL Server 2012- Server collation and database collation - sql-server

I have SQL Server 2012 installed that is used for a few different applications. One of our applications needs to be installed, but the company is saying that:
The SQL collation isn't correct, it needs to be: SQL_Latin1_General_CP1_CI_AS
You can just uninstall the SQL Server Database Engine & upon reinstall select the right collation.
What possible reason would this company have to want to change the collation of the database engine itself?

Yes, you are able to set the collation at the database level. To do so, here is an example:
USE master;
GO
ALTER DATABASE <DatabaseName>
COLLATE SQL_Latin1_General_CP1_CI_AS;
GO

You can alter the database Collation even after you have created the database using the following query
USE master;
GO
ALTER DATABASE Database_Name
COLLATE Your_New_Collation;
GO
For more information on database collation Read here

What possible reason would this company have to want to change the collation of the database engine itself?
The other two answers are speaking in terms of Database-level Collation, not Instance-level Collation (i.e. "database engine itself"). The most likely reason that the vendor has for wanting a highly specific Collation (not just a case-insensitive one of your choosing, for example) is that, like most folks, they don't really understand how Collations work, but what they do know is that their application works (i.e. does not get Collation conflict errors) when the Instance and Database both have a Collation of SQL_Latin1_General_CP1_CI_AS, which is the Collation of their Instance and Database (that they develop the app on), because that is the default Collation when installing on an OS having English as its language.
I'm guessing that they have probably had some customers report problems that they didn't know how to fix, but narrowed it down to those Instances not having SQL_Latin1_General_CP1_CI_AS as the Instance / Server -level Collation. The Instance-level Collation controls not just tempdb meta-data (and default column Collation when no COLLATE keyword is specified when creating local or global temporary tables), which has been mentioned by others, but also name resolution for variables / parameters, cursors, and GOTO labels. Even if unlikely that they would be using GOTO statements, they are certainly using variables / parameters, and likely enough to be using cursors.
What this means is that they likely had problems in one or more of the following areas:
Collation conflict errors related to temporary tables:
tempdb being in the Collation of the Instance does not always mean that there will be problems, even if the COLLATE keyword was never used in a CREATE TABLE #[#]... statement. Collation conflicts only occur when attempting to combine or compare two string columns. So assuming that they created a temporary table and used it in conjunction with a table in their Database, they would need to be JOINing on those string columns, or concatenating them, or combining them via UNION, or something along those lines. Under these circumstances, an error will occur if the Collations of the two columns are not identical.
Unexpected behavior:
Comparing a string column of a table to a variable or parameter will use the Collation of the column. Given their requirement for you to use SQL_Latin1_General_CP1_CI_AS, this vendor is clearly expecting case-insensitive comparisons. Since string columns of temp tables (that were not created using the COLLATE keyword) take on the Collation of the Instance, if the Instance is using a binary or case-sensitive Collation, then their application will not be returning all of the data that they were expecting it to return.
Code compilation errors:
Since the Instance-level Collation controls resolution of variable / parameter / cursor names, if they have inconsistent casing in any of their variable / parameter / cursor names, then errors will occur when attempting to execute the code. For example, doing this:
DECLARE #CustomerID INT;
SET #customerid = 5;
would get the following error:
Msg 137, Level 15, State 1, Line XXXXX
Must declare the scalar variable "#customerid".
Similarly, they would get:
Msg 16916, Level 16, State 1, Line XXXXX
A cursor with the name 'Customers' does not exist.
if they did this:
DECLARE customers CURSOR FOR SELECT 1 AS [Bob];
OPEN Customers;
These problems are easy enough to avoid, simply by doing the following:
Specify the COLLATE keyword on string columns when creating temporary tables (local or global). Using COLLATE DATABASE_DEFAULT is handy if the Database itself is not guaranteed to have a particular Collation. But if the Collation of the Database is always the same, then you can specify either DATABASE_DEFAULT or the particular Collation. Though I suppose DATABASE_DEFAULT works in both cases, so maybe it's the easier choice.
Be consistent in casing of identifiers, especially variables / parameters. And to be more complete, I should mention that Instance-level meta-data is also affected by the Instance-level Collation (e.g. names of Logins, Databases, server-Roles, SQL Agent Jobs, SQL Agent Job Steps, etc). So being consistent with casing in all areas is the safest bet.
Am I being unfair in assuming that the vendor doesn't understand how Collations work? Well, according to a comment made by the O.P. on M.Ali's answer:
I got this reply from him: "It's the other way around, you need the new SQL instance collation to match the old SQL collation when attaching databases to it. The collation is used in the functioning of the database, not just something that gets set when it's created."
the answer is "no". There are two problems here:
No, the Collations of the source and destination Instances do not need to match when attaching a Database to a new Instance. In fact, you can even attach a system DB to an Instance that has a different Collation, thereby having a mismatch between the attached system DB and the Instance and the other system DBs.
It's unclear if "database" in that last sentence means actual Database or the Instance (sometimes people use the term "database" to refer to the RDBMS as a whole). If it means actual "Database", then that is entirely irrelevant because the issue at hand is the Instance-level Collation. But, if the vendor meant the Instance, then while true that the Collation is used in normal operations (as noted above), this only shows awareness of simple cause-effect relationship and not actual understanding. Actual understanding would lead to doing those simple fixes (noted above) such that the Instance-level Collation was a non-issue.
If needing to change the Collation of the Instance, please see:
Changing the Collation of the Instance, the Databases, and All Columns in All User Databases: What Could Possibly Go Wrong?
For more info on working with Collations / encodings / Unicode / etc, please visit:
Collations.Info

Related

Case sensitive sybase query: Invalid column name

Details:
2 databases: sybase version 15 and sybase version 16
1 table each (identical): AuthRole with columns id, rolename and description
Tried both jTDS and jconn drivers
Query:
SELECT t1.roleName FROM AuthRole t1;
Results:
Sybase 15: rows returned successfully. 'roleName' could be upper, lower or a mix of case, i.e. not case sensitive
Sybase 16: Invalid column name 'roleName'. It will only work with 'rolename' which is the exact case of the column. Anyone know why this would happen and how to resolve it?
If on ASE 15 both queries work - with "rolename" and "roleName" - that means the the sort order in this database is case insensitive.
If on ASE 16 "rolename" is different than "roleName" - that means the the sort order in this database is case sensitive.
You can check this by querying:
if "a" = "A" print "Case insensitive" else print "Case sensitive"
This setting is set and static for the whole server (and for all the databases that the server contains), but can be changed. Of course changing the sort order is a time consuming process, as it requires to rebuild all indexes based on character types.
You can check the server sortorder setting:
exec sp_configure 'sortorder id'
The information about sort order should be visible in the ASE errorlog when the database server starts:
00:0002:00000:00002:2017/07/04 16:49:26.35 server ASE's default unicode sort order is 'binary'.
00:0002:00000:00002:2017/07/04 16:49:26.35 server ASE's default sort order is:
00:0002:00000:00002:2017/07/04 16:49:26.35 server 'bin_iso_1' (ID = 50)
00:0002:00000:00002:2017/07/04 16:49:26.35 server on top of default character set:
00:0002:00000:00002:2017/07/04 16:49:26.35 server 'iso_1' (ID = 1).
In my example the sort order is binary - which is case sensitive.
Information how to change the sort order for the server is in the ASE manual. Basicaly to change the sort order you need to:
add the new sort order using the charset program,
change the config parameter 'sortorder id'
reboot the ASE server (the server boots, rebuilds the disk devices and then it shuts down)
reboot the ASE server again
indexes that are build on character types are marked as invalid and need to be rebuild
Sounds like an issue with the sort order, eg:
ASE 15 is configured with a case-insensitive sort order
ASE 16 is configured with a case-sensitive sort order
You should be able to confirm the above by running sp_helpsort.
In ASE, case (in)sensitivity applies to data as well as identifiers (eg, table/column names).
To get ASE 16 to function like ASE 15, the DBA will need to change the sort order in the ASE 16 dataserver (I'd suggest they also verify the character set while they're at it).
Keep in mind that changing the sort order (and/or character set) is a dataserver-wide configuration and will require (at a minimum) a rebuild of all indexes and re-running of update index statistics. [For more info the DBA should refer to the ASE System Administration Guide, Chapter on Configuring Character Sets, Sort Orders and Languages.]
Off the top of my head:
In older versions of Sybase ASE you had to carefully set the case-sensitivity at server installation time. The installer defaults to case-sensitive. Maybe the admin who installed ASE15 noticed this (and changed the default to case-insensitive), whereas the admin who installed your ASE16 didn't.
Yes, case-Sensitivity is a property of the Server. You can change it at a later time with sp_configure or ALTER DATABASE, or both (I don't remember and I don't have the time to look it up). You can also use a graphical admin tool to change the server default sort order.
In any case only databases created after that configuration change will be affected. Confusingly, older databases will still be case-sensitive, or lots of warnings will be issued. This is because in your older tables, all primary keys (PK) are implemented as indices, in assume case-sensitivity, and PKs and the PK indices cannot be changed by an installer or a config wizard.
In fact, you have to drop and re-create the indices and run dbcc something (again I don't remember).
For small databases, this drop-and-recreate of indices can of cause be done (use a script or a database reengineering tool to do so). For larger databases this can take some time.
Maybe it's different for ASE16 -check the docuemntation

SQL Server Column names case sensitivity

The DB I use has French_CI_AS collation (CI should stand for Case-Insensitive) but is case-sensitive anyway. I'm trying to understand why.
The reason I assert this is that bulk inserts with a 'GIVEN' case setup fail, but they succeed with another 'Given' case setup.
For example:
INSERT INTO SomeTable([GIVEN],[COLNAME]) VALUES ("value1", "value2") fails, but
INSERT INTO SomeTable([Given],[ColName]) VALUES ("value1", "value2") works.
EDIT
Just saw this:
http://msdn.microsoft.com/en-us/library/ms190920.aspx
so that means it should be possible to change a column's collation without emptying all the data and recreating the related table?
Given this critical piece of information (that is in a comment on the question and not in the actual question):
In fact I use Microsoft .Net's bulk insert method, so I don't really know the exact query it sends to the DB server.
it makes sense that the column names are being treated as case-sensitive, even in a case-insensitive DB, since that is how the SqlBulkCopy Class works. Please see Column mappings in SqlBulkCopy are case sensitive.
ADDITIONAL NOTES
When asking about an error, please always include the actual, and full, error message in the question. Simply saying that there was an error leads to a lot of guessing and wild-goose chases that in turn lead to off-topic answers.
When asking a question, please do not change the circumstances that you are dealing with. For example, the question states (emphasis added):
bulk inserts with a 'GIVEN' case setup fail, but they succeed with another 'Given' case setup.
Yet the example statements are single INSERTs. Also, a comment on the question states:
In fact I use Microsoft .Net's bulk insert method, so I don't really know the exact query it sends to the DB server.
Using .NET and SqlBulkCopy is waaaay different than using BULK INSERT or INSERT, making the current question misleading, making it difficult (or even impossible) to answer correctly. This new bit of info also leads to more questions because when using SqlBulkCopy, you don't write any INSERT statements: you just write a SELECT statement and specify the name of the destination Table. If you specify column names at all for the destination Table, it is in the optional column mappings. Is that where the issue is?
Regarding the "EDIT" section of the question:
No, changing the Collation of the column won't help at all, even if you weren't using SqlBulkCopy. The Collation of a column determines how data stored in the column behaves, not how the column names (i.e. meta-data of the Table) behaves. It is the Collation of the Database itself that determines how Database-level object meta-data behaves. And in this case, you claim that the DB is using a case-insensitive Collation (correct, the _CI_ portion of the Collation name does mean "Case Insensitive").
Regarding the following statements made by Jonathan Leffler on the question:
that gets into a very delicate area of the interaction between delimited identifiers (normally case-sensitive) and collations (this one is case-insensitive).
No, delimited identifiers are not normally case-sensitive. The sensitivities (case, accent, kana type, width, and starting in SQL Server 2017 variation selector) of delimited identifiers is the same as for non-delimited identifiers at that same level. "Same level" means that Instance-level names (Databases, Logins, etc) are controlled by the Instance-level Collation, while Database-level names (Schemas, Objects--Tables, Views, Functions, Stored Procedures, etc--, Users, etc) are controlled by the Database-level Collation. And these two levels can have different Collations.
you need to research whether the SQL column names in a database are case-sensitive when delimited. It may also depend on how the CREATE TABLE statement is written (were the names delimited in that?). Normally, SQL is case-insensitive on column and table names; you could write INSERT INTO SoMeTaBlE(GiVeN, cOlNaMe) VALUES("v1", "v2") and if the names were never delimited, it'd be OK.
It does not matter if the column names were delimited or not when creating the Table, at least not in terms of how their resolution is handled. Column names are Database-level meta-data, and that is controlled by the default Collation of the Database. And it is the same for all Database-level meta-data within each Databases. You cannot have some column names being case-sensitive while others are case-insensitive.
Also, there is nothing special about Table and column names. They are Database-level meta-data just like User names, Schema names, Index names, etc. All of this meta-data is controlled by the Database's default Collation.
Meta-data (both Instance-level and Database-level) is only "normally" case-insensitive due to the default Collation suggested during installation being a case-insensitive Collation.
a 'delimited identifier' is a column name, table name, or something similar enclosed in double quotes, such as CREATE TABLE "table"(...)
It is more accurate to say that a delimited identifier is an identifier enclosed in whatever character(s) the DBMS in question has defined as its delimiters. And which particular characters are used for delimiters varies between the different DBMSs.
In SQL Server, delimited identifiers are enclosed in square brackets: [GIVEN]
While square brackets always work as delimiters for identifiers, it is possible to use double-quotes as delimiters IF you have the session-level property of QUOTED_IDENTIFIER set to ON (which is best to always do anyway).
There are arcane parts to SQL (and delimited identifier handling is one of them)
Well, delimited identifiers are actually quite simple. The whole point of delimiting an identifier is to effectively ignore the rules of regular (i.e. non-delimited) identifiers. But, in terms of regular identifiers, yes, those rules are rather arcane (mainly due to the official documentation being incomplete and incorrect). So, in order to take the mystery out of how identifiers in SQL Server actually work, I did a bunch of research and published the results here (which includes links to the research itself):
Completely Complete List of Rules for T-SQL Identifiers
For more info on Collations / Encodings / Unicode / ASCII, especially as they relate to Microsoft SQL Server, please visit:
Collations.Info
The fact the column names are case sensitive means that the MASTER database has been created using a case sensitive collation.
In the case I just had that lead me to investigate this, someone entered
Latin1_CS_AI instead of Latin1_CI_AS
When setting up SQL server.
Check the collation of the columns in your table definition, and the collation of the tempdb database (i.e. the server collation). They may differ from your database collation.

in sql server, what is: Latin1_General_CI_AI versus Latin1_General_CI_AS

i am using SQL Compare to compare two versions of a database. it keeps highlighting differences in the nvarchar fields, where it shows one db that has:
Latin1_General_CI_AS
and the other one has this:
Latin1_General_CI_AI
can someone please explain what this is and if i should be worried about this difference
Accent Sensitive and Accent Insensitive
Lòpez and Lopez are the same if Accent Insensitive.
What the comparison is showing is that the two columns have difference collations. The collation of a (text) field affects how it is both stored and compared.
The particular difference in your case is that accents on characters will be ignored when comparisons and sorting is done.
When you install SQL Server, you set a default collation for the whole server. You can also set a collation per database and per column, meaning that you can mix them within a database (whether you want to depends on your particular case). The MSDN page I linked to has more information on collations, how to choose the best one, and how to set them.

SQL Server default character encoding

By default - what is the character encoding set for a database in Microsoft SQL Server?
How can I see the current character encoding in SQL Server?
Encodings
In most cases, SQL Server stores Unicode data (i.e. that which is found in the XML and N-prefixed types) in UCS-2 / UTF-16 (storage is the same, UTF-16 merely handles Supplementary Characters correctly). This is not configurable: there is no option to use either UTF-8 or UTF-32 (see UPDATE section at the bottom re: UTF-8 starting in SQL Server 2019). Whether or not the built-in functions can properly handle Supplementary Characters, and whether or not those are sorted and compared properly, depends on the Collation being used. The older Collations — names starting with SQL_ (e.g. SQL_Latin1_General_CP1_CI_AS) xor no version number in the name (e.g. Latin1_General_CI_AS) — equate all Supplementary Characters with each other (due to having no sort weight). Starting in SQL Server 2005 they introduced the 90 series Collations (those with _90_ in the name) that could at least do a binary comparison on Supplementary Characters so that you could differentiate between them, even if they didn't sort in the desired order. That also holds true for the 100 series Collations introduced in SQL Server 2008. SQL Server 2012 introduced Collations with names ending in _SC that not only sort Supplementary Characters properly, but also allow the built-in functions to interpret them as expected (i.e. treating the surrogate pair as a single entity). Starting in SQL Server 2017, all new Collations (the 140 series) implicitly support Supplementary Characters, hence there are no new Collations with names ending in _SC.
Starting in SQL Server 2019, UTF-8 became a supported encoding for CHAR and VARCHAR data (columns, variables, and literals), but not TEXT (see UPDATE section at the bottom re: UTF-8 starting in SQL Server 2019).
Non-Unicode data (i.e. that which is found in the CHAR, VARCHAR, and TEXT types — but don't use TEXT, use VARCHAR(MAX) instead) uses an 8-bit encoding (Extended ASCII, DBCS, or EBCDIC). The specific character set / encoding is based on the Code Page, which in turn is based on the Collation of a column, or the Collation of the current database for literals and variables, or the Collation of the Instance for variable / cursor names and GOTO labels, or what is specified in a COLLATE clause if one is being used.
To see how locales match up to collations, check out:
Windows Collation Name
SQL Server Collation Name
To see the Code Page associated with a particular Collation (this is the character set and only affects CHAR / VARCHAR / TEXT data), run the following:
SELECT COLLATIONPROPERTY( 'Latin1_General_100_CI_AS' , 'CodePage' ) AS [CodePage];
To see the LCID (i.e. locale) associated with a particular Collation (this affects the sorting & comparison rules), run the following:
SELECT COLLATIONPROPERTY( 'Latin1_General_100_CI_AS' , 'LCID' ) AS [LCID];
To view the list of available Collations, along with their associated LCIDs and Code Pages, run:
SELECT [name],
COLLATIONPROPERTY( [name], 'LCID' ) AS [LCID],
COLLATIONPROPERTY( [name], 'CodePage' ) AS [CodePage]
FROM sys.fn_helpcollations()
ORDER BY [name];
Defaults
Before looking at the Server and Database default Collations, one should understand the relative importance of those defaults.
The Server (Instance, really) default Collation is used as the default for newly created Databases (including the system Databases: master, model, msdb, and tempdb). But this does not mean that any Database (other than the 4 system DBs) is using that Collation. The Database default Collation can be changed at any time (though there are dependencies that might prevent a Database from having it's Collation changed). The Server default Collation, however, is not so easy to change. For details on changing all collations, please see: Changing the Collation of the Instance, the Databases, and All Columns in All User Databases: What Could Possibly Go Wrong?
The server/Instance Collation controls:
local variable names
CURSOR names
GOTO labels
Instance-level meta-data
The Database default Collation is used in three ways:
as the default for newly created string columns. But this does not mean that any string column is using that Collation. The Collation of a column can be changed at any time. Here knowing the Database default is important as an indication of what the string columns are most likely set to.
as the Collation for operations involving string literals, variables, and built-in functions that do not take string inputs but produces a string output (i.e. IF (#InputParam = 'something') ). Here knowing the Database default is definitely important as it governs how these operations will behave.
Database-level meta-data
The column Collation is either specified in the COLLATE clause at the time of the CREATE TABLE or an ALTER TABLE {table_name} ALTER COLUMN, or if not specified, taken from the Database default.
Since there are several layers here where a Collation can be specified (Database default / columns / literals & variables), the resulting Collation is determined by Collation Precedence.
All of that being said, the following query shows the default / current settings for the OS, SQL Server Instance, and specified Database:
SELECT os_language_version,
---
SERVERPROPERTY('LCID') AS 'Instance-LCID',
SERVERPROPERTY('Collation') AS 'Instance-Collation',
SERVERPROPERTY('ComparisonStyle') AS 'Instance-ComparisonStyle',
SERVERPROPERTY('SqlSortOrder') AS 'Instance-SqlSortOrder',
SERVERPROPERTY('SqlSortOrderName') AS 'Instance-SqlSortOrderName',
SERVERPROPERTY('SqlCharSet') AS 'Instance-SqlCharSet',
SERVERPROPERTY('SqlCharSetName') AS 'Instance-SqlCharSetName',
---
DATABASEPROPERTYEX(N'{database_name}', 'LCID') AS 'Database-LCID',
DATABASEPROPERTYEX(N'{database_name}', 'Collation') AS 'Database-Collation',
DATABASEPROPERTYEX(N'{database_name}', 'ComparisonStyle') AS 'Database-ComparisonStyle',
DATABASEPROPERTYEX(N'{database_name}', 'SQLSortOrder') AS 'Database-SQLSortOrder'
FROM sys.dm_os_windows_info;
Installation Default
Another interpretation of "default" could mean what default Collation is selected for the Instance-level collation when installing. That varies based on the OS language, but the (horrible, horrible) default for systems using "US English" is SQL_Latin1_General_CP1_CI_AS. In that case, the "default" encoding is Windows Code Page 1252 for VARCHAR data, and as always, UTF-16 for NVARCHAR data. You can find the list of OS language to default SQL Server collation here: Collation and Unicode support: Server-level collations. Keep in mind that these defaults can be overridden; this list is merely what the Instance will use if not overridden during install.
UPDATE 2018-10-02
SQL Server 2019 introduces native support for UTF-8 in VARCHAR / CHAR datatypes (not TEXT!). This is accomplished via a set of new collations, the names of which all end with _UTF8. This is an interesting capability that will definitely help some folks, but there are some "quirks" with it, especially when UTF-8 isn't being used for all columns and the Database's default Collation, so don't use it just because you have heard that UTF-8 is magically better. UTF-8 was designed solely for ASCII compatibility: to enable ASCII-only systems (i.e. UNIX back in the day) to support Unicode without changing any existing code or files. That it saves space for data using mostly (or only) US English characters (and some punctuation) is a side-effect. When not using mostly (or only) US English characters, data can be the same size as UTF-16, or even larger, depending on which characters are being used. And, in cases where space is being saved, performance might improve, but it might also get worse.
For a detailed analysis of this new feature, please see my post, "Native UTF-8 Support in SQL Server 2019: Savior or False Prophet?".
If you need to know the default collation for a newly created database use:
SELECT SERVERPROPERTY('Collation')
This is the server collation for the SQL Server instance that you are running.
The default character encoding for a SQL Server database is iso_1, which is ISO 8859-1. Note that the character encoding depends on the data type of a column. You can get an idea of what character encodings are used for the columns in a database as well as the collations using this SQL:
select data_type, character_set_catalog, character_set_schema, character_set_name, collation_catalog, collation_schema, collation_name, count(*) count
from information_schema.columns
group by data_type, character_set_catalog, character_set_schema, character_set_name, collation_catalog, collation_schema, collation_name;
If it's using the default, the character_set_name should be iso_1 for the char and varchar data types. Since nchar and nvarchar store Unicode data in UCS-2 format, the character_set_name for those data types is UNICODE.
SELECT DATABASEPROPERTYEX('DBName', 'Collation') SQLCollation;
Where DBName is your database name.
I think this is worthy of a separate answer: although internally unicode data is stored as UTF-16 in Sql Server this is the Little Endian flavour, so if you're calling the database from an external system, you probably need to specify UTF-16LE.
You can see collation settings for each table like the following code:
SELECT t.name TableName, c.name ColumnName, collation_name
FROM sys.columns c
INNER JOIN sys.tables t on c.object_id = t.object_id where t.name = 'name of table';

SQL Server Collation / ADO.NET DataTable.Locale with different languages

we have WinForms app which stores data in SQL Server (2000, we are working on porting it in 2008) through ADO.NET (1.1, working on porting to 4.0). Everything works fine if I read data previsouly written in Western-European locale (E.g.: "test", "test ù"), but now we have to be able to mix Western and non-Western alphabets as well (E.g.: "test - ۓےۑ" - these are just random arabic chars).
On the SQL Server side, database has been set with the Latin1_General collation, the field is a nvarchar(80). If I run a SQL SELECT statement (E.g.: "SELECT * FROM MyTable WHERE field = 'test - ۓےۑ'", don't mind about the "*" or the actual names) from Query Analyzer, I get no results; the same happens if I pass the Sql statement to an ADO.NET DataAdapter to fill a DataTable. My guess is that it has something to do with collation, but I don't know how to correct this: do I have to change to collation (SQL Server) to a different one? Or do I have to set the locale on the DataAdaoter/DataTable (ADO.NET)?
Thanks in advance to anyone who will help
Shouldn't you use N when comparing nvarchar with extended char. set?
SELECT * From TestTable WHERE GreekColCaseInsensitive = N'test - ۓےۑ'
Yes, the problem is most likely the collation. The Latin1_General collation does not include the rules to sort and compare non latin characters.
MSDN claims:
If you must store character data that reflects multiple languages, you can minimize collation compatibility issues by always using the Unicode nchar, nvarchar, and ntext data types instead of the char, varchar, text data types. Using the Unicode data types eliminates code page conversion issues.
Since you have already complied with this, you should read further on the info about Mixed Collation Environments here.
Additionally I want to add that just changing a collation is not something done easy, check the MSDN for SQL 2000:
When you set up SQL Server 2000, it is important to use the correct collation settings. You can change collation settings after running Setup, but you must rebuild the databases and reload the data. It is recommended that you develop a standard within your organization for these options. Many server-to-server activities can fail if the collation settings are not consistent across servers.
You can specify a collation on a per column bases however:
CREATE TABLE TestTable (
id int,
GreekColCaseInsensitive nvarchar(10) collate greek_ci_as,
LatinColCaseSensitive nvarchar(10) collate latin1_general_cs_as
)
Have a look at the different binary multilingual collations here. Depending on the charset you use, you should find one that fits your purpose.
If you are not able or willing to change the collation of a column you can also just specify the collation to be used in the query like:
SELECT * From TestTable
WHERE GreekColCaseInsensitive = N'test - ۓےۑ'
COLLATE latin1_general_cs_as
As jfrobishow pointed out the use of N in front of the string you want to use to compare is essential. What does it do:
It denotes that the subsequent string is in Unicode (the N actually stands for National language character set). Which means that you are passing an NCHAR, NVARCHAR or NTEXT value, as opposed to CHAR, VARCHAR or TEXT. See Article #2354 for a comparison of these data types.
You can find a quick rundown here.

Resources