The DB I use has French_CI_AS collation (CI should stand for Case-Insensitive) but is case-sensitive anyway. I'm trying to understand why.
The reason I assert this is that bulk inserts with a 'GIVEN' case setup fail, but they succeed with another 'Given' case setup.
For example:
INSERT INTO SomeTable([GIVEN],[COLNAME]) VALUES ("value1", "value2") fails, but
INSERT INTO SomeTable([Given],[ColName]) VALUES ("value1", "value2") works.
EDIT
Just saw this:
http://msdn.microsoft.com/en-us/library/ms190920.aspx
so that means it should be possible to change a column's collation without emptying all the data and recreating the related table?
Given this critical piece of information (that is in a comment on the question and not in the actual question):
In fact I use Microsoft .Net's bulk insert method, so I don't really know the exact query it sends to the DB server.
it makes sense that the column names are being treated as case-sensitive, even in a case-insensitive DB, since that is how the SqlBulkCopy Class works. Please see Column mappings in SqlBulkCopy are case sensitive.
ADDITIONAL NOTES
When asking about an error, please always include the actual, and full, error message in the question. Simply saying that there was an error leads to a lot of guessing and wild-goose chases that in turn lead to off-topic answers.
When asking a question, please do not change the circumstances that you are dealing with. For example, the question states (emphasis added):
bulk inserts with a 'GIVEN' case setup fail, but they succeed with another 'Given' case setup.
Yet the example statements are single INSERTs. Also, a comment on the question states:
In fact I use Microsoft .Net's bulk insert method, so I don't really know the exact query it sends to the DB server.
Using .NET and SqlBulkCopy is waaaay different than using BULK INSERT or INSERT, making the current question misleading, making it difficult (or even impossible) to answer correctly. This new bit of info also leads to more questions because when using SqlBulkCopy, you don't write any INSERT statements: you just write a SELECT statement and specify the name of the destination Table. If you specify column names at all for the destination Table, it is in the optional column mappings. Is that where the issue is?
Regarding the "EDIT" section of the question:
No, changing the Collation of the column won't help at all, even if you weren't using SqlBulkCopy. The Collation of a column determines how data stored in the column behaves, not how the column names (i.e. meta-data of the Table) behaves. It is the Collation of the Database itself that determines how Database-level object meta-data behaves. And in this case, you claim that the DB is using a case-insensitive Collation (correct, the _CI_ portion of the Collation name does mean "Case Insensitive").
Regarding the following statements made by Jonathan Leffler on the question:
that gets into a very delicate area of the interaction between delimited identifiers (normally case-sensitive) and collations (this one is case-insensitive).
No, delimited identifiers are not normally case-sensitive. The sensitivities (case, accent, kana type, width, and starting in SQL Server 2017 variation selector) of delimited identifiers is the same as for non-delimited identifiers at that same level. "Same level" means that Instance-level names (Databases, Logins, etc) are controlled by the Instance-level Collation, while Database-level names (Schemas, Objects--Tables, Views, Functions, Stored Procedures, etc--, Users, etc) are controlled by the Database-level Collation. And these two levels can have different Collations.
you need to research whether the SQL column names in a database are case-sensitive when delimited. It may also depend on how the CREATE TABLE statement is written (were the names delimited in that?). Normally, SQL is case-insensitive on column and table names; you could write INSERT INTO SoMeTaBlE(GiVeN, cOlNaMe) VALUES("v1", "v2") and if the names were never delimited, it'd be OK.
It does not matter if the column names were delimited or not when creating the Table, at least not in terms of how their resolution is handled. Column names are Database-level meta-data, and that is controlled by the default Collation of the Database. And it is the same for all Database-level meta-data within each Databases. You cannot have some column names being case-sensitive while others are case-insensitive.
Also, there is nothing special about Table and column names. They are Database-level meta-data just like User names, Schema names, Index names, etc. All of this meta-data is controlled by the Database's default Collation.
Meta-data (both Instance-level and Database-level) is only "normally" case-insensitive due to the default Collation suggested during installation being a case-insensitive Collation.
a 'delimited identifier' is a column name, table name, or something similar enclosed in double quotes, such as CREATE TABLE "table"(...)
It is more accurate to say that a delimited identifier is an identifier enclosed in whatever character(s) the DBMS in question has defined as its delimiters. And which particular characters are used for delimiters varies between the different DBMSs.
In SQL Server, delimited identifiers are enclosed in square brackets: [GIVEN]
While square brackets always work as delimiters for identifiers, it is possible to use double-quotes as delimiters IF you have the session-level property of QUOTED_IDENTIFIER set to ON (which is best to always do anyway).
There are arcane parts to SQL (and delimited identifier handling is one of them)
Well, delimited identifiers are actually quite simple. The whole point of delimiting an identifier is to effectively ignore the rules of regular (i.e. non-delimited) identifiers. But, in terms of regular identifiers, yes, those rules are rather arcane (mainly due to the official documentation being incomplete and incorrect). So, in order to take the mystery out of how identifiers in SQL Server actually work, I did a bunch of research and published the results here (which includes links to the research itself):
Completely Complete List of Rules for T-SQL Identifiers
For more info on Collations / Encodings / Unicode / ASCII, especially as they relate to Microsoft SQL Server, please visit:
Collations.Info
The fact the column names are case sensitive means that the MASTER database has been created using a case sensitive collation.
In the case I just had that lead me to investigate this, someone entered
Latin1_CS_AI instead of Latin1_CI_AS
When setting up SQL server.
Check the collation of the columns in your table definition, and the collation of the tempdb database (i.e. the server collation). They may differ from your database collation.
Related
I am really weak as far as databases are concerned so, please bear with me.
I have a database which is in Greek.CI_AI working without any issue with several applications. All servers that put data into this DB are on Greek locale.
However, an application treats its information and checks integrity constraints in a case-sensitive manner. I have not run into any issues with the specific application so far but I am concerned that I may have to deal with it later when the data will be more and the impact even bigger. What is the proper way to do this? I mean do I just change it or should I drop it and recreate it with the right collation? If I do not have to drop it, how will this affect the data?
Comparing the two I have not found differences.
http://collation-charts.org/mssql/mssql.0408.1253.Greek_CI_AI.html
http://collation-charts.org/mssql/mssql.0408.1253.Greek_CS_AI.html
Thanks for your help!
You should not change your db collation from Greek_CI_AI to Greek_CS_AI/BIN.
If your application checks integrity constraints in a case-sensitive manner it just means that your business rules require this approach and this case sensitivity is implemented directly in those constraints.
If you change database collation to Greek_CS_AI you can just break application code. If there are tables Table1 and Table2 in your database now, all the code can reference them as table1 and table2, but once your db collation becomes case sensitive, the objects table1 and table2 will not be found.
Also, what is the difference between Greek.CI_AI and Greek.BIN
To view this by your eyes, try to do some selects of your data adding ORDER BY col1 COLLATE Greek_CS_AI --Greek_CS_BIN to your SELECT statement
You'll find that in first case your uppercase/lowercase letters will be placed next to each other but lowercase will always precede uppercase within the same letter while in the second(BIN) case ALL the uppercase letters will precede ALL the lowercase letters like this:
This is because BIN collation compare characters based on their ascii codes.
Note that there is a bug in BIN collations that compare correctly only the first character of the string, for this reason if you ever need to use binary collation always use BIN2 collations that have no bug
I just came across this when looking into someone else code.
Say there is this schema called Books that has a table call Genres...whenever this schema and table is being used on a script, such as batch/perl it was originally Books..Genres
question is, should it stay like this or changed to Books.Genres? and what is the difference?
First of all, I rarely work outside my default schema and thus rarely ever list the schema name in my SQL statements. Having said that, there are rare occasions when I do need to access more than one schema and only a single dot is used to separate the schema name from the table name. I checked both DB2 and Oracle: neither even allow a double dot. So, unless they are manipulating the SQL in some manner (e.g. maybe the code is processed in a template), SQL statements with a double dot should not work.
MySQL doesn't allow a double dot as separator either; so unless they're preprocessing the SQL in some way as kjpires suggested, this is likely an error. Does the code work?
I have SQL Server 2012 installed that is used for a few different applications. One of our applications needs to be installed, but the company is saying that:
The SQL collation isn't correct, it needs to be: SQL_Latin1_General_CP1_CI_AS
You can just uninstall the SQL Server Database Engine & upon reinstall select the right collation.
What possible reason would this company have to want to change the collation of the database engine itself?
Yes, you are able to set the collation at the database level. To do so, here is an example:
USE master;
GO
ALTER DATABASE <DatabaseName>
COLLATE SQL_Latin1_General_CP1_CI_AS;
GO
You can alter the database Collation even after you have created the database using the following query
USE master;
GO
ALTER DATABASE Database_Name
COLLATE Your_New_Collation;
GO
For more information on database collation Read here
What possible reason would this company have to want to change the collation of the database engine itself?
The other two answers are speaking in terms of Database-level Collation, not Instance-level Collation (i.e. "database engine itself"). The most likely reason that the vendor has for wanting a highly specific Collation (not just a case-insensitive one of your choosing, for example) is that, like most folks, they don't really understand how Collations work, but what they do know is that their application works (i.e. does not get Collation conflict errors) when the Instance and Database both have a Collation of SQL_Latin1_General_CP1_CI_AS, which is the Collation of their Instance and Database (that they develop the app on), because that is the default Collation when installing on an OS having English as its language.
I'm guessing that they have probably had some customers report problems that they didn't know how to fix, but narrowed it down to those Instances not having SQL_Latin1_General_CP1_CI_AS as the Instance / Server -level Collation. The Instance-level Collation controls not just tempdb meta-data (and default column Collation when no COLLATE keyword is specified when creating local or global temporary tables), which has been mentioned by others, but also name resolution for variables / parameters, cursors, and GOTO labels. Even if unlikely that they would be using GOTO statements, they are certainly using variables / parameters, and likely enough to be using cursors.
What this means is that they likely had problems in one or more of the following areas:
Collation conflict errors related to temporary tables:
tempdb being in the Collation of the Instance does not always mean that there will be problems, even if the COLLATE keyword was never used in a CREATE TABLE #[#]... statement. Collation conflicts only occur when attempting to combine or compare two string columns. So assuming that they created a temporary table and used it in conjunction with a table in their Database, they would need to be JOINing on those string columns, or concatenating them, or combining them via UNION, or something along those lines. Under these circumstances, an error will occur if the Collations of the two columns are not identical.
Unexpected behavior:
Comparing a string column of a table to a variable or parameter will use the Collation of the column. Given their requirement for you to use SQL_Latin1_General_CP1_CI_AS, this vendor is clearly expecting case-insensitive comparisons. Since string columns of temp tables (that were not created using the COLLATE keyword) take on the Collation of the Instance, if the Instance is using a binary or case-sensitive Collation, then their application will not be returning all of the data that they were expecting it to return.
Code compilation errors:
Since the Instance-level Collation controls resolution of variable / parameter / cursor names, if they have inconsistent casing in any of their variable / parameter / cursor names, then errors will occur when attempting to execute the code. For example, doing this:
DECLARE #CustomerID INT;
SET #customerid = 5;
would get the following error:
Msg 137, Level 15, State 1, Line XXXXX
Must declare the scalar variable "#customerid".
Similarly, they would get:
Msg 16916, Level 16, State 1, Line XXXXX
A cursor with the name 'Customers' does not exist.
if they did this:
DECLARE customers CURSOR FOR SELECT 1 AS [Bob];
OPEN Customers;
These problems are easy enough to avoid, simply by doing the following:
Specify the COLLATE keyword on string columns when creating temporary tables (local or global). Using COLLATE DATABASE_DEFAULT is handy if the Database itself is not guaranteed to have a particular Collation. But if the Collation of the Database is always the same, then you can specify either DATABASE_DEFAULT or the particular Collation. Though I suppose DATABASE_DEFAULT works in both cases, so maybe it's the easier choice.
Be consistent in casing of identifiers, especially variables / parameters. And to be more complete, I should mention that Instance-level meta-data is also affected by the Instance-level Collation (e.g. names of Logins, Databases, server-Roles, SQL Agent Jobs, SQL Agent Job Steps, etc). So being consistent with casing in all areas is the safest bet.
Am I being unfair in assuming that the vendor doesn't understand how Collations work? Well, according to a comment made by the O.P. on M.Ali's answer:
I got this reply from him: "It's the other way around, you need the new SQL instance collation to match the old SQL collation when attaching databases to it. The collation is used in the functioning of the database, not just something that gets set when it's created."
the answer is "no". There are two problems here:
No, the Collations of the source and destination Instances do not need to match when attaching a Database to a new Instance. In fact, you can even attach a system DB to an Instance that has a different Collation, thereby having a mismatch between the attached system DB and the Instance and the other system DBs.
It's unclear if "database" in that last sentence means actual Database or the Instance (sometimes people use the term "database" to refer to the RDBMS as a whole). If it means actual "Database", then that is entirely irrelevant because the issue at hand is the Instance-level Collation. But, if the vendor meant the Instance, then while true that the Collation is used in normal operations (as noted above), this only shows awareness of simple cause-effect relationship and not actual understanding. Actual understanding would lead to doing those simple fixes (noted above) such that the Instance-level Collation was a non-issue.
If needing to change the Collation of the Instance, please see:
Changing the Collation of the Instance, the Databases, and All Columns in All User Databases: What Could Possibly Go Wrong?
For more info on working with Collations / encodings / Unicode / etc, please visit:
Collations.Info
Is there any option to set the words "file", "key" and "trigger" as field names for ms-sql server?
We have pretty big application written with web2py with PostgreSQL as db, ans some of the fields has those names. One customer wishes to use ms-sql as db server, And I'm trying not to break compatibility within the DB structure.
Using square brackets (found in google) didn't help (could not use '[file]') - the ms-sql rejected it.
The documentation is pretty clear on this:
Although it is syntactically possible to use SQL Server reserved
keywords as identifiers and object names in Transact-SQL scripts, you
can do this only by using delimited identifiers.
The delimiters used in SQL Server are either double quotes or []. So, you can define them as:
[file]
or
"file"
Note that you need to use the delimiters wherever they appear.
The use of reserved words for such columns is discouraged. However, you might actually have a use case of compatibility between different databases where this capability will be useful.
I don't know why square bracket would fail. It works on SQL Fiddle.
I stumbled into this same issue when trying to use a legacy SQL Server table with fields that don't conform to identifier name rule, such as 'My File'.
Reading the source code I found that a rname field attribute exists for these cases:
Field('my_file', 'string', rname='[My File]'),
With this, Web2Py works with the my_file field name, the actual SQL/DML generated to interact with the database will use [My File] instead.
I have a field [Product/Services] in my table in sql server 2005. Now I want to create a stored procedure for that table, but it keep giving an error and when I put only Product in my table than stored procedure is working fine. Now I want to put [Product/Services] in my table so how can I do that?????
it is always a bad idea to try to include special characters in a column/variable/parameter/table/view/procedure/etc names. All of your code will have to dance around this bad decision forever.
Without any detail on your particular code and/or error message, all I can provide are these links on the the rules for naming things in SQL Server:
Identifiers
Delimited Identifiers
from the second link:
Microsoft SQL Server does not
recognize variable names and stored
procedure parameters that are
delimited. These types of identifiers
must comply with the rules for regular
identifiers.
Your best bet is to just name the column something like Product_Services or ProductServices and you can have local variables and parameters named #Product_Services or #ProductServices.
Your next best bet is to leave the table alone and just names the local variables as #Product_Services or #ProductServices even though the table column is named [Product/Services].
Make sure you are properly quoting the name Product/Services as [Product/Services] whenever you are referring to it or using it of redefining it. [] are the quote characters in MS SQL Server.