SQL Server collation -- closest to utf-8?

SQL Server collation -- closest to utf-8? - sql-server

I can't seem to find a way to set the default collation of a database to utf(ish). For example:
For example, in mysql the default utf collation is called utf8_general_ci. Is there something similar for SQL server for this? Also, what does it use Latin1 as default?

According to https://learn.microsoft.com/en-us/sql/relational-databases/collations/collation-and-unicode-support?view=sql-server-ver15#utf8, you add "_UTF8" to the collation name to enable use of UTF8. (SQL Server 2019 is required.) The example given is to change LATIN1_GENERAL_100_CI_AS_SC to LATIN1_GENERAL_100_CI_AS_SC_UTF8.
If you will be migrating an existing database from a older version, I believe extra care is required to insure collation conversion is handled properly. There can be side effects from the change in sorting. Also, existing table definitions will use their original collation. This might be an issue if creating new tables that will use the new collation by default.

Related

Why does SSDT Schema compare showing collation as a difference?

I have a Visual Studio Database project (SQL Server) with tables, stored procedures etc. The tables have collation defined ex:
CREATE TABLE [dbo].[TestTable]
(
[TestColumn] [varchar] (3) COLLATE SQL_Latin1_General_CP1_CI_AS NOT NULL
);
The database default collation is also SQL_Latin1_General_CP1_CI_AS.
I use sqlpackage to publish and the ScriptDatabaseCollation set to True.
When I modify the table from any direction (like adding a new column), and use the SSDT compare tool, it shows the collation as a different, even though the "Ignore collation" is set to True:
Also, another interesting is that, when I click on the generate script, it won't contain any collation modifications, just the new column.
It's even worst when I try to compare from the other direction (update the DB directly and use compare from DB to local project), because it updates my file and removes the collation.
Sytem information:
SSDT Version 17.0.62204.01010
MSSQL Server Express 15.0.4153.1
Visual Studio Professional 2022 17.2.2
Does anybody know how can I solve this problem?

I can only presume that SSDT is thus trying to remove "excessive" DML which it thinks is unnecessary. Since your column's collation is the same as that of the project, repeating it again doesn't really make much sense (at least from SSDT's point of view).
You probably will appreciate this behaviour if / when you will have multiple instances of your database with different default collations. Speaking of which, I hope you know what you are doing, choosing a very old, problematic SQL collation as a default for your system.
Having said that, SSDT doesn't always remove collate clauses from your DML. If you specify a column collation which is different from the project's default one, it won't disappear after schema comparison (assuming both source and target have the same one). In one of my recent projects, for example, I needed a couple of columns to be case-sensitive, so I set them to Latin1_General_100_CS_AS in SSDT. These clauses didn't go anywhere after several months of development work.
If, for some unknown reason, it is absolutely paramount for you to keep these collate clauses in your code, you may set the project's default collation to something else. This will prevent SSDT from cleaning up the noise. However, you need to be careful with schema comparison and DACPAC deployment. In the former, you have the following options:
"Compare using target collation" (cleared by default),
"Ignore column collation" (cleared by default),
"Verify collation compatibility" (set by default).
(Not sure about the latter, as I never used it.)
However, going to the Schema Compare settings dialog every time you need to compare schemas will soon become too tedious to bear. I would recommend to just agree with SSDT and remove the stuff you don't really need.

SonarQube: Is the Collation for the Database or the Instance?

According to the SonarQube documentation "Installing the Server" (https://docs.sonarqube.org/display/sonar/installing+the+server), for a Microsoft SQL Server host, "collation MUST be case-sensitive (CS) and accent-sensitive (AS)."
The documentation is not clear if the collation must be set:
for the SQL Server instance, or
the database
If the collation for the SQL Server (and specifically for tempdb) is "accent insensitive" and the database collation is "accent sensitive", does SonarQube accommodate this configuration?

If the collation for the SQL Server (and specifically for tempdb) is "accent insensitive" and the database collation is "accent sensitive", does SonarQube accommodate this configuration?
Since the documentation is ambiguous (they might not use SQL Server enough to know the different levels where Collation can be set), the only two ways to get the answer here are:
Contact their community: https://www.sonarqube.org/community/feedback/. This is the best choice.
Install it on an Instance that has an accent insensitive default Collation and test it out. No reason not to try this.
Whether or not SonarQube handles this properly depends on how it was coded. They could be JOINing on string columns in temporary tables and any difference in Collation between the Database and Instance could potentially cause an error, but only if they are not specifically declaring the Collation when creating the temp tables.
Also, it is possible that their app needs the accent sensitivity because they have some variables names and/or cursor names and/or (less likely) GOTO label names that might equate under accent insensitivity that should otherwise be seen as different. Instance-level Collation controls these areas and would hence affect the name resolution of those items. Of course, this would be easy to test for since declaring two variables that are considered different names under accent sensitivity will cause a parse error if close enough to be considered the same under accent insensitivity. Still, contact their community.

What collation should I use for amharic language?

I am using SQL Server 2014. I want to use amharic language in my database. The default collation for the database was Latin1_General_CP1_CI_AS. I changed it to Latin1_General_CI_AS. Both doesn't diplay amharic characters. They show the characters when typing but when comitted they are changed to question marks.
What collation should I use or what am I missing?

I assume your problem might not be in Collation.
Try to use UNICODE data types such as NCHAR and NVARCHAR and you'll see your saved characters.
Collation is only needed for sorting and comparing. Look through the list of collations and choose the most appropriate one.
Originally, you did not tell you are using Full-text search. That requires you to use key word LANGUAGE with name of your language. However, there are only 53 supported languages (see in sys.fulltext_languages) and amharic isn't there.
You have only an option to recreate your Full-text catalog with the neutral word breaker and then re-populate it. Then at least it will recognize words by spaces and punctuation marks.
See more details: https://msdn.microsoft.com/en-us/library/ms142509.aspx?f=255&MSPPError=-2147217396

You can follow the steps on MSDN:
Default Collations in SQL Server Setup
In Control Panel, find the Windows system locale name on the Advanced
tab in Regional and Language Options. In Windows Vista, use the
Formats" tab. The following table shows the corresponding collation
designator to match collation settings with an existing Windows
locale:
So you have to change the setting using the Control panel so as to see the results.

A simple google search brougth me to this list of collations, most of them started with SQL Server 2008R2.
Your collation should be Latin1_General_100_
Some more hints:
The SQL Server has a default collation, which is the standard collation for all new databases and - very important!!! - the standard collation for your temp-database.
Best would be - if this works for you - to install the SQL Server with the appropriate collation.
SQL Server allows to define a default collation on database level too. But this can lead into deep troubles, if you work with CREATE TABLE #table and use WHERE,ORDER BY,GROUP BY or JOIN (any comparison...) using character fields...
Last but not least you can define a collation on statement level too. This means a lot ot typing and rather difficult to read code, but offers the best control...
It is a different issue how the output of a query is displayed. This is an issue of the output window (or any other program you are using to display the values of your queries...). In this case it might be necessary to use the appropriate encoding and character set. This depends on the tools you are working with...

User Defined Data Types with COLLATION

I am trying to find a way of specifying a COLLATION for a user-defined data type, but it doesn't appear to support collations in SQL Server, and it basically takes the collation from the database.
Is there any way of applying a collation against a user defined data type that doesn't inherit from the database collation?
I am writing a database comparison tool, using the AdventureWorks2012 database for testing, and after generate an SQL diff script and running it on a blank database the tool is now reporting a lot of differences due to the collation being different to what was actually expected.
I could create the blank database with the same collation as the source database, but I need to consider situations where organisations, etc. may wish to change the collation on the database and my tool will not work consistently and will keep on reporting differences.
Thanks.

SQL Server Collation Conflict

Transferring data from one SQL server to another but when Schema is compared and syncronised the following error is received. We are using redgate SQL compare to complete.
Cannot resolve collation conflict for equal to operation
Base SQL server is SQL_Latin1_General_CP1_CI_AS and the destination server is Latin1_General_CI_AS

SQL Compare has an option to ignore collations. Look under the tab "options" in your compare project configuration.

is you problem with the SQL Compare utlity, or a worry that different server collations will lead to problems?
You could change the collation of the destination server to match the Base server
If that is not possible, then make the Collation of the databases on each server match, and then your only real problem is likely to be any temporary tables which you create (they will have a default collation matching the server / TEMPDB), and so long as you explicitly create the temporary table (i.e. don't create it using SELECT * INTO #TEMP FROM MyTable) and explicitly assign a collation to any varchar/text columns you should be OK

The way I overcome this is to generate the scripts via SQL Compare and then strip out (or replace) the Collation specific code. This is relatively fast and easy to do, and finally I manually apply the scripts to the destination server/ database.

Sounds like the collation settings for the server are different.
How are you transferring the data, do you perform a database restore on your new platform?
Either way, you need to ensure that the same collation is used on your new environment as is currently in place in your source environment.
Hope this makes sense, let me know if you need further assistance.

"Ignore collations" is definitely not going to work, for the reason stated above. The problem happens when migrating objects like views and stored procedures that use JOIN clauses on text fields that have differing collations.
If someone changes the default collation on the server and the column on the other side of the JOIN uses a specific collation, you've caused this issue. And it would happen in SQL Compare as well as if you just manually scripted the object in SSMS and moved it yourself.
There are two roads to fixing it - you could specify a COLLATE clause on the join and explicitly state the collation you want to use, or you could change the destination database default collation to match the source.
I'm afraid there is no SQL Compare "magic bullet" to solve this.