Change default collation for NVARCHAR columns in MariaDb 10.6 - database

I'm stuck with a problem in MariaDb 10.6.4, which I didn't encounter in 10.5.12.
I set the default character set and default collation for my database (and actually for my server) to utf8mb4 and utf8mb4_general_ci.
Though the server setting doesn't affect the defaults for information_schema, performance_schema and sys in 10.6.4 (they are still utf8mb3), all other databases which I create having utf8mb4.
I use SELECT SCHEMA_NAME, DEFAULT_CHARACTER_SET_NAME, DEFAULT_COLLATION_NAME FROM information_schema.SCHEMATA to verify the default settings.
I use the following Sql to create a test table on the database ETLBox_DataFlow2
CREATE TABLE testtable
(
col1 NVARCHAR(100) NULL,
col2 VARCHAR(100) NULL,
col3 VARCHAR(100) COLLATE 'utf8mb4_general_ci',
col4 VARCHAR(100) CHARACTER SET 'utf8mb4'
)
When I check the character set & collation that each column uses, I see that the NVARCHAR column still has utf8mb3 as default character set.
SELECT COLUMN_NAME, CHARACTER_SET_NAME, COLLATION_NAME
FROM information_schema.Columns
WHERE TABLE_NAME = 'testtable'
This is very unfortunate, as I use C# and the latest MySql connector 8.0 to retrieve data from this table. But the connector won't work with data collated in utf8mb3, as this is already deprecated by MySql/Oracle.
I never had this issue with MariaDb 10.5.12. I know that MariaDb changed the default for utf8, but I thought that now everything would be aligned to utf8mb4 to be in line with MySql. I tried to play around with the ##OLD_MODE, but didn't see any differences here when creating the table.
Is there a way to change this behaviour? I would like that also NVARCHAR columns are collated to utf8mb4 by default (or actually any other collation than utf8mb3 would work for me).
Looking forward to your replies!

Related

Emojis storing as ? in PhpMyAdmin

I'm using wordpress as CMS, Plesk is host and I got a question. I understand that for the database to store emojis, the charset should be changed to utf8mb4.
Where do I change it?
Is there any global setting so that every site uses the new utf8mb4?
And how do I change it to an existing database which has emojis stored already?
Because it really hurts when migrating to new servers. All the emojis then appear as ? when importing database.
Step 1, change your database's default charset:
ALTER DATABASE database_name CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;
if the db is not created yet, create it with correct encodings:
CREATE DATABASE database_name DEFAULT CHARSET = utf8mb4 DEFAULT COLLATE =
utf8mb4_unicode_ci;
Step 2, set charset when creating table:
CREATE TABLE IF NOT EXISTS table_name (
...
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE utf8mb4_unicode_ci;
or alter table
ALTER TABLE table_name CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
ALTER TABLE table_name MODIFY field_name TEXT CHARSET utf8mb4;
Source

Can you set collation of T-SQL Variable?

I've searched high and low but can't find an answer, can you set the collation of a variable? According to the MS documentation, it seems that it's only possible on SQL Azure:
-- Syntax for Azure SQL Data Warehouse and Parallel Data Warehouse
DECLARE
{{ #local_variable [AS] data_type } [ =value [ COLLATE ] ] } [,...n]
Currently I have to do this:
DECLARE #Test nvarchar(10) = N'Crud';
IF ( #Test = N'Crud' COLLATE Latin1_General_CS_AI )
Print N'Crud';
IF ( #Test = N'cRud' COLLATE Latin1_General_CS_AI )
Print N'cRud';
IF ( #Test = N'crUd' COLLATE Latin1_General_CS_AI )
Print N'crUd';
IF ( #Test = N'cruD' COLLATE Latin1_General_CS_AI )
Print N'cruD';
When what I'd like to do is this:
DECLARE #Test nvarchar(10) = N'Crud' COLLATE Latin1_General_CS_AI;
IF ( #Test = N'Crud' )
Print N'Crud';
IF ( #Test = N'cRud' )
Print N'cRud';
IF ( #Test = N'crUd' )
Print N'crUd';
IF ( #Test = N'cruD' )
Print N'cruD';
I'm guessing the answer is no but I wanted to confirm and at the very least, someone else ever needing this info will get a definitive answer.
Much appreciated.
Well, you're guessing correctly.
In most SQL Server systems, (meaning, not including Azure SQL Data Warehouse and Parallel Data Warehouse) A collation can be set on four levels:
The default collation of the SQL Server instance:
The server collation acts as the default collation for all system databases that are installed with the instance of SQL Server, and also any newly created user databases.
The default collation of a specific database:
You can use the COLLATE clause of the CREATE DATABASE or ALTER DATABASE statement to specify the default collation of the database. You can also specify a collation when you create a database using SQL Server Management Studio. If you do not specify a collation, the database is assigned the default collation of the instance of SQL Server.
You can set a collation for a table's column:
You can specify collations for each character string column using the COLLATE clause of the CREATE TABLE or ALTER TABLE statement. You can also specify a collation when you create a table using SQL Server Management Studio. If you do not specify a collation, the column is assigned the default collation of the database.
You can set a collation for a specific expression using the Collate clause:
You can use the COLLATE clause to apply a character expression to a certain collation. Character literals and variables are assigned the default collation of the current database. Column references are assigned the definition collation of the column.
So yes, with the exception of Azure SQL Data Warehouse and Parallel Data Warehouse, you can't set a collation on a local scalar variable.

Enforce same collation on multiple SQL Server databases

Multiple SQL Server databases with the exact same schema somehow ended up having different collations. How do I change them all to be the same with a scripted approach without any manual clicking around?
declare #rename_models table (
wrong nvarchar(256) COLLATE SQL_Latin1_General_CP1_CI_AS, -- tried overriding collation, but this conflicts with some of the databases
correct nvarchar(256) COLLATE SQL_Latin1_General_CP1_CI_AS
);
The query I run against a models table:
select code as to_be_deleted from models where code in (select wrong from #rename_models);
Throws this for some databases:
MESSAGE
"Msg 468, Level 16, State 9, Line 140
Cannot resolve the collation conflict between ""SQL_Latin1_General_CP1_CI_AS"" and ""Latin1_General_CI_AS"" in the equal to operation.
You can use the COLLATE keyword in your Select Query.
Casting the collation of an expression.
You can use the COLLATE clause to apply a character expression to a
certain collation. Character literals and variables are assigned the
default collation of the current database. Column references are
assigned the definition collation of the column.
References :
COLLATE

Changing the collation of a SQL Server 2012 database

Alter Collation
I need to change the collation of one of our databases on a particular server from Latin1_General_CI_AS to SQL_Latin1_General_CP1_CI_AI so that it matches the rest of our databases.
The Problem
However, when I attempt to do this, I get the following error:
ALTER DATABASE failed. The default collation of database 'XxxxxXxxxxx' cannot be set to SQL_Latin1_General_CP1_CI_AI. (Microsoft SQL Server, Error: 5075)
My Research
My googling on the topic has revealed a number of articles which indicate that I need to export all the data, drop the database, re-create it with the correct collation, then re-import the data.
For example: Problem with database collation change (SQL Server 2008)
Obviously this is a significant task, especially since primary-foreign key relationships must be preserved, and our database is quite large (over ten million data rows).
My Question
Is there a way to change the collation of an existing SQL Server 2012 database which does not require exporting and re-importing all the data?
Alternatively, is there some tool or script(s) which can automate this process in a reliable manner?
The following works for me on SQL Server 2012:
ALTER DATABASE CURRENT COLLATE SQL_Latin1_General_CP1_CI_AI;
The accepted answer in the linked question is not entirely correct, at least not for SQL Server 2012. It says:
Ahh, this is one of the worst problems in SQL Server: you cannot change the collation once an object is created (this is true both for tables and databases...).
But I was just able to change the default collation and I have tables that are populated. The MSDN page for ALTER DATABASE states in the "Remarks" section, under "Changing the Database Collation":
Before you apply a different collation to a database, make sure that the following conditions are in place:
You are the only one currently using the database.
No schema-bound object depends on the collation of the database.
If the following objects, which depend on the database collation, exist in the database, the ALTER DATABASE database_name COLLATE statement will fail. SQL Server will return an error message for each object blocking the ALTER action:
User-defined functions and views created with SCHEMABINDING.
Computed columns.
CHECK constraints.
Table-valued functions that return tables with character columns with collations inherited from the default database collation.
So, I would suggest making sure that the database is in Single-User mode, and that if you have any of those four items, that you:
drop them
change the collation
and then re-add them
BUT, at that point all that has been changed is the Database's default Collation. The Collation of any existing columns in user tables (i.e. non-system tables) will still have the original Collation. If you want existing string columns -- CHAR, VARCHAR, NCHAR, NVARCHAR, and the deprecated TEXT and NTEXT -- to take on the new Collation, you need to change each of those columns individually. And, if there are any indexes defined on those columns, then those indexes will need to be dropped first (disabling is not enough) and created again after the ALTER COLUMN (other dependencies that would prevent the ALTER COLUMN would have already been dropped in order to get the ALTER DATABASE to work). The example below illustrates this behavior:
Test Setup
USE [tempdb];
SET NOCOUNT ON;
CREATE TABLE dbo.ChangeCollationParent
(
[ChangeCollationParentID] INT NOT NULL IDENTITY(1, 1)
CONSTRAINT [PK_ChangeCollationParent] PRIMARY KEY,
ExtendedASCIIString VARCHAR(50) COLLATE Latin1_General_CI_AS NULL,
UnicodeString NVARCHAR(50) COLLATE Latin1_General_CI_AS NULL
);
CREATE TABLE dbo.ChangeCollationChild
(
[ChangeCollationChildID] INT NOT NULL IDENTITY(1, 1)
CONSTRAINT [PK_ChangeCollationChild] PRIMARY KEY,
[ChangeCollationParentID] INT NULL
CONSTRAINT [FK_ChangeCollationChild_ChangeCollationParent] FOREIGN KEY
REFERENCES dbo.ChangeCollationParent([ChangeCollationParentID]),
ExtendedASCIIString VARCHAR(50) COLLATE Latin1_General_CI_AS NULL,
UnicodeString NVARCHAR(50) COLLATE Latin1_General_CI_AS NULL
);
INSERT INTO dbo.ChangeCollationParent ([ExtendedASCIIString], [UnicodeString])
VALUES ('test1' + CHAR(200), N'test1' + NCHAR(200));
INSERT INTO dbo.ChangeCollationParent ([ExtendedASCIIString], [UnicodeString])
VALUES ('test2' + CHAR(170), N'test2' + NCHAR(170));
INSERT INTO dbo.ChangeCollationChild
([ChangeCollationParentID], [ExtendedASCIIString], [UnicodeString])
VALUES (1, 'testA ' + CHAR(200), N'testA ' + NCHAR(200));
INSERT INTO dbo.ChangeCollationChild
([ChangeCollationParentID], [ExtendedASCIIString], [UnicodeString])
VALUES (1, 'testB ' + CHAR(170), N'testB ' + NCHAR(170));
SELECT * FROM dbo.ChangeCollationParent;
SELECT * FROM dbo.ChangeCollationChild;
Test 1: Change column Collation with no dependencies
ALTER TABLE dbo.ChangeCollationParent
ALTER COLUMN [ExtendedASCIIString] VARCHAR(50) COLLATE SQL_Latin1_General_CP1_CI_AI NULL;
ALTER TABLE dbo.ChangeCollationParent
ALTER COLUMN [UnicodeString] NVARCHAR(50) COLLATE SQL_Latin1_General_CP1_CI_AI NULL;
ALTER TABLE dbo.ChangeCollationChild
ALTER COLUMN [ExtendedASCIIString] VARCHAR(50) COLLATE SQL_Latin1_General_CP1_CI_AI NULL;
ALTER TABLE dbo.ChangeCollationChild
ALTER COLUMN [UnicodeString] NVARCHAR(50) COLLATE SQL_Latin1_General_CP1_CI_AI NULL;
SELECT * FROM dbo.ChangeCollationParent;
SELECT * FROM dbo.ChangeCollationChild;
The ALTER COLUMN statements above complete successfully.
Test 2: Change column Collation with dependencies
-- First, create an index:
CREATE NONCLUSTERED INDEX [IX_ChangeCollationParent_ExtendedASCIIString]
ON dbo.ChangeCollationParent ([ExtendedASCIIString] ASC);
-- Next, change the Collation back to the original setting:
ALTER TABLE dbo.ChangeCollationParent
ALTER COLUMN [ExtendedASCIIString] VARCHAR(50) COLLATE Latin1_General_CI_AS NULL;
This time, the ALTER COLUMN statement received the following error:
Msg 5074, Level 16, State 1, Line 60
The index 'IX_ChangeCollationParent_ExtendedASCIIString' is dependent on column 'ExtendedASCIIString'.
Msg 4922, Level 16, State 9, Line 60
ALTER TABLE ALTER COLUMN ExtendedASCIIString failed because one or more objects access this column.
ALSO, please be aware that the Collation of some string columns in database-scoped system catalog views (e.g. sys.objects, sys.columns, sys.indexes, etc) will change to the new Collation. If your code has JOINs to any of these string columns (i.e. name), then you might start getting Collation mismatch errors until you change the Collation on the joining columns in your user tables.
UPDATE:
If changing the Collation for the entire Instance is the desire, or an option, then there is an easier method that bypasses all of these restrictions. It is undocumented and hence unsupported (so if it doesn't work, Microsoft isn't going to help). However, it changes the Collation at all levels: Instance, all Database's, and all string columns in all User Tables. It does this, and avoids all of the typical restrictions, by simply updating the meta-data of the tables, etc to have the new Collation. It then drops and recreates all indexes that have string columns. There are also a few nuances to this method that might have impact, but are fixable. This method is the -q command-line switch of sqlservr.exe. I have documented all of the behaviors, including listing all of the potentially affected areas by doing such a wide-sweeping Collation change, in the following post:
Changing the Collation of the Instance, the Databases, and All Columns in All User Databases: What Could Possibly Go Wrong?
For anyone else stumbling to this problem, the solution is to set DB in single_user mode before change the collation and then set again the multi_user mode after it.
Make sure to not close the connection before setting the multi_user mode!
/* block all other users from connecting to the db */
ALTER DATABASE YorDbName SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
/* modify your db collate */
ALTER DATABASE CURRENT COLLATE SQL_Latin1_General_CP1_CI_AI;
/* allow again all other users to connect to the db */
ALTER DATABASE YorDbName SET MULTI_USER;

SQL Server - Convert varchar to another collation (code page) to fix character encoding

I'm querying a SQL Server database that uses the SQL_Latin1_General_CP850_BIN2 collation. One of the table rows has a varchar with a value that includes the +/- character (decimal code 177 in the Windows-1252 codepage).
When I query the table directly in SQL Server Management Studio, I get a gibberish character instead of the +/- character in this row. When I use this table as the source in an SSIS package, the destination table (which uses the typical SQL_Latin1_General_CP1_CI_AS collation), ends up with the correct +/- character.
I now have to build a mechanism that directly queries the source table without SSIS. How do I do this in a way that I get the correct character instead of gibberish? My guess would be that I would need to convert/cast the column to the SQL_Latin1_General_CP1_CI_AS collation but that isn't working as I keep getting a gibberish character.
I've tried the following with no luck:
select
columnName collate SQL_Latin1_General_CP1_CI_AS
from tableName
select
cast (columnName as varchar(100)) collate SQL_Latin1_General_CP1_CI_AS
from tableName
select
convert (varchar, columnName) collate SQL_Latin1_General_CP1_CI_AS
from tableName
What am I doing wrong?
Character set conversion is done implicitly on the database connection level. You can force automatic conversion off in the ODBC or ADODB connection string with the parameter "Auto Translate=False". This is NOT recommended.
See: https://msdn.microsoft.com/en-us/library/ms130822.aspx
There has been a codepage incompatibility in SQL Server 2005 when Database and Client codepage did not match.
https://support.microsoft.com/kb/KbView/904803
SQL-Management Console 2008 and upwards is a UNICODE application. All values entered or requested are interpreted as such on the application level. Conversation to and from the column collation is done implicitly. You can verify this with:
SELECT CAST(N'±' as varbinary(10)) AS Result
This will return 0xB100 which is the Unicode character U+00B1 (as entered in the Management Console window). You cannot turn off "Auto Translate" for Management Studio.
If you specify a different collation in the select, you eventually end up in a double conversion (with possible data loss) as long as "Auto Translate" is still active. The original character is first transformed to the new collation during the select, which in turn gets "Auto Translated" to the "proper" application codepage. That's why your various COLLATION tests still show all the same result.
You can verify that specifying the collation DOES have an effect in the select, if you cast the result as VARBINARY instead of VARCHAR so the SQL Server transformation is not invalidated by the client before it is presented:
SELECT cast(columnName COLLATE SQL_Latin1_General_CP850_BIN2 as varbinary(10)) from tableName
SELECT cast(columnName COLLATE SQL_Latin1_General_CP1_CI_AS as varbinary(10)) from tableName
This will get you 0xF1 or 0xB1 respectively if columnName contains just the character '±'
You still might get the correct result and yet a wrong character, if the font you are using does not provide the proper glyph.
Please double check the actual internal representation of your character by casting the query to VARBINARY on a proper sample and verify whether this code indeed corresponds to the defined database collation SQL_Latin1_General_CP850_BIN2
SELECT CAST(columnName as varbinary(10)) from tableName
Differences in application collation and database collation might go unnoticed as long as the conversion is always done the same way in and out. Troubles emerge as soon as you add a client with a different collation. Then you might find that the internal conversion is unable to match the characters correctly.
All that said, you should keep in mind that Management Studio usually is not the final reference when interpreting result sets. Even if it looks gibberish in MS, it still might be the correct output. The question is whether the records show up correctly in your applications.
Must be used convert, not cast:
SELECT
CONVERT(varchar(50), N'æøåáäĺćçčéđńőöřůýţžš')
COLLATE Cyrillic_General_CI_AI
(http://blog.sqlpositive.com/2010/03/using-convert-with-collate-to-strip-accents-from-unicode-strings/)
We may need more information. Here is what I did to reproduce on SQL Server 2008:
CREATE DATABASE [Test] ON PRIMARY
(
NAME = N'Test'
, FILENAME = N'...Test.mdf'
, SIZE = 3072KB
, FILEGROWTH = 1024KB
)
LOG ON
(
NAME = N'Test_log'
, FILENAME = N'...Test_log.ldf'
, SIZE = 1024KB
, FILEGROWTH = 10%
)
COLLATE SQL_Latin1_General_CP850_BIN2
GO
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[MyTable]
(
[SomeCol] [varchar](50) NULL
) ON [PRIMARY]
GO
Insert MyTable( SomeCol )
Select '±' Collate SQL_Latin1_General_CP1_CI_AS
GO
Select SomeCol, SomeCol Collate SQL_Latin1_General_CP1_CI_AS
From MyTable
Results show the original character. Declaring collation in the query should return the proper character from SQL Server's perspective however it may be the case that the presentation layer is then converting to something yet different like UTF-8.
try:
SELECT CAST( CAST([field] AS VARBINARY) AS varchar)
I think
SELECT CAST( CAST([field] AS VARBINARY(120)) AS varchar(120))
for your update

Resources