Saving Greek characters in varchar column using SQL Server 2019 _UTF8 collations - sql-server

Background: I'm doing some proofing of SQL Server 2019's _UTF8 collations (ref. https://learn.microsoft.com/en-us/sql/relational-databases/collations/collation-and-unicode-support?view=sql-server-ver15).
Use case: We sometimes have a need to store scientific names featuring Greek letters, such as DL-α-Tocopherol Acetate or Δ8-THC-Naphthoylester, etc.. Although this isn't common (most of our customers do not need this feature) but when the feature is needed, it almost non-negotiable. Since we normally don't need this feature we usually default the columns to varchar. I don't really want to get into a debate on nvarchar vs. varchar here, but suffice to say we only use nvarchar when needed, so when saving characters like α and Δ we'll change the data type and move on.
But the _UTF8 collations appear to allow these values to be saved in a varchar column without needing to change the data type to nvarchar:
UTF-8 support SQL Server 2019 (15.x) introduces full support for the
widely used UTF-8 character encoding as an import or export encoding,
and as database-level or column-level collation for string data. UTF-8
is allowed in the char and varchar data types, and it's enabled when
you create or change an object's collation to a collation that has a
UTF8 suffix. One example is changing LATIN1_GENERAL_100_CI_AS_SC to
LATIN1_GENERAL_100_CI_AS_SC_UTF8.
UTF-8 is available only to Windows collations that support
supplementary characters, as introduced in SQL Server 2012 (11.x). The
nchar and nvarchar data types allow UCS-2 or UTF-16 encoding only, and
they remain unchanged.
Azure SQL Database and Azure SQL Managed Instance also support UTF-8
on database and column level, while Managed Instance supports this on
a server level as well.
My problem:
I tried to save α and Δ with a simple example using SQL Server 2019 as follows:
CREATE TABLE dbo.SCI_DATA
(
id int NOT NULL,
name varchar(75) COLLATE Latin1_General_100_CI_AS_SC_UTF8
)
INSERT INTO dbo.SCI_DATA (id, name)
VALUES (1, 'DL-α-Tocopherol Acetate');
INSERT INTO dbo.SCI_DATA (id, name)
VALUES (2, 'Δ8-THC-Naphthoylester');
INSERT INTO dbo.SCI_DATA (id, name)
VALUES (3, 'Slainté');
SELECT *
FROM dbo.SCI_DATA;
DROP TABLE IF EXISTS dbo.SCI_DATA;
this returns
id name
---------------------------
1 DL-a-Tocopherol Acetate
2 ?8-THC-Naphthoylester
3 Slainté
The accented é is fine but Alpha α and Delta Δ are returned as ?. I'm wondering what I'm missing.

Even though the column is varchar with a UTF8 collation, it appears you still need to indicate that the data is Unicode with N'.
This modified example works:
CREATE TABLE dbo.SCI_DATA (
id int NOT NULL
, name varchar(75) COLLATE Latin1_General_100_CI_AS_SC_UTF8
)
INSERT INTO dbo.SCI_DATA (id, name) VALUES (1, N'DL-α-Tocopherol Acetate');
INSERT INTO dbo.SCI_DATA (id, name) VALUES (2, N'Δ8-THC-Naphthoylester');
INSERT INTO dbo.SCI_DATA (id, name) VALUES (3, 'Slainté');
SELECT * FROM dbo.SCI_DATA;
DROP TABLE IF EXISTS dbo.SCI_DATA;
Returns:
id name
1 DL-α-Tocopherol Acetate
2 Δ8-THC-Naphthoylester
3 Slainté

Related

Mssql encoding problem with special characters

We are using the node-mssql package to insert into and read out of our azure mssql database.
The database is very simple, because it is just used as a key/value cache, where the value can be quite long and also contain special characters.
DB Schema:
create table cache
(
id int identity
constraint cache_pk
primary key nonclustered,
cacheKey varchar(500),
ttl bigint,
value varchar(max)
)
go
create index cache_cacheKey_index
on cache (cacheKey desc)
go
create index cache_ttl_index
on cache (ttl)
go
For some reason, when I insert values into "value", some special characters are not treated well.
Dash – example
turns into:
Dash  example
I have seen the same thing happening with the french apostrophe.
I also tried to change the collation already, but that did not help.
Also tried it by using nvarchar(max) as column type.
This is the insert code (including the sql):
const result = await conn.request()
.input('key', mssql.VarChar, key)
.input('value', mssql.Text, value)
.input('ttl', mssql.BigInt, cacheEndTime)
.query`INSERT INTO cache (cacheKey, value, ttl) VALUES (#key, #value, #ttl)`;
Can you please help with a correct table structure or sql statement to make this work?
I'm not sure if can help you, but have you checked the collation of the table, the database, and the server? The collate have differents levels.
The answer for your question are in one of this items:
Server collation
Table collation
Field collation
Cast the insert text
For example, if you create nvarchar (I recommend if you have international scenario) field, cast the text insert like N'text to insert'.
It will work ;)
I have found the answer.
Like #RealSeik and #Larnu already stated it was probably not a problem with the database or the queries themselves, but rather an input problem.
I realized, that node-sass has a type for Unicode text, where they took care of casting it correctly.
So instead of mssql.Text I changed it to mssql.NText.
So now the insert command looks as follows:
const result = await conn.request()
.input('key', mssql.VarChar, key)
.input('value', mssql.NText, value)
.input('ttl', mssql.BigInt, cacheEndTime)
.query`INSERT INTO cache (cacheKey, value, ttl) VALUES (#key, #value, #ttl)`;
I have also added collations to my other scripts, for good measure as well.
(that alone did not help, but for good measure)
ALTER DATABASE MyDbName
COLLATE Latin1_General_100_CI_AI_SC_UTF8 ;
create table cache
(
id int identity
constraint cache_pk
primary key nonclustered,
cacheKey varchar(500) COLLATE Latin1_General_100_CI_AI_SC_UTF8,
ttl bigint,
value varchar(max) COLLATE Latin1_General_100_CI_AI_SC_UTF8
)
go
create index cache_cacheKey_index
on cache (cacheKey desc)
go
create index cache_ttl_index
on cache (ttl)
go

How to insert Gujarati in SQL Server 2012 Management Studio?

How can insert Gujarati language in Microsoft SQL Server Management Studio?
I have tried to insert Gujarati and use nvarchar datatype but it's not working
The most common problem that I see when people aren't getting the outcome they want when inserting into an nvarchar column is that they aren't putting N in front of their string literal values (N is National Character Set).
If you have a table like this:
CREATE TABLE dbo.Test
(
TestID int IDENTITY(1,1) PRIMARY KEY,
TestValue nvarchar(100)
);
Here's a Chinese example that won't work (because of multi-byte string values):
INSERT dbo.Test (TestValue) VALUES ('Hello 你好');
With just single quotes, it's just an ASCII string. What would work is:
INSERT dbo.Test (TestValue) VALUES (N'Hello 你好');
I'm guessing your issue might be similar for Gujarati.
you can take column datatype Nvarchar and then you can insert in Gujarati

SQL Syntax: SQL Server vs. Teradata

I've been querying against Teradata servers with SQL Assistant for years, but now have to work with a SQL Server. I've been stumbling over my code for hours, having a hard time figuring out which pieces of syntax need to be updated.
Does anyone know of a good resource for converting logic?
Here's an example -- I was loading .txt data into a temp table:
In Teradata, the following works:
CREATE MULTISET TABLE USER_WORK.TABLE1 (
VAR1 CHAR(3)
,VAR2 CHAR(5)
,VAR3 DECIMAL(12,2) )
PRIMARY INDEX (VAR1, VAR2);
In SQL Server, I was able to get the following to work:
CREATE TABLE #TABLE1 (
VAR1 VARCHAR(20)
,VAR2 VARCHAR(20)
,VAR3 VAR(20) );
(Main differences: No "Multiset"; all variables read in as VARCHAR & and I couldn't get any length shorter than 20 to work; I couldn't figure out how to define a functional index)
Mostly wondering if there is some sort of pattern behind migrating the logic - it's painful to have to look up every single piece of failed code, and to sort out of it will actually run on SQL Server.
A few points...
The # prefix in your SQL Server attempt defines a local temporary table. It's visible to your session only, and it will go away when the session ends. I think it's similar to a VOLATILE table in Teradata. Is that what you wanted?
SQL Server tables are MULTISET by default so SQL has no equivalent keyword.
If you were having trouble with CHAR column sizes it was most likely a syntax error elsewhere. CHAR columns can be from 1 to 8,000 characters long, using a single-byte character set.
SQL Server doesn't have a PRIMARY INDEX. As I understand it, the equivalent in SQL Server is a CLUSTERED index.
So your exact table structure in SQL Server would be like this:
CREATE TABLE USER_WORK.TABLE1 (
VAR1 CHAR(3)
,VAR2 CHAR(5)
,VAR3 DECIMAL(12,2));
And for the index (the name can be whatever you want):
CREATE CLUSTERED INDEX TABLE1_FOO ON USER_WORK.TABLE1(VAR1, VAR2);
You can create the exact same schema in sql server as well but the syntax will be a bit different.
I would translate your teradata table as below:
CREATE TABLE TABLE1
( VAR1 CHAR(3) NOT NULL
,VAR2 CHAR(5) NOT NULL
,VAR3 DECIMAL(12,2)
,PRIMARY KEY (VAR1, VAR2)
);
GO
You can still have CHAR(3) and CHAR(5) data types for VAR1 and VAR2 columns, but you have to make them non-nullable column since they are going to be Primary key columns ( requirement in sql server).
Sql server also has data type decimal(12,2) you can use it for your VAR3 column. finally the composite primary key can be part of the table definition as shows above. tar
Some Teradata - SQL Server Management Studio (SSMS) Differences:
• Extract(MONTH from Column) = DATEPART(MONTH, Column)
• To comment a block of code, highlight it then click ‘Ctrl /’ (‘Ctrl Alt /’ to remove)
• DATE() (& TIME()) = GETDATE()
• Datatype BYTEINT = TINYINT
• Datatype LONG VARCHAR = VARCHAR(Max) (64,000 characters)
• ADD_MONTHS(Column, 2) = DATEADD(MONTH, 2, Column)
• String1 || ’ ‘ || String2 = String1 + ‘ ‘ + String2
• SELECT * FROM TABLE SAMPLE 50 = SELECT TOP 50 * FROM TABLE
• GROUP BY ROLLUP(1) = GROUP BY ‘Column’ WITH ROLLUP
• SUBSTRING ('SQL Tutorial' FROM 4 FOR 2) = SUBSTRING('SQL Tutorial', 4, 2)
• TRIM (TRAILING FROM <expression>) = RTRIM(<expression>)
• TRIM (LEADING FROM <expression>) = LTRIM(<expression>)
• NVL(Column,’’) = ISNULL(Column,’’)
I'm still trying to figure out the differences in writing update statements!

SQL server key violation due to case insesitivity

I m having a table in which a primary key is there with 2 columns(CODE nvarchar,VALUE nvarchar).This table contains the values in the Key columns as (X8900,A) but when I try to insert a new value as (X8900,a) ,its giving error message “primary key violation”.
Why its giving this error,if case is different for values column and is there any solution for this in order avoid the error ?
You can specify if SQL Server should be case sensitive or not using collation. In this instance the column must have a case sensitive collation in order for you to be able to specify any type of unique constraint on it. For example, the first example will fail whereas the second will work, notice the CI and CS for case insensitive and sensitive.
CREATE TABLE test1 (
col1 varchar(20) COLLATE Latin1_General_CI_AS PRIMARY KEY
)
INSERT INTO test1 VALUES ('ASD')
INSERT INTO test1 VALUES ('asd')
CREATE TABLE test2 (
col1 varchar(20) COLLATE Latin1_General_CS_AS PRIMARY KEY
)
INSERT INTO test2 VALUES ('ASD')
INSERT INTO test2 VALUES ('asd')
Collation can be set at the column or database level. If set at database level then all character columns without a collation specified adopt the database collation.
You have to check the collation of your database. If you have a case insensitive collation, 'A' == 'a'. If you need to maintain difference between cases, you can either change the collation to a case sensitive collation, or you could cast the strings to varbinary. A binary representation differentiates between cases.
Collations can be set at the server level (i.e what databases default to) and at the database level (overriding the server collation). At an even more granular level, you can set collation on individual columns if you want/need. Here are a few articles to look at:
https://msdn.microsoft.com/en-us/library/hh230914.aspx#TsqlProcedure
https://msdn.microsoft.com/en-us/library/ms144250%28v=sql.105%29.aspx
Here are a few SQL snippets you can run to view your current server collation, as well as the default collations on each database
SELECT CONVERT (varchar, SERVERPROPERTY('collation'));
SELECT name, collation_name FROM sys.databases;

Correct SQL to convert mySQL tables to SQL Server tables

I have a number of tables I need to convert from mySQL to SQL Server.
An Example of a mySQL Table is
CREATE TABLE `required_items` (
`id` INT( 11 ) NOT NULL AUTO_INCREMENT PRIMARY KEY COMMENT 'Unique Barcode ID',
`fk_load_id` INT( 11 ) NOT NULL COMMENT 'Load ID',
`barcode` VARCHAR( 255 ) NOT NULL COMMENT 'Barcode Value',
`description` VARCHAR( 255 ) NULL DEFAULT NULL COMMENT 'Barcode Description',
`created` TIMESTAMP NULL DEFAULT NULL COMMENT 'Creation Timestamp',
`modified` TIMESTAMP ON UPDATE CURRENT_TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT 'Modified Timestamp',
FOREIGN KEY (`fk_load_id`) REFERENCES `loads`(`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE = InnoDB CHARACTER SET ascii COLLATE ascii_general_ci COMMENT = 'Contains Required Items for the Load';
And a trigger to update the created date
CREATE TRIGGER required_items_before_insert_created_date BEFORE INSERT ON `required_items`
FOR EACH ROW
BEGIN
SET NEW.created = CURRENT_TIMESTAMP;
END
Now I need to create tables similar to this in SQL Server. There seems to be a lot of different data types available so I am unsure which to use.
What data type should I use to the primary key column
(uniqueidentifier, bigint, int)?
What should I use for the timestamps
(timestamp, datatime, datetime2(7))?
How should I enforce the created
and modified timestamps (currently I am using triggers)?
How can I enforce foreign key constraints.
Should I be using Varchar(255) in SQL Server? (Maybe Text, Varchar(MAX) is better)
I am using Visual Studio 2010 to create the tables.
First of all, you can probably use PHPMyAdmin (or something similar) to script out the table creation process to SQL Server. You can take a look at what is automatically created for you to get an idea of what you should be using. After that, you should take a look at SSMS (SQL Server Management Studio) over Visual Studio 2010. Tweaking the tables that your script will create will be easier in SSMS - in fact, most database development tasks will be easier in SSMS.
What data type should I use to the primary key column (uniqueidentifier, bigint, int)?
Depending on how many records you plan to have in your table, use int, or bigint. There are problems with uniqueidentfiers that you will probably want to avoid. INT vs Unique-Identifier for ID field in database
What should I use for the timestamps (timestamp, datatime, datetime2(7))?
timestamps are different in SQL Server than in MySQL. Despite the name, a timestamp is an incrementing number that is used as a mechanism to version rows. http://msdn.microsoft.com/en-us/library/ms182776%28v=sql.90%29.aspx . In short though, datetime is probably your best bet for compatibility purposes.
How should I enforce the created and modified timestamps (currently I am using triggers)?
See above. Also, the SQL Server version of a "Timestamp" is automatically updated by the DBMS. If you need a timestamp similar to your MySQL version, you can use a trigger to do that (but that is generally frowned upon...kind of dogmatic really).
How can I enforce foreign key constraints.
You should treat them as you would using innoDB. See this article for examples of creating foreign key constraints http://blog.sqlauthority.com/2008/09/08/sql-server-%E2%80%93-2008-creating-primary-key-foreign-key-and-default-constraint/
Should I be using Varchar(255) in SQL Server? (Maybe Text, Varchar(MAX) is better)
That depends on the data you plan to store in the field. Varchar max is equivalent to varchar(8000) and if you don't need varchar(255), you can always set it to a lower value like varchar(50). Using a field size that is too large has performance implications. One thing to note is that if you plan to support unicode (multilingual) data in your field, use nvarchar or nchar.

Resources