SQL Server 2016 UTF-8 support of nvarchar columns? - sql-server

I'm trying to figure out if my SQL Database setup is ready to store any language in the world (also Japanese). Having read a lot of documentation (Microsofts own) I'm still unsure if further specification of collation is needed.
Database: SQL Server Standard 2016
Collation: SQL_Latin1_General_CP1_CI_AS
Question: I have a table with a MovieTitle column, defined as nvarchar(2048). Will this be able to store the movie title of any language in the world? According to the documentation it seems that any nvarchar column is UTF-8 from SQL Server version 2009.
I'm asking because recently I searched for
WHERE MovieTitle = ''
and it returns several results with different Arabic titles.

Related

Storing Emojis in SQL Tables

I am working with a SQL Server 2008 database on a Windows 2008 Server. Anytime I try to store an emoji in my table it converts it to a weird looking box. When I try to store the same emoji in SQL Server 2012 it stores the emoji fine. Is it not possible to store emojis correctly in SQL Server 2008? I really cannot update at this point so that would not be an option.
What we know based on details from the question and comments on the question:
Column is NVARCHAR
Value is inserted from VB.NET app via stored procedure
App hitting SQL Server 2008 (running on Windows 2008 Server) stores emoji character but "converts it to a weird looking box"
Same app code hitting SQL Server 2012 stores the same emoji character just fine
What we do not know:
How is the character being retrieved in order to determine whether or not it was stored correctly?
Are you viewing it in the app or in SSMS?
If in SSMS, are you connecting to SQL Server 2008 and 2012 using the same SSMS running on the same machine? Or are you using the version of SSMS that came with each version of SQL Server (hence they are not the same program, even if on the same machine)?
Based on the above:
Most likely this is a font issue. I say this due to:
If it were an issue of not supporting Unicode, then you would be seeing two question marks ?? (one for each surrogate character) instead of a single square box.
Emojis are nothing special. They are merely supplementary characters. And there are currently (as of Unicode v 12.0) 72,457 supplementary characters defined (and slots for another 976,119).
Supplementary Characters (emojis or otherwise) can be stored in NCHAR, NVARCHAR, and NTEXT columns without a problem, and without regard to the collation of the column or the current database.
To test this, I executed the following in a database having a default collation of SQL_Latin1_General_CP1_CI_AS, so there is definitely no "supplementary character support" there.
SELECT NCHAR(0xD83D) + NCHAR(0xDE31) AS [ScreamingFace],
NCHAR(0xD83D) + NCHAR(0xDDFA) AS [WorldMap],
NCHAR(0xD83D) + NCHAR(0xDF08) AS [Alchemical Symbol for Aqua Vitae];
It returns:
ScreamingFace WorldMap Alchemical Symbol for Aqua Vitae
😱 🗺 🜈
I see different things in different areas, all due to font differences. The chart below indicates what I am seeing:
LOCATION FONT Screaming World Alchemical Symbol
Face Map for Aqua Vitae
------------ ------------ ---------- ------ ----------------------------
Text Editor Consolas Yes Yes Square box w/ question mark
Grid Results Code2003 Yes Yes Yes
Text Results Courier New Yes Yes Empty square box
Most likely you were using two different versions of SSMS, or at least SSMS on two different computers. In either case, you probably had different fonts mapped to the Grid Results, or were even using Grid Results on one and Text Results on the other.
In the end, if you want to know if data was stored correctly, you need to check the bytes that were stored. To do this, simply convert the string column to VARBINARY(MAX):
SELECT CONVERT(VARBINARY(MAX), string_column)
FROM schema.table;
And compare those results between the 2008 and 2012 systems. More than likely they are (or "were" given that this was almost 2.5 years ago) the same.
For more info on what characters can actually be stored in the various string datatypes in SQL Server (from SQL Server 7.0 through at least SQL Server 2019), please read the following post of mine:
How Many Bytes Per Character in SQL Server: a Completely Complete Guide

Rupee symbol '₹' and Nigeria naira '₦' are not supported by database. It's saving as '¿' in the database Oracle and SQL Server

Rupee symbol '₹' and Nigeria naira '₦' are not supported by database. It is saving as '¿' in the database Oracle and SQL Server.
Even I set as NLS_CHARACTERSET=WE8MSWIN1252 in Oracle, it's not working
Any other settings has to be done in db?
For SQL Server, you must:
define the column to hold this information as NVARCHAR(n) datatype (not varchar(n) !)
use the N'...' syntax when inserting values from SQL script to ensure Unicode storage
INSERT INTO dbo.YourTable(UnicodeColumn)
VALUES(N'₹'), (N'₦')
use the correct Unicode data type for e.g. a parameter if you're inserting your values from frontend code (e.g. PHP, C#, Java)

nvarchar & varchar with Oracle & SQLServer

I need to upload some data from an Oracle table to a SQL Server table. The data will be uploaded to the SQL server using a Java processing utilising JDBC facilities.
Is there any benefit in creating the SQL server columns using nvarchar instead of varchar?
Google suggests that nvarchar is used when UNICODE characters are involved but i am wondering does nvarchar provide any benefit in this situation? (i.e. when the source data of the SQL Server table comes from an Oracle database running a Unix environment?)
Thanks in advance
As you have found out, nvarchar stores unicode characters - the same as nvarchar2 within Oracle. It comes down to whether your source data is unicode - or whether you anticipate having to store unicode values in future (e.g. internationalized software)

Does Access have any issues with unicode capable data types like nvarchar in SQL Server?

I am using Access 2003 as a front end UI for a SQL Server 2008 database. In looking at my SQL Server database design I am wondering if nvarchar was the right choice to use over varchar. I chose nvarchar because I thought it would be useful in case any characters represented by unicode needed to be entered. However, I didn't think about any possible issues with Access 2003 using the uni-code datatype. Are there any issues with Access 2003 working with unicode datatypes within SQL Server (i.e. nvarchar)? Thank you.
You can go ahead and use nvarchar, if that's the correct datatype for the job. Access supports Unicode data, both with it's own tables and with external (linked) tables and direct queries.

Character set issues with Oracle Gateways, SQL Server, and Application Express

I am migrating data from a Oracle on VMS that accesses data on SQL Server using heterogeneous services (over ODBC) to Oracle on AIX accessing the SQL Server via Oracle Gateways (dg4msql). The Oracle VMS database used the WE8ISO8859P1 character set. The AIX database uses WE8MSWIN1252. The SQL Server database uses "Latin1-General, case-insensitive, accent-sensitive, kanatype-insensitive, width-insensitive for Unicode Data, SQL Server Sort Order 52 on Code Page 1252 for non-Unicode Data" according to sp_helpsort. The SQL Server databases uses nchar/nvarchar or all string columns.
In Application Express, extra characters are appearing in some cases, for example 123 shows up as %001%002%003. In sqlplus, things look ok but if I use Oracle functions like initcap, I see what appear as spaces between each letter of a string when I query the sql server database (using a database link). This did not occur under the old configuration.
I'm assuming the issue is that an nchar has extra bytes in it and the character set in Oracle can't convert it. It appears that the ODBC solution didn't support nchars so must have just cast them back to char and they showed up ok. I only need to view the sql server data so I'm open to any solution such as casting, but I haven't found anything that works.
Any ideas on how to deal with this? Should I be using a different character set in Oracle and if so, does that apply to all schemas since I only care about one of them.
Update: I think I can simplify this question. SQL Server table uses nchar. select dump(column) from table returns Typ=1 Len=6: 0,67,0,79,0,88 when the value is 'COX' whether I select from a remote link to sql server, cast the literal 'COX' to an nvarchar, or copy into an Oracle table as an nvarchar. But when I select the column itself it appears with extra spaces only when selecting from the remote sql server link. I can't understand why dump would return the same thing but not using dump would show different values. Any help is appreciated.
There is an incompatibility between Oracle Gateways and nchar on that particular version of SQL Server. The solution was to create views on the SQL Server side casting the nchars to varchars. Then I could select from the views via gateways and it handled the character sets correctly.
You might be interested in the Oracle NLS Lang FAQ

Resources