SQL Server 2019 UTF-8 collation & varchar(n) - sql-server

I use varchar(n) data type in many tables in my database.
Is there is a way to specify characters count (as n characters) instead bytes count?

Related

SQL Server TO Oracle table creation

We are using MS-SQL and Oracle as our database.
We have used hibernate annotations to create tables, in the annotation class file we have declared column definition as
#Column(name="UCAALSNO",nullable=false,columnDefinition="nvarchar(20)")
and this works fine for MS-SQL.
But when it comes to Oracle nvarchar throws an exception as oracle supports only nvarchar2.
How to create annotation file to support datatype nvarchar for both the databases.
You could use NCHAR:
In MSSQL:
nchar [ ( n ) ]
Fixed-length Unicode string data. n defines the string length and must
be a value from 1 through 4,000. The storage size is two times n
bytes. When the collation code page uses double-byte characters, the
storage size is still n bytes. Depending on the string, the storage
size of n bytes can be less than the value specified for n. The ISO
synonyms for nchar are national char and national character.
while in Oracle:
NCHAR
The maximum length of an NCHAR column is 2000 bytes. It can hold up to
2000 characters. The actual data is subject to the maximum byte limit
of 2000. The two size constraints must be satisfied simultaneously at
run time.
Nchar occupies a fixed space, so for very large table there will be a considerable space difference between an nchar and an nvarchar, so you should take this into consideration.
I usually have incremental DB schema migration scripts for my production DBs and I only rely on Hibernate DDL generation for my integration testing in-memory databases (e.g. HSQLDB or H2). This way I choose the production schema types first and the "columnDefinition" only applies to the testing schema, so there is no conflict.
You might want to read this too, which disregards the N(VAR)CHAR(2) additional complexity, so you might consider setting a default character encoding:
Given that, I'd much rather go with the approach that maximizes
flexibility going forward, and that's converting the entire database
to Unicode (AL32UTF8 presumably) and just using that.
Although that you might be recommanded to use VARCHAR2, VARCHAR has been synonym with VARCAHR2 for a long time now.
So quoting a DBA opinion:
The Oracle 9.2 and 8.1.7 documentation say essentially the same thing,
so even though Oracle continually discourages the use of VARCHAR, so
far they haven't done anything to change it's parity with VARCHAR2.
I'd say give it a try for VARCHAR too, as it's supported on most DBs.

Correct mapping of big Visual FoxPro CHAR column to the appropriate SQL Server data type

I am migrating a Visual FoxPro database to SQL Server. One of the tables has a Char column with length 2147483647. I am wondering if the correct data type to use in SQL Server 2008 R2 is Char(2147483647). Is there an alternative type I can use in SQL Server which will not result in any loss of information?
The following image gives a description of the column as shown within Visual Studio 2008.
Visual FoxPro's native CHAR type only allows up to about 255 characters. What you're seeing is a FoxPro MEMO field, translated to a generic OLE equivalent.
A SQL Server VARCHAR(MAX) is the usual proper equivalent, assuming the MEMO is simply user-entered text in a western dialect and not a multi-linqual or data-blob variation.
Be aware that FoxPro does NOT speak UTF natively, so you may have code-page translation issues.
Hope this helps someone else.
MSDN varchar description states:
Use varchar(max) when the sizes of the column data entries vary considerably, and the size might exceed 8,000 bytes.
The maximum storage for Char is 8000 bytes/characters. varchar(max) on the other hand will use a storage size equal to the actual length of the characters entered plus 2 bytes.

Difference between NVARCHAR in Oracle and SQL Server?

We are migrating some data from sql server to oracle. For columns defined as NVARCHAR in SQL server we started creating NVARCHAR columns in Oracle thinking them to be similar..But it looks like they are not.
I have read couple of posts on stackoverflow and want to confirm my findings.
Oracle VARCHAR2 already supports unicode if the database character set is say AL32UTF8 (which is true for our case).
SQLServer VARCHAR does not support unicode. SQLServer explicitly requires columns to be in NCHAR/NVARCHAR type to store data in unicode (specifically in the 2 byte UCS-2 format)..
Hence would it be correct to say that SQL Server NVARCHAR columns can/should be migrated as Oracle VARCHAR2 columns ?
Yes, if your Oracle database is created using a Unicode character set, an NVARCHAR in SQL Server should be migrated to a VARCHAR2 in Oracle. In Oracle, the NVARCHAR data type exists to allow applications to store data using a Unicode character set when the database character set does not support Unicode.
One thing to be aware of in migrating, however, is character length semantics. In SQL Server, a NVARCHAR(20) allocates space for 20 characters which requires up to 40 bytes in UCS-2. In Oracle, by default, a VARCHAR2(20) allocates 20 bytes of storage. In the AL32UTF8 character set, that is potentially only enough space for 6 characters though most likely it will handle much more (a single character in AL32UTF8 requires between 1 and 3 bytes. You probably want to declare your Oracle types as VARCHAR2(20 CHAR) which indicates that you want to allocate space for 20 characters regardless of how many bytes that requires. That tends to be much easier to communicate than trying to explain why some 20 character strings are allowed while other 10 character strings are rejected.
You can change the default length semantics at the session level so that any tables you create without specifying any length semantics will use character rather than byte semantics
ALTER SESSION SET nls_length_semantics=CHAR;
That lets you avoid typing CHAR every time you define a new column. It is also possible to set that at a system level but doing so is discouraged by the NLS team-- apparently, not all the scripts Oracle provides have been thoroughly tested against databases where the NLS_LENGTH_SEMANTICS has been changed. And probably very few third-party scripts have been.

Picking the right SQL Server collation for storage

How does the collation impact SQL Server in terms of storage and how does this affect the Unicode and non-unicode data types?
Does the collation impact Unicode storage? or just govern sort rules within the database?
When I use the non-unicode data types what restictions are tied to the collation?
If restrictions apply, what happens when I try to store a character not in database collation of a non-unicode data type?
My understanding is that the Unicode data type can always store the full set of Unicode data while the non-unicode data types storage capabilties depend on the code page (which is defined by the collation) and can only represent a number of common characters in that collation.
Obviously each character in an Unicode data type would at least occupy 2 bytes while the non-unicode data types occupy 1 byte per character (or does this vary with collation as well?)
Set me straight here, how does this work exactly?
SQL Server stores Unicode data (NTEXT, NVARCHAR) in UCS2, always resulting in 2 bytes per character.
A collation only affects sorting (and casing).
In non-Unicode data types (TEXT, VARCHAR), only a single byte is used per character, and only characters of the collation's code page can be stored (just as you stated). See this MSDN article on collations

Why is sql server storing question mark characters instead of Japanese characters in NVarchar fields?

I'm trying to store Japanese characters in nvarchar fields in my SQL Server 2000 database.
When I run an update statement like:
update blah
set address = N'スタンダードチャ'
where key_ID = 1
from SQL Server Management Studio, then run a select statement I see only question marks returned to the results window. I'm seeing the same question marks in the webpage which looks at the database.
It seems this is an issue with storing the proper data right? Can anyone tell me what I need to do differently?
This cannot be a correct answer given your example, but the most common reason I've seen is that string literals do not need a unicode N prefix.
So, instead of
set address = N'スタンダードチャ'
one would try to write to a nvarchar field without the unicode prefix
set address = 'スタンダードチャ'
See also:
N prefix before string in Transact-SQL query
I was facing this same issue when using Indian languages characters while storing in DB nvarchar fields. Then i went through this microsoft article -
http://support.microsoft.com/kb/239530
I followed this and my unicode issue got resolved.In this article they say - You must precede all Unicode strings with a prefix N when you deal with Unicode string constants in SQL Server
SQL Server Unicode Support
SQL Server Unicode data types support UCS-2 encoding. Unicode data types store character data using two bytes for each character rather than one byte. There are 65,536 different bit patterns in two bytes, so Unicode can use one standard set of bit patterns to encode each character in all languages, including languages such as Chinese that have large numbers of characters.
In SQL Server, data types that support Unicode data are:
nchar
nvarchar
nvarchar(max) – new in SQL Server 2005
ntext
Use of nchar, nvarchar, nvarchar(max), and ntext is the same as char, varchar, varchar(max), and text, respectively, except:
- Unicode supports a wider range of characters.
- More space is needed to store Unicode characters.
- The maximum size of nchar and nvarchar columns is 4,000 characters, not 8,000 characters like char and varchar.
- Unicode constants are specified with a leading N, for example, N'A Unicode string'
APPLIES TO
Microsoft SQL Server 7.0 Standard Edition
Microsoft SQL Server 2000 Standard Edition
Microsoft SQL Server 2005 Standard Edition
Microsoft SQL Server 2005 Express Edition
Microsoft SQL Server 2005 Developer Edition
Microsoft SQL Server 2005 Enterprise Edition
Microsoft SQL Server 2005 Workgroup Edition
The code is absolutely fine. You can insert an unicode string with a prefix N into a field declared as NVARCHAR. So can you check if Address is a NVARCHAR column. Tested the below code in SQL Server 2008R2 and it worked.
update blah
set address = N'スタンダードチャ'
where key_ID = 1
enter image description here
you need to write N before the string value.
e.g.INSERT INTO LabelManagement (KeyValue) VALUES
(N'変更命令');
Here I am storing value in japanese language and i have added N before the string character.
I am using Sql server 2014.
Hope you find the solution.
Enjoy.
You need to check the globalisation settings of all the code that deals with this data, from your database, data access and presentation layers. This includes SSMS.
(You also need to work out which version you are using, 2003 doesn't exist...)
SSMS will not display that correctly, you might see question marks or boxes
paste the results into word and they should be in Japanese
In the webpage you need to set the Content-Type, the code below will display Chinese Big5
<META HTTP-EQUIV="content-type" CONTENT="text/html; charset=big5">
To verify the data you can't use ascii since ascii only can see the ascii character set
run this
select unicode(address),ascii(address)
from blah where key_ID = 1
Output should be the following (it only looks at the first character)
12473 63
I can almost gurantee that the data type is not unicode. If you want to learn more you can check Wikipedia for information on Unicode, ASCII, and ANSI. Unicode can store more unique characters, but takes more space to store, transfer, and process. Also some programs and other things don't support unicode. The unicode data types for MS SQL are "nchar", "nvarchar", and "ntext".
We are using Microsoft SQL Server 2008 R2(SP3). Our table collation is specified as SQL_Latin1_General_CP1_CI_AS. I have my types specified as the n variety
nvarchar(MAX)
nchar(2)
..etc
To insert Japanese characters I prefix the string with a capital N
N'素晴らしい一日を'
Works like a charm.

Resources