SQL Server character set and N prefix - sql-server

[THIS IS NOT A QUESTION ABOUT NVARCHAR OR HOW TO STORE CHINESE CHARACTER]
SQL Server 2008 Express
Database collation is SQL_Latin1_General_CP1_CI_AS
create table sample1(val varchar(2))
insert into sample1 values(N'中文')
I know these Chinese characters would become junk characters.
I know I can use nvarchar to overcome all problem.
What I don't know is: why there isn't "string too long" error when I run the insert statement?
N prefix means that client will encode the string using UNICODE.
2 Chinese characters will become 4 bytes.
varchar(2) can only contain 2 bytes.
Why people down vote this question? really?

An implied cast takes place. This would work if "val" was created as nvarchar(2).

More explanation to #marc_s answer.
The character N'中文' will be converted to varchar with the collation SQL_Latin1_General_CP1_CI_AS. Since there is no such character in the code page, it will converted to not defined, and 0x3f3f in the end. 0x3f is the question mark, so there will be two question marks in this case and it won't exceed the column length.

Try to use NVARCHAR(...), NCHAR(...) datatypes -
CREATE TABLE dbo.sample1
(
val NVARCHAR(4)
)
INSERT INTO dbo.sample1
SELECT N'中文'

Related

Why can I store an Ukrainian string in a varchar column?

I got a little surprised as I was able to store an Ukrainian string in a varchar column .
My table is:
create table delete_collation
(
text1 varchar(100) collate SQL_Ukrainian_CP1251_CI_AS
)
and using this query I am able to insert:
insert into delete_collation
values(N'використовується для вирішення квитки')
but when I am removing 'N' it is showing ?????? in the select statement.
Is it okay or am I missing something in understanding unicode and non-unicode with collate?
From MSDN:
Prefix Unicode character string constants with the letter N. Without
the N prefix, the string is converted to the default code page of the
database. This default code page may not recognize certain characters.
UPDATE:
Please see a similar questions::
What is the meaning of the prefix N in T-SQL statements?
Cyrillic symbols in SQL code are not correctly after insert
sql server 2012 express do not understand Russian letters
To expand on MegaTron's answer:
Using collate SQL_Ukrainian_CP1251_CI_AS, SQL server is able to store ukrainian characters in a varchar column by using CodePage 1251.
However, when you specify a string without the N prefix, that string will be converted to the default non-unicode codepage before it is sent to the database, and that is why you see ??????.
So it is completely fine to use varchar and collate as you do, but you must always include the N prefix when sending strings to the database, to avoid the intermediate conversion to default (non-ukrainian) codepage.

LEN and DATALENGTH of VARCHAR and NVARCHAR

After reading "What is the difference between char, nchar, varchar, and nvarchar in SQL Server?" I have a question.
I'm using MS SQL Server 2008 R2
DECLARE #T TABLE
(
C1 VARCHAR(20) COLLATE Chinese_Traditional_Stroke_Order_100_CS_AS_KS_WS,
C2 NVARCHAR(20) COLLATE Chinese_Traditional_Stroke_Order_100_CS_AS_KS_WS
)
INSERT INTO #T VALUES (N'中华人民共和国',N'中华人民共和国')
SELECT LEN(C1) AS [LEN(C1)],
DATALENGTH(C1) AS [DATALENGTH(C1)],
LEN(C2) AS [LEN(C2)],
DATALENGTH(C2) AS [DATALENGTH(C2)]
FROM #T
Returns
LEN(C1) DATALENGTH(C1) LEN(C2) DATALENGTH(C2)
----------- -------------- ----------- --------------
7 12 7 14
Why the second DATALENGTH(C1) is 12?
In your INSERT you are converting text from unicode to chinese codepage for C1. Most likely this process alters the text and something may be lost.
Here is SQL Fiddle.
You can see that the second character 华 is stored as 3F in varchar. You can also see that the last character 国 is also stored as 3F in varchar. 3F is a code for ?. When Windows tries to convert text from unicode to the codepage and certain character can't be represented in the given codepage, the conversion function (most likely WideCharToMultiByte) puts ? for such characters .
One more example. The last, but one character 和 is encoded as A94D in varchar and 8C54 in nvarchar. If you look it up in Character Map it will show these codes (unicode and codepage):
See also:
What does it mean when my text is displayed as Question Marks?
https://www.microsoft.com/middleeast/msdn/Questionmark.aspx
Any time Unicode data must be displayed, they may be internally
converted from Unicode using the WideCharToMultiByte API. Any time a
character cannot be represented on the current code page, it will be
replaced by a question mark (?).
This is exactly what is happening when you store a unicode literal N'中华人民共和国' in a varchar column. The unicode text is converted to multi-byte and some characters can't be represented in that code page and they are replaced by question marks ?.

Why I can insert non-ascii characters into VARCHAR column and correctly get it back?

Below is my code sample.
DECLARE #a TABLE (a VARCHAR(20));
INSERT #a
(a)
VALUES ('中');
SELECT *
FROM #a;
I'm using SQL Server Management Studio to run it. My question is, why I can insert non-ascii characters into VARCHAR column and correctly get it back? As I understand, VARCHAR type is only for ascii characters and the NVARCHAR is for unicode characters. Anyone can help to explain it please? I'm on Windows 7 with SQL Server 2014 developer edition.
The codepage used to store the varchar data varies by DB collation.
https://msdn.microsoft.com/en-us/library/ms189617.aspx
Varchar is 8 bits, so you may have a different collation, or you may have gotten lucky on where your character falls on the code set
You can find the ASCII and Extended ASCII characters below.
ASCII
Extended ASCII
I don't believe '中' is an ASCII character.
www.asciitable.com

SQL Server 2005: converting varchar to nvarchar issue

I have table with two varchar columns first_name and last_name.
Now I need to convert these columns to nvarchar in order to support UTF-8.
I choose nvarchar datatype in SSMS for these columns and when I try to enter some UTF-8 data, my symbols converts to question marks. For example, if I input йцукен (Ukrainian) it will be converted to ??????.
What is the problem and how to fix it?
Thanks.
When you want to insert nvarchar literals into the database table, you must use the N'..' prefix.
So use
INSERT INTO dbo.YourTable(First_Name)
VALUES(N'йцукен')
so that this string will be treated as a unicode string
If you're not using the N'..' notation, you're really inserting a non-unicode string literal - and this will cause these conversions to ?

How to Show Eastern Letter(Chinese Character) on SQL Server/SQL Reporting Services?

I need to insert chinese characters in my database but it always show ???? ..
Example:
Insert this record.
微波室外单元-Apple
Then it became ???
Result:
??????-Apple
I really Need Help...thanks in regard.
I am using MSSQL Server 2008
Make sure you specify a unicode string with a capital N when you insert like:
INSERT INTO Table1 (Col1) SELECT N'微波室外单元-Apple' AS [Col1]
and that Table1 (Col1) is an NVARCHAR data type.
Make sure the column you're inserting to is nchar, nvarchar, or ntext. If you insert a Unicode string into an ANSI column, you really will get question marks in the data.
Also, be careful to check that when you pull the data back out you're not just seeing a client display problem but are actually getting the question marks back:
SELECT Unicode(YourColumn), YourColumn FROM YourTable
Note that the Unicode function returns the code of only the first character in the string.
Once you've determined whether the column is really storing the data correctly, post back and we'll help you more.
Try adding the appropriate languages to your Windows locale setings. you'll have to make sure your development machine is set to display Non-Unicode characters in the appropriate language.
And ofcourse u need to use NVarchar for foreign language feilds
Make sure that you have set an encoding for the database to one that supports these characters. UTF-8 is the de facto encoding as it's ASCII compatible but supports all 1114111 Unicode code points.
SELECT 'UPDATE table SET msg=UNISTR('''||ASCIISTR(msg)||''') WHERE id='''||id||''' FROM table WHERE id= '123344556' ;

Resources