Multi-language support - sql-server

We have developed a site that needs to display text in English, Polish, Slovak and Czech. However, when the text is entered into the database, any accented letters are changed to english letters.
After searching around on forums, I have found that it is possible to put an 'N' in front of a string which contains accented characters. For example:
INSERT INTO Table_Name (Col1, Col2) VALUES (N'Value1', N'Value2')
However, the site has already been fully developed so at this stage, going through all of the INSERT and UPDATE queries in the site would be a very long and tedious process.
I was wondering if there is any other, much quicker, way of doing what I am trying to do?
The database is MSSQL and the columns being inserted into are already nvarchar(n).

There isn't any quick solution.
The updates and inserts are wrong and need to be fixed.
If they were parameterized queries, you could have simply made sure they were using the NVarChar database type and you would not have a problem.
Since they are dynamic strings, you will need to ensure that you add the unicode specifier (N) in front of each text field you are inserting/updating.

Topic-starter wrote:
"text in English, Polish, Slovak and Czech. However, when the text is entered into the database, any accented letters are changed to english letters" After searching around on forums, I have found that it is possible to put an 'N' in front of a string which contains accented characters. For example:
INSERT INTO Table_Name (Col1, Col2) VALUES (N'Value1', N'Value2')
"The collation for the database as a whole is Latin1_General_CI_AS"
I do not see how it could happen due to SQL Server since Latin1_General_CI_AS treats european "non-English" letters:
--on database with collation Latin1_General_CI_AS
declare #test_multilanguage_eu table
(
c1 char(12),
c2 nchar(12)
)
INSERT INTO #test_multilanguage_eu VALUES ('éÉâÂàÀëËçæà', 'éÉâÂàÀëËçæà')
SELECT c1, cast(c1 as binary(4)) as c1bin, c2, cast(c2 as binary(4)) as c2bin
FROM #test_multilanguage_eu
outputs:
c1 c1bin c2 c2bin
------------ ---------- ------------ ----------
éÉâÂàÀëËçæà 0xE9C9E2C2 éÉâÂàÀëËçæà 0xE900C900
(1 row(s) affected)
I believe you simply have to check checkboxes them Control Panel --> Regional and Language Options --> tab Advanced --> Code page conversion tables and check that you render in the same codepage as you store it.
Converting to unicode from encodings used by clients would lead to problems to render back to webclients, it seems to me.
I believe that most European collation designators use codepage 1252 [1], [2].
Update:
SELECT
COLLATIONPROPERTY('Latin1_General_CI_AS' , 'CodePage')
outputs 1252
[1]
http://msdn.microsoft.com/en-us/library/ms174596.aspx
[2]
Windows 1252
http://msdn.microsoft.com/en-us/goglobal/cc305145.aspx

Related

characters appearing incorrectly even with Unicode source and destination (SSIS)

I am having a codepage unicode/non unicode problem and need expertise to understand it.
In SSIS I am reading data in from a UTF8 encoded text file. The datatypes are all DT_WSTR (unicode string). The destination is NVARCHAR which is also unicode.
Non standard characters such as Ú are not being encoded correctly )appearing as a black box question mark).
If the character appears correctly in the input file, the source is set to DT_WSTR & the destination is nvarchar, why is the character not rendering correctly?
I have tried setting the codepage of the source column to 65001, but in SSIS its only possible to change the codepage on a STR (non unicode) type.
Id appreciate any help in understanding why all unicode fields still cant store a unicode value correctly.
Update from the OP comments
It seems my output is ok if i use Unicode types end to end (input is DT_WSTR, destination column is nvarchar & when extracting again to text, output column is DW_WSTR. The only issue is sql server management studio, which does not seem to be able to render unicode characters correctly in the results of a query, when setting output to grid or text. this is a red herring and the process overall works without issue if this is ignored
Trying to figure out the issue
There is not problem importing unicode characters from flat files to SQL Server destination, the only thing you have to do is the set the flat file encoding as unicode, and the result columns must be NVARCHAR. Based on your question, it looks like you have met the requirements so i can say that:
Unicode Character are imported successfully to SQL Server, but for some reasons SQL Server Management Studio cannot show unicode characters in a grid Results, to check that data is imported correctly, change change the result view to Result To Text.
GoTo Tools >> Options >> Query Results >> Results To Text
In the second reference link i provided they mentioned that:
If you use SSMS for your queries, change to output type from "Grid" to "Text", because depending on the font the grid can't show unicode.
Or you can try to change the Grid Results font, (on my machine, i use Tahoma font and it shows unicode characters normally)
Experiments
You can perform the following test (taken from the links below)
SET NOCOUNT ON;
CREATE TABLE #test
( id int IDENTITY(1, 2) NOT NULL Primary KEY
,Uni nvarchar(20) NULL);
INSERT INTO #test (Uni) VALUES (N'DE: äöüßÖÜÄ');
INSERT INTO #test (Uni) VALUES (N'PL: śćźłę');
INSERT INTO #test (Uni) VALUES (N'JAP: 言も言わずに');
INSERT INTO #test (Uni) VALUES (N'CHN: 玉王瓜瓦甘生用田由疋');
SELECT * FROM #test;
GO
DROP TABLE #test;
Try the following query using Result as Grid and Result as Text options.
References
SQL Server 2012 not showing unicode character in results
sql server 2008 not showing and inserting unicode characters!
Import UTF-8 Unicode Special Characters with SQL Server Integration Services
Microsoft SQL Server Management Studio - query result as text

Central european characters in SQL

I have an issue. I have data stored on SQL server with central european characters like "č", "ř", "ž" etc. On the database I have the "Czech_CI_AS" collation which should accepted these characters. But when I try to select for example name of the street with this characters like this:
SELECT *
FROM Street where Name = 'Čáslavská'
It returns me nothing
When I remove the "č" it returns me what I need.
SELECT *
FROM Street where Name like '%áslavská'
I have this column in nvarchar type. But I cannot use the N character before my string because the external applications use this table for read and selects are made automaticlly.
Is here any solution? Or have I got something wrong?
Thanks for any help
#YuriyTsarkov really deservers the credit here. To elaborate on his answer.
From MSDN:
Prefix Unicode character string constants with the letter N. Without the N prefix, the string is converted to the default code page of the database. This default code page may not recognize certain characters.
Example
-- Storing Čáslavská in two vars, with and without N prefix.
DECLARE #Test_001 NVARCHAR(255) = 'Čáslavská' COLLATE Czech_CI_AS;
DECLARE #Test_002 NVARCHAR(255) = N'Čáslavská' COLLATE Czech_CI_AS;
-- Test output.
SELECT
#Test_001 AS T1,
#Test_002 AS T2
;
Returns
T1 T2
Cáslavská Čáslavská
You need to update all your external applications code to use selects with N, or, you need to change collation of your column to same, as used by external applications. It may cause some data loss.

Why can I store an Ukrainian string in a varchar column?

I got a little surprised as I was able to store an Ukrainian string in a varchar column .
My table is:
create table delete_collation
(
text1 varchar(100) collate SQL_Ukrainian_CP1251_CI_AS
)
and using this query I am able to insert:
insert into delete_collation
values(N'використовується для вирішення квитки')
but when I am removing 'N' it is showing ?????? in the select statement.
Is it okay or am I missing something in understanding unicode and non-unicode with collate?
From MSDN:
Prefix Unicode character string constants with the letter N. Without
the N prefix, the string is converted to the default code page of the
database. This default code page may not recognize certain characters.
UPDATE:
Please see a similar questions::
What is the meaning of the prefix N in T-SQL statements?
Cyrillic symbols in SQL code are not correctly after insert
sql server 2012 express do not understand Russian letters
To expand on MegaTron's answer:
Using collate SQL_Ukrainian_CP1251_CI_AS, SQL server is able to store ukrainian characters in a varchar column by using CodePage 1251.
However, when you specify a string without the N prefix, that string will be converted to the default non-unicode codepage before it is sent to the database, and that is why you see ??????.
So it is completely fine to use varchar and collate as you do, but you must always include the N prefix when sending strings to the database, to avoid the intermediate conversion to default (non-ukrainian) codepage.

How to Show Eastern Letter(Chinese Character) on SQL Server/SQL Reporting Services?

I need to insert chinese characters in my database but it always show ???? ..
Example:
Insert this record.
微波室外单元-Apple
Then it became ???
Result:
??????-Apple
I really Need Help...thanks in regard.
I am using MSSQL Server 2008
Make sure you specify a unicode string with a capital N when you insert like:
INSERT INTO Table1 (Col1) SELECT N'微波室外单元-Apple' AS [Col1]
and that Table1 (Col1) is an NVARCHAR data type.
Make sure the column you're inserting to is nchar, nvarchar, or ntext. If you insert a Unicode string into an ANSI column, you really will get question marks in the data.
Also, be careful to check that when you pull the data back out you're not just seeing a client display problem but are actually getting the question marks back:
SELECT Unicode(YourColumn), YourColumn FROM YourTable
Note that the Unicode function returns the code of only the first character in the string.
Once you've determined whether the column is really storing the data correctly, post back and we'll help you more.
Try adding the appropriate languages to your Windows locale setings. you'll have to make sure your development machine is set to display Non-Unicode characters in the appropriate language.
And ofcourse u need to use NVarchar for foreign language feilds
Make sure that you have set an encoding for the database to one that supports these characters. UTF-8 is the de facto encoding as it's ASCII compatible but supports all 1114111 Unicode code points.
SELECT 'UPDATE table SET msg=UNISTR('''||ASCIISTR(msg)||''') WHERE id='''||id||''' FROM table WHERE id= '123344556' ;

How to use ORDER BY, LOWER in SQL SERVER 2008 with non-unicode data

The question is about Armenian. I'm using sql server 2005, collation
SQL_Latin1_General_CP1_CI_AS, data mostly is in Armenian and we can't use unicode.
I tested on ms sql 2008 with a windows collation for armenian language ( Cyrillic_General_100_ ), I have found here, ( http://msdn.microsoft.com/en-us/library/ms188046.aspx ) but it didn't help.
I have a function, that orders hex values and a lower function, which takes each char in each string and converts it to it's lower form, but it's not acceptable solution, it works really slow, calling that functions on every column of a huge table.
Is there any solution for this issue not using unicode and not working with hex values manually?
UPDATE:
On the left side are mixed case words, sorted in the right order and with lower case representations on the right side. Hope this will help. Thank You.
Words are written in unicode.
ԱբԳդԵզ -> աբգդեզ
ԱգԳսԴԼ -> ագգսդլ
ԲաԴֆդԴ -> բադֆդդ
ԳԳԼասա -> գգլասա
ԴմմլօՏ -> դմմլօտ
ԵլԲնՆն -> ելբննն
ԶՎլուտ -> զվլուտ
էԹփձջՐ -> էթփձջր
ԸխԾդսՂ -> ըխծդսղ
ԹԶէըԿր -> թզէըկր
One solution would be to create a computed column for each text column which converts the value into Armenian collation and sets it to lower case like so:
Alter Table TableName
Add TextValueArmenian As ( LOWER(TextColumn COLLATE Latin1_General_CI_AS) ) PERSISTED
Once you do this, you can put indexes on these columns and query for them.
If that isn't your flavor of tea, then another solution would be an indexed view where you create a view with SCHEMABINDING that casts each of the various columns to lower case and to the right collation and then put indexes on that view.
EDIT I notice in your examples, that your are using a Case-insensitive, Accent-sensitive. Perhaps the simple solution to your ordering issues would be to use Latin1_General_CS_AS or Cyrillic_General_100_CS_AS if available.
EDIT
Whew. After quite a bit of research, I think I have an answer which unfortunately may not be you will want. First, yes I can copy the text you provided into code or something like Notepad++ because StackOverflow is encoded using UTF-8 and Armenian will fit into UTF-8. Second, this hints at what you are trying to achieve: storing UTF-8 in SQL Server. Unfortunately, SQL Server 2008 (or any prior version) does not natively support UTF-8. In order to store data in UTF-8, you have a handful of choices:
Store it in binary and convert it to UTF-8 on the client (which pretty much eliminates any sorting done on the server)
Store it in Unicode and convert it to UTF-8 on the client. It should be noted that the SQL Server driver will already convert most strings to Unicode and your example does work fine in Unicode. The obvious downside is that it eats up twice the space.
Create a CLR user defined type in SQL Server to store UTF-8 values. Microsoft provides a sample that comes with SQL Server to do just this. You can download the samples from CodePlex from here. You can also find more information on the sample in this article in the Books Online. The downside is that you have to have the CLR enabled in SQL Server and I'm not sure how well it will perform.
Now, that said, I was able to get you sample working with no problem using Unicode in SQL Server.
If object_id('tempdb..#Test') Is Not Null
Drop Table #Test
GO
Create Table #Test
(
EntrySort int identity(1,1) not null
, ProperSort int
, MixedCase nvarchar(50)
, Lowercase nvarchar(50)
)
GO
Insert #Test(ProperSort, MixedCase, Lowercase)
Select 1, N'ԱբԳդԵզ',N'աբգդեզ'
Union All Select 6, N'ԵլԲնՆն',N'ելբննն'
Union All Select 2, N'ԱգԳսԴԼ',N'ագգսդլ'
Union All Select 3, N'ԲաԴֆդԴ',N'բադֆդդ'
Union All Select 4, N'ԳԳԼասա',N'գգլասա'
Union All Select 5, N'ԴմմլօՏ',N'դմմլօտ'
Union All Select 9, N'ԸխԾդսՂ',N'ըխծդսղ'
Union All Select 7, N'ԶՎլուտ',N'զվլուտ'
Union All Select 10, N'ԹԶէըԿր',N'թզէըկր'
Union All Select 8,N'էԹփձջՐ',N'էթփձջր'
Select * From #Test Order by ProperSort
Select * From #Test Order by Lowercase
Select * From #Test Order by Lower(MixedCase)
All three of these queries return the same result.
Did you get an error like this?
Msg 448, Level 16, State 1, Line 1
Invalid collation 'Cyrillic_General_100_'.
Try:
ORDER BY Name COLLATE Cyrillic_General_100_CI_AS
Or pick one you prefer from this list:
select * from fn_helpcollations()
where name like 'Cyrillic_General_100_%'

Resources