SQL server globalization - sql-server

When using arabic language, data is stored in database with ? (question mark) sign instead of real arabic characters in SQL server. Suggest me solution if anybody have idea.

There are several possibilities here:
Your language of choice is not encoding the text properly. Make sure you use unicode strings.
You are transferring the data incorrectly. Are you using the correct SQL types? NVarChar instead of VarChar, NChar instead of Char for your stored procedure parameters.
You are saving the data incorrectly. Again you should be using NVarChar fields instead of VarChar, NChar instead of Char etc...
You should be using unicode character strings at all places. See the list of SQL Server data types.

Related

Microsoft SQL Server database character conversion

I have a database imported that contains Arabic characters but are displayed as question marks. Through some searching I found that the column data types should be nvarchar to support unicode (they were varchar). I changed one of the columns that contains those characters to nvarchar but the data is still displayed as "??". How can I change the existing values to become unicode and display correctly?
You cannot just simply change the datatype to nvarchar - that won't bring back the data since it's already been "destroyed" by having been converted to a non-Unicode format.
You need to use nvarchar and then you need to insert (or update) the data in such a way that doesn't convert it back to ANSI codes.
If you use T-SQL to insert that Unicode code, make sure to use the N'...' prefix:
INSERT INTO dbo.YourTable(NvarcharCol)
VALUES (N'nvarchar-value')
From a front-end language like C# or PHP or Ruby, make sure to use Unicode strings - .NET (C# and VB.NET) does that automatically. When using queries with parameters, make sure to specify Unicode string types for those relevant parameters.
You need different collation.
Read more here: https://msdn.microsoft.com/en-us/library/ms143508.aspx

Unicode Data Type in SQL

I'm new to Microsoft SQL. I'm planning to store text in Microsoft SQL server and there will be special international characters. Is there a "Data Type" specific to Unicode or I'm better encoding my text with a reference to the unicode number (i.e. \u0056)
Use Nvarchar/Nchar (MSDN link). There used to be an Ntext datatype as well, but it's deprecated now in favour of Nvarchar.
The columns take up twice as much space over the non-unicode counterparts (char and varchar).
Then when "manually" inserting into them, use N to indicate it's unicode text:
INSERT INTO MyTable(SomeNvarcharColumn)
VALUES (N'français')
When you say special international characters, what do you mean? If special means they aren't common and just occasional, then the overhead of nvarchar might not make sense in your situation on a table with a very large number of rows or a lot of indexing.
I'm all for using Unicode where appropriate, but understanding when it is appropriate is important.
If you are mixing data with different implied code pages (Japanese and Chinese in same database) or you just want to be forward-looking for internationalization and localization, then you want the column to be Unicode and use nvarchar data type and that's perfectly fine. Unicode is not going to magically solve all sorting problems for you.
If you are know that you will always be storing mainly ASCII but some occasional foreign characters, just store your UTF-8 data or HTML encoded data in varchar. If your data is all in Japanese and code page 932 (or any other single code page), you can still store double-byte characters in varchar, they still take up two bytes. My point is, that when you are already in a DBCS collation, international characters are no longer "special". It's not just the data storage, but any indexes as well as the working set when dealing with such a column in queries and in other dataflows.
And do not make a blanket rule that all character data should be nvarchar - it's a waste for many columns which are codes or identifiers.
Any time you have a column, go through the same questions:
What is the type of data?
What is the range?
Are NULLs allowed?
What is the limit of the size?
Are there any constraints I should apply now to stop bad data getting in from the beginning?
People have had success with using the following code to force Unicode at insert data manipulation.
INSERT INTO <table> (text) values (N'<text here>)
1
Character set features of tables and string inside them are specified for the database and if your database has a Unicode collation, strings inside the tables are unicode. As well for string columns you have to use nvarchar or nchar data types to make them able to store unicode strings. But this feature works if your database has a utf8 or unicode characterset or collation. Read this link for more information. Unicode and SQL Server

Unicode conversion, database woes (Delphi 2007 to XE2)

Currently, I am in the process of updating all of our Delphi 2007 code base to Delphi XE2. The biggest consideration is the ANSI to Unicode conversion, which we've dealt with by re-defining all base types (char/string) to ANSI types (ansichar/ansistring). This has worked in many of our programs, until I started working with the database.
The problem started when I converted a program that stores information read from a file into an SQL Server 2008 database. Suddenly simple queries that used a string to locate data would fail, such as:
SELECT id FROM table WHERE name = 'something'
The name field is a varchar. I found that I was able to complete the query successfully by prefixing the string name with an N. I was under the impression that varchar could only store ANSI characters, but it appears to be storing Unicode?
Some more information: the name field in Delphi is string[13], but I've tried dropping the [13]. The database collation is SQL_Latin1_General_CP1_CI_AS. We use ADO to interface with the database. The connection information is stored in the ODBC Administrator.
NOTE: I've solved my actual problem thanks to a bit of direction from Panagiotis. The name we read from our map file is an array[1..24] of AnsiChar. This value was being implicitly converted to string[13], which was including null characters. So a name with 5 characters was really being stored as the 5 characters + 8 null characters in the database.
varchar fields do NOT store Unicode characters. They store ASCII values in the codepage specified by the field's collation. SQL Server will try to convert characters to the correct codepage when you try to store Unicode or data from a different codepage. You can disable this feature but the best option is to avoid the whole mess by using nvarchar fields and UnicodeString in your application.
You mention that you changes all character types to ANSI, not UNICODE types in your application. If you want to use UNICODE you should be using a UNICODE type like UnicodeString. Otherwise your values will be converted to ANSI when they are sent to your server. This conversion is done by your code when you create the AnsiString that is sent to the server.
BTW, your select statement stores an ASCII value in the field. You have to prepend the value with N if you want to store it as a unicode value, eg.g
SELECT id FROM table WHERE name = N'something'
Even this will not guarantee that your data will reach the server in a Unicode form. If you store the statement in an AnsiString the entire statement is converted to ANSI before it is sent to the server. If your app makes a wrong conversion, you will end up with mangled data on the server.
The solution is very simple, just use parameterized statements to pass unicode values as unicode parameters and store them in NVarchar fields. It is much faster, avoids all conversion errors and prevents SQL injection attacks.

SQL Server 2008: Collation for UTF-8 code page 65001

There is a need to save an XML in UTF-8 encoding and then use it in T-SQL code to extract data.
Default database collation is SQL_Latin1_General_CP1_CI_AS.
I don't know if it is possible to save and work with UTF-8 data in SQL Server 2008, but I have an idea to use collation with code page of UTF-8 (65001) on the XML column in order to save the data in UTF-8.
Does anybody know if it is possible or have another idea on how to work with UTF-8 data in SQL Server?
If you're dealing with xml data, store it as the xml data type. That should take care of any concerns you have (i.e. how to store it) and you'll save yourself the work of having to convert it to xml when you do work on it (e.g. xpath expressions, xquery, etc).
You can store all Unicode characters in xml or nvarchar columns. It does not matter what collation you use. A handful of rare Chinese characters (from the supplementary plane) may be stored as pairs of nchars (surrogate pairs). But there is no loss of data.
NVARCHAR column should do the job just fine.

SQL Server multi language data support

How do you setup a SQL Server 2005 DBMS, so that you can store data in different languages?
My exact problem is this: in SQL Server Management Studio I'm writing an insert statement which contains German Umlauts. Text is successfully saved but reading the same value results in text without Umlaut.
Consider that I have to support 4 languages: English, German, Greek & Russian (I don't want to think what I will face with the Russian text).
The DBMS is now setup with Greek collation (to support Greek).
Does this cause any problem??
Any hints??
You need to use nvarchar data type for strings ( http://msdn.microsoft.com/en-us/library/ms186939.aspx ) and you also need to precede all unicode strings with N ( http://support.microsoft.com/kb/239530 ).
When dealing with Unicode string constants in SQL Server you must precede all Unicode strings with a capital letter N, as documented in the SQL Server Books Online topic "Using Unicode Data". The "N" prefix stands for National Language in the SQL-92 standard, and must be uppercase. If you do not prefix a Unicode string constant with N, SQL Server will convert it to the non-Unicode code page of the current database before it uses the string.
are you using nvarchar type (rather than varchar)? would be recommended if you have multiple langs in the same column. that will let you store and retrieve properly.
of course, sql server can only maintain one collation type on a particular column, even if the column is being used to store strings in multiple languages, so that is something to consider...

Resources