SQL Server Linked Server to PostgreSQL Turkish Character Issue - sql-server

I have added a PostgreSQL linked server to my SQL Server with help from this blog post. My problem is when I use the query below, I am having problems with Turkish characters.
Query on Microsoft SQL Server 2012:
SELECT *
FROM OpenQuery(CARGO, 'SELECT taxno ASACCOUNTNUM, title AS NAME FROM view_company');
Actual results:
MUSTAFA ÞAHÝNALP
Expected results:
MUSTAFA ŞAHİNALP

The problem is that the source encoding is 8-bit Extended ASCII using Code Page 1254 -- Windows Latin 5 (Turkish). If you follow that link, you will see the Latin5 chart of characters to values. The value of the Ş character -- "Latin Capital Letter S with Cedilla" -- is 222 (Decimal) / DE (Hex). Your local server (i.e. SQL Server) has a default Collation of SQL_Latin1_General_CP1_CI_AS which is also 8-bit Extended ASCII, but using Code Page 1252 -- Windows Latin 1 (ANSI). If you follow that link, you will see the Latin1 chart that shows the Þ character -- "Latin Capital Letter Thorn" -- also having a value of 222 (Decimal) / DE (Hex). This is why your characters are getting translated in that manner.
There are a few things you can try:
Use sp_serveroption to set the following two options:
EXEC sp_serveroption #server=N'linked_server_name',
#optname='use remote collation',
#optvalue=N'true';
EXEC sp_serveroption #server=N'linked_server_name',
#optname='collation name',
#optvalue=N'Turkish_100_CI_AS';
Not sure if that will work with PostgreSQL as the remote system, but it's worth trying at least. Please note that this requires that all remote column collations be set to this particular value: Turkish / Code Page 1254.
Force the Collation per each column:
SELECT [ACCOUNTNUM], [NAME] COLLATE Turkish_100_CI_AS
FROM OPENQUERY(CARGO, 'SELECT taxno AS ACCOUNTNUM, title AS NAME FROM view_company');
Convert the string values (just the ones with character mapping issues) to VARBINARY and insert into a temporary table where the column is set to the proper Collation:
CREATE TABLE #Temp ([AccountNum] INT, [Name] VARCHAR(100) COLLATE Turkish_100_CI_AS);
INSERT INTO #Temp ([AccountNum], [Name])
SELECT [ACCOUNTNUM], CONVERT(VARBINARY(100), [NAME])
FROM OPENQUERY(CARGO, 'SELECT taxno AS ACCOUNTNUM, title AS NAME FROM view_company');
SELECT * FROM #Temp;
This approach will first convert the incoming characters into their binary / hex representation (e.g. Ş --> 0xDE), and then, upon inserting 0xDE into the VARCHAR column in the temp table, it will translate 0xDE into the expected character of that value for Code Page 1254 (since that is the Collation of that column). The result will be Ş instead of Þ.
UPDATE
Option # 1 worked for the O.P.

Related

Kurdish Sorani Letters sql server

I am trying to create a database containing Kurdish Sorani Letters.
My Database fields has to be varchar cause of project is started that vay.
First I create database with Arabic_CI_AS
I can store all arabic letters on varchar fields but when it comes to kurdish letters for example
ڕۆ these special letters are show like ?? on the table after entering data, I think my collation is wrong. Have anybody got and idea for collation ?
With that collation, no, you need to use nvarchar and always prefix such strings with the N prefix:
CREATE TABLE dbo.floo
(
UseNPrefix bit,
a varchar(32) collate Arabic_CI_AS,
b nvarchar(32) collate Arabic_CI_AS
);
INSERT dbo.floo(UseNPrefix,a,b) VALUES(0,'ڕۆ','ڕۆ');
INSERT dbo.floo(UseNPrefix,a,b) VALUES(1,N'ڕۆ',N'ڕۆ');
SELECT * FROM dbo.floo;
Output:
UseNPrefix
a
b
False
??
??
True
??
ڕۆ
Example db<>fiddle
In SQL Server 2019, you can use a different SC + UTF-8 collation with varchar, but you will still need to prefix string literals with N to prevent data from being lost:
CREATE TABLE dbo.floo
(
UseNPrefix bit,
a varchar(32) collate Arabic_100_CI_AS_KS_SC_UTF8,
b nvarchar(32) collate Arabic_100_CI_AS_KS_SC_UTF8
);
INSERT dbo.floo(UseNPrefix,a,b) VALUES(0,'ڕۆ','ڕۆ');
INSERT dbo.floo(UseNPrefix,a,b) VALUES(1,N'ڕۆ',N'ڕۆ');
SELECT * FROM dbo.floo;
Output:
UseNPrefix
a
b
False
??
??
True
ڕۆ
ڕۆ
Example db<>fiddle
Basically, even if you are on SQL Server 2019, your requirements of "I need to store Sorani" and "I can't change the table" are incompatible. You will need to either change the data type of the column or at least change the collation, and you will need to adjust any code that expects to pass this data to SQL Server without an N prefix on strings.

UTF-8 characters get saved as ?? on insert, but gets saved correctly on update

I have a table on MS SQLServer with an nVarchar column. I am saving a UTF-8 character using an insert statement. It gets saved as ???. If I update the same column using the same value via an update statement, it gets saved correctly.
Any hint on what would be the issue here? The collation used is : SQL_Latin1_General_CP1_CI_AS
Show your insert statement. There is - quite probably - an N missing:
DECLARE #v NVARCHAR(100)='Some Hindi from Wikipedia मानक हिन्दी';
SELECT #v;
Result: Some Hindi from Wikipedia ???? ??????
SET #v=N'Some Hindi from Wikipedia मानक हिन्दी';
SELECT #v;
Result: Some Hindi from Wikipedia मानक हिन्दी
The N in front of the string literal tells SQL-Server to interpret the content as unicode (to be exact: as ucs-2). Otherwise it will be treated as a 1-byte-encoded extended ASCII, which is not able to deal with all characters...

'LIKE' keyword is not working in sql server 2008

I have bulk-data in SQL-Server table. One of the fields contains following data :
'(اے انسان!) کیا تو نہیں جانتا)'
Tried:
SELECT * from Ayyat where Data like '%انسان%' ;
but it is showing no-result.
Plese use N before string if language is not a english:
SELECT * from Ayyat where Data like N'%انسان%' ;
If you're storing urdu, arabic or other language except english in your database first you should convert your database into another format and then also table.
first you've to convert your database charset and also collation alter your database
ALTER DATABASE database_name CHARACTER SET utf8 COLLATE utf8_general_ci
then convert your table
ALTER TABLE table_name CONVERT TO CHARACTER SET utf8 COLLATE utf8_general_ci
after this execute normal your query
SELECT * FROM table_name WHERE Datas LIKE '%انسان%'
Note: if you not convert your database and table other languages and special characters will be changed into question marks.

text encodings in .net, sql server processing

I have an application that gets terms from a DB to run as a list of string terms. The DB table was set up with nvarchar for that column to include all foreign characters. Now in some cases where characters like ä will come through clearly when getting the terms from the DB and even show that way in the table.
When importing japanese or arabic characters, all I see are ????????.
Now I have tried converting it using different methods, first converting it into utf8 encoding and then back and also secondly using the httputility.htmlencode which works perfectly when it is these characters but then converts quotes and other stuff which I dont need it to do.
Now I accused the db designer that he needs to do something on his part but am I wrong in that the DB should display all these characters and make it easy to just query it and add to my ssearch list. If not is there a consistent way of getting all international characters to display correctly in SQL and VB.net
I know when I have read from text files I just used the Microsoft.visualbasic.textfieldparser reader tool with encoding set to utf8 and this would not be an issue.
If the database field is nvarchar, then it will store data correctly. As you have seen.
Somewhere before it gets to the database, the data is being lost or changed to varchar: stored procedure, parameters, file encoding, ODBC translation etc.
DECLARE #foo nvarchar(100), #foo2 varchar(100)
--with arabic and japanese and proper N literal
SELECT #foo = N'العربي 日本語', #foo2 = N'العربي 日本語'
SELECT #foo, #foo2 -- gives العربي 日本語
--now a varchar literal
SELECT #foo = 'العربي 日本語', #foo2 = 'العربي 日本語'
SELECT #foo, #foo2 --gives ?????? ???
--from my Swiss German keyboard. These are part of my code page.
SELECT #foo = 'öéäàüè', #foo2 = 'öéäàüè'
SELECT #foo, #foo2 --gives ?????? ???
So, apologise to the nice DB monkey... :-)
Always try to use NVARCHAR or NTEXT to store foreign charactesr.
you cannot store UNICODE in varchar ot text datatype.
Also put a N before string value
like
UPDATE [USER]
SET Name = N'日本語'
WHERE ID = XXXX;

Multi-language support

We have developed a site that needs to display text in English, Polish, Slovak and Czech. However, when the text is entered into the database, any accented letters are changed to english letters.
After searching around on forums, I have found that it is possible to put an 'N' in front of a string which contains accented characters. For example:
INSERT INTO Table_Name (Col1, Col2) VALUES (N'Value1', N'Value2')
However, the site has already been fully developed so at this stage, going through all of the INSERT and UPDATE queries in the site would be a very long and tedious process.
I was wondering if there is any other, much quicker, way of doing what I am trying to do?
The database is MSSQL and the columns being inserted into are already nvarchar(n).
There isn't any quick solution.
The updates and inserts are wrong and need to be fixed.
If they were parameterized queries, you could have simply made sure they were using the NVarChar database type and you would not have a problem.
Since they are dynamic strings, you will need to ensure that you add the unicode specifier (N) in front of each text field you are inserting/updating.
Topic-starter wrote:
"text in English, Polish, Slovak and Czech. However, when the text is entered into the database, any accented letters are changed to english letters" After searching around on forums, I have found that it is possible to put an 'N' in front of a string which contains accented characters. For example:
INSERT INTO Table_Name (Col1, Col2) VALUES (N'Value1', N'Value2')
"The collation for the database as a whole is Latin1_General_CI_AS"
I do not see how it could happen due to SQL Server since Latin1_General_CI_AS treats european "non-English" letters:
--on database with collation Latin1_General_CI_AS
declare #test_multilanguage_eu table
(
c1 char(12),
c2 nchar(12)
)
INSERT INTO #test_multilanguage_eu VALUES ('éÉâÂàÀëËçæà', 'éÉâÂàÀëËçæà')
SELECT c1, cast(c1 as binary(4)) as c1bin, c2, cast(c2 as binary(4)) as c2bin
FROM #test_multilanguage_eu
outputs:
c1 c1bin c2 c2bin
------------ ---------- ------------ ----------
éÉâÂàÀëËçæà 0xE9C9E2C2 éÉâÂàÀëËçæà 0xE900C900
(1 row(s) affected)
I believe you simply have to check checkboxes them Control Panel --> Regional and Language Options --> tab Advanced --> Code page conversion tables and check that you render in the same codepage as you store it.
Converting to unicode from encodings used by clients would lead to problems to render back to webclients, it seems to me.
I believe that most European collation designators use codepage 1252 [1], [2].
Update:
SELECT
COLLATIONPROPERTY('Latin1_General_CI_AS' , 'CodePage')
outputs 1252
[1]
http://msdn.microsoft.com/en-us/library/ms174596.aspx
[2]
Windows 1252
http://msdn.microsoft.com/en-us/goglobal/cc305145.aspx

Resources