Character set mismatch on Linux with ODBC to SQL Server - c

I've got a funny issue trying to insert non-ASCII characters into a SQL Server database, using the Microsoft ODBC driver for Linux. The problem is it seems to be assuming different character sets when sending and receiving data. For info, the server collation is set to Latin1_General_CI_AS (I'm only trying to insert European accent characters).
Testing with tsql (which came with FreeTDS), everything is fine. On startup, it outputs the following:
locale is "en_GB.utf8"
locale charset is "UTF-8"
using default charset "UTF-8"
I can both insert and select a non-ASCII value into a table.
However, using my own utility which uses the ODBC API, it's not working. When I do a select query, the data comes back in UTF-8 character set as desired. However if I insert UTF-8 characters, they get corrupted.
SQL > update test set a = 'Béthune';
Running SQL: update test set a = 'Béthune'
Query executed OK: 1 affected rows
SQL > select * from test;
Running SQL: select * from test
+------------+
| a |
+------------+
| Béthune |
+------------+
If I instead insert the data encoded in ISO-8859-1, then that works correctly, however the select query will still return it encoded in UTF-8!
I've already got the locale set to en_GB.utf8, and a client charset of UTF-8 in the database connection details. Aargh!
FWIW I seem to be getting the same problem whether I use the FreeTDS driver or the official Microsoft driver.
EDIT: Just realised one relevant point, which is that in this test program, it isn't using a prepared statement with bound variables. In other words, the update SQL is passed directly into the SQLPrepare call. Something in ODBC is definitely doing an iconv translation, but evidently not to the correct character set!
#0 0x0000003d4c41f850 in iconv () from /lib64/libc.so.6
#1 0x0000003d4d83fd94 in ?? () from /usr/lib64/libodbc.so.2
#2 0x0000003d4d820465 in SQLPrepare () from /usr/lib64/libodbc.so.2
I'll try compiling my own UnixODBC to see better what's going on.
EDIT 2: I've built UnixODBC from source to debug what it's doing, and the problem is nl_langinfo(CODESET) reports back ISO-8859-1. That is strange, since the man page for it says it's the same string you get from locale charmap, which returns UTF-8. I'm guessing that's the problem but still not sure how to solve.

A colleague at work has just figured out the solution for FreeTDS at least.
For a direct driver connection (SQLDriverConnect()), adding ClientCharset=UTF-8;ServerCharset=CP1252; to the connection string fixed the problem
For a connection via the driver manager (SQLConnect()), I can add these lines to the connection settings in odbc.ini:
client charset = UTF-8
server charset = CP1252
Can't yet figure out a solution using the Microsoft driver ...

A solution for Microsoft ODBC Driver might be to set a proper value into the LANG environment variable.
Make sure you have your required locale installed and configured. Also make sure that the LANG environment variable is set correctly for the user you are running your application under. This might be tricky for daemons. For example to make it work for PHP with Apache2 I had to add export LANG=en_US.utf8 into /etc/apache2/envvars.

Related

Connecting to sybase ase with specific charset from a software

I need to connect a Sybase ASE database with a specific charset. I have data with a charset that cannot be read properly by the default charset of Sybase ASE. I know how to connect with "isqln -J" but what I need is a little more complicated.
I have software that connects itself to an ASE database but it doesn't ask for a charset while establishing the connection, so it connects with the default charset.
What I want to ask is, are any of these options possible;
Can I change the default charset of Sybase ASE (not the database charset - I tried changing it, didn't work)
Can I track the connection so just before login (with login trigger etc.) I intercept the connection and change it to my needs (adding "-J" parameter maybe)
I tried changing the database's charset, system's charset, OEM charset, etc. none of these seem to work. The third-party software still connects the database with the default charset, so it cannot read the values with special characters properly
Please be noted -- Sybase ASE can only work fine when server/client's charset are same or compatible -- eg. when using iso_1 on server and client must be iso_1 too, or Server is cp936 and client is cp936 or the subset of cp936-- gb2312... Otherwise on client, you can only get illegal characters like space or square or XXX...
So here's the answer -- please check your ASE server's charset --
sp_helpsort
I don't know what client software you are using, but most general sybase client will call OS's Language settings to set as client's default charset. Then set it on your client's charset -- on linux/unix you can set it with LC_ALL or LANG, on windows, you can set it in Windows Unicode settings.
HTH

Configure charset for ODBC Driver 17 for SQL Server

I'm running a Windows application on Linux under Wine, that accesses a SQL Server using the ODBC Driver 17 for SQL Server, for Linux.
It runs fine except that I see incorrectly represented the varchars with non-Ascii characters. The nvarchar fields (unicode strings) have no problem.
Example:
select rtrim('Presentación ')
Returns: Presentación
My database has the encoding for varchars defined as iso8859-1, and Wine seems to use the cp1252 page code.
My guess is that the ODBC driver for Linux retrieves correctly the data and transforms them to UTF8, which runs fine (I can see the values correctly if I run my queries directly through isql), but when those strings are passed to my application, under Wine, they must be considered as cp1252 and that's when I see them incorrectly.
Has anyone had the same problem? what could I try?
Thank you.

Incorrect encoding using jruby with datamapper

I'm trying to get data form a 2008r2 MSSql server using jruby and datamapper.
The only problem I've got this far is correct character coding in jruby.
Database uses Polish_CI_AS collation, testing field is populated with: "ą ę ś ć".
Fetching that field from within jruby results in: "uFFFD uFFFD uFFFD uFFFD" which are default replacement strings for utf-8.
I've tried setting the -E variable to windows-1250, it changes the characters displayed but as in Utf-8 they are displayed in the same manner. Also tried to include # encoding: Windows-1250, but it doesn’t help either.
I’m pretty sure it has something to do with datamapper or the db connection but jdbc does not supports (AFAIK) encoding variables.
UPDATE
My connection string: DataMapper.setup(:default, 'sqlserver://servername/database;instance=InstanceName;domain=DOMAIN')
The connection works well with MS JDBC, datamapper uses jTDS which uses UTF8 encoding as default.
I've checked the jTDS documentation and found that I needed to add: charset=cp1250; property at the end of my connection string. It all works well now.

SQL Server 2000 charset issues

Once again with the charset issues when talking to DB's :)
I have two enviroments running Zend Server. Bot of these communicate to a SQL Server 2000 using the mssql extension. None of them has any value given for the charset in the settings of the extension. For one it works and for the other one it returns data in the wrong encoding.
The problem became noticed when this data was beeing inserted into a MySQL database and it screamed with SQLSTATE[HY000]: General error: 1366 Incorrect string value: '\xF6m' for column 'cust_lastname' at row 1.
I tried using SET NAMES utf8 to get the SQL Server connection to return the correct data, but it complains and says that NAMES is not a recognized SET statement. Looking around most people even recommend using this but it doesn't seem to be part of SQL Server 2000 :)
So, what should I do? How do I, WITHOUT fiddling with the SQL Server database/tables, tell it to send me the data in UTF-8 encoded format?
EDIT:
Some more info...
SQL Server uses the Finnish_Swedish_CI_AS collation
MySQL has every table in UTF-8 format and uses utf8_unicode_ci
I didn't find a good solution and ended up converting to and from utf8 in my application. If this is encapsulated within a class it doesn't riddle the code. But a way to actually tell the SQL server which encoding to use during communication would be better.

"String data, right truncation" warning on a select statement

I am upscaling an access 2003 database to SQL Server Express 2008. The tables appear to be created ok and the data looks ok.
I have an MFC application that connects to this database. It worked fine connecting to access, but when I connect to SQL Server I am getting the following error on a select statement.
DBMS: Microsoft SQL Server
Version: 10.50.1600
ODBC Driver Manager Version: 03.80.0000
Warning: ODBC Success With Info on field 0.
String data, right truncation
State:01004,Native:0,Origin:[Microsoft][ODBC SQL Server Driver]
The data that is returned should be 8 characters but is only 7 with the right most character truncated.
The access front end can read the data from SQL Server correctly.
The field in the SQL Server table is defined as nvarchar with a length of 8.
The code to read the field looks something like
CDatabase Database;
CString sSerialNumber = "00000000";
CString SqlString;
CString sDsn = "Driver={SQL Server};Server=server\\db;Database=Boards;Uid=uid;Pwd=pwd;Trusted_Connection=False";
Database.Open(NULL,false,false,sDsn);
CRecordset recset( &Database );
SqlString.Format("Select SerialNumber from boards where MACAddress = '%s'",mac);
recset.Open(CRecordset::forwardOnly,SqlString,CRecordset::readOnly);
recset.GetFieldValue("SerialNumber",sSerialNumber);
After this, sSerialNumber should be 12345678 but its 1234567
Thanks for the help
I'd agree that this is driver related. The {SQL Server} driver was introduced for use with SQL 2000. {SQL Native Client} came along with 2005. Ideally, for your 2008 database, you should use the newest {SQL Server Native Client 10.0}. The newer drivers are backward compatible with older versions of SQL Server.
Changing my driver from
"Driver={SQL Server};"
to
Driver={SQL Native Client};
has made the problem go away, but I'm not sure what was going on. I'm going to keep looking into it
From a bit of Googling, I've learned that apparently, at times, particularly when "Use Regional Settings" is checked in the MS SQL Server ODBC driver DSN setup dialog, ODBC will treat a string made up of all digits, as a number, and return it like "12345678.00" which doesn't fit into the space you've given it. The solution is to turn that setting off, either in the dialog box, or, more permanently, in the connection string:
CString sDsn = "Driver={SQL Server};Server=server\\db;Database=Boards;"
+"Uid=uid;Pwd=pwd;Trusted_Connection=False;Regional=No;"
If you absolutely have to dig to the bottom of this, make a minimal stored procedure that will "select" local var defined as varchar(17) - any size more than 2x your original size will do. Now call the sproc instead of dynamic SQL and see what comes back. Then you can repeat it with exactly the same size (nvarchar(8)). Your little sproc serves as easy data adapter and to stabilize typing if old driver tends to get confused - much easier than fiddling with table definition.
Also, check if there's any param/property on inreface/connection classes to specify character encoding and make sure that it's unicode (utf-16). I assume that your code gets compiled for unicode. If not, you need to make decision about that first (N in Nvarchar means unicode, otherwise it would be just varchar). You definitely need character encoding matched at both sides or you will have other spurious errors.

Resources