Configure charset for ODBC Driver 17 for SQL Server - sql-server

I'm running a Windows application on Linux under Wine, that accesses a SQL Server using the ODBC Driver 17 for SQL Server, for Linux.
It runs fine except that I see incorrectly represented the varchars with non-Ascii characters. The nvarchar fields (unicode strings) have no problem.
Example:
select rtrim('Presentación ')
Returns: Presentación
My database has the encoding for varchars defined as iso8859-1, and Wine seems to use the cp1252 page code.
My guess is that the ODBC driver for Linux retrieves correctly the data and transforms them to UTF8, which runs fine (I can see the values correctly if I run my queries directly through isql), but when those strings are passed to my application, under Wine, they must be considered as cp1252 and that's when I see them incorrectly.
Has anyone had the same problem? what could I try?
Thank you.

Related

RODBC in Ubuntu truncates text strings to 255 characters

I am using RODBC installed on Ubuntu 16.0.4, and I am porting my Windows-based R project/package to this Linux environment. I am running into the issue where sqlQuery returns only the first 255 characters of a text string from an MS SQL Server database. I have found many references to this issue, and I have changed the column type in the database to nvarchar(3500) to presumably solve this issue. This was not a problem in the Windows environment. I cannot seem to get around this 255 character limit, in spite of many folks saying that changing the column variable type to nvarchar(4000) or less, would solve this. I've tried many things, including the cast(...as nvarchar(1000)), for instance, to no avail.
Where am I going wrong?
I was using FreeTDS. I switched to native MS SQL Server drivers, and this fixed the issue. I do not know where the problem lies, but replacing FreeTDS with the MS drivers for SQL server did the trick.

Sybase/HPUX to MSSQL/Linux

We have a Sybase (15.5) server running on HPUX, and I want to migrate the data to a MSSQL 2017 (CU1) on RHEL 7.3.
I'm trying data export/import via bcp using '-c' (ascii) option.
Everything seems fine except for hebrew characters, a 'א' is originally encoded ascii value 224 (Sybase is using iso_1) but the character is modified to ascii value 133 (MSSQL uses SQL_Latin1_General_CP1255_CS_AS collation).
Does someone got a clue about this issue ?
Well, after successfully testing the DirectConnect Odbc drivers it appears that this is a MS drivers limitation.

Character set mismatch on Linux with ODBC to SQL Server

I've got a funny issue trying to insert non-ASCII characters into a SQL Server database, using the Microsoft ODBC driver for Linux. The problem is it seems to be assuming different character sets when sending and receiving data. For info, the server collation is set to Latin1_General_CI_AS (I'm only trying to insert European accent characters).
Testing with tsql (which came with FreeTDS), everything is fine. On startup, it outputs the following:
locale is "en_GB.utf8"
locale charset is "UTF-8"
using default charset "UTF-8"
I can both insert and select a non-ASCII value into a table.
However, using my own utility which uses the ODBC API, it's not working. When I do a select query, the data comes back in UTF-8 character set as desired. However if I insert UTF-8 characters, they get corrupted.
SQL > update test set a = 'Béthune';
Running SQL: update test set a = 'Béthune'
Query executed OK: 1 affected rows
SQL > select * from test;
Running SQL: select * from test
+------------+
| a |
+------------+
| Béthune |
+------------+
If I instead insert the data encoded in ISO-8859-1, then that works correctly, however the select query will still return it encoded in UTF-8!
I've already got the locale set to en_GB.utf8, and a client charset of UTF-8 in the database connection details. Aargh!
FWIW I seem to be getting the same problem whether I use the FreeTDS driver or the official Microsoft driver.
EDIT: Just realised one relevant point, which is that in this test program, it isn't using a prepared statement with bound variables. In other words, the update SQL is passed directly into the SQLPrepare call. Something in ODBC is definitely doing an iconv translation, but evidently not to the correct character set!
#0 0x0000003d4c41f850 in iconv () from /lib64/libc.so.6
#1 0x0000003d4d83fd94 in ?? () from /usr/lib64/libodbc.so.2
#2 0x0000003d4d820465 in SQLPrepare () from /usr/lib64/libodbc.so.2
I'll try compiling my own UnixODBC to see better what's going on.
EDIT 2: I've built UnixODBC from source to debug what it's doing, and the problem is nl_langinfo(CODESET) reports back ISO-8859-1. That is strange, since the man page for it says it's the same string you get from locale charmap, which returns UTF-8. I'm guessing that's the problem but still not sure how to solve.
A colleague at work has just figured out the solution for FreeTDS at least.
For a direct driver connection (SQLDriverConnect()), adding ClientCharset=UTF-8;ServerCharset=CP1252; to the connection string fixed the problem
For a connection via the driver manager (SQLConnect()), I can add these lines to the connection settings in odbc.ini:
client charset = UTF-8
server charset = CP1252
Can't yet figure out a solution using the Microsoft driver ...
A solution for Microsoft ODBC Driver might be to set a proper value into the LANG environment variable.
Make sure you have your required locale installed and configured. Also make sure that the LANG environment variable is set correctly for the user you are running your application under. This might be tricky for daemons. For example to make it work for PHP with Apache2 I had to add export LANG=en_US.utf8 into /etc/apache2/envvars.

SQL Server 2000 charset issues

Once again with the charset issues when talking to DB's :)
I have two enviroments running Zend Server. Bot of these communicate to a SQL Server 2000 using the mssql extension. None of them has any value given for the charset in the settings of the extension. For one it works and for the other one it returns data in the wrong encoding.
The problem became noticed when this data was beeing inserted into a MySQL database and it screamed with SQLSTATE[HY000]: General error: 1366 Incorrect string value: '\xF6m' for column 'cust_lastname' at row 1.
I tried using SET NAMES utf8 to get the SQL Server connection to return the correct data, but it complains and says that NAMES is not a recognized SET statement. Looking around most people even recommend using this but it doesn't seem to be part of SQL Server 2000 :)
So, what should I do? How do I, WITHOUT fiddling with the SQL Server database/tables, tell it to send me the data in UTF-8 encoded format?
EDIT:
Some more info...
SQL Server uses the Finnish_Swedish_CI_AS collation
MySQL has every table in UTF-8 format and uses utf8_unicode_ci
I didn't find a good solution and ended up converting to and from utf8 in my application. If this is encapsulated within a class it doesn't riddle the code. But a way to actually tell the SQL server which encoding to use during communication would be better.

"String data, right truncation" warning on a select statement

I am upscaling an access 2003 database to SQL Server Express 2008. The tables appear to be created ok and the data looks ok.
I have an MFC application that connects to this database. It worked fine connecting to access, but when I connect to SQL Server I am getting the following error on a select statement.
DBMS: Microsoft SQL Server
Version: 10.50.1600
ODBC Driver Manager Version: 03.80.0000
Warning: ODBC Success With Info on field 0.
String data, right truncation
State:01004,Native:0,Origin:[Microsoft][ODBC SQL Server Driver]
The data that is returned should be 8 characters but is only 7 with the right most character truncated.
The access front end can read the data from SQL Server correctly.
The field in the SQL Server table is defined as nvarchar with a length of 8.
The code to read the field looks something like
CDatabase Database;
CString sSerialNumber = "00000000";
CString SqlString;
CString sDsn = "Driver={SQL Server};Server=server\\db;Database=Boards;Uid=uid;Pwd=pwd;Trusted_Connection=False";
Database.Open(NULL,false,false,sDsn);
CRecordset recset( &Database );
SqlString.Format("Select SerialNumber from boards where MACAddress = '%s'",mac);
recset.Open(CRecordset::forwardOnly,SqlString,CRecordset::readOnly);
recset.GetFieldValue("SerialNumber",sSerialNumber);
After this, sSerialNumber should be 12345678 but its 1234567
Thanks for the help
I'd agree that this is driver related. The {SQL Server} driver was introduced for use with SQL 2000. {SQL Native Client} came along with 2005. Ideally, for your 2008 database, you should use the newest {SQL Server Native Client 10.0}. The newer drivers are backward compatible with older versions of SQL Server.
Changing my driver from
"Driver={SQL Server};"
to
Driver={SQL Native Client};
has made the problem go away, but I'm not sure what was going on. I'm going to keep looking into it
From a bit of Googling, I've learned that apparently, at times, particularly when "Use Regional Settings" is checked in the MS SQL Server ODBC driver DSN setup dialog, ODBC will treat a string made up of all digits, as a number, and return it like "12345678.00" which doesn't fit into the space you've given it. The solution is to turn that setting off, either in the dialog box, or, more permanently, in the connection string:
CString sDsn = "Driver={SQL Server};Server=server\\db;Database=Boards;"
+"Uid=uid;Pwd=pwd;Trusted_Connection=False;Regional=No;"
If you absolutely have to dig to the bottom of this, make a minimal stored procedure that will "select" local var defined as varchar(17) - any size more than 2x your original size will do. Now call the sproc instead of dynamic SQL and see what comes back. Then you can repeat it with exactly the same size (nvarchar(8)). Your little sproc serves as easy data adapter and to stabilize typing if old driver tends to get confused - much easier than fiddling with table definition.
Also, check if there's any param/property on inreface/connection classes to specify character encoding and make sure that it's unicode (utf-16). I assume that your code gets compiled for unicode. If not, you need to make decision about that first (N in Nvarchar means unicode, otherwise it would be just varchar). You definitely need character encoding matched at both sides or you will have other spurious errors.

Resources