Incorrect encoding using jruby with datamapper - sql-server

I'm trying to get data form a 2008r2 MSSql server using jruby and datamapper.
The only problem I've got this far is correct character coding in jruby.
Database uses Polish_CI_AS collation, testing field is populated with: "ą ę ś ć".
Fetching that field from within jruby results in: "uFFFD uFFFD uFFFD uFFFD" which are default replacement strings for utf-8.
I've tried setting the -E variable to windows-1250, it changes the characters displayed but as in Utf-8 they are displayed in the same manner. Also tried to include # encoding: Windows-1250, but it doesn’t help either.
I’m pretty sure it has something to do with datamapper or the db connection but jdbc does not supports (AFAIK) encoding variables.
UPDATE
My connection string: DataMapper.setup(:default, 'sqlserver://servername/database;instance=InstanceName;domain=DOMAIN')

The connection works well with MS JDBC, datamapper uses jTDS which uses UTF8 encoding as default.
I've checked the jTDS documentation and found that I needed to add: charset=cp1250; property at the end of my connection string. It all works well now.

Related

Configure charset for ODBC Driver 17 for SQL Server

I'm running a Windows application on Linux under Wine, that accesses a SQL Server using the ODBC Driver 17 for SQL Server, for Linux.
It runs fine except that I see incorrectly represented the varchars with non-Ascii characters. The nvarchar fields (unicode strings) have no problem.
Example:
select rtrim('Presentación ')
Returns: Presentación
My database has the encoding for varchars defined as iso8859-1, and Wine seems to use the cp1252 page code.
My guess is that the ODBC driver for Linux retrieves correctly the data and transforms them to UTF8, which runs fine (I can see the values correctly if I run my queries directly through isql), but when those strings are passed to my application, under Wine, they must be considered as cp1252 and that's when I see them incorrectly.
Has anyone had the same problem? what could I try?
Thank you.

JDBC getNString()

I'm working with a Java web application backed by a Microsoft SQL Server. I have used nvarchar columns and my plan is to support Unicode characters. So in my JDBC layer I have used getNString() method from the result set and it works fine. However just for curiosity I changed all the getNString() methods to normal getString() methods and it also works fine displaying Unicode characters correctly.
I found the similar observation from below question as well
Should I be using JDBC getNString() instead of getString()?
Do you guys have any idea about this?
The presence of getNString and setNString is - in my opinion - a design mistake of JDBC. However, database systems that discern between (VAR)CHAR and N(VAR)CHAR can take this setter as a type hint to send the data in their specific format for N(VAR)CHAR. For getters there will usually be no difference as in most drivers, the data will have already been fetched before this method can be called, and a driver should know the proper conversion.
Specifically for the Microsoft SQL Server JDBC driver, with default configuration there is no difference between using setString or setNString: both will lead to the values being sent as unicode. This changes when the connection property sendStringParametersAsUnicode has been set to false.
See also NVARCHAR Support in Type 2 SQL Server 2008 JDBC Driver:
You do not need to use the JDBC 4 API to work with NVARCHAR (UCS-2)
data in SQL Server. Existing JDBC 3 APIs such as
getString/setString/updateString are used to get/set/update Unicode
values as Java Strings (which are always Unicode). The only thing to
be aware of when using the JDBC 3 methods is the setting of the
sendStringParametersAsUnicode property. This property is set to
'true' by default, which means that PreparedStatement and
CallableStatement parameters are sent to SQL Server in Unicode.
Changing the setting to 'false' allows the driver to save space in the
protocol by converting input parameter values to the character set of
the database when sending them.
That said, using the JDBC 4 API to work with NVARCHAR data has some
advantages. The main advantage is formal recognition of the NVARCHAR
type by the JDBC API, including accessor methods such as
getNString/setNString/updateNString. For example, the setNString
method can be used to send input parameter values to SQL Server in
Unicode, even if sendStringParametersAsUnicode is set to 'false'.
Put another way, with the default setting of
sendStringParametersAsUnicode=true, the JDBC 4 'N' methods behave the
same way as the JDBC 3/JDBC 4 non-'N' methods with respect to NVARCHAR
data.
For more information on the sendStringParametersAsUnicode connection property, see Setting the Connection Properties.

Character set mismatch on Linux with ODBC to SQL Server

I've got a funny issue trying to insert non-ASCII characters into a SQL Server database, using the Microsoft ODBC driver for Linux. The problem is it seems to be assuming different character sets when sending and receiving data. For info, the server collation is set to Latin1_General_CI_AS (I'm only trying to insert European accent characters).
Testing with tsql (which came with FreeTDS), everything is fine. On startup, it outputs the following:
locale is "en_GB.utf8"
locale charset is "UTF-8"
using default charset "UTF-8"
I can both insert and select a non-ASCII value into a table.
However, using my own utility which uses the ODBC API, it's not working. When I do a select query, the data comes back in UTF-8 character set as desired. However if I insert UTF-8 characters, they get corrupted.
SQL > update test set a = 'Béthune';
Running SQL: update test set a = 'Béthune'
Query executed OK: 1 affected rows
SQL > select * from test;
Running SQL: select * from test
+------------+
| a |
+------------+
| Béthune |
+------------+
If I instead insert the data encoded in ISO-8859-1, then that works correctly, however the select query will still return it encoded in UTF-8!
I've already got the locale set to en_GB.utf8, and a client charset of UTF-8 in the database connection details. Aargh!
FWIW I seem to be getting the same problem whether I use the FreeTDS driver or the official Microsoft driver.
EDIT: Just realised one relevant point, which is that in this test program, it isn't using a prepared statement with bound variables. In other words, the update SQL is passed directly into the SQLPrepare call. Something in ODBC is definitely doing an iconv translation, but evidently not to the correct character set!
#0 0x0000003d4c41f850 in iconv () from /lib64/libc.so.6
#1 0x0000003d4d83fd94 in ?? () from /usr/lib64/libodbc.so.2
#2 0x0000003d4d820465 in SQLPrepare () from /usr/lib64/libodbc.so.2
I'll try compiling my own UnixODBC to see better what's going on.
EDIT 2: I've built UnixODBC from source to debug what it's doing, and the problem is nl_langinfo(CODESET) reports back ISO-8859-1. That is strange, since the man page for it says it's the same string you get from locale charmap, which returns UTF-8. I'm guessing that's the problem but still not sure how to solve.
A colleague at work has just figured out the solution for FreeTDS at least.
For a direct driver connection (SQLDriverConnect()), adding ClientCharset=UTF-8;ServerCharset=CP1252; to the connection string fixed the problem
For a connection via the driver manager (SQLConnect()), I can add these lines to the connection settings in odbc.ini:
client charset = UTF-8
server charset = CP1252
Can't yet figure out a solution using the Microsoft driver ...
A solution for Microsoft ODBC Driver might be to set a proper value into the LANG environment variable.
Make sure you have your required locale installed and configured. Also make sure that the LANG environment variable is set correctly for the user you are running your application under. This might be tricky for daemons. For example to make it work for PHP with Apache2 I had to add export LANG=en_US.utf8 into /etc/apache2/envvars.

Using nvarchar cfsqltype in coldfusion with jtds jdbc

Firstly, my cconfig is:
Language: ColdFusion 10(and installed update 11)
DB is MS SQL Server 2012
Using the jtds jdbc(tried versions 1.2.6, 1.2.8 and 1.3.0)
I'm currently having a problem running queries where I use cfqueryparam with a cfsqltype of cf_sql_nvarchar. The problem is the page just hangs. If I look at the application log of ColdFusion, I see the error of:
"net.sourceforge.jtds.jdbc.JtdsPreparedStatement.setNString(ILjava/lang/String;)V The specific sequence of files included or processed is:" followed by the test filename.
I'm running a very basic select query on a nvarchar column, but the page doesn't load and that error is logged.
I know it's gotta be something to do with the jtds jdbc as if I connect through the regular sql driver it'll work perfectly.
So did anybody experience this before? If so, what was your resolution?
Thanks
I did a quick search and the results suggest jtds does not support setNString(). I checked the driver source for 1.3.1, and as mentioned in the comments here the method is not implemented:
"..while getNString is implemented the code just consists of // TODO
Auto-generated method stub and throw new AbstractMethodError();.."
So it sounds like you may need to use cf_sql_varchar, combined with the "String Format" setting, like in previous versions. Obviously, the other option is to use a different driver (one that does support setNString(), such as Adobe's driver or the MS SQL Server driver).
Try using cf_sql_varchar. cf_sql_nvarchar is not a valid option according to the Documentation and you should use cf_sql_varchar

SQL Server 2000 charset issues

Once again with the charset issues when talking to DB's :)
I have two enviroments running Zend Server. Bot of these communicate to a SQL Server 2000 using the mssql extension. None of them has any value given for the charset in the settings of the extension. For one it works and for the other one it returns data in the wrong encoding.
The problem became noticed when this data was beeing inserted into a MySQL database and it screamed with SQLSTATE[HY000]: General error: 1366 Incorrect string value: '\xF6m' for column 'cust_lastname' at row 1.
I tried using SET NAMES utf8 to get the SQL Server connection to return the correct data, but it complains and says that NAMES is not a recognized SET statement. Looking around most people even recommend using this but it doesn't seem to be part of SQL Server 2000 :)
So, what should I do? How do I, WITHOUT fiddling with the SQL Server database/tables, tell it to send me the data in UTF-8 encoded format?
EDIT:
Some more info...
SQL Server uses the Finnish_Swedish_CI_AS collation
MySQL has every table in UTF-8 format and uses utf8_unicode_ci
I didn't find a good solution and ended up converting to and from utf8 in my application. If this is encapsulated within a class it doesn't riddle the code. But a way to actually tell the SQL server which encoding to use during communication would be better.

Resources