RODBC - MSSQL response - character problems - sql-server

I'm trying to query a table of my MS SQL Server from R and work with the data. Somewhere along the way some of my characters are apparently lost or transformed. What am I doing wrong?
R code for querying the data:
library("RODBC")
dbhandle <- odbcConnect("Local MSSQL db", DBMSencoding= "windows-1252")
response <- sqlQuery(dbhandle, "select NEM from databasename.dbo.tablename")
I tried omitting the DBMSencoding parameter, as well as setting it to utf-8, windows-1250 and windows-1251 with no success.
When I write and view the result in a csv (without any transformation afterwards) it looks like this:
No accents
(I am aware that RStudio has limited capability in displaying Unicode characters, so I'm verifying the success of the query by writing the data to a csv)

Related

Pandas read_sql changing large number IDs when reading

I transferred an Oracle database to SQL Server and all seems to have went well. The various ID columns are large numbers so I had to use Decimal as they were too large for BigInt.
I am now trying to read the data using pandas.read_sql using pyodbc connection with ODBC Driver 17 for SQL Server. df = pandas.read_sql("SELECT * FROM table1"),con)
The numbers are coming out as float64 and when I try to print them our use them in SQL statements they come out in scientific notation and when I try to use '{:.0f}'.format(df.loc[i,'Id']) It turns several numbers into the same number such as 90300111000003078520832. It is like precision is lost when it goes to scientific notation.
I also tried pd.options.display.float_format = '{:.0f}'.format before the read_sql but this did not help.
Clearly I must be doing something wrong as the Ids in the database are correct.
Any help is appreciated Thanks
pandas' read_sql method has an option named coerce_float which defaults to True and it …
Attempts to convert values of non-string, non-numeric objects (like decimal.Decimal) to floating point, useful for SQL result sets.
However, in your case it is not useful, so simply specify coerce_float=False.
I've had this problem too, especially working with long ids: read_sql works fine for the primary key, but not for other columns (like the retweeted_status_id from Twitter API calls). Setting coerce_float to false does nothing for me, so instead I cast retweeted_status_id to a character format in my sql query.
Using psql, I do:
df = pandas.read_sql("SELECT *, Id::text FROM table1"),con)
But in SQL server it'd be something like
df = pandas.read_sql("SELECT *, CONVERT(text, Id) FROM table1"),con)
or
df = pandas.read_sql("SELECT *, CAST(Id AS varchar) FROM table1"),con)
Obviously there's a cost here if you're asking to cast many rows, and a more efficient option might be to pull from SQL server without using pandas (as a nested list or JSON or something else) which will also preserve your long integer formats.

SQLAlchemy Truncating Strings On Import From MS SQL

First off this is my setup:
Windows 7
MS SQL Server 2008
Python 3.6 Anaconda Distribution
I am working in a Jupyter notebook and trying to import a column of data from a MS SQL Server database using SQLAlchemy. The column in question contains cells which store long strings of text (datatype is nvarchar(max)). This is my code:
engine = create_engine('mssql+pyodbc://user:password#server:port/db_name?driver=SQL+Server+Native+Client+11.0'
stmt = 'SELECT componenttext FROM TranscriptComponent WHERE transcriptId=1265293'
connection = engine.connect()
results = connection.execute(stmt).fetchall()
This executes fine, and imports a list of strings. However when I examine the strings they are truncated, and in the middle of the strings the following message seems to have been inserted:
... (8326 characters truncated) ...
With the number of characters varying from string to string. I did a check on how long the strings that got imported are, and the ones that have been truncated are all limited at either 339 or 340 characters.
Is this a limitation in SQLAlchemy, Python or something else entirely?
Any help appreciated!
Same problem here!
Set up :
Windows Server 2012
MS SQL Server 2016/PostgreSQL 10.1
Python 3.6 Anaconda Distribution
I've tested everything I could, but can't overpass this 33x limitation in field length. Either varchar/text seems to be affected and the DBMS/driver doesn't seem to have any influence.
EDIT:
Found the source of the "problem": https://bitbucket.org/zzzeek/sqlalchemy/issues/2837
Seems like fetchall() is affected by this feature.
The only workaround i found was :
empty_list=[]
connection = engine.connect()
results = connection.execute(stmt)
for row in results:
empty_list.append(row['componenttext'])
This way i haven't found any truncation in my long string field(>3000 ch).

RODBC ERROR: 'Calloc' could not allocate memory

I am setting up a SQL Azure database. I need to write data into the database on daily basis. I am using 64-bit R version 3.3.3 on Windows10. Some of the columns contain text (more than 4000 characters). Initially, I have imported some data from a csv into the SQL Azure database using Microsoft SQL Server Management Studios. I set up the text columns as ntext format, because when I tried using nvarchar the max was 4000 and some of the values got truncated even though they were about 1100 characters long.
In order to append to the database I am first saving the records in a temp table when I have predefined the varTypes:
varTypesNewFile <- c("Numeric", rep("NTEXT", ncol(newFileToAppend) - 1))
names(varTypesNewFile) <- names(newFileToAppend)
sqlSave(dbhandle, newFileToAppend, "newFileToAppendTmp", rownames = F, varTypes = varTypesNewFile, safer = F)
and then append them by using:
insert into mainTable select * from newFileToAppendTmp
If the text is not too long, the above does work. However, sometimes I get the following error during the sqlSave command:
Error in odbcUpdate(channel, query, mydata, coldata[m, ], test = test, :
'Calloc' could not allocate memory (1073741824 of 1 bytes)
My questions are:
How can I counter this issue?
Is this the format I should be using?
Additionally, even when the above works, it takes about an hour to upload about 5k of records. Is it not too long? Is this the normal amount of time it should take? If not, what could I do better.
RODBC is very old, and can be a bit flaky with NVARCHAR columns. Try using the RSQLServer package instead, which offers an alternative means to connect to SQL Server (and also provides a dplyr backend).

Automatic character encoding handling in Perl / DBI / DBD::ODBC

I'm using Perl with DBI / DBD::ODBC to retrieve data from an SQL Server database, and have some issues with character encoding.
The database has a default collation of SQL_Latin1_General_CP1_CI_AS, so data in varchar columns is encoded in Microsoft's version of Latin-1, AKA windows-1252.
There doesn't seem to be a way to handle this transparently in DBI/DBD::ODBC. I get data back still encoded as windows-1252, for instance, € “ ” are encoded as bytes 0x80, 0x93 and 0x94. When I write those to an UTF-8 encoded XML file without decoding them first, they are written as Unicode characters 0x80, 0x93 and 0x94 instead of 0x20AC, 0x201C, 0x201D, which is obviously not correct.
My current workaround is to call $val = Encode::decode('windows-1252', $val) on every column after every fetch. This works, but hardly seems like the proper way to do this.
Isn't there a way to tell DBI or DBD::ODBC to do this conversion for me?
I'm using ActivePerl (5.12.2 Build 1202), with DBI (1.616) and DBD::ODBC (1.29) provided by ActivePerl and updated with ppm; running on the same server that hosts the database (SQL Server 2008 R2).
My connection string is:
dbi:ODBC:Driver={SQL Server Native Client 10.0};Server=localhost;Database=$DB_NAME;Trusted_Connection=yes;
Thanks in advance.
DBD::ODBC (and ODBC API) does not know the character set of the underlying column so DBD::ODBC cannot do anything with 8 bit data returned, it can only return it as it is and you need to know what it is and decode it. If you bind the columns as SQL_WCHAR/SQL_WVARCHAR the driver/sql_server should translate the characters to UCS2 and DBD::ODBC should see the columns as SQL_WCHAR/SQL_WVARCHAR. When DBD::ODBC is built in unicode mode SQL_WCHAR columns are treat as UCS2 and decoded and re-encoded in UTF-8 and Perl should see them as unicode characters.
You need to set SQL_WCHAR as the bind type after bind_columns as bind types are not sticky like parameter types.
If you want to continue reading your varchar data which windows 1252 as bytes then currently you have no choice but to decode them. I'm not in a rush to add something to DBD::ODBC to do this for you since this is the first time anyone has mentioned this to me. You might want to look at DBI callbacks as decoding the returned data might be more easily done in those (say the fetch method).
You might also want to investigate the "Perform Translation for character data" setting in newer SQL Server ODBC Drivers although I have little experience with it myself.

How can I handle non-ASCII characters when retrieving data from SQL Server using Perl?

I have a Perl script running on UNIX that uses DBI to connect to and retrieve data from a SQL Server database. The script looks like the following:
$dbh = DBI->connect("dbi:Sybase:server=$connect;charset=UTF-8", $login, $password) or die("Couldn't connect to $connect as $login/$password:
$DBI::errstr");
$sql = "use mydb";
$sth = $dbh->prepare($sql);
$sth->execute or die("execute failed");
$sth->finish;
$sql = "MyProc \#DATE='1/1/2008'";
$sth = $dbh->prepare($sql);
$sth->execute or die("execute failed");
while (($body) = $sth->fetchrow()) {
print "$body\n";
}
$sth->finish;
$dbh->disconnect if $dbh;
The body variable retrieves data from a column that is NVARCHAR and contains non-ASCII characters. The query runs fine, but the print statement spits out ????? when it encounters a non-ASCII character. In DBI->connect I even specify the character set, but no luck.
Any thoughts on how I can get this to work?
Your code looks OK.
I've no reason to believe that what you're putting into the database and subsequently retrieving isn't still in UTF-8 encoding.
Have you confirmed that the terminal on which your printing the data is actually in UTF-8 mode?
Oh how many hours I've wasted chasing non-existent bugs based on what I saw or didn't see when I printed data to my terminal. There are several different ways to verify your data that aren't affected by non-printing characters and don't depend on your system's display being able to map the correct glyphs to non-ASCII character codes. If your data don't look right dump them into a file and browse the file with a hex editor or run them through the od utility.
I've connected Perl to SQL Server via FreeTDS + ODBC and have had no problem with character encodings. Maybe the Sybase DBI is the culprit here...

Resources