First off this is my setup:
Windows 7
MS SQL Server 2008
Python 3.6 Anaconda Distribution
I am working in a Jupyter notebook and trying to import a column of data from a MS SQL Server database using SQLAlchemy. The column in question contains cells which store long strings of text (datatype is nvarchar(max)). This is my code:
engine = create_engine('mssql+pyodbc://user:password#server:port/db_name?driver=SQL+Server+Native+Client+11.0'
stmt = 'SELECT componenttext FROM TranscriptComponent WHERE transcriptId=1265293'
connection = engine.connect()
results = connection.execute(stmt).fetchall()
This executes fine, and imports a list of strings. However when I examine the strings they are truncated, and in the middle of the strings the following message seems to have been inserted:
... (8326 characters truncated) ...
With the number of characters varying from string to string. I did a check on how long the strings that got imported are, and the ones that have been truncated are all limited at either 339 or 340 characters.
Is this a limitation in SQLAlchemy, Python or something else entirely?
Any help appreciated!
Same problem here!
Set up :
Windows Server 2012
MS SQL Server 2016/PostgreSQL 10.1
Python 3.6 Anaconda Distribution
I've tested everything I could, but can't overpass this 33x limitation in field length. Either varchar/text seems to be affected and the DBMS/driver doesn't seem to have any influence.
EDIT:
Found the source of the "problem": https://bitbucket.org/zzzeek/sqlalchemy/issues/2837
Seems like fetchall() is affected by this feature.
The only workaround i found was :
empty_list=[]
connection = engine.connect()
results = connection.execute(stmt)
for row in results:
empty_list.append(row['componenttext'])
This way i haven't found any truncation in my long string field(>3000 ch).
Related
Once I connect to the database (DB2) to check the values in the tables, if they have special chars then I see their utf-8 text value:
I expected instead to see the correct: Tükörfúrógép.
I am still able to handle the value properly, but is there any configuration in the db that I am missing to display the value properly when checking the table?
More Info:
Connected to DB with Intellij and also tried with DbVisualizer.
The following JDBC connection was used in intellij:
jdbc:db2://(...)?characterEncoding=UTF-8;
Tried both with the characterEncoding and without getting the same results.
I am still able to handle the value properly, but is there any configuration in the db that I am missing to display the value properly when checking the table?
DB Version: v11 LUW
JDBC: com.ibm.db2.jcc -- db2jcc4 -- Version 10.5
Encoding being used: UTF-8
db2 "select char(value,10), char(name,10) from sysibmadm.dbcfg where
name like 'code%'"
1 2
---------- ---------- 1208 codepage UTF-8 codeset
2 record(s) selected.
UPDATE 1:
I was able to directly insert in the database values with special
chars, so starting to think this is not DB2 configuration missing but
maybe jdbc or other related issue.
You must have the following HEX string representation for given string Tükörfúrógép in UTF-8 database:
54C3BC6BC3B67266C3BA72C3B367C3A970.
But you have the following instead with repeating garbage symbols:
54C383C2BC6BC383C2B67266C383C2BA72C383C2B367C383C2A970
You may try to manually remove such a byte sequence with the following statement, but it's better to understand a root cause of such a garbage appearance in this column.
VALUES REPLACE (x'54C383C2BC6BC383C2B67266C383C2BA72C383C2B367C383C2A970', x'83C2', '');
SELECT REPLACE (TOWN, x'83C2', '') FROM ...;
I'm trying to query a table of my MS SQL Server from R and work with the data. Somewhere along the way some of my characters are apparently lost or transformed. What am I doing wrong?
R code for querying the data:
library("RODBC")
dbhandle <- odbcConnect("Local MSSQL db", DBMSencoding= "windows-1252")
response <- sqlQuery(dbhandle, "select NEM from databasename.dbo.tablename")
I tried omitting the DBMSencoding parameter, as well as setting it to utf-8, windows-1250 and windows-1251 with no success.
When I write and view the result in a csv (without any transformation afterwards) it looks like this:
No accents
(I am aware that RStudio has limited capability in displaying Unicode characters, so I'm verifying the success of the query by writing the data to a csv)
I am setting up a SQL Azure database. I need to write data into the database on daily basis. I am using 64-bit R version 3.3.3 on Windows10. Some of the columns contain text (more than 4000 characters). Initially, I have imported some data from a csv into the SQL Azure database using Microsoft SQL Server Management Studios. I set up the text columns as ntext format, because when I tried using nvarchar the max was 4000 and some of the values got truncated even though they were about 1100 characters long.
In order to append to the database I am first saving the records in a temp table when I have predefined the varTypes:
varTypesNewFile <- c("Numeric", rep("NTEXT", ncol(newFileToAppend) - 1))
names(varTypesNewFile) <- names(newFileToAppend)
sqlSave(dbhandle, newFileToAppend, "newFileToAppendTmp", rownames = F, varTypes = varTypesNewFile, safer = F)
and then append them by using:
insert into mainTable select * from newFileToAppendTmp
If the text is not too long, the above does work. However, sometimes I get the following error during the sqlSave command:
Error in odbcUpdate(channel, query, mydata, coldata[m, ], test = test, :
'Calloc' could not allocate memory (1073741824 of 1 bytes)
My questions are:
How can I counter this issue?
Is this the format I should be using?
Additionally, even when the above works, it takes about an hour to upload about 5k of records. Is it not too long? Is this the normal amount of time it should take? If not, what could I do better.
RODBC is very old, and can be a bit flaky with NVARCHAR columns. Try using the RSQLServer package instead, which offers an alternative means to connect to SQL Server (and also provides a dplyr backend).
I have a big problem when I try to save an object that's bigger than 400KB in a varbinary(max) column, calling ODBC from C++.
Here's my basic workflow of calling SqlPrepare, SQLBindParameter, SQLExecute, SQLPutData (the last one various times):
SqlPrepare:
StatementHandle 0x019141f0
StatementText "UPDATE DT460 SET DI024543 = ?, DI024541 = ?, DI024542 = ? WHERE DI006397 = ? AND DI008098 = ?"
TextLength 93
Binding of first parameter (BLOB field):
SQLBindParameter:
StatementHandle 0x019141f0
ParameterNumber 1
InputOutputType 1
ValueType -2 (SQL_C_BINARY)
ParameterType -4 (SQL_LONGVARBINARY)
ColumnSize 427078
DecimalDigits 0
ParameterValPtr 1
BufferLength 4
StrLenOrIndPtr -427178 (result of SQL_LEN_DATA_AT_EXEC(427078))
SQLExecute:
StatementHandle 0x019141f0
Attempt to save blob in chunks of 32K by calling SQLPutData a number of times:
SQLPutData:
StatementHandle 0x019141f0
DataPtr address of a std::vector with 32768 chars
StrLen_or_Ind 32768
During the very first SQLPutData-operation with the first 32KB of data, I get the following SQL Server error:
[HY000][Microsoft][ODBC SQL Server Driver]Warning: Partial insert/update. The insert/update of a text or image column(s) did not succeed.
This happens always when I try to save an object with a size of more than 400KB. Saving something that's smaller than 400KB works just fine.
I found out the critical parameter is ColumSize of SQLBindParemter. The parameter StrLenOrIndPtr during SQLBindParameter can have lower values (like 32K),
it still results in the same error.
But according to SQL Server API, I don't see why this should be problematic as long as I call SQLPutData with chunks of data that are smaller than 32KB.
Does anyone have an idea what the problem could be?
Any help would be greatly appreciated.
Ok, I just found out this was actually an sql driver problem!
After installing the newest version of Microsoft® SQL Server® 2012 Native Client (from http://www.microsoft.com/de-de/download/details.aspx?id=29065), saving bigger BLOBs works with exactly these parameters from above.
I've managed to connect from SBCL running on debian to an SQL Server 2000 instance over the network using FreeTDS/unixODBC.
I can actually get data back from the server, so all is working.
However, many of the columns trigger what seem to be unsupported data types a-la:
The value 2147483647 is not of type FIXNUM.
and
-11 fell through ECASE expression.
Wanted one of (-7 -6 -2 -3 -4 93 92 91 11 10 ...).
Anyone have experience using CLSQL with SQL Server would be able to help out?
This (error with 2147483647) occurs because the FreeTDS driver doesn't handle OLEDB BLOBs so well.
You have to issue the following SQL command to make it work:
set textsize 102400
You can see the FreeTDS FAQ entry here. Excerpt:
The text data type is different from char and varchar types. The maximum data length of a text column is governed by the textsize connection option. Microsoft claims in their documentation to use a default textsize of 4000 characters, but in fact their implementation is inconsistent. Sometimes text columns are returned with a size of 4 GB!
The best solution is to make sure you set the textsize option to a reasonable value when establishing a connection.
As for the ECASE expression, I haven't really solved it but I have hacked it away by doing a data conversion of timestamp into a binary value, and a uniqueidentifier into a varchar(36).