RJDBC limiting rows from Netezza - netezza

I have an RJDBC connection to Netezza. Queries that should return more than 256 rows are getting truncated to 256 rows. I have tested the queries in SQuirrel and they work fine (return the correct number of rows - 600+).
I have also tried the following:
dbFetch(res, n=-1)
dbFetch(res, n=1000)
dbSendQuery(conn, "select ...", believeNRows=FALSE)
All of these simply return just the first 256 rows. I am on a Mac so ODBC is not an option.

Upgrading the JDBC driver from version 5 to version 7 resolved the issue.

Related

pyodbc: Memory Error using fast_executemany with TEXT / NTEXT columns

I'm having an issue with inserting rows into a database. Just wondering if anyone has any ideas why this is happening? It works when I avoid using fast_executemany but then inserts become very slow.
driver = 'ODBC Driver 17 for SQL Server'
conn = pyodbc.connect('DRIVER=' + driver + ';SERVER=' + server+ \
';UID=' + user+ ';PWD=' + password)
cursor = conn.cursor()
cursor.fast_executemany = True
insert_sql = """
INSERT INTO table (a, b, c)
VALUES (?, ?, ?)
"""
cursor.executemany(insert_sql, insert_params)
---------------------------------------------------------------------------
MemoryError Traceback (most recent call last)
<ipython-input-12-e7e82e4d8c2d> in <module>
2 start_time = time.time()
3
----> 4 cursor.executemany(insert_sql, insert_params)
MemoryError:
There is a known issue with fast_executemany when working with TEXT or NTEXT columns, as described on GitHub here.
The problem is that when pyodbc queries the database metadata to determine the maximum size of the column the driver returns 2 GB (instead of 0, as would be returned for a [n]varchar(max) column).
pyodbc allocates 2 GB of memory for each [N]TEXT element in the parameter array, and the Python app quickly runs out of memory.
The workaround is to use cursor.setinputsizes([(pyodbc.SQL_WVARCHAR, 0, 0)]) (as described here) to coax pyodbc into treating [N]TEXT columns like [n]varchar(max) columns.
(Given that [N]TEXT is a deprecated column type for SQL Server it is unlikely that there will be a formal fix for this issue.)
While this issue was solved for the OP by Gord Thompson's answer, I wanted to note that the question as written applies to other cases where a MemoryError may occur, and fast_executemany actually can throw that in other circumstances beyond just usage of [N]TEXT columns.
In my case a MemoryError was thrown during an attempt to INSERT several million records at once, and as noted here, "parameter values are held in memory, so very large numbers of records (tens of millions or more) may cause memory issues". It doesn't necessarily require tens of millions to trigger, so YMMV.
An easy solution to this is to identify a sane number of records to batch per each execute. Here's an example if using a Pandas dataframe as a source (establish your insert_query as usual):
batch_size = 5000 # Set to a desirable batch size
with connection.cursor() as cursor:
try:
cursor.fast_executemany = True
# Iterate each batch chunk using numpy's split
for chunk in np.array_split(df, batch_size):
cursor.executemany(insert_query,
chunk.values.tolist())
# Run a single commit at the end of the transaction
connection.commit()
except Exception as e:
# Rollback on any exception
connection.rollback()
raise e
Hope this helps anyone who hits this issue and doesn't have any [N]TEXT columns on their target!
In my case, the MemoryError was because I was using a very old driver 'SQL Server'. Switched to the newer driver ('ODBC Driver 17 for SQL Server') as described in the link below and it worked:
link

SQLAlchemy Truncating Strings On Import From MS SQL

First off this is my setup:
Windows 7
MS SQL Server 2008
Python 3.6 Anaconda Distribution
I am working in a Jupyter notebook and trying to import a column of data from a MS SQL Server database using SQLAlchemy. The column in question contains cells which store long strings of text (datatype is nvarchar(max)). This is my code:
engine = create_engine('mssql+pyodbc://user:password#server:port/db_name?driver=SQL+Server+Native+Client+11.0'
stmt = 'SELECT componenttext FROM TranscriptComponent WHERE transcriptId=1265293'
connection = engine.connect()
results = connection.execute(stmt).fetchall()
This executes fine, and imports a list of strings. However when I examine the strings they are truncated, and in the middle of the strings the following message seems to have been inserted:
... (8326 characters truncated) ...
With the number of characters varying from string to string. I did a check on how long the strings that got imported are, and the ones that have been truncated are all limited at either 339 or 340 characters.
Is this a limitation in SQLAlchemy, Python or something else entirely?
Any help appreciated!
Same problem here!
Set up :
Windows Server 2012
MS SQL Server 2016/PostgreSQL 10.1
Python 3.6 Anaconda Distribution
I've tested everything I could, but can't overpass this 33x limitation in field length. Either varchar/text seems to be affected and the DBMS/driver doesn't seem to have any influence.
EDIT:
Found the source of the "problem": https://bitbucket.org/zzzeek/sqlalchemy/issues/2837
Seems like fetchall() is affected by this feature.
The only workaround i found was :
empty_list=[]
connection = engine.connect()
results = connection.execute(stmt)
for row in results:
empty_list.append(row['componenttext'])
This way i haven't found any truncation in my long string field(>3000 ch).

RODBC ERROR: 'Calloc' could not allocate memory

I am setting up a SQL Azure database. I need to write data into the database on daily basis. I am using 64-bit R version 3.3.3 on Windows10. Some of the columns contain text (more than 4000 characters). Initially, I have imported some data from a csv into the SQL Azure database using Microsoft SQL Server Management Studios. I set up the text columns as ntext format, because when I tried using nvarchar the max was 4000 and some of the values got truncated even though they were about 1100 characters long.
In order to append to the database I am first saving the records in a temp table when I have predefined the varTypes:
varTypesNewFile <- c("Numeric", rep("NTEXT", ncol(newFileToAppend) - 1))
names(varTypesNewFile) <- names(newFileToAppend)
sqlSave(dbhandle, newFileToAppend, "newFileToAppendTmp", rownames = F, varTypes = varTypesNewFile, safer = F)
and then append them by using:
insert into mainTable select * from newFileToAppendTmp
If the text is not too long, the above does work. However, sometimes I get the following error during the sqlSave command:
Error in odbcUpdate(channel, query, mydata, coldata[m, ], test = test, :
'Calloc' could not allocate memory (1073741824 of 1 bytes)
My questions are:
How can I counter this issue?
Is this the format I should be using?
Additionally, even when the above works, it takes about an hour to upload about 5k of records. Is it not too long? Is this the normal amount of time it should take? If not, what could I do better.
RODBC is very old, and can be a bit flaky with NVARCHAR columns. Try using the RSQLServer package instead, which offers an alternative means to connect to SQL Server (and also provides a dplyr backend).

SqlServer error HY000: Partial insert/update while calling SQLPutData with an object with more than 400 KB in field of varbinary(max)

I have a big problem when I try to save an object that's bigger than 400KB in a varbinary(max) column, calling ODBC from C++.
Here's my basic workflow of calling SqlPrepare, SQLBindParameter, SQLExecute, SQLPutData (the last one various times):
SqlPrepare:
StatementHandle 0x019141f0
StatementText "UPDATE DT460 SET DI024543 = ?, DI024541 = ?, DI024542 = ? WHERE DI006397 = ? AND DI008098 = ?"
TextLength 93
Binding of first parameter (BLOB field):
SQLBindParameter:
StatementHandle 0x019141f0
ParameterNumber 1
InputOutputType 1
ValueType -2 (SQL_C_BINARY)
ParameterType -4 (SQL_LONGVARBINARY)
ColumnSize 427078
DecimalDigits 0
ParameterValPtr 1
BufferLength 4
StrLenOrIndPtr -427178 (result of SQL_LEN_DATA_AT_EXEC(427078))
SQLExecute:
StatementHandle 0x019141f0
Attempt to save blob in chunks of 32K by calling SQLPutData a number of times:
SQLPutData:
StatementHandle 0x019141f0
DataPtr address of a std::vector with 32768 chars
StrLen_or_Ind 32768
During the very first SQLPutData-operation with the first 32KB of data, I get the following SQL Server error:
[HY000][Microsoft][ODBC SQL Server Driver]Warning: Partial insert/update. The insert/update of a text or image column(s) did not succeed.
This happens always when I try to save an object with a size of more than 400KB. Saving something that's smaller than 400KB works just fine.
I found out the critical parameter is ColumSize of SQLBindParemter. The parameter StrLenOrIndPtr during SQLBindParameter can have lower values (like 32K),
it still results in the same error.
But according to SQL Server API, I don't see why this should be problematic as long as I call SQLPutData with chunks of data that are smaller than 32KB.
Does anyone have an idea what the problem could be?
Any help would be greatly appreciated.
Ok, I just found out this was actually an sql driver problem!
After installing the newest version of Microsoft® SQL Server® 2012 Native Client (from http://www.microsoft.com/de-de/download/details.aspx?id=29065), saving bigger BLOBs works with exactly these parameters from above.

Connecting to SQL Server with CL-SQL via unixODBC/FreeTDS

I've managed to connect from SBCL running on debian to an SQL Server 2000 instance over the network using FreeTDS/unixODBC.
I can actually get data back from the server, so all is working.
However, many of the columns trigger what seem to be unsupported data types a-la:
The value 2147483647 is not of type FIXNUM.
and
-11 fell through ECASE expression.
Wanted one of (-7 -6 -2 -3 -4 93 92 91 11 10 ...).
Anyone have experience using CLSQL with SQL Server would be able to help out?
This (error with 2147483647) occurs because the FreeTDS driver doesn't handle OLEDB BLOBs so well.
You have to issue the following SQL command to make it work:
set textsize 102400
You can see the FreeTDS FAQ entry here. Excerpt:
The text data type is different from char and varchar types. The maximum data length of a text column is governed by the textsize connection option. Microsoft claims in their documentation to use a default textsize of 4000 characters, but in fact their implementation is inconsistent. Sometimes text columns are returned with a size of 4 GB!
The best solution is to make sure you set the textsize option to a reasonable value when establishing a connection.
As for the ECASE expression, I haven't really solved it but I have hacked it away by doing a data conversion of timestamp into a binary value, and a uniqueidentifier into a varchar(36).

Resources