pyodbc: Memory Error using fast_executemany with TEXT / NTEXT columns - sql-server

I'm having an issue with inserting rows into a database. Just wondering if anyone has any ideas why this is happening? It works when I avoid using fast_executemany but then inserts become very slow.
driver = 'ODBC Driver 17 for SQL Server'
conn = pyodbc.connect('DRIVER=' + driver + ';SERVER=' + server+ \
';UID=' + user+ ';PWD=' + password)
cursor = conn.cursor()
cursor.fast_executemany = True
insert_sql = """
INSERT INTO table (a, b, c)
VALUES (?, ?, ?)
"""
cursor.executemany(insert_sql, insert_params)
---------------------------------------------------------------------------
MemoryError Traceback (most recent call last)
<ipython-input-12-e7e82e4d8c2d> in <module>
2 start_time = time.time()
3
----> 4 cursor.executemany(insert_sql, insert_params)
MemoryError:

There is a known issue with fast_executemany when working with TEXT or NTEXT columns, as described on GitHub here.
The problem is that when pyodbc queries the database metadata to determine the maximum size of the column the driver returns 2 GB (instead of 0, as would be returned for a [n]varchar(max) column).
pyodbc allocates 2 GB of memory for each [N]TEXT element in the parameter array, and the Python app quickly runs out of memory.
The workaround is to use cursor.setinputsizes([(pyodbc.SQL_WVARCHAR, 0, 0)]) (as described here) to coax pyodbc into treating [N]TEXT columns like [n]varchar(max) columns.
(Given that [N]TEXT is a deprecated column type for SQL Server it is unlikely that there will be a formal fix for this issue.)

While this issue was solved for the OP by Gord Thompson's answer, I wanted to note that the question as written applies to other cases where a MemoryError may occur, and fast_executemany actually can throw that in other circumstances beyond just usage of [N]TEXT columns.
In my case a MemoryError was thrown during an attempt to INSERT several million records at once, and as noted here, "parameter values are held in memory, so very large numbers of records (tens of millions or more) may cause memory issues". It doesn't necessarily require tens of millions to trigger, so YMMV.
An easy solution to this is to identify a sane number of records to batch per each execute. Here's an example if using a Pandas dataframe as a source (establish your insert_query as usual):
batch_size = 5000 # Set to a desirable batch size
with connection.cursor() as cursor:
try:
cursor.fast_executemany = True
# Iterate each batch chunk using numpy's split
for chunk in np.array_split(df, batch_size):
cursor.executemany(insert_query,
chunk.values.tolist())
# Run a single commit at the end of the transaction
connection.commit()
except Exception as e:
# Rollback on any exception
connection.rollback()
raise e
Hope this helps anyone who hits this issue and doesn't have any [N]TEXT columns on their target!

In my case, the MemoryError was because I was using a very old driver 'SQL Server'. Switched to the newer driver ('ODBC Driver 17 for SQL Server') as described in the link below and it worked:
link

Related

SQL-Server - Bulk Insert Error 7301

Using SQL Server 2016, I am working on a legacy system that requires its nightly import to run via bulk insert. I know SSIS is a better option, but not one that's available to me.
I am uploading the file from the local machine with the following command:
BULK INSERT DataImports.staging_Companies
FROM 'D:\xxxxx\companies_20180802093057.txt'
WITH (BATCHSIZE = 1000
, DATAFILETYPE = 'char'
, FIRSTROW = 2
, FIELDTERMINATOR = ' ' -- Tab Character here
, ROWTERMINATOR = '\n'
, ERRORFILE = 'D:\xxxxx\company_errors.txt');
No format file is being used, and due to our dynamic handling, we would not be able to use one. When uploading the file I am getting an error as:
Msg 7301, Level 16, State 2, Line 4
Cannot obtain the required interface ("IID_IColumnsInfo") from OLE DB provider "BULK" for linked server "(null)".
The general opinion on this is that there is an issue with the row/line terminators. This is not the case here. The issue seems to be around file size or number of rows. The import works fine up to 2482 rows but falls over at 2483. By moving rows around I have ruled out an error on the data itself.
Once the file size/rowcount is exceeded, the command does not run at all. I have added a trigger to the destination table and a batch size, and see results for the smaller size, and none at all for the larger size.
From this, I am wondering if there is something that causes an interruption while reading the file, and cuts it off halfway through a line? I have used and maintained this system for a while and have seen far larger files processed before in terms of both rows, and data size.
Update:
Have just transferred this file onto my local machine, and ran the import on my test db (SQL2017). The bulk insert ran fine with no errors. My local version 14.0.1000.169, client server 13.0.1601.5.
Also tested on another 2016 Server (13.0.4474.0) and that ran fine. Is there anything in server setup that may be causing this issue? Or even something from the main file system? Am clutching at straws now.
Any ideas gratefully received.

RODBC ERROR: 'Calloc' could not allocate memory

I am setting up a SQL Azure database. I need to write data into the database on daily basis. I am using 64-bit R version 3.3.3 on Windows10. Some of the columns contain text (more than 4000 characters). Initially, I have imported some data from a csv into the SQL Azure database using Microsoft SQL Server Management Studios. I set up the text columns as ntext format, because when I tried using nvarchar the max was 4000 and some of the values got truncated even though they were about 1100 characters long.
In order to append to the database I am first saving the records in a temp table when I have predefined the varTypes:
varTypesNewFile <- c("Numeric", rep("NTEXT", ncol(newFileToAppend) - 1))
names(varTypesNewFile) <- names(newFileToAppend)
sqlSave(dbhandle, newFileToAppend, "newFileToAppendTmp", rownames = F, varTypes = varTypesNewFile, safer = F)
and then append them by using:
insert into mainTable select * from newFileToAppendTmp
If the text is not too long, the above does work. However, sometimes I get the following error during the sqlSave command:
Error in odbcUpdate(channel, query, mydata, coldata[m, ], test = test, :
'Calloc' could not allocate memory (1073741824 of 1 bytes)
My questions are:
How can I counter this issue?
Is this the format I should be using?
Additionally, even when the above works, it takes about an hour to upload about 5k of records. Is it not too long? Is this the normal amount of time it should take? If not, what could I do better.
RODBC is very old, and can be a bit flaky with NVARCHAR columns. Try using the RSQLServer package instead, which offers an alternative means to connect to SQL Server (and also provides a dplyr backend).

How to fix locale errors in RODBC (WAS: R CMD BATCH: "$ operator is invalid for atomic vectors" but not in Rstudio)

Maybe related: Stack overflow: Windoes does not support UTF-8
I have a script which I can source from Rstudio, but when I try to source it from Rgui.exe or try to BATCH CMD run, I get the following error in my Rout file:
Error in easy_clean$Sv_Karakter: $ operator is invalid for atomic vectors
The reason is that the database table I am quering have a latin charachter 'ø' in its name (se third line below). So the result of my query is this (as per str(easy)):
"";"x"
"1";"42000 102 [Microsoft][ODBC SQL Server Driver][SQL Server]Incorrect syntax near '¸'."
"2";"[RODBC] ERROR: Could not SQLExecDirect 'select *
from PrøveDeltager a
left outer join Aftaler b
ON b.Cpr = a.CPR
LEFT OUTER JOIN GODK c
ON c.GODK_ID = b.GODK_ID
where a.slut >= '20140808'
AND a.slut <='20140818'
AND a.Branche = 'vvs'
AND a.SaleID is not null
AND a.CPR in (select x.CPRNR from Statistik x)
order by Sv_Karakter'"
In rstudio the query works.
Sys.getlocale('LC_CTYPE')returns Danish_Denmark.1252 in both R.gui and Rstudio - so I don't know how to fix this.
I did find this link to developer.r-project which discus windows locales (quite old though).
For now I have created a database view without the 'ø' - that view I can call without problems from R.
From sessionInfo I can say that:
Rstudio R is 64 bit, and R.exe is 32 bit.
Other than that, the only difference is this, for Rstudio:
loaded via a namespace (and not attached): 1 tools_3.1.0
Since I can't write my database credentials, I can't create a reproducible example. But here is the script. http://pastebin.com/XwdZPhL7
Two ways I can imagine that would yield an error with that code: one is if this results in a single column:
easy[!is.na(easy$Sv_Karakter),]
In that case, the result would be a vector because of the default action of the "[" function to create atomic vectors from single column results. The attempts to extract that column would fail with that error. The other case of failure (but perhaps not that error) would be when there was no 'Sv_Karakter' column in the 'easy' dataframe.
Better efforts at documentation by offering str(easy) or dput(head(easy)) are needed.
Your query returned an error message and it was in the form of an R vector. So that explains the particular error. You now need to figure out how your db connection is getting messed up, as Andreas was saying.

SqlServer error HY000: Partial insert/update while calling SQLPutData with an object with more than 400 KB in field of varbinary(max)

I have a big problem when I try to save an object that's bigger than 400KB in a varbinary(max) column, calling ODBC from C++.
Here's my basic workflow of calling SqlPrepare, SQLBindParameter, SQLExecute, SQLPutData (the last one various times):
SqlPrepare:
StatementHandle 0x019141f0
StatementText "UPDATE DT460 SET DI024543 = ?, DI024541 = ?, DI024542 = ? WHERE DI006397 = ? AND DI008098 = ?"
TextLength 93
Binding of first parameter (BLOB field):
SQLBindParameter:
StatementHandle 0x019141f0
ParameterNumber 1
InputOutputType 1
ValueType -2 (SQL_C_BINARY)
ParameterType -4 (SQL_LONGVARBINARY)
ColumnSize 427078
DecimalDigits 0
ParameterValPtr 1
BufferLength 4
StrLenOrIndPtr -427178 (result of SQL_LEN_DATA_AT_EXEC(427078))
SQLExecute:
StatementHandle 0x019141f0
Attempt to save blob in chunks of 32K by calling SQLPutData a number of times:
SQLPutData:
StatementHandle 0x019141f0
DataPtr address of a std::vector with 32768 chars
StrLen_or_Ind 32768
During the very first SQLPutData-operation with the first 32KB of data, I get the following SQL Server error:
[HY000][Microsoft][ODBC SQL Server Driver]Warning: Partial insert/update. The insert/update of a text or image column(s) did not succeed.
This happens always when I try to save an object with a size of more than 400KB. Saving something that's smaller than 400KB works just fine.
I found out the critical parameter is ColumSize of SQLBindParemter. The parameter StrLenOrIndPtr during SQLBindParameter can have lower values (like 32K),
it still results in the same error.
But according to SQL Server API, I don't see why this should be problematic as long as I call SQLPutData with chunks of data that are smaller than 32KB.
Does anyone have an idea what the problem could be?
Any help would be greatly appreciated.
Ok, I just found out this was actually an sql driver problem!
After installing the newest version of Microsoft® SQL Server® 2012 Native Client (from http://www.microsoft.com/de-de/download/details.aspx?id=29065), saving bigger BLOBs works with exactly these parameters from above.

Out of memory error as inserting a 600MB files into sql server express as filestream data

(please read the update section below, I leave the original question too for clarity)
I am inserting many files into a SQL Server db configured for filestream.
I am inserting in a loop the files from a folder to a database table.
Everything goes fine until I try to insert a 600 MB file.
As it inserts it there is a +600MB memory usage in task manager and I have the error.
The DB size is < 1 GB and the total size of documents is 8 GB, I am using SQL Server Express R2, and according to the documentation I could have problems only if trying to insert a document that is greater than 10 GB (Express limitation) - Current DB Size.
Can anyone tell me why do I have this error? It is very crucial for me.
UPDATE FOR BOUNTY:
I offered 150 because it is very crucial for me!
This seems to be a limitation of Delphi memory Manager, trying to insert a document bigger than 500MB, I didn't check the exact threshold anyway it is between 500 and 600MB). I use SDAC components, in particular a TMSQuery (but I think the same can be done with and TDataset descendant), to insert the document in a table that has a PK (ID_DOC_FILE) and a varbinary(max) field (DOCUMENT) I do:
procedure UploadBigFile;
var
sFilePath: String;
begin
sFilePath := 'D:\Test\VeryBigFile.dat';
sqlInsertDoc.ParamByName('ID_DOC_FILE').AsInteger := 1;
sqlInsertDoc.ParamByName('DOCUMENT').LoadFromFile(sFilePath, ftblob);
sqlInsertDoc.Execute;
sqlInsertDoc.Close;
end;
SDAC team told me this is a limitation of Delphi memory manager. Now since SDAC doesn't support filestream I cannot do what has been suggested in c# in the first answer. Is the only solution reporting to Embarcadero and ask a bug fix?
FINAL UPDATE:
Thanks, really, to all you that answered me. For sure inserting big blobs can be a problem for the Express Edition (because the limitations of 1 GB of ram), anyway I had the error on the Enterprise edition, and it was a "delphi" error, not a sql server one. So I think that the answer that I accepted really hits the problem, even if I have no time to verify it now.
SDAC team told me this is a limitation of Delphi memory manager
To me that looked like an simplistic answer, and I investigated. I don't have the SDAC components and I also don't use SQL Server, my favorites are Firebird SQL and the IBX component set. I tried inserting an 600Mb blob into a table, using IBX, then tried the same using ADO (covering two connection technologies, both TDataSet descendants). I discovered the truth is somewhere in the middle, it's not really the memory manager, it's not SDAC's fault (well... they are in a position to do something about it, if many more people attempt inserting 600 Mb blobs into databases, but that's irrelevant to this discussion). The "problem" is with the DB code in Delphi. As it turns out Delphi insists on using an single Variant to hold whatever type of data one might load into an parameter. And it makes sense, after all we can load lots of different things into an parameter for an INSERT. The second problem is, Delphi wants to treat that Variant like an VALUE type: It copies it around at list twice and maybe three times! The first copy is made right when the parameter is loaded from the file. The second copy is made when the parameter is prepared to be sent to the database engine.
Writing this is easy:
var V1, V2:Variant;
V1 := V2;
and works just fine for Integer and Date and small Strings, but when V2 is an 600 Mb Variant array that assignment apparently makes a full copy! Now think about the memory space available for a 32 bit application that's not running in "3G" mode. Only 2 Gb of addressing space are available. Some of that space is reserved, some of that space is used for the executable itself, then there are the libraries, then there's some space reserved for the memory manager. After making the first 600 Mb allocation there just might not be enough available addressing space to allocate an other 600 Mb buffer! Because of this it's safe to blame it on the memory manager, but then again, why exactly does the DB stuff need an other copy of the 600 Mb monster?
One possible fix
Try splitting up the file into smaller, more manageable chunks. Set up the database table to have 3 fields: ID_DOCUMENT, SEQUENCE, DOCUMENT. Also make the primary key on the table to be (ID_DOCUMENT, SEQUENCE). Next try this:
procedure UploadBigFile(id_doc:Integer; sFilePath: String);
var FS:TFileStream;
MS:TMemoryStream;
AvailableSize, ReadNow:Int64;
Sequence:Integer;
const MaxPerSequence = 10 * 1024 * 1024; // 10 Mb
begin
FS := TFileStream.Create(sFilePath, fmOpenRead);
try
AvailableSize := FS.Size;
Sequence := 0;
while AvailableSize > 0 do
begin
if AvailableSize > MaxPerSequence then
begin
ReadNow := MaxPerSequence;
Dec(AvailableSize, MaxPerSequence);
end
else
begin
ReadNow := AvailableSize;
AvailableSize := 0;
end;
Inc(Sequence); // Prep sequence; First sequence into DB will be "1"
MS := TMemoryStream.Create;
try
MS.CopyFrom(FS, ReadNow);
sqlInsertDoc.ParamByName('ID_DOC_FILE').AsInteger := id_doc;
sqlInsertDoc.ParamByName('SEQUENCE').AsInteger := sequence;
sqlInsertDoc.ParamByName('DOCUMENT').LoadFromStream(MS, ftblob);
sqlInsertDoc.Execute;
finally MS.Free;
end;
end;
finally FS.Free;
end;
sqlInsertDoc.Close;
end;
You could loop through the byte stream of the object you are trying to insert and essentially buffer a piece of it at a time into your database until you have your entire object stored.
I would take a look at the Buffer.BlockCopy() method if you're using .NET
Off the top of my head, the method to parse your file could look something like this:
var file = new FileStream(#"c:\file.exe");
byte[] fileStream;
byte[] buffer = new byte[100];
file.Write(fileStream, 0, fileStream.Length);
for (int i = 0; i < fileStream.Length; i += 100)
{
Buffer.BlockCopy(fileStream, i, buffer, 0, 100);
// Do database processing
}
Here is an example that reads a disk file and saves it into a FILESTREAM column. (It assumes that you already have the transaction Context and FilePath in variables "filepath" and "txContext".
'Open the FILESTREAM data file for writing
Dim fs As New SqlFileStream(filePath, txContext, FileAccess.Write)
'Open the source file for reading
Dim localFile As New FileStream("C:\temp\microsoftmouse.jpg",
FileMode.Open,
FileAccess.Read)
'Start transferring data from the source file to FILESTREAM data file
Dim bw As New BinaryWriter(fs)
Const bufferSize As Integer = 4096
Dim buffer As Byte() = New Byte(bufferSize) {}
Dim bytes As Integer = localFile.Read(buffer, 0, bufferSize)
While bytes > 0
bw.Write(buffer, 0, bytes)
bw.Flush()
bytes = localFile.Read(buffer, 0, bufferSize)
End While
'Close the files
bw.Close()
localFile.Close()
fs.Close()
You're probably running into memory fragmentation issues somewhere. Playing around with really large blocks of memory, especially in any situation where they might need to be reallocated tends to cause out of memory errors when in theory you have enough memory to do the job. If it needs a 600mb block and it can't find a hole that's 600mb wide that's it, out of memory.
While I have never tried it my inclination for a workaround would be to create a very minimal program that does ONLY the one operation. Keep it absolutely as simple as possible to keep the memory allocation minimal. When faced with a risky operation like this call the external program to do the job. The program runs, does the one operation and exits. The point is the new program is in it's own address space.
The only true fix is 64 bit and we don't have that option yet.
I recently experienced a similar problem while running DBCC CHECKDB on a very large table. I would get this error:
There is insufficient system memory in
resource pool 'internal' to run this
query.
This was on SQL Server 2008 R2 Express. The interesting thing was that I could control the occurrence of the error by adding or deleting a certain number of rows to the table.
After extensive research and discussions with various SQL Server experts, I came to the conclusion that the problem was a combination of memory pressure and the 1 GB memory limitation of SQL Server Express.
The recommendation given to me was to either
Acquire a machine with more memory and a
licensed edition of SQL Server or...
Partition the table into sizeable
chunks that DBCC CHECKDB could
handle
Due the complicated nature of parsing these files into the FILSTREAM object, I would recommend the filesystem method and and simply use SQL Server to store the locations of the files.
"While there are no limitations on the number of databases or users supported, it is limited to using one processor, 1 GB memory and 4 GB database files (10 GB database files from SQL Server Express 2008 R2)." It is not the size of the database files that is an insue but "1 GB memory". Try spitting the 600MB+ file but putting it in the stream.

Resources