Using SQL Server 2016, I am working on a legacy system that requires its nightly import to run via bulk insert. I know SSIS is a better option, but not one that's available to me.
I am uploading the file from the local machine with the following command:
BULK INSERT DataImports.staging_Companies
FROM 'D:\xxxxx\companies_20180802093057.txt'
WITH (BATCHSIZE = 1000
, DATAFILETYPE = 'char'
, FIRSTROW = 2
, FIELDTERMINATOR = ' ' -- Tab Character here
, ROWTERMINATOR = '\n'
, ERRORFILE = 'D:\xxxxx\company_errors.txt');
No format file is being used, and due to our dynamic handling, we would not be able to use one. When uploading the file I am getting an error as:
Msg 7301, Level 16, State 2, Line 4
Cannot obtain the required interface ("IID_IColumnsInfo") from OLE DB provider "BULK" for linked server "(null)".
The general opinion on this is that there is an issue with the row/line terminators. This is not the case here. The issue seems to be around file size or number of rows. The import works fine up to 2482 rows but falls over at 2483. By moving rows around I have ruled out an error on the data itself.
Once the file size/rowcount is exceeded, the command does not run at all. I have added a trigger to the destination table and a batch size, and see results for the smaller size, and none at all for the larger size.
From this, I am wondering if there is something that causes an interruption while reading the file, and cuts it off halfway through a line? I have used and maintained this system for a while and have seen far larger files processed before in terms of both rows, and data size.
Update:
Have just transferred this file onto my local machine, and ran the import on my test db (SQL2017). The bulk insert ran fine with no errors. My local version 14.0.1000.169, client server 13.0.1601.5.
Also tested on another 2016 Server (13.0.4474.0) and that ran fine. Is there anything in server setup that may be causing this issue? Or even something from the main file system? Am clutching at straws now.
Any ideas gratefully received.
Related
I try to use this sentence "unload" in Informix but it doesn't work:
UNLOAD TO 'p7024cargaP.unl' select * from p7024carga;
[Error] Script lines: 1-4 --------------------------
A syntax error has occurred.
Script line 1, statement line 1, column 1
So maybe it is because I am using this sentence in Aqua Data Studio.
I have a Windows system in my pc. Can someone help me?
UNLOAD is not a command understood by the server. Some tools, notably DB-Access, recognize the syntax and use a more or less complex sequence of operations to declare a cursor for the SELECT statement and then open the cursor, fetch each row, and format the result, writing to the named file.
Your primary option is to use DB-Access to execute the statement. That is certainly the simplest.
I have a file dump which needs to be imported into SQL Server on a daily basis, which I have created a scheduled task to do this without any attendant. All CSV files are decimated by ',' and it's a Windows CR/LF file encoded with UTF-8.
To import data from these CSV files, I mainly use OpenRowset. It works well until I ran into a file in which there's a value of "S7". If the file contains the value of "S7" then that column will be recognized as datatype of numeric while doing the OpenRowset import and which will lead to a failure for other alphabetic characters to be imported, leaving only NULL values.
This is by far I had tried:
Using IMEX=1: openrowset('Microsoft.ACE.OLEDB.15.0','text;IMEX=1;HDR=Yes;
Using text driver: OpenRowset('MSDASQL','Driver=Microsoft Access Text Driver (*.txt, *.csv);
Using Bulk Insert with or without a format file.
The interesting part is that if I use Bulk Insert, it will give me a warning of unexpected end of file. To solve this, I have tried to use various row terminator indicators like '0x0a','\n', '\r\n' or not designated any, but they all failed. And finally I managed to import some of the records which using a row terminator of ',\n'. However the original file contains like 1000 records and only 100 will be imported, without any notice of errors or warnings.
Any tips or helps would be much appreciated.
Edit 1:
The file is ended with a newline character, from which I can tell from notepad++. I managed to import files which give me an error of unexpected end of file by removing the last record in those files. However even with this method, that I still can not import all records, only a partial of which can be imported.
I am setting up a SQL Azure database. I need to write data into the database on daily basis. I am using 64-bit R version 3.3.3 on Windows10. Some of the columns contain text (more than 4000 characters). Initially, I have imported some data from a csv into the SQL Azure database using Microsoft SQL Server Management Studios. I set up the text columns as ntext format, because when I tried using nvarchar the max was 4000 and some of the values got truncated even though they were about 1100 characters long.
In order to append to the database I am first saving the records in a temp table when I have predefined the varTypes:
varTypesNewFile <- c("Numeric", rep("NTEXT", ncol(newFileToAppend) - 1))
names(varTypesNewFile) <- names(newFileToAppend)
sqlSave(dbhandle, newFileToAppend, "newFileToAppendTmp", rownames = F, varTypes = varTypesNewFile, safer = F)
and then append them by using:
insert into mainTable select * from newFileToAppendTmp
If the text is not too long, the above does work. However, sometimes I get the following error during the sqlSave command:
Error in odbcUpdate(channel, query, mydata, coldata[m, ], test = test, :
'Calloc' could not allocate memory (1073741824 of 1 bytes)
My questions are:
How can I counter this issue?
Is this the format I should be using?
Additionally, even when the above works, it takes about an hour to upload about 5k of records. Is it not too long? Is this the normal amount of time it should take? If not, what could I do better.
RODBC is very old, and can be a bit flaky with NVARCHAR columns. Try using the RSQLServer package instead, which offers an alternative means to connect to SQL Server (and also provides a dplyr backend).
I run some R code after querying 100M records and get the following error after the process runs for over 6 hours:
Msg 39004, Level 16, State 19, Line 300
A 'R' script error occurred during execution of 'sp_execute_external_script'
with HRESULT 0x80004005.
HRESULT 0x80004005 appears to be associated in Windows with Connectivity, Permissions or an "Unspecified" error.
I know from logging in my R code that the process never reaches the R script at all. I also know that the entire procedure completes after 4 minutes on a smaller number of records, for example, 1M. This leads me to believe that this is a scaling problem or some issue with the data, rather than a bug in my R code. I have not included the R code or the full query for proprietary reasons.
However, I would expect a disk or memory error to display a 0x80004004 Out of memory error if that were the case.
One clue I noticed in the SQL ERRORLOG is the following:
SQL Server received abort message and abort execution for major error : 18
and minor error : 42
However the time of this log line does not coincide with the interruption of the process, although it does occur after it started. Unfortunately, there is precious little on the web about "major error 18".
A SQL Trace when running from SSMS shows the client logging in and logging out every 6 minutes or so, but I can only assume this is normal keepalive behaviour.
The sanitized sp_execute_external_script call:
EXEC sp_execute_external_script
#language = N'R'
, #script = N'#We never get here
#returns name of output data file'
, #input_data_1 = N'SELECT TOP 100000000 FROM DATA'
, #input_data_1_name = N'x'
, #output_data_1_name = N'output_file_df'
WITH RESULT SETS ((output_file varchar(100) not null))
Server Specs:
8 cores
256 GB RAM
SQL Server 2016 CTP 3
Any ideas, suggestions or debugging hints would be greatly appreciated!
UPDATE:
Set TRACE_LEVEL=3 in rlauncher.config to turn on a higher level of logging and re-ran the process. The log reveals a cleanup process that ran, removing session files, at the time the entire process failed after 6.5 hours.
[2016-05-30 01:35:34.419][00002070][00001EC4][Info] SQLSatellite_LaunchSatellite(1, A187BC64-C349-410B-861E-BFDC714C8017, 1, 49232, nullptr) completed: 00000000
[2016-05-30 01:35:34.420][00002070][00001EC4][Info] < SQLSatellite_LaunchSatellite, dllmain.cpp, 223
[2016-05-30 08:04:02.443][00002070][00001EC4][Info] > SQLSatellite_LauncherCleanUp, dllmain.cpp, 309
[2016-05-30 08:04:07.443][00002070][00001EC4][Warning] Session A187BC64-C349-410B-861E-BFDC714C8017 cleanup wait failed with 258 and error 0
[2016-05-30 08:04:07.444][00002070][00001EC4][Info] Session(A187BC64-C349-410B-861E-BFDC714C8017) logged 2 output files
[2016-05-30 08:04:07.444][00002070][00001EC4][Warning] TryDeleteSingleFile(C:\PROGRA~1\MICROS~1\MSSQL1~1.MSS\MSSQL\EXTENS~1\MSSQLSERVER06\A187BC64-C349-410B-861E-BFDC714C8017\Rscript1878455a2528) failed with 32
[2016-05-30 08:04:07.445][00002070][00001EC4][Warning] TryDeleteSingleDirectory(C:\PROGRA~1\MICROS~1\MSSQL1~1.MSS\MSSQL\EXTENS~1\MSSQLSERVER06\A187BC64-C349-410B-861E-BFDC714C8017) failed with 32
[2016-05-30 08:04:08.446][00002070][00001EC4][Info] Session A187BC64-C349-410B-861E-BFDC714C8017 removed from MSSQLSERVER06 user
[2016-05-30 08:04:08.447][00002070][00001EC4][Info] SQLSatellite_LauncherCleanUp(A187BC64-C349-410B-861E-BFDC714C8017) completed: 00000000
It appears the only way to allow my long-running process to continue is to:
a) Extend the Job Cleanup wait time to allow the job to finish
b) Disable the Job Cleanup process
I have thus far been unable to find the value that sets the Job Cleanup wait time in the MSSQLLaunchpad service.
While a JOB_CLEANUP_ON_EXIT flag exists in rlauncher.config, setting it to 0 has no effect. The service seems to reset it to 1 when it is restarted.
Again, any suggestions or assistance would be much appreciated!
By default, SQL Server reads all data into R memory as a Data Frame before starting execution of R script. Based on the fact that the script works with 1M rows and fails to start with 100M rows, this could potentially be an Out of Memory error. To resolve memory issues, (other than increasing memory on machine/reducing data size) you can try one of these solutions
Increase memory allocation for R process execution using sys.resource_governor_external_resource_pools max_memory_percent setting. By default, SQL Server limits R process execution to 20% of memory.
Streaming execution for R script instead of loading all data into memory. Note that this parameter can only be used in cases where the output of the R script doesn’t depend on reading or looking at the entire set of rows.
The Warnings in RLauncher.log about data cleanup happened after the R script execution can be safely ignored and probably not the root cause for the failures you are seeing.
Unable to resolve this issue in SQL, I simply avoided the SQL Server Launchpad service which was interrupting the processing and pulled the data from SQL using the R RODBC library. The pull took just over 3 hours (instead of 6+ using sp_execute_external_procedure).
This might implicate the SQL Launchpad service, and suggests that memory was not the issue.
Please try your scenario in SQL Server 2016 RTM. There have been many functional and performance fixes made since CTP3.
For more information on how to get the SQL Server 2016 RTM checkout SQL Server 2016 is generally available today blogpost.
I had almost the same issue with SQL Server 2016 RTM-CU1. My Query failed with error 0x80004004 instead of 0x80004005. And it failed beginning with 10,000,000 records, but that could be related to only having 16 GB memory and/or different data.
I got around it by using a field list instead of "*". Even if the field list contains all the fields from the data source (a rather complicated view in my case), a query featuring a field list is always successful, while "SELECT TOP x * FROM ..." always fails for some large x.
I've had the a similar error (0x80004004), and the problem was that one of the rows in one of the columns contained a "very" special character (I'm saying "very" because other special characters did not cause this error).
So that when I replaced 'Folkelånet Telefinans' with 'Folkelanet Telefinans', the problem went away.
In your case, maybe at least one of the values in the last 99M rows contains something like that character, and you just have to replace it. I hope that Microsoft will resolve this issue at some point.
I'm working on an IronPython (v2.7.3) module that connects to a given SQL Server database on a remote machine, and uses SMO to generate scripts for all of that DB's objects. My 'real' module has the code to generate a script for every defined object type in SMO, from ApplicationRoles to XmlSchemaCollections. The DB I'm working with is on SQL Server 2000. It has a fair number of objects -- 117 tables, 257 SPs, 101 views, etc.
Every time I run my module, I get a stack trace at the point where it's scripting the SPs. I trimmed down to module to script only the tables and the SPs, and it still failed out while scripting the SPs. Here's the trimmed-down version:
import sys, clr
import System.Array
serverName = r'x.x.x.x' #IP address of remote server
pathAssemblies = r'C:\Program Files\Microsoft SQL Server\100\Setup Bootstrap\SQLServer2008R2\x64'
sys.path.append(pathAssemblies)
clr.AddReferenceToFile('Microsoft.SqlServer.Smo.dll')
import Microsoft.SqlServer.Management.Smo as SMO
srv = SMO.Server(serverName)
srv.ConnectionContext.LoginSecure = False
srv.ConnectionContext.Login = 'sa'
srv.ConnectionContext.Password = 'foo' #Password of sa
db = srv.Databases['bar'] #Name of database
scrp = SMO.Scripter(srv)
sys.stdout = open('DBScriptOutput.txt', 'w')
try:
for dbgenobj in db.Tables:
urns = System.Array[SMO.SqlSmoObject]([dbgenobj])
outStr = scrp.Script(urns)
for outLine in outStr:print outLine
except:
print 'Failed out while generating table scripts.'
try:
for dbgenobj in db.StoredProcedures:
urns = System.Array[SMO.SqlSmoObject]([dbgenobj])
outStr = scrp.Script(urns)
for outLine in outStr:print outLine
except:
print 'Failed out while generating stored procedure scripts.'
The puzzle here that has me stumped involves two things that don't seem to make sense:
(1) The stack track itself looks like this:
Traceback (most recent call last):
File "E:\t.py", line 33, in <module>
UnicodeEncodeError: ('unknown', '\x00', 0, 1, '')
Line 33 though is the print statement in the except block. The output file has all of the tables' scripts, complete scripts for 235 of the SPs, and part of the script for the 236th. But there's nothing unusual (that I can see anyway) about #236 that should cause the scripting to fail out. Nor can I understand why the stack trace would occur at all citing a simple print statement in the except block.
(2) As a further troubleshooting experiment, I tried running the script with the whole try-except block for the tables commented out. It still fails generating the SP scripts, and generates the same stack trace citing line 33. The difference is this time it successfully generates another 16 lines of the script for procedure #236 before terminating. The overall file size of the output file is significantly smaller though. I could understand if the file stopped at the same size, or if the scripting stopped at the same point in the SP, but neither one of these is true.
So at this point, having (apparently) ruled out a problem character in the SP or a file/memory size limit for the scripting process, I'm stumped.
I've got this problem with procedure that had non-ASCII characters in comments. The simplest solution is to use the codecs module and codecs.open instead of plain open call. Add this to the import lines:
import codecs
then replace open call with:
sys.stdout = codecs.open('DBScriptOutput.txt', 'w', 'utf8')
That worked for me.