I have a server with an alias which I'm referencing in a stored procedure when trying to bulk load the contents of a text file into a temp table (as below). I get the following error message when the stored procedure is called:
Cannot bulk load because the file "\\ServerAlias\SubFolder\FileName.txt" could not be opened. Operating system error code 3 (The system cannot find the path specified.).
Yet when I swap out the alias for the real server name, the stored procedure runs fine. The alias works perfectly well in every other context in our system and I haven't found any answers online. Why won't it work here?
BULK INSERT #Temp
FROM '\\ServerAlias\SubFolder\FileName.txt'
WITH
(
FIRSTROW = 1,
ROWTERMINATOR = '\n'
)
Related
I'm exporting data into Parquet files and importing it into Snowflake. The export is done with python (using to_parquet from pandas) on a Windows Server machine.
The exported file has several timestamp columns. Here's the metadata of one of these columns (ParquetViewer):
I'm having weird issues trying to import the timestamp columns into Snowflake.
Attempt 1 (using the copy into):
create or replace table STAGING.DIM_EMPLOYEE(
"EmployeeID" NUMBER(38,0),
"ExitDate" TIMESTAMP_NTZ(9)
);
copy into STAGING.DIM_EMPLOYEE
from #S3
pattern='dim_Employee_.*.parquet'
file_format = (type = parquet)
match_by_column_name = case_insensitive;
select * from STAGING.DIM_EMPLOYEE;
The timestamp column is not imported correctly:
It seems that Snowflake assumes that the value in the column is in seconds and not in microseconds and therefore converts incorrectly.
Attempt 2 (using the external tables):
Then I created an external table:
create or replace external table STAGING.EXT_DIM_EMPLOYEE(
"EmployeeID" NUMBER(38,0) AS (CAST(GET($1, 'EmployeeID') AS NUMBER(38,0))),
"ExitDate" TIMESTAMP_NTZ(9) AS (CAST(GET($1, 'ExitDate') AS TIMESTAMP_NTZ(9)))
)
location=#S3
pattern='dim_Employee_.*.parquet'
file_format='parquet'
;
SELECT * FROM STAGING.EXT_DIM_EMPLOYEE;
The data is still incorrect - still the same issue (seconds instead of microseconds):
Attempt 3 (using the external tables, with modified TO_TIMESTAMP):
I've then modified the external table definition to specifically define that microseconds are used TO_TIMESTAMP_TNZ with scale parameter 6:
create or replace external table STAGING.EXT_DIM_EMPLOYEE_V2(
"EmployeeID" NUMBER(38,0) AS (CAST(GET($1, 'EmployeeID') AS NUMBER(38,0))),
"ExitDate" TIMESTAMP_NTZ(9) AS (TO_TIMESTAMP_NTZ(TO_NUMBER(GET($1, 'ExitDate')), 6))
)
location=#CHICOREE_D365_BI_STAGE/
pattern='dim_Employee_.*.parquet'
file_format='parquet'
;
SELECT * FROM STAGING.EXT_DIM_EMPLOYEE_V2;
Now the data is correct:
But now the "weird" issue appears:
I can load the data into a table, but the load is quite slow and I get a Querying (repair) message during the load. However, at the end, the query is executed, albeit slow:
I want to load the data from stored procedure, using SQL script. When executing the statement using the EXECUTE IMMEDIATE an error is returned:
DECLARE
SQL STRING;
BEGIN
SET SQL := 'INSERT INTO STAGING.DIM_EMPLOYEE ("EmployeeID", "ExitDate") SELECT "EmployeeID", "ExitDate" FROM STAGING.EXT_DIM_EMPLOYEE_V2;';
EXECUTE IMMEDIATE :SQL;
END;
I have also tried to define the timestamp column in an external table as a NUMBER, import it and later convert it into timestamp. This generates the same issue (returning SQL execution internal error in SQL script).
Has anyone experienced an issue like this - it seems to me like a bug?
Basically - my goal is to generate insert/select statements dynamically and execute them (in stored procedures). I have a lot of files (with different schemas) that need to be imported and I want to create an "universal logic" to load these Parquet files into Snowflake.
As confirmed in the Snowflake Support ticket you opened, this issue got resolved when the Snowflake Support team enabled an internal configuration for Parquet timestamp logical types.
If anyone encounters a similar issue please submit a Snowflake Support ticket.
I am using EF Core (3.1.15). In a previous migration (also created in 3.1.15), a column was referenced that was dropped later on. The idempotent script does check if the migration was performed on the database (which it is, and the reference still shows in the __EFMigrationsHistory table). However the check doesn't have the expected result and the script due to the inexistent column.
Q: why is the inexistent column tripping the execution of the SQL script?
Script was created with
dotnet-ef migrations script -i -o migrations.sql
Relevant part of the automated script that fails, where ReferenceToLedgerId is the column dropped in later migration:
IF NOT EXISTS(SELECT * FROM [__EFMigrationsHistory] WHERE [MigrationId] = N'20210612052003_CLedger')
BEGIN
UPDATE LedgerTable SET LedgerId = ReferenceToLedgerId
END;
Error:
Msg 207, Level 16, State 1, Line 3
Invalid column name 'ReferenceToLedgerId'
When running the following SQL query, the result comes back as expected:
SELECT *
FROM [__EFMigrationsHistory] WHERE [MigrationId] = N'20210612052003_CLedger'
MigrationId
ProductVersion
20210612052003_CLedger
3.1.15
The database is Azure SQL Database. Script doesn't fail on local SQL dev database. A dozen migrations have been applied since then, and only now the script fails.
Below was the call that created the specific script:
migrationBuilder.Sql("UPDATE LedgerTable set LedgerId = ReferenceToLedgerId", true);
I tried to place the table and column names in square brackets, but that made no difference (eg. [ReferenceToLedgerId]. The script fails in Azure DevOps release when using SQLCMD and also fails when using Azure Data Studio, both accessing the Azure SQL Database.
Additional check
I changed the script to do a quick check:
PRINT '#Before IF'
IF NOT EXISTS(SELECT * FROM [__EFMigrationsHistory] WHERE [MigrationId] = N'20210612052003_CLedger')
BEGIN
PRINT '#Within IF'
--UPDATE LedgerTable SET LedgerId = ReferenceToLedgerId
END;
PRINT '#After IF'
To which I get the following result:
Started executing query at Line 1
#Before IF #After IF
Total execution time: 00:00:00.010
If I uncomment the UPDATE statement it fails again. So I can only conclude that the code path works as intended, but that the server still checks for the existence of the column. I am not familiar with SQL to understand why this would be, or why it only fails for this one line while the column itself is referenced in other lines of the SQL script without it failing.
That batch will fail on every version of SQL Server. eg
use tempdb
go
create table __EFMigrationsHistory(MigrationId nvarchar(200))
create table LedgerTable(LedgerId int)
go
insert into __EFMigrationsHistory(MigrationId) values (N'20210612052003_CLedger')
go
IF NOT EXISTS(SELECT * FROM [__EFMigrationsHistory] WHERE [MigrationId] = N'20210612052003_CLedger')
BEGIN
UPDATE LedgerTable SET LedgerId = ReferenceToLedgerId
END;
Fails with
Msg 207, Level 16, State 1, Line 8
Invalid column name 'ReferenceToLedgerId'.
Because the batch cannot be parsed and compiled. It's simply not legal to reference a non-existent table or column in a TSQL batch.
You can work around this by using dynamic SQL, so that the batch referencing the non-existent column is not parsed and compiled unless the migration is being applied.
migrationBuilder.Sql("exec('UPDATE LedgerTable set LedgerId = ReferenceToLedgerId')", true);
This is documented here:
Tip
Use the EXEC function when a statement must be the first or only
one in a SQL batch. It might also be needed to work around parser
errors in idempotent migration scripts that can occur when referenced
columns don't currently exist on a table.
https://learn.microsoft.com/en-us/ef/core/managing-schemas/migrations/operations
I have an R script that combines years of FFIEC Bank Call Report schedules into flat files--one for each schedule--then writes each schedule to a tab-delimited, non-quoted flat file suitable for bulk inserting into SQL Server. Then I run this bulk insert command:
bulk insert CI from 'e:\CI.txt' with (firstrow = 2, rowterminator = '0x0a', fieldterminator = '\t')
The bulk insert will run for a while then quit, with this error message:
Msg 7301, Level 16, State 2, Line 4
Cannot obtain the required interface ("IID_IColumnsInfo") from OLE DB provider "BULK" for linked server "(null)".
I've searched here for answers and the most common problem seems to be the rowterminator argument. I know that the files I've created have a line feed without a carriage return, so '0x0a' is the correct argument (but I tried '\n' and it didn't work).
Interestingly, I tried setting the fieldterminator to gibberish just to see what happened and I got the expected error message:
The bulk load failed. The column is too long in the data file for row 1, column 1."
So that tells me that SQL Server has access to the file and is indeed starting to insert it.
Also, I did a manual import (right click on database, tasks->Import Data) and SQL Server swallowed up the file without a hitch. That tells the layout of the table is fine, and so is the file?
Is it possible there's something at the end of the file that's confusing the bulk insert? I looked in a hex editor and it ends with data followed by 0A (the hex code for a line feed).
I'm stumped and open to any possibilities!
I'm setting up a new VM on a server to offload SQL Server database loading from my laptop. In doing so, I'd like to be able to execute stored procedures (no params, just 'exec storedprocedure') in my database via Python, but it's not working.
Stored procedure call worked when using sqlcmd via a batch file and in SSMS, but i'd like to make it all python based.
The stored procedure is appending fact tables follows the below general format:
--staging tbl drop and creation
if object_id(stagingtbl) is not null drop tabl stagingtbl
create table stagingtbl
(fields datatypes nullable
)
--staging tbl load
bulk insert stagingtbl
from 'c:\\filepath\\filename.csv'
with (
firstrow = 2
, rowterminator = '\n'
,fieldterminator = ','
, tablock /*don't know what tablock does but it works...*/
)
--staging table transformation
; with cte as (
/*ETL process to transform csv file into my tbl structure*/
)
--final table load
insert final_tbl
select * from cte
/*
T-SQL update the final table's effect to date, based on subsequent effect from date.
eg:
id, effectfromdate, effecttodate
1,1/1/19, 1/1/3000
1,1/10/19, 1/1/3000
becomes
id, effectfromdate, effecttodate
1,1/1/19, 1/10/19
1,1/10/19, 1/1/3000
*/
The stored procedure works fine with sqlcmd and in ssms but in python (pyodbc) executing the query 'exec storedprocedure', I get the error message:
pyodbc.ProgrammingError: ('42000', '[42000] [Microsoft][SQL Server Native Client 11.0][SQL Server]
Cannot bulk load because the file "c:\filepath\filename.csv" could not be opened.
Operating system error code 3(The system cannot find the path specified.). (4861) (SQLExecDirectW)')
When the csv file is there, no misspellings in path or filename, and I can open the csv when double clicking on it, and no one has the csv open.
With continued experimentation I've established the problem is not with python or pyodbc. In SSMS on my laptop (host machine of db) the stored procedures work just fine, but in SSMS on the VM the stored procedures cause the same error. This tells me that my question isn't the root problem and I had more digging to do. The error (in SSMS) is below.
Msg 4861, Level 16, State 1, Procedure Append_People, Line 71 [Batch Start Line 0]
Cannot bulk load because the file "N:\path\filename.csv" could not be opened. Operating system error code 3(The system cannot find the path specified.).
Once I established the problem is in SSMS, I broadened my search and discovered the issue is the path for the bulk insert command has to be relative to the machine hosting the database. So in the VM (client machine until I migrate the db) when I use the path c:\ thinking it's the VM's c:\ drive the stored procedure is looking at the c:\ of my laptop since it's the host machine. With that I also learned that on a shared drive (N:\) access is delegated and that is its own issue (https://dba.stackexchange.com/questions/44524/bulk-insert-through-network).
So I'm going to focus on migrating the database first, then that'll solve my problem. Thanks to all who tried to help
Currently I am using SQLCMD Utility to load the CSV data to SQL Server. Below is my command which was executed in command prompt to load the data:
sqlcmd -Usa -Pxxx -S192.168.1.223,49546 -dlocal -i"/test.sql" -o"/test.log"
I have also copied my test.sql file contents for your reference:
SET NOCOUNT ON
BULK INSERT test FROM
"\\192.168.1.223\test.csv"
WITH
(
MAXERRORS = 1000000,
CODEPAGE = 1251,
FIELDTERMINATOR = '~%',
ROWTERMINATOR = '0x0a'
)
GO
SELECT CONVERT(varchar,##ROWCOUNT) + ' rows affected'
GO
The insert operation is working fine with the above process. But my concern is, in case of any errors due to data type or data length the row is rejected and I am unable to trace the particular row.
Each time I have to look at the log file for the rejected row number and the data file to check the corresponding row.
Is there any option to generate the error/rejected row to another file, as like we have in ORACLE - SQLPLUS Utility to generate bad file?
I think the option your are looking for is not in sqlcmd, but in BULK INSERT:
ERRORFILE ='file_name'
Specifies the file used to collect rows that have formatting errors and cannot be converted to an OLE DB rowset. These rows are copied into this error file from the data file "as is."