discovering files in the FileSystem, through SSIS - sql-server

I have a folder where files are going to be dropped for importing into my data warehouse.
\\server\share\loading_area
I have the following (inherited) code that uses xp_cmdshell shivers to call out to the command shell to run the DIR command and insert the resulting filenames into a table in SQL Server.
I would like to 'go native' and reproduce this functionality in SSIS.
Thanks in advance guys and girls. Here's the code
USE MyDatabase
GO
declare #CMD varchar(500)
declare #EXTRACT_PATH varchar(255)
set #EXTRACT_PATH = '\\server\share\folder\'
create table tmp_FILELIST([FILENUM] int identity(1,1), [FNAME] varchar(100), [FILE_STATUS] varchar(20) NULL CONSTRAINT [DF_FILELIST_FILE_STATUS] DEFAULT ('PENDING'))
set #CMD = 'dir ' + #EXTRACT_PATH + '*.* /b /on'
insert tmp_FILELIST([FNAME])
exec master..xp_cmdshell #CMD
--remove the DOS reply when the folder is empty
delete tmp_FILELIST where [FNAME] is null or [FNAME] = 'File Not Found'
--Remove my administrative and default/common, files not for importing, such as readme.txt
delete tmp_FILELIST where [FNAME] is null or [FNAME] = 'readme.txt'

Use the ForEach loop with the file enumerator.

Since you're only inserting file names in a table (i.e. not doing any processing on each file at the same time in SSIS), I suggest doing it all with .NET in a script task. This will also make it easy to add additional logic, such as filtering names etc. See the following items in System.Data.SqlClient:
SqlConnection
SqlCommand
SqlCommand.Parameters
SqlCommand.ExecuteNonQuery()

Related

Azcopy move or remove specific files from SQL

I need to move files from one blob to another. Not copy. I need to move; meaning when I move from A to B then files go from A to B and nothing is there in A. Which is so basic but not possible in Azure Blob. Please let me know if its possible. I am using it from SQL Server using AzcopyVersion 10.3.2
Now because of this, I need to copy files from A to B and then remove files form A. There are 2 problems.
1) I only want certain files to go from A to B.
DECLARE #Program varchar(200) = 'C:\azcopy.exe'
DECLARE #Source varchar(max) = '"https://myblob.blob.core.windows.net/test/myfolder/*?SAS"'
DECLARE #Destination varchar(max) = '"https://myblob.blob.core.windows.net/test/archive?SAS"'
DECLARE #Cmd varchar(5000)
SELECT #Cmd = #Program +' cp '+ #Source +' '+ #Destination + ' --recursive'
PRINT #cmd
EXECUTE master..xp_cmdshell #Cmd
So When I type myfolder/* then it will take all the files. When I try myfolder/*.pdf, it says
failed to parse user input due to error: cannot use wildcards in the path section of the URL except in trailing "/*". If you wish to use * in your URL, manually encode it to %2A
When I try myfolder/%2A.pdf OR myfolder/%2Apdf it still gives the error.
INFO: Failed to create one or more destination container(s). Your transfers may still succeed if the container already exists.
But the destination folder is already there. And in the log file it says,
RESPONSE Status: 403 This request is not authorized to perform this operation.
For azcopy version 10.3.2:
1.Copy specific files, like only copy .pdf files: you should add --include-pattern "*.pdf" to your command. And also remember for the #Source variable, remove the wildcard *, so your #Source should be '"https://myblob.blob.core.windows.net/test/myfolder?SAS"'.
The completed command looks like this(please change it to meet your sql cmd):
azcopy cp "https://xx.blob.core.windows.net/test1/folder1?sas" "https://xx.blob.core.windows.net/test1/archive1?sas" --include-pattern "*.pdf" --recursive=true
2.For delete specific blobs, like only delete .pdf files, you should also add --include-pattern "*.pdf" to your azcopy rm command.
And also, there is no move command in azcopy, you should copy it first => then delete it. You can achieve this with the above 2 commands.

Create a single table by importing all CSV files in a folder?

I have around 30-40 CSV files in a folder. For example, suppose folder 'Florida' has customer information from different cities of state Florida. each CSV file has customer information of one city. Now I want to create a table in SQL Server by importing all the CSV files from that folder to create a table for all customers in Florida. I wanted to know if there is any way I could perform this action for all CSV files at once. I am using SQL Server Management Studio (SSMS).
All the CSV files have same column names.
I am doing the following for one CSV file:
CREATE TABLE sales.cust (
Full_name VARCHAR (100) NOT NULL,
phone VARCHAR(50),
city VARCHAR (50) NOT NULL,
state VARCHAR (50) NOT NULL,
);
BULK INSERT sales.cust
FROM 'C:\Users..............\cust1.csv'
WITH
(
FIRSTROW = 2,
FIELDTERMINATOR = ',', --CSV field delimiter
ROWTERMINATOR = '\n', --Use to shift the control to next row
ERRORFILE = 'C:\Users\..............\cust1ErrorRows.csv',
TABLOCK
)
Suggestion to use command prompt only because of limited tools.
I thought of another solution you can use that could help you out and make it so you only have to import one file.
Create your table:
CREATE TABLE sales.cust (
Full_name VARCHAR (100) NOT NULL,
phone VARCHAR(50),
city VARCHAR (50) NOT NULL,
state VARCHAR (50) NOT NULL,
);
Using command Prompt do the following:
a. Navigate to your directory using cd "C:\Users..............\"
b. Copy the files into one giant file using:
copy *.csv combined.csv
Import that file using GUI in SSMS
Deal with the headers
delete from sales.cust where full_name = 'Full_name' and phone = 'phone'
You can only do this because all columns are varchar.
Here is one route to get all the files into a table---
-- from Rigel and froadie # https://stackoverflow.com/questions/26096057/how-can-i-loop-through-all-the-files-in-a-folder-using-tsql
-- 1.Allow for SQL to use cmd shell
EXEC sp_configure 'show advanced options', 1 -- To allow advanced options to be changed.
RECONFIGURE -- To update the currently configured value for advanced options.
EXEC sp_configure 'xp_cmdshell', 1 -- To enable the feature.
RECONFIGURE -- To update the currently configured value for this feature.
-- 2.Get all FileNames into a temp table
--for repeatability when testing in SMSS, delete any prior table
IF OBJECT_ID('tempdb..#tmp') IS NOT NULL DROP TABLE #tmp
GO
CREATE TABLE #tmp(csvFileName VARCHAR(100));
INSERT INTO #tmp
EXEC xp_cmdshell 'dir /B "C:\\Users..............\\*.csv"';
-- from Chompy #https://bytes.com/topic/sql-server/answers/777399-bulk-insert-dynamic-errorfile-filename
-- 3.Create sql prototype of the Dynamic sql
---- with CSV field delimiter=',' and CSV shift the control to next row='\n'
DECLARE #sqlPrototype nvarchar(500)
SET #sqlPrototype = N'BULK INSERT sales.cust
FROM ''C:\\Users..............\\xxxx''
WITH ( FIRSTROW = 2,
FIELDTERMINATOR = '','',
ROWTERMINATOR = ''\n'',
ERRORFILE = ''C:\\Users..............\\xxxx_ErrorRows.txt'',
TABLOCK)'
-- 4.Loop through all of the files
Declare #fileName varchar(100)
While (Select Count(*) From #tmp where csvFileName is not null) > 0
Begin
Select Top 1 #fileName = csvFileName From #tmp
-- 5.Replace real filename into prototype
PRINT(#filename)
DECLARE #sqlstmt nvarchar(500)
Set #sqlstmt = replace(#sqlPrototype, 'xxxx', #filename)
--print(#sqlstmt)
-- 6.Execute the resulting sql
EXEC sp_executesql #sqlstmt;
-- 4A.Remove FileName that was just processed
Delete from #tmp Where csvFileName = #FileName
End
Caution--If ErrorFile exists, then BulkInsert will fail.

Cannot get the correct files in the folder using SQL xp_smdshell command

I have files in my computer's folder with following names:
XXX.IN.txt
YYY.TEST.NUM.txt
ABC.AA.Z100.X.E999567777.Y001.txt
ABC.AA.Z100.X.E999568888.Y002.txt
ABC.AA.Z100.X.E999568888.Y003.txt
I want to write a SQL statement that would insert files that have the above described structure into a table, so I can later on write some logic on them.
I already used the command line statement inside of my stored proc to check if files exist:
EXEC master.dbo.xp_fileexist #fullPath, #exist OUTPUT
SET #exist = CAST(#exist AS BIT)
Now, I need to find certain files containing string in their names. I have the statement to do that:
DECLARE #cmdLine VARCHAR(200)
DECLARE #fullPath VARCHAR(900) = '\\my_network_path\MyDir\'
DECLARE #filter VARCHAR(100) = 'ABC.AA.Z100.X.*.txt'
SET #cmdLine = 'dir "' + #fullPath + '"'
EXEC master..xp_cmdshell #cmdLine
Above command should give me the following files:
ABC.AA.Z100.X.E999567777.Y001.txt
ABC.AA.Z100.X.E999568888.Y002.txt
ABC.AA.Z100.X.E999568888.Y003.txt
CREATE TABLE #FileDetails
(
data VARCHAR(MAX)
)
INSERT #FileDetails(data) EXEC master..xp_cmdshell #cmdLine
But it lists all the .txt files in the folder
How would I do list only those files that I need
First of all, the #cmdline should be set higher than #fullpath since it's supposed to fit all of it in the end.
Second of all, unless I am looking at it wrong or you didn't correct it here, the #filter variable isn't being used, so it would show every file regardless of extension.
My Code:
DECLARE #cmdLine VARCHAR(2000)
DECLARE #fullPath VARCHAR(1000) = '\\my_network_path\MyDir\'
DECLARE #filter VARCHAR(100) = 'ABC.AA.Z100.X.*.txt'
SET #cmdLine = 'dir "' + #fullPath + #filter + '"'
EXEC master..xp_cmdshell #cmdLine
My Output (keep in mind I created a Test.txt in the same folder):
10-10-2017 12:17 0 ABC.AA.Z100.X.Y001.txt
10-10-2017 12:18 0 ABC.AA.Z100.X.Y002.txt
10-10-2017 12:18 0 ABC.AA.Z100.X.Y003.txt
I would do this with either a CLR proc or an SSIS Script that uses the FileSystemObject to iterate through the files, filter for the ones you want and build a SQL String and execute it.
I don't know any way to do what you want with just straight TSQL.

I am not able to delete a .csv file using xp_cmdshell

I am getting an error Access is denied when I try deleting the csv file from a folder using xp_cmdshell. however I can delete .csv.gpg file successfully from the same location using xp_cmdshell.
My query goes as:
--delete the csv file from local folder
SELECT #Delete2 = 'del ' + 'C:\Akshay\files\testfile.csv'
EXEC master..Xp_cmdshell #Delete2
I would say you need to user declare and set assuming you know what the file name is. I just tried the statement you have and it erred because of the select.
Here is my solution with the information I have:
DECLARE #delete VARCHAR(50)
SELECT #delete = 'del B:\test.txt'
EXEC xp_cmdshell #delete

Import images from folder into SQL Server table

I've being searching for this on google but I haven't found any good explanation, so here's my issue.
I need to import product images, which are in a folder to SQL Server, I tried to use xp_cmdshell but without success.
My images are in C:\users\user.name\Images and the images have their names as the product id, just like [product_id].jpg and they're going to be inserted in a table with the product ID and the image binary as columns.
I just need to list the images on the folder, convert the images to binary and insert them in the table with the file name (as the product_id)
My questions are:
How do I list the images on the folder?
How do I access the folder with dots in their name (like user.name)
How do I convert the images to binary in order to store them in the database (if SQL Server doesn't do that automatically)
Thanks in advance
I figured I'd try an xp_cmdshell-based approach just for kicks. I came up with something that does appear to work for me, so I'd be curious to know what problems you ran into when you tried using xp_cmdshell. See the comments for an explanation of what's going on here.
-- I'm going to assume you already have a destination table like this one set up.
create table Images (fname nvarchar(max), data varbinary(max));
go
-- Set the directory whose images you want to load. The rest of this code assumes that #directory
-- has a terminating backslash.
declare #directory nvarchar(max) = N'D:\Images\';
-- Query the names of all .JPG files in the given directory. The dir command's /b switch omits all
-- data from the output save for the filenames. Note that directories can contain single-quotes, so
-- we need the REPLACE to avoid terminating the string literal too early.
declare #filenames table (fname varchar(max));
declare #shellCommand nvarchar(max) = N'exec xp_cmdshell ''dir ' + replace(#directory, '''', '''''') + '*.jpg /b''';
insert #filenames exec(#shellCommand);
-- Construct and execute a batch of SQL statements to load the filenames and the contents of the
-- corresponding files into the Images table. I found when I called dir /b via xp_cmdshell above, I
-- always got a null back in the final row, which is why I check for fname IS NOT NULL here.
declare #sql nvarchar(max) = '';
with EscapedNameCTE as (select fname = replace(#directory + fname, '''', '''''') from #filenames where fname is not null)
select
#sql = #sql + N'insert Images (fname, data) values (''' + E.fname + ''', (select X.* from openrowset(bulk ''' + E.fname + N''', single_blob) X)); '
from
EscapedNameCTE E;
exec(#sql);
I started with an empty Images table. Here's what I had after running the above:
Now I'm not claiming this is necessarily the best way to go about doing this; the link provided by #nscheaffer might be more appropriate, and I'll be reading it myself since I'm not familiar with SSIS. But perhaps this will help illustrate the kind of approach you were initially trying for.

Resources