How to insert multiple PDF files into database using SSMS - sql-server

I have a folder on my PC that contains nearly 300000 pdf files (~25GB). By using a SQL script, I want to insert all those pdf files into my database, including filename and the PDF file itself.
I have followed many tutorials or articles around there but all of them require the filename to be defined one-by-one for each file.
Upload files article.
I want to have a stored procedure like this one:
Create PROCEDURE [dbo].[Importfiles] (
#FolderPath NVARCHAR (1000)
)
AS
BEGIN
DECLARE #tsql NVARCHAR (2000);
SET NOCOUNT ON
SET #tsql = 'insert into dokumentet ( name, files) ' +
' SELECT ' + // I would like to select all files here (including their file name)
'FROM Openrowset( Bulk ' + '''' + #FolderPath + '''' + ', Single_Blob) as pdf'
EXEC (#tsql)
SET NOCOUNT OFF
END
My database table schema, and source folder is like this:
Name is the file name, Files is the pdf file.
They should look like this on the database table:
Once again my goal is to create a SQL script to take the file and its name automatically, and insert them in the table. But I'm having difficulties.

Related

How to migrate attachments stored on a fileshare, referenced in MS Access, to SQL Server

I have an MS Access database that we're converting to a SQL Server backend. This database has an Attachments table with a few simple columns:
PK, FK to MainTable.RecordID, Description, filename
Attachments are stored in a fileshare. VBA code uses a hardcoded filepath and ShellExecute to save attachments to a directory, under a RecordID subfolder.
We're moving to store attachments in SQL Server using filestream.
I need to move these attachments from fileshare, to SQL Server, while maintaining RecordID integrity. SQL Server tables and columns are already set up.
These attachments vary in extensions (.msg, .doc, .xlsx, .pdf)
I've been looking into "OPENROWSET" but every example I've seen uses only one file.
I've been looking into SSMA but can't find what I'm looking for.
Any references/reference articles or code resources I can use/repurpose would be greatly appreciated.
Sounds like you want to write an SQL stored procedure that will find all files in a given file path, iterate over those files, and insert the file into a table.
This article will help in general: https://www.mssqltips.com/sqlservertip/5432/stored-procedure-to-import-files-into-a-sql-server-filestream-enabled-table/
This article is about xp_dirtree: https://www.sqlservercentral.com/blogs/how-to-use-xp_dirtree-to-list-all-files-in-a-folder
Here's sample code to read the file system from SQL. THIS IS UNTESTED CODE, you'll need to modify to your needs but it gives you some idea of how to do the loops and read in files.
--You will need xm_cmdshell enabled on SQL server if not already.
USE master
GO
EXEC sp_configure 'show advanced option',1
RECONFIGURE WITH OVERRIDE
EXEC sp_configure 'xp_cmdshell',1
RECONFIGURE WITH OVERRIDE
GO
--Create a variable to hold the pickup folder.
DECLARE #PickupDirectory nvarchar(512) = '\\folder_containing_files_or_folders\';
--Create a temp table to hold the files found in the pickup directory.
PRINT 'Parsing directory to identify most recent file.';
DECLARE #DirTree TABLE (
id int IDENTITY(1,1)
, subdirectory nvarchar(512)
, depth int
, isfile bit
);
--Enumerate the pickup directory.
INSERT #DirTree
EXEC master.sys.xp_dirtree #PickupDirectory,1,1 --Second variable is depth.
--Create variables to loop through folders and files.
DECLARE #folderCount int;
DECLARE #folderName nvarchar(max);
DECLARE #folderPath nvarchar(max);
DECLARE #i int = 0;
DECLARE #fileCount int;
DECLARE #fileName NVARCHAR(max);
DECLARE #filePath varchar(max);
DECLARE #j int = 0;
DECLARE #RecordID nvarchar(50);
DECLARE #SQLText NVARCHAR(max);
SET #folderCount = (SELECT Count(*) FROM #DirTree WHERE isfile = 0);
WHILE ( #i < #folderCount )
BEGIN
--Get the next folder to process.
SET #folderName = (
SELECT TOP 1 subdirectory
FROM #DirTree as dt
LEFT OUTER JOIN #processedFolders as pf
on pf.folder_name = dt.subdirectory
WHERE isfile = 0
AND pf.folder_name IS NULL
);
--Get the recordID from folder name.
SET #recordID = #folderName; --Edit this to get the RecordID from your folder structure.
--Concat root path and new folder to get files from.
SET #folderPath = #PickupDirectory + #folderName + '\';
--Enumerate the this subdirectory to process files from.
INSERT #filesToProcess
EXEC master.sys.xp_dirtree #folderPath,1,1
--Get count of files to loop through.
SET #fileCount = (SELECT COUNT(*) FROM #filesToProcess WHERE isfile = 1);
WHILE (#j < #fileCount)
BEGIN
--Get next filename.
SET #fileName = (SELECT TOP 1 subdirectory FROM #filesToProcess WHERE isfile = 1);
--Concat the whole file path.
SET #filePath = #folderPath + #fileName;
SET #SQLText = '
INSERT INTO [table_name](RecordID,[filename],[filestreamCol])
SELECT
''' + #RecordID + '''
, ''' + #fileName + '''
, BulkColumn
FROM OPENROWSET(Bulk ''' + #filePath + ''', Single_Blob) as tb'
EXEC Sp_executesql #SQLText
DELETE FROM #filesToProcess
WHERE subdirectory = #fileName;
SET #j = #j + 1;
END
INSERT INTO #processedFolders (folder_name)
SELECT #folderName;
PRINT 'Folder complete: ' + #folderName;
SET #i = #i + 1
END
I think you want to parse just a root directory with the xp_dirtree command above. That will display all the subdirectories which should contain the "RecordID". Read the RecordID into a variable, then parse each of those subdirectories to get the actual files. If you want more detailed code, you'll have to show some examples of the directory structure and the destination table.

Import images from folder into SQL Server table

I've being searching for this on google but I haven't found any good explanation, so here's my issue.
I need to import product images, which are in a folder to SQL Server, I tried to use xp_cmdshell but without success.
My images are in C:\users\user.name\Images and the images have their names as the product id, just like [product_id].jpg and they're going to be inserted in a table with the product ID and the image binary as columns.
I just need to list the images on the folder, convert the images to binary and insert them in the table with the file name (as the product_id)
My questions are:
How do I list the images on the folder?
How do I access the folder with dots in their name (like user.name)
How do I convert the images to binary in order to store them in the database (if SQL Server doesn't do that automatically)
Thanks in advance
I figured I'd try an xp_cmdshell-based approach just for kicks. I came up with something that does appear to work for me, so I'd be curious to know what problems you ran into when you tried using xp_cmdshell. See the comments for an explanation of what's going on here.
-- I'm going to assume you already have a destination table like this one set up.
create table Images (fname nvarchar(max), data varbinary(max));
go
-- Set the directory whose images you want to load. The rest of this code assumes that #directory
-- has a terminating backslash.
declare #directory nvarchar(max) = N'D:\Images\';
-- Query the names of all .JPG files in the given directory. The dir command's /b switch omits all
-- data from the output save for the filenames. Note that directories can contain single-quotes, so
-- we need the REPLACE to avoid terminating the string literal too early.
declare #filenames table (fname varchar(max));
declare #shellCommand nvarchar(max) = N'exec xp_cmdshell ''dir ' + replace(#directory, '''', '''''') + '*.jpg /b''';
insert #filenames exec(#shellCommand);
-- Construct and execute a batch of SQL statements to load the filenames and the contents of the
-- corresponding files into the Images table. I found when I called dir /b via xp_cmdshell above, I
-- always got a null back in the final row, which is why I check for fname IS NOT NULL here.
declare #sql nvarchar(max) = '';
with EscapedNameCTE as (select fname = replace(#directory + fname, '''', '''''') from #filenames where fname is not null)
select
#sql = #sql + N'insert Images (fname, data) values (''' + E.fname + ''', (select X.* from openrowset(bulk ''' + E.fname + N''', single_blob) X)); '
from
EscapedNameCTE E;
exec(#sql);
I started with an empty Images table. Here's what I had after running the above:
Now I'm not claiming this is necessarily the best way to go about doing this; the link provided by #nscheaffer might be more appropriate, and I'll be reading it myself since I'm not familiar with SSIS. But perhaps this will help illustrate the kind of approach you were initially trying for.

How do I save base64-encoded image data to a file in a dynamically named subfolder

I have a table containing base64-encoded jpegs as well as some other data. The base64 string is stored in a VARCHAR(MAX) column.
How can I save these images out to actual files inside folders that are dynamically named using other data from the table in a stored procedure?
I found the answer by combining a lot of small tips from different places and wanted to collate them all here as no one seemed to have the full process.
I have two tables, called Photos and PhotoBinary. Photos contains at least a PhotoID BIGINT, the Base64 data VARCHAR(MAX), and something to get the FolderName from - NVARCHAR(15) - increase as needed. I also have a BIT field to mark them as isProcessed. PhotoBinary has a single VARBINARY(MAX) column and should be empty.
The second table serves two purposes, it holds the converted base64 encoded image in binary format, and allows me to work around the fact that BCP will not let you skip columns when exporting data from a table when using a "format file" to specify the column formats. So in my case the data has to sit in a table all on its own, I did try taking it from a view but had the aforementioned issue with not being allowed to skip the id column.
The main stored procedure has a bcp command that depends upon an .fmt file created using the following SQL. I'm pretty sure I had to edit the resulting file with a plain text editor to change the 8 to a 0 that indicates the prefix length after SQLBINARY. I couldn't just use the -n switch on the command in the main stored procedure as it resulted in an 8 byte prefix being put into the resulting file, which made it an invalid jpeg. So, I used the edited format file to get around that.
DECLARE #command VARCHAR(4000);
SET #command = 'bcp DB.dbo.PhotoBinary format nul -T -n -f "A:\pathto\photobinary.fmt"';
EXEC xp_cmdshell #command;
I then have the following in a stored procedure that I run to export the images into an appropriate folder:
DECLARE #command VARCHAR(4000),
#photoId BIGINT,
#imageFileName VARCHAR(128),
#folderName NVARCHAR(15),
#basePath NVARCHAR(500),
#fullPath NVARCHAR(500),
#dbServerName NVARCHAR(100);
DECLARE #directories TABLE (directory nvarchar(255), depth INT);
-- The location of the output folder
SET #basePath = '\\server\share';
-- The server that the photobinary db is on
SET #dbServerName = 'localhost';
-- #basePath values, get the folders already in the output folder
INSERT INTO #directories(directory, depth) EXEC master.sys.xp_dirtree #basePath;
-- Cursor for each image in table that hasn't already been exported
DECLARE photo_cursor CURSOR FOR
SELECT PhotoID,
'some_image_' + CAST(PhotoID AS NVARCHAR) + '.jpg',
FolderName
FROM dbo.Photos
WHERE isProcessed = 0;
OPEN photo_cursor
FETCH NEXT FROM photo_cursor
INTO #photoId,
#imageFileName,
#folderName;
WHILE (##FETCH_STATUS = 0) -- Cursor loop
BEGIN
-- Create the #basePath directory
IF NOT EXISTS (SELECT * FROM #directories WHERE directory = #folderName)
BEGIN
SET #fullPath = #basePath + '\' + #folderName;
EXEC master.dbo.xp_create_subdir #fullPath;
END
-- move and convert the base64 encoded image to a separate table in binary format
-- it should be the only row in the table
INSERT INTO DB.dbo.PhotoBinary (PhotoBinary)
SELECT CAST(N'' AS xml).value('xs:base64Binary(sql:column("Base64"))', 'varbinary(max)')
FROM DB.dbo.Photos
WHERE PhotoID = #photoId;
-- This command uses the command-line BCP tool to "bulk export" the image data in binary to an "archive" file that just happens to be a jpg
SET #command = 'bcp "SELECT TOP 1 PhotoBinary FROM DB.dbo.PhotoBinary" queryout "' + #basePath + '\' + #folderName + '\' + #imageFileName + '" -T -S ' + #dbServerName + ' -f "A:\pathto\photobinary.fmt"';
EXEC xp_cmdshell #command;
-- clean up the photo data
DELETE FROM DB.dbo.PhotoBinary;
-- mark photo as processed
UPDATE DB.dbo.Photos SET isProcessed = 1 WHERE PhotoID = #photoId;
FETCH NEXT FROM photo_cursor
INTO #photoId,
#imageFileName,
#folderName;
END -- cursor loop
CLOSE photo_cursor
DEALLOCATE photo_cursor

Execute BCP command in remote server

i am facing an issue while executing the BCP command in the remote server.
i have the following command in a batch file which executes a script file.
The script file process a temporary table and writes the records in temporary table to a file.
SQLCMD -SQA_Server236 -dtestdb -Usa -PPassword1 -i"D:\script\Writing to Files\Write to CSV.sql" -o"D:\script\script_logs.log"
--script files contains...
declare #table NVARCHAR(255)
declare #filename VARCHAR(100)
set #filename='C:\TextFile\abcd.csv'
set #table ='##Indexes_Add'
IF OBJECT_ID('tempdb..#Indexes_Add') IS NOT NULL
BEGIN
DROP TABLE ##Indexes_Add
END
CREATE TABLE ##Indexes_Add
(
id int IDENTITY(1,1) PRIMARY KEY,
alter_command NVARCHAR(MAX),
successfully_readded BIT NULL,
)
insert into ##Indexes_Add select 'a',0
insert into ##Indexes_Add select 'a',0
SET NOCOUNT ON;
IF OBJECT_ID('tempdb..'+#table) IS NOT NULL
BEGIN
DECLARE
#sql NVARCHAR(MAX),
#cols NVARCHAR(MAX) = N'';
SELECT #cols += ',' + name
FROM tempdb.sys.columns
WHERE [object_id] = OBJECT_ID('tempdb..'+#table)
ORDER BY column_id;
SELECT #cols = STUFF(#cols, 1, 1, '');
SET #sql = N'EXEC master..xp_cmdshell ''bcp "SELECT '''''
+ REPLACE(#cols, ',', ''''',''''') + ''''' UNION ALL SELECT '
+ 'RTRIM(' + REPLACE(#cols, ',', '),RTRIM(') + ') FROM '
+ 'tempdb.dbo.'+#table + '" queryout "' + #filename + '" -c -t, -SQA_Server236 -Usa -PPassword1''';
EXEC sp_executesql #sql;
print #sql
END
My problem-:
When i run the above batch command in a batch file and i give my local server other than server name "QA_Server236" , i am getting the file "abcd.csv" created in my system but when i give the server name as "QA_Server236" the file is created in the remote machine i.e QA_Server236. But i want the file to be created in my system if the given server is a remote server say "QA_Server236"
Can anyone help me in this issue. i am not getting any method to do so.
If I'm right BCP does not allow saving result sets and/or logs to remote machines. But you could try to mount a folder on remote PC to a shared folder on your local machine and set the output location there, or maybe try using network path "\[YOURPC]....".As I'm not sure it works (actually, I think it won't), here's the only solution I can think of: add a line to your batch which moves the file(s) from the remote machine to your PC (xcopy or similar) after BCP finished executing (in batch, do this as "Call bcp.exe [params]" instead of just "bcp.exe ...")Hope this helps!

How to create text files of database rows?

I have a database table with a column named File Content, and many rows. What I need is to create a text file for each row of File Content column.
Example:
Sr. File Name File Content
1. FN1 Hello
2. FN2 Good Morning
3. FN3 How are you?
4. FN4 Where are you?
Suppose I have 4 rows, then 4 text files should be created (maybe with any name which we want)
File1.txt should have text "hello" in it.
File2.txt should have text "Good Morning" in it.
File3.txt should have text "How are you?" in it.
File4.txt should have text "Where are you?" in it
Although you said you said you need to do it in TSQL, I wouldn't do it that way if possible. Ram has shown you one solution, but it has the disadvantages that you need to use xp_cmdshell and the SQL Server service account needs permission to access the file system in whatever location you want to have the files.
My suggestion would be to write a script or small program in your preferred language (PowerShell, Perl, Python, C#, whatever) and use that instead. TSQL as a language is simply badly suited for manipulating files or handling anything outside the database. It is obviously possible (CLR procedures are another way), but you often run into problems with permissions, encodings and other issues that are much easier to deal with in an external language.
This can be done with BCP OUT syntax of SQL server.
For the setup: just make sure that you have xp_cmdshell exec permissions on the server. This can be checked from master.sys.configurations table. Also change filelocation path corresponding to your server or network share. I checked and was able to generate 4 files as there are 4 records in the table.
use master
go
declare #DSQL Nvarchar(max)
declare #counter int
declare #maxrows int
declare #filename Nvarchar(30)
select #counter=1, #maxrows = 0
create table t1 (
sno int identity(1,1) not null,
filename varchar(5),
filecontent varchar(100)
)
insert into t1
select 'FN1', 'Hello'
UNION
select 'FN2', 'Good Morning'
UNION
select 'FN3', 'How are you?'
UNION
select 'FN14', 'Where are you?'
select #maxrows = count(*) from t1
--SELECT * FROM T1
while (#counter <= #maxrows)
begin
select #filename = filename from t1
where sno = #counter
select #DSQL = N'exec xp_cmdshell' + ' ''bcp "select filecontent from master.dbo.T1 where sno = ' + cast(#counter as nvarchar(10)) + '" queryout "d:\temp\' + #filename + '.txt" -T -c -S home-e93994b54f'''
print #dsql
exec sp_executesql #DSQL
select #counter = #counter + 1
end
drop table t1

Resources