Merge Number of Excel Files and load to SQL Database - sql-server

I am trying to merge a number of files. About 40,000 excel files all in exactly the same format (columns etc).
I have tried to run a merge command through CMD which has merged them together to a point but the CSV file it has merged to I am unable to open due to the size of it.
What I am trying to find out is what is the best process to merge such a large amount of files and then the process to load them into SQL server.
Is there any tools or something that may need to be customised and built?

I don't know a tool for that, but my first idea is this, assumed you are experienced with Transact SQL:
open a command shell, change to folder where your Excel files are stored in and enter the following command: dir *.xlsx /b > source.txt
This will create a textfile named "source.txt", which contains the names (and only the names) of all your Excel files
import this file in a SQL Server table, i.e. called "sourcefiles"
create a new stored procedure, which contains a cursor. The cursor should read your table "sourcefiles" in a loop row by row and store the name of the actually readed Excel file in a variable, i.e. called "#FileName"
in this loop perform a sql statement like this for every readed Excel file:
SELECT * INTO dbo.YourDatabaseTable
FROM OPENROWSET('Microsoft.ACE.OLEDB.12.0',
'Excel 12.0 Xml;HDR=YES;Database=#FileName',
'SELECT * FROM [YourWorkSheet$]')
let the cursor read the next row
Replace "YourDataseTable" and "YourWorkSheet" with your needs.
#FileName must contain the full path to the Excel files.
Maybe you have to download the Microsoft.ACE.OLEDB.12.0-Provider before executing the sql command.
Hope, this helps to think about your further steps
Michael
edit: have a look on this website for possible errors

Related

Import data from Excel using SSIS without knowing the file name

I'm working on an SSIS package that will be used to import data from an Excel file into SQL Server. My current struggle is figuring out how to make the SSIS package bring in exactly one excel file without knowing the name of it beforehand. I have a directory that will contain between 0 and n excel files at the same time. I want to pull in only the file with the oldest creation time. Is this possible?
I'm using Visual Studio 2015 to build the SSIS package. My DB is in SQL Server 2016.
To create a dynamic file connection:
Create a new Variable (Name Example: 'SourceFile') of datatype String.
In a 'For Each Loop Container' map that variable under the 'Variable Mapping' Tab and set the 'Enumerator Configuration' to the correct folder and file extension.
The 'For Each Loop Container' will read the file from the location and assign the name of the file to the variable.
In the Expressions Properties of your file connection set the ConnectionString property to #[User::SourceFile]
This should make your file source dynamic. It will pick up the file no matter what it is named, but the format of the file will have to be consistent.
Using just SSIS tasks, I am not aware of how to utilize the create date of the files to pick the oldest file, but if the file name contains the create date of the file you could substring the date out of #[User::SourceFile] variable and store it in another variable with each execution of the 'For Each Loop Container' to determine which file is oldest.

Moving files based on a source path found in a table using SSIS

I've chased my tail for a full 12 hours. Haven't found the right solution.
I'm locked into using SSIS. I have a SQL Server table with full paths and filenames already concatenated. Examples:
\\MydevServer1\C$\ABC\App_Data\Sample.pdf
\\MydevServer2\E$\Garth\App_Data\Morefiles.txt
\\MydevServer3\D$\Paths\App_Data\MySS.xlsx
etc.
I need to read each row of the table, get the path and filename and move that file to a new static destination directory.
The rows in the table will remain unchanged. I only use it as a source to locate the file to be moved.
I've tried:
1) Feeding a resultset from an ole db source to a recordset destination then to an Object variable that connects via variable to a foreach loop container holding a files system task. (Very problematic.)
2) Sending the table rows to a .csv file and reading each line of the csv file using a foreach loop container holding a file system task.
3) Reading directly from the table rows using a foreach loop container holding a file system task. (preferred).
and many other scenarios.
I have viewed a hundred examples online, but most of them involve loading a table, or sending results to flat files, or moving files from one folder to another based on extension type, etc. I haven't found anything on configuring a file system task to read a table supplied path and move the file based on the table value as the source.
I'm rambling. :-)
Any insight or help will be appreciated. I'm not new to SSIS, but I sure feel like it right now.
Create two string variables to store source and destination paths
Use an Execute SQL Task to populate a Full Recordset (Variable with Object data type)
Use For Loop container to go through each row of recordset and set those two variables.
Inside For Loop container, use File System Task. You need to specify IsSourcePathVariable = True, IsDestinationPathVariable = True, path variables - DestinationVariable / SourceVariable, and set operation (copy, move, etc.)
It appears I've been tail chasing due to the error, "Source is empty error".
This was caused by a blank first row in my recordset. I was searching for a fix to the Object variable is empty issue, when in reality the issue was that the Object variable couldn't find data right off the bat.
Insert shameful smug here.
Thanks to Anton for the help.

Use SSIS to import multiple .csv files that each have unique columns

I keep running into issues creating a SSIS project that does the following:
inspects folder for .csv files -> for each csv file -> insert into [db].[each .csv files' name]
each csv and corresponding table in the database have their own unique columns
i've tried the foreach loop found in many write ups but the issue comes down to the flat file connection. it seems to expect each csv file has the same columns as the file before it and errors out when not presented with this column names.
anyone aware of a work around for this?
Every flat file format would have to have it's own connection because the connection is what tells SSIS how to interpret the data set contained within the file. If it didn't exist it would be the same as telling SQL server you want data out of a database but not specifying a table or its columns.
I guess the thing you have to consider is how are you going to tell a data flow task what column in a source component is going to map to a destination component? Will it always be the same column name? Without a Connection Manager there is no way to map the columns unless you do it dynamically.
There are still a few ways you can do what you want and you just need to search around because I know there are answers on this subject.
You could create a Script Task and do the import in .Net
You could create a SQL Script Task and use BULK INSERT or OPENROWSET into a temporary stagging table and then use dynamic sql to map and import the final table.
Try to keep a mapping table with below columns
FileLocation
FileName
TableName
Add all the details in the table.
Create user variables for all the columns names & one for result set.
Read the data from table using Execute SQL task & keep it in single result set variable.
In For each loop container variable mappings map all the columns to user variables.
Create two Connection Managers one for Excel & other for csv file.
Pass CSV file connection string as #[User::FileLocation]+#[User::FileName]
Inside for each loop conatiner use bulk insert & assign the source & destination connections as well as table name as User::TableName parameter.
if you need any details please post i will try to help you if it is useful.
You could look into BiML Script, which dynamically creates and executes a package, based on available meta data.
I got 2 options for you here.
1) Scrip component, to dynamically create table structures in sql server.
2) With for each loop container, use EXECUTE SQL TASK with OPENROWSET clause.

How to insert many source tables into one destination table using SSIS?

I have a connection manager (oledb) that points to a folder that has 20 dbf files. All the dbf files have the same schema. Just the data are for specific entities. I want to take all the data from the 20 dbf files and insert them into one table (in sql server). What tasks enables me to do this?
You should create "For each loop" container, to write each file path to variable f.e. #fp
After that, you put inside DataFlow task and configure the connection
After, you should create another variable like #table (substring your #fp,to only file name) and put this variable in DataFlow task in source table.
ready example

Convert SSMS .rpt output file to .txt/.csv

I want to export my big SSMS (SQL Server Management Studio) query result (2.5m lines, 9 fields) as .csv or comma-delimited .txt (with headings). (MS SQL Server 2005 Management Studio.)
So that I can then either read it line-by-line into VBA program (to do certain calculations on the data) or do queries on it in Excel (e.g. with Microsoft Query). The calculations are complicated and I prefer to do it somewhere else than SSMS.
If I choose ‘query result to text’ in SSMS and a small answer (few lines e.g. up to 200k) I could of course simply copy and paste to a text editor. For my large answer here I could of course copy and paste 200k or so lines at a time, 10 times, into a text editor like Ultra-Edit. (When I try all 2.5m at once, I get a memory warning inside SSMS.) But for the future I’d like a more elegant solution.
For ‘query result to file’, SSMS writes to an .rpt file always. (When you right-click in the results window and choose ‘save as’, it gives a memory error just like above.)
--> So it looks like my only option is to have SSMS output its result to a file i.e. .rpt and then afterwards, convert the .rpt to .txt.
I assume this .rpt is a Crystal Reports file? Or isn't it. I don’t have Crystal Reports on my PC, so I cannot use that to convert the file.
When opening the .rpt in Ultra-Edit it looks fine. However in Microsoft Query in Excel, the headings doesn’t want to show.
When I simply read & write the .rpt using VBA, the file halves in size. (330meg to 180meg). In Microsoft Query the headings do show now (though the first field name has a funny leading character, which has happened to me before in other totally different situations). I do seem to be able to do meaningful pivot tables on it in Excel.
However when I open this new file in Ultra-Edit, it shows Chinese characters! Could there still be some funny characters in it somewhere?
--> Is there perhaps a free (and simple/ safe) converter app available somewhere. Or should I just trust that this .txt is fine for reading into my VBA program.
Thanks
Simple way: In SQL Server Management Studio, go to the "Query" menu & select "Query Options…" > Results > Text > Change "Output Format" to "Comma Delimited". Now, run your query to export to a file, and once done rename the file from .rpt to .csv and it will open in Excel :).
Here is my solution.
Use Microsoft SQL Server Management Studio
Configure it to save Tab delimited .rpt files: Go to 'Query' > 'Query Options' > 'Results' > 'Text' > 'Output Format' and choose 'Tab delimited' (press OK)
Now, when you create a report, use the 'Save With Encoding...' menu, and select 'Unicode' (by default, it's 'UTF8')
You can now open the file with Excel, and everything will be in columns, with no escaping nor foreign characters issues (note the file may be bigger due to unicode encoding).
Well with the help of a friend I found my solution: Rpt files are plain text files generated in MS SQL Server Management Studio, but with UCS-2 Little Endian encoding instead of ANSI.
--> In Ultra-Edit the option ‘file, conversion options, unicode to ASCII’ did the trick. The text file reduces from 330meg to 180 meg, Microsoft Query in Excel can now see the columns, and VBA can read the file & process lines*.
P.s. Another alternative would have been to use MS Access (which can handle big results) and connect with ODBC to the database. However then I would have to use Jet-SQL which has fewer commands than the T-SQL of MS SQL Server Management Studio. Apparently one can create a new file as .adp in MS Access 2007 and then use T-SQL to a SQL Server back end. But in MS Access 2010 (on my PC) this option seems not to exist anymore.
You can use BCP
Open a command prompt, then type this:
SET Q="select * from user1.dbo.table1"
BCP.EXE %Q% queryout query.out -S ServerName -T -c -t
You can use -U -P (instead of -T) for SQL Authentication.
Your app have a problem with UNICODE. You can force a code page using -C {code page}. If in doubt, try 850.
-t will force tab as field delimiter, you can change it for comma -t,
The nice thing is you can call this directly from your VBA running shell command.
This is the recommended way I see you can do it.
My Source (Answer from DavidAir)
Pick "results to grid" then then right-click on the grid and select "Save Results As..." This will save a CSV.
Actually, there is a problem with that if some values contain commas - the resulting CSV is not properly escaped. The RPT file is actually quite nice as it contains fixed-width columns. If you have Excel, a relatively easy way of converting the result to CSV is to open the RPT file in Excel. This will bring up the text import wizard and Excel would do a pretty good job at guessing the columns. Go through the wizard and then save the results as CSV.
I recommend using the "SQL Server Import and Export Wizard" for a couple reasons:
The output file will not have a status message at the bottom like a .rpt file does (ie. "(100 rows affected)") which may mess up your data import
Ability to specify custom row and column delimiters of a length greater than 1 character
Ability to specify custom source to destination mapping (ie. column FirstName can be mapped to first_name in the CSV)
Ability to perform a direct transfer to any other database accessible from the SSMS machine
Ability to explicitly select your file encoding and locale
It can be accessed by right-clicking on your database in the management studio (you must right-click the database and not the table) and selecting Tasks > Export Data.
When asked for data source you can select the "SQL Server Native Client" and when asked to select a destination you can select "Flat File Destination".
You are then asked to specify a table or query to use.
You can find more info about the tool here:
https://learn.microsoft.com/en-us/sql/integration-services/import-export-data/start-the-sql-server-import-and-export-wizard?view=sql-server-2017
In my case, I execute a query on SSMS (before that press CTRL+SHIFT+F) the result open a window to save it as an rpt file, I couldn´t read it (no Crystal Report install in my computer) so...next time I runned the query I saved it as (all files) set with extension *.txt, and that´s it I was able to read it as text file.
First get your data in .rpt file by using any of above method.
Default .rpt with fixed space column. (262MB)
Comma delimited with Unicode. (52MB) - I used this.
Change file extension to .csv.
Open/Import it in excel and verify data. File type is 'Text Unicode'.
Save it as CSV (Comma Delimited), which reduced size to 25 MB.

Resources