SSIS Importing Excel Data - sql-server

I've been tasked with importing data from an excel spreadsheet in to a table in SQL 2012. The spreadsheet will have data added to it monthly.
My plan, is to use SSIS to create a workflow to do this, I then will use SQL Job agent to execute the workflow at the beginning of every month to add in the new data.
One problem i can think of with this plan is the spreadsheet is going to become huge and eventually exceed the excel maximum rows. Instead of adding to the one spreadsheet I could have a new spreadsheet for each month? Though I'm not sure how I can use the workflow to pick the newest spreadsheet to add to the table
I'm a complete novice to SSIS, there might even be a more practical way of doing this whole process, so please feel free to offer suggestions.

Why inserting data into one Excel worksheet??
Inserting data into one Excel worksheet or even one workbook (Excel file) is not a good practice at all, you have to think in another way, you can create a new Excel file each time new data comes and save historical data in another repository or directory (if you need to). Or as #TabAlleman suggested if you can use flat files, it is more recommended since reading data from Excel is more difficult. But also make sure that you will not store all data in one flat file.

Related

Get data from SQL Server with Excel vba and modify some of them then update the table

I have a SQL Server table with some row (max 6000). With Excel vba, I can create a query and the results can show in a sheet.
Next I can change some record in the worksheet.
After all I can make a copy from the original sheet and after some cells changed I compare all the cells and I can update in the SQL Server database only the different cells.
My question is there a simpler way to do this?
Maybe after I leave a cells the vba compare the before/after content and make an update if required?
Thanks your opinions!
This is the VBA event you should use to catch if a cell is changed:
https://msdn.microsoft.com/en-us/vba/excel-vba/articles/worksheet-change-event-excel
If you don't want to spend much time on it, I have created an Excel AddIn which updates data from Excel to SQL Server. It is a commercial product, but if it is a one time job there is a fully functional 14-day trial which you can download from
https://sqlspreads.com

Extract data from thousands of Excel files into database

We use SharePoint 2013 as a library to hold thousands of Excel files, with almost never consistent formatting, to manage projects occurring on servers. Somewhere in these maybe formatted as table objects is a common set of server names.
Somehow, without being able to change this process in the short term, I need to pull data from all these files to identify how many projects are targeting a particular server.
I've got access to SQL Server 2016 enterprise, and wondering if something like PolyBase could help with this? I also wonder about SSIS but I don't expect any tables to look exactly like another one.
Other tools may be an option, but I'm not sure what can handle this scale and variety. I think daily updates to the data would be enough, but even so it's still a mess.
How do I pull thousands of varied excel tables into a database? Is this even possible?
Any longer term solution that doesn't allow them to format and annotate like excel is unlikely to actually be adopted.
The less you know in advance, the more difficult it will be...
Some ideas:
Technology
read about FROM OPENROWSET which allows to read from an Excel
read about linked server
Use Excel and its great abilities through VBA to iterate through all your Excel-Sheets, open them, analyse them and fill proper tables. Within Excel you know most about your messy data...
Target structure
You might create thousands of tables, each representing one single sheet in all your Excel files. You could query these tables with dynamically created SQL (using meta-data of INFORMATION_SCHEMA) or think about Full-Text-Search
You might import each sheet into one single XML-structure (SELECT * ... FOR XML PATH('...')). In this case you'd need a target table with columns for Path and name of your Excel, Name of the sheet and an XML column for your data. Another approach was to represent each File on one XML and include all sheets there. Try to define common naming for all your data. Querying XML allows to query columns without knowing their actual names (XQuery with XPath using *).
If your Excels are xlsx already, you might open them with UNZIP and take the existing XML as-is.
To be honest: I do not think that any tool can do the magic to import such a wide range of mess automatically...

How can I import multiple csv files from a folder into sql, into their own separate table

I would like some advice on the best way to go about doing this. I have multiple files all with different layouts and I would like to create a procedure to import them into new tables in sql.
I have written a procedure which uses xp_cmdshell to get the list of file names in a folder and the use a cursor to loop through those file names and use a bulk insert to get them into sql but I dont know the best way to create a new table with a new layout each time.
I thought if I could import just the column row into a temp table then I could use that to create a new table to do my bulk insert into. but I couldn't get that to work.
So whats the best way to do this using SQL? I am not that familiar with .net either. I have thought about doing this in SSIS, I know its easy enough to load multiple files which have the same layout in SSIS but can it be doe with variable layouts?
thanks
You could use BimlScript to make the whole process automated where you just point it at the path of interest and it writes all the SSIS and T-SQL DDL for you, but for the effort involved in writing the C# you'd need, you may as well just put the data dump into SQL Server in the C#, too.
You can use SSIS to solve this issue, though, and there are a few levels of effort to pick from.
The easiest is to use the SQL Server Import and Export Wizard to create SSIS packages from your Excel spreadsheets that will dump the sheet into its own table. You'd have to run this wizard every time you had a new spreadsheet you wanted to import, but you could save the package(s) so that you could re-import that spreadsheet again.
The next level would be to edit a saved SSIS package (or write one from scratch) to parameterize the file path and the destination table names, and you could then re-use that package for any spreadsheets that followed the same format.
Further along would be to write a package that determined with of the packages from the previouw level to call. If you can query the header rows effectively, you could probably write an SSIS package that accepted a path as an input parameter, found all the Excel sheets in that path, queried the header rows to determine the spreadsheet format, and then pass that information to the parameterized package for that format type.
SSIS development is, of course, its own topic - Integration Services Features and Tasks on MSDN is a good place to start. SSIS has its quirks, and I highly recommend learning BimlScript if you want to do a lot of SSIS development. If you'd like to talk over what the ideas above would require in more detail, please feel free to message me.

Best way to import large excel file into SQL Server

We are trying to devise an optimal method for importing very large Excel files into SQL database. Using SSIS is somewhat troublesome because it scans top X records to determine the format of the file, but rows further down may be different, so it takes a lot of trial and error, with us having to bring the unusual columns to the top so SSIS can "learn".
When we get new file formats to import, they conform to specification in terms of row formatting etc - so we can say we know the schema in advance. The SQL destination tables have the same schema, with couple of extra columns such as date inserted and original filename.
Is there an easier way to create format definitions for new files we are going to insert? We don't have to use SSIS, we are open to any other tool, with a view for as much automation as possible. There's a question of testing the sanity of data we will import, we were planning on doing basic queries against staging datasets such as "less than 1% of records can miss postal code" etc.
Many thanks
Maybe you can import data as text and after that you can convert that using Derived Column transformation. You can read data from Excel as Text using IMEX option in Connection String. More information about this parameter you can find here.

select data from excel spreadsheet that's read only

I've got a dtsx package that selects data from an excel spreadsheet on the network and inserts it into a sql server table twice a day. However, the process fails if someone is in the spreadsheet modifying data. Is there a way to select data from an excel spreadsheet so that it doesn't fail if someone is in the spreadsheet?
Not used dtsx in anger in a long time but as an alternative solution.
Each time the job runs, would it be possible to create a temporary copy of that spreasheet (via your dtsx package) and then you can use that copy instead to import the data into table. When your done with the copy you can simply remove it.

Resources