I have group of excel files in a folder. excel file name will be like
ABC 2014-09-13.xlsx
ABC 2014-09-14.xlsx
ABC 2014-09-15.xlsx
I need to get the data from latest excel file and load it into the table using ssis package.
This may not be the shortest answer, but will help you.
Steps:
Create a For-each loop, to fetch all the excel sheets. Insert all the excel sheet names to a table.
Create a variable. Assign its value as the MAX() among Excel dates.
Add a 2nd Fore-each loop. Just like the 1st loop, pick all the excel sheets 1 by 1, compare each file name with Variable value. Load the excel which matches it.
As this is duplicate question, I will put answer anyway with some changes or additional info.
You should have created table for excel to import and added Connection Manager into package.
Create 2 variables MainDir, where excel files exists, and ExcelFile to hold last file full name.
Add Script Task to package. Open it and in the Script tab add ReadOnlyVariables = User::MainDir and ReadWriteVariables = User::ExcelFile
Press Edit Script... button and in the new window paste this code:
into Main
string fileMask = "*.xlsx";
string mostRecentFile = string.Empty;
string rootFolder = string.Empty;
rootFolder = Dts.Variables["User::MainDir"].Value.ToString();
System.IO.DirectoryInfo directoryInfo = new System.IO.DirectoryInfo(rootFolder);
System.IO.FileInfo mostRecent = null;
System.IO.FileInfo[] legacyArray = directoryInfo.GetFiles(fileMask, System.IO.SearchOption.TopDirectoryOnly);
Array.Sort(legacyArray, (f2, f1) => f2.Name.CompareTo(f1.Name));
mostRecent = legacyArray[legacyArray.Length - 1];
if (mostRecent != null)
{
mostRecentFile = mostRecent.FullName;
}
Dts.Variables["User::ExcelFile"].Value = mostRecentFile;
Dts.TaskResult = (int)ScriptResults.Success;`
Create Excel Connection Manager and in the Edit mode select Excel file path to some excel, Excel version and if needed keep First row has column names checked.
In the properties of Excel Connection Manager find Expressions and add Property ExcelFilePath with value #[User::ExcelFile]
Put Data Flow Task, connect with Script task.
Add Excel Source into Data Flow Task. Open editor. Select Excel Connection Manager you created before, Data access mode change to SQL command and add this line (make sure, that excel file sheet name is Sheet1): SELECT * FROM [Sheet1$]. Also check if all necessary columns selected in Columns tab.
The last component is OLE DB Destination, which you must connect with Excel Source component. Add connection manager, select table and mappings to table you want to insert.
That's all you need to do to insert excel...
Related
I have a table which stores all of my customers and their invoices (less than 5k total), I want to to use a foreach loop container to write each one of these (customers) to their own file listing their own invoices.
I have used a foreach loop container to read/load/write files before so I understand that part but how do I apply the foreach loop on the AccountNumber as the enumerator?
For each file, I only want that customers info.
My table:
AccountNumber InvoiceNumber OriginalCharge
A255 2017-11 225.00
A255 2017-12 13.50
A255 2018-01 25.00
D870 2017-09 7.25
D870 2017-10 10.00
R400 2016-12 100.00
R400 2017-03 5.00
R400 2017-04 7.00
R400 2017-09 82.00
So this would produce 3 files and would include the invoices/original charge for the given customers.
File 1 = Customer A255
File 2 = Customer D870
File 3 = Customer R400
Or should I approach this differently?
Environment: SQL Server 2014
SSIS-2012
Thanks!
You'll need to apply a few different recipes to make this work.
Dynamic file name
Source query parameterization
Shredding record set
Assumptions
You have three SSIS Variables:
CurrentAccountNumber String (initial value of A255)
rsAccountNumbers Object
FileNameOutput String EvaluateAsExpression = True "C:\\ssisdata\output\\" + #[User::CurrentAccountNumber] + ".txt"
The package would look something like
[Execute SQL Task] -> [Foreach (Ado.net) Enumerator] -> [Data Flow Task]
Execute SQL Task
Set the resultset type to Full
Your source query would be SELECT DISTINCT AccountNumber FROM dbo.Invoices;
In the Results tab, assuming OLE DB Connection Manager, click add result button and use a "name" of 0 and the variable becomes User::rsAccountNumbers
Foreach (Ado.net) Enumerator
Set your enumerator type as Ado.NET and single table. Use the variable User::rsAccountNumbers and assign the zeroeth element to our variable CurrentAccountNumber
Run the package as is to verify the Execute SQL Task is returning a resultset that the Foreach can shred. Observe that each loop in the enumerator results in the value of our Variable FileNameOutput changing (C:\ssisdata\output\A255.txt, C:\ssisdata\output\D870.txt, etc)
Data flow task
This a simple flow
[OLE DB Source] -> [Flat File Destination]
Configure your OLE DB Source to be a Query SELECT * FROM dbo.Invoices WHERE D.AccountNumber = ?;
Click the Parameter button. Configure the name 0 to be #[User::CurrentAccountNumber]
Flat File Destination - Connect the Source to the destination, create a new
Flat File Connection Manager and connect the columns.
Dynamic file name
The final piece will be to edit the Flat File Connection manager created above to use the variable FileNameOutput instead of the hard coded value you indicated. Right click on the Flat File Connection manager and select Properties. In the resulting properties window, find the Expressions property and click the ellipses (...) In the lefthand window, find ConnectionString and in the righthand window, use #[User::FileNameOutput]
F5 and the package should fire up and generate an output file per account number.
My actual Excel has about 1,000 rows/records and up to 5 comma separated tags per record in a "TAGS" column. Very open to any and all suggested solutions besides manually populating the TEXTs&TAGs tbl.
The Excel & SQL Server resemble this:
Excel:
TEXT TAGS
--------------------------------
1.derivatives math, calculus
2.triangles math, geometry
Database:
TEXTs tbl
1.derivatives
2.triangles
TAGs tbl
1.math
2.calculus
3.geometry
4.science
TEXTs&TAGs tbl (many to many)
1.1,1
2.1,2
3.2,1
4.2,3
You can use ADODB (tutorial, another one) to connect to SQL Server from your VBA macro in Excel. Then just insert records one by one like this. It won't by very fast, but should be OK for your needs.
Sub update()
For I = 1 To 100
txt = Sheet1.Cells(I, 1)
txt_id = insertAndGetIdForText(txt)
tags = Split(Sheet1.Cells(I, 2), ",")
For Each Tag In tags
tag_id = getIdForTag(Tag) #insert if not in DB, otherwise just return id
Call insertTextTag(txt_id, tag_id)
Next Tag
Next I
End Sub
I am using SQL Server 2016.
I have a stored procedure GET_RECORDS that takes input parameters for filter and outputs a CURSOR parameter
I want to get this cursor in my SSIS package
I had created data flow task, OleDb source and variables for parameter values. Then mapped parameters
Params mapping screen
but when I wanted to save the component - I got an error
error screen
I tried to add clause WITH RESULT SETS with some dummy columns, but my procedure doesn't return any result set
What am I doing wrong?
Any advices will be helpful.
Thank you.
With regards, Yuriy.
The source component is trying to determine what columns and types will be returned. Because you are using dynamic SQL the metadata can change each time you run it.
With result sets allows you to define the data being returned but should only be used if you are guaranteed to have those results every time you execute.
EDIT:
I create a connection and run the command so that it populates a data table. Then I put the column headers into a string array. There are plenty of examples out there.
Then I use the following function to create a destination table. Finally I create a datareader and pass that to the .Net SqlBulkCopy. Hope this helps.
private void CreateTable(string TableName, string[] Fields)
{
if (TableExists(TableName) && Overwrite)
{
SqlCommand = new SqlCommand($"Drop Table [{TableName}]", SqlConnection);
SqlCommand.ExecuteNonQuery();
}
string Sql = $"Create Table [{TableName}] (";
int ColumnNumber = 1;
foreach (string Field in Fields)
{
string FieldValue = Field;
if (! HasHeaders)
{
FieldValue = "Column" + ColumnNumber;
ColumnNumber++;
}
Sql += $"[{FieldValue}] Varchar(8000),";
}
Sql = Sql + "ImportFileID Int, ID Int Identity(1,1) Not Null, Constraint [PK_" + TableName + "] Primary Key Clustered ([ID] Asc))";
SqlCommand = new SqlCommand(Sql, SqlConnection);
SqlCommand.ExecuteNonQuery();
}
Use ado.net source instead of oledb source, define a simple select and get the columns you wish to return. Now you can define expresión in the dataflow properties.
Search ado.net source dynamic sql
:)
try to return the records and use foreach in ETL instead of cursor
https://www.simple-talk.com/sql/ssis/implementing-foreach-looping-logic-in-ssis/
I think you can do it from a simple way, but I don't know what you are you doing, exactly...
I am using XML source in SSIS to import XML file into SQL Server database.
I am not using all detail elements from XML file. But I want to save original element with all details in case they are needed at some point in future.
Lets say xml:
<root>
<row>
<desc>Some row</desc>
<child>
<hi>hi</hi>
<ho>ho</ho>
</child>
</row>
<row>
<desc>Some row2</desc>
<child>
<hi>hi2</hi>
<ho>ho2</ho>
</child>
</row>
</root>
Intended result in structure:
Create Table ParentTable
(
Id int primary key identity,
[desc] nvarchar(50),
xmlElement xml
)
How can I load original XML element (in this case element "row") into database as well by using SSIS?
I am new in SSIS, but in internet found solution (maybe not the best but worked).
So here it comes.
First i create same table as you provided ParentTable, just changed desc to 255. Also added Connection Manager to package.
Created two new variables User::FileName = "some.xml" and User::SourceCatalog = "C:\xmlCatalog\"
Then added Data Flow Task in which I added Script Component (selected Source type).
Opened Script Transformation Editor in Script tab into ReadOnlyVariables property added newly created variables User::FileName,User::SourceCatalog.
In the tab Inputs and Outputs renamed Output 0 to XMLResultOutput and under Output Columns created two new columns xmlDesc (Data Type = Unicode string [DT_WSTR] 255) and xmlData (Data Type = Unicode string [DT_WSTR] 3000). This variables will be used later in C# script.
Pressed Edit Script... in the Script tab. In the opened window in the CreateNewOutputRows method paste this code:
XmlDocument xDoc = new XmlDocument();
string xml_filepath = Variables.SourceCatalog + Variables.FileName;
xDoc.Load(xml_filepath);
foreach (XmlNode xNode in xDoc.SelectNodes("//row"))
{
this.XMLResultOutputBuffer.AddRow();
this.XMLResultOutputBuffer.xmlData = xNode.OuterXml.ToString();
this.XMLResultOutputBuffer.xmlDesc = xNode.SelectSingleNode("./desc").InnerText;//xNode.ChildNodes[0].InnerText;
}
Don't forget to add using System.Xml;
Added OLE DB Destination component, linked Script Component to it, selected table, mapped columns and THATS IT.
I try to search but didn't found answer to relative simple thing. I have a CSV, that doesn't have all the column as in my database table, as well as it miss the auto increment, primary key in CSV too.
All I did is I read CSV into the DataSet, and then run a traditional SQLBulkCopy code to read the first table of dataset to database table. But it give me following error:
The given ColumnMapping does not match up with any column in the source or destination.
My code for bulkcopy is
using (SqlBulkCopy blkcopy = new SqlBulkCopy(DBUtility.ConnectionString))
{
blkcopy.EnableStreaming = true;
blkcopy.DestinationTableName = "Project_" + this.ProjectID.ToString() + "_Data";
blkcopy.BatchSize = 100;
foreach (DataColumn c in ds.Tables[0].Columns)
{
blkcopy.ColumnMappings.Add(c.ColumnName, c.ColumnName);
}
blkcopy.WriteToServer(ds.Tables[0]);
blkcopy.Close();
}
I add Mapping to test, but it doesn't make difference to remove mapping part. If we remove mapping that it try to match column in order and since column are different in count they end up mismatch datatype and lesser column values etc. Oh yes the column names from CSV does match that from Table, and are in same case.
EDIT: I change the mapping code to compare the column name from live DB. For this I simply run a SQL Select query to fetch 1 record from database table and then do following
foreach (DataColumn c in ds.Tables[0].Columns)
{
if (LiveDT.Columns.Contains(c.ColumnName))
{
blkcopy.ColumnMappings.Add(c.ColumnName, c.ColumnName);
}
else
{
log.WriteLine(c.ColumnName + " doesn't exists in final table");
}
}
I would dump the results of CSV into a staging SQL table...and then do a simple insert from staging table to production table.
also do a simple Import of CSV into SQL Table, maybe there are some empty/invalid columns within CSV file.
I once had this problem and the cause was a difference in the case of the column names. One of the columns was "Id", but in the DB it was "id".