Having different set of tables for every SSIS load, I want to implement a smart routine which would launch only the packages which files are present.
I have a task that uploads file listing from a folder to a database table:
[dbo].[FileList]
Product.csv
Sales_2018.csv
Customer.csv
Delivery.csv
If in my SSIS, besides Product, Sales, Customer, Delivery packages I also have Shipping, Returns and others, is it possible to disable those automatically based on FileList match. So only Product, Sales, Customer, Delivery packages would run?
Or should it be approached in a different way?
Thank you!
I've done this in the past using this simple control flow.
Note the 2nd for each only loops through one time. It is just a check to see if the one file exists without a script task.
A few more notes:
1. Store execute sql results into Object Variable
2. Outer foreach is on ADO Object (variable from step 1)
2a. Map the current iteration of the object to local variables
3. Inner foreach is on file based on local variable from step 2
4. Package expression is based on local variable from step 2
Related
I am creating an SSIS package to import CSV file data into a SQL Server table.
Some of the rows in the CSV files will have missing values.
For example, if a row has the format: value1,value2,value3 and value2 is missing,
then it will render as: value1,,value3 in the csv file.
When the above happens (value2 is missing) in my SSIS package, I want NULL to go into the receiving SQL Server column that would hold value2.
I understand that I can add a "Script" task to my SSIS package to apply this rule. However, I'm concerned that this will drastically reduce the performance of my SSIS package. I'm not an expert on the inner workings of SSIS/SQL Server, but I'm concerned that this script will cause my script to lose "BULK INSERT" capabilities (and other efficiencies) since the script will have to inspect every row and apply the changes as needed.
Can anyone confirm if adding such a script will cause major performance impacts? Or does the SSIS/SQL-Server engine run the script on every row and then bulk-insert? Is there another way I can apply this rule without taking a performance hit?
Firstly, you can use script task when required. Script task will be executed only once for each execution of the whole package not for every row. For every row there is another component called script component. When the other regular SSIS tasks are not enough to achieve what you want you can surely use script component. I don't believe it is a performance killer unless you implement it badly.
Secondly, this particular requirement you can simply use Flat File Source task to import your csv file. It will put the value NULL when there is no value. I'm considering this is a valid csv value and each row has correct number of comma for every fields (total field - 1 actually) even if value is empty or null for some fields.
I want to load files in SQL server database on weekly basis. Each file name contains date on it. Currently, I am using Foreach Loop Container to get the file name and stored it in table. Table contains 3 columns FileName, Date and Week. After loading FileName using Execute SQL Task I extract Date and Week from the FileName and Populate Date and Week column. Then I use Execute SQL Task to SELECT all table date ORDER BY Date and Week and store it into object variable. Finally, I use Foreach Loop Container to load actual files in date order using ADO Enumerator and object variable. This works fine. However, I want to load files on Weekly basis. For an example all the files which has week 15 in the table should loaded first. Then it should load load all the files of week 16 and so on. The reason I want to load like this is after loading one week of files I want to process it using some stored procedure.
I think the problem can be solved by making two edits:
Loop over weeks
Add an Execute SQL Task that retrieve the Distinct Weeks from the table
Add a foreach loop container to loop over weeks
inside the foreach loop add an Execute SQL Task that retrieve the rows based on the current week
Use another foreach loop container to loop over result
Ordered results
You can simply add an ORDER BY clause inside the Execute SQL Task to get an ordered resultset.
This is a limitation of the ForEach loop enumerators - there is no way to load files in a sorted/ordered manner. If you want to load files in such a manner then there are two ways to do this:
Purchase an expensive package of components from third party vendors that provide a ForEach loop enumerator that can process files in a sorted/ordered manner
Do it yourself manually.
For option two, you will need to perform the following steps:
Create a ForEach File loop enumerator scan the folder for all files and insert the file names into a database table.
Create an Execute SQL Task that will SELECT all file names, ORDERED BY file name. You can add constraints in the WHERE clause to control the date range of files that you want to process.
Load the result set into a variable of type Object
Create a ForEach ADO loop enumerator to loop through each file name that is stored in the object.
Place a data flow in the loop and then process the files.
I am new to SSIS and am trying to understand how to do the following:
I have a folder (TestFolder) that has multiple folders within it (SubFolder1, SubFolder2, etc). In each subfolder there are multiple Excel files that have various names but will end in a date (Formatted as YYYYMM). In each Excel workbook there is a tab named: AccessRates and this is the data I want to store in the table in SQL Server.
Okay, so the question: How do I set up my SSIS Control flow to handle such a task? I have built a Data Flow Task that handles the Data conversion, error handling and ultimate placement in the server table, but I can not figure out the Control Flow. I believe I need a ForEach loop container, but I can't figure out how to set it, along with the variables up.
Any help or direction would be greatly appreciated!
JP
Solution guidelines
You should follow these steps:
Use a foreach loop and enumerate on files.
Set the top folder and select traverse subfolders.
Set the file sequence something like [the start of all files]*.xlsx
Retrieve fully qualified file name and map to a variable.
Inside foreach, drop a dataflow task
Make an Excel connection to any of the files
Go to the properties of the connection (F4).
Set an expression map connection string to the variable from step 4
Set Delay Validation to true.
Do your data flow.
This should be it.
Step-by-step tutorials
There are many articles that describe the whole process step-by-step, you can refer to them if you need more details:
How to read data from multiple Excel files with SQL Server Integration Services
Loop Through Excel Files in SSIS
Loop through Excel Files and Tables by Using a Foreach Loop Container
Have a look at this link it shows you how to set up an environment var to store the sheet name you want to get the data from and then how to use that to get the data from an excel source.
Hope this helps!
(I am not a vested enough member to include screen shots in my post - here is a link to a shared OneDrive folder which has article with images to explain better https://1drv.ms/f/s!Ai9bJ2dV87SLg9h5FP84If74hyUK4Q)
I am trying to log what particular stored procs have inserted, updated and deleted after they were executed via Execute SQL task within a SSIS package workflow. There was a custom logging method that a 3rd party implemented, but it worked by relating a System ParentContainer ID to a user Task ID which served as a parameter to a stored procedure which logged such information. 1) I don’t think this will work from an Execute SQL Task and 2) I want a level of detail that extends past what DML function occurred.
I have been successful in logging a “single row” by setting up a result set, using variables and via an adjacent Data Flow task using a derived column task to retrieve the variables and insert into a log table.
As an example of my current working method:
The Exec SQL Task setup
Detail of the data flow part that logs
I am now coming across stored procedures that perform multiple inserts, thus I have the need to log the additional detail - more than one row. I created variables in the proc to retrieves this multiple INSERT scenario and have a union select in the SP that yields the following result set.
I understand that I now need to use the Full Result Set setting now but for this application what method is used for to persist the result set to another step (for me a destination in order to log). Amongst research I understand how one may use a Foreach loop container but this appears to configurable to one variable which needs to be value type object. I currently have 4 variables here and unable to setup as such in the Collection section of the Foreach object.
I would appreciate any insight in achieving this or suggestion to another method altogether.
many thanks in advance!
Your INSERT_B_BUDGET SQL task generates more than 1 row, you want to use Foreach Loop to catch the full result set, correct?
create a variable, LoopObject, data type as Object.
edit INSERT_B_BUDGET task,
in General tab, change ResultSet to Full result set
in Result Set tab, Variable Name as LoopObject.
add Foreach loop container after INSERT_B_BUDGET task, move your Logging 1 task into the container.
in Collection tab, Enumerator, select Foreach ADO Enumerator, ADO object source variable as LoopObject, Enumeration mode, select Rows in the first table.
in Variable Mappings, add your existing 4 variables.
You have to set up a profiler trace to track what procs are being executed. You should set up filters on the trace by database and user and use the SQLProfilerTSQL_SPs template.
I'm fairly new to SSIS and don't know all it's features and what tasks I can use to do things I want. I have found many Google and stackoverflow.com searches to help me get to know variables and parameters and how to set them etc.
BLUF (Bottom Line Up Front)
I have a view with data which I want to export to a file through a job that runs the package.
The data will be filtered by it's LastUpdatedDate field with datatype of DateTimeOffSet(7). The package should allow a user to run it with a specified dat or use a value from another table (SSISJobRun).
Structure
/*Employee Table*/
Id int
Name varchar(255)
LastUpdatedDate datetimeoffset(7)
/*SSISJobRun Table*/
Id int
JobTypeId int
RunDate datetimeoffset(7)
What I have Done
Currently, I'm using SSDT for VS 2015 to create my SSIS packages and deploy them to my SSIS Catalog.
I have a project with 1 package. The package contains:
a Data Flow Task named EmployeeExport; this task contains an OLE DB Source and a Flat File Destination
a package level parameter named Filename_Export (this is so that the file path can be changed when it's run by a user; the parameter has a default value configured within the Job that runs it daily
All this runs perfectly fine.
Problem
I also have set another package level parameter named LastUpdatedDate. The intent is to have who/what-ever runs the package to define a date. However, if the date is null (if I decide to use a string) or if the date is the default value 1899-12-30 00:00:00 (if I decide to use a date), I want to determine what date to use.
Specifically, if there is no real date supplied by the user, then I want to the date to be the latest RunDate. For that case I use the following code:
SELECT TOP 1 LastUpdatedDate
FROM SSISJobRun
WHERE JobTypeId = 1
ORDER BY LastUpdatedDate DESC
I've tried many different ways, and it works when I supply a date, but I couldn't get it to work when the date I gave was blank when I used a string or the default when I used a date.
Here's a few sources I've been looking through to figure out my issue
How to pass SSIS variables in ODBC SQLCommand expression?
How do I pass system variable value to the SQL statement in Execute SQL task?
http://mindmajix.com/ssis/how-to-execute-stored-procedure-in-ssis-execute-sql-task
.. and many more.
Once last note: this date will be used to run two tasks, so if there is a way to keep it global that would be great.
Lastly, I need to package to insert a row specifying when the the task was run into the SSISJobRun table.
Thank you.
Use a Execute SQL Task, paste
SELECT TOP 1 LastUpdatedDate
FROM SSISJobRun
WHERE JobTypeId = 1
ORDER BY LastUpdatedDate DESC
in the statement, and set the result to single row, in the Result page, choose the variable you set, and change the index to 0
And before the same task run the 2nd time (inside any foreach or for loop) within the same execution and does not get used anywhere within the package, this variable will remain the same value.
if you need to check, right click that Execute SQL task, Edit Breakpoints to post execution, then run the package, open watch window from Debug tab, drag and drop the variable into watch window, you should see the value.