Is there a way to ignore mapping errors in SSIS? - sql-server

I have a data flow in SSIS that's using an ODBC Source to a conditional split.
The source returns a dynamic set of columns dependent on availability of data in the source - the number of columns goes from 1 to 13.
In my conditional split I have it pointing at the source and feeding the data to a destination that fits its number of columns.
Example:
Condition 1 -> Map column 1 to column 1 and ignore the other 12 columns
Condition 2 -> Map column 1 and 2 to column 1 and 2 and ignore the other 11 columns
However, if the source only contains 1 column it fails on the second condition because "there are some mapping errors on this path"
I know that the count of columns will never exceed 13 which means I can set conditions for columns 1 - 13.
Is there any way that I can ignore the mapping error or force SSIS to stop at the last executable case in my conditional split?
I don't personally want to have to dive into a script component so if this can be done with conditional split I'd be relieved!
Any thoughts?

As Larnu indicates, the number of columns in a data flow is a design time artifact and cannot be changed at run-time.
But, you should be able to handle this with 12 data flows.
Execute SQL Task -> However your current ODBC source is generating a variable set of columns, determine how many are being returned. Assign this to an SSIS Variable #[User::ColumnCount]
Attach 12 output paths from the Execute SQL Task to custom Data Flow Tasks that account for the number of source columns.
Change the precedence constraint on each of the paths to be Constraint and Expression with expressions like #[User::ColumnCount]==1 ... ==13
The SSIS designer is going to try to validate metadata as you design the package. As will the execution engine when you run the package. Therefore, you'll need to set the Delay Validation property to True on each of the Data Flow Tasks after you finish designing them.
In fact, as I think about this more, you'd like be better served by a parent/child package paradigm here. Design a package per data flow task and then have the parent/controller package invoke them much as I described above. That should simplify the metadata validation challenges you'll experience trying to get this built.

Related

Does adding simple script tasks to SSIS packages drastically reduce performance?

I am creating an SSIS package to import CSV file data into a SQL Server table.
Some of the rows in the CSV files will have missing values.
For example, if a row has the format: value1,value2,value3 and value2 is missing,
then it will render as: value1,,value3 in the csv file.
When the above happens (value2 is missing) in my SSIS package, I want NULL to go into the receiving SQL Server column that would hold value2.
I understand that I can add a "Script" task to my SSIS package to apply this rule. However, I'm concerned that this will drastically reduce the performance of my SSIS package. I'm not an expert on the inner workings of SSIS/SQL Server, but I'm concerned that this script will cause my script to lose "BULK INSERT" capabilities (and other efficiencies) since the script will have to inspect every row and apply the changes as needed.
Can anyone confirm if adding such a script will cause major performance impacts? Or does the SSIS/SQL-Server engine run the script on every row and then bulk-insert? Is there another way I can apply this rule without taking a performance hit?
Firstly, you can use script task when required. Script task will be executed only once for each execution of the whole package not for every row. For every row there is another component called script component. When the other regular SSIS tasks are not enough to achieve what you want you can surely use script component. I don't believe it is a performance killer unless you implement it badly.
Secondly, this particular requirement you can simply use Flat File Source task to import your csv file. It will put the value NULL when there is no value. I'm considering this is a valid csv value and each row has correct number of comma for every fields (total field - 1 actually) even if value is empty or null for some fields.

Not Creating the File when source has 0 rows

I have the below within the Data-flow area. The problem I'm experiencing is that even if the result is 0, it is still creating the file.
Can anyone see what I'm doing wrong here?
This is pretty much expected and known annoying behavior.
SSIS will create an empty flat file, even if unchecked: "column names in a first data row".
The workarounds are:
remove such file by a file system task if #RowCountWriteOff = 0 just after the execution of a dataflow.
as alternative, do not start a dataflow if expected number of rows in the source is 0:
Update 2019-02-11:
Issue I have is that I have 13 of these export to csv commands in the
data flow and they are costly queries
Then double querying a source to check a row-count ahead will be even more expensive and perhaps better to reuse a value of variable #RowCountWriteOff.
Initial design has 13 dataflows, adding 13 constraints and 13 filesystem tasks the main control flow will make package more complex and harder to maintain
Therefore, suggestion is to use a OnPostExecute event handler, so cleanup logic is isolated to some certain dataflow:
Update 1 - Adding more details based on OP comments
Based on your comment i will assume that you want to loop over many tables using SQL Commands, check if table contains row, if so then you should export rows to flat files, else you should ignore the tables. I will mention the steps that you need to achieve that and provide links that contains more details for each step.
First you should create a Foreach Loop container to loop over tables
You should add an Execute SQL Task with a count command SELECT COunt(*) FROM ....) and store the Resultset inside a variable
Add a Data Flow Task that import data from OLEDB Source to Flat File Destination.
After that you should add a precedence constraint with expression, to the Data Flow Task, with expression similar to #[User::RowCount] > 0
Also, it is good to check the links i provided because they contains a lot of useful informations and step by step guides.
Initial Answer
Preventing SSIS from creating empty flat files is a common issue that you can find a lot of references online, there are many workarounds suggested and many methods that may solves the issue:
Try to set the Data Flow Task Delay Validation property to True
Create another Data Flow Task within the package, which will be used only to count rows in the Source, if it is bigger than 0 then the precedence constraint should led to the other Data Flow Task
Add a File System Task after the Data Flow Task which delete the output file if RowCount is o, you should set the precedence constraint expression to ensure that.
References and helpful links
How to prevent SSIS package creating empty flat file at the destination
Prevent SSIS from creating an empty flat file
Eliminating Empty Output Files in SSIS
Prevent SSIS for creating an empty csv file at destination
Check for number of rows returned and do not create empty destination file
Set the Data Flow Task Delay Validation property to True

SSIS Conditional Split Lineage Error

I'm trying to run an Excel table through an SSIS Package and 3 nodes in, it has a Conditional Split. I'm using a previously known working spreadsheet with some data added to it.
The error I'm getting specifically is:
Conditional Split.Inputs[Split Input].Columns[ColumnName] has lineage ID 147 that was not previously used.
I've tried a couple spreadsheets with no avail. I was getting ID 105 initially.
My specific questions are: What do the IDs correspond to? Where do I look to try troubleshooting them?
Some additional logs.
Output:
Error at Data Flow Task 1 [SSIS.Pipeline]: Conditional Split.Inputs[Conditional Split Input].Columns[ColumnName] has lineage ID 147 that was not previously used in the Data Flow task.
Error at Data Flow Task 1 [SSIS.Pipeline]: "Conditional Split" failed validation and returned validation status "VS_NEEDSNEWMETADATA".
Error at Data Flow Task 1 [SSIS.Pipeline]: One or more component failed validation.
Error at Data Flow Task 1: There were errors during task validation.
"Lineage ID is a property of the component or transformation used in the data flow task. It contains an integer value that will work as buffer pointer. Each column in the data flow task will be assigned a lineage ID." Read about lineage ID in this Microsoft TechNet article
LINEAGE ID Error implies that a Source metadata was changed, just re-validate source (connection and component) by double click on the the Conditional split and close it , then check the columns metadata (using the advanced editor). (Note that when double-clicking on a component that contains errors it will prompt to fix it)
Or you can try removing Conditional Split and adding it again (if previous solution doesn't works)
Right click Conditional Split -> Advanced Editor -> Input and Output Properties -> expend those columns, you will see each column has a LineageID.
I believe SSIS assigns unique identifiers (lineage IDs) to each column in each pipe connecting your components. SSIS gets confused when a component is expecting a lineage ID of x, but can't find it in the input pipe.
Generally, you try to find the offending pipe (in BIDS/SSDT, using #Wendy's method). Double-clicking the pipe or connected components will sometimes produce a dialog box offering the option to fix the issue. If not, then removing and recreating the pipe is your best chance.
Downstream components can be adversely affected when you change things upstream of them. Often, the only recourse when doing midstream modifications is to rebuild the entire downstream. SSIS is a bit brittle in this area.

How can I minimize validation intervals when changing the SQL in ADO NET Source Tasks

Part of an SSIS package is the data import from an external database via a SQL command embedded into an ADO.NET Source Data Flow Source. Whenever I make even the slightest adjustment to the query (such as changing a column name) it takes ages (in that case 1-2 hours) until the program has finished validation. The query itself returns around 30,000 rows with 20 columns each.
Is there any way to cut these long intervals or is this something I have to live with?
I usually store the source queries in a table and the first part of my package would execute a select and store the query returned from the table in a package variable, which would then be used by the ADO.NET Source Data Flow. So In my package for the default value of the variable I usually have the query that is stored in the database along with a "where 1=2" at the end. Hence during design time it does execute the query but just returns the column metadata. Let me know if you have any questions.

Import data from .xls to table by removing unwanted columns? [duplicate]

I need to import sheets which look like the following:
March Orders
***Empty Row
Week Order # Date Cust #
3.1 271356 3/3/10 010572
3.1 280353 3/5/10 022114
3.1 290822 3/5/10 010275
3.1 291436 3/2/10 010155
3.1 291627 3/5/10 011840
The column headers are actually row 3. I can use an Excel Sourch to import them, but I don't know how to specify that the information starts at row 3.
I Googled the problem, but came up empty.
have a look:
the links have more details, but I've included some text from the pages (just in case the links go dead)
http://social.msdn.microsoft.com/Forums/en-US/sqlintegrationservices/thread/97144bb2-9bb9-4cb8-b069-45c29690dfeb
Q:
While we are loading the text file to SQL Server via SSIS, we have the
provision to skip any number of leading rows from the source and load
the data to SQL server. Is there any provision to do the same for
Excel file.
The source Excel file for me has some description in the leading 5
rows, I want to skip it and start the data load from the row 6. Please
provide your thoughts on this.
A:
Easiest would be to give each row a number (a bit like an identity in
SQL Server) and then use a conditional split to filter out everything
where the number <=5
http://social.msdn.microsoft.com/Forums/en/sqlintegrationservices/thread/947fa27e-e31f-4108-a889-18acebce9217
Q:
Is it possible during import data from Excel to DB table skip first 6 rows for example?
Also Excel data divided by sections with headers. Is it possible for example to skip every 12th row?
A:
YES YOU CAN. Actually, you can do this very easily if you know the number columns that will be imported from your Excel file. In
your Data Flow task, you will need to set the "OpenRowset" Custom
Property of your Excel Connection (right-click your Excel connection >
Properties; in the Properties window, look for OpenRowset under Custom
Properties). To ignore the first 5 rows in Sheet1, and import columns
A-M, you would enter the following value for OpenRowset: Sheet1$A6:M
(notice, I did not specify a row number for column M. You can enter a
row number if you like, but in my case the number of rows can vary
from one iteration to the next)
AGAIN, YES YOU CAN. You can import the data using a conditional split. You'd configure the conditional split to look for something in
each row that uniquely identifies it as a header row; skip the rows
that match this 'header logic'. Another option would be to import all
the rows and then remove the header rows using a SQL script in the
database...like a cursor that deletes every 12th row. Or you could
add an identity field with seed/increment of 1/1 and then delete all
rows with row numbers that divide perfectly by 12. Something like
that...
http://social.msdn.microsoft.com/Forums/en-US/sqlintegrationservices/thread/847c4b9e-b2d7-4cdf-a193-e4ce14986ee2
Q:
I have an SSIS package that imports from an Excel file with data
beginning in the 7th row.
Unlike the same operation with a csv file ('Header Rows to Skip' in
Connection Manager Editor), I can't seem to find a way to ignore the
first 6 rows of an Excel file connection.
I'm guessing the answer might be in one of the Data Flow
Transformation objects, but I'm not very familiar with them.
A:
Question Sign in to vote 1 Sign in to vote rbhro, actually there were
2 fields in the upper 5 rows that had some data that I think prevented
the importer from ignoring those rows completely.
Anyway, I did find a solution to my problem.
In my Excel source object, I used 'SQL Command' as the 'Data Access
Mode' (it's drop down when you double-click the Excel Source object).
From there I was able to build a query ('Build Query' button) that
only grabbed records I needed. Something like this: SELECT F4,
F5, F6 FROM [Spreadsheet$] WHERE (F4 IS NOT NULL) AND (F4
<> 'TheHeaderFieldName')
Note: I initially tried an ISNUMERIC instead of 'IS NOT NULL', but
that wasn't supported for some reason.
In my particular case, I was only interested in rows where F4 wasn't
NULL (and fortunately F4 didn't containing any junk in the first 5
rows). I could skip the whole header row (row 6) with the 2nd WHERE
clause.
So that cleaned up my data source perfectly. All I needed to do now
was add a Data Conversion object in between the source and destination
(everything needed to be converted from unicode in the spreadsheet),
and it worked.
My first suggestion is not to accept a file in that format. Excel files to be imported should always start with column header rows. Send it back to whoever provides it to you and tell them to fix their format. This works most of the time.
We provide guidance to our customers and vendors about how files must be formatted before we can process them and it is up to them to meet the guidlines as much as possible. People often aren't aware that files like that create a problem in processing (next month it might have six lines before the data starts) and they need to be educated that Excel files must start with the column headers, have no blank lines in the middle of the data and no repeating the headers multiple times and most important of all, they must have the same columns with the same column titles in the same order every time. If they can't provide that then you probably don't have something that will work for automated import as you will get the file in a differnt format everytime depending on the mood of the person who maintains the Excel spreadsheet. Incidentally, we push really hard to never receive any data from Excel (only works some of the time, but if they have the data in a database, they can usually accomodate). They also must know that any changes they make to the spreadsheet format will result in a change to the import package and that they willl be charged for those development changes (assuming that these are outside clients and not internal ones). These changes must be communicated in advance and developer time scheduled, a file with the wrong format will fail and be returned to them to fix if not.
If that doesn't work, may I suggest that you open the file, delete the first two rows and save a text file in a data flow. Then write a data flow that will process the text file. SSIS did a lousy job of supporting Excel and anything you can do to get the file in a different format will make life easier in the long run.
My first suggestion is not to accept a file in that format. Excel files to be imported should always start with column header rows. Send it back to whoever provides it to you and tell them to fix their format. This works most of the time.
Not entirely correct.
SSIS forces you to use the format and quite often it does not work correctly with excel
If you can't change he format consider using our Advanced ETL Processor.
You can skip rows or fields and you can validate the data the way you want.
http://www.dbsoftlab.com/etl-tools/advanced-etl-processor/overview.html
Sky is the limit
You can just use the OpenRowset property you can find in the Excel Source properties.
Take a look here for details:
SSIS: Read and Export Excel data from nth Row
Regards.

Resources