How to use selected SQL statement in SSIS package as source variable? - sql-server

I created SSIS package where I need to use FELC. The first step before the loop is to run sql task to obtain all SQL statements designed to generate different XML files and stored in a source table. Inside the FELC I would like to process the statements to generate XML files and send them to various folder locations with names and target folder coming from the source table. There is hundreds of files that needs to be refreshed on regular basis. Instead of running single jobs for each XML file generation I would like to amalgamate it into one process.
Is it possible?

This is the basic Shred Recordset pattern.
I have 3 variables declared: SourceQuery, CurrentQuery and rsQueryData. The first 2 are Strings, the last is an Object type
SQL - Get source data
This is my query. It simulates your table and induces a failing SQL Statement if I take out the filter.
SELECT
ProcessID
, Stmt_details
FROM
(
VALUES
( 1, 'SELECT 1;', 1)
, ( 20, 'SELECT 20;', 1)
, ( 30, 'SELECT 1/0;', 0)
) Stmt_collection (ProcessID, Stmt_details, xmlFlag)
WHERE
xmlFlag = 1
The Execute SQL Task is set with Recordset = Full and I assign it to variable User::rsQueryData which has a name of 0 in the mapping tab.
FELC
This is a standard Foreach ADO Recordset Loop container. I use my User::rsQueryData as the source and since I only care about the second element, ordinal position 1, that's the only thing I map. I assign the current value to User::CurrentStatement
SQL - Execute CurrentStatement
This is an Execute SQL Task that has as its source the Variable User::CurrentStatement. There's no scripting involved. The FELC handles the assignment of values to this Variable. This Task uses as its source that same Variable. This is very much how native SSIS developers will approach solving a problem. If you reach for a Script Task or Component as the first approach, you're likely doing it wrong.
Biml
If you're doing any level of SSIS/SSRS/SSAS development, you want Bids Helper It is a free add on to Visual Studio that makes your development life so much easier. The feature I'm going to leverage here is the ability to declaratively define an SSIS package. This language is called the Business Intelligence Markup Language, Biml, and I love it for many reasons but on StackOverflow, I love it because I can give you the code to reproduce exactly my solution. Otherwise, I have to build out a few hundred screenshots showing you everywhere I have to click and set values.
Or, you
1. Download and install BIDS Helper
2. Open up your existing SSIS project
3. Right click on the Project and select "Add new Biml file"
4. In the resulting BimlScript.biml file, open it up and paste all of the following code into it
5. Fix the value for your database connection string. This one assumes you have an instance on your local machine called Dev2014
6. Save the biml file
7. Right click that BimlScript.biml and select "Generate SSIS Packages"
8. Marvel at the resulting so_28867703.dtsx package that was added to your solution
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Connections>
<OleDbConnection Name="CM_OLE" ConnectionString="Data Source=localhost\dev2014;Initial Catalog=tempdb;Provider=SQLNCLI10.1;Integrated Security=SSPI;Auto Translate=False;" />
</Connections>
<Packages>
<Package ConstraintMode="Linear" Name="so_28867703">
<Variables>
<Variable DataType="String" Name="QuerySource">SELECT ProcessID, Stmt_details FROM (VALUES (1, 'SELECT 1;', 1), (20, 'SELECT 20;', 1), (30, 'SELECT 1/0;', 0))Stmt_collection(ProcessID, Stmt_details, xmlFlag) WHERE xmlFlag = 1 </Variable>
<Variable DataType="String" Name="CurrentStatement">This statement is invalid</Variable>
<Variable DataType="Object" Name="rsQueryData"></Variable>
</Variables>
<Tasks>
<ExecuteSQL
ConnectionName="CM_OLE"
Name="SQL - Get source data"
ResultSet="Full"
>
<VariableInput VariableName="User.QuerySource" />
<Results>
<Result VariableName="User.rsQueryData" Name="0" />
</Results>
</ExecuteSQL>
<ForEachAdoLoop
SourceVariableName="User.rsQueryData"
ConstraintMode="Linear"
Name="FELC - Shred RS"
>
<VariableMappings>
<!--
0 based system
-->
<VariableMapping VariableName="User.CurrentStatement" Name="1" />
</VariableMappings>
<Tasks>
<ExecuteSQL ConnectionName="CM_OLE" Name="SQL - Execute CurrentStatement">
<VariableInput VariableName="User.CurrentStatement" />
</ExecuteSQL>
</Tasks>
</ForEachAdoLoop>
</Tasks>
</Package>
</Packages>
</Biml>
That package will run, assuming you fixed the connection string to a valid instance. You can see below that if you put break points on the Execute SQL Task, it will light up two times. If you have a watch window on CurrentStatement, you can see it change from the design time value to the values shredded from the result set.
While we await clarification on XML and files, if the goal is to take the query from the FELC and export to file, I answered that https://stackoverflow.com/a/9105756/181965 Although in this case, I'd restructure your package to just the Data Flow and eliminate the shredding as there's no need to complicate matters to export a single row N times.

If i understand you correctly; You can add a "Script Task" from Toolbox to first step of loop container and store the selected statement from the database in to the global variable and pass it for execution in the next step

Related

SSIS - Extract from XML dtsx Package every SQLCommand / TableorViewName From Data flow task

Basically, my purpose is urbanization context is to retrieve ALL input Datasource (sqlcommand or tableorviewname) and ALL output datasources
I did a first try succesfully with one of our dtsx packages :
SELECT CONVERT(XML, BulkColumn) AS xContent INTO Packages
FROM OPENROWSET(BULK 'F:\Repos\DW_all\DW_all\anaplan_sales_mrr_v2.dtsx', SINGLE_BLOB) AS x;
SELECT
X.Exe.value('(./#DTS:Description)','nvarchar(20)') as description
,X.Exe.value('(./#DTS:ObjectName)','nvarchar(20)') as ObjectName
,X.Exe.value('(./DTS:ObjectData/pipeline/components/component/properties/property)[1]','nvarchar(25)') as TargettedTable
,X.Exe.value('(./DTS:ObjectData/pipeline/components/component/connections/connection/#connectionManagerRefId)[1]','nvarchar(100)') as TargettedConnect
,X.Exe.value('(./DTS:ObjectData/pipeline/components/component[2]/properties/property[1]/#name)[1]','nvarchar(max)') as SourceType
,X.Exe.value('(./DTS:ObjectData/pipeline/components/component[2]/properties/property)[1]','nvarchar(max)') as [Source]
,X.Exe.value('(./DTS:ObjectData/pipeline/components/component[2]/connections/connection/#connectionManagerRefId)[1]','nvarchar(100)') as SourceConnect
FROM (SELECT XContent AS pkgXML FROM [dbo].[TestPackage]) t
CROSS APPLY pkgXML.nodes('/DTS:Executable/DTS:Executables/DTS:Executable/DTS:Executables/DTS:Executable') X(Exe)
Where X.Exe.value('(./#DTS:Description)[1]','nvarchar(max)') = 'Data Flow Task'
RESULT :
description
ObjectName
TargettedTable
...
Data Flow Task
1rst flowname
First Table
Data Flow Task
Other Flow
other Table
Data Flow Task
Another Flow
another table
Well, till now everything's fine.
Unfortunately, the properties position and the path is now always the same. In my exemple above, target is the first tag in the XML file (that's why [1]) and source is mentioned after (that's why [2]). But in other packages, that's the reverse case.
In the same idea, the property indicating the type of the datasource (name=sqlcommand or name=tableorviewname) is not always in the same position, so the pointer [1] is not relevant.
Moreover, in my exemple above the path is '/DTS:Executable/DTS:Executables/DTS:Executable/DTS:Executables/DTS:Executable' (with one Sequence Container) but other packages don't have container and have a different path (ex: /DTS:Executable/DTS:Executables/DTS:Executable').
I have tried some test with kind of wildcard like [.] or [*] but i'm not confortable with this and my tests are still on failure.
<DTS:ObjectData xmlns:DTS="DTS">
<pipeline
version="1">
<components>
<component
refId="Package\blabla my description id"
componentClassID="Microsoft.OLEDBSource"
contactInfo="OLE DB Source;Microsoft Corporation; Microsoft SQL Server; (C) Microsoft Corporation; All Rights Reserved; http://www.microsoft.com/sql/support;7"
description="OLE DB Source"
name="My DataFlow Name"
usesDispositions="true"
version="7">
<properties>
<property
dataType="System.Int32"
description="The number of seconds before a command times out. A value of 0 indicates an infinite time-out."
name="CommandTimeout">0</property>
<property
dataType="System.String"
description="Specifies the name of the database object used to open a rowset."
name="OpenRowset"></property>
<property
dataType="System.String"
description="Specifies the variable that contains the name of the database object used to open a rowset."
name="OpenRowsetVariable"></property>
<property
dataType="System.String"
description="The SQL command to be executed."
name="SqlCommand"
UITypeEditor="Microsoft.DataTransformationServices.Controls.ModalMultilineStringEditor, Microsoft.DataTransformationServices.Controls, Version=15.0.0.0, Culture=neutral, PublicKeyToken=89845dcd8080cc91">SELECT * From OneTable Inner Join AnotherTable ...
</property>
</properties>
</component>
</components>
</pipeline>
</DTS:ObjectData>
Can anyone help me please to improve the initial script to make it efficient for whatever packages, resulting all sqlcommand or tableorviewname in input of the dataflows included in the package, and the same in output.
TIA for your help and advices :-)
Fred.M.

SSIS - merging non linked parts of xml

i have xml that has complex structure and while i was able to pull set of data i need from this sensor like measurements "from" and "to" and "count", I also have to pull data about sensor like IP address and Serial number that lives in different tag which doesn't have same id as the data tags. Here is the XML:
<response xmlns="http://www.test.com/sensor-api/v2">
<sensor-time timezone="America/New_York">2017-07-18T15:45:03-04:00</sensor-time>
<status>
<code>OK</code>
</status>
<sensor-info>
<serial-number>Q3:80:39:40:9Z:N2</serial-number>
<ip-address>192.163.135.10</ip-address>
<name>Test</name>
<group />
<device-type>PC2 - UL</device-type>
</sensor-info>
<content>
<elements>
<element>
<element-id>2</element-id>
<element-name>Conf_Lower_Zone</element-name>
<sensor-type>SINGLE_SENSOR</sensor-type>
<data-type>ZONE</data-type>
<from>2017-07-18T15:40:00-04:00</from>
<to>2017-07-18T15:45:00-04:00</to>
<resolution>ONE_MINUTE</resolution>
<measurements>
<measurement>
<from>2017-07-18T15:40:00-04:00</from>
<to>2017-07-18T15:41:00-04:00</to>
<values>
<value label="count">0</value>
</values>
</measurement>
<measurement>
<from>2017-07-18T15:41:00-04:00</from>
<to>2017-07-18T15:42:00-04:00</to>
<values>
<value label="count">0</value>
</values>
</measurement>
I used SSIS package with merge join process and i was able to push data in to SQL table now i have to add the sensor info ( IP, Serial) to the same table. So Serial and IP would repeat for every row of data of course.
How do i do this in SSIS package? What process to use to add two additional columns to repeat data all the way down for every line.
Here is the SSIS package so far:
Ok so I edited the SSIS package deriving two differing output from XML Source, one with Sensor-Info that feeds that Sensor_Info Table in SQL Server, and another output from XML Source that feeds Count_Data Table in SQL Server.
Than I added the Execute SQL Task within foreach Look Container as on image bellow, and i added this Query
USE SANDBOX
GO
INSERT INTO ALL_DATA
SELECT *
FROM [SANDBOX].[dbo].[Sensor_Info],[dbo].[Count_Data]
This is in attempt to Combine these two tables after each XML Load. However i am getting trash data which does combine the tables but with no sense.
What am I doing wrong now?
One of your XML outputs should be sensor-info.
You should run that through a script component transformation:
Set up some variables for #IP and #SerNum and include in variables.
Check both IP and SerialNumber columns as input columns.
Enter the script and it is simply this.
Variables.IP = Row.IP.ToString();
Variables.SerNum = Row.SerialNumber.TOString();
Now add a derived column in the flow that you want to add these values and set them to the variables you just defined.

The execution of a SP in SSIS returns nothing

Until now I've been looking for a possible solution to the execution of a sp from SSIS, but anything seems to work. I´ve got a sp:
CREATE PROCEDURE [DBO].[SPIDENTIFIERS] #IDENT NVARCHAR(MAX) OUTPUT
What I need is to save the result in a variable that I've created in SSIS.
This is the configuration that I used to try to do it.
In the parameter set section I have also used the Direction as Output or ReturnValue but I received a error message. Just to try I put a Script Task to chek the value, but as you can see this is empty.
With the Direction Ouput or ReturnValue I've got this:
[Execute SQL Task] Error: Executing the query "EXECUTE spIdentifiers ? OUTPUT;" failed with the following error:
"El valor no está dentro del intervalo esperado.".
Possible failure reasons: Problems with the query, "ResultSet" property not set correctly,
parameters not set correctly, or connection not established correctly.
What am I missing in the configuration of the task?.
I looked for an answer in this post. But nothing seems to work
How do you call a Stored Procedure in SSIS?
SSIS Stored Procedure Call
Thanks in advance.
Your parameter should not be named, as #gerald Davis has indicated. For a connection manager of OLEDB type, it should be ordinal based, thus 0
Here's my sample package and you can see that my variable #[User::MyVariables] is populated with a lot of Xs
Here's my proc definition
IF NOT EXISTS
(
SELECT
*
FROM
sys.procedures AS P
WHERE
P.name = N'SPIDENTIFIERS'
)
BEGIN
EXECUTE sys.sp_executesql N'CREATE PROC dbo.spidentifiers AS SELECT ''stub version, to be replaced''';
END
GO
ALTER PROCEDURE [DBO].[SPIDENTIFIERS]
(
#IDENT NVARCHAR(MAX) OUTPUT
)
AS
BEGIN
SET NOCOUNT ON;
SET #IDENT = REPLICATE(CAST(N'X' AS nvarchar(MAX)), 4001);
-- Uncomment this to watch the fireworks
--SET #IDENT = REPLICATE(CAST(N'X' AS nvarchar(MAX)), 4001);
END
Biml
I'm a big fan of using Biml, the Business Intelligence Markup Language, to describe my solutions as it allows the reader to recreate exactly the solution I describe without all those pesky mouse clicks.
Download BIDS Helper and install or unzip
Add a new biml file to your SSIS project
Fix the third line's ConnectionString to point to a valid server and database. Mine references localhost\dev2014 and tempdb
Right click on the saved biml file and generate package
Take your well deserved Biml break
Biml code follows
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Connections>
<OleDbConnection Name="tempdb" ConnectionString="Provider=SQLNCLI11.1;Server=localhost\dev2014;Initial Catalog=tempdb;Integrated Security=SSPI;" />
</Connections>
<Packages>
<Package Name="so_30460630" ConstraintMode="Linear">
<Variables>
<Variable DataType="String" Name="MyVariables">0</Variable>
</Variables>
<Tasks>
<ExecuteSQL
ConnectionName="tempdb"
Name="SQL Ensure Objects Exist">
<DirectInput>
<![CDATA[IF NOT EXISTS
(
SELECT
*
FROM
sys.procedures AS P
WHERE
P.name = N'SPIDENTIFIERS'
)
BEGIN
EXECUTE sys.sp_executesql N'CREATE PROC dbo.spidentifiers AS SELECT ''stub version, to be replaced''';
END
GO
ALTER PROCEDURE [DBO].[SPIDENTIFIERS]
(
#IDENT NVARCHAR(MAX) OUTPUT
)
AS
BEGIN
SET NOCOUNT ON;
SET #IDENT = REPLICATE(CAST(N'X' AS nvarchar(MAX)), 4001);
END
]]>
</DirectInput>
</ExecuteSQL>
<ExecuteSQL
ConnectionName="tempdb"
Name="SQL Using an OUTPUT parameter">
<DirectInput>EXECUTE dbo.SPIDENTIFIERS ? OUTPUT;</DirectInput>
<Parameters>
<Parameter DataType="String" VariableName="User.MyVariables" Name="0" Direction="Output" Length="-1" />
</Parameters>
</ExecuteSQL>
<ExecuteSQL
ConnectionName="tempdb"
Name="SQL Breakpoint">
<DirectInput>SELECT NULL AS nothing;</DirectInput>
</ExecuteSQL>
</Tasks>
</Package>
</Packages>
</Biml>
Your stored procedure parameter is OUTPUT but your SSIS package defines it as INPUT. Depending on the application, RETURNVALUE could also be used but from the syntax of your SP it is using an Output Parameter not a Return Value.
Verify the User:Id variable has the correct datatype. Try executing the SP in SSMS manually to verify that it runs without error and returns the expected result.
Also I think you are mixing OLEDB and ADO.NET syntax.
If using an OLEDB Data connection then you use the ? parameters in the query and the Parameter names must be "Parameter0 (and Parameter1, etc if more than 1)". Note: parameter names are zero indexed. In SP with more than 1 parameter the correct order is required.
If using an ADO.NET DataConnection then the query is just the named of the stored procedure, IsStoredProcedure=True, and the Parameter names matches the name of the parameter in the SP.
From your screenshots you currently are using named parameters and OLDEDB ? syntax. I don't believe that is ever valid. It is one or the other depending on the connection type.
UserID needs to be in the readwritevariable section, not the read section, so that you allow the task to write into the variable.
parameter direction should be "output" since you are passing it out of your task not into it.
You need to keep the sql statement as "EXEC SPIDENTIFIERS ? OUTPUT**
direction of variable should be Output in parameter mapping tab and "Parameter Name" should be exactly same as of input parameter defined in stored procedure or you can just use 0 instead of giving the actual name.

SQL Server BIT data type reports differently for View and Table query

I need to export data from SQL Server 2012 based on a view. While testing the export for a downstream system, I was manually extracting the data out of the table that the view is based on and the BIT data type columns were reporting as 1/0.
However, once I setup the view against the table, I noticed that the BIT data type columns reported as TRUE/FALSE. This happens whether I perform a select against the view or export from it.
Why does this happen and how can I maintain the same results in the view as the data table (1/0)?
The bit data type is interpreted by clients differently. SSMS, will report back a 1 or 0 for a bit while the same 1/0 is interpreted by an SSIS's Data Flow as True or False.
Whether the source is a table or a view makes no matter for SSIS unless you explicitly change the data type.
For setup, I created 2 tables and a view
CREATE TABLE dbo.BaseTable
(
SomeBit bit NOT NULL
, RowDescription varchar(50) NOT NULL
);
CREATE TABLE dbo.TargetTable
(
SomeBit bit NOT NULL
, RowDescription varchar(50) NOT NULL
, SourcePackage nvarchar(100) NOT NULL
);
GO
CREATE VIEW dbo.MyView
AS
SELECT
BT.SomeBit
, BT.RowDescription
FROM
dbo.BaseTable AS BT;
GO
INSERT INTO
dbo.BaseTable
(
SomeBit
, RowDescription
)
VALUES
(CAST(0 AS bit), 'Falsification')
, (CAST(1 AS bit), 'True dat');
GO
At this point, if I use SSMS and query either dbo.BaseTable or dbo.MyView, I will get back a 1 and 0. But again, these are just artifacts of presentation. In C, 0 is false and any numeric value that isn't 0 is true. Excel will present it as FALSE and TRUE. Every client will interpret the value into whatever the local representation of a boolean value is. SSIS chose True and False.
I built out a simple package that pulls data from BaseTable or MyView and writes it to a text file and a table.
The basic control flow looks thus
The data flow looks complex but it's not.
I select from either my table or view, add a description for my target table, use a multicast so I can send the same data to multiple destinations and then write to a file and table.
If I query SSMS for my sources and destinations, you'll see that the destination libraries handle the translation between the local and foreign representation of the data type.
There is no such translation available for a flat file because there's no "standard" for the representation of a boolean. I might like Y/N. Even so, the
I tried a number of things to coerce a 1/0 to be written to the flat file. I set my data types to
Boolean DT_BOOL
Single byte signed int DT_I1
Four byte signed int DT_I4
String DT_STR
but it never mattered (which actually seems odd given how persnickety SSIS is about data types) --- my output was always the same
False,Falsification
True,True dat
Ultimately, if I wanted a 0 or a 1 in that output file, I needed to change my data type: either in the source query with an explicit cast or through a Derived Column component using the ternary operator SomeBit ? (DT_I1)1 : (DT_I1)0. Use DT_I1/I2/I4/I8 as you see fit
Fun trivia note: if you chose to use the Data Conversion component you're going to get 0 for False, -1 for True or if you use a lazy cast in the Derived Component (DT_I1) SomeBit It seems they follow the C interpretation of boolean values.
Biml it
No need to take my word for it. Using the above table definitions and population of values, if you install the free addon BIDS Helper you can generate the same code for any version of SSIS.
After installing BIDS Helper, right click on an SSIS project and in the context menu, select Add Biml file. Replace the contents of that file with the below code; save and then right-click to generate a new package.
You will need to edit the values for the Flat File Connection to point to valid locations as well as point the ole db connection string to wherever you spin up your tables.
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Connections>
<FlatFileConnection FilePath="C:\ssisdata\so_29244868.table.csv" FileFormat="FFF_table" Name="FF_Table" />
<FlatFileConnection FilePath="C:\ssisdata\so_29244868.view.csv" FileFormat="FFF_table" Name="FF_View" />
<OleDbConnection Name="CM_OLE" ConnectionString="Data Source=localhost\dev2014;Initial Catalog=tempdb;Provider=SQLNCLI11.0;Integrated Security=SSPI;" />
</Connections>
<FileFormats>
<FlatFileFormat
Name="FFF_table" IsUnicode="false" CodePage="1252"
FlatFileType="RaggedRight">
<Columns>
<Column Name="SomeBit" DataType="Boolean" Delimiter="," />
<Column Name="RowDescription" DataType="AnsiString" Length="50" Delimiter="CRLF"/>
</Columns>
</FlatFileFormat>
</FileFormats>
<Packages>
<Package ConstraintMode="Parallel" Name="so_29244868">
<Tasks>
<Dataflow Name="DFT Table example">
<Transformations>
<OleDbSource ConnectionName="CM_OLE" Name="OLE_SRC dbo_BaseTable">
<ExternalTableInput Table="dbo.BaseTable" />
</OleDbSource>
<DerivedColumns Name="DER Package name">
<Columns>
<Column DataType="String" Name="SourcePackage" Length="100">"DFT Table example"</Column>
</Columns>
</DerivedColumns>
<Multicast Name="MC Dupe">
<OutputPaths>
<OutputPath Name="FF" />
<OutputPath Name="Table" />
</OutputPaths>
</Multicast>
<FlatFileDestination ConnectionName="FF_Table" Name="FF_DST table">
<InputPath OutputPathName="MC Dupe.FF" />
</FlatFileDestination>
<OleDbDestination
ConnectionName="CM_OLE"
Name="OLE_DST Table"
TableLock="false">
<InputPath OutputPathName="MC Dupe.Table" />
<ExternalTableOutput Table="[dbo].[TargetTable]"></ExternalTableOutput>
</OleDbDestination>
</Transformations>
</Dataflow>
<Dataflow Name="DFT View example">
<Transformations>
<OleDbSource ConnectionName="CM_OLE" Name="OLE_SRC dbo_MyView">
<ExternalTableInput Table="dbo.MyView" />
</OleDbSource>
<DerivedColumns Name="DER Package name">
<Columns>
<Column DataType="String" Name="SourcePackage" Length="100">"DFT View example"</Column>
</Columns>
</DerivedColumns>
<Multicast Name="MC Dupe">
<OutputPaths>
<OutputPath Name="FF" />
<OutputPath Name="Table" />
</OutputPaths>
</Multicast>
<FlatFileDestination ConnectionName="FF_View" Name="FF_DST view">
<InputPath OutputPathName="MC Dupe.FF" />
</FlatFileDestination>
<OleDbDestination
ConnectionName="CM_OLE"
Name="OLE_DST view"
TableLock="false"
>
<InputPath OutputPathName="MC Dupe.Table" />
<ExternalTableOutput Table="[dbo].[TargetTable]"></ExternalTableOutput>
</OleDbDestination>
</Transformations>
</Dataflow>
</Tasks>
</Package>
</Packages>
</Biml>
I've run into the same problem using Entity Framework.
Try casting the bit field to a bit.

Lookup component fails to match empty strings when full cache is used

I have lookup component a with a lookup table that retusn a varchar(4) column with 3 possible values: "T", "R" or "" (empty string).
I'm using an OLE DB connection for the lookup table, and have tried direct access to the table, as well as specifying a query with an RTRIM() on the column, to get sure that the string is empty and not a "blank string of some length".
If I set the cache mode to "Partial cache" everything works fine (either with direct reading of the table, or using the trimming query), and the empty strings of the input table are correctly matched to the corresponding lookup table row.
However, If I change the cache mode to "Full cache", none of the empty strings are matched at all.
I've checked that the data type, DT_STR, and lenght, 4, is the same in the lookup table and the input table.
Is there something that explains this behaviour? Can it be modified?
NOTE: This is not the documented problem with null values. It's about empty strings.
Somewhere, you have trailing spaces, either in your source or your lookup.
Consider the following source query.
SELECT
D.SourceColumn
, D.Description
FROM
(
VALUES
(CAST('T' AS varchar(4)), 'T')
, (CAST('R' AS varchar(4)), 'R')
, (CAST('' AS varchar(4)), 'Empty string')
, (CAST(' ' AS varchar(4)), 'Blanks')
, (NULL, 'NULL')
) D (SourceColumn, Description);
For my lookup, I restricted the above query to just T, R and the Empty String rows.
You can see that for the 5 source rows, T, R and Empty String matched and went to the Match Output path. Where I used a NULL or explicitly used spaces, did not make a match.
If I change my lookup mode from Full Cache to Partial, the NULL continues to not match while the explicit spaces does match.
Wut?
In full cache mode, the Lookup transformation executes the source query and keeps the data locally on the machine SSIS is executing on. This lookup is going to be an exact match using .NET equality rules. In that case, '' will not match ' '.
However, when we change our cache mode to None or Partial, we will no longer be relying on the .NET matching rules and instead, we'll use the source Database's matching rules. In TSQL, '' will match ' '
To make your Full Cache mode work as expected, you will need to apply an RTRIM in your Source and/or Lookup transformation. If you are convinced RTRIM isn't working your source, add a Derived Column Transformation and then apply your RTRIM there but I find it's better to abuse the database instead of SSIS.
Biml
Biml, the Business Intelligence Markup Language, describes the platform for business intelligence. BIDS Helper, is a free add on for Visual Studio/BIDS/SSDT that we're going to use to transform a Biml file below into an SSIS package.
The following biml will generate the
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Connections>
<OleDbConnection Name="CM_OLE" ConnectionString="Data Source=localhost\dev2012;Initial Catalog=tempdb;Provider=SQLNCLI11.0;Integrated Security=SSPI;" />
</Connections>
<Packages>
<Package ConstraintMode="Linear" Name="so_26719974">
<Tasks>
<Dataflow Name="DFT Demo">
<Transformations>
<OleDbSource
ConnectionName="CM_OLE"
Name="OLESRC Source">
<DirectInput>
SELECT
D.SourceColumn
, D.Description
FROM
(
VALUES
(CAST('T' AS varchar(4)), 'T')
, (CAST('R' AS varchar(4)), 'R')
, (CAST('' AS varchar(4)), 'Empty string')
, (CAST(' ' AS varchar(4)), 'Blanks')
, (NULL, 'NULL')
) D (SourceColumn, Description);
</DirectInput>
</OleDbSource>
<Lookup
Name="LKP POC"
OleDbConnectionName="CM_OLE"
NoMatchBehavior="RedirectRowsToNoMatchOutput"
>
<DirectInput>
SELECT
D.SourceColumn
FROM
(
VALUES
(CAST('T' AS varchar(4)))
, (CAST('R' AS varchar(4)))
, (CAST('' AS varchar(4)))
) D (SourceColumn);
</DirectInput>
<Inputs>
<Column SourceColumn="SourceColumn" TargetColumn="SourceColumn"></Column>
</Inputs>
</Lookup>
<DerivedColumns Name="DER Default catcher" />
<DerivedColumns Name="DER NoMatch catcher">
<InputPath OutputPathName="LKP POC.NoMatch" />
</DerivedColumns>
</Transformations>
</Dataflow>
</Tasks>
</Package>
</Packages>
</Biml>
The issue is that FULL Cache uses a .Net equality comparison and Partial and None use SQL.
I have had a similar issue where all works well with a Partial cache and when I use Full, I get Errors with Row not found, as I'm Failing on No Match.
My issue was a lower case string in the source and an UPPER version in the Lookup table, so Full/.Net sees these as different and Partial/SQL are happy to do a Case insensitive join.
Output the No Match rows to a csv file if you want to see the rows that are failing.

Resources