SSIS How to import xml elements into table - sql-server

I am using XML source in SSIS to import XML file into SQL Server database.
I am not using all detail elements from XML file. But I want to save original element with all details in case they are needed at some point in future.
Lets say xml:
<root>
<row>
<desc>Some row</desc>
<child>
<hi>hi</hi>
<ho>ho</ho>
</child>
</row>
<row>
<desc>Some row2</desc>
<child>
<hi>hi2</hi>
<ho>ho2</ho>
</child>
</row>
</root>
Intended result in structure:
Create Table ParentTable
(
Id int primary key identity,
[desc] nvarchar(50),
xmlElement xml
)
How can I load original XML element (in this case element "row") into database as well by using SSIS?

I am new in SSIS, but in internet found solution (maybe not the best but worked).
So here it comes.
First i create same table as you provided ParentTable, just changed desc to 255. Also added Connection Manager to package.
Created two new variables User::FileName = "some.xml" and User::SourceCatalog = "C:\xmlCatalog\"
Then added Data Flow Task in which I added Script Component (selected Source type).
Opened Script Transformation Editor in Script tab into ReadOnlyVariables property added newly created variables User::FileName,User::SourceCatalog.
In the tab Inputs and Outputs renamed Output 0 to XMLResultOutput and under Output Columns created two new columns xmlDesc (Data Type = Unicode string [DT_WSTR] 255) and xmlData (Data Type = Unicode string [DT_WSTR] 3000). This variables will be used later in C# script.
Pressed Edit Script... in the Script tab. In the opened window in the CreateNewOutputRows method paste this code:
XmlDocument xDoc = new XmlDocument();
string xml_filepath = Variables.SourceCatalog + Variables.FileName;
xDoc.Load(xml_filepath);
foreach (XmlNode xNode in xDoc.SelectNodes("//row"))
{
this.XMLResultOutputBuffer.AddRow();
this.XMLResultOutputBuffer.xmlData = xNode.OuterXml.ToString();
this.XMLResultOutputBuffer.xmlDesc = xNode.SelectSingleNode("./desc").InnerText;//xNode.ChildNodes[0].InnerText;
}
Don't forget to add using System.Xml;
Added OLE DB Destination component, linked Script Component to it, selected table, mapped columns and THATS IT.

Related

How to modify the DACPAC model at build time?

I want to alter the table model during build time in my BuildContributor. Here is some sample code:
using Microsoft.SqlServer.Dac.Deployment;
using Microsoft.SqlServer.Dac.Extensibility;
using Microsoft.SqlServer.Dac.Model;
using System.Collections.Generic;
using System.Linq;
namespace MyNamespace
{
[ExportBuildContributor("MyNamespace.MyBuildContributor", "1.0.0.0")]
public class MyBuildContributor : BuildContributor
{
protected override void OnExecute(BuildContributorContext context, IList<ExtensibilityError> messages)
{
foreach (var table in context.Model.GetObjects(DacQueryScopes.UserDefined, ModelSchema.Table))
{
var tableName = table.Name.Parts.Last();
var rowId = "alter table " + tableName + " add rowid uniqueidentifier";
context.Model.AddObjects(rowId);
}
}
}
}
The build succeeds with no errors but I don't see rowid in any of the tables when I go look in the model.xml file in bin\Debug\MyDb.dacpac.
You can't use Model.AddObjects in this context.
Model.AddObjects from (https://msdn.microsoft.com/en-us/library/microsoft.sqlserver.dac.model.tsqlmodel.addobjects(v=sql.120).aspx#M:Microsoft.SqlServer.Dac.Model.TSqlModel.AddObjects(System.String)):
"Adds objects to the model based on the contents of a TSql Script string. The script should consist of valid TSql DDL statements. Objects added using this method cannot be updated or deleted at a later point as update/delete requires a script name to be specified when adding the objects. If this is a requirement use the AddOrUpdateObjects method instead."
I.E it can only add objects like tables or stored procedure, columns by themselves aren't added to the model.
If you want to update an existing object (i.e. to add a column to an existing table) you will need to use "TSqlModel.AddOrUpdateObjects" which also takes a script name. You can get the script name from a build contributor by using:
var sourceName = table.GetSourceInformation().SourceName;
Then you can build the updated script you want (just a rough outline of rebuilding the SQL for stack overflow, I'm sure you can do better):
var sql = table.GetScript();
sql = sql.Trim().TrimEnd(')', ';') + ", rowid uniqueidentifier);";
var sourceName = table.GetSourceInformation().SourceName;
model.AddOrUpdateObjects(sql, sourceName, new TSqlObjectOptions());
There are a few ways you could create your new script but basically what you need is a new script which has your extra column and the original table definition which you can pass to AddorUpdateObjects to overwrite the original create table statement.
If you don't get a source to use in AddorUpdateObjects then maybe you could use a post-deploy script to add it to any table you need and then use a deployment contributor to remove the drop column step.
You could also look at using a deployment contributor instead to add the new column step to that.
Hope it helps! Let me know how you get on :)

SQL Server 2016 SSIS get cursor from stored procedure

I am using SQL Server 2016.
I have a stored procedure GET_RECORDS that takes input parameters for filter and outputs a CURSOR parameter
I want to get this cursor in my SSIS package
I had created data flow task, OleDb source and variables for parameter values. Then mapped parameters
Params mapping screen
but when I wanted to save the component - I got an error
error screen
I tried to add clause WITH RESULT SETS with some dummy columns, but my procedure doesn't return any result set
What am I doing wrong?
Any advices will be helpful.
Thank you.
With regards, Yuriy.
The source component is trying to determine what columns and types will be returned. Because you are using dynamic SQL the metadata can change each time you run it.
With result sets allows you to define the data being returned but should only be used if you are guaranteed to have those results every time you execute.
EDIT:
I create a connection and run the command so that it populates a data table. Then I put the column headers into a string array. There are plenty of examples out there.
Then I use the following function to create a destination table. Finally I create a datareader and pass that to the .Net SqlBulkCopy. Hope this helps.
private void CreateTable(string TableName, string[] Fields)
{
if (TableExists(TableName) && Overwrite)
{
SqlCommand = new SqlCommand($"Drop Table [{TableName}]", SqlConnection);
SqlCommand.ExecuteNonQuery();
}
string Sql = $"Create Table [{TableName}] (";
int ColumnNumber = 1;
foreach (string Field in Fields)
{
string FieldValue = Field;
if (! HasHeaders)
{
FieldValue = "Column" + ColumnNumber;
ColumnNumber++;
}
Sql += $"[{FieldValue}] Varchar(8000),";
}
Sql = Sql + "ImportFileID Int, ID Int Identity(1,1) Not Null, Constraint [PK_" + TableName + "] Primary Key Clustered ([ID] Asc))";
SqlCommand = new SqlCommand(Sql, SqlConnection);
SqlCommand.ExecuteNonQuery();
}
Use ado.net source instead of oledb source, define a simple select and get the columns you wish to return. Now you can define expresión in the dataflow properties.
Search ado.net source dynamic sql
:)
try to return the records and use foreach in ETL instead of cursor
https://www.simple-talk.com/sql/ssis/implementing-foreach-looping-logic-in-ssis/
I think you can do it from a simple way, but I don't know what you are you doing, exactly...

How to use selected SQL statement in SSIS package as source variable?

I created SSIS package where I need to use FELC. The first step before the loop is to run sql task to obtain all SQL statements designed to generate different XML files and stored in a source table. Inside the FELC I would like to process the statements to generate XML files and send them to various folder locations with names and target folder coming from the source table. There is hundreds of files that needs to be refreshed on regular basis. Instead of running single jobs for each XML file generation I would like to amalgamate it into one process.
Is it possible?
This is the basic Shred Recordset pattern.
I have 3 variables declared: SourceQuery, CurrentQuery and rsQueryData. The first 2 are Strings, the last is an Object type
SQL - Get source data
This is my query. It simulates your table and induces a failing SQL Statement if I take out the filter.
SELECT
ProcessID
, Stmt_details
FROM
(
VALUES
( 1, 'SELECT 1;', 1)
, ( 20, 'SELECT 20;', 1)
, ( 30, 'SELECT 1/0;', 0)
) Stmt_collection (ProcessID, Stmt_details, xmlFlag)
WHERE
xmlFlag = 1
The Execute SQL Task is set with Recordset = Full and I assign it to variable User::rsQueryData which has a name of 0 in the mapping tab.
FELC
This is a standard Foreach ADO Recordset Loop container. I use my User::rsQueryData as the source and since I only care about the second element, ordinal position 1, that's the only thing I map. I assign the current value to User::CurrentStatement
SQL - Execute CurrentStatement
This is an Execute SQL Task that has as its source the Variable User::CurrentStatement. There's no scripting involved. The FELC handles the assignment of values to this Variable. This Task uses as its source that same Variable. This is very much how native SSIS developers will approach solving a problem. If you reach for a Script Task or Component as the first approach, you're likely doing it wrong.
Biml
If you're doing any level of SSIS/SSRS/SSAS development, you want Bids Helper It is a free add on to Visual Studio that makes your development life so much easier. The feature I'm going to leverage here is the ability to declaratively define an SSIS package. This language is called the Business Intelligence Markup Language, Biml, and I love it for many reasons but on StackOverflow, I love it because I can give you the code to reproduce exactly my solution. Otherwise, I have to build out a few hundred screenshots showing you everywhere I have to click and set values.
Or, you
1. Download and install BIDS Helper
2. Open up your existing SSIS project
3. Right click on the Project and select "Add new Biml file"
4. In the resulting BimlScript.biml file, open it up and paste all of the following code into it
5. Fix the value for your database connection string. This one assumes you have an instance on your local machine called Dev2014
6. Save the biml file
7. Right click that BimlScript.biml and select "Generate SSIS Packages"
8. Marvel at the resulting so_28867703.dtsx package that was added to your solution
<Biml xmlns="http://schemas.varigence.com/biml.xsd">
<Connections>
<OleDbConnection Name="CM_OLE" ConnectionString="Data Source=localhost\dev2014;Initial Catalog=tempdb;Provider=SQLNCLI10.1;Integrated Security=SSPI;Auto Translate=False;" />
</Connections>
<Packages>
<Package ConstraintMode="Linear" Name="so_28867703">
<Variables>
<Variable DataType="String" Name="QuerySource">SELECT ProcessID, Stmt_details FROM (VALUES (1, 'SELECT 1;', 1), (20, 'SELECT 20;', 1), (30, 'SELECT 1/0;', 0))Stmt_collection(ProcessID, Stmt_details, xmlFlag) WHERE xmlFlag = 1 </Variable>
<Variable DataType="String" Name="CurrentStatement">This statement is invalid</Variable>
<Variable DataType="Object" Name="rsQueryData"></Variable>
</Variables>
<Tasks>
<ExecuteSQL
ConnectionName="CM_OLE"
Name="SQL - Get source data"
ResultSet="Full"
>
<VariableInput VariableName="User.QuerySource" />
<Results>
<Result VariableName="User.rsQueryData" Name="0" />
</Results>
</ExecuteSQL>
<ForEachAdoLoop
SourceVariableName="User.rsQueryData"
ConstraintMode="Linear"
Name="FELC - Shred RS"
>
<VariableMappings>
<!--
0 based system
-->
<VariableMapping VariableName="User.CurrentStatement" Name="1" />
</VariableMappings>
<Tasks>
<ExecuteSQL ConnectionName="CM_OLE" Name="SQL - Execute CurrentStatement">
<VariableInput VariableName="User.CurrentStatement" />
</ExecuteSQL>
</Tasks>
</ForEachAdoLoop>
</Tasks>
</Package>
</Packages>
</Biml>
That package will run, assuming you fixed the connection string to a valid instance. You can see below that if you put break points on the Execute SQL Task, it will light up two times. If you have a watch window on CurrentStatement, you can see it change from the design time value to the values shredded from the result set.
While we await clarification on XML and files, if the goal is to take the query from the FELC and export to file, I answered that https://stackoverflow.com/a/9105756/181965 Although in this case, I'd restructure your package to just the Data Flow and eliminate the shredding as there's no need to complicate matters to export a single row N times.
If i understand you correctly; You can add a "Script Task" from Toolbox to first step of loop container and store the selected statement from the database in to the global variable and pass it for execution in the next step

Using XPath expressions inside SQL Server stored procedure

I am trying to use an XPath expression to select nodes or node-sets in an XML document inside a SQL Server stored procedure.
I am trying to do something similar to code in c#
XmlDocument xmlDoc = new XmlDocument();
xmlDoc.Load(response.GetResponseStream());
XmlNode contact = xmlDoc.SelectSingleNode("//users/user/contact");
string strContact = contact.InnerText.Trim();
The XML is a result of call to Web Service from within a stored procedure.
Similar to this example.
Calling Web Service from stored procedure
However the XML looks like this:
<DocumentElement xmlns="">
<stats>
<delivered>5</delivered>
</stats>
</DocumentElement>
I need to retrieve the value of the node delivered using a statement similar to
XmlNode delivered = xmlDoc.SelectSingleNode("//stats/delivered");
You're obviously ignoring the XML namespace that's defined on your root element - you need to respect that XML namespace and include it in your query!
-- Sample data - using a *SAMPLE* namespace - replace with your own!
DECLARE #input XML = '<DocumentElement xmlns="urn:sample-namespace">
<stats>
<delivered>5</delivered>
</stats>
</DocumentElement>'
-- set up a query that *uses* that XML namespace instead of ignoring it...
;WITH XMLNAMESPACES( DEFAULT 'urn:sample-namespace')
SELECT
#input.value('(/DocumentElement/stats/delivered)[1]', 'int')

finding latest excel file from folder using ssis

I have group of excel files in a folder. excel file name will be like
ABC 2014-09-13.xlsx
ABC 2014-09-14.xlsx
ABC 2014-09-15.xlsx
I need to get the data from latest excel file and load it into the table using ssis package.
This may not be the shortest answer, but will help you.
Steps:
Create a For-each loop, to fetch all the excel sheets. Insert all the excel sheet names to a table.
Create a variable. Assign its value as the MAX() among Excel dates.
Add a 2nd Fore-each loop. Just like the 1st loop, pick all the excel sheets 1 by 1, compare each file name with Variable value. Load the excel which matches it.
As this is duplicate question, I will put answer anyway with some changes or additional info.
You should have created table for excel to import and added Connection Manager into package.
Create 2 variables MainDir, where excel files exists, and ExcelFile to hold last file full name.
Add Script Task to package. Open it and in the Script tab add ReadOnlyVariables = User::MainDir and ReadWriteVariables = User::ExcelFile
Press Edit Script... button and in the new window paste this code:
into Main
string fileMask = "*.xlsx";
string mostRecentFile = string.Empty;
string rootFolder = string.Empty;
rootFolder = Dts.Variables["User::MainDir"].Value.ToString();
System.IO.DirectoryInfo directoryInfo = new System.IO.DirectoryInfo(rootFolder);
System.IO.FileInfo mostRecent = null;
System.IO.FileInfo[] legacyArray = directoryInfo.GetFiles(fileMask, System.IO.SearchOption.TopDirectoryOnly);
Array.Sort(legacyArray, (f2, f1) => f2.Name.CompareTo(f1.Name));
mostRecent = legacyArray[legacyArray.Length - 1];
if (mostRecent != null)
{
mostRecentFile = mostRecent.FullName;
}
Dts.Variables["User::ExcelFile"].Value = mostRecentFile;
Dts.TaskResult = (int)ScriptResults.Success;`
Create Excel Connection Manager and in the Edit mode select Excel file path to some excel, Excel version and if needed keep First row has column names checked.
In the properties of Excel Connection Manager find Expressions and add Property ExcelFilePath with value #[User::ExcelFile]
Put Data Flow Task, connect with Script task.
Add Excel Source into Data Flow Task. Open editor. Select Excel Connection Manager you created before, Data access mode change to SQL command and add this line (make sure, that excel file sheet name is Sheet1): SELECT * FROM [Sheet1$]. Also check if all necessary columns selected in Columns tab.
The last component is OLE DB Destination, which you must connect with Excel Source component. Add connection manager, select table and mappings to table you want to insert.
That's all you need to do to insert excel...

Resources