ADF V2 - SQL source dataset - column structure mapping issue - dataset

In copy activity (SQL data set to azure blob), i'm using dynamic content for source data set, sink data set & Mapping of source and sink.
in SQL source i used SP output having 3 columns named (col1,col2,col3) in same order, but in source data set structure i used dynamic content with same name but different order (col2, col1, col3), because of that values are swapped between col1 & col2 in the source data set itself
My question is why name based mapping is not taking in ADF V2 data set.
in the same way for another Source (SP Output) returns 7 columns, if i want to use only 3 columns it picking first 3 columns only, there is no leverage of columns to choose using dynamic content.

The dynamic schema mapping is really useful and saves a ton of work, specially when you dont have a fixed schema. In your case it seems that your schemas are always the same, so why not do the mapping yourself?
Just go to your copy activity, select the tab Mapping and click the button "New Mapping". It will pop 2 textbox with a line, indicating a column from source, being mapped to a column in the sink.
Just fill it with the corresponding names, and you should be good to go.
Hope this helped!

Related

How to use Sybase LOAD TABLE correctly?

I have been using ISQL (SQLAnywhere 12) to import data from CSVs into existing tables using INPUT INTO and never ran into a problem. Today I needed to import data into a table containing an auto-increment column, however, and thought I just needed to leave that column blank, so I tried it with a file containing only 1 row of data (to be safe). Turns out it imported with a 0 in the auto-increment field instead of the next integer value.
Looking at the Sybase documentation, it seems like I should be using LOAD TABLE instead, but the examples look a bit complex.
My questions are the following...
The documentation says the CSV file needs to be on the database server and not the client. I do not have access to the database server itself - can I load the file from within ISQL remotely?
How do I define the columns of the table I'm loading into? What if I am only loading data into a few columns and leaving the rest as NULL?
To confirm, this will leave existing data in the table as-is and simply add to it using whatever is in the CSV?
Many thanks in advance.
Yes. Check out the online documentation for LOAD TABLE - you can use the USING CLIENT FILE clause.
You can specify the column names in parens after the table name, i.e. LOAD TABLE mytable (col1, col2, col3) USING CLIENT FILE 'mylocalfile.txt'. Any columns not listed here will be set to NULL if the column is nullable or the equivalent to an empty string if it's not - this is why your autoincrement column was set to 0. You can use the DEFAULTS ON clause to get what you want.
Yes, existing data in the table is not affected.

SSIS dynamic columns validation

I'm trying to use Dynamic Column mapping by selecting the destination table using the Variable Name option in the OLEDB destination. I'm getting the error: "OLE DB Destination" failed validation and returned validation status "VS_NEEDSNEWMETADATA".
I understand from what I've read that Dynamic column validation is not possible in SSIS. But then, why is it possible to select table destination in OLEDB using a variable name? Isn't it dynamic column mapping?
What I'm trying to do is to create a foreach loop to read a list of tables and import these tables from the source db to the staging area. Using the Variable Name destination within OLEDB seems perfect to me, but it does not work, even by enabling DelayValidation in the dataflow.
Thanks,
Rodrigo
Why would I use a TableName from Variable for my OLE DB Destination?
I automate the heck out of my SSIS package development. Instead of having to specify each table name, I have a variable called FullyQualifiedName that I populate once and then reuse for my package. Think of a truncate and reload pattern: Execute SQL Task to clear out the target table, A Foreach loop to load all the files-either because the names are dynamic or I have multiple days worth of data to load, and then Archive the file. I'd need to reference that table at least twice in that scenario. By having the table name in a variable, I can define it once and reference it in many different locations.
I have worked in environments where we physically isolate data based on the customer. i.e Blackstone.Sales, Yampas.Sales, Ranger.Sales, etc. When the customer logs in, their account can only access data in their schema. The tables are identical in structure but they have different names to ensure isolation. For a scenario like that, you could be matching file name to target table and therefore want to use a Variable to control what table is written to.
As you've already determined, you cannot accomplish dynamic column mapping in the manner you are attempting. If it's a straight copy from source to your staging environment, I'd just use a technology like Biml to generate the packages and be done with it.
I have faced and worked on such requests. NO, SSIS won't allow you dynamic column mappings. So I had tried something on the lines of below:
You need to first use your knowledge of the system and put together a sort of configuration table that would tell you the following things -
-Source Table(SourceTable)
-Columns to be extracted from source table(SourceQuery)
HINT: A SELECT query..e.g. SELECT ID, Name, Salary from dbo.tblEmployee
-Destination Table(DestinationTable)
-Columns which need to be fed from the source
-Few other details like server name/connection properties etc..
You would need to later traverse through the rows of this table using a ForEach Loop container.
Next, identify the maximum number of columns and maximum length of data types in these columns, in the source that might be up for extracting. You would need to create a table with information soon.
Create a sort of staging table let's say StgData. I will create this table with 50 columns, all of data type NVARCHAR(MAX). The CREATE statement should look like:
CREATE TABLE StgData
(
Column1 NVARCHAR(MAX),
Column2 NVARCHAR(MAX),
Column3 NVARCHAR(MAX),
....
Column50 NVARCHAR(MAX)
)
The raw data would be loaded onto StgData.
Now have a ForEach loop container traversing through ETLMappings.
Inside this, you would have to use INSERT statements in Execute SQL Task to load the data.
The script inside the task would look like:-
INSERT INTO dbo.StgData
?
? corresponds to the SourceQuery column(which should be captured by ForEach container.
Once the StgData is loaded, it should be used to load the DestinationTable(also captured in ForEach loop container)
Now again you need to have good understanding on schema and column mapping. The configuration table should have a column which stores the SQL query in the form
INSERT INTO DestTable1 SELECT Col1, CAST(Col2 as float) Col2 FROM StgData
Something on those lines.
This is just a basic structure. Ofcourse lot of formatting and customization has to be added.

How to transfer only new records between two different databases (ie. Oracle and MSSQL) using SSIS?

Do you know how to transfer only new records between two different databases (ie. Oracle and MSSQL) using SSIS? There is no problem transfering new data only between two tables in the same database and server, but is this possible to do such operation between completely different servers and databases?
Ps. I know about solution using Lookup but it is not very efficient if anybody needs to check and add a lot of records (50k and more) several times per day. I would like to operate with new data only.
You have several options:
Timestamp based solution
If you have a column which stores the insertation time in the source system, you can select only the new records created since the last load. With the same logic, you can transfer modified records too, just mark the records with the timestamp value when it change.
Sequence based solution
If there is a sequence in the source table, you can load the new records based on that sequence. Query the last value from the destination system, then load avarything which is larger than that value.
CDC based solution
If you have CDC (Change Data Capture) in your source system, you can track the changes and you can load them based on the CDC entries.
Full load
This is the most resource hungry solution: you have to copy all data from the source to the destination. If you do not have any column which marks the new records, you should use this solution.
You have several options to achieve this:
TRUNCATE the destination table and reload it from source
Use a Lookup component to determine which records are missing
Load all data from source to a temporary table and write a query which retrieves the new/changed records.
Summary
If you have at least one column, which marks the new/modified records, you can use it to implement a differential/incremental load with SSIS. If you do not have any clue, which columns/rows are changed, you have to load (or at least query) all of them.
There is no solution which enables a one-query (INSERT .. SELECT) solution using multiple servers without transferring all data. (Please note, that a multi-server query using Linked Servers are transfers the data from the source system).
What about variables? Is it possible to use the same variable between different databases and servers in SSIS?
I would like to transfer last id number from a destination table and transfer it to the source table (different server!).
I can set a variable in a database scope like this:
DECLARE #Last int
SET #Last = (SELECT TOP 1 Id FROM dbo.Table_1 ORDER BY Id DESC)
SELECT *
FROM dbo.Table_2
WHERE ID > #Last;
However it works between two tables in the same database (as a SQL command) only. I can create a variable for a entire SSIS package in Variables --> Add variable, but I don't know it is possible to use the variable in a similar way as above - to keep an information about last id in a destination table and pass it to another table on a source server as data limit.

SSIS - how to use lookup to add extra columns within data flow

I have a csv file containing the following columns:
Email | Date | Location
I want to pretty much throw this straight into a database table. The snag is that the Location values in the file are strings - eg: "Boston". The table I want to insert into has an integer property LocationId.
So, midway through my data flow, I need to do a database query to get the LocationId corresponding to the Location. eg:
SELECT Id as LocationId FROM Locations WHERE Name = { location string from current csv row }
and add this to my current column set as a new value "LocationId".
I don't know how to do this - I tried a lookup, but this meant I had to put the lookup in a separate data flow - the columns from my csv file don't seem to be available.
I want to use caching, as the same Locations are repeated a lot and I don't want to run selects for each row when I don't need to.
In summary:
How can I, part-way through the data flow, stick in a lookup transformation (from a different source, sql), and merge the output with the csv-derived columns?
Is lookup the wrong transformation to use?
Lookup transformation will work for you, it has caching and you can persist all input columns to the output and add columns from the query you use in lookup transformation.
You can also use merge join here, in some cases it is better solution, however it brings additional overhead because it requres sorting for its inputs.
Check this.
Right click on look up transformation -> go to show advanced editor -> go to Input and output properties.
here you can add new column or you can change data type of existing columns.
for more info how to use look up Click Here
Open the flat file connection manager, go to the Advanced tab.
Click "New" to add the new column and modify the properties.
Now go back to the Flat File Destination output, right click > Mappings > map the lookup column with the new one.

How do you get an SSIS package to only insert new records when copying data between servers

I am copying some user data from one SqlServer to another. Call them Alpha and Beta. The SSIS package runs on Beta and it gets the rows on Alpha that meet a certain condition. The package then adds the rows to Beta's table. Pretty simple and that works great.
The problem is that I only want to add new rows into Beta. Normally I would just do something simple like....
INSERT INTO BetaPeople
SELECT * From AlphaPeople
where ID NOT IN (SELECT ID FROM BetaPeople)
But this doesn't work in an SSIS package. At least I don't know how and that is the point of this question. How would one go about doing this across servers?
Your example seems simple, looks like you are adding only new people, not looking for changed data in existing records. In this case, store the last ID in the DB.
CREATE TABLE dbo.LAST (RW int, LastID Int)
go
INSERT INTO dbo.LAST (RW, LastID) VALUES (1,0)
Now you can use this to insert the last ID of the row transferred.
UPDATE dbo.LAST SET LastID = #myLastID WHERE RW = 1
When selecting OLEDB source, set data access mode to SQL Command and use
DECLARE #Last int
SET #Last = (SELECT LastID FROM dbo.LAST WHERE RW = 1)
SELECT * FROM AlphaPeople WHERE ID > #Last;
Note, I do assume that you are using ID int IDENTITY for your PK.
If you have to monitor for data changes of existing records, then have the "last changed" column in every table, and store time of the last transfer.
A different technique would involve setting-up a linked server on Beta to Alpha and running your example without using SSIS. I would expect this to be way slower and more resource intensive than the SSIS solution.
INSERT INTO dbo.BetaPeople
SELECT * FROM [Alpha].[myDB].[dbo].[AlphaPeople]
WHERE ID NOT IN (SELECT ID FROM dbo.BetaPeople)
Add a lookup between your source and destination.
Right click the lookup box to open Lookup Transformation Editor.
Choose [Redirect rows to no match output].
Open columns, map your key columns.
Add an entry with the table key in lookup column , lookup operation as
Connect lookup box to destination, choose [Lookup no Match Output]
Simplest method I have used is as follows:
Query Alpha in a Source task in a Dataflow and bring in records to the data flow.
Perform any needed Transformations.
Before writing to the Destination (Beta) perform a lookup matching the ID column from Alpha to those in Beta. On the first page of the Lookup Transformation editor, make sure you select "Redirect rows to no match output" from the dropdown list "Specify how to handle rows with now matching error"
Link the Lookup task to the Destination. This will give you a prompt where you can specify that it is the unmatched rows that you want to insert.
This is the classical Delta detection issue. The best solution is to use Change Data Capture with/without SSIS. If what you are looking for is a once in a life time activity, no need to go for SSIS. Use other means such as linked server and compare with existing records.
The following should solve issue of loading Changed and New records using SSIS:
Extract Data from Source usint Data flow.
Extract Data from Target.
Match on Primary key Add Unmatch records and split matched and unmatched records from Source and Matched records from Target call them Matched_Source,
Unmatch_Source and Matched_Target.
Compare Matched_Source and Matched_Target and Split Matched_Source to Changed and Unchanged.
Null load TempChanged Table.
Add Changed Records to TempChanged.
Execute SQL script/stored proc to Delete Records from Target for primary key in TempChanged and add records in TempChanged to Target.
Add Unmatched_Source to Target.
Another solution would be to use a temporary table.
In the properties for Beta's connection manager, change RetainSameConnection to true (by default SSIS runs each query in it's own connection, this would mean the temporary table would be killed as soon as it has been created).
Create a SQL Task using Beta's connection and use the following SQL to create your temporary table:
SELECT TOP 0 *
INTO ##beta_temp
FROM Beta
Next create a data flow that pulls data from Alpha and loads into ##beta_temp (you will need to run the SQL statement above on SSMS first so that Visual Studio can see the table at design time and you will also need to set the DelayValidation property to true on the Data Flow task).
Now you have two tables on the same server and you can just use your example SQL modified to use the temporary table.
INSERT INTO Beta
SELECT * FROM ##beta_temp
WHERE ID NOT IN (SELECT ID FROM Beta)

Resources