load all databases from the source to ADLG2(azure data lake gen2) - sql-server

is there a way to load all databases from the source SQL Server to the data lake as it is?
I tried to load each database with his tables but I am asking if there was a way to load all databases as it is to the data lake

One of the ways to load all databases from SQL to ADLG2 is by using Azure Data Factory
Follow the below procedure to load all databases from SQL to ADLG2:
First create one pipeline and take script activity in it add linked service with master database select script as Query And give the following query:
SELECT name, database_id, create_date
FROM sys.databases;
Then take ForEach activity and int its settings give Items as so it will fetch output of script activity
#activity('Script1').output.resultSets[0].rows
In for each activity take one lookup activity, create and add linked service for database with dynamic values
In that dataset add Db name parameter
Noe send this parameter value to linked service properties as below
Lookup activity settings
SELECT table_Schema, TABLE_NAME, TABLE_CATALOG
FROM information_Schema.tables
WHERE TABLE_TYPE = 'BASE TABLE'sql
now take execute pipeline activity click on new in that pipeline create lookupOP parameter with array datatype and in execute pipeline pass the value to it as output of lookup as #activity('Lookup1').output.value
In that new pipeline take ForEach activity and passthe parameter we created as items
In that for each activity take one copy activity and for the source dataset create linked service on SQL database with dynamic values as we created previously
In this dataset create parameters for database name, table name and schema name
now add these dynamic values to linked service properties and Table name, table schema
Copy activity source setting:
create parameters in sink dataset
now add these dynamic values to folder name, file name
Copy activity Sink settings
Output
creating folder of database name and in that folder loading tables of that particular database

Related

Azure Data Factory: Return Identifier values from a Copy Data activity

I am updating an on-premises SQL Server database table with data from a csv file using a Copy Data activity. There is an int identity Id column on the sink table that gets generated when I do the Upsert. I would like to retrieve the Id value generated in that table to use later in the pipeline.
Is there a way to do this?
I can't use a data flow as I am using a self-hosted Integration Runtime.
Hi #Nick.McDermaid, I am loading about 7,000 rows from the file to the database. I want to store the identities in the database the file comes from.
Edit:
I have 2 databases (source/target). I want to upsert (using MERGE SQL below, with the OUTPUT clause) into the target db from the source db and then return the Ids (via the OUTPUT resultset) to the source db. The problem I have is that the upsert (MERGE) SQL gets it's SELECT statement from the same target db that the target table is in (when using a Copy Data activity), but I need to get the SELECT from the source db. Is there a way to do this, maybe using the Script activity?
Edit 2: To clarify, the 2 databases are on different servers.
Edit 3 (MERGE Update):
MERGE Product AS target
USING (SELECT [epProductDescription]
,[epProductPrimaryReference]
FROM [epProduct]
WHERE [epEndpointId] = '438E5150-8B7C-493C-9E79-AF4E990DEA04') AS source
ON target.[Sku] = source.[epProductPrimaryReference]
WHEN MATCHED THEN
UPDATE SET [Name] = source.[epProductDescription]
,[Sku] = source.[epProductPrimaryReference]
WHEN NOT MATCHED THEN
INSERT ([Name]
,[Sku]
VALUES (source.[epProductDescription]
,source.[epProductPrimaryReference]
OUTPUT $action, inserted.*, updated.*;
Edit 3 (sample data):
source sample:
target output
Is there a way to do this, maybe using the Script activity?
Yes, you can execute this script using Script activity in ADF
As your tables are on different SQL servers first you have to create Linked server with source database on target Database.
go to >> Server Objects >> Linked Server >> New Linked server and create linked server with source database on target Database as below.
While creating linked server make sure same user must exist on both databases.
then I wrote Merge Query using this linked sever source.
My Sample Query:
MERGE INTO PersonsTarget as trg
USING (SELECT [LastName],[FirstName],[State]
FROM [OP3].[sample1].[dbo].[Personssource]) AS src
ON trg.[State] = src.[State]
WHEN MATCHED THEN
UPDATE SET [LastName] = src.[LastName]
,[FirstName] = src.[FirstName]
WHEN NOT MATCHED THEN
INSERT ([LastName],[FirstName],[State])
VALUES (src.[FirstName],src.[LastName],src.[State])
OUTPUT $action, inserted.*;
Then In Script activity I provided the script
Note: In linked service for on premises target table use same user which you used in linked service
Executed successfully and returning Ids:

How to do an inner join rather than for each loop in SSIS?

On the ETL server I have a DW user table.
On the prod OLTP server I have the sales database. I want to pull the sales only for users that are present in the user table on the ETL server.
Presently I am using an execute SQL task to fetch the DW users into a SSIS System.Object variable. Then using a for each loop to loop through each item (userid) in this variable and via a data flow task fetch the OLTP sales table for each user and dump it into the DW staging table. The for each is taking long time to run.
I want to be able to do an inner join so that the response is quicker, but I cant do this since they are on separate servers. Neither can I use a global temp table to make the inner join, for the same reason.
I tried to collect the DW users into a comma separated string variable and then using it (via string_split) to query into OLTP, but this is also taking more time at the pre-execute phase (not sure why exactly) even for small number of users.
I also am aware of lookup transform but that too will result in all oltp rows to be brought into the dw etl server to test the lookup condition.
Is there any alternate approach to be able to do an inner join by taking the list of users into the source?
Note: I do not have write permissions on the OLTP db.
Based on the comments, I think we can use a temporary table to solve this.
Can you help me understand this restriction? "Neither can I use a global temp table to make the inner join, for the same reason."
The restriction is since oltp server and dw server are separate so can't have global temp table common to both servers. Hope makes sense.
The general pattern we're going to do is
Execute SQL Task to create a temporary table on the OLTP server
A Data Flow task to populate the new temporary table. Source = DW. Destination = OLTP. Ensure Delay Validation = True
Modify existing Data Flow. Modify source to be a query that uses the temporary table i.e. SELECT S.* FROM oltp.sales AS S WHERE EXISTS (SELECT * FROM #SalesPerson AS SP WHERE SP.UserId = S.UserId); Ensure Delay Validation = True
A long form answer on using temporary tables (global to set the metadata, regular thereafter)
I don't use temp table in SSIS
Temporary tables, live in tempdb. Your OLTP and DW connection managers likely do not point to tempdb. To be able to reference a temporary table, local or global, in SSIS you need to either define an additional connection manager for the same server that points explicitly at tempdb so you can use the drop down in the source/destination components (technically accurate but dumb). Or, you use an SSIS Variable to hold the name of the table and use the ~From Variable~ named option in source/destination component (best option, maximum flexibility).
Soup to nuts example
I will use WideWorldImporters as my OLTP system and WideWorldImportersDW as my DW system.
One-time task
Open SQL Server Management Studio, SSMS, and connect to your OLTP system. Define a global temporary table with a unique name and the expected structure. Leave your connection open so the table structure remains intact during initial development.
I used the following statement.
DROP TABLE IF EXISTS #SO_70530036;
CREATE TABLE #SO_70530036(EmployeeId int NOT NULL);
Keep track of your query because we'll use it later on but as I advocate in my SSIS answers, perform the smallest task, test that it works and then go on to the next. It's the only way to debug.
Connection Managers
Define two OLE DB Connection Managers. WWI_DW uses points to the named instance DEV2019UTF8 and WWI_OLTP points to DEV2019EXPRESS. Right click on WWI_OLTP and select Properties. Find the property RetainSameConnection and flip that from the default of False to True. This ensures the same connection is used throughout the package. As temporary tables go out of scope when the connection goes away, closing and reopening a connection in a package will result in a fatal error.
These two databases on different instances so we can't cheat and directly comingle data.
Variables
Define 4 variables in SSIS, all of type String.
TempTableName - I used a value of ##SO_70530036 but use whatever value you specified in the One-time task section.
QuerySourceEmployees - This will be the query you run to generate the candidate set of data to go into the temporary table. I used SELECT TOP (3) E.[WWI Employee ID] AS EmployeeId FROM Dimension.Employee AS E WHERE E.[Is SalesPerson] = CAST(1 AS bit);
QueryDefineTables - Remember the drop/create statements from the on-time task? We're going to use the essence of them but use the expression builder to let us dynamically swap the table name. I clicked the ellipses, ..., on the Expression section and used the following "DROP TABLE IF EXISTS " + #[User::TempTableName] + "; CREATE TABLE " + #[User::TempTableName] + "( EmployeeId int NOT NULL);" You should be able to copy the Value from the row and paste it into SSMS to confirm it works.
QuerySales - This is the actual query you're going to use to pull your filtered set of sales data. Again, we'll use the Expression to allow us to dynamically reference the temporary table name. The prettified version of the expression would look something like
"SELECT
SI.InvoiceID
, SI.SalespersonPersonID
, SO.OrderID
, SOL.StockItemID
, SOL.Quantity
, SOL.OrderLineID
FROM
Sales.Invoices AS SI
INNER JOIN
Sales.Orders AS SO
ON SO.OrderID = SI.OrderID
INNER JOIN
Sales.OrderLines AS SOL
ON SO.OrderID = SOL.OrderID
WHERE
EXISTS (SELECT * FROM " + #[User::TempTableName] + " AS TT WHERE TT.EmployeeID = SI.SalespersonPersonID);"
Again, you should be able to pull the Value from the three queries and run them independently and verify they work.
Execute SQL Task
Add an Execute SQL task to the Control Flow. I named mine SQL Create temporary table My Connection Manager is WWI_OLTP and I changed the SQLSourceType to Variable and the SourceVariable is User::QueryDefineTables
Every time your package runs, the first thing it will do is establish create the temporary table. Which is good because SSIS is a metadata driven ETL engine and the next two steps would fail if the table didn't exist.
Data Flow Task - Prime the pump
This data flow is where we'll transfer DW data back to the OLTP system so can filter in the source system.
Drag a Data Flow Task onto the Control Flow. I named mine DFT Load Temp and before you click into it, right click on the Task and find the DelayValidation property and change this from the default of False to True. Normally, a package validates all metadata before actual execution begins as the idea is you want to know everything is good before any data starts moving. Since we're using temporary tables, we need to tell the execution engine "trust us, it'll be ready"
Double click inside the Data Flow Task.
Add an OLE DB Source. I named mine OLESRC SourceEmployees I use the connection manager WWI_DW. My data access mode changes to SQL command from variable and then I select my variable User::QuerySourceEmployees
Add an OLE DB Destination. I named mine OLEDST TempTableName and double clicked to configure it. The Connection Manager is WWI_OLTP and again, since the table lives in tempdb, we can't select it from the drop down. Change the Data access mode to Table name or view name variable - fast load and then select your variable name User::TempTableName. Click the Mapping tab and ensure source columns map to destination columns.
Data Flow Task - Transfer data
Finally, we will pull our source data, nicely filtered against the data from our target system.
Add an OLE DB Source. I named it OLESRC QuerySales. The Connection Manager is WWI_OLTP. Data access mode again changes to SQL command from variable and the variable name is User::QuerySales
From here, do whatever else you need to do to make the magic happen.
Instead of having 270k rows with an unfiltered query
I have 67k as there are only 3 employees in the temporary table.
Reference package
But wait, there's more!
Close out visual studio, open it back up and try to touch something in the data flows. Suddenly, there are red Xs everywhere! Any time you close a data flow component, it fires a revalidate metadata operation and guess what, it can't do that as the connection to the temporary table is gone.
The package will run fine, it will not throw VS_NEEDSNEWMETADATA but editing/maintenance becomes a pain.
If you switched from global temporary table to local, switch the table name variable's value back to a global and then run the define statement in SSMS. Once that's done, then you can continue editing the package.
I assure you, the local temporary table does work once you have the metadata set and you use queries via variables for source/destination.
No need for the global temporary table hack, or the SET FMTONLY OFF hack (which no longer works).
Just specify the result set metadata in the SQL query with WITH RESULT SETS. eg
EXEC ('
create table #t
(
ID INT,
Name VARCHAR(150),
Number VARCHAR(15)
)
insert into #t (Id, Name, Number)
select object_id, name, 12
from sys.objects
select * from #t
')
WITH RESULT SETS
(
(
ID INT,
Name VARCHAR(150),
Number VARCHAR(15)
)
)
If you need to parameterize the query, there's a bit of a catch because there are some limitations in how SSIS discovers parameters. SSIS runs sp_describe_undeclared_parameters, which doesn't really work with batches that call sp_executesql, because sp_executesql has a very unique way it handles parameters, one which you couldn't replicate with a user stored procedure.
So to parameterize the query you'll either need to pass the parameter values into the query using the "query from variable" and SSIS expressions, or push all this TSQL into a stored procedure.

Sink to SQL Server table using azure data factory pipeline using stored procedure not working

I need to copy blob storage csv file data to sql server table. But I also required some additional column in destination while using copy activity in azure data factory pipeline.
I refer this link 'https://blog.pragmaticworks.com/using-stored-procedure-in-azure-data-factory'.
I created table type and insert data stored procedure. But when I'm trying to use that in copy activity 'sink', it gives me 2 textboxex in sink dataset table parameter.
As you can see in above image, we need to put parameter name in 'Table' text box. But now we have 2 textbox rather having only one.
Are there some update in azure data factory pipeline published which show this additional thing and not the way it was previously.
Stored procedure is as below.
ALTER PROCEDURE [dbo].[Insert_SFl_Data]
(
#Passing [dbo].[STG_SFl] READONLY,
#SourceFileName NVARCHAR(500)
)
AS
BEGIN
INSERT INTO [dbo].[SFl]
([DateExtracted]
,[Country]
,[SourceFileName])
SELECT *, #SourceFileName FROM #Passing
END

Import SQL tables as data into access Db

I have a SQL database (lets use northwind), that has a number of tables (unknown number of tables). I would like to import these tables into a MS access database as DATA (not tables) into a MTT_Table
All standard imports, creates the table as a physical table within ms access and not as data.
I have a table in MS Access that needs to store all the names of tables in other systems - not sure if that makes sense
Is there any way to read an infinite number of tables and populate them as data, using an odbc connection all through VBA
Expected output would be to see the table names as data values, and potentially able to populate the MS access row with metadata about the table
Use information schema to create a view in SQL server:
CREATE VIEW dbo.Sample_View
AS
SELECT TABLE_NAME
FROM [Your_Database].INFORMATION_SCHEMA.TABLES
WHERE TABLE_TYPE = 'BASE TABLE'
Now import this view to access following the steps in this link
Your question is a bit broad (what information do you want from tables), but generally can be achieved by querying the INFORMATION_SCHEMA meta-tables over ODBC.
SELECT * INTO MTT_Table
FROM [ODBC;Driver={SQL Server};Server=my\server;Database=myDb;Trusted_Connection=Yes;].INFORMATION_SCHEMA.TABLES

ETL Script to dynamically map multiple EXECUTE SQL resultset to multiple tables (table name based on sql file provided)

ETL Script to dynamically map multiple execute sql resultset to multiple tables (table name based on sql file provided)
I have a source folder with sql files ( I can put it up as stored procedures as well ) . I know how to loop and execute sql tasks in a foreach container. Now the part where I'm stuck is I need to use the final result set of each sql queries and shove it into a table with the same name as the sql file.
So, Folder -> script1.sql , script2.sql etc -> ETL -> goes to table script1, table script2 etc.
EDIT : Based on the comment made by Joe, I just want to say that I'm aware of using insert within a script but I need to insert it onto a table in a different server.And Linked servers are not the ideal solutions
Any psuedocode or link to tutorials will be extremely helpful . Thanks!
I would add the table creation to the script. It is probably the simplest way to do this. If your script is Select SomeField From Table1, you could change it to Select SomeField Into Table script1 From Table1. Then there is no need to map in SSIS which is not easy to do from my experience.

Resources