When developing a Azure SQL Data warehouse with SSIS. We have 2-phrase steps
to 1) copy data source to staging table, 2) copy staging table to report table
My question is, will SSIS actually extract data through it's own server, even it knows source & target are the same OLE DB provider? Or it's smart enough to use "SELECT INTO FROM SELECT * FROM .."? This makes a difference to us as Azure calculate the cost on exporting data out from Azure, and we have a lot of similar copying actions in DW, and SSIS is the only machine on-premise.
We could define a series of SQL statement tasks with nested query but it's hard to manage for TransactionOption in such a quantity.
Thanks.
I do not think it will do this as SSIS was designed with so many hooks into the pipe that it is counter-intuitive to try to optimize by skipping it.
You could however use TSQL to do the Select Into and keep processing on the same server as the database engine.
If you need to switch between the two methods you can have a parameter to your package and conditional execution via constraints.
Related
I am relatively new to SSIS and have to come up with a SSIS package for work such that certain tables must be dynamically moved from one SQL server database to another SQL server database. I have the following constraints that need to be met:
Source table names and destination table names may differ so direct copying of table does not work with transfer SQL server object task.
Only certain columns may be transferred from source table to destination table.
This package needs to run every 5 minutes so it has to be relatively fast.
The transfer must be dynamic such that if there are new source tables, the package need not be reconfigured with hard coded values.
I have the following ideas for now:
Use transfer SQL Server object task but I'm not sure if the above requirements can be met, especially selective transfer of tables and dynamic mapping of columns.
Use SQLBulkCopy in a script component to perform migration.
I would appreciate if anyone could give some direction as to how I can go about meeting the requirements and if my existing ideas are possible.
I have more than 500s of tables in SQL server that I want to move to Dynamics 365. I am using SSIS so far. The problem with SSIS is the destination entity of dynamics CRM is to be specified along with mappings and hence it would be foolish to create separate data flows for entities for 100s of SQL server table sources. Is there any better way to accomplish this?
I am new to SSIS. I don't feel this is the correct approach. I am just simulating the import/export wizard of SQL server. Please let me know if there are better ways
It's amazing how often this gets asked!
SSIS cannot have dynamic dataflows because the buffer size (the pipeline) is calculated at design time (as opposed to execution time).
The only way you can re-use a dataflow is if all the source to target mappings are the same - Eg if you have 2 tables with exactly the same DDL structure.
One option (horrible IMO) is to concatenate all columns into a massive pipe-separated VARCHAR and then write this to your destination into a custom staging table with 2 columns eg (table_name, column_dump) & then "unpack" this in your target system via a post-Load SQL statement.
I'd bite the bullet, put on your headphones and start churning out the SSIS dataflows one by one - you'd be surprised how quick you can bang them out!
ETL works that way. You have to map source, destination & column mapping. If you want that to be dynamic that’s possible in Execute SQL task inside foreach loop container. Read more
But when we are using Kingswaysoft CRM destination connector - this is little tricky (may or may not be possible?) as this need very specific column mapping between source & destination.
That too when the source schema is from OLEDB, better to have separate Dataflow tasks for each table.
I'm currently trying to build a data flow in SSIS to select all records from a mapping table where an ID column exists in the related Item table. There are two complications:
The two tables are currently in different databases on different servers.
The databases are in Azure, for which I've read Linked Servers are not supported.
To be more clear, the job to migrate data from Staging environment to Production. I only want to push lookup records into prod if the associated Item IDs are in there. Here's some psudo-TSQL to give a clear goal of what I'm trying to achieve:
SELECT *
FROM [Staging_Server].[SourceDB].[dbo].[Lookup] L
WHERE L.[ID] IN (
SELECT P.[Item]
FROM [Production_Server].[TargetDB].[dbo].[Item] P
)
I haven't found a good way to create this in SSIS. I think I've created a work-around that involves sorting both tables and performing a merge join, but sorting both sides is an unnecessary hit on performance. I'm looking for a more direct and intuitive design for this seemingly simple data flow.
Doing this in a data flow, you'd have your Source query, sans filter, fed into a Lookup Component which is the subquery.
The challenge with this is SSIS is likely on-premises so that means you are going to pull all of your data out of Stage Azure to the server running SSIS and push it back to the Prod Azure instance.
That's a lot of network activity and as I'm reading the Azure pricing guide, I guess as long as you have the appropriate DTUs, you'd be fine. Back in the day, you were charged for Reads and not Writes so the idiom was to just push all your data to target server and then do the comparison there, much as ElendaDBA mentions. Only suggestion I'd make on the implementation is to avoid temporary tables or ad-hoc creation/destruction of them. Just implement as a physical table and truncate and reload prior to transmission to production.
You could create a temp table on staging server to copy production data into. Then you could create a query joining those two tables. After SSIS package runs, you could delete the temp table on staging server
I have an Oracle database and a SQL Server database. There is one table say Inventory which contains millions of rows in both database tables and it keeps growing.
I want to compare the Oracle table data with the SQL Server data to find out which records are missing in the SQL Server table on daily basis.
Which is best approach for this?
Create SSIS package.
Create Windows service.
I want to consume less resource to achieve this functionality which takes less time and less resource.
Eg : 18 millions records in oracle and 16/17 millions in SQL Server
This situation of two different database arise because two different application online and offline
EDIT : How about connecting SQL server from oracle through Oracle Gateway to SQL server to
1) Direct query to SQL server from Oracle to update missing record in SQL server for 1st time.
2) Create a trigger on Oracle which gets executed when record is deleted from Oracle and it insert deleted record in new oracle table.
3) Create SSIS package to map newly created oracle table with SQL server to update SQL server record.This way only few records have to process daily through SSIS.
What do you think of this approach ?
I would create an SSIS package and load the data from the Oracle table use a Data Flow / OLE DB Data Source. If you have SQL Enterprise, the Attunity Connectors are a bit faster.
Then I would load key from the SQL Server table into a Lookup transformation, where I would match the 2 sources on the key, and direct unmatched rows into a separate output.
Finally I would direct the unmatched rows output to a OLE DB Command, to update the SQL Server table.
This SSIS package will require a lot of memory, but as the matching is done in memory with minimal IO, it will probably outperform other solutions for speed. It will need enough free memory to cache all the keys from the SQL Server Table.
SSIS also has the advantage that it has lots of other transformation functions available if you need them later.
What you basically want to do is replication from Oracle to SQL Server.
You could do this in SSIS, A windows Service or indeed a multitude of platforms.
The real trick is using the correct design pattern.
There are two general design patterns
Snapshot Replication
You take all records from both systems and compare them somewhere (so far we have suggestions to compare in SSIS or compare on Oracle but not yet a suggestion to compare on SQL Server, although this is valid)
You are comparing 18 million records here so this is a lot of work
Differential replication
You record the changes in the publisher (i.e. Oracle) since the last replication then you apply those changes to the subscriber (i.e. SQL Server)
You can do this manually by implementing triggers and log tables on the Oracle side, then use a regular ETL process (SSIS, command line tools, text files, whatever), probably scheduled in SQL Agent to apply these to the SQL Server.
Or you could do this by using the out of the box replication capability to set up Oracle as a publisher and SQL as a subscriber: https://msdn.microsoft.com/en-us/library/ms151149(v=sql.105).aspx
You're going to have to try a few of these and see what works for you.
Given this objective:
I want to consume less resource to achieve this functionality which takes less time and less resource
transactional replication is far more efficient but complicated. For maintenance purposes, which platforms (.Net, SSIS, Python etc.) are you most comfortable with?
Other alternatives:
If you can use Oracle gateway for SQL Server then you do not need to transfer data and can make the query directly.
If you can't use Oracle gateway, you can use Pentaho data integration or another ETL tool to compare tables and get results. Is easy to use.
I think the best approach is using oracle gateway.Just follow the steps. I have similar type of experience.
Install and Configure Oracle Database Gateway for SQL Server.
https://docs.oracle.com/cd/B28359_01/gateways.111/b31042/installsql.htm
Now you can create a dblink from oracle to sql server.
Create a procedure which compare the missing records in oracle database and insert into sql server database.
For example, you can use this statement inside your procedure.
INSERT INTO "dbo"."sql_server_table"#dblink_name("column1","column2"...."column5")
VALUES
(
select column1,column2....column5 from oracle_table
minus
select "column1","column2"...."column5" from "dbo"."sql_server_table"#dblink_name
)
Create a scheduler which execute the procedure daily.
When both databases are online, missing records will be inserted to sql server. Otherwise the scheduler fail or you can execute the procedure manually.
It takes minimum resource.
I will suggest having a homemade ETL solution.
Schedule an oracle job to export source table data (on a daily
manner based on the application logic ) to plain CSV format.
Schedule a SQL-Server job (with acceptable delay from first oracle job) to read this CSV file and import it
to a medium table inside sql-servter using BULK INSERT.
Last part of the SQL-Server job will be reading medium table data
and do the logic(insert, update target table). I suggest having another table to store reports of this daily job result.
I am trying come up with a way to pull the tables out of an Access database, automate the creation of those same tables in a SQL 2008 DB, and move the data to the new tables. This process will happen on a regular basis and there may be different tables each time.
I would like to do this totally in SSIS.
C# SQL CLR objects are an option.
The main issue I have been running into is how to get the Access table's schema and then convert that to a SQL script that I can run via SSIS.
Any ideas?
TIA
J
SSIS cannot adapt to new tables at runtime. (You can change connections, move a source to a table with a different name, but the same schema) So, it's not really easy to do what I think you are saying: Upsize an arbitrary set of tables in an Access DB to SQL (mirroring their structure and data, naming, etc), so that I can then write some straight SQL to transform the data into another SQL database or the same part of the database.
You can access the SSIS object model from C# and build a package (or modify a template package) programmatically and then execute it. This might offer the best bang for your buck, but the SSIS object model is kind of deep. The SSIS Team blog have finally started putting up examples (a year after I had to figure a lot of this out for myself)
There is always the upsizing wizard, and I'm sure there are some third party tools.