Alternate to SQL Server Linked Server - sql-server

I am trying to build a program that compares 2 database servers that have exact table but in some table have additional column. I am using linked server to connect these 2 database servers.
But I found a problem, when I try to compare some data the connection is mostly timeout. And when I check Activity Monitor and Execution plan, more than 90% is in remote query - this makes comparing 1 record that has 5 child entries run for 5-7 minutes.
This is a sample query that I try to run.
Select
pol.PO_TXN_ID, pol.Pol_Num
From
ServerA.InstanceA.dbo.POLine pol
Where
not exist (Select 1
From ServerB.InstanceA.dbo.POLine pol2
where pol.PO_TXN_ID = pol2.PO_TXN_ID
and pol.Pol_Num = pol2.Pol_Num)
I tried using OPENROWSET, but our administrator does not permit to install it on the production server.
Is there any alternative that I can use to optimize my query instead using linked server?

Options:
OpenQuery() / 4 part naming with temp tables.
ETL (eg: SQL Server Integration Services)
The problem with linked servers especially with 4 part naming like in your example:
The query engine doesn't know how to optimize it. He can't access statistics on the linked servers
Resulting in doing full table scans, pulling all the data to the source SQL server and then processing it. (High network IO, Bad execution plans, resulting in long running queries)
Option 1
Create a temp table (preferably with indexes)
Query the linked server with OPENQUERY and preferably a filter condition. eg:
CREATE TABLE #MyTempTable(Id INT NOT NULL PRIMARY KEY, /*Other columns*/)
INSERT INTO #MyTempTable(Id, , /*Other columns*/)
SELECT *
FROM OPENQUERY(ServerA, 'SELECT Id, /*Other columns*/ FROM Table WHERE /*Condition*/')
Use the temp table(s) to do your calculation.
Still needs at least 1 linked server
OPENQUERY has better performance when your database is not a SQL Server (e.g. Postgres, MySql, Oracle,...) as the query is executed on the linked server instead of pulling all the data to the source server.
Option 2
You can use an ETL tool like SQL Server Integration Services (SSIS)
Load the data from the 2 servers
Use a Slowly changing dimension or lookup component to determine the differences.
Insert/update what you want/need
No linked servers are needed, SSIS can connect to the databases directly

Related

SSIS, query Oracle table using ID's from SQL Server?

Here's the basic idea of what I want to do in SSIS:
I have a large query against a production Oracle database, and I need the following where clause that brings in a long list of ids from SQL Server. From there, the results are sent elsewhere.
select ...
from Oracle_table(s) --multi-join
where id in ([select distinct id from SQL_SERVER_table])
Alternatively, I could write the query this way:
select ...
from Oracle_table(s) --multi-join
...
join SQL_SERVER_table sst on sst.ID = Oracle_table.ID
Here are my limitations:
The Oracle query is large and cannot be run without the where id in (... clause
This means I cannot run the Oracle query, then join it against the ids in another step. I tried this, and the DBA's killed the temp table after it became 3 TB in size.
I have 160k id's
This means it is not practical to iterate through the id's one by one. In the past, I have run against ~1000 IDs, using a comma-separated list. It runs relatively fast - a few minutes.
The main query is in Oracle, but the ids are in SQL Server
I do not have the ability to write to Oracle
I've found many questions like this.
None of the answers I have found have a solution to my limitations.
Similar question:
Query a database based on result of query from another database
To prevent loading all rows from the Oracle table. The only way is to apply the filter in the Oracle database engine. I don't think this can be achieved using SSIS since you have more than 160000 ids in the SQL Server table, which cannot be efficiently loaded and passed to the Oracle SQL command:
Using Lookups and Merge Join will require loading all data from the Oracle database
Retrieving data from SQL Server, building a comma-separated string, and passing it to the Oracle SQL command cannot be done with too many IDs (160K).
The same issue using a Script Task.
Creating a Linked Server in SQL Server and Joining both tables will load all data from the Oracle database.
To solve your problem, you should search for a way to create a link to the SQL Server database from the Oracle engine.
Oracle Heterogenous Services
I don't have much experience in Oracle databases. Still, after a small research, I found something in Oracle equivalent to "Linked Servers" in SQL Server called "heterogeneous connectivity".
The query syntax should look like this:
select *
from Oracle_table
where id in (select distinct id from SQL_SERVER_table#sqlserverdsn)
You can refer to the following step-by-step guides to read more on how to connect to SQL Server tables from Oracle:
What is Oracle equivalent for Linked Server and can you join with SQL Server?
Making a Connection from Oracle to SQL Server - 1
Making a Connection from Oracle to SQL Server - 2
Heterogeneous Database connections - Oracle to SQL Server
Importing Data from SQL Server to a staging table in Oracle
Another approach is to use a Data Flow Task that imports IDs from SQL Server to a staging table in Oracle. Then use the staging table in your Oracle query. It would be better to create an index on the staging table. (If you do not have permission to write to the Oracle database, try to get permission to a separate staging database.)
Example of exporting data from SQL Server to Oracle:
Export SQL Server Data to Oracle using SSIS
Minimizing the data load from the Oracle table
If none of the solutions above solves your issue. You can try minimizing the data loaded from the Oracle database as much as possible.
As an example, you can try to get the Minimum and Maximum IDs from the SQL Server table, store both values within two variables. Then, you can use both variables in the SQL Command that loads the data from the Oracle table, like the following:
SELECT * FROM Oracle_Table WHERE ID > #MinID and ID < #MaxID
This will remove a bunch of useless data in your operation. In case your ID column is a string, you can use other measures to filter data, such as the string length, the first character.

Loading data from oracle to SQL in multiple tables simultaneously using SSIS

enter image description here
I am creating a clinical data warehouse, so I am testing different scenarios. I am loading the below tables from oracle DB (Attunity connector) to SQL DB (OLE DB):
Table1 1.2 GB(3 million rows)
Table2 20 GB(200 million rows)
Table3 100 GB(250 million rows)
Table4 25 GB(60 million rows)
For my initial load I am planning to use SSIS and just select * from TABLE1/TABLE2/TABLE3/TABLE4
Questions :
Is it ok to have multiple data flow tasks for loading each table in one package. So that they are all running together. i just wanted to improve the speed with that. But somehow it is slower than if I run it individually.
Also for loading complete tables is "select * from table" a good way? It seems pretty slow!!
You can have as many parallel data flow tasks executing as the number of processor cores you have minus one. That is, if you are using an octacore processor, the ideal number of parallel tasks is 7 (8 -1 ). Just put in it different sequence containers(not compulsory,but for the sake of readability) and execute.
You can speed up the data load by adjusting several things like the setting DelayValidation=true and using OPTION ( FAST 10000(or any value,just do some trials)) and also play around with the DefaultBufferSize and DefaultBufferMaxRows until you get the right one. Also, check if the MAXDOP value is not set to 1 int the settings, if you intend to run parallel DFTs.
And, NEVER use SELECT * from table_name. List out the column names, * adds additional overhead and can slow down your query considerably.
Process 1: Using SSMA
You can use SQL Server Migration Assistant (SSMA) for Migration the data from Oracle to Sql Server Databases/Schemas/Tables.
This is Open source tool from microsoft for database migration.
Microsoft SQL Server Migration Assistant (SSMA) is a tool designed to automate database migration to SQL Server from Microsoft Access, DB2, MySQL, Oracle, and SAP ASE.
Process 2: Using SSIS
You can also use SQL Server Integration Services (SSIS) package for Migration.
Create SSIS package from Import/Export wizard and run the package into command line.

Where does an Access query run using linked tables?

I use MS Access 2010 on my PC, to link MS SQL tables from our server on cloud.
When I run a query I wrote on Access on my PC involving the lined tables, does the query retreive data from the SQL server over the connection, or is there a cashed data locally on my PC that is used instead?
Simply the question is: In case of using linked tables in Access, does querying these tables run locally where Access database is in use or on the server?
The query will run locally with ordinary queries and linked tables. This means that Access will need to pull all of the data from the linked tables and filter the rows locally. This can be very bad for performance if your WHERE clause only returns a small percentage of rows.
An alternative would be to use pass through queries where the query is executed on the server. This means only the data you need will pass over the network and the performance could be far better.

Compare millions of records from Oracle to SQL server

I have an Oracle database and a SQL Server database. There is one table say Inventory which contains millions of rows in both database tables and it keeps growing.
I want to compare the Oracle table data with the SQL Server data to find out which records are missing in the SQL Server table on daily basis.
Which is best approach for this?
Create SSIS package.
Create Windows service.
I want to consume less resource to achieve this functionality which takes less time and less resource.
Eg : 18 millions records in oracle and 16/17 millions in SQL Server
This situation of two different database arise because two different application online and offline
EDIT : How about connecting SQL server from oracle through Oracle Gateway to SQL server to
1) Direct query to SQL server from Oracle to update missing record in SQL server for 1st time.
2) Create a trigger on Oracle which gets executed when record is deleted from Oracle and it insert deleted record in new oracle table.
3) Create SSIS package to map newly created oracle table with SQL server to update SQL server record.This way only few records have to process daily through SSIS.
What do you think of this approach ?
I would create an SSIS package and load the data from the Oracle table use a Data Flow / OLE DB Data Source. If you have SQL Enterprise, the Attunity Connectors are a bit faster.
Then I would load key from the SQL Server table into a Lookup transformation, where I would match the 2 sources on the key, and direct unmatched rows into a separate output.
Finally I would direct the unmatched rows output to a OLE DB Command, to update the SQL Server table.
This SSIS package will require a lot of memory, but as the matching is done in memory with minimal IO, it will probably outperform other solutions for speed. It will need enough free memory to cache all the keys from the SQL Server Table.
SSIS also has the advantage that it has lots of other transformation functions available if you need them later.
What you basically want to do is replication from Oracle to SQL Server.
You could do this in SSIS, A windows Service or indeed a multitude of platforms.
The real trick is using the correct design pattern.
There are two general design patterns
Snapshot Replication
You take all records from both systems and compare them somewhere (so far we have suggestions to compare in SSIS or compare on Oracle but not yet a suggestion to compare on SQL Server, although this is valid)
You are comparing 18 million records here so this is a lot of work
Differential replication
You record the changes in the publisher (i.e. Oracle) since the last replication then you apply those changes to the subscriber (i.e. SQL Server)
You can do this manually by implementing triggers and log tables on the Oracle side, then use a regular ETL process (SSIS, command line tools, text files, whatever), probably scheduled in SQL Agent to apply these to the SQL Server.
Or you could do this by using the out of the box replication capability to set up Oracle as a publisher and SQL as a subscriber: https://msdn.microsoft.com/en-us/library/ms151149(v=sql.105).aspx
You're going to have to try a few of these and see what works for you.
Given this objective:
I want to consume less resource to achieve this functionality which takes less time and less resource
transactional replication is far more efficient but complicated. For maintenance purposes, which platforms (.Net, SSIS, Python etc.) are you most comfortable with?
Other alternatives:
If you can use Oracle gateway for SQL Server then you do not need to transfer data and can make the query directly.
If you can't use Oracle gateway, you can use Pentaho data integration or another ETL tool to compare tables and get results. Is easy to use.
I think the best approach is using oracle gateway.Just follow the steps. I have similar type of experience.
Install and Configure Oracle Database Gateway for SQL Server.
https://docs.oracle.com/cd/B28359_01/gateways.111/b31042/installsql.htm
Now you can create a dblink from oracle to sql server.
Create a procedure which compare the missing records in oracle database and insert into sql server database.
For example, you can use this statement inside your procedure.
INSERT INTO "dbo"."sql_server_table"#dblink_name("column1","column2"...."column5")
VALUES
(
select column1,column2....column5 from oracle_table
minus
select "column1","column2"...."column5" from "dbo"."sql_server_table"#dblink_name
)
Create a scheduler which execute the procedure daily.
When both databases are online, missing records will be inserted to sql server. Otherwise the scheduler fail or you can execute the procedure manually.
It takes minimum resource.
I will suggest having a homemade ETL solution.
Schedule an oracle job to export source table data (on a daily
manner based on the application logic ) to plain CSV format.
Schedule a SQL-Server job (with acceptable delay from first oracle job) to read this CSV file and import it
to a medium table inside sql-servter using BULK INSERT.
Last part of the SQL-Server job will be reading medium table data
and do the logic(insert, update target table). I suggest having another table to store reports of this daily job result.

Inserting results from a very large OPENQUERY without distributed transactions

I am trying to insert rows into a Microsoft SQL Server 2014 table from a query that hits a linked Oracle 11g server. I have read only access rights on the linked server. I have traditionally used OPENQUERY to to do something like the following:
INSERT INTO <TABLE> SELECT * FROM OPENQUERY(LINKED_SERVER, <SQL>)
Over time the SQL queries I run have been getting progressively more complex and recently surpassed the OPENQUERY limit of 8000 characters. The general consensus on the web appears to be to switch to something like the following:
INSERT INTO <TABLE> EXECUTE(<SQL>) AT LINKED_SERVER
However, this seems to require that distributed transactions are enabled on the linked server, which isn't an option this project. Are there any other possible solutions I am missing?
Can you get your second method to work if you disable the "remote proc transaction promotion" linked server option?
EXEC master.dbo.sp_serveroption
#server = 'YourLinkedServerName',
#optname = 'remote proc transaction promotion',
#optvalue = 'false'
If SQL Server Integration Services is installed/available, you could do this with an SSIS package. SQL Server Import/Export Wizard can automate a lot of the package configuration/setup for you.
Here's a previous question with some useful links on SSIS to Oracle:
Connecting to Oracle Database using Sql Server Integration Services
If you're interested in running it via T-SQL, here's an article on executing SSIS packages from a stored proc:
http://www.databasejournal.com/features/mssql/executing-a-ssis-package-from-stored-procedure-in-sql-server.html
I've been in a similar situation before, what worked for me was to decompose the large query string while still using the query method below. (I did not have the luxury of SSIS).
FROM OPENQUERY(LINKED_SERVER, < SQL >)
Instead of Inserting directly into your table, move your main result set into a local temporary landing table first (could be a physical or temp table).
Decompose your < SQL > query by moving transformation and business logic code into SQL Server boundary out of the < SQL > query.
If you have joins in your < SQL > query bring these result sets across to SQL Server as well and then join locally to your main result set.
Finally perform your insert locally.
Clear your staging area.
There are various approaches (like wrapping your open queries in Views) but I like flexibility and found that reducing the size of my open queries to the minimum, storing and transforming locally yielded better results.
Hope this helps.

Resources