Snowflake to SQL Server data load performance - sql-server

I am trying to improve performance of a SQL query used to move data from snowflake view to SQL Server table. Query uses linked server created using a ODBC driver to connect to Snowflake view from SQL Server using an OPENQUERY statement. Currently 50 million rows take around 2 hours to move the data. Please suggest if there is anything that can be done to improve performance.
Sample query we are using:
SELECT *
INTO #temp
FROM OPENQUERY (SnowflakeServer, 'select * from "SnowflakeDB"."SnowflakeSchema"."mytable"')

So this is not based on anything concrete, but a few ideas could be.
Importing to a table with an columnstore index
Doing the query using PolyBase
Exporting your query to CSV and loading using BULK INSERT

Related

SSIS, query Oracle table using ID's from SQL Server?

Here's the basic idea of what I want to do in SSIS:
I have a large query against a production Oracle database, and I need the following where clause that brings in a long list of ids from SQL Server. From there, the results are sent elsewhere.
select ...
from Oracle_table(s) --multi-join
where id in ([select distinct id from SQL_SERVER_table])
Alternatively, I could write the query this way:
select ...
from Oracle_table(s) --multi-join
...
join SQL_SERVER_table sst on sst.ID = Oracle_table.ID
Here are my limitations:
The Oracle query is large and cannot be run without the where id in (... clause
This means I cannot run the Oracle query, then join it against the ids in another step. I tried this, and the DBA's killed the temp table after it became 3 TB in size.
I have 160k id's
This means it is not practical to iterate through the id's one by one. In the past, I have run against ~1000 IDs, using a comma-separated list. It runs relatively fast - a few minutes.
The main query is in Oracle, but the ids are in SQL Server
I do not have the ability to write to Oracle
I've found many questions like this.
None of the answers I have found have a solution to my limitations.
Similar question:
Query a database based on result of query from another database
To prevent loading all rows from the Oracle table. The only way is to apply the filter in the Oracle database engine. I don't think this can be achieved using SSIS since you have more than 160000 ids in the SQL Server table, which cannot be efficiently loaded and passed to the Oracle SQL command:
Using Lookups and Merge Join will require loading all data from the Oracle database
Retrieving data from SQL Server, building a comma-separated string, and passing it to the Oracle SQL command cannot be done with too many IDs (160K).
The same issue using a Script Task.
Creating a Linked Server in SQL Server and Joining both tables will load all data from the Oracle database.
To solve your problem, you should search for a way to create a link to the SQL Server database from the Oracle engine.
Oracle Heterogenous Services
I don't have much experience in Oracle databases. Still, after a small research, I found something in Oracle equivalent to "Linked Servers" in SQL Server called "heterogeneous connectivity".
The query syntax should look like this:
select *
from Oracle_table
where id in (select distinct id from SQL_SERVER_table#sqlserverdsn)
You can refer to the following step-by-step guides to read more on how to connect to SQL Server tables from Oracle:
What is Oracle equivalent for Linked Server and can you join with SQL Server?
Making a Connection from Oracle to SQL Server - 1
Making a Connection from Oracle to SQL Server - 2
Heterogeneous Database connections - Oracle to SQL Server
Importing Data from SQL Server to a staging table in Oracle
Another approach is to use a Data Flow Task that imports IDs from SQL Server to a staging table in Oracle. Then use the staging table in your Oracle query. It would be better to create an index on the staging table. (If you do not have permission to write to the Oracle database, try to get permission to a separate staging database.)
Example of exporting data from SQL Server to Oracle:
Export SQL Server Data to Oracle using SSIS
Minimizing the data load from the Oracle table
If none of the solutions above solves your issue. You can try minimizing the data loaded from the Oracle database as much as possible.
As an example, you can try to get the Minimum and Maximum IDs from the SQL Server table, store both values within two variables. Then, you can use both variables in the SQL Command that loads the data from the Oracle table, like the following:
SELECT * FROM Oracle_Table WHERE ID > #MinID and ID < #MaxID
This will remove a bunch of useless data in your operation. In case your ID column is a string, you can use other measures to filter data, such as the string length, the first character.

Alternate to SQL Server Linked Server

I am trying to build a program that compares 2 database servers that have exact table but in some table have additional column. I am using linked server to connect these 2 database servers.
But I found a problem, when I try to compare some data the connection is mostly timeout. And when I check Activity Monitor and Execution plan, more than 90% is in remote query - this makes comparing 1 record that has 5 child entries run for 5-7 minutes.
This is a sample query that I try to run.
Select
pol.PO_TXN_ID, pol.Pol_Num
From
ServerA.InstanceA.dbo.POLine pol
Where
not exist (Select 1
From ServerB.InstanceA.dbo.POLine pol2
where pol.PO_TXN_ID = pol2.PO_TXN_ID
and pol.Pol_Num = pol2.Pol_Num)
I tried using OPENROWSET, but our administrator does not permit to install it on the production server.
Is there any alternative that I can use to optimize my query instead using linked server?
Options:
OpenQuery() / 4 part naming with temp tables.
ETL (eg: SQL Server Integration Services)
The problem with linked servers especially with 4 part naming like in your example:
The query engine doesn't know how to optimize it. He can't access statistics on the linked servers
Resulting in doing full table scans, pulling all the data to the source SQL server and then processing it. (High network IO, Bad execution plans, resulting in long running queries)
Option 1
Create a temp table (preferably with indexes)
Query the linked server with OPENQUERY and preferably a filter condition. eg:
CREATE TABLE #MyTempTable(Id INT NOT NULL PRIMARY KEY, /*Other columns*/)
INSERT INTO #MyTempTable(Id, , /*Other columns*/)
SELECT *
FROM OPENQUERY(ServerA, 'SELECT Id, /*Other columns*/ FROM Table WHERE /*Condition*/')
Use the temp table(s) to do your calculation.
Still needs at least 1 linked server
OPENQUERY has better performance when your database is not a SQL Server (e.g. Postgres, MySql, Oracle,...) as the query is executed on the linked server instead of pulling all the data to the source server.
Option 2
You can use an ETL tool like SQL Server Integration Services (SSIS)
Load the data from the 2 servers
Use a Slowly changing dimension or lookup component to determine the differences.
Insert/update what you want/need
No linked servers are needed, SSIS can connect to the databases directly

SSIS update DB2 database from SQL Server

I have an SSIS package which imports full table data from DB2 database and loads in SQL Server on a daily basis. At the end of the day I need to post only updates (only some rows of table which are edited) to DB2 from SQL Server.
I have couple of options to do this task
Write a stored procedure using linked servers and update only required rows to DB2
Use script component in SSIS and perform row by row update.
Both of them can work, however I don't want to use either of them. Is there any other better solution apart from these which I can perform in SSIS?
Note : I cannot truncate and load all records from SQL Server to db2 as this is not a feasible solution.
Thank you

Faster way to insert records using sql server and jdbc

I am running a java application in which thousands of records to be inserted from one table to another. So I am doing batch insert with 100 records each. My database is SQL Server 2014 and I am using jdbc connection. Is there a faster way that will insert these thousands of records within less time?
Instead of inserting the rows using JDBC batch a faster alternative that would perform everything in the SQL Server would be to use INSERT INTO...SELECT, which allows to select rows from one table and insert into another.

SQL Server Linked Server to Progress is slow with an openquery view

We have a SQL Server database setup with a Linked Server setup connecting to a Progress OpenEdge database. We created a SQL Server view (for using with SSRS) of some of the OpenEdge tables using code similar to the following:
CREATE VIEW accounts AS SELECT * FROM OPENQUERY(myLinkedServerName,
'SELECT * FROM PUB.accounts')
CREATE VIEW clients AS SELECT * FROM OPENQUERY(myLinkedServerName,
'SELECT * FROM PUB.clients')
For some reason the queries seem to bring back the whole table and then filter on the SQL side instead of executing the query on the Progress side.
Anybody know why or how to remedy the situation?
Thanks
Is it any faster when executed as a native OpenEdge SQL query? (You can use the sqlexp command line tool to run the query from a proenv prompt.)
If it is not then the issue may be that you need to run UPDATE STATISTICS on the database.
http://knowledgebase.progress.com/articles/Article/20992
You may also need to run dbtool to adjust field widths (OpenEdge fields are all variable width and can be over-stuffed -- which gives SQL clients fits.)
http://knowledgebase.progress.com/articles/Article/P24496

Resources