Speeding up SQL transfer over a network - sql-server

I have two SQL Server environments, data warehouse which collects data and a datamart which people access for a subset of the data, each with their own SQL Server 2016 databases. I run a script which pulls out data, transforms it and transfer it from the data warehouse to the datamart using Linked Servers. The entire process takes around 60+ hours to run. I want to avoid at all costs at having the data warehouse data in the datamart.
I experimented to see why the whole process was taking so long. I did a backup of the data warehouse, restored it onto the datamart and ran the import script and the entire process took around 3 hours to run. The script itself to 1.5 hours, telling me of the 60+ hours its the linked server transfer of data between the two servers that is the slowest part. I've pretty much ruled out network speed or issues between the two servers; this is all SQL. I'm trying to avoid having to write an application to do all of this in .NET if I can keep it in SQL Server.
Does anyone have any suggestions on how to improve performance time between SQL Server transfers?

The slowliness could be from the destination database
try disable triggers, indexes, locks, etc.
look at this link may help more
https://social.msdn.microsoft.com/Forums/sqlserver/en-US/e04a8e21-54a9-46a4-8eb2-67da291dc7e1/slow-data-transfer-through-linked-server?forum=transactsql

Related

Alteryx - bulk copy from SQL Server to Greenplum - need tips to increase performance

Need advise here: using Alteryx Designer, I'm pulling a large dataset from SQL Server (10M rows) and need to move into Greenplum DB
I tried both with connecting using Input Data (SQL Server) and Output Data (GP) and also Connect In-DB (SQL Server) and Write Data In-DB (GP)
Any approach is taking a life to complete at the point that i have to cancel the process (to give an idea, over the weekend it ran for 18hours and advanced no further than 1%)
Any good advice or trick to speed up these sort of massive bulk data loading would be very very highly appreciated!
I can control or do modifications on SQL Server and Alteryx to increase performance but not in Greenplum
Thanks in advance.
Regards,
Erick
I'll break down the approaches that you're taking.
You won't be able to use IN-DB tools as the Databases are different, hence you can't push the processing on to the DB...
Using the standard Alteryx Tools, you are bringing the whole table on to your machine and then pushing it out again, there are multiple ways that this could be done depending on where your blockage is.
Looking first at the extract from SQL, 10M rows isn't that much and so you could split the process and write it as a yxdb. If that fails or takes several hours, then you will need to look at the connection to the SQL Server or the resources available on the SQL Server.
Then for the push into Greenplum, there is no PostgreS bulk loader at present and so you can either just try and write the whole table, Or you can write segments of the table into temp tables in Greenplum and then execute a command to combine those tables.
We are pulling millions of rows daily from SQL servers to Greenplum and we use open source tool called Outsourcer. it's great tool and take care of cleansing and other.. We are using this tool for past 3.5 yrs and no issue till now.. It take care of all parallelism and millions of rows are loaded within minutes.
It support incremental or full load. If you need supports Jon Robert owner of the Outsourcers will response to your email within minutes. Here is the link for the tool
https://www.pivotalguru.com/

SSIS Transferring Data to an Oracle DB is Extremely Slow

We are transferring data to an Oracle Database from two different sources and it's extremely slow.
Please see notes and images below. Any suggestions?
Notes:
We're using the Microsoft OLE DB Provider for Oracle.
One data source is SQL Server and includes about 5M records.
The second data source is Oracle and includes about 700M records.
When trying to transfer the SQL Server data, we broke it up into
five "Data Flow Tasks" in the "Control Flow". Each "Data Flow Task"
in turn use an "OLE DB Source" which internally uses a "SQL command"
that effectively selects 1M of the 5M records. When we run this
package it ran the first data flow task for about 3 hours and only
transferred about 50,000 records until we ended the process.
We had similar experience with the Oracle data as well.
For some reason saving to a Oracle Destination is extremely slow.
Interestingly, we once transfer the same 700M records from Oracle to
SQL Server (so the opposite direction) and it worked as expected in
about 4.5 to 5 hours.
Images:
On the Oracle side you can examine v$session to see where the time is being spent (if AWR is licensed on the Oracle instance you can use DBA_HIST_ACTIVE_SESS_HISTORY or v$active_session_history).
I work on Oracle performance problems every day (over 300 production Oracle instances), so I feel qualified to say that I can't give you a specific answer to your question, but I can point you in the right direction.
Typical process mistakes that make inserts slow:
not using array insert
connecting to the DB for each insert (sound strange? believe me
I've seen DataStage and other ETL tools set-up this way)
app server/client not on same local area network as the Oracle instance
indexes on table(s) being inserted into (especially problematic with
bit mapped indexes); requires index update and table update per
statement
redo log files too small on Oracle instance (driving up
redo log file switching)
log_buffer parameter on DB side too small
not enough db writers (see db_writer_processes initialization
parameter)
committing too often
Not an answer, just a bunch of observations and questions...
Any one of the components in the data pipeline could be the bottleneck.
You first need to observe the row counts when running interactively in SSIS and see if there is any obvious clogging going on - i.e. do you have a large rowcount right before your Data conversion transformation and a low one after? Or is it at the Oracle destination? Or is it just taking a long time to come out of SQL? A quick way to check the SQL side is to dump it to a local file instead - that mostly measures the SQL select performance without any blocking from Oracle.
When you run your source query in SQL Server, how long does it take to return all rows?
Your data conversion transformation can be performed in the source query. Every transformation requires set up of buffers, memory etc. and can slow down and block your dataflow. Avoid these and do it in the source query instead
Various buffers and config that exists in Oracle driver. Already addressed in detail by #RogerCornejo. For read performance out of Oracle, I have found altering FetchBufferSize made a huge difference, but you are doing writes here so that's not the case.
Lastly, where are the two database servers and the SSIS client tool situated network wise? If you are running this across three different servers then you have network throughput to consider.
If you use a linked server as suggested, note that SSIS doesn't do any processing at all so you take that whole piece out of the equation
And if you're just looking for the fastest way to transfer data, you might find that dumping to a file and bulk inserting is the fastest
Thank you all for your suggestions. For those who may run into a similar problem in the future, I'm posting what finally worked for me. The answer was ... switching the provider. The ODBC or Attunity providers were much faster, by a factor of almost 800X.
Remember that my goal was to move data from a SQL Server Database to an Oracle database. I originally used an OLE DB provider for both the source and destination. This provider works fine if you are moving data from SQL Server to SQL Server because it allows you to use the "Fast Load" option on the destination which in turn allows you to use batch processing.
However, the OLE DB provider doesn't allow the "Fast Load" option with an Oracle DB as the destination (couldn't get it to work and read elsewhere that it doesn't work). Because I couldn't use the "Fast Load" option I couldn't batch and instead was inserting records row by row which was extremely slow.
A colleague suggested trying ODBC and others suggested trying Microsoft's Attunity Connectors for Oracle. I didn't think the difference would be so great because in my experience ODBC had similar (and sometimes less) performance than OLE DB (hadn't tried Attunity). BUT... that was when moving data from and to a SQL Server database or staying in the Microsoft world.
When moving data from a SQL Server database to an Oracle database, there was a huge difference! Both ODBC and Attunity out performed OLE DB dramatically.
Here were my summarized performance test results inserting 5.4M records from a SQL Server database to an Oracle Database.
When doing all the work on one local computer.
OLE DB source and destination inserted 12 thousand records per minute which would have taken approx. 7 hours to complete.
ODBC source and destination inserted 9 Million records per minute which only took approx. 30 seconds to complete.
When moving data from one network/remote computer to another network/remote computer.
OLE DB source and destination inserted 115 records per minute which would have taken approx. 32 days to complete.
ODBC source and destination inserted 1 Million records per minute which only took approx. 5 minutes to complete.
Big difference!
Now why when working locally it only took 30 seconds and remotely it took 5 minutes is another issue for another day, but for now I have something workable (it should be slower on the network, but surprised it's that much slower).
Thanks again to everyone!
Extra notes:
My OLE DB results were similar with either Microsoft's or Oracle OLE DB providers for Oracle databases.
Attunity was a little faster than ODBC. I didn't get to test on remote servers or on larger data set, but locally it was a consitently about 2 to 3 seconds faster than ODBC. Those seconds could add up on a large data set so take note.

T-SQL: advise on copying data across to another database

I need advise on copying daily data to another server.
Just to give you an image of the situation, I will explain a little. there are workstations posting transactions to 2 database servers (DB1 and DB2). These db servers hosted on 2 separate physical servers and are linked. Daily transactions are 50.000 for now but will increase soon. There might be days some workstations down (operational but cannot post data) and transactions posted after a few days.
So, what I do is I run a query on those 2 linked servers. The daily query output contains ~50.000 records with minimum 15 minutes fetching time as linked servers have performance problems.I will create a SP and schedule it to run 2AM in the morning.
My concern starts from here, the output will be copied across to another data warehouse (DW). This is our client's special land, I do not know much about. This DW will be linked onto these db servers to make it possible to send the data (produced by my stored procedure) across.
Now, what would you do to copy the data across:
Create a dummy table on DB1 to copy stored procedure output on the same server so make sure it is available and we do not need to rerun stored procedure again. Then client retrieves it later.
Use "select into" statement to copy the content to remote DW table. I do not know what happens with this one during fetching and sending data across to DW. Remember it takes ~15 mins to fetch the data by my stored procedure.
post the data (retrieved by stored procedure) with xml file through ftp.
Please tell me if there is a way of setting an alert or notification on jobs.
I just want to take precautions so it will be easier to track when something goes wrong.
Any advice is appreciated very much. Thank you. Oz.
When it comes to coping data in SQL Server you need to look at High Availability Solutions, depending on the version and edition of your SQL Server you will have different options.
http://msdn.microsoft.com/en-us/library/ms190202(v=sql.105).aspx
If you need just to move data for specific tables you can have options like SSIS job or SQL Server Replication.
If you are looking to have all tables in a given databases copied to another server you should use Log Shipping. Which allows you to copy entire content of source database to another location. Because this is done of smaller interval the your load will be distributed over larger period of time instead of having large transaction running at once.
Another great alternative is SQL Server Replication. This option will capture transaction on the source and push them to the target. This model requires publisher (source), distributor (can be source or another db) and subscriber (target).
Also you can create SSIS job that runs on frequent basis and just moves specified amount of data.

Offloading SQL Report Server processing to a dedicated server - is it worth it?

Not sure if this is a SO or a ServerFault question, so please feel free to move if it's not in the right place:
I have a client with a large database containing a table with around 30-35 million rows running on a SQL2008R2 server (the server is pretty high spec, 16 cores, 92 gig ram, RAID etc). There are other tables this table may join on, but it is the main driver of a several reports.
Their SSRS instance/database and query source database are both running on the same box/sql instance
They regularly run ad-hoc reports from this database (which have undergone extensive optimisation), many of which may end up touching a lot of the data in the table. After looking at the report server stats it appears that the data fetch doesn't actually take that long, but a lot of data is returned and report processing takes a fair while: it can take up to 20-30 minutes to process some of the larger reports, which can have tens of thousands of pages (the data fetch in these cases is less than 10 seconds).
(Note: I realise that there is never really a need to run 25,000 pages off but the client insists and won't listen to reason...something about Excel spreadsheets *FACEPALM!*)
At the moment they are concerned about a couple of performance issues that crop up sporadically and the culprit may be the ad-hoc reporting.
We are looking at offloading the report processing anyway, so thought that this would be an ideal opportunity - but before doing so I'm wondering how much relief this will give the SQL server.
If I move the SSRS app and database onto another SQL host and remotely query the data (network conditions should be ideal as this is datacentre based), will I see any performance gains?
This is mainly based on guesswork at this stage but I see the following being the factors that could affect performance:
I/O for moving a shedload of rows from the query source to RS temp DB
CPU load when the report server is crunching all the data
In moving to another host I see these factors being reduced for the SQL server. The new server will be solely responsible for report processing (and should also be high spec), so hopefully there will be no contention when processing reports.
Do I sound like I am on the right track in my assumptions? Is there anything else that I may have missed which could adversely affect performance or improve performance?
Thanks in advance
You should look at transactional replication to send data from the main server to a database on the reporting server. Querying the tables directly over the network will only slow things down even more.

SQL Server Table > MS Access Local Copy?

I'm looking for a little advice.
I have some SQL Server tables I need to move to local Access databases for some local production tasks - once per "job" setup, w/400 jobs this qtr, across a dozen users...
A little background:
I am currently using a DSN-less approach to avoid distribution issues
I can create temporary LINKS to the remote tables and run "make table" queries to populate the local tables, then drop the remote tables. Works as expected.
Performance here in US is decent - 10-15 seconds for ~40K records. Our India teams are seeing >5-10 minutes for the same datasets. Their internet connection is decent, not great and a variable I cannot control.
I am wondering if MS Access is adding some overhead here than can be avoided by a more direct approach: i.e., letting the server do all/most of the heavy lifting vs Access?
I've tinkered with various combinations, with no clear improvement or success:
Parameterized stored procedures from Access
SQL Passthru queries from Access
ADO vs DAO
Any suggestions, or an overall approach to suggest? How about moving data as XML?
Note: I have Access 7, 10, 13 users.
Thanks!
It's not entirely clear but if the MSAccess database performing the dump is local and the SQL Server database is remote, across the internet, you are bound to bump into the physical limitations of the connection.
ODBC drivers are not meant to be used for data access beyond a LAN, there is too much latency.
When Access queries data, is doesn't open a stream, it fetches blocks of it, wait for the data wot be downloaded, then request another batch. This is OK on a LAN but quickly degrades over long distances, especially when you consider that communication between the US and India has probably around 200ms latency and you can't do much about it as it adds up very quickly if the communication protocol is chatty, all this on top of the connection's bandwidth that is very likely way below what you would get on a LAN.
The better solution would be to perform the dump locally and then transmit the resulting Access file after it has been compacted and maybe zipped (using 7z for instance for better compression). This would most likely result in very small files that would be easy to move around in a few seconds.
The process could easily be automated. The easiest is maybe to automatically perform this dump every day and making it available on an FTP server or an internal website ready for download.
You can also make it available on demand, maybe trough an app running on a server and made available through RemoteApp using RDP services on a Windows 2008 server or simply though a website, or a shell.
You could also have a simple windows service on your SQL Server that listens to requests for a remote client installed on the local machines everywhere, that would process the dump and sent it to the client which would then unpack it and replace the previously downloaded database.
Plenty of solutions for this, even though they would probably require some amount of work to automate reliably.
One final note: if you automate the data dump from SQL Server to Access, avoid using Access in an automated way. It's hard to debug and quite easy to break. Use an export tool instead that doesn't rely on having Access installed.
Renaud and all, thanks for taking time to provide your responses. As you note, performance across the internet is the bottleneck. The fetching of blocks (vs a continguous DL) of data is exactly what I was hoping to avoid via an alternate approach.
Or workflow is evolving to better leverage both sides of the clock where User1 in US completes their day's efforts in the local DB and then sends JUST their updates back to the server (based on timestamps). User2 in India, also has a local copy of the same DB, grabs just the updated records off the server at the start of his day. So, pretty efficient for day-to-day stuff.
The primary issue is the initial DL of the local DB tables from the server (huge multi-year DB) for the current "job" - should happen just once at the start of the effort (~1 wk long process) This is the piece that takes 5-10 minutes for India to accomplish.
We currently do move the DB back and forth via FTP - DAILY. It is used as a SINGLE shared DB and is a bit LARGE due to temp tables. I was hoping my new timestamped-based push-pull of just the changes daily would have been an overall plus. Seems to be, but the initial DL hurdle remains.

Resources