T-SQL: advise on copying data across to another database - sql-server

I need advise on copying daily data to another server.
Just to give you an image of the situation, I will explain a little. there are workstations posting transactions to 2 database servers (DB1 and DB2). These db servers hosted on 2 separate physical servers and are linked. Daily transactions are 50.000 for now but will increase soon. There might be days some workstations down (operational but cannot post data) and transactions posted after a few days.
So, what I do is I run a query on those 2 linked servers. The daily query output contains ~50.000 records with minimum 15 minutes fetching time as linked servers have performance problems.I will create a SP and schedule it to run 2AM in the morning.
My concern starts from here, the output will be copied across to another data warehouse (DW). This is our client's special land, I do not know much about. This DW will be linked onto these db servers to make it possible to send the data (produced by my stored procedure) across.
Now, what would you do to copy the data across:
Create a dummy table on DB1 to copy stored procedure output on the same server so make sure it is available and we do not need to rerun stored procedure again. Then client retrieves it later.
Use "select into" statement to copy the content to remote DW table. I do not know what happens with this one during fetching and sending data across to DW. Remember it takes ~15 mins to fetch the data by my stored procedure.
post the data (retrieved by stored procedure) with xml file through ftp.
Please tell me if there is a way of setting an alert or notification on jobs.
I just want to take precautions so it will be easier to track when something goes wrong.
Any advice is appreciated very much. Thank you. Oz.

When it comes to coping data in SQL Server you need to look at High Availability Solutions, depending on the version and edition of your SQL Server you will have different options.
http://msdn.microsoft.com/en-us/library/ms190202(v=sql.105).aspx
If you need just to move data for specific tables you can have options like SSIS job or SQL Server Replication.
If you are looking to have all tables in a given databases copied to another server you should use Log Shipping. Which allows you to copy entire content of source database to another location. Because this is done of smaller interval the your load will be distributed over larger period of time instead of having large transaction running at once.
Another great alternative is SQL Server Replication. This option will capture transaction on the source and push them to the target. This model requires publisher (source), distributor (can be source or another db) and subscriber (target).
Also you can create SSIS job that runs on frequent basis and just moves specified amount of data.

Related

Speeding up SQL transfer over a network

I have two SQL Server environments, data warehouse which collects data and a datamart which people access for a subset of the data, each with their own SQL Server 2016 databases. I run a script which pulls out data, transforms it and transfer it from the data warehouse to the datamart using Linked Servers. The entire process takes around 60+ hours to run. I want to avoid at all costs at having the data warehouse data in the datamart.
I experimented to see why the whole process was taking so long. I did a backup of the data warehouse, restored it onto the datamart and ran the import script and the entire process took around 3 hours to run. The script itself to 1.5 hours, telling me of the 60+ hours its the linked server transfer of data between the two servers that is the slowest part. I've pretty much ruled out network speed or issues between the two servers; this is all SQL. I'm trying to avoid having to write an application to do all of this in .NET if I can keep it in SQL Server.
Does anyone have any suggestions on how to improve performance time between SQL Server transfers?
The slowliness could be from the destination database
try disable triggers, indexes, locks, etc.
look at this link may help more
https://social.msdn.microsoft.com/Forums/sqlserver/en-US/e04a8e21-54a9-46a4-8eb2-67da291dc7e1/slow-data-transfer-through-linked-server?forum=transactsql

SSIS ETL - Is it a good practice to have the destination DB pull data from sources directly

I have an ETL package that moves data from a number of source SQL Server DBs to a single destination SQL Server DB. All these DBs are on the same server. The destination DB contains a large number of views that reference the source DBs. E.g. SELECT * FROM SourceDB1.dbo.Transactions.
So the majority of the data goes directly source DB => destination DB, without passing through the SSIS server. I'm new to SSIS and wondering if this is a good thing to do, or should I look into changing the process.
Time passes, your company grows. You stand up Server2 and have SourceDBN on there. Now what? Your pattern of SELECT * FROM SourceDB.dbo.Transactions breaks.
SourceDB27, that client pays us a lot of money and so they ask us to add column FooBitsWhatsIt to their Transactions table. Now your SELECT * breaks because you have inconsistent columns across your ecosystem.
Someone writes a big query that takes a while to process - the people in the destination database are negatively impacting the ability of the Source databases to do their regular activity. Had the data been copied over to the destination and not merely referenced, there would be isolation between source and destination activities.
Generally speaking, the above costs and risks outweigh the additional development, storage and processing costs.
When I started learning about ETL and data migration using SSIS I was always told that it is best practice to first move the data into a staging database where you can validate the data, deduplicate, clean etc in there then move it to the destination DB

Applying Delete Operations on Mirror/Parallel DB

THE SETUP
Two Databases at different locations
Local Server(Oracle): Used for in-house data entry and
processing.
Live Server(Postgres): Used as the DB for a public website.
THE SCENARIO
Daily data insertion/updations/deletions are performed on the Local
DB through out the day.
Later after the end of the day the entire data of the current day is
pushed to the Live DB Server using CSV files and Sql merge.
This updates the Live DB server with the latest updations and new
data inserted.
THE PROBLEM
As the Live server is updated using running batch at the end of the day, the deletion operations do not get applied on the Live server.
Due to this unwanted data also remains at the Live Server causing discrepancy in the data on both servers.
How can the delete operation on local DB server be applied on Live Server along with the Updations and Insertions?
P.S. The entire Live DB is to be restructured so any solution that requires breaking down and restructuring the DB server can also be looked into.
Oracle GoldenGate supports replication from Oracle to PostgreSQL. It would certainly be faster and less error prone than your manual approach since it is all handled at a much lower level by the database.
If for some reason you don't want to do that then you are back to triggers tracking deletes in a table with the PK for the deleted records.
Or you could just switch out the PostgreSQL with Oracle :-)

Copy data between two linked servers

I have two MSSQL Server instances and one is on DMZ so it has not access to the inside network.
So SERVER1 (On the inside of firewall) pushes today data to SERVER2 (on DMZ).
How do i get better performance in shuffling large amount of rows to tables on SERVER2? Today when doing this.
INSERT INTO SERVER2.DB.DBO.TABLE SELECT something from SERVER1Table
Its very slow and time consuming and not to say the least it locks the table for outside users.
The thing is that SERVER2 is a webserver that is a portal for customers to log in and check certain information.
Or am I almost pushed into the choice of using pull-data query? So that I need to open up the MSSQL port through the firewall and let the DMZ SERVER2 pull data from SERVER1?
SQL Server Integration Services (SSIS) should be right tool for the job...
The tool's purpose is to transfer and transform data, so they are really good at this.
You can easily extend your packages and develop simple tasks like the one you mention in minutes.
Ssis will likely take a similar amount of time. You should optimise your architecture. Try adding two more steps to minimise locking:
Copy to a new local table very quickly on server, using any filtering.
Copy to server2 on a similarly named new table
Copy from the new table on server2 to the final destination.
This way the slowest step occurs between two tables completely disconnected from affecting users.

Client-side Replication for SQL Server?

I'd like to have some degree of fault tolerance / redundancy with my SQL Server Express database. I know that if I upgrade to a pricier version of SQL Server, I can get "Replication" built in. But I'm wondering if anyone has experience in managing replication on the client side. As in, from my application:
Every time I need to create, update or delete records from the database -- issue the statement to all n servers directly from the client side
Every time I need to read, I can do so from one representative server (other schemes seem possible here, too).
It seems like this logic could potentially be added directly to my Linq-To-SQL Data Context.
Any thoughts?
Every time I need to create, update or
delete records from the database --
issue the statement to all n servers
directly from the client side
Recipe for disaster.
Are you going to have a distributed transaction or just let some of the servers fail? If you have a distributed transaction, what do you do if a server goes offline for a while.
This type of thing can only work if you do it at a server-side data-portal layer where application servers take in your requests and are aware of your database farm. At that point, you're better off just using a higher grade of SQL Server.
I have managed replication from an in-house client. My database model worked on an insert-only mode for all transactions, and insert-update for lookup data. Deletes were not allowed.
I had a central table that everything was related to. I added a field to this table for a date-time stamp which defaulted to NULL. I took data from this table and all related tables into a staging area, did BCP out, cleaned up staging tables on the receiver side, did a BCP IN to staging tables, performed data validation and then inserted the data.
For some basic Fault Tolerance, you can scheduling a regular backup.

Resources