Copy data between two linked servers - sql-server

I have two MSSQL Server instances and one is on DMZ so it has not access to the inside network.
So SERVER1 (On the inside of firewall) pushes today data to SERVER2 (on DMZ).
How do i get better performance in shuffling large amount of rows to tables on SERVER2? Today when doing this.
INSERT INTO SERVER2.DB.DBO.TABLE SELECT something from SERVER1Table
Its very slow and time consuming and not to say the least it locks the table for outside users.
The thing is that SERVER2 is a webserver that is a portal for customers to log in and check certain information.
Or am I almost pushed into the choice of using pull-data query? So that I need to open up the MSSQL port through the firewall and let the DMZ SERVER2 pull data from SERVER1?

SQL Server Integration Services (SSIS) should be right tool for the job...
The tool's purpose is to transfer and transform data, so they are really good at this.
You can easily extend your packages and develop simple tasks like the one you mention in minutes.

Ssis will likely take a similar amount of time. You should optimise your architecture. Try adding two more steps to minimise locking:
Copy to a new local table very quickly on server, using any filtering.
Copy to server2 on a similarly named new table
Copy from the new table on server2 to the final destination.
This way the slowest step occurs between two tables completely disconnected from affecting users.

Related

T-SQL: advise on copying data across to another database

I need advise on copying daily data to another server.
Just to give you an image of the situation, I will explain a little. there are workstations posting transactions to 2 database servers (DB1 and DB2). These db servers hosted on 2 separate physical servers and are linked. Daily transactions are 50.000 for now but will increase soon. There might be days some workstations down (operational but cannot post data) and transactions posted after a few days.
So, what I do is I run a query on those 2 linked servers. The daily query output contains ~50.000 records with minimum 15 minutes fetching time as linked servers have performance problems.I will create a SP and schedule it to run 2AM in the morning.
My concern starts from here, the output will be copied across to another data warehouse (DW). This is our client's special land, I do not know much about. This DW will be linked onto these db servers to make it possible to send the data (produced by my stored procedure) across.
Now, what would you do to copy the data across:
Create a dummy table on DB1 to copy stored procedure output on the same server so make sure it is available and we do not need to rerun stored procedure again. Then client retrieves it later.
Use "select into" statement to copy the content to remote DW table. I do not know what happens with this one during fetching and sending data across to DW. Remember it takes ~15 mins to fetch the data by my stored procedure.
post the data (retrieved by stored procedure) with xml file through ftp.
Please tell me if there is a way of setting an alert or notification on jobs.
I just want to take precautions so it will be easier to track when something goes wrong.
Any advice is appreciated very much. Thank you. Oz.
When it comes to coping data in SQL Server you need to look at High Availability Solutions, depending on the version and edition of your SQL Server you will have different options.
http://msdn.microsoft.com/en-us/library/ms190202(v=sql.105).aspx
If you need just to move data for specific tables you can have options like SSIS job or SQL Server Replication.
If you are looking to have all tables in a given databases copied to another server you should use Log Shipping. Which allows you to copy entire content of source database to another location. Because this is done of smaller interval the your load will be distributed over larger period of time instead of having large transaction running at once.
Another great alternative is SQL Server Replication. This option will capture transaction on the source and push them to the target. This model requires publisher (source), distributor (can be source or another db) and subscriber (target).
Also you can create SSIS job that runs on frequent basis and just moves specified amount of data.

Store database in sql server client wise

I have develop application in which i have created different logins for every client.Our applications is having so many clients like job portals or facebook and every client having huge amount of data .If i use single database then one table get huge amount of data for all client
I find out one solution for that and solution is to create separate database for every client but as there are so many client then we need to create so many databases so that not correct solution
Please can you tell me right way to implement this by using sql server 2008 r2
Thanks
You could try having one schema per client, and that client's logon has that schema as their default and is the only schema that they have access to. However you'll have a lot of schemas so it may not be much help! (Also, iof you're using something like EF to access the db it won't work.)
Single database good:
Easy management
Single database bad:
Possible performance problems (although not until you get into
billions of rows; one DB I designed had a table with more than 21B
rows after 3 months; lucky I made the IDENTITY column a BigInt!)
Security issues/complexity: how do you stop one client accessing
another's data?
Single point of failure for all clients
Multiple database good
Security is easier
Single point of failure per client (assuming multiple DB Servers to
spread that load also)
More flexibility in applying updates: some clients are OK with
Wednesday, some with Thursday
I'm sure that there are other issues as well. Really it's up to your requirements and how they can best be met,
Multiple db bad:
More management required
Given a DB has overhead, your overhead resource usage goes up

SQL Server Table > MS Access Local Copy?

I'm looking for a little advice.
I have some SQL Server tables I need to move to local Access databases for some local production tasks - once per "job" setup, w/400 jobs this qtr, across a dozen users...
A little background:
I am currently using a DSN-less approach to avoid distribution issues
I can create temporary LINKS to the remote tables and run "make table" queries to populate the local tables, then drop the remote tables. Works as expected.
Performance here in US is decent - 10-15 seconds for ~40K records. Our India teams are seeing >5-10 minutes for the same datasets. Their internet connection is decent, not great and a variable I cannot control.
I am wondering if MS Access is adding some overhead here than can be avoided by a more direct approach: i.e., letting the server do all/most of the heavy lifting vs Access?
I've tinkered with various combinations, with no clear improvement or success:
Parameterized stored procedures from Access
SQL Passthru queries from Access
ADO vs DAO
Any suggestions, or an overall approach to suggest? How about moving data as XML?
Note: I have Access 7, 10, 13 users.
Thanks!
It's not entirely clear but if the MSAccess database performing the dump is local and the SQL Server database is remote, across the internet, you are bound to bump into the physical limitations of the connection.
ODBC drivers are not meant to be used for data access beyond a LAN, there is too much latency.
When Access queries data, is doesn't open a stream, it fetches blocks of it, wait for the data wot be downloaded, then request another batch. This is OK on a LAN but quickly degrades over long distances, especially when you consider that communication between the US and India has probably around 200ms latency and you can't do much about it as it adds up very quickly if the communication protocol is chatty, all this on top of the connection's bandwidth that is very likely way below what you would get on a LAN.
The better solution would be to perform the dump locally and then transmit the resulting Access file after it has been compacted and maybe zipped (using 7z for instance for better compression). This would most likely result in very small files that would be easy to move around in a few seconds.
The process could easily be automated. The easiest is maybe to automatically perform this dump every day and making it available on an FTP server or an internal website ready for download.
You can also make it available on demand, maybe trough an app running on a server and made available through RemoteApp using RDP services on a Windows 2008 server or simply though a website, or a shell.
You could also have a simple windows service on your SQL Server that listens to requests for a remote client installed on the local machines everywhere, that would process the dump and sent it to the client which would then unpack it and replace the previously downloaded database.
Plenty of solutions for this, even though they would probably require some amount of work to automate reliably.
One final note: if you automate the data dump from SQL Server to Access, avoid using Access in an automated way. It's hard to debug and quite easy to break. Use an export tool instead that doesn't rely on having Access installed.
Renaud and all, thanks for taking time to provide your responses. As you note, performance across the internet is the bottleneck. The fetching of blocks (vs a continguous DL) of data is exactly what I was hoping to avoid via an alternate approach.
Or workflow is evolving to better leverage both sides of the clock where User1 in US completes their day's efforts in the local DB and then sends JUST their updates back to the server (based on timestamps). User2 in India, also has a local copy of the same DB, grabs just the updated records off the server at the start of his day. So, pretty efficient for day-to-day stuff.
The primary issue is the initial DL of the local DB tables from the server (huge multi-year DB) for the current "job" - should happen just once at the start of the effort (~1 wk long process) This is the piece that takes 5-10 minutes for India to accomplish.
We currently do move the DB back and forth via FTP - DAILY. It is used as a SINGLE shared DB and is a bit LARGE due to temp tables. I was hoping my new timestamped-based push-pull of just the changes daily would have been an overall plus. Seems to be, but the initial DL hurdle remains.

Linked server vs integration

We have an application which needs to interact with 3 different databases
(SQL Server) to fetch the user details and display them on a web page. Is it a good option to use a linked server or should we copy the user details (via some daily job) to the application database?
Using a linked server will give you a round trip delay every time you query the data. If you only query the data once per day or per session this might be acceptable. If however you are issuing many queries to these servers you may find that the performance is so poor that your application is unusable.
You could use SQL replication to push (or pull) the data from each of the servers into a local copy on the application server. This will provide you with much better query performance (no round trip delay) while also ensuring that you have the latest data. There are lots of options with SQL replication you should be able to find something that suits your needs.
For more information on SQL Replication see http://technet.microsoft.com/en-us/library/ms151198.aspx
A linked server is only going to allow your databases to talk to each other. If the application is interacting with three discrete databases, then you simply need discrete connections. I would not recommend heavily using the linked servers unless you are moving a lot of data (since picking it up into the application and putting it into another database may take even longer).

Parallel query execution on multiple database servers (running Microsoft SQL Server)

Is it possible to configure multiple database servers (all hosting the same database) to execute a single query simultaneously?
I'm not asking about executing queries using multiple CPUs simultaneously - I know this it possible.
UPDATE
What I mean is something like this:
There are two 2 servers: Server1 and Server2
Both server host database Foo and both instances of Foo are identical
I connect to Server1 and submit a complicated (lots of joins, many calculations) query
Server1 decides that some calculations should be made on Server2 and some data should be read from that server, too - appropriate parts of the query are sent to Server2
Both servers read data and perform necessary calculations
Finally, results from Server1 and Server2 are merged and returned to the client
All this should happen automatically, without need to explicitly reference Server1 or Server2. I mean such parallel query execution - is it possible?
UPDATE 2
Thanks for the tips, John and wuputah.
I am researching alternatives of increasing both availability and capacity of MOSS database backend. So what I'm looking for is some kind out-of-the-box SQL Server load balancing solution that would be transparent to the application, because I cannot modify the application in any way. I guess SQL Server has no such feature (and Oracle, as far as I understand it, does - it is RAC mentioned by wuputah).
UPDATE 3
A quote from the Top Tips for SQL Server Clustering article:
Let's start by debunking a common
misconception. You use MSCS clustering
for high availability, not for load
balancing. Also, SQL Server does not
have any built-in, automatic
load-balancing capability. You have to
load balance through your
application's physical design.
What you're really talking about is a clustering solution. It looks like SQL Server and Oracle have solutions to this, but I don't know anything about them. I can guess they would be very costly to buy and implement.
Possible alternate suggestions would be as follows:
Use master-slave replication, and do your complex read queries from the slave. All writes must go to the master, which are then sent to the slave, so things stay in sync. This helps things go faster because the slave only has to worry about the writes coming from the master, which are already predetermined on behalf of the slave (no deadlocks etc). If you're looking to utilize multiple servers, this is the first place I would start.
Use master-master replication. This means that all writes from both servers go to each other, so they stay in sync (at least theoretically). This has some of the benefits as master-slave but you don't have to worry about writes going to one server instead of the other. The more common use of master-master replication is for failover support; master-slave is really better suited to performance.
Use the feature John Sansom talked about. I don't know much about it, but it seems its basis is splitting your database into tables on different servers, which will have some benefits as well as drawbacks. The big issue is that since the two systems can't share memory, they will have to share a lot of data over the network to compute complex joins.
Hope this helps!
RE Update 1:
If you can't modify the application, there is hope, but it might be a bit complicated. If you were to set up master-slave replication, you can then set up a proxy to send read queries to the slave(s) and write queries to the master(s). I've seen this done with MySQL, but not SQLServer. That's a bit of a problem unless you want to write the proxy yourself.
This has been discussed on SO previously, so you can find more information there.
RE Update 2:
Microsoft's clustering might not be designed for performance, but that's Microsoft fault. That's still the level of complexity you're talking about here. If they say it won't help, then your options are limited to those above and by what you do with your application (like sharding, splitting into multiple databases, etc).
Yes I believe it is possible, well sort of, let me explain.
You need to look into and research the use of Distributed Queries. A distributed query runs across multiple servers and is typically used to reference data that is not stored locally.
http://msdn.microsoft.com/en-us/library/ms191440.aspx
For example, Server A may hold my Customers table and Server B holds my Orders table. It is possible using distributed queries to run a query that references both Server A and Server B, with each server managing the processing of its local data (which could incorporate the use of parallelism).
Now in theory you could store the exact same data on each server and design your queries specifically so that only certain table were referenced on certain servers, thereby distributing the query load. This is not true parallel processing however, in terms of CPU.
If your intended goal is to distribute the processing load of your application then the typical approach with SQL Server is to use Replication to distribute data processing across multiple servers. This method is also not to be confused with parallel processing.
http://databases.about.com/cs/sqlserver/a/aa041303a.htm
I hope this helps but of course please feel free to pose any questions you may have.
Interesting question, but I'm struggling to get my head around this being beneficial for a multi-user system.
If I'm the only user having half my query done on Server1 and the other half on Server2 sounds cool :)
If there are two concurrent users (lets say with queries of identical difficulty) then I'm struggling to see that this helps :(
I could have identical data on both servers and load balancing - so I get Server1, my mate gets Server2 - or I could have half the data on Server1 and the other half on Server2, and each will be optimised, and cache, just their own data - spreading the load. But whenever you have to do a merge to complete a query the limiting factor becomes the pipe-size between them.
Which is basically Federated Database Servers. Instead of having all my Customers on one server and all my Orders on the other I could, say, have my USA customers and their orders on one, and my European customers/orders on the other, and only if my query spans both is there any need for a merge step.

Resources