Hi can any body tell me what is use of replication in sqlserver2005.
backup and replicaton looks same?what is diference b/w them
Backups are exactly that: backups. They enable you to recover the data if something bad happens.
Replication is another beast entirely. It basically distributes the data across multiple nodes so that each node has a complete, (close to) up-to-date copy of the data.
There are a number of reasons why you would use replication including, but not limited to:
High availability so that, if one node goes down, other nodes can still service requests.
Geographical distribution, meaning your data can be placed close to those that need it. Clients in Belarus don't need to go all the way to Montana to get the data if you maintain a local replica in Belarus (or somewhere close) - this is for performance. You may have 10,000 clients in Belarus - it's quicker to send one copy over than have all 10,000 request data [although this depends on how often they request data].
Prioritization. If your reporting users (bank management) have a lower service level agreement than your customer-facing staff (bank tellers) [and they should], you can put all the management onto a replica so as not to slow down the primary copy.
Replication is used for a different purpose, for example to make reports without putting that load on the 'real' database.
Replication increases system availability. If one set of database is down, you can serve out of replica.
Backup saves you from catastrophic errors such as human error that dropped the production database. Note that in this case, replication won't save you as it will dutifully replicate drop command.
SQL Server replication is the process of distributing data from a source database to one or more destination databases throughout the enterprise.
Replication is a great solution for maintaining a reporting server.
Clients at the site to which the data is replicated experience improved performance because those clients can access data locally rather than connecting to a remote database server over a network.
Clients at all sites experience improved availability of replicated data. If the local copy of the replicated data is unavailable, clients can still access the remote copy of the data.
Replication: Lots of data, fast and most recent.
Backup/Restore: Some data, perhaps a bit slower, and a specific point in time.
Replication can be used to address a number of different scenarios as detailed below.
Just to be clear however, Replication is not the same as Database Backup
Scenarios:
Server to server: Replicating Data in a Server to Server Environment
Improving Scalability and Availability
Data Warehousing and Reporting
Integrating Data from Multiple Sites(Server)
Integrating Heterogeneous
Data Offloading Batch Processing
Server to client: Replicating Data Between a Server and Clients
Exchanging Data with Mobile Users
Consumer Point of Sale (POS)
Applications Integrating Data from
Multiple Sites (Client)
For a full overview of Microsoft SQL Server Replication see the following Microsoft reference.
http://msdn.microsoft.com/en-us/library/ms151198(SQL.90).aspx
Choose the track that is most appropriate to you (i.e. Developer / Architect) and all shall be revealed :-)
Related
The current single application server can handle about 5000 concurrent requests. However, the user base will be over millions and I may need to have two application servers to handle requests.
So the design is to have a load balancer to hope it will handle over 10000 concurrent requests. However, the data of each users are being stored in one single database. So the design is to have two or more servers, shall I do the followings?
Having two instances of databases
Real-time sync between two database
Is this correct?
However, if so, will the sync process lower down the performance of the servers
as Database replication seems costly.
Thank you.
You probably want to think of your service in "tiers". In this instance, you've got two tiers; the application tier and the database tier.
Typically, your application tier is going to be considerably easier to scale horizontally (i.e. by adding more application servers behind a load balancer) than your database tier.
With that in mind, the best approach is probably to overprovision your database (i.e. put it on its own, meaty server) and have your application servers all connect to that same database. Depending on the database software you're using, you could also look at using read replicas (AWS docs) to reduce the strain on your database.
You can also look at caching via Memcached / Redis to reduce the amount of load you're placing on the database.
So – tl;dr – put your DB on its own, big, server, and spread your application code across many small servers, all connecting to that same DB server.
Best option could be the synchronizing the standby node with data from active node as cost effective solution since it can be achievable using open source relational database(e.g. Maria DB).
Do not store computable results and statistics that can be easily doable at run time which may help reduce to data size.
If history data is not needed urgent for inquiries , it can be written to text file in easily importable format to database(e.g. .csv).
Data objects that are very oftenly updated can be kept in in-memory database as key value pair, use scheduled task to perform batch update/insert to relation database to achieve persistence
Implement retry logic for database batch update tasks to handle db downtimes or network errors
Consider writing data to relational database as serialized objects
Cache configuration data to memory from database either periodically or via API to refresh the changing part.
The system in question is for a company with multiple locations. Unreliable internet speeds/availability at some locations have led to the path of a local server at each location off of which a location and a central server.
The role of the local server is for each location to be able to run no matter if it is connected to the outside world or not, or to eliminate high latency if the the connection speed is less than optimal.
The role of the central server is two-fold:
Configuration, policy, user, etc, management. For example, new products, price changes, promotions, user changes, etc, are done on the central server and then distributed to the local servers so they have the most up to date info.
Centralize all data created at each location to run reports, analytics and warehouse data.
The question of how much data to keep on the local server is debatable. For example some processes are dependent upon not just that one location, like customer loyalty, so a query must be run to the central server to check user activity and determine incentives. On the other hand, active customer base should be within the scope of the local servers data.
I lack experience in these types of distributed systems. My question is what database should we use that will facilitate this type of setup, hopefully incorporating the functionality to work automatically without much coding needed to achieve the data syncs to/from central server.
Master-Slave Replication:
In this type of replication one server (the master) accepts writes and will replicate the changes to read replicas(slaves)
Characteristics
Asynchronous
Read Scalability
Master is a point of failure for all the nodes (SPOF)
Master-Master:
In this setup all the database servers accepts read and writes and synchronize together.
Characteristics
Synchronous(hopefully)
Read and Write Scalability
Performance is worse than Master-Slave
No SPOF
Master-Master is harder to setup and maintain. Possibility of id collisions.
Any Popular Database Server these days supports the features above.
I have installed SQL server database (mainserver) in one instance and SQL server database for RerportServer in others. what is the best way to replicate data from mainServer to report Server? Data in mainServer changes frequently and actual information in the ReportSever is very important.
And there is many ways to do this:
mirroring
shipping log
transactional replication
merge replication
snapshot replication
are there some best-practices about this?
Thanks
You need Transactional Replication for your case. Here is why you would not need the other 4 cases:
Mirroring
This is generally used to increase the availability of a database server and provides for automatic failover in case of a disaster.
Typically even though you have more than a single copy of the database (recommended to be on different server instances), only one of them is active at a time, called the principle server.
Every operation on this server instance is mirrored on the others continuously (as soon as possible), so this doesn't fit your use case.
Log Shipping
In this case, apart from the production database servers, you have extra failover servers such that the backup of the production server's database, differential & transactional logs are automatically shipped (copied) to the failovers, and restored.
The replication here is relatively scheduled to be at a longer interval of time than the other mechanisms, typically ranging from an hour to a couple of hours.
This also provides for having the failver servers readies manually in case of a disaster at the production sites.
This also doesn't fit your use case.
Merge Replication
The key difference between this and the others is that the replicated database instances can communicate to the different client applications independent of the changes being made to each other.
For example a database server in North America being updated by clients across Americas & Europe and another one in Australia being updated by clients across the Asia-Pacific region, and then the changes being merged to one another.
Again, it doesn't fit your use case.
Snapshot Replication
The whole snapshot of the database is published to be replicated to the secondary database (different from just the log files being shipped for replication.)
Initially however, for each type of replication a snapshot is generated to initialized the subscribing database, i.e only once.
Why you should use Transactional Replication?
You can choose the objects (Tables, Views, etc) to be replicated continuously, so if there are only a subset of the tables which are used to reporting, it would save a lot of bandwidth. This is not possible in Mirroring and Log Shipping.
You can redirect traffic from your application to the reporting server for all the reads and reports (which you can also do in others too, btw).
You can have independent batch jobs generating some of the more used reports running on the reporting server, reducing the load on the main server if it has quite frequent Inserts, Updates or Deletes.
Going through your list from top to bottom.
Mirroring: If you mirror your data from your mainServer to your reportServer you will not be able to access your reportServer. Mirroring puts the mirrored database into a continuous restoring state. Mirroring is a High Availability solution. In your case the reportServer will only be available to query if you do a fail over. The mirrored server is never operational till fail over. This is not what you want as you cannot use the reportServer till it is operational.
Log Shipping: Log shipping will allow you to apply transactional log backups on a scheduled event to the reportServer. If you backup the transaction log every 15 minutes and apply the data to the reportServer you will have a delay of 15+ minutes between your mainServer and Log server. Mirroring is actually real time log shipping. Depending on how you setup log shipping your client will have to disconnect while the database is busy restoring the log files. Thus during a long restore it might be impossible to use reporting. Log Shipping is also a High Availability feature and not really useful for reporting. See this link for a description of trying to access a database while it is trying to restore http://social.msdn.microsoft.com/forums/en-US/sqldisasterrecovery/thread/c6931747-9dcb-41f6-bdf4-ae0f4569fda7
Replication : I am lumping all the replication together here. Replication especially transactional replication can help you scale out your reporting needs. It would generally be mush easier to implement and also you would be able to report on the data all of the time where in mirroring you cant report on the data in transaction log shipping you will have gaps. So in your case replication makes much more sense. Snapshot replication would be useful if your reports could be say a day old. You can make a snapshot every morning of the data you need from mainServer and publish this to the subscribers reportServer. However if the database is extremely large then Snapshot is going to be problematic to deal with on a daily basis. Merge replication is only usefull when you want to update the replicated data. In your case you want to have a read only copy of the data to report on so Merge replication is not going to help. Transactional Replication would allow you to send replications across the wire. In your case where you need frequently updated information in your reportServer this would be extremely useful. I would probably suggest this route for you.
Just remember that by implementing the replication/mirroring/log shipping you are creating more maintenance work. Replication CAN fail. So can mirroring and so can transaction log shipping. You will need to monitor these solutions to make sure they are running smoothly. So the question is do you really need to scale out your reports to another server or maybe spend time identifying why you cant report on the production server?
Hope that helps!
We have a decent sized, write-heavy database that is about 426 GB (including indexes) and about 300 million rows . We currently collect location data from devices that report to our server every couple of minutes, and we serve about 10,000 devices - so lots of writes every second. The location table that stores the location of each device has about 223 million rows. The data is currently archived by year.
Problems occur when users run large reports on this database, the whole database grinds down almost to a stop.
I understand I need a reporting database, but my question is if anyone has experience of using SQL Server Transactional Replication on a database of equivalent size, and their experience of using this technology?
My rough plan is to point all the reports in our application to the Reporting Database, use Transactional Replication to replicate the data over from the master to the slave (Reporting Database).
Anyone have any thoughts on this strategy and the problems I may encounter?
Many thanks!
Transactional replication should work well in this scenario (the only effect the size of the database will have is the time taken to generate the initial snapshot). However, it may not solve your problem.
I think the issue you'll have if you choose transactional replication is that the slave server is going to be under the same load as the master machine as changes are applied - it will still crawl when users run large reports (assuming it's of a similar spec).
Depending on the acceptable latency of reporting data to the live data, this may or may not be OK for your users.
If some latency is acceptable you may get better performance from log shipping, since changes are applied in batches.
Before acquiring a reporting server, another approach would be to investigate the queries that your users are running and look at modifying either their code or the indexing strategy to better match what they're trying to do.
Transactional Replication could work well for you. The things to consider:
The target database tables must be read-only.
The server containing the target database should be stout enough to handle the SELECT traffic from the reporting applications.
Depending on the INSERT/UPDATE traffic, you may need to have a third server act as the Distribution server.
You also have to consider the size of the Distribution database.
Based on what I read here, I'd use a pull subscription from the Reporting server to offload traffic from the OLTP server.
You can skip the torment of a snapshot by initializing the reporting database from a backup of the OLTP database. See https://msdn.microsoft.com/en-us/library/ms151705.aspx
There will be INSERT/UPDATE/DELETE traffic from the Replication into both the Distribution and the Subscriber databases. That requires consideration, but lock/block issues should be no worse (and probably better) than running those reports off of OLTP.
I am running multiple publications on a 2.6TB database with 2.5GB/day of growth, using both pure transactional to drive reports (to two reporting servers) and Peer-to-Peer Transactional to replicate data in a scale-out for a SaaS offering (to three more servers). Because of this, we have a separate distributor.
Hope this helps.
Thanks
John.
I need to implement a SQL Server replication solution. Very simple need for now. I just need to replicate one pretty simple table from 200 remote sites or so to one central server. The data is not really transactional in nature. I just need it moved up to the central server once a day. I can't decide if I should use push or pull, and I'm not sure if the distributor should live on the server side, or on all the clients.
The server and all the remote sites all live on a fairly decent VPN. The server is 2005, and it's not being pushed very hard at the moment. Just a few jobs here and there collecting data (which I want to get away from) and pushing reports/exports to various vendors once a day. The sites are a mix of 2000/2005.
I'd recommend you do some scalability tests first. Replication is very verbose in terms of agent jobs and T-SQL connections for reading and writing data. 200 publications you're talking 200 publisher agents, 200 subscription agents, plus the distributor maintenance. Most sites complain about maintenance problems of having 1 publisher and 1 subscriber... Say you manage to pull this off and operate it successfully, what is going to be your upgrade story? And how are you going to implement a schema change?
The largest replication deployment I heard of (some years ago) had I believe 450 publishers and was implemented by an army of Microsoft field consultants sweating for months to bend the behemoth into shape. Your 200 replication sites project is way more ambitious than you realize.
I suggest you explore some alternatives too. If you need a periodic table snapshot then SSIS can be a good match. If you need a continuous stream of changes then Service Broker can scale way way easier than replication.
If there is need to adjust the replication down the road, having the central server initiate a pull will be much easier to administrate than adjusting 200 sites to accomplish the same thing. Also, that would naturally manage the load, rather than some scheme to prevent, say, 100 remote sites all connecting at once.
Push subscriptions are the way to go here if you wish to centrally manage the data distribution of your application platform.
From what you have described you will need to make a choice between Snapshot Replication and Transactional Replication for your architecture.
Dependent on how much data you are looking to push and also the schedule of your updates will determine the most appropriate Replication Method for you to use. For example, if you looking to update all Subscriptions at the same time then dependent on how much data you need to push Snapshot Replication may not be suitable and you may be better off using Transactional Replication, perhaps pushed at specific determined intervals. Your network may even be able to support near real-time replication however conducting a small test of your environment will determine this for you. For example, setup the Publisher, local Distributor and a handful of Subscribers at geographically different locations on your network in order to test network transfer times and Replication Latency.
Things to consider:
How much data is to be moved across
the network? Size in Kb and record
volume.
Consider the physical location of
your sites
What is the suitability of your
network? Seed, capacity etc.
You may wish to consider using a
dedicated Distributor.