I have a question for the DBA's out there: If I scale from a single web/DB server setup to two web/two DB server setup with a load balancer in front of the web servers to route incoming queries evenly... how do solutions like MySQL Cluster work so that a change made to one DB server is immediately known to the other (otherwise, users routed to the other DB server won't see the data or will outdated data), or at least so that the other web server is made aware of the fact that it's reading "dirty data" and it should try again in X seconds so as to get up-to-date data?
Thank you.
TWO ways of doing this.
Active/Active or Active/Passive.
Active/Passive is most prevalent
The data is kept in sync on the passive node.
The cluster is useful configuration in as much as the active node goes down the passive is immediately switched hence no downtime.
The clustering continuously synchronises the 2 nodes in the cluster.
I work with SQL server but I think the basic premise of clustering is the same for mySQL - that is no (or no noticeable) downtime on hardware failure.
EDIT: Additionally the clustering software handles the synchronisation. You don't need to worry. You view the cluster nodes as a virtual directory, which behaves like one server in windows.
here is document explaining this
http://www.sql-server-performance.com/articles/clustering/clustering_intro_p1.aspx
In Windows server clustering (to be distinguished from High Performance Clustering), there is a shared external storage array. The active node takes ownership/control of the storage, and when that node fails, the storage 'fails over' to the previously passive node (which is now the active node). There are also different schemes that allow for independent storage at each node, vs. shared storage. However, these require the application to have enough intelligence to know that it is clustered, and keep the two storage sets in sync.
Clustering is also where a number of nodes handle the workload, this is sometimes called active/active clusters i.e. all the nodes share the workload and are active. This is normally handled by specialist software like Oracle RAC (RAC#Wikipedia) for the Oracle RDBMS database. RAC allows Oracle to scale to very large workloads.
Related
What is database clustering? If you allow the same database to be on 2 different servers how do they keep the data between synchronized. And how does this differ from load balancing from a database server perspective?
Database clustering is a bit of an ambiguous term, some vendors consider a cluster having two or more servers share the same storage, some others call a cluster a set of replicated servers.
Replication defines the method by which a set of servers remain synchronized without having to share the storage being able to be geographically disperse, there are two main ways of going about it:
master-master (or multi-master) replication: Any server can update the database. It is usually taken care of by a different module within the database (or a whole different software running on top of them in some cases).
Downside is that it is very hard to do well, and some systems lose ACID properties when in this mode of replication.
Upside is that it is flexible and you can support the failure of any server while still having the database updated.
master-slave replication: There is only a single copy of authoritative data, which is the pushed to the slave servers.
Downside is that it is less fault tolerant, if the master dies, there are no further changes in the slaves.
Upside is that it is easier to do than multi-master and it usually preserve ACID properties.
Load balancing is a different concept, it consists distributing the queries sent to those servers so the load is as evenly distributed as possible. It is usually done at the application layer (or with a connection pool). The only direct relation between replication and load balancing is that you need some replication to be able to load balance, else you'd have a single server.
From SQL Server point of view:
Clustering will give you an active - passive configuration. Meaning in a 2 node cluster, one of them will be the active (serving) and the other one will be passive (waiting to take over when the active node fails). It's a high availability from hardware point of view.
You can have an active-active cluster, but it will require multiple instances of SQL Server running on each node. (i.e. Instance 1 on Node A failing over to Instance 2 on Node B, and instance 1 on Node B failing over to instance 2 on Node A).
Load balancing (at least from SQL Server point of view) does not exists (at least in the same sense of web server load balancing). You can't balance load that way. However, you can split your application to run on some database on server 1 and also run on some database on server 2, etc. This is the primary mean of "load balancing" in SQL world.
Clustering uses shared storage of some kind (a drive cage or a SAN, for example), and puts two database front-ends on it. The front end servers share an IP address and cluster network name that clients use to connect, and they decide between themselves who is currently in charge of serving client requests.
If you're asking about a particular database server, add that to your question and we can add details on their implementation, but at its core, that's what clustering is.
Database Clustering is actually a mode of synchronous replication between two or possibly more nodes with an added functionality of fault tolerance added to your system, and that too in a shared nothing architecture. By shared nothing it means that the individual nodes actually don't share any physical resources like disk or memory.
As far as keeping the data synchronized is concerned, there is a management server to which all the data nodes are connected along with the SQL node to achieve this(talking specifically about MySQL).
Now about the differences: load balancing is just one result that could be achieved through clustering, the others include high availability, scalability and fault tolerance.
I'm not a DBA so this may be a stupid question but I'll ask it anyway. We're upgrading our SQL Servers from 2000 to 2005 and we will probably use either database replication or database mirroring. Our DBA would like to "multipurpose" the standby server meaning that he'd like to increase our capabilities and capacity by running other database applications on the standby server since "it's just going to be sitting there anyway" (his words, not mine). Is this such a good idea? Right now, our main application server uses only one instance that contains 50+ databases. As I understand it, what we're doing now and what our DBA is proposing for a failover server is a bad idea because all of these databases are sharing memory, CPUs, and working areas. If one applications starts behaving badly, the other DBs could be affected.
Any thoughts?
It's really a business question that needs to be answered?? is a slow app better then no app if you can't afford the expense of extra hardware?
Standby and mirrored db's can be used for reporting. Using it as the failover db can work if you have enough headroom (i.e. both databases will comfortably run on the server)
Will you depend on these extra applications? Where do they run in the failover case?
You really need to understand your failure modes.
If you look at it as basic resource math, that doesn't often make sense unless the resources you have running in the failure scenarios can handle the entire expected load. Sometimes this is the case, but not always. In this case, to handle the actual load you may need yet another server to come in (like RAID - perhaps your load needs a minimum of 5 servers, but you have a farm of 6, then you need 1 standby server for ever server to fail above 1). Sometimes a farm can run degraded, but sometimes they just puke and die.
And in the case of out of normal operation, you often have accident cascading where a legitimate incident causes a cascade of issues - e.g. your backup tape is busy restoring a server from a backup (to a test environment, even - there are no real "failures"), now your sql server or exhcange server (or both) is not backed up and your log gets full.
Database Mirroring would not be the way to go here in my opinion as it provides redundancy at the database level only. So you would need to configure database mirroring for up to 50 databases based on the information you provided. The chances are that if one DB where to fail all, 50 would probably follow, as failures typically occur at the hardware level rather than a specific database.
It sounds to me like you should be using SQL Server Clustering technology. You could create an Active/Active cluster to support your requirements.
What is an Active/Active Cluster?
An Active/Active SQL Server cluster means that SQL Server is running on both nodes of a two-way cluster. Each copy of SQL Server acts independently, and users see two different SQL Servers. If one of the SQL Servers in the cluster should fail, then the failed instance of SQL Server will failover to the remaining server. This means that then both instances of SQL Server will be running on one physical server, instead of two.
Applying this to your scenario
You could then split the databases between two instances of SQL server, one active instance on each node. Should one node fail, the other node will pick up the slack and vice versa.
Further Reading
An introduction to SQL Server Clustering
I suspect that you will find the following MSDN thread useful reading also
"it's just going to be sitting there anyway"
It will be sitting there applying transactions...
Take note of John Sansom's recommendation. Keep in mind that a Active/Active cluster requires two sql server licenses and a failover cluster/mirror only needs one.
Setting up mirroring for a large number of db's could turn into a big pain. You need any jobs/maintenance to move over as well - which can be achieved with alerts on WMI failover events. There's probably more to think about that could complicate things.
Hi can any body tell me what is use of replication in sqlserver2005.
backup and replicaton looks same?what is diference b/w them
Backups are exactly that: backups. They enable you to recover the data if something bad happens.
Replication is another beast entirely. It basically distributes the data across multiple nodes so that each node has a complete, (close to) up-to-date copy of the data.
There are a number of reasons why you would use replication including, but not limited to:
High availability so that, if one node goes down, other nodes can still service requests.
Geographical distribution, meaning your data can be placed close to those that need it. Clients in Belarus don't need to go all the way to Montana to get the data if you maintain a local replica in Belarus (or somewhere close) - this is for performance. You may have 10,000 clients in Belarus - it's quicker to send one copy over than have all 10,000 request data [although this depends on how often they request data].
Prioritization. If your reporting users (bank management) have a lower service level agreement than your customer-facing staff (bank tellers) [and they should], you can put all the management onto a replica so as not to slow down the primary copy.
Replication is used for a different purpose, for example to make reports without putting that load on the 'real' database.
Replication increases system availability. If one set of database is down, you can serve out of replica.
Backup saves you from catastrophic errors such as human error that dropped the production database. Note that in this case, replication won't save you as it will dutifully replicate drop command.
SQL Server replication is the process of distributing data from a source database to one or more destination databases throughout the enterprise.
Replication is a great solution for maintaining a reporting server.
Clients at the site to which the data is replicated experience improved performance because those clients can access data locally rather than connecting to a remote database server over a network.
Clients at all sites experience improved availability of replicated data. If the local copy of the replicated data is unavailable, clients can still access the remote copy of the data.
Replication: Lots of data, fast and most recent.
Backup/Restore: Some data, perhaps a bit slower, and a specific point in time.
Replication can be used to address a number of different scenarios as detailed below.
Just to be clear however, Replication is not the same as Database Backup
Scenarios:
Server to server: Replicating Data in a Server to Server Environment
Improving Scalability and Availability
Data Warehousing and Reporting
Integrating Data from Multiple Sites(Server)
Integrating Heterogeneous
Data Offloading Batch Processing
Server to client: Replicating Data Between a Server and Clients
Exchanging Data with Mobile Users
Consumer Point of Sale (POS)
Applications Integrating Data from
Multiple Sites (Client)
For a full overview of Microsoft SQL Server Replication see the following Microsoft reference.
http://msdn.microsoft.com/en-us/library/ms151198(SQL.90).aspx
Choose the track that is most appropriate to you (i.e. Developer / Architect) and all shall be revealed :-)
What is database clustering? If you allow the same database to be on 2 different servers how do they keep the data between synchronized. And how does this differ from load balancing from a database server perspective?
Database clustering is a bit of an ambiguous term, some vendors consider a cluster having two or more servers share the same storage, some others call a cluster a set of replicated servers.
Replication defines the method by which a set of servers remain synchronized without having to share the storage being able to be geographically disperse, there are two main ways of going about it:
master-master (or multi-master) replication: Any server can update the database. It is usually taken care of by a different module within the database (or a whole different software running on top of them in some cases).
Downside is that it is very hard to do well, and some systems lose ACID properties when in this mode of replication.
Upside is that it is flexible and you can support the failure of any server while still having the database updated.
master-slave replication: There is only a single copy of authoritative data, which is the pushed to the slave servers.
Downside is that it is less fault tolerant, if the master dies, there are no further changes in the slaves.
Upside is that it is easier to do than multi-master and it usually preserve ACID properties.
Load balancing is a different concept, it consists distributing the queries sent to those servers so the load is as evenly distributed as possible. It is usually done at the application layer (or with a connection pool). The only direct relation between replication and load balancing is that you need some replication to be able to load balance, else you'd have a single server.
From SQL Server point of view:
Clustering will give you an active - passive configuration. Meaning in a 2 node cluster, one of them will be the active (serving) and the other one will be passive (waiting to take over when the active node fails). It's a high availability from hardware point of view.
You can have an active-active cluster, but it will require multiple instances of SQL Server running on each node. (i.e. Instance 1 on Node A failing over to Instance 2 on Node B, and instance 1 on Node B failing over to instance 2 on Node A).
Load balancing (at least from SQL Server point of view) does not exists (at least in the same sense of web server load balancing). You can't balance load that way. However, you can split your application to run on some database on server 1 and also run on some database on server 2, etc. This is the primary mean of "load balancing" in SQL world.
Clustering uses shared storage of some kind (a drive cage or a SAN, for example), and puts two database front-ends on it. The front end servers share an IP address and cluster network name that clients use to connect, and they decide between themselves who is currently in charge of serving client requests.
If you're asking about a particular database server, add that to your question and we can add details on their implementation, but at its core, that's what clustering is.
Database Clustering is actually a mode of synchronous replication between two or possibly more nodes with an added functionality of fault tolerance added to your system, and that too in a shared nothing architecture. By shared nothing it means that the individual nodes actually don't share any physical resources like disk or memory.
As far as keeping the data synchronized is concerned, there is a management server to which all the data nodes are connected along with the SQL node to achieve this(talking specifically about MySQL).
Now about the differences: load balancing is just one result that could be achieved through clustering, the others include high availability, scalability and fault tolerance.
we are planning to implement sql server 2005 cluster in next few months. i wanted to know what steps / precautions need to be taken as a database developer when trying to achieve this? do we need to change any ado.net code (in front end) / stored procs etc etc? are there any best practices to be followed?
reason i am asking this question is : for asp.net load balancing, you have to ensure your code for sessions / application / cache all comply with load balanced environment. (so incase you are using inproc sessions, you have to rewrite that code so that it works on load balanced environment). now this is at your web server level. i just wanted to the right things to do when trying to scale out at database server level
i am sorry if this question is stupid. please excuse my limited knowledge on this subject :-)
You have to make no front end changes to implement a SQL Server cluster, you simply connect to a SQL Server instance as normal.
SQL Server failover clustering is not load balancing however. It is used to add redundancy should any hardware fail on your primary node. Your other (secondary) node is doing nothing until your primary fails, in which case the failover happens automatically and your database is serving connections again after a 10-20 second delay.
Another issue is the cache on the secondary node is empty, so you may see some performance impact after failover. You can implement a "warm" cache on your mirror server using SQL Server database mirroring, but there is no way to do something similar with clustering.
Database clustering is different to load balancing. It's high availability, not "scaling out"
Basically:
2 servers (or nodes) with shared disks (that can only be owned by one node at any time)
one is "active" running a virtual windows server and SQL Server instance
one is monitoring the other ("passive")
you connect to the virtual windows server.
If node 1 goes off line, node 2 takes over. Or it can be failed over manually.
This means: services are shut down on node 1, node 2 takes control of the disks and services and starts up. Any connections will be broken, and no state or session is transferred.