I am learning the very basics of High Availability, SAN etc. and hence this question may sound stupid to experts, but would greatly help me if you answer it. Let’s say I am using an Enterprise SAN setup. I understand that any database e.g. SQL stores the data in a File which is stored on the SAN. Now if let’s say I enable array-based replication to another array may be in another Data center, then my database file will continuously be replicated in the second data center. Whenever the first data center is lost, I can use the replicated file in second data center to bring up the data and database. Then what exactly is the role played by various HA solutions like SQL Always On, Oracle Dataguard etc.? Thanks a ton in advance to people who rely.
Mission Critical systems are just that. Mission Critical.
We test this, and plan for it to happen but dependency upon redundancy is no real redundancy. Are both installation yours? Are third parties involved? For how many days (or hours) can you bear the extra latency?
Andrew, Katrina, Sandy. If it is not just you, how high are you on the priority list for your third parties? They may be blue chip long on promise, but short on delivery when spread thin on the ground.
When you do recover, there will be exceptions, dead letters, and possibly a period before you can confirm eventual consistency. Functionally you may be fine, but your brand may be damaged, your stock price devalued.
Ultimately it is meeting a requirement that is in turn driven by a real need or by risk mitigation strategies. I'm sure it is sold to people who do not need it. But for the people that do, 100% uptime is an absolute, not something one rounds to for the Annual Report. These products are actively marketed to the 'C' suite for these reasons.
When you build IT solutions, some requirements to consider are how resilient your
IT solution needs to be based on the business process criticality.
there are two aspects of the solution that you need to consider.
High availability (HA): Which determines how resilient you application needs to be and it is usually expressed in 9s (nines) for example 99.99% availability is called four nines availability which is the equivalent of about one hour of unplanned downtime. HA in Oracle this is accomplished usually by using oracle real application cluster (RAC), which gives you availability even in the event when a node in the cluster goes down.
in SQL Server this would be SQL Server availability group.
Disaster recovery(DR): determines methods and technologies to provide business continuity in case of disaster in other words when your application High Availability (HA) features are no longer responding to requests.
in Oracle this would be a dataguard replicating your database from one cluster in one datacenter to a second cluster at the remote location.
SQL server also provides similiar features such as log shipping, alwas-on, and availability groups.
disaster recovery capabilities can be measured by two metrics: Recovery time objective (RTO) which determines how long it would take for the backup site to be fully funtional in case of loss of the primary data center. and the Recovery Time objective (RPO) which determines business data loss tolerance.
I have one central database and 25 client databases and all have same schema.
I want that whenever some changes are done in some tables of the central database then these changes flow down to the client database.
The databases used is SQL Express so I cannot use replication.
The solution that I have today is to make keep track of the changes in the central database and then a program makes a text file with these changes and sends them down to the client databases.Another program reads these text files and updates the client database.
There are three problems with this:-
1. The files get lost or arrive in jumbled order which messes up the client data
2. the process is slow
3. the programs are sometimes shutdown so the whole sync flow gets stopped.
Is there a reliable alternative that is fast and secure ?
I wonder how banking software are made ...they never lose transactions and they are fast.
Add an UpdateDate column to all the entities that need to be replicated. At each client add a linked server to the central repository. Now, every 5 minutes or so, poll your central repository for changes using the last UpdateDate of a client entity and grab the delta.
Then use merge or insert and update to merge data on the client. That's a very reliable way of doing homebrew replication. To keep track of deleted elements you would either want to mark them as deleted or have another table to keep track of entity kind and its reference, again combined with UpdateDate for replication.
Update
Then you mention transactions and banking software. When you do your replication via files, we ain't talkin' about no transactional replication here, not by a long shot.
If you need transactional consistency you need to subscribe to the transaction flow of the data warehouse.
I don't want to be unhelpful and you haven't given any background about your business needs, but you have to decide if your priority is really "fast and secure" or if it's actually "cheap". Replicating changes between multiple databases in a reliable, consistent way is not easy (as you know) and it's highly unlikely that you will be able to develop a solution yourself that has the features, stability and performance of SQL Server replication.
SQL Express can be a replication subscriber, by the way, so it's not clear why it doesn't meet your needs. But if it doesn't, you should estimate the cost to your business (or customer) of dealing with issues caused by an unreliable solution: your time, business downtime, finding and correcting incorrect data, customer complaints, lost business etc. Then compare that to the cost of 25 SQL Server licenses (you should certainly be able to get a good discount when you order that volume), additional hardware (if any) and the costs of training, consulting and/or learning how to use replication. Then extrapolate those costs over 5 years or so. You may find that it's cheaper just to buy the solution you need. And of course buying the full SQL Server edition means you get a lot of other new features that might be useful to you.
If you (or your boss) is really determined to get something for nothing, you might want to investigate PostgreSQL or MySQL. They both have free replication solutions that seem to be widely enough used to be reliable for many companies. Of course, you then need to calculate the costs of switching to a new database platform.
If you have one central database and 25 clients, you can easily do it with one (yes only one) SQL server licence for the main database. Subscribers to this database can run SQL express. As long as users access the the client databases, you are not even obliged to buy SQL CALs.
Back to banking software, be sure that they are paying good money for their server licenses! So don't be surprised if these are reliable and fast ...
We're building a relatively high profile site that is expected to have 100 million hits on the first days of launch. My predecessor had argued for a scale-up SQL strategy using a single server with 1 TB RAM and 32 cores. We have been advised that this is not a feasible soluiton.
In response, I have shifted to a scale-out strategy with multiple SQL servers and horizontal partitioning. My question revolves around how I will direct the DAL to the appropriate database server. There will be many reads and writes for each user of the application. My first thought was to use a single scale out server that stored the profile id (GUID) of each user and would return the connection string to that user's shard. This seemed like a lot of overhead and created a single point of failure.
My second strategy was to route to the database by GUID so I could directly code this into the DAL. GUID's aren't random though so I'm thinking I'd need to hash it in order to get a relatively even distribution between my database shards. Every user including anonymous users has a GUID, so this is really the only property I have available to me that I can use for sharding.
So the question is whether I'm going to kill performance with the hashing that will have to occur. I'm pretty confident that the hash will be less of a bottleneck than a database read, but I'd really like some feedback on this or any other thoughts the community would like to share about my strategy.
Some specifics:
We're using SQL 2008 R2 Enterprise on the db servers. Each db will have 64GB RAM and 8 cores. The databases will be on shared storage. Vmotion will be used if a server goes down. There will be a slew of web servers at launch (30-40?) but the exact number will be dictated by performance testing. The application is built on .net 4.0 with the Enterprise Library v5. Web server load balancing will be handled by a Cisco ACE. We have requested that each of the database servers be on a separate vsphere instance.
Thanks!
Is any user profile needing to interact with another user profile? Typical example would be a Wall of some sort, eg. where you see the status updates from all your friends. See Scale out SQL Server by using Reliable Messaging to understand why I'm asking this.
As for hashing, a good hashing scheme can accommodate a change in the distribution (eg. a segment owned by machine A is split into two new segments now owned by A and B) w/o changes to the application. Hashing in the app layer does not solve this problem, having a dedicated 'scale-out manager' is better, as long as you design the scale-out manager to be highly available and control the number of requests it has to respond to (eg. less thatn 1 per HTTP request on the front end layer).
I'm wondering if, under the circumstances that
You get lots more reads than writes
Your SQL server of choice is cheap/free and offers a fast mirroring/replication service
Your database isn't insanely large
rather than having separate SQL servers it would be better to have an instance of SQL on each machine getting instant updates from the master. This way there would be no network latency when doing all the read queries, but there would be a per box performance hit as the SQL instance has to execute. Would this be better overall for performance? Are there any other pros/cons that might come up?
Your SQL Server should always be on a different box to the webserver, of that there is no question.
How many DB servers and webservers you have, and how they mirror (or otherwise) is up to how you scale your application.
You have SQL Server on a different machine because it needs (and deserves) a lot of RAM.
It's quite a common architectural pattern to have read-only replicas of a database. We accept some degree of stalesness in them, perhaps they are even only updated once a day.
The general rule will be that multiple copies will introduce complexity in terms of operations and management and tend to introduce the possibilities of inconsistency of data - almost inevitably the copies will not be perfectly is step (or the costs of making them soo will be too high.)
An example: what happens if your replication processing breaks a bit. So that some, but not all copies become stale. Now your users start to see radically different views of the world. How much might that matter to you? If it's a site with low value data (eg. celebrity sightings in London suberbs) then perhaps that's fine. If it's on hand inventory, and being out of date means that your customers can't place orders, then maybe you care rather more.
My advice: things that sound simple at a boxed on paper sort of level don't always work out that way when you're sitting in an operations room at 3AM. Be very sure that you can easily operate your solution.
How would your SQL Server be cheap/free? I should have said the licensing costs for this setup would be crippling. At retail prices you're looking at $6000 per server. See also Jeff's comments about costs. Scale out the web servers by all means, but not your SQL Server until it's pretty much on its' knees.
You might instead want to think about a distributed cache like Velocity or NCache.
Either way, run your site first with one SQL server and see how it copes with the load, then think about mirroring/replication across servers, otherwise you're just optimising prematurely. Measure first!
An immediate con is that there is no distributed lock co-ordinator in SQL Server so you can get merge conflicts as updates can change the same row on two different servers at the same time.
Depending on the size of the database and the disks in the web servers, you will find your network latency is smaller than the disk latency you will start suffering as the web server disks will not usually be as performant as the disk array you give to the database. If you wanted that kind of performance, you would be buying it per web server.
Replication performance is not without latency either, the distribution of the transactions isn't 'free' and careful maintenance of the transaction log would have to be planned to ensure you did not get log fragmentation (too many vlog's wthin the transaction log) which kills replication performance.
Listening to Scott Hanselman's interview with the Stack Overflow team (part 1 and 2), he was adamant that the SQL server and application server should be on separate machines. Is this just to make sure that if one server is compromised, both systems aren't accessible? Do the security concerns outweigh the complexity of two servers (extra cost, dedicated network connection between the two, more maintenance, etc.), especially for a small application, where neither piece is using too much CPU or memory? Even with two servers, with one server compromised, an attacker could still do serious damage, either by deleting the database, or messing with the application code.
Why would this be such a big deal if performance isn't an issue?
Security. Your web server lives in a DMZ, accessible to the public internet and taking untrusted input from anonymous users. If your web server gets compromised, and you've followed least privilege rules in connecting to your DB, the maximum exposure is what your app can do through the database API. If you have a business tier in between, you have one more step between your attacker and your data. If, on the other hand, your database is on the same server, the attacker now has root access to your data and server.
Scalability. Keeping your web server stateless allows you to scale your web servers horizontally pretty much effortlessly. It is very difficult to horizontally scale a database server.
Performance. 2 boxes = 2 times the CPU, 2 times the RAM, and 2 times the spindles for disk access.
All that being said, I can certainly see reasonable cases that none of those points really matter.
It doesn't really matter (you can quite happily run your site with web/database on the same machine), it's just the easiest step in scaling..
It's exactly what StackOverflow did - starting with single machine running IIS/SQL Server, then when it started getting heavily loaded, a second server was bought and the SQL server was moved onto that.
If performance is not an issue, do not waste money buying/maintaining two servers.
On the other hand, referring to a different blogging Scott (Watermasyck, of Telligent) - they found that most users could speed up the websites (using Telligent's Community Server), by putting the database on the same machine as the web site. However, in their customer's case, usually the db & web server are the only applications on that machine, and the website isn't straining the machine that much. Then, the efficiency of not having to send data across the network more that made up for the increased strain.
Tom is correct on this. Some other reasons are that it isn't cost effective and that there are additional security risks.
Webservers have different hardware requirements than database servers. Database servers fare better with a lot of memory and a really fast disk array while web servers only require enough memory to cache files and frequent DB requests (depending on your setup). Regarding cost effectiveness, the two servers won't necessarily be less expensive, however performance/cost ratio should be higher since you don't have to different applications competing for resources. For this reason, you're probably going to have to spend a lot more for one server which caters to both and offers equivalent performance to 2 specialized ones.
The security concern is that if the single machine is compromised, both webserver and database are vulnerable. With two servers, you have some breathing room as the 2nd server will still be secure (for a while at least).
Also, there are some scalability benefits since you may only have to maintain a few database servers that are used by a bunch of different web applications. This way you have less work to do applying upgrades or patches and doing performance tuning. I believe that there are server management tools for making these tasks easier though (in the single machine case).
I would think the big factor would be performance. Both the web server/app code and SQL Server would cache commonly requested data in memory and you're killing your cache performance by running them in the same memory space.
Security is a major concern. Ideally your database server should be sitting behind a firewall with only the ports required to perform data access opened. Your web application should be connecting to the database server with a SQL account that has just enough rights for the application to function and no more. For example you should remove rights that permit dropping of objects and most certainly you shouldn't be connecting using accounts such as 'sa'.
In the event that you lose the web server to a hijack (i.e. a full blown privilege escalation to administrator rights), the worst case scenario is that your application's database may be compromised but not the whole database server (as would be the case if the database server and web server were the same machine). If you've encrypted your database connection strings and the hacker isn't savvy enough to decrypt them then all you've lost is the web server.
One factor that hasn't been mentioned yet is load balancing. If you start off thinking of the web server and the database as separate machines, you optimize for fewer network round trips and also it gets easier to add a second web server or a second database engine as needs increase.
I agree with Daniel Earwicker - the security question is pretty much flawed.
If you have a single box setup with a webserver and only the database for that webserver on it, if that webserver is compromised you lose both the webserver and only the database for that specific application.
This is exactly the same as what happens if you lose the webserver on a 2-server setup. You lose the web server, and just the database for that specific application.
The argument that 'the rest of the DB server's integrity is maintained' where you have a 2-server setup is irrelevant, because in the first scenario, every other database server relating to every other application (if there are any) remain unaffected as well - being, as they are, hosted elsewhere.
Similarly, to the question posed by Kev 'what about all the other databases residing on the DB server? All you've lost is one database.'
if you were hosting an application and database on one server, you would only host databases on that server which related to that application. Therefore, you would not lose any additional databases in a single server setup when compared to a multiple server setup.
By contrast, in a 2 server setup, where the attacker had access to the Web Server, and by proxy, limited rights (in the best case scenario) to the database server, they could put the databases of every other application at risk by carrying out slow, memory intensive queries or maximising the available storage space on the database server. By separating the applications out into their own concerns, very much like virtualisation, you also isolate them for security purposes in a positive way.
I can speak from first hand experience that it is often a good idea to place the web server and database on different machines. If you have an application that is resource intensive, it can easily cause the CPU cycles on the machine to peak, essentially bringing the machine to a halt. However, if your application has limited use of the database, it would probably be no big deal to have them share a server.
Wow, No one brings up the fact that if you actually buy SQL server at 5k bucks, you might want to use it for more than your web application. If your using express, maybe you don't care. I see SQL servers run Databases for 20 to 30 applicaitions, so putting it on the webserver would not be smart.
Secondly, depends on whom the server is for. I do work for financial companies and the govt. So we use a crazy pain in the arse approach of using only sprocs and limiting ports from webserver to SQL. So if the web app gets hacked. The only thing the hacker can do is call sprocs as the user account on the webserver is locked down to only see/call sprocs on the DB. So now the hacker has to figure out how to get into the DB. If its on the web server well its kind of easy to get to.
It depends on the application and the purpose. When high availability and performance is not critical, it's not bad to not to separate the DB and web server. Especially considering the performance gains - if the appliation makes a large amount of database queries, a considerable amount of network load can be removed by keeping it all on the same system, keeping the response times low.
I listened to that podcast, and it was amusing, but the security argument made no sense to me. If you've compromised server A, and that server can access data on server B, then you instantly have access to the data on server B.
I think its because the two machines usually would need to be optimized in different ways. Other than that I have no idea, we run all our applications with the server-database on the same machine - granted we're not public facing - but we've had no problems.
I can't imagine that too many people care about one machine being compromised over both since the web application will usually have nearly unrestricted access to at the very least the data if not the schema inside the database.
Interested in what others might say.
Database licences are not cheep and are often charged per CPU, therefore by separating out your web-servers you can reduce the cost of your database licences.
E.g if you have 1 server doing both web and database that contains 8 CPUs you will have to pay for an 8 cpu licence. However if you have two servers each with 4 CPUs and runs the database on one server you will only have to pay for a 4 cpu licences
An additional concern is that databases like to take up all the available memory and hold it in reserve for when it wants to use it. You can force it to limit the memory but this can considerably slow data access.
Something not mentioned here, and the reason I am facing, is 0 downtime deployments. Currently I have DB/webserver on same machine and that makes updates a pain. If you they are on a seprate machine, you can perform A/B releases.
I.e.:
The DNS currently points to WebServerA
Apply sofware updates to WebServerB
Change DNS to point to WebServerB
Work on WebServerA at leisure for the next round of updates.
This works before the state is stored in the DB, on a separate server.
Arguing that there is a real performance gain to be had by running a database server on a web server is a flawed argument.
Since Database servers take query strings and return result sets, the data actually flowing from data server to web server is relatively small, but the horsepower required to process the query and generate the result set is relatively large. Optimizing performance around the data transfer time therefore is optimizing around the wrong thing.
Regarding security, there are advantages to having the data server on a different box than the web server. Having such a setup is not the be all and end all of security, but it is a step in the right direction.
Regarding scalability, it is easy and relatively cheap to add web servers and put them into cluster to handle increased traffic. It is not so easy and cheap to add data servers and cluster them. Also, web servers and data servers have different hardware needs, so multiple boxes help out with scalability.
If you are starting small and have only one box, then a good way would go would be to use virtual machines. Running the web server and data server in different VMs on one host gives you all the gains of separate boxes at the cost of one large box price.
Operating system is another consideration. While your database may require larger memory spaces and therefore UNIX, your web server - or more specifically your app server since you mention only two tiers - may be a .Net-based, and therefore require Windows.
Ok! Here is the thing, it is more Secure to have your DB Server installed on another Machine and your Application on the Web Server. You then connect your application to the DB with a Web Link. Thanks it.