Load balancer and multiple instance of database design - database

The current single application server can handle about 5000 concurrent requests. However, the user base will be over millions and I may need to have two application servers to handle requests.
So the design is to have a load balancer to hope it will handle over 10000 concurrent requests. However, the data of each users are being stored in one single database. So the design is to have two or more servers, shall I do the followings?
Having two instances of databases
Real-time sync between two database
Is this correct?
However, if so, will the sync process lower down the performance of the servers
as Database replication seems costly.
Thank you.

You probably want to think of your service in "tiers". In this instance, you've got two tiers; the application tier and the database tier.
Typically, your application tier is going to be considerably easier to scale horizontally (i.e. by adding more application servers behind a load balancer) than your database tier.
With that in mind, the best approach is probably to overprovision your database (i.e. put it on its own, meaty server) and have your application servers all connect to that same database. Depending on the database software you're using, you could also look at using read replicas (AWS docs) to reduce the strain on your database.
You can also look at caching via Memcached / Redis to reduce the amount of load you're placing on the database.
So – tl;dr – put your DB on its own, big, server, and spread your application code across many small servers, all connecting to that same DB server.

Best option could be the synchronizing the standby node with data from active node as cost effective solution since it can be achievable using open source relational database(e.g. Maria DB).
Do not store computable results and statistics that can be easily doable at run time which may help reduce to data size.
If history data is not needed urgent for inquiries , it can be written to text file in easily importable format to database(e.g. .csv).
Data objects that are very oftenly updated can be kept in in-memory database as key value pair, use scheduled task to perform batch update/insert to relation database to achieve persistence
Implement retry logic for database batch update tasks to handle db downtimes or network errors
Consider writing data to relational database as serialized objects
Cache configuration data to memory from database either periodically or via API to refresh the changing part.

Related

Mirroring or caching database views in intermediary layer for web application?

I have an Oracle 11g database with an extremely complex and badly designed schema. This is a legacy system with many dependencies that supports several critical applications, modifying the schema of this database is unfortunately out of the question.
I am developing a web application (ASP.NET MVC 5) that acts as a read-only status dashboard for the information in this database. Currently, I rely on purpose built database views to get only the information the web application needs. Given the complexity of the schema many of these views perform very poorly. When the web application is busy with many users, the database struggles to keep up, usually resulting in time out errors. Also, when the database does fail for whatever reason, the web application cannot show any data. Users would still like the web application to show a snapshot of the data before the database failed.
The nature of the data is very dynamic, rows are being added/updated/deleted by several external systems and processes constantly, and I have no way of knowing when there is a change to the underlying data, so I have to re-query the view to get fresh data.
Because of this situation, we are considering removed the direct link between this database and the web application and instead create some sort of intermediary cache/database/magic layer between them. This way, the web application would get its data from this intermediary later without placing heavy load on the complex database. When the complex database fails, the web application can still query the last snapshot of data from this intermediary layer.
The question is, what should this intermediary layer be? Because I don't know when and how the underlying data changes I can't maintain a live cache of the data. Instead I would need to rely on snapshots of the views in this database.
This is our current idea:
We create a new, intermediary SQL Server/Oracle database. A job runs every 2 minutes for each database view we are currently using, queries it, then dumps the results into a table in our intermediary SQL Server/Oracle database. This would require truncating the intermediary table, then refilling it with the fresh view result data. In the meantime, the web application would be querying these intermediary tables for data. The obvious concern is what happens when the web application is trying to query the intermediary table while it is being truncated and repopulated with fresh data? Another concern is dealing with possible concurrency issues when grabbing data from views that share foreign keys or related data.
Normally, a web application would simply maintain a cache, but this would require being hooked into the add/update/delete events in the database to maintain the state of the cache. On top of that, the cache wouldn't be a full snapshot of the database. If the database were to fail, the cache would be unable to provide a snapshot of data from before the failure.
Any other suggestions on what this magical intermediary layer should be? We are looking at solutions available on both the database end (either SQL Server or Oracle) as well as solutions on the web application side (ASP.NET MVC 5, IIS).

Cloud Architecture

I'm researching cloud services to host an e-commerce site. And I'm trying to understand some basics on how they are able to scale things.
From what I can gather from AWS, Rackspace, etc documentation:
Setup 1:
You can get an instance of a webserver (AWS - EC2, Rackspace - Cloud Server) up. Then you can grow that instance to have more resources or make replicas of that instance to handle more traffic. And it seems like you can install a database local to these instances.
Setup 2:
You can have instance(s) of a webserver (AWS - EC2, Rackspace - Cloud Server) up. You can also have instance(s) of a database (AWS - RDS, Rackspace - Cloud Database) up. So the webserver instances can communicate with the database instances through a single access point.
When I use the term instances, I'm just thinking of replicas that can be access through a single access point and data is synchronized across each replica in the background. This could be the wrong mental image, but it's the best I got right now.
I can understand how setup 2 can be scalable. Webserver instances don't change at all since it's just the source code. So all the http requests are distributed to the different webserver instances and is load balanced. And the data queries have a single access point and are then distributed to the different database instances and is load balanced and all the data writes are sync'd between all database instances that is transparent to the application/webserver instance(s).
But for setup 1, where there is a database setup locally within each webserver instance, how is the data able to be synchronized across the other databases local to the other web server instances? Since the instances of each webserver can't talk to each other, how can you spin up multiple instances to scale the app? Is this setup mainly for sites with static content where the data inside the database is not getting changed? So with an e-commerce site where orders are written to the database, this architecture will just not be feasible? Or is there some way to get each webserver instance to update their local database to some master copy?
Sorry for such a simple question. I'm guessing the documentation doesn't say it plainly because it's so simple or I just wasn't able to find the correct document/page.
Thank you for your time!
Update:
Moved question to here:
https://webmasters.stackexchange.com/questions/32273/cloud-architecture
We have one server setup to be the application server, and our database installed across a cluster of separate machines on AWS in the same availability zone (initially three but scalable). The way we set it up is with a "k-safe" replication. This is scalable as the data is distributed across the machines, and duplicated such that one machine could disappear entirely and the site continues to function. THis also allows queries to be distributed.
(Another configuration option was to duplicate all the data on each of the database machines)
Relating to setup #1, you're right, if you duplicate the entire database on each machine with load balancing, you need to worry about replicating the data between the nodes, this will be complex and will take a toll on performance, or you'll need to sacrifice consistency, or synchronize everything to a single big database and then you lose the effect of clustering. Also keep in mind that when throughput increases, adding an additional server is a manual operation that can take hours, so you can't respond to throughput on-demand.
Relating to setup #2, here scaling the application is easy and the cloud providers do that for you automatically, but the database will become the bottleneck, as you are aware. If the cloud provider scales up your application and all those application instances talk to the same database, you'll get more throughput for the application, but the database will quickly run out of capacity. It has been suggested to solve this by setting up a MySQL cluster on the cloud, which is a valid option but keep in mind that if throughput suddenly increases you will need to reconfigure the MySQL cluster which is complex, you won't have auto scaling for your data.
Another way to do this is a cloud database as a service, there are several options on both the Amazon and RackSpace clouds. You mentioned RDS but it has the same issue because in the end it's limited to one database instance with no auto-scaling. Another MySQL database service is Xeround, which spreads the load over several database nodes, and there is a load balancer that manages the connection between those nodes and synchronizes the data between the partitions automatically. There is a single access point and a round-robin DNS that sends the requests to up to thousands of database nodes. So this might answer your need for a single access point and scalability of the database, without needing to setup a cluster or change it every time there is a scale operation.

Caching to a local SQL instance on a web server

I run a very high traffic(10m impressions a day)/high revenue generating web site built with .net. The core meta data is stored on a SQL server. My team and I have a unique caching strategy that involves querying the database for new meta data at regular intervals from a middle tier server, serializing the data to files and sending those to the web nodes. The web application uses the data in these files (some are actually serialized objects) to instantiate objects and caches those in memory to use for real time requests.
The advantage of this model is that it:
Allows the web nodes to cache all data in memory and not incur any IO overhead querying a database.
If the database ever goes down either unexpectedly or for maintenance windows, the web servers will continue to run and generate revenue. You can even fire up a web server without having to retrieve its initial data from the DB because all the data it needs are in files on its own disks.
Allows us to be completely horizontally scalable. If throughput suffers, we can just add a web server.
The disadvantages are that this caching and persistense layers adds complexity in the code that queries the database, packages the data and unpackages it on the web server. Any time our domain model requires us to add entities, more of this "plumbing" has to be coded. This architecture has been in place for four years and there are probably better ways to tackle this.
One strategy I have been considering is using replication to replicate our master sql server database to local database instances installed on each web server. The web server application would use normal sql/ORM techniques to instantiate objects. Here, we can still sustain a master database outage and we would not have to code up specialized caching code and could instead use nHibernate to handle the persistence.
This seems like a more elegant solution and would like to see what others think or if anyone else has any alternatives to suggest.
I think you're overthinking this. SQL Server already has mechanisms available to you to handle these kinds of things.
First, implement a SQL Server cluster to protect your main database. You can fail over from node to node in the cluster without losing data, and downtime is a matter of seconds, max.
Second, implement database mirroring to protect from a cluster failure. Depending on whether you use synchronous or asynchronous mirroring, your mirrored server will either be updated in realtime or a few minutes behind. If you do it in realtime, you can fail over to the mirror automatically inside your app - SQL Server 2005 & above support embedding the mirror server's name in the connection string, so you don't even have to lift a finger. The app just connects to whatever server's live.
Between these two things, you're protected from just about any main database failure short of a datacenter-wide power outage or network outage, and there's none of the complexity of the replication stuff. That covers your high availability issue, and lets you answer the scaling question separately.
My favorite starting point for scaling is using three separate connection strings in your application, and choose the right one based on the needs of your query:
Realtime - Points directly at the one master server. All writes go to this connection string, and only the most mission-critical reads go here.
Near-Realtime - Points at a load balanced pool of read-only SQL Servers that are getting updated by replication or log shipping. In your original design, these lived on the web servers, but that's dangerous practice and a maintenance nightmare. SQL Server needs a lot of memory (not to mention money for licensing) and you don't want to be tied into adding a database server for every single web server.
Delayed Reporting - In your environment right now, it's going to point to the same load-balanced pool of subscribers, but down the road you can use a technology like log shipping to have a pool of servers 8-24 hours behind. These scale out really well, but the data's far behind. It's great for reporting, search, long-term history, and other non-realtime needs.
If you design your app to use those 3 connection strings from the start, scaling is a lot easier, and doesn't involve any coding complexity - just pick the right connection string.
Have you considered memcached? Since it is:
in memory
can run locally
fully scalable horizontally
prevents the need to re-cache on each web server
It may fit the bill. Check out Google for lots of details and usage stories.
Just some addition to what RickNZ proposed above..
Since your master data which you are caching currently won't change so frequently and probably over some maintenance window, here is what should you do first on database side:
Create a SNAPSHOT replication for the master tables which you want to cache. Adding new entities will be equally easy.
On all the webservers, install SQL Express and subscribe to this Publication.
Since, this is not a frequently changing data, you can rest assure, no much server resource usage issue minus network trips for master data.
All your caching which was available via previous mechanism is still availbale minus all headache which comes when you add new entities.
Next, you can leverage .NET mechanisms as suggested above. You won't face memcached cluster failure unless your webserver itself goes down. There is a lot availble in .NET which a .NET pro can point out after this stage.
It seems to me that Windows Server AppFabric is exactly what you are looking for. (AKA "Velocity"). From the introductory documentation:
Windows Server AppFabric provides a
distributed in-memory application
cache platform for developing
scalable, available, and
high-performance applications.
AppFabric fuses memory across multiple
computers to give a single unified
cache view to applications.
Applications can store any
serializable CLR object without
worrying about where the object gets
stored. Scalability can be achieved by
simply adding more computers on
demand. The cache also allows for
copies of data to be stored across the
cluster, thus protecting data against
failures. It runs as a service
accessed over the network. In
addition, Windows Server AppFabric
provides seamless integration with
ASP.NET that enables ASP.NET session
objects to be stored in the
distributed cache without having to
write to databases. This increases
both the performance and scalability
of ASP.NET applications.
Have you considered using SqlDependency caching?
You could also write the data to the local disk at the web tier, if you're concerned about initial start-up time or DB outages. But at least with a SqlDependency, you shouldn't have to poll the DB to look for changes. It can also be made relatively transparent.
In my experience, adding a DB instance on web servers generally doesn't work out too well from a scalability or performance perspective.
If you're concerned about performance and scalability, you might consider partitioning your data tier. The specifics depend on your app, but as an example, you could move read-only data onto a couple of SQL Express servers that are populated with replication.
In case it helps, I talk about this subject at length in my book (Ultra-Fast ASP.NET).

What is the use of replication in SQLSERVER2005

Hi can any body tell me what is use of replication in sqlserver2005.
backup and replicaton looks same?what is diference b/w them
Backups are exactly that: backups. They enable you to recover the data if something bad happens.
Replication is another beast entirely. It basically distributes the data across multiple nodes so that each node has a complete, (close to) up-to-date copy of the data.
There are a number of reasons why you would use replication including, but not limited to:
High availability so that, if one node goes down, other nodes can still service requests.
Geographical distribution, meaning your data can be placed close to those that need it. Clients in Belarus don't need to go all the way to Montana to get the data if you maintain a local replica in Belarus (or somewhere close) - this is for performance. You may have 10,000 clients in Belarus - it's quicker to send one copy over than have all 10,000 request data [although this depends on how often they request data].
Prioritization. If your reporting users (bank management) have a lower service level agreement than your customer-facing staff (bank tellers) [and they should], you can put all the management onto a replica so as not to slow down the primary copy.
Replication is used for a different purpose, for example to make reports without putting that load on the 'real' database.
Replication increases system availability. If one set of database is down, you can serve out of replica.
Backup saves you from catastrophic errors such as human error that dropped the production database. Note that in this case, replication won't save you as it will dutifully replicate drop command.
SQL Server replication is the process of distributing data from a source database to one or more destination databases throughout the enterprise.
Replication is a great solution for maintaining a reporting server.
Clients at the site to which the data is replicated experience improved performance because those clients can access data locally rather than connecting to a remote database server over a network.
Clients at all sites experience improved availability of replicated data. If the local copy of the replicated data is unavailable, clients can still access the remote copy of the data.
Replication: Lots of data, fast and most recent.
Backup/Restore: Some data, perhaps a bit slower, and a specific point in time.
Replication can be used to address a number of different scenarios as detailed below.
Just to be clear however, Replication is not the same as Database Backup
Scenarios:
Server to server: Replicating Data in a Server to Server Environment
Improving Scalability and Availability
Data Warehousing and Reporting
Integrating Data from Multiple Sites(Server)
Integrating Heterogeneous
Data Offloading Batch Processing
Server to client: Replicating Data Between a Server and Clients
Exchanging Data with Mobile Users
Consumer Point of Sale (POS)
Applications Integrating Data from
Multiple Sites (Client)
For a full overview of Microsoft SQL Server Replication see the following Microsoft reference.
http://msdn.microsoft.com/en-us/library/ms151198(SQL.90).aspx
Choose the track that is most appropriate to you (i.e. Developer / Architect) and all shall be revealed :-)

Pattern for very slow DB Server

I am building an Asp.net MVC site where I have a fast dedicated server for the web app but the database is stored in a very busy Ms Sql Server used by many other applications.
Also if the web server is very fast, the application response time is slow mainly for the slow response from the db server.
I cannot change the db server as all data entered in the web application needs to arrive there at the end (for backup reasons).
The database is used only from the webapp and I would like to find a cache mechanism where all the data is cached on the web server and the updates are sent to the db asynchronously.
It is not important for me to have an immediate correspondence between read db data and inserted data: think like reading questions on StackOverflow and new inserted questions that are not necessary to show up immediately after insertion).
I thought to build an in between WCF service that would exchange and sync the data between the slow db server and a local one (may be an Sqllite or an SqlExpress one).
What would be the best pattern for this problem?
What is your bottleneck? Reading data or Writing data?
If you are concerning about reading data, using a memory based data caching machanism like memcached would be a performance booster, As of most of the mainstream and biggest web sites doing so. Scaling facebook hi5 with memcached is a good read. Also implementing application side page caches would drop queries made by the application triggering lower db load and better response time. But this will not have much effect on database servers load as your database have some other heavy users.
If writing data is the bottleneck, implementing some kind of asyncronyous middleware storage service seems like a necessity. If you have fast and slow response timed data storage on the frontend server, going with a lightweight database storage like mysql or postgresql (Maybe not that lightweight ;) ) and using your real database as an slave replication server for your site is a good choise for you.
I would do what you are already considering. Use another database for the application and only use the current one for backup-purposes.
I had this problem once, and we decided to go for a combination of data warehousing (i.e. pulling data from the database every once in a while and storing this in a separate read-only database) and message queuing via a Windows service (for the updates.)
This worked surprisingly well, because MSMQ ensured reliable message delivery (updates weren't lost) and the data warehousing made sure that data was available in a local database.
It still will depend on a few factors though. If you have tons of data to transfer to your web application it might take some time to rebuild the warehouse and you might need to consider data replication or transaction log shipping. Also, changes are not visible until the warehouse is rebuilt and the messages are processed.
On the other hand, this solution is scalable and can be relatively easy to implement. (You can use integration services to pull the data to the warehouse for example and use a BL layer for processing changes.)
There are many replication techniques that should give you proper results. By installing a SQL Server instance on the 'web' side of your configuration, you'll have the choice between:
Making snapshot replications from the web side (publisher) to the database-server side (suscriber). You'll need a paid version of SQLServer on the web server. I have never worked on this kind of configuration but it might use a lot of the web server ressources at scheduled synchronization times
Making merge (or transactional if requested) replication between the database-server side (publisher) and web side(suscriber). You can then use the free version of MS-SQL Server and schedule the synchronization process to run according to your tolerance for potential loss of data if the web server goes down.
I wonder if you could improve it adding a MDF file in your Web side instead dealing with the Sever in other IP...
Just add an SQL 2008 Server Express Edition file and try, as long as you don't pass 4Gb of data you will be ok, of course there are more restrictions but, just for the speed of it, why not trying?
You should also consider the network switches involved. If the DB server is talking to a number of web servers then it may be being constrained by the network connection speed. If they are only connected via a 100mb network switch then you may want to look at upgrading that too.
the WCF service would be a very poor engineering solution to this problem - why make your own when you can use the standard SQLServer connectivity mechanisms to ensure data is transferred correctly. Log shipping will send the data across at selected intervals.
This way, you get the fast local sql server, and the data is preserved correctly in the slow backup server.
You should investigate the slow sql server though, the performance problem could be nothing to do with its load, and more to do with the queries and indexes you're asking it to work with.

Resources