Large Scale Distributed Application in AWS / Azure

Large Scale Distributed Application in AWS / Azure - sql-server

I am working on a large scale web app where the users can be from anywhere in the world.
Considerations:
1. Web servers needs to be distributed in possibly 3 data centers possibly across 3 continents.
2. Each datacenter might have 2 webservers (ASP.NET) to start with and can scale out
3. The database needs to be partitioned (SQL Server sherding). Not thinking of separate database instances and mirroring
4. The application will have media contents. So, a CDN might be right fit for them.
Hosting Options:
1. Azure/AWS IaaS: In this case the Web and Applciation servers needs to be configured and managed by us
2. Azure/AWS PaaS: Here we get tied up with using vendor specific tools, code blocks and "way of doing things" and finally one fine morning they announce they are retiring a dependant service (eg. SQL Azure federation). Also, to consider the limits around db max size 150GB for Az SQL and throttling within shared services.
So, Hosting Option 1 looks the safe bet around.
Now my questions:
I need to consider a load balancing server for each DATA Center that will route traffic to 2 or more Web Servers in that data center. But what about managing the traffic from anywhere of the world come to its desired data centers? In the IaaS model, where do I put the load balancer that distributes web traffic across datacenters?
I came across Azure traffic manager that seems take care of the problem 1 above, but does that work with their IaaS offering? What is the equivalent in AWS? It is desired that when an user connects from APAC, they get redirected to the DC in Asia.
In sherding model, we want to partition specific database tables and not all. I am not quite familiar, but how does failover work in sherded database? Can I have active passive SQL servers for each member database server in the federation? (BTW is SQL federation same as Sherding?)
The application itself is ASP.NET and SQL Server based.

Max size now 1 terabyte for SQL Azure - see https://azure.microsoft.com/en-us/pricing/details/sql-database/ and look at the premium tier...

Related

Azure VM availability, mirroring

Apologies for the noob question, I've never dealt with failover before.
Currently we have a single hardware server running Windows Server, SQL Server, ASP.NET and a single (very large) web application. We are considering migrating this to an Azure VM.
I see in the SLA that Microsoft will only guarantee 99.95% availability if I am running more than one instance of an Azure VM, to allow for failure and reboots etc.
Does this mean I therefore would have two servers to manage and maintain? For example, two versions of SQL with a database on each, and two sets of ASP.NET application files? If correct, this puts the price up dramatically.
I assume there is no way to 'mirror' one server across to the other to reduce this workload?
Also, our hardware server has 25,000 uploaded files on it. Would we need to put these on a VHD then 'link' them to whichever live server was running, or does Azure do this automatically? Or do they have to be mirrored from the live server to the failover server?
Any pointers would be appreciated. I've already read all the Azure documentation but it hasn't really made things much clearer...

Seems like you have multiple topics you should look after.
Let's start with the database. The easiest thing would be, if you could migrate your sql server into the sql azure one. Than you would not have no need to maintain it and to maintain the machines you should use.
This would you give the advantage, that you central component can be used by 1 to many applications.
Second one are you uploaded files. I assume that your application allows to upload files for sharing or something else. The best thing would be, if you could just write these files into the windows azure blobstorage. Often this means you have to rewrite a connector, but this would centralize another component.
For the first step you could make them available and clients can download it with the help of a link. If not you could load the files from their and deliver them to the customer.
If you don't want to rewrite your component, you should have to use the VHD. One VHD can only have one lease. So only one instance can be used. A common way I have seen is that if the application is starting, it is trying to "recover" the lease. (try-and-error like)
Last but not least your ASP.NET application. If you have such an application I would have a look into cloud instances. Try not to consider the VMs, because than you have to do all the management. VMs are the IaaS. With a .NET application should easily be able to convert it and deploy instances.
Than you have not to think about failover and so on. Just deploy 2 instances and the load-balancer will do the rest.
If you are able to "outsource" the SQL server, you could minimize your machine for the ASP.net application. Try to use scale-out and not scale-up. This means use more smaller nodes, than one big one. (if possible)
If you are really going the VM way, you have to manage all the stuff by yourself and yes than you need 2 vms. You are also need 3 vms, because you have no auto-loadbalancer and if you only have 2 just one machine can have the port 80 exported.
HTH

Options for cloud based MS Access backend

Our company has used Access for its database needs. It wants to stick with the current database frontend, but migrate the tables to some cloud based solution. We do not want to host SharePoint or pay a monthly fee for Office 365. I have used MySql as a backend, but we had to install drivers. We will also be gathering data from Google Forms on a regular basis. Can anyone suggest possible options for this combination? Data from Google Forms, Access frontend, and cloud based backend.
Thanks!

When you say you don't want any monthly fees, are you thus expecting to find some database server system for free and without cost that allows external connections? I simply don't think you going to find such a service for free.
Since office 365 starts at $6 per month, then I not sure why you think this is too high of a cost here? So you could certainly consider to up-size your Access back end tables up to office 365 and continue to use your Access front end. And more amazing is with Access 2010 this means you get a "off line" and disconnected mode. This means that your application will continue to run EVEN WITH NO internet connection. The instant you find a wifi then the data sync process starts again. And this sync is not file based, but record based and is really replication built into the product and this setup requires ZERO extra code on your part.
And since your back end is not an access file, then you can now scale out to millions of users – the only real limit is the size of the 365 server farm (a super huge computer farm).
Keep in mind that in addition to simply linking your Access application to these office 365 tables, you can also publish Access web forms to office 365. So in the following video at the halfway point I switch to running the Access application 100% inn a browser:
http://www.youtube.com/watch?v=AU4mH0jPntI
Note the above resulting browser application does not require any ActiveX or Silverlight. And as noted, again this is again based on that massive server farm.
Another cloud approach is to consider SQL Azure. Access 2010 also has baked into the product the ability to use the cloud based edition of SQL server running on the Azure OS.
So, you could consider using SQL Azure, but that going to be about $10 per month.
I think the office 365 deal at $6 per month is the best bet (and you get lync communication which gives you remote desktop support for your customers or perhaps for supporting this application!). I actually think that Lync makes the $6 worth it alone. Toss in most SharePoint features and document sharing (including free web based Word, and Excel), this is hard to beat.
So it not clear here why you are avoiding office 365, but will have to adopt some kind of server setup here and I not aware of ANY system that going to allow free external connection from your desktop client software such as Access.
I think the best solution is 365 for use with Access.
Another low cost solution I used in the past is to consider some VERY low cost web hosting sites that also allow external connections to their database. In fact I did this for a good number of years (I did not even use the web site hosting!). I simply purchased the monthly web site and used the ability to "connect" external to the database server that was part of the web hosting package. This I did for a good number of years, and at VERY low cost. I thus was deploying Access front ends to multiple places and using this cheap-o web hosting account.
However, I am much now dropping this low cost web setup with the arrival of office 365 being even less cost than that cheap-o $9 web package I was using for this.
So, at the end of the day, I don't think there is any free hosting that allows external database connections, but the most low cost approach is office 365 at this point in time.

Cloud based does not mean you have to think about your database backend any differently, you could it you wished stick with MS access, however as Access does not natively support remote connections then you would need to setup a VPN to your cloud server in order to connect to the .mdb, .aacdb file.
Dedicated database servers are always a nice option (MySql, Sql Server Express, Postgre SQL to mention the free ones) but you will always need to ensure you have the necessary drivers installed (shouldn't be the end of the world)
As for Google forms, I don't really have much experience with those but I imagine Google would have made every effort to ensure they can be implemented relatively easily with existing database products.

Hosted Databases: How is latency handled?

I read some things about hosted (aka cloud) databases. For example Cloudant offers a hosted CouchDB database or Cassandra.io offers hosted Cassandra. I understand why these services solve some problems.
My question: Why do these services work? I suppose I host my own application on my own servers (or somewhere on a cloud-hosting-platform) and use one of these services to store my data. For every database request (either read or write), I need to pay a full roundtrip over the internet (supposing my application is not hosted in the same place as my database cloud provider uses). Why aren't these roundtrips killing me? When thinking about SQL, every query would cost another x*10ms just for the network, without any time spend.
How is this problem solved? Or are these services not suitable for applications which need fast responses and can only be used for data processing where latency is not an issue?

generally, the physical hosts of hosted database services normally reside in major data-centers (e.g. AWS). In order to reduce network latency, customers can choose whether to host their application on servers that reside in the physical same data-center as their hosted databases reside.
The majority of high-performance applications and/or website that do not use hosted database services usually maintain their application servers and their database servers on separate hosts for performance reason anyways. So, in short, switching to hosted database service would not necessarily increase network latency.

Cloud Architecture

I'm researching cloud services to host an e-commerce site. And I'm trying to understand some basics on how they are able to scale things.
From what I can gather from AWS, Rackspace, etc documentation:
Setup 1:
You can get an instance of a webserver (AWS - EC2, Rackspace - Cloud Server) up. Then you can grow that instance to have more resources or make replicas of that instance to handle more traffic. And it seems like you can install a database local to these instances.
Setup 2:
You can have instance(s) of a webserver (AWS - EC2, Rackspace - Cloud Server) up. You can also have instance(s) of a database (AWS - RDS, Rackspace - Cloud Database) up. So the webserver instances can communicate with the database instances through a single access point.
When I use the term instances, I'm just thinking of replicas that can be access through a single access point and data is synchronized across each replica in the background. This could be the wrong mental image, but it's the best I got right now.
I can understand how setup 2 can be scalable. Webserver instances don't change at all since it's just the source code. So all the http requests are distributed to the different webserver instances and is load balanced. And the data queries have a single access point and are then distributed to the different database instances and is load balanced and all the data writes are sync'd between all database instances that is transparent to the application/webserver instance(s).
But for setup 1, where there is a database setup locally within each webserver instance, how is the data able to be synchronized across the other databases local to the other web server instances? Since the instances of each webserver can't talk to each other, how can you spin up multiple instances to scale the app? Is this setup mainly for sites with static content where the data inside the database is not getting changed? So with an e-commerce site where orders are written to the database, this architecture will just not be feasible? Or is there some way to get each webserver instance to update their local database to some master copy?
Sorry for such a simple question. I'm guessing the documentation doesn't say it plainly because it's so simple or I just wasn't able to find the correct document/page.
Thank you for your time!
Update:
Moved question to here:
https://webmasters.stackexchange.com/questions/32273/cloud-architecture

We have one server setup to be the application server, and our database installed across a cluster of separate machines on AWS in the same availability zone (initially three but scalable). The way we set it up is with a "k-safe" replication. This is scalable as the data is distributed across the machines, and duplicated such that one machine could disappear entirely and the site continues to function. THis also allows queries to be distributed.
(Another configuration option was to duplicate all the data on each of the database machines)

Relating to setup #1, you're right, if you duplicate the entire database on each machine with load balancing, you need to worry about replicating the data between the nodes, this will be complex and will take a toll on performance, or you'll need to sacrifice consistency, or synchronize everything to a single big database and then you lose the effect of clustering. Also keep in mind that when throughput increases, adding an additional server is a manual operation that can take hours, so you can't respond to throughput on-demand.
Relating to setup #2, here scaling the application is easy and the cloud providers do that for you automatically, but the database will become the bottleneck, as you are aware. If the cloud provider scales up your application and all those application instances talk to the same database, you'll get more throughput for the application, but the database will quickly run out of capacity. It has been suggested to solve this by setting up a MySQL cluster on the cloud, which is a valid option but keep in mind that if throughput suddenly increases you will need to reconfigure the MySQL cluster which is complex, you won't have auto scaling for your data.
Another way to do this is a cloud database as a service, there are several options on both the Amazon and RackSpace clouds. You mentioned RDS but it has the same issue because in the end it's limited to one database instance with no auto-scaling. Another MySQL database service is Xeround, which spreads the load over several database nodes, and there is a load balancer that manages the connection between those nodes and synchronizes the data between the partitions automatically. There is a single access point and a round-robin DNS that sends the requests to up to thousands of database nodes. So this might answer your need for a single access point and scalability of the database, without needing to setup a cluster or change it every time there is a scale operation.

Which server platform to choose: SQL Azure or Hosted SQL Server for new project

We're getting ready to build a new platform for our current system. Currently we install sql server express locally to all our clients and all their data is stored there. While the process works pretty good, it's still a pain to add columns/tables etc. We also want to have our data available outside of the local install. So we're moving to a central web based sql database and creating a web based application. Our new application will be a Silverlight 5, wcf ria services, mvvm, entity framework application
We've decided that either a web hosted sql server database or sql azure database are the way to go. However, I have no idea why I would choose one over the other. The limitations of azure don't seem to apply to us, but our application will be run on our current shared web host. Is it better to host the application on the same server as the database? Do we even know with shared web hosting that the server is on the same location as the app? There's also the marketing advantage of being 'in the cloud' which our clients love when we drop that word (they have no idea about anything technical, it's just a buzzword for them). I'm not too worried about the cost as I think both will ultimately be about the equivalent of each other.
I feel like I may be completely overthinking this and either will work, however I'd like to try and get the best solution for us and don't want to choose without getting some feedback.
In case it helps, our application is mostly dashboard/informational data. Mostly financial and trending data. It's almost entirely read only. Sometimes the data can get fairly large and we would be sending upwards of 50,000 rows of data to the application.
Thanks for any help/insight you can provide for me!

The main concerns I would have with using a SQL Azure DB from an application on your current shared web host would be
The effect of network latency: Depending on location, every time you do a DB round trip from your application to the SQL Azure DB you will incur a 50-100ms delay. If your application does lots of round trips, this will mount up. Often, if an application has been designed to work with a DB on the LAN (you use of local client DBs suggests this) the they tend to get "chatty" since network delays are very small on the LAN. You may find your application slows down significantly.
Security: You will have to open up the SQL Azure firewall to the IP address(es) that your application presents when querying. Depending on your host, it may be that this IP address is shared between several tenants. This would be a vulnerability.
If neither of these is a problem, then SQL Azure will provide a much lower management overhead (e.g. no need to patch etc.) and will give you very high reliability, especially in terms of the risk of data loss.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight