Tools to demonstrate load balancing and scheduling in the cloud - google-app-engine

I have been asked by my professor at university to demonstrate how load balancing happens on the cloud. Like, for example, what kind of tools are used to do it.
I have looked at some like nginx, Pen etc, but running them on my laptop to demonstrate load balancing is a little far fetched. I have also looked at the Google App Engine, which seems feasible, but it load balances automatically.
Is there another way I can demonstrate this?
EDIT: I'm looking at Open-Source software only, I can't pay a fee to use it.

have you considered amazon ?
also
http://www.rackspace.com/cloud/cloud_hosting_products/loadbalancers/
Does this help?
http://askville.amazon.com/open-source-load-balancer-software-run-Linux/AnswerViewer.do?requestId=7650176

Some pointers to think about:
TCP mode vs HTTP mode (some tools support both) - you will want to know what difference is.
Load balancing versus load distribution.
Load balancing strategies (eg hashing/consistent hashing, cookies, round robin and more).
Good luck !

Related

How can I Google App Engine performance improved?

I currently has serviced my app in Korea.
But, my app is installed in us-central because GAE not supported install for Asia.
so, i suppose it is very slow because it is faraway from GAE.
If that's problem, how can I solve this problem?
please suggest to me... thank you.
I have been using Google Cloud Platform for 4 years now, Including the Google App Engine. The performance of your application backend system can only be slow if the developer did not optimise the program well. I would suggest that you try out using some of the following key aspect in solving your problem:
Try so much to use MemCache for requests that are common to users and do not require instant real time updates.
Look at the algorithms you are putting in place. This is very important for your execution through put. For example lets say you want to run a search though a billion records, u can use quick search algorithms like QuickSort3way.
Lastly look at the choice of database you are using. You could mix NoSQL with SQL if you were only using SQL. If you are into big data then use BigQuery. This way your application's performance can drastically increase and scale up enormously.

MEAN.JS, high latence / ways to find bottlenecks in web-development

:)
I recently came across MEAN.JS. I'm still a beginner in webdevelopment but all worked really fine so far. Up to one thing.
Unfortunately, all requests seem to take a huge amount of time - 300 - 4000(!) ms for a single call (have a look at the screenshot). I'm developing locally on a state of the art computer and wonder where the bottleneck might be. Does anyone have the same issues? Could you give me a hint how to attack this problem?
I've had a look at this and similar posts, but couldn't find a way to tackle it.
What are the ways to find bottlenecks in a web application?
The framework uses MongoDB, ExpressJS, AngularJS, Node.js. Could you give me a hint how to track down the source of those latencies in a Javascript-based application? (Maybe a tool, plugin or best practice approach in development?) Have you experienced similar issues?
Greetings,
Tea
It's hard to guess what's wrong as that latency can be originated from many sources, however if we put aside computer and network problems/configurations, and taking into account that you don't have any other processes running that can affect your app performance, the first thing I would check is the express configuration, i.e, the order in which the middleware is loaded. A misplaced middleware can indeed influence the app's performance.

Resolving Overloaded Webserver Issues

I am new to the area of web development and currently interviewing companies, the most favorite questions among what people ask is:
How do you scale your webserver if it
starts hitting a million queries?
What would you do if you have just one
database instance running at that
time? how do you manage that?
These questions are really interesting and I would like to learn about them.
Please pour in your suggestions / practices (that you follow) for such scenarios
Thank you
How to scale:
Identify your bottlenecks.
Identify the correct solution for the problem.
Check to see you you can implement the correct solution.
Identify alternate solution and check
Typical Scaling Options:
Vertical Scaling (bigger, faster server hardware)
Load balancing
Split tiers/components out onto more/other hardware
Offload work through caching/cdn
Database Scaling Options:
Vertical Scaling (bigger, faster server hardware)
Replication (active or passive)
Clustering (if DBMS supports it)
Sharding
At the most basic level, scaling web servers consists of writing your app in such a way that it can run on > 1 machine, and throwing more machines at the problem. No matter how much you tune them, the eventual scaling will involve a farm of web servers.
The database issue is way more sticky to deal with. What is your read / write percentage? What kind of application is this? OLTP? OLAP? Social Media? What is the database? How do we add more servers to handle the load? Do we partition our data across multiple dbs? Or replicate all changes to loads of slaves?
Your questions call more questions, i.e. in an interview, if someone just "has the answer" to a generic question like you've posted, then they only know one way of doing things, and that way may or may not be the best one.
There are a few approaches I'd take to the first question:
Are there hardware upgrades that may get things up enough to handle the million queries in a short time? If so, this is likely an initial point to investigate.
Are there software changes that could be made to optimize the performance of the server? I know IIS has a ton of different settings that could be used to improve performance to some extent.
Consider going into a web farm situation rather than use a single server. I actually did have a situation where I worked once where we did have millions of hits a minute and it was thrashing our web servers rather badly and taking down a number of sites. Our solution was to change the load balancer so that a few of the servers served up the site that would thrash the servers so that other servers could keep the other sites up as this was in the fall and in retail this is your big quarter. While some would start here, I'd likely come here last as this can be opening a bit can of worms compared to the other two options.
As for the database instance, it would be a similar set of options to my mind though I may do the multi-server option first as redundancy may be an important side benefit here that I'm not sure it is as easy with a web server. I may be way off, but that is how I'd initially tackle this.
Use a caching proxy
If you serve identical pages to all visitors (say, a news site) you can reduce load by an order of magnitude by caching generated content with a caching proxy such as Varnish or Apache Traffic Server.
The proxy will sit between your server and your visitors. If you get 10,000 hits to your front page it will only have to be generated once, the proxy will send the same response to the other 9999 visitors without asking your app server again.
probably before developer starting to develop the system,
they will consider the specification of the server
maybe you can decrease use of SEO and block it from search engine to craw it
(which is the task that taking a lot of resource)
try to index everything well and avoid to making search easily
Deploy it on the cloud, make sure your web server and webapp cloud ready and can scale across different nodes. I recommend cherokee web server (very easy to load balance across different servers, and benchmarks proves faster than Apache,). For ex, google cloud (appspot) needs your web app to be Python or Java
Use caching proxy eg. Nginx.
For database use memcache on some queries which are suppose to be repeated.
If the company wants data to be private , build a private cloud , Here , Ubuntu is doing very good job at it fully free and opensource : http://www.ubuntu.com/cloud/private

What happens when a live site has too many users?

I'm new to production level web development, so sorry if this is obvious. My site has a potential to have a sudden surge of (permanent) users and I'm wondering what happens if too many users sign up in a short period of time, causing the site to run slowly. Since development takes time, would it just be a case of adding more boxes to the server, or does the site have to be taken down for code improvement?
Thanks
Don't worry even very popular sites go through this. Coding well is always a plus, but sometimes even that is not enough. Twitter being an ideal example, they started their messaging on Ruby but had to move to Scala as they became more and more popular.
Since you say you are new, can I suggest getting yourself familiar with caching queries and caching static content? Learning about good indexing practices on SQL server should also be helpful in dealing with a large influx of users.
Both but code improvement would be the first to target. Writing code that will scale will help you out the most. You can throw more servers at it behind the scenes but you would have to do this less with well architected code that was designed for scalability.
Depends on the technologies your using and how the code you write is written.
Since you tagged sql-server, when it comes to databases in general, you are limited by your locking strategies and your replication architecture a lot of the time. How you design your database and put it into production has big impact. Things that have to happen in any type of serial manner are bottlenecks. Check your execution plans, watch and manage your indexes, and replicate and distribute your systems if you can.
The best way to understand your scalability limitations is through load testing and proper QA.
If you don't do it right, your users are sure to be unhappy when you start 503ing or timing out. :-)
If the site is developed in such a fashion that you can have multiple servers/data access layers, then scalibilty should not be an issue.
Create the app so that you can loadshed as required, and keep the code as flexible as possible.
But from past experiance. Performance tune once it is required. Write easily understandable and maintainable code, and fix performance issues as the occur.
The best advice I can give is to test your app and server before you go live, then you can see when you are likely to get problems and how bad they could be.
It is one thing to say 'it will go slow' but once you get past a certain point your app may crash or randomly give users error 500 pages.
Test with automatic scripts tools to stress the site and simulate sign-ups and random users visiting random pages.
If you have SSL make sure your tools simulate lots of different SSL connections rather than just different HTTP requests ( SSL handshakes take extra resources )

Web Scraping with Google App Engine

I am trying to scrape some website and republish the data as a RSS feed. How hard is this to setup with Google App Engine? Disadvantages and Advantages using GAE. Any recommendations and guidelines greatly appreciated!
Google AppEngine offers much more functionality (and complexity) than you will need if truly all you will want to do is republish some structured data as RSS.
Personally, I would use something like Yahoo pipes for a task like this.
That being said... if you want/need to get your feet wet with GAE, go for it!
Working with Google App Engine is pretty straight forward. I would recommend going through the Getting Started guide. It's short and simple and touches on essential GAE topics. There are more pros and cons than I will list here.
Pros:
In general, App Engine is designed for high traffic web applications that need to scale. Furthermore, it is designed from a programmer's perspective. Much of the scalability issues (database optimization, server administration, etc) are dealt with by Google. Having said that, I find it to be a nice platform. It is still being actively developed by Google engineers, and scheduling of tasks (a feature that has been long requested) is in the current road map.
Cons:
Perhaps the biggest downside right now is again the lack of official scheduling support and the quota limits currently set for free accounts. However you can't complain much if its free. Currently it only supports Python as a programming interface (although a new language [Java I predict] is coming soon). Furthermore, Python 2.6 (and 3.0 for that matter) are not yet supported. In addition, Django 1.0 is not officially supported in App Engine (although you can package Django 1.0 with your application).
Harder than it would be in most other technologies.
GAE can sort of do scheduled batch stuff like this now, but it's really not intended for that type of thing. Pick pretty much any other language and platform for this particular task, and you'll make your life a lot easier.
I think BeautifulSoup could run on GAE, so all your scraping needs are handled :D
Also, GAE has a geturl thingy. The only problem I think you might have is not having enough time to get the data (30 secs limitation).
I am working on a same project and I've decided that it's easier to prepare the data on another server and push them to GAE.
You might also want to look into Yahoo! Query Language (YQL)

Resources