What amazon ec2 would be suggested for an ecommerce website with windows and sql server? - sql-server

We are running an eCommerce venture that has around 2000 unique visitors in a day. The total data is around 6 GB as of now.
We are using SQL Server as our database and in the coming months the website may scale up to 10000 users per day.
From this link deciphered that it would be best to use M1 instance but could anyone help really clueless as to what to purchase from these options.
Note: Our budget is around 170 Dollars PM.
EDIT: The number of concurrent users we have had is around 150

I'd try to fit everything in memory. If you can't due to budget, you need to make sure the disk response times are up to par for your expected load. You application can vary widely. One visit to a homepage could generate many queries, or maybe you have application caching set up - so it's hard for anyone to just tell you. You should also get solid numbers on your peak number of concurrent users so you can plan for that. You don't mention your current environment, but you can get some numbers about CPU, Disk MB Read/Writes/s and memory used to help you get the right size.
I'd look at the xlarge m1. That gives you 15GB of memory to play with. You'll be able to cache all the data you need and have some left over for the OS and also have some room to grow. CPU probably won't be your issue, but be sure to check out your current use.
If you have some time to spend on it, I'd try setting up JMeter to do some load testing and see how many concurrent users you can max out with one of the cheaper options.
This topic may be better suited to ServerFault.

I'd suggest you to check at the reserved instances in heavy category where you can get interesting discounts if you plan to run it for a year or similar.
But with that budget you should be thinking about an m1.medium instance, which might be a little tight for your requirements.

Related

Central data management for custom desktop applications

I have a background in web programming where both the data and the code live on the server. Web hosts with mysql or the like are plentiful and cheap so using the application from multiple pcs was never a problem.
However I'm considering switching to building desktop applications but the only factor that annoys me is the syncing of data across the many pcs I use. I was thinking of perhaps setting up a light amazon ec2 instance with a postgresql on it and having my desktop applications use that.
I have a few questions:
I'm curious as to what latency I might expect by running the database on ec2 instead of the local network, any experience or insight is appreciated.
Are there better/more obvious/cheaper solutions?
I've looked at the pricing and it seems to come down to 24.48$ per month for a yearly contract. Whilst not really expensive, it is not exactly cheap either. At what point does it become more interesting to run a local server?
I'm obviously not using my applications for large parts of the day (sleep, work,...). I was wondering if I can have the amazon server go into a sort of "sleep" mode and wake up when poked. An initial delay for the first desktop application is acceptable. The reason behind this behavior would be to save money on the instance if it is only actually needed for 10% of the day.
I welcome any feedback at all on how this problem is best tackled.
This could get ugly. Every single query you do will have latency associated with it. If you have a lot of queries, this can add up very fast. So keep your query count low, and try to pre-fetch and cache data when possible.
Not enough information to answer that question.
Depends on the cost of your local server. Keep in mind that you will need to pay for electricity to keep it on.
You can stop your instance when you are not needing it, with the exception of high utilization reservations, you wont get billed when its in stopped state. With high utilization reservations you will still pay the full cost.

What is fastest way to Improve database performance? aim straight to my target (5000 users)? Or first use artificial milestones?

When improving database performance, What is the fastest way?
should I go straight to my target (5000 users)?
Or should I go through artificial milestones - (for example - first 1000 users, only then 2500...)?
we have a performance problem. For the purpose of the question, suppose the only bottleneck is the Database and all other resources consume zero time.
Facts:
Database performance for 50 concurrent users is good
Database performance for more then 50 is not acceptable and by that I mean database requests takes too long. Instead of ~0.5 sec they usually take between ~2 and ~7 sec, and sometimes even ~30 sec
On about 70 users, the system stop working at all, most database requests takes more then ~30 sec
Database is implemented in SQL SERVER 2005\8
Requirement:
The application need to support 5000 concurrent users.
From my understanding, and please correct me if I'm wrong or there is something missing, the best practise for database performance improvement is:
Define performance objective targets.
Establish testing environment identical (as much as possible) to production environment
Check if the performance targets is OK. if not do the following until performance targets is OK
3.1 Improve performance (on at a time) by - server re-configuration, indexes improvements ,code improvements, database structure improvements and more, using trace and monitor tools such as - performance counters, SQL Server profiler, logs and more
3.2 Test and analyse the affects of the change and accept or reject it
My question is:
Assuming that getting the system to work on anything else then 5000, say for example 1000, or even 2500 users Doesn't provide any business value, What is the fastest way to get to 5000 users?
Getting there directly?
Or first get to smaller milestones?
for example - first target 500 users, then 1000, then 2500 ....
Something else??
Do I get any technical value from using milestones, that will eventually take me to 5000 users faster?
Tuning up to 500 users, for example, can lead you to solutions that won't be effective when you increase to 1000 users. Example: let's imagine that you can tune cache for 500 users, perhaps you won't have enought RAM to get a good result for 5,000.
Go for your goal.
I don't see that artificial milestones really add any value, if the requirement if 5000 concurrent users, go straight for that, otherwise you run the risk of wasting time making optimisations which don't scale.
I would say that getting small milestone would make you a problem, imagine that you get a small number of users (or a very big one) you would have to find a point where the milestones are not too big nor too small. That can be tricky.

How to calculate hours/month usage on Amazon RDS and Pricing?

I never used Amazon EC2 or RDS Service. I am trying to calculate my cost using http://calculator.s3.amazonaws.com/calc5.html
I searched a little but could locate answers to some basic things. Can you help me out with this:
What does DB Instance means? 1 Database = 1 Instance or 1 Connection = 1 Instance
How to calculate hours/month usage? It should depend on the transfer rates or processing time. Is there a way I can get rough Idea about it?
What if I already have my DB Ready and want to upload it directly (it would be few GBs) then how will it be calculated.
I am new to amazon EC2 and searched stackoverflow and serverfault before posting this question. Got some idea but not specific what I am looking for. Can someone help me out here?
In general, one database = one instance. You spin up instances, and do what you like with them. Definitely possible to have more connections to it.
Hours per month is just that. How many hours per month you have the instance active. If you plan to have the instance active 24/7, you may find more cost effective alternatives with other cloud providers. If you run it less often than that, you save money when it's not active. It's billed hourly to your account at the rate specified.
Upload data is counted at the standard transfer rates. A few GBs doesn't cost much, but you will be paying for the service starting the moment you spin up the instance.

How to gear towards scalability for a start up e-commerce portal?

I want to scale an e-commerce portal based on LAMP. Recently we've seen huge traffic surge.
What would be steps (please mention in order) in scaling it:
Should I consider moving onto Amazon EC2 or similar? what could be potential problems in switching servers?
Do we need to redesign database? I read, Facebook switched to Cassandra from MySql. What kind of code changes are required if switched to Cassandra? Would Cassandra be better option than MySql?
Possibility of Hadoop, not even sure?
Any other things, which need to be thought of?
Found this post helpful. This blog has nice articles as well. What I want to know is list of steps I should consider in scaling this app.
First, I would suggest making sure every resource served by your server sets appropriate cache control headers. The goal is to make sure truly dynamic content gets served fresh every time and any stable or static content gets served from somebody else's cache as much as possible. Why deliver a product image to every AOL customer when you can deliver it to the first and let AOL deliver it to all the others?
If you currently run your webserver and dbms on the same box, you can look into moving the dbms onto a dedicated database server.
Once you have done the above, you need to start measuring the specifics. What resource will hit its capacity first?
For example, if the webserver is running at or near capacity while the database server sits mostly idle, it makes no sense to switch databases or to implement replication etc.
If the webserver sits mostly idle while the dbms chugs away constantly, it makes no sense to look into switching to a cluster of load-balanced webservers.
Take care of the simple things first.
If the dbms is the likely bottle-neck, make sure your database has the right indexes so that it gets fast access times during lookup and doesn't waste unnecessary time during updates. Make sure the dbms logs to a different physical medium from the tables themselves. Make sure the application isn't issuing any wasteful queries etc. Make sure you do not run any expensive analytical queries against your transactional database.
If the webserver is the likely bottle-neck, profile it to see where it spends most of its time and reduce the work by changing your application or implementing new caching strategies etc. Make sure you are not doing anything that will prevent you from moving from a single server to multiple servers with a load balancer.
If you have taken care of the above, you will be much better prepared for making the move to multiple webservers or database servers. You will be much better informed for deciding whether to scale your database with replication or to switch to a completely different data model etc.
1) First thing - measure how many requests per second can serve you most-visited pages. For well-written PHP sites on average hardware it must be in 200-400 requests per second range. If you are not there - you have to optimize the code by reducing number of database requests, caching rarely changed data in memcached/shared memory, using PHP accelerator. If you are at some 10-20 requests per second, you need to get rid of your bulky framework.
2) Second - if you are still on Apache2, you have to switch to lighthttpd or nginx+apache2. Personally, I like the second option.
3) Then you move all your static data to separate server or CDN. Make sure it is served with "expires" headers, at least 24 hours.
4) Only after all these things you might start thinking about going to EC2/Hadoop, build multiple servers and balancing the load (nginx would also help you there)
After steps 1-3 you should be able to serve some 10'000'000 hits per day easily.
If you need just 1.5-3 times more, I would go for single more powerfull server (8-16 cores, lots of RAM for caching & database).
With step 4 and multiple servers you are on your way to 0.1-1billion hits per day (but for significantly larger hardware & support expenses).
Find out where issues are happening (or are likely to happen if you don't have them now). Knowing what is your biggest resource usage is important when evaluating any solution. Stick to solutions that will give you the biggest improvement.
Consider:
- higher than needed bandwidth use x user is something you want to address regardless of moving to ec2. It will cost you money either way, so its worth a shot at looking at things like this: http://developer.yahoo.com/yslow/
- don't invest into changing databases if that's a non issue. Find out first if that's really the problem, and even if you are having issues with the database it might be a code issue i.e. hitting the database lots of times per request.
- unless we are talking about v. big numbers, you shouldn't have high cpu usage issues, if you do find out where they are happening / optimization is worth it where specific code has a high impact in your overall resource usage.
- after making sure the above is reasonable, you might get big improvements with caching. In bandwith (making sure browsers/proxy can play their part on caching), local resources usage (avoiding re-processing/re-retrieving the same info all the time).
I'm not saying you should go all out with the above, just enough to make sure you won't get the same issues elsewhere in v. few months. Also enough to find out where are your biggest gains, and if you will get enough value from any scaling options. This will also allow you to come back and ask questions about specific problems, and how these scaling options relate to those.
You should prepare by choosing a flexible framework and be sure things are going to change along the way. In some situations it's difficult to predict your user's behavior.
If you have seen an explosion of traffic recently, analyze what are the slowest pages.
You can move to cloud, but EC2 is not the best performing one. Again, be sure there's no other optimization you can do.
Database might be redesigned, but I doubt all of it. Again, see the problem points.
Both Hadoop and Cassandra are pretty nifty, but they might be overkill.

Calculate server requirements based on programming specs

Have you ever encountered something so easy to develop but stopped a while to think of server requirements for your project ? It is my case.
I want to compete with a gaming site, they have multiplayer Flash games like poker, rummy, backgammon, and other card games, 8 games in total. For each game they have rooms and tables.
I'll use Silverlight with Sockets. I already managed to develop the policy server, the Socket Server app using WinForms, the Client Socket app in Silverlight. I own a VPS for tests, so there is no problem in developing what I want, the problem is How to calculate server requirements, RAM, bandwidth, internet speed based on the following requirements:
Server should support 24.000 users / day or 1000 users / hour
Each game room should have it's own tables where users can play
Users should not lose scores and game speed should be fast in general
I just wonder how to handle the following situation: if 1000 users are connected through Socket connection to a room full of tables and one user leave a table, all 1000 users must be updated and UI should reflect the changes. Let's say that I'll update the clients by sending a small Message of 100 bytes to each user, this will eat 100 bytes * 1000 users = 100 kb, and this just for 1 UI change, for 1 Game and for 1 Room, not counting all my other games and rooms. Also 1000 iterations that sends bytes to clients should be very time consuming.I am a developer, but not experienced in those situations. Please advice. Numbers will be great.
Until you've built -- and optimized -- your actual applications, you cannot predict much about the hardware required for some level of performance.
You have to finish the apps first. Then you can measure their performance under load. Then you can decide how much to spend on what levels of performance.
The best answer I can offer you is to run stress tests and see how much load a single server can support. While running those tests, monitor memory, IO, CPU and disk activity (if relevant) to understand which resource is running out first.
We deploy our applications on Amazon's EC2 cloud infrastructure. That lets us easily (within minutes) add or remove capacity as needed. Perhaps it's worth considering for your situation.
Always follow these two rules
“The First Rule of Program Optimization: Don't do it. The Second Rule of Program Optimization (for experts only!): Don't do it yet.” - Michael A. Jackson
First of all you should think more about how and when to send what information to which clients. Not every client needs to be informed about every table change.
That there are only so much informations that a client needs, and you need to decide when/how it will be transmitted. Also you should pack the informations into meaningfull packets. Whats happening at a table is only interesting for that table.
Also you need to profile your application to make sure you know what ressources it consumes. Cardgames should not eat up so much ressources. But the important point is to FIRST build it, and when you HAVE a bottleneck, then try to fix it.
It's very difficult to guess at these things at this point.
From a pragmatic standpoint, you may eventually want to look into a) a cloud-hosting type service for better bandwidth price-scaling for you, or b) a very experienced full-service hosting company that can help you calculate your needs based on prior experience.
Disclaimer: I work for Rackspace Hosting which provides both of the above.

Resources