Have you ever encountered something so easy to develop but stopped a while to think of server requirements for your project ? It is my case.
I want to compete with a gaming site, they have multiplayer Flash games like poker, rummy, backgammon, and other card games, 8 games in total. For each game they have rooms and tables.
I'll use Silverlight with Sockets. I already managed to develop the policy server, the Socket Server app using WinForms, the Client Socket app in Silverlight. I own a VPS for tests, so there is no problem in developing what I want, the problem is How to calculate server requirements, RAM, bandwidth, internet speed based on the following requirements:
Server should support 24.000 users / day or 1000 users / hour
Each game room should have it's own tables where users can play
Users should not lose scores and game speed should be fast in general
I just wonder how to handle the following situation: if 1000 users are connected through Socket connection to a room full of tables and one user leave a table, all 1000 users must be updated and UI should reflect the changes. Let's say that I'll update the clients by sending a small Message of 100 bytes to each user, this will eat 100 bytes * 1000 users = 100 kb, and this just for 1 UI change, for 1 Game and for 1 Room, not counting all my other games and rooms. Also 1000 iterations that sends bytes to clients should be very time consuming.I am a developer, but not experienced in those situations. Please advice. Numbers will be great.
Until you've built -- and optimized -- your actual applications, you cannot predict much about the hardware required for some level of performance.
You have to finish the apps first. Then you can measure their performance under load. Then you can decide how much to spend on what levels of performance.
The best answer I can offer you is to run stress tests and see how much load a single server can support. While running those tests, monitor memory, IO, CPU and disk activity (if relevant) to understand which resource is running out first.
We deploy our applications on Amazon's EC2 cloud infrastructure. That lets us easily (within minutes) add or remove capacity as needed. Perhaps it's worth considering for your situation.
Always follow these two rules
“The First Rule of Program Optimization: Don't do it. The Second Rule of Program Optimization (for experts only!): Don't do it yet.” - Michael A. Jackson
First of all you should think more about how and when to send what information to which clients. Not every client needs to be informed about every table change.
That there are only so much informations that a client needs, and you need to decide when/how it will be transmitted. Also you should pack the informations into meaningfull packets. Whats happening at a table is only interesting for that table.
Also you need to profile your application to make sure you know what ressources it consumes. Cardgames should not eat up so much ressources. But the important point is to FIRST build it, and when you HAVE a bottleneck, then try to fix it.
It's very difficult to guess at these things at this point.
From a pragmatic standpoint, you may eventually want to look into a) a cloud-hosting type service for better bandwidth price-scaling for you, or b) a very experienced full-service hosting company that can help you calculate your needs based on prior experience.
Disclaimer: I work for Rackspace Hosting which provides both of the above.
Related
We are running an eCommerce venture that has around 2000 unique visitors in a day. The total data is around 6 GB as of now.
We are using SQL Server as our database and in the coming months the website may scale up to 10000 users per day.
From this link deciphered that it would be best to use M1 instance but could anyone help really clueless as to what to purchase from these options.
Note: Our budget is around 170 Dollars PM.
EDIT: The number of concurrent users we have had is around 150
I'd try to fit everything in memory. If you can't due to budget, you need to make sure the disk response times are up to par for your expected load. You application can vary widely. One visit to a homepage could generate many queries, or maybe you have application caching set up - so it's hard for anyone to just tell you. You should also get solid numbers on your peak number of concurrent users so you can plan for that. You don't mention your current environment, but you can get some numbers about CPU, Disk MB Read/Writes/s and memory used to help you get the right size.
I'd look at the xlarge m1. That gives you 15GB of memory to play with. You'll be able to cache all the data you need and have some left over for the OS and also have some room to grow. CPU probably won't be your issue, but be sure to check out your current use.
If you have some time to spend on it, I'd try setting up JMeter to do some load testing and see how many concurrent users you can max out with one of the cheaper options.
This topic may be better suited to ServerFault.
I'd suggest you to check at the reserved instances in heavy category where you can get interesting discounts if you plan to run it for a year or similar.
But with that budget you should be thinking about an m1.medium instance, which might be a little tight for your requirements.
I have a background in web programming where both the data and the code live on the server. Web hosts with mysql or the like are plentiful and cheap so using the application from multiple pcs was never a problem.
However I'm considering switching to building desktop applications but the only factor that annoys me is the syncing of data across the many pcs I use. I was thinking of perhaps setting up a light amazon ec2 instance with a postgresql on it and having my desktop applications use that.
I have a few questions:
I'm curious as to what latency I might expect by running the database on ec2 instead of the local network, any experience or insight is appreciated.
Are there better/more obvious/cheaper solutions?
I've looked at the pricing and it seems to come down to 24.48$ per month for a yearly contract. Whilst not really expensive, it is not exactly cheap either. At what point does it become more interesting to run a local server?
I'm obviously not using my applications for large parts of the day (sleep, work,...). I was wondering if I can have the amazon server go into a sort of "sleep" mode and wake up when poked. An initial delay for the first desktop application is acceptable. The reason behind this behavior would be to save money on the instance if it is only actually needed for 10% of the day.
I welcome any feedback at all on how this problem is best tackled.
This could get ugly. Every single query you do will have latency associated with it. If you have a lot of queries, this can add up very fast. So keep your query count low, and try to pre-fetch and cache data when possible.
Not enough information to answer that question.
Depends on the cost of your local server. Keep in mind that you will need to pay for electricity to keep it on.
You can stop your instance when you are not needing it, with the exception of high utilization reservations, you wont get billed when its in stopped state. With high utilization reservations you will still pay the full cost.
I want to scale an e-commerce portal based on LAMP. Recently we've seen huge traffic surge.
What would be steps (please mention in order) in scaling it:
Should I consider moving onto Amazon EC2 or similar? what could be potential problems in switching servers?
Do we need to redesign database? I read, Facebook switched to Cassandra from MySql. What kind of code changes are required if switched to Cassandra? Would Cassandra be better option than MySql?
Possibility of Hadoop, not even sure?
Any other things, which need to be thought of?
Found this post helpful. This blog has nice articles as well. What I want to know is list of steps I should consider in scaling this app.
First, I would suggest making sure every resource served by your server sets appropriate cache control headers. The goal is to make sure truly dynamic content gets served fresh every time and any stable or static content gets served from somebody else's cache as much as possible. Why deliver a product image to every AOL customer when you can deliver it to the first and let AOL deliver it to all the others?
If you currently run your webserver and dbms on the same box, you can look into moving the dbms onto a dedicated database server.
Once you have done the above, you need to start measuring the specifics. What resource will hit its capacity first?
For example, if the webserver is running at or near capacity while the database server sits mostly idle, it makes no sense to switch databases or to implement replication etc.
If the webserver sits mostly idle while the dbms chugs away constantly, it makes no sense to look into switching to a cluster of load-balanced webservers.
Take care of the simple things first.
If the dbms is the likely bottle-neck, make sure your database has the right indexes so that it gets fast access times during lookup and doesn't waste unnecessary time during updates. Make sure the dbms logs to a different physical medium from the tables themselves. Make sure the application isn't issuing any wasteful queries etc. Make sure you do not run any expensive analytical queries against your transactional database.
If the webserver is the likely bottle-neck, profile it to see where it spends most of its time and reduce the work by changing your application or implementing new caching strategies etc. Make sure you are not doing anything that will prevent you from moving from a single server to multiple servers with a load balancer.
If you have taken care of the above, you will be much better prepared for making the move to multiple webservers or database servers. You will be much better informed for deciding whether to scale your database with replication or to switch to a completely different data model etc.
1) First thing - measure how many requests per second can serve you most-visited pages. For well-written PHP sites on average hardware it must be in 200-400 requests per second range. If you are not there - you have to optimize the code by reducing number of database requests, caching rarely changed data in memcached/shared memory, using PHP accelerator. If you are at some 10-20 requests per second, you need to get rid of your bulky framework.
2) Second - if you are still on Apache2, you have to switch to lighthttpd or nginx+apache2. Personally, I like the second option.
3) Then you move all your static data to separate server or CDN. Make sure it is served with "expires" headers, at least 24 hours.
4) Only after all these things you might start thinking about going to EC2/Hadoop, build multiple servers and balancing the load (nginx would also help you there)
After steps 1-3 you should be able to serve some 10'000'000 hits per day easily.
If you need just 1.5-3 times more, I would go for single more powerfull server (8-16 cores, lots of RAM for caching & database).
With step 4 and multiple servers you are on your way to 0.1-1billion hits per day (but for significantly larger hardware & support expenses).
Find out where issues are happening (or are likely to happen if you don't have them now). Knowing what is your biggest resource usage is important when evaluating any solution. Stick to solutions that will give you the biggest improvement.
Consider:
- higher than needed bandwidth use x user is something you want to address regardless of moving to ec2. It will cost you money either way, so its worth a shot at looking at things like this: http://developer.yahoo.com/yslow/
- don't invest into changing databases if that's a non issue. Find out first if that's really the problem, and even if you are having issues with the database it might be a code issue i.e. hitting the database lots of times per request.
- unless we are talking about v. big numbers, you shouldn't have high cpu usage issues, if you do find out where they are happening / optimization is worth it where specific code has a high impact in your overall resource usage.
- after making sure the above is reasonable, you might get big improvements with caching. In bandwith (making sure browsers/proxy can play their part on caching), local resources usage (avoiding re-processing/re-retrieving the same info all the time).
I'm not saying you should go all out with the above, just enough to make sure you won't get the same issues elsewhere in v. few months. Also enough to find out where are your biggest gains, and if you will get enough value from any scaling options. This will also allow you to come back and ask questions about specific problems, and how these scaling options relate to those.
You should prepare by choosing a flexible framework and be sure things are going to change along the way. In some situations it's difficult to predict your user's behavior.
If you have seen an explosion of traffic recently, analyze what are the slowest pages.
You can move to cloud, but EC2 is not the best performing one. Again, be sure there's no other optimization you can do.
Database might be redesigned, but I doubt all of it. Again, see the problem points.
Both Hadoop and Cassandra are pretty nifty, but they might be overkill.
I have a very limited experience of database programming and my applications that access databases are simple ones :). Until now :(. I need to create a medium-size desktop application (it's called rich client?) that will use a database on the network to share data between multiple users. Most probably i will use C# and MSSQL/MySQL/SQLite.
I have performed a few drive tests and discovered that on low quality networks database access is not so smooth. In one company's LAN it's a lot of data transferred over network and servers are at constant load, so it's a common situation that a simple INSERT or SELECT SQL query will take 1-2 minutes or even fail with timeout / network error.
Is it any best practices to handle such situations? Of course i can split my app into GUI thread and DB thread so network problems will not lead to frozen GUI. But what to do with lots of network errors? Displaying them to user too often will be not very good :(. I'm thinking about automatic creating local copy of a database on each computer my app is running: first updating local database and synchronize it in background, simple retrying on network errors. This will allow an app to function event if network has great lags / problems.
Any hints and buzzwords what can i look into? Maybe it's some best practices already available that i don't know :)
Sorry this is prob not the answer you are looking for but you mention that a simple insert / update could take 1-2 minutes or even fail with timeout / network error.
This to me sounds like there may be another problem rather than the network itself. If your working on a corporate network there would have to be insane levels of traffic for this sort of behavior. I would do everything in your power to look at improving the network before proceeding. Can you post the result of a ping to the db box?
If your going to architect your application around this type of network it will significantly alter the end product and even possibly result in a poor quality product for other clients.
Depending upon the nature of the application maybe look at implementing an async persistence queue and caching data on startup or even embedding a copy of the db into your application.
Even though async behaviour/queues/caching/copying the database to each local instance etc will help solve the symptoms, the problem will still remain. If the network really is that bad then I'd address it with their I.T. department, or the project manager and build some performance requirement from their side of things into the contract.
With a distributed application, where you have lots of clients and one main server, should you:
Make the clients dumb and the server smart: clients are fast and non-invasive. Business rules are needed in only 1 place
Make the clients smart and the server dumb: take as much load as possible off of the server
Additional info:
Clients collect tons of data about the computer they are on. The server must analyze all of this info to determine the health of these computers
The owners of the client computers are temperamental and will shut down the clients if the client starts to consume too many resources (thus negating the purpose of the distributed app in helping diagnose problems)
You should do as much client-side processing as possible. This will enable your application to scale better than doing processing server-side. To solve your temperamental user problem, you could look into making your client processes run at a very low priority so there's no noticeable decrease in performance on the part of the user.
In a client-server setting, if you care about security, you should always program on the assumption that the client may have been compromised. Even if it hasn't, there is always the risk of somebody using an old version of the client, using a competing or modified version of the client, or just of the net connection being a bit screwy.
So while you do as much work on the client as possible, processing and marshalling information into the right form, the server then needs to do a thorough sanity check on anything the client gives it.
So the answer I guess is "both".
The server must analyze all of this
info to determine the health of these
computers
That is probably the biggest clue so far explaning what your application is kinda about. Are you able to provide a more elaborate briefing on what this application is seeking to achieve in this distributed environment? We do not even know if the client-side processing is disk I/O or processor intensive. How you design the solution is dependent on the nature of what needs to be done to help the users/business accomplish their jobs and objectives.