Lighttpd, nginx and others use a range of techniques to provide maximum application performance such as AIO, sendfile, MMIO, caching and epoll and lock free data structures.
My collegue and I have written a little application server which uses many of these techniques and can also server static files. So we tested it with apache bench and compared ours with lighttpd and nginx and have at least matched the performance for static content for files from 100 bytes to 1K.
However, when we compare the transaction rate over the same static files to that of G-WAN, G-WAN is miles ahead.
I know this question may be a little subjective but what techniques apart from the obvious ones I've mentioned might Pierre Gauthier be using in GWAN that would enable him to achieve such astounding performance?
Following G-WAN server for years, I have read the (many) talks covering this question on the old G-WAN forum.
From what I can remember, what was repeatedly addressed were the program:
architecture (specific comparisons were made with nginx, lighty and cherokee)
implementation (how overall branching, request parsing and response building were made)
lean common path (the path followed by all types of requests: dynamic, static, handlers)
Pierre often mentionned other servers to explain what in their specific architecture and implementation was slowing them down.
As time goes, since G-WAN seems to stack more and more features (C# scripts support, a reverse-proxy and a load balancer are expected with the next version), it seems that those 3 points above are more and more important.
This is probably why each new release of G-WAN seems to be willing to be faster than the previous: the more work you do, the more extra fat must be eliminated because its cost gets higher. And like for a race car or a plane this is an incremental process, one calling for more of the other.
If you are looking for the 'secret' of G-WAN's speed then I guess that here is the key point. But if you want more details then you should rather talk directly to the G-WAN author.
Check out G-WAN's timeline. An update on August 8, 2011 might give you idea on what he is using.
G-WAN Timeline
Pierre mentioned that G-WAN uses it's wait-free Key-Value store a lot on G-WAN's core functions. Which gives it more speed since there's no locks being used.
He also uses a Lorenz Waterwheel inspired technique to handle threads. I am not sure how it works but he said that it allows G-WAN to run faster in every possible case.
Related
I'm considering to use Apache Flink to process some stream data in my project.
However, I was told that Flink may need much RAM by a friend. Also, I've found something which told me the same thing: https://www.quora.com/What-is-the-difference-between-Apache-Flink-and-Apache-Spark
For now I haven't learnt a lot about Flink, I just succeeded in installing it and running the Word Count example.
So I'm wondering why Flink needs much RAM. What is the mainly reason? Some disadvantage of Flink itself? Or saving the historical data? or anything else?
Can I use something like Redis to avoid this issue?
That answer on Quora is rather old, and lacks specifics.
It all depends on what you mean by "a lot of memory". I've seen Flink running on a cluster of Raspberry PIs -- see https://hal.inria.fr/hal-02463206/document. For another take on this, see also Extend Flink to edge computing with much lower footprint.
The out-of-the-box configuration is designed to work pretty well across a wide set of use cases. So there is some room for optimization if you need to squeeze Flink down into a more resource-constrained environment.
I deployed an instance of Solr onto a ubuntu machine with tomcat. Then i have a single thread client program to read and inject data into Solr. I am observing memory and cpu usages, and realized that I still have a lot of resources (in terms of memory and CPUs) to use. I wonder if I should change my indexing code to multi-threading to inject into Solr? To index 20 millions of data using current single thread program, it needs about 14 hours. This is why i wonder if i should change to use multi-threading as well. Thanks in advance for your suggestions and help! :)
Multi-threading while indexing in Solr is widely used.
What you say is not very clear if you can also multi-thread the reading from your source, but I think that is the way to go.
I suggest you try it, but first try to analize your code and see which part of the code is the slowest and include that in the multi-threading.
Also keep an eye on your commit strategy.
From the Solr documentation: (http://wiki.apache.org/solr/SolrPerformanceFactors)
"In general, adding many documents per update request is faster than one per update request. ...
Reducing the frequency of automatic commits or disabling them entirely may speed indexing. Beware that this can lead to increased memory usage, which can cause performance issues of its own, such as excessive swapping or garbage collection."
CouchDB is great, I like its p2p replication functionality, but it's a bit larger(because we have to install Erlang) and slower when used in desktop application.
As I tested in intel duo core cpu,
12 seconds to load 10000 docs
10 seconds to insert 10000 doc, but need 20 seconds to update view, so total is 30 seconds
Is there any No SQL implementation which has the same p2p replication functionality, but the size is very small like sqlite, and the speed is quite good(1 second to load 10000 docs).
Have you tried using the Hovercraft and/or the Erlang view server? I had a similar problem and found staying within the Erlang VM (thereby and avoiding excursions to SpiderMonkey) gave me the boost I needed. I did 3 things...
Boosting Queries: Porting your mapreduce functions from js to "native" Erlang usually gives tremendous performance boost when querying couch (http://wiki.apache.org/couchdb/EnableErlangViews). Also, managing views is easier coz you can call external libs or your own compiled modules (just add them to your ebin dir) reducing the number of uploads you need to do during development.
Boosting Inserts: Using Hovercraft for inserts gives upto X100 increase in performance (https://github.com/jchris/hovercraft.) This was mentioned in the CouchDB book (http://guide.couchdb.org/draft/performance.html)
Pre-Run Views: The last thing you can do for desktop apps is run your views during application startup (say, when the splash-screen is showing.) The first time views are run is always the slowest, subsequent runs are faster.
These helped me a lot.
Edmond -
Unfortunately the question doesn't offer enough details about your app requirements so it's kind of difficult to offer an advise. Anyways, I'm not aware of any other storage solution offering a similar/advanced P2P replication.
A couple of questions/comments about your your requirements:
what kind of desktop app requires 10000 inserts/second?
when you say size what exactly are you referring to?
You might want to take a look at:
Redis
RavenDB
Also check some of the other NoSQL-solutions listed on http://nosql.mypopescu.com against your app requirements.
I want to scale an e-commerce portal based on LAMP. Recently we've seen huge traffic surge.
What would be steps (please mention in order) in scaling it:
Should I consider moving onto Amazon EC2 or similar? what could be potential problems in switching servers?
Do we need to redesign database? I read, Facebook switched to Cassandra from MySql. What kind of code changes are required if switched to Cassandra? Would Cassandra be better option than MySql?
Possibility of Hadoop, not even sure?
Any other things, which need to be thought of?
Found this post helpful. This blog has nice articles as well. What I want to know is list of steps I should consider in scaling this app.
First, I would suggest making sure every resource served by your server sets appropriate cache control headers. The goal is to make sure truly dynamic content gets served fresh every time and any stable or static content gets served from somebody else's cache as much as possible. Why deliver a product image to every AOL customer when you can deliver it to the first and let AOL deliver it to all the others?
If you currently run your webserver and dbms on the same box, you can look into moving the dbms onto a dedicated database server.
Once you have done the above, you need to start measuring the specifics. What resource will hit its capacity first?
For example, if the webserver is running at or near capacity while the database server sits mostly idle, it makes no sense to switch databases or to implement replication etc.
If the webserver sits mostly idle while the dbms chugs away constantly, it makes no sense to look into switching to a cluster of load-balanced webservers.
Take care of the simple things first.
If the dbms is the likely bottle-neck, make sure your database has the right indexes so that it gets fast access times during lookup and doesn't waste unnecessary time during updates. Make sure the dbms logs to a different physical medium from the tables themselves. Make sure the application isn't issuing any wasteful queries etc. Make sure you do not run any expensive analytical queries against your transactional database.
If the webserver is the likely bottle-neck, profile it to see where it spends most of its time and reduce the work by changing your application or implementing new caching strategies etc. Make sure you are not doing anything that will prevent you from moving from a single server to multiple servers with a load balancer.
If you have taken care of the above, you will be much better prepared for making the move to multiple webservers or database servers. You will be much better informed for deciding whether to scale your database with replication or to switch to a completely different data model etc.
1) First thing - measure how many requests per second can serve you most-visited pages. For well-written PHP sites on average hardware it must be in 200-400 requests per second range. If you are not there - you have to optimize the code by reducing number of database requests, caching rarely changed data in memcached/shared memory, using PHP accelerator. If you are at some 10-20 requests per second, you need to get rid of your bulky framework.
2) Second - if you are still on Apache2, you have to switch to lighthttpd or nginx+apache2. Personally, I like the second option.
3) Then you move all your static data to separate server or CDN. Make sure it is served with "expires" headers, at least 24 hours.
4) Only after all these things you might start thinking about going to EC2/Hadoop, build multiple servers and balancing the load (nginx would also help you there)
After steps 1-3 you should be able to serve some 10'000'000 hits per day easily.
If you need just 1.5-3 times more, I would go for single more powerfull server (8-16 cores, lots of RAM for caching & database).
With step 4 and multiple servers you are on your way to 0.1-1billion hits per day (but for significantly larger hardware & support expenses).
Find out where issues are happening (or are likely to happen if you don't have them now). Knowing what is your biggest resource usage is important when evaluating any solution. Stick to solutions that will give you the biggest improvement.
Consider:
- higher than needed bandwidth use x user is something you want to address regardless of moving to ec2. It will cost you money either way, so its worth a shot at looking at things like this: http://developer.yahoo.com/yslow/
- don't invest into changing databases if that's a non issue. Find out first if that's really the problem, and even if you are having issues with the database it might be a code issue i.e. hitting the database lots of times per request.
- unless we are talking about v. big numbers, you shouldn't have high cpu usage issues, if you do find out where they are happening / optimization is worth it where specific code has a high impact in your overall resource usage.
- after making sure the above is reasonable, you might get big improvements with caching. In bandwith (making sure browsers/proxy can play their part on caching), local resources usage (avoiding re-processing/re-retrieving the same info all the time).
I'm not saying you should go all out with the above, just enough to make sure you won't get the same issues elsewhere in v. few months. Also enough to find out where are your biggest gains, and if you will get enough value from any scaling options. This will also allow you to come back and ask questions about specific problems, and how these scaling options relate to those.
You should prepare by choosing a flexible framework and be sure things are going to change along the way. In some situations it's difficult to predict your user's behavior.
If you have seen an explosion of traffic recently, analyze what are the slowest pages.
You can move to cloud, but EC2 is not the best performing one. Again, be sure there's no other optimization you can do.
Database might be redesigned, but I doubt all of it. Again, see the problem points.
Both Hadoop and Cassandra are pretty nifty, but they might be overkill.
Here's the deal. We would have taken the complete static html road to solve performance issues, but since the site will be partially dynamic, this won't work out for us.
What we have thought of instead is using memcache + eAccelerator to speed up PHP and take care of caching for the most used data.
Here's our two approaches that we have thought of right now:
Using memcache on >>all<< major queries and leaving it alone to do what it does best.
Usinc memcache for most commonly retrieved data, and combining with a standard harddrive-stored cache for further usage.
The major advantage of only using memcache is of course the performance, but as users increases, the memory usage gets heavy. Combining the two sounds like a more natural approach to us, even though the theoretical compromize in performance.
Memcached appears to have some replication features available as well, which may come handy when it's time to increase the nodes.
What approach should we use?
- Is it stupid to compromize and combine the two methods? Should we insted be focusing on utilizing memcache and instead focusing on upgrading the memory as the load increases with the number of users?
Thanks a lot!
Compromize and combine this two method is a very clever way, I think.
The most obvious cache management rule is latency v.s. size rule, which is used in CPU cached also. In multi level caches each next level should have more size for compensating higher latency. We have higher latency but higher cache hit ratio. So, I didn't recommend you to place disk based cache in front of memcache. Сonversely it's should be place behind memcache. The only exception is if you cache directory mounted in memory (tmpfs). In this case file based cache could compensate high load on memcache, and also could have latency profits (because of data locality).
This two storages (file based, memcache) are not only storages that are convenient for cache. You also could use almost any KV database as they are very good at concurrency control.
Cache invalidation is separate question which can engage your attention. There are several tricks you could use to provide more subtle cache update on cache misses. One of them is dog pile effect prediction. If several concurrent threads got cache miss simultaneously all of them go to backend (database). Application should allow only one of them to proceed and rest of them should wait on cache. Second is background cache update. It's nice to update cache not in web request thread but in background. In background you can control concurrency level and update timeouts more gracefully.
Actually there is one cool method which allows you to do tag based cache tracking (memcached-tag for example). It's very simple under the hood. With every cache entry you save a vector of tags versions which it is belongs to (for example: {directory#5: 1, user#8: 2}). When you reading cache line you also read all actual vector numbers from memcached (this could be effectively performed with multiget). If at least one actual tag version is greater than tag version saved in cache line then cache is invalidated. And when you change objects (for example directory) appropriate tag version should be incremented. It's very simple and powerful method, but have it's own disadvantages, though. In this scheme you couldn't perform efficient cache invalidation. Memcached could easily drop out live entries and keep old entries.
And of course you should remember: "There are only two hard things in Computer Science: cache invalidation and naming things" - Phil Karlton.
Memcached is quite a scalable system. For instance, you can replicate cache to decrease access time for certain key buckets or implement Ketama algorithm that enables you to add/remove Memcached instances from pool without remap of all keys. In this way, you can easily add new machines dedicated to Memcached when you happen to have extra memory. Furthermore, as its instance can be run with different sizes, you can throw up one instance by adding more RAM to an old machine. Generally, this approach is more economic and to some extent does not inferior to the first one, especially for multiget() requests. Regarding a performance drop with data growth, the runtime of the algorithms used in Memcached does not vary with the size of the data, and therefore the access time depend only on number of simultaneous requests. Finally, if you want to tune your memory/performance priorities you can set expire time and available memory configuration values which will strict RAM usage or increase cache hits.
At the same time, when you use a hard-disk the file system can become a bottleneck of your application. Besides general I/O latency, such things as fragmentation and huge directories can noticeably affect your overall request speed. Also, beware that default Linux hard disk settings are tuned more for compatibility than for speed, so it is advisable to configure it properly before usage (for instance, you can try hdparm utility).
Thus, before adding one more integrating point, I think you should tune the existent system. Usually, properly designed database, configured PHP, Memcached and handling of static data should be enough even for a high-load web site.
I would suggest that you first use memcache for all major queries. Then, test to find queries that are least used or data that is rarely changed and then provide a cache for this.
If you can isolate common data from rarely used data, then you can focus on improving performance on the more commonly used data.
Memcached is something that you use when you're sure you need to. You don't worry about it being heavy on memory, because when you evaluate it, you include the cost of the dedicated boxes that you're going to deploy it on.
In most cases putting memcached on a shared machine is a waste of time, as its memory would be better used caching whatever else it does instead.
The benefit of memcached is that you can use it as a shared cache between many machines, which increases the hit rate. Moreover, you can have the cache size and performance higher than a single box can give, as you can (and normally would) deploy several boxes (per geographical location).
Also the way memcached is normally used is dependent on a low latency link from your app servers; so you wouldn't normally use the same memcached cluster in different geographical locations within your infrastructure (each DC would have its own cluster)
The process is:
Identify performance problems
Decide how much performance improvement is enough
Reproduce problems in your test lab, on production-grade hardware with necessary driver machines - this is nontrivial and you may need a lot of dedicated (even specialised) hardware to drive your app hard enough.
Test a proposed solution
If it works, release it to production, if not, try more options and start again.
You should not
Cache "everything"
Do things without measuring their actual impact.
As your performance test environment will never be perfect, you should have sufficient instrumentation / monitoring that you can measure performance and profile your app IN PRODUCTION.
This also means that every single thing that you cache should have a cache hit/miss counter on it. You can use this to determine when the cache is being wasted. If a cache has a low hit rate (< 90%, say), then it is probably not worthwhile.
It may also be worth having the individual caches switchable in production.
Remember: OPTIMISATIONS INTRODUCE FUNCTIONAL BUGS. Do as few optimisations as possible, and be sure that they are necessary AND effective.
You can delegate the combination of disk/memory cache to the OS (if your OS is smart enough).
For Solaris, you can actually even add SSD layer in the middle; this technology is called L2ARC.
I'd recommend you to read this for a start: http://blogs.oracle.com/brendan/entry/test.