What could be the reasons for Redis slow work/response?
i.e. I found on Stackoverflow that storing large files or data in Redis makes it slow. What's else?
There is no simple answer to this question. With all NoSQL or SQL based storage solutions, there are plenty of conditions that could result in high latency or slowness of the storage engine. Redis is no exception.
I would suggest to start by reading:
How fast is Redis?
Redis latency problems troubleshooting
Here is a non exhaustive list of potential reasons:
Inadequate hardware (network, memory, CPU)
Software based virtualization (Xen on low-end hardware for instance)
Not enough memory, generating swapping at the OS level
Too many O(n) operations (like KEYS) executed in the single-threaded engine
Large objects stored in Redis, leading to uncontrolled expansion of the communication buffers
Huge number of simultaneous sessions (>30000)
Too many connection operations per second (Redis is not a webserver, connections are supposed to be permanent, not transient).
Too many roundtrips generated by the client application (no pipelining or aggregated command usage)
Large fork operations generated by bgsave or AOF rewrite (especially on VMs)
I/O related latencies when AOF is used
Accumulation of many expire operations triggered at the same time
Accumulation of memory in client and master/slave communication buffers, or slow log data
TCP incast conditions when network bandwidth consumption is significant
Using distributed storage (and especially cloudy ones such as EC2 EBS) to store dump or AOF files
There are probably many other reasons, related to the workload generated by your own application.
If some people think about other general reasons, we can add them to this list.
As was mentioned new connections, > 200 per minute could cause slownesses. A Possible solution is to add a proxy that keeps constant number of connections:
twemproxy
envoy
Related
Why would there be any latency in App Engine in the middle of processing a request? This only happens at times and randomly occurs at different places in the request handling with a latency of around 3 or more seconds after starting to process a request.
The usual suspect is your handler reaching out for some resources, either from GAE APIs (datastore, memcache, etc), other GCP API/infra (cloud storage, machine learning, big query, etc) or an external/3rd party service/URL.
Most, if not all such interactions can occasionally encounter peak response times way longer than average for various possible reasons (or combinations of reasons), for example:
temporary outages of the service being accessed of in the networking layer ensuring connectivity to them
retries at networking or application layers due to communication errors/packet loss
service VMs/instances needed to be launched from scratch during (re)starts or even during scaling up
normal operation conditions which require more time, like datastore transaction retries due to collisions
If the occurrence rate becomes unacceptable an investigation would need to be done to identify which of such external accesses is/are responsible, what are the conditions causing them and maybe find some solution to prevent or reduce the impact of the occurences.
Of course, there may be other reasons as well.
I can use only one server to run my application and my Solr server. I was wondering if performance and availability-wise it makes sense to deploy several nodes of SolrCloud and zookeeper on this machine (e.g. using VMs or docker). Since I will be vulnerable to hardware failure, my main concerns are protection against software failure and performance.
Thus, does adding a few nodes (3 maybe?) will help to have a Solr server with higher availability or better performance? Or will it have the opposite effect?
Using multiple JVMs on one piece of hardware isn't generally going to help much.
As you've mentioned, using many JVMs on one machine doesn't reduce your vulnerability to hardware failure, and it adds a bunch of cognitive complexity because now you have to remember that just because you have three replicas, it doesn't mean two can fail unless you're extra careful where you put each of the three.
For most purposes, just using additional shards in a single JVM/Solr instance is simpler, and accomplishes the same performance goal of keeping your index size per core down to manageable levels. This is a central feature of SolrCloud.
The only exception to this I'm aware of is if you're dealing with an index or usage pattern that requires a very large JVM heap. A very large JVM heap can lead to high max GC pause times, and GC tuning can only help so much. In this case, using multiple JVMs, with a single replica/shard per JVM, can constrain the worst-case GC pause to that required for a single replica.
You also mention Zookeeper, so it's worth noting that ZK is a somewhat different beast. You should probably host ZK separately, you should always use an odd number of ZK nodes, and never more than one per physical host.
I'm developing a web application that requires a lot of users to be in the same "universe", where a lot of frequent queries will happen:
frequent lookups of clients that are in a certain box area (between X1, X2, Y1 and Y2)
frequent position updates by clients
frequent chat messages by clients
frequent status updates by clients
frequent connections and disconnections of new and old clients
I believe my nodes can have enough memory for all currently online users to be in RAM. This is why I originally considered Redis. However, I decided Redis is not applicable here because:
it has a single point of failure (one master server)
only the master server can write, if one has 40 nodes then 39 slaves would have to make the one master write each and every entry
Cassandra seems to solve these issues.
However, is Cassandra also suitable for my frequent queries?
Cassandra optimises writes over reads (reads are expensive compared to writes), but it can still sustain high read and write throughput simultaneously.
With the right column family structures you should be able to do what you want at high frequencies, depending on how big your cluster is.
Personally I'd use Redis for caching most of the information, and only read from Cassandra on cache miss.
Cassandra is definitely a superb solution for handling writes but if you can tell your read load then definitely you can expect a precise answer but generally reads are also good as long as you have enough RAM.
The user case you described seems to include many joins..
Do you have enough reasons to adopt NoSQL solution right from the developmental stage? Because Cassandra is basically a solution for setups which require high scalability BUT at the expense of de-normalization and sacrificing Joins to a good extent. In other words you need higher disk space but low CPU.
Or have you finalized your database design and apparent scheme (though Cassandra is not schema bound) which fulfills all of your query especially read query requirements? (its v.imp)
The prevailing wisdom in webservices/web requests in general is to design your api such that you use as few requests as possible, and that each request returns therefore as much data as is needed
In database design, the accepted wisdom is to design your queries to minimise size over the network, as opposed to minimizing the number of queries.
They are both remote calls, so what gives?
Probably because the fixed overhead for a web service call (made over the internet) is much higher than the fixed cost of a call to the database (typically over gigabit ethernet or even to the local machine)
Still, I would argue that you always want to reduce trips to the database to as few as necessary. The overhead is lower, but relative to most other operations your program does, it is still quite high.
Web service. You miss one thing - SQL always said get as little data AS NEEDED and make as many requests as you have to - not "dump your reqwuests down into mino bits".
Also, remote means additional latency. SQL / WS, all the same. Latency is EVIL. Cut down on round trips as much as you can, especially if those cost yu 20-30 times as much as in a LAN (<1ms vs. what - 30ms to 150ms in a remote scenario).
Here's the deal. We would have taken the complete static html road to solve performance issues, but since the site will be partially dynamic, this won't work out for us.
What we have thought of instead is using memcache + eAccelerator to speed up PHP and take care of caching for the most used data.
Here's our two approaches that we have thought of right now:
Using memcache on >>all<< major queries and leaving it alone to do what it does best.
Usinc memcache for most commonly retrieved data, and combining with a standard harddrive-stored cache for further usage.
The major advantage of only using memcache is of course the performance, but as users increases, the memory usage gets heavy. Combining the two sounds like a more natural approach to us, even though the theoretical compromize in performance.
Memcached appears to have some replication features available as well, which may come handy when it's time to increase the nodes.
What approach should we use?
- Is it stupid to compromize and combine the two methods? Should we insted be focusing on utilizing memcache and instead focusing on upgrading the memory as the load increases with the number of users?
Thanks a lot!
Compromize and combine this two method is a very clever way, I think.
The most obvious cache management rule is latency v.s. size rule, which is used in CPU cached also. In multi level caches each next level should have more size for compensating higher latency. We have higher latency but higher cache hit ratio. So, I didn't recommend you to place disk based cache in front of memcache. Сonversely it's should be place behind memcache. The only exception is if you cache directory mounted in memory (tmpfs). In this case file based cache could compensate high load on memcache, and also could have latency profits (because of data locality).
This two storages (file based, memcache) are not only storages that are convenient for cache. You also could use almost any KV database as they are very good at concurrency control.
Cache invalidation is separate question which can engage your attention. There are several tricks you could use to provide more subtle cache update on cache misses. One of them is dog pile effect prediction. If several concurrent threads got cache miss simultaneously all of them go to backend (database). Application should allow only one of them to proceed and rest of them should wait on cache. Second is background cache update. It's nice to update cache not in web request thread but in background. In background you can control concurrency level and update timeouts more gracefully.
Actually there is one cool method which allows you to do tag based cache tracking (memcached-tag for example). It's very simple under the hood. With every cache entry you save a vector of tags versions which it is belongs to (for example: {directory#5: 1, user#8: 2}). When you reading cache line you also read all actual vector numbers from memcached (this could be effectively performed with multiget). If at least one actual tag version is greater than tag version saved in cache line then cache is invalidated. And when you change objects (for example directory) appropriate tag version should be incremented. It's very simple and powerful method, but have it's own disadvantages, though. In this scheme you couldn't perform efficient cache invalidation. Memcached could easily drop out live entries and keep old entries.
And of course you should remember: "There are only two hard things in Computer Science: cache invalidation and naming things" - Phil Karlton.
Memcached is quite a scalable system. For instance, you can replicate cache to decrease access time for certain key buckets or implement Ketama algorithm that enables you to add/remove Memcached instances from pool without remap of all keys. In this way, you can easily add new machines dedicated to Memcached when you happen to have extra memory. Furthermore, as its instance can be run with different sizes, you can throw up one instance by adding more RAM to an old machine. Generally, this approach is more economic and to some extent does not inferior to the first one, especially for multiget() requests. Regarding a performance drop with data growth, the runtime of the algorithms used in Memcached does not vary with the size of the data, and therefore the access time depend only on number of simultaneous requests. Finally, if you want to tune your memory/performance priorities you can set expire time and available memory configuration values which will strict RAM usage or increase cache hits.
At the same time, when you use a hard-disk the file system can become a bottleneck of your application. Besides general I/O latency, such things as fragmentation and huge directories can noticeably affect your overall request speed. Also, beware that default Linux hard disk settings are tuned more for compatibility than for speed, so it is advisable to configure it properly before usage (for instance, you can try hdparm utility).
Thus, before adding one more integrating point, I think you should tune the existent system. Usually, properly designed database, configured PHP, Memcached and handling of static data should be enough even for a high-load web site.
I would suggest that you first use memcache for all major queries. Then, test to find queries that are least used or data that is rarely changed and then provide a cache for this.
If you can isolate common data from rarely used data, then you can focus on improving performance on the more commonly used data.
Memcached is something that you use when you're sure you need to. You don't worry about it being heavy on memory, because when you evaluate it, you include the cost of the dedicated boxes that you're going to deploy it on.
In most cases putting memcached on a shared machine is a waste of time, as its memory would be better used caching whatever else it does instead.
The benefit of memcached is that you can use it as a shared cache between many machines, which increases the hit rate. Moreover, you can have the cache size and performance higher than a single box can give, as you can (and normally would) deploy several boxes (per geographical location).
Also the way memcached is normally used is dependent on a low latency link from your app servers; so you wouldn't normally use the same memcached cluster in different geographical locations within your infrastructure (each DC would have its own cluster)
The process is:
Identify performance problems
Decide how much performance improvement is enough
Reproduce problems in your test lab, on production-grade hardware with necessary driver machines - this is nontrivial and you may need a lot of dedicated (even specialised) hardware to drive your app hard enough.
Test a proposed solution
If it works, release it to production, if not, try more options and start again.
You should not
Cache "everything"
Do things without measuring their actual impact.
As your performance test environment will never be perfect, you should have sufficient instrumentation / monitoring that you can measure performance and profile your app IN PRODUCTION.
This also means that every single thing that you cache should have a cache hit/miss counter on it. You can use this to determine when the cache is being wasted. If a cache has a low hit rate (< 90%, say), then it is probably not worthwhile.
It may also be worth having the individual caches switchable in production.
Remember: OPTIMISATIONS INTRODUCE FUNCTIONAL BUGS. Do as few optimisations as possible, and be sure that they are necessary AND effective.
You can delegate the combination of disk/memory cache to the OS (if your OS is smart enough).
For Solaris, you can actually even add SSD layer in the middle; this technology is called L2ARC.
I'd recommend you to read this for a start: http://blogs.oracle.com/brendan/entry/test.