Solr always use more than 90% of physical memory - solr

I have 300000 documents stored in solr index. And used 4GB RAM for solr server. But It consumes more than 90% of physical memory. So I moved to my data to a new server which has 16 GB RAM. Again solr consumes more than 90% memory. I don't know how to resolve this issue. I used default MMapDirectory and solr version 4.2.0. Explain me if you have any solution or the reason for this.

MMapDirectory tries to use the OS memory (OS Cache) to the full as much as possible this is normal behaviour, it will try to load the entire index into memory if available. In fact, it is a good thing. Since these memory is available it will try to use it. If another application in the same machine demands more, OS will release it for it. This is one the reason why Solr/Lucene the queries are order of magnitude fast, as most of the call to server ends up memory (depending on the size memory) rather than disk.
JVM memory is a different thing, it can be controlled, only working query response objects and certain cache entries use JVM memory. So JVM size can be configured based on number request and cache entries.

what -Xmx value are you using when invoking the jvm? If you are not using an explicit value, the jvm will set one based on the machine features.
Once you give a max amount of heap to Solr, solr will potentially use all of it, if it needs to, that is how it works. If you to limit to say 2GB use -Xmx=2000m when you invoke the jvm. Not sure how large your docs are, but 300k docs would be considered a smallish index.

Related

Solrcloud replica down after modify heap size jvm memory

I had to increase the JVM-Memory to 10g from the default value 512m of solr.I changes the values directly in the files ‘solr/bin/solr.cmd‘ and ‘solr/bin/solr.in.cmd‘ and restarted the solr cloud.
All the replica showing statuses as Down mode. And Iam getting error message like status 404 when execute query on the collection.
Nothing is showing in log about the replicas down.
What are steps I need to perform to get the all replicas to Active mode?
I don't really understand why you had to increase the JVM memory from 512 MB directly to 10 GB.
You should know that Solr and Lucene use MMapDirectory as default. This means that all indexes are not loaded in the JVM virtual memory but they are allocated in a dedicated space of memory. This blog post can help you
Considering you have 16GB RAM available, as a first iteration, I'd allocate 4 GB to the JVM so that 12GB remains available for the operating system (and 6GB of index files). Then, by monitoring the system memory and the JVM memory, you can do better tuning.
That being said, I don't think the high JVM allocated memory is enough to break all Solr instances. Can you please verify that you updated only the JVM heap memory value? Can you also verify if the logs show some initialization failures?
There is still some missing information:
How many nodes your SolrCloud is composed of?
How many replicas? And what type of replica?
PS: Considering you are working on solr.cmd and solr.in.cmd I assume your server is Windows, the Linux version invokes the solr.in.sh script.

Increased use of swap on AWS Aurora

I have an AWS RDS Aurora (PostgreSQL compatible) instance, which recently triggered an alert because of increased swap usage, which was caused by running some not optimized queries (big temporary tables and sequential scanning). Some basic AWS metrics looks like:
Blue line: freeable memory
Purple line: swap usage
Yellow line: freeable - swap
I have a few questions I could not find an answer, nowhere in AWS docs, forums nor on SO
Why the DB started allocating swap while it still had a lot of freeable memory?
Why it's not releasing the swap if it's no longer used? How to reduce the amount of used swap?
Why also adds to freeable memory?
You can find more details about the RDS swap memory in the AWS Knowledgebase: https://aws.amazon.com/premiumsupport/knowledge-center/troubleshoot-rds-swap-memory/.
Swap Memory is an essential part of the OS, which helps to extend the memory size by storing additional data in a DISK. When more memory is allocated, old contents of the RAM is written to the Swap location in the DISK and new contents is placed in the RAM. In this case, it indicates that a new query/a set of queries are executed which fetched/scanned more records thus more RAM is required. So the OS made room by moving some old data to Swap.
As per the KB, below is the reason for swap memory not going down,
Linux swap usage isn't cleared frequently, because clearing the swap
usage requires extra overhead to reallocate swap when it's needed and
when reloading pages. As a result, if swap space is used on your RDS
DB instance, even if swap space was used only one time, the SwapUsage
metrics don't return to zero.
Postgres caches the result of previous executions in RAM so that it can reduce the disk seek next time. You can improving the database performance by allocating sufficient buffer cache. This is an expected behavior. The size of this cache is configurable. Please refer: https://redfin.engineering/how-to-boost-postgresql-cache-performance-8db383dc2d8f
Also as mentioned in the KB, this could be due to queries that returns huge amount of record, or a load on the database. You can enable performance insights to get more details about the queries that are running during that time.
BTW, Performance insights may not be available in smaller RDS instances. In that case, you can look in to the binary logs to see which queries where executed. Also enabling slow query logs will help you.

How to Uninitialize solr core when not used

We are using solr multicore environment for production and we have hundreds of users, each having 1 core.
The problem is: when most of the users access solr concurrently, the related cores get initialized and solr is not clearing memory used by unused cores in RAM automatically. Consequently, server RAM usage goes to critical. So to fix this, we have to restart the server every single time. It is not a dedicated server and therefore, if RAM usage increases, other shared resources might get affected.
So how do we clear unused cores from RAM memory? Is there any api or workaround to achieve this in solr?
There are a few settings that allow you to handle unloading of cores more dynamically. First, you'll need to mark each core as transient="true", meaning that the core can be unloaded if the amount of cores exceeds transientCacheSize. The latter is an option you can add within your <solr> element in solr.xml, while the former is for each core definition (in core.properties). You can also provide transient=true when creating a core through the API. The size of transientCacheSize will have to be tuned to the load you're seeing, the size of the cores and the amount of memory available (and used for each core).
It sounds like you've already discovered loadOnStartup which tells Solr if it should attempt to bring the core into memory when the application container starts, or wait until it's actually going to be used.
The Lots of Cores Wiki Page also has a long list of Solr issue tickets that you can dig into to learn more about the features.

Optimizing Solr 4 on EC2 debian instance(s)

My Solr 4 instance is slow and I don't know why.
I am attempting to modify the configurations of JVM, Tomcat6 and Solr 4 in order
to optimize performance, with queries per second as the key metric.
Currently I am running on an EC2 small tier with Debian squeeze, but ready to switch to Ubuntu if needed.
There is nothing special about my use case. The index is small. Queries do include a moderate number of unions (e.g. 10), plus faceting, but I don't think that's unusual.
My understanding is that these areas could need tweaking:
Configuring the JVM Garbage collection schedule and memory allocation ("GC tuning is a precise art form", ref)
Other JVM settings
Solr's Query Result cache, Filter cache, Document cache settings
Solr's Auto-warming settings
There are a number of ways to monitor the performance of Solr:
SolrMeter
Sematext SPM
New Relic
But none of these methods indicate which settings need to be adjusted, and there's no guide that I know of that steps through an exhaustive list of settings that could possibly improve performance. I've reviewed the following pages (one, two, three, four), and gone through some rounds of trial and error so far without improvement.
Questions:
How to tell JVM to use all the 2 GB memory on the small EC2 instance?
How to debug and optimize JVM Garbage Collection?
How do I know when I/O throttling, such as the new EBS IOPS pricing, is the issue?
Using figures like the NewRelic examples below, how to detect what is problematic behavior, and how to approach solutions.
Answers:
I'm looking for link to good documentation for setting up and optimizing Solr 4, from a DevOps or server admin perspective (not index or application design).
I'm looking for the top trouble spots in catalina.sh, solrconfig.xml, solr.xml (other?) that are most likely causes of problems.
Or any tips you think address the questions.
First, you should not focus on switching your linux distribution. A different distribution might bring some changes but considering the information you gave, nothing prove that these changes may be significant.
You are mentionning lots of possibilities for your optimisations, this can be overwhelming. You should consider an tweaking area only once you have proven that the problem lies in that particular part of your stack.
JVM Heap Sizing
You can use the parameter -mx1700m to give a maximum of 1.7GB of RAM to the JVM. Hotspot might not need it, so don't be surprised if your heap capacity does not reach that number.
You should set the minimum heap size to a low value, so that Hotspot can optimise its memory usage. For instance, to set a minimal heap size at 128MB, use -mx128m.
Garbage Collector
From what you say, you have limited hardware (1-core at 1.2GHz max, see this page)
M1 Small Instance
1.7 GiB memory
1 EC2 Compute Unit (1 virtual core with 1 EC2 Compute Unit)
...
One EC2 Compute Unit provides the equivalent CPU capacity of a 1.0-1.2
GHz 2007 Opteron or 2007 Xeon processor
Therefore, using that low-latency GC (CMS) won't do any good. It won't be able to run concurrently with your application since you have only one core. You should switch to the Throughput GC using -XX:+UseParallelGC -XX:+UseParallelOldGC.
Is the GC really a problem ?
To answer that question, you need to turn on GC logging. It is the only way to see whether GC pauses are responsible for your application response time. You should turn these on with -Xloggc:gc.log -XX:+PrintGCDetails.
But I don't think the problem lies here.
Is it a hardware problem ?
To answer this question, you need to monitor resource utilization (disk I/O, network I/O, memory usage, CPU usage). You have a lot of tools to do that, including top, free, vmstat, iostat, mpstat, ifstat, ...
If you find that some of these resources are saturating, then you need a bigger EC2 instance.
Is it a software problem ?
In your stats, the document cache hit rate and the filter cache hit rate are healthy. However, I think the query result cache hit rate is pretty low. This implies a lot of queries operations.
You should monitor the query execution time. Depending on that value you may want to increase the cache size or tune the queries so that they take less time.
More links
JVM options reference : http://jvm-options.tech.xebia.fr/
A feedback that I did on some application performance audit : http://www.pingtimeout.fr/2013/03/petclinic-performance-tuning-about.html
Hope that helps !

SOLR4/lucene and JVM memory management

Does anyone know how solr4/lucene and the JVM, manages memory?
We have the following case.
We have a 15GB server running only SOLR4/Lucene and the JVM (no custom code)
We had allocated 2GB of memory and the JVM was using 1.9MB. At some point something happened and we run out of memory.
Then we increased the JVM memory to 4GB and we see that gradually, JVM starts to use as much as it can. It is now using 3GB out of the 4GB allocated.
Is that normal JVM memory usage? i.e. Does the JVM always use as much as it can from the allocated space?
Thanks for your help
Facets are using field cache. If you have a lot of fields (even small) which are used for faceting, it very well may be that eventually you hit your memory limit.
if you have a lot of query facets or define a lot of FQ - in your case 1Mb is allocated per each filter cache entry. You have defined 16384 as an upper bound, so you may also hit large numbers there eventually.
Hope this helps.

Resources