SOLR4/lucene and JVM memory management - solr

Does anyone know how solr4/lucene and the JVM, manages memory?
We have the following case.
We have a 15GB server running only SOLR4/Lucene and the JVM (no custom code)
We had allocated 2GB of memory and the JVM was using 1.9MB. At some point something happened and we run out of memory.
Then we increased the JVM memory to 4GB and we see that gradually, JVM starts to use as much as it can. It is now using 3GB out of the 4GB allocated.
Is that normal JVM memory usage? i.e. Does the JVM always use as much as it can from the allocated space?
Thanks for your help

Facets are using field cache. If you have a lot of fields (even small) which are used for faceting, it very well may be that eventually you hit your memory limit.
if you have a lot of query facets or define a lot of FQ - in your case 1Mb is allocated per each filter cache entry. You have defined 16384 as an upper bound, so you may also hit large numbers there eventually.
Hope this helps.

Related

Solrcloud replica down after modify heap size jvm memory

I had to increase the JVM-Memory to 10g from the default value 512m of solr.I changes the values directly in the files ‘solr/bin/solr.cmd‘ and ‘solr/bin/solr.in.cmd‘ and restarted the solr cloud.
All the replica showing statuses as Down mode. And Iam getting error message like status 404 when execute query on the collection.
Nothing is showing in log about the replicas down.
What are steps I need to perform to get the all replicas to Active mode?
I don't really understand why you had to increase the JVM memory from 512 MB directly to 10 GB.
You should know that Solr and Lucene use MMapDirectory as default. This means that all indexes are not loaded in the JVM virtual memory but they are allocated in a dedicated space of memory. This blog post can help you
Considering you have 16GB RAM available, as a first iteration, I'd allocate 4 GB to the JVM so that 12GB remains available for the operating system (and 6GB of index files). Then, by monitoring the system memory and the JVM memory, you can do better tuning.
That being said, I don't think the high JVM allocated memory is enough to break all Solr instances. Can you please verify that you updated only the JVM heap memory value? Can you also verify if the logs show some initialization failures?
There is still some missing information:
How many nodes your SolrCloud is composed of?
How many replicas? And what type of replica?
PS: Considering you are working on solr.cmd and solr.in.cmd I assume your server is Windows, the Linux version invokes the solr.in.sh script.

Increased use of swap on AWS Aurora

I have an AWS RDS Aurora (PostgreSQL compatible) instance, which recently triggered an alert because of increased swap usage, which was caused by running some not optimized queries (big temporary tables and sequential scanning). Some basic AWS metrics looks like:
Blue line: freeable memory
Purple line: swap usage
Yellow line: freeable - swap
I have a few questions I could not find an answer, nowhere in AWS docs, forums nor on SO
Why the DB started allocating swap while it still had a lot of freeable memory?
Why it's not releasing the swap if it's no longer used? How to reduce the amount of used swap?
Why also adds to freeable memory?
You can find more details about the RDS swap memory in the AWS Knowledgebase: https://aws.amazon.com/premiumsupport/knowledge-center/troubleshoot-rds-swap-memory/.
Swap Memory is an essential part of the OS, which helps to extend the memory size by storing additional data in a DISK. When more memory is allocated, old contents of the RAM is written to the Swap location in the DISK and new contents is placed in the RAM. In this case, it indicates that a new query/a set of queries are executed which fetched/scanned more records thus more RAM is required. So the OS made room by moving some old data to Swap.
As per the KB, below is the reason for swap memory not going down,
Linux swap usage isn't cleared frequently, because clearing the swap
usage requires extra overhead to reallocate swap when it's needed and
when reloading pages. As a result, if swap space is used on your RDS
DB instance, even if swap space was used only one time, the SwapUsage
metrics don't return to zero.
Postgres caches the result of previous executions in RAM so that it can reduce the disk seek next time. You can improving the database performance by allocating sufficient buffer cache. This is an expected behavior. The size of this cache is configurable. Please refer: https://redfin.engineering/how-to-boost-postgresql-cache-performance-8db383dc2d8f
Also as mentioned in the KB, this could be due to queries that returns huge amount of record, or a load on the database. You can enable performance insights to get more details about the queries that are running during that time.
BTW, Performance insights may not be available in smaller RDS instances. In that case, you can look in to the binary logs to see which queries where executed. Also enabling slow query logs will help you.

Flink taskmanager out of memory and memory configuration

We are using Flink streaming to run a few jobs on a single cluster. Our jobs are using rocksDB to hold a state.
The cluster is configured to run with a single Jobmanager and 3 Taskmanager on 3 separate VMs.
Each TM is configured to run with 14GB of RAM.
JM is configured to run with 1GB.
We are experiencing 2 memory related issues:
- When running Taskmanager with 8GB heap allocation, the TM ran out of heap memory and we got heap out of memory exception. Our solution to this problem was increasing heap size to 14GB. Seems like this configuration solved the issue, as we no longer crash due to out of heap memory.
- Still, after increasing heap size to 14GB (per TM process) OS runs out of memory and kills the TM process. RES memory is rising over time and reaching ~20GB per TM process.
1. The question is how can we predict the maximal total amount of physical memory and heap size configuration?
2. Due to our memory issues, is it reasonable to use a non default values of Flink managed memory? what will be the guideline in such case?
Further details:
Each Vm is configured with 4 CPUs and 24GB of RAM
Using Flink version: 1.3.2
The total amount of required physical and heap memory is quite difficult to compute since it strongly depends on your user code, your job's topology and which state backend you use.
As a rule of thumb, if you experience OOM and are still using the FileSystemStateBackend or the MemoryStateBackend, then you should switch to RocksDBStateBackend, because it can gracefully spill to disk if the state grows too big.
If you are still experiencing OOM exceptions as you have described, then you should check your user code whether it keeps references to state objects or generates in some other way large objects which cannot be garbage collected. If this is the case, then you should try to refactor your code to rely on Flink's state abstraction, because with RocksDB it can go out of core.
RocksDB itself needs native memory which adds to Flink's memory footprint. This depends on the block cache size, indexes, bloom filters and memtables. You can find out more about these things and how to configure them here.
Last but not least, you should not activate taskmanager.memory.preallocate when running streaming jobs, because streaming jobs currently don't use managed memory. Thus, by activating preallocation, you would allocate memory for Flink's managed memory which is reduces the available heap space.
Using RocksDBStateBackend can lead to significant off-heap/direct memory consumption, up to the available memory on the host. Normally that doesn't cause a problem, when the task manager process is the only big memory consumer. However, if there are other processes with dynamically changing memory allocations, it can lead to out of memory. I came across this post since I'm looking for a way to cap the RocksDBStateBackend memory usage. As of Flink 1.5, there are alternative option sets available here. It appears though that these can only be activated programmatically, not via flink-conf.yaml.

Whole Oracle database in memory

Suppose I have an Oracle database whose data files are 256 GB in size. Is it a good idea to use a server with, say, 384 GB RAM in order to host the entire database in RAM?
Is there any difference if you only have, say, 128 GB RAM?
I'm talking about caching and Oracle inner workings, not memory based filesystem. Suppose OLTP, and a 100 GB working set.
Regards,
Assuming you are talking about Oracle using the memory for caching and other processes and not a memory based filesystem (which is an awful idea)... more memory is almost always better than less memory.
The real world answer is it depends. If your working set of data is a few GB or less then the extra memory wouldn't help as much.
How much memory you need and when extra memory stops helping depends on your application and what style of DB (OLTP,DSS) and there is no simple yes/no answer.
Use the views V$SGA_TARGET_ADVICE and V$PGA_TARGET_ADVICE to predict the performance improvement of additional memory.
Oracle records many statistics about physical (disk) and logical (total) I/O requests. People used to obsess over the buffer cache hit ratio. It can be helpful but that number doesn't tell the whole story. If the ratio is 99% then your cache is probably sufficient and adding more memory won't help. If it's low then you might benefit from more memory, or perhaps the processes that use disk aren't time critical.
Be careful before you request more memory. I've seen a lot of memory wasted because some people assume more memory will solve everything. Oracle has many I/O features to help reduce memory requirements. The "in-memory database" fad is mostly hype.

Solr always use more than 90% of physical memory

I have 300000 documents stored in solr index. And used 4GB RAM for solr server. But It consumes more than 90% of physical memory. So I moved to my data to a new server which has 16 GB RAM. Again solr consumes more than 90% memory. I don't know how to resolve this issue. I used default MMapDirectory and solr version 4.2.0. Explain me if you have any solution or the reason for this.
MMapDirectory tries to use the OS memory (OS Cache) to the full as much as possible this is normal behaviour, it will try to load the entire index into memory if available. In fact, it is a good thing. Since these memory is available it will try to use it. If another application in the same machine demands more, OS will release it for it. This is one the reason why Solr/Lucene the queries are order of magnitude fast, as most of the call to server ends up memory (depending on the size memory) rather than disk.
JVM memory is a different thing, it can be controlled, only working query response objects and certain cache entries use JVM memory. So JVM size can be configured based on number request and cache entries.
what -Xmx value are you using when invoking the jvm? If you are not using an explicit value, the jvm will set one based on the machine features.
Once you give a max amount of heap to Solr, solr will potentially use all of it, if it needs to, that is how it works. If you to limit to say 2GB use -Xmx=2000m when you invoke the jvm. Not sure how large your docs are, but 300k docs would be considered a smallish index.

Resources