solr - Size of Heap Space - solr

I have an Alfresco 6.2.0 instance on an Ubuntu system using solr specification 6.6.7 and Search Services 1.4.0. I have two cores with currently 155364 documents in the alfresco core and 126054 documents in the archive core. Until today solr hat 1 GB heap space and the last few weeks problems where rising that solr exits with heap space out of memory. Today I raised to 2 Gb, hoping that this is enough.
Is this normal, that solr needs this amount of memory? Are 100.000 documents (no big files excepted the images) really so many, that solr needs more than 1GB? I am just wondering, because the instance is used by a small company.
Thanks,
Florian

1-2 GB heap is not much for Solr, but it is little. In fact, it is the component in the Alfresco architecture that is usually allocated the most memory. The index with the metadata for the full text should fit into th memory if possible to ensure fast searching. You can have a look at Sizing my Alfresco Search Services.

Related

Solrcloud replica down after modify heap size jvm memory

I had to increase the JVM-Memory to 10g from the default value 512m of solr.I changes the values directly in the files ‘solr/bin/solr.cmd‘ and ‘solr/bin/solr.in.cmd‘ and restarted the solr cloud.
All the replica showing statuses as Down mode. And Iam getting error message like status 404 when execute query on the collection.
Nothing is showing in log about the replicas down.
What are steps I need to perform to get the all replicas to Active mode?
I don't really understand why you had to increase the JVM memory from 512 MB directly to 10 GB.
You should know that Solr and Lucene use MMapDirectory as default. This means that all indexes are not loaded in the JVM virtual memory but they are allocated in a dedicated space of memory. This blog post can help you
Considering you have 16GB RAM available, as a first iteration, I'd allocate 4 GB to the JVM so that 12GB remains available for the operating system (and 6GB of index files). Then, by monitoring the system memory and the JVM memory, you can do better tuning.
That being said, I don't think the high JVM allocated memory is enough to break all Solr instances. Can you please verify that you updated only the JVM heap memory value? Can you also verify if the logs show some initialization failures?
There is still some missing information:
How many nodes your SolrCloud is composed of?
How many replicas? And what type of replica?
PS: Considering you are working on solr.cmd and solr.in.cmd I assume your server is Windows, the Linux version invokes the solr.in.sh script.

SolrCloud - Out of Memory

Were using SolrCloud 4.10.3 on the Cloudera Platform with a 3 node solr cluster with 2 collections of 3 shards each.
Collection 1: approx size: 15.3 GB Collection 2: size: 1.2GB
Our heap size is 8GB and off heap is 15GB. We have a realtime feed into solr for one of our collections (the other is pretty static). We are constantly getting an out of memory error.
Can anyone help us as to the reason? Should be we having additional shards to spread the load? Or do we need to keep giving more off heap memory? All the cloudera heap graphs show that we are find for heap space (we rarely go above 6.5GB) and GC pauses are not an issue.
Thanks
The best approach should be to upgrade the solr cloud to version
6.2.1.
it also depends on the node architecture if the node is of 32 bit arch. then heap size more than 2gb wont work if the node is of 64 bit arch you can allocate more heap size but can generate gc overhead error.
so better to update solr and add more shards and replicas to avoid the error.

Running Search workload and Cassandra workload on the same physical node

Can't seem to find the answer to this obvious question.
We have 6 servers currently configured as "Search" workload running DSE.
My question is:
Is it possible to run Search (Solr) and Cassandra on the same physical box? (Not) Possible / (Not) Recommended?
I'm very confused with the fact that we currently are running all nodes as Solr nodes and I'm still able to use them as Cassandra (real time queries) - so it's technically both?
The "Services /Best Practice" tells me that:
"Please replace the current search nodes that have vnodes enabled with nodes without vnodes."
Our ideal situation would be:
a. Use all 6 servers as cassandra storage (+ real time queries)
b. Use 1 or 2 of the SAME servers as Solr Search.
The only documentation that I've found that somewhat resemble what we want to is -
http://www.datastax.com/documentation/datastax_enterprise/4.6/datastax_enterprise/deploy/deployWkLdSep.html
but as far as I understand it still says that I need to physically split the load, meaning dedicate 4 servers for cassandra and 2 nodes for solr/search ?
Can anyone explain/suggest anything?
Thank you!
DSE Search - C* and Solr on the Same node:
As Rock Brain mentioned, DSE Search will run Solr and Cassandra on the same node. More specifically, it will run it on the same JVM. This has heap implications. Recommendation is to bump your heap up to 14gb rather than the c* only 8gb.
As RB also mentioned, CPU consumption will be greater with Solr. However, I
often see Search DC's with fewer, beefier, nodes than C* nodes. Again this depends on your workload and how much data you're indexing.
Note: DSE Search Performance Tip
The main rule of thumb for performance is to try to fit all your DSE Indexes in the OS page cache so you may need more RAM than for a Cassandra only node to get optimal performance.
DSE Search and Workload Isolation:
You will find in the DataStax docs, that we recommend for you to run separate data centers for your cassandra workloads and for your search or analytics workloads. This basically prevents Search driven contention from affecting your cassandra ingestions.
The reason behind this recommendation is that many DSE customers have super-tight micro second sla's and very large workloads. You can get away with running search and c* in the same nodes (same DC) if you have looser SLA's and smaller workloads. Your best bet is to POC it with your workload on your hardware and see how it performs.
Can I activate DSE Search on just 2 of my 6 DSE nodes?
Not really, you most likely want to turn on search on your whole DC or not at all. For the following reasons:
the DSESimpleSnitch will automatically split them up into separate DC's so you'd have to use another snitch.
you will get cannot find endpoints errors on your Solr DC's if there aren't enough nodes with the right copies of your data. Remember, Cassandra is still responsible for replication and the Solr core on each node will only index the corresponding data that is on that node.
Turn on search in all 6, but feel free to direct c* queries at all of them and search queries only at 2 if you want. Not sure why you would want to though, you'll clearly see those 2 nodes will be under higher load in OpsCenter.
Remember that you can leverage Search queries right from CQL now as of DSE 4.6.
Vnodes vs. Non Vnodes for DSE Search
For your question on the comment above. Vnodes are not recommended for DSE Search as you will incur a performance hit. Specifically, pre 4.6 it was a large hit, ~300%. But as of 4.6 it's only a 30% performance hit for Search queries. The bigger the num_vnodes the larger the hit.
You can run vnodes on one DC and single tokens on the other DC. DSE will, by default, run single tokens.
Is it possible to run Search (Solr) and Cassandra on the same physical box? (Not) Possible / (Not) Recommended?
Yes, this is how DSE Search works, Cassandra and Solr run in the same process with the full functionality of both available.
Solr uses more CPU than Cassandra, so you will want more Solr nodes than dedicated Cassandra nodes. You will setup separate Cassandra and Solr data centers to divide the work load types.

Why is solr QPS so low?

We are running a vanilla solr installation. [ in non-cloud mode ].
Each document has about 100 fields, and the document size is ~5k bytes.
There are multiple cores, ~20 in a single solr instance. The total number of documents combined is ~2 million.
During testing, this node gives a peak QPS of ~100. For a modern 8core, 60G machine, this seems to be really low.
Does anyone have experience with solr internals to explain, why is it so slow?
Will using lucene library directly with a thin server wrapper give a higher QPS?

Solr always use more than 90% of physical memory

I have 300000 documents stored in solr index. And used 4GB RAM for solr server. But It consumes more than 90% of physical memory. So I moved to my data to a new server which has 16 GB RAM. Again solr consumes more than 90% memory. I don't know how to resolve this issue. I used default MMapDirectory and solr version 4.2.0. Explain me if you have any solution or the reason for this.
MMapDirectory tries to use the OS memory (OS Cache) to the full as much as possible this is normal behaviour, it will try to load the entire index into memory if available. In fact, it is a good thing. Since these memory is available it will try to use it. If another application in the same machine demands more, OS will release it for it. This is one the reason why Solr/Lucene the queries are order of magnitude fast, as most of the call to server ends up memory (depending on the size memory) rather than disk.
JVM memory is a different thing, it can be controlled, only working query response objects and certain cache entries use JVM memory. So JVM size can be configured based on number request and cache entries.
what -Xmx value are you using when invoking the jvm? If you are not using an explicit value, the jvm will set one based on the machine features.
Once you give a max amount of heap to Solr, solr will potentially use all of it, if it needs to, that is how it works. If you to limit to say 2GB use -Xmx=2000m when you invoke the jvm. Not sure how large your docs are, but 300k docs would be considered a smallish index.

Resources