While upgrading the solr from version 6.5 to 8.7, we observe the query time has been increased by 40%.
On solr 8.7 the difference between optimized and unoptimized index is also
very huge. 350 ms on optimized and 650 ms on unoptimized. The difference is
only 5 GB in size in cores of optimized and unoptimized. The segment count
in the optimized index is 1 and 20 in the unoptimized index.
I wanted to ask, Is this normal behavior on solr 8.7, or was there some
setting that we forgot to add? Pleas also tell us how can we reduce the
response time in unoptimzed core.
Specifications
We are using master slave architecture, Polling interval is 3 hours
RAM- 96 GB
CPU-14
Heap-30 GB
Index Size-95 GB
Segments size-20
Merge Policy :
mergePolicyFactory : org.apache.solr.index.TieredMergePolicyFactory
maxMergeAtOnce : 5
segmentsPerTier : 3
In Solr 8 the maxSegmentSizeMB is honored. If your index is way larger than 5GB, this means in Solr 6 the number of segments is few, but more in Solr 8 because of the size limitation per segment.
The more opened segments in runtime mean a request (search terms) must be looked up in more index segments. Furthermore, the memory allocation will be higher too and cause the GC to become more frequent.
Related
I need to refresh an index governed by SOLR 7.4. I use SOLRJ to access it on a 64 bit Linux machine with 8 CPUs and 32GB of RAM (8GB of heap for the indexing part and 24GB for SOLR server). The index to be refreshed is around 800MB in size and counts around 36k documents (according to Luke).
Before starting the indexing process itself, I need to "clean" the index and remove the Documents that do not match an actual file on disk (e.g : a document had been indexed previously and has moved since then, so user won't be able to open it if it appears on the result page).
To do so I first need to get the list of Document in index :
final SolrQuery query = new SolrQuery("*:*"); // Content fields are not loaded to reduce memory footprint
query.addField(PATH_DESCENDANT_FIELDNAME);
query.addField(PATH_SPLIT_FIELDNAME);
query.addField(MODIFIED_DATE_FIELDNAME);
query.addField(TYPE_OF_SCANNED_DOCUMENT_FIELDNAME);
query.addField("id");
query.setRows(Integer.MAX_VALUE); // we want ALL documents in the index not only the first ones
SolrDocumentList results = this.getSolrClient().
query(query).
getResults(); // This line sometimes gives OOM
When the OOM appears on the production machine, it appears during that "index cleaning" part and the stack trace reads :
Exception in thread "Timer-0" java.lang.OutOfMemoryError: Java heap space
at org.noggit.CharArr.resize(CharArr.java:110)
at org.noggit.CharArr.reserve(CharArr.java:116)
at org.apache.solr.common.util.ByteUtils.UTF8toUTF16(ByteUtils.java:68)
at org.apache.solr.common.util.JavaBinCodec.readStr(JavaBinCodec.java:868)
at org.apache.solr.common.util.JavaBinCodec.readStr(JavaBinCodec.java:857)
at org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:266)
at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:256)
at org.apache.solr.common.util.JavaBinCodec.readSolrDocument(JavaBinCodec.java:541)
at org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:305)
at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:256)
at org.apache.solr.common.util.JavaBinCodec.readArray(JavaBinCodec.java:747)
at org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:272)
at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:256)
at org.apache.solr.common.util.JavaBinCodec.readSolrDocumentList(JavaBinCodec.java:555)
at org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:307)
at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:256)
at org.apache.solr.common.util.JavaBinCodec.readOrderedMap(JavaBinCodec.java:200)
at org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:274)
at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:256)
at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:178)
at org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:50)
at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:614)
at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194)
at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:942)
at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:957)
I've aleady removed the content fields from the query because there were already OOMs, so I thought only storing "small" data would avoid OOMs, but they are still there. Moreover as I started the project for the customer we had only 8GB of RAM (so heap of 2GB), then we increased it to 20GB (heap of 5GB), and now to 32GB (heap of 8GB) and the OOM still appears, although the index is not that large compared to what is described in other SO questions (featuring millions of documents).
Please note that I cannot reproduce it on my dev machine less powerful (16GB RAM so 4GB of heap) after copying the 800 MB index from the production machine to my dev machine.
So to me there could be a memory leak. That's why I followed Netbeans post on Memory Leaks on my dev machine with the 800MB index. From what I see I guess there is a memory leak since indexing after indexing the number of surviving generation keeps increasing during the "index cleaning" (steep lines below) :
What should I do, 8GB of heap is already a huge quantity heap compared to the index characteristics ? So increasing the heap does not seem to make sense because the OOM only appears during the "index cleaning" not while actually indexing large documents, and it seems to be caused by the surviving generations, doesn't it ? Would creating a query object and then applying getResults on it would help the Garbage COllector ?
Is there another method to get all document paths ? Or maybe retrieving them chunk by chunk (pagination) would help even for that small amount of documents ?
Any help appreciated
After a while I finally came across this post. It exactly describe my issue
An out of memory (OOM) error typically occurs after a query comes in with a large rows parameter. Solr will typically work just fine up until that query comes in.
So they advice (emphasize is mine):
The rows parameter for Solr can be used to return more than the default of 10 rows. I have seen users successfully set the rows parameter to 100-200 and not see any issues. However, setting the rows parameter higher has a big memory consequence and should be avoided at all costs.
And this is what I see while retrieving 100 results per page :
The number of surviving generations has decreased dramatically although garbage collector's activity is much more intensive and computation time is way greater. But if this is the cost for avoiding OOM this is OK (see the program looses some seconds per index updates which can last several hours) !
Increasing the number of rows to 500 already makes the memory leak happens again (number of surviving generations increasing) :
Please note that setting the row number to 200 did not cause the number of surviving generations to increase a lot (I did not measure it), but did not perform much better in my test case (less than 2%) than the "100" setting :
So here is the code I used to retrieve all documents from an index (from Solr's wiki) :
SolrQuery q = (new SolrQuery(some_query)).setRows(r).setSort(SortClause.asc("id"));
String cursorMark = CursorMarkParams.CURSOR_MARK_START;
boolean done = false;
while (! done) {
q.set(CursorMarkParams.CURSOR_MARK_PARAM, cursorMark);
QueryResponse rsp = solrServer.query(q);
String nextCursorMark = rsp.getNextCursorMark();
doCustomProcessingOfResults(rsp);
if (cursorMark.equals(nextCursorMark)) {
done = true;
}
cursorMark = nextCursorMark;
}
TL;DR : Don't use a number too large for query.setRows ie not greater than 100-200 as a higher number may very much likely cause an OOM.
I am trying to profile cassandra on a single node cluster to see how much one node can handle inserts and then add more node as per this result.
I have changed certain parameters in cassandra.yaml. They are as follows.
memtable_offheap_space_in_mb: 4096
memtable_allocation_type: offheap_buffers
concurrent_compactors: 8
compaction_throughput_mb_per_sec: 32768
concurrent_reads: 64
concurrent_writes: 128
concurrent_counter_writes: 128
write_request_timeout_in_ms: 200000
Cassandra node: JVM heap size 12GB
I have added these parameters to the cassandra C++ driver APIs
cass_cluster_set_num_threads_io(cluster, 8);
cass_cluster_set_core_connections_per_host(cluster,8);
cass_cluster_set_write_bytes_high_water_mark(cluster,32768000);
cass_cluster_set_pending_requests_high_water_mark(cluster,16384000);
With these parameters i get a write speed of 13k/sec with data size of 1250 bytes.
I wanted to know am i missing out on anything in terms of parameter tuning to achieve a better performance.
Cassandra DB node details:
VM
CentOs 6
16GB RAM
8 cores. And is running on a separate box from the machine i am pumping data.
Any insight will be highly appreciated.
Q - I am forced to set Java Xmx as high as 3.5g for my solr app.. If i keep this low, my CPU hits 100% and response time for indexing increases a lot.. And i have hit OOM Error as well when this value is low..
Is this too high? If so, can I reduce this?
Machine Details
4 G RAM, SSD
Solr App Details (Standalone solr app, no shards)
num. of Solr Cores = 5
Index Size - 2 g
num. of Search Hits per sec - 10 [IMP - All search queries have faceting..]
num. of times Re-Indexing per hour per core - 10 (it may happen at
the same time at a moment for all the 5 cores)
Query Result Cache, Document cache and Filter Cache are all default size - 4 kb.
top stats -
VIRT RES SHR S %CPU %MEM
6446600 3.478g 18308 S 11.3 94.6
iotop stats
DISK READ DISK WRITE SWAPIN IO>
0-1200 K/s 0-100 K/s 0 0-5%
Try either increasing the RAM size or increasing the frequency of Index Rebuilt. If you are rebuilding the index 10 times in an hours, then Solr may not be the right choice. Solr Index tries to give faster results by keeping the index files in the OS memory.
Solr always use more than 90% of physical memory
I know solr search is I/O bound, if I have 4 node cluster and have an index separated into 4 blocks, which architecture below will have a better search performance :
1) Have 4 solr instances running in ONE single node and put each block of index over these 4 solr instances
2) Have a solr instances running in each node, hence total of 4-node cluster, and put each block of index into each solr instance.
Thanks!
The 2nd option will probably better and I explain.
Solr cores is a java program that contains few cache objects. When you put 4 Solr cores on the same node, they will use the same JVM RAM and the same CPU.
In the 1st oprion, the same JVM will need run 4 Solr cores and to collect the garbage of 4 cores instead of 1.
When you use 4 different nodes (4 JVMs) you will probably get better performance even if you host the 4 nodes on the same physical machine.
I am evaluating tokyo cabinet Table engine. The insert rate slows down considerable after hitting 1 million records. Batch size is 100,000 and is done within transaction. I tried setting the xmsiz but still no use. Has any one faced this problem with tokyo cabinet?
Details
Tokyo cabinet - 1.4.3
Perl bindings - 1.23
OS : Ubuntu 7.10 (VMWare Player on top of Windows XP)
I hit a brick wall around 1 million records per shard as well (sharding on the client side, nothing fancy). I tried various ttserver options and they seemed to make no difference, so I looked at the kernel side and found that
echo 80 > /proc/sys/vm/dirty_ratio
(previous value was 10) gave a big improvement - the following is the total size of the data (on 8 shards, each on its own node) printed every minute:
total: 14238792 records, 27.5881 GB size
total: 14263546 records, 27.6415 GB size
total: 14288997 records, 27.6824 GB size
total: 14309739 records, 27.7144 GB size
total: 14323563 records, 27.7438 GB size
(here I changed the dirty_ratio setting for all shards)
total: 14394007 records, 27.8996 GB size
total: 14486489 records, 28.0758 GB size
total: 14571409 records, 28.2898 GB size
total: 14663636 records, 28.4929 GB size
total: 14802109 records, 28.7366 GB size
So you can see that the improvement was in the order of 7-8 times. Database size was around 4.5GB per node at that point (including indexes) and the nodes have 8GB RAM (so dirty_ratio of 10 meant that the kernel tried to keep less than ca. 800MB dirty).
Next thing I'll try is ext2 (currently: ext3) and noatime and also keeping everything on a ramdisk (that would probably waste twice the amount of memory, but might be worth it).
I just set the cache option and it is now significantly faster.
I think modifying the bnum parameter in the dbtune function will also give a significant speed improvement.