I know solr search is I/O bound, if I have 4 node cluster and have an index separated into 4 blocks, which architecture below will have a better search performance :
1) Have 4 solr instances running in ONE single node and put each block of index over these 4 solr instances
2) Have a solr instances running in each node, hence total of 4-node cluster, and put each block of index into each solr instance.
Thanks!
The 2nd option will probably better and I explain.
Solr cores is a java program that contains few cache objects. When you put 4 Solr cores on the same node, they will use the same JVM RAM and the same CPU.
In the 1st oprion, the same JVM will need run 4 Solr cores and to collect the garbage of 4 cores instead of 1.
When you use 4 different nodes (4 JVMs) you will probably get better performance even if you host the 4 nodes on the same physical machine.
Related
While upgrading the solr from version 6.5 to 8.7, we observe the query time has been increased by 40%.
On solr 8.7 the difference between optimized and unoptimized index is also
very huge. 350 ms on optimized and 650 ms on unoptimized. The difference is
only 5 GB in size in cores of optimized and unoptimized. The segment count
in the optimized index is 1 and 20 in the unoptimized index.
I wanted to ask, Is this normal behavior on solr 8.7, or was there some
setting that we forgot to add? Pleas also tell us how can we reduce the
response time in unoptimzed core.
Specifications
We are using master slave architecture, Polling interval is 3 hours
RAM- 96 GB
CPU-14
Heap-30 GB
Index Size-95 GB
Segments size-20
Merge Policy :
mergePolicyFactory : org.apache.solr.index.TieredMergePolicyFactory
maxMergeAtOnce : 5
segmentsPerTier : 3
In Solr 8 the maxSegmentSizeMB is honored. If your index is way larger than 5GB, this means in Solr 6 the number of segments is few, but more in Solr 8 because of the size limitation per segment.
The more opened segments in runtime mean a request (search terms) must be looked up in more index segments. Furthermore, the memory allocation will be higher too and cause the GC to become more frequent.
We have two osb nodes in cluster. One of node osb1 has less ovearall response time ( 1 sec) when measured in appdynamics, another node osb2 has high response(20sec). We brought down each of this node and tested individually. We see same behavior. Any suggestions on what to look into to identify the issue.? The osb configuration across both the nodes Is identical and jvm configuration also identical. Heap usage is same. CPU bit differs.
I am trying to profile cassandra on a single node cluster to see how much one node can handle inserts and then add more node as per this result.
I have changed certain parameters in cassandra.yaml. They are as follows.
memtable_offheap_space_in_mb: 4096
memtable_allocation_type: offheap_buffers
concurrent_compactors: 8
compaction_throughput_mb_per_sec: 32768
concurrent_reads: 64
concurrent_writes: 128
concurrent_counter_writes: 128
write_request_timeout_in_ms: 200000
Cassandra node: JVM heap size 12GB
I have added these parameters to the cassandra C++ driver APIs
cass_cluster_set_num_threads_io(cluster, 8);
cass_cluster_set_core_connections_per_host(cluster,8);
cass_cluster_set_write_bytes_high_water_mark(cluster,32768000);
cass_cluster_set_pending_requests_high_water_mark(cluster,16384000);
With these parameters i get a write speed of 13k/sec with data size of 1250 bytes.
I wanted to know am i missing out on anything in terms of parameter tuning to achieve a better performance.
Cassandra DB node details:
VM
CentOs 6
16GB RAM
8 cores. And is running on a separate box from the machine i am pumping data.
Any insight will be highly appreciated.
Q - I am forced to set Java Xmx as high as 3.5g for my solr app.. If i keep this low, my CPU hits 100% and response time for indexing increases a lot.. And i have hit OOM Error as well when this value is low..
Is this too high? If so, can I reduce this?
Machine Details
4 G RAM, SSD
Solr App Details (Standalone solr app, no shards)
num. of Solr Cores = 5
Index Size - 2 g
num. of Search Hits per sec - 10 [IMP - All search queries have faceting..]
num. of times Re-Indexing per hour per core - 10 (it may happen at
the same time at a moment for all the 5 cores)
Query Result Cache, Document cache and Filter Cache are all default size - 4 kb.
top stats -
VIRT RES SHR S %CPU %MEM
6446600 3.478g 18308 S 11.3 94.6
iotop stats
DISK READ DISK WRITE SWAPIN IO>
0-1200 K/s 0-100 K/s 0 0-5%
Try either increasing the RAM size or increasing the frequency of Index Rebuilt. If you are rebuilding the index 10 times in an hours, then Solr may not be the right choice. Solr Index tries to give faster results by keeping the index files in the OS memory.
Solr always use more than 90% of physical memory
I'm experimenting with different infrastructure approaches and I'm surprised to notice the following.
I've indexed 1.3M documents (all fields indexed, stored, and some shingle-analyzed) using DataImportHandler via sql query in Solr4.4.
Approach1: Single Solr instance
Indexing time: ~10 minutes
Size of "index" folder: 1.6GB
Approach2: SolrCloud with two index slices.
Indexing time: ~11 minutes
Size of "index" folders: 1.6GB + 1.5GB = 3.1GB
Each of index slice has around 0.65M documents adding to original total count which is expected.
Approach3:SolrCloud with two shards (1 leader + 1 replica)
Indexing time: ~30 minutes
Size of "index" folders: Leader (4.6GB), replica (3.8GB) = 8.4GB (expected this to be 1.6gb * 2, but it is ~1.6gb*5.25)
I've followed the SolrCloud tutorial.
I realize that there's some meta-data (please correct me if I'm wrong) like term dictionary, etc. which has to exist in all the instances irrespective of slicing (partition) or sharding (replication).
However, approach 2 and 3 show drastic growth (400%) in the final index size.
Please could you provide insights.
From the overall index size I suppose your documents are quite small. That is why the relative size of the terms dictionary is big - for that number of documents it's pretty similar, so you have it twice. Therefore 1.6 turns into 3.1Gb.
As for Approach 3 - are you sure that it's a clean test? Any chance you have included transaction log in the size? What happens if you optimize?
You can check what exactly adds to the size by checking the index files extensions.
See here:
https://lucene.apache.org/core/4_2_0/core/org/apache/lucene/codecs/lucene42/package-summary.html#file-names