Why is solr QPS so low? - solr

We are running a vanilla solr installation. [ in non-cloud mode ].
Each document has about 100 fields, and the document size is ~5k bytes.
There are multiple cores, ~20 in a single solr instance. The total number of documents combined is ~2 million.
During testing, this node gives a peak QPS of ~100. For a modern 8core, 60G machine, this seems to be really low.
Does anyone have experience with solr internals to explain, why is it so slow?
Will using lucene library directly with a thin server wrapper give a higher QPS?

Related

How many transactions per second can happen in MongoDB?

I am writing a feature that might lead to us executing a few 100s or even 1000 mongodb transactions for a particular endpoint. I want to know if there is a maximum limit to the number of transactions that can occur in mongodb?
I read this old answer about SQL server Can SQL server 2008 handle 300 transactions a second? but couldn't find anything on mongo
It's really hard to find a non-biased benchmark, let alone the benchmark that your objectively reflect your projected workload.
Here is one, by makers of Cassandra (obviously, here Cassandra wins): Cassandra vs. MongoDB vs. Couchbase vs. HBase
few thousand operations/second as a starting point and it only goes up as the cluster size grows.
Once again - numbers here is just a baseline and can not be used to correctly estimate the performance of your application on your data. Not all the transactions are created equal.
Well, this isn't a direct answer to your question, but since you have quoted a comparison, I would like to share an experience with Couchbase. When it comes to Couchbase: a cluster's performance is usually limited by the network bandwidth (assuming you have given it SSD/NVMe storage which improves the storage latency). I have achieved in excess of 400k TPS on a 4 node cluster running Couchbase 4.x and 5.x. in a K/V use case.
Node specs below:
12 core x 2 Xeon on HP BL460c blades
SAS SSD's (NVMe would generally be a lot better)
10 GBPS network within the blade chassis
Before we arrived here, we moved on from MongoDB that was limiting the system throughput to a few tens of thousand at most.

SolrCloud deployment for high frequency traffic

We have been looking for a big data storage which can collect a huge pool of user information.
Also, I would like to mention that we are working on RTB platform (namely DSP side). As a result, our platform handles around 100 millions of requests per day. It is about 1-2 thousands of requests per second (depends on time). Here is a simple overview what we are going to implement:
My questions are:
Is it a good solution using Solr (SolrCloud) in Data Management Platform?
How do you think whether SolrCloud can handle high frequency traffic?
I have checked Solr performance problem and there is one option - extremely frequent commits (under Slow commits item). What is the limit?
What configuration of SolrCloud will be suitable for us? I mean amount of shards, cores and server configuration (CPU, RAM) to handle 1-2K QPS and store about 500M docs. How can we calculate this?

I changed solr.in.sh file and restarted solr. but solr core went down

I am running my solr in cloud mode. I have 3 shards and 6 cores. each shard has 2 nodes. I needed to change JMX configuration hence I change solr.in.sh file and restarted solr on one of the solr machines. Looks like core associated with that machine went down. Can anyone please help.
Looks like we were calling optimize call on solr quite frequently.
SolrCloud does not require index optimization as deleted documents are removed from Index segment files as the segments are merged.
SolrCloud's replication performance and its overall performance suffer dramatically when indices are optimized. Replication performance suffers because optimized indices have to be copied in their entirety, while unoptimized indices can be copied segment by segment. Overall performance suffers because optimization requires lots of CPU time and
IO time.
There should be no need to explicitly optimize indices.
If you are calling optimization quite frequently that might have slowed down your replication.

Running Search workload and Cassandra workload on the same physical node

Can't seem to find the answer to this obvious question.
We have 6 servers currently configured as "Search" workload running DSE.
My question is:
Is it possible to run Search (Solr) and Cassandra on the same physical box? (Not) Possible / (Not) Recommended?
I'm very confused with the fact that we currently are running all nodes as Solr nodes and I'm still able to use them as Cassandra (real time queries) - so it's technically both?
The "Services /Best Practice" tells me that:
"Please replace the current search nodes that have vnodes enabled with nodes without vnodes."
Our ideal situation would be:
a. Use all 6 servers as cassandra storage (+ real time queries)
b. Use 1 or 2 of the SAME servers as Solr Search.
The only documentation that I've found that somewhat resemble what we want to is -
http://www.datastax.com/documentation/datastax_enterprise/4.6/datastax_enterprise/deploy/deployWkLdSep.html
but as far as I understand it still says that I need to physically split the load, meaning dedicate 4 servers for cassandra and 2 nodes for solr/search ?
Can anyone explain/suggest anything?
Thank you!
DSE Search - C* and Solr on the Same node:
As Rock Brain mentioned, DSE Search will run Solr and Cassandra on the same node. More specifically, it will run it on the same JVM. This has heap implications. Recommendation is to bump your heap up to 14gb rather than the c* only 8gb.
As RB also mentioned, CPU consumption will be greater with Solr. However, I
often see Search DC's with fewer, beefier, nodes than C* nodes. Again this depends on your workload and how much data you're indexing.
Note: DSE Search Performance Tip
The main rule of thumb for performance is to try to fit all your DSE Indexes in the OS page cache so you may need more RAM than for a Cassandra only node to get optimal performance.
DSE Search and Workload Isolation:
You will find in the DataStax docs, that we recommend for you to run separate data centers for your cassandra workloads and for your search or analytics workloads. This basically prevents Search driven contention from affecting your cassandra ingestions.
The reason behind this recommendation is that many DSE customers have super-tight micro second sla's and very large workloads. You can get away with running search and c* in the same nodes (same DC) if you have looser SLA's and smaller workloads. Your best bet is to POC it with your workload on your hardware and see how it performs.
Can I activate DSE Search on just 2 of my 6 DSE nodes?
Not really, you most likely want to turn on search on your whole DC or not at all. For the following reasons:
the DSESimpleSnitch will automatically split them up into separate DC's so you'd have to use another snitch.
you will get cannot find endpoints errors on your Solr DC's if there aren't enough nodes with the right copies of your data. Remember, Cassandra is still responsible for replication and the Solr core on each node will only index the corresponding data that is on that node.
Turn on search in all 6, but feel free to direct c* queries at all of them and search queries only at 2 if you want. Not sure why you would want to though, you'll clearly see those 2 nodes will be under higher load in OpsCenter.
Remember that you can leverage Search queries right from CQL now as of DSE 4.6.
Vnodes vs. Non Vnodes for DSE Search
For your question on the comment above. Vnodes are not recommended for DSE Search as you will incur a performance hit. Specifically, pre 4.6 it was a large hit, ~300%. But as of 4.6 it's only a 30% performance hit for Search queries. The bigger the num_vnodes the larger the hit.
You can run vnodes on one DC and single tokens on the other DC. DSE will, by default, run single tokens.
Is it possible to run Search (Solr) and Cassandra on the same physical box? (Not) Possible / (Not) Recommended?
Yes, this is how DSE Search works, Cassandra and Solr run in the same process with the full functionality of both available.
Solr uses more CPU than Cassandra, so you will want more Solr nodes than dedicated Cassandra nodes. You will setup separate Cassandra and Solr data centers to divide the work load types.

Solr appears to block update requests while committing

We're running a master-slave setup with Solr 3.6 using the following auto-commit options:
maxDocs: 500000
maxTime: 600000
We have approx 5 million documents in our index which takes up approx 550GB. We're running both master and slave on Amazon EC2 XLarge instances (4 virtual cores and 15GB). We don't have a particularly high write throughput - about 100 new documents per minute.
We're using Jetty as a container which has 6GB allocated to it.
The problem is that once a commit has started, all our update requests start timing out (we're not performing queries against this box). The commit itself appears to take approx 20-25mins during which time we're unable to add any new documents to Solr.
One of the answers in the following question suggests using 2 cores and swapping them once its fully updated. However this seems a little over the top.
Solr requests time out during index update. Perhaps replication a possible solution?
Is there anything else I should be looking at regarding why Solr seems to be blocking requests? I'm optimistically hoping there's a "dontBlockUpdateRequestsWhenCommitting" flag in the config that I've overlooked...
Many thanks,
According to bounty reason and the problem mentioned at question here is a solution from Solr:
Solr has a capability that is called as SolrCloud beginning with 4.x version of Solr. Instead of previous master/slave architecture there are leaders and replicas. Leaders are responsible for indexing documents and replicas answers queries. System is managed by Zookeeper. If a leader goes down one of its replicas are selected as new leader.
All in all if you want to divide you indexing process that is OK with SolrCloud by automatically because there exists one leader for each shard and they are responsible for indexing for their shard's documents. When you send a query into the system there will be some Solr nodes (of course if there are Solr nodes more than shard count) that is not responsible for indexing however ready to answer the query. When you add more replica, you will get faster query result (but it will cause more inbound network traffic when indexing etc.)
For those who is facing a similar problem, the cause of my problem was i had too many fields in the document, i used automatic fields *_t, and the number of fields grows pretty fast, and when that reach a certain number, it just hogs solr and commit would take forever.
Secondarily, I took some effort to do a profiling, it end up most of the time is consumed by string.intern() function call, it seems the number of fields in the document matters, when that number goes up, the string.intern() seems getting slower.
The solr4 source appears no longer using the string.intern() anymore. But large number of fields still kills the performance quite easily.

Resources