Is there any way to remove dead replicas in solrcloud? - solr

I am using solr 4.5. After several tests I have noticed a lot of dead (non existing) replicas are shown in my SolrCloud graph as gone (black). Is there any way to force my solr to forget about this gone replicas?
I think that manually modifying /clusterstate.json node in zookeeper might help but did not try it yet.

The simplest way I found is in fact editing /clusterstate.json in zookeeper, and removing dead replicas info from it.

I don't know if there is any way to do some sort of global cleanup... but:
There is an API to remove some specific replica:
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-DeleteaReplica
As well as for removing the (INACTIVE) shard with all it's replica's (4.4+):
https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-DeleteaShard
And, if this is something related to production and not only for testing purpose - you may also want to look at this upcoming change from 4.6 related to registering the replica that was previously removed - https://issues.apache.org/jira/browse/SOLR-5311

Related

Solr AutoScaling - Add replicas on new nodes

Using Solr version 7.3.1
Starting with 3 nodes:
I have created a collection like this:
wget "localhost:8983/solr/admin/collections?action=CREATE&autoAddReplicas=true&collection.configName=my_col_config&maxShardsPerNode=1&name=my_col&numShards=1&replicationFactor=3&router.name=compositeId&wt=json" -O /dev/null
In this way I have a replica on each node.
GOAL:
Each shard should add a replica to new nodes joining the cluster.
When a node are shoot down. It should just go away.
Only one replica for each shard on each node.
I know that it should be possible with the new AutoScalling API but I am having a hard time finding the right syntax. The API is very new and all I can find is the documentation. Its not bad but I am missing some more examples.
This is how its looks today. There are many small shard each with a replication factor that match the numbers of nodes. Right now there are 3 nodes.
This video was uploaded yesterday (2018-06-13) and around 30 min. into the video there is an example of the Solr.HttpTriggerListener that can be used to call any kind of service, for example an AWS Lamda to add new nodes.
The short answer is that your goals are not not achievable today (till Solr 7.4).
The NodeAddedTrigger only moves replicas from other nodes to the new node in an attempt to balance the cluster. It does not support adding new replicas. I have opened SOLR-12715 to add this feature.
Similarly, the NodeLostTrigger adds new replicas on other nodes to replace the ones on the lost node. It, too, has no support for merely deleting replicas from cluster state. I have opened SOLR-12716 to address that issue. I hope to release both the enhancements in Solr 7.5.
As for the third goal:
Only one replica for each shard on each node.
To achieve this, a policy rule given in the "Limit Replica Placement" example should suffice. However, looking at the screenshot you've posted, you actually mean a (collection,shard) pair which is unsupported today. You'd need a policy rule like the following (following does not work because collection:#EACH is not supported):
{"replica": "<2", "collection": "#EACH", "shard": "#EACH", "node": "#ANY"}
I have opened SOLR-12717 to add this feature.
Thank you for these excellent use-cases. I'll recommend asking questions such as these on the solr-user mailing list because not a lot of Solr developers frequent Stackoverflow. I could only find this question because it was posted on the docker-solr project.

Solr nodes' replication is getting stuck

We have standalone solr servers which are master and slave. Also have a full indexer job nightly. Generally, when job executed successful everything is alright. But last days, we noticed that indexer node has different document number with searching node. So, expected productions are not available in our production system. That's why we had to restart nodes and start replication manually, then problem went away. We need to prevent to occur this problem again. What do you suggest us to check or where should i look at? Indeed i think that essential error about the issue is: "SEVERE: No files to download for index generation"
Regards

upgrade solr from 4.2.1 to 5.3.1

I've been tasked with migrating from our solr 4.2.1 server to a new solr server, 5.3.1. I was hoping I could just pick up the cores, and move them over with a little but of editing files. But atlas, I can't quite figure it out.
I have tried moving a single core, and creating a core.properties files with the name of the core and I get:
testcore: org.apache.solr.common.SolrException:org.apache.solr.common.SolrException: Error loading class 'solr.JsonUpdateRequestHandler'
Any thoughts as to what the problem might be? Any thoughts would be appreciated, thank you!
I am in the final stages of the similar upgrade; here is how I suggest you proceed.
Install both versions side by side and create the collection in new solr
Take your default schema/solrconfig from the new solr and move stuff into it from your old schema/solrconfig. The formatting changed, so you will need to manually move all of your config.
Make sure that works
Move the indexes - once your solrconfig and schema match up you should be able to use your old indexes (data directory).
To complete the upgrade you will need to re-index into a new but similar collection. This will upgrade the underlying lucene indexes. Your new version of solr has cursor mark support so it becomes much simplier; especially if you are using collection aliases.
JSON does not have its own request handler any longer (changed in 4.x, removed in 5.x). It has now been merged into the standard solr.UpdateRequestHandler, and the request handler is selected internally based on the Content-Type header of the request.

Disappearing cores in Solr

I am new to Solr.
I have created two cores from the admin page, let's call them "books" and "libraries", and imported some data there. Everything works without a hitch until I restart the server. When I do so, one of these cores disappears, and the logging screen in the admin page contains:
SEVERE CoreContainer null:java.lang.NoClassDefFoundError: net/arnx/jsonic/JSONException
SEVERE SolrCore REFCOUNT ERROR: unreferenced org.apache.solr.core.SolrCore#454055ac (papers) has a reference count of 1
I was testing my query in the admin interface; when I refreshed it, the "libraries" core was gone, even though I could normally query it just a minute earlier. The contents of solr.xml are intact. Even if I restart Tomcat, it remains gone.
Additionally, I was trying to build a query similar to this: "Find books matching 'war peace' in libraries in Atlanta or New York". So given cores "books" and "libraries", I would issue "books" the following query (which might be wrong, if it is please correct me):
(title:(war peace) blurb:(war peace))
AND _query_:"{!join
fromIndex=libraries from=libraryid to=libraryid
v='city:(new york) city:(atlanta)'}"
When I do so, the query fails with "libraries" core disappears, with the above symptoms. If I re-add it, I can continue working (as long as I don't restart the server or issue another join query).
I am using Solr 4.0; if anyone has a clue what is happening, I would be very grateful. I could not find out anything about the meaning of the error message, so if anyone could suggest where to look for that, or how go about debugging this, it would be really great. I can't even find where the log file itself is located...
I would avoid the Debian package which may be misconfigured and quirky. And it contains (a very early build of?) solr 4.0, which itself may have lingering issues; being the first release in a new major version. The package maintainer may not have incorporated the latest and safest Solr release into his package.
A better way is to download Solr 4.1 yourself and set it up yourself with Tomcat or another servlet container.
In case you are looking to install SOLR 4.0 and configure, you can following the installation procedure from here
Update the solr config for the cores to be persistent.
In your solr.xml, update <solr> or <solr persistent="false"> to <solr persistent="true">

Running Solr in read-only mode

I think I'm missing something obvious here. I have to imagine a lot of people open up their Solr servers to other developers and don't want them to be able to modify the index.
Is there something in solrconfig.xml that can be set to effectively make the index read-only?
Update for clarification:
My goal is to use Solr with an existing Lucene index managed by another application. This works just fine, but I want to be sure Solr never tries to write to this index.
Exposing a Solr instance to the public internet is a bad idea. Even though you can strip some components to make it read-only, it just wasn't designed with security in mind, it's meant to be used as an internal service, just like you wouldn't expose a RDBMS.
From the Solr Security wiki page:
First and foremost, Solr does not
concern itself with security either at
the document level or the
communication level. It is strongly
recommended that the application
server containing Solr be firewalled
such the only clients with access to
Solr are your own. A default/example
installation of Solr allows any client
with access to it to add, update, and
delete documents (and of course
search/read too), including access to
the Solr configuration and schema
files and the administrative user
interface.
Even ajax-solr, a Solr client for javascript meant to run in a browser, recommends talking to Solr through a proxy.
Take for example guardian.co.uk: it's well-known that they use Solr for searching, but they built an API to let others access their content. This way they can define and control exactly what and how they want people to search for things.
Otherwise, any script kiddie can write a trivial loop to DoS your Solr instance and therefore bring down your site.
You can probably just remove the line that defines your solr.XmlUpdateRequestHandler in solrconfig.xml.
Replication is a nice way to setup read-only while being able to do indexation. Just setup a master with restricted access and a slave that is read-only (by removing your XmlUpdateRequestHandler from the config). The slave will be replicated from the master but won't accept any indexation directly.
UPDATE
I just read that in Solr 1.4, you can disable component. I just tried it on the /update requestHandler and I was not able to index anymore.

Resources