how to replicate ExternalFileField file on SolrCloud - solr

I have the External File Field configured and working on a non-cloud Solr setup. Now I need to apply the same to a SolrCloud setup. I have 3 shard and 3 replication factors.
The EFF file needs to go into the data directory of the solr index. How do I upload/update the EFF file, since I have about 3 shards on 3 solr servers each.
Can Zookeeper be used to maintain these files too?
The issue is that updating these files manually, means going to each shard/replica and update them manually.
Any guidance anyone could provide about EFF and SolrCloud.
Thanks,
Brijesh

Related

Data dosn't get replicated in Solr 8.4 CDCR

i have configured Solr 8.4 on 3+3 , 6 systesm across 2 Data center. Uses zookeeper ensemble external server in each region.
I could configure the solrConfig.xml for test collection as per Solr manual. Followed the instruction manual provided by solr website to start solrCloud in a sequence.
When i insert a single record (document) in primary he dosn't get replicated. The replication happens only after i restart solrCloud on all servers. This is a new collection i defined. I have updated records manually thru UI. Not indexed the collection.
Do i have to restart solr evry time i update ? why records get updated only while restarting?
Please let me know if you have come across this.
Note: i dint run an index. it was empty collection using defaukt configset . Added record from UI in document section

Hbase indexer for Solr

I’m trying to index data from a Hbase table using lucid works hbase indexer , I would like to know if Solr , Hbase indexer & Hbase have to use the same zookeeper?
Can my Solr instance be independent while hbase and Hbase indexer are together reporting to zookeeper1 while Solr reports to its own zookeeper ?
Im following the below url
https://community.hortonworks.com/articles/1181/hbase-indexing-to-solr-with-hdp-search-in-hdp-23.html
It is up to our decisions whether go with the same zookeeper or the different independent one.
Because for hbase-zookeeper production setup zookeeper recommend the 3 node setup which means 3 zookeeper required for that setup. So we can make use of the same server for solr also.
It will help us to reduce the number of servers.
Zookeeper is light weight server which will be used to monitor solr server, so it would be good to keep the zookeeper outside the solr server for production run.

SolrCloud DIH implementation with zookeeper

I am going to put my old DataImportHandler configuration of solr 4.3 to SolrCloud 5.0.
I have already deployed zookeeper on 3 virtual machines and all are well communicating with each other. I have read about nodes, collections, shards and replicas but I am not able to collect how I can put my old DIH configurations to zookeeper. Currently I have 5 different DIH configurations which I need to put into solrCloud. Is that mean I have to create 5 nodes or collections?, yup I am confused here.
Thanks in Advance!
There is no need of extra node for configuration. Solr Cloud depends upon collection which is sharded across the nodes and you can create replica of it.
These are the Steps you need to do for SolrCloud :-
Run Zookeeper
Run Solrnodes with zookeeper
Upload configuration to zookeeper
Create collection by referring to the configuration
To upload configuration to zookeeper and create collection :-
Create a solrlibs directory
Copy /opt/solr/server/solr-webapp/webapp/WEB-INF/lib/* to it
Copy /opt/solr/server/lib/ext/* to it
Run the command java -classpath .:/opt/solrlibs/* org.apache.solr.cloud.ZkCLI -cmd upconfig -zkhost 192.168.1.1:2181,192.168.1.2:2181,192.168.1.3:2181 -confdir /opt/solrconfigs/test/conf -confname testconf
Create the collection using following command http://192.168.1.4:8080/solr/admin/collections?action=CREATE&name=test&collection.configName=testconf&numShards=2&replicationFactor=2
Num Shards and Replication factor will be based on number of nodes you have.

Solr cloud sharding

Currently I have a zookeeper instance controlling replication on 3 servers. It is the solr integrated zookeeper. It works well in my web based application.
I have a new requirement which will require sharding in the cloud and I am not sure how to implement it. Basically I want to separate the data which can only be updated by me, shard 1, from the data that users can update, shard 2. From time to time I will be completely replacing the data directory in shard 1 - but I don't want to disturb the user created data in shard 2.
Shard 1 does not need replication since I can copy the new data to each server when I chose to update it however shard 2 does need replication.
Currently I run the following command on the server running zookeeper -
java -Dbootstrap_confdir=solr/myApp/conf -Dcollection.configName=myConfig -DzkRun -DnumShards=1 -jar start.jar
And the following command on the other 2 non zookeeper servers
java -Djetty-port=8983 -DzkHost=129.**.30.11:9983 -jar start.jar&
This creates a single shard solr instance * 3
I think I just need to add 1 static shard to this configuration however I am not sure the sequence of commands to accomplish it.
Many thanks
Firstly you are using zookeeper to maintain your shards and leaders/replicas. So if you want to have one shard with two instances and another shard with only a leader then you will have to modify your command as:
1)provide -DnumShards=2 so that the zookeeper knows that you need two shards
2)specify the -DzkHost parameter for this first solr instance also.
java -Dbootstrap_confdir=solr/myApp/conf -Dcollection.configName=myConfig -DzkRun -DnumShards=2 -DzkHost=** -jar start.jar
When you do this you will see some errors on console since shard2 is not created as yet.
Now start your other two servers and you should see a shard1 with two servers(leader and replica) and shard2 will have only one instance i.e leader
If you want separation of indexes and control over those indexes.You will have to create two collections instead of two shards.
Explanation
you have 3 servers right!!! so when you will start solrCloud using zookeeper. following things will happen as:
1) start first solr server along with the zookeeper and you will get 1 shard for solr cloud as shard1
2) start second solr server and point to the zookeeper... since you have declared DnumShards=2 ,Zookeeper will check that it needs to create 1 more shard, so it creates shard2 for your collection. By now you will be able to see your admin console with 2 shards for 1 collection.
3) Now start your 3rd server and point it to zookeeper and now zookeeper sees that 2 shards are there so it will now create a replica for shard1 instead of a new shard.
so it will be like
collection--->shard1--->server1,server3
--->shard2--->server2

Solr replication can be done with automatic dictionary update?

I am wondering if Solr replication can be done with some key dictionary files update. I am building an index from a build machine and then these will be replicated to a few real production solr machines. One issue I have is that I have dictionary files (synonym and stemming related) which are used in index building in the build machine and those files need to be synchronized with replication. Does Solr have inherent mechanism of supporting it or do I have to program/script something on top of replication (does it have some kind of hook which can be called at the end of replication)?
Solr does support replication of configuration files, but the ones that are within the Conf folder.
Check How_are_configuration_files_replicated

Resources