SolrNet support for failover scanerio with SolrCloud cluster - solr

Does SolrNet have built-in support for fail-over scenarios with SolrCloud?
I have 3 nodes in SolrCloud cluster, with external ZooKeeper ensembly. I use SolrNet client to communicate with Solr, but it is obviously uses connects to just one Solr node; when this Solr node I need to use another node.
I am currently using ZooKeeperNetEx library to get the list of alive nodes from /live_nodes - but I am wondering maybe it is an overkill and SolrNet is already SOlr-Cloud-aware and will automatically switch to another Solr node if currect one dies?

According to Basic usage of cloud mode in the SolrNet documentation, you use a SolrCloudStateProvider with the zkurls when creating your instance:
var zookeeperConnectionString = "127.0.0.1:2181";
var collectionName = "collection_name";
Startup.Init<Product>(new SolrCloudStateProvider(zookeeperConnectionString), collectionName);
I'm guessing the connection string follow the regular zookeeper format, which means that you can give it a list of zookeeper instances to use by separating the hosts/ips with , between them (192.168.0.1:2181,192.168.0.2:2181,...).

Related

Change "Solr Cluster" in Lucidworks Fusion 4

I am running Fusion 4.2.4 with external Zookeeper (3.5.6) and Solr (7.7.2). I have been running a local set of servers and am trying to move to AWS instances. All of the configuration from my local Zookeepers has been duplicated to the AWS instances so they should be functionally equivalent.
I am to the point where I want to shut down the old (local) Zookeeper instances and just use the ones running in AWS. I have changed the configuration for Solr and Fusion (fusion.properties) so that they only use the AWS instances.
The problem I have is that Fusion's solr cluster (System->Solr Clusters) associated with all of my collections is still set to the old Zookeepers :9983,:9983,:9983 so if I turn off all of the old instances of Zookeeper my queries through Fusion's Query API no longer work. When I try to change the "Connect String" for that cluster it fails because the cluster is currently in use by collections. I am able to create a new cluster but there is no way that I can see to associate the new cluster with any of my collections. In a test environment set up similar to production, I have changed the searchClusterId for a specific collection using Fusion's Collections API however after doing so the queries still fail when I turn off all of the "old" Zookeeper instances. It seems like this is the way to go so I'm surprised that it doesn't seem to work.
So far, Lucidworks's support has not been able to provide a solution - I am open to suggestions.
This is what I came up with to solve this problem.
I created a test environment with an AWS Fusion UI/API/etc., local Solr, AWS Solr, local ZK, and AWS ZK.
1. Configure Fusion and Solr to only have the AWS ZK configured
2. Configure the two ZKs to be an ensemble
3. Create a new Solr Cluster in Fusion containing only the AWS ZK
4. For each collection in Solr
a. GET the json from <fusion_url>:8764/api/collections/<collection>
b. Edit the json to change “searchClusterId” to the new cluster defined in Fusion
c. PUT the new json to <fusion_url>:8764/api/collections/<collection>
After doing all of this, I was able to change the “default” Solr cluster in the Fusion Admin UI to confirm that no collections were using it (I wasn’t sure if anything would use the ‘default’ cluster so I thought it would be wise to not take the chance).
I was able to then stop the local ZK, put the AWS ZK in standalone mode, and restart Fusion. Everything seems to have started without issues.
I am not sure that this is the best way to do it, but it solved the problem as far as I could determine.

How to scale and distribute the SOLR CLOUD nodes

I have initially setup the SOLR CLOUD with two solr nodes as shown below.
I have to add a new solr node (i.e) with additional shard and same number of replicas with the existing SOLR CLUSTER nodes.
I have already gone through the SOLR scaling and distributing https://cwiki.apache.org/confluence/display/solr/Introduction+to+Scaling+and+Distribution
But the above link contains information of scaling only for SOLR standalone mode. That's the sad part.
I have started the SOLR CLUSTER nodes using the following command
./bin/solr start -c -s server/solr -p 8983 -z [zkip's] -noprompt
Kindly share the command command for creating the new shard for adding new node.
Thanks in advance.
From my knowledge am sharing this answer.
Adding a new SOLR CLOUD /SOLR CLUSTER node is that having the copy of
all the SHARDs into the new box(through replication of all SHARDs).
SHARD : The actual data is equally splitted across the number of SHARDs we create (while creating the collection).
So while adding the new SOLR CLOUD node make sure that all the SHARD should be available on the new node(RECOMENDED) or as required.
Naming Standards of SOLR CORE in SOLR CLOUD MODE/ CLUSTER MODE
Syntax:
<COLLECTION_NAME>_shard<SHARD_NUMBER>_replica<REPLICA_NUMBER>
Example
CORE NAME : enter_2_shard1_replica1
COLLECTION_NAME : enter_2
SHARD_NUMBER : 1
REPLICA_NUMBER : 1
STEPS FOR ADDING THE NEW SOLR CLOUD/CLUSTER NODE
Create a core with the common collection name as we used in the existing SOLR CLOUD nodes.
Notes while creating a new core in new node
Example :
enter_2_shard1_replica1
enter_2_shard1_replica2
From the above example the maximum value/number of repilca of the corresponding shard is 2(enter_2_shard1_replica2)
So in the new node while creating a core, give the replica numbering as 3 "enter_2_shard1_replica3" so that SOLR will take this as the third replication of the corresponding SHARD.
Note : replica numbering should be in a incremental oreder of 1
Give time to replicate the data from the existing node to the new node.

Solr cloud distributed search on collections

Currently I have a zookeeper instance controlling replication on 3 physical servers. It is the solr integrated zookeeper. 1 shard, 1 collection.
I have a new requirement in which I will need a new static solr instance (1 new collection, no replication). Same schema as previous collection. A copy of this instance will also be placed on the 3 physical servers mentioned above. A caveat is that I need to perform distributed searches across the 2 collections and have the results blended.
Thanks to javacreed I now know that sharding is not in my solution. Previous questions answers here and here.
In my current setup I run the following command on the server running zookeeper -
java -Dbootstrap_confdir=solr/myApp/conf -Dcollection.configName=myConfig -DzkRun -DnumShards=1 -jar start.jar
Am I correct in saying that this will not change and I will now also manually start the non replicated collection. I really only need to change my search queries to include the 'collection' parameter? Something like -
http://localhost:8983/solr/collection1/select?collection=collection1,collection2
This example is from Solr documentation. I am slightly confused as to whether it should be ...solr/collection1/select?... or ...solr/collection2/select?... or if it even matters?
Thanks
Thanks for your kind word stewart.You can search it directly on solr as
http://localhost:8983/solr/select?collection=collection1,collection2
There is no need to mention any collection path since you are defining them in the collection parameters.

Alfresco 4.1 Search Integration?

I am using alfresco 4.1 & want to integrate it with solr search so that solr can access all alfresco content (keeping in mind authentication etc).
I know solr is inbuilt inside alfresco but i have a solr instance running separately which integrates searches from a number of other sources also such as DBs etc.
Which would be the best way forward?
Regards.
I got one of the way to do it. I can install solr on a separate server & use solrj apis within alfresco for integration.
Programatically, i can dump alfresco content for indexing to solr search using solrj using custom built xml for alfresco contents. Once xml are available to solr, it can be consumed by solr server & can be indexed by it. These indexes would be separate from alfresco OOTB indexes.
Once indexed, it is good to go. Using solr apis, i can search same content in solr server, instead of alfresco plus benefit of integrating multiple content sources with solr so that solr can be used as universal search.
However, when content items go to large volumes, i could see a performance hit as for every content item in alfresco , solr needs to index it. Any workaround possible?
Regards.

How to do indexing data from database using apache solr with glassfish server on linux?

I want to create a search box in my web app using Apache Lucene and Apache Solr.I am using postgres database and have to do it with java.
As I new to these concepts (solr,lucene), I am struggling with this. I already installed and configured apache Solr with glassfish.Now I dont know how to start with this, Whether I have to cretae a java project in eclipse or I have to use Solr admin GUI.
can any one help me on this?
Thanks in Advance.....
In order to make data searchable, you have to first index your data. You can use one of the following ways to index data.
By using Solr clients such as Solrj
If you store your data in relational DB then you can use DataImportHandler
By posting XML or Json messages. Check here for documentation.
When new data added you can index it using Solr clients (Solrj). You can also search your data using Solrj or any other client libraries.
You can find other client libraries here.
You can start with Solr DIH to index the data from postgres to Solr.
For more detailed understanding you can refer to :-
how-to-import-data-from-sql-databases-part-1
how-to-import-data-from-sql-databases-part-2
how-to-import-data-from-sql-databases-part-3

Resources