Change "Solr Cluster" in Lucidworks Fusion 4 - solr

I am running Fusion 4.2.4 with external Zookeeper (3.5.6) and Solr (7.7.2). I have been running a local set of servers and am trying to move to AWS instances. All of the configuration from my local Zookeepers has been duplicated to the AWS instances so they should be functionally equivalent.
I am to the point where I want to shut down the old (local) Zookeeper instances and just use the ones running in AWS. I have changed the configuration for Solr and Fusion (fusion.properties) so that they only use the AWS instances.
The problem I have is that Fusion's solr cluster (System->Solr Clusters) associated with all of my collections is still set to the old Zookeepers :9983,:9983,:9983 so if I turn off all of the old instances of Zookeeper my queries through Fusion's Query API no longer work. When I try to change the "Connect String" for that cluster it fails because the cluster is currently in use by collections. I am able to create a new cluster but there is no way that I can see to associate the new cluster with any of my collections. In a test environment set up similar to production, I have changed the searchClusterId for a specific collection using Fusion's Collections API however after doing so the queries still fail when I turn off all of the "old" Zookeeper instances. It seems like this is the way to go so I'm surprised that it doesn't seem to work.
So far, Lucidworks's support has not been able to provide a solution - I am open to suggestions.

This is what I came up with to solve this problem.
I created a test environment with an AWS Fusion UI/API/etc., local Solr, AWS Solr, local ZK, and AWS ZK.
1. Configure Fusion and Solr to only have the AWS ZK configured
2. Configure the two ZKs to be an ensemble
3. Create a new Solr Cluster in Fusion containing only the AWS ZK
4. For each collection in Solr
a. GET the json from <fusion_url>:8764/api/collections/<collection>
b. Edit the json to change “searchClusterId” to the new cluster defined in Fusion
c. PUT the new json to <fusion_url>:8764/api/collections/<collection>
After doing all of this, I was able to change the “default” Solr cluster in the Fusion Admin UI to confirm that no collections were using it (I wasn’t sure if anything would use the ‘default’ cluster so I thought it would be wise to not take the chance).
I was able to then stop the local ZK, put the AWS ZK in standalone mode, and restart Fusion. Everything seems to have started without issues.
I am not sure that this is the best way to do it, but it solved the problem as far as I could determine.

Related

How to integrate apache solr in apache tomcat 7?

Is there any way to integrate apache solr in tomcat 7 or do I have to run two servers in my application? Can in run apache solr within tomcat and not as a separate server?
The official answer is: Run it as a stand alone application.
No Longer Supported
Beginning with Solr 5.0, Support for deploying Solr as a WAR in servlet containers like Tomcat is no longer supported.
For information on how to install Solr as a standalone server, please see Installing Solr.
Background about the the decision for this move can be found on the Solr Wiki.
Solr is intended to be a server not a Java web application, similar to mysql or the Apache web server. When Solr was first created, designing it as a web application was a convenient choice, to avoid writing a lot of tricky code to build a network layer. These days, this design decision has become a limiting factor.
When you download Solr and install it onto your machine, it should be Solr that gets started. It should not be necessary to install Solr into a third-party application (servlet container) before it will work.
At this time, Solr is still a webapp, but this is an internal implementation detail, not an immutable property. The intention is to make Solr into a completely standalone application. Startup scripts that start the included container are the first step towards that goal. Jetty might still be the technology used once Solr is a standalone application, but if that happens, it will be internally embedded.
Solr is still a web app. They are suggesting that not to use it in another servlet container as its not been recommended and tested by them.
They are giving a fully tested system and don't want others(developers) to invest time in testing it with other containers.
I have used solr 3.4 version and deployed it in tomcat server on port 8080.
My application is deployed in another tomcat at 8080 port and is on another machine.
I have created the solr war and deployed the same in tomcat. and only 8080 port of that server is made open and been access by our application only.
I am not sure about the solr version 5 ...i.e. whether that can be deployed in other container other than jetty...but I think it can be deployed in tomcat...you need to try the same...
Other thing is that dont deply application war and solr war in the same container or in a single container. The reason is if one of the application goes down everything will be down.
Like if solr goes down then the application may go down, which is not a good for the application.

SolrCloud: Unable to Create Collection, Locking Issues

I have been trying to implement a SolrCloud, and everything works fine until I try to create a collection with 6 shards. My setup is as follows:
5 virtual servers, all running Ubuntu 14.04, hosted by a single company across different data centers
3 servers running ZooKeeper 3.4.6 for the ensemble
2 servers, each running Solr 5.1.0 server (Jetty)
The Solr instances each have a 2TB, ext4 secondary disk for the indexes, mounted at /solrData/Indexes. I set this value in solrconfig.xml via <dataDir>/solrData/Indexes</dataDir>, and uploaded it to the ZooKeeper ensemble. Note that these secondary disks are neither NAS nor NFS, which I know can cause problems. The solr user owns /solrData.
All the intra-server communication is via private IP, since all are hosted by the same company. I'm using iptables for firewall, and the ports are open and all the servers are communicating successfully. Config upload to ZooKeeper is successful, and I can see via the Solr admin interface that both nodes are available.
The trouble starts when I try to create a collection using the following command:
http://xxx.xxx.xxx.xxx:8983/solr/admin/collections?action=CREATE&name=coll1&maxShardsPerNode=6&router.name=implicit&shards=shard1,shard2,shard3,shard4,shard5,shard6&router.field=shard&async=4444
Via the Solr UI logging, I see that multiple index creation commands are issued simultaneously, like so:
6/25/2015, 7:55:45 AM WARN SolrCore [coll1_shard2_replica1] Solr index directory '/solrData/Indexes/index' doesn't exist. Creating new index...
6/25/2015, 7:55:45 AM WARN SolrCore [coll1_shard1_replica2] Solr index directory '/solrData/Indexes/index' doesn't exist. Creating new index...
Ultimately the task gets reported as complete, but in the log, I have locking errors:
Error creating core [coll1_shard2_replica1]: Lock obtain timed out: SimpleFSLock#/solrData/Indexes/index/write.lock
SolrIndexWriter was not closed prior to finalize(),​ indicates a bug -- POSSIBLE RESOURCE LEAK!!!
Error closing IndexWriter
If I look at the cloud graph, maybe a couple of the shards will have been created, others are closed or recovering, and if I restart Solr, none of the cores can fire up.
Now, I know what you're going to say: follow this SO post and change solrconfig.xml locking settings to this:
<unlockOnStartup>true</unlockOnStartup>
<lockType>simple</lockType>
I did that, and it had no impact whatsoever. Hence the question. I'm about to have to release a single Solr instance into production, which I hate to do. Does anybody know how to fix this?
Based on the log entry you supplied, it looks like Solr may be creating the data (index) directory for EACH shard in the same folder.
Solr index directory '/solrData/Indexes/index' doesn't exist. Creating new index...
This message was shown for two different collections and it references the same location. What I usually do, is change my Solr Home to a different directory, under which all collection "instance" stuff will be created. Then I manually edit the core.properties for each shard to specify the location of the index data.

Solr cloud distributed search on collections

Currently I have a zookeeper instance controlling replication on 3 physical servers. It is the solr integrated zookeeper. 1 shard, 1 collection.
I have a new requirement in which I will need a new static solr instance (1 new collection, no replication). Same schema as previous collection. A copy of this instance will also be placed on the 3 physical servers mentioned above. A caveat is that I need to perform distributed searches across the 2 collections and have the results blended.
Thanks to javacreed I now know that sharding is not in my solution. Previous questions answers here and here.
In my current setup I run the following command on the server running zookeeper -
java -Dbootstrap_confdir=solr/myApp/conf -Dcollection.configName=myConfig -DzkRun -DnumShards=1 -jar start.jar
Am I correct in saying that this will not change and I will now also manually start the non replicated collection. I really only need to change my search queries to include the 'collection' parameter? Something like -
http://localhost:8983/solr/collection1/select?collection=collection1,collection2
This example is from Solr documentation. I am slightly confused as to whether it should be ...solr/collection1/select?... or ...solr/collection2/select?... or if it even matters?
Thanks
Thanks for your kind word stewart.You can search it directly on solr as
http://localhost:8983/solr/select?collection=collection1,collection2
There is no need to mention any collection path since you are defining them in the collection parameters.

Embedded Solr on Amazon AWS

Currently, I have developed a web application. In my web application, I used embedded solr server to make indexing. After that I deployed onto the Tomcat 6 on window xp. Everything is ok. Next, I have tried my web application to deploy on Amazon AWS. My platform is linux + mysql. When I deployed, I got the exception related with embedded solr.
[ WARN] 19:50:55 SolrCore - [] Solr index directory 'solrhome/./data/index' doesn't exist. Creating new index...
[ERROR] 19:50:55 CoreContainer - java.lang.RuntimeException: java.io.IOException: Cannot create directory: /usr/share/tomcat6/solrhome/./data/index
at org.apache.solr.core.SolrCore.initIndex(SolrCore.java:403)
at org.apache.solr.core.SolrCore.<init>(SolrCore.java:552)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:480)
So how to fix my problem. I am novie to linux.
My guess is that the user you are running Solr under does not have permission to access that directory.
Also, which version of Solr are you using? Looks like 3+. The latest version is 4, so it may make sense to try using that from the start. Probably a bit more troubleshooting to start, but a much better pay off that starting with legacy configuration.
I got solution. That is because of permission affair on Amazon Linux with ec2-user. So , I changed permission by following.
sudo chmod -R ugo+rw /usr/share/tomcat6
http://wiki.apache.org/solr/SolrOnAmazonEC2strong text
t should allow access to ports 22 and 8983 for the IP you're working from, with routing prefix /32 (e.g., 4.2.2.1/32). This will limit access to your current machine. If you want wider access to the instance available to collaborate with others, you can specify that, but make sure you only allow as much access as needed. A Solr instance should not be exposed to general Internet traffic. If you need help figuring out what your IP is, you can always use whatismyip.com. Please note that production security on AWS is a wide ranging topic and is beyond the scope of this tutorial.

Install Jetty or run embedded for Solr install

I am about to install Solr on a production box. It will be the only Java applet running and be on the same box as the web server (nginx).
It seems there are two options.
Install Jetty separately and configure to use with Solr
Set Solr's embedded Jetty server to start as a service and just use that
Is there any performance benefit in having them separate?
I am a big fan of KISS, the less setup the better.
Thanks
If you want KISS there is no question: 2. stick to vanilla Solr distrib with included jetty.
Doing the work of installing an external servlet engine would make sense if you needed Tomcat for example, but just to use the same thing (Jetty) Solr already includes...no way.
Solr is still using jetty 6. So there would be some benefits if you can get the solr application to run in a recent jetty distribution. For example you could use jetty 9 and use features like SPDY to enhance the response times of your application.
However I have no idea or experience if it's possible to run the solr application standalone in a servlet engine.
Another option for running Solr and keeping it simple is to use Solr-Undertow which is a high performance with small footprint server for Solr. It is easy to use on local machines for development and also production. It supports simple config files for running instances with different data directories, ports and more. It also can run just by pointing it at a distribution .zip file without needing to unpack it.
(note, I am the author of Solr-Undertow)
Link here: https://github.com/bremeld/solr-undertow with releases under the "Releases" tab.

Resources