Solr Cloud with ZooKeeper in local machine - solr

I am trying to setup up in my local laptop with 3 Solr, 3 ZooKeeper and 1 Load Balancer.
I have followed the post https://www.codehousegroup.com/insight-and-inspiration/tech-stream/how-to-configure-sitecore-with-solr-cloud and https://medium.com/#sarkaramrit2/setting-up-solr-cloud-6-3-0-with-zookeeper-3-4-6-867b96ec4272 and few others
I have setup successfully 3 solr in my laptop and URL's are as follows
https://solrcloud1:6161/solr/#/
https://solrcloud2:6162/solr/#/
https://solrcloud3:6163/solr/#/
I have installed the local load balancer "GoBetween" and mapped my above 3 solr paths over there. When I hit https://solrcloud:3010/solr/#/ this URL, I am able to receive response from different Solr instances also. It looks it's working fine as well.
ZooKeeer
I have downloaded zookeeper and placed that in all the 3 Solr locations as well
E.g.: SolrCloud1 = \LocalSolrCloud\SolrCloud1\ contains "solr-8.8.2" and "zookeeper-3.5.6" same structure for SolrCloud2 and SolrCloud3 also.
In Each zookeeper location, I have created a data folder and created "myid" file and placed the value as "1", "2", "3" respectively.
Each zoopkeeper's "zoo.cfg" file contains the below
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/SolrCloud1/zookeeper-3.5.6/data
clientPort=2181
autopurge.snapRetainCount=4
autopurge.purgeInterval=24
server.1=locahost:2888:3888
server.2=locahost:2889:3889
server.3=locahost:2890:3890
When I run the zooKeeper from the command prompt, I am getting the below error.
2022-12-15 11:47:20,051 [myid:1] - WARN [WorkerSender[myid=1]:QuorumCnxManager#679] -
Cannot open channel to 2 at election address localhost/127.0.0.1:3889
java.net.ConnectException: Connection refused: connect
at java.net.DualStackPlainSocketImpl.waitForConnect(Native Method)
at java.net.DualStackPlainSocketImpl.socketConnect(DualStackPlainSocketImpl.java:85)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:172)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:650)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:707)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:620)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:477)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:456)
at java.lang.Thread.run(Thread.java:748)

Related

Replace zookeeper server from zookeeper ensemble (with SolrCloud)

I have a SolrCloud cluster (6.6) setup with external Zookeeper Ensemble (3.4.8) of 5 nodes. Recently, one machine (ip1:port1) that run 1 Zookeeper with id=1 went down. This is what I've done to replace zookeeper:
Start zookeeper in another machine with the same id (=1).
Change zoo.cfg in 4 live zookeeper to match new zookeeper server and restart.
Update ZK_HOST variable in solr.in.sh to match new zookeeper server.
Restart solr.
After that, my solr cluster seemed to functioning well, but in solr.log, it looked like solr client and zookeeper servers still try to connect to the old zookeeper:
Solr log
2017-12-01 15:04:38.782 WARN (Timer-0-SendThread(ip1:port1)) [ ] o.a.z.ClientCnxn Client session timed out, have not heard from server in 30029ms for sessionid 0x0
2017-12-01 15:04:40.807 WARN (Timer-0-SendThread(ip1:port1)) [ ] o.a.z.ClientCnxn Client session timed out, have not heard from server in 31030ms for sessionid 0x0
Zookeeper log:
2017-12-01 13:53:57,972 [myid:] - INFO [main-SendThread(ip1:port1):ClientCnxn$SendThread#1032] - Opening socket connection to server ip1:port1. Will not attempt to authenticate using SASL (unknown error)
2017-12-01 13:54:03,972 [myid:] - WARN [main-SendThread(ip1:port1):ClientCnxn$SendThread#1162] - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1141)
2017-12-01 13:54:05,074 [myid:] - INFO [main-SendThread(ip1:port1):ClientCnxn$SendThread#1032] - Opening socket connection to server ip1:port1. Will not attempt to authenticate using SASL (unknown error)
2017-12-01 13:54:06,974 [myid:] - WARN [main-SendThread(ip1:port1):ClientCnxn$SendThread#1162] - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
I've done some search in add/remove zookeeper but didn't find a document for it. My zookeeper version (3.4.7) is not supported for dynamic reconfiguration (which is in zookeeper 3.5).
Is there a way I can manually remove/add zookeeper server from ensemble?
Thanks for your attention!

Solr Map reduce indexer tool not able to fetch aliases through zk

Hi While working with MapReduceIndexerTool with solr 4.10 cloud, the code is successfully able to connect to Zookeeper, but while fetching the aliases.json, it fails to fetch the data. Below is the command and stack trace:
command:
hadoop --config /etc/hadoop/conf jar target/search-mr-*-job.jar org.apache.solr.hadoop.MapReduceIndexerTool -D 'mapred.child.java.opts=-Xmx500m' --log4j src/test/resources/log4j.properties --morphline-file /home/impadmin/app_quotes_morphline.conf --output-dir hdfs://impetus-i0056.impetus.co.in:8020/user/impadmin/MapReduceIndexerTool/output2 --zk-host 172.26.45.69:9983/solr --collection app.quotes hdfs://impetus-i0056.impetus.co.in:8020/apps/hive/warehouse/kst
stack trace:
WARNING: Use "yarn jar" to launch YARN applications.
1 [main] INFO org.apache.solr.common.cloud.SolrZkClient - Using default ZkCredentialsProvider
87 [main] INFO org.apache.solr.common.cloud.ConnectionManager - Waiting for client to connect to ZooKeeper
114 [main-EventThread] INFO org.apache.solr.common.cloud.ConnectionManager - Watcher org.apache.solr.common.cloud.ConnectionManager#1568159 name:ZooKeeperConnection Watcher:172.26.45.69:9983/solr got event WatchedEvent state:SyncConnected type:None path:null path:null type:None
115 [main] INFO org.apache.solr.common.cloud.ConnectionManager - Client is connected to ZooKeeper
115 [main] INFO org.apache.solr.common.cloud.SolrZkClient - Using default ZkACLProvider
Exception in thread "main" net.sourceforge.argparse4j.inf.ArgumentParserException: java.lang.IllegalArgumentException: Cannot find expected information for SolrCloud in ZooKeeper: 172.26.45.69:9983/solr
at org.apache.solr.hadoop.MapReduceIndexerTool.verifyZKStructure(MapReduceIndexerTool.java:1418)
at org.apache.solr.hadoop.MapReduceIndexerTool.run(MapReduceIndexerTool.java:716)
at org.apache.solr.hadoop.MapReduceIndexerTool.run(MapReduceIndexerTool.java:681)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.solr.hadoop.MapReduceIndexerTool.main(MapReduceIndexerTool.java:668)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: java.lang.IllegalArgumentException: Cannot find expected information for SolrCloud in ZooKeeper: 172.26.45.69:9983/solr
at org.apache.solr.hadoop.ZooKeeperInspector.extractDocCollection(ZooKeeperInspector.java:88)
at org.apache.solr.hadoop.ZooKeeperInspector.extractShardUrls(ZooKeeperInspector.java:56)
at org.apache.solr.hadoop.MapReduceIndexerTool.verifyZKStructure(MapReduceIndexerTool.java:1415)
... 10 more
Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /aliases.json
at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155)
at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:351)
at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:348)
at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:61)
at org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:348)
at org.apache.solr.hadoop.ZooKeeperInspector.checkForAlias(ZooKeeperInspector.java:164)
at org.apache.solr.hadoop.ZooKeeperInspector.extractDocCollection(ZooKeeperInspector.java:85)
... 12 more
Please help me to identify the root cause.
The issue was with the URL that was being hit to access zk solr configs. thus correcting the URL solved the issue. In case of embedded solr instance the URL does not have application solr available, but rather puts it directly under zk root.

Which ports should I open in firewall on nodes with Apach Flink?

When I try to run my flow on Apache Flink standalone cluster I see the following exception:
java.lang.IllegalStateException: Update task on instance aaa0859f6af25decf1f5fc1821ffa55d # app-2 - 4 slots - URL: akka.tcp://flink#192.168.38.98:46369/user/taskmanager failed due to:
at org.apache.flink.runtime.executiongraph.Execution$6.onFailure(Execution.java:954)
at akka.dispatch.OnFailure.internal(Future.scala:228)
at akka.dispatch.OnFailure.internal(Future.scala:227)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:174)
at akka.dispatch.japi$CallbackBridge.apply(Future.scala:171)
at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
at scala.runtime.AbstractPartialFunction.applyOrElse(AbstractPartialFunction.scala:28)
at scala.concurrent.Future$$anonfun$onFailure$1.apply(Future.scala:136)
at scala.concurrent.Future$$anonfun$onFailure$1.apply(Future.scala:134)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at scala.concurrent.impl.ExecutionContextImpl$AdaptedForkJoinTask.exec(ExecutionContextImpl.scala:121)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka.tcp://flink#192.168.38.98:46369/user/taskmanager#1804590378]] after [10000 ms]
at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:333)
at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117)
at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:599)
at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:597)
at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:467)
at akka.actor.LightArrayRevolverScheduler$$anon$8.executeBucket$1(Scheduler.scala:419)
at akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:423)
at akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375)
at java.lang.Thread.run(Thread.java:745)
Seems like port 46369 blocked by firewall. It is true because I read configuration section and open these ports only:
6121:
comment: Apache Flink TaskManager (Data Exchange)
6122:
comment: Apache Flink TaskManager (IPC)
6123:
comment: Apache Flink JobManager
6130:
comment: Apache Flink JobManager (BLOB Server)
8081:
comment: Apache Flink JobManager (Web UI)
The same ports described in flink-conf.yaml:
jobmanager.rpc.address: app-1.stag.local
jobmanager.rpc.port: 6123
jobmanager.heap.mb: 1024
taskmanager.heap.mb: 2048
taskmanager.numberOfTaskSlots: 4
taskmanager.memory.preallocate: false
blob.server.port: 6130
parallelism.default: 4
jobmanager.web.port: 8081
state.backend: jobmanager
restart-strategy: none
restart-strategy.fixed-delay.attempts: 2
restart-strategy.fixed-delay.delay: 60s
So, I have two questions:
This exception related to blocked ports. Right?
Which ports should I open on firewall for standalone Apache Flink cluster?
UPDATE 1
I found configuration problem in masters and slaves files (I skip new line separators between hosts described in these files). I fixed it and now I see other exceptions:
flink--taskmanager-0-app-1.stag.local.log
flink--taskmanager-0-app-2.stag.local.log
I have 2 nodes:
app-1.stag.local (with running job and task managers)
app-2.stag.local (with running task manager)
As you can see from these logs the app-1.stag.local task manager can't connect to other task manager:
java.io.IOException: Connecting the channel failed: Connecting to remote task manager + 'app-2.stag.local/192.168.38.98:35806' has failed. This might indicate that the remote task manager has been lost.
but app-2.stag.local has open port:
2016-03-18 16:24:14,347 INFO org.apache.flink.runtime.io.network.netty.NettyServer - Successful initialization (took 39 ms). Listening on SocketAddress /192.168.38.98:35806
So, I think problem related to firewall but I don't understand where I can configure this port (or range of ports) in Apache Flink.
I have found a problem: taskmanager.data.port parameter was set to 0 by default (but documentation say what it should be set to 6121).
So, I set this port in flink-conf.yaml and now all works fine.

Using external zookeper with solr cloud

I am trying to implement solrcloud.I foollowed doc from official resource https://cwiki.apache.org/confluence/display/solr/Getting+Started+with+SolrCloud .It works fine with embeded zookeper but it is recomended to use external zookeper. I insalled zookeper on my system created data dictionary zookeper on my home folder.I created sub folders named 1 and 2 and created myid file with text 1 and two respectively i each folder as mentioned in doc.I created config files for zookeper zoo.cnfg
clientPort=2181
initLimit=5
syncLimit=2
server.1=localhost:2879:3879
server.2=localhost:2888:3888
and zoo2.cnfg
initLimit=5
syncLimit=2
clientPort=2182
server.1=localhost:2878:3878
server.2=localhost:2888:3888
Next I run cd
bin/zkServer.sh start zoo.cfg
bin/zkServer.sh start zoo2.cfg
And its started sucessfully. next I run
bin/solr start -e cloud -z localhost:2181,localhost:2182
system ask me no of shards etc like in getting started i select port for node1 8990 and for node 2 8991. It gives error
Waiting to see Solr listening on port 8991 [/] Still not seeing Solr listening on 8991 after 30 seconds!
WARN - 2015-10-30 09:47:04.827; [ ] org.apache.zookeeper.ClientCnxn$SendThread; Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
WARN - 2015-10-30 09:47:05.929; [ ] org.apache.zookeeper.ClientCnxn$SendThread; Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
WARN - 2015-10-30 09:47:06.030; [ ] org.apache.zookeeper.ClientCnxn$SendThread; Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
WARN - 2015-10-30 09:47:07.131; [ ] org.apache.zookeeper.ClientCnxn$SendThread; Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
WARN - 2015-10-30 09:47:07.232; [ ] org.apache.zookeeper.ClientCnxn$SendThread; Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
Where I am missing ? gone through many docs but apche doc is not proper for external zookeper setup.
Your Zookeeper ensemble must have an impair number of nodes : 1, 3, 5, etc...
If you want to test ZK clustering feature than you have to set up at least 3 ZK instances. In this case, don't forget :
To set correctly the ZK server id in the file myid, that must be created in the directory dataDir, referenced by your zoo.cfg.
Separate the dataDir and dataLogDir for each ZK instance.

Solr 5.3 Zookeeper Ensemble create_collection timeout 180s

I have 3 servers running with each Solr 5.3 and Zookeeper (solr-cloud-01/zookeeper-01, solr-cloud-02/zookeeper-02 & solr-cloud-03/zookeeper-03)
Zookeeper is up and running and one of the servers is a leader, others are follower
# zkServer.sh status
If I try to create a solr collection, the config is created correctly in Zookeeper, but the core itself will not create, but timeout after 180s
# solr create_collection -c [collection_name] -d [config_name]
Connecting to ZooKeeper at zookeeper-01:2181,zookeeper-02:2181,zookeeper-03:2181 ...
Uploading /opt/solr/server/solr/configsets/[config_name]/conf for config
[collection_name] to ZooKeeper at zookeeper-01:2181,zookeeper-02:2181,zookeeper-03:2181
(or)
Re-using existing configuration directory [collection_name]
next:
Creating new collection '[collection_name]' using command:
http://localhost:8983/solr/admin/collections?action=CREATE&name=
[collection_name]&numShards=1&replicationFactor=1&maxShardsPerNode=1&
collection.configName=[collection_name]
ERROR: Failed to create collection '[collection_name]' due to:
create the collection time out:180s
The solr admin console log shows 2 identical error messages, one from SolrCore, the other from SolrDispatchFilter
null:org.apache.solr.common.SolrException: create the collection time out:180s
at org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:239)
at org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:170)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
at org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:675)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:443)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:214)
at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
at org.eclipse.jetty.server.Server.handle(Server.java:499)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)
at org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)
at java.lang.Thread.run(Thread.java:745)
If I then edit /opt/zookeeper/conf/zoo.cfg and uncomment the other zookeepers (reducing the quorum to 1 server)
server.1=zookeeper-01:2888:3888
#server.2=zookeeper-02:2888:3888
#server.3=zookeeper-03:2888:3888
And change the ZK_HOSTS option in /var/solr/solr.in.sh
#ZK_HOST="zookeeper-01:2181,zookeeper-02:2181,zookeeper-03:2181"
ZK_HOST="zookeeper-01:2181"
And restart both zookeeper and solr => The core is created (it was queued somehow?). But offline becausethe quorum was down (1 of 3 zookeeper nodes)
So then I experimented with a standalone solr / zookeeper setup (solr-cloud-01 / zookeeper-01)
# zkServer.sh status
JMX enabled by default
Using config: /opt/zookeeper/bin/../conf/zoo.cfg
Mode: standalone
# zkServer.sh status
JMX enabled by default
Using config: /opt/zookeeper/bin/../conf/zoo.cfg
Mode: standalone
I executed the same command:
# solr create_collection -c [collection_name] -d [config_name]
Connecting to ZooKeeper at zookeeper-01:2181 ...
Uploading /opt/solr/server/solr/configsets/[config_name]/conf for config [collection_name]
to ZooKeeper at zookeeper-01:2181
Creating new collection '[collection_name]' using command:
http://localhost:8983/solr/admin/collections?action=CREATE
&name=[collection_name]&numShards=1&replicationFactor=1&
maxShardsPerNode=1&collection.configName=[collection_name]
{
"responseHeader":{
"status":0,
"QTime":9417},
"success":{"":{
"responseHeader":{
"status":0,
"QTime":8869},
"core":"[collection_name]_shard1_replica1"}}}
So that works!
In conclusion, I have the feeling that some routes are not correctly configured, but I can't seem to find out which... Because Zookeeper seems to work and all individual solr instances as well
Here my hosts file:
127.0.0.1 localhost
10.0.0.1 solr-cloud-01
10.0.0.2 solr-cloud-02
10.0.0.3 solr-cloud-03
10.0.0.1 zookeeper-01
10.0.0.2 zookeeper-02
10.0.0.3 zookeeper-03
So, I finally found the answer!
After inspecting the /clusterstate.json via the zkCli.sh I saw that when disconnected 3 'rogue' replica's were mad to the standalone cluster. All pointing to 127.0.1.1, (which is a debian specific loopback to localhost, see https://www.debian.org/doc/manuals/debian-reference/ch05.en.html#_the_hostname_resolution)
The clue was in my hosts file.
So when I changed all reference to hostnames from 127.0.1.1 to the outside IP (in my case 10.0.0.x) it started working!
My new hosts file:
127.0.0.1 localhost
10.0.0.1 solr-cloud-01
10.0.0.2 solr-cloud-02
10.0.0.3 solr-cloud-03
10.0.0.1 zookeeper-01
10.0.0.2 zookeeper-02
10.0.0.3 zookeeper-03

Resources