I searched a lot for this problem but I can't find many resources concerning this issue.
We are running a set-up of SolrCloud on 2 FreeBsd servers. Every night at exactly 12:00 our Solr servers are somehow being shutdown gracefully internally by solr. Strange is that both solr-servers are being restarted.
I don't have any clu what the cause is, it seems that Jetty is the possible cause with the setting "stopAtShutdown" and "gracefulShutdown" but I am not sure what the reason is for restarting solr.
Some log lines:
INFO - 2015-11-24 00:00:01.056; org.eclipse.jetty.server.Server; Graceful shutdown SocketConnector#0.0.0.0:8983
INFO - 2015-11-24 00:00:01.057; org.eclipse.jetty.server.Server; Graceful shutdown o.e.j.w.WebAppContext{/solr,file:/usr/local/share/examples/apache-solr/solr-webapp/webapp/},/usr/local/share/examples/apa
che-solr/webapps/solr.war
INFO - 2015-11-24 00:00:02.064; org.apache.solr.core.CoreContainer; Shutting down CoreContainer instance=568196821
WARN - 2015-11-24 00:00:02.065; org.apache.solr.cloud.RecoveryStrategy; Stopping recovery for core=X_shard1_replica2 coreNodeName=core_node1
WARN - 2015-11-24 00:00:02.065; org.apache.solr.cloud.RecoveryStrategy; Stopping recovery for core=Y_shard1_replica1 coreNodeName=core_node1
INFO - 2015-11-24 00:00:02.065; org.apache.solr.cloud.ZkController; publishing core=Z_shard1_replica1 state=down collection=Z
INFO - 2015-11-24 00:00:02.073; org.apache.solr.cloud.ZkController; publishing core=Zoetermeer_Openbaar_shard1_replica2 state=down collection=Zoetermeer_Openbaar
INFO - 2015-11-24 00:00:02.102; org.apache.solr.cloud.ZkController; publishing core=X_shard1_replica2 state=down collection=X
INFO - 2015-11-24 00:00:02.110; org.apache.solr.cloud.ZkController; publishing core=Y_shard1_replica1 state=down collection=Y
INFO - 2015-11-24 00:00:02.125; org.apache.solr.core.SolrCore; [Z_shard1_replica1] CLOSING SolrCore org.apache.solr.core.SolrCore#3e6f43bd
INFO - 2015-11-24 00:00:02.125; org.apache.solr.update.DirectUpdateHandler2; closing DirectUpdateHandler2{commits=0,autocommits=0,soft autocommits=0,optimizes=0,rollbacks=0,expungeDeletes=0,docsPending=0,
adds=0,deletesById=0,deletesByQuery=0,errors=0,cumulative_adds=0,cumulative_deletesById=0,cumulative_deletesByQuery=0,cumulative_errors=0,transaction_logs_total_size=0,transaction_logs_total_number=0}
INFO - 2015-11-24 00:00:02.126; org.apache.solr.update.SolrCoreState; Closing SolrCoreState
INFO - 2015-11-24 00:00:02.126; org.apache.solr.update.DefaultSolrCoreState; SolrCoreState ref count has reached 0 - closing IndexWriter
INFO - 2015-11-24 00:00:02.126; org.apache.solr.update.DefaultSolrCoreState; closing IndexWriter with IndexWriterCloser
INFO - 2015-11-24 00:00:02.129; org.apache.solr.core.SolrCore; [Z_shard1_replica1] Closing main searcher on request.
INFO - 2015-11-24 00:00:02.130; org.apache.solr.core.CachingDirectoryFactory; Closing StandardDirectoryFactory - 2 directories currently being tracked
INFO - 2015-11-24 00:00:02.132; org.apache.solr.common.cloud.ZkStateReader$2; A cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred - updating
... (live nodes size: 2)
INFO - 2015-11-24 00:00:02.133; org.apache.solr.core.CachingDirectoryFactory; looking to close /data/solr/Z_shard1_replica1/data/index [CachedDir<<refCount=0;path=/data/solr/Z_shard1_replica1/data/index;done=false>>]
INFO - 2015-11-24 00:00:02.133; org.apache.solr.core.CachingDirectoryFactory; Closing directory: /data/solr/Z_shard1_replica1/data/index
INFO - 2015-11-24 00:00:02.134; org.apache.solr.core.CachingDirectoryFactory; looking to close /data/solr/Z_shard1_replica1/data [CachedDir<<refCount=0;path=/data/solr/Z_shard1_replica1/data;done=false>>]
INFO - 2015-11-24 00:00:02.134; org.apache.solr.core.CachingDirectoryFactory; Closing directory: /data/solr/Z_shard1_replica1/data
INFO - 2015-11-24 00:00:02.134; org.apache.solr.core.SolrCore; [X_Openbaar_shard1_replica2] CLOSING SolrCore org.apache.solr.core.SolrCore#78
Does somebody have an idea what is going on?
Related
The problem:
The flink job-manager couldn't recovery from a checkpoint.
Caused by: java.lang.IllegalStateException: There is no operator for the state
Background:
I'm running a flink 1.6.3 over k8s. and I'm using incremental checkpoint on rocksdb.
I tryied to pass the parameter --allowNonRestoredState in order to skip savepoint state that cannot be restored
From my log:
2019-02-06 08:51:08.068 [main] INFO
org.apache.flink.runtime.entrypoint.ClusterEntrypoint -
--allowNonRestoredState
2019-02-06 08:51:22.827 [flink-akka.actor.default-dispatcher-14] INFO
o.a.f.runtime.checkpoint.ZooKeeperCompletedCheckpointStore -
Recovering checkpoints from ZooKeeper. 2019-02-06 08:51:22.883
[flink-akka.actor.default-dispatcher-14] INFO
o.a.f.runtime.checkpoint.ZooKeeperCompletedCheckpointStore - Found 1
checkpoints in ZooKeeper. 2019-02-06 08:51:22.883
[flink-akka.actor.default-dispatcher-14] INFO
o.a.f.runtime.checkpoint.ZooKeeperCompletedCheckpointStore - Trying
to fetch 1 checkpoints from storage. 2019-02-06 08:51:22.884
[flink-akka.actor.default-dispatcher-14] INFO
o.a.f.runtime.checkpoint.ZooKeeperCompletedCheckpointStore - Trying
to retrieve checkpoint 1612. 2019-02-06 08:51:22.977
[flink-akka.actor.default-dispatcher-14] INFO
org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Restoring
job 00000000000000000000000000000000 from latest valid checkpoint:
Checkpoint 1612 # 1549376250641 for 00000000000000000000000000000000.
2019-02-06 08:51:22.982 [flink-akka.actor.default-dispatcher-14] ERROR
org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Fatal error
occurred in the cluster entrypoint. java.lang.RuntimeException:
org.apache.flink.runtime.client.JobExecutionException: Could not set
up JobManager
at org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:36)
at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:39)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:415)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: org.apache.flink.runtime.client.JobExecutionException:
Could not set up JobManager
at org.apache.flink.runtime.jobmaster.JobManagerRunner.(JobManagerRunner.java:176)
at org.apache.flink.runtime.dispatcher.Dispatcher$DefaultJobManagerRunnerFactory.createJobManagerRunner(Dispatcher.java:1058)
at org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJobManagerRunner$5(Dispatcher.java:308)
at org.apache.flink.util.function.CheckedSupplier.lambda$unchecked$0(CheckedSupplier.java:34)
... 7 common frames omitted Caused by: java.lang.IllegalStateException: There is no operator for the state
b22e6e8baea7d7e562d5a233f3301ce1
at org.apache.flink.runtime.checkpoint.StateAssignmentOperation.checkStateMappingCompleteness(StateAssignmentOperation.java:569)
at org.apache.flink.runtime.checkpoint.StateAssignmentOperation.assignStates(StateAssignmentOperation.java:77)
at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.restoreLatestCheckpointedState(CheckpointCoordinator.java:1049)
at org.apache.flink.runtime.jobmaster.JobMaster.createAndRestoreExecutionGraph(JobMaster.java:1138)
at org.apache.flink.runtime.jobmaster.JobMaster.(JobMaster.java:294)
at org.apache.flink.runtime.jobmaster.JobManagerRunner.(JobManagerRunner.java:157)
... 10 common frames omitted 2019-02-06 08:51:23.013 [TransientBlobCache shutdown hook] INFO
org.apache.flink.runtime.blob.TransientBlobCache - Shutting down BLOB
cache 2019-02-06 08:51:23.033 [BlobServer shutdown hook] INFO
org.apache.flink.runtime.blob.BlobServer - Stopped BLOB server at
0.0.0.0:6124
Expected result:
The job will start running from the latest checkpoint and will skip the state that cannot be restored
While I was debugging my Camel application I realized, that the graceful shutdown of a route ignores the outstanding tasks that have been triggered by wireTap().
If I have a route definition like this:
from("direct:start")
.wireTap("bean:myWireTapBean")
.to("mock:result");
and I set a debugging breakpoint in myWireTapBean (i.e. suspend the asynchronous processing of wireTap) then a CamelContext.stopRoute(routeId) call produces the following log messages:
11:36:12.352 [Thread-1] INFO o.a.camel.spring.SpringCamelContext - Apache Camel 2.19.1 (CamelContext: camel-1) is shutting down
11:36:12.352 [Thread-1] INFO o.a.c.impl.DefaultShutdownStrategy - Starting to graceful shutdown 1 routes (timeout 300 seconds)
11:36:12.352 [Camel (camel-1) thread #6 - ShutdownTask] INFO o.a.c.impl.DefaultShutdownStrategy - Route: route1 shutdown complete, was consuming from: direct://myRoute
11:36:12.352 [Thread-1] INFO o.a.c.impl.DefaultShutdownStrategy - Graceful shutdown of 1 routes completed in 0 seconds
11:36:12.478 [Thread-1] WARN o.a.c.impl.DefaultInflightRepository - Shutting down while there are still 1 inflight exchanges.
11:36:12.478 [Thread-1] INFO o.a.camel.spring.SpringCamelContext - Apache Camel 2.19.1 (CamelContext: camel-1) uptime 1.411 seconds
11:36:12.478 [Thread-1] INFO o.a.camel.spring.SpringCamelContext - Apache Camel 2.19.1 (CamelContext: camel-1) is shutdown in 0.126 seconds
Is there any way to prevent Camel from shutting down while there are still inflight exchanges in the DefaultInflightRepository that have been created by wireTap?
I've already read the FAQ: How can I stop a route from a route but this doesn't seem to be an answer to this question.
Yeah the current implementation of WireTap does not take in account of active tasks if they are not being routed.
I logged a ticket to add support for deferring shutdown of the WireTap EIP if it has inflight tasks: https://issues.apache.org/jira/browse/CAMEL-11539
The workaround is to create a route such as direct and then call that route where you can then call the bean, then when you shutdown Camel the direct route will have inflight exchanges and therefore wait for it to complete.
The DefaultShutdownStrategy's default behavior is to complete route shutdowns, so it is one of those things where the default works for many scenarios, but not all and you are in the latter. If the exchange is run within a transaction, it would rollback, you might look at sourcing from a JMS queue or other so you can rollback and resume processing.
The Javadocs have good information on the behavior and various options you can use to configure the DefaultShutdownStrategy before considering writing a custom one:
DefaultShutdownStrategy
I have 3 zookeeper nodes. Those node was working fine but when I restart those nodes using ./zkServer.sh restart, the zookeeper did not got up again.
When I checked on the zookeeper status, it return:
./zkServer.sh status
JMX enabled by default
Using config: /opt/zookeeper/bin/../conf/zoo.cfg
Error contacting service. It is probably not running.
my zoo.cnf is:
dataDir=/var/lib/zookeeperdata/3
clientPort=2181
initLimit=50
tickTime=2000
syncLimit=10
maxClientCnxns=100000
server.1=IP1 value:2888:3888
server.2=IP2 value:2889:3889
server.3=127.0.0.1:2890:3890
This is unstable behavior because may be after two hours or tomorrow if I made restart for the 3 zookeeper nodes, they will see each others and working fine because this happened before with me.
zookeeper log:
2014-05-14 15:22:34,236 [myid:3] - INFO [main:NIOServerCnxnFactory#94] - binding to port 0.0.0.0/0.0.0.0:2181
2014-05-14 15:22:34,282 [myid:3] - INFO [main:QuorumPeer#913] - tickTime set to 2000
2014-05-14 15:22:34,283 [myid:3] - INFO [main:QuorumPeer#933] - minSessionTimeout set to -1
2014-05-14 15:22:34,283 [myid:3] - INFO [main:QuorumPeer#944] - maxSessionTimeout set to -1
2014-05-14 15:22:34,283 [myid:3] - INFO [main:QuorumPeer#959] - initLimit set to 50
2014-05-14 15:22:34,356 [myid:3] - INFO [main:FileSnap#83] - Reading snapshot /var/lib/zookeeperdata/3/version-2/snapshot.f100000001
2014-05-14 15:22:43,387 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory#197] - Accepted socket connection from /127.0.0.1:50923
2014-05-14 15:22:43,396 [myid:3] - INFO [Thread-1:QuorumCnxManager$Listener#486] - My election bind port: 0.0.0.0/0.0.0.0:3890
2014-05-14 15:22:43,404 [myid:3] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn#354] - Exception causing close of session 0x0 due to java.io.IOExce
ption: ZooKeeperServer not running
2014-05-14 15:22:43,404 [myid:3] - INFO [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn#1001] - Closed socket connection for client /127.0.0.1:50923 (no se
ssion established for client)
2014-05-14 15:22:43,427 [myid:3] - INFO [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:QuorumPeer#670] - LOOKING
2014-05-14 15:22:43,429 [myid:3] - INFO [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:FastLeaderElection#740] - New election. My id = 3, proposed zxid=0xf100000001
2014-05-14 15:22:48,438 [myid:3] - WARN [WorkerSender[myid=3]:QuorumCnxManager#368] - Cannot open channel to 1 at election address /54.76.10.81:3888
java.net.SocketTimeoutException: connect timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
at java.net.Socket.connect(Socket.java:529)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:354)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:327)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:393)
at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:365)
at java.lang.Thread.run(Thread.java:662)
2014-05-14 15:22:53,440 [myid:3] - WARN [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:QuorumCnxManager#368] - Cannot open channel to 1 at election address /54.76.10.81:3
888
java.net.SocketTimeoutException: connect timed out
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)
at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213)
at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:366)
at java.net.Socket.connect(Socket.java:529)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:354)
at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:388)
I searched a lot on this but I did not found anything useful for me so I hope someone can help me.
Thanks
I've seen behavior like this as well. A ZK configuration that's been running fine will sometimes simply fail to restart. When this happens I've tried the following:
1) look at the logs for all of the servers...often one will list an error
2) stop all servers and restart
3) stop all servers and restart the servers one at a time
4) verify that each server's myid file exists, has correct permissions and has the right value.
I've used clusterssh to open windows to each of the servers so that the restarts can be at the very same time...and then I've tailed all of the server logs. Keep in mind that during restart the ZK cluster is doing a lot: both starting each server and electing a leader. I've had times when the cluster seemed to fail and then after a few more minutes it seems to figure it out.
There is a great tool called zktop that I've used for monitoring ZK.
I fixed it by changing the IP 127.0.0.1 to the internal IP for amazon node, after making this change for the three nodes and restart, this problem did not happened again. I hope this answer can help someone asking about the same problem.
make sure you have put correct data Dir in each of your node configuration.
and also put a myid file in data Dir and put a number between 1-255 for each of you node in the myid file.
I think it resole the issue.
We try to import our data into SolrCloud using MapReduce batch indexing. We face a problem at the reduce phase, that solr.xml cannot be found. We create a 'twitter' collection but looking at the logs, after it failed to load in solr.xml, it uses the default one and tries to create 'collection1' (failed) and 'core1' (success) SolrCore. I'm not sure if we need to create our own solr.xml and where to put it (we try to put it at several places but it seems not to load in). Below is the log:
2022 [main] INFO org.apache.solr.hadoop.HeartBeater - Heart beat reporting class is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
2025 [main] INFO org.apache.solr.hadoop.SolrRecordWriter - Using this unpacked directory as solr home: /data/06/mapred/local/taskTracker/distcache/3866561797898787678_-1754062477_512745567/master.hadoop/tmp/9501daf9-5011-4665-bae3-d5af1c8bcd62.solr.zip
2025 [main] INFO org.apache.solr.hadoop.SolrRecordWriter - Creating embedded Solr server with solrHomeDir: /data/06/mapred/local/taskTracker/distcache/3866561797898787678_-1754062477_512745567/master.hadoop/tmp/9501daf9-5011-4665-bae3-d5af1c8bcd62.solr.zip, fs: DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-1828461666_1, ugi=nguyen (auth:SIMPLE)]], outputShardDir: hdfs://master.hadoop:8020/user/nguyen/twitter/outdir/reducers/_temporary/_attempt_201311191613_0320_r_000014_0/part-r-00014
2029 [Thread-64] INFO org.apache.solr.hadoop.HeartBeater - HeartBeat thread running
2030 [Thread-64] INFO org.apache.solr.hadoop.HeartBeater - Issuing heart beat for 1 threads
2083 [main] INFO org.apache.solr.core.SolrResourceLoader - new SolrResourceLoader for directory: '/data/06/mapred/local/taskTracker/distcache/3866561797898787678_-1754062477_512745567/master.hadoop/tmp/9501daf9-5011-4665-bae3-d5af1c8bcd62.solr.zip/'
2259 [main] INFO org.apache.solr.hadoop.SolrRecordWriter - Constructed instance information solr.home /data/06/mapred/local/taskTracker/distcache/3866561797898787678_-1754062477_512745567/master.hadoop/tmp/9501daf9-5011-4665-bae3-d5af1c8bcd62.solr.zip (/data/06/mapred/local/taskTracker/distcache/3866561797898787678_-1754062477_512745567/master.hadoop/tmp/9501daf9-5011-4665-bae3-d5af1c8bcd62.solr.zip), instance dir /data/06/mapred/local/taskTracker/distcache/3866561797898787678_-1754062477_512745567/master.hadoop/tmp/9501daf9-5011-4665-bae3-d5af1c8bcd62.solr.zip/, conf dir /data/06/mapred/local/taskTracker/distcache/3866561797898787678_-1754062477_512745567/master.hadoop/tmp/9501daf9-5011-4665-bae3-d5af1c8bcd62.solr.zip/conf/, writing index to solr.data.dir hdfs://master.hadoop:8020/user/nguyen/twitter/outdir/reducers/_temporary/_attempt_201311191613_0320_r_000014_0/part-r-00014/data, with permdir hdfs://master.hadoop:8020/user/nguyen/twitter/outdir/reducers/_temporary/_attempt_201311191613_0320_r_000014_0/part-r-00014
2266 [main] INFO org.apache.solr.core.ConfigSolr - Loading container configuration from /data/06/mapred/local/taskTracker/distcache/3866561797898787678_-1754062477_512745567/master.hadoop/tmp/9501daf9-5011-4665-bae3-d5af1c8bcd62.solr.zip/solr.xml
2267 [main] INFO org.apache.solr.core.ConfigSolr - /data/06/mapred/local/taskTracker/distcache/3866561797898787678_-1754062477_512745567/master.hadoop/tmp/9501daf9-5011-4665-bae3-d5af1c8bcd62.solr.zip/solr.xml does not exist, using default configuration
2505 [main] INFO org.apache.solr.core.CoreContainer - New CoreContainer 696103669
2505 [main] INFO org.apache.solr.core.CoreContainer - Loading cores into CoreContainer [instanceDir=/data/06/mapred/local/taskTracker/distcache/3866561797898787678_-1754062477_512745567/master.hadoop/tmp/9501daf9-5011-4665-bae3-d5af1c8bcd62.solr.zip/]
2515 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory - Setting socketTimeout to: 0
2515 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory - Setting urlScheme to: http://
2515 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory - Setting connTimeout to: 0
2515 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory - Setting maxConnectionsPerHost to: 20
2516 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory - Setting corePoolSize to: 0
2516 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory - Setting maximumPoolSize to: 2147483647
2516 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory - Setting maxThreadIdleTime to: 5
2516 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory - Setting sizeOfQueue to: -1
2516 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory - Setting fairnessPolicy to: false
2527 [main] INFO org.apache.solr.client.solrj.impl.HttpClientUtil - Creating new http client, config:maxConnectionsPerHost=20&maxConnections=10000&socketTimeout=0&connTimeout=0&retry=false
2648 [main] INFO org.apache.solr.logging.LogWatcher - Registering Log Listener [Log4j (org.slf4j.impl.Log4jLoggerFactory)]
2676 [coreLoadExecutor-3-thread-1] INFO org.apache.solr.core.CoreContainer - Creating SolrCore 'collection1' using instanceDir: /data/06/mapred/local/taskTracker/distcache/3866561797898787678_-1754062477_512745567/master.hadoop/tmp/9501daf9-5011-4665-bae3-d5af1c8bcd62.solr.zip/collection1
2677 [coreLoadExecutor-3-thread-1] INFO org.apache.solr.core.SolrResourceLoader - new SolrResourceLoader for directory: '/data/06/mapred/local/taskTracker/distcache/3866561797898787678_-1754062477_512745567/master.hadoop/tmp/9501daf9-5011-4665-bae3-d5af1c8bcd62.solr.zip/collection1/'
2691 [coreLoadExecutor-3-thread-1] ERROR org.apache.solr.core.CoreContainer - Failed to load file /data/06/mapred/local/taskTracker/distcache/3866561797898787678_-1754062477_512745567/master.hadoop/tmp/9501daf9-5011-4665-bae3-d5af1c8bcd62.solr.zip/collection1/solrconfig.xml
2693 [coreLoadExecutor-3-thread-1] ERROR org.apache.solr.core.CoreContainer - Unable to create core: collection1
org.apache.solr.common.SolrException: Could not load config for solrconfig.xml
at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:596)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:661)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:368)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:360)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: java.io.IOException: Can't find resource 'solrconfig.xml' in classpath or '/data/06/mapred/local/taskTracker/distcache/3866561797898787678_-1754062477_512745567/master.hadoop/tmp/9501daf9-5011-4665-bae3-d5af1c8bcd62.solr.zip/collection1/conf/', cwd=/data/05/mapred/local/taskTracker/nguyen/jobcache/job_201311191613_0320/attempt_201311191613_0320_r_000014_0/work
at org.apache.solr.core.SolrResourceLoader.openResource(SolrResourceLoader.java:322)
at org.apache.solr.core.SolrResourceLoader.openConfig(SolrResourceLoader.java:287)
at org.apache.solr.core.Config.<init>(Config.java:116)
at org.apache.solr.core.Config.<init>(Config.java:86)
at org.apache.solr.core.SolrConfig.<init>(SolrConfig.java:120)
at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:593)
... 11 more
2695 [coreLoadExecutor-3-thread-1] ERROR org.apache.solr.core.CoreContainer - null:org.apache.solr.common.SolrException: Unable to create core: collection1
at org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:1158)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:670)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:368)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:360)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)
Caused by: org.apache.solr.common.SolrException: Could not load config for solrconfig.xml
at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:596)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:661)
... 10 more
Caused by: java.io.IOException: Can't find resource 'solrconfig.xml' in classpath or '/data/06/mapred/local/taskTracker/distcache/3866561797898787678_-1754062477_512745567/master.hadoop/tmp/9501daf9-5011-4665-bae3-d5af1c8bcd62.solr.zip/collection1/conf/', cwd=/data/05/mapred/local/taskTracker/nguyen/jobcache/job_201311191613_0320/attempt_201311191613_0320_r_000014_0/work
at org.apache.solr.core.SolrResourceLoader.openResource(SolrResourceLoader.java:322)
at org.apache.solr.core.SolrResourceLoader.openConfig(SolrResourceLoader.java:287)
at org.apache.solr.core.Config.<init>(Config.java:116)
at org.apache.solr.core.Config.<init>(Config.java:86)
at org.apache.solr.core.SolrConfig.<init>(SolrConfig.java:120)
at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:593)
... 11 more
2697 [main] INFO org.apache.solr.core.CoreContainer - Creating SolrCore 'core1' using instanceDir: /data/06/mapred/local/taskTracker/distcache/3866561797898787678_-1754062477_512745567/master.hadoop/tmp/9501daf9-5011-4665-bae3-d5af1c8bcd62.solr.zip
2697 [main] INFO org.apache.solr.core.SolrResourceLoader - new SolrResourceLoader for directory: '/data/06/mapred/local/taskTracker/distcache/3866561797898787678_-1754062477_512745567/master.hadoop/tmp/9501daf9-5011-4665-bae3-d5af1c8bcd62.solr.zip/'
2751 [main] INFO org.apache.solr.core.SolrConfig - Adding specified lib dirs to ClassLoader
2752 [main] WARN org.apache.solr.core.SolrResourceLoader - Can't find (or read) directory to add to classloader: ../../../contrib/extraction/lib (resolved as: /data/06/mapred/local/taskTracker/distcache/3866561797898787678_-1754062477_512745567/master.hadoop/tmp/9501daf9-5011-4665-bae3-d5af1c8bcd62.solr.zip/../../../contrib/extraction/lib).
2752 [main] WARN org.apache.solr.core.SolrResourceLoader - Can't find (or read) directory to add to classloader: ../../../dist/ (resolved as: /data/06/mapred/local/taskTracker/distcache/3866561797898787678_-1754062477_512745567/master.hadoop/tmp/9501daf9-5011-4665-bae3-d5af1c8bcd62.solr.zip/../../../dist).
2752 [main] WARN org.apache.solr.core.SolrResourceLoader - Can't find (or read) directory to add to classloader: ../../../contrib/clustering/lib/ (resolved as: /data/06/mapred/local/taskTracker/distcache/3866561797898787678_-1754062477_512745567/master.hadoop/tmp/9501daf9-5011-4665-bae3-d5af1c8bcd62.solr.zip/../../../contrib/clustering/lib).
2753 [main] WARN org.apache.solr.core.SolrResourceLoader - Can't find (or read) directory to add to classloader: ../../../dist/ (resolved as: /data/06/mapred/local/taskTracker/distcache/3866561797898787678_-1754062477_512745567/master.hadoop/tmp/9501daf9-5011-4665-bae3-d5af1c8bcd62.solr.zip/../../../dist).
2753 [main] WARN org.apache.solr.core.SolrResourceLoader - Can't find (or read) directory to add to classloader: ../../../contrib/langid/lib/ (resolved as: /data/06/mapred/local/taskTracker/distcache/3866561797898787678_-1754062477_512745567/master.hadoop/tmp/9501daf9-5011-4665-bae3-d5af1c8bcd62.solr.zip/../../../contrib/langid/lib).
2753 [main] WARN org.apache.solr.core.SolrResourceLoader - Can't find (or read) directory to add to classloader: ../../../dist/ (resolved as: /data/06/mapred/local/taskTracker/distcache/3866561797898787678_-1754062477_512745567/master.hadoop/tmp/9501daf9-5011-4665-bae3-d5af1c8bcd62.solr.zip/../../../dist).
2753 [main] WARN org.apache.solr.core.SolrResourceLoader - Can't find (or read) directory to add to classloader: ../../../contrib/velocity/lib (resolved as: /data/06/mapred/local/taskTracker/distcache/3866561797898787678_-1754062477_512745567/master.hadoop/tmp/9501daf9-5011-4665-bae3-d5af1c8bcd62.solr.zip/../../../contrib/velocity/lib).
2753 [main] WARN org.apache.solr.core.SolrResourceLoader - Can't find (or read) directory to add to classloader: ../../../dist/ (resolved as: /data/06/mapred/local/taskTracker/distcache/3866561797898787678_-1754062477_512745567/master.hadoop/tmp/9501daf9-5011-4665-bae3-d5af1c8bcd62.solr.zip/../../../dist).
2785 [main] INFO org.apache.solr.update.SolrIndexConfig - IndexWriter infoStream solr logging is enabled
2790 [main] INFO org.apache.solr.core.SolrConfig - Using Lucene MatchVersion: LUCENE_44
2869 [main] INFO org.apache.solr.core.Config - Loaded SolrConfig: solrconfig.xml
2879 [main] INFO org.apache.solr.schema.IndexSchema - Reading Solr Schema from schema.xml
2937 [main] INFO org.apache.solr.schema.IndexSchema - [core1] Schema name=twitter
3352 [main] INFO org.apache.solr.schema.IndexSchema - unique key field: id
3471 [main] INFO org.apache.solr.schema.FileExchangeRateProvider - Reloading exchange rates from file currency.xml
3478 [main] INFO org.apache.solr.schema.FileExchangeRateProvider - Reloading exchange rates from file currency.xml
3635 [main] INFO org.apache.solr.core.HdfsDirectoryFactory - Solr Kerberos Authentication disabled
3636 [main] INFO org.apache.solr.core.JmxMonitoredMap - No JMX servers found, not exposing Solr information with JMX.
3652 [main] INFO org.apache.solr.core.HdfsDirectoryFactory - creating directory factory for path hdfs://master.hadoop:8020/user/nguyen/twitter/outdir/reducers/_temporary/_attempt_201311191613_0320_r_000014_0/part-r-00014/data
3686 [main] INFO org.apache.solr.core.CachingDirectoryFactory - return new directory for hdfs://master.hadoop:8020/user/nguyen/twitter/outdir/reducers/_temporary/_attempt_201311191613_0320_r_000014_0/part-r-00014/data
3711 [main] WARN org.apache.solr.core.SolrCore - [core1] Solr index directory 'hdfs:/master.hadoop:8020/user/nguyen/twitter/outdir/reducers/_temporary/_attempt_201311191613_0320_r_000014_0/part-r-00014/data/index' doesn't exist. Creating new index...
3719 [main] INFO org.apache.solr.core.HdfsDirectoryFactory - creating directory factory for path hdfs://master.hadoop:8020/user/nguyen/twitter/outdir/reducers/_temporary/_attempt_201311191613_0320_r_000014_0/part-r-00014/data/index
3719 [main] INFO org.apache.solr.core.HdfsDirectoryFactory - Number of slabs of block cache [1] with direct memory allocation set to [true]
3720 [main] INFO org.apache.solr.core.HdfsDirectoryFactory - Block cache target memory usage, slab size of [134217728] will allocate [1] slabs and use ~[134217728] bytes
3721 [main] INFO org.apache.solr.store.blockcache.BufferStore - Initializing the 1024 buffers with [8192] buffers.
3740 [main] INFO org.apache.solr.store.blockcache.BufferStore - Initializing the 8192 buffers with [8192] buffers.
3891 [main] INFO org.apache.solr.core.CachingDirectoryFactory - return new directory for hdfs://master.hadoop:8020/user/nguyen/twitter/outdir/reducers/_temporary/_attempt_201311191613_0320_r_000014_0/part-r-00014/data/index
3988 [main] INFO org.apache.solr.update.LoggingInfoStream - [IFD][main]: init: current segments file is "null"; deletionPolicy=org.apache.solr.core.IndexDeletionPolicyWrapper#65b01d5d
3992 [main] INFO org.apache.solr.update.LoggingInfoStream - [IFD][main]: now checkpoint "" [0 segments ; isCommit = false]
3992 [main] INFO org.apache.solr.update.LoggingInfoStream - [IFD][main]: 0 msec to checkpoint
3992 [main] INFO org.apache.solr.update.LoggingInfoStream - [IW][main]: init: create=true
3992 [main] INFO org.apache.solr.update.LoggingInfoStream - [IW][main]:
dir=NRTCachingDirectory(org.apache.solr.store.hdfs.HdfsDirectory#17e5a6d8 lockFactory=org.apache.solr.store.hdfs.HdfsLockFactory#7f117668; maxCacheMB=192.0 maxMergeSizeMB=16.0)
solr looks for solr.home parameter and searchs solrConfig.xml file there. if there is none it tries to load default configuration.
it looks like your solr home is
/data/06/mapred/local/taskTracker/distcache/3866561797898787678_-1754062477_512745567/master.hadoop/tmp/9501daf9-5011-4665-bae3-d5af1c8bcd62.solr.zip/collection1/
check that folder for solrconfig.xml file
if there is none, copy one from example directory of solr
if there is one, match the file/folder permissions with the server instance
I am using Solr-4.3.1 on Ubuntu machine and it is running on default jetty server. Recently, I have observed that it gets shut down abruptly. I cant find any exception or error or warning message in the logs. Following are the logs:
org.eclipse.jetty.server.Server; Graceful shutdown
SocketConnector#0.0.0.0:8983
INFO - 2013-10-16 15:05:15.091; org.eclipse.jetty.server.Server;
Graceful shutdown
o.e.j.w.WebAppContext{/solr,file:/home/abhijeet/project/demo/Demo/solr-4.3.1/example/solr-webapp/webapp/},/home/abhijeet/project/demo/Demo/solr-4.3.1/example/webapps/solr.war
INFO - 2013-10-16 15:05:16.093; org.apache.solr.core.CoreContainer;
Shutting down CoreContainer instance=2137222734
INFO - 2013-10-16 15:05:16.236; org.apache.solr.core.SolrCore;
[collection1] CLOSING SolrCore org.apache.solr.core.SolrCore#9f07597
INFO - 2013-10-16 15:05:16.244;
org.apache.solr.update.DirectUpdateHandler2; closing
DirectUpdateHandler2{commits=9,autocommit
maxTime=15000ms,autocommits=0,soft
autocommits=0,optimizes=1,rollbacks=0,expungeDeletes=0,docsPending=0,adds=0,deletesById=0,deletesByQuery=0,errors=0,cumulative_adds=0,cumulative_deletesById=0,cumulative_deletesByQuery=0,cumulative_errors=0}
INFO - 2013-10-16 15:05:16.245; org.apache.solr.update.SolrCoreState;
Closing SolrCoreState
INFO - 2013-10-16 15:05:16.246;
org.apache.solr.update.DefaultSolrCoreState; SolrCoreState ref count
has reached 0 - closing IndexWriter
INFO - 2013-10-16 15:05:16.246;
org.apache.solr.update.DefaultSolrCoreState; closing IndexWriter with
IndexWriterCloser
INFO - 2013-10-16 15:05:16.330; org.apache.solr.core.SolrCore;
[collection1] Closing main searcher on request.
INFO - 2013-10-16 15:05:16.332;
org.apache.solr.core.CachingDirectoryFactory; Closing
NRTCachingDirectoryFactory - 2 directories currently being tracked
INFO - 2013-10-16 15:05:16.352;
org.apache.solr.core.CachingDirectoryFactory; looking to close
/home/abhijeet/project/demo/Demo/solr-4.3.1/example/solr/collection1/data
[CachedDir<>]
INFO - 2013-10-16 15:05:16.352;
org.apache.solr.core.CachingDirectoryFactory; Closing directory:
/home/abhijeet/project/demo/Demo/solr-4.3.1/example/solr/collection1/data
INFO - 2013-10-16 15:05:16.352;
org.apache.solr.core.CachingDirectoryFactory; looking to close
/home/abhijeet/project/demo/Demo/solr-4.3.1/example/solr/collection1/data/index
[CachedDir<>]
INFO - 2013-10-16 15:05:16.353;
org.apache.solr.core.CachingDirectoryFactory; Closing directory:
/home/abhijeet/project/demo/Demo/solr-4.3.1/example/solr/collection1/data/index
INFO - 2013-10-16 15:05:16.354;
org.eclipse.jetty.server.handler.ContextHandler; stopped
o.e.j.w.WebAppContext{/solr,file:/home/abhijeet/project/demo/Demo/solr-4.3.1/example/solr-webapp/webapp/},/home/abhijeet/project/demo/Demo/solr-4.3.1/example/webapps/solr.war
INFO - 2013-10-16 15:05:16.588;
org.eclipse.jetty.util.thread.ShutdownThread; shutdown already
commenced