Diagnosing error in deploying GAE flex app - google-app-engine

I've been using GAE flex for awhile now, and all of a sudden my deploy process ends on the command line with:
ERROR: (gcloud.app.deploy) Error Response: [4] Flex operation
projects/MY-PROJECT/regions/us-central1/operations/xxx
error [DEADLINE_EXCEEDED]: An internal error occurred while processing
task
/appengine-flex-v1/insert_flex_deployment/flex_create_resources>2019-09-04T21:29:03.412Z8424.ow.0:
Gave up polling Deployment Manager operation
MY-PROJECT/operation-xxx.
My logs don't have any helpful info. These are relevant logs from the deployment:
2019-09-04T14:07:07Z [2019-09-04 14:07:07 +0000] [1] [INFO] Shutting down: Master
2019-09-04T14:07:06Z [2019-09-04 14:07:06 +0000] [16] [INFO] Worker exiting (pid: 16)
2019-09-04T14:07:06Z [2019-09-04 14:07:06 +0000] [14] [INFO] Worker exiting (pid: 14)
2019-09-04T14:07:05Z [2019-09-04 14:07:05 +0000] [13] [INFO] Worker exiting (pid: 13)
2019-09-04T14:07:05Z [2019-09-04 14:07:05 +0000] [11] [INFO] Worker exiting (pid: 11)
2019-09-04T14:07:05Z [2019-09-04 14:07:05 +0000] [10] [INFO] Worker exiting (pid: 10)
2019-09-04T14:07:05Z [2019-09-04 14:07:05 +0000] [9] [INFO] Worker exiting (pid: 9)
2019-09-04T14:07:05Z [2019-09-04 14:07:05 +0000] [8] [INFO] Worker exiting (pid: 8)
2019-09-04T14:07:05Z [2019-09-04 14:07:05 +0000] [1] [INFO] Handling signal: term
2019-09-04T14:03:04Z [2019-09-04 14:03:04 +0000] [16] [INFO] Booting worker with pid: 16
2019-09-04T14:03:03Z [2019-09-04 14:03:03 +0000] [14] [INFO] Booting worker with pid: 14
2019-09-04T14:03:03Z [2019-09-04 14:03:03 +0000] [13] [INFO] Booting worker with pid: 13
2019-09-04T14:03:03Z [2019-09-04 14:03:03 +0000] [11] [INFO] Booting worker with pid: 11
2019-09-04T14:03:03Z [2019-09-04 14:03:03 +0000] [10] [INFO] Booting worker with pid: 10
2019-09-04T14:03:03Z [2019-09-04 14:03:03 +0000] [9] [INFO] Booting worker with pid: 9
2019-09-04T14:03:03Z [2019-09-04 14:03:03 +0000] [8] [INFO] Booting worker with pid: 8
2019-09-04T14:03:03Z [2019-09-04 14:03:03 +0000] [1] [INFO] Using worker: sync
2019-09-04T14:03:03Z [2019-09-04 14:03:03 +0000] [1] [INFO] Listening at: http://0.0.0.0:8080 (1)
2019-09-04T14:03:03Z [2019-09-04 14:03:03 +0000] [1] [INFO] Starting gunicorn 19.9.0
The instance exists in the console and appears to be running, but it just returns a 404. The code runs fine locally.
Any ideas for how to diagnose what is going on?
I wonder if Google reduced a default deadline since the current deadline appears to be 4 minutes and my build has always taken longer than 4 minutes.

I figured this out and it is kind of a crazy Google Cloud bug. TL; DR -- Don't use Google Cloud Organization Policy Constraints.
Here is what happened according to my best understanding:
For my Google Cloud project, I picked the us-central region.
About 6 months ago I set a Google Cloud policy constraint for my organization so that I would use only US-based resources. This set a policy that allowed US resources that existed at that time.
My recent deploys of my flex app were being deployed to the us-central1-f zone. I believe Google picked the zone and I don't have control over that.
The us-central1-f was not allowed by my location policy because that zone did not exist at the time I set my location policy.
This caused my deploy to crash with the unhelpful error message in my question.
The way I figured this out was that I deployed Google's hello world flask app, and when deploying that app, I received a more helpful error message that allowed me to understand the problem.

Related

How to debug Google app engine Sever Error 500?

I deployed a Django web app to GAE, no errors during deployment.
But whenI try to open the website, it shows Server Error (500).
I tried to see some logs using gcloud app logs read, it only shows
2020-05-28 16:07:48 default[20200528t144758] [2020-05-28 16:07:48 +0000] [1] [INFO] Handling signal: term
2020-05-28 16:07:48 default[20200528t144758] [2020-05-28 16:07:48 +0000] [8] [INFO] Worker exiting (pid: 8)
2020-05-28 16:07:49 default[20200528t144758] [2020-05-28 16:07:49 +0000] [1] [INFO] Shutting down: Master
2020-05-28 16:07:49 default[20200528t144758] [2020-05-28 16:07:49 +0000] [1] [INFO] Handling signal: term
2020-05-28 16:07:49 default[20200528t144758] [2020-05-28 16:07:49 +0000] [8] [INFO] Worker exiting (pid: 8)
2020-05-28 16:07:50 default[20200528t144758] [2020-05-28 16:07:50 +0000] [1] [INFO] Shutting down: Master
2020-05-28 16:08:06 default[20200528t165550] "GET /" 500
The logs are not informative, so I wonder
1) if I could logon to the App Engine machine, and run my web application manually and see what's the error?
2) if not, what are the suggested ways to debug app engine errors?
In App Engine Flex environment, you can debug your instance by enabling the debug mode and SSH to the instance.
You may also write app logs and structured logs to stdout and stderr so that you can look into your application logs and request logs via Logs Viewer or the command line. You may also consider using Cloud Profiler which is currently a free service to capture profiling data of you application so that you would get a better undrestanding of the characteristics of your application as it runs.
Cloud Debugger would also let you inspect the state of your application while running without adding logging statements. Note that Cloud Debugger is currently a free service as well.
By setting the DEBUG=1 in Django project settings.py, now I'm able to see error details on GAE.

Handling signal: term (gunicorn, eventlet, Flask, Google App Engine)

I am running a Flask app on Google App Engine using Gunicorn's async workers.
Every time requests come in, after the last request is finished responding, I get the following message and my gunicorn workers exit. Then, theres a slight delay when the next batch of requests come in.
2020-05-17 16:57:14 default[20200517t125405] [2020-05-17 16:57:14 +0000] [7] [INFO] Handling signal: term
2020-05-17 16:57:14 default[20200517t125405] [2020-05-17 16:57:14 +0000] [7] [INFO] Handling signal: term
2020-05-17 16:57:14 default[20200517t125405] [2020-05-17 16:57:14 +0000] [21] [INFO] Worker exiting (pid: 21)
2020-05-17 16:57:14 default[20200517t125405] [2020-05-17 16:57:14 +0000] [20] [INFO] Worker exiting (pid: 20)
2020-05-17 16:57:14 default[20200517t125405] [2020-05-17 16:57:14 +0000] [18] [INFO] Worker exiting (pid: 18)
2020-05-17 16:57:14 default[20200517t125405] [2020-05-17 16:57:14 +0000] [14] [INFO] Worker exiting (pid: 14)
2020-05-17 16:57:14 default[20200517t125405] [2020-05-17 16:57:14 +0000] [19] [INFO] Worker exiting (pid: 19)
Here is my app.yaml
runtime: python37
entrypoint: gunicorn --worker-class eventlet -c gunicorn.conf.py -b :$PORT main:app preload_app=True
instance_class: F2
Here is my gunicorn.conf.py file
import multiprocessing
workers = (multiprocessing.cpu_count()) * 2 + 1
threads = workers # originally didn't have this, just had the workers var defined, but tried this and it also didn't solve the problem
I tried searching SO and some other sources but can't find a workaround for this.

Akka Clustering is not working

I am trying to learn Akka Clustering following the tutorial provided here
I have created the app and the repo is here.
As mentioned in the tutorial I have started the FrontEndApp
> runMain TransformationFrontendApp
[info] Running TransformationFrontendApp
[INFO] [10/31/2017 17:28:05.293] [run-main-0] [akka.remote.Remoting] Starting remoting
[INFO] [10/31/2017 17:28:05.543] [run-main-0] [akka.remote.Remoting] Remoting started; listening on addresses :[akka.tcp://ClusterSystem#localhost:54746]
[INFO] [10/31/2017 17:28:05.556] [run-main-0]
[akka.cluster.Cluster(akka://ClusterSystem)] Cluster Node
[akka.tcp://ClusterSystem#localhost:54746] - Starting up...
[INFO] [10/31/2017 17:28:05.648] [run-main-0]
[akka.cluster.Cluster(akka://ClusterSystem)] Cluster Node
[akka.tcp://ClusterSystem#localhost:54746] - Registered cluster JMX MBean
[akka:type=Cluster]
[INFO] [10/31/2017 17:28:05.648] [run-main-0]
[akka.cluster.Cluster(akka://ClusterSystem)] Cluster Node
[akka.tcp://ClusterSystem#localhost:54746] - Started up successfully
[WARN] [10/31/2017 17:28:05.683] [ClusterSystem-akka.actor.default-dispatcher-2]
[WARN] [10/31/2017 17:28:05.748] [New I/O boss #3]
[NettyTransport(akka://ClusterSystem)] Remote connection to [null] failed with java.net.ConnectException: Connection refused: /127.0.0.1:2551
[WARN] [10/31/2017 17:28:05.750] [New I/O boss #3]
[NettyTransport(akka://ClusterSystem)] Remote connection to [null] failed with java.net.ConnectException: Connection refused: /127.0.0.1:2552
[WARN] [10/31/2017 17:28:05.751] [ClusterSystem-akka.remote.default-remote-dispatcher-12] [akka.tcp://ClusterSystem#localhost:54746/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2FClusterSystem%40127.0.0.1%3A2551-0] Association with remote system [akka.tcp://ClusterSystem#127.0.0.1:2551] has failed, address is now gated for [5000] ms. Reason: [Association failed with [akka.tcp://ClusterSystem#127.0.0.1:2551]] Caused by: [Connection refused: /127.0.0.1:2551]
The above warn message repeats continuously even after I start the Backend App on 2551 and 2552.
The terminal log of starting backend actor on 2551.
> runMain TransformationBackendApp 2551
[info] Running TransformationBackendApp 2551
[INFO] [10/31/2017 17:28:50.867] [run-main-0] [akka.remote.Remoting] Starting remoting
[INFO] [10/31/2017 17:28:51.122] [run-main-0] [akka.remote.Remoting] Remoting started; listening on addresses :[akka.tcp://ClusterSystem#localhost:2551]
[INFO] [10/31/2017 17:28:51.134] [run-main-0] [akka.cluster.Cluster(akka://ClusterSystem)] Cluster Node [akka.tcp://ClusterSystem#localhost:2551] - Starting up...
[INFO] [10/31/2017 17:28:51.228] [run-main-0] [akka.cluster.Cluster(akka://ClusterSystem)] Cluster Node [akka.tcp://ClusterSystem#localhost:2551] - Registered cluster JMX MBean [akka:type=Cluster]
[INFO] [10/31/2017 17:28:51.228] [run-main-0] [akka.cluster.Cluster(akka://ClusterSystem)] Cluster Node [akka.tcp://ClusterSystem#localhost:2551] - Started up successfully
[WARN] [10/31/2017 17:28:51.259] [ClusterSystem-akka.actor.default-dispatcher-3] [akka.tcp://ClusterSystem#localhost:2551/system/cluster/core/daemon/downingProvider] Don't use auto-down feature of Akka Cluster in production. See 'Auto-downing (DO NOT USE)' section of Akka Cluster documentation.
[ ERROR] [10/31/2017 17:28:51.382] [ClusterSystem-akka.remote.default-remote-dispatcher-5] [akka://ClusterSystem/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2FClusterSystem%40localhost%3A2551-2/endpointWriter] dropping message [class akka.actor.ActorSelectionMessage] for non-local recipient [Actor[akka.tcp://ClusterSystem#127.0.0.1:2551/]] arriving at [akka.tcp://ClusterSystem#127.0.0.1:2551] inbound addresses are [akka.tcp://ClusterSystem#localhost:2551]
The last [Error] log repeats continuously.
The terminal log of starting backend actor on 2552.
> runMain TransformationBackendApp 2552
[info] Running TransformationBackendApp 2552
[INFO] [10/31/2017 17:28:25.451] [run-main-0] [akka.remote.Remoting] Starting remoting
[INFO] [10/31/2017 17:28:25.689] [run-main-0] [akka.remote.Remoting] Remoting started; listening on addresses :[akka.tcp://ClusterSystem#localhost:2552]
[INFO] [10/31/2017 17:28:25.706] [run-main-0] [akka.cluster.Cluster(akka://ClusterSystem)] Cluster Node [akka.tcp://ClusterSystem#localhost:2552] - Starting up...
[INFO] [10/31/2017 17:28:25.803] [run-main-0] [akka.cluster.Cluster(akka://ClusterSystem)] Cluster Node [akka.tcp://ClusterSystem#localhost:2552] - Registered cluster JMX MBean [akka:type=Cluster]
[INFO] [10/31/2017 17:28:25.803] [run-main-0] [akka.cluster.Cluster(akka://ClusterSystem)] Cluster Node [akka.tcp://ClusterSystem#localhost:2552] - Started up successfully
[WARN] [10/31/2017 17:28:25.836] [ClusterSystem-akka.actor.default-dispatcher-2] [akka.tcp://ClusterSystem#localhost:2552/system/cluster/core/daemon/downingProvider] Don't use auto-down feature of Akka Cluster in production. See 'Auto-downing (DO NOT USE)' section of Akka Cluster documentation.
[WARN] [10/31/2017 17:28:25.909] [New I/O boss #3] [NettyTransport(akka://ClusterSystem)] Remote connection to [null] failed with java.net.ConnectException: Connection refused: /127.0.0.1:2551
[WARN] [10/31/2017 17:28:25.910] [ClusterSystem-akka.remote.default-remote-dispatcher-13] [akka.tcp://ClusterSystem#localhost:2552/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2FClusterSystem%40127.0.0.1%3A2551-0] Association with remote system [akka.tcp://ClusterSystem#127.0.0.1:2551] has failed, address is now gated for [5000] ms. Reason: [Association failed with [akka.tcp://ClusterSystem#127.0.0.1:2551]] Caused by: [Connection refused: /127.0.0.1:2551]
[INFO] [10/31/2017 17:28:25.914] [ClusterSystem-akka.actor.default-dispatcher-4] [akka://ClusterSystem/deadLetters] Message [akka.cluster.InternalClusterAction$InitJoin$] from Actor[akka://ClusterSystem/system/cluster/core/daemon/joinSeedNodeProcess-1#-937368711] to Actor[akka://ClusterSystem/deadLetters] was not delivered. [1] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'.
[ERROR] [10/31/2017 17:28:25.958] [ClusterSystem-akka.remote.default-remote-dispatcher-17] [akka://ClusterSystem/system/endpointManager/reliableEndpointWriter-akka.tcp%3A%2F%2FClusterSystem%40localhost%3A2552-2/endpointWriter] dropping message [class akka.actor.ActorSelectionMessage] for non-local recipient [Actor[akka.tcp://ClusterSystem#127.0.0.1:2552/]] arriving at [akka.tcp://ClusterSystem#127.0.0.1:2552] inbound addresses are [akka.tcp://ClusterSystem#localhost:2552]
Not sure what is the reason backend cluster nodes are not able to detect each other and frontend actor node with back end.
Do I miss any settings?
The problem is in your application.conf. You have akka.remote.netty.tcp.hostname = "localhost" and akka.cluster.seed-nodes=["akka.tcp://ClusterSystem#127.0.0.1:2551", "akka.tcp://ClusterSystem#127.0.0.1:2552"]. You have to use either localhost or 127.0.0.1, not both:
akka {
actor {
provider = "akka.cluster.ClusterActorRefProvider"
}
remote {
log-remote-lifecycle-events = off
netty.tcp {
hostname = "localhost"
port = 0
}
}
cluster {
seed-nodes = ["akka.tcp://ClusterSystem#localhost:2551", "akka.tcp://ClusterSystem#localhost:2552"]
auto-down-unreachable-after = 10s
}
}

(gcloud.app.deploy) Error Response: [13] Unexpected Error

I'm getting below error when I try to deploy an Spring Boot app to google cloud.
(gcloud.app.deploy) Error Response: [13] Unexpected Error.
I'm using com.google.cloud.tools:appengine-maven-plugin version 1.3.1, goal deploy. This error message is not useful at all! I appreciate any help with this as I am not much familiar with Google Cloud. Where should I start looking?
[INFO] GCLOUD: d4498962e4fc: Pushed
[INFO] GCLOUD: latest: digest: sha256:1c2516746601c4fe68dac3507fe684380b122ebc1801e8dc234599825d3cfb89 size: 2416
[INFO] GCLOUD: DONE
[INFO] GCLOUD: ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
[INFO] GCLOUD:
[INFO] GCLOUD: Updating service [default]...
[INFO] GCLOUD: .....................failed.
[INFO] GCLOUD: ERROR: (gcloud.app.deploy) Error Response: [13] Unexpected Error. ()
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO]------------------------------------------------------------------------
Turns out, same problem encountered by others.
(gcloud.app.deploy) Error Response: [13] Unexpected Error. ()
https://github.com/GoogleCloudPlatform/getting-started-java/issues/281#issuecomment-327572081
Result of this ticket must bring a resolution.
Simply rename your main python file (maybe it is app.py) to main.py
In the resources section of yaml file increase the number of cpu and the assigning memory if you are with the flex plan.

Yokozuna shutting down and taking Riak with it -Can't seem to find why

Currently experiencing an issue on a 10 node cluster, whereby after approx a day of running, 3 nodes will drop out (always a random 3).
Riak Version : 2.1.4
10 VM's running with 10GB Ram each, Running Oracle Linux version 7.3
Java Version :
[riak#pp2xria01trd001 riak$] java -version
openjdk version "1.8.0_121"
OpenJDK Runtime Environment (build 1.8.0_121-b13)
OpenJDK 64-Bit Server VM (build 25.121-b13, mixed mode)
Our usual Riak guy is on holiday at the moment, so don't have much resource to look into. Any help or guidance on where to possibly start looking would be greatly appreciated.
Crash dump details :
Slogan: Kernel pid terminated (application_controller) ({application_terminated,yokozuna,shutdown})System version: Erlang
R16B02_basho10 (erts-5.10.3) [source] [64-bit] [smp:2:2] [async-threads:64] [hipe] [kernel-poll:true] [frame-pointer]
Not much in the solr.log to detail why :
2017-04-06 21:04:13,958 [INFO] <qtp1924582348-828>#LogUpdateProcessorFactory.java:198 [marketblueprints_index] webapp=/internal_solr path=/update params={} {} 0 0
2017-04-06 21:04:18,567 [INFO] <qtp1924582348-855>#SolrDispatchFilter.java:732 [admin] webapp=null path=/admin/cores params={action=STATUS&wt=json} status=0 QTime=2
2017-04-06 21:04:23,573 [INFO] <qtp1924582348-1161>#SolrDispatchFilter.java:732 [admin] webapp=null path=/admin/cores params={action=STATUS&wt=json} status=0 QTime=2
2017-04-06 21:04:28,578 [INFO] <qtp1924582348-865>#SolrDispatchFilter.java:732 [admin] webapp=null path=/admin/cores params={action=STATUS&wt=json} status=0 QTime=2
2017-04-06 21:04:33,584 [INFO] <qtp1924582348-848>#SolrDispatchFilter.java:732 [admin] webapp=null path=/admin/cores params={action=STATUS&wt=json} status=0 QTime=2
2017-04-06 21:04:38,589 [INFO] <qtp1924582348-641>#SolrDispatchFilter.java:732 [admin] webapp=null path=/admin/cores params={action=STATUS&wt=json} status=0 QTime=2
2017-04-06 21:04:54,242 [INFO] <Thread-1>#Monitor.java:41 Yokozuna has exited - shutting down Solr
2017-04-06 21:04:55,219 [INFO] <Thread-2>#Server.java:320 Graceful shutdown SocketConnector#0.0.0.0:8093
2017-04-06 21:04:56,027 [INFO] <Thread-2>#Server.java:329 Graceful shutdown o.e.j.w.WebAppContext{/internal_solr,file:/var/lib/riak/yz_temp/solr-webapp/webapp/},/usr/lib64/
riak/lib/yokozuna-2.1.7-0-g6cf80ad/priv/solr/webapps/solr.war
2017-04-06 21:04:59,288 [INFO] <Thread-2>#CoreContainer.java:314 Shutting down CoreContainer instance=1916575798
2017-04-06 21:04:59,710 [INFO] <Thread-2>#SolrCore.java:1040 [feed_mapping_index] CLOSING SolrCore org.apache.solr.core.SolrCore#78acc5b
However, after some of the merge processes in the solr.log, we are getting the following (which I suspect is preventing the supervisor from re-starting it for the 2nd time, and hence stopping Riak
2017-04-06 21:05:13,546 [INFO] <Thread-2>#CachingDirectoryFactory.java:305 Closing directory: /var/lib/riak/yz/endpoint_mappings_index/data
2017-04-06 21:05:13,547 [INFO] <Thread-2>#CachingDirectoryFactory.java:236 looking to close /var/lib/riak/yz/endpoint_mappings_index/data/index [CachedDir<<refCount=0;path=
/var/lib/riak/yz/endpoint_mappings_index/data/index;done=false>>]
2017-04-06 21:05:13,547 [INFO] <Thread-2>#CachingDirectoryFactory.java:305 Closing directory: /var/lib/riak/yz/endpoint_mappings_index/data/index
2017-04-06 21:05:14,657 [INFO] <Thread-2>#ContextHandler.java:832 stopped o.e.j.w.WebAppContext{/internal_solr,file:/var/lib/riak/yz_temp/solr-webapp/webapp/},/usr/lib64/ri
ak/lib/yokozuna-2.1.7-0-g6cf80ad/priv/solr/webapps/solr.war
2017-04-06 21:05:15,298 [WARN] <Thread-2>#QueuedThreadPool.java:145 79 threads could not be stopped
Erlang.log contains :
2017-04-06 21:04:54.193 [error] <0.5934.108> gen_server yz_solr_proc terminated with reason: {timeout,{gen_server,call,[<0.1306.0>,{spawn_connection,{url,"http://localhost:
8093/internal_solr/admin/cores?action=STATUS&wt=json","localhost",8093,undefined,undefined,"/internal_solr/admin/cores?action=STATUS&wt=json",http,hostname},100,1,{[],false
},[]}]}}
2017-04-06 21:04:54.198 [error] <0.5934.108> CRASH REPORT Process yz_solr_proc with 0 neighbours exited with reason: {timeout,{gen_server,call,[<0.1306.0>,{spawn_connection
,{url,"http://localhost:8093/internal_solr/admin/cores?action=STATUS&wt=json","localhost",8093,undefined,undefined,"/internal_solr/admin/cores?action=STATUS&wt=json",http,h
ostname},100,1,{[],false},[]}]}} in gen_server:terminate/6 line 744
2017-04-06 21:04:54.201 [error] <0.1150.0> Supervisor yz_solr_sup had child yz_solr_proc started with yz_solr_proc:start_link("/var/lib/riak/yz", "/var/lib/riak/yz_temp", 8
093, 8985) at <0.5934.108> exit with reason {timeout,{gen_server,call,[<0.1306.0>,{spawn_connection,{url,"http://localhost:8093/internal_solr/admin/cores?action=STATUS&wt=j
son","localhost",8093,undefined,undefined,"/internal_solr/admin/cores?action=STATUS&wt=json",http,hostname},100,1,{[],false},[]}]}} in context child_terminated
2017-04-06 21:04:57.422 [info] <0.1102.0>#riak_ensemble_peer:leading:631 {{kv,1141798154164767904846628775559596109106197299200,3,114179815416476790484662877555959610910619
7299200},'riak#pp2xria01trd001.pp2.williamhill.plc'}: Leading
2017-04-06 21:04:57.422 [info] <0.1090.0>#riak_ensemble_peer:leading:631 {{kv,685078892498860742907977265335757665463718379520,3,6850788924988607429079772653357576654637183
79520},'riak#pp2xria01trd001.pp2.williamhill.plc'}: Leading
2017-04-06 21:04:57.780 [info] <0.1072.0>#riak_ensemble_peer:leading:631 {{kv,0,3,0},'riak#pp2xria01trd001.pp2.williamhill.plc'}: Leading
2017-04-06 21:05:01.432 [info] <0.8030.232>#yz_solr_proc:init:119 Starting solr: "/usr/bin/riak/java" ["-Djava.awt.headless=true","-Djetty.home=/usr/lib64/riak/lib/yokozuna
-2.1.7-0-g6cf80ad/priv/solr","-Djetty.temp=/var/lib/riak/yz_temp","-Djetty.port=8093","-Dsolr.solr.home=/var/lib/riak/yz","-DhostContext=/internal_solr","-cp","/usr/lib64/r
iak/lib/yokozuna-2.1.7-0-g6cf80ad/priv/solr/start.jar","-Dlog4j.configuration=file:///etc/riak/solr-log4j.properties","-Dyz.lib.dir=/usr/lib64/riak/lib/yokozuna-2.1.7-0-g6c
f80ad/priv/java_lib","-d64","-Xms4g","-Xmx4g","-XX:+UseStringCache","-XX:+UseCompressedOops","-Dcom.sun.management.jmxremote.port=8985","-Dcom.sun.management.jmxremote.auth
enticate=false","-Dcom.sun.management.jmxremote.ssl=false","org.eclipse.jetty.start.Main"]
2017-04-06 21:05:01.483 [info] <0.1108.0>#riak_ensemble_peer:leading:631 {{kv,1370157784997721485815954530671515330927436759040,3,137015778499772148581595453067151533092743
6759040},'riak#pp2xria01trd001.pp2.williamhill.plc'}: Leading
2017-04-06 21:05:02.032 [info] <0.8030.232>#yz_solr_proc:handle_info:184 solr stdout/err: OpenJDK 64-Bit Server VM warning: ignoring option UseSplitVerifier; support was re
moved in 8.0
OpenJDK 64-Bit Server VM warning: ignoring option UseStringCache; support was removed in 8.0
2017-04-06 21:05:04.212 [info] <0.1110.0>#riak_ensemble_peer:leading:631 {{kv,1415829711164312202009819681693899175291684651008,3,0},'riak#pp2xria01trd001.pp2.williamhill.p
lc'}: Leading
2017-04-06 21:05:10.798 [info] <0.1096.0>#riak_ensemble_peer:leading:631 {{kv,913438523331814323877303020447676887284957839360,3,9134385233318143238773030204476768872849578
39360},'riak#pp2xria01trd001.pp2.williamhill.plc'}: Leading
2017-04-06 21:05:17.001 [info] <0.8030.232>#yz_solr_proc:handle_info:184 solr stdout/err: Error: Exception thrown by the agent : java.rmi.server.ExportException: Port alrea
dy in use: 8985; nested exception is:
java.net.BindException: Address already in use (Bind failed)
2017-04-06 21:05:17.964 [error] <0.8030.232> gen_server yz_solr_proc terminated with reason: {"solr OS process exited",1}
2017-04-06 21:05:17.964 [error] <0.8030.232> CRASH REPORT Process yz_solr_proc with 0 neighbours exited with reason: {"solr OS process exited",1} in gen_server:terminate/6
line 744
2017-04-06 21:05:17.964 [error] <0.1150.0> Supervisor yz_solr_sup had child yz_solr_proc started with yz_solr_proc:start_link("/var/lib/riak/yz", "/var/lib/riak/yz_temp", 8
093, 8985) at <0.8030.232> exit with reason {"solr OS process exited",1} in context child_terminated
2017-04-06 21:05:17.964 [error] <0.1150.0> Supervisor yz_solr_sup had child yz_solr_proc started with yz_solr_proc:start_link("/var/lib/riak/yz", "/var/lib/riak/yz_temp", 8
093, 8985) at <0.8030.232> exit with reason reached_max_restart_intensity in context shutdown
2017-04-06 21:05:17.964 [error] <0.1119.0> Supervisor yz_sup had child yz_solr_sup started with yz_solr_sup:start_link() at <0.1150.0> exit with reason shutdown in context
child_terminated
2017-04-06 21:05:17.964 [error] <0.1119.0> Supervisor yz_sup had child yz_solr_sup started with yz_solr_sup:start_link() at <0.1150.0> exit with reason reached_max_restart_
intensity in context shutdown
2017-04-06 21:05:23.072 [error] <0.1551.0> Supervisor yz_index_hashtree_sup had child ignored started with yz_index_hashtree:start_link() at undefined exit with reason kill
ed in context shutdown_error
2017-04-06 21:05:24.353 [info] <0.745.0>#yz_app:prep_stop:74 Stopping application yokozuna.
2017-04-06 21:05:27.582 [error] <0.745.0>#yz_app:prep_stop:82 Stopping application yokozuna - exit:{noproc,{gen_server,call,[yz_solrq_drain_mgr,{drain,[]},infinity]}}.
2017-04-06 21:05:27.582 [info] <0.745.0>#yz_app:stop:88 Stopped application yokozuna.
2017-04-06 21:05:27.940 [info] <0.7.0> Application yokozuna exited with reason: shutdown
2017-04-06 21:05:28.165 [info] <0.431.0>#riak_kv_app:prep_stop:228 Stopping application riak_kv - marked service down.
2017-04-06 21:05:28.252 [info] <0.431.0>#riak_kv_app:prep_stop:232 Unregistered pb services
2017-04-06 21:05:28.408 [info] <0.431.0>#riak_kv_app:prep_stop:237 unregistered webmachine routes
2017-04-06 21:05:28.459 [info] <0.431.0>#riak_kv_app:prep_stop:239 all active put FSMs completed
2017-04-06 21:05:29.665 [info] <0.540.0>#riak_kv_js_vm:terminate:237 Spidermonkey VM (pool: riak_kv_js_hook) host stopping (<0.540.0>)
2017-04-06 21:05:29.665 [info] <0.539.0>#riak_kv_js_vm:terminate:237 Spidermonkey VM (pool: riak_kv_js_hook) host stopping (<0.539.0>)
2017-04-06 21:05:30.379 [info] <0.532.0>#riak_kv_js_vm:terminate:237 Spidermonkey VM (pool: riak_kv_js_reduce) host stopping (<0.532.0>)
2017-04-06 21:05:31.116 [info] <0.534.0>#riak_kv_js_vm:terminate:237 Spidermonkey VM (pool: riak_kv_js_reduce) host stopping (<0.534.0>)
2017-04-06 21:05:31.362 [info] <0.533.0>#riak_kv_js_vm:terminate:237 Spidermonkey VM (pool: riak_kv_js_reduce) host stopping (<0.533.0>)
2017-04-06 21:05:32.153 [info] <0.536.0>#riak_kv_js_vm:terminate:237 Spidermonkey VM (pool: riak_kv_js_reduce) host stopping (<0.536.0>)
2017-04-06 21:05:32.245 [info] <0.537.0>#riak_kv_js_vm:terminate:237 Spidermonkey VM (pool: riak_kv_js_reduce) host stopping (<0.537.0>)
2017-04-06 21:05:32.676 [info] <0.535.0>#riak_kv_js_vm:terminate:237 Spidermonkey VM (pool: riak_kv_js_reduce) host stopping (<0.535.0>)
2017-04-06 21:05:33.450 [info] <0.431.0>#riak_kv_app:stop:250 Stopped application riak_kv.
2017-04-06 21:05:41.701 [info] <0.195.0>#riak_core_app:stop:116 Stopped application riak_core.
2017-04-06 21:05:43.061 [info] <0.93.0> alarm_handler: {clear,system_memory_high_watermark}
We have the extra options added to riak.conf
search = on
search.solr.jmx_port = 8985
search.solr.jvm_options = -d64 -Xms4g -Xmx4g -XX:+UseStringCache -XX:+UseCompressedOops
search.solr.port = 8093
search.solr.start_timeout = 180s
No sign of any OOM errors, or processes being killed by a oom_killer

Resources