What is CrashLoopBackOff status for openshift pods? - apache-camel

There is more than one example where I have seen this status from a pod running in openshift origin. In this case it was the quickstart for the cdi camel example. I was able to successfully build and run it locally (non - openshift) but when I try to deploy on my local openshift (using mvn -Pf8-local-deploy), I get this output for that particular example (snipped for relevance) :-
[vagrant#vagrant camel]$ oc get pods
NAME READY STATUS RESTARTS AGE
cdi-camel-z4czs 0/1 CrashLoopBackOff 4 2m
Tail of the logs are as under:-
Error occurred during initialization of VM
Error opening zip file or JAR manifest missing : agents/jolokia.jar
agent library failed to init: instrument
Can someone help me solve this ?

If the state of the pod goes in to CrashLoopBackOff it usually indicates that the application within the container is failing to start up properly and the container is exiting straight away as a result.
If you use oc logs on the pod name, you may not see anything useful though as it would capture what the latest attempt to start it up is doing and may miss messages.
What you should do is instead provide the --previous or -p option to oc logs along with the pod name. That will show you the complete logs from the previous attempt to start up the container.
If this is an arbitrary Docker image you are using, a common problem that can occur and which would cause the container not to start, is an application image which requires to be run as the root user. Because running an application inside of a container as root still has risks, OpenShift doesn't allow you to do that by default and will instead run as an arbitrary assigned user ID. The application image may not be designed with this possibility in mind and so is failing.
So try and get those logs messages and see what the problem is.

Temporary workaround -> https://github.com/fabric8io/ipaas-quickstarts/issues/1157
Basically, the src/main/hawt-app directory needs to be deleted.

Related

Deploying Haskell yesod docker container on google app engine

I am trying to upload a yesod Docker container on Google App Engine. The source code is here and the Docker image is here.
I followed the documentation in the Custom runtime quickstart, and when invoking gcloud app deploy the app builds fine after increasing the build timeout, but the container either the readiness check when trying to start or shows the following timeout message:
ERROR: (gcloud.app.deploy) Operation [apps/meeshkan-github-webhook-router/operations/xxxx-xxxx-xxxx] timed out. This operation may still be underway.
I have tried experimenting with several things, including a manual readiness check, creating an /_ah/health endpoint, and increasing the timeout of the readiness check all the way to 1799 seconds, but none of these actions seem to work.
One issue may be the size of the container (it is 3.2gb), and I could try to prune it down, but I'd only do that if someone could confirm that container size is a contributing factor to deployment problems. Other than that, I'm not sure what could be causing this failure. The docker image starts fine on our local machines.
Thanks in advance for your help and suggestions!
The issue turned out to be that, because I was building on Windows, images built using Docker Desktop on Windows gave all shell scripts executable permission automatically, whereas Docker on Linux needs shell scripts to be given the executable permission. By adding this line to my Dockerfile:
RUN chmod +x /usr/src/app/run.sh
Everything worked fine!

Appengine Flexible Logging with Pyramid

Migrating from Standard to Flexible appengine and trying to get logs to show up the same way they did under Standard. Example:
That is one single log entry, expanded. Shows what the URL is at the top, the http method, status and latency of the request. When expanded like it is, it shows all logs that were created during the request, their level, and where in the code it is. This makes it very easy to see what all happened during a request.
Under Flexible, none of this seems to happen. Calling logging.info() creates its own distinct log entry in Logging, with no information about what request/route it was triggered under:
As you can see, each log entry (and in the case of a fatal error, tracebacks) get their own individual log entry per line. Some digging in their api and documentation I was able to get it to a point where I can at least group them together somewhat, but it's still not where it used to be.
I don't get severity level at the "group" level of the log, only when expanded (which means filtering by severity isn't possible) nor do I get what line the logging entry was called at. This also means a lot more individual log entries, and I don't even know how this will affect log exports.
To group the logs, I'm passing Pyramid a custom logging handler which is just google's AppEngineHandler but overriding get_gae_labels to provide it with the correct trace ID header (out of the box, it only supports django, flask, and webapp2)
def get_gae_labels(self):
"""Return the labels for GAE app.
If the trace ID can be detected, it will be included as a label.
Currently, no other labels are included.
:rtype: dict
:returns: Labels for GAE app.
"""
gae_labels = {}
request = pyramid.threadlocal.get_current_request()
header = request.headers.get('X-Cloud-Trace-Context')
if header:
gae_labels[_TRACE_ID_LABEL] = header.split("/", 1)[0]
return gae_labels
From what I can gather, appengine Flexible runs nginx in front of my application, and that passes stderr logs to Logging, and its own nginx_request logs. Then, when my application calls logging.info(), it matches up a trace ID to group them together. Because of this, a few things seem to be happening.
A. It doesn't show the highest severity level of related log entries
B. When you expand the log entry, the related log entries don't appear instantly like they do under appengine Standard, they take a second to load in as presumably Logging is looking for related logs via trace ID. Under Standard, appengine provides Logging with a line entry which has some meta data like the log message, line number, source code location etc, so it doesn't need to go look for related log entries, it's all there from the beginning. See below
I'm not sure of a solution here (hence the post) and I wonder if this ultimately would be solved by expanding google's Logging api. It seems to me that the solution really is to stop nginx from logging anything and let Pyramid handle logging exclusively, as well as allowing me to send up data within line so Logging doesn't have to try and group requests by trace ID.
Custom runtime under Flexible, adding this in the yaml file:
runtime_config:
python_version: 3.7
And the dockerfile:
FROM gcr.io/google-appengine/python
# Create a virtualenv for dependencies. This isolates these packages from
# system-level packages.
# Use -p python3 or -p python3.7 to select python version. Default is version 2.
RUN virtualenv /env -p python3
# Setting these environment variables are the same as running
# source /env/bin/activate.
ENV VIRTUAL_ENV /env
ENV PATH /env/bin:$PATH
# Copy the application's requirements.txt and run pip to install all
# dependencies into the virtualenv.
ADD requirements.txt /app/requirements.txt
RUN pip install -r /app/requirements.txt
# Add the application source code.
ADD . /app
# Run a WSGI server to serve the application. gunicorn must be declared as
# a dependency in requirements.txt.
RUN pip install -e .
CMD gunicorn -b :$PORT main:app
and requiresments.txt
pyramid
gunicorn
redis
google-cloud-tasks
googleapis-common-protos
google-cloud-ndb
google-cloud-logging

dev_appserver.py slow start

After some time I needed to do changes on GAE Python (2.7) First Generation app, but when I develop and run locally, it's basically nonusable due to a very slow start.
These are printed logs:
INFO 2019-10-18 07:56:35,533 devappserver2.py:278] Skipping SDK update check.
INFO 2019-10-18 07:56:35,595 api_server.py:275] Starting API server at: http://localhost:36159
INFO 2019-10-18 07:56:35,599 dispatcher.py:270] Starting module "default" running at: http://127.0.0.1:8080
INFO 2019-10-18 07:56:35,600 admin_server.py:152] Starting admin server at: http://localhost:8000
INFO 2019-10-18 08:01:01,644 instance.py:294] Instance PID: 28496
What I notice that the last line (instance.py) is printed after ~5 minutes and only after that app responds to requests, not before. Interesting that the admin server (localhost:8000) is available right away. Of course, when I do some code change it automatically reloads and it repeats again.
Things I tried/found out:
it behaves like that on my all GAE projects Python First Gen.
tried to create a bare minimal version (webapp2 with one URL), clean virtualenv, still the same behavior
tried to reinstall Google Cloud SDK. delete the whole google-cloud-sdk folder and install again, no changes
tried to install older version of Cloud SDK
used clean VM and it works ok!!!, so it looks like there could be something wrong with my system (outside of SDK), but I'm not sure what.
It's interesting that the pause between the last two log lines is always about 5 minutes, not sure why exactly that time.
Python 2.7.14
OS: OpenSuse Leap 15.0
I'm running out of ideas so any advice would be appreciated.
I solved this accidentally.
I wanted to run Jupyter notebook, but I got the error:
error: [Errno 99] Cannot assign requested address
after debugging in /tornado/netutil.py, I noticed that it tries to work with IP 192.168.1.50 which I wasn't sure where did that come from, (probably I set it since I was playing with my home network some time ago), but when I deleted it from /etc/hosts, Jupyter, as well as GAE, works ok.
What a coincidence :)

How to run managed-vm-gae example code locally

I followed this tutorial
to get a Bigtable client up and running in Google Managed VMs. But is there a way to run this locally? Reason is that deploying the code remotely in development is a pain.
Normally I can use dev_appserver.sh to run GAE app locally. But when I run it, I'm getting this error:
Caused by: java.lang.IllegalStateException: Jetty ALPN has not been
properly configured.
Which means we need to include ALPN library? Since our codebase is in Java 7, I used this ALPN version: 7.1.3.v20150130.
I then tried again with this:
dev_appserver.sh --jvm_flag=-Xbootclasspath/p:/Users/shouguoli/tmp/alpn-boot-7.1.3.v20150130.jar
still getting this error:
Caused by: com.google.apphosting.api.ApiProxy$CallNotFoundException:
The API package 'urlfetch' or call 'Fetch()' was not found.
How do you get it to work locally?
The sample was updated last week. It's based on the java 8 compat runtime, which means that you have access to most of the App Engine API's including Users, Task Queues, and Datastore.
There is a new Netty TCNative module that uses Boring SSL.
To use it with the pom.xml in the sample, do:
mvn clean -Pmac jetty:run -Dbigtable.projectID=<your-project> -Dbigtable.clusterID=<your-cluster> -Dbigtable.zone=<your-zone>
To use on Windows, use -Pwindows instead of -Pmac. For linux, omit the Profile -P as it's the default.
To deploy:
mvn clean gcloud:deploy -Dbigtable.projectID=<your-project> -Dbigtable.clusterID=<your-cluster> -Dbigtable.zone=<your-zone>
NOTE - it is advisable to do the clean between running locally and running remotely as the TCNative module is currently specific to the platform the code runs on.
We are in the process of updating all of our samples to use TCNative, we hope to have this by 3/10/16.

Google app engine error, don't know where to start looking

I'm receiving this error when I attempt to start an app using google app engine :
Loading modules
com.bookmark.Mobile_bookmark
[ERROR] Unable to find 'com/bookmark/Mobile_bookmark.gwt.xml' on your classpath; could be a typo, or maybe you forgot to include a classpath entry for source?
[ERROR] shell failed in doStartup method
I don't have a file called Mobile_bookmark.gwt.xml. My project name is mobile-bookmark
Why is this error occuring ?
If it's looking for Mobile_bookmark.gwt.xml (the wrong file), I'd check how you are starting the server.
For example, under Eclipse, I'd check run configurations --> arguments to see if "Mobile_bookmark" got into there. (you can get to run configurations by selecting the little triangle next to either the green debug or run button you use to start it --- or RUN (menu in eclipse) -- run configurations -- select your project)
Change the file name to whatever *.gwt.xml file you do have. I suspect it's mobile-bookmark.gwt.xml.
Look under /src/yourpath/yourproject/
Your layout is probably like:
/src/yourpath/yourproject/*.gwt.xml
/src/yourpath/yourproject/client
/src/yourpath/yourproject/server

Resources