Adjust priority of alerts coming from Nagios - nagios

We're using opsgenie and Nagios for monitoring and incident notifications. I'm looking to set priority of incidents based on the environment they come from (staging/dev P3 or lower, production P2 or higher).
We're using the Nagios Opsgenie plugin but I don't see anywhere that I can configure the priority of alerts by default. Is this an option with Nagios to Opsgenie integrations?

Related

After the Flink version is upgraded, the taskmanager log information cannot be seen in the web UI

After the Flink version is upgraded, the taskmanager log information can not be seen in the web UI. In stdout, you can see the log of the code itself, but can not see the log of Spring and Flink itself.
What version have you upgraded to, and how is Flink running (i.e., Yarn, Kubernetes, standalone, etc)?
With some versions of Flink in certain environments, the logs aren't available in the web UI because they are being aggregated elsewhere. For example, you will want to use something like kubectl logs to access the logs if you are running on Kubernetes with certain versions of Flink.
UPDATE
Flink 1.11 switched from log4j1 to log4j2. See the release notes for details. Also, the logging properties file log4j-yarn-session.properties was renamed to log4j-session.properties and yarn-session.sh was updated to use the new file. Again, see the release notes for more info.

Flink : Unable to collect Task Metrics via JMX

I have been able to run JMX with Flink with the following configuration applied to the flink-conf.yaml file of all nodes in the cluster:
metrics.reporters: jmx
metrics.reporter.jmx.class: org.apache.flink.metrics.jmx.JMXReporter
metrics.reporter.jmx.port: 9020-9022
env.java.opts: -Dcom.sun.management.jmxremote -
Dcom.sun.management.jmxremote.port=9999 -
Dcom.sun.management.jmxremote.authenticate=false -
Dcom.sun.management.jmxremote.ssl=false
When I run JConsole and listen on ports master-IP:9999/slave-IP:9020, I am able to see the system metrics like CPU, memory etc.
How can I access the task metrics and their respective graphs like bytesRead, latency etc. which are collected for each subtask and shown on the GUI.
you can go to mbeans tab on jconsole and there you will see various dropdowns on RHS in the name of job and tasks. Let me know if you have any issues.

Why "Configuration" section of running job is empty?

Can anybody explain me why "Configuration" section of running job in Apache Flink Dashboard is empty?
How to use this job configuration in my flow? Seems like this is not described in documentation.
The configuration tab of a running job shows the values of the ExecutionConfig. Depending on the version of Flink you might will experience a different behaviour.
Flink <= 1.0
The ExecutionConfig is only accessible for finished jobs. For running jobs, it is not possible to access it. Once the job has finished or has been stopped/cancelled, you should be able to see the ExecutionConfig.
Flink > 1.0
The ExecutionConfig can also be accessed for running jobs.

How to make SOLR "boot" into Cloud Mode when server reboots

NOTE: I've tried everything in the comments below and everything else I can think of. At this point I have to assume there's a bug of some kind and that a restart will NOT bring SOLR up in cloud mode unless you roll your own init.d stuff.
==================================================
I have 3 SOLR nodes and 3 Zookeeper nodes.
The SOLR Nodes are SOLR 5.4 on Ubuntu 14 and were installed based on the instructions here:
https://cwiki.apache.org/confluence/display/solr/Taking+Solr+to+Production
If I issue this command to start or restart SOLR on the command line, everything looks fine in the SOLR Admin UI and all my nodes are green in the UI
sudo /opt/solr/bin/solr restart -c -z 192.168.56.5,192.168.56.6,192.168.56.7/solr5_4
However, even though I have a ZK_HOST entry in my solr.in.sh I cannot get the nodes to show up in the SOLR Admin console correctly if I try:
service solr restart
Or if I reboot the VM.
My ZK_HOST entry in solr.in.sh looks like this:
ZK_HOST="192.168.56.5,192.168.56.6,192.168.56.7/solr5_4"
I also tried it this way (no quotes, just in case) because that's how it looks on the Apache wiki page I was reading:
ZK_HOST=192.168.56.5,192.168.56.6,192.168.56.7/solr5_4
I always have to run the command line to get the SOLR instances to show up correctly in the Admin UI. It would be preferrable to have this "just happen" when rebooting the VM.
If I run service solr restart on any of them, they show as "down" in the Admin UI and the core I am using disappears from the Admin UI for the one IP address I'm looking at.
Why is this and what settings are required to get SOLR to start on boot into "Cloud Mode" with the correct Zookeeper settings?
Until a recent change, the docs for setting SOLR up for Prod had a slight misdirection. The bottom line here is that /etc/default/solr.in.sh was what controlled the SOLR configs on startup. NOT the one mentioned in the docs (which was somewhere else anyway /opt/solr/bin)
Once I added the ZKHOST setting in /etc/default/solr.in.sh and restarted the service (or rebooted the server) SOLR came up in "Cloud" mode every time.

Nagios / Check_MK web interface host visibility

I am using Nagios with Check_MK addon at a small ISP company I work for. I am a sole Nagios admin, but we have a few users who use the Nagios / Check_MK system (with Check_MK as the web frontend).
As most devices we use are MikroTik routers with proprietary OS within which I cannot install the check_mk agent (have to use SNMP), I am using Check_MK with generate_hostconf = False - I have to manually define WiFi interface checks (like signal strength check) anyways, so all host configuration is done in Nagios files.
All users who use the system are listed in cgi.cfg with authorized_for_all_services=user1,user2 and authorized_for_all_hosts=user1,user2 etc.
As I was not satisfied with the current configuration (there is not enough serverity-based differentiation among different hosts and service types - i.e. we want not only backbone / not-monitored host differentiation, but something more fine-grained like backbone / distribution layer 1 / distribution layer 2 / not-monitored client-side), I started to change the configuration to somewhat hackish setup with multiple contacts per real user, with different timeperiods assigned, so that e.g. 'distribution-layer 2' hosts don't wake people at 3 a.m. Perhaps this is not the proper way to do the thing.
Anyways, here is the problem - I created new contacts and contact groups and some rules for inventory - for services it works fine, it seems, but apparently hosts are not visible in Check_MK web interface (but they are visible in our Nagios website). Most likely it is so due to the fact I am logged in as the 'old' user, who is not a part of the new contact group, but still who is supposed to see all hosts (as defined in cgi.cfg). Can I do something to make hosts visible in Check_MK GUI with that setup and not only in Nagios web interface?
I had to use check_mk --flush hostname and re-inventory with check_mk -II hostname even after changing the settings back to the previous state to make the hosts appear again.
I haven't tried to add new contacts to .htaccess, as I don't really want to create multiple contacts with login permissions. Does Check_mk simply ignore the authorized_for_all_hosts / services directive defined in cgi.cfg in this case?
I can see that Check_MK itself is able to communicate with those hosts not show in GUI - I can do check_mk -II hostname or check_mk -N hostname. Appropriate entries are present in etc/check_mk.d/check_mk_objects.cfg and nagios/var/retention.dat; hostnames are listed with check_mk --list-tag TAG etc. so most likely it is a problem with GUI user permissions only.
I know I could use notification_period directives for hosts and custom SNMP services within Nagios configuration files and extra_service_conf['notification_period'] in main.mk, but I am actually using that for some exceptional cases and wasn't sure about the precedence rules.
Anyways, it is Ubuntu Server 12.04 LTS x86_64, Nagios Core 3.4.1, Check_MK 1.2.0p3.
Apparently it is enough to use default_user_role = "admin" in multisite.mk. Perhaps not the safest thing, but it gets the job done in this setup.

Resources