Ingest metrics data from Nagios to Prometheus

Ingest metrics data from Nagios to Prometheus - nagios

We are trying to expose the metrics Nagios collects to ingest into another system. That system would primarily comprise of prometheus as a historical data repository of all Nagios data collected so far.
We have tried using Nrpe Exporter but that seems to be only fetching the values like Ok, Crit etc whereas what we want is the actual CPU or other scripts output.
The Nagios version is 3.5.1

Related

How to load redis rdb data into running redis server without losing redis data?

I have 2 redis server. If I have backup of one redis server (example.rdb), then how to load this data to another running redis server without losing current memory data ?

You can use the rdb command from redis-rdb-tools with the -c protocol option to output redis commands representing the data in the RDB file, and pipe them into a redis instance using netcat, socat, or similar.
Unfortunately, this python package was built for now unsupported python 2.7 and 3.5 and has not been updated since 2020 (see FAQs).

How to "submit" an ad-hoc SQL to Beam on Flink

I'm using Apache Beam with Flink runner with Java SDK. It seems that deploying a job to Flink means building a 80-megabyte fat jar that gets uploaded to Flink job manager.
Is there a way to easily deploy a lightweight SQL to run Beam SQL? Maybe have job deployed that can soemhow get and run ad hoc queries?

I don't think it's possible at the moment, if I understand your question. Right now Beam SDK will always build a fat jar which will implement the pipeline and include all pipeline dependencies, and it will not be able to accept lightweight ad-hoc queries.
If you're interested in more interactive experience in general, you cat look at the ongoing efforts to make Beam more interactive, for example:
SQL shell: https://s.apache.org/beam-sql-packaging . This describes a work-in-progress Beam SQL shell, which should allow you to quickly execute small SQL queries locally in a REPL environment, so that you can interactively explore your data, and design the pipeline before submitting a long-running job. This does not change the way how the job gets submitted to Flink (or any other runner) though. So after you submitted the long running job, you will likely still have to use normal job management tools you currently have to control it.
Python: https://s.apache.org/interactive-beam . Describes the approach to wrap existing runner into an interactive wrapper.

Near real-time data ingestion from SQL SERVER to HDFS in cloudera

We have PLC data in SQL Server which gets updated every 5 min.
Have to push the data to HDFS in cloudera distribution in the same time interval.
Which are the tools available for this?

I would suggest to use the Confluent Kafka for this task (https://www.confluent.io/product/connectors/).
The idea is as following:
SQLServer --> [JDBC-Connector] --> Kafka --> [HDFS-Connector] --> HDFS
All these connectors are already available via confluent web site.

I'm assuming your data is being written in some directory in local FS. You may use some streaming engine for this task. Since you've tagged this with apache-spark, I'll give you the Spark Streaming solution.
Using structured streaming, your streaming consumer will watch your data directory. Spark streaming reads and processes data in configurable micro batches (stream wait time) which in your case will be of a 5 min duration. You may save data in each micro batch as text files which will use your cloudera hadoop cluster for storage.
Let me know if this helped. Cheers.

You can google the tool named sqoop. It is an open source software.

What is simplest way to stream yammer Metrics data into relational database?

We start to integrate yammer metrics in our applications. We want to collect generated Metrics data into relational database table.
How this Metrics data can be Streamed to database continuously ?
I have searched the internet and found that Yammer provides inbuilt Reporter API(CSVReporter, GraphiteReporter etc.) which can Stream data to CSV, Graphite etc.
We cannot keep augmenting CSV or text files because they have to be archived from server after some time because of memory issues.
Once yammer metrics API streams data out to some other place, do it keeps the copy of same in server memory ?
We want to keep server memory free once data streamed out to database.

The metrics stay in memory for a while in every situation, but you need a product like Ganglia or Graphite to store the data long term. These are normally better for operations metrics than a relational database and provide reporting add-ons. You'd need to have some extra code or extend the metrics library to log directly to a database.
Once the data is streamed out there is no point in holding onto it so it isn't going to affect your servers if you have it setup correctly.

Import data on HDFS to SQL Server or export data on HDFS to SQL Server

I had been trying to figure out on which is the best approach for porting data from HDFS to SQL Server.
Do I import data from Cloudera Hadoop using sqoop Hadoop Connector for SQL Server 2008 R2 or
Do I export data from Cloudera Hadoop using sqoop into SQL Server
I am sure that both are possible based on the bunch of links I read through
http://www.cloudera.com/blog/2011/10/apache-sqoop-overview/
http://www.microsoft.com/en-in/download/details.aspx?id=27584
But when I am looking for possible issues that could rise at level of configuration and maintenance I don't have proper answers.
I strongly feel that I should go for import, but I am not comfortable in troubleshooting and maintaining the issues that could come up every now and then.
Can someone share their thoughts on what could be the best?

Both of your options use the same method: Apache Sqoop's Export utility. Using the licensed Microsoft connector/driver jar should expectedly yield more performance for the task than using a generic connector offered by Apache Sqoop.
In terms of maintenance, there should be none once you have it working fine. So long as the version of SQL Server in use is supported by the driver jar, it should continue to work as normally expected of it.
In terms of configuration, you may initially have to manually tune to find the best -m value for parallelism of your Export MapReduce job launched by the export tool. Using a too high value would cause problems on the DB side, while using a too low value would not give you ideal performance. Some trial and error is required here to arrive at the right -m value, along with knowledge of the load periods of your DB, in order to set the parallelism right.
The Apache Sqoop (v1) doc page for users of the export tool also lists down a set of common reasons for the failure of the export job. You may want to view those here.
On the MapReduce side, you may also want to dedicate a defined scheduler pool or queue for such external-writing jobs as they may be business critical, and schedulers like FairScheduler and CapacityScheduler help define SLA guarantees on each pool or queue such that the jobs get adequate resources to run when they're launched.