Aggregate Jenkins build logs stored in ElasticSearch - database

I'm storing my Jenkins build logs in ElasticSearch with the Jenkins Logstash plugin.
My configuration looks sort of like this:
That part works great, but I'd like to view the full log in Kibana.
The plugin incrementally sends the results to ES and breaks on each newline. That means a long log can look something like this in Kibana:
Where each line is a massive JSON output containing tons of fields I do not care about. I really only care about the message field.
I'm reading about aggregators right now that appear to be what I need, but my results are not coming out to what I'd like.
curl -X GET "localhost:9200/_search" -H 'Content-Type: application/json' -d'
{
"aggs" : {
"buildLog" : {
"terms" : {
"field" : "data.url"
}
}
}
}'
Prints out a large glob of json that does not have what I need.
In a perfect world, I'd like to concatenate every message field from each data.url and fetch that.
In SQL, an individual query for this might look something like:
SELECT message FROM jenkins-logstash WHERE data.url='job/playground/36' ORDER BY ASC
Where 'job/playground/36' is one example of every data.url.
How can I go about doing this?

Update: Better answer than before.
I still ended up using FileBeat, but with ELK v6.5.+ Kibana has a logs UI! https://www.elastic.co/guide/en/kibana/current/logs-ui.html
The default config from FileBeat works fine with it.
__
Old answer:
I ended up solving this by using FileBeat to harvest all logs, and then using the Kibana Log Viewer to watch each one. I filtered based on source and then used the path where the log was going to be.

Related

Spark Streaming display (streaming) not working

I follow this example to simulate streaming in Spark from a source file. At the end of the example, a function named display is used which is supported only in databricks. I run my code in Jupyter notebook. What is the alternative in Jupyter to get the same output obtained from display function?
screenshoot_of_the_Example
Update_1:
The code:
# Source
sourceStream=spark.readStream.format("csv").\
option("header",True).\
schema(schema).option("ignoreLeadingWhiteSpace",True).\
option("mode","dropMalformed").\
option("maxFilesPerTrigger",1).load("D:/PHD Project/Paper_3/Tutorials/HeartTest_1/").\
withColumnRenamed("output","label")
#stream test data to the ML model
streamingHeart=pModel.transform(sourceStream).select('label')
I do the following:
streamingHeart.writeStream.outputMode("append").\
format("csv").option("path", "D:/PHD \
Project/Paper_3/Tutorials/sa1/").option("checkpointLocation",\
"checkpoint/filesink_checkpoint").start()\
The problem is that the generated files (output files) are empty. What might be the reason behind that?
I solved the problem by changing the checkpoint, as follow.
Project/Paper_3/Tutorials/sa1/").option("checkpointLocation",\
"checkpoint/filesink_checkpoint_1")

Saving Documents to CouchDB Urls with Multiple Slashes

My first exposure to NoSQL DBs was through Firebase, where I'd typically store json data to a url like: category, then store something else later to a url like category/subcategory.
Trying to do the same in CouchDB I ran into a problem.
For example, I saved a simple object like:
{"_id":"one"}
to
database/category
which works as expected. Then I try saving the following
{"_id":"two"}
to
database/category/subcategory
I get this error message:
{"error":"not_found","reason":"Document is missing attachment"}
Apparently, when you use multiple slashes in a url, Couch understands the resource as an attachment. If this is so, how does one make databases where data will have multiple levels, like Geography/Continents/Africa/Egypt, for example?
CouchDB is not suitable for the usage you described. CouchDB is a flat document store.
You should flatten your structure in order to store it in CouchDB.
{"_id":"country-es",
"type":"geography",
"country":"Spain",
"continent":"Europe"
}
{"_id":"country-fr",
"type":"geography",
"country":"France",
"continent":"Europe"
}
Then use a view in order to have a mechanism to query it hierarchically.
function (doc) {
if (doc.type == "geography") {
emit([doc.continent,doc.country], doc._id );
}
}

How do we get the document file url using the Watson Discovery Service?

I don't see a solution to this using the available api documentation.
It is also not available on the web console.
Is it possible to get the file url using the Watson Discovery Service?
If you need to store the original source/file URL, you can include it as a field within your documents in the Discovery service, then you will be able to query that field back out when needed.
I also struggled with this request but ultimately got it working using Python bindings into Watson Discovery. The online documentation and API reference is very poor; here's what I used to get it working:
(Assume you have a Watson Discovery service and have a created collection):
# Programmatic upload and retrieval of documents and metadata with Watson Discovery
from watson_developer_cloud import DiscoveryV1
import os
import json
discovery = DiscoveryV1(
version='2017-11-07',
iam_apikey='xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx',
url='https://gateway-syd.watsonplatform.net/discovery/api'
)
environments = discovery.list_environments().get_result()
print(json.dumps(environments, indent=2))
This gives you your environment ID. Now append to your code:
collections = discovery.list_collections('{environment-id}').get_result()
print(json.dumps(collections, indent=2))
This will show you the collection ID for uploading documents into programmatically. You should have a document to upload (in my case, an MS Word document), and its accompanying URL from your own source document system. I'll use a trivial fictitious example.
NOTE: the documentation DOES NOT tell you to append , 'rb' to the end of the open statement, but it is required when uploading a Word document, as in my example below. Raw text / HTML documents can be uploaded without the 'rb' parameter.
url = {"source_url":"http://mysite/dis030.docx"}
with open(os.path.join(os.getcwd(), '{path to your document folder with trailing / }', 'dis030.docx'), 'rb') as fileinfo:
add_doc = discovery.add_document('{environment-id}', '{collections-id}', metadata=json.dumps(url), file=fileinfo).get_result()
print(json.dumps(add_doc, indent=2))
print(add_doc["document_id"])
Note the setting up of the metadata as a JSON dictionary, and then encoding it using json.dumps within the parameters. So far I've only wanted to store the original source URL but you could extend this with other parameters as your own use case requires.
This call to Discovery gives you the document ID.
You can now query the collection and extract the metadata using something like a Discovery query:
my_query = discovery.query('{environment-id}', '{collection-id}', natural_language_query="chlorine safety")
print(json.dumps(my_query.result["results"][0]["metadata"], indent=2))
Note - I'm extracting just the stored metadata here from within the overall returned results - if you instead just had:
print(my_query) you'll get the full response from Discovery ... but ... there's a lot to go through to identify just your own custom metadata.

cURL post json data, json array and image files REST api testing

I'm facing a problem with complex http request via cURL.
I'm building REST API with NODEjs, using Express routers and Multer middleware to handle multiple body data and files.
My endpoint route 127.0.0.1/api/postData
expects:
json data with fields, one of which is array of json objects (I'm having nested mongoose schema) and 2 named images (png/jpg).
I need to send Post request via cURL with the following 5-object data structure:
name String
description String
usersArray Array of json objects like: [{"id": "123"}, {"id": "456}]
imgIcon Png/Image providing /path/to/imageIcon.png
imgHeader Png/Image providing /path/to/imageHeader.png
I've read a lot of threads on stackoverflow, but all of them are answers to a particular singular problem, one thread about curl post images, another cURL post arrays, but not altogether.
I've tried REST API test tools like POSTMAN and DHC (google chrome), and there everything's fine except for arraysArray field
I used the fields like:
usersArray[0] {"id": "123"}
usersArray[1] {"id": "456"}
But validation didn't pass, cuz the json object value is parsed somehow incorrectly.
So I decided to put everything in cURL script.
I tried to write my cURL request in following way:
#!/bin/bash
curl -H 'Content-Type: application/json'
-H 'Accept: application/json' -X POST
-F "name=Foo Name Test"
--data '[{"id": "a667cc8f-42cf-438a-b9d8-7515438a9ac1"}, {"id": "7c7960fb-eeb9-4cbf-9838-bcb6bc9a3878"}]'
-F "description=Super Bar"
-F "imgIcon=#/home/username/Pictures/imgIcon.png"
-F "imgHeader=#/home/username/Pictures/imgHeader.png" http://127.0.0.1:7777/api/postData
When I run my cUrl script in bash
./postData
I got this:
$ Warning: You can only select one HTTP request!
You can help with:
1) Any Idea how to write such a complex HTTP REST request in cURL
2) Or with suggestion of tools (like DHC, POSTMAN) to solve this complex http request.
3) or with any idea how to write this request with the help of request.js node http request library.
Thank you all in advance for all answers, thoughts and ideas!!
Regards, JJ
You can try POSTMAN or google chrome ext app for REST api testing DHC.
But you should use multidimensional array instead of using JSON objects as a value, that can cause problem during validation.
Try this in DHC:
|FIELD| |VALUE|
name 'Some name'
description 'Some description'
usersArray[0][id] 89a7df9
usersArray[1][id] dskf28f
imgIcon (select type of field "file" and upload image)
imgHeader (select type of field "file" and upload image)
The way it works above is:
usersArray[0][id] specifies multidimensional array and places on position 0 an object {} with key "id" and value which you sepcify in value part.
So usersArray[0][id] "123" creates [{"id": 123}]
usersArray[1][id] "456" adds another element to array, so array becomes: [{"id": "123"},{"id": "456"}]
Sometimes you would want to use shell+curl to make RestAPI Calls, and you may want to pass complex json as data. And in case you want to use variables and create that data during execution time, you probably end up using lot of escape characters () and code would look ugly. I was doing the same thing as well, Windows powershell has a method to convert Dictionary|Associative Array to JSON which really helps in that realm but looked for something similar but couldn't find anything. So this is a sample code that would help:
#!/bin/Bash
# Author Jayan#localhost.com
## Variable Declaration
email="jayan#localhost.com"
password="VerySecurePassword"
department="Department X"
## Create JSON using variables in Bash
creds="{^email^:^${email}^, ^password^:^${password}^}"
department="{^department^:^${department}^}"
authdata_crippled="{^Authenticate^:[${creds},${department}]}"
# Replace ^ with "
authdata_json_for_RestAPI_Call=$(echo ${authdata_crippled}|tr '^' '"')
# Testing syntax
# Get "jq": yum install jq OR https://stedolan.github.io/jq/
echo ${authdata_json_for_RestAPI_Call} | jq .
# Then you make API call using curl
# Eg:
curl http://hostname:port/right/endpoint/for/api/Authenticate -X POST --header "Content-Type:application/json" -d ${authdata_json_for_RestAPI_Call} --cookie-jar cookie.data
Hope this helps someone.

How to view JSON logs of a managed VM in the Log Viewer?

I'm trying to get JSON formatted logs on a Compute Engine VM instance to appear in the Log Viewer of the Google Developer Console. According to this documentation it should be possible to do so:
Applications using App Engine Managed VMs should write custom log
files to the VM's log directory at /var/log/app_engine/custom_logs.
These files are automatically collected and made available in the Logs
Viewer.
Custom log files must have the suffix .log or .log.json. If the suffix
is .log.json, the logs must be in JSON format with one JSON object per
line. If the suffix is .log, log entries are treated as plain text.
This doesn't seem to be working for me: logs ending with .log are visible in the Log Viewer, but displayed as plain text. Logs ending with .log.json aren't visible at all.
It also contradicts another recent article that states that file names must end in .log and its contents are treated as plain text.
As far as I can tell Google uses fluentd to index the log files into the Log Viewer. In the GitHub repository I cannot find any evidence that .log.json files are being indexed.
Does anyone know how to get this working? Or is the documentation out-of-date and has this feature been removed for some reason?
Here is one way to generate JSON logs for the Managed VMs logviewer:
The desired JSON format
The goal is to create a single line JSON object for each log line containing:
{
"message": "Error occurred!.",
"severity": "ERROR",
"timestamp": {
"seconds": 1437712034000,
"nanos": 905
}
}
(information sourced from Google: https://code.google.com/p/googleappengine/issues/detail?id=11678#c5)
Using python-json-logger
See: https://github.com/madzak/python-json-logger
def get_timestamp_dict(when=None):
"""Converts a datetime.datetime to integer milliseconds since the epoch.
Requires special handling to preserve microseconds.
Args:
when:
A datetime.datetime instance. If None, the timestamp for 'now'
will be used.
Returns:
Integer time since the epoch in milliseconds. If the supplied 'when' is
None, the return value will be None.
"""
if when is None:
when = datetime.datetime.utcnow()
ms_since_epoch = float(time.mktime(when.utctimetuple()) * 1000.0)
return {
'seconds': int(ms_since_epoch),
'nanos': int(when.microsecond / 1000.0),
}
def setup_json_logger(suffix=''):
try:
from pythonjsonlogger import jsonlogger
class GoogleJsonFormatter(jsonlogger.JsonFormatter):
FORMAT_STRING = "{message}"
def add_fields(self, log_record, record, message_dict):
super(GoogleJsonFormatter, self).add_fields(log_record,
record,
message_dict)
log_record['severity'] = record.levelname
log_record['timestamp'] = get_timestamp_dict()
log_record['message'] = self.FORMAT_STRING.format(
message=record.message,
filename=record.filename,
)
formatter = GoogleJsonFormatter()
log_path = '/var/log/app_engine/custom_logs/worker'+suffix+'.log.json'
make_sure_path_exists(log_path)
file_handler = logging.FileHandler(log_path)
file_handler.setFormatter(formatter)
logging.getLogger().addHandler(file_handler)
except OSError:
logging.warn("Custom log path not found for production logging")
except ImportError:
logging.warn("JSON Formatting not available")
To use, simply call setup_json_logger - you may also want to change the name of worker for your log.
I am currently working on a NodeJS app running on a managed VM and I am also trying to get my logs to be printed on the Google Developper Console. I created my log files in the ‘/var/log/app_engine’ directory as described in the documentation. Unfortunately this doesn’t seem to be working for me, even for the ‘.log’ files.
Could you describe where your logs are created ? Also, is your managed VM configured as "Managed by Google" or "Managed by User" ? Thanks!

Resources