I have logs in the following type of format:
2021-10-12 14:41:23,903716 [{"Name":"A","Dimen":[{"Name":"in","Value":"348"},{"Name":"ses","Value":"asfju"}]},{"Name":"read","A":[{"Name":"ins","Value":"348"},{"Name":"ses","Value":"asf5u"}]}]
2021-10-12 14:41:23,903716 [{"Name":"B","Dimen":[{"Name":"in","Value":"348"},{"Name":"ses","Value":"a7hju"}]},{"Name":"read","B":[{"Name":"ins","Value":"348"},{"Name":"ses","Value":"ashju"}]}]
Each log on a new line. Problem is I want each object from the single line in the top level array to be a separate document and parsed accordingly.
I need to parse this and send it to Elasticsearch. I have tried a number of filters, grok, JSON, split etc and I cannot get it to work the way I need to and I have little experience with these filters so if anyone can help it would be much appreciated.
The JSON codec is what I would need if I can remove the Text/timestamp from the file.
"If the data being sent is a JSON array at its root multiple events will be created (one per element)"
If there is a way to do that, this would also be helpful
This is the config example for your usecase:
input { stdin {} }
filter {
grok {
match => { "message" => "%{DATA:date},%{DATA:some_field} %{GREEDYDATA:json_message}" }
}
#Use the json plugin to translate raw to json
json { source => "json_message" target => "json" }
#and split the result to dedicated raws
split { field => "json" }
}
output {
stdout {
codec => rubydebug
}
}
If you need to parse the start of the log as date, you can use the grok with the date format or connect two fields and set than as source to the date plugin.
Related
In ADF I am calling an API source that's returning nested/complex JSON that I need to flatten out into csv files.
The Copy Activity wont work for me. It will only read the first record from the nested data.
I need to be able to call and then flatten the aliases array
Here is an example of the response from the API:
{
"items": [
{
"title_no": 12345,
"booking_xref_title_no": 45305,
"edi_no": "2495",
"title_global_id": "TTL-11",
"aliases": [
{
"source_name": "123A",
"title_alias_global_id": "ABC1234"
},
{
"source_name": "123B",
"title_alias_global_id": "ABC5678"
I need to get the following into my output csv:
source_name
title_global_id
123A
ABC1234
123B
ABC5678
Json output given in question is incomplete. So I added few braces to it.
JSON response should be as given below:
{
"items":[
{
"title_no":12345,
"booking_xref_title_no":45305,
"edi_no":"2495",
"title_global_id":"TTL-11",
"aliases":[
{
"source_name":"123A",
"title_alias_global_id":"ABC1234"
},
{
"source_name":"123B",
"title_alias_global_id":"ABC5678"
}
]
}
]
}
With above JSON response, you can flatten it and save output to CSV file as shown in below steps.
Use JSON response as Source to DataFlow Activity in Azure data factory.
Select document form as Single document.
Now select Flatten transformation. Select Unroll by = items.aliases and do the mapping as shown in below screenshot.
Data preview after Flatten transformation.
Now add CSV file as Sink.
Output:
I'm trying to get Azure Data Factory to read my REST API and put it in SQL Server. The source is a REST API and the sink is a SQL Server table.
I tried to do something like:
"translator": {
"type": "TabularTranslator",
"schemaMapping": {
"$": "json"
},
"collectionReference": "$.tickets"
}
The source looks like:
{ "tickets": [ {... }, {...} ] }
Because of the poor mapping capabilities I'm choosing this path. I'll then split the data with a query. Preferbly I'd like to store each object inside tickets as a row with JSON of that object.
In short, how can I get the JSON output from the RestSource to a SqlSink single column text/nvarchar(max) column?
I managed to solve the same issue by modifying mapping manually.
ADF anyway tries to parse json, but from the Advanced mode you can edit json paths. Ex., this is the original schema parsed automatically by ADF
https://imgur.com/Y7QhcDI
Once opened in Advanced mode it will show full paths by adding indexes of the elements, something similar to $tickets[0][] etc
Try to delete all other columns and keep the only one $tickets (the highest level one), in my case it was $value https://i.stack.imgur.com/WnAzC.jpg. As the result the entire json will be written into the destination column.
If there are pagination rules in place, each page will be written as a single row.
I use JSON for sending data through an API for clients.
My data is a JSON array of objects, each
object in the array has the same type,
and the keys value are the same for all.
70% of a request is consumed by repeating useless key names.
Is there way to send data without this overhead?
"I know some way exists like csv but I want to choose general solution for this problem"
for example my array in json 5Mb and in csv its only 500kb
A simple json array
var people = [
{ firstname:"Micro", hasSocialNetworkSite: false, lastname:"Soft", site:"http://microsoft.com" },
{ firstname:"Face", hasSocialNetworkSite: true, lastname:"Book", site:"http://facebook.com" },
{ firstname:"Go", hasSocialNetworkSite: true, lastname:"ogle", site:"http://google.com" },
{ firstname:"Twit", hasSocialNetworkSite: true, lastname:"Ter", site:"http://twitter.com" },
{ firstname:"App", hasSocialNetworkSite: false, lastname:"Le", site:"http://apple.com" },
];
and this above array in csv format
"firstname","hasSocialNetworkSite","lastname","site"
"Micro","False","Soft","http://microsoft.com"
"Face","True","Book","http://facebook.com"
"Go","True","ogle","http://google.com"
"Twit","True","Ter","http://twitter.com"
"App","False","Le","http://apple.com"
you can see that the performance of json array of object in example.
Why would using a csv file not be a 'general solution'?
If your data is tabular you don't really need a hierachical format like json or xml.
You can even shrink your csv file further by removing the double quotes (those are only needed when there is a separator inside the field):
firstname,hasSocialNetworkSite,lastname,site
Micro,False,Soft,http://microsoft.com
Face,True,Book,http://facebook.com
Go,True,ogle,http://google.com
Twit,True,Ter,http://twitter.com
App,False,Le,http://apple.com
I want to split the custom logs
"2016-05-11 02:38:00.617,userTestId,Key-string-test113321,UID-123,10079,0,30096,128,3"
that log means
Timestamp, String userId, String setlkey, String uniqueId, long providerId, String itemCode1, String itemCode2, String itemCode3, String serviceType
I try to made a filter using ruby
filter {
ruby{
code => "
fieldArray = event['message'].split(',')
for field in fieldArray
result = field
event[field[0]] = result
end
"
}
}
but I don't have idea how to split the logs with adding field names each custom values as belows.
Timestamp : 2016-05-11 02:38:00.617
userId : userTestId
setlkey : Key-string-test113321
uniqueId : UID-123
providerId : 10079
itemCode1 : 0
itemCode2 : 30096
itemCode3 : 128
serviceType : 3
How can I do?
Thanks regards.
You can use the grok filter instead. The grok filter parse the line with a regex and you can associate each group to a field.
It is possible to parse your log with this pattern :
grok {
match => {
"message" => [
"%{TIMESTAMP_ISO8601:timestamp},%{USERNAME:userId},%{USERNAME:setlkey},%{USERNAME:uniqueId},%{NUMBER:providerId},%{NUMBER:itemCode1},%{NUMBER:itemCode2},%{NUMBER:itemCode3},%{NUMBER:serviceType}"
]
}
}
This will create the fields you wish to have.
Reference: grok patterns on github
To test : Grok constructor
Another solution :
You can use the csv filter, which is even more closer to your needs (but I went with grok filter first since I have more experience with it): Csv filter documentation
The CSV filter takes an event field containing CSV data, parses it, and stores it as individual fields (can optionally specify the names). This filter can also parse data with any separator, not just commas.
I have never used it, but it should look like this :
csv {
columns => [ "Timestamp", "userId", "setlkey", "uniqueId", "providerId", "itemCode1", "itemCode2 "itemCode3", "serviceType" ]
}
By default, the filter is on the message field, with the "," separator, so there no need to configure them.
I think that the csv filter solution is better.
I am storing c structures to couchbase, I am doing so so that I can read back these structures later and process directly, I am avoiding the steps of
1 )C structure - > JSON while storing
and
2 )JSON -> C structure while retrieving.
This is working well when I use lcb_get() and lcb_set()
But I also need have a requirement for making hits to views using the REST model and lcb_make_http_request () call.
So I was wondering how the lcb_make_http_request () will handle my non-JSON C structure , which is hex data and may have nulls in between.
Will I still be able to extract and populate my C - structure with the data that I get as HTTP response after calling lcb_make_http_request () ?
As WiredPrairie said in his comment you aren't forced to use JSON and can store C structs, but keep in mind byte order and field alignment when you are doing so.
When server detects that your data isn't in JSON format it will encode it using base64 and set meta.type to "json" when the document comes to map function.
And you will be able to emit your complete document as a value if you'd like to get the value in the HTTP stream. In case of this simple map function:
function (doc, meta) {
if (meta.type == "base64") {
emit(meta.id);
}
}
You will get response like this one (I've formatted it for clarity):
{
"total_rows": 1,
"rows": [
{
"id": "foo",
"key": "foo",
"value": "4KwuAgAAAAA="
}
]
}
It does mean that you must use some json parser to extract "value" attribute from result, decode it and you will get exactly the same bytestream, you have sent it with SET command.