Ingesting and LINE BREAKING json data to SPLUNK

Ingesting and LINE BREAKING json data to SPLUNK - arrays

Looking to ingest this RESTAPI data to SPLUNK, but having issues with LINE BREAKER, can't seem to discover the correct combination for props.conf.
Also as data is returned in array format without keys, do I need a script to add the keys to the returned array data or can this be achieved using SPLUNK?
N.B.
The keys are returned in the tail of the response.
RESTAPI CALL:
{{base_url}}accounts/{{account}}/{{siteid}}/report?dimensions=queryName,queryType,responseCode,responseCached,coloName,origin,dayOfWeek,tcp,ipVersion,querySizeBucket,responseSizeBucket&metrics=queryCount,uncachedCount,staleCount,responseTimeAvg&limit=2
Any help appreciated.
{
"result": {
"rows": 100,
"data": [
{
"dimensions": [
"college.edu",
"A",
"REFUSED",
"uncached",
"EWR",
"192.0.0.0",
"1",
"0",
"4",
"48-63",
"48-63"
],
"metrics": [
1,
1,
0,
16
]
},
{
"dimensions": [
"school.edu",
"A",
"REFUSED",
"uncached",
"EWR",
"192.0.0.0",
"1",
"0",
"4",
"32-47",
"32-47"
],
"metrics": [
1,
1,
0,
10
]
}
],
"data_lag": 0,
"min": {},
"max": {},
"totals": {
"queryCount": 12,
"responseTimeAvg": 37.28936572607269,
"staleCount": 0,
"uncachedCount": 2147541
},
"query": {
"dimensions": [
"queryName",
"queryType",
"responseCode",
"responseCached",
"coloName",
"origin",
"dayOfWeek",
"tcp",
"ipVersion",
"querySizeBucket",
"responseSizeBucket"
],
"metrics": [
"queryCount",
"uncachedCount",
"staleCount",
"responseTimeAvg"
],
"since": "2022-10-17T04:37:00Z",
"until": "2022-10-17T10:37:00Z",
"limit": 100
}
},
"success": true,
"errors": [],
"messages": []
}

Assuming you want the JSON object to be a single event, the LINE_BREAKER setting should be }([\r\n]+){.
Splunk should have no problems parsing the JSON, but I think there will be problems relating metrics to dimensions because there are multiple sets of data and only one set of keys. Creating a script to combine them seems to be the best option.

Related

useQuery's onCompleted being called with cached value

Hopefully I can articulate this question clearly without too much code as it's difficult to extract the pieces from my codebase.
I was observing odd behavior yesterday with useQuery that I can't seem to understand. I think I understand Apollo's cache pretty well but this particular behavior doesn't make sense to me. I have a query that looks something like this:
query {
reservations {
priceBreakdown {
sections {
id
name
total
}
}
}
}
The schema is something like:
type Query {
reservations: [Reservation]
}
type Reservation {
priceBreakdown: PriceBreakdown
}
type PriceBreakdown {
sections: [Section]
}
type Section {
id: String
name: String
total: Float
}
That id on Section is not a proper ID and, in fact, is not unique. It's just a string and all PriceBreakdowns have a list of Sections that contain the same ID. I've pointed this out to the backend folks and it's being fixed but I realize this causes incorrect caching with Apollo since there will be collisions w.r.t. __typename and id. My confusion comes from how onCompleted is called. I noticed when doing
const { data } = useQuery(myQuery, {
onCompleted: console.log
})
that when the network call returns, all PriceBreakdowns are unique and correct, as they should be. But when onCompleted is called with what I thought would be that same API data, it's different and seems to reflect the cached values. In case that's confusing, here are the two results. First is straight from the API and second is the log from onCompleted:
// api results
"data": [
{
"id": "92267",
"price_breakdown": {
"sections": [
{
"name": "Reservation",
"total": "$60.00",
"id": "RESERVATION"
},
{
"name": "Promotions and Fees",
"total": null,
"id": "PROMOTIONS_AND_FEES"
},
{
"name": "Total",
"total": "$51.00",
"id": "HOST_TOTAL"
}
]
}
},
{
"id": "92266",
"price_breakdown": {
"sections": [
{
"name": "Reservation",
"total": "$30.00",
"id": "RESERVATION"
},
{
"name": "Promotions and Fees",
"total": null,
"id": "PROMOTIONS_AND_FEES"
},
{
"name": "Total",
"total": "$25.50",
"id": "HOST_TOTAL"
}
]
}
}
]
// onCompleted log
"data": [
{
"id": "92267",
"price_breakdown": {
"sections": [
{
"name": "Reservation",
"total": "$60.00",
"id": "RESERVATION"
},
{
"name": "Promotions and Fees",
"total": null,
"id": "PROMOTIONS_AND_FEES"
},
{
"name": "Total",
"total": "$51.00",
"id": "HOST_TOTAL"
}
]
}
},
{
"id": "92266",
"price_breakdown": {
"sections": [
{
"name": "Reservation",
"total": "$60.00",
"id": "RESERVATION"
},
{
"name": "Promotions and Fees",
"total": null,
"id": "PROMOTIONS_AND_FEES"
},
{
"name": "Total",
"total": "$51.00",
"id": "HOST_TOTAL"
}
]
}
}
]
As you can see, in the onCompleted log, the Sections that had the same ID as Sections from the previous record are duplicated, suggesting Apollo is rebuilding the payload from cache and calling onCompleted with that. Is that what's happening? If I set the fetchPolicy to no-cache, the results are correct, but of course that's just a patch for the problem. I want to better understand Apollo because I thought I understood and now I see something unintuitive. I wouldn't have expected onCompleted to be called with something built from the cache. Thanks in advance.

Solr indexing nested objects array

We're trying to run an index in Solr (8.9.0 - Schemaless Mode) of a list of items that each contain 1 or 2 arrays of objects, with 1 or more records per array. The sample code below is the json we feed the index:
[
{
"id": 8270861,
"type": "Product",
"title": "Stripped T-shirt"
"tags": [{
"tagId": 218,
"tagIcon": "smile,happy",
"tagHelpText": "",
"tagValue": "grand"
},
{
"tagId": 219,
"tagIcon": "frown,sad",
"tagHelpText": "",
"tagValue": "grand"
}],
"keywords": [
{
"keywordId": 742,
"type": "color"
},
{
"keywordId": 743,
"type": "size"
}]
}
]
2 problems we run into:
PROBLEM 1:
The output of the solr query changes the format of the arrays to this (effectively removing the quotes):
...
"tags": [
"{tagIcon=smile,happy, tagHelpText=, tagId=218, tagValue=grand}",
"{tagIcon=frown,sad, tagHelpText=, tagId=219, tagValue=grand}"
],
"keywords": [
"{type=color, keywordId=742}",
"{type=size, keywordId=743}"
],
...
Is there a way to get the format of the arrays to come back the same way as fed into the index:
"tags": [
{ "tagId": 218, "tagIcon": "smile,happy", "tagHelpText": "", "tagValue": "grand" },
{ "tagId": 219, "tagIcon": "frown,sad", "tagHelpText": "", "tagValue": "grand"}
]
to avoid any conflicts when the value is a comma separated list. Are we missing some definition adjustments in the schema file? If so do we need to define the children of those parent keys(i.e. "tags.tagIcon")?
PROBLEM 2:
The index seems to reject an array with a single element. If we feed it the same json as above, but only one entry in the keywords array (or the tags array):
...
"keywords": [
{
"keywordId": 742,
"type": "color"
}]
...
it throws a code: 400 Unknown operation for the an atomic update: type"
Any suggestions on this would be welcome.

Array within Element within Array in Variant

How can I get the data out of this array stored in a variant column in Snowflake. I don't care if it's a new table, a view or a query. There is a second column of type varchar(256) that contains a unique ID.
If you can just help me read the "confirmed" data and the "editorIds" data I can probably take it from there. Many thanks!
Output example would be
UniqueID ConfirmationID EditorID
u3kd9 xxxx-436a-a2d7 nupd
u3kd9 xxxx-436a-a2d7 9l34c
R3nDo xxxx-436a-a3e4 5rnj
yP48a xxxx-436a-a477 jTpz8
yP48a xxxx-436a-a477 nupd
[
{
"confirmed": {
"Confirmation": "Entry ID=xxxx-436a-a2d7-3525158332f0: Confirmed order submitted.",
"ConfirmationID": "xxxx-436a-a2d7-3525158332f0",
"ConfirmedOrders": 1,
"Received": "8/29/2019 4:31:11 PM Central Time"
},
"editorIds": [
"xxsJYgWDENLoX",
"JR9bWcGwbaymm3a8v",
"JxncJrdpeFJeWsTbT"
] ,
"id": "xxxxx5AvGgeSHy8Ms6Ytyc-1",
"messages": [],
"orderJson": {
"EntryID": "xxxxx5AvGgeSHy8Ms6Ytyc-1",
"Orders": [
{
"DropShipFlag": 1,
"FromAddressValue": 1,
"OrderAttributes": [
{
"AttributeUID": 548
},
{
"AttributeUID": 553
},
{
"AttributeUID": 2418
}
],
"OrderItems": [
{
"EditorId": "aC3f5HsJYgWDENLoX",
"ItemAssets": [
{
"AssetPath": "https://xxxx573043eac521.png",
"DP2NodeID": "10000",
"ImageHash": "000000000000000FFFFFFFFFFFFFFFFF",
"ImageRotation": 0,
"OffsetX": 50,
"OffsetY": 50,
"PrintedFileName": "aC3f5HsJYgWDENLoX-10000",
"X": 50,
"Y": 52.03909266409266,
"ZoomX": 100,
"ZoomY": 93.75
}
],
"ItemAttributes": [
{
"AttributeUID": 2105
},
{
"AttributeUID": 125
}
],
"ItemBookAttribute": null,
"ProductUID": 52,
"Quantity": 1
}
],
"SendNotificationEmailToAccount": true,
"SequenceNumber": 1,
"ShipToAddress": {
"Addr1": "Addr1",
"Addr2": "0",
"City": "City",
"Country": "US",
"Name": "Name",
"State": "ST",
"Zip": "00000"
}
}
]
},
"orderNumber": null,
"status": "order_placed",
"submitted": {
"Account": "350000",
"ConfirmationID": "xxxxx-436a-a2d7-3525158332f0",
"EntryID": "xxxxx-5AvGgeSHy8Ms6Ytyc-1",
"Key": "D83590AFF0CC0000B54B",
"NumberOfOrders": 1,
"Orders": [
{
"LineItems": [],
"Note": "",
"Products": [
{
"Price": "00.30",
"ProductDescription": "xxxxxint 8x10",
"Quantity": 1
},
{
"Price": "00.40",
"ProductDescription": "xxxxxut Black 8x10",
"Quantity": 1
},
{
"Price": "00.50",
"ProductDescription": "xxxxx"
},
{
"Price": "00.50",
"ProductDescription": "xxxscount",
"Quantity": 1
}
],
"SequenceNumber": "1",
"SubTotal": "00.70",
"Tax": "1.01",
"Total": "00.71"
}
],
"Received": "8/29/2019 4:31:10 PM Central Time"
},
"tracking": null,
"updatedOn": 1.598736670503000e+12
}
]

So, this is how I'd query that exact JSON assuming the data is in column var in table x:
SELECT x.var[0]:confirmed:ConfirmationID::varchar as ConfirmationID,
f.value::varchar as EditorID
FROM x,
LATERAL FLATTEN(input => var[0]:editorIds) f
;
Since your sample output doesn't match the JSON that you provided, I will assume that this is what you need.
Also, as a note, your JSON includes outer [ ] which indicates that the entire JSON string is inside an array. This is the reason for var[0] in my query. If you have multiple records inside that array, then you should remove that. In general, you should exclude those and instead load each record into the table separately. I wasn't sure whether you could make that change, so I just wanted to make note.

Flutter: Parse json arrays without name

I am getting json response from server as below.
[
[{
"ID": 1,
"Date": "11-09-2015",
"Balance": 1496693.00
}, {
"ID": 2,
"Date": "01-10-2015",
"Balance": 1496693.00
}],
[{
"ID": 1,
"Date": "03-09-2000",
"IntAmount": "003.00"
}],
[{
"EmployeeId": "000",
"DesignationName": "deg"
}],
[{
"LoanAmount": "00000.00",
"IntRate": "3.00",
"LoanNo": "56656"
}]
]
I can parse json array with name but in above json there are three arrays without name.
How to parse above json in three different arrays?

If you are positive that the data will always come in the stated format, then you can iterate through the result. See below for example:
main(List<String> args) {
// Define the array of data "object" like this
List<List<Map<String, dynamic>>> arrayOfData = [
[
{"ID": 1, "Date": "11-09-2015", "Balance": 1496693.00},
{"ID": 2, "Date": "01-10-2015", "Balance": 1496693.00}
],
[
{"ID": 1, "Date": "03-09-2000", "IntAmount": "003.00"}
],
[
{"EmployeeId": "000", "DesignationName": "deg"}
],
[
{"LoanAmount": "00000.00", "IntRate": "3.00", "LoanNo": "56656"}
]
];
/*
Iterate through the array of "objects" using forEach,
then, iterate through each resulting array using forEach
*/
arrayOfData.forEach((datasetArray) => datasetArray.forEach((dataset) => print(dataset)));
/*
============== RESULT ========
{ID: 1, Date: 11-09-2015, Balance: 1496693.0}
{ID: 2, Date: 01-10-2015, Balance: 1496693.0}
{ID: 1, Date: 03-09-2000, IntAmount: 003.00}
{EmployeeId: 000, DesignationName: deg}
{LoanAmount: 00000.00, IntRate: 3.00, LoanNo: 56656}
*/
}

How do I get french text FEMMES.COM to index as language variants of FEMMES

I need FEMMES.COM to get tokenized as singular + plural forms of the base word FEMME.
Custom Analyzer Config
"analyzers": [ { "#odata.type": "#Microsoft.Azure.Search.CustomAnalyzer", "name": "text_language_search_custom_analyzer", "tokenizer": "text_language_search_custom_analyzer_ms_tokenizer", "tokenFilters": [ "lowercase", "asciifolding" ], "charFilters": [ "html_strip" ] } ], "tokenizers": [ { "#odata.type": "#Microsoft.Azure.Search.MicrosoftLanguageStemmingTokenizer", "name": "text_language_search_custom_analyzer_ms_tokenizer", "maxTokenLength": 300, "isSearchTokenizer": false, "language": "english" } ], "tokenFilters": [], "charFilters": []}
Analyze API call for FEMMES
{ "analyzer": "text_language_search_custom_analyzer", "text": "FEMMES" }
Analyze API response for FEMMES
{ "#odata.context": "https://one-adscope-search-eu-stage.search.windows.net/$metadata#Microsoft.Azure.Search.V2016_09_01.AnalyzeResult", "tokens": [ { "token": "femme", "startOffset": 0, "endOffset": 6, "position": 0 }, { "token": "femmes", "startOffset": 0, "endOffset": 6, "position": 0 } ] }
Analyze API response for FEMMES.COM
{ "#odata.context": "https://one-adscope-search-eu-stage.search.windows.net/$metadata#Microsoft.Azure.Search.V2016_09_01.AnalyzeResult", "tokens": [ { "token": "femmes", "startOffset": 0, "endOffset": 6, "position": 0 }, { "token": "com", "startOffset": 7, "endOffset": 10, "position": 1 } ] }
Analyze API response for FEMMES COM
{ "#odata.context": "https://one-adscope-search-eu-stage.search.windows.net/$metadata#Microsoft.Azure.Search.V2016_09_01.AnalyzeResult", "tokens": [ { "token": "femme", "startOffset": 0, "endOffset": 6, "position": 0 }, { "token": "femmes", "startOffset": 0, "endOffset": 6, "position": 0 }, { "token": "com", "startOffset": 7, "endOffset": 10, "position": 1 } ]}

I think I figured this one out myself after some experimentation. I found the MappingCharFilter could be used to replace . with , before the indexer did the tokenization. This allowed the lemmatization/stemming to work as expected on the terms in question. I need to do more thorough integration tests with our other use cases, but I think this would solve the problem for anybody facing the same type of issue.

My previous answer was not correct. Azure Search implementation actually applies the language tokenizer BEFORE token filters. This essentially made the WordDelimiterToken filter useless in my use case.
What I ended up having to do was to pre-process data BEFORE I uploaded to Azure for indexing. In my C# code, I added some regex logic that would break apart text like FEMMES2017 into FEMMES 2017, before I sent it to Azure. This way, when the text got to Azure, the indexer would see FEMMES by itself and properly tokenize as FEMME and FEMMES using the language tokenizer.