Solr indexing nested objects array - arrays

We're trying to run an index in Solr (8.9.0 - Schemaless Mode) of a list of items that each contain 1 or 2 arrays of objects, with 1 or more records per array. The sample code below is the json we feed the index:
[
{
"id": 8270861,
"type": "Product",
"title": "Stripped T-shirt"
"tags": [{
"tagId": 218,
"tagIcon": "smile,happy",
"tagHelpText": "",
"tagValue": "grand"
},
{
"tagId": 219,
"tagIcon": "frown,sad",
"tagHelpText": "",
"tagValue": "grand"
}],
"keywords": [
{
"keywordId": 742,
"type": "color"
},
{
"keywordId": 743,
"type": "size"
}]
}
]
2 problems we run into:
PROBLEM 1:
The output of the solr query changes the format of the arrays to this (effectively removing the quotes):
...
"tags": [
"{tagIcon=smile,happy, tagHelpText=, tagId=218, tagValue=grand}",
"{tagIcon=frown,sad, tagHelpText=, tagId=219, tagValue=grand}"
],
"keywords": [
"{type=color, keywordId=742}",
"{type=size, keywordId=743}"
],
...
Is there a way to get the format of the arrays to come back the same way as fed into the index:
"tags": [
{ "tagId": 218, "tagIcon": "smile,happy", "tagHelpText": "", "tagValue": "grand" },
{ "tagId": 219, "tagIcon": "frown,sad", "tagHelpText": "", "tagValue": "grand"}
]
to avoid any conflicts when the value is a comma separated list. Are we missing some definition adjustments in the schema file? If so do we need to define the children of those parent keys(i.e. "tags.tagIcon")?
PROBLEM 2:
The index seems to reject an array with a single element. If we feed it the same json as above, but only one entry in the keywords array (or the tags array):
...
"keywords": [
{
"keywordId": 742,
"type": "color"
}]
...
it throws a code: 400 Unknown operation for the an atomic update: type"
Any suggestions on this would be welcome.

Related

Reading data from MongoDB that contains array using Talend

I have a collection in my MongoDB that contains one field that is an array.
Refer to the data above, the field 'Courses' is an array.
The JSON format of the data is like this:
{
"_id": {
"$oid": "60eb59b98a970a20865142e8"
},
"Name": "Sadia",
"Age": 24,
"Institute": "IBA",
"Courses": [{
"Name": "ITP",
"Grade": "A-"
}, {
"Name": "OOP",
"Grade": "A-"
}]
}
I am aware that there is a way in case its an object, but could not find a way on how to read this data using Talend since it contains an array.

Ruby - parse JSON file with nested arrays to ruby hash without data loss

I have a file1.json with structure like this :
[
{
"uri": "features/hdp.feature",
"id": "as-a-user-i-want-to-use-house-detailed-page",
"keyword": "Feature",
"name": "As a user I want to use house detailed page",
"description": "",
"line": 2,
"tags": [
{
"name": "#hdp",
"line": 1
}
],
"elements": [
{
As you can see - it is an array with nested key:value pairs and other arrays. I need to convert it to ruby hash, but when I'm performing JSON.parse(file1) - it creates an array (http://prntscr.com/lqio6r) with ruby hashes, arrays and so on. If I'm performing JSON.parse(file1).reduce Hash.new, :merge or JSON.parse(file1).reduce Hash.new, :update) - as one of the answers on StackOverflow supposed - the result hash losses about 60% of .json content. Can you please advice on how can I convert json file to ruby hash (without any data losses)?
UPD - not truncated array - https://gist.githubusercontent.com/M1khah/3337507e3ca1544e6098bc726bca90cb/raw/c8262ad753bd0eebf1180e111acd016ffc07d1a5/gistfile1.txt
Hash with hashes - something like this instead of an array with nested hashes
{
{
"uri": "features/hdp.feature",
"id": "as-a-user-i-want-to-use-house-detailed-page",
"keyword": "Feature",
"name": "As a user I want to use house detailed page",
"description": "",
"line": 2,
"tags": [
{
"name": "#hdp",
"line": 1
}
],
"elements": [
{
}

Apache Nifi: Parse data with UpdateRecord Processor

I'm trying to parse some data in Nifi (1.7.1) using UpdateRecord Processor.
Original data are json files, that I would like to convert to Avro, based on a schema.
The Avro conversion is ok, but in that convertion I also need to parse one array element from the json data to a different structure in Avro.
This is a sample data of the input json:
{ "geometry" : {
"coordinates" : [ [ 4.963087975800593, 45.76365595859971 ], [ 4.962874487781098, 45.76320922779652 ], [ 4.962815443439148, 45.763116079159374 ], [ 4.962744732112515, 45.763010484202866 ], [ 4.962096825239138, 45.762112721939246 ] ]} ...}
Being its schema (specified in RecordReader):
{ "type": "record",
"name": "features",
"fields": [
{
"name": "geometry",
"type": {
"type": "record",
"name": "geometry",
"fields": [
{
"name": "coordinatesJson",
"type": {
"type": "array",
"items": {
"type": "array",
"items": "double"
}
}
},
]
}
},
....
]
}
As you can see, coordinates is an array of arrays.
And I need to parse those data to Avro, based on this schema (specified in RecordWriter):
{
"name": "outputdata",
"type": "record",
"fields": [
{"name": "coordinatesAvro",
"type": {
"type": "array",
"items" : {
"type" : "record",
"name" : "coordinatesAvro",
"fields" : [ {
"name" : "X",
"type" : "double"
}, {
"name" : "Y",
"type" : "double"
} ]
}
}
},
.....
]
}
The problem here is that I'm not being able to parse from coordinatesJson to coordinatesAvro, using RecordPath functions
I tried several mappings, like:
Property: Value:
/coordinatesJson[0..-1]/X /geometry/coordinatesAvro[*][0]
/coordinatesJson[0..-1]/Y /geometry/coordinatesAvro[*][1]
It should be a pretty straighforward parsing step, but as I said, I've been going in circles to achive this for a while.
Any help would be really appreciated.
When I collide with something like that I do next:
1) Transofrm Json into Json with strcuture that I need (for example in your case: coordinatesAvro) by ExecuteScript Processor. I have used ECMAScript cause you can simple parse JSON and work with objects (transform them).
2) ConvertJsonToAvro with one common schema (coordinatesAvro in your case) for Reader and Writer.
It works very good and I have used it on BigData cases. This is one of possible resolutions for your problem.

Swagger array of strings without name

Currently I am trying to create a swagger file for my software.
Now I would like to create a definition for a timeRange.
My problem is that this array looks like this:
timeRange: {
"2016-01-15T09:00:00.000Z", // this is the start date
"2017-01-15T09:00:00.000Z" // this is the end date
}
How can I create an example value that works out of the box?
It is an "array of strings" with a minimum of two.
"timeRange": {
"type": "array",
"items": {
"type": "string",
"example": "2017-01-15T09:00:00.000Z,2017-01-15T09:00:00.000Z"
}
}
This generates an example like this:
"timeRange": [
"2017-01-15T09:00:00.000Z,2017-01-15T09:00:00.000Z"
]
This example does not work, because it is an array and not an object.
All together:
How can I realize an example value that exists out of two different strings (without a name).
Hope you can help me!
Cheers!
timeRange: {
"2016-01-15T09:00:00.000Z", // this is the start date
"2017-01-15T09:00:00.000Z" // this is the end date
}
is not valid JSON – "timeRange" needs to be enclosed in quotes, and the object/array syntax should be different.
If using the object syntax {}, the values need to be named properties:
"timeRange": {
"start_date": "2016-01-15T09:00:00.000Z",
"end_date": "2017-01-15T09:00:00.000Z"
}
Otherwise timeRange needs to be an [] array:
"timeRange": [
"2016-01-15T09:00:00.000Z",
"2017-01-15T09:00:00.000Z"
]
In the first example ({} object), your Swagger would look as follows, with a separate example for each named property:
"timeRange": {
"type": "object",
"properties": {
"start_date": {
"type": "string",
"format": "date-time",
"example": "2016-01-15T09:00:00.000Z"
},
"end_date": {
"type": "string",
"format": "date-time",
"example": "2017-01-15T09:00:00.000Z"
}
},
"required": ["start_date", "end_date"]
}
In case of an [] array, you can specify an array-level example that is a multi-item array:
"timeRange": {
"type": "array",
"items": {
"type": "string",
"format": "date-time"
},
"example": [
"2016-01-15T09:00:00.000Z",
"2017-01-15T09:00:00.000Z"
]
}

Issue when deleting the items from the array in document of MongoDB

I am inserting log items in the document in the form of an array. I have restricted document size up to 5MB to make sure that the document size is not increased.
Here one document contains one array and all the log items will be stored into the array. Lets say I have 500 log items of 5 MB size is stored in one document in the form an array.
When I delete 497 log items,It is showing the remaining 3 log items in the document but when I tried to delete one of the items from the 3 log items, the entire document was deleted, I don't know What is happening.
Is the array in the document should have some minimum number size of data.
Note: I am restricting the document size at my application level.
Here is the sample data:
activityLogDetails:
[{
"activityLog": {
"acctId": 1,
"info1": {
"itemName": "-",
"value": "-"
},
"info2": {
"itemName": "-",
"value": "-"
},
"errorCode": "",
"internalInformation": "",
"kind": "Infomation",
"loginId": "0",
"opeLogId": "G1_1",
"operation": "startDiscovery",
"result": "normal",
"targetId": "1",
"timestamp": "1470980265729",
"undoFlag": "false"
}
},{
"activityLog": {
"acctId": 2,
"info1": {
"itemName": "-",
"value": "-"
},
"info2": {
"itemName": "-",
"value": "-"
},
"errorCode": "",
"internalInformation": "",
"kind": "Infomation",
"loginId": "0",
"opeLogId": "G1_1",
"operation": "startDiscovery",
"result": "normal",
"targetId": "1",
"timestamp": "1470980265729",
"undoFlag": "false"
}
},
etc....]
Delete Query:
db.test.remove({activityLogDetails.activityLog.acctId:{$gt:2}})
Could any body tell me what could be the issue?
What you are doing in your query, will remove the whole record.
Try the following query using $pull:-
db.test.updateMany(
{'activityLogDetails.activityLog.acctId':{$gt:2}},
{$pull:{activityLogDetails:{'activityLog.acctId':{$gt:2}}}})
Refer $pull for more info on how to use.

Resources