Combine JSON with same value into JSON array - Scala - arrays

I have converted a dataframe with columns email, account, id into json using toJSON. Each row is a JSON which looks like: {"email": "xyz", "account": "pqr", "id": "1"}.
The id field is not unique and I want to combine this array of JSON into array of JSON array such that each row is a array of JSONs with same id values.
For example: One row would look like: [{"email": "xyz", "account": "pqr", "id": "1"},{"email": "abc", "account": "lmn", "id": "1"}]
After this, I want to populate this JSON array into another dataframe user which has columns id and user.
The JSON array of each user with the matching id should be in the user dataframe.
O/p would be each row as: | 1 | [{"email": "xyz", "account": "pqr", "id": "1"},{"email": "abc", "account": "lmn", "id": "1"}] |
Can someone suggest how I can do this efficiently without exploding all the arrays multiple times?

I'm unsure which JSON library you are using, so I'd recommend to convert to a case class which has an id field. You could then group by the id field and then insert into your user dataframe, converting the grouped rows to JSON.
Something along the lines of...
case class Row(email: String, account: String, id: String)
val rows: List[Row] = ??? // converted from your dataframe
rows.groupBy(_.id)
.map { case (id, rows) =>
// insert into user dataframe. Convert rows to JSON
}

Related

Json Response - Accessing objects in an array

I'm attempting to access an object in an Json response, but not sure how. How can I access ID 11 using rest-assured, where ObjID1 and ObjID2 are unique UUID's?
"ObjID1": [
{
"ID": "11",
"NAME": "XYZ",
"GENDER": "M"
}
]
"ObjID2": [
{
"ID": "12",
"NAME": "Z",
"GENDER": "F"
}
]
To assert element's value you can use
then().body("ObjID1.ID[0]", equalTo("11"))
Indexing ID field with [0] allows you to get the ID of first JSON Object in the Array.
If you want to get this value for further processing then you can extract it like this:
JsonPath path = JsonPath.from("json file or json String");
List<HashMap<String, Object>> listOfJsonObjects = path.get("ObjID1");
We parsed the JSON and by using the path.get method we save Array of JSON Objects inside List of HashMaps. Each element in the list is the JSON Object.
In order to access first JSON Object you can use
HashMap<String, Object> jsonObject = listOfJsonObjects.get(0);
and then, using classic HashMap methods you can get specific element in the JSON Object like this:
jsonObject.get("ID");
The above will return "11"
Note that you will have to make a cast to String to get the value. Values in the HashMap are objects because JSON Objects in the array may contain nested Arrays or Objects.
String firstId = (String) jsonObject.get("ID");

How to transform a JSON array nested inside an object inside another array in Postgres?

I'm using Postgres 9.6 and have a JSON field called credits with the following structure; A list of credits, each with a position and multiple people that can be in that position.
[
{
"position": "Set Designers",
people: [
"Joe Blow",
"Tom Thumb"
]
}
]
I need to transform the nested people array, which are currently just strings representing their names, into objects that have a name and image_url field, like this
[
{
"position": "Set Designers",
people: [
{ "name": "Joe Blow", "image_url": "" },
{ "name": "Tom Thumb", "image_url": "" }
]
}
]
So far I've only been able to find decent examples of doing this on either the parent JSON array or on an array field nested inside a single JSON object.
So far this is all I've been able to manage and even it is mangling the result.
UPDATE campaigns
SET credits = (
SELECT jsonb_build_array(el)
FROM jsonb_array_elements(credits::jsonb) AS el
)::jsonb
;
Create an auxiliary function to simplify the rather complex operation:
create or replace function transform_my_array(arr jsonb)
returns jsonb language sql as $$
select case when coalesce(arr, '[]') = '[]' then '[]'
else jsonb_agg(jsonb_build_object('name', value, 'image_url', '')) end
from jsonb_array_elements(arr)
$$;
With the function the update is not so horrible:
update campaigns
set credits = (
select jsonb_agg(jsonb_set(el, '{people}', transform_my_array(el->'people')))
from jsonb_array_elements(credits::jsonb) as el
)::jsonb
;
Working example in rextester.

Querying an array within an array with Postgres JSONB query

I have some JSON in a field in my Postgres 9.4 db and I want to find rows where the given name is a certain value, where the field is named model and the JSON structure is as follows:
{
"resourceType": "Person",
"id": "8a7b72b1-49ec-43e5-bd21-bc62674d9875",
"name": [
{
"family": [
"NEWMAN"
],
"given": [
"JOHN"
]
}
]
}
So I tried this: SELECT * FROM current WHERE model->'name' #> '{"given":["JOHN"]}'; (as well as various other guesses) but that does not match the above data. How should I do this?
Use the function jsonb_array_elements():
select t.*
from current t,
jsonb_array_elements(model->'name') names
where names->'given' ? 'JOHN'

jq: select only an array which contains element A but not element B

My data is a series of JSON arrays. Each array has one or more elements with name and id keys:
[
{
"name": "first_source",
"id": "abcdef"
},
{
"name": "second_source",
"id": "ghijkl"
},
{
"name": "third_source",
"id": "opqrst"
}
]
How, using jq, do I select only the arrays which contain an element with "first source" as the name value, but which don't contain "second_source" as the name value of any element?
This only returns an element for further processing:
jq '.[] | select (.name == "first_source")
But I clearly need to return the entire array for my scenario to work.
You can use this filter:
select(
(map(.name == "first_source") | any) and
(map(.name != "second_source") | all)
)
You need to test all the elements of an array for an existence of the names. You can do that by mapping each object to your condition and use the any or all filter appropriately.
Here, you want to see if any item is named "first_source" and all items are not named "second_source".

What is sequence of fields in result document

Does Solr maintain sequence of fields (Dynamic fields ) in result document like in the sequence used to index the document ?
For Example:
Consider the following record being indexed
School_txt , Class_txt , Section_txt
So When I will get this document as a result , will the sequence of fields be maintained or it can be random like Class_tx , School_txt , Section_txt ?
If it can be random then how can I preserve the sequence of fields ?
Yes, the sequence of the fields are maintained (at least with 4.9.0) for each document. This is also true for multiValued field, where the values are returned in the same sequence as they are added (which is useful if you want to merge two fields into a separate value later). Here's an example where I rotated the field sequence while indexing:
{
"id": "1",
"School_txt": "School",
"Class_txt": "Class",
"Section_txt": "Section1",
"_version_": 1473987528354693000
},
{
"id": "2",
"Class_txt": "School2",
"Section_txt": "Class2",
"School_txt": "Section2",
"_version_": 1473987528356790300
},
{
"id": "3",
"Section_txt": "School3",
"School_txt": "Class3",
"Class_txt": "Section3",
"_version_": 1473987528356790300
}

Resources