Querying an array within an array with Postgres JSONB query - arrays

I have some JSON in a field in my Postgres 9.4 db and I want to find rows where the given name is a certain value, where the field is named model and the JSON structure is as follows:
{
"resourceType": "Person",
"id": "8a7b72b1-49ec-43e5-bd21-bc62674d9875",
"name": [
{
"family": [
"NEWMAN"
],
"given": [
"JOHN"
]
}
]
}
So I tried this: SELECT * FROM current WHERE model->'name' #> '{"given":["JOHN"]}'; (as well as various other guesses) but that does not match the above data. How should I do this?

Use the function jsonb_array_elements():
select t.*
from current t,
jsonb_array_elements(model->'name') names
where names->'given' ? 'JOHN'

Related

Full Text Search in OrientDB JSON Data

I have following data in OrientDB 3.0.27 where some of the values are in JSON Array and some are string
{
"#type": "d",
"#rid": "#57:0",
"#version": 2,
"#class": "abc_class",
"user_name": [
"7/1 LIBOR Product"
],
"user_Accountability": [],
"user_Rollout_32_date": [],
"user_Brands": [
"AppNet"
],
"user_lastModificationTime": [
"2019-11-27 06:40:35"
],
"user_columnPercentage": [
"0.00"
],
"user_systemId": [
"06114a87-a099-0c30c60b49c4"
],
"user_lastModificationUser": [
"system"
],
"system_type": "Product",
"user_createDate": [
"2017-10-27 09:58:42"
],
"system_modelId": "bian_model",
"user_parent": [
"a12a41bd-af6f-0ca028af480d"
],
"user_Strategic_32_value": [],
"system_oeId": "06114a87-a099-0c30c60b49c4",
"user_description": [],
"#fieldTypes": "user_name=e,user_Accountability=e,user_Rollout_32_date=e,user_Brands=e,user_lastModificationTime=e,user_columnPercentage=e,user_systemId=e,user_lastModificationUser=e,user_createDate=e,user_parent=e,user_Strategic_32_value=e,user_description=e"
}
I have tried following queries:
select * from `abc_class ` where any() = ["AppNet"] limit 2;
select * from `abc_class ` where any() like '%a099%' limit 2;
Both of the above queries work since they are respecting the datatype of the field.
I want to run a contains query which will search in ANY field with ANY data type (like String, number, JSON Array, etc) more of like a - full text search.
select * from `abc_class ` where any() like '%AppNet%' limit 2;
The above query doesn't work since the real value is inside JSON Array. Tried almost all the things from filtering section documentation
How can I achieve full-text search like functionality with the existing data?
EDIT # 1
After doing more research now I'm able to atleast convert the array value into string and then run like operator on it, like below;
select * from `abc_class` where user_name.asString() like '%LIBOR%'
However, using any().asString() doesn't result any result
select * from `abc_class` where any().asString() like '%LIBOR%'
If the above query can be enhanced somehow to query any column as string, then the problem can be resolved.
If all the column values needs to be searched then we can create a JSON object of the full row data and convert it into String.
Then query the string with like keyword, as follows:
select * from `abc_class` where #this.toJSON().asString() like '%LIBOR%'
If we will be converting to #this.asString() directly then we'll be getting the count of array elements instead of the real data inside the array elements like below:
abc_class#57:4{system_modelId:model,system_oeId:14f4b593-a57d-4d37ad070a10,system_type:Product,user_lastModificationUser:[1],user_name:[1],user_description:[0],user_Accountability:[0],user_lastModificationTime:[1],user_Rollout_32_date:[0],user_Strategic_32_value:[0],user_createDate:[1],user_Brands:[0],user_parent:[1],user_systemId:[1],user_columnCompletenessPercentage:[1]} v2
Therefore, we need to first convert into JSON and then into String to query the full record using #this.toJSON().asString()
References:
https://orientdb.com/docs/last/sql/SQL-Methods.html
https://orientdb.com/docs/last/sql/SQL-Where.html
https://orientdb.com/docs/last/sql/SQL-Syntax.html

Combine JSON with same value into JSON array - Scala

I have converted a dataframe with columns email, account, id into json using toJSON. Each row is a JSON which looks like: {"email": "xyz", "account": "pqr", "id": "1"}.
The id field is not unique and I want to combine this array of JSON into array of JSON array such that each row is a array of JSONs with same id values.
For example: One row would look like: [{"email": "xyz", "account": "pqr", "id": "1"},{"email": "abc", "account": "lmn", "id": "1"}]
After this, I want to populate this JSON array into another dataframe user which has columns id and user.
The JSON array of each user with the matching id should be in the user dataframe.
O/p would be each row as: | 1 | [{"email": "xyz", "account": "pqr", "id": "1"},{"email": "abc", "account": "lmn", "id": "1"}] |
Can someone suggest how I can do this efficiently without exploding all the arrays multiple times?
I'm unsure which JSON library you are using, so I'd recommend to convert to a case class which has an id field. You could then group by the id field and then insert into your user dataframe, converting the grouped rows to JSON.
Something along the lines of...
case class Row(email: String, account: String, id: String)
val rows: List[Row] = ??? // converted from your dataframe
rows.groupBy(_.id)
.map { case (id, rows) =>
// insert into user dataframe. Convert rows to JSON
}

How to transform a JSON array nested inside an object inside another array in Postgres?

I'm using Postgres 9.6 and have a JSON field called credits with the following structure; A list of credits, each with a position and multiple people that can be in that position.
[
{
"position": "Set Designers",
people: [
"Joe Blow",
"Tom Thumb"
]
}
]
I need to transform the nested people array, which are currently just strings representing their names, into objects that have a name and image_url field, like this
[
{
"position": "Set Designers",
people: [
{ "name": "Joe Blow", "image_url": "" },
{ "name": "Tom Thumb", "image_url": "" }
]
}
]
So far I've only been able to find decent examples of doing this on either the parent JSON array or on an array field nested inside a single JSON object.
So far this is all I've been able to manage and even it is mangling the result.
UPDATE campaigns
SET credits = (
SELECT jsonb_build_array(el)
FROM jsonb_array_elements(credits::jsonb) AS el
)::jsonb
;
Create an auxiliary function to simplify the rather complex operation:
create or replace function transform_my_array(arr jsonb)
returns jsonb language sql as $$
select case when coalesce(arr, '[]') = '[]' then '[]'
else jsonb_agg(jsonb_build_object('name', value, 'image_url', '')) end
from jsonb_array_elements(arr)
$$;
With the function the update is not so horrible:
update campaigns
set credits = (
select jsonb_agg(jsonb_set(el, '{people}', transform_my_array(el->'people')))
from jsonb_array_elements(credits::jsonb) as el
)::jsonb
;
Working example in rextester.

Construct unique arrays from nested array values with common parents

Likely a close question to JQ: Nested JSON transformation but I wasn't able to get my head around it.
Sample JSON:
"value": [
{
"FeatureStatus": [
{
"FeatureName": "Sway1",
"FeatureServiceStatus": "ServiceOperational"
},
{
"FeatureName": "Sway2",
"FeatureServiceStatus": "ServiceDegraded"
}
],
"Id": "SwayEnterprise",
},
{
"FeatureStatus": [
{
"FeatureName": "yammerfeatures",
"FeatureServiceStatus": "ServiceOperational"
}
],
"Id": "yammer"
}
]
What I want to do is create an output with jq which results in the following;
{"Sway":"Sway1":"ServiceOperational"},
{"Sway":"Sway2":"ServiceDegraded"},
{"Yammer":"yammerfeatures":"ServiceOperational"}
My various attempts either end up with thousands of non-unique (i.e Yammer with Sway status), or only one Id with x number of FeatureServiceStatus.
Any pointers would be greatly appreciated. I've gone through the tutorial and the cookbook. I am perhaps 2.5 days into using jq.
Assuming that the enclosing braces have been added to make the input valid JSON, the filter:
.value[]
| [.Id] + (.FeatureStatus[] | [ .FeatureName, .FeatureServiceStatus ])
produces:
["SwayEnterprise","Sway1","ServiceOperational"]
["SwayEnterprise","Sway2","ServiceDegraded"]
["yammer","yammerfeatures","ServiceOperational"]
You can then easily reformat this as desired.

Query nested arrays in ArangoDB

I'm looking for a way to query nested arrays in ArangoDB.
The JSON structure I have is:
{
"uid": "bykwwla4prqi",
"category": "party",
"notBefore": "2016-04-19T08:43:35.388+01:00",
"notAfter": "9999-12-31T23:59:59.999+01:00",
"version": 1.0,
"aspects": [
"participant"
],
"description": [
{ "value": "User Homer Simpson, main actor in 'The Simpsons'", "lang": "en"}
],
"properties": [
{
"property": [
"urn:project:domain:attribute:surname"
],
"values": [
"Simpson"
]
},
{
"property": [
"urn:project:domain:attribute:givennames"
],
"values": [
"Homer",
"Jay"
]
}
]
}
I tried to use a query like the following to find all parties having a given name 'Jay':
FOR r IN resource
FILTER "urn:project:domain:attribute:givennames" IN r.properties[*].targets[*]
AND "Jay" IN r.properties[*].values[*]
RETURN r
but unfortunately it does not work - it returns an empty array. If I use a '1' instead of '*' for the properties array it works. But the array of the properties has no fixed structure.
Does anybody have an idea how to solve this?
Thanks a lot!
You can inspect what the filter does using a simple trick: you RETURN the actual filter condition:
db._query(`FOR r IN resource RETURN r.properties[*].property[*]`).toArray()
[
[
[
"urn:project:domain:attribute:surname"
],
[
"urn:project:domain:attribute:givennames"
]
]
]
which makes it pretty clear whats going on. The IN operator can only work on one dimensional arrays. You could work around this by using FLATTEN() to remove the sub layers:
db._query(`FOR r IN resource RETURN FLATTEN(r.properties[*].property[*])`).toArray()
[
[
"urn:project:domain:attribute:surname",
"urn:project:domain:attribute:givennames"
]
]
However, while your documents are valid json (I guess its converted from xml?) you should alter the structure as one would do it in json:
"properties" : {
"urn:project:domain:attribute:surname":[
"Simpson"
],
"urn:project:domain:attribute:givennames": [
"Homer",
"Jay"
]
}
Since the FILTER combination you specify would also find any other Jay (not only those found in givennames) and the usage of FLATTEN() will prohibit using indices in your filter statement. You don't want to use queries that can't use indices on reasonably sized collections for performance reasons.
In Contrast you can use an array index on givennames with the above document layout:
db.resource.ensureIndex({type: "hash",
fields:
["properties.urn:project:domain:attribute:givennames[*]"]
})
Now doublecheck the explain for the query:
db._explain("FOR r IN resource FILTER 'Jay' IN " +
"r.properties.`urn:project:domain:attribute:givennames` RETURN r")
...
6 IndexNode 1 - FOR r IN resource /* hash index scan */
...
Indexes used:
By Type Collection Unique Sparse Selectivity Fields Ranges
6 hash resource false false 100.00 % \
[ `properties.urn:project:domain:attribute:givennames[*]` ] \
("Jay" in r.`properties`.`urn:project:domain:attribute:givennames`)
that its using the index.

Resources