How do I do an aggregate query against a Couchbase array index? - arrays

I have documents in my database that contain a "flags" array. Each of those has a "flag" value that contains a string. I'm trying to get the count of how many of each flag string there are across all documents. So for example, if I had two documents:
{
"flags": [
{
"flag": "flag1",
...
},
{
"flag": "flag2",
...
}
],
...
},
{
"flags": [
{
"flag": "flag1",
...
},
{
"flag": "flag3",
...
}
],
...
}
I would expect a result back like:
{
{
"flag": "flag1",
"flag_count": 2
},
{
"flag": "flag2",
"flag_count": 1
},
{
"flag": "flag3",
"flag_count": 1
}
}
I've created an index that looks like this:
CREATE INDEX `indexname` ON `dbname`((all (array (`f`.`flag`) for `f` in `flags` end)),`flags`) WHERE (`type` in ["type1", "type2"])
So far, the only way I've been able to get this to work is with a query like this:
SELECT f1.flag, count(*) as flag_count from dbname s unnest flags as f1 where (s.type in ["type1", "type2"]) AND any f in s.flags satisfies f.flag like '%' end group by f1.flag
This all makes sense to me except that it requires something along the lines of that AND any f in s.flags satisfies f.flag like '%' part to run at all - if I leave that out, it tells me it can't find an index that works. Is there a way to structure this such that I could leave that out? It seems unnecessary to me, but I guess I'm missing something.

CREATE INDEX ix1 ON dbname( ALL ARRAY f.flag FOR f IN flags END)
WHERE type IN ["type1", "type2"];
SELECT f.flag, COUNT(1) AS flag_count
FROM dbname AS d
UNNEST d.flags AS f
WHERE d.type IN ["type1", "type2"] AND f.flag LIKE "%"
GROUP BY f.flag;
If ARRAY has duplicate flag value and count one per document
SELECT f.flag, COUNT( DISTINCT META(d).id) AS flag_count
FROM dbname AS d
UNNEST d.flags AS f
WHERE d.type IN ["type1", "type2"] AND f.flag LIKE "%"
GROUP BY f.flag;
Check UNNEST https://docs.couchbase.com/server/current/n1ql/n1ql-language-reference/indexing-arrays.html

Related

SQL Server table data to JSON Path result

I am looking for a solution to convert the table results to a JSON path.
I have a table with two columns as below. Column 1 Will always have normal values, but column 2 will have values up to 15 separated by ';' (semicolon).
ID Column1 Column2
--------------------------------------
1 T1 Re;BoRe;Va
I want to convert the above column data in to below JSON Format
{
"services":
[
{ "service": "T1"}
],
"additional_services":
[
{ "service": "Re" },
{ "service": "BoRe" },
{ "service": "Va" }
]
}
I have tried creating something like the below, but cannot get to the exact format that I am looking for
SELECT
REPLACE((SELECT d.Column1 AS services, d.column2 AS additional_services
FROM Table1 w (nolock)
INNER JOIN Table2 d (nolock) ON w.Id = d.Id
WHERE ID = 1
FOR JSON PATH), '\/', '/')
Please let me know if this is something we can achieve using T-SQL
As I mention in the comments, I strongly recommend you fix your design and normalise your design. Don't store delimited data in your database; Re;BoRe;Va should be 3 rows, not 1 delimited one. That doesn't mean you can't achieve what you want with your denormalised data, just that your design is flawed, and thus it needs being brought up.
One way to achieve what you're after is with some nested FOR JSON calls:
SELECT (SELECT V.Column1 AS service
FOR JSON PATH) AS services,
(SELECT SS.[value] AS service
FROM STRING_SPLIT(V.Column2,';') SS
FOR JSON PATH) AS additional_services
FROM (VALUES(1,'T1','Re;BoRe;Va'))V(ID,Column1,Column2)
FOR JSON PATH, WITHOUT_ARRAY_WRAPPER;
This results in the following JSON:
{
"services": [
{
"service": "T1"
}
],
"additional_services": [
{
"service": "Re"
},
{
"service": "BoRe"
},
{
"service": "Va"
}
]
}

Parsing string with multiple delimiters into columns

I want to split strings into columns.
My columns should be:
account_id, resource_type, resource_name
I have a JSON file source that I have been trying to parse via ADF data flow. That hasn't worked for me, hence I flattened the data and brought it into SQL Server (I am open to parsing values via ADF or SQL if anyone can show me how). Please check the JSON file at the bottom.
Use this code to query the data I am working with.
CREATE TABLE test.test2
(
resource_type nvarchar(max) NULL
)
INSERT INTO test.test2 ([resource_type])
VALUES
('account_id:224526257458,resource_type:buckets,resource_name:camp-stage-artifactory'),
('account_id:535533456241,resource_type:buckets,resource_name:tni-prod-diva-backups'),
('account_id:369798452057,resource_type:buckets,resource_name:369798452057-s3-manifests'),
('account_id:460085747812,resource_type:buckets,resource_name:vessel-incident-report-nonprod-accesslogs')
The output that I should be able to query in SQL Server should like this:
account_id
resource_type
resource_name
224526257458
buckets
camp-stage-artifactory
535533456241
buckets
tni-prod-diva-backups
and so forth.
Please help me out and ask for clarification if needed. Thanks in advance.
EDIT:
Source JSON Format:
{
"start_date": "2021-12-01 00:00:00+00:00",
"end_date": "2021-12-31 23:59:59+00:00",
"resource_type": "all",
"records": [
{
"directconnect_connections": [
"account_id:227148359287,resource_type:directconnect_connections,resource_name:'dxcon-fh40evn5'",
"account_id:401311080156,resource_type:directconnect_connections,resource_name:'dxcon-ffxgf6kh'",
"account_id:401311080156,resource_type:directconnect_connections,resource_name:'dxcon-fg5j5v6o'",
"account_id:227148359287,resource_type:directconnect_connections,resource_name:'dxcon-fgvfo1ej'"
]
},
{
"virtual_interfaces": [
"account_id:227148359287,resource_type:virtual_interfaces,resource_name:'dxvif-fgvj25vt'",
"account_id:227148359287,resource_type:virtual_interfaces,resource_name:'dxvif-fgbw5gs0'",
"account_id:401311080156,resource_type:virtual_interfaces,resource_name:'dxvif-ffnosohr'",
"account_id:227148359287,resource_type:virtual_interfaces,resource_name:'dxvif-fg18bdhl'",
"account_id:227148359287,resource_type:virtual_interfaces,resource_name:'dxvif-ffmf6h64'",
"account_id:390251991779,resource_type:virtual_interfaces,resource_name:'dxvif-fgkxjhcj'",
"account_id:227148359287,resource_type:virtual_interfaces,resource_name:'dxvif-ffp6kl3f'"
]
}
]
}
Since you don't have a valid JSON string and not wanting to get in the business of string manipulation... perhaps this will help.
Select B.*
From test2 A
Cross Apply ( Select account_id = max(case when value like 'account_id:%' then stuff(value,1,11,'') end )
,resource_type = max(case when value like 'resource_type:%' then stuff(value,1,14,'') end )
,resource_name = max(case when value like 'resource_name:%' then stuff(value,1,14,'') end )
from string_split(resource_type,',')
)B
Results
account_id resource_type resource_name
224526257458 buckets camp-stage-artifactory
535533456241 buckets tni-prod-diva-backups
369798452057 buckets 369798452057-s3-manifests
460085747812 buckets vessel-incident-report-nonprod-accesslogs
Unfortunately, the values inside the arrays are not valid JSON. You can patch them up by adding {} to the beginning/end, and adding " on either side of : and ,.
DECLARE #json nvarchar(max) = N'{
"start_date": "2021-12-01 00:00:00+00:00",
"end_date": "2021-12-31 23:59:59+00:00",
"resource_type": "all",
"records": [
{
"directconnect_connections": [
"account_id:227148359287,resource_type:directconnect_connections,resource_name:''dxcon-fh40evn5''",
"account_id:401311080156,resource_type:directconnect_connections,resource_name:''dxcon-ffxgf6kh''",
"account_id:401311080156,resource_type:directconnect_connections,resource_name:''dxcon-fg5j5v6o''",
"account_id:227148359287,resource_type:directconnect_connections,resource_name:''dxcon-fgvfo1ej''"
]
},
{
"virtual_interfaces": [
"account_id:227148359287,resource_type:virtual_interfaces,resource_name:''dxvif-fgvj25vt''",
"account_id:227148359287,resource_type:virtual_interfaces,resource_name:''dxvif-fgbw5gs0''",
"account_id:401311080156,resource_type:virtual_interfaces,resource_name:''dxvif-ffnosohr''",
"account_id:227148359287,resource_type:virtual_interfaces,resource_name:''dxvif-fg18bdhl''",
"account_id:227148359287,resource_type:virtual_interfaces,resource_name:''dxvif-ffmf6h64''",
"account_id:390251991779,resource_type:virtual_interfaces,resource_name:''dxvif-fgkxjhcj''",
"account_id:227148359287,resource_type:virtual_interfaces,resource_name:''dxvif-ffp6kl3f''"
]
}
]
}';
SELECT
j4.account_id,
j4.resource_type,
TRIM('''' FROM j4.resource_name) resource_name
FROM OPENJSON(#json, '$.records') j1
CROSS APPLY OPENJSON(j1.value) j2
CROSS APPLY OPENJSON(j2.value) j3
CROSS APPLY OPENJSON('{"' + REPLACE(REPLACE(j3.value, ':', '":"'), ',', '","') + '"}')
WITH (
account_id bigint,
resource_type varchar(20),
resource_name varchar(100)
) j4;
db<>fiddle
The first three calls to OPENJSON have no schema, so the resultset is three columns: key value and type. In the case of arrays (j1 and j3), key is the index into the array. In the case of single objects (j2), key is each property name.

How Can I Calculate the Average of Floats in a Nested Array in a Variant Column

I have a VARIANT column that contains a JSON response from a web service. It contains a nested array with a float value that I would like to aggregate and return as an average. Here is an example SnowSQL command that I am using:
select
value:disambiguated.id,
value:mentions
from TABLE(
FLATTEN(input =>
PARSE_JSON('{ "entities": [{"count": 2,"disambiguated": {"id": 123},"label": "Coronavirus Disease 2019","mentions": [{"confidence": 0.5928,}, {"confidence": 0.5445,}],"type": "MEDICAL"}]}'):entities
)
)
Which returns:
VALUE:DISAMBIGUATED.ID VALUE:MENTIONS
123 [ { "confidence": 0.5928 }, { "confidence": 0.5445 } ]
What I would like to return is something with the two "confidence" values averaged to 0.56825. I was able to add a second FLATTEN statement which isolated the "mentions" array and allowed me to extract each "confidence" value. I can not seem to figure out how to group the records to calculate the average. Would love to use the built in AVG() function if possible. Thank you in advance for any help you can provide.
Using your example, you can use LATERAL FLATTEN to create your required flattened fields, and then aggregate as you normally would. In this example, I'm grouping on the ID that is in the data, but you could also use y.index or z.index depending on which of those you wanted to group on for your AVG().
WITH x AS (
SELECT PARSE_JSON('{ "entities": [{"count": 2,"disambiguated": {"id": 123},"label": "Coronavirus Disease 2019","mentions": [{"confidence": 0.5928,}, {"confidence": 0.5445,}],"type": "MEDICAL"}]}') as json_str
)
SELECT
y.value:disambiguated.id as id,
avg(z.value:confidence)
from x,
LATERAL FLATTEN(input => json_str:entities) y,
LATERAL FLATTEN(input => y.value:mentions) z
GROUP BY id
;

How can I count the number of top level json keys in a Snowflake database variant field?

I'm looking for the number 2 here... array_size appears to work on a variant list but is not doing so well on this json. Is there a clever way to do this? I don't know/probably can't trust the structure will only go this deep, and am hoping to use this as just another field on a query where I'm pulling out a bunch of other values out of the json; so ideally a solution allows this as well.
select dict, array_size(dict)
from (select parse_json('{
"partition": [
"partition_col"
],
"sample_weight": [
"sample_weight"
]
}') as dict)
In that case you can create a small Javascript UDF:
create or replace function count_keys(MYVAR variant)
returns float
language javascript
as '
return (Object.entries(MYVAR)).length
'
;
To call it:
select count_keys(parse_json(
'{
"partition": [
"partition_col"
],
"sample_weight": [
"sample_weight"
]
}')
)
;
Use flatten:
with dict as (
select parse_json('{
"partition": [
"partition_col"
],
"sample_weight": [
"sample_weight"
]
}') val
)
select val, count(*)
from dict,
lateral flatten(input => val)
group by val
;

Postgres/JSON - update all array elements

Given the following json:
{
"foo": [
{
"bar": true
},
{
"bar": true
}
]
}
How can I select the following:
{
"foo": [
{
"bar": false
},
{
"bar": false
}
]
}
?
So far I've figured out how to manipulate a single array value:
SELECT
jsonb_set(
'{
"foo": [
{
"bar": true
},
{
"bar": true
}
]
}'::jsonb, '{foo,0,bar}', to_jsonb(false)
)
But how do I set all elements within an array?
You might want to kill two birds with one stone - update existing key in every object in the array or insert the key with a given value. jsonb_set is a perfect match here, but it requires us to pass the index of each object, so we have to iterate over the array first.
The implementation is HIGHLY inspired by klin's answer, which didn't solve my problem (which was updating and inserting) and didn't work if there were multiple keys in the object.
So, the implementation is as follows:
-- the params are the same as in aforementioned `jsonb_set`
CREATE OR REPLACE FUNCTION update_array_elements(target jsonb, path text[], new_value jsonb)
RETURNS jsonb language sql AS $$
-- aggregate the jsonb from parts created in LATERAL
SELECT jsonb_agg(updated_jsonb)
-- split the target array to individual objects...
FROM jsonb_array_elements(target) individual_object,
-- operate on each object and apply jsonb_set to it. The results are aggregated in SELECT
LATERAL jsonb_set(individual_object, path, new_value) updated_jsonb
$$;
And that's it... :)
I hope it'll help someone with the same problem I had.
There is no standard function to update json array elements by key.
A custom function is probably the simplest way to solve the problem:
create or replace function update_array_elements(arr jsonb, key text, value jsonb)
returns jsonb language sql as $$
select jsonb_agg(jsonb_build_object(k, case when k <> key then v else value end))
from jsonb_array_elements(arr) e(e),
lateral jsonb_each(e) p(k, v)
$$;
select update_array_elements('[{"bar":true},{"bar":true}]'::jsonb, 'bar', 'false');
update_array_elements
----------------------------------
[{"bar": false}, {"bar": false}]
(1 row)
Your query may look like this:
with a_data(js) as (
values(
'{
"foo": [
{
"bar": true
},
{
"bar": true
}
]
}'::jsonb)
)
select
jsonb_set(js, '{foo}', update_array_elements(js->'foo', 'bar', 'false'))
from a_data;
jsonb_set
-------------------------------------------
{"foo": [{"bar": false}, {"bar": false}]}
(1 row)

Resources