Snowflake merge object / json - snowflake-cloud-data-platform

is there any way how to merge 2 objects in snowflake? I found https://docs.snowflake.net/manuals/sql-reference/functions/object_insert.html, but that only sets/updates one key at a time. I want to merge 2 objects (something like Object.assign() in js).
Also tried to find workaround by converting to array, concatenating and construction object from that array, but did not manage to make it work.
Thanks!

Snowflake does not have a built-in function like that, but it's trivial to do using, well, Object.assign() inside Snowflake's JavaScript UDFs :)
create or replace function my_object_assign(o1 VARIANT, o2 VARIANT)
returns VARIANT
language javascript
as 'return Object.assign(O1, O2);';
select my_object_assign(parse_json('{"a":1,"b":2,"c":3}'), parse_json('{"c":4, "d":5}')) as res;
-----------+
RES |
-----------+
{ |
"a": 1, |
"b": 2, |
"c": 4, |
"d": 5 |
} |
-----------+

Related

Indexing only an attribute on a json array - Postgres

I have a table with a jsonb field named "data" with the following content:
{
customerId: 1,
something: "..."
list: [{ nestedId: 1, attribute: "a" }, { nestedId: 2, attribute: "b" }]
}
I need to retrieve the whole row based on its 'nestedId' attribute, note that the field is inside an array.
After checking the query plans I found out I could benefit from an index. So I added:
CREATE INDEX i1 ON mytable using gin ((data->'list') jsonb_path_ops))
From what I understood from the doc, this creates index items for the values in the "list", the solution solves my problem.
For the sake of completion follow the query I can use to retrieve my data
SELECT data FROM mytable where data->'list' #> '[{"nestedId": 1}]'
Tho, I wonder if there are more optimal indexing I could do. Is it possible to create an index only for the "nestedId" field for example?
You can index only the numeric values and not also the keys, using functional indexes. You probably need to make a helper function to do so.
create function jsonb_objarray_to_intarray(jsonb,text) returns int[] immutable language sql as
$$ select array_agg((x->>$2)::int) from jsonb_array_elements($1) f(x) $$;
create index on mytable using gin (jsonb_objarray_to_intarray(data->'list','nestedId'));
SELECT data FROM mytable where jsonb_objarray_to_intarray(data->'list','nestedId') #> ARRAY[3];
I wrote it this way so the function could be reused in other similar situations. If you don't care about it being re-used, you can make the code that uses it look prettier by hard coding the dereference and the key value into the function:
create function mytable_to_intarray(jsonb) returns int[] immutable language sql as
$$ select array_agg((x->>'nestedId')::int) from jsonb_array_elements($1->'list') f(x) $$;
create index on mytable using gin (mytable_to_intarray(data));
SELECT data FROM mytable where mytable_to_intarray(data) #> ARRAY[3];
Now those indexes do take longer to make than your original, but they are about half the size and are at least as fast to query. More importantly, the planner has better statistics about the selectivity, and so in more complicated queries is likely to come up with better query plans.

postgresql Json path capabilities

In documentation some of the postgresql json functions uses a json path attribute.
For exemple the jsonb_set function :
jsonb_set(target jsonb, path text[], new_value jsonb[, create_missing boolean])
I can't find any of the specifications of this type of attribute.
Can it be used for example to retrieve an array element based on it's attribute's value ?
The path is akin to a path on a filesystem: each value drills further down the leaves of the tree (in the order you specified). Once you get a particular JSONB value from extracting it via a path, you can chain other JSONB operations if needed. Using functions/operators with JSONB paths is mostly useful for when there are nested JSONB objects, but can also handle simple JSONB arrays too.
For example:
SELECT '{"a": 42, "b": {"c": [1, 2, 3]}}'::JSONB #> '{b, c}' -> 1;
...should return 2.
The path {b, c} first gets b's value, which is {"c": [1, 2, 3]}.
Next, it drills down to get c's value, which is [1, 2, 3].
Then the -> operation is chained onto that, which gets the value at the specified index of that array (using base-zero notation, so that 0 is the first element, 1 is the second, etc.). If you use -> then it will return a value with a JSONB data type, whereas ->> will return a value with a TEXT data type.
But you could have also written it like this:
SELECT '{"a": 42, "b": {"c": [1, 2, 3]}}'::JSONB #> '{b, c, 1}';
...and simply included both keys and array indexes in the same path.
For arrays, the following two should be equivalent, except the first uses a path, and the second expects an array and gets the value at the specified index:
SELECT '[1, 2, 3]'::JSONB #> '{1}';
SELECT '[1, 2, 3]'::JSONB -> 1;
Notice a path must always be in JSON array syntax, where each successive value is the next leaf in the tree you want to drill down to. You supply keys if it is a JSONB object, and indexes if it is a JSONB array. If these were file paths, the JSONB keys are like folders, and array indexes are like files.

Join in Laravel's Eloquent

I have a visit Model and I'm getting the data I want like that:
$app_visits = Visit::select([
'start',
'end',
'machine_name'
])->where('user_id', $chosen_id)->get();
But I want to add points for every visit. Every visit has an interaction (but there's no visit_id (because of other system I cannot add it).
Last developer left it like that:
$interactions = Interaction::where([
'machine_name' => $app_visit->machine_name,
])->whereBetween('date', [$app_visit->start, $app_visit->end])->get();
$points = 0;
foreach ($interactions as $interaction) {
$points += (int)$interaction->app_stage;
}
$app_visits[$key]['points'] = $points
But I really don't like it as it's slow and messy. I wanted to just add 'points' sum to the first query, to touch database only once.
#edit as someone asked for database structure:
visit:
|id | start | end | machine_name | user_id
inteaction:
|id | time | machine_name | points
You can use a few things in eloquent. Probably the most useful for this case, is the select(DB::raw(sql...)) as you will have to add a bit of raw sql to retrieve a count.
For example:
return $query
->join(...)
->where(...)
->select(DB::raw(
COUNT(DISTINCT res.id) AS count'
))
->groupBy(...);
Failing that, I'd just replace the eloquent with raw sql. We've had to do that a fair bit, as our data sets are massive, and eloquent model building has proven a little slow.
Update as you've added structure. Why not just add a relation to Interaction, based upon machine_name (or even a custom method using raw sql that calculates the points), and use: Visits::with('interaction.visitPoints')->...blah ?
Take a look at DB instead of Eloquent:
https://laravel.com/docs/5.6/queries
For more complex and efficient queries.
There is also a possibility to use raw SQL with this facade.

'Malformed class name' error when trying to concatenate arrays of zeros to an existing row that's an array[array[int]] [duplicate]

When working with Spark's DataFrames, User Defined Functions (UDFs) are required for mapping data in columns. UDFs require that argument types are explicitly specified. In my case, I need to manipulate a column that is made up of arrays of objects, and I do not know what type to use. Here's an example:
import sqlContext.implicits._
// Start with some data. Each row (here, there's only one row)
// is a topic and a bunch of subjects
val data = sqlContext.read.json(sc.parallelize(Seq(
"""
|{
| "topic" : "pets",
| "subjects" : [
| {"type" : "cat", "score" : 10},
| {"type" : "dog", "score" : 1}
| ]
|}
""")))
It's relatively straightforward to use the built-in org.apache.spark.sql.functions to perform basic operations on the data in the columns
import org.apache.spark.sql.functions.size
data.select($"topic", size($"subjects")).show
+-----+--------------+
|topic|size(subjects)|
+-----+--------------+
| pets| 2|
+-----+--------------+
and it's generally easy to write custom UDFs to perform arbitrary operations
import org.apache.spark.sql.functions.udf
val enhance = udf { topic : String => topic.toUpperCase() }
data.select(enhance($"topic"), size($"subjects")).show
+----------+--------------+
|UDF(topic)|size(subjects)|
+----------+--------------+
| PETS| 2|
+----------+--------------+
But what if I want to use a UDF to manipulate the array of objects in the "subjects" column? What type do I use for the argument in the UDF? For example, if I want to reimplement the size function, instead of using the one provided by spark:
val my_size = udf { subjects: Array[Something] => subjects.size }
data.select($"topic", my_size($"subjects")).show
Clearly Array[Something] does not work... what type should I use!? Should I ditch Array[] altogether? Poking around tells me scala.collection.mutable.WrappedArray may have something to do with it, but still there's another type I need to provide.
What you're looking for is Seq[o.a.s.sql.Row]:
import org.apache.spark.sql.Row
val my_size = udf { subjects: Seq[Row] => subjects.size }
Explanation:
Current representation of ArrayType is, as you already know, WrappedArray so Array won't work and it is better to stay on the safe side.
According to the official specification, the local (external) type for StructType is Row. Unfortunately it means that access to the individual fields is not type safe.
Notes:
To create struct in Spark < 2.3, function passed to udf has to return Product type (Tuple* or case class), not Row. That's because corresponding udf variants depend on Scala reflection:
Defines a Scala closure of n arguments as user-defined function (UDF). The data types are automatically inferred based on the Scala closure's signature.
In Spark >= 2.3 it is possible to return Row directly, as long as the schema is provided.
def udf(f: AnyRef, dataType: DataType): UserDefinedFunction
Defines a deterministic user-defined function (UDF) using a Scala closure. For this variant, the caller must specify the output data type, and there is no automatic input type coercion.
See for example How to create a Spark UDF in Java / Kotlin which returns a complex type?.

Getting a list of keys based on another property in json with jq

I'm trying to create a jq filter for JSON, similar to How to filter an array of objects based on values in an inner array with jq? - but even using that as a basis doesn't seem to be giving me the results I want.
Here's my example json
[{"id":"0001","tags":["one","two"]},{"id":"0002", "tags":["two"]}]
I want to return a list of IDs where tags contains "one" (not partial string match, full element match).
I have tried some variations, but can't get the filter right.
. - map(select(.resources[] | contains("one"))) | .[] .id
Returns "0001","0002"
Have also tried ... .resources[].one)) | ... but always get full list when trying to filter by "one" and expecting to only get 0001
Where am I filtering wrong? (have about 30 minutes experience with jq, so please excuse my ignorance if it's something obvious :)
map(select(.tags | index("one")) | .id)
Since your problem description indicates you want to check if the array contains "one", it's simplest to use index.
UPDATE
On Jan 30, 2017, a builtin named IN was added for efficiently testing whether a JSON entity is contained in a stream. It can also be used for efficiently testing membership in an array. In the present case, the relevant usage would be:
map(select(.tags as $tags | "one" | IN($tags[])) | .id)
If your jq does not have IN/1, then so long as your jq has first/1, you can use this equivalent definition:
def IN(s): . as $in | first(if (s == $in) then true else empty end) // false;
(In practice, index/1 is usually fast enough, but its implementation currently (jq 1.5 and versions through at least July 2017) is suboptimal.)

Resources