OBJECT_CONSTRUCT function is not working properly - snowflake-cloud-data-platform

output--
I have written the query in snowflake to generate Json file, from the query output want to remove fields which has NULL. OBJECT_CONSTRUCT is not working properly for some column its not passing NULL value where else for some column its giving null value as result.
Input-
Json remove any field which has value NULL or blank.
{"DIFID":122,"DIF_FLAG":"NULL","DIF_TYPE":"asian/white","FOCAL_COUNT":2370,"REFERENCE_COUNT":17304},
Required Output-
Json remove any field which has value NULL or blank.
{"DIFID":122,"DIF_TYPE":"asian/white","FOCAL_COUNT":2370,"REFERENCE_COUNT":17304},
query-
select distinct ITEMSTATID,object_construct(
'DIFID',DIFID,
'DIF_TYPE',DIF_TYPE,
'DIF_FLAG',DIF_FLAG,
'FOCAL_COUNT',FOCAL_COUNT::integer,
'REFERENCE_COUNT',REFERENCE_COUNT::integer,
'DIF_METHOD',DIF_METHOD,
'DIF_VALUE',DIF_VALUE)
DIF
from DEV_IPM.STAGEVAULT.DIF_STATISTICS;

For string column and 'NULL' as string literal column's value is not skipped:
CREATE OR REPLACE TABLE DIF_STATISTICS
AS
SELECT 1 AS ITEMSTATID,
122 AS DIFID,
'NULL' AS DIF_FLAG, -- here
'asian/white' AS DIF_TYPE,
2370 AS FOCAL_COUNT,
17304 AS REFERENCE_COUNT;
Output:
The value is definitely stored as TEXT:
SELECT null AS DIF_FLAG, 'NULL' AS DIF_FLAG;
On the left: true NULL on the right: NULL string
If it the case then it should be nullified NULLIF(DIF_FLAG, 'NULL') before passing to OBJECT_CONSTRUCT function:
SELECT ITEMSTATID,
object_construct(
'DIFID',DIFID,
'DIF_TYPE',DIF_TYPE,
'DIF_FLAG',NULLIF(DIF_FLAG, 'NULL'),
'FOCAL_COUNT',FOCAL_COUNT::integer,
'REFERENCE_COUNT',REFERENCE_COUNT::integer) AS DIF
FROM DIF_STATISTICS;
Previous answer before column details were provided (also plausible):
It is working as intended:
NULL Values
Snowflake supports two types of NULL values in semi-structured data:
SQL NULL: SQL NULL means the same thing for semi-structured data types as it means for structured data types: the value is missing or unknown.
JSON null (sometimes called “VARIANT NULL”): In a VARIANT column, JSON null values are stored as a string containing the word “null” to distinguish them from SQL NULL values.
OBJECT_CONSTRUCT
If the key or value is NULL (i.e. SQL NULL), the key-value pair is omitted from the resulting object. A key-value pair consisting of a not-null string as key and a JSON NULL as value (i.e. PARSE_JSON(‘NULL’)) is not omitted.
For true SQL NULL values, that column is ommitted:
CREATE OR REPLACE TABLE DIF_STATISTICS
AS
SELECT 1 AS ITEMSTATID,
122 AS DIFID,
NULL AS DIF_FLAG,
'asian/white' AS DIF_TYPE,
2370 AS FOCAL_COUNT,
17304 AS REFERENCE_COUNT;
SELECT ITEMSTATID,
object_construct(
'DIFID',DIFID,
'DIF_TYPE',DIF_TYPE,
'DIF_FLAG',DIF_FLAG,
'FOCAL_COUNT',FOCAL_COUNT::integer,
'REFERENCE_COUNT',REFERENCE_COUNT::integer) AS DIF
FROM DIF_STATISTICS;
Output:
Probably the data type of the column DIFID is VARIANT/OBJECT:
CREATE OR REPLACE TABLE DIF_STATISTICS
AS
SELECT 1 AS ITEMSTATID,
122 AS DIFID,
PARSE_JSON('NULL') AS DIF_FLAG, -- here
'asian/white' AS DIF_TYPE,
2370 AS FOCAL_COUNT,
17304 AS REFERENCE_COUNT;
Output:

Related

Snowflake : Object_construct leaving null values when i used copy command to frame json file as out put

I use copy command of snowflake which is below returns a file with content json
copy into #elasticsearch/product/sf_index
from (select object_construct('id',id, alpha,'alpha')from table limit 1)
file_format = (type = json, COMPRESSION=NONE), overwrite=TRUE, single = TRUE, max_file_size=5368709120;
data is
id alpha
1 null
the output file is
{
"id" :1
}
but I need to have the null values
{
"id" : 1,
"alpha" : null
}
You can use the function OBJECT_CONSTRUCT_KEEP_NULL.
Documentation: https://docs.snowflake.com/en/sql-reference/functions/object_construct_keep_null.html
Example:
select OBJECT_CONSTRUCT_KEEP_NULL('id',id, alpha,'alpha')
Will it be possible for you to check programmatically if the value is null and it is null use the below
select object_construct('id',1,'alpha',parse_json('null'));
Per SnowFlake documentation
If the key or value is NULL (i.e. SQL NULL), the key-value pair will be omitted from the resulting object. A key-value pair consisting of a not-null string as key and a JSON NULL as value (i.e. PARSE_JSON(‘NULL’)) will not be omitted.
The other option is, just send it without the null attribute in Elastic and then take care of the retrieval from Elastic.
How about this
select object_construct('id',id, 'alpha', case when alpha is not null then alpha else 'null' end )from table limit 1;
case should be supported by the copy command.
"null" is a valid in json document as per this SO
Is null valid JSON (4 bytes, nothing else)
Ok another possible way is this using union
select object_construct('id',id, 'alpha', parse_json('NULL') )from table where alpha is null
union
select object_construct('id',id, 'alpha', alpha )from table where alpha is not null;
select object_construct('id', id,'alpha', IFNULL(alpha, PARSE_JSON('null'))) from table limit 1
Use IFNULL to check if the value is null and replace with JSON 'null'

SQL Server: How to remove a key from a Json object

I have a query like (simplified):
SELECT
JSON_QUERY(r.SerializedData, '$.Values') AS [Values]
FROM
<TABLE> r
WHERE ...
The result is like this:
{ "2019":120, "20191":120, "201902":121, "201903":134, "201904":513 }
How can I remove the entries with a key length less then 6.
Result:
{ "201902":121, "201903":134, "201904":513 }
One possible solution is to parse the JSON and generate it again using string manipulations for keys with desired length:
Table:
CREATE TABLE Data (SerializedData nvarchar(max))
INSERT INTO Data (SerializedData)
VALUES (N'{"Values": { "2019":120, "20191":120, "201902":121, "201903":134, "201904":513 }}')
Statement (for SQL Server 2017+):
UPDATE Data
SET SerializedData = JSON_MODIFY(
SerializedData,
'$.Values',
JSON_QUERY(
(
SELECT CONCAT('{', STRING_AGG(CONCAT('"', [key] ,'":', [value]), ','), '}')
FROM OPENJSON(SerializedData, '$.Values') j
WHERE LEN([key]) >= 6
)
)
)
SELECT JSON_QUERY(d.SerializedData, '$.Values') AS [Values]
FROM Data d
Result:
Values
{"201902":121,"201903":134,"201904":513}
Notes:
It's important to note, that JSON_MODIFY() in lax mode deletes the specified key if the new value is NULL and the path points to a JSON object. But, in this specific case (JSON object with variable key names), I prefer the above solution.

Select on jsonb array on specific key/value

I have a jsonb field containing this data :
[{"FieldName":"wire1","Metadata":[{"Date":"2018-02-06T11:32:57.4022774+01:00","Source":"exampleSource"}]},
{"FieldName":"wire2","Metadata":[{"Date":"2018-02-06T11:32:57.4022774+01:00","Source":"exampleSource"}]},
{"FieldName":"wire3","Metadata":[{"Date":"2018-02-06T11:32:57.4022774+01:00","Source":"exampleSource"}]}]
What is the correct way to access the FieldName = FieldValue inside this array, as part of a select ? We tried SELECT meta::json->0 FROM myTable , and that returned null. (meta is the column's name containing the metaData)
What I hope to get is, in a select, to return all lines where FieldName = wire1, or where Source = exampleSource, or where both are true.
what you need is jsonb_array_elements function.
-> returns an object
SELECT jsonb_array_elements(**columnName**)->'FieldName' FROM ....
returns FieldName object
->> returns text
SELECT jsonb_array_elements(**columnName**)->>'FieldName' FROM ....
returns "wire1" text

Simplifying a SQL Server query with a shortcut

I have a query where many columns could be blank or null. They actually have longer names than the example below which I am using as an example:
select *
from table1
where field1 is not null and field1 != '' and
field2 is not null and field2 != ''
...etc
It gets tiresome having to type
x is not null and x != ''.
Is there some way to specify "x is not null and x != ''"?
Like for Java with
StringUtils.isNotEmpty(x)
I use
where isnull(x, '') <> ''
a lot. I find it a bit easier to "understand" than nullif.
-- EDIT ---------------------------------------
I missed that they were all ANDed together. So, if all N fields must be non-null and not empty, assuming that all fields are strings (varchars), this should do it:
where isnull(field1 + field2 + field3 + ... + fieldN, '') <> ''
First, the strings are concatenated together:
If any are null, the result will be null
If none are null and all are empty, the result will be an empty string
Else, the result will be a non-empty string
Next, the results are isnulled:
If the concatenated value is null, it is set to an empty string
Else, you get the concatenated contents (empty or not-empty string)
Last, compare that with the empty string:
If True, then either all are empty or one or more is null
If False, none are null and at least one is not empty
Try
WHERE NULLIF(field1, '') IS NULL
For SQL Server, I would use COALESCE for this:
WHERE COALESCE(field1, '') > ''
ISNULL also works
If you want to exclude rows where every field is null or blank you can do it like this:
WHERE COAlESCE(Field1,Field2,Field3,Field4,Field5,'') <> ''

SQL Server 2016 FOR JSON PATH returns string instead of array when using case statement

I'm trying to build a JSON object that contains an array, using SQL Server 2016.
The source data for the array is itself JSON, so I'm using the JSON_QUERY inside a select statement, with the FOR JSON clause applied to the select statement.
Everything works beautifully until I wrap the JSON_QUERY clause in a CASE statement (in certain cases the array must not be included, i.e. must be null).
The following code illustrates the problem:
declare #projects nvarchar(max) = '{"projects": [23439658267415,166584258534050]}'
declare #id bigint = 123
SELECT
[data.array1] = JSON_QUERY(#projects, '$.projects') -- returns an array - perfect.
, [data.array2] = CASE WHEN 1 is NOT NULL
THEN JSON_QUERY(#projects, '$.projects')
ELSE NULL END -- returns an array - still good!
, [data.array3] = CASE WHEN #id is NOT NULL
THEN JSON_QUERY(#projects, '$.projects')
ELSE NULL END -- why do I end up with a string in the JSON when I do this?
FOR JSON PATH, without_array_wrapper
This code returns the following JSON:
{
"data":{
"array1": [23439658267415,166584258534050],
"array2": [23439658267415,166584258534050],
"array3":"[23439658267415,166584258534050]"
}
}
The problem is that the third 'array' is returned as a string object.
I would expect it to return the following JSON:
{
"data":{
"array1": [23439658267415,166584258534050],
"array2": [23439658267415,166584258534050],
"array3": [23439658267415,166584258534050]
}
}
If I remove the FOR JSON PATH... clause, all columns returned by the query are identical (i.e. all three nvarchar values returned by the JSON_QUERY function are identical).
Why is this happening, how do I make it output an array in the final JSON?
Wrap the result from the case statement in a call to JSON_QUERY.
, [data.array3] = JSON_QUERY(
CASE WHEN #id is NOT NULL
THEN JSON_QUERY(#projects, '$.projects')
ELSE NULL END
)
According to the documentation JSON_QUERY "Extracts an object or an array from a JSON string". Further down it says "Returns a JSON fragment of type nvarchar(max).". A bit confusing.
Doing a for xml json on a string value will give you a string value in the returned JSON string and when you do it on a JSON object you get the JSON object inlined in the resulting string value.
You can look at CASE as a function call with a return value automatically determined for you by looking at what values you are returning from the CASE. And since JSON_QUERY returns a string the case will return a string and the returned value will be a string value in JSON.
The case statement in the query plan looks like this.
<ScalarOperator ScalarString="CASE WHEN [#id] IS NOT NULL THEN json_query([#projects],N'$.projects') ELSE NULL END">
When you wrap the case in a call to JSON_QUERY it looks like this instead.
<ScalarOperator ScalarString="json_query(CASE WHEN [#id] IS NOT NULL THEN json_query([#projects],N'$.projects') ELSE NULL END)">
<Intrinsic FunctionName="json_query">
By some kind of internal magic SQL Server recognize this as a JSON object instead of a string and inserts it into the resulting JSON string as a JSON value instead.
CASE WHEN 1 is NOT NULL works because SQL Server is smart enough to see that the case statement will always be true and is optimized away.

Resources