For "Querying Data in Staged Files" in Snowflake what does the syntax "=>" mean? - snowflake-cloud-data-platform

In the Snowflake Documentation for "Querying Data in Staged Files" why is the syntax for the "Pattern" & "Format" parameters "=>" instead of "=" whereas for the COPY INTO syntax the "Pattern" & "Format" parameters have "="?
The documentation doesn't mention anything about this difference so I'm confused.
">=" means Greater than or Equal to
"<=" means Less than or Equal to
So, what the hell does "=>" mean?
Link to the documentation for "Querying Data in Staged Files": https://docs.snowflake.com/en/user-guide/querying-stage.html
Link to the documentation for "COPY INTO ": https://docs.snowflake.com/en/sql-reference/sql/copy-into-table.html
Link to the documentation for Snowflake Query Operators: https://docs.snowflake.com/en/sql-reference/operators.html

In general when you define function or stored procedure it will have a specific signature. This signature has to be matched during the routine call.
Example:
CREATE OR REPLACE FUNCTION test(a INT, b TEXT)
RETURNS TEXT
AS
$$
CONCAT(a, ' ', b)
$$;
SHOW FUNCTIONS LIKE 'TEST';
-- TEST(NUMBER, VARCHAR) RETURN VARCHAR
When calling test function argument order has to match its signature("positional notation"):
SELECT test(1, 'b');
-- 1 b
Unfortunately it is not possible to use named parameters for user defined objects and explicitly state the parameters("named notation"):
SELECT test(a => 1, b => 'b');
SELECT test(b => 'b', a => 1);
SELECT test(b => 'b');
Some built-ins constructs however allows to use named parameters => (for instance FLATTEN or staged file clause).
Using FLATTEN as it is easier to produce self-contained example:
FLATTEN( INPUT => <expr> [ , PATH => <constant_expr> ]
[ , OUTER => TRUE | FALSE ]
[ , RECURSIVE => TRUE | FALSE ]
[ , MODE => 'OBJECT' | 'ARRAY' | 'BOTH' ] )
All 3 invocations are correct:
-- no explicit parameters names
SELECT * FROM TABLE(FLATTEN(parse_json('{"a":1, "b":[77,88]}'), 'b')) f;
-- parameters names order: input, path
SELECT * FROM TABLE(FLATTEN(input => parse_json('{"a":1, "b":[77,88]}'), path => 'b')) f;
-- parameters names order: path, input
SELECT * FROM TABLE(FLATTEN(path => 'b', input => parse_json('{"a":1, "b":[77,88]}'))) f;

Related

Postgresql how to select a value from multiple jsons inside a array on a jsonB column

I have this table
create table <table_name>(attr jsonb)
And this is the data inside
{
"rules": [
{
"id": "foo",
"name": "test_01",
...
},
{
"id": "bar",
"name": "test_02",
...
}
]
}
What I want is to select both names, what I have accomplished so far is this
select attr -> 'rules' -> 0 -> 'name' from <table_name>;
which returns test_01
select attr -> 'rules' -> 1 -> 'name' from <table_name>;
which returns test_02
I want to return something like this:
test_01,test_02
or if it's possible to return them in multiple lines, that would be even better
This is a sample data to show my problem, for reasons beyond my control, it's not possible to store each rule on a distinct line
You can use jsonb_array_length together with generate_series to get each name. Then use string_agg to aggregate list of names. Without plpgsql and with a single statement. (see demo)
with jl(counter) as ( select jsonb_array_length(attr->'rules') from table_name )
select string_agg(name,' ') "Rule Names"
from (select attr->'rules'-> n ->> 'name' name
from table_name
cross join ( select generate_series(0,counter-1) from jl ) gs(n)
) rn;
if anyone else get stuck on a situation like this, this is the solution the I found
create or replace function func_get_name() RETURNS text
language 'plpgsql'
AS $$
declare
len character varying(255);
names character varying(255);
res character varying(255);
begin
select jsonb_array_length(attr->'rules') into len from <table_name>;
res := '';
for counter in 0..len loop
select attr->'rules'-> counter ->> 'name'
into names
from <table_name>;
if names is not null then
res := res || ' ' || names;
end if;
end loop;
return res;
end;
$$
select func_get_name();
it's a solution: yes, it's a good solution: I have no ideia

BigQuery: extract keys from json object, convert json from object to key-value array

I have a table with a column which contains a json-object, the value type is always a string.
I need 2 kind of information:
a list of the json keys
convert the json in an array of key-value pairs
This is what I got so far, which is working:
CREATE TEMP FUNCTION jsonObjectKeys(input STRING)
RETURNS Array<String>
LANGUAGE js AS """
return Object.keys(JSON.parse(input));
""";
CREATE TEMP FUNCTION jsonToKeyValueArray(input STRING)
RETURNS Array<Struct<key String, value String>>
LANGUAGE js AS """
let json = JSON.parse(input);
return Object.keys(json).map(e => {
return { "key" : e, "value" : json[e] }
});
""";
WITH input AS (
SELECT "{\"key1\": \"value1\", \"key2\": \"value2\"}" AS json_column
UNION ALL
SELECT "{\"key1\": \"value1\", \"key3\": \"value3\"}" AS json_column
UNION ALL
SELECT "{\"key5\": \"value5\"}" AS json_column
)
SELECT
json_column,
jsonObjectKeys(json_column) AS keys,
jsonToKeyValueArray(json_column) AS key_value
FROM input
The problem is that FUNCTION is not the best in term of compute optimization, so I'm trying to understand if there is a way to use plain SQL to achieve these 2 needs (or 1 of them at least) using only SQL w/o functions.
Below is for BigQuery Standard SQL
#standardsql
select
json_column,
array(select trim(split(kv, ':')[offset(0)]) from t.kv kv) as keys,
array(
select as struct
trim(split(kv, ':')[offset(0)]) as key,
trim(split(kv, ':')[offset(1)]) as value
from t.kv kv
) as key_value
from input,
unnest([struct(split(translate(json_column, '{}"', '')) as kv)]) t
If to apply to sample data from your question - output is

How do I unload a CSV file where only non-null values are wrapped in quotes, quotes are optionally enclosed, and null values are not quoted?

(Submitting on behalf of a Snowflake User)
For example - ""NiceOne"" LLC","Robert","GoodRX",,"Maxift","Brian","P,N and B","Jane"
I have been able use create a file format that satisfies each of these conditions, but not one that satisfies all three.
I've used the following recommendation:
Your first column is malformed, missing the initial ", it should be:
"""NiceOne"" LLC"
After fixing that, you should be able to load your data with almost
default settings,
COPY INTO my_table FROM #my_stage/my_file.csv FILE_FORMAT = (TYPE =
CSV FIELD_OPTIONALLY_ENCLOSED_BY = '"');
...but the above format returns:
returns -
"""NiceOne"" LLC","Robert","GoodRX","","Maxift","Brian","P,N and B","Jane"
I don't want quotes around empty fields. I'm looking for
"""NiceOne"" LLC","Robert","GoodRX",,"Maxift","Brian","P,N and B","Jane"
Any recommendations?
If you use the following you will not get quotes around NULL fields, but you will get quotes on '' (empty text). You can always concatenate the fields and format the resulting line manually if this doesn't suite you.
COPY INTO #my_stage/my_file.CSV
FROM (
SELECT
'"NiceOne" LLC' A, 'Robert' B, 'GoodRX' C, NULL D,
'Maxift' E, 'Brian' F, 'P,N and B' G, 'Jane' H
)
FILE_FORMAT = (
TYPE = CSV
FIELD_OPTIONALLY_ENCLOSED_BY = '"'
NULL_IF = ()
COMPRESSION = NONE
)
OVERWRITE = TRUE
SINGLE = TRUE

Create Postgres JSONB Index on Array Sub-Object

I have table myTable with a JSONB column myJsonb with a data structure that I want to index like:
{
"myArray": [
{
"subItem": {
"email": "bar#bar.com"
}
},
{
"subItem": {
"email": "foo#foo.com"
}
}
]
}
I want to run indexed queries on email like:
SELECT *
FROM mytable
WHERE 'foo#foo.com' IN (
SELECT lower(
jsonb_array_elements(myjsonb -> 'myArray')
-> 'subItem'
->> 'email'
)
);
How do I create a Postgres JSONB index for that?
If you don't need the lower() in there, the query can be simple and efficient:
SELECT *
FROM mytable
WHERE myjsonb -> 'myArray' #> '[{"subItem": {"email": "foo#foo.com"}}]'
Supported by a jsonb_path_ops index:
CREATE INDEX mytable_myjsonb_gin_idx ON mytable
USING gin ((myjsonb -> 'myArray') jsonb_path_ops);
But the match is case-sensitive.
Case-insensitive!
If you need the search to match disregarding case, things get more complex.
You could use this query, similar to your original:
SELECT *
FROM t
WHERE EXISTS (
SELECT 1
FROM jsonb_array_elements(myjsonb -> 'myArray') arr
WHERE lower(arr #>>'{subItem, email}') = 'foo#foo.com'
);
But I can't think of a good way to use an index for this.
Instead, I would use an expression index based on a function extracting an array of lower-case emails:
Function:
CREATE OR REPLACE FUNCTION f_jsonb_arr_lower(_j jsonb, VARIADIC _path text[])
RETURNS jsonb LANGUAGE sql IMMUTABLE AS
'SELECT jsonb_agg(lower(elem #>> _path)) FROM jsonb_array_elements(_j) elem';
Index:
CREATE INDEX mytable_email_arr_idx ON mytable
USING gin (f_jsonb_arr_lower(myjsonb -> 'myArray', 'subItem', 'email') jsonb_path_ops);
Query:
SELECT *
FROM mytable
WHERE f_jsonb_arr_lower(myjsonb -> 'myArray', 'subItem', 'email') #> '"foo#foo.com"';
While this works with an untyped string literal or with actual jsonb values, it stops working if you pass text or varchar (like in a prepared statement). Postgres does not know how to cast because the input is ambiguous. You need an explicit cast in this case:
... #> '"foo#foo.com"'::text::jsonb;
Or pass a simple string without enclosing double quotes and do the conversion to jsonb in Postgres:
... #> to_jsonb('foo#foo.com'::text);
Related, with more explanation:
Query for array elements inside JSON type
Index for finding an element in a JSON array

Concatenating INT and VARCHAR inside EXEC not producing conversion error

Given the following table:
USE tempdb;
CREATE TABLE #T(Val INT);
INSERT INTO #T VALUES (1), (2), (3), (4), (5);
I wanted to execute a dynamic sql query using EXEC given a Val value:
DECLARE #sql NVARCHAR(MAX);
DECLARE #Val INT = 3;
EXEC ('SELECT * FROM #T WHERE Val = ' + #Val);
This executes without error and gives the correct result.
My assumption is that this will produce an error:
Conversion failed when converting the varchar value 'SELECT * FROM #T
WHERE Val = ' to data type int.
Since #Val is of INT data type and by the rules of the data type precedence, the query inside the EXEC must be converted to INT.
My question is why didn't the call to EXEC produce a conversion error?
Notes:
- I know about sp_executesql. I'm not also asking for an alternative. I'm just asking for an explanation why no error was produced.
- The answer to this question does not seem to explain my situation as the question refers to VARCHAR to VARCHAR concatenation.
According to MSDN/BOL, simplified syntax for EXEC[UTE] statement is:
Execute a character string
{ EXEC | EXECUTE }
( { #string_variable | [ N ]'tsql_string' } [ + ...n ] )
[ AS { USER } = ' name ' ]
[;]
#string_variable
Is the name of a local variable. #string_variable can be any char, varchar, nchar, or nvarchar data type. These include the (max) data types.
Few notes:
1) According to this line ( { #string_variable | [ N ] 'command_string [ ? ]' } [ **+** ...n ], we can write something like this EXEC (#var1 + #var2 + #var3) but according to last paragraph SQL Server expects these variables to be/have one of following string data type: char, varchar, nchar, or nvarchar.
2) Also, this syntax references only string variables (#string_variable | [ N ] 'command_string [ ? ]' } [ + ...n). I believe that this is the reason why EXEC ('SELECT * FROM #T WHERE Val = ' + 3); fails: 3 isn't a variable.
3) I assume that when one of these variables don't have one of above string types then SQL Server will do an implicit conversion. I assume it will convert from source variable from INT (for example) to NVARCHAR because it has the highest data type precedence between these string types.
4) This is not the only place where data type precedence doesn't work. ISNULL(param1, param2) it's just another example. In this case, param2 will be converted to data type of param1.
An implicit conversion from int to string types is allowed, at least as far back as SQL Server 2008.
Ref: Data Type Conversion (Database Engine)
You cannot disable the implicit conversion: Is there a way to turn off implicit type conversion in SQL Server?
Edit: I originally wrote
'SELECT * FROM #T WHERE Val = ' + #Val is created before the call to EXEC.
I am not so sure about that. I now suspect that the argument to EXEC is passed to part of the DB engine that parses it in a different way to what we are used to seeing.
All that MSDN tell about concatinating in EXEC[UTE] is:
the concatenation is performed logically in the SQL Server parser and never materializes in memory.
So we can not know much about what does SQL Server deep inside. All we know is that EXEC does not accept an expression as an argument, instead it accepts a list of strings delimited by '+'.
If you look at the syntax:
Execute a character string
{ EXEC | EXECUTE }
( { #string_variable | [ N ]'tsql_string' } [ + ...n ] )
[ AS { LOGIN | USER } = ' name ' ]
[;]
It directly supports a number of strings or variables separated by '+' character. So you are not actually passing an expression to EXEC, and so you are bypassing the SQL Server expression parser.

Resources