Snowflake Behavior differently for literal values and table values

Snowflake Behavior differently for literal values and table values - snowflake-cloud-data-platform

I have a function AVGF, which is suppose to output average of comma separated sequence
CREATE OR REPLACE FUNCTION AVGF (STR VARCHAR)
RETURNS VARCHAR
LANGUAGE JAVASCRIPT
AS $$
var str_array = STR.split(",");
var total = 0.0
for (i = 0; i < str_array.length; i += 1) {
total += parseFloat(str_array[i]);
}
return total/str_array.length;
$$
;
It works fine for example mentioned below
SELECT AVGF('1,2,3')
Output: 2
However, when I use it on top of table it doesn't work
SELECT AVGF(concat(TO_CHAR(ic.col1) ,',',TO_CHAR(ic.col2) )::STRING)
from tab
Output: JavaScript execution error: Uncaught TypeError: Cannot read property 'split' of undefined in AVGF at ' var str_array = STR.split(",");' position 21
where tab is table with col1(number) & col2(float)
Any reason for this and any solution?

This was a casting issue, modified the function as mentioned below to make it work
CREATE OR REPLACE FUNCTION AVGF (STR VARCHAR)
RETURNS VARCHAR
LANGUAGE JAVASCRIPT
AS $$
var str_array = String(STR).split(",");
var total = 0.0
for (i = 0; i < str_array.length; i += 1) {
total += parseFloat(str_array[i]);
}
return total/str_array.length;
$$
;

Related

How to generate Stackoverflow table markdown from Snowflake

Stackoverflow supports table markdown. For example, to display a table like this:
N_NATIONKEY
N_NAME
N_REGIONKEY
0
ALGERIA
0
1
ARGENTINA
1
2
BRAZIL
1
3
CANADA
1
4
EGYPT
4
You can write code like this:
|N_NATIONKEY|N_NAME|N_REGIONKEY|
|---:|:---|---:|
|0|ALGERIA|0|
|1|ARGENTINA|1|
|2|BRAZIL|1|
|3|CANADA|1|
|4|EGYPT|4|
It would save a lot of time to generate the Stackoverflow table markdown automatically when running Snowflake queries.

The following stored procedure accepts either a query string or a query ID (it will auto-detect which it is) and returns the table results as Stackoverflow table markdown. It will automatically align numbers and dates to the right, strings, arrays, and objects to the left, and other types default to centered. It supports any query you can pass to it. It may be a good idea to use $$ to terminate the string passed into the procedure in case the SQL contains single quotes. You can create the procedure and test it using this script:
create or replace procedure MARKDOWN("queryOrQueryId" string)
returns string
language javascript
execute as caller
as
$$
const MAX_ROWS = 50; // Set the maximum row count to fetch. Tables in markdown larger than this become hard to read.
var [rs, i, c, row, props] = [null, 0, 0, 0, {}];
if (!queryOrQueryId || queryOrQueryId == 0){
queryOrQueryId = `select * from table(result_scan(last_query_id())) limit ${MAX_ROWS}`;
}
queryOrQueryId = queryOrQueryId.trim();
if (isUUID(queryOrQueryId)){
rs = snowflake.execute({sqlText:`select * from table(result_scan('${queryOrQueryId}')) limit ${MAX_ROWS}`});
} else {
rs = snowflake.execute({sqlText:`${queryOrQueryId}`});
}
props.columnCount = rs.getColumnCount();
for(i = 1; i <= props.columnCount; i++){
props["col" + i + "Name"] = rs.getColumnName(i);
props["col" + i + "Type"] = rs.getColumnType(i);
}
var table = getHeader(props);
while(rs.next()){
row = "|";
for(c = 1; c <= props.columnCount; c++){
row += escapeMarkup(rs.getColumnValueAsString(c)) + "|";
}
table += "\n" + row;
}
return table;
//------ End main function. Start of helper functions.
function escapeMarkup(s){
s = s.replace(/\\/g, "\\\\");
s = s.replaceAll('|', '\\|');
s = s.replace(/\s+/g, " ");
return s;
}
function getHeader(props){
s = "|";
for (var i = 1; i <= props.columnCount; i++){
s += props["col" + i + "Name"] + "|";
}
s += "\n";
for (var i = 1; i <= props.columnCount; i++){
switch(props["col" + i + "Type"]) {
case 'number':
s += '|---:';
break;
case 'string':
s += '|:---';
break;
case 'date':
s += '|---:';
break;
case 'json':
s += '|:---';
break;
default:
s += '|:---:';
}
}
return s + "|";
}
function isUUID(str){
const regexExp = /^[0-9a-fA-F]{8}\b-[0-9a-fA-F]{4}\b-[0-9a-fA-F]{4}\b-[0-9a-fA-F]{4}\b-[0-9a-fA-F]{12}$/gi;
return regexExp.test(str);
}
$$;
-- Usage type 1, a simple query:
call stackoverflow_table($$ select * from SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.NATION limit 5 $$);
-- Usage type 2, a query ID:
select * from SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.NATION limit 5;
set quid = (select last_query_id());
call stackoverflow_table($quid);
Edit: Based on Fieldy's helpful feedback, I modified the procedure code to allow passing null or 0 or a blank string '' as the parameter. This will use the last query ID and is a helpful shortcut. It also adds a constant to the code that will limit the returns to a set number of rows. This limit will be applied when using query IDs (or sending null, '', or 0, which uses the last query ID). The limit is not applied when the input parameter is the text of a query to run to avoid syntax errors if there's already a limit applied, etc.

Greg Pavlik's Javascript Stored Procedure solution made me wonder if this would be any easier with the new Python language support in Stored Procedures. This is currently a public-preview feature.
The Python Snowpark API supports returning a result as a Pandas dataframe, and Pandas supports returning a dataframe in Markdown format, via the tabulate package. Here's the stored procedure.
CREATE OR REPLACE PROCEDURE markdown_table(query_id VARCHAR)
RETURNS VARCHAR
LANGUAGE PYTHON
RUNTIME_VERSION = '3.8'
PACKAGES = ('snowflake-snowpark-python','pandas','tabulate', 'regex')
HANDLER = 'markdown_table'
EXECUTE AS CALLER
AS $$
import pandas as pd
import tabulate
import regex
def markdown_table(session, queryOrQueryId = None):
# Validate UUID
if(queryOrQueryId is None):
pandas_result = session.sql("""Select * from table(result_scan(last_query_id()))""").to_pandas()
elif(bool(regex.match("^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$", queryOrQueryId))):
pandas_result = session.sql(f"""select * from table(result_scan('{queryOrQueryId}'))""").to_pandas()
else:
pandas_result = session.sql(queryOrQueryId).to_pandas()
return pandas_result.to_markdown()
$$;
Which you can use as follows:
-- Usage type 1, use the result from the query ran immediately proceeding the Store-Procedure Call
select * from SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.NATION limit 5;
call markdown_table(NULL);
-- Usage type 2, pass in a query_id
select * from SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.NATION limit 5;
set quid = (select last_query_id());
select $quid;
call markdown_table($quid);
-- Usage type 3, provide a Query string to the Store-Procedure Call
call markdown_table('select * from SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.NATION limit 5');

The table can also be
N_NATIONKEY|N_NAME|N_REGIONKEY
--|--|--
0|ALGERIA|0
1|ARGENTINA|1
2|BRAZIL|1
3|CANADA|1
4|EGYPT|4
giving, so it can be a simpler solution
N_NATIONKEY
N_NAME
N_REGIONKEY
0
ALGERIA
0
1
ARGENTINA
1
2
BRAZIL
1
3
CANADA
1
4
EGYPT
4
I grab the result table and use notepad++ and replace tab \t with pipe space | and then insert by hand the header marker line. I sometime replace the empty null results with the text null to make the results make more sense. the form you use with the start/end pipes gets around the need for that.

DBeaver IDE supports "data export as markdown" and "advanced copy as markdown" out-of-the-box:
Output:
|R_REGIONKEY|R_NAME|R_COMMENT|
|-----------|------|---------|
|0|AFRICA|lar deposits. blithely final packages cajole. regular waters are final requests. regular accounts are according to |
|1|AMERICA|hs use ironic, even requests. s|
|2|ASIA|ges. thinly even pinto beans ca|
|3|EUROPE|ly final courts cajole furiously final excuse|
|4|MIDDLE EAST|uickly special accounts cajole carefully blithely close requests. carefully final asymptotes haggle furiousl|
It is rendered as:
R_REGIONKEY
R_NAME
R_COMMENT
0
AFRICA
lar deposits. blithely final packages cajole. regular waters are final requests. regular accounts are according to
1
AMERICA
hs use ironic, even requests. s
2
ASIA
ges. thinly even pinto beans ca
3
EUROPE
ly final courts cajole furiously final excuse
4
MIDDLE EAST
uickly special accounts cajole carefully blithely close requests. carefully final asymptotes haggle furiousl

Snowflake regexp_replace not working as expected

I tried to get paths enclosed by double quotes (ex: "path"."to"."element"). It also strips any bracket-enclosed array element references (like "[0]")
var path_name = "regexp_replace(regexp_replace("customers[0].name",'\\[(.+)\\]'),'(\\w+)','"\\1"')" ;
I tried this method but it is displaying error

So this is a really poorly written question. But lets play the guessing game anyways.
So you have a Javascript stored procedure, and you have that line it side it, and it doesn't work as you expect: lets guess it looks like:
create or replace procedure sp()
returns VARCHAR
language javascript
as
$$
var txt = '"customers[0].name"';
var sql_regexp1 = '\\\\[(.+)\\\\]';
var sql_regexp2 = '(\\\\w+)';
var sql_rep_2 = '\"\\\\1\"';
var full_rep1 = "regexp_replace('" + txt + "','"+ sql_regexp1 +"')";
var full_rep2 = "select regexp_replace(" + full_rep1 + ",'"+ sql_regexp2 +"','"+ sql_rep_2 + "');";
//return full_rep2;
var statement = snowflake.createStatement( {sqlText: full_rep2} );
var result_set1 = statement.execute();
result_set1.next()
return result_set1.getColumnValue(1);
$$;
;
and if you uncomment out the early return to can see the full_rep2
thus you can test that the inner SQL
select regexp_replace('"customers[0].name"','\\[(.+)\\]');
gives:
REGEXP_REPLACE('"CUSTOMERS[0].NAME"','\[(.+)\]')
"customers.name"
lets assume that's correct!
then you can check the outer replace:
select regexp_replace(regexp_replace('"customers[0].name"','\\[(.+)\\]'),'(\\w+)','"\\1"');
which gives:
REGEXP_REPLACE(REGEXP_REPLACE('"CUSTOMERS[0].NAME"','\[(.+)\]'),'(\W+)','"\1"')
""customers"."name""
and if we call the stored procedure:
call sp();
we get:
SP
""customers"."name""
So this was "how I debugged the SQL/Javascript" to have "valid working SQL. The question then becomes, what output did you want. And can you get there from here.

What are the options available to get Primary Key Column Names in Snowflake?

I need to fetch all Primary Keys, their parent Table Name , Column Name, and Schema Name together.
I am using INFORMATION_SCHEMA for all metadata fetching, SHOW PRIMARY KEYS/DESCRIBE TABLE does the job but it's not an option here.
Need something similar to SELECT *FROM DB.INFORMATION_SCHEMA.XXX.
What are the options we have here?
*I am Using JDBC

You may consider using: getPrimaryKeys(String, String, String)
Details: https://docs.snowflake.com/en/user-guide/jdbc-api.html#object-databasemetadata

A while back, I wrote a user defined table function (UDTF) to get the PK column(s) for a single table, each column in the PK as a single row in the return. I extended it to return a table with all the columns in PKs in an entire database.
Once you create the UDTF, you can get all the PKs for a database like this:
select * from table(get_pk_columns(get_ddl('database', 'MY_DB_NAME')));
It will return a table with columns for the schema name, table name, and column name(s). Note that if there's a composite PK, it shows in the table as one row per column. You can of course use an aggregate function such as listagg() to change that into a single row with the columns names of the composite PK separated by commas.
It's possible that if you have a very large number of tables/columns in your database, the return of the GET_DDL() function will be too large to fit into the 16mb limit. If it does fit, this should return the results quickly.
/********************************************************************************************************
* *
* User defined table function (UDTF) to get all primary keys for a database. *
* *
* #param {string}: DATABASE_DDL The DDL for the database to get the PKs. Usually use GET_DDL() *
* #return {table}: A table with the columns comprising the table's primary key *
* *
********************************************************************************************************/
create or replace function GET_PK_COLUMNS(DATABASE_DDL string)
returns table ("SCHEMA_NAME" string, "TABLE_NAME" string, PK_COLUMN string)
language javascript
as
$$
{
processRow: function get_params(row, rowWriter, context){
var startTableLine = -1;
var endTableLine = -1;
var dbDDL = row.DATABASE_DDL.replace(/'[\s\S]*'/gm, '')
var lines = dbDDL.split("\n");
var currentSchema = "";
var currentTable = "";
var ln = 0;
var tableDDL = "";
var pkCols = null;
var c = 0;
for (var i=0; i < lines.length; i++) {
if (lines[i].match(/^create .* schema /)) {
currentSchema = lines[i].split("schema")[1].replace(/;/, '');
//rowWriter.writeRow({PK_COLUMN: "currentSchema = " + currentSchema});
}
if (lines[i].match(/^create or replace TABLE /)) {
startTableLine = i;
}
if (startTableLine != -1 && lines[i] == ");") {
endTableLine = i;
}
if (startTableLine != -1 && endTableLine != -1) {
// We found a table. Now, join it and send it for parsing
tableDDL = "";
for (ln = startTableLine; ln <= endTableLine; ln++) {
if (ln > 0) tableDDL += "\n";
tableDDL += lines[ln];
}
startTableLine = -1;
endTableLine = -1;
currentTable = getTableName(tableDDL);
pkCols = getPKs(tableDDL);
for (c = 0; c < pkCols.length; c++) {
rowWriter.writeRow({PK_COLUMN: pkCols[c], SCHEMA_NAME: currentSchema, TABLE_NAME: currentTable});
}
}
}
function getTableName(tableDDL) {
var lines = tableDDL.split("\n");
var s = lines[1];
s = s.substring(s.indexOf(" TABLE ") + " TABLE ".length);
s = s.split(" (")[0];
return s;
}
function getPKs(tableDDL) {
var c;
var keyword = "primary key";
var ins = -1;
var s = tableDDL.split("\n");
for (var i = 0; i < s.length; i++) {
ins = s[i].indexOf(keyword);
if (ins != -1) {
var colList = s[i].substring(ins + keyword.length);
colList = colList.replace("(", "");
colList = colList.replace(")", "");
var colArray = colList.split(",");
for (pkc = 0; c < colArray.length; pkc++) {
colArray[pkc] = colArray[pkc].trim();
}
return colArray;
}
}
return []; // No PK
}
}
}
$$;

I did this using a very simple SQL based UDTF:
CREATE OR REPLACE FUNCTION admin.get_primary_key(p_table_nm VARCHAR)
RETURNS TABLE(column_name VARCHAR, ordinal_position int)
AS
WITH t AS (select get_ddl('TABLE', p_table_nm) tbl_ddl)
, t1 AS (
SELECT POSITION('primary key (', tbl_ddl) + 13 pos
, SUBSTR(tbl_ddl, pos, POSITION(')', tbl_ddl, pos) - pos ) str
FROM t
)
SELECT x.value column_name
, x.index ordinal_position
FROM t1
, LATERAL SPLIT_TO_TABLE(t1.str, ',') x
;
You can then query this in a SQL statement:
select *
FROM TABLE(admin.get_primary_key('<your table name>'));
Unfortunately, due to the odd implementation of GET_DDL(), it will only accept a string literal and you can't use this function with a lateral join to information_schema.tables. Get the following error:
SQL compilation error: Invalid value [CORRELATION(T.TABLE_SCHEMA) ||
'.' || CORRELATION(T.TABLE_NAME)] for function '2', parameter
EXPORT_DDL: constant arguments expected

Question regarding nesting multiple functions inside of a function in Snowflake

(Submitting on behalf of a Snowflake User...)
QUESTION:
Is it possible to nest multiple functions inside of a function and pass all the parameters required?
for example...
CREATE OR REPLACE FUNCTION "udf_InteractionIndicator"("ROH_RENEWAL_SYSTEM_STATUS1" VARCHAR(100), "GOLS_OPPORTUNITY_LINE_STATUS" VARCHAR(100)
, "ROH_CLIENT_CURRENT_TEMPERATURE1" VARCHAR(100)
, "ROH_PO_ATTACHED" VARCHAR(100)
, "ROH_PO_NUMBER" VARCHAR(100)
, "RT_PAID_OVERRIDE" VARCHAR(100), "ROH_RENEWAL_OPPORTUNITY_STATUS1" VARCHAR(100)
, "ROH_RENEWAL_CONVERSATION_DATE" DATE, "ROH_APPROVAL_RECEIVED_DATE" DATETIME)
RETURNS NUMBER(1,0)
AS
$$
CASE WHEN ("udf_RenewalNoticeSentIndicator"("ROH_RENEWAL_SYSTEM_STATUS1", "ROH_CLIENT_CURRENT_TEMPERATURE1"
, "GOLS_OPPORTUNITY_LINE_STATUS"
, "ROH_PO_ATTACHED", "RT_PAID_OVERRIDE"
, "ROH_RENEWAL_OPPORTUNITY_STATUS1")) = 1
AND (ROH_RENEWAL_CONVERSATION_DATE IS NOT NULL
OR ("udf_AuthorizedIndicator"(ROH_APPROVAL_RECEIVED_DATE, "ROH_PO_ATTACHED", "ROH_PO_NUMBER")) = 1
OR ("udf_PaidIndicator"("GOLS_OPPORTUNITY_LINE_STATUS")) = 1
OR ("udf_ChurnIndicator"("GOLS_OPPORTUNITY_LINE_STATUS")) = 1
)
THEN 1 ELSE 0 END
$$
;
I've received the recommendation to:
...create a SQL UDF or JavaScript UDF. A JavaScript UDF can only
contain JavaScript code, and an SQL UDF can contain only one SQL
statement (no DML and DDL). In case of nesting, SQL UDF can call
another SQL UDF or a JavaScript UDF but the same is not true with the
JavaScript UDF(it only contains JavaScript code).
CREATE OR REPLACE FUNCTION udf_InteractionIndicator_nested(ID DOUBLE)
RETURNS DOUBLE
AS
$$
SELECT ID
$$;
create or replace function js_factorial(d double)
returns double
language javascript
strict
as '
if (D <= 0) {
return 1;
} else {
var result = 1;
for (var i = 2; i <= D; i++) {
result = result * i;
}
return result;
}
';
CREATE OR REPLACE FUNCTION udf_InteractionIndicator(ID DOUBLE)
RETURNS double
AS
$$
select udf_InteractionIndicator_nested(ID) + js_factorial(ID)
$$;
select udf_InteractionIndicator(4);
+-----------------------------+
| UDF_INTERACTIONINDICATOR(4) |
|-----------------------------|
| 28 |
+-----------------------------+
HOWEVER, I'm trying to accomplish this with a SQL UDF. It makes sense that a nested function can be created as long as they use the same parameter. I'd like to create a function that accepts say 8 parameters and the underlying functions may reference all, some or none of the parent function parameters. That is where I run into an issue... THOUGHTS?

(A consultant in our community offered the following answer...)
With a JavaScript UDF the design will be much more compact and maintainable, if your use case is that there is a "main" function that breaks down work into subfunctions which will only be invoked from main.
Then you simply define all underlying functions within the main function, which is possible with JavaScript but not with an SQL UDF, and then you are free to use the main parameters everywhere.
CREATE OR REPLACE FUNCTION MAIN_JS(P1 STRING, P2 FLOAT)
RETURNS FLOAT
LANGUAGE JAVASCRIPT
AS '
function helper_1(p) { return p * 2; }
function helper_2(p) { return p == "triple" ? P2 * 3 : P2; }
return helper_1(P2) + helper_2(P1);
';
SELECT MAIN_JS('triple', 4); -- => 20

Pass array from node-postgres to plpgsql function

The plpgsql function:
CREATE OR REPLACE FUNCTION testarray (int[]) returns int as $$
DECLARE
len int;
BEGIN
len := array_upper($1);
return len;
END
$$ language plpgsql;
The node-postgres query + test array:
var ta = [1,2,3,4,5];
client.query('SELECT testarray($1)', [ta], function(err, result) {
console.log('err: ' + err);
console.log('result: ' + result);
});
Output from node server:
err: error: array value must start with "{" or dimension information
result: undefined
I also tried cast the parameter in the client query like testarray($1::int[]) which returned the same error.
I changed the function argument to (anyarray int) and the output error changed:
err: error: invalid input syntax for integer: "1,2,3,4,5"
result: undefined
As well as a couple other variations.
I seek the variation that produces no error and returns 5.
I read about the Postgres parse-array issue and this stackoverflow question on parameterised arrays in node-postgres:
node-postgres: how to execute "WHERE col IN (<dynamic value list>)" query?
But the answer didn't seem to be there.

The parameter has to be in one of these forms:
'{1,2,3,4,5}' -- array literal
'{1,2,3,4,5}'::int[] -- array literal with explicit cast
ARRAY[1,2,3,4,5] -- array constructor
Also, you can simplify your function:
CREATE OR REPLACE FUNCTION testarray (int[])
RETURNS int AS
$func$
BEGIN
RETURN array_length($1, 1);
END
$func$ LANGUAGE plpgsql IMMUTABLE;
Or a simple SQL function:
CREATE OR REPLACE FUNCTION testarray2 (int[])
RETURNS int AS 'SELECT array_length($1, 1)' LANGUAGE sql IMMUTABLE;
Or just use array_length($1, 1) directly.
array_upper() is the wrong function. Arrays can have arbitrary subscripts. array_length() does what you are looking for. Related question:
Normalize array subscripts for 1-dimensional array so they start with 1
Both array_length() and array_upper() require two parameters. The second is the array dimension - 1 in your case.

thanks to the responses from PinnyM and Erwin. I reviewed the options and reread related answers.
the array formats described by Erwin work in node-postgres literally as follows:
'select testarray(' + "'{1,2,3,4,5}'" + ')'
'select testarray(' + "'{1,2,3,4,5}'" + '::INT[])'
'select testarray(ARRAY[1,2,3,4,5])'
the tl:dr of javascript quoting
to parameterize them in node-postgres:
(based on this answer)
var ta = [1,2,3,4,5];
var tas = '{' + ta.join() + '}';
...skipped over the pg connect code
client.query("select testarray($1)", [tas] ...
client.query("select testarray($1::int[])", [tas] ...
not sure about the ARRAY constructor.

Based on the answer you posted, this might work for you:
var ta = [1,2,3,4,5];
var params = [];
for(var i = 1, i <= ta.length; i++) {
params.push('$'+i);
}
var ta = [1,2,3,4,5];
client.query('SELECT testarray(\'{' + params.join(', ') + '}\')', ta, function(err, result) {
console.log('err: ' + err);
console.log('result: ' + result);
});

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Snowflake Behavior differently for literal values and table values - snowflake-cloud-data-platform

Related

How to generate Stackoverflow table markdown from Snowflake

Snowflake regexp_replace not working as expected

What are the options available to get Primary Key Column Names in Snowflake?

Question regarding nesting multiple functions inside of a function in Snowflake

Pass array from node-postgres to plpgsql function

Categories

Resources