Generating an "ORDER BY" statement, in a more convenient way - sql-server

In SQL we can use the "WHERE 1=1 hack" to easily generate a WHERE statement in a loop, so we don't need to check if the current iteration is the first one.
without WHERE 1 :
//I haven't tried the C++ code below, it's just an example to briefly explain the "hack"
string statement = "WHERE";
for (int i = 0 ; i < list.size() ; i++)
{
if (i != 0)
{
statement += " AND "; //we don't want to generate "WHERE AND"
}
statement += list[i];
}
the generated statement :
WHERE <something_1>
AND <something_2>
AND <something_3>
with WHERE 1 :
string statement = "WHERE 1 = 1"; // Added "1 = 1"
for (int i = 0 ; i < list.size() ; i++)
{
statement += "AND" + list[i];
}
the generated statement :
WHERE 1 = 1
AND <something_1>
AND <something_2>
AND <something_3>
My issue : I need to generate an "ORDER BY" statement, and I was wondering if such a hack also exists for the ORDER BY statement.
I could check if the current iteration in the loop is the last one, but there's maybe a better solution.
ORDER BY a DESC,
b DESC,
c DESC,
d DESC,
<dummy statement added at the end, so I don't need to remove the last comma>
From what I've read I cannot use "ORDER BY 1", so does a similar hack actually exists?

You could legally start every ORDER BY with something like:
order by ##spid
But it's likely to raise questions about query execution efficiency.

What you said about not using "1" is only partially true. If you did a similar pattern, adding "1" at the end would work as long as whatever column is in position 1 isn't also referenced in the order by clause.
I generally do something like this:
string whereClause = (list.Count > 0)
? "WHERE (" + list.StringJoin(") AND (") + ")"
: "";
where StringJoin is a very simple extension method I wrote. Note the added parentheses so that if any of the list elements have an "OR" you don't get into trouble.
Something identical can be done for ORDER BY, just replacing the word WHERE above.
string orderClause = (list.Count > 0)
? "ORDER BY " + list.StringJoin(", ")
: "";
This is C# instead of c++, but a simple version of the extension method is this:
public static string StringJoin(this IEnumerable<string> list, string separator)
{
if (list == null)
return null;
else
return string.Join(separator, list.ToArray());
}

Related

How to generate Stackoverflow table markdown from Snowflake

Stackoverflow supports table markdown. For example, to display a table like this:
N_NATIONKEY
N_NAME
N_REGIONKEY
0
ALGERIA
0
1
ARGENTINA
1
2
BRAZIL
1
3
CANADA
1
4
EGYPT
4
You can write code like this:
|N_NATIONKEY|N_NAME|N_REGIONKEY|
|---:|:---|---:|
|0|ALGERIA|0|
|1|ARGENTINA|1|
|2|BRAZIL|1|
|3|CANADA|1|
|4|EGYPT|4|
It would save a lot of time to generate the Stackoverflow table markdown automatically when running Snowflake queries.
The following stored procedure accepts either a query string or a query ID (it will auto-detect which it is) and returns the table results as Stackoverflow table markdown. It will automatically align numbers and dates to the right, strings, arrays, and objects to the left, and other types default to centered. It supports any query you can pass to it. It may be a good idea to use $$ to terminate the string passed into the procedure in case the SQL contains single quotes. You can create the procedure and test it using this script:
create or replace procedure MARKDOWN("queryOrQueryId" string)
returns string
language javascript
execute as caller
as
$$
const MAX_ROWS = 50; // Set the maximum row count to fetch. Tables in markdown larger than this become hard to read.
var [rs, i, c, row, props] = [null, 0, 0, 0, {}];
if (!queryOrQueryId || queryOrQueryId == 0){
queryOrQueryId = `select * from table(result_scan(last_query_id())) limit ${MAX_ROWS}`;
}
queryOrQueryId = queryOrQueryId.trim();
if (isUUID(queryOrQueryId)){
rs = snowflake.execute({sqlText:`select * from table(result_scan('${queryOrQueryId}')) limit ${MAX_ROWS}`});
} else {
rs = snowflake.execute({sqlText:`${queryOrQueryId}`});
}
props.columnCount = rs.getColumnCount();
for(i = 1; i <= props.columnCount; i++){
props["col" + i + "Name"] = rs.getColumnName(i);
props["col" + i + "Type"] = rs.getColumnType(i);
}
var table = getHeader(props);
while(rs.next()){
row = "|";
for(c = 1; c <= props.columnCount; c++){
row += escapeMarkup(rs.getColumnValueAsString(c)) + "|";
}
table += "\n" + row;
}
return table;
//------ End main function. Start of helper functions.
function escapeMarkup(s){
s = s.replace(/\\/g, "\\\\");
s = s.replaceAll('|', '\\|');
s = s.replace(/\s+/g, " ");
return s;
}
function getHeader(props){
s = "|";
for (var i = 1; i <= props.columnCount; i++){
s += props["col" + i + "Name"] + "|";
}
s += "\n";
for (var i = 1; i <= props.columnCount; i++){
switch(props["col" + i + "Type"]) {
case 'number':
s += '|---:';
break;
case 'string':
s += '|:---';
break;
case 'date':
s += '|---:';
break;
case 'json':
s += '|:---';
break;
default:
s += '|:---:';
}
}
return s + "|";
}
function isUUID(str){
const regexExp = /^[0-9a-fA-F]{8}\b-[0-9a-fA-F]{4}\b-[0-9a-fA-F]{4}\b-[0-9a-fA-F]{4}\b-[0-9a-fA-F]{12}$/gi;
return regexExp.test(str);
}
$$;
-- Usage type 1, a simple query:
call stackoverflow_table($$ select * from SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.NATION limit 5 $$);
-- Usage type 2, a query ID:
select * from SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.NATION limit 5;
set quid = (select last_query_id());
call stackoverflow_table($quid);
Edit: Based on Fieldy's helpful feedback, I modified the procedure code to allow passing null or 0 or a blank string '' as the parameter. This will use the last query ID and is a helpful shortcut. It also adds a constant to the code that will limit the returns to a set number of rows. This limit will be applied when using query IDs (or sending null, '', or 0, which uses the last query ID). The limit is not applied when the input parameter is the text of a query to run to avoid syntax errors if there's already a limit applied, etc.
Greg Pavlik's Javascript Stored Procedure solution made me wonder if this would be any easier with the new Python language support in Stored Procedures. This is currently a public-preview feature.
The Python Snowpark API supports returning a result as a Pandas dataframe, and Pandas supports returning a dataframe in Markdown format, via the tabulate package. Here's the stored procedure.
CREATE OR REPLACE PROCEDURE markdown_table(query_id VARCHAR)
RETURNS VARCHAR
LANGUAGE PYTHON
RUNTIME_VERSION = '3.8'
PACKAGES = ('snowflake-snowpark-python','pandas','tabulate', 'regex')
HANDLER = 'markdown_table'
EXECUTE AS CALLER
AS $$
import pandas as pd
import tabulate
import regex
def markdown_table(session, queryOrQueryId = None):
# Validate UUID
if(queryOrQueryId is None):
pandas_result = session.sql("""Select * from table(result_scan(last_query_id()))""").to_pandas()
elif(bool(regex.match("^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$", queryOrQueryId))):
pandas_result = session.sql(f"""select * from table(result_scan('{queryOrQueryId}'))""").to_pandas()
else:
pandas_result = session.sql(queryOrQueryId).to_pandas()
return pandas_result.to_markdown()
$$;
Which you can use as follows:
-- Usage type 1, use the result from the query ran immediately proceeding the Store-Procedure Call
select * from SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.NATION limit 5;
call markdown_table(NULL);
-- Usage type 2, pass in a query_id
select * from SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.NATION limit 5;
set quid = (select last_query_id());
select $quid;
call markdown_table($quid);
-- Usage type 3, provide a Query string to the Store-Procedure Call
call markdown_table('select * from SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.NATION limit 5');
The table can also be
N_NATIONKEY|N_NAME|N_REGIONKEY
--|--|--
0|ALGERIA|0
1|ARGENTINA|1
2|BRAZIL|1
3|CANADA|1
4|EGYPT|4
giving, so it can be a simpler solution
N_NATIONKEY
N_NAME
N_REGIONKEY
0
ALGERIA
0
1
ARGENTINA
1
2
BRAZIL
1
3
CANADA
1
4
EGYPT
4
I grab the result table and use notepad++ and replace tab \t with pipe space | and then insert by hand the header marker line. I sometime replace the empty null results with the text null to make the results make more sense. the form you use with the start/end pipes gets around the need for that.
DBeaver IDE supports "data export as markdown" and "advanced copy as markdown" out-of-the-box:
Output:
|R_REGIONKEY|R_NAME|R_COMMENT|
|-----------|------|---------|
|0|AFRICA|lar deposits. blithely final packages cajole. regular waters are final requests. regular accounts are according to |
|1|AMERICA|hs use ironic, even requests. s|
|2|ASIA|ges. thinly even pinto beans ca|
|3|EUROPE|ly final courts cajole furiously final excuse|
|4|MIDDLE EAST|uickly special accounts cajole carefully blithely close requests. carefully final asymptotes haggle furiousl|
It is rendered as:
R_REGIONKEY
R_NAME
R_COMMENT
0
AFRICA
lar deposits. blithely final packages cajole. regular waters are final requests. regular accounts are according to
1
AMERICA
hs use ironic, even requests. s
2
ASIA
ges. thinly even pinto beans ca
3
EUROPE
ly final courts cajole furiously final excuse
4
MIDDLE EAST
uickly special accounts cajole carefully blithely close requests. carefully final asymptotes haggle furiousl

Stored procedure - get anticipated columns before fully executing statement?

I'm working through a stored procedure and wondering if there's a way to retrieve the anticipated result column list from a sql statement before fully executing.
Scenarios:
dynamic SQL
a UDF that might vary the columns outside of our control
EX:
//inbound parameter
SET QUERY_DEFINITION_ID = 12345;
//Initial statement pulls query text from bank of queries
var sqlText = getQueryFromQueryBank(QUERY_DEFINITION_ID);
//now we run our query
var cmd = {sqlText: sqlText };
stmt = snowflake.createStatement(cmd);
What I'd like to be able to do is say "right - before you run this, give me the anticipated column list" so I can compare it to what's expected.
EX:
Expected: [col1, col2, col3, col4]
Got: [col1]
Result: Oops. Don't run.
Rationale here is that I want to short-circuit the execution if something is missing - before it potentially runs for a while. I can validate all of this after the fact, but it would be really helpful to stop early.
Any ideas very much appreciated!
This sample SP code shows how to get a list of columns that a query will project into the result before you run the query. It should only be used for large, long running queries because it will take a few seconds to get the column list.
There are a couple of caveats. 1) It will only return the names of the columns. It won't tell you how they were built, that is, whether they're aliased, direct from a table, calculated, etc. 2) The example query I used is straight from the Snowflake documentation here https://docs.snowflake.com/en/user-guide/sample-data-tpcds.html#functional-query-definition. For convenience, I minimized the query to a single line. The output of the columns includes object qualifiers in addition to the column names, so V1.I_CATEGORY, V1.D_YEAR, V1.D_MOY, etc. If you don't want them to make it easier to compare names, you can strip off the qualifiers using the JavaScript split function on the dot and take index 1 of the resulting array.
create or replace procedure EXPLAIN_BEFORE_RUNNING()
returns string
language javascript
execute as caller
as
$$
// Set the context for the session to the TPC-H sample data:
executeNonQuery("use schema snowflake_sample_data.tpcds_sf10tcl;");
// Here's a complex query from the Snowflake docs (minimized to one line for convienience):
var sql = `with v1 as( select i_category, i_brand, cc_name, d_year, d_moy, sum(cs_sales_price) sum_sales, avg(sum(cs_sales_price)) over(partition by i_category, i_brand, cc_name, d_year) avg_monthly_sales, rank() over (partition by i_category, i_brand, cc_name order by d_year, d_moy) rn from item, catalog_sales, date_dim, call_center where cs_item_sk = i_item_sk and cs_sold_date_sk = d_date_sk and cc_call_center_sk= cs_call_center_sk and ( d_year = 1999 or ( d_year = 1999-1 and d_moy =12) or ( d_year = 1999+1 and d_moy =1)) group by i_category, i_brand, cc_name , d_year, d_moy), v2 as( select v1.i_category ,v1.d_year, v1.d_moy ,v1.avg_monthly_sales ,v1.sum_sales, v1_lag.sum_sales psum, v1_lead.sum_sales nsum from v1, v1 v1_lag, v1 v1_lead where v1.i_category = v1_lag.i_category and v1.i_category = v1_lead.i_category and v1.i_brand = v1_lag.i_brand and v1.i_brand = v1_lead.i_brand and v1.cc_name = v1_lag.cc_name and v1.cc_name = v1_lead.cc_name and v1.rn = v1_lag.rn + 1 and v1.rn = v1_lead.rn - 1) select * from v2 where d_year = 1999 and avg_monthly_sales > 0 and case when avg_monthly_sales > 0 then abs(sum_sales - avg_monthly_sales) / avg_monthly_sales else null end > 0.1 order by sum_sales - avg_monthly_sales, 3 limit 100;`;
// Before actually running the query, generate an explain plan.
executeNonQuery("explain " + sql);
// Now read the column list from the explain plan from the result set.
var columnList = executeSingleValueQuery("COLUMN_LIST", `select "expressions" as COLUMN_LIST from table(result_scan(last_query_id())) where "operation" = 'Result';`);
// For now, just exit with the column list as the output...
return columnList;
// Your code here...
// Helper functions:
function executeNonQuery(queryString) {
var out = '';
cmd = {sqlText: queryString};
stmt = snowflake.createStatement(cmd);
var rs;
rs = stmt.execute();
}
function executeSingleValueQuery(columnName, queryString) {
var out;
cmd1 = {sqlText: queryString};
stmt = snowflake.createStatement(cmd1);
var rs;
try{
rs = stmt.execute();
rs.next();
return rs.getColumnValue(columnName);
}
catch(err) {
if (err.message.substring(0, 18) == "ResultSet is empty"){
throw "ERROR: No rows returned in query.";
} else {
throw "ERROR: " + err.message.replace(/\n/g, " ");
}
}
return out;
}
$$;
call Explain_Before_Running();

snowflake identify ARRAY_CONTAINS partial match

I am looking for a partial match / using some kind of a wild card to do string matching in SNOWFLAKE arrays.
SELECT ARRAY_CONTAINS('HELLO'::VARIANT, ARRAY_CONSTRUCT('HELLO', 'HI'))
<TRUE>
Tried like this , but no luck
SELECT ARRAY_CONTAINS('HELL%'::VARIANT, ARRAY_CONSTRUCT('HELLO', 'HI'))
<FALSE>
Is there any other way to do this ?
Maybe you can convert the array (temporarily) to string and use CONTAINS operator?
SELECT CONTAINS( ARRAY_TO_STRING( ARRAY_CONSTRUCT('WHY','HELLO', 'HI'),',') , 'HELL' );
SELECT ARRAY_CONSTRUCT('HELLO', 'HI') AS aa
,':' || ARRAY_TO_STRING(aa,';') || ':' AS bb
,bb like '%:HELL%'
so this is a dirty solution, but it would "work"
You can do this with a JavaScript UDF. This example uses RegExp instead of the like function wildcards. It's overloaded so you can use standard RegExp or one with parameters (i for case insensitivity in particular). This returns a boolean to approximate the behavior of ARRAY_CONTAINS. If you want, you can do a simple modification to return the value of "i", which will be the ordinal position of the first match in the array, or -1 if it leaves the loop (not found).
create or replace function ARRAY_CONTAINS_REGEXP(REGEXP_EXPRESSION string, A array)
returns boolean
language javascript
as
$$
var rx = new RegExp(REGEXP_EXPRESSION);
for (var i = 0; i < A.length; i++){
if (A[i].search(rx) != -1) return true;
}
return false;
$$;
create or replace function ARRAY_CONTAINS_REGEXP(REGEXP_EXPRESSION string, A array, REGEX_PARAMS string)
returns boolean
language javascript
as
$$
var rx = new RegExp(REGEXP_EXPRESSION, REGEX_PARAMS);
for (var i = 0; i < A.length; i++){
if (A[i].search(rx) != -1) return true;
}
return false;
$$;
SELECT ARRAY_CONTAINS_REGEXP('HELL.'::VARIANT, ARRAY_CONSTRUCT('HELLO', 'HI')); -- True. Wildcard match.
SELECT ARRAY_CONTAINS_REGEXP('HELL'::VARIANT, ARRAY_CONSTRUCT('HELLO', 'HI')); -- True. Partial match.
SELECT ARRAY_CONTAINS_REGEXP('\\bHEL\\b'::VARIANT, ARRAY_CONSTRUCT('HELLO', 'HI')); -- False. Full word matching.
SELECT ARRAY_CONTAINS_REGEXP('hel'::VARIANT, ARRAY_CONSTRUCT('HELLO', 'HI')); -- False. Case sensitive.
SELECT ARRAY_CONTAINS_REGEXP('HeL'::VARIANT, ARRAY_CONSTRUCT('HELLO', 'HI'), 'i'); -- True. Case insensitive partial match.
SELECT ARRAY_CONTAINS_REGEXP('\\bHeL\\b'::VARIANT, ARRAY_CONSTRUCT('HELLO', 'HI'), 'i'); -- False. Full word insensitive match.

Case test if substring exist and replace with blank

I need to cleanup a set of companies name by replacing : INC, LTD, LTD. , INC. , others, with a empty space when they are individual words ( with one blank space before the word i.e. Incoming INC) and not letters part of company name i.e. INComing Money.
The logic I tried :
case
when FINDSTRING([Trade Name]," INC",1) > 0 then REPLACE([Trade Name]," INC","")
when FINDSTRING([Trade Name]," LTD",1) > 0 then REPLACE([Trade Name]," LTD","")
ELSE [Trade Name]
I tried SSIS expresion in a derived column :
FINDSTRING( [Trade Name] ," INC",1) ? REPLACE([Trade Name]," INC","") :
FINDSTRING([Trade Name]," LTD",1) ? REPLACE([Trade Name]," LTD",""):
The error received:
Error at Data Flow Task [Derived Column [1]]: Attempt to find the
input column named "A" failed with error code 0xC0010009. The input
column specified was not found in the input column collection.
In a similar case it is easier to use a Script Component to clean this column, you can simply split the column based on spaces then re concatenate the parts that are not equal to INC, you can use the following method to do that, or you can simple use RegEx.Replace() method to replace values based on regular expressions:
string value = "";
string[] parts = Row.TradeName.Split(' ');
foreach(string str in parts){
if(str != "INC"){
value += " " + str;
}
}
Row.outTradeName = value.TrimStart();

Problems with string parameter insertion into prepared statement

I have a database running on an MS SQL Server. My application communicates via JDBC and ODBC with it. Now I try to use prepared statements.
When I insert a numeric (Long) parameter everything works fine. When I insert a string
parameter it does not work. There is no error message, but an empty result set.
WHERE column LIKE ('%' + ? + '%') --inserted "test" -> empty result set
WHERE column LIKE ? --inserted "%test%" -> empty result set
WHERE column = ? --inserted "test" -> works
But I need the LIKE functionality. When I insert the same string directly into the query string (not as a prepared statement parameter) it runs fine.
WHERE column LIKE '%test%'
It looks a little bit like double quoting for me, but I never used quotes inside a string. I use preparedStatement.setString(int index, String x) for insertion.
What is causing this problem?
How can I fix it?
Thanks in advance.
What are you inserting at '?'
If you are inserting
test
Then this will result in
WHERE column LIKE ('%' + test + '%')
which will fail. If you are inserting
"test"
Then this will result in
WHERE column LIKE ('%' + "test" + '%')
Which will fail.
You need to insert
'test'
Then this will result in
WHERE column LIKE ('%' + 'test' + '%')
And this should work.
I don't know why = "test" works, it should not unless you have a column called test.
I am using SUN's JdbcOdbcBridge. As far as I read yet, you should avoid to use it. Maybe there is a better implementation out there.
For now, I wrote the folling method. It inserts string-type parameters into the statement with string operations before the statement is compiled.
You should build a map of the parameters with the parameter index as the key and the value as the parameter itself.
private static String insertStringParameters(String statement, Map<Integer, Object> parameters) {
for (Integer parameterIndex : parameters.keySet()) {
Object parameter = parameters.get(parameterIndex);
if (parameter instanceof String) {
String parameterString = "'" + (String) parameter + "'";
int occurence = 0;
int stringIndex = 0;
while(occurence < parameterIndex){
stringIndex = statement.indexOf("?", stringIndex) + 1;
occurence++;
}
statement = statement.substring(0, stringIndex - 1) + parameterString + statement.substring(stringIndex);
}
}
return statement;
}

Resources