SQL: MAX like implementation of OR on Grouped object [duplicate] - sql-server

I have a field in a table which contains bitwise flags. Let's say for the sake of example there are three flags: 4 => read, 2 => write, 1 => execute and the table looks like this*:
user_id | file | permissions
-----------+--------+---------------
1 | a.txt | 6 ( <-- 6 = 4 + 2 = read + write)
1 | b.txt | 4 ( <-- 4 = 4 = read)
2 | a.txt | 4
2 | c.exe | 1 ( <-- 1 = execute)
I'm interested to find all users who have a particular flag set (eg: write) on ANY record. To do this in one query, I figured that if you OR'd all the user's permissions together you'd get a single value which is the "sum total" of their permissions:
user_id | all_perms
-----------+-------------
1 | 6 (<-- 6 | 4 = 6)
2 | 5 (<-- 4 | 1 = 5)
*My actual table isn't to do with files or file permissions, 'tis but an example
Is there a way I could perform this in one statement? The way I see it, it's very similar to a normal aggregate function with GROUP BY:
SELECT user_id, SUM(permissions) as all_perms
FROM permissions
GROUP BY user_id
...but obviously, some magical "bitwise-or" function instead of SUM. Anyone know of anything like that?
(And for bonus points, does it work in oracle?)

MySQL:
SELECT user_id, BIT_OR(permissions) as all_perms
FROM permissions
GROUP BY user_id

Ah, another one of those questions where I find the answer 5 minutes after asking... Accepted answer will go to the MySQL implementation though...
Here's how to do it with Oracle, as I discovered on Radino's blog
You create an object...
CREATE OR REPLACE TYPE bitor_impl AS OBJECT
(
bitor NUMBER,
STATIC FUNCTION ODCIAggregateInitialize(ctx IN OUT bitor_impl) RETURN NUMBER,
MEMBER FUNCTION ODCIAggregateIterate(SELF IN OUT bitor_impl,
VALUE IN NUMBER) RETURN NUMBER,
MEMBER FUNCTION ODCIAggregateMerge(SELF IN OUT bitor_impl,
ctx2 IN bitor_impl) RETURN NUMBER,
MEMBER FUNCTION ODCIAggregateTerminate(SELF IN OUT bitor_impl,
returnvalue OUT NUMBER,
flags IN NUMBER) RETURN NUMBER
)
/
CREATE OR REPLACE TYPE BODY bitor_impl IS
STATIC FUNCTION ODCIAggregateInitialize(ctx IN OUT bitor_impl) RETURN NUMBER IS
BEGIN
ctx := bitor_impl(0);
RETURN ODCIConst.Success;
END ODCIAggregateInitialize;
MEMBER FUNCTION ODCIAggregateIterate(SELF IN OUT bitor_impl,
VALUE IN NUMBER) RETURN NUMBER IS
BEGIN
SELF.bitor := SELF.bitor + VALUE - bitand(SELF.bitor, VALUE);
RETURN ODCIConst.Success;
END ODCIAggregateIterate;
MEMBER FUNCTION ODCIAggregateMerge(SELF IN OUT bitor_impl,
ctx2 IN bitor_impl) RETURN NUMBER IS
BEGIN
SELF.bitor := SELF.bitor + ctx2.bitor - bitand(SELF.bitor, ctx2.bitor);
RETURN ODCIConst.Success;
END ODCIAggregateMerge;
MEMBER FUNCTION ODCIAggregateTerminate(SELF IN OUT bitor_impl,
returnvalue OUT NUMBER,
flags IN NUMBER) RETURN NUMBER IS
BEGIN
returnvalue := SELF.bitor;
RETURN ODCIConst.Success;
END ODCIAggregateTerminate;
END;
/
...and then define your own aggregate function
CREATE OR REPLACE FUNCTION bitoragg(x IN NUMBER) RETURN NUMBER
PARALLEL_ENABLE
AGGREGATE USING bitor_impl;
/
Usage:
SELECT user_id, bitoragg(permissions) FROM perms GROUP BY user_id

And you can do a bitwise or with...
FUNCTION BITOR(x IN NUMBER, y IN NUMBER)
RETURN NUMBER
AS
BEGIN
RETURN x + y - BITAND(x,y);
END;

You would need to know the possible permission components (1, 2 and 4) apriori (thus harder to maintain), but this is pretty simple and would work:
SELECT user_id,
MAX(BITAND(permissions, 1)) +
MAX(BITAND(permissions, 2)) +
MAX(BITAND(permissions, 4)) all_perms
FROM permissions
GROUP BY user_id

I'm interested to find all users who
have a particular flag set (eg: write)
on ANY record
What's wrong with simply
SELECT DISTINCT User_ID
FROM Permissions
WHERE permissions & 2 = 2

Related

How to generate Stackoverflow table markdown from Snowflake

Stackoverflow supports table markdown. For example, to display a table like this:
N_NATIONKEY
N_NAME
N_REGIONKEY
0
ALGERIA
0
1
ARGENTINA
1
2
BRAZIL
1
3
CANADA
1
4
EGYPT
4
You can write code like this:
|N_NATIONKEY|N_NAME|N_REGIONKEY|
|---:|:---|---:|
|0|ALGERIA|0|
|1|ARGENTINA|1|
|2|BRAZIL|1|
|3|CANADA|1|
|4|EGYPT|4|
It would save a lot of time to generate the Stackoverflow table markdown automatically when running Snowflake queries.
The following stored procedure accepts either a query string or a query ID (it will auto-detect which it is) and returns the table results as Stackoverflow table markdown. It will automatically align numbers and dates to the right, strings, arrays, and objects to the left, and other types default to centered. It supports any query you can pass to it. It may be a good idea to use $$ to terminate the string passed into the procedure in case the SQL contains single quotes. You can create the procedure and test it using this script:
create or replace procedure MARKDOWN("queryOrQueryId" string)
returns string
language javascript
execute as caller
as
$$
const MAX_ROWS = 50; // Set the maximum row count to fetch. Tables in markdown larger than this become hard to read.
var [rs, i, c, row, props] = [null, 0, 0, 0, {}];
if (!queryOrQueryId || queryOrQueryId == 0){
queryOrQueryId = `select * from table(result_scan(last_query_id())) limit ${MAX_ROWS}`;
}
queryOrQueryId = queryOrQueryId.trim();
if (isUUID(queryOrQueryId)){
rs = snowflake.execute({sqlText:`select * from table(result_scan('${queryOrQueryId}')) limit ${MAX_ROWS}`});
} else {
rs = snowflake.execute({sqlText:`${queryOrQueryId}`});
}
props.columnCount = rs.getColumnCount();
for(i = 1; i <= props.columnCount; i++){
props["col" + i + "Name"] = rs.getColumnName(i);
props["col" + i + "Type"] = rs.getColumnType(i);
}
var table = getHeader(props);
while(rs.next()){
row = "|";
for(c = 1; c <= props.columnCount; c++){
row += escapeMarkup(rs.getColumnValueAsString(c)) + "|";
}
table += "\n" + row;
}
return table;
//------ End main function. Start of helper functions.
function escapeMarkup(s){
s = s.replace(/\\/g, "\\\\");
s = s.replaceAll('|', '\\|');
s = s.replace(/\s+/g, " ");
return s;
}
function getHeader(props){
s = "|";
for (var i = 1; i <= props.columnCount; i++){
s += props["col" + i + "Name"] + "|";
}
s += "\n";
for (var i = 1; i <= props.columnCount; i++){
switch(props["col" + i + "Type"]) {
case 'number':
s += '|---:';
break;
case 'string':
s += '|:---';
break;
case 'date':
s += '|---:';
break;
case 'json':
s += '|:---';
break;
default:
s += '|:---:';
}
}
return s + "|";
}
function isUUID(str){
const regexExp = /^[0-9a-fA-F]{8}\b-[0-9a-fA-F]{4}\b-[0-9a-fA-F]{4}\b-[0-9a-fA-F]{4}\b-[0-9a-fA-F]{12}$/gi;
return regexExp.test(str);
}
$$;
-- Usage type 1, a simple query:
call stackoverflow_table($$ select * from SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.NATION limit 5 $$);
-- Usage type 2, a query ID:
select * from SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.NATION limit 5;
set quid = (select last_query_id());
call stackoverflow_table($quid);
Edit: Based on Fieldy's helpful feedback, I modified the procedure code to allow passing null or 0 or a blank string '' as the parameter. This will use the last query ID and is a helpful shortcut. It also adds a constant to the code that will limit the returns to a set number of rows. This limit will be applied when using query IDs (or sending null, '', or 0, which uses the last query ID). The limit is not applied when the input parameter is the text of a query to run to avoid syntax errors if there's already a limit applied, etc.
Greg Pavlik's Javascript Stored Procedure solution made me wonder if this would be any easier with the new Python language support in Stored Procedures. This is currently a public-preview feature.
The Python Snowpark API supports returning a result as a Pandas dataframe, and Pandas supports returning a dataframe in Markdown format, via the tabulate package. Here's the stored procedure.
CREATE OR REPLACE PROCEDURE markdown_table(query_id VARCHAR)
RETURNS VARCHAR
LANGUAGE PYTHON
RUNTIME_VERSION = '3.8'
PACKAGES = ('snowflake-snowpark-python','pandas','tabulate', 'regex')
HANDLER = 'markdown_table'
EXECUTE AS CALLER
AS $$
import pandas as pd
import tabulate
import regex
def markdown_table(session, queryOrQueryId = None):
# Validate UUID
if(queryOrQueryId is None):
pandas_result = session.sql("""Select * from table(result_scan(last_query_id()))""").to_pandas()
elif(bool(regex.match("^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$", queryOrQueryId))):
pandas_result = session.sql(f"""select * from table(result_scan('{queryOrQueryId}'))""").to_pandas()
else:
pandas_result = session.sql(queryOrQueryId).to_pandas()
return pandas_result.to_markdown()
$$;
Which you can use as follows:
-- Usage type 1, use the result from the query ran immediately proceeding the Store-Procedure Call
select * from SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.NATION limit 5;
call markdown_table(NULL);
-- Usage type 2, pass in a query_id
select * from SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.NATION limit 5;
set quid = (select last_query_id());
select $quid;
call markdown_table($quid);
-- Usage type 3, provide a Query string to the Store-Procedure Call
call markdown_table('select * from SNOWFLAKE_SAMPLE_DATA.TPCH_SF1.NATION limit 5');
The table can also be
N_NATIONKEY|N_NAME|N_REGIONKEY
--|--|--
0|ALGERIA|0
1|ARGENTINA|1
2|BRAZIL|1
3|CANADA|1
4|EGYPT|4
giving, so it can be a simpler solution
N_NATIONKEY
N_NAME
N_REGIONKEY
0
ALGERIA
0
1
ARGENTINA
1
2
BRAZIL
1
3
CANADA
1
4
EGYPT
4
I grab the result table and use notepad++ and replace tab \t with pipe space | and then insert by hand the header marker line. I sometime replace the empty null results with the text null to make the results make more sense. the form you use with the start/end pipes gets around the need for that.
DBeaver IDE supports "data export as markdown" and "advanced copy as markdown" out-of-the-box:
Output:
|R_REGIONKEY|R_NAME|R_COMMENT|
|-----------|------|---------|
|0|AFRICA|lar deposits. blithely final packages cajole. regular waters are final requests. regular accounts are according to |
|1|AMERICA|hs use ironic, even requests. s|
|2|ASIA|ges. thinly even pinto beans ca|
|3|EUROPE|ly final courts cajole furiously final excuse|
|4|MIDDLE EAST|uickly special accounts cajole carefully blithely close requests. carefully final asymptotes haggle furiousl|
It is rendered as:
R_REGIONKEY
R_NAME
R_COMMENT
0
AFRICA
lar deposits. blithely final packages cajole. regular waters are final requests. regular accounts are according to
1
AMERICA
hs use ironic, even requests. s
2
ASIA
ges. thinly even pinto beans ca
3
EUROPE
ly final courts cajole furiously final excuse
4
MIDDLE EAST
uickly special accounts cajole carefully blithely close requests. carefully final asymptotes haggle furiousl

Adding multiple records from a string

I have a string of email addresses. For example, "a#a.com; b#a.com; c#a.com"
My database is:
record | flag1 | flag2 | emailaddresss
--------------------------------------------------------
1 | 0 | 0 | a#a.com
2 | 0 | 0 | b#a.com
3 | 0 | 0 | c#a.com
What I need to do is parse the string, and if the address is not in the database, add it.
Then, return a string of just the record numbers that correspond to the email addresses.
So, if the call is made with "A#a.com; c#a.com; d#a.com", the rountine would add "d#a.com", then return "1, 3,4" corresponding to the records that match the email addresses.
What I am doing now is calling the database once per email address to look it up and confirm it exists (adding if it doesn't exist), then looping thru them again to get the addresses 1 by 1 from my powershell app to collect the record numbers.
There has to be a way to just pass all of the addresses to SQL at the same time, right?
I have it working in powershell.. but slowly..
I'd love a response from SQL as shown above of just the record number for each email address in a single response. That is, "1,2,4" etc.
My powershell code is:
$EmailList2 = $EmailList.split(";")
# lets get the ID # for each eamil address.
foreach($x in $EmailList2)
{
$data = exec-query "select Record from emailaddresses where emailAddress = #email" -parameter #{email=$x.trim()} -conn $connection
if ($($data.Tables.record) -gt 0)
{
$ResponseNumbers = $ResponseNumbers + "$($data.Tables.record), "
}
}
$ResponseNumbers = $($ResponseNumbers+"XX").replace(", XX","")
return $ResponseNumbers
You'd have to do this in 2 steps. Firstly INSERT the new values and then use a SELECT to get the values back. This answer uses delimitedsplit8k (not delimitedsplit8k_LEAD) as you're still using SQL Server 2008. On the note of 2008 I strongly suggest looking at upgrade paths soon as you have about 6 weeks of support left.
You can use the function to split the values and then INSERT/SELECT appropriately:
DECLARE #Emails varchar(8000) = 'a#a.com;b#a.com;c#a.com';
WITH Emails AS(
SELECT DS.Item AS Email
FROM dbo.DelimitedSplit8K(#Emails,';') DS)
INSERT INTO YT (emailaddress) --I don't know what the other columns value should be, so have excluded
SELECT E.Email
FROM dbo.YourTable YT
LEFT JOIN Emails E ON YT.emailaddress = E.Email
WHERE E.Email IS NULL;
SELECT YT.record
FROM dbo.YourTable YT
JOIN dbo.DelimitedSplit8K(#Emails,';') DS ON DS.Item = YT.emailaddress;

Add incremental number in duplicate records

I have SSIS package, which retrieves all records including duplicates. My question is how to add an incremental value for the duplicate records (only the ID and PropertyID).
Eg
Records from a Merge Join
ID Name PropertyID Value
1 A 1 123
1 A 1 223
2 B 2 334
3 C 1 22
3 C 1 45
Now I need to append an incremental value at the end of the each record as
ID Name PropertyID Value RID
1 A 1 123 1
1 A 1 223 2
2 B 2 334 1
3 C 1 22 1
3 C 1 45 2
Since ID 1 & 3 are returned twice, the first record has RID as 1 and the second record as 2.
ID and PropertyID need to be considered to generate the Repeating ID i.e RID.
How can I do it in SSIS or using SQL command?
Update #1:
Please correct me if I'm wrong, since the data is not stored in any table yet, I'm unable to use the select query using rownumber(). Any way I can do it from the Merge Join?
You could use ROW_NUMBER:
SELECT ID,
Name,
PropertyID,
Value,
ROW_NUMBER() OVER(PARTITION BY ID, PropertyID ORDER BY Value) As RID
FROM TableName
This will do the job for you: https://paultebraak.wordpress.com/2013/02/25/rank-partitioning-in-etl-using-ssis/
You will need to write a custom script, something like this:
public
class
ScriptMain : UserComponent
{
string _sub_category = “”;
int _row_rank = 1;
public override void Input0_ProcessInputRow(Input0Buffer Row)
{
if (Row.subcategory != _sub_category)
{
_row_rank = 1;
Row.rowrank = _row_rank;
_sub_category = Row.subcategory;
}
else
{
_row_rank++;
Row.rowrank = _row_rank;
}
}
}

Removing the repeating elements from a row in a squlite table

Please let me know if there is any query where in I remove the repeating entries in a row.
For eg: I have a table which has name with 9 telephone numbers:
Name Tel0 Tel1 Tel2 Tel3 Tel4 Tel5 Tel6 Tel7 Tel8
John 1 2 2 2 3 3 4 5 1
The final result should be as shown below:
Name Tel0 Tel1 Tel2 Tel3 Tel4 Tel5 Tel6 Tel7 Tel8
John 1 2 3 4 5
regards
Maddy
I fear that it will be more complicated to keep this format than to split the table in two as I suggested. If you insist on keeping the current schema then I would suggest that you query the row, organise the fields in application code and then perform an update on the database.
You could also try to use SQL UNION operator to give you a list of the numbers, a UNION by default will remove all duplicate rows:
SELECT Name, Tel FROM
(SELECT Name, Tel0 AS Tel FROM Person UNION
SELECT Name, Tel1 FROM Person UNION
SELECT Name, Tel2 FROM Person) ORDER BY Name ;
Which should give you a result set like this:
John|1
John|2
You will then have to step through the result set and saving each number into a separate variable (skipping those variables that do not exist) until the "Name" field changes.
Tel1 := Null; Tel2 := Null;
Name := ResultSet['Name'];
Tel0 := ResultSet['Tel'];
ResultSet.Next();
if (Name == ResultSet['Name']) {
Tel1 := ResultSet['Tel'];
} else {
UPDATE here.
StartAgain;
}
ResultSet.Next();
if (Name == ResultSet['Name']) {
Tel2 := ResultSet['Tel'];
} else {
UPDATE here.
StartAgain;
}
I am not recommending you do this, it is very bad use of a relational database but once implemented in a real language and debugged that should work.

Apply function to every element of an array in a SELECT statement

I am listing all functions of a PostgreSQL schema and need the human readable types for every argument of the functions. OIDs of the types a represented as an array in proallargtypes. I can unnest the array and apply format_type() to it, which causes the query to split into multiple rows for a single function. To avoid that I have to create an outer SELECT to GROUP the argtypes again because, apperently, one cannot group an unnested array. All columns are dependent on proname but I have to list all columns in GROUP BY clause, which is unnecessary but proname is not a primary key.
Is there a better way to achieve my goal of an output like this:
proname | ... | protypes
-------------------------------------
test | ... | {integer,integer}
I am currently using this query:
SELECT
proname,
prosrc,
pronargs,
proargmodes,
array_agg(proargtypes), -- see here
proallargtypes,
proargnames,
prodefaults,
prorettype,
lanname
FROM (
SELECT
p.proname,
p.prosrc,
p.pronargs,
p.proargmodes,
format_type(unnest(p.proallargtypes), NULL) AS proargtypes, -- and here
p.proallargtypes,
p.proargnames,
pg_get_expr(p.proargdefaults, 0) AS prodefaults,
format_type(p.prorettype, NULL) AS prorettype,
l.lanname
FROM pg_catalog.pg_proc p
JOIN pg_catalog.pg_language l
ON l.oid = p.prolang
JOIN pg_catalog.pg_namespace n
ON n.oid = p.pronamespace
WHERE n.nspname = 'public'
) x
GROUP BY proname, prosrc, pronargs, proargmodes, proallargtypes, proargnames, prodefaults, prorettype, lanname
you can use internal "undocumented" function pg_catalog.pg_get_function_arguments(p.oid).
postgres=# SELECT pg_catalog.pg_get_function_arguments('fufu'::regproc);
pg_get_function_arguments
---------------------------
a integer, b integer
(1 row)
Now, there are no build "map" function. So unnest, array_agg is only one possible. You can simplify life with own custom function:
CREATE OR REPLACE FUNCTION format_types(oid[])
RETURNS text[] AS $$
SELECT ARRAY(SELECT format_type(unnest($1), null))
$$ LANGUAGE sql IMMUTABLE;
and result
postgres=# SELECT format_types('{21,22,23}');
format_types
-------------------------------
{smallint,int2vector,integer}
(1 row)
Then your query should to be:
SELECT proname, format_types(proallargtypes)
FROM pg_proc
WHERE pronamespace = 2200 AND proallargtypes;
But result will not be expected probably, because proallargtypes field is not empty only when OUT parameters are used. It is empty usually. You should to look to proargtypes field, but it is a oidvector type - so you should to transform to oid[] first.
postgres=# SELECT proname, format_types(string_to_array(proargtypes::text,' ')::oid[])
FROM pg_proc
WHERE pronamespace = 2200
LIMIT 10;
proname | format_types
------------------------------+----------------------------------------------------
quantile_append_double | {internal,"double precision","double precision"}
quantile_append_double_array | {internal,"double precision","double precision[]"}
quantile_double | {internal}
quantile_double_array | {internal}
quantile | {"double precision","double precision"}
quantile | {"double precision","double precision[]"}
quantile_cont_double | {internal}
quantile_cont_double_array | {internal}
quantile_cont | {"double precision","double precision"}
quantile_cont | {"double precision","double precision[]"}
(10 rows)

Resources