Extract words in between the separator - snowflake-cloud-data-platform

I have input like below
I want like below
I was trying with
Sales External?HR?Purchase Department
I did LISTAGG because finally i want in separate columns
Query Output would be like below,
meaning it should search for first occurrence of the separator (in this case "?", it can be anything but not common ones like "-" or "/" as the separator needs to be separate than sting value) and then extract the phrase before the first separator and create one column and put the value. Then it should look for second occurrence of the separator and then extract the word and keep creating columns, there can be multiple separators.
I tried with SPLIT_PART but it does not maintain the sequence in real data scenario and data does not come correct as per sequence.
I also tried with REGEXP_INSTR, but unable to use special characters as separators.
Any thought?

Regex Extract should work for you:
SELECT
REGEXP_SUBSTR_ALL("Sales External?HR?Purchase Department", "(.*)\?")
You can use LATERAL FLATTEN to convert your array into rows:
WITH MY_CTE AS (
SELECT
REGEXP_SUBSTR_ALL("Sales External?HR?Purchase Department", "(.*)\?")
)
SELECT
*
FROM
LATERAL FLATTEN(INPUT => MY_CTE, MODE=> 'ARRAY')
Deeper dive into some more cases: https://dwgeek.com/snowflake-convert-array-to-rows-methods-and-examples.html/

Here's a simplified version of the data. It uses a CTE with array_agg to group the rows. It then changes from arrays to columns. To add more columns, you can use max(), min(), or any_value() functions to get them through the aggregation. (Note that use of any_value() will not allow use of cached results from the result set cache since it's flagged as nondeterministic.)
create or replace table T1 (EMPID int, ROLE string, ACCESS string, ACCESS_LVL string, ITERATION string);
insert into T1(EMPID, ROLE, ACCESS, ACCESS_LVL, ITERATION) values
(1234, 'Sales Rep', 'Specific', 'REGION', 'DEV'),
(1234, 'Purchase Rep', 'Specific', 'EVERY', 'PROD'),
(1234, 'HR', NULL, 'Dept', 'PROD'),
(4321, 'HR', 'Foo', 'Foo', 'Foo')
;
with X as
(
select EMPID
,array_agg(nvl(ROLE,'')) within group (order by ROLE) ARR_ROLE
,array_agg(nvl(ACCESS,'')) within group (order by ROLE) ARR_ACCESS
,array_agg(nvl(ACCESS_LVL,'')) within group (order by ROLE) ARR_ACCESS_LVL
,array_agg(nvl(ITERATION,'')) within group (order by ROLE) ARR_ITERATION
from T1
group by EMPID
)
select EMPID
,ARR_ROLE[0]::string as ROLE1
,ARR_ROLE[1]::string as ROLE2
,ARR_ROLE[2]::string as ROLE3
,ARR_ACCESS[0]::string as ACCESS1
,ARR_ACCESS[1]::string as ACCESS2
,ARR_ACCESS[2]::string as ACCESS3
,ARR_ACCESS_LVL[0]::string as ACCESS_LVL1
,ARR_ACCESS_LVL[1]::string as ACCESS_LVL2
,ARR_ACCESS_LVL[2]::string as ACCESS_LVL3
,ARR_ITERATION[0]::string as ITERATION1
,ARR_ITERATION[1]::string as ITERATION2
,ARR_ITERATION[2]::string as ITERATION3
from X
;
There's nothing particular that seems interesting to sort the rows into the array so that ROLE1, ROLE2, ROLE3, etc. are deterministic. I showed simply sorting on the name of the role, but it could be any order by within that group.

Here's a stored proc that will produce a table result with a dynamic set of columns based on the input string and specified delimiter.
If you are looking for a way to generate dynamic column names based on values, I recommend visiting Felipe Hoffa's blog entry here:
https://medium.com/snowflake/dynamic-pivots-in-sql-with-snowflake-c763933987c
create or replace procedure pivot_dyn_results(input string, delimiter string)
returns table ()
language SQL
AS
declare
max_count integer default 0;
lcount integer default 0;
rs resultset;
stmt1 string;
stmt2 string;
begin
-- Get number of delimiter separated values (assumes no leading or trailing delimiter)
select regexp_count(:input, '\\'||:delimiter, 1) into :max_count from dual;
-- Generate the initial row-based result set of parsed values
stmt1 := 'SELECT * from lateral split_to_table(?,?)';
-- Build dynamic query to produce the pivoted column based results
stmt2 := 'select * from (select * from table(result_scan(last_query_id(-1)))) pivot(max(value) for index in (';
-- initialize look counter for resulting columns
lcount := 1;
stmt2 := stmt2 || '\'' || lcount || '\'';
-- append pivot statement for each column to be represented
FOR l in 1 to max_count do
lcount := lcount + 1;
stmt2 := stmt2 || ',\'' || lcount || '\'';
END FOR;
-- close out the pivot statement
stmt2 := stmt2 || '))';
-- execute the
EXECUTE IMMEDIATE :stmt1 using (input, delimiter);
rs := (EXECUTE IMMEDIATE :stmt2);
return table(rs);
end;
Invocation:
call pivot_dyn_results([string],[delimiter]);
call pivot_dyn_results('Sales External?HR?Billing?Purchase Department','?');
Results:

Related

SQL Server to Oracle - using Cross Apply with Oracle

I have a function that takes primary keys and separates them with commas.
Oracle function:
create or replace function split(
list in CHAR,
delimiter in CHAR default ','
)
return split_tbl as
splitted split_tbl := split_tbl();
i pls_integer := 0;
list_ varchar2(32767) := list;
begin
loop
i := instr(list_, delimiter);
if i > 0 then
splitted.extend(1);
splitted(splitted.last) := substr(list_, 1, i - 1);
list_ := substr(list_, i + length(delimiter));
else
splitted.extend(1);
splitted(splitted.last) := list_;
return splitted;
end if;
end loop;
end;
and I have this query in SQL Server that returns the data of this query in the function table
select maxUserSalary.id as 'UserSalary'
into #usersalary
from dbo.Split(#usersalary,';') as userid
cross apply (
select top 1 * from User_Salaryas usersalary
where usersalary.User_Id= userid.item
order by usersalary.Date desc
) as maxUserSalary
The problem is, I'm not able to use cross apply in Oracle to throw this data into this function that is returning a table.
How can I use cross apply with Oracle to return this data in function?
You're using Oracle 18c so you can use the CROSS APPLY syntax. Oracle added it (as well as LATERAL and OUTER APPLY ) in 12c.
Here is a simplified version of your logic:
select us.name
, us.salary
from table(split('FOX IN SOCKS,THING ONE,THING TWO')) t
cross apply (select us.name, max(us.salary) as salary
from user_salaries us
where us.name = t.column_value ) us
There is a working demo on db<>fiddle .
If this doesn't completely solve your problem please post a complete question with table structures, sample data and expected output derived from that sample.
I think APC answered your direct question well. As a side note, I wanted to suggest NOT writing your own function to do this at all. There are several existing solutions to split delimited string values into virtual tables that don't require you to create your own custom types, and don't have the performance overhead of context switching between the SQL and PL/SQL engines.
-- example data - remove this to test with your User_Salary table
with User_Salary as (select 1 as id, 'A' as user_id, sysdate as "Date" from dual
union select 2, 'B', sysdate from dual)
-- your query:
select maxUserSalary.id as "UserSalary"
from (select trim(COLUMN_VALUE) as item
from xmltable(('"'||replace(:usersalary, ';', '","')||'"'))) userid -- note ';' delimiter
cross apply (
select * from User_Salary usersalary
where usersalary.User_Id = userid.item
order by usersalary."Date" desc
fetch first 1 row only
) maxUserSalary;
If you run this and pass in 'A;B;C' for :usersalary, you'll get 1 and 2 back.
A few notes:
In this example, I'm using ; as the delimiter, since that's what your query used.
I tried to match your table/column names, but your column name Date is invalid - it's an Oracle reserved keyword, so it has to be put in quotes to be a valid column name.
As a column identifier, "UserSalary" should also have double quotes, not single.
You can't use as in table aliases.
I removed into usersalary, since into is only used with queries which return a single row, and your query can return multiple rows.

bulk update postgresql sequences

I have existing data that I want to import in a new system.
I want to set sequences accordingly to the length of existing tables. I try this, but I get number == 1.
DO
$do$
DECLARE
_tbl text;
number int;
BEGIN
FOR _tbl IN
SELECT c.relname FROM pg_class c WHERE c.relkind = 'S' and c.relname ilike '%y_id_seq'
LOOP
-- EXECUTE
SELECT count(*) FROM regexp_replace(_tbl, '(\w)y_.*', '\1ies') INTO number;
RAISE NOTICE '%', number;
EXECUTE format('SELECT setval(''"%s"'', ''%s'' )', _tbl, number);
END LOOP;
END
$do$;
What should I do to get the right count?
COUNT(*) is not the best choice for a new sequence value. Just imagine that you have holes in your numbering, for example 1, 2, 15. Count is 3 but next value should be 16 to avoid duplicates in the future.
Assuming you use sequence for id column I would suggest:
SELECT max(id) FROM _table_name_ INTO number;
Or even simpler:
SELECT setval(_sequence_name_, max(id)) FROM _table_name_;

Passing an array of string as a parameter in Delphi to SQL Server [duplicate]

I have a list of integers or of strings and need to pass it as a parameter for a Delphi DataSet. How to do it?
Here is an example. MyQuery is something like:
select * from myTable where intKey in :listParam
I'd set a parameter as a list or array or something else:
MyQuery.ParamByName('listParam').AsSomething := [1,2,3];
and it would result in this query sent to the sql server:
select * from myTable where intKey in (1, 2, 3)
It would be even better if the solution would also work with strings, making this query:
select * from myTable where stringKey in :listParam
become:
select * from myTable where stringKey in ('a', 'b', 'c')
I believe this is a simple question, but "IN" isn't a good keyword for searching the web.
Please answer how I should configure the parameter in the IDE, the query and how to pass the parameters.
I'm using Delphi 7.
Edited: I'm considering the answer is "it isn't possible to do directly". If someone give me a non-hackish answer, the accepted answer will be changed.
AFAIK, it is not possible directly.
You'll have to convert the list into a SQL list in plain text.
For instance:
function ListToText(const Args: array of string): string; overload;
var i: integer;
begin
result := '(';
for i := 0 to high(Args) do
result := result+QuotedStr(Args[i])+',';
result[length(result)] := ')';
end;
function ListToText(const Args: array of integer): string; overload;
var i: integer;
begin
result := '(';
for i := 0 to high(Args) do
result := result+IntToStr(Args[i])+',';
result[length(result)] := ')';
end;
To be used as such:
SQL.Text := 'select * from myTable where intKey in '+ListToText([1,2,3]);
SQL.Text := 'select * from myTable where stringKey in '+ListToText(['a','b','c']);
SQL accepts only single values as parameters so you cannot create a statement with one parameter that can map to a variable number of values, such as the example you gave.
However, you can still use parameterized SQL in this situation. The solution is to iterate over the list of values you have, adding a parameter marker to the SQL and a parameter to the parameter list for each value.
This is easiest to do with positional rather than named parameters but can be adapted for named parameters as well (you may need to adjust this code since I don't have Delphi available and don't remember the Parameter creation syntax):
//AValues is an array of variant values
//SQLCommand is some TDataSet component with Parameters.
for I := Low(AValues) to High(AValues) do
begin
if ParamString = '' then
ParamString = '?'
else
ParamString = ParamString + ', ?';
SQLCommand.Parameters.Add(AValues[I]);
end
SQLCommand.CommandText =
'SELECT * FROM MyTable WHERE KeyValue IN (' + ParamString + ')';
This will produce an injection-safe parameterized query.
There are several options for you but basically you need to get your values into a table. I would suggest a table variable for that.
Here is a version that unpacks an int list.
declare #IDs varchar(max)
set #IDs = :listParam
set #IDs = #IDs+','
declare #T table(ID int primary key)
while len(#IDs) > 1
begin
insert into #T(ID) values (left(#IDs, charindex(',', #IDs)-1))
set #IDs = stuff(#IDs, 1, charindex(',', #IDs), '')
end
select *
from myTable
where intKey in (select ID from #T)
It is possible to have multi-statement queries. The parameter :listParam should be a string:
MyQuery.ParamByName('listParam').AsString := '1,2,3';
You can use the same technique for strings. You just need to change the data type of ID to for instance varchar(10).
Instead of unpacking with a while loop you could make use of a split function
declare #T table(ID varchar(10))
insert into #T
select s
from dbo.Split(',', :listParam)
select *
from myTable
where charKey in (select ID from #T)
A string param could look like this:
MyQuery.ParamByName('listParam').AsString := 'Adam,Bertil,Caesar';
Create a temporary table and insert your values in it. Then use that table as part of a subquery.
For example, create MyListTable in your database. Insert your values into MyListTable. Then do
select * from myTable where keyvalue in (select keyvalue from MyListTable)
This avoids SQL injection attacks. But it's not elegant, is not performance friendly because you have to insert records before running your query, and can lead to concurrency issues.
Not my first choice to deal with your situation but it addresses your concern about sql injection.
If someone still having the same problem, if you are using firedac you can use macros like this:
Query -> "select * from myTable where intKey in (&listParam)"
Setting the macro -> MyQuery.MacroByName('listParam').AsRaw := '1, 2, 3';
I use some "IN" replacement. Here is the query I use:
SELECT * FROM MyTable WHERE CHARINDEX(','+cast(intKey as varchar(10))+',', :listParam) > 0
the code to send parameter:
MyQuery.ParamByName('listParam').AsString := ',1,2,3,';
The array item value can partially match some other values. For instance, "1" can be part of "100". To protect against it, I use comma as delimiter
Why not make a dynamic sql:
Quick and dirty, but still using parameters.
check 10 elements. I don't know how well this scales.
MyQuerySQL.Text:='SELECT * FROM myTable WHERE intKey in (:listParam0'
for i := 1 to 9 do begin
MyQuerySQL.Text := MyQuerySQL.Text + ',:listParam'+IntToStr(i)
end;
MyQuerySQL.Text := MyQuerySQL.Text+')';
for i:=0 to 9 do begin
MyQuery.ParamByName('listParam'+IntToStr(i)).AsInteger := ArrayofInt[0];
end;

PostgreSQL - join statement duplicate row data combine to single row [duplicate]

I am looking for a way to concatenate the strings of a field within a group by query. So for example, I have a table:
ID COMPANY_ID EMPLOYEE
1 1 Anna
2 1 Bill
3 2 Carol
4 2 Dave
and I wanted to group by company_id to get something like:
COMPANY_ID EMPLOYEE
1 Anna, Bill
2 Carol, Dave
There is a built-in function in mySQL to do this group_concat
PostgreSQL 9.0 or later:
Modern Postgres (since 2010) has the string_agg(expression, delimiter) function which will do exactly what the asker was looking for:
SELECT company_id, string_agg(employee, ', ')
FROM mytable
GROUP BY company_id;
Postgres 9 also added the ability to specify an ORDER BY clause in any aggregate expression; otherwise you have to order all your results or deal with an undefined order. So you can now write:
SELECT company_id, string_agg(employee, ', ' ORDER BY employee)
FROM mytable
GROUP BY company_id;
PostgreSQL 8.4.x:
PostgreSQL 8.4 (in 2009) introduced the aggregate function array_agg(expression) which collects the values in an array. Then array_to_string() can be used to give the desired result:
SELECT company_id, array_to_string(array_agg(employee), ', ')
FROM mytable
GROUP BY company_id;
PostgreSQL 8.3.x and older:
When this question was originally posed, there was no built-in aggregate function to concatenate strings. The simplest custom implementation (suggested by Vajda Gabo in this mailing list post, among many others) is to use the built-in textcat function (which lies behind the || operator):
CREATE AGGREGATE textcat_all(
basetype = text,
sfunc = textcat,
stype = text,
initcond = ''
);
Here is the CREATE AGGREGATE documentation.
This simply glues all the strings together, with no separator. In order to get a ", " inserted in between them without having it at the end, you might want to make your own concatenation function and substitute it for the "textcat" above. Here is one I put together and tested on 8.3.12:
CREATE FUNCTION commacat(acc text, instr text) RETURNS text AS $$
BEGIN
IF acc IS NULL OR acc = '' THEN
RETURN instr;
ELSE
RETURN acc || ', ' || instr;
END IF;
END;
$$ LANGUAGE plpgsql;
This version will output a comma even if the value in the row is null or empty, so you get output like this:
a, b, c, , e, , g
If you would prefer to remove extra commas to output this:
a, b, c, e, g
Then add an ELSIF check to the function like this:
CREATE FUNCTION commacat_ignore_nulls(acc text, instr text) RETURNS text AS $$
BEGIN
IF acc IS NULL OR acc = '' THEN
RETURN instr;
ELSIF instr IS NULL OR instr = '' THEN
RETURN acc;
ELSE
RETURN acc || ', ' || instr;
END IF;
END;
$$ LANGUAGE plpgsql;
How about using Postgres built-in array functions? At least on 8.4 this works out of the box:
SELECT company_id, array_to_string(array_agg(employee), ',')
FROM mytable
GROUP BY company_id;
As from PostgreSQL 9.0 you can use the aggregate function called string_agg. Your new SQL should look something like this: SELECT company_id, string_agg(employee, ', ')
FROM mytable
GROUP BY company_id;
I claim no credit for the answer because I found it after some searching:
What I didn't know is that PostgreSQL allows you to define your own aggregate functions with CREATE AGGREGATE
This post on the PostgreSQL list shows how trivial it is to create a function to do what's required:
CREATE AGGREGATE textcat_all(
basetype = text,
sfunc = textcat,
stype = text,
initcond = ''
);
SELECT company_id, textcat_all(employee || ', ')
FROM mytable
GROUP BY company_id;
As already mentioned, creating your own aggregate function is the right thing to do. Here is my concatenation aggregate function (you can find details in French):
CREATE OR REPLACE FUNCTION concat2(text, text) RETURNS text AS '
SELECT CASE WHEN $1 IS NULL OR $1 = \'\' THEN $2
WHEN $2 IS NULL OR $2 = \'\' THEN $1
ELSE $1 || \' / \' || $2
END;
'
LANGUAGE SQL;
CREATE AGGREGATE concatenate (
sfunc = concat2,
basetype = text,
stype = text,
initcond = ''
);
And then use it as:
SELECT company_id, concatenate(employee) AS employees FROM ...
This latest announcement list snippet might be of interest if you'll be upgrading to 8.4:
Until 8.4 comes out with a
super-effient native one, you can add
the array_accum() function in the
PostgreSQL documentation for rolling
up any column into an array, which can
then be used by application code, or
combined with array_to_string() to
format it as a list:
http://www.postgresql.org/docs/current/static/xaggr.html
I'd link to the 8.4 development docs but they don't seem to list this feature yet.
Following up on Kev's answer, using the Postgres docs:
First, create an array of the elements, then use the built-in array_to_string function.
CREATE AGGREGATE array_accum (anyelement)
(
sfunc = array_append,
stype = anyarray,
initcond = '{}'
);
select array_to_string(array_accum(name),'|') from table group by id;
Following yet again on the use of a custom aggregate function of string concatenation: you need to remember that the select statement will place rows in any order, so you will need to do a sub select in the from statement with an order by clause, and then an outer select with a group by clause to aggregate the strings, thus:
SELECT custom_aggregate(MY.special_strings)
FROM (SELECT special_strings, grouping_column
FROM a_table
ORDER BY ordering_column) MY
GROUP BY MY.grouping_column
Use STRING_AGG function for PostgreSQL and Google BigQuery SQL:
SELECT company_id, STRING_AGG(employee, ', ')
FROM employees
GROUP BY company_id;
I found this PostgreSQL documentation helpful: http://www.postgresql.org/docs/8.0/interactive/functions-conditional.html.
In my case, I sought plain SQL to concatenate a field with brackets around it, if the field is not empty.
select itemid,
CASE
itemdescription WHEN '' THEN itemname
ELSE itemname || ' (' || itemdescription || ')'
END
from items;
If you are on Amazon Redshift, where string_agg is not supported, try using listagg.
SELECT company_id, listagg(EMPLOYEE, ', ') as employees
FROM EMPLOYEE_table
GROUP BY company_id;
According to version PostgreSQL 9.0 and above you can use the aggregate function called string_agg. Your new SQL should look something like this:
SELECT company_id, string_agg(employee, ', ')
FROM mytable GROUP BY company_id;
You can also use format function. Which can also implicitly take care of type conversion of text, int, etc by itself.
create or replace function concat_return_row_count(tbl_name text, column_name text, value int)
returns integer as $row_count$
declare
total integer;
begin
EXECUTE format('select count(*) from %s WHERE %s = %s', tbl_name, column_name, value) INTO total;
return total;
end;
$row_count$ language plpgsql;
postgres=# select concat_return_row_count('tbl_name','column_name',2); --2 is the value
I'm using Jetbrains Rider and it was a hassle copying the results from above examples to re-execute because it seemed to wrap it all in JSON. This joins them into a single statement that was easier to run
select string_agg('drop table if exists "' || tablename || '" cascade', ';')
from pg_tables where schemaname != $$pg_catalog$$ and tableName like $$rm_%$$

comparing a column to a list of values in t-sql

I am displaying records on a page, and I need a way for the user to select a subset of those records to be displayed on another page. These records aren't stored anywhere the are a dynamically generated thing.
What is the best way to in sql to say where a uniqueid is in this list of ids not in a table etc. I know I could dynamically construct the sql with a bunch of ors, but that seems like a hack. anyone else have any suggestions?
this is the best source:
http://www.sommarskog.se/arrays-in-sql.html
create a split function, and use it like:
SELECT
*
FROM YourTable y
INNER JOIN dbo.splitFunction(#Parameter) s ON y.ID=s.Value
I prefer the number table approach
For this method to work, you need to do this one time table setup:
SELECT TOP 10000 IDENTITY(int,1,1) AS Number
INTO Numbers
FROM sys.objects s1
CROSS JOIN sys.objects s2
ALTER TABLE Numbers ADD CONSTRAINT PK_Numbers PRIMARY KEY CLUSTERED (Number)
Once the Numbers table is set up, create this function:
CREATE FUNCTION [dbo].[FN_ListToTable]
(
#SplitOn char(1) --REQUIRED, the character to split the #List string on
,#List varchar(8000)--REQUIRED, the list to split apart
)
RETURNS TABLE
AS
RETURN
(
----------------
--SINGLE QUERY-- --this will not return empty rows
----------------
SELECT
ListValue
FROM (SELECT
LTRIM(RTRIM(SUBSTRING(List2, number+1, CHARINDEX(#SplitOn, List2, number+1)-number - 1))) AS ListValue
FROM (
SELECT #SplitOn + #List + #SplitOn AS List2
) AS dt
INNER JOIN Numbers n ON n.Number < LEN(dt.List2)
WHERE SUBSTRING(List2, number, 1) = #SplitOn
) dt2
WHERE ListValue IS NOT NULL AND ListValue!=''
);
GO
You can now easily split a CSV string into a table and join on it:
select * from dbo.FN_ListToTable(',','1,2,3,,,4,5,6777,,,')
OUTPUT:
ListValue
-----------------------
1
2
3
4
5
6777
(6 row(s) affected)
Your can pass in a CSV string into a procedure and process only rows for the given IDs:
SELECT
y.*
FROM YourTable y
INNER JOIN dbo.FN_ListToTable(',',#GivenCSV) s ON y.ID=s.ListValue
You can use the solution Joel Spolsky recently gave for this problem.
SELECT * FROM MyTable
WHERE ',' + 'comma,separated,list,of,words' + ','
LIKE '%,' + MyTable.word + ',%';
That solution is clever but slow. The better solution is to split the comma-separated string, and construct a dynamic SQL query with the IN() predicate, adding a query parameter placeholder for each element in your list of values:
SELECT * FROM MyTable
WHERE word IN ( ?, ?, ?, ?, ?, ?, ?, ? );
The number of placeholders is what you have to determine when you split your comma-separated string. Then pass one value from that list per parameter.
If you have too many values in the list and making a long IN() predicate is unwieldy, then insert the values to a temporary table, and JOIN against your main table:
CREATE TEMPORARY TABLE TempTableForSplitValues (word VARCHAR(20));
...split your comma-separated list and INSERT each value to a separate row...
SELECT * FROM MyTable JOIN TempTableForSplitValues USING (word);
Also see many other similar questions on SO, including:
Dynamic SQL Comma Delimited Value Query
Passing a varchar full of comma delimited values to a SQL Server IN function
Parameterized Queries with Like and In
Parameterizing a SQL IN clause?

Resources