PostgreSQL - join statement duplicate row data combine to single row [duplicate] - database

I am looking for a way to concatenate the strings of a field within a group by query. So for example, I have a table:
ID COMPANY_ID EMPLOYEE
1 1 Anna
2 1 Bill
3 2 Carol
4 2 Dave
and I wanted to group by company_id to get something like:
COMPANY_ID EMPLOYEE
1 Anna, Bill
2 Carol, Dave
There is a built-in function in mySQL to do this group_concat

PostgreSQL 9.0 or later:
Modern Postgres (since 2010) has the string_agg(expression, delimiter) function which will do exactly what the asker was looking for:
SELECT company_id, string_agg(employee, ', ')
FROM mytable
GROUP BY company_id;
Postgres 9 also added the ability to specify an ORDER BY clause in any aggregate expression; otherwise you have to order all your results or deal with an undefined order. So you can now write:
SELECT company_id, string_agg(employee, ', ' ORDER BY employee)
FROM mytable
GROUP BY company_id;
PostgreSQL 8.4.x:
PostgreSQL 8.4 (in 2009) introduced the aggregate function array_agg(expression) which collects the values in an array. Then array_to_string() can be used to give the desired result:
SELECT company_id, array_to_string(array_agg(employee), ', ')
FROM mytable
GROUP BY company_id;
PostgreSQL 8.3.x and older:
When this question was originally posed, there was no built-in aggregate function to concatenate strings. The simplest custom implementation (suggested by Vajda Gabo in this mailing list post, among many others) is to use the built-in textcat function (which lies behind the || operator):
CREATE AGGREGATE textcat_all(
basetype = text,
sfunc = textcat,
stype = text,
initcond = ''
);
Here is the CREATE AGGREGATE documentation.
This simply glues all the strings together, with no separator. In order to get a ", " inserted in between them without having it at the end, you might want to make your own concatenation function and substitute it for the "textcat" above. Here is one I put together and tested on 8.3.12:
CREATE FUNCTION commacat(acc text, instr text) RETURNS text AS $$
BEGIN
IF acc IS NULL OR acc = '' THEN
RETURN instr;
ELSE
RETURN acc || ', ' || instr;
END IF;
END;
$$ LANGUAGE plpgsql;
This version will output a comma even if the value in the row is null or empty, so you get output like this:
a, b, c, , e, , g
If you would prefer to remove extra commas to output this:
a, b, c, e, g
Then add an ELSIF check to the function like this:
CREATE FUNCTION commacat_ignore_nulls(acc text, instr text) RETURNS text AS $$
BEGIN
IF acc IS NULL OR acc = '' THEN
RETURN instr;
ELSIF instr IS NULL OR instr = '' THEN
RETURN acc;
ELSE
RETURN acc || ', ' || instr;
END IF;
END;
$$ LANGUAGE plpgsql;

How about using Postgres built-in array functions? At least on 8.4 this works out of the box:
SELECT company_id, array_to_string(array_agg(employee), ',')
FROM mytable
GROUP BY company_id;

As from PostgreSQL 9.0 you can use the aggregate function called string_agg. Your new SQL should look something like this: SELECT company_id, string_agg(employee, ', ')
FROM mytable
GROUP BY company_id;

I claim no credit for the answer because I found it after some searching:
What I didn't know is that PostgreSQL allows you to define your own aggregate functions with CREATE AGGREGATE
This post on the PostgreSQL list shows how trivial it is to create a function to do what's required:
CREATE AGGREGATE textcat_all(
basetype = text,
sfunc = textcat,
stype = text,
initcond = ''
);
SELECT company_id, textcat_all(employee || ', ')
FROM mytable
GROUP BY company_id;

As already mentioned, creating your own aggregate function is the right thing to do. Here is my concatenation aggregate function (you can find details in French):
CREATE OR REPLACE FUNCTION concat2(text, text) RETURNS text AS '
SELECT CASE WHEN $1 IS NULL OR $1 = \'\' THEN $2
WHEN $2 IS NULL OR $2 = \'\' THEN $1
ELSE $1 || \' / \' || $2
END;
'
LANGUAGE SQL;
CREATE AGGREGATE concatenate (
sfunc = concat2,
basetype = text,
stype = text,
initcond = ''
);
And then use it as:
SELECT company_id, concatenate(employee) AS employees FROM ...

This latest announcement list snippet might be of interest if you'll be upgrading to 8.4:
Until 8.4 comes out with a
super-effient native one, you can add
the array_accum() function in the
PostgreSQL documentation for rolling
up any column into an array, which can
then be used by application code, or
combined with array_to_string() to
format it as a list:
http://www.postgresql.org/docs/current/static/xaggr.html
I'd link to the 8.4 development docs but they don't seem to list this feature yet.

Following up on Kev's answer, using the Postgres docs:
First, create an array of the elements, then use the built-in array_to_string function.
CREATE AGGREGATE array_accum (anyelement)
(
sfunc = array_append,
stype = anyarray,
initcond = '{}'
);
select array_to_string(array_accum(name),'|') from table group by id;

Following yet again on the use of a custom aggregate function of string concatenation: you need to remember that the select statement will place rows in any order, so you will need to do a sub select in the from statement with an order by clause, and then an outer select with a group by clause to aggregate the strings, thus:
SELECT custom_aggregate(MY.special_strings)
FROM (SELECT special_strings, grouping_column
FROM a_table
ORDER BY ordering_column) MY
GROUP BY MY.grouping_column

Use STRING_AGG function for PostgreSQL and Google BigQuery SQL:
SELECT company_id, STRING_AGG(employee, ', ')
FROM employees
GROUP BY company_id;

I found this PostgreSQL documentation helpful: http://www.postgresql.org/docs/8.0/interactive/functions-conditional.html.
In my case, I sought plain SQL to concatenate a field with brackets around it, if the field is not empty.
select itemid,
CASE
itemdescription WHEN '' THEN itemname
ELSE itemname || ' (' || itemdescription || ')'
END
from items;

If you are on Amazon Redshift, where string_agg is not supported, try using listagg.
SELECT company_id, listagg(EMPLOYEE, ', ') as employees
FROM EMPLOYEE_table
GROUP BY company_id;

According to version PostgreSQL 9.0 and above you can use the aggregate function called string_agg. Your new SQL should look something like this:
SELECT company_id, string_agg(employee, ', ')
FROM mytable GROUP BY company_id;

You can also use format function. Which can also implicitly take care of type conversion of text, int, etc by itself.
create or replace function concat_return_row_count(tbl_name text, column_name text, value int)
returns integer as $row_count$
declare
total integer;
begin
EXECUTE format('select count(*) from %s WHERE %s = %s', tbl_name, column_name, value) INTO total;
return total;
end;
$row_count$ language plpgsql;
postgres=# select concat_return_row_count('tbl_name','column_name',2); --2 is the value

I'm using Jetbrains Rider and it was a hassle copying the results from above examples to re-execute because it seemed to wrap it all in JSON. This joins them into a single statement that was easier to run
select string_agg('drop table if exists "' || tablename || '" cascade', ';')
from pg_tables where schemaname != $$pg_catalog$$ and tableName like $$rm_%$$

Related

Extract words in between the separator

I have input like below
I want like below
I was trying with
Sales External?HR?Purchase Department
I did LISTAGG because finally i want in separate columns
Query Output would be like below,
meaning it should search for first occurrence of the separator (in this case "?", it can be anything but not common ones like "-" or "/" as the separator needs to be separate than sting value) and then extract the phrase before the first separator and create one column and put the value. Then it should look for second occurrence of the separator and then extract the word and keep creating columns, there can be multiple separators.
I tried with SPLIT_PART but it does not maintain the sequence in real data scenario and data does not come correct as per sequence.
I also tried with REGEXP_INSTR, but unable to use special characters as separators.
Any thought?
Regex Extract should work for you:
SELECT
REGEXP_SUBSTR_ALL("Sales External?HR?Purchase Department", "(.*)\?")
You can use LATERAL FLATTEN to convert your array into rows:
WITH MY_CTE AS (
SELECT
REGEXP_SUBSTR_ALL("Sales External?HR?Purchase Department", "(.*)\?")
)
SELECT
*
FROM
LATERAL FLATTEN(INPUT => MY_CTE, MODE=> 'ARRAY')
Deeper dive into some more cases: https://dwgeek.com/snowflake-convert-array-to-rows-methods-and-examples.html/
Here's a simplified version of the data. It uses a CTE with array_agg to group the rows. It then changes from arrays to columns. To add more columns, you can use max(), min(), or any_value() functions to get them through the aggregation. (Note that use of any_value() will not allow use of cached results from the result set cache since it's flagged as nondeterministic.)
create or replace table T1 (EMPID int, ROLE string, ACCESS string, ACCESS_LVL string, ITERATION string);
insert into T1(EMPID, ROLE, ACCESS, ACCESS_LVL, ITERATION) values
(1234, 'Sales Rep', 'Specific', 'REGION', 'DEV'),
(1234, 'Purchase Rep', 'Specific', 'EVERY', 'PROD'),
(1234, 'HR', NULL, 'Dept', 'PROD'),
(4321, 'HR', 'Foo', 'Foo', 'Foo')
;
with X as
(
select EMPID
,array_agg(nvl(ROLE,'')) within group (order by ROLE) ARR_ROLE
,array_agg(nvl(ACCESS,'')) within group (order by ROLE) ARR_ACCESS
,array_agg(nvl(ACCESS_LVL,'')) within group (order by ROLE) ARR_ACCESS_LVL
,array_agg(nvl(ITERATION,'')) within group (order by ROLE) ARR_ITERATION
from T1
group by EMPID
)
select EMPID
,ARR_ROLE[0]::string as ROLE1
,ARR_ROLE[1]::string as ROLE2
,ARR_ROLE[2]::string as ROLE3
,ARR_ACCESS[0]::string as ACCESS1
,ARR_ACCESS[1]::string as ACCESS2
,ARR_ACCESS[2]::string as ACCESS3
,ARR_ACCESS_LVL[0]::string as ACCESS_LVL1
,ARR_ACCESS_LVL[1]::string as ACCESS_LVL2
,ARR_ACCESS_LVL[2]::string as ACCESS_LVL3
,ARR_ITERATION[0]::string as ITERATION1
,ARR_ITERATION[1]::string as ITERATION2
,ARR_ITERATION[2]::string as ITERATION3
from X
;
There's nothing particular that seems interesting to sort the rows into the array so that ROLE1, ROLE2, ROLE3, etc. are deterministic. I showed simply sorting on the name of the role, but it could be any order by within that group.
Here's a stored proc that will produce a table result with a dynamic set of columns based on the input string and specified delimiter.
If you are looking for a way to generate dynamic column names based on values, I recommend visiting Felipe Hoffa's blog entry here:
https://medium.com/snowflake/dynamic-pivots-in-sql-with-snowflake-c763933987c
create or replace procedure pivot_dyn_results(input string, delimiter string)
returns table ()
language SQL
AS
declare
max_count integer default 0;
lcount integer default 0;
rs resultset;
stmt1 string;
stmt2 string;
begin
-- Get number of delimiter separated values (assumes no leading or trailing delimiter)
select regexp_count(:input, '\\'||:delimiter, 1) into :max_count from dual;
-- Generate the initial row-based result set of parsed values
stmt1 := 'SELECT * from lateral split_to_table(?,?)';
-- Build dynamic query to produce the pivoted column based results
stmt2 := 'select * from (select * from table(result_scan(last_query_id(-1)))) pivot(max(value) for index in (';
-- initialize look counter for resulting columns
lcount := 1;
stmt2 := stmt2 || '\'' || lcount || '\'';
-- append pivot statement for each column to be represented
FOR l in 1 to max_count do
lcount := lcount + 1;
stmt2 := stmt2 || ',\'' || lcount || '\'';
END FOR;
-- close out the pivot statement
stmt2 := stmt2 || '))';
-- execute the
EXECUTE IMMEDIATE :stmt1 using (input, delimiter);
rs := (EXECUTE IMMEDIATE :stmt2);
return table(rs);
end;
Invocation:
call pivot_dyn_results([string],[delimiter]);
call pivot_dyn_results('Sales External?HR?Billing?Purchase Department','?');
Results:

SQL Server to Oracle - using Cross Apply with Oracle

I have a function that takes primary keys and separates them with commas.
Oracle function:
create or replace function split(
list in CHAR,
delimiter in CHAR default ','
)
return split_tbl as
splitted split_tbl := split_tbl();
i pls_integer := 0;
list_ varchar2(32767) := list;
begin
loop
i := instr(list_, delimiter);
if i > 0 then
splitted.extend(1);
splitted(splitted.last) := substr(list_, 1, i - 1);
list_ := substr(list_, i + length(delimiter));
else
splitted.extend(1);
splitted(splitted.last) := list_;
return splitted;
end if;
end loop;
end;
and I have this query in SQL Server that returns the data of this query in the function table
select maxUserSalary.id as 'UserSalary'
into #usersalary
from dbo.Split(#usersalary,';') as userid
cross apply (
select top 1 * from User_Salaryas usersalary
where usersalary.User_Id= userid.item
order by usersalary.Date desc
) as maxUserSalary
The problem is, I'm not able to use cross apply in Oracle to throw this data into this function that is returning a table.
How can I use cross apply with Oracle to return this data in function?
You're using Oracle 18c so you can use the CROSS APPLY syntax. Oracle added it (as well as LATERAL and OUTER APPLY ) in 12c.
Here is a simplified version of your logic:
select us.name
, us.salary
from table(split('FOX IN SOCKS,THING ONE,THING TWO')) t
cross apply (select us.name, max(us.salary) as salary
from user_salaries us
where us.name = t.column_value ) us
There is a working demo on db<>fiddle .
If this doesn't completely solve your problem please post a complete question with table structures, sample data and expected output derived from that sample.
I think APC answered your direct question well. As a side note, I wanted to suggest NOT writing your own function to do this at all. There are several existing solutions to split delimited string values into virtual tables that don't require you to create your own custom types, and don't have the performance overhead of context switching between the SQL and PL/SQL engines.
-- example data - remove this to test with your User_Salary table
with User_Salary as (select 1 as id, 'A' as user_id, sysdate as "Date" from dual
union select 2, 'B', sysdate from dual)
-- your query:
select maxUserSalary.id as "UserSalary"
from (select trim(COLUMN_VALUE) as item
from xmltable(('"'||replace(:usersalary, ';', '","')||'"'))) userid -- note ';' delimiter
cross apply (
select * from User_Salary usersalary
where usersalary.User_Id = userid.item
order by usersalary."Date" desc
fetch first 1 row only
) maxUserSalary;
If you run this and pass in 'A;B;C' for :usersalary, you'll get 1 and 2 back.
A few notes:
In this example, I'm using ; as the delimiter, since that's what your query used.
I tried to match your table/column names, but your column name Date is invalid - it's an Oracle reserved keyword, so it has to be put in quotes to be a valid column name.
As a column identifier, "UserSalary" should also have double quotes, not single.
You can't use as in table aliases.
I removed into usersalary, since into is only used with queries which return a single row, and your query can return multiple rows.

Postgresql put strings inside quotes with array_to_string

In select I have used array_to_string like this (example)
array_to_string(array_agg(tag_name),';') tag_names
I got resulting string "tag1;tag2;tag3;..." but I would like to get resulting string as "'tag1';'tag2';'tag3';...".
How can I do this in Postgres?
Use the functions string_agg() and format() with the %L placeholder, which quotes the argument value as an SQL literal.
with my_table(tag_name) as (
values
('tag1'),
('tag2'),
('tag3')
)
select string_agg(format('%L', tag_name), ';' order by tag_name) tag_names
from my_table;
tag_names
----------------------
'tag1';'tag2';'tag3'
(1 row)
Alternatively, format('%L', tag_name) may be replaced with quote_nullable(tag_name).
Or your can use unnest, format, array_agg and array_to_string in one request like this :
select array_to_string(t.tag, ',')
from (
select array_agg(format('%L', t.tag)) as tag
from (
select unnest(tag_name) as tag
) t
) t;
Or use
array_to_string(array_agg(''''||tag_name||''''),';') tag_names
or even simpler (thanks for the commenting :) )
string_agg(''''||tag_name||''''),';') tag_names
Note:
When dealing with multiple-argument aggregate functions, note that the
ORDER BY clause goes after all the aggregate arguments. For example,
write this:
SELECT string_agg(a, ',' ORDER BY a) FROM table;
not this:
SELECT string_agg(a ORDER BY a, ',') FROM table; -- incorrect
See https://www.postgresql.org/docs/current/static/sql-expressions.html#SYNTAX-AGGREGATES
You can use string_agg() function with '''; ''' so it will be like
SELECT string_agg(tag_name, '''; ''') from my_table

Copy records with dynamic column names

I have two tables with different columns in PostgreSQL 9.3:
CREATE TABLE person1(
NAME TEXT NOT NULL,
AGE INT NOT NULL
);
CREATE TABLE person2(
NAME TEXT NOT NULL,
AGE INT NOT NULL,
ADDRESS CHAR(50),
SALARY REAL
);
INSERT INTO person2 (Name, Age, ADDRESS, SALARY)
VALUES ('Piotr', 20, 'London', 80);
I would like to copy records from person2 to person1, but column names can change in program, so I would like to select joint column names in program. So I create an array containing the intersection of column names. Next I use a function: insert into .... select, but I get an error, when I pass the array variable to the function by name. Like this:
select column_name into name1 from information_schema.columns where table_name = 'person1';
select column_name into name2 from information_schema.columns where table_name = 'person2';
select * into cols from ( select * from name1 intersect select * from name2) as tmp;
-- Create array with name of columns
select array (select column_name::text from cols) into cols2;
CREATE OR REPLACE FUNCTION f_insert_these_columns(VARIADIC _cols text[])
RETURNS void AS
$func$
BEGIN
EXECUTE (
SELECT 'INSERT INTO person1 SELECT '
|| string_agg(quote_ident(col), ', ')
|| ' FROM person2'
FROM unnest(_cols) col
);
END
$func$ LANGUAGE plpgsql;
select * from cols2;
array
------------
{name,age}
(1 row)
SELECT f_insert_these_columns(VARIADIC cols2);
ERROR: column "cols2" does not exist
What's wrong here?
You seem to assume that SELECT INTO in SQL would assign a variable. But that is not so.
It creates a new table and its use is discouraged in Postgres. Use the superior CREATE TABLE AS instead. Not least, because the meaning of SELECT INTO inside plpgsql is different:
Combine two tables into a new one so that select rows from the other one are ignored
Concerning SQL variables:
User defined variables in PostgreSQL
Hence you cannot call the function like this:
SELECT f_insert_these_columns(VARIADIC cols2);
This would work:
SELECT f_insert_these_columns(VARIADIC (TABLE cols2 LIMIT 1));
Or cleaner:
SELECT f_insert_these_columns(VARIADIC array) -- "array" being the unfortunate column name
FROM cols2
LIMIT 1;
About the short TABLE syntax:
Is there a shortcut for SELECT * FROM?
Better solution
To copy all rows with columns sharing the same name between two tables:
CREATE OR REPLACE FUNCTION f_copy_rows_with_shared_cols(
IN _tbl1 regclass
, IN _tbl2 regclass
, OUT rows int
, OUT columns text)
LANGUAGE plpgsql AS
$func$
BEGIN
SELECT INTO columns -- proper use of SELECT INTO!
string_agg(quote_ident(attname), ', ')
FROM (
SELECT attname
FROM pg_attribute
WHERE attrelid IN (_tbl1, _tbl2)
AND NOT attisdropped -- no dropped (dead) columns
AND attnum > 0 -- no system columns
GROUP BY 1
HAVING count(*) = 2
) sub;
EXECUTE format('INSERT INTO %1$s(%2$s) SELECT %2$s FROM %3$s'
, _tbl1, columns, _tbl2);
GET DIAGNOSTICS rows = ROW_COUNT; -- return number of rows copied
END
$func$;
Call:
SELECT * FROM f_copy_rows_with_shared_cols('public.person2', 'public.person1');
Result:
rows | columns
-----+---------
3 | name, age
Major points
Note the proper use of SELECT INTO for assignment inside plpgsql.
Note the use of the data type regclass. This allows to use schema-qualified table names (optionally) and defends against SQL injection attempts:
Table name as a PostgreSQL function parameter
About GET DIAGNOSTICS:
Count rows affected by DELETE
About OUT parameters:
Returning from a function with OUT parameter
The manual about format().
Information schema vs. system catalogs.

SQL Server view based on JOIN two tables - how to replace NULL values with spaces?

I'm creating a view for some external system. This external system does not work with null values so I want to change them to some more user-friendly, by example "".
I use this query to create my view:
CREATE VIEW SomeView
AS
SELECT
r.CountryRegionCode, r.Name, e.ExampleData
FROM
AdventureWorks2008.Person.CountryRegion r
JOIN
AdventureWorks2008.dbo.SomeTable e ON r.CountryRegionCode = e.CountryRegionCode
The result of this query is:
As I understand I can use ISNULL operator to replace NULL with space, but how to use it in my statement? How to do it right:
SELECT
ISNULL (r.CountryRegionCode, 0) AS r.CountryRegionCode
-- ..
?
Update: It does not understand what the r is:
Update #2: thank you guys very much! The final result:
SELECT
CountryRegionCode = ISNULL(r.CountryRegionCode, ''),
Name = ISNULL(r.Name,''),
ExampleData = ISNULL(e.ExampleData,'')
FROM
AdventureWorks2008.Person.CountryRegion r
JOIN
AdventureWorks2008.dbo.SomeTable e ON r.CountryRegionCode = e.CountryRegionCode
r is a table alias. If you want get column name r.CountryRegionCode need to quote them: [r.CountryRegionCode]
CREATE VIEW SomeView
AS
SELECT CountryRegionCode = ISNULL(r.CountryRegionCode, ' '), -- COALESCE(r.CountryRegionCode, ' ')
r.name,
e.ExampleData
FROM AdventureWorks2008.Person.CountryRegion r
JOIN AdventureWorks2008.dbo.SomeTable e ON r.CountryRegionCode = e.CountryRegionCode
Also please note about difference between COALESCE and ISNULL:
DECLARE #a CHAR(1)
SELECT ISNULL(#a, 'NULL') /*N*/, COALESCE(#a, 'NULL') /*NULL*/
SELECT ISNULL(r.CountryRegionCode,' ') AS CountryRegionCode

Resources