I have a parcels table in postgresql in which the zoning and zoning_description columns are array_agg cast over to text. the new.universities table has 9 rows and I need to return 9 rows in the output.
The purpose of this query is to find all the properties these universities are located on and collapse there zoning types into 1 unique column and union/dissolve their geometries into multipolygons
select array_agg(distinct dp.zoning) zoning,array_agg(distinct dp.zoning_description) zoning_description,
uni.school name_,uni.address,'University' type_,1000 buff,st_union(dp.geom)
from new.universities uni join new.detroit_parcels_update dp
on st_intersects(st_buffer(uni.geom,-10),dp.geom)
group by name_,uni.address,type_,buff
I get this error
ERROR: cannot accumulate arrays of different dimensionality
********** Error **********
ERROR: cannot accumulate arrays of different dimensionality
SQL state: 2202E
I can do array_agg(distinct dp.zoning::text) zoning etc.. but this returns a completely messed up column with nested arrays in arrays...
Based on the answer here is my updated query which does not work
select array_agg(distinct zoning_u) zoning,array_agg(distinct zoning_description_u) zoning_description,
uni.school name_,uni.address,'University' type_,1000::int buff,st_union(dp.geom) geom
from new.detroit_parcels_update dp,unnest(zoning) zoning_u,
unnest(zoning_description) zoning_description_u
join new.universities uni
on st_intersects(st_buffer(uni.geom,-10),dp.geom)
group by name_,uni.address,type_,buff order by name_
get this error
ERROR: invalid reference to FROM-clause entry for table "dp"
LINE 6: on st_intersects(st_buffer(uni.geom,-10),dp.geom)
^
HINT: There is an entry for table "dp", but it cannot be referenced from this part of the query.
********** Error **********
ERROR: invalid reference to FROM-clause entry for table "dp"
SQL state: 42P01
Hint: There is an entry for table "dp", but it cannot be referenced from this part of the query.
Character: 373
My Final query which worked was
with t as(select dp.zoning,dp.zoning_description,uni.school name_,uni.address,'University' type_,1000::int buff,st_union(dp.geom) geom
from new.detroit_parcels_update dp
join new.universities uni
on st_intersects(st_buffer(uni.geom,-10),dp.geom)
group by name_,uni.address,type_,buff,dp.zoning,zoning_description order by name_
)
select name_,address,type_,buff,st_union(geom) geom,array_agg(distinct z) zoning, array_agg(distinct zd) zoning_description
from t,unnest(zoning) z,unnest(zoning_description) zd
group by name_,address,type_,buff
Example data:
create table my_table(name text, numbers text[], letters text[]);
insert into my_table values
('first', '{1, 2}', '{a}' ),
('first', '{2, 3}', '{a, b}'),
('second', '{4}', '{c, d}'),
('second', '{5, 6}', '{c}' );
You should aggregate arrays elements, not arrays. Use unnest():
select
name,
array_agg(distinct number) as numbers,
array_agg(distinct letter) as letters
from
my_table,
unnest(numbers) as number,
unnest(letters) as letter
group by name;
name | numbers | letters
--------+---------+---------
first | {1,2,3} | {a,b}
second | {4,5,6} | {c,d}
(2 rows)
Alternatively, you can create a custom aggregate. You need a function to merge arrays (concatenation with duplicates removing):
create or replace function public.array_merge(arr1 anyarray, arr2 anyarray)
returns anyarray language sql immutable
as $$
select array_agg(distinct elem order by elem)
from (
select unnest(arr1) elem
union
select unnest(arr2)
) s
$$;
create aggregate array_merge_agg(anyarray) (
sfunc = array_merge,
stype = anyarray
);
select
name,
array_merge_agg(numbers) as numbers,
array_merge_agg(letters) as letters
from my_table
group by name;
A much simpler alternative is to create a custom aggregate function (you only need to do this once)
CREATE AGGREGATE array_concat_agg(anyarray) (
SFUNC = array_cat,
STYPE = anyarray
);
Then replace array_agg for array_concat_agg:
SELECT
array_concat_agg(DISTINCT dp.zoning) zoning,
array_concat_agg(DISTINCT dp.zoning_description) zoning_description,
uni.school name_,
uni.address,
'University' type_,
1000 buff,
st_union(dp.geom)
FROM
new.universities uni
JOIN new.detroit_parcels_update dp ON st_intersects(st_buffer(uni.geom, - 10), dp.geom)
GROUP BY
name_,
uni.address,
type_,
buff
Related
I am relatively new to Snowflake and struggle a bit with setting up a transformation for a semi-structured dataset. I have several log data batches, where each batch (table row in Snowflake) has the following columns: LOG_ID, COLUMN_NAMES, and LOG_ENTRIES .
COLUMN_NAMES contains a semicolon-separated list of columns names, e.g.:
“TIMESTAMP;Sensor A;Sensor B”, “TIMESTAMP;Sensor B;Sensor C”
LOG_ENTRIES:entry contains a semicolon separated list of values, e.g.
“2020-02-11 09:08:19; 99.24;12.25”
The COLUMN_NAMES string can be different between log batches (Snowflake rows), but the names in the order they appear describe the content of the LOG_ENTRIES column values of the same row. My goal is to transform the data into a table that has column names for all unique values present in the COLUMN_NAMES column, e.g.:
LOG_ID
TIMESTAMP
Sensor A
Sensor B
Sensor C
1
2020-02-11 09:08:19
99.24
12.25
NaN
2
2020-02-11 09:10:44
NaN
13.32
0.947
Can this be achieved with a snowflake script, and if so, how? :)
Best regards,
Johan
You should use the SPLIT_TO_TABLE function, split the two values, and join them by index.
After that, all you have to do is use PIVOT to invert the table.
Sample data:
create or replace table splittable (LOG_ID int, COLUMN_NAMES varchar, LOG_ENTRIES varchar);
insert into splittable (LOG_ID, COLUMN_NAMES, LOG_ENTRIES)
values (1, 'TIMESTAMP;Sensor A;Sensor B', '2020-02-11 09:08:19;99.24;12.25'),
(2, 'TIMESTAMP;Sensor B;Sensor C', '2020-02-11 09:10:44;13.32;0.947');
Solution proposal:
WITH src AS (
select LOG_ID, cn.VALUE as COLUMN_NAMES, le.VALUE as LOG_ENTRIES
from splittable as st,
lateral split_to_table(st.COLUMN_NAMES, ';') as cn,
lateral split_to_table(st.LOG_ENTRIES, ';') as le
where cn.INDEX = le.INDEX
)
select * from src
pivot (min(LOG_ENTRIES) for COLUMN_NAMES in ('TIMESTAMP','Sensor A','Sensor B','Sensor C'))
order by LOG_ID;
Reference: SPLIT_TO_TABLE, PIVOT
If the column list is variable and you can't define it then you have to write some generator, maybe it will help: CREATE A DYNAMIC PIVOT IN SNOWFLAKE
You could transform the data into an ACTUAL semi-structured data type that you can then natively query using Snowflake SQL.
WITH x AS (
SELECT column_names, log_entries
FROM (VALUES ('TIMESTAMP_;SENSOR1','2021-02-01'||';1.2')) x (column_names, log_entries)
),
y AS (
SELECT *
FROM x,
LATERAL FLATTEN(input => split(column_names,';')) f
),
z AS (
SELECT *
FROM x,
LATERAL FLATTEN(input => split(log_entries,';')) f
)
SELECT listagg(('"'||y.value||'":"'||z.value||'"'),',') as cnt
, parse_json('{'||cnt||'}') as var
FROM y
JOIN z
ON y.seq = z.seq
AND y.index = z.index
GROUP BY y.seq;
I'm trying to migrate an oracle query to Postgres:
SELECT /*+ materialize */ distinct r.empid, r.mgr_id, CONNECT_BY_ISLEAF leafs
FROM (Select * from empid_reports_to_mgrid WHERE sysdate BETWEEN eff_date AND eff_date_end) r
CONNECT BY PRIOR r.mgr_id = r.empid
START WITH r.empid IN (SELECT distinct empid
FROM employee
WHERE event_oid ='345345' AND F_HISTORICAL=0 and F_ELIGIBLE=1);
I arrived at this solution:
( with recursive cte ( empid, mgr_id,level, visited, root_id) AS
(
select empid::varchar ,
mgr_id::varchar,
1 as level,
array[empid]::varchar[] as visited,
empid::varchar as root_id
from (Select * from empid_reports_to_mgrid WHERE now() BETWEEN eff_date AND eff_date_end
where empid IN (SELECT distinct empid
FROM employee
WHERE event_oid ='345345' AND F_HISTORICAL=0 and F_ELIGIBLE=1) ) e
union all
select c.empid::varchar,
c.mgr_id::varchar,
p.level + 1,
(p.visited::varchar[] ||c.empid::varchar[]),
p.root_id::varchar
from (Select * from empid_reports_to_mgrid WHERE now() BETWEEN eff_date AND eff_date_end) c
join cte p on p.mgr_id= c.empid
where c.empid <> all(p.visited)
)
SELECT e.*,
not exists (select * from cte p where p.mgr_id = e.empid) as leafs
FROM cte e);
The columns empid and mgr_id are of data type varchar(32).
When I run this query, I'm getting the below error:
SQL Error [42804]: ERROR: recursive query "cte" column 4 has type character varying(32)[] in non-recursive term but type character varying[] overall
Hint: Cast the output of the non-recursive term to the correct type.
The type casts that are present have been added after looking at the below post which suggests to type cast the recursive columns to get rid of the error but it dint work:
Postgres CTE : type character varying(255)[] in non-recursive term but type character varying[] overall
How do we migrate CONNECT_BY_ISLEAF to postgres? Please help!
Also, what are the recursive columns in this case?
If I'm typecasting to text and text[] instead of varchar and varchar[], I'm getting the below error:
malformed array literal: "21466694N" Detail: Array value must start with "{" or dimension information.
I am using Postgres 9.6, and I have the following two tables:
create table employees (
id text,
setting_id text -- references settings.id
);
create table settings (
id text,
setting_str text -- contains json string
);
insert into employees (id, setting_id) values ('e1', 's1');
insert into employees (id, setting_id) values ('e2', 's2');
insert into settings (id, setting_str)
values ('s1', '{"vehicles" : null}');
insert into settings (id, setting_str)
values ('s2', '{"vehicles" : ["Car", "Bike"]}');
Now I want to get output like:
employee_id, name, vehicles
e1, one, null
e2, two, {"Car", "Bike"}
I tried with the following query:
select e.id,
jsonb_array_elements_text(s.setting_str::jsonb #> '{vehicles}')
from employees e
join settings s on s.id = e.setting_id;
But it gives me an error:
ERROR: cannot extract elements from a scalar
Any idea how I can extract the JSON array from the text field and display it as a Postgres array of strings (not a json array, not text) in a select statement?
The data model is not good, so the query will be complicated.
You should use a relational model for data like this.
Here is the query with inline comments:
SELECT e.id AS employee_id,
s.id AS setting_id,
/* construct an array of all vehicles that are not NULL */
array_agg(v.p) FILTER (WHERE v.p IS NOT NULL) AS vehicles
FROM employees AS e
JOIN settings AS s ON e.setting_id = s.id
/* join with the "exploded" JSON array from s.setting_str */
LEFT JOIN LATERAL json_array_elements_text(
/* replace "null" with an empty array */
COALESCE(
(s.setting_str::json) ->> 'vehicles',
'[]'
)::json
) AS v(p) ON TRUE
GROUP BY e.id, s.id;
employee_id | setting_id | vehicles
-------------+------------+------------
e1 | s1 |
e2 | s2 | {Car,Bike}
(2 rows)
I have a table like this:
CREATE TABLE preferences (name varchar, preferences varchar[]);
INSERT INTO preferences (name, preferences)
VALUES
('John','{pizza, spaghetti}'),
('Charlie','{spaghetti, rice}'),
('Lucy','{rice, potatoes}'),
('Beth','{bread, cheese}'),
('Trudy','{rice, milk}');
So from the table
John {pizza, spaghetti}
Charlie {spaghetti, rice}
Lucy {rice, potatoes}
Beth {bread, cheese}
Trudy {rice, milk}
I would like group all rows that have elements in common (even if it is through other people).
So in this case I would like to end up with:
{John,Charlie,Lucy,Trudy} {pizza,spaghetti,rice,potatoes,milk}
{Beth} {bread, cheese}
because Johns preferences intersect with those of Charlie, and those of Charlie intersect with those of Lucy and with those of Trudy.
I already haven an array_intersection function like this:
CREATE OR REPLACE FUNCTION array_intersection(anyarray, anyarray)
RETURNS anyarray
language sql
as $FUNCTION$
SELECT ARRAY(
SELECT UNNEST($1)
INTERSECT
SELECT UNNEST($2)
);
$FUNCTION$;
and know the array_agg function to aggregate arrays, but how to turn those into a grouping like I want is the step I am missing.
This is a typical task for recursion. You need an auxiliary function to merge and sort two arrays:
create or replace function public.array_merge(arr1 anyarray, arr2 anyarray)
returns anyarray
language sql immutable
as $function$
select array_agg(distinct elem order by elem)
from (
select unnest(arr1) elem
union
select unnest(arr2)
) s
$function$;
Use the function in the recursive query:
with recursive cte(name, preferences) as (
select *
from preferences
union
select p.name, array_merge(c.preferences, p.preferences)
from cte c
join preferences p
on c.preferences && p.preferences
and c.name <> p.name
)
select array_agg(name) as names, preferences
from (
select distinct on(name) *
from cte
order by name, cardinality(preferences) desc
) s
group by preferences;
names | preferences
---------------------------+--------------------------------------
{Charlie,John,Lucy,Trudy} | {milk,pizza,potatoes,rice,spaghetti}
{Beth} | {bread,cheese}
(2 rows)
I pass a 2d array to a procedure. This array contains multiple arrays of ids. I want to
group a table by group_id
for each group, for each array in the 2d array
IF this group has all the ids within this iteration array, then return it
I read here about issues with 2d arrays:
postgres, contains-operator for multidimensional arrays performs flatten before comparing?
I think I'm nearly there, but I am unsure how to get around the problem. I understand why the following code produces the error "Subquery can only return one column", but I cant work out how to fix it
DEALLOCATE my_proc;
PREPARE my_proc (bigint[][]) AS
WITH cte_arr AS (select $1 AS arr),
cte_s AS (select generate_subscripts(arr,1) AS subscript,
arr from cte_arr),
grouped AS (SELECT ufs.user_id, array_agg(entity_id)
FROM table_A AS ufs
GROUP BY ufs.user_id)
SELECT *
FROM grouped
WHERE (select arr[subscript:subscript] #> array_agg AS sub,
arr[subscript:subscript]
from cte_s);
EXECUTE my_proc(array[array[1, 2], array[1,3]]);
You can create a row for each group and each array in the parameter with a cross join:
PREPARE stmt (bigint[][]) AS
with grouped as
(
select user_id
, array_agg(entity_id) as user_groups
from table_A
group by
user_id
)
select user_id
, user_groups
, $1[subscript:subscript] as matches
from grouped
cross join
generate_subscripts($1, 1) as gen(subscript)
where user_groups #> $1[subscript:subscript]
;
Example at SQL Fiddle