Postgres SQL function string_to_array - arrays

I have a table:
c1|c2|c3|c4
-----+--+--+----
a b c 10
a a b 20
c a c 10
b b c 10
c b c 30
I want to write a function where the inputs are 3 strings / text eg ('a b c, b d, c'), compare every element to each other, find if a row exist with this combination, an sum the number of the 4th (c4) column up. But if there is a constellation of b a c or c a b it would match a b c 10. If there is a row like b c c then it wont be a row like c b b. Every matchup is unique.
I think the best would be to use string_to_array(text, text).
I put together some pseudo code, but no idea how to write it in SQL. Maybe the logic is wrong too.
function (x,y,z)
res = 0
x_array = string_to_array(x, ' ')
y_array = string_to_array(y, ' ')
z_array = string_to_array(z, ' ')
foreach(x_item in x_array)
foreach(y_item in y_array)
foreach(z_item in z_array)
if (c1 = (x_item || y_item || z_item ) && c2 = (x_item || y_item || z_item ) && c3 = (x_item || y_item || z_item ))
res++
EDIT
First off all there was a mistake in the example table. There was a row a b c and c b a. It cant be. a b c = c b a ! and each row must be unique.
example: three text inputs a b c | b c | c
each element vs each element: a b c , a c c, b b c, b c c, c b c, c c c
a b c = 10;
a c c (is the same as c a c) = 10;
b b c = 10;
b c c (is the same as c b c) = 30;
c b c = 30;
c c c (no match) = 0; result = 90

I think this might be what you want:
Return the sum of column c4 from all rows where a given set of three tokens matches the columns (c1, c2, c3).
Simple version
Much simpler with contains #> and is contained <# by operators:
SELECT sum(c4) AS sum_of_matching_c4
FROM tbl
WHERE ARRAY[c1,c2,c3] <# ARRAY['b', 'a', 'c'] -- strings in arbitrary order
AND ARRAY[c1,c2,c3] #> ARRAY['b', 'a', 'c'];
Sorry, that would fail for ('b', 'c', 'c') vs. ('c', 'b', 'b').
Slow and sure
WITH i(arr) AS (
SELECT ARRAY(VALUES ('b'), ('c'), ('c') ORDER BY 1) -- input once
) -- in arbitrary order
SELECT sum(c4) AS sum_of_matching_c4
FROM (
SELECT c4, array_agg(x ORDER BY x) AS arr
FROM (
SELECT ctid, c4, unnest(ARRAY[c1,c2,c3]) AS x
FROM tbl t, i
WHERE ARRAY[c1,c2,c3] <# arr -- optional pre-selection
AND ARRAY[c1,c2,c3] #> arr -- for better performance?
) a
GROUP BY ctid, c4
) b
JOIN i USING (arr)
-> sqlfiddle demo.
The major difficulty is to order the values of the columns within the row.
For your input (3 strings) I achieve this in the WHERE clause with a VALUE expression in the CTE which I order right away and collect it in an array. I use a CTE for convenience, so we have to enter values in one place only.
It's more complicated for the row values. I put the three columns in an array and break that up to rows with unnest(). As you did not provide a primary key, I use the ctid as ad-hoc surrogate primary key instead - which I need for the GROUP BY to stuff the now sorted (c1, c2, c3) into an array.
Finally I sum up all c4 of rows where the now sorted arrays match exactly.
Note: I expressly do not use string_agg() because that does not produce distinct results. Consider:
'abc' 'cde' 'fgh'
'ab' 'ccdef' 'gh'
.. resulting int the same string if concatenated.
Index / Performance
You might consider to save pre-ordered data to speed up queries. Doing it on the fly is expensive. I.e. you could pre-generate the sorted array and save it as redundant column which you can then support with an index. Should be faster by several orders of magnitude for the cost of redundant data storage.
If you are dealing with long strings, a solution similar to what I outlined in this related answer on dba.SE might be the best course of action.
Alternatively (preferred!) guarantee that (c1, c2, c3) are always stored in ascending order. You could use a trigger BEFORE INSERT OR UPDATE to keep values within the row ordered. No redundant storage and you can simply create a multi-column index on the three columns and compare to them one by one (instead of comparing the array like in my example).

You don't need to write a function for that.
First, there's no "strings" with postgresql ( sql ) , it's "text" or "varchar".
Second, what you need is an SQL query like this:
SELECT ( DISTINCT ( c1 || c2 || c3 )) AS txtcol, SUM (c4) AS rowsum;
or
SELECT ( DISTINCT ( c1 || c2 || c3 )) AS txtcol, SUM(c4) AS numsum GROUP BY txtcol;
Can't recall the exact syntax at the moment, you need to work it out,
anyway the point is you need to concatenate varchar columns with some built-in
function like CONCAT or "||" operator, and then sum/group by numeric column. All you need
is to concatenate columns, and give resulting all-together column a name.
To be exact, you don't even need to show concatenated column on resulting table,
you could output just sums, and number of rows sumarized for example.
Theoretically you could write SQL function or PL/SQL function for that, but I'm sure it's just not necessary, your case seems to me simple enough to be able to achieve result you want without a function. Built-in sumarizing function SUM() is called "aggregate" function, other examples of aggregating functions are e.g. MIN() or MAX().
Note what you're actually trying to do, is grouping rows by some resulting VARCHAR column by the effect of concatenation per-row.
EDIT: "Arrays" in SQL or procedural SQL is some internally-handled arrays, do not confuse them with relations ( tables in database, nor with tables as SELECT results ). I think you also don't need SQL arrays for that, the task really isn't so hard as it looks like.

Related

convert letters to numbers or vice versa

TRADE_SIDE values are stored in DB with values 1 or 2.
On the other hand, SPOT_SIDE values are stored with equivalent A and B values in DB.
I need to find a way to compare these values in the where clause when querying the DB. 1 for A and 2 for B.
Do you have an idea?
Simple CASE EXPRESSION will do the trick :
SELECT * FROM trade_side t
INNER JOIN spot_side s
ON(CASE WHEN t.<YourColumn> = 1 THEN 'A' ELSE 'B' END = s.<YourColumn>)
This query will join both tables together on(1 = a,2 = b) . If you have more then 2 values, you should add another WHEN .

Union of two arrays in PostgreSQL without unnesting

I have two arrays in PostgreSQL that I need to union. For example:
{1,2,3} union {1,4,5} would return {1,2,3,4,5}
Using the concatenate (||) operator would not remove duplicate entries, i.e. it returns {1,2,3,1,4,5}
I found one solution on the web, however I do not like how it needs to unnest both arrays:
select ARRAY(select unnest(ARRAY[1,2,3]) as a UNION select unnest(ARRAY[2,3,4,5]) as a)
Is there an operator or built-in function that will cleanly union two arrays?
If your problem is to unnest twice this will unnest only once
select array_agg(a order by a)
from (
select distinct unnest(array[1,2,3] || array[2,3,4,5]) as a
) s;
There is a extension intarray (in contrib package) that contains some useful functions and operators:
postgres=# create extension intarray ;
CREATE EXTENSION
with single pipe operator:
postgres=# select array[1,2,3] | array[3,4,5];
?column?
─────────────
{1,2,3,4,5}
(1 row)
or with uniq function:
postgres=# select uniq(ARRAY[1,2,3] || ARRAY[3,4,5]);
uniq
─────────────
{1,2,3,4,5}
(1 row)
ANSI/SQL knows a multiset, but it is not supported by PostgreSQL yet.
Can be done like so...
select uniq(sort(array_remove(array_cat(ARRAY[1,2,3], ARRAY[1,4,5]), NULL)))
gives:
{1,2,3,4,5}
array_remove is needed because your can't sort arrays with NULLS.
Sort is needed because uniq de-duplicates only if adjacent elements are found.
A benefit of this approach over #Clodoaldo Neto's is that works entire within the select, and doesn't the unnest in the FROM clause. This makes it straightforward to operate on multiple arrays columns at the same time, and in a single table-scan. (Although see Ryan Guill version as a function in the comment).
Also, this pattern works for all array types (who's elements are sortable).
A downside is that, feasibly, its a little slower for longer arrays (due to the sort and the 3 intermediate array allocations).
I think both this and the accept answer fail if you want to keep NULL in the result.
The intarray-based answers don't work when you're trying to take the set union of an array-valued column from a group of rows. The accepted array_agg-based answer can be modified to work, e.g.
SELECT selector_column, array_agg(a ORDER BY a) AS array_valued_column
FROM (
SELECT DISTINCT selector_column, UNNEST(array_valued_column) AS a FROM table
) _ GROUP BY selector_column;
but, if this is buried deep in a complex query, the planner won't be able to push outer WHERE expressions past it, even when they would substantially reduce the number of rows that have to be processed. The right solution in that case is to define a custom aggregate:
CREATE FUNCTION array_union_step (s ANYARRAY, n ANYARRAY) RETURNS ANYARRAY
AS $$ SELECT s || n; $$
LANGUAGE SQL IMMUTABLE LEAKPROOF PARALLEL SAFE;
CREATE FUNCTION array_union_final (s ANYARRAY) RETURNS ANYARRAY
AS $$
SELECT array_agg(i ORDER BY i) FROM (
SELECT DISTINCT UNNEST(x) AS i FROM (VALUES(s)) AS v(x)
) AS w WHERE i IS NOT NULL;
$$
LANGUAGE SQL IMMUTABLE LEAKPROOF PARALLEL SAFE;
CREATE AGGREGATE array_union (ANYARRAY) (
SFUNC = array_union_step,
STYPE = ANYARRAY,
FINALFUNC = array_union_final,
INITCOND = '{}',
PARALLEL = SAFE
);
Usage is
SELECT selector_column, array_union(array_valued_column) AS array_valued_column
FROM table
GROUP BY selector_column;
It's doing the same thing "under the hood", but because it's packaged into an aggregate function, the planner can see through it.
It's possible that this could be made more efficient by having the step function do the UNNEST and append the rows to a temporary table, rather than a scratch array, but I don't know how to do that and this is good enough for my use case.

Compare arrays for equality, ignoring order of elements

I have a table with 4 array columns.. the results are like:
ids signed_ids new_ids new_ids_signed
{1,2,3} | {2,1,3} | {4,5,6} | {6,5,4}
Anyway to compare ids and signed_ids so that they come out equal, by ignoring the order of the elements?
You can use contained by operator:
(array1 <# array2 and array1 #> array2)
The additional module intarray provides operators for arrays of integer, which are typically (much) faster. Install once per database with (in Postgres 9.1 or later):
CREATE EXTENSION intarray;
Then you can:
SELECT uniq(sort(ids)) = uniq(sort(signed_ids));
Or:
SELECT ids #> signed_ids AND ids <# signed_ids;
Bold emphasis on functions and operators from intarray.
In the second example, operator resolution arrives at the specialized intarray operators if left and right argument are type integer[].
Both expressions will ignore order and duplicity of elements. Further reading in the helpful manual here.
intarray operators only work for arrays of integer (int4), not bigint (int8) or smallint (int2) or any other data type.
Unlike the default generic operators, intarray operators do not accept NULL values in arrays. NULL in any involved array raises an exception. If you need to work with NULL values, you can default to the standard, generic operators by schema-qualifying the operator with the OPERATOR construct:
SELECT ARRAY[1,4,null,3]::int[] OPERATOR(pg_catalog.#>) ARRAY[3,1]::int[]
The generic operators can't use indexes with an intarray operator class and vice versa.
Related:
GIN index on smallint[] column not used or error "operator is not unique"
The simplest thing to do is sort them and compare them sorted. See sorting arrays in PostgreSQL.
Given sample data:
CREATE TABLE aa(ids integer[], signed_ids integer[]);
INSERT INTO aa(ids, signed_ids) VALUES (ARRAY[1,2,3], ARRAY[2,1,3]);
the best thing to do is to if the array entries are always integers is to use the intarray extension, as Erwin explains in his answer. It's a lot faster than any pure-SQL formulation.
Otherwise, for a general version that works for any data type, define an array_sort(anyarray):
CREATE OR REPLACE FUNCTION array_sort(anyarray) RETURNS anyarray AS $$
SELECT array_agg(x order by x) FROM unnest($1) x;
$$ LANGUAGE 'SQL';
and use it sort and compare the sorted arrays:
SELECT array_sort(ids) = array_sort(signed_ids) FROM aa;
There's an important caveat:
SELECT array_sort( ARRAY[1,2,2,4,4] ) = array_sort( ARRAY[1,2,4] );
will be false. This may or may not be what you want, depending on your intentions.
Alternately, define a function array_compare_as_set:
CREATE OR REPLACE FUNCTION array_compare_as_set(anyarray,anyarray) RETURNS boolean AS $$
SELECT CASE
WHEN array_dims($1) <> array_dims($2) THEN
'f'
WHEN array_length($1,1) <> array_length($2,1) THEN
'f'
ELSE
NOT EXISTS (
SELECT 1
FROM unnest($1) a
FULL JOIN unnest($2) b ON (a=b)
WHERE a IS NULL or b IS NULL
)
END
$$ LANGUAGE 'SQL' IMMUTABLE;
and then:
SELECT array_compare_as_set(ids, signed_ids) FROM aa;
This is subtly different from comparing two array_sorted values. array_compare_as_set will eliminate duplicates, making array_compare_as_set(ARRAY[1,2,3,3],ARRAY[1,2,3]) true, whereas array_sort(ARRAY[1,2,3,3]) = array_sort(ARRAY[1,2,3]) will be false.
Both of these approaches will have pretty bad performance. Consider ensuring that you always store your arrays sorted in the first place.
If your arrays have no duplicates and are of the same dimension:
use array contains #>
AND array_length where the length must match the size you want on both sides
select (string_agg(a,',' order by a) = string_agg(b,',' order by b)) from
(select unnest(array[1,2,3,2])::text as a,unnest(array[2,2,3,1])::text as b) A

SQL case in select query

I have following query
Select
a,b,c,
case totalcount
when 0 then 0
else abcd/totalcount
end AS 'd',
case totalcount
when 0 then 0
else defg/totalcount
end AS 'e'
From
Table1
In this query, I have same case statement in select query...Can i make it into single select statement.
"totalcount" is some value... abcd and defg are two column values from table1. If totalcount is zero, i dont want to divide, else I wish to divide to get an average value of abcd and defg.
No, since you seem to be requiring two output columns in your desired resultset... Each Case statement can generate at most one output value.
It doesn't matter how many columns the case statement needs to access to do it's job, or how many different values it has to select from (how many When... Then... expressions), it just matters how many columns you are trying to generate in the final resultset.
At first view, I would say no, since you seem to use the exact same condition leading to different results.
Besides, this looks odd to me, why would you need two SELECT CASE for only one condition? This makes no sense.
Could you be more specific or give a real-world example of what you're trying to ask, with "real" data so that we might better answer your question?
EDIT #1
Given that:
Yes...it's two different fields....and the valud i need is calculated one...which is Value = a/b. But i need to check if b is zero or not
I would still answer no, as if they are two different fields, and you want both results in your result set, then you will need to write two CASE statements, one for each of the fields you need to verify whether it is zero-valued or not. Keep in mind that one CASE statement is equivalent to one single column only. So, if you need to check for a zero value, you are required to check for both independently.
By the way, sorry for my misunderstanding, that must be a language barrier, unfortunately. =) But my requirement for "real" data example holds, since this might light us all up with some other related solutions, we never know! =)
EDIT #2
Considering you have to check for a value of 0, I would perhaps rewrite my code as follows, for the sake of readability:
select a, b, c
, case when condition1 <> 0 then abcd / condition1 else 0 end as Result1
, case when condition2 <> 0 then defg / condition2 else 0 end as Result2
from Table1
In my opinion, it is leaner and swifter in the code, and it is easier to understand what is the intention. But after all, the result would be the same! No offense made here! =)
EDIT #3
After having read your question edit:
"totalcount" is some value... abcd and defg are two column values from table1. If totalcount is zero, i dont want to divide, else I wish to divide to get an average value of abcd and defg.
I say, if it is the average that you're after, why not just use the AVG function which will deal with it internaly, without having to care about zero values?
select a, b, c
, AVG(abcd) as Result1
, AVG(defg) as Result2
from Table1
Plus, considering having no record, the average can only be 0!
I don't know about your database engine, but if it is Oracle or SQL Server, and I think DB2 as well, they support this function.
You need to elaborate more - what is the condition? With what you have entered so far, it looks like you are just using the result of the expression, making the CASE statement redundant.
For example, see my inline comments:
SELECT a,b,c,
case condition
when 0 then 0 --this is unnecessary
else abcd/condition, --this is just using the result of the condition?
case condition
when 0 then 0 --this is unnecessary
else defg/condition --this is just using the result of the condition?
from Table1
so you could just refactor this as:
SELECT a
,b
,c
,expression
,expression
from Table1
if you actually meant the else abcd/condition as evaluate another condition or use a default value then you need to expand on that so we can answer accurately.
EDIT: if you are looking to avoid a divide by zero and you only want to evaluate condition once, then you need to do it outside of the SELECT and use the variable inside the SELECT. I wouldn't be concerned about the performance impact of evaluating it multiple times, if the condition doesn't change then its value should be cached by the query execution engine. Although it could look ugly and unmaintainable, in which case you could possibly factor condition into a function.
If all you're trying to do is avoid dividing by zero, and if it's not critical that an attempt to divide by zero actually produces zero, then you can do the following:
Select
a,b,c,
abcd / NULLIF(totalcount,0) AS 'd',
defg / NULLIF(totalcount,0) AS 'e'
From
Table1
This transforms an attempt to divide by zero into an attempt to divide by NULL, which always produces NULL.
If you need zero and you know that abcd and defg will never be NULL, then you can wrap the whole thing in an ISNULL(...,0) to force it to be zero. If abcd or defg might legitimately be null and you need the zero output if totalcount is zero, then I don't know any other option than using a CASE statement, sorry.
Moreover, since it appears totalcount is an expression, for maintainability you could compute it in a subquery, such as:
Select
a,b,c,
abcd / NULLIF(totalcount,0) AS 'd',
defg / NULLIF(totalcount,0) AS 'e'
From
(
Select
a,b,c,
abcd,
defg,
(... expression goes here ...) AS totalcount
From
Table1
) Data
But if it's a simple expression this may be overkill.
Technically this is possible. I'm not suggesting it's a good idea though!
;with Table1 as
(
select 1 as a, 2 as b, 3 as c, 0 as condition, 1234 as abcd, 4567 AS defg UNION ALL
select 1 as a, 2 as b, 3 as c, 1 as condition, 1234.0 as abcd, 4567 AS defg
)
,
cte AS
(
Select
a,b,c,
case condition
when 0 then CAST(CAST(0 as decimal(18, 6)) as binary(9)) + CAST(CAST(0 as decimal(18, 6)) as binary(9))
else CAST(CAST(abcd/condition as decimal(18, 6)) as binary(9)) + CAST(CAST(defg/condition as decimal(18, 6)) as binary(9))
end AS d
From
Table1
)
Select
a,
b,
c,
cast(substring(d,1,9) as decimal(18, 6)) as d,
cast(substring(d,10,9) as decimal(18, 6)) as e
From cte

"if, then, else" in SQLite

Without using custom functions, is it possible in SQLite to do the following. I have two tables, which are linked via common id numbers. In the second table, there are two variables. What I would like to do is be able to return a list of results, consisting of: the row id, and NULL if all instances of those two variables (and there may be more than two) are NULL, 1 if they are all 0 and 2 if one or more is 1.
What I have right now is as follows:
SELECT
a.aid,
(SELECT count(*) from W3S19 b WHERE a.aid=b.aid) as num,
(SELECT count(*) FROM W3S19 c WHERE a.aid=c.aid AND H110 IS NULL AND H112 IS NULL) as num_null,
(SELECT count(*) FROM W3S19 d WHERE a.aid=d.aid AND (H110=1 or H112=1)) AS num_yes
FROM W3 a
So what this requires is to step through each result as follows (rough Python pseudocode):
if row['num_yes'] > 0:
out[aid] = 2
elif row['num_null'] == row['num']:
out[aid] = 'NULL'
else:
out[aid] = 1
Is there an easier way? Thanks!
Use CASE...WHEN, e.g.
CASE x WHEN w1 THEN r1 WHEN w2 THEN r2 ELSE r3 END
Read more from SQLite syntax manual (go to section "The CASE expression").
There's another way, for numeric values, which might be easier for certain specific cases.
It's based on the fact that boolean values is 1 or 0, "if condition" gives a boolean result:
(this will work only for "or" condition, depends on the usage)
SELECT (w1=TRUE)*r1 + (w2=TRUE)*r2 + ...
of course #evan's answer is the general-purpose, correct answer

Resources