I am trying to sum the contents of a postgresql array and also determine its length. In all cases, I am working with columns that are of integer array type.
ERROR: function sum(integer[]) does not exist
LINE 1: select sum(interested) from deals;
^
HINT: No function matches the given name and argument types. You might need to add explicit type casts.
db5=> select SUM(interested) from deals;
ERROR: function sum(integer[]) does not exist
LINE 1: select SUM(interested) from deals;
^
HINT: No function matches the given name and argument types. You might need to add explicit type casts.
db5=> select array_length(interested) from deals;
ERROR: function array_length(integer[]) does not exist
LINE 1: select array_length(interested) from deals;
^
HINT: No function matches the given name and argument types. You might need to add explicit type casts.
db5=>
Contrary to what I have read (Select sum of an array column in PostgreSQL, http://www.postgresql.org/docs/8.4/static/functions-array.html) sum(column) and array_length(column, 1) do not seem to perform as expected for me. Am I looking at the wrong docs? How can I get the sum and length of a postgre integer array?
Also, these are the php queries I am using along with pg_query to run these calls:
$query = "UPDATE deals set remaining = max - array_length(interested, 1), until = min -array_length(interested, 1";
$query = "UPDATE deals set remaining = max - (SELECT SUM FROM UNNEST(hours)), until = min - (SELECT SUM FROM UNNEST(hours))";
I modified them to take Patrick's suggestions from the comments into account, but I still get a syntax error, e.g.
Warning: pg_query(): Query failed: ERROR: syntax error at end of input LINE 1: ...ngth(interested, 1), until = min -array_length(interested, 1 ^ in /var/www/html/join.php on line 162
Thank you.
In the update statement you just forgot a bracket at end of the statement (see comment).
In the second update statement, you cannot use sum in UPDATE, because aggregate functions are not allowed in update.
Use a custom sql function which sums all elements in a array and use it in the update:
CREATE FUNCTION array_sum(NUMERIC[]) returns numeric AS
$$
SELECT sum(unnest) FROM (select unnest($1)) as foo;
$$
LANGUAGE sql;
The update statement:
UPDATE deals SET remaining = max - array_sum(hours), until = min - array_sum(hours);
Related
Goal: write a function in PostgreSQL SQL that takes as input an integer array whose each element is either 0, 1, or -1 and returns an array of the same length, where each element of the output array is the sum of all adjacent nonzero values in the input array having the same or lower index.
Example, this input:
{0,1,1,1,1,0,-1,-1,0}
should produce this result:
{0,1,2,3,4,0,-1,-2,0}
Here is my attempt at such a function:
CREATE FUNCTION runs(input int[], output int[] DEFAULT '{}')
RETURNS int[] AS $$
SELECT
CASE WHEN cardinality(input) = 0 THEN output
ELSE runs(input[2:],
array_append(output, CASE
WHEN input[1] = 0 THEN 0
ELSE output[cardinality(output)] + input[1]
END)
)
END
$$ LANGUAGE SQL;
Which gives unexpected (to me) output:
# select runs('{0,1,1,1,1,0,-1,-1,-1,0}');
runs
----------------------------------------
{0,1,2,3,4,5,6,0,0,0,-1,-2,-3,-4,-5,0}
(1 row)
I'm using PostgreSQL 14.4. While I am ignorant of why there are more elements in the output array than the input, the cardinality() in the recursive call seems to be causing it, as also does using array_length() or array_upper() in the same place.
Question: how can I write a function that gives me the output I want (and why is the function I wrote failing to do that)?
Bonus extra: For context, this input array is coming from array_agg() invoked on a table column and the output will go back into a table using unnest(). I'm converting to/from an array since I see no way to do it directly on the table, in particular because WITH RECURSIVE forbids references to the recursive table in either an outer join or subquery. But if there's a way around using arrays (especially with a lack of tail-recursion optimization) that will answer the general question (But I am still very very curious why I'm seeing the extra elements in the output array).
Everything indicates that you have found a reportable Postgres bug. The function should work properly, and a slight modification unexpectedly changes its behavior. Add SELECT; right after $$ to get the function to run as expected, see Db<>fiddle.
A good alternative to a recursive solution is a simple iterative function. Handling arrays in PL/pgSQL is typically simpler and faster than recursion.
create or replace function loop_function(input int[])
returns int[] language plpgsql as $$
declare
val int;
tot int = 0;
res int[];
begin
foreach val in array input loop
if val = 0 then tot = 0;
else tot := tot + val;
end if;
res := res || tot;
end loop;
return res;
end $$;
Test it in Db<>fiddle.
The OP wrote:
this input array is coming from array_agg() invoked on a table column and the output will go back into a table using unnest().
You can calculate these cumulative sums directly in the table with the help of window functions.
select id, val, sum(val) over w
from (
select
id,
val,
case val
when 0 then 0
else sum((val = 0)::int) over w
end as series
from my_table
window w as (order by id)
) t
window w as (partition by series order by id)
order by id
Test it in Db<>fiddle.
I have a Postgres function that accepts a text[] as input. For example
create function temp1(player_ids text[])
returns void
language plpgsql
as
$$
begin
update players set player_xp = 0
where id in (player_ids);
-- the body is actually 20 lines long, updating a lot of tables
end;
$$;
and I'm trying to call it, but I keep getting
[42883] ERROR: operator does not exist: text = text[] Hint: No operator matches the given name and argument types. You might need to add explicit type casts. Where: PL/pgSQL function temp1(text[]) line 3 at SQL statement
I have tried these so far
select temp1('{F7AWLJWYQ5BMPKGXLMDNQKQ4NY,AQPBAFKQONGLBKIMCSOD747GY4}');
select temp1('{F7AWLJWYQ5BMPKGXLMDNQKQ4NY,AQPBAFKQONGLBKIMCSOD747GY4}'::text[]);
select temp1(array['F7AWLJWYQ5BMPKGXLMDNQKQ4NY,AQPBAFKQONGLBKIMCSOD747GY4']);
select temp1(array['F7AWLJWYQ5BMPKGXLMDNQKQ4NY,AQPBAFKQONGLBKIMCSOD747GY4']::text[]);
I have to be missing something obvious...how do I call this function with an array literal?
Use = any instead of in:
...
update players set player_xp = 0
where id = any(player_ids);
...
The IN operator acts on an explicit list of values.
expression IN (value [, ...])
When you want to compare a value to each element of an array, use ANY instead.
expression operator ANY (array expression)
Note that there are variants of both constructs for subqueries expression IN (subquery) and expression operator ANY (subquery). The first one was properly used in the other answer though a subquery seems excessive in this case.
You can use unnest function, this function is very easy and same time best performanced. Unnest using for converting array elements to rows. Example:
create function temp1(player_ids text[])
returns void
language plpgsql
as
$$
begin
update players set player_xp = 0
where id in (select pl.id from unnest(player_ids) as pl(id));
-- the body is actually 20 lines long, updating a lot of tables
end;
$$;
And you can easily cast array elements to another type for using unnest.
Example:
update players set player_xp = 0
where id in (select pl.id::integer from unnest(player_ids) as pl(id));
I've encountered an interesting SQL server behaviour while trying to generate random values in T-sql using RAND and CHOOSE functions.
My goal was to try to return one of two given values using RAND() as rng. Pretty easy right?
For those of you who don't know it, CHOOSE function accepts in an index number(int) along with a collection of values and returns a value at specified index. Pretty straightforward.
At first attempt my SQL looked like this:
select choose(ceiling((rand()*2)) ,'a','b')
To my surprise, this expression returned one of three values: null, 'a' or 'b'. Since I didn't expect the null value i started digging. RAND() function returns a float in range from 0(included) to 1 (excluded). Since I'm multiplying it by 2, it should return values anywhere in range from 0(included) to 2 (excluded). Therefore after use of CEILING function final value should be one of: 0,1,2. After realising that i extended the value list by 'c' to check whether that'd be perhaps returned. I also checked the docs page of CEILING and learnt that:
Return values have the same type as numeric_expression.
I assumed the CEILINGfunction returned int, but in this case would mean that the value is implicitly cast to int before being used in CHOOSE, which sure enough is stated on the docs page:
If the provided index value has a numeric data type other than int,
then the value is implicitly converted to an integer.
Just in case I added an explicit cast. My SQL query looks like this now:
select choose(cast(ceiling((rand()*2)) as int) ,'a','b','c')
However, the result set didn't change. To check which values cause the problem I tried generating the value beforehand and selecting it alongside the CHOOSE result. It looked like this:
declare #int int = cast(ceiling((rand()*2)) as int)
select #int,choose( #int,'a','b','c')
Interestingly enough, now the result set changed to (1,a), (2,b) which was my original goal. After delving deeper in the CHOOSE docs page and some testing i learned that 'null' is returned in one of two cases:
Given index is a null
Given index is out of range
In this case that would mean that index value when generated inside the SELECT statement is either 0 or above 2/3 (I'm assuming that negative numbers are not possible here and CHOOSE function indexes from 1). As I've stated before 0 should be one of possibilities of:
ceiling((rand()*2))
,but for some reason it's never 0 (at least when i tried it 1 million+ times like this)
set nocount on
declare #test table(ceiling_rand int)
declare #counter int = 0
while #counter<1000000
begin
insert into #test
select ceiling((rand()*2))
set #counter=#counter+1
end
select distinct ceiling_rand from #test
Therefore I assume that the value generated in SELECT is greater than 2/3 or NULL. Why would it be like this only when generated in SELECT statement? Perhaps order of resolving CAST, CELING or RAND inside SELECT is different than it would seem? It's true I've only tried it a limited number of times, but at this point the chances of it being a statistical fluctuation are extremely small. Is it somehow a floating-point error? I truly am stumbled and looking forward to any explanation.
TL;DR: When generating a random number inside a SELECT statement result set of possible values is different then when it's generated before the SELECT statement.
Cheers,
NFSU
EDIT: Formatting
You can see what's going on if you look at the execution plan.
SET SHOWPLAN_TEXT ON
GO
SELECT (select choose(ceiling((rand()*2)) ,'a','b'))
Returns
|--Constant Scan(VALUES:((CASE WHEN CONVERT_IMPLICIT(int,ceiling(rand()*(2.0000000000000000e+000)),0)=(1) THEN 'a' ELSE CASE WHEN CONVERT_IMPLICIT(int,ceiling(rand()*(2.0000000000000000e+000)),0)=(2) THEN 'b' ELSE NULL END END)))
The CHOOSE is expanded out to
SELECT CASE
WHEN ceiling(( rand() * 2 )) = 1 THEN 'a'
ELSE
CASE
WHEN ceiling(( rand() * 2 )) = 2 THEN 'b'
ELSE NULL
END
END
and rand() is referenced twice. Each evaluation can return a different result.
You will get the same problem with the below rewrite being expanded out too
SELECT CASE ceiling(( rand() * 2 ))
WHEN 1 THEN 'a'
WHEN 2 THEN 'b'
END
Avoid CASE for this and any of its variants.
One method would be
SELECT JSON_VALUE ( '["a", "b"]' , CONCAT('$[', FLOOR(rand()*2) ,']') )
I am looking to retrieve a value of the profit (FilmBoxOfficeDollar - FilmBudgetDollars) based on the Studio given as a parameter to the function.
USE Movies;
GO
CREATE FUNCTION fnmovieProfits(#StudioName nvarchar(255))
RETURNS int
AS
BEGIN
RETURN (SELECT SUM(FilmBoxOfficeDollars - FilmBudgetDollars)
FROM Film JOIN Studio
ON Film.FilmStudioID = Studio.StudioID
WHERE StudioName = #StudioName);
END;
GO
SELECT [dbo].[fnmovieProfits]('Dreamworks');
Whenever I run this through to pull the piece of data I get the following error:
Msg 8115, Level 16, State 2, Line 13
Arithmetic overflow error converting expression to data type int.
Any help would be much appreciated!
The problem you are experiencing is that you are overflowing the allowed value of a 32 bit number (INT); if you cast/convert to a 64 bit number (BIGINT) and return that datatype, the issue will be corrected. Proof of concept showing the issue:
DECLARE #BigNumber INT=2000000000
select CONVERT(BIGINT,#BigNumber) + CONVERT(BIGINT,#BigNumber) --returns 4,000,000,000
select (#BigNumber + #BigNumber) --errors with "Arithmetic overflow error converting expression to data type int."
BUT, do yourself a favor and use a view instead. Scalars like that are terrible for performance in reports. Scalar functions should never be used unless they are simply doing calculations based on input values (i.e. not hitting underlying, persisted data).
CREATE VIEW dbo.v_StudioProfits
AS
SELECT
StudioName,
SUM(CONVERT(BIGINT,FilmBoxOfficeDollars) - CONVERT(BIGINT,FilmBudgetDollars)) AS [Profit]
FROM Film
INNER JOIN Studio ON Film.FilmStudioID = Studio.StudioID
GROUP BY StudioName
GO
SELECT * FROM dbo.v_StudioProfits WHERE StudioName='Dreamworks'
Relevant reading on SQL Server datatypes. Specifically, integer datatypes.
Your sum is exceeding int range. You should define your return type as bigint:
CREATE FUNCTION fnmovieProfits(#StudioName nvarchar(255))
RETURNS bigint
AS.......
The maximum value you can return with int as a return type is 2147483647. Your sum is probably bigger than that.
One example of function that exceeds its return type:
CREATE FUNCTION testFunction()
RETURNS int
AS
BEGIN
RETURN (SELECT 2147483647 + 1);
END;
GO
SELECT [dbo].[testFunction]();
If you execute it you will get the following error:
Msg 8115, Level 16, State 2, Line 8
Arithmetic overflow error converting expression to data type int.
So the solution is just to increase your return type range by replacing int with bigint.
Is the only way to pass an extra parameter to the final function of a PostgreSQL aggregate to create a special TYPE for the state value?
e.g.:
CREATE TYPE geomvaltext AS (
geom public.geometry,
val double precision,
txt text
);
And then to use this type as the state variable so that the third parameter (text) finally reaches the final function?
Why aggregates can't pass extra parameters to the final function themselves? Any implementation reason?
So we could easily construct, for example, aggregates taking a method:
SELECT ST_MyAgg(accum_number, 'COMPUTE_METHOD') FROM blablabla
Thanks
You can define an aggregate with more than one parameter.
I don't know if that solves your problem, but you could use it like this:
CREATE OR REPLACE FUNCTION myaggsfunc(integer, integer, text) RETURNS integer
IMMUTABLE STRICT LANGUAGE sql AS
$f$
SELECT CASE $3
WHEN '+' THEN $1 + $2
WHEN '*' THEN $1 * $2
ELSE NULL
END
$f$;
CREATE AGGREGATE myagg(integer, text) (
SFUNC = myaggsfunc(integer, integer, text),
STYPE = integer
);
It could be used like this:
CREATE TABLE mytab
AS SELECT * FROM generate_series(1, 10) i;
SELECT myagg(i, '+') FROM mytab;
myagg
-------
55
(1 row)
SELECT myagg(i, '*') FROM mytab;
myagg
---------
3628800
(1 row)
I solved a similar issue by making a custom aggregate function that did all the operations at once and stored their states in an array.
CREATE AGGREGATE myagg(integer)
(
INITCOND = '{ 0, 1 }',
STYPE = integer[],
SFUNC = myaggsfunc
);
and:
CREATE OR REPLACE FUNCTION myaggsfunc(agg_state integer[], agg_next integer)
RETURNS integer[] IMMUTABLE STRICT LANGUAGE 'plpgsql' AS $$
BEGIN
agg_state[1] := agg_state[1] + agg_next;
agg_state[2] := agg_state[2] * agg_next;
RETURN agg_state;
END;
$$;
Then made another function that selected one of the results based on the second argument:
CREATE OR REPLACE FUNCTION myagg_pick(agg_state integer[], agg_fn character varying)
RETURNS integer IMMUTABLE STRICT LANGUAGE 'plpgsql' AS $$
BEGIN
CASE agg_fn
WHEN '+' THEN RETURN agg_state[1];
WHEN '*' THEN RETURN agg_state[2];
ELSE RETURN 0;
END CASE;
END;
$$;
Usage:
SELECT myagg_pick(myagg("accum_number"), 'COMPUTE_METHOD') FROM "mytable" GROUP BY ...
Obvious downside of this is the overhead of performing all the functions instead of just one. However when dealing with simple operations such as adding, multiplying etc. it should be acceptable in most cases.
You would have to rewrite the final function itself, and in that case you might as well write a set of new aggregate functions, one for each possible COMPUTE_METHOD. If the COMPUTE_METHOD is a data value or implied by a data value, then a CASE statement can be used to select the appropriate aggregate method. Alternatively, you may want to create a custom composite type with fields for accum_number and COMPUTE_METHOD, and write a single new aggregate function that uses this new data type.