Database Design for 2D Matrix Algebra - database

Can anyone advise on a database design/DBMS for storing 2D Time Series Matrix data. To allow for quick BACK END algebraic calculations: e.g:
Table A,B,C..
Col1: Date- Timestamp
col2: Data- Array? (Matrix Data)
SQL Psuedo Code
INSERT INTO TABLE C
SELECT
Multiply A.Data A by B.Data
Where Matrix A Start Date = Matrix B Start Date
And Matrix A End Date = Matrix B End Date
Essentially set the co-ordinates for the calculation.

The difficulty with matrix algebra is determining what is a domain on the matrix for data modelling purposes. Is it a value? Is it a matrix as a whole? This is not a pre-defined question, so I will give you two solutions and what the tradeoffs are.
Solution 1: Value in a matrix cell is a domain:
CREATE TABLE matrix_info (
x_size int,
y_size int,
id serial not null unique,
timestamp not null,
);
CREATE TABLE matrix_cell (
matrix_id int references matrix_info(id),
x int,
y int,
value numeric not null,
primary key (matrix_id, x, y)
);
The big concern is that this does not enforce matrix sizes very well. Additionally a missing value could be used to represent 0, or might not be allowed. The idea of using a matrix as a whole as a domain has some attractiveness. In this case:
CREATE TABLE matrix (
id serial not null unique,
timestamp not null,
matrix_data numeric[]
);
Note that many db's including PostgreSQL will enforce that an array is actually a matrix. Then you'd need to write your own functions for multiplication etc. I would recommend doing this in an object-relational way and on PostgreSQL since it is quite programmable for this sort of thing. Something like:
CREATE TABLE matrix(int) RETURNS matrix LANGUAGE SQL AS
$$ select * from matrix where id = $1 $$;
CREATE FUNCTION multiply(matrix, matrix) RETURNS matrix LANGUAGE plpgsql AS
$$
DECLARE matrix1 = $1.matrix_data;
matrix2 = $2.matrix_data;
begin
...
end;
$$;
Then you can call the matrix multiplication as:
SELECT * FROM multiply(matrix(1), matrix(2));
You could even insert into the table the product of two other matrices:
INSERT INTO matrix (matrix_data)
SELECT matrix_data FROM multiply(matrix(1), matrix(2));

Related

Call set-returning plpgsql function for each row returned from a query

In my Postgres 9.6 database I have the following custom domain and table definition:
create domain lowResData as
float[21];
create table myRawValues (
id text,
myData lowResData,
xAxis lowResData,
primary key(id)
);
The following functions are able to produce the result I want for a single item.
create function getData(_id 'text') returns float[] as $$
select myData
from myRawValues
where id = _id
$$ language sql;
create function getAxis(_id 'text') returns float[] as $$
select xAxis
from myRawValues
where id = _id
$$ language sql;
create function myPlotter(myarray float[], myData float[])
returns table (frequency float, amplitude float) as
$$
select *
from unnest(myarray, myData) as u;
$$ language sql;
select * from myPlotter(getAxis('123'), getData('123'));
I want to do the same for all id's produced from executing a particular query and end up with a result like this:
create or replace function allIdLowResData() returns setof float[] as
$body$
declare r text;
begin
for r in (select id from myRawValues where /*SOME CONDITION*/)
loop
return next myPlotter(getAxis(r), getData(r));
end loop;
return;
end
$body$
language plpgsql;
Use a LATERAL join to combine your set-returning function with the rest of the query. Like:
CREATE OR REPLACE FUNCTION allIdLowResData()
RETURNS TABLE (frequency float, amplitude float, id text) AS
$func$
SELECT p.*, r.id
FROM myRawValues r
LEFT JOIN LATERAL myPlotter(r.xAxis, r.myData) p ON true
WHERE /*SOME CONDITION*/
$func$ LANGUAGE sql;
See:
What is the difference between LATERAL and a subquery in PostgreSQL?
Plus, the declared return type of the function (RETURNS) must match what's actually returned.
Using a simpler SQL function here. You can do the same with PL/pgSQL, lead with RETURNS QUERY in this case.
You might be interested in these details about Postgres array definitions, quoted from the manual:
However, the current implementation ignores any supplied array size
limits, i.e., the behavior is the same as for arrays of unspecified
length.
The current implementation does not enforce the declared number of
dimensions either. Arrays of a particular element type are all
considered to be of the same type, regardless of size or number of
dimensions. So, declaring the array size or number of dimensions in
CREATE TABLE is simply documentation; it does not affect run-time behavior.
Meaning, your domain is currently noise without any effect (aside from complications). To actually enforce 1-dimensional arrays with exactly 21 elements in your table, use a CHECK constraint. Like:
CREATE DOMAIN lowResData AS float[21] -- "[21]" is just for documentation
CONSTRAINT dim1_elem21 CHECK (array_ndims(VALUE) = 1 AND array_length(VALUE, 1) = 21);
I would also ditch the functions getData() and getAxis() unless there is more to them.

PostgreSQL PL/pgSQL random value from array of values

How can I declare an array like variable with two or three values and get them randomly during execution?
a := [1, 2, 5] -- sample sake
select random(a) -- returns random value
Any suggestion where to start?
Try this one:
select (array['Yes', 'No', 'Maybe'])[floor(random() * 3 + 1)];
Updated 2023-01-10 to fix the broken array literal. Made it several times faster while being at it:
CREATE OR REPLACE FUNCTION random_pick()
RETURNS int
LANGUAGE sql VOLATILE PARALLEL SAFE AS
$func$
SELECT ('[0:2]={1,2,5}'::int[])[trunc(random() * 3)::int];
$func$;
random() returns a value x where 0.0 <= x < 1.0. Multiply by 3 and truncate it with trunc() (slightly faster than floor()) to get 0, 1, or 2 with exactly equal chance.
Postgres indexes are 1-based by default (as per SQL standard). This would be off-by-1. We could increment by 1 every time, but for efficiency I declare the array index to start with 0 instead. Slightly faster, yet. See:
Normalize array subscripts so they start with 1
The manual on mathematical functions.
PARALLEL SAFE for Postgres 9.6 or later. See:
PARALLEL label for a function with SELECT and INSERT
When to mark functions as PARALLEL RESTRICTED vs PARALLEL SAFE?
You can use the plain SELECT statement if you don't want to create a function:
SELECT ('[0:2]={1,2,5}'::int[])[trunc(random() * 3)::int];
Erwin Brandstetter answered the OP's question well enough. However, for others looking for understanding how to randomly pick elements from more complex arrays (like me some two months ago), I expanded his function:
CREATE OR REPLACE FUNCTION random_pick( a anyarray, OUT x anyelement )
RETURNS anyelement AS
$func$
BEGIN
IF a = '{}' THEN
x := NULL::TEXT;
ELSE
WHILE x IS NULL LOOP
x := a[floor(array_lower(a, 1) + (random()*( array_upper(a, 1) - array_lower(a, 1)+1) ) )::int];
END LOOP;
END IF;
END
$func$ LANGUAGE plpgsql VOLATILE RETURNS NULL ON NULL INPUT;
Few assumptions:
this is not only for integer arrays, but for arrays of any type
we ignore NULL data; NULL is returned only if the array is empty or if NULL is inserted (values of other non-array types produce an error)
the array don't need to be formatted as usual - the array index may start and end anywhere, may have gaps etc.
this is for one-dimensional arrays
Other notes:
without the first IF statement, empty array would lead to an endless loop
without the loop, gaps and NULLs would make the function return NULL
omit both array_lower calls if you know that your arrays start at zero
with gaps in the index, you will need array_upper instead of array_length; without gaps, it's the same (not sure which is faster, but they shouldn't be much different)
the +1 after second array_lower serves to get the last value in the array with the same probability as any other; otherwise it would need the random()'s output to be exactly 1, which never happens
this is considerably slower than Erwin's solution, and likely to be an overkill for the your needs; in practice, most people would mix an ideal cocktail from the two
Here is another way to do the same thing
WITH arr AS (
SELECT '{1, 2, 5}'::INT[] a
)
SELECT a[1 + floor((random() * array_length(a, 1)))::int] FROM arr;
You can change the array to any type you would like.
CREATE OR REPLACE FUNCTION pick_random( members anyarray )
RETURNS anyelement AS
$$
BEGIN
RETURN members[trunc(random() * array_length(members, 1) + 1)];
END
$$ LANGUAGE plpgsql VOLATILE;
or
CREATE OR REPLACE FUNCTION pick_random( members anyarray )
RETURNS anyelement AS
$$
SELECT (array_agg(m1 order by random()))[1]
FROM unnest(members) m1;
$$ LANGUAGE SQL VOLATILE;
For bigger datasets, see:
http://blog.rhodiumtoad.org.uk/2009/03/08/selecting-random-rows-from-a-table/
http://www.depesz.com/2007/09/16/my-thoughts-on-getting-random-row/
https://blog.2ndquadrant.com/tablesample-and-other-methods-for-getting-random-tuples/
https://www.postgresql.org/docs/current/static/functions-math.html
CREATE FUNCTION random_pick(p_items anyarray)
RETURNS anyelement AS
$$
SELECT unnest(p_items) ORDER BY RANDOM() LIMIT 1;
$$ LANGUAGE SQL;

Postgres SQL function string_to_array

I have a table:
c1|c2|c3|c4
-----+--+--+----
a b c 10
a a b 20
c a c 10
b b c 10
c b c 30
I want to write a function where the inputs are 3 strings / text eg ('a b c, b d, c'), compare every element to each other, find if a row exist with this combination, an sum the number of the 4th (c4) column up. But if there is a constellation of b a c or c a b it would match a b c 10. If there is a row like b c c then it wont be a row like c b b. Every matchup is unique.
I think the best would be to use string_to_array(text, text).
I put together some pseudo code, but no idea how to write it in SQL. Maybe the logic is wrong too.
function (x,y,z)
res = 0
x_array = string_to_array(x, ' ')
y_array = string_to_array(y, ' ')
z_array = string_to_array(z, ' ')
foreach(x_item in x_array)
foreach(y_item in y_array)
foreach(z_item in z_array)
if (c1 = (x_item || y_item || z_item ) && c2 = (x_item || y_item || z_item ) && c3 = (x_item || y_item || z_item ))
res++
EDIT
First off all there was a mistake in the example table. There was a row a b c and c b a. It cant be. a b c = c b a ! and each row must be unique.
example: three text inputs a b c | b c | c
each element vs each element: a b c , a c c, b b c, b c c, c b c, c c c
a b c = 10;
a c c (is the same as c a c) = 10;
b b c = 10;
b c c (is the same as c b c) = 30;
c b c = 30;
c c c (no match) = 0; result = 90
I think this might be what you want:
Return the sum of column c4 from all rows where a given set of three tokens matches the columns (c1, c2, c3).
Simple version
Much simpler with contains #> and is contained <# by operators:
SELECT sum(c4) AS sum_of_matching_c4
FROM tbl
WHERE ARRAY[c1,c2,c3] <# ARRAY['b', 'a', 'c'] -- strings in arbitrary order
AND ARRAY[c1,c2,c3] #> ARRAY['b', 'a', 'c'];
Sorry, that would fail for ('b', 'c', 'c') vs. ('c', 'b', 'b').
Slow and sure
WITH i(arr) AS (
SELECT ARRAY(VALUES ('b'), ('c'), ('c') ORDER BY 1) -- input once
) -- in arbitrary order
SELECT sum(c4) AS sum_of_matching_c4
FROM (
SELECT c4, array_agg(x ORDER BY x) AS arr
FROM (
SELECT ctid, c4, unnest(ARRAY[c1,c2,c3]) AS x
FROM tbl t, i
WHERE ARRAY[c1,c2,c3] <# arr -- optional pre-selection
AND ARRAY[c1,c2,c3] #> arr -- for better performance?
) a
GROUP BY ctid, c4
) b
JOIN i USING (arr)
-> sqlfiddle demo.
The major difficulty is to order the values of the columns within the row.
For your input (3 strings) I achieve this in the WHERE clause with a VALUE expression in the CTE which I order right away and collect it in an array. I use a CTE for convenience, so we have to enter values in one place only.
It's more complicated for the row values. I put the three columns in an array and break that up to rows with unnest(). As you did not provide a primary key, I use the ctid as ad-hoc surrogate primary key instead - which I need for the GROUP BY to stuff the now sorted (c1, c2, c3) into an array.
Finally I sum up all c4 of rows where the now sorted arrays match exactly.
Note: I expressly do not use string_agg() because that does not produce distinct results. Consider:
'abc' 'cde' 'fgh'
'ab' 'ccdef' 'gh'
.. resulting int the same string if concatenated.
Index / Performance
You might consider to save pre-ordered data to speed up queries. Doing it on the fly is expensive. I.e. you could pre-generate the sorted array and save it as redundant column which you can then support with an index. Should be faster by several orders of magnitude for the cost of redundant data storage.
If you are dealing with long strings, a solution similar to what I outlined in this related answer on dba.SE might be the best course of action.
Alternatively (preferred!) guarantee that (c1, c2, c3) are always stored in ascending order. You could use a trigger BEFORE INSERT OR UPDATE to keep values within the row ordered. No redundant storage and you can simply create a multi-column index on the three columns and compare to them one by one (instead of comparing the array like in my example).
You don't need to write a function for that.
First, there's no "strings" with postgresql ( sql ) , it's "text" or "varchar".
Second, what you need is an SQL query like this:
SELECT ( DISTINCT ( c1 || c2 || c3 )) AS txtcol, SUM (c4) AS rowsum;
or
SELECT ( DISTINCT ( c1 || c2 || c3 )) AS txtcol, SUM(c4) AS numsum GROUP BY txtcol;
Can't recall the exact syntax at the moment, you need to work it out,
anyway the point is you need to concatenate varchar columns with some built-in
function like CONCAT or "||" operator, and then sum/group by numeric column. All you need
is to concatenate columns, and give resulting all-together column a name.
To be exact, you don't even need to show concatenated column on resulting table,
you could output just sums, and number of rows sumarized for example.
Theoretically you could write SQL function or PL/SQL function for that, but I'm sure it's just not necessary, your case seems to me simple enough to be able to achieve result you want without a function. Built-in sumarizing function SUM() is called "aggregate" function, other examples of aggregating functions are e.g. MIN() or MAX().
Note what you're actually trying to do, is grouping rows by some resulting VARCHAR column by the effect of concatenation per-row.
EDIT: "Arrays" in SQL or procedural SQL is some internally-handled arrays, do not confuse them with relations ( tables in database, nor with tables as SELECT results ). I think you also don't need SQL arrays for that, the task really isn't so hard as it looks like.

PostgreSQL - CREATE INDEX

I'm working with PostgreSQL to create some data types written in C.
For example, I have:
typedef struct Point3D
{
char id[50];
double x;
double y;
double z;
Point3D;
}
The input and output functions are working properly.
But the problem is the following:
Every id of Point3D must be unique (and can be NULL), so I have decided to create an unique index on this field id, but is that possible?
I'm thinking in something like this:
CREATE UNIQUE INDEX test_point3d_idx ON test_point3d (( getID(columname) ));
where getID returns the field ID of columname.
But I need to implement getID and I am really blocked.
Any advice?
The Postgres manual section "Interfacing Extensions To Indexes" explains indexes on user-defined types like your Point3D. That requires a fair amount of work. I don't know any shortcuts.
Unrelated to your question: are you sure you need this C-language Point3D datatype? Mistakes in such a datatype definition can "confuse or even crash the server". I presume the same applies to C-language operator functions supporting it.
Could you create tables with four columns, one for each Point3D field? Otherwise, could you forego C in favor of a simple CREATE TYPE point3d AS (id char(50), x float8, y float8, z float8)? Perhaps not, but worth a shot...
A unique column will allow multiple values of NULL because NULL is an unknown value so one null compared to another can never really be considered to be equal. Now logically you might consider NULL = NULL to be true, but unique constraint doesn't work that way.
Simple example to prove it.
CREATE TABLE test2
(
unq_id integer NULL,
CONSTRAINT uq_test2 UNIQUE (unq_id)
);
INSERT INTO test2
SElECT NULL;
INSERT INTO test2
SElECT NULL;
INSERT INTO test2
SElECT NULL;
SELECT *
FROM test2;

How would you query an array of 1's and 0's chars from a database?

Say you had a long array of chars that are either 1 or 0, kind of like a bitvector, but on a database column. How would you query to know what values are set/no set? Say you need to know if the char 500 and char 1500 are "true" or not.
SELECT
Id
FROM
BitVectorTable
WHERE
SUBSTRING(BitVector, 500, 1) = '1'
AND SUBSTRING(BitVector, 1000, 1) = '1'
No index can be used for this kind of query, though. When you have many rows, this will get slow very quickly.
Edit: On SQL Server at least, all built-in string functions are deterministic. That means you could look into the possibility to make computed columns based on the SUBSTRING() results for the whole combined value, putting an index on each of them. Inserts will be slower, table size will increase, but searches will be really fast.
SELECT
Id
FROM
BitVectorTable
WHERE
BitVector_0500 = '1'
AND BitVector_1000 = '1'
Edit #2: The limits for SQL Server are:
1,024 columns per normal table
30.000 columns per "wide" table
In MySQL, something using substring like
select foo from bar
where substring(col, 500,1)='1' and substring(col, 1500,1)='1';
This will be pretty inefficient though, you might want to rethink your schema. For example, you could store each bit separately to tradeoff space for speed...
create table foo
(
id int not null,
bar varchar(128),
primary key(id)
);
create table foobit
(
int foo_id int not null,
int idx int not null,
value tinyint not null,
primary key(foo_id,idx),
index(idx,value)
);
Which would be queried
select foo.bar from foo
inner join foobit as bit500
on(foo.id=bit500.foo_id and bit500.idx=500)
inner join foobit as bit1500
on(foo.id=bit1500.foo_id and bit1500.idx=1500)
where
bit500.value=1 and bit1500.value=1;
Obviously consumes more storage, but should be faster for those query operations as an index will be used.
I would convert the column to multiple bit-columns and rewrite the relevant code - Bit masks are so much faster than string comparisons. But if you can't do that, you must use db-specific functions. Regular expressions could be an option
-- Flavor: MySql
SELECT * FROM table WHERE column REGEXP "^.{499}1.{999}1"
select substring(your_col, 500,1) as char500,
substring(your_col, 1500,1) as char1500 from your_table;

Resources