How to get the dimensionality of an ARRAY column? - arrays

I'm working on a project that collects information about your schema from the database directly. I can get the data_type of the column using information_schema.columns, which will tell me if it's an ARRAY or not. I can also get the underlying type (integer, bytea etc) of the ARRAY by querying information_schema.element_types as described here:
https://www.postgresql.org/docs/9.1/static/infoschema-element-types.html
My problem is that I also need to know how many dimensions the array has, whether it is integer[], or integer[][] for example. Does anyone know of a way to do this? Google isn't being very helpful here, hopefully someone more familiar with the Postgres spec can lead me in the right direction.

For starters, the dimensionality of an array is not reflected in the data type in Postgres. The syntax integer[][] is tolerated, but it's really just integer[] internally.
Read the manual here.
This means that dimensions can vary within the same array type (the same table column).
To get actual dimensions of a particular array value:
SELECT array_dims(my_arr); -- [1:2][1:3]
Or to just get the number of dimensions:
SELECT array_ndims(my_arr); -- 2
There are more array functions for similar needs. See table of array functions in the manual.
Related:
Use string[][] with ngpsql
If you need to enforce particular dimensions in a column, add a CHECK constraint. To enforce 2-dimensional arrays:
ALTER TABLE tbl ADD CONSTRAINT tbl_arr_col_must_have_2_dims
CHECK (array_ndims(arr_col) = 2);

Multidimensional arrays support in Postgres is very specific. Multidimensional array types do not exist. If you declare an array as multidimensional, Postgres casts it automatically to a simple array type:
create table test(a integer[][]);
\d test
Table "public.test"
Column | Type | Modifiers
--------+-----------+-----------
a | integer[] |
You can store arrays of different dimensions in a column of an array type:
insert into test values
(array[1,2]),
(array[array[1,2], array[3,4]]);
select a, a[1] a1, a[2] a2, a[1][1] a11, a[2][2] a22
from test;
a | a1 | a2 | a11 | a22
---------------+----+----+-----+-----
{1,2} | 1 | 2 | |
{{1,2},{3,4}} | | | 1 | 4
(2 rows)
This is a key difference between Postgres and programming languages like C, python etc. The feature has its advantages and disadvantages but usually causes various problems for novices.
You can find the number of dimensions in the system catalog pg_attribute:
select attname, typname, attndims
from pg_class c
join pg_attribute a on c.oid = attrelid
join pg_type t on t.oid = atttypid
where relname = 'test'
and attnum > 0;
attname | typname | attndims
---------+---------+----------
a | _int4 | 2
(1 row)
It is not clear whether you can rely on this number, as for the documentation:
attndims - Number of dimensions, if the column is an array type; otherwise 0. (Presently, the number of dimensions of an array is not enforced, so any nonzero value effectively means "it's an array".)

Related

GIN index on array column not used, even after setting `enable_seqscan` to off?

As recommended by this comment, I built an intarray GIN index. I even set local enable_seqscan = 'off';. Still, when I EXPLAIN my query, it is doing the sequential scan. For demonstration, I created a dummy table called deletethis.
devapp=>
devapp=> \d deletethis;
Table "public.deletethis"
Column | Type | Collation | Nullable | Default
--------+-----------+-----------+----------+---------
col1 | integer[] | | |
Indexes:
"deletethis_idx" gin (col1 gin__int_ops)
devapp=>
devapp=>
devapp=> BEGIN; set local enable_seqscan = False; SHOW enable_seqscan; EXPLAIN SELECT * from deletethis where 1 = ANY(col1); COMMIT;
BEGIN
SET
enable_seqscan
----------------
off
(1 row)
QUERY PLAN
-------------------------------------------------------------------------------
Seq Scan on deletethis (cost=10000000000.00..10000000040.60 rows=7 width=32)
Filter: (1 = ANY (col1))
(2 rows)
COMMIT
devapp=>
Why is it still doing Seq Scan and not using the index despite the enable_seqscan having been set to off?
As I mentioned in another answer to the same question you referenced, the ANY construct cannot tap into a GIN index:
Can PostgreSQL index array columns?
This can use your index:
SELECT * FROM deletethis WHERE col1 #> '{1}';
'{1}' being an array literal. An array constructor (ARRAY[1]) or any variable or column of type integer[] would work, too.
See:
Check if value exists in Postgres array
Cannot INSERT: ERROR: array value must start with "{" or dimension information
An explicit type cast for the input is optional because the assignment cast for the literal as well as the default type for the constructor happen to work for us.
But while using operator classes of the additional module intarray, be careful to operate with integer (int4) and nothing else. Related:
Compare arrays for equality, ignoring order of elements
GIN index on smallint[] column not used or error "operator is not unique"

Alternative solutions to an array search in PostgreSQL

I am not sure if my database design is good for this tricky case and I also ask for help how the query for this could look like.
I plan a query with the following table:
search_array | value | id
-----------------------+-------+----
{XYa,YZb,WQb} | b | 1
{XYa,YZb,WQb,RSc,QZa} | a | 2
{XYc,YZa} | c | 3
{XYb} | a | 4
{RSa} | c | 5
There are 5 main elements in the search_array: XY, YZ, WQ, RS, QZ and 3 Values: a, b, c that are concardinated to each element.
Each row has also one value: a, b or c.
My aim is to find all rows that fit to a specific row in this sense: At first it should be checked if they have any same main elements in their search_arrays (yellow marked in the example).
As example:
Row id 4 an row id 5 wouldnt match because XY != RS.
Row id 1, 2 and 3 would match two times because they have all XY and YZ.
Row id 1 and 2 would even match three times because they have also WQ in common.
And second: if there is a Main Element match it should be 'crosschecked' if the lowercase letters after the Main Elements fit to the value of the other row.
As example: The only match for Row id 1 in the table would be Row id 4 because they both search for XY and the low letters after the elements match each value of the two rows.
Another match would be ROW id 2 and 5 with RS and search c to value c and search a to value a (green and orange marked).
My idea was to cut the search_array elements in the query in two parts with the RIGHT and LEFT command for strings. But I dont know how to combine the subqueries for this search.
Or would be a complete other solution faster? Like splitting the search array into another table with the columns 'foregin key' to the maintable, 'main element' and 'searched_value'. I am not sure if this is the best solution because the program would all the time switch to the main table to find two rows out of 3 million rows to compare their searched_values to the values?
Thank you very much for your answers and your time!
You'll have to represent the data in a normalized fashion. I'll do it in a WITH clause, but it would be better to store the data in this fashion to begin with.
WITH unravel AS (
SELECT t.id, t.value,
substr(u.val, 1, 2) AS arr_main,
substr(u.val, 3, 1) AS arr_val
FROM mytable AS t
CROSS JOIN LATERAL unnest(t.search_array) AS u(val)
)
SELECT a.id AS first_id,
a.value AS first_value,
b.id AS second_id,
b.value AS second_value,
a.arr_main AS main_element
FROM unravel AS a
JOIN unravel AS b
ON a.arr_main = b.arr_main
AND a.arr_val = b.value
AND b.arr_val = a.value;

Insert array of JSONB objects from one table as multiple rows in second table

We are trying to migrate data from an array column containing JSONB to a proper Postgres table.
{{"a":1,"b": 2, "c":"bar"},{"a": 2, "b": 3, "c":"baz"}}
a | b | c
---+---------+---
1 | 2 | "bar"
2 | 3 | "baz"
As part of the process, we have made several attempts using functions like unnest and array_to_json. In the unnest case, we get several JSONB rows, but cannot figure out how to insert them into the second table. In the array_to_json case, we are able to cast the array to a JSON string, but the json_to_recordset does not seem to accept the JSON string from a common table expression.
What would be a good strategy to 'mirror' the array of JSONB items as a proper table, so that we can run the query inside of a stored procedure, triggered on insert?
Use unnest() in a lateral join:
with my_data(json_column) as (
values (
array['{"a":1,"b":2,"c":"bar"}','{"a":2,"b":3,"c":"baz"}']::jsonb[])
)
select
value->>'a' as a,
value->>'b' as b,
value->>'c' as c
from my_data
cross join unnest(json_column) as value
a | b | c
---+---+-----
1 | 2 | bar
2 | 3 | baz
(2 rows)
You may need some casts or converts, e.g.:
select
(value->>'a')::int as a,
(value->>'b')::int as b,
(value->>'c')::text as c
from my_data
cross join unnest(json_column) as value
Lateral joining means that the function unnest() will be executed for each row from the main table. The function returns elements of the array as value.

Use TQuery.Locate() function to find other then first matching

Locate moves the cursor to the first row matching a specified set of search criteria.
Let's say that q is TQuery component, which is connected to the database with two columns TAG and TAGTEXT. With next code I am getting letter a. And I would like to use Locate() function to get letter d.
If q.Locate('TAG','1',[loPartialKey]) Then
begin
tag60 := q.FieldByName('TAGTEXT');
end
For example if I got table like this:
TAG | TAGTEXT
+---+--------+
| 1 | a |
+---+--------+
| 2 | b |
+---+--------+
| 3 | c |
+---+--------+
| 1 | d |
+---+--------+
| 4 | e |
+---+--------+
| 1 | f |
+---+--------+
is it possible to locate the second time number one occurred in table?
EDIT
My job is to find the occurrence of TAG with value 1 (which occurrence I need depends on the parameter I get), I need to iterate through table and get the values from all the TAGTEXT fields till I find that value in TAG field is again number 1. Number 1 in this case represents the start of new segment, and all between the two number 1s belongs to one segment. It doesn't have to be same number of rows in each segment. Also I am not allowed to do any changes on table.
What I thought I could do is to create a counter variable that is going to be increased by one every time it comes to TAG with value 1 in it. When the counter equals to the parameter that represents the occurrence I know that I am in the right segment and I am going to iterate through that segment and get the values I need.
But this might be slow solution, and I wanted to know if there was any faster.
You need to be a bit wary of using Locate for a purpose like this, because some
TDataSet descendants' implementation of Locate (or the underlying db-access layer) construct a temporary index on the dataset. which can be discarded immediately afterwards, so repeatedly calling Locate to iterate the rows of a given segment may be a lot more inefficient than one might expect it to be.
Also, TClientDataSet constructs, uses and then discards an expression parser for each invocation of Locate (in its internal call to LocateRecord), which is a lot of overhead for repeated calls, especial when they are entirely avoidable.
In any case, the best way to do this is to ensure that your table records which segment a given row belongs to, adding a column like the SegmentID below if your table does not already have one:
TAG | TAGTEXT|SegmentID
+---+--------+---------+
| 1 | a | 1
| 2 | b | 1
| 3 | c | 1
| 1 | d | 2
+---+--------+---------+ // btw, what happened to the 2 missing rows after this one?
| 4 | e | 2
| 1 | f | 3
+---+--------+---------+
Then, you could use code like this to iterate the rows of a segment:
procedure IterateSegment(Query : TSomeTypeOfQueryComponent; SegmentID : Integer);
var
Sql; String;
begin
Sql := Format('select * from mytable where SegmentID = %d order by Tag', [SegmentID]);
if Query.Active then
Query.Close;
Query.Sql.Text := Sql;
Query.Open;
Query.DisableControls;
try
while not Query.Eof do begin
// process row here
Query.Next;
end;
finally
Query.EnableControls;
end;
end;
Once you have the SegmentID column in the table, if you don't want to open a new query to iterate a block, you can set up a local index (by SegmentID then Tag), assuming your dataset type supports it, set a filter on the dataset to restrict it to a given SegmentID and then iterate over it
You have much options to do this.
If your component donĀ“t provide a locateNext you can make your on function locateNext, comparing the value and make next until find.
You can also bring the sql with order by then use locate for de the first value and test if the next value match the comparision.
If you use a clientDataset you can filter into the component filter propertie, or set IndexFieldNames to order values instead the "order by" of sql in the prior suggestion.
You can filter it on the SQL Where clausule too.

Mask some fields when loading hdb in kdb

If there are two instances: a1 and a2, and both connect to the same hdb. I would like a2 connect to the hdb but add some filter. For example, there is a table called elec.
I would like a2 starts with filtering some of the values. If I write codes and let a2 load it when starting, doesn't that load the information to memory? Is there any way I can load it like normal hdb when starting a2 instance?
Basically, the question is how to mask some fields in one table when loading hdb?
It is possible to prevent columns from being returned by select statements be manipulating the table definition in your HDB instance. The below example has a single date paritioned table. We update the definition to a flipped dictionary with only a subset of the columns defined. This however is reversible and will not update the meta of the table in your instance which will still show all columns.
q)meta trade
c | t f a
----| -----
date| d
sym | s p
size| j
px | f
side| s
q)flip trade
`sym`size`px`side!`trade
q)`trade set flip `sym`size`px!`trade
q)select from trade where date=2017.05.27
date sym size px
------------------------------
2017.05.27 APPl 9968 92.79204
2017.05.27 APPl 9788 94.97189
2017.05.27 APPl 9660 27.62907
q)meta trade
c | t f a
----| -----
date| d
sym | s p
size| j
px | f
side| s

Resources