I have a series of rows and I need to aggregate values from these rows into the groups of N elements, accumulating values from current and N-1 succeeding rows.
With N=3 and data being:
VALUES (1),(2),(3),(4),(5);
I want to receive the following set of rows (arrays):
{1,2,3}
{2,3,4}
{3,4,5}
{4,5}
{5}
It is important, that N is a variable, so I cannot use joins.
Well, this can be solved using frames together with window functions.
The question in subject can be solved like this:
WITH v(v) AS (VALUES (1),(2),(3),(4),(5))
SELECT v,
array_agg(v) OVER (ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING) AS arr
FROM v;
And the following example illustrates how to get a list of complete arrays, i.e. eliminate those that don't contain all N entries:
WITH cnt(c) AS (SELECT 3),
val(v) AS (VALUES (1),(2),(3),(4),(5)),
arr AS
(SELECT v,
array_agg(v) OVER (ROWS BETWEEN CURRENT ROW
AND (SELECT c-1 FROM cnt) FOLLOWING) AS arr
FROM val)
SELECT v,arr
FROM arr
WHERE array_upper(arr,1) = (SELECT c FROM cnt);
I really love window functions!
Related
I'm working on a variation of the Quick Sort Algorithm designed to handle arrays where there may be many duplicate elements. The basic idea is to divide the array into 3 partitions first: All Elements below the Pivot Values (with the initial pivot value being chosen at random); All Elements Equal to the Pivot Value; and All Elements Greater than the Pivot Value.
I need some advice regarding the best way to arrange the Partition..
What is the best way to arrange the Partition in a Three Way Quick Sort?
The first way I might go about it is to just keep the Pivot Partition on the left, which would make it easy to define the boundaries when I return them to the larger Quick Sort function I plan to nest the Partition function within. But that makes subsequent recursive calls to sort the Above and Below Partitions a little tricky, since they would be all lumped together in one large partition above the Pivot Partition to start with (instead of being more neatly organized into an Above and Below Partition). I could call a For Loop to insert each of these elements above and below the Pivot Partition, but I suspect that would mitigate the efficiency of the algorithm. After doing this, I could make two recursive calls to Quick Sort: once on the Below Partition, and again on the Above Partition.
OR I could modify Partition to insert "Below" elements to the left of the Pivot Partition, and insert "Above" elements to the right. This reduces the need for linear scans over the array, but it means I would have to update the left and right bounds of the partition as the Partition function operates over the array.
I believe the second choice is the better one, but I want to see if anyone has any other ideas.
For reference, the initial array might look something like this:
array = [2, 2, 1, 9, 2]
Assuming the Pivot is randomly chosen as value of "2", then after Partition, it could look either like this:
array = [2, 2, 2, 9, 1]
Or like this if I insert above and below the partition during the Partition Function:
array = [1, 2, 2, 2, 9]
And the "shell code" I'm supposed to build this function around looks like this:
def randomized_quick_sort(a, l, r):
if l >= r:
return
k = random.randint(l, r)
a[l], a[k] = a[k], a[l]
left_part_bound, right_part_bound = partition3(a, l, r)
randomized_quick_sort(a, l, left_part_bound - 1)
randomized_quick_sort(a, right_part_bound + 1, r)
*The end result doesn't need to look like this (I just need to be able to output the right result and be able to resolve within a time limit to demonstrate minimal efficiency), but it shows why I think I may need to create Above and Below partitions as I'm creating the Pivot Partition.
I am going to select some rows of a data set, except of a list of indices.
For example, in a given data_set I want to select all rows except for idx = [2,3,6,11,15].
How can we do this in MATLAB? Is there any command or logical indexing method?
There are many ways to do this by comparing your exclusion list with a full list from 1:n, where n is the number of rows. Below I've listed 2 which use this logic. I've also shown the simplest way (removing rows) but that requires an intermediate step.
I'm not sure which is more performant of these:
% Setup
idx = [2,3,6,11,15]; % exclusion list
M = rand( 25, 10 ); % test matrix to index (25 rows)
Using ismember and logical indexing
K = M( ~ismember( 1:size(M,1), idx ), : );
Using setdiff to get the row numbers not listed
K = M( setdiff( 1:size(M,1), idx ), : );
Creating a temp matrix, then removing the excluded rows
K = M;
K( idx, : ) = [];
Error handling
Note that the final row-removing method will produce an error if any of the rows in your exclusion list are out of bounds (e.g. if 0 was in idx).
The setdiff and ismember methods won't give you any errors, the presence of out-of-bounds values in idx are simply redundant.
Given a vector X of discrete positive integers with size 160*1, and a table Tb1 in size 40*200, that contains a list of indices to be deleted from X Each column from the 200 columns in Tb1 points to 40 elements to be deleted from original X.
I create a new matrix of the remaining 120*200 elements by using a for loop with 200 iterations, that at round i deletes 40 elements from a copy of the original X according to the indices listed in Tb1(:,i), but it takes too much time and memory.
How can I get the result without using loops and with a minimum number of operations?
Here are different methods:
Method1:
idx = ~hist(tbl, 1:160);
[f,~]=find(idx);
result1 = reshape(M(f),120,200);
Method2:
idx = ~hist(tbl, 1:160);
M2=repmat(M,200,1);
result2 = reshape(M2(idx),120,200);
Method 3 & 4:
% idx can be generated using accumarray
idx = ~accumarray([tbl(:) reshape(repmat(1:200,40,1),[],1)],true,[160,200],#any);
%... use method 1 and 2
Method5:
M5=repmat(M,200,1);
M5(bsxfun(#plus,tbl,0:160:160*199))=[];
result5 = reshape(M5,120,200);
Assuming that M is an array of integers and tbl is the table of indices.
It can be tested with the following data:
M = rand(160,1);
[~,tbl] = sort(rand(160,200));
tbl = tbl(1:40,:);
However it is more efficient if you generate indices of elements to be remained instead of indices of elements to be removed.
I'm trying to pass data around as a multidimensional array, and I'm getting behavior that seems odd to me. Specifically I'm trying to get a single element out of a 2 dimensional array (so a 1 dimensional array out of my 2 dimension array), and it doesn't work the way I'd expect.
In the following examples #2, 4, & 5 work the way I'd expect, but 1 & 3 do not.
db=> select s.col[2] from (select array[[1,2,3],[4,5,6]] as col) s;
col
-----
(1 row)
db=> select s.col[2:2] from (select array[[1,2,3],[4,5,6]] as col) s;
col
-----
{{4,5,6}}
(1 row)
db=> select array[s.col[2]] from (select array[[1,2,3],[4,5,6]] as col) s;
array
--------
{NULL}
(1 row)
db=> select array[s.col[2:2]] from (select array[[1,2,3],[4,5,6]] as col) s;
array
-------------
{{{4,5,6}}}
(1 row)
db=> select s.col[2][1] from (select array[[1,2,3],[4,5,6]] as col) s;
col
-----
4
(1 row)
Is there doc on this? I have something that's working well enough for me right now, but it's ugly and I worry it won't do the things I want to do next. Technically I'm getting a 2 dimensional array, where 1 dimension only has 1 element. I'd rather just get an array.
I've read (among others):
http://www.postgresql.org/docs/9.1/static/arrays.html
http://www.postgresql.org/docs/9.1/static/functions-array.html
http://www.postgresql.org/docs/9.1/static/sql-expressions.html#SQL-SYNTAX-ARRAY-CONSTRUCTORS
And I'm just not seeing what I'm looking for.
Postgres array elements are always base elements, i.e. non-array types. Sub-arrays are not "elements" in Postgres. Array slices retain original dimensions.
You can either extract a base element, with element data type. Or you can extract an array slice, which retains the original array data type, and also original array dimensions.
Your idea to retrieve a sub-array as "element" would conflict with that and is just not implemented.
The manual might be made clearer in its explanation. But at least we can find:
If any dimension is written as a slice, i.e., contains a colon, then
all dimensions are treated as slices. Any dimension that has only a
single number (no colon) is treated as being from 1 to the number
specified. For example, [2] is treated as [1:2] ...
Your 1st example tries to reference a base element, which is not found (you'd need a subscript with two array indexes in a 2D array). So Postgres returns NULL.
Your 3rd example just wraps the resulting NULL in a new array.
To flatten an array slice (make it a 1D array) you can unnest() and feed the resulting set to a new ARRAY constructor. Either in a correlated subquery or in a LATERAL join (requires pg 9.3+). Demonstrating both:
SELECT s.col[2:2][2:3] AS slice_arr
, x.lateral_arr
, ARRAY(SELECT unnest(s.col[2:2][2:3])) AS corr_arr
FROM (SELECT ARRAY[[1,2,3],[4,5,6]] AS col) s
, LATERAL (
SELECT ARRAY(SELECT * FROM unnest(s.col[2:2][2:3])) AS lateral_arr
) x;
Be sure to read the current version of the manual. your references point to Postgres 9.1, but chances are you are actually using Postgres 9.4.
Related:
How to select 1d array from 2d array?
Unnest array by one level
I have a matrix with six columns. I found the max value of a certain column but how would I go about extracting the entire row pertaining to that value?
To extract row 1 of matrix A use A([1],:) to extract row 1 and 2 use A([1,2],:)
Use the max() function as explained here. For example
if A is your matrix
[M, I] = max(A)
Row = A([I(1)],:)
where I(1) is used to find the row containing the max element of the first coloumn