Function to count the number of elements in an OBJECT - snowflake-cloud-data-platform

Is there a function for counting the number of elements in an OBJECT data type? ARRAY has ARRAY_SIZE(). VARCHAR has LEN() or LENGTH().
I am used to other query languages where I can use a function like SIZE() or CARDINALITY().

The best answer I can think of using Snowflake SQL to do ARRAY_SIZE(OBJECT_KEYS(x)). However, this seems more complicated than it needs to be.
For the special case of checking for empty OBJECT (cardinality 0), I could compare x = OBJECT_CONSTRUCT().

Related

How to Rank an array directly or a group of arrays without creating more cell references?

How to RANK an array directly? I would like to avoid creating more intermediate data in cells just to reference them.
Excel RANK.AVG formula states it accepts both array and reference:
Syntax
RANK.AVG(number,ref,[order])
The RANK.AVG function syntax has the following arguments:
Number Required. The number whose rank you want to find.
Ref Required. **An array of, or a reference to**, a list of numbers. Nonnumeric values in Ref are ignored.
Order Optional. A number specifying how to rank number.
But Excel keeps rejecting the below formula.
=RANK.AVG(5, {3,1,7,10,5})
If the numbers are put in cells, say B1:B5, Excel accepts
=RANK.AVG(5, B1:B5}
Ultimately, I would like to rank a dynamic array
=RANK.AVG(value, TOCOL(VSTACK(array1, array2))
e.g. =RANK.AVG(5, TOCOL(VSTACK(B1:B5,C1:C10))
It seems that the official documentation on the various RANK functions is simply wrong with respect to the fact that they permit arrays for the ref argument (see here, for example).
You will have to come up with creative alternatives which mimic the RANK.AVG function, for example:
=LET(ζ,SORT(MyArray,,-1),AVERAGE(FILTER(SEQUENCE(COUNT(ζ)),ζ=MyValue)))

Position of the lowest value greater than x in ordered postgresql array (optimization)

Looking at the postgres function array_position(anyarray, anyelement [, int])
My problem is similar, but I'm looking for the position of the first value in an array that is greater than an element. I'm running this on small arrays, but really large tables.
This works:
CREATE OR REPLACE FUNCTION arr_pos_min(anyarray,real)
RETURNS int LANGUAGE sql IMMUTABLE PARALLEL SAFE AS
'select array_position($1,(SELECT min(i) FROM unnest($1) i where i>$2))';
the array_position takes advantage of the fact that my array is ordered, but the second part doesn't. And I feel like the second part could potentially just return the position without having to re-query.
My arrays are only 100 elements long, but I have to run this millions of times and so looking for a performance pickup.
Suggestions appreciated.
This seems to be a bit faster
CREATE OR REPLACE FUNCTION arr_pos_min(p_input anyarray, p_to_check real)
RETURNS int
AS
$$
select t.idx
from unnest(p_input) with ordinality as t(i, idx)
where t.i > p_to_check
order by t.idx
limit 1
$$
LANGUAGE sql
IMMUTABLE
PARALLEL SAFE
;
The above will use the fact that the values in the array are already sorted. Sorting by the array index is therefor quite fast. I am not sure if unnest() is guaranteed in this context to return the elements in the order they are stored in the array. If that was the case, you could remove the order by and make it even faster.
I don't think that there is a more efficient solution than yours, except if you write a dedicated C function for that.
Storing large arrays is often a good recipe for bad performance.

How to use computed values inside a function that demands an array

There's a handy Excel function called SMALL that lets you find the n-th smallest value from an array. For example: SMALL({35;10;5000;6},2) = 10, the second smallest number in the set.
You could use this function by referencing an array of cells (SMALL(A1:A10,2)) or you can write an array of constant values in the formula directly (SMALL({1;2;3},2)).
Is there a way to write an array of computed values directly in the formula? It should look something like this, if using RAND to generate the values:
SMALL({RAND();RAND();RAND()},2)
but Excel doesn't allow that.
How can you use a function (like RAND) inside another function that demands an array (like SMALL)?
Yes, I'm aware that the usual solution would be to put the computed values in their own individual cells, then just use that array as the input of SMALL. It would be great if I could do this all inside a single cell.
The general answer as you may know is that you would use an array formula.
This is a bit difficult to illustrate with RAND(), but if you take a more normal function like SQRT, the usage would be
=SMALL(SQRT({9,4,1}),1)
so instead of providing a single argument to SQRT, you are providing a list of arguments which it is going to work through one at a time and return an array of 3 elements {3,2,1} which is passed to SMALL to evaluate.
If the same list of numbers was in (say) A1:A3, you would need to enter this as an array formula using CtrlShiftEnter
=SMALL(SQRT(A1:A3),1)
But as pointed out by #Jeeped, it's often more convenient to use AGGREGATE
=AGGREGATE(15,6,SQRT(A1:A3),1)
As far as I know, you can't reproduce this behaviour with RAND because it is one of the few Excel functions that takes no arguments.
If you really did want to generate an array of random numbers r where 0<=r<1 , you would have to do something like this
=SMALL((RANDBETWEEN({0,0,0},10^20-1)/10^20),1)
i.e. use RANDBETWEEN with an arbitrarily large upper limit.

Access array elements from string argument in Modelica

I'm having a task in Modelica, where within a function, I want to read out values of a record (parameters) according to a given string type argument, similar to the dictionary type in Python.
For example I have a record containing coefficicents for different media, I want to read out the coefficients for methane, so my argument is the string "Methane".
Until now I solve this by presenting a second array in my coefficients-record storing the names of the media in strings. This array I parse in a for loop to match the requested media-name and then access the coefficients-array by using the found index.
This is obviously very complicated and leads to a lot of confusing code and nested for loops. Isn't there a more convenient way like the one Python presents with its dictionary type, where a string is directly linked to a value?
Thanks for the help!
There are several different alternatives you can use. I will add the pattern I like most:
model M
function index
input String[:] keys;
input String key;
output Integer i;
algorithm
i := Modelica.Math.BooleanVectors.firstTrueIndex({k == key for k in keys});
end index;
constant String[3] keys = {"A","B","C"};
Real[size(keys,1)] values = {1,2*time,3};
Real c = values[index(keys,"B")] "Coefficient";
annotation(uses(Modelica(version="3.2.1")));
end M;
The reason I like this code is because it can be made efficient by a Modelica compiler. You create a keys vector, and a corresponding data vector. The reason it is not a record is that you want the keys vector to be constant, and the values may vary over time (for a more generic dictionary than you wanted).
The compiler can then create a constant index for any constant names you want to lookup from this. This makes sorting and matching better in the compiler (since there are no unknown indexes). If there is a key you want to lookup at run-time, the code will work for this as well.

Skip column in an array

I have a VBA function that returns an array to be displayed in Excel. The array's first two columns contain ID's that don't need to be displayed.
Is there any way to modify the Excel formula to skip the first two columns, without going back to create a VBA helper to strip off the columns?
The formula looks like this, where the brackets let the array be displayed across a span of cells:
{=GetCustomers($a$1)}
The closest thing Excel has to built-in array manipulation is the 'INDEX' function. In your case, if the array returned by your 'GetCustomers' routine is a single row, and if you know how long it is (which I guess you do since you're putting it into the sheet), you can get what you want by doing something like this:
=INDEX(GetCustomers($A$1),{3,4,5})
So say GetCustomers() returned the array {1,2,"a","b","c"}, the above would just give back {"a","b","c"}.
There are various ways to save yourself having to type out your array of indices, though. For example,
=COLUMN(C1:E1)
will return {3,4,5}, and you can use that instead:
=INDEX(GetCustomers($A$1),COLUMN(C1:E1))
This trick doesn't work with a true 2-D array, though. 'INDEX' will return a whole row or column if you pass in a zero in the right place, but I don't know how to make it return a 2-D subset. EDIT: You can do it, but it's cumbersome. Say your array is 2x5, and you want the last three columns. You could use:
=INDEX(GetCustomers($A$1), {1,1,1;2,2,2}, {3,4,5;3,4,5})
(FURTHER EDIT: chris neilsen provides a nice way to compute those arrays in his answer.)
Charles Williams has a link on his website that explains more about using arrays like this here:
http://www.decisionmodels.com/optspeedj.htm
He posted that in response to this question I asked:
Is there any documentation of the behavior of built-in Excel functions called with array arguments?
Personally, I use VBA helper functions for things like this. With the right library routines, you can do something like:
=subseq(GetCustomers($A$1),3,3)
in the 1-D case, or:
=hstack(subseq(asCols(GetCustomers($A$1)),3,3))
in the 2-D case, and it's much more clear.
simplest solution is to just hide the first two columns
another may be to use OFFSET to resize the returned array
syntax is
OFFSET(reference,rows,cols,height,width)
I suggest modifying the 'GetCustomers' function to include an optional Boolean variable to tell the function to return all the columns, or just the single column. That would be the cleanest solution, instead of trying to handle it on the formula side.
Public Function GetCustomers(rng as Range, Optional stripColumns as Boolean = False) as Variant()
If stripColumns Then 'Resize array to meet your needs
Else 'return full sized array
End If
End Function
You can use the INDEX function to extract items from the return array
- formula is in an range starting at cell B2
{=INDEX(getcustomers($A$1),ROW()-ROW($B$2)+1,COLUMN()-COLUMN($B$2)+3)}

Resources