Read Array of Hashes content - arrays

I've pushed the results of my DBI query into the Array of Hashes and called a subroutine with these AoH's as an input (the same subroutine with different AoH's). As I don't know the exact size of my AoH, I'd like to determine it dynamically. Is there any way to get the number of fields/columns in the AoH?
Something like scalar #inContent, but I need the horizontal size only.
According to the actual size of AoH, I'd like to iterate it and read its content. At the moment I address fields by name, but it may vary from AoH to AoH, so it's not a very effective solution:
foreach my $row (#inContent) {
print $row->{ID}; }
but would like to get something like that:
print $row->[0]->value;
Thank you for your help in advance.

The number of keys in the first hash in the array is
scalar keys %{$inContent[0]}
and the rest of them should have the same set of keys since it's a DBI query, so it's a good measure of "horizontal size".

Related

Checking for duplicates per column in a dynamic Excel array

I'm trying to generate a formula in Excel that evaluates the presence of duplicates in a dynamic array per column and then returns a new 1-dimensional array that contains either True or False when duplicates are present/absent. See the simplified example below.
Example array:
{a,b,c,d;
d,e,f,g;
a,h,i,j}
The output of the formula should result in {True,False,False,False}. What can I try next?
Seemingly simple, agreed, though until Microsoft release functions such as BYCOL it's anything but, assuming you're wanting to obtain your array output using a single formula.
One option would be:
=LET(ρ,A1:D3,κ,ROWS(ρ),ε,COLUMNS(ρ),η,SEQUENCE(κ*ε,,0),γ,MATCH(ρ,INDEX(ρ,1+MOD(η,κ),1+QUOTIENT(η,κ)),0)+SEQUENCE(,ε)/10^6,MMULT(SEQUENCE(,κ),N(INDEX(FREQUENCY(γ,γ)=0,SEQUENCE(κ,ε))))>0)
Replace A1:D3 as required.
Hopefully someone will come along with an improvement.
If you'd have BYCOL() you can use something like:
=BYCOL({a,b,c,d;d,e,f,g;a,h,i,j},LAMBDA(x,COUNTA(UNIQUE(x))<>ROWS(x)))
Not particularly dynamic as CHOOSE can't be set that way but if you have a smaller number of columns you can use the following:
=LET(data,E4:H6,CHOOSE(SEQUENCE(1,4),
IF(COUNTA(UNIQUE(INDEX(data,,1)))=ROWS(data),FALSE,TRUE),
IF(COUNTA(UNIQUE(INDEX(data,,2)))=ROWS(data),FALSE,TRUE),
IF(COUNTA(UNIQUE(INDEX(data,,3)))=ROWS(data),FALSE,TRUE),
IF(COUNTA(UNIQUE(INDEX(data,,4)))=ROWS(data),FALSE,TRUE))
)

How to parse an array to a hash of arrays?

I'm a beginner (a wet lab biologist, who has to fiddle a bit with bioinformatics for the first time in my life) and today I've got stuck on one problem: how to parse an array to a hash of arrays in perl?
This doesn't work:
#myhash{$key} = #mytable;
I've finally circumvented my problem with a for loop:
for(my $i=0;$i<=$#mytable;$i++){$myhash{$key}[$i]=$mytable[$i]};
Of course it works and it does what I need to be done, but it seems to me not a solution to my problem, but just a way to circumvent it... When something doesn't work I like to understand why...
Thank you very much for your advice!
If you are asking how to put an array as one value of a hash, you do this by taking a reference to the array, since references are scalars and the values of hashes must be scalars. This is done with the backslash operator.
$myhash{$key} = \#mytable;
The for loop you describe creates such a reference through autovivification, as $myhash{$key}[0] creates an array reference at $myhash{$key} in order to assign to its index. Also note that the difference between taking a reference and copying each value is that in the former case, changes to the array after the fact will also affect the values referenced via the hash value, and vice versa.
$mytable[5] = 42; # $myhash{$key}[5] is also changed
As Grinnz mentioned you can save a reference to an array, but any change on the array latter will be reflected in hash (it is same data).
For example if you reuse same array in the loop then data in hash will reflect last iteration of the loop.
In such case you will want a copy of array stored in the hash.
#{$hash{$key}} = #array;
Programming Perl: Data strutures

How to concatenate multiple ranges within a Match function

I have a list of values that I would like to match against the combination of multiple ranges.
So, for example, my ranges are A1:A100 and B1:B100.
Instead of concatenating A with B in a new column C, i.e.
CONCAT(A1,B1)...CONCAT(A100,B100)
and then matching my value against that new column - I would like to do something like this:
MATCH(value,CONCATENATE(A1:B100),0)
And copy this down a column near my list of values.
I have a feeling this can be done with some sort of array formula...
Yes as an array formula:
=MATCH(value,$A$1:$A$100 & $B$1:$B$100,0)
Being an array formula it must be confirmed with Ctrl-Shift-Enter instead of Enter when exiting edit mode.
Though they may seem similar in approach they are not. CONCATENATE will return a string not an array to the MATCH with all 200 values in one long string. Where the above will return 100 values, each row concatenated, as an array which can be used to search.
One further note, If performance becomes a issue, Array formulas are inherently slower, adding the helper column and using a regular MATCH will improve the responsiveness.
This should work, basically you just need to concatenate it yourself using &
=MATCH(D1,A1:A10&B1:B10,0)
D1 is the value you're trying to look for.
This is an array, so remember to hit Ctrl+Shift+Enter when you input it.

How to sort an array and find the two highest peaks after using find_peaks from Scipy

I am struggling find_peaks a little...
Im applying a cubic spline to some data, from which I want to extract some peaks. However, the data may have several peaks, while I only want the two largest. I can find the peaks
peak_data = signal.find_peaks(spline, height=0.3, distance=50)
I can use this to get the x and y values at the index points within peak_data
peak_vals = spline[peak_data[0]]
time_vals = xnew[peak_data[0]] # xnew being thee splined x-axis
I thought I could order the peak_vals and keep the first two values (ie highest and second highest peaks) and then use that to get the time from xnew which coincides with those values. However, I am unable to use .sort which returns
AttributeError: 'tuple' object has no attribute 'sort'
or sorted() which returns
TypeError: '>' not supported between instances of 'int' and 'dict'
I think this is because it is indexing from a Numpy array (the spline data) and therefore creates another numpy array which then does not work either of the sort commands.
The best I can manage is to iterate through to a new list and then grab the first two values from that:
peak_val1=[]
peak_vals = spline[peak_data[0]]
for i in peak_val_d:
peak_val1.append(i)
peak_val1.sort(reverse=True)
peak_val2 = peak_val1[0:2]
This works but seems a tremendously long winded way to do this given that I still need to index the time values. I'm sure that there must be a faster (simpler) way?
Added Note: I realise that find_peaks returns an index list, but it actually seems to contain both index's and max values in and array-dictionary?? (sorry Im very new to python and curly braces means dictionary but it doesn't look like a simple dict). Anyway... print(peak_data) returns both the index positions and their values.
(array([ 40, 145, 240, 446]), {'peak_heights': array([0.34588031, 0.43761898, 0.45778744, 0.74167977])})
Is there a way to directly access these data perhaps?
You can do this:
peak_indices, peak_dict = signal.find_peaks(spline, height=0.3, distance=50)
This returns the indices of all the peaks in the array, as well as a dictionary of information about the peaks, such as their heights, prominences, etc. To get the heights of the peaks you can access the dictionary like this:
peak_heights = peak_dict['peak_heights']
Then to find the indices of the highest and second-highest peak you can do:
highest_peak_index = peak_indices[np.argmax(peak_heights)]
second_highest_peak_index = peak_indices[np.argpartition(peak_heights,-2)[-2]]
Hope this helps someone somewhere:))
Just posting this incase anyone else has similar trouble and to remind myself to read the docs carefully in future!
Assuming there are no additional arguments, find_peaks() returns a tuple containing and array of the indexes of the peak values, and a dictionary of the actual peak values. Once I realised this it's pretty simple to perform sequence unpacking to generate a separate array and dictionary. So if I began with
peak_data = signal.find_peaks(spline, height=0.3, distance=50)
all I needed to do was to unpack to two variables
peak_index, dict_vals = peak_data
Now I have the index and the values in the order they were identified.

Skip column in an array

I have a VBA function that returns an array to be displayed in Excel. The array's first two columns contain ID's that don't need to be displayed.
Is there any way to modify the Excel formula to skip the first two columns, without going back to create a VBA helper to strip off the columns?
The formula looks like this, where the brackets let the array be displayed across a span of cells:
{=GetCustomers($a$1)}
The closest thing Excel has to built-in array manipulation is the 'INDEX' function. In your case, if the array returned by your 'GetCustomers' routine is a single row, and if you know how long it is (which I guess you do since you're putting it into the sheet), you can get what you want by doing something like this:
=INDEX(GetCustomers($A$1),{3,4,5})
So say GetCustomers() returned the array {1,2,"a","b","c"}, the above would just give back {"a","b","c"}.
There are various ways to save yourself having to type out your array of indices, though. For example,
=COLUMN(C1:E1)
will return {3,4,5}, and you can use that instead:
=INDEX(GetCustomers($A$1),COLUMN(C1:E1))
This trick doesn't work with a true 2-D array, though. 'INDEX' will return a whole row or column if you pass in a zero in the right place, but I don't know how to make it return a 2-D subset. EDIT: You can do it, but it's cumbersome. Say your array is 2x5, and you want the last three columns. You could use:
=INDEX(GetCustomers($A$1), {1,1,1;2,2,2}, {3,4,5;3,4,5})
(FURTHER EDIT: chris neilsen provides a nice way to compute those arrays in his answer.)
Charles Williams has a link on his website that explains more about using arrays like this here:
http://www.decisionmodels.com/optspeedj.htm
He posted that in response to this question I asked:
Is there any documentation of the behavior of built-in Excel functions called with array arguments?
Personally, I use VBA helper functions for things like this. With the right library routines, you can do something like:
=subseq(GetCustomers($A$1),3,3)
in the 1-D case, or:
=hstack(subseq(asCols(GetCustomers($A$1)),3,3))
in the 2-D case, and it's much more clear.
simplest solution is to just hide the first two columns
another may be to use OFFSET to resize the returned array
syntax is
OFFSET(reference,rows,cols,height,width)
I suggest modifying the 'GetCustomers' function to include an optional Boolean variable to tell the function to return all the columns, or just the single column. That would be the cleanest solution, instead of trying to handle it on the formula side.
Public Function GetCustomers(rng as Range, Optional stripColumns as Boolean = False) as Variant()
If stripColumns Then 'Resize array to meet your needs
Else 'return full sized array
End If
End Function
You can use the INDEX function to extract items from the return array
- formula is in an range starting at cell B2
{=INDEX(getcustomers($A$1),ROW()-ROW($B$2)+1,COLUMN()-COLUMN($B$2)+3)}

Resources