Finding key for minimum value and conditions in excel - arrays

This is my table (copied from the similar question Finding minimum value in index(match) array [EXCEL])
A B C D
tasmania 10 3 10
queensland 22 8 10
new south wales 10 12 12
northern territory 8 4 15
south australia 12 2 8
western australia 32 4 15
tasmania 72 6 16
I have criteria for B and C, and I want to retrieve the A with the lowest corresponding value D. Values in B, C and D can be duplicates, values in A can not.
Example:
B >= 8
C >= 4
Should result in "queensland" (lowest matching value is 10), but not "tasmania" (has the same cost)
I am currently trying this array formula:
{ =MIN(IF(B:B>=8;IF(C:C>=4;D;""));1) }
Which returns the correct lowest D, but since I am losing the informaiton about A, I can not retrieve the value for A

This as an array formula should work for you:
=INDEX($A$1:$A$7,MATCH(MIN(IF($B$1:$B$7>=8,IF($C$1:$C$7>=4,$D$1:$D$7))),IF($B$1:$B$7>=8,IF($C$1:$C$7>=4,$D$1:$D$7)),0))
It should be noted that if you have Excel 2016 or Office365, you'll have access to the MINIFS function which is probably better suited for this task (i don't actually have the newest version, so am unable to test)

Related

Query min column header while excluding blanks and handling duplicates

I have the following table.
Name
Score A
Score B
Score C
Bob
8
6
Sue
9
12
9
Joe
11
2
Susan
7
9
10
Tim
10
12
4
Ellie
9
8
7
In my actual table there are about 2k rows.
I am trying to get the min score (excluding blanks & handles duplicate scores) for each person into another column using the QUERY formula or ARRAYFORMULA, really to avoid entering a formula for each row.
As I do currently have this
=INDEX($B$1:$D$1,MATCH(MIN(B2:D2),B2:D2,0))
But that involves dragging down through each cell, as I do this on a few sheets that have circa 2k rows, it's very slow when inputting new data.
This should be the end result
Name
Score A
Score B
Score C
Min Score
Bob
8
6
Score C
Sue
9
12
9
Score A
Joe
11
2
Score B
Susan
7
9
10
Score A
Tim
10
12
4
Score C
Ellie
9
8
7
Score C
use:
=INDEX(SORTN(SORT(SPLIT(QUERY(FLATTEN(
IF(B2:D="",,B1:D1&"×"&B2:D&"×"&ROW(B2:D))),
"where Col1 is not null", ),
"×"), 3, 1, 2, 1), 9^9, 2, 3, 1),, 1)
The following answer employs three of the newest set of functions that are still being rolled out by Google so you might not be able to use it right now, but in a few weeks when they're fully rolled out you definitely will (this worked using the Android version of Sheets just now for me):
=arrayformula(if(len(A2:A),byrow(B2:D,lambda(row,xlookup(min(row),row,B1:D1))),))
Assuming the names are in column A, this should give a result for every row which has a name in it. I'm sure there are other ways of doing this, but these 'row/column-wise' problems are really ideal use-cases for LAMBDA and its helper functions like BYROW.

How to select the values greater than the mean in an array?

I want to apply feature selection on a dataset (lung.mat)
After loading the data, I computed the mean of distances between each feature with others by Jaccard measure. Then I sorted the distances descendingly in B1. And then I selected for example 25 number of all the features and saved the matrix in databs1.
I want to select the features that have distance values greater than the mean of the array (B1).
close all;
clc
load lung.mat
data=lung;
[n,m]=size(data);
for i=1:m-1
for j=i+1:m
t1(i,j)=fjaccard(data(:,i),data(:,j));
b1=sum(t1)/(m-1);
end
end
[B1,indB1]=sort(b1,'descend');
databs1=data(:,indB1(1:25));
databs1=[databs1,data(:,m)]; %jaccard
save('databs1.mat');
I’ll be grateful to have your opinions about how to define this in B1, selecting values of B1 which are greater than the mean of the array B1, It means cutting the rest of smaller values than the mean of B1.
I used this line,
B1(B1>mean(B1(:)))
after running, B1 still has the full number of features(column) equal to the full dataset, for example, lung.mat has 57 features and B1 by this line still has 57 columns,
I considered that by this line B1 will be cut to the number of features that are greater than the mean of B1.
the general answer to your question is here (this seems clear to you based on your code):
a=randi(10,1,10) %example data
a>mean(a) %get binary matrix of which elements are larger than mean
a(a>mean(a)) %select elements from a that are larger than mean
a =
1 9 10 7 8 8 4 7 2 8
ans =
1×10 logical array
0 1 1 1 1 1 0 1 0 1
ans =
9 10 7 8 8 7 8

saving hashtable using c so that random access is faster

I am writing a C code (call it database generation) processes an input file and generated a number in range [1,10^8] alongwith a sequence of float values whose length is fixed but unknown followed by 3 integers. All values are separated by space
Example:
19432 23.45 32.12 45.76 ...(156 such float values) 4 6 106
This will be one line of database where first number is hash index (one to 10^8) , and last 3 integers denote the x,y coordinated and document ID respectively.
Our database is saved in file xyz which has following content
2341 34.67 43.13 ... (234 such float values) 5 8 123
2352 46.92 41.89 ... (51 such float values) 1 9 145
2352 46.92 41.89 ... (98 such float values) 2 7 12
2359 12.71 72.90 ... (141 such float values) 8 12 13
The starting number (hash index value) will always be in non-decreasing order in database as we proceed from one line to next.
I have another C code (call it retrieval) which takes hash index value as input and should output all lines starting with that value.
I have 2 questions
How can I make sure that retrieval directly jumps to line containing asked hash index value skipping the starting lines of database so that its response is fast.
When I get another input file for database and its hash index value is 2352. How do i add another line starting with 2352 at its proper position in database?
I am considering following approach which is not ideal, as the database won't be organised in required non-decreasing order of hash index values. Also, database is split into 2 components. One contains byte offset entries for each hash index and another is the database file presented above.
It involves
(1)byte-offset.txt of the form
2341 byte-pos-1
2352 byte-pos-2
2359 byte-pos-3
2352 byte-pos-4
(2)database.txt of the form
2341 34.67 43.13 ... (234 such float values) 5 8 123
2352 46.92 41.89 ... (51 such float values) 1 9 145
2359 12.71 72.90 ... (141 such float values) 8 12 13
2352 46.92 41.89 ... (98 such float values) 2 7 12
the only good thing about it is that new entries can be appended to end in each file as database grows when we get more data.

Exclude blank/FALSE cells in in Excel array IF formula output

I am having difficulties with making an array formula work the way I want it to work.
Out of a column of dates which is not sorted, I want it to extract values into a new column. The formula below identifies the required cells of a given month and year, but they appear in their original row rather than on top of the output range. Moreover, I want all ""/FALSE cells to be excluded from the output array.
=IF((MONTH($I$15:$I$1346)=1)*(YEAR($I$15:$I$1346)=2008),$I$15:$I$1346,"")
In fact, the $I$15:$I$1346 should be dynamic and go to the last filled range (I could make a named range for that)
Part two is to expand on that formula so that it calculates the data that is an two column offset of the data described above.
Is the above possible to build into one cell probably with a combination of IF, INDEX, SMALL and maybe others?
I'm not looking for a filter solution. Hope the above is clear enough and that you can help!
Here's a shortened sample layout:
A B C
1 Date Series_A Series_B
2 03/01/2011 45 20
3 04/01/2011 73 30
4 06/01/2011 95 40
5 08/01/2011 72 50
6 06/02/2011 5 13
7 09/02/2011 12 #N/A
8 05/02/2011 23 65
9 07/03/2011 12 65
Then I want three input cells for the year and and the month and series name (index/match, as there are many more columns with data). If it would be 2011, Feb and Series_A, I want it to calculate the average for that month. In this case it would be (5+12+23)/3. If it would be Feb-2011 and Series_B instead, which has an error, it should show (13+65)/2 rather than an error.
Aside from that I want a separate which will output an array with the data instead without 'holes' in between and with the right 'length'. Example for Feb-2011 in Column C:
A B C D
1 Date Series_A Desired Output Output based on f above
2 03/01/2011 45 5
3 04/01/2011 73 12
4 06/01/2011 95 23
5 08/01/2011 72
6 06/02/2011 5 5
7 09/02/2011 12 12
8 05/02/2011 23 23
9 07/03/2011 12
If I then run a =ISBLANK(C5) it should be true, rather than =""=C5
Hope the edit clarifies
I reached out to various platsforms to get an answer, and here you have one which is ok. Still doesn't fully answer part 1, but works nonetheless.
http://www.excelforum.com/excel-formulas-and-functions/905356-exclude-blank-false-cells-in-in-excel-array-if-formula-output.html

Need a formula for cumulative moving averages in Open Office

I have data in column A, and would like to put the averages in column B like this:
a b
1 10 10
2 7 8.5
3 8 8.333
4 19 11
5 13 11.5
where b1 =average(a1), b2 =average(a1:a2), b3 =average(a1:a3)....
Using average() is alright for small amounts of data, but I have over 1500 data entries. I would like to find a more efficient way of doing this.
Make your initial range reference absolute, while the other is relative, i.e.:
b4 = average($a$1:a4)
You can paste that 1500 times an it will always increment the end of the range while keeping the beginning pinned to A1 due to the dollar signs in that reference.

Resources