How do you find a value between multiple named number ranges in Google Sheets - arrays

I have values in a certain column as follows.
Rank Score
A 10
B 24
C 35
D 88
E 192
.
.
.
And so on. There are far too many entries to do an IFS statement and the numbers have an arbitrary difference between levels (A to Z). If I have a number, say 85, as per the info above, it should be rank C (between 35 and 88).
I want to check which rank it falls under. I need a single formula so I can apply it across another sheet with multiple scores that need to be ranked.

use floating VLOOKUP:
=VLOOKUP(D2, {B:B, A:A}, 2, 1)
for arrayformula do:
=ARRAYFORMULA(IFNA(VLOOKUP(D2:D, {B:B, A:A}, 2, 1)))
also see alternatives: https://webapps.stackexchange.com/q/123729/186471

Related

Aggregating in arrays

The netCDF4 file I want to work with in R is too large. I want to write a loop that will read in a chunk of the data and summarize it.
The variable I wish to read in has 4 dimensions; 'lat', 'lon', 'member' and 'time'. The time has a monthly resolution and the member contains 60 ensemble runs from a climate model.
Using the ncvar_get command I have extracted 12 time slices from the netCDF, leaving me with a 4 dimension array.
num[1:144,1:69:1:60,1:12]
How would I aggregate this so that I would have annual data.
I am assuming that your fourth dimension, which has a length of 12 is the number of months and you would like to aggregate over this dimension and return an array of dimension c(144, 69, 60).
reproducible data (an array of the same dimensions, with all 1s)
myArray <- array(1, dim =c(144, 69, 60, 12))
Here is a method using apply:
mySumArray <- apply(myArray, c(1,2,3), sum)
This returns an array with the following dimensions:
dim(mySumArray)
[1] 144 69 60
and where the first three elements are:
mySumArray[1:3]
[1] 12 12 12
If you wanted the mean or some other function, just replace sum with you desired function.
An optimized version of summing and calculating the mean is rowSums and rowMeans.
mySumArray <- rowSums(myArray, dims=3)
returns the same result as above MUCH faster.

Stata: Observation-pairwise calculation

input X group
21 1
62 1
98 1
12 2
87 2
end
Now I try to calculate a measure as follows:
$$ \sum_{g} \left | X_{ig}-X_{jg} \right | $$
,where $i$ or $j$ ($i \neq j$) indexes an observation. g corresponds to the group variable (here, 1 and 2)
How to calculate this number using loops?
Looks like a Gini mean difference, apart from a scaling factor. There are numerous user-written commands already in this territory. There is (unusually) a summary within the Stata manual at [R] inequality.
In addition, this is related to the second L-moment. See the lmoments command from SSC.
You need not calculate this through a double loop over indexes. It collapses to a linear combination of the order statistics.
LATER: See David's 1998 paper which is open-access at
https://doi.org/10.1214/ss/1028905831

How can I lookup a column value based on a percentage that must match a range specified in the same row as the return value?

I am a lecturer and am trying to build a mini grading spreadsheet that based on the points given to a student will calculate a percentage value describing their grade and I now want to assign the appropriate grade to a student's column in Sheet1 so that it reads
50%
C-
I have created a matrix of the range identifiers and the text value of the grade on Sheet2 which looks like this:
A B C
1 A++ 95 100
2 A+ 90 94.5
3 A 85 89.5
4 A- 80 84.5
5 B+ 75 79.5
6 B 70 74.5
7 B- 65 69.5
8 C+ 60 64.5
9 C 55 59.5
10 C- 50 54.5
11 D 0 49.5
Column A holds the string describing the grade,
Column B is the start of the range,
Column C is the end of the range for a given grade.
On Sheet1 I run through all the basic calculations which returns me with a single percentage value to match against this matrix, let's call the field of this value Sheet1!A1 for now.
What I in essence want to do is loop through the rows 1-11 of Sheet2 and do this:
IF ( Sheet1!A1 >= Sheet2!B1 AND Sheet1!A1 <= Sheet2!C1 ) THEN RETURN Sheet2!A1
Can this be done? I've read through all the supporting documentation and have not been able to find a way to do this yet.
Any help would be much appreciated.
Many thanks for reading,
Jannis
Anand Varma just replied to the same question posted on the Google Forums with the solution so for completeness I will post his answer here as well.
The original answer on the Google Forums is here.
The solution was using the INDEX function with the MATCH function inside of it:
=INDEX( Sheet2!$A$1:Sheet2!$A$11, MATCH( C41, Sheet2!$C$1:Sheet2!$C$11, -1 ) )
C41 refers to the cell that holds the numerical source value to match against.
Thanks Anand.

How to get an evenly distributed sample from Perl array values?

I have an array containing many values between 0 and 360 (like degrees in a circle), but unevenly distributed:
1,45,46,47,48,49,50,51,52,53,54,55,100,120,140,188, 210, 280, 355
Now I need to reduce those values to e.g. 4 only, but as evenly as possible distributed values.
How to do that?
Thanks,
Jan
Put the numbers on a circle, like a clock. Now construct a logical cross, say at 12, 3, 6, and 9 o’clock. Put the 12 at the first number. Now find what numbers would be nearest to 3, 6, and 9 o’clock, and record the sum of those three numbers’ distances next to the first number.
Iterate by rotating the top of your cross — the 12 o’clock point — clockwise until it exactly lines up with the next number. Again measure how far the nearest numbers are to each of your three other crosspoints, and record that score next to this current 12 o’clock number.
Repeat until you reach your 12 o’clock has rotated all the way to the original 3 o’clock, at which point you’re done. Whichever number has the lowest sum assigned to it determines the winning configuration.
This solution generalizes to any range of values R and any number N of final points you wish to reduce the set to. Each point on the “cross” is R/N away from each other, and you need only rotate until the top of your cross reaches where the next arm was in the original position. So if you wanted 6 points, you would have a 6-pointed cross, each 60 degrees apart instead of a 4-pointed cross each 90 degrees apart. If your range is different, you still do the same sort of operation. That way you don’t need a physical clock and cross to implement this algorithm: it works for any R and N.
I feel bad about this answer from a Perl perspective, as I’ve not managed to include any dollar signs in the solution. :)
Use a clustering algorithm to divide your data into evenly distributed partitions. Then grab a random value from each cluster. The following $datafile looks like this:
1 1
45 45
46 46
...
210 210
280 280
355 355
First column is a tag, second column is data. Running the following with $K = 4:
use strict; use warnings;
use Algorithm::KMeans;
my $datafile = $ARGV[0] or die;
my $K = $ARGV[1] or 0;
my $mask = 'N1';
my $clusterer = Algorithm::KMeans->new(
datafile => $datafile,
mask => $mask,
K => $K,
terminal_output => 0,
);
$clusterer->read_data_from_file();
my ($clusters, $cluster_centers) = $clusterer->kmeans();
my %clusters;
while (#$clusters) {
my $cluster = shift #$clusters;
my $center = shift #$cluster_centers;
$clusters{"#$center"} = $cluster->[int rand( #$cluster - 1)];
}
use YAML; print Dump \%clusters;
returns this:
120: 120
199: 188
317.5: 355
45.9166666666667: 46
First column is the center of the cluster, second is the selected value from that cluster. The centers' distance to one another should be maximized according to the Expectation Maximization algorithm.

Find all possible row-wise sums in a 2D array

Ideally I'm looking for a c# solution, but any help on the algorithm will do.
I have a 2-dimension array (x,y). The max columns (max x) varies between 2 and 10 but can be determined before the array is actually populated. Max rows (y) is fixed at 5, but each column can have a varying number of values, something like:
1 2 3 4 5 6 7...10
A 1 1 7 9 1 1
B 2 2 5 2 2
C 3 3
D 4
E 5
I need to come up with the total of all possible row-wise sums for the purpose of looking for a specific total. That is, a row-wise total could be the cells A1 + B2 + A3 + B5 + D6 + A7 (any combination of one value from each column).
This process will be repeated several hundred times with different cell values each time, so I'm looking for a somewhat elegant solution (better than what I've been able to come with). Thanks for your help.
The Problem Size
Let's first consider the worst case:
You have 10 columns and 5 (full) rows per column. It should be clear that you will be able to get (with the appropriate number population for each place) up to 5^10 ≅ 10^6 different results (solution space).
For example, the following matrix will give you the worst case for 3 columns:
| 1 10 100 |
| 2 20 200 |
| 3 30 300 |
| 4 40 400 |
| 5 50 500 |
resulting in 5^3=125 different results. Each result is in the form {a1 a2 a3} with ai ∈ {1,5}
It's quite easy to show that such a matrix will always exist for any number n of columns.
Now, to get each numerical result, you will need to do n-1 sums, adding up to a problem size of O(n 5^n). So, that's the worst case and I think nothing can be done about it, because to know the possible results you NEED to effectively perform the sums.
More benign incarnations:
The problem complexity may be cut off in two ways:
Less numbers (i.e. not all columns are full)
Repeated results (i.e. several partial sums give the same result, and you can join them in one thread). Much more in this later.
Let's see a simplified example of the later with two rows:
| 7 6 100 |
| 3 4 200 |
| 1 2 200 |
at first sight you will need to do 2 3^3 sums. But that's not the real case. As you add up the first column you don't get the expected 9 different results, but only 6 ({13,11,9,7,5,3}).
So you don't have to carry your nine results up to the third column, but only 6.
Of course, that is on the expense of deleting the repeating numbers from the list. The "Removal of Repeated Integer Elements" was posted before in SO and I'll not repeat the discussion here, but just cite that doing a mergesort O(m log m) in the list size (m) will remove the duplicates. If you want something easier, a double loop O(m^2) will do.
Anyway, I'll not try to calculate the size of the (mean) problem in this way for several reasons. One of them is that the "m" in the sort merge is not the size of the problem, but the size of the vector of results after adding up any two columns, and that operation is repeated (n-1) times ... and I really don't want to do the math :(.
The other reason is that as I implemented the algorithm, we will be able to use some experimental results and save us from my surely leaking theoretical considerations.
The Algorithm
With what we said before, it is clear that we should optimize for the benign cases, as the worst case is a lost one.
For doing so, we need to use lists (or variable dim vectors, or whatever can emulate those) for the columns and do a merge after every column add.
The merge may be replaced by several other algorithms (such as an insertion on a BTree) without modifying the results.
So the algorithm (procedural pseudocode) is something like:
Set result_vector to Column 1
For column i in (2 to n-1)
Remove repeated integers in the result_vector
Add every element of result_vector to every element of column i+1
giving a new result vector
Next column
Remove repeated integers in the result_vector
Or as you asked for it, a recursive version may work as follows:
function genResVector(a:list, b:list): returns list
local c:list
{
Set c = CartesianProduct (a x b)
Set c = Sum up each element {a[i],b[j]} of c </code>
Drop repeated elements of c
Return(c)
}
function ResursiveAdd(a:matrix, i integer): returns list
{
genResVector[Column i from a, RecursiveAdd[a, i-1]];
}
function ResursiveAdd(a:matrix, i==0 integer): returns list={0}
Algorithm Implementation (Recursive)
I choose a functional language, I guess it's no big deal to translate to any procedural one.
Our program has two functions:
genResVector, which sums two lists giving all possible results with repeated elements removed, and
recursiveAdd, which recurses on the matrix columns adding up all of them.
recursiveAdd, which recurses on the matrix columns adding up all of them.
The code is:
genResVector[x__, y__] := (* Header: A function that takes two lists as input *)
Union[ (* remove duplicates from resulting list *)
Apply (* distribute the following function on the lists *)
[Plus, (* "Add" is the function to be distributed *)
Tuples[{x, y}],2] (*generate all combinations of the two lists *)];
recursiveAdd[t_, i_] := genResVector[t[[i]], recursiveAdd[t, i - 1]];
(* Recursive add function *)
recursiveAdd[t_, 0] := {0}; (* With its stop pit *)
Test
If we take your example list
| 1 1 7 9 1 1 |
| 2 2 5 2 2 |
| 3 3 |
| 4 |
| 5 |
And run the program the result is:
{11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27}
The maximum and minimum are very easy to verify since they correspond to taking the Min or Max from each column.
Some interesting results
Let's consider what happens when the numbers on each position of the matrix is bounded. For that we will take a full (10 x 5 ) matrix and populate it with Random Integers.
In the extreme case where the integers are only zeros or ones, we may expect two things:
A very small result set
Fast execution, since there will be a lot of duplicate intermediate results
If we increase the Range of our Random Integers we may expect increasing result sets and execution times.
Experiment 1: 5x10 matrix populated with varying range random integers
It's clear enough that for a result set near the maximum result set size (5^10 ≅ 10^6 ) the Calculation time and the "Number of != results" have an asymptote. The fact that we see increasing functions just denote that we are still far from that point.
Morale: The smaller your elements are, the better chances you have to get it fast. This is because you are likely to have a lot of repetitions!
Note that our MAX calculation time is near 20 secs for the worst case tested
Experiment 2: Optimizations that aren't
Having a lot of memory available, we can calculate by brute force, not removing the repeated results.
The result is interesting ... 10.6 secs! ... Wait! What happened ? Our little "remove repeated integers" trick is eating up a lot of time, and when there are not a lot of results to remove there is no gain, but looses in trying to get rid of the repetitions.
But we may get a lot of benefits from the optimization when the Max numbers in the matrix are well under 5 10^5. Remember that I'm doing these tests with the 5x10 matrix fully loaded.
The Morale of this experiment is: The repeated integer removal algorithm is critical.
HTH!
PS: I have a few more experiments to post, if I get the time to edit them.

Resources