Extracting data with Vlookup and checking values with Arrayformula - arrays

I need to extract data from one tab (extracted data) to another tab and validate the data in the following way:
if 0% assign 3
if from 0 till -10% assign 2
if from -10% and more assign 1
if from 0% till 10% assign 4
if from 10% and more assign 5
here is the link to the file https://docs.google.com/spreadsheets/d/1f8SFi2hNP6Anav7G7BYWyK-fasPk1pT1A2HFJblT-FI/edit?usp=sharing

I suggest you use two vlookups.
If you have a tab called 'Ranges' with the following two columns:
Percentage Result
-1000% 1
-10% 2
0% 3
10% 4
11% 5
Then the formula in cell B1 on the 'calculations' tab would be something like:
=arrayformula({"Con Potential";iferror(vlookup(vlookup(A2:A,'Extracted data'!A:D,4,0),Ranges!A:B,2,1),)})
Delete all data below cell B1 for the arrayformula to work correctly.
The second vlookup references col D on the 'Extracted data' tab because that is the percentage I think you are comparing? If not, alter 4 in the vlookup to another column.
If it helps, please see:
https://stackoverflow.com/help/someone-answers
NB: In place of Ranges!A:B you could use a fixed array:
=arrayformula({"Con Potential";iferror(vlookup(vlookup(A2:A,'Extracted data'!A:D,4,0),{-10,1;-0.1,2;0,3;0.1,4;0.11,5},2,1),)})
If you want to temporarily see the fixed array in case you want to edit any values, place this in a cell somewhere out of the way:
={-10,1;-0.1,2;0,3;0.1,4;0.11,5}
, is used to bump to a new column, ; is used as a return.
Relevance
Looking at 'Relevance' lookup from 'Position Delta' and this table in your sheet:
Since a 'position delta' value of 10 cannot both have a relevance of 5 and 4, I've made the assumption that 10 gets 5. If that is incorrect, then I'll adjust the boundaries.
Add this to cell C1 on the 'calculations' tab (clearing all cells below):
=arrayformula({"Relevance";iferror(vlookup(vlookup(calculations!A2:A,'Extracted data'!A:D,3,0),{0,5;11,4;21,3;31,2;41,1;51,0},2,1),)})
The fixed array {0,5;11,4;21,3;31,2;41,1;51,0} has these values:
0 5
11 4
21 3
31 2
41 1
51 0
If you need to change the boundaries so 10 is a 4, not 5, then change the vlookup to use this fixed range {0,5;10,4;20,3;30,2;40,1;50,0}:
0 5
10 4
20 3
30 2
40 1
50 0
vlookup is incremental and anything up to 11 will get 5, then 11 to 20 will get 4, 21 to 30 will get 3 and so on.
,1) in the vlookup at the far right gets the nearest value match until 'position delta' has reached the next boundary.

Related

Google Sheets - Increment cell value with ARRAYFORMULA and based on value from another field

I have a spreadsheet I'm creating and I have an ARRAYFORMULA for incrementing the number of a field based on another field. My formula looks like this (NOTE: my rows start on row 4 that is why there is a ROW(A4:A)-3):
=ARRAYFORMULA(IF(ROW(A4:A) = 4, 1, IF(B4:B = 1, ROW(A4:A)-3, (ROW(A4:A)-3)-(B4:B - 1))))
What I'm doing is creating groups (A) and then have a sequence counter (B) which is the number of rows within the group. I want the result to look like this where the A just picks up from where it left off (Note: B is manually entered):
A
B
1
1
2
1
3
1
3
2
3
3
4
1
5
1
However, my result is looking this this:
A
B
1
1
2
1
3
1
3
2
3
3
6
1
7
1
I know ROW gets me the row number but when I try and use INDEX formula:
=ARRAYFORMULA(IF(ROW(G4:G) = 4, 1, IF(H4:H = 1, INDEX(G4:G, ROW(G4:G)-1, 1) + 1, INDEX(G4:G, ROW(G4:G)-1, 1)))
to get the actual value of the prior cell I get a constant flashing like it's stuck in an infinite loop of some sort. I know I can probably just accomplish this without ARRAYFORMULA however, this spreadsheet will be shared and contains many other formulas that I just don't want people to have to cut and copy from the row above and get formulas all messed up. I'm dealing with non-technical people that need something to just work very simple.
Sample:
https://docs.google.com/spreadsheets/d/1FqndR4oTm_uaO7aUxYSgb3p0bZ6yBz-KnjUAD8kctd4/edit?usp=drivesdk
use:
=ARRAYFORMULA(COUNTIFS(A4:A, A4:A, ROW(A4:A), "<="&ROW(A4:A)))
reverse:
=ARRAYFORMULA(MMULT(TRANSPOSE((SEQUENCE(COUNTA(B4:B))<=
SEQUENCE(1, COUNTA(B4:B)))*IF(INDIRECT("B4:B"&COUNTA(B4:B)+ROW(B4)-1)=1, 1)),
SEQUENCE(COUNTA(B4:B))^0))

How to select the values greater than the mean in an array?

I want to apply feature selection on a dataset (lung.mat)
After loading the data, I computed the mean of distances between each feature with others by Jaccard measure. Then I sorted the distances descendingly in B1. And then I selected for example 25 number of all the features and saved the matrix in databs1.
I want to select the features that have distance values greater than the mean of the array (B1).
close all;
clc
load lung.mat
data=lung;
[n,m]=size(data);
for i=1:m-1
for j=i+1:m
t1(i,j)=fjaccard(data(:,i),data(:,j));
b1=sum(t1)/(m-1);
end
end
[B1,indB1]=sort(b1,'descend');
databs1=data(:,indB1(1:25));
databs1=[databs1,data(:,m)]; %jaccard
save('databs1.mat');
I’ll be grateful to have your opinions about how to define this in B1, selecting values of B1 which are greater than the mean of the array B1, It means cutting the rest of smaller values than the mean of B1.
I used this line,
B1(B1>mean(B1(:)))
after running, B1 still has the full number of features(column) equal to the full dataset, for example, lung.mat has 57 features and B1 by this line still has 57 columns,
I considered that by this line B1 will be cut to the number of features that are greater than the mean of B1.
the general answer to your question is here (this seems clear to you based on your code):
a=randi(10,1,10) %example data
a>mean(a) %get binary matrix of which elements are larger than mean
a(a>mean(a)) %select elements from a that are larger than mean
a =
1 9 10 7 8 8 4 7 2 8
ans =
1×10 logical array
0 1 1 1 1 1 0 1 0 1
ans =
9 10 7 8 8 7 8

Group two non-adjacent columns into 2d array for Excel VBA Script

I think this question might be related to Ms Excel -> 2 columns into a 2 dimensional array but I can't quite make the connection.
I have a VBA script for filling missing missing data. I select two adjacent columns, and it finds any gaps in the second column and linearly interpolates based on (possibly irregular) spacing in the first column. For instance, I could use it on this data:
1 7
2 14
3 21
5 35
5.1
6 42
7
8
9 45
to get this output
1 7
2 14
3 21
5 35
5.1 35.7 <---1/10th the way between 35&42
6 42
7 43 <-- 1/3 the way between 42 & 45
8 44 <-- 2/3 the way between 42 & 45
9 45
This is very useful for me.
My trouble is that it only works on contiguous columns. I would like to be able to select two columns that are not adjacent to each other and have it work the same way. My code starts out like this:
Dim addr As String
addr = Selection.Address
Dim nR As Long
Dim nC As Long
'Reads Selected Cells' Row and Column Information
nR = Range(addr).Rows.Count
nC = Range(addr).Columns.Count
When I run this with contiguous columns selected, addr shows up in the Locals window with a value like "$A$2:$B$8" and nC = 2
When I run this with non-contiguous columns selected, addr shows up in the Locals window with a value like "$A$2:$A$8,$C$2:$C$8" and nC = 1.
Later on in the script, I collect the values in each column into an array. Here's how I deal with the second column, for example:
'Creates a Column 2 (col1) array, determines cells needed to interpolate for, changes font to bold and red, and reads its values
Dim col2() As Double
ReDim col2(0 To nR + 1)
i = 1
Do Until i > nR
If IsEmpty(Selection(i, 2)) Or Selection(i, 2) = 0 Or Selection(i, 2) = -901 Then
Selection(i, 2).Font.Bold = True
Selection(i, 2).Font.Color = RGB(255, 69, 0)
col2(i) = 9999999
Else
col2(i) = Selection(i, 2)
End If
i = i + 1
Loop
This is also busted, because even if my selection is "$A$2:$A$8,$C$2:$C$8" VBA will treat Selection(1,2) as a reference to $B$2, not the desired $C$2.
Anyone have a suggestion for how I can get VBA to treat non-contiguous selection the way it treats contiguous?
You're dealing with "disjoint ranges." Use the Areas collection, e.g., as described here. The first column should be in Selection.Areas(1) and the second column should be in Selection.Areas(2).

Exclude blank/FALSE cells in in Excel array IF formula output

I am having difficulties with making an array formula work the way I want it to work.
Out of a column of dates which is not sorted, I want it to extract values into a new column. The formula below identifies the required cells of a given month and year, but they appear in their original row rather than on top of the output range. Moreover, I want all ""/FALSE cells to be excluded from the output array.
=IF((MONTH($I$15:$I$1346)=1)*(YEAR($I$15:$I$1346)=2008),$I$15:$I$1346,"")
In fact, the $I$15:$I$1346 should be dynamic and go to the last filled range (I could make a named range for that)
Part two is to expand on that formula so that it calculates the data that is an two column offset of the data described above.
Is the above possible to build into one cell probably with a combination of IF, INDEX, SMALL and maybe others?
I'm not looking for a filter solution. Hope the above is clear enough and that you can help!
Here's a shortened sample layout:
A B C
1 Date Series_A Series_B
2 03/01/2011 45 20
3 04/01/2011 73 30
4 06/01/2011 95 40
5 08/01/2011 72 50
6 06/02/2011 5 13
7 09/02/2011 12 #N/A
8 05/02/2011 23 65
9 07/03/2011 12 65
Then I want three input cells for the year and and the month and series name (index/match, as there are many more columns with data). If it would be 2011, Feb and Series_A, I want it to calculate the average for that month. In this case it would be (5+12+23)/3. If it would be Feb-2011 and Series_B instead, which has an error, it should show (13+65)/2 rather than an error.
Aside from that I want a separate which will output an array with the data instead without 'holes' in between and with the right 'length'. Example for Feb-2011 in Column C:
A B C D
1 Date Series_A Desired Output Output based on f above
2 03/01/2011 45 5
3 04/01/2011 73 12
4 06/01/2011 95 23
5 08/01/2011 72
6 06/02/2011 5 5
7 09/02/2011 12 12
8 05/02/2011 23 23
9 07/03/2011 12
If I then run a =ISBLANK(C5) it should be true, rather than =""=C5
Hope the edit clarifies
I reached out to various platsforms to get an answer, and here you have one which is ok. Still doesn't fully answer part 1, but works nonetheless.
http://www.excelforum.com/excel-formulas-and-functions/905356-exclude-blank-false-cells-in-in-excel-array-if-formula-output.html

Need a formula for cumulative moving averages in Open Office

I have data in column A, and would like to put the averages in column B like this:
a b
1 10 10
2 7 8.5
3 8 8.333
4 19 11
5 13 11.5
where b1 =average(a1), b2 =average(a1:a2), b3 =average(a1:a3)....
Using average() is alright for small amounts of data, but I have over 1500 data entries. I would like to find a more efficient way of doing this.
Make your initial range reference absolute, while the other is relative, i.e.:
b4 = average($a$1:a4)
You can paste that 1500 times an it will always increment the end of the range while keeping the beginning pinned to A1 due to the dollar signs in that reference.

Resources