Excel Array: Reducing the Number of Calculation Steps - arrays

As per my on-going journey through the world of Excel arrays, I was wondering if someone might be able to give me a pointer or two.
On the excel sheet attached, I currently have a four-step process to get from a segregated lookup to a gapless list:
Step 1 (yellow): For the 50-word long list in sheet 'Data', a 50-cell lookup is performed to see whether the input in row 1 (red) appears somewhere in the corresponding cell. In this case, the lookup is performed three times for three different inputs, i.e. in columns C-E.
Step 2 (orange): An array then relists the contents of the 50-cell lookup above it but removes all empty cells (i.e. where there is no match to the input in row 1)
Step 3 (green): The results from step 2 are listed out in a single column.
Step 4 (blue): The results from step 3 are listed out using the same technique as in step 2 in order to remove the blank cells.
Collectively, this enables a gapless listing of all data objects which contain the given inputs somewhere in their string.
However, my real list of data objects is 5000 entries long and I would like to look up the results for 100 or more inputs. As step 1 requires each combination to be looked up separately, this requires 500,000 calculations for step 1 alone, which causes a heavy toll on the processors.
Therefore, I was wondering if anyone had an idea as to how I could shortcut this process to reduce the number of cells / calculations involved. I assume that step 1 and 2 could somehow be merged, but my knowledge of arrays is not sufficient to think of how this could be done.
It would be brilliant to hear from somebody who may have some advice on the matter!
Kind regards,
Rob
File Link: https://drive.google.com/open?id=10O91QDD78RkbWtQx2iWfax17Dt5TPw1G

Since you're not removing duplicated entries from the final list, this is quite straightforward.
Based on the workbook you provided, to be entered within the Lookup sheet:
In cell A1:
=SUMPRODUCT(0+ISNUMBER(FIND(C1:E1,Data!A1:A50)))
In any cell of your choice, to begin the list of returns, array formula**:
=IF(ROWS($1:1)>A$1,"",INDIRECT("'Data'!"&TEXT(SMALL(IF(ISNUMBER(FIND(C$1:E$1,Data!A$1:A$50)),10^5*ROW(Data!A$1:A$50)+COLUMN(Data!A$1:A$50)),ROWS($1:1)),"R0C00000"),0))
and copied down until you start to get blanks for the results.
Notes:
Instructions for entering an array formula are at the foot of this post.
The sheet name (emboldened within the second formula) should be amended as required.
It is important that the range containing the values being searched for (A1:C1 here) and that containing the entries to be searched within (A1:A50) be orthogonal, i.e. one is a horizontal range, the other a vertical range.
If you are not using an English-language version of Excel then the part "R0C00000" within the second formula may need amending.
Regards
**Array formulas are not entered in the same way as 'standard' formulas. Instead of pressing just ENTER, you first hold down CTRL and SHIFT, and only then press ENTER. If you've done it correctly, you'll notice Excel puts curly brackets {} around the formula (though do not attempt to manually insert these yourself).

Related

Unavoidable merged Cells - Fill blank cells with previous non-blank cell in same column

I have a roster table for a sports facility that has been formatted and has a column of merged cells (for human readability). Unfortunately I cannot change the formatting to eliminate the merged cells - too many people use it and in any case I'd need to overhaul all the formulas everywhere.
The cells contain names and merge 4 rows of a single column.
Formatted roster table w/ sample data
In a separate range I am trying to take this formatted info and put it into 1st normal form for analysis & graphing purposes. Since merged cells only contain the top-leftmost value, when trying to copy the column contents by formula (e.g. "=B14") it only shows the name in the top cell followed by 3 empty ones below.
I need to fill in the blank rows by copying the athlete names down. The other column formulas are working just fine.
For the life of me I can't figure it out. It has to be a formula and not apps script due to mobile use, and I've always been really bad with certain formulas and good with others. Usually I can make a guess at it, but this time I'm just lost.
Can someone point me in the right direction?
use:
=ARRAYFORMULA(IF(B2:B="",, VLOOKUP(ROW(A2:A), IF(A2:A<>"", {ROW(A2:A), A2:A}), 2, 1)))

Excel sum across multiple sheets with criteria array

I have a workbook with several sheets, each containing a large amount of data formatted identically. What I'd like to do is enter a formula on a summary sheet that sums data from across the data sheets, selecting the data to sum based on an array of criteria.
The list of sheets is named 'AdHoc_Sheets' and the list of criteria is named 'Uncontrollable_Compensation'.
First attempt:
=SUMPRODUCT(SUMIF(INDIRECT("'"&AdHoc_Sheets&"'!"&"C:C"),A40,INDIRECT("'"&AdHoc_Sheets&"'!"&"E:E")))
This works well when only a single criteria (in this case 'A40') is needed. The challenge I'm finding is changing that to be an array of criteria.
Second attempt:
={SUMPRODUCT(SUM(IF(ISERROR(MATCH(INDIRECT("'"&AdHoc_Sheets&"'!"&"C:C"),TRANSPOSE(Uncontrollable_Compensation),0)),0,INDIRECT("'"&AdHoc_Sheets&"'!"&"E:E"))))}
Which returns a zero when it's not CSE'd and an #N/A error when it is CSE'd. Something about the dynamics of juggling the arrays is messing me up, and I can't quite tell if I need to turn to MMULT or some other method. Thanks in advance.
Assuming that the entries in column C are text, not numeric, array formula**:
=SUM(IF(ISNUMBER(MATCH(T(OFFSET(INDIRECT("'"&AdHoc_Sheets&"'!"&"C1"),TRANSPOSE(ROW(C1:C100)-MIN(ROW(C1:C100))),0)),Uncontrollable_Compensation,0)),N(OFFSET(INDIRECT("'"&AdHoc_Sheets&"'!"&"E1"),TRANSPOSE(ROW(C1:C100)-MIN(ROW(C1:C100))),0))))`
With such a construction you cannot 'get away' with arbitrarily referencing entire columns without detriment to performance. Hence my choice of range from row 1 to row 100, which obviously you can change, though be sure to keep it as small as possible.
Regards
**Array formulas are not entered in the same way as 'standard' formulas. Instead of pressing just ENTER, you first hold down CTRL and SHIFT, and only then press ENTER. If you've done it correctly, you'll notice Excel puts curly brackets {} around the formula (though do not attempt to manually insert these yourself).

How to count instances of a string occurring in an array of cells

I have a spreadsheet that tracks where consumers are referred to. To record individual referrals, users select places from a drop down list. This list then populates a cell, with values separated by commas.
On a separate sheet, I need to count the number of referrals for each referral type. So I need to count the # of times DHHS shows up in the array, for example. I've attempted to do this using the following formula:
=SUM(LEN(range))-LEN(SUBSTITUTE(range,"string",""))/LEN("string")
This is working fine for single word strings, but is not working for multiple word strings like "CHIP Water Inquiry". Any ideas why, and what I can do about it?
You need to make two minor corrections, as you are breathtakingly close in your formula. Add a second SUM formula, tally the subtracted length and then house the two subtracted sections within parentheses as you see in the posted formula. As Boris said, Ctrl + Shift + Enter it, as you probably already know. F1 is assumed to hold your "string" that you wish to count.
=(SUM(LEN(range))-SUM(LEN(SUBSTITUTE(range,F1,""))))/LEN(F1)

Excel array Formula that copies only cells containing a string

I would prefer to use an excel array formula (but if it can only be done in VBA, so be it) that copies ALL cells from a column array that contains specific text. The picture below shows what I am after and what I have tried. I'm getting close (thanks to similar, but different questions) but can't quite get to where I want. At the moment, I am getting only the first cell instead of all the cells. In my actual application, I am searching through about 20,000 cells and will have a few hundred search terms. I expect most search terms to give me about 8 - 12 cells with that value.
formula I am using:
=INDEX($A$4:$A$10,MATCH(FALSE,ISERROR(SEARCH($C$1,$A$4:$A$10)),0))
Spredsheet Image
To make this work efficiently, I recommend having a separate cell holding the results count (I used cell C2) which has this formula:
=COUNTIF(A:A,"*"&C1&"*")
Then in cell C4 and copied down use this array formula (The -3 is just because the header row is row 3. If the header row was row 1, it would be -1):
=IF(ROW(A1)>$C$2,"",INDEX($A$4:$A$21000,SMALL(IF(ISNUMBER(SEARCH($C$1,$A$4:$A$21000)),ROW($A$4:$A$21000)-3),ROW(C1))))
I tested this with 21000 rows of data in column A with an average of 30 results per search string and the formula is copied down for 60 cells in column C. With that much data, this takes about 1-2 seconds to finish recalculating. Recalculation time can vary widely depending on other factors in your workbook (additional formulas, nested dependencies, use of volatile functions, etc) as well as your hardware.
Alternately, you could just use the built-in Filter functionality, but I hope this helps.
You need to get the ROWS. Put this in C4 and copy down.
=IFERROR(AGGREGATE(15,6, IF(SEARCH($C$1, $A$4:$A$10)>0, ROW($A$4:$A$10)), ROW($C4)-ROW($A$4)+1), "")
Array formula so use ctrl-shift-Enter

Returning multiple adjacent cell results from an min array which may include multiple duplicate values

I'm trying to setup a formula that will return the contents of an related cell (my related cell is on another sheet) from the smallest 2 results in an array. This is what I'm using right now.
=INDEX('Sheet1'!$A$40:'Sheet1'!$A$167,MATCH(SMALL(F1:F128,1),F1:F128,0),1)
And
=INDEX('Sheet1'!$A$40:'Sheet1:!$A$167,MATCH(SMALL(F1:F128,2),F1:F128,0),1)
The problem I've run into is twofold.
First, if there are multiple lowest results I get whichever one appears first in the array for both entries.
Second, if the second lowest result is duplicated but the first is not I get whichever one shows up on the list first, but any subsequent duplicates are ignored. I would like to be able to display the names associated with the duplicated scores.
You will have to adjust the k parameter of the SMALL function to raise the k according to duplicates. The COUNTIF function should be sufficient for this. Once all occurrences of the top two scores are retrieved, standard 'lookup multiple values' formulas can be applied. Retrieving successive row positions with the AGGREGATE¹ function and passing those into an INDEX of the names works well.
    
The formulas in H2:I2 are,
=IF(SMALL(F$40:F$167, ROW(1:1))<=SMALL(F$40:F$167, 1+COUNTIF(F$40:F$167, MIN(F$40:F$167))), SMALL(F$40:F$167, ROW(1:1)), "") '◄ H2
=IF(LEN(H40), INDEX(A$40:A$167, AGGREGATE(15, 6, ROW($1:$128)/(F$40:F$167=H40), COUNTIF(H$40:H40, H40))), "") '◄ I2
Fill down as necessary. The scores are designed to terminate after the last second place so it would be a good idea to fill down several rows more than is immediately necessary for future duplicates.
¹ The AGGREGATE function was introduced with Excel 2010². It is not available in earlier versions.
² Related article for pre-xl2010 functions - see Multiple Ranked Returns from INDEX().
The following formula will do what I think you want:
=IF(OR(ROW(1:1)=1,COUNTIF($E$1:$E1,INDEX(Sheet1!$A$40:$A$167,MATCH(SMALL($F$1:$F$128,ROW(1:1)),$F$1:$F$128,0)))>0,ROW(1:1)=2),INDEX(Sheet1!$A$40:$A$167,MATCH(1,INDEX(($F$1:$F$128=SMALL($F$1:$F$128,ROW(1:1)))*(COUNTIF($E$1:$E1,Sheet1!$A$40:$A$167)=0),),0)),"")
NOTE:
This is an array formula and must be confirmed with Ctrl-Shift-Enter.
There are two references $E$1:$E1. This formula assumes that it will be entered in E2 and copied down. If it is going in a different column Change these two references. It must go in the second row or it will through a circular reference.
What it will do
If there is a tie for first place it will only list those teams that are tied for first.
If there is only one first place but multiple tied for second places it will list all those in second.
So make sure you copy the formula down far enough to cover all possible ties. It will put "" in any that do not fill, so err on the high side.
To get the Scores use this simple formula, I put mine in Column F:
=IF(E2<>"",SMALL($F$1:$F$128,ROW(1:1)),"")
Again change the E reference to the column you use for the output.
I did a small test:

Resources