Excel array Formula that copies only cells containing a string - arrays

I would prefer to use an excel array formula (but if it can only be done in VBA, so be it) that copies ALL cells from a column array that contains specific text. The picture below shows what I am after and what I have tried. I'm getting close (thanks to similar, but different questions) but can't quite get to where I want. At the moment, I am getting only the first cell instead of all the cells. In my actual application, I am searching through about 20,000 cells and will have a few hundred search terms. I expect most search terms to give me about 8 - 12 cells with that value.
formula I am using:
=INDEX($A$4:$A$10,MATCH(FALSE,ISERROR(SEARCH($C$1,$A$4:$A$10)),0))
Spredsheet Image

To make this work efficiently, I recommend having a separate cell holding the results count (I used cell C2) which has this formula:
=COUNTIF(A:A,"*"&C1&"*")
Then in cell C4 and copied down use this array formula (The -3 is just because the header row is row 3. If the header row was row 1, it would be -1):
=IF(ROW(A1)>$C$2,"",INDEX($A$4:$A$21000,SMALL(IF(ISNUMBER(SEARCH($C$1,$A$4:$A$21000)),ROW($A$4:$A$21000)-3),ROW(C1))))
I tested this with 21000 rows of data in column A with an average of 30 results per search string and the formula is copied down for 60 cells in column C. With that much data, this takes about 1-2 seconds to finish recalculating. Recalculation time can vary widely depending on other factors in your workbook (additional formulas, nested dependencies, use of volatile functions, etc) as well as your hardware.
Alternately, you could just use the built-in Filter functionality, but I hope this helps.

You need to get the ROWS. Put this in C4 and copy down.
=IFERROR(AGGREGATE(15,6, IF(SEARCH($C$1, $A$4:$A$10)>0, ROW($A$4:$A$10)), ROW($C4)-ROW($A$4)+1), "")
Array formula so use ctrl-shift-Enter

Related

How would I reference cells in order while moving down several cells in a formula on a separate sheet?

I have been at this for hours and it's kicking me. I'm trying to build a log for someone, and I have a sheet with standard data in table format. I need the next sheet to look a certain way so that it can be exported to PDF and continue looking like the log always has - which means that it will not be a standard table.
In the Log sheet, data is all on one row, in the PrintSheet the cell references will be placed in three rows, with a gap fourth row. Obviously, when you paste formulas in Excel, it picks the row you're in, vs the next row down in the referenced sheet. I've included the formulas that "work" in blue in the image for reference, but that would involve manually subtracting 3 (or 4 depending on which one I'm doing) to each formula (Formula for reference -- =Log!$A$1&": "&INDIRECT("'log'!A"&ROW()-3).
Is there a way to dynamically write this formula so it can just be copy/pasted every 4th row when they need more in the PrintSheet? Is it possible I need to be using an array formula (that is an area of Excel that I am deeply lacking in)?
Input Sheet (log) vs Output Sheet (PrintSheet) with formulas in blue
Use a bit of maths on the row number and pull the formula down as required:
=CHOOSE(MOD(ROW()-2,4)+1,Log!$A$1&":"&INDEX(Log!A:A,QUOTIENT(ROW()-2,4)+2),Log!$D$1&":"&INDEX(Log!D:D,QUOTIENT(ROW()-2,4)+2),"","")
Log:
PrintSheet:
If you have Excel 365, you can do it using the same method but as a spill formula:
=LET(logRows,COUNTA(Log!A:A)-1,
seq,SEQUENCE(logRows*4,1,0),
CHOOSE(MOD(seq,4)+1,Log!$A$1&":"&INDEX(Log!A:A,QUOTIENT(seq,4)+2),
Log!$D$1&":"&INDEX(Log!D:D,QUOTIENT(seq,4)+2),"",""))
In this case it counts the number of rows in the log so will expand as you add more rows.
The date and time can be done in a similar way.
Instead of row()-3 etc. have a look here whether this offset calculation helps:
Column A is the row in the output list and column B is the target row from the input list

Can I make an array out of a range of countif functions?

A truncated version of my data is in the form shown in the screenshot below: three columns of 5 unique names. The names appear in any order and in any position but never repeat in a single row.
My goal is to create an array that contains the number of times Adam appears in each row. I can fill down the formula=countif(A2:C2,$I$2) in a new column, or if I write the array manually for each row, it looks like:
={countif(A2:C2,$I$2);countif(A3:C3,$I$2);countif(A4:C4,$I$2);countif(A5:C5,$I$2);countif(A6:C6,$I$2)}
Where cell I2 contains "Adam". Of course, this is not feasible for large data sets.
I know that arrays are effectively cells turned into ranges, but my main issue is that the cell I'm trying to transform already references a range, and I don't know how to tell the software to apply the countif down each row (i.e. I intuitively would like to do something like countif((A2:C2):(A99:C99),"Adam") but understand that's not how spreadsheets work).
My goal is ultimately to perform some operations on the corresponding array but I think I'm comfortable enough with that once I can get the array formula I'm looking for.
try:
=ARRAYFORMULA(IF(A2:A="",,MMULT(IF(A2:C="Adam", 1, 0), {1;1;1})))

Excel Array: Reducing the Number of Calculation Steps

As per my on-going journey through the world of Excel arrays, I was wondering if someone might be able to give me a pointer or two.
On the excel sheet attached, I currently have a four-step process to get from a segregated lookup to a gapless list:
Step 1 (yellow): For the 50-word long list in sheet 'Data', a 50-cell lookup is performed to see whether the input in row 1 (red) appears somewhere in the corresponding cell. In this case, the lookup is performed three times for three different inputs, i.e. in columns C-E.
Step 2 (orange): An array then relists the contents of the 50-cell lookup above it but removes all empty cells (i.e. where there is no match to the input in row 1)
Step 3 (green): The results from step 2 are listed out in a single column.
Step 4 (blue): The results from step 3 are listed out using the same technique as in step 2 in order to remove the blank cells.
Collectively, this enables a gapless listing of all data objects which contain the given inputs somewhere in their string.
However, my real list of data objects is 5000 entries long and I would like to look up the results for 100 or more inputs. As step 1 requires each combination to be looked up separately, this requires 500,000 calculations for step 1 alone, which causes a heavy toll on the processors.
Therefore, I was wondering if anyone had an idea as to how I could shortcut this process to reduce the number of cells / calculations involved. I assume that step 1 and 2 could somehow be merged, but my knowledge of arrays is not sufficient to think of how this could be done.
It would be brilliant to hear from somebody who may have some advice on the matter!
Kind regards,
Rob
File Link: https://drive.google.com/open?id=10O91QDD78RkbWtQx2iWfax17Dt5TPw1G
Since you're not removing duplicated entries from the final list, this is quite straightforward.
Based on the workbook you provided, to be entered within the Lookup sheet:
In cell A1:
=SUMPRODUCT(0+ISNUMBER(FIND(C1:E1,Data!A1:A50)))
In any cell of your choice, to begin the list of returns, array formula**:
=IF(ROWS($1:1)>A$1,"",INDIRECT("'Data'!"&TEXT(SMALL(IF(ISNUMBER(FIND(C$1:E$1,Data!A$1:A$50)),10^5*ROW(Data!A$1:A$50)+COLUMN(Data!A$1:A$50)),ROWS($1:1)),"R0C00000"),0))
and copied down until you start to get blanks for the results.
Notes:
Instructions for entering an array formula are at the foot of this post.
The sheet name (emboldened within the second formula) should be amended as required.
It is important that the range containing the values being searched for (A1:C1 here) and that containing the entries to be searched within (A1:A50) be orthogonal, i.e. one is a horizontal range, the other a vertical range.
If you are not using an English-language version of Excel then the part "R0C00000" within the second formula may need amending.
Regards
**Array formulas are not entered in the same way as 'standard' formulas. Instead of pressing just ENTER, you first hold down CTRL and SHIFT, and only then press ENTER. If you've done it correctly, you'll notice Excel puts curly brackets {} around the formula (though do not attempt to manually insert these yourself).

Excel sum across multiple sheets with criteria array

I have a workbook with several sheets, each containing a large amount of data formatted identically. What I'd like to do is enter a formula on a summary sheet that sums data from across the data sheets, selecting the data to sum based on an array of criteria.
The list of sheets is named 'AdHoc_Sheets' and the list of criteria is named 'Uncontrollable_Compensation'.
First attempt:
=SUMPRODUCT(SUMIF(INDIRECT("'"&AdHoc_Sheets&"'!"&"C:C"),A40,INDIRECT("'"&AdHoc_Sheets&"'!"&"E:E")))
This works well when only a single criteria (in this case 'A40') is needed. The challenge I'm finding is changing that to be an array of criteria.
Second attempt:
={SUMPRODUCT(SUM(IF(ISERROR(MATCH(INDIRECT("'"&AdHoc_Sheets&"'!"&"C:C"),TRANSPOSE(Uncontrollable_Compensation),0)),0,INDIRECT("'"&AdHoc_Sheets&"'!"&"E:E"))))}
Which returns a zero when it's not CSE'd and an #N/A error when it is CSE'd. Something about the dynamics of juggling the arrays is messing me up, and I can't quite tell if I need to turn to MMULT or some other method. Thanks in advance.
Assuming that the entries in column C are text, not numeric, array formula**:
=SUM(IF(ISNUMBER(MATCH(T(OFFSET(INDIRECT("'"&AdHoc_Sheets&"'!"&"C1"),TRANSPOSE(ROW(C1:C100)-MIN(ROW(C1:C100))),0)),Uncontrollable_Compensation,0)),N(OFFSET(INDIRECT("'"&AdHoc_Sheets&"'!"&"E1"),TRANSPOSE(ROW(C1:C100)-MIN(ROW(C1:C100))),0))))`
With such a construction you cannot 'get away' with arbitrarily referencing entire columns without detriment to performance. Hence my choice of range from row 1 to row 100, which obviously you can change, though be sure to keep it as small as possible.
Regards
**Array formulas are not entered in the same way as 'standard' formulas. Instead of pressing just ENTER, you first hold down CTRL and SHIFT, and only then press ENTER. If you've done it correctly, you'll notice Excel puts curly brackets {} around the formula (though do not attempt to manually insert these yourself).

speed up array formula

I have the following formula which I will explain below:
{=SUM(IF(($G$1:$L$1=$O$1)*($G$2:$L$2=$O$2)*($G$3:$L$3=$O$3)*($G$4:$L$4=$O$4)*($G$5:$L$5=$O$5)*($G$6:$L$6=$O$6)*($G$7:$L$7=$O$7);G21:L21))}
Here is what the worksheet looks like:
Under columns G - L we have a 'database' of all data. These columns will be added cumulatively each quarter (approx 30 columns a quarter). So after a few years we have ended up with a bunch of database columns (1000 + columns of raw data). For the sake of this demo, I have only included those 6 columns.
As you can see, each column contains specific parameters, between rows 1 - 7, which allows to identify specific CountryCode + Project Code + Category + Fiscal Year, + ... (etc.). This allows us to track down a unique specific project and retrieve its data.
What we have afterwards on the column O is a specific project we are trying to retrieve values for (you can see that the rows 1 - 7 are the same as under column G (we are trying to retrieve values for this particular project).
Here comes our formula. I have attached above. Here is what it looks like when I press F2. As you can see the IF statement is first simply checking whether the particular columns match the pre-defined criteria under column O and it sums all the columns that match all the criteria between rows 1-7.
Now here is the problem. We have a worksheet, which contains 20 projects (such as column O) and we are using this array formula there to retrieve values. The problem is that retrieving data using this way takes A LOT OF TIME. We have also adopted a principle via VBA that we iterate through all the cells, then we insert a formula, calculate array cell, and then we copy & Paste resulting value inside (so that we won't end up with full sheet of array formulas). However it still takes LONG to calculate (1 minute or so).
I was wondering, if there is a better solution how to retrieve the data in the already mentioned format (that means we have a specific criteria we are trying to find)? Maybe SUMIFS could be better? Or sumproduct? Or even compeltely different solution?
I am open to any proposal which would fasten the process.
i met similar problem about 2 weeks ago. At first i use a helper column/row. The helper column is to concatenate the 7 string in each column. then only use the IF function to check if the joined text match. Such as, assuming the helper row is row 8 per your sample, cell G8 formula would be
=CONCATENATE(G1,"|",G2,"|",G3,"|",G4,"|",G5,"|",G6,"|",G7)
and do the same for the rest including column O
=CONCATENATE(O1,"|",O2,"|",O3,"|",O4,"|",O5,"|",O6,"|",O7)
Then do a HLOOKUP
=HLOOKUP(O8,G8:L21,14,0)
In my case, the calculation time reduce from 10 min to a few seconds!
Alternatively I also found a way to do without helper column, using array again, but the idea is pretty much the same,
the formula in O21 as per your sample would be
=SUM(IF(CONCATENATE(G1:L1,G2:L2,G3:L3,G4:L4,G5:L5,G6:L6,G7:L7)=CONCATENATE(O1,O2,O3,O4,O5,O6,O7),G21:L21))
(i didn't add in the "|" delimiter for this formula, but it is better to do so)
But in the end I prefer the helper column method.
For your reference
HTH
To improve performance avoid reapeating same calculations multiple times.
This allows us to track down a unique specific project and retrieve its data.
If a combination of 7 values is unique, calculate the position of chosen project only once in helper cell (for example O15) with array formula (confirmed with Ctrl+Shift+Enter:
=MATCH(1;(G1:L1=O1)*(G2:L2=O2)*(G3:L3=O3)*(G4:L4=O4)*(G5:L5=O5)*(G6:L6=O6)*(G7:L7=O7);0)
Use the following formula in O21 and drag down:
=INDEX(G21:L21;1;$O$15)

Resources