speed up array formula - arrays

I have the following formula which I will explain below:
{=SUM(IF(($G$1:$L$1=$O$1)*($G$2:$L$2=$O$2)*($G$3:$L$3=$O$3)*($G$4:$L$4=$O$4)*($G$5:$L$5=$O$5)*($G$6:$L$6=$O$6)*($G$7:$L$7=$O$7);G21:L21))}
Here is what the worksheet looks like:
Under columns G - L we have a 'database' of all data. These columns will be added cumulatively each quarter (approx 30 columns a quarter). So after a few years we have ended up with a bunch of database columns (1000 + columns of raw data). For the sake of this demo, I have only included those 6 columns.
As you can see, each column contains specific parameters, between rows 1 - 7, which allows to identify specific CountryCode + Project Code + Category + Fiscal Year, + ... (etc.). This allows us to track down a unique specific project and retrieve its data.
What we have afterwards on the column O is a specific project we are trying to retrieve values for (you can see that the rows 1 - 7 are the same as under column G (we are trying to retrieve values for this particular project).
Here comes our formula. I have attached above. Here is what it looks like when I press F2. As you can see the IF statement is first simply checking whether the particular columns match the pre-defined criteria under column O and it sums all the columns that match all the criteria between rows 1-7.
Now here is the problem. We have a worksheet, which contains 20 projects (such as column O) and we are using this array formula there to retrieve values. The problem is that retrieving data using this way takes A LOT OF TIME. We have also adopted a principle via VBA that we iterate through all the cells, then we insert a formula, calculate array cell, and then we copy & Paste resulting value inside (so that we won't end up with full sheet of array formulas). However it still takes LONG to calculate (1 minute or so).
I was wondering, if there is a better solution how to retrieve the data in the already mentioned format (that means we have a specific criteria we are trying to find)? Maybe SUMIFS could be better? Or sumproduct? Or even compeltely different solution?
I am open to any proposal which would fasten the process.

i met similar problem about 2 weeks ago. At first i use a helper column/row. The helper column is to concatenate the 7 string in each column. then only use the IF function to check if the joined text match. Such as, assuming the helper row is row 8 per your sample, cell G8 formula would be
=CONCATENATE(G1,"|",G2,"|",G3,"|",G4,"|",G5,"|",G6,"|",G7)
and do the same for the rest including column O
=CONCATENATE(O1,"|",O2,"|",O3,"|",O4,"|",O5,"|",O6,"|",O7)
Then do a HLOOKUP
=HLOOKUP(O8,G8:L21,14,0)
In my case, the calculation time reduce from 10 min to a few seconds!
Alternatively I also found a way to do without helper column, using array again, but the idea is pretty much the same,
the formula in O21 as per your sample would be
=SUM(IF(CONCATENATE(G1:L1,G2:L2,G3:L3,G4:L4,G5:L5,G6:L6,G7:L7)=CONCATENATE(O1,O2,O3,O4,O5,O6,O7),G21:L21))
(i didn't add in the "|" delimiter for this formula, but it is better to do so)
But in the end I prefer the helper column method.
For your reference
HTH

To improve performance avoid reapeating same calculations multiple times.
This allows us to track down a unique specific project and retrieve its data.
If a combination of 7 values is unique, calculate the position of chosen project only once in helper cell (for example O15) with array formula (confirmed with Ctrl+Shift+Enter:
=MATCH(1;(G1:L1=O1)*(G2:L2=O2)*(G3:L3=O3)*(G4:L4=O4)*(G5:L5=O5)*(G6:L6=O6)*(G7:L7=O7);0)
Use the following formula in O21 and drag down:
=INDEX(G21:L21;1;$O$15)

Related

How can I drag down to multiple rows a formula which has more than 1 row in its result?

I have an extension I am getting the data from, and I am referring to that extension in a formula with result of pre determined rows but I want more than 1 row in its result and that's when I have this problem, because I want to drag the formula to multiple rows but they overlap each other, for example if I wanted 3 rows in the result of the formula starting in row 1 and then drag it down from row 1 to row 3 the formula in row 1 and 2 will show an error because they're overlapped in each other I will put a picture in how it looks...
Is there a way to specify amount of rows as a space between each formula in a way that when I drag the formula down to more rows it will adjust to the "space" I specified?
This is the formula I am using, I am also referring to another sheet as you can see so it'll be great if you can use this formula to answer my question, if I can specify the "space" using another formula that is (also it's probably obvious but the pre determined rows in the formula is the "2d").
=CRYPTOFINANCE("KRAKEN:"&'crypto-track'!C4&"/USD", "price_history", "2d")
this is usually solved by constructing an array of formulae where you stack them up in the line like:
={CRYPTOFINANCE("KRAKEN:"&'crypto-track'!C4&"/USD", "price_history", "2d");
CRYPTOFINANCE("KRAKEN:"&'crypto-track'!C5&"/USD", "price_history", "2d");
CRYPTOFINANCE("KRAKEN:"&'crypto-track'!C6&"/USD", "price_history", "2d")}
this way the 2nd fx will pick up right after 1 fx ends
you can ease your pain of a "hand job" from constructing such an array - especially if that array needs to span over the larger range - by building a formula to generate a formula. for example: https://stackoverflow.com/a/68278101/5632629
also, make sure you obey the law of array constructs and successfully avoid all array errors - https://stackoverflow.com/a/58042211/5632629

How would I reference cells in order while moving down several cells in a formula on a separate sheet?

I have been at this for hours and it's kicking me. I'm trying to build a log for someone, and I have a sheet with standard data in table format. I need the next sheet to look a certain way so that it can be exported to PDF and continue looking like the log always has - which means that it will not be a standard table.
In the Log sheet, data is all on one row, in the PrintSheet the cell references will be placed in three rows, with a gap fourth row. Obviously, when you paste formulas in Excel, it picks the row you're in, vs the next row down in the referenced sheet. I've included the formulas that "work" in blue in the image for reference, but that would involve manually subtracting 3 (or 4 depending on which one I'm doing) to each formula (Formula for reference -- =Log!$A$1&": "&INDIRECT("'log'!A"&ROW()-3).
Is there a way to dynamically write this formula so it can just be copy/pasted every 4th row when they need more in the PrintSheet? Is it possible I need to be using an array formula (that is an area of Excel that I am deeply lacking in)?
Input Sheet (log) vs Output Sheet (PrintSheet) with formulas in blue
Use a bit of maths on the row number and pull the formula down as required:
=CHOOSE(MOD(ROW()-2,4)+1,Log!$A$1&":"&INDEX(Log!A:A,QUOTIENT(ROW()-2,4)+2),Log!$D$1&":"&INDEX(Log!D:D,QUOTIENT(ROW()-2,4)+2),"","")
Log:
PrintSheet:
If you have Excel 365, you can do it using the same method but as a spill formula:
=LET(logRows,COUNTA(Log!A:A)-1,
seq,SEQUENCE(logRows*4,1,0),
CHOOSE(MOD(seq,4)+1,Log!$A$1&":"&INDEX(Log!A:A,QUOTIENT(seq,4)+2),
Log!$D$1&":"&INDEX(Log!D:D,QUOTIENT(seq,4)+2),"",""))
In this case it counts the number of rows in the log so will expand as you add more rows.
The date and time can be done in a similar way.
Instead of row()-3 etc. have a look here whether this offset calculation helps:
Column A is the row in the output list and column B is the target row from the input list

Can I make an array out of a range of countif functions?

A truncated version of my data is in the form shown in the screenshot below: three columns of 5 unique names. The names appear in any order and in any position but never repeat in a single row.
My goal is to create an array that contains the number of times Adam appears in each row. I can fill down the formula=countif(A2:C2,$I$2) in a new column, or if I write the array manually for each row, it looks like:
={countif(A2:C2,$I$2);countif(A3:C3,$I$2);countif(A4:C4,$I$2);countif(A5:C5,$I$2);countif(A6:C6,$I$2)}
Where cell I2 contains "Adam". Of course, this is not feasible for large data sets.
I know that arrays are effectively cells turned into ranges, but my main issue is that the cell I'm trying to transform already references a range, and I don't know how to tell the software to apply the countif down each row (i.e. I intuitively would like to do something like countif((A2:C2):(A99:C99),"Adam") but understand that's not how spreadsheets work).
My goal is ultimately to perform some operations on the corresponding array but I think I'm comfortable enough with that once I can get the array formula I'm looking for.
try:
=ARRAYFORMULA(IF(A2:A="",,MMULT(IF(A2:C="Adam", 1, 0), {1;1;1})))

Excel array Formula that copies only cells containing a string

I would prefer to use an excel array formula (but if it can only be done in VBA, so be it) that copies ALL cells from a column array that contains specific text. The picture below shows what I am after and what I have tried. I'm getting close (thanks to similar, but different questions) but can't quite get to where I want. At the moment, I am getting only the first cell instead of all the cells. In my actual application, I am searching through about 20,000 cells and will have a few hundred search terms. I expect most search terms to give me about 8 - 12 cells with that value.
formula I am using:
=INDEX($A$4:$A$10,MATCH(FALSE,ISERROR(SEARCH($C$1,$A$4:$A$10)),0))
Spredsheet Image
To make this work efficiently, I recommend having a separate cell holding the results count (I used cell C2) which has this formula:
=COUNTIF(A:A,"*"&C1&"*")
Then in cell C4 and copied down use this array formula (The -3 is just because the header row is row 3. If the header row was row 1, it would be -1):
=IF(ROW(A1)>$C$2,"",INDEX($A$4:$A$21000,SMALL(IF(ISNUMBER(SEARCH($C$1,$A$4:$A$21000)),ROW($A$4:$A$21000)-3),ROW(C1))))
I tested this with 21000 rows of data in column A with an average of 30 results per search string and the formula is copied down for 60 cells in column C. With that much data, this takes about 1-2 seconds to finish recalculating. Recalculation time can vary widely depending on other factors in your workbook (additional formulas, nested dependencies, use of volatile functions, etc) as well as your hardware.
Alternately, you could just use the built-in Filter functionality, but I hope this helps.
You need to get the ROWS. Put this in C4 and copy down.
=IFERROR(AGGREGATE(15,6, IF(SEARCH($C$1, $A$4:$A$10)>0, ROW($A$4:$A$10)), ROW($C4)-ROW($A$4)+1), "")
Array formula so use ctrl-shift-Enter

Code equivalent to Array Formula

I am currently using an array formula in my data to find a row where columns O, Y, and AA match the current row, and where column A value does not match, and return column C for the matching row.
Here is my formula:
=INDEX(C:C,MATCH(1,(O:O=O2)*(Y:Y=Y2)*(AA:AA=AA2)*(A:A<>A2),0))
Using named ranges I have been able to input this formula using VBA, but what I really want to do is use VBA to perform a similar function and write the resulting value to column D.
I am thinking that possibly a loop, for each i from 2 to last row, find the other row within the range that matches and write cell(row that was found, 3).value to cell(i, 4), but I don't know the syntax for a VBA array to find that matching row.
While not explicitly stated, it could easily be inferred that you are seeking to use VBA to increase the efficiency of the calculation/recalculation of your array formula. You haven't provided the scope (i.e. number of rows) of your data but it is unlikely that you require the full column references you are using. The following calculation cycle times were based on ~1000 rows of static data.
Your array formula:
=INDEX(C:C, MATCH(1, (O:O=O2)*(Y:Y=Y2)*(AA:AA=AA2)*(A:A<>A2), 0))
Elapsed time to fill down and calculate: 24.828 seconds
Your array formula with column references truncated to actual extents of the data:
=INDEX($C$2:$C$999, MATCH(1, ($O$2:$O$999=O2)*($Y$2:$Y$999=Y2)*($AA$2:$AA$999=AA2)*($A$2:$A$999<>A2), 0))
Elapsed time to fill down and calculate: 0.203 seconds
Comparable standard formula with column references truncated to actual extents of the data:
=INDEX($C$2:$C$999, MIN(INDEX(ROW($1:$998)+(($A$2:$A$999=A2)+($O$2:$O$999<>O2)+($Y$2:$Y$999<>Y2)+($AA$2:$AA$999<>AA2))*1E+99, , )))
Elapsed time to fill down and calculate: 0.257 seconds
As you can see, cutting down the column references to what are actually being used increases efficiency immensely. Array formulas process by calculating everything against everything else. The calculation load increases exponentially
as the rows of cells referenced increases.
If your data is constantly changing and you do not know how many rows you will be dealing with, use named ranges with a Refers to: that is defined dynamically. Example:
Pick a column that usually defines the extents of the data and note the nature of the data. This method differs slightly depending upon whether you are dealing with true numbers or text. For demonstration purposes, column C has numbers.
Choose Formulas ► Defined Names► Name Manager. When the Name Manager dialog opens, click New.
Type a friendly name for the range in the Name: text box; e.g. MyColAA. Lave the Scope: as Workbook.
Use the following for the Refers to::     =$AA$2:INDEX($AA:$AA, MATCH(1e99, $C:$C)) 
This will define myColAA as AA2 to the row matching the last number in column C. If column C:C was full of text values you would use the following,     =$AA$2:INDEX($AA:$AA, MATCH("zzz", $C:$C)) 
Repeat for columns A:A, C:C, O:O and Y:Y. Keep the last used reference as column C:C so they will always have the same number of rows but change the other column references and give each a new name.
When you created all of the named ranges and are back at the worksheet, test one by tapping F5, typing myColAA into the Reference: text box and clicking OK.
Your array formula will now look similar to the following.
=INDEX(myColC, MATCH(1, (myColO=O2)*(myColY=Y2)*(myColAA=AA2)*(myColA<>A2), 0))
The named ranges will grow and shrink with the amount of data available.

Resources