Optimization of array function that calculates products - arrays

I have the following array formula that calculates the returns on a particular stock in a particular year:
=IF(AND(NOT(E2=E3),H2=H3),PRODUCT(IF($E$2:E2=E1,$O$2:O2,""))-1,"")
But since I have 500,000 row entries as soon as I hit row 50,000 I get an error from Excel stating that my machine does not have enough resources to compute the values.
How shall I optimize the function so that it actually works?
E column refers to a counter to check the years and ticker values of stocks. If year is different from the previous value the function will output 1. It will also output 1 when the name of stock has changed. So for example you may have values for year 1993 and the next value is 1993 too but the name of stock is different, so clearly the return should be calculated anew, and I use 1 as an indication for that.
Then I have another column that runs a cumulative sum of those 1s. When a new 1 in that previous column is encountered I add 1 to the running total and keep printing same number until I observe a new one. This makes possible use of the array function, if the column that contains running total values (E column) has a next value that is different from previous I use my twist on SUMIF but with PRODUCT IF. This will return the product of all the corresponding running total E column values.

The source of the inefficiency, I believe, is in the steady increase with row number of the number of cells that must be examined in order to evaluate each successive array formula. In row 50,000, for example, your formula must examine cells in all the rows above it.
I'm a big fan of array formulas, so it pains me to say this, but I wouldn't do it this way. Instead, use additional columns to compute, in each row, the pieces of your formula that are needed to return the desired result. By taking that approach, you're exploiting Excel's very efficient recalculation engine to compute only what's needed.
As for the final product, compute that from a cumulative running product in an auxiliary column, and that resets to the value now in column O when column P in the row above contains a number. This approach is much more "local" and avoids formulas that depend on large numbers of cells.
I realize that text is not the best language for describing this, and my poor writing skills might be adding to the challenge, so please let me know if more detail is needed.
Interesting problem, thanks.

Could I suggest a really quick and [very] dirty vba? Something like the below. Obviously, have a backup of your file before running this. This assumes you want to start calculating from row 13.
Sub calculateP()
'start on row 13, column P:
Cells(13, 16).Select
'loop through every row as long as column A is populated:
Do
If ActiveCell(1, -14).Value = "" Then Exit Do 'column A not populated so exit loop
'enter formula:
Selection.FormulaR1C1 = _
"=IF(AND(NOT(RC[-11]=R[1]C[-11]),RC[-8]=R[1]C[-8]),PRODUCT(IF(R[-11]C5:RC[-11]=R[-1]C[-11],R2C15:RC[-1],""""))-1,"""")"
'convert cell value to value only (remove formula):
ActiveCell.Value = ActiveCell.Value
'select next row:
ActiveCell(2, 1).Select
Loop
End Sub
Sorry, this is definitely not a great answer for you... in fact, even this method could be achieved more elegantly using range... but, the quick and dirty approach may help you in the interim ??

Related

Google Sheets: Average of every other column

I’ve looked at similar questions and I think I’m close to a working solution, but it’s giving me the wrong answer. I have a spreadsheet in Google Sheets with data in all columns, but every other cell contains a dollar value and I need only the average of those cells. They start (in this version) on cell G3 and continue through most of row 3, then I intend to copy the formula to other rows with the same cells in those rows needing to be averaged as well, so if it’ll adjust as I copy that’ll be best. Here’s what I’ve worked up so far:
=AVERAGEIF(ArrayFormula(mod(column(G3:3),2)),”>0”)
It’s returning 1 as the result, when it should be about 1500. If I change the 2 to another number, the result increases with it, so I think something in mod or column is being done wrong, but I don’t have enough practice to know where I messed up.
avg of every 2nd column it's done like:
=AVERAGE(FILTER(G3:3, MOD(COLUMN(G3:3)-1, 2)=0))
TIL about the FILTER function. Thanks guys.
There is a way with ArrayFormula. I think you almost got it. I would prefer to add one more argument for AVERAGEIF to specify the range to average. I don't know if it defaults to the range in the condition.
And I would do the modulo on the difference between a column and the first column. I guess for your question it isn't needed as all column numbers are either odd or even. But using the difference is a general purpose way to apply the concept to say every nth column.
The modulo 2 of any column number with be 0 or 1. So instead of using an inequality just use 0 or 1. From your formula it looks like your dollar values must be in odd columns so the result of the modulo should be 1. But I think if you are starting at G3 then taking the column difference before applying Mod 2 changes the desired result to 0. But of course to switch to the other column change the 0 to a 1.
=AVERAGEIF(ArrayFormula(mod((COLUMN(G3:3)-COLUMN(G3)),2)),0,G3:3)

Sum across rows and count rows less than 0

I am trying to identify inventory shortages for each store and the type of bread I am supplying.
Example table showing the demand for each type of bread by store.
Row 8 is what I am trying to accomplish with a formula.
Based on the quantity on hand, I can conditionally format to highlight cells red or green based on shortages.
I can't get a formula to count the number of red cells.
I am thinking I need to use sumproduct.
=SUMPRODUCT(--($B9:$B11<C9:C11))
The above works for identifying shortages for Store A (Column C). However, when dragged to column D and E it doesn't remember what the other stores needed.
I can't get the rows to sum without adding ALL rows in the range. I need each row to be individually compared to the qty on hand. I assume an array needs to be used.
This is the formula I mentioned. It uses a standard method with mmult to get the row totals of the matrix, then compares them with the amounts available:
=SUM(--($B9:$B11<MMULT($C9:C11,TRANSPOSE(COLUMN($C9:C11)^0))))
entered in C8 and pulled across. Must be entered as an array formula using CtrlShiftEnter
EDIT
OP has commented that it shouldn't be listed as a shortage if a store doesn't need a particular item, even if the stock of that item has been exhausted.
So there should be an extra condition for it to be registered as a shortage only if the current column has a number >0 in a particular row as well as the row sum being greater than the amount available:
=SUM((C9:C11>0)*($B9:$B11<MMULT($C9:C11,TRANSPOSE(COLUMN($C9:C11)^0))))
If you wanted to select only some of the rows as well it would look like
=SUM(ISNUMBER(MATCH($A9:$A11,{"Wheat","Rye"},0))*(C9:C11>0)*($B9:$B11<MMULT($C9:C11,TRANSPOSE(COLUMN($C9:C11)^0))))
EDIT: FIXED TO WORK FOR CONDITIONAL FORMATTING:
Paste this into the module:
Public Function CellColour(addr)
CellColour = Range(addr).DisplayFormat.Interior.Color
End Function
Function FINDRED(rng As Range)
FINDRED = 0
Dim cell As Range
For Each cell In rng
If (cell.Parent.Evaluate("CellColour(""" & cell.Address & """)") = RGB(255, 0, 0)) Then
FINDRED = FINDRED + 1
End If
Next cell
End Function
You can then use this like a normal excel formula:

Index/Match with IF Statement

As you can see, I have a database table on the left. And I want to add in IF statement that allows me to lookup the [Code], [Name] and [Amount] of the top 5 of Company A ONLY. Then do a top 5 for Company B and so on. I have managed to lookup the top 5 out of ALL companies but cannot seem to add a criteria to target specific company.
Here are my formulas so far:
Formula in Column K [Company]: = INDEX(Database,MATCH(N3,sales,0),1)
Formula in Column L [Code]: = INDEX(Database,MATCH(N3,sales,0),2)
Formula in Column M [Name]: = INDEX(Database,MATCH(N3,sales,0),2)
Formula in Column N [Amount]: = LARGE(sales,ROW(1:20))
The intended result is to show the top 5 sales person in each company along with their [Code], [Name] and [Amount], feel free to suggest any edits to the worksheet.
Here's an alternative if you know the code is unique. After putting A into K3:K7
First get the highest amounts for Company A starting in N3
=AGGREGATE(14,6,Database[Amount]/(Database[Company]=K3),ROWS(N$1:N1))
Then find the code which matches the amount, but only if it hasn't been used before (this assumes that the code is unique) starting in L3
=INDEX(Database[Code],MATCH(1,INDEX((Database[Company]="A")*(Database[Amount]=N3)*ISNA(MATCH(Database[Code],L$2:L2,0)),0),0))
Then find the matching name with a normal INDEX/MATCH starting in M3
=INDEX(Database[Name],MATCH(L3,Database[Code],0))
Okay, I have achieved this with the use of a helper column which you can hide. Please nnote though that this will only work as long as there are not more than 9 identical totals for any 1 company, I don't think you should have that issue but it may occur, the digits being added by the helper column would need to be tweaked
First Helper Column:
Adds a digit to the end of the total representing the number of times that amount already exists above for that company. This formula is =CONCATENATE([#Amount],COUNTIFS($A$1:A1,A2,$D$1:D1,D2))*1
This is multiplied by 1 to keep the number format for LARGEto work with.
Second Helper Column:
This is an array formula and will need to be input by using Ctrl+Shift+Enter while still in the formula bar.
The formula for this one is:
=LARGE(IF(Company="A",Helper),ROW(1:1))
What this formula does as an array formula is produce a list of results based on the IF statement that LARGE can use. Rather than the entire column being ranked largest to smallest, we can now single out the rows that have company "A" like so:
=LARGE({20000;20001;20002;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;15000;14000;30000;FALSE;FALSE;FALSE;FALSE},ROW(1:1))
LARGE will only work with numeric values so the FALSES produced where column A does not match "A" will be ignored. Notice why I have used the helper column here to eliminate unique values but not affect the top 5.
ROW(1:1) has been used as this will automatically update when the formula is dragged down to produce the next highest result in this array.
The main formula for top 5 array
Again this is an Array formula so will need to be input by using Ctrl+Shift+Enter while still in the formula bar.
=INDEX(Database,SMALL(IF(Company="A",IF(Helper=$O3,ROW(Company))),1)-1,COLUMN(A:A))
With array formulas for some unknown reason IF(AND()) just does not work for me so I have nested two IF's instead.
Notice how I am again checking whether the first column matches "A" and then whether the last column matches the result of the second formula. What will happen is where both of these conditions match in the array (as in both produce TRUE for the same row) I wanted the row number to be returned.
IF({TRUE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;TRUE;TRUE;FALSE;FALSE;FALSE;FALSE},IF({FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;FALSE;TRUE;FALSE;FALSE},{2;3;4;5;6;7;8;9;10;11;12;13;14;15;16;17;18;19;20}))
It looks like a mess I know, but the position where both TRUEs align gives us the row 16 as a result.
{FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;16;FALSE;FALSE;FALSE;FALSE}
As I know that there can only be one match possible for this, I use SMALL to grab the first smallest number to use in the INDEX formula for row and deduct 1 as we are not considering the headers in the INDEX formula so we actually want the 15th result.
Again, COLUMN(A:A) has been used for the column number to return as this will automatically update when the formula is dragged across.
If you are struggling with my explanation and want me to provide more clarity, feel free to reach out and I will try my best to explain the logic in more detail

How to refer to a variable range

I want to refer a range of data in my excel sheet that is of variable range. Means this month data have 80 rows but next month it could be of 100 rows. So i just wanted a method to refer a range for variable range. So that i can use that method in following formula:-
=SUMPRODUCT(Allocation_Updt!$J$2:$J$83*((RIGHT(Allocation_Updt!$F$2:$F$83,6)+0)=$E62))/100
Here 83 is the last row of the data sheet. but it can be changed next time. Setting it to 10000(Almost max limit of my data) will give me error.
Try converting the range of data to a table. It will automatically apply a name to each column. These column names can be used to refer to the data in the column, and that range of data will be dynamic going forward.
Use,
match(1e99, Allocation_Updt!$J:$J)
... to find the row number of the last number or date in a column. With the last value in J83 all of the following three range references are the same thing
Allocation_Updt!$J$2:$J$83
Allocation_Updt!$J$2:index(Allocation_Updt!J:J, match(1e99, Allocation_Updt!$J:$J))
index(Allocation_Updt!J:J, 2)):index(Allocation_Updt!J:J, match(1e99, Allocation_Updt!$J:$J))
So your SUMPRODUCT function can be dynamically limited to exactly what is needed with,
=SUMPRODUCT(Allocation_Updt!$J$2:index(Allocation_Updt!$J:$J, match(1e99, Allocation_Updt!$J:$J))*((RIGHT(Allocation_Updt!$F$2:index(Allocation_Updt!$F:$F, match(1e99, Allocation_Updt!$J:$J)),6)+0)=$E62))/100
Note that the last row number in column J is used to get the last valid entry in both column F and column J.
Given the persnickety nature of SUMPRODUCT, I might perform some tests with,
=sumifs(Allocation_Updt!$J:$J, Allocation_Updt!$F:$F, "*"&$E62)/100
That is not specifically a 'right-most 6 character match'; it is an 'ends-with-E62' match. Some testing on your own data will quickly prove whether this is a viable alternative. It is more efficient, more forgiving and you can use full column references without penalty.

Sumproduct formula returns a #VALUE! error when the last array refers to a column with formulas in every row. MS Excel 2010

I am trying to find an easy way to calculate commissions off of sales on multiple sheets within a workbook. Each month, I need to find the total net profit for only items sold within the specified month.
The formula I am currently using is:
=SUMPRODUCT((TEXT('Sheet Name'!$P$3:P24,"MY")=TEXT($G$4,"MY"))*'Sheet Name'!$M$3:M24)
Column P shows the Sold Date,
Column M includes a formula in each row to calculate the net profit, and
cell G4 is where I would enter the month & year I am currently working with.
I have come to the conclusion that it only gives me the #VALUE! error because of the formula in each row of Column M (example: =IF(OR(F15=0,G15=0)," ",(F15-L15)) ).
When I reference a different column (in place of Column M) that does not contain formulas it works perfectly (example: =SUMPRODUCT((TEXT('Sheet Name'!$P$3:P24,"MY")=TEXT($G$4,"MY"))*'Sheet Name'!$G$3:G24) ). Also, changing the astrisk to a comma causes the formula to calculate incorrectly and add the (--(TEXT double negative does not fix the problem.
How to I get this array to calculate without removing the formulas from Column M?
Thanks for your attention.
I presume it is giving you a #VALUE error because your formula results in text (a space) and it errors when trying to multiply a space by a number (aka True or False). I think you would be better served changing your M column formula to =IF(OR(F15=0,G15=0),0,(F15-L15)). Do you have a specific reason to not make it evaluate to 0? Also is there a reason you are converting to text to do your month/year check?
Try something like this: =SUMPRODUCT(--(MONTH('Sheet Name'!$P$3:P24)=MONTH($G$4)),--(YEAR('Sheet Name'!$P$3:P24)=YEAR($G$4)),'Sheet Name'!$M$3:M24). Of course this is dependent on entering the dates as actual dates. The -- is used to change a logical/boolean (true/false) into a 1 or 0. It won't do anything useful to text. For example, it should also work as =SUMPRODUCT((MONTH('Sheet Name'!$P$3:P24)=MONTH($G$4))*(YEAR('Sheet Name'!$P$3:P24)=YEAR($G$4))*'Sheet Name'!$M$3:M24) since the multiplication converts the truthy statements to numbers. The trick is to make sure when everything else evaluates, you have =sumproduct(numbers,numbers,numbers). Your instance is one array of =sumproduct(numbers/text).

Resources