I have a macro that populates start and complete dates based upon the accounting period. For example: Product X has sales in Jan12 - Dec12. The macro would use a vlookup/min array to find the start and a vlookup/max array to find the finish. The dates are in a YYYYMM format, so the vlookup is looking at a calendar tab that maps the corresponding YYYYMM to a start or finish date. The coding works fine, but in some instances there are 30,000 - 100,000 lines and it can take upward of 15-20 minutes to run. Does anyone have any ideas on how to make this run faster?
Here is the code:
Cells(Rcnt, 5).FormulaArray = "=VLOOKUP(MIN(IF('Normalized'!A:A= " & Cells(Rcnt, 1).Address(False, False) & ", 'Normalized'!J:J)),'Calendar'!A:C,2,FALSE)"
Any help would be greatly appreciated.
Don't use a formula in your macro. You can probably make the code much faster by writing the code in VBA yourself. You could loop through all values in a line and store the smallest and largest values (you can keep max and min so you only need one loop). Then write these values to wherever is appropriate. This should already be a speed up over the excel formula.
To get started with writing macros in VBA look here. You will specifically want to look at how to loop through the cells. You will want to set max = Jan 1 and min = Dec 31 and then if the value you are currently inspecting is greater than max you set max equal to the value and likewise for min. After looping through that row you will have the max and min values in that row. You can then write these values into some other column, reset max and min and move to the next row.
My understanding of your issue is that your data lists each product's sales in each month, so for each product, you are searching for the first and last months. Can you sort the data by product and by date? That would be the easiest way to implement a loop as Emschorsch described, because you'd only be dealing with one product at a time.
A couple other things to try to speed this up:
set the first line of your code to:
Application.Calculation = xlCalculationManual
and your last line to:
Application.Calculation = xlCalculationAutomatic
(you still have tons of lookups, but they will calculate all at once, not one at a time as you create each row's formula)
If sorting the data won't fly, you can Autofilter on each product in succession and use the SUBTOTAL() function, referencing the date column, to get the min and max. (You can place the SUBTOTAL() formula in an unused cell and reference that cell in your code, or use it directly in the code using Application.WorksheetFunction)
The other part that seems unnecessary (without seeing your data we can't be sure) is using a lookup to turn YYYYMM into a date, if that is truly the format all of your dates are in. That can easily be done without VBA using the DATE() formula, or if you need it inside a macro:
cells(Rcnt, 5).Value = DateSerial(Left(x, 4), Right(x, 2), 1) 'x is your date in YYYYMM format
Related
I have a LAMBDA function nested in a lambda helper function (MAKEARRAY) to create a column with a series of dates. The series starts with the last day of the month defined in cell start_date followed by the last day of the following month. This one month interval goes on a number of times defined by the value in cell number_months.
The formula is the following:
=MAKEARRAY(number_months,1,LAMBDA(r,c,EOMONTH(start_date,r-1)))
I would like:
This sequence to repeat just below.
Repetition needs to take place a certain number of times, as defined by value in number_repeats.
Since I have the series as the row heading of another Sheet, I have tried using TRANSPOSE(ARRAYFORMULA(INDIRECT to select the variable range, rather than generating again the repeated series of dates. However, in such case I have to figure out how to repeat that array a certain number of times without using REPT and SPLIT because it exceeds the character capacity by far.
That being said, if possible my preference is for a solution based on the transposed LAMBDA function that created the row heading in the other Sheet, rather than referring to the heading using ARRAYFORMULA.
I feel I could use SEQUENCE for that, but I am not sure how to combine it with the LAMBDA function to repeat the series a certain number of times.
What do you think of this?? You can change that B1:1 of the last pair of brackets to your own lambda function. For visual purposes (and probably of calculations if they're too long) I used a range of headers in B1:1 and cell A1 to state the "n" number of times.
So, "qt" is the amount of items in the "headers" (done with Counta) and "times" is that number set by you. I used Counta so you could be able to add more items in your headers (or lambda function) and it would be flexible enough to work too.
BYROW displays the range, and SEQUENCE calculates the amount of rows needed (qt*times) and MOD and INDEX iterates through the headers the amount of times needed.
=lambda(qt,headers,lambda(times,byrow(sequence(qt*times),lambda(seq,INDICE(headers,,RESIDUO(seq-1,times)+1))))(CONTARA(headers)))(A1,B1:1)
Here you have a sample sheet: https://docs.google.com/spreadsheets/d/18qaf86Q5eY-F1ffo15ecaS6haKqj55heW8hT4VBLFHc/edit?usp=drivesdk
You can use reduce to repeat an array a number of times:
=reduce("Header",sequence(num_repeats),lambda(total,value,{total;MAKEARRAY(number_months,1,LAMBDA(r,c,EOMONTH(start_date,r-1)))}))
I have formatted the dates manually though.
Thank you very much Tom and Martin,
I also got the following solution from someone who assisted me too:
=MAKEARRAY(1, B1 * C1, lambda(r, c, eomonth(A1,{mod(c-1, B1)})))
I am currently using this formula to calculate a rolling average score across 12 columns for either the last 3 or 6 months.
=SUM(SUMIFS($E$54:P54,$E$54:P54,LARGE(IF($E$54:P54>0,$E$54:P54),{1,2,3})))
This is an array formula and is entered via CTRL + SHIFT + ENTER.
The problem now is that I need to deploy my work book on older machines and those being ancient office computers (we are talking windows XP and Office 2003...), I find that the array is killing the entire workbook. Now, I have already taken steps to speed up the workbook via VBA (disabling events, manual formula calc, etc.), but I need a way to convert the above array formula into a non-array formula which is NOT counting zeros or empty cells as part of the average.
I tried this below but couldnt get it to work with the zeros / empty cells.
=SUM(OFFSET($E68,0,COUNT($E68:$P68)-IF(COUNT($E68:$P68)>3,3,COUNT($E68:$P68)),1,IF(COUNT($E68:$P68)>3,3,COUNT($E68:$P68))))
Picture of the sample data attached below.
Since an Average is a "Sum" divided by a "Count", this can be accomplished using the simplicity of non-Array formulas.This formula avoids the inclusion of both zeros and blank cells:
=IF(SUM(A2:C2)>0,SUM(A2:C2)/(COUNT(A2:C2)-COUNTIF(A2:C2,0)),0).
If the row is filled with only zero values, then it shows the actual numerical average as zero, which is correct.If you want to avoid the display of zero averages in cells, then use this slightly different formula:
=IF(SUM(A3:C3)>0,SUM(A3:C3)/(COUNT(A3:C3)-COUNTIF(A3:C3,0)),"").
Of course, you will need to adjust the cell ranges for the 6 month and YTD averages; these formulas deal with 3- month ranges.
For last 3-Month range average: =SUM(OFFSET($A3,0,COUNT($A3:$L3)-IF(COUNT($A3:$L3)>3,3,COUNT($A3:$L3)),1,IF(COUNT($A3:$L3)>3,3,COUNT($A3:$L3))))/(COUNT(OFFSET($A3,0,COUNT($A3:$L3)-IF(COUNT($A3:$L3)>3,3,COUNT($A3:$L3)),1,IF(COUNT($A3:$L3)>3,3,COUNT($A3:$L3))))-COUNTIF(OFFSET($A3,0,COUNT($A3:$L3)-IF(COUNT($A3:$L3)>3,3,COUNT($A3:$L3)),1,IF(COUNT($A3:$L3)>3,3,COUNT($A3:$L3))),0))
For last 6-month average: SUM(OFFSET($A2,0,COUNT($A2:$L2)-IF(COUNT($A2:$L2)>6,6,COUNT($A2:$L2)),1,IF(COUNT($A2:$L2)>6,6,COUNT($A2:$L2))))/(COUNT(OFFSET($A2,0,COUNT($A2:$L2)-IF(COUNT($A2:$L2)>6,6,COUNT($A2:$L2)),1,IF(COUNT($A2:$L2)>6,6,COUNT($A2:$L2))))-COUNTIF(OFFSET($A2,0,COUNT($A2:$L2)-IF(COUNT($A2:$L2)>6,6,COUNT($A2:$L2)),1,IF(COUNT($A2:$L2)>6,6,COUNT($A2:$L2))),0))
These are quite lengthy, but they exclude the zero value cells from your original non-Array formula.The data was assumed to begin in row A2:L2.
I have the following formula which I will explain below:
{=SUM(IF(($G$1:$L$1=$O$1)*($G$2:$L$2=$O$2)*($G$3:$L$3=$O$3)*($G$4:$L$4=$O$4)*($G$5:$L$5=$O$5)*($G$6:$L$6=$O$6)*($G$7:$L$7=$O$7);G21:L21))}
Here is what the worksheet looks like:
Under columns G - L we have a 'database' of all data. These columns will be added cumulatively each quarter (approx 30 columns a quarter). So after a few years we have ended up with a bunch of database columns (1000 + columns of raw data). For the sake of this demo, I have only included those 6 columns.
As you can see, each column contains specific parameters, between rows 1 - 7, which allows to identify specific CountryCode + Project Code + Category + Fiscal Year, + ... (etc.). This allows us to track down a unique specific project and retrieve its data.
What we have afterwards on the column O is a specific project we are trying to retrieve values for (you can see that the rows 1 - 7 are the same as under column G (we are trying to retrieve values for this particular project).
Here comes our formula. I have attached above. Here is what it looks like when I press F2. As you can see the IF statement is first simply checking whether the particular columns match the pre-defined criteria under column O and it sums all the columns that match all the criteria between rows 1-7.
Now here is the problem. We have a worksheet, which contains 20 projects (such as column O) and we are using this array formula there to retrieve values. The problem is that retrieving data using this way takes A LOT OF TIME. We have also adopted a principle via VBA that we iterate through all the cells, then we insert a formula, calculate array cell, and then we copy & Paste resulting value inside (so that we won't end up with full sheet of array formulas). However it still takes LONG to calculate (1 minute or so).
I was wondering, if there is a better solution how to retrieve the data in the already mentioned format (that means we have a specific criteria we are trying to find)? Maybe SUMIFS could be better? Or sumproduct? Or even compeltely different solution?
I am open to any proposal which would fasten the process.
i met similar problem about 2 weeks ago. At first i use a helper column/row. The helper column is to concatenate the 7 string in each column. then only use the IF function to check if the joined text match. Such as, assuming the helper row is row 8 per your sample, cell G8 formula would be
=CONCATENATE(G1,"|",G2,"|",G3,"|",G4,"|",G5,"|",G6,"|",G7)
and do the same for the rest including column O
=CONCATENATE(O1,"|",O2,"|",O3,"|",O4,"|",O5,"|",O6,"|",O7)
Then do a HLOOKUP
=HLOOKUP(O8,G8:L21,14,0)
In my case, the calculation time reduce from 10 min to a few seconds!
Alternatively I also found a way to do without helper column, using array again, but the idea is pretty much the same,
the formula in O21 as per your sample would be
=SUM(IF(CONCATENATE(G1:L1,G2:L2,G3:L3,G4:L4,G5:L5,G6:L6,G7:L7)=CONCATENATE(O1,O2,O3,O4,O5,O6,O7),G21:L21))
(i didn't add in the "|" delimiter for this formula, but it is better to do so)
But in the end I prefer the helper column method.
For your reference
HTH
To improve performance avoid reapeating same calculations multiple times.
This allows us to track down a unique specific project and retrieve its data.
If a combination of 7 values is unique, calculate the position of chosen project only once in helper cell (for example O15) with array formula (confirmed with Ctrl+Shift+Enter:
=MATCH(1;(G1:L1=O1)*(G2:L2=O2)*(G3:L3=O3)*(G4:L4=O4)*(G5:L5=O5)*(G6:L6=O6)*(G7:L7=O7);0)
Use the following formula in O21 and drag down:
=INDEX(G21:L21;1;$O$15)
I have the following array formula that calculates the returns on a particular stock in a particular year:
=IF(AND(NOT(E2=E3),H2=H3),PRODUCT(IF($E$2:E2=E1,$O$2:O2,""))-1,"")
But since I have 500,000 row entries as soon as I hit row 50,000 I get an error from Excel stating that my machine does not have enough resources to compute the values.
How shall I optimize the function so that it actually works?
E column refers to a counter to check the years and ticker values of stocks. If year is different from the previous value the function will output 1. It will also output 1 when the name of stock has changed. So for example you may have values for year 1993 and the next value is 1993 too but the name of stock is different, so clearly the return should be calculated anew, and I use 1 as an indication for that.
Then I have another column that runs a cumulative sum of those 1s. When a new 1 in that previous column is encountered I add 1 to the running total and keep printing same number until I observe a new one. This makes possible use of the array function, if the column that contains running total values (E column) has a next value that is different from previous I use my twist on SUMIF but with PRODUCT IF. This will return the product of all the corresponding running total E column values.
The source of the inefficiency, I believe, is in the steady increase with row number of the number of cells that must be examined in order to evaluate each successive array formula. In row 50,000, for example, your formula must examine cells in all the rows above it.
I'm a big fan of array formulas, so it pains me to say this, but I wouldn't do it this way. Instead, use additional columns to compute, in each row, the pieces of your formula that are needed to return the desired result. By taking that approach, you're exploiting Excel's very efficient recalculation engine to compute only what's needed.
As for the final product, compute that from a cumulative running product in an auxiliary column, and that resets to the value now in column O when column P in the row above contains a number. This approach is much more "local" and avoids formulas that depend on large numbers of cells.
I realize that text is not the best language for describing this, and my poor writing skills might be adding to the challenge, so please let me know if more detail is needed.
Interesting problem, thanks.
Could I suggest a really quick and [very] dirty vba? Something like the below. Obviously, have a backup of your file before running this. This assumes you want to start calculating from row 13.
Sub calculateP()
'start on row 13, column P:
Cells(13, 16).Select
'loop through every row as long as column A is populated:
Do
If ActiveCell(1, -14).Value = "" Then Exit Do 'column A not populated so exit loop
'enter formula:
Selection.FormulaR1C1 = _
"=IF(AND(NOT(RC[-11]=R[1]C[-11]),RC[-8]=R[1]C[-8]),PRODUCT(IF(R[-11]C5:RC[-11]=R[-1]C[-11],R2C15:RC[-1],""""))-1,"""")"
'convert cell value to value only (remove formula):
ActiveCell.Value = ActiveCell.Value
'select next row:
ActiveCell(2, 1).Select
Loop
End Sub
Sorry, this is definitely not a great answer for you... in fact, even this method could be achieved more elegantly using range... but, the quick and dirty approach may help you in the interim ??
I am trying to find an easy way to calculate commissions off of sales on multiple sheets within a workbook. Each month, I need to find the total net profit for only items sold within the specified month.
The formula I am currently using is:
=SUMPRODUCT((TEXT('Sheet Name'!$P$3:P24,"MY")=TEXT($G$4,"MY"))*'Sheet Name'!$M$3:M24)
Column P shows the Sold Date,
Column M includes a formula in each row to calculate the net profit, and
cell G4 is where I would enter the month & year I am currently working with.
I have come to the conclusion that it only gives me the #VALUE! error because of the formula in each row of Column M (example: =IF(OR(F15=0,G15=0)," ",(F15-L15)) ).
When I reference a different column (in place of Column M) that does not contain formulas it works perfectly (example: =SUMPRODUCT((TEXT('Sheet Name'!$P$3:P24,"MY")=TEXT($G$4,"MY"))*'Sheet Name'!$G$3:G24) ). Also, changing the astrisk to a comma causes the formula to calculate incorrectly and add the (--(TEXT double negative does not fix the problem.
How to I get this array to calculate without removing the formulas from Column M?
Thanks for your attention.
I presume it is giving you a #VALUE error because your formula results in text (a space) and it errors when trying to multiply a space by a number (aka True or False). I think you would be better served changing your M column formula to =IF(OR(F15=0,G15=0),0,(F15-L15)). Do you have a specific reason to not make it evaluate to 0? Also is there a reason you are converting to text to do your month/year check?
Try something like this: =SUMPRODUCT(--(MONTH('Sheet Name'!$P$3:P24)=MONTH($G$4)),--(YEAR('Sheet Name'!$P$3:P24)=YEAR($G$4)),'Sheet Name'!$M$3:M24). Of course this is dependent on entering the dates as actual dates. The -- is used to change a logical/boolean (true/false) into a 1 or 0. It won't do anything useful to text. For example, it should also work as =SUMPRODUCT((MONTH('Sheet Name'!$P$3:P24)=MONTH($G$4))*(YEAR('Sheet Name'!$P$3:P24)=YEAR($G$4))*'Sheet Name'!$M$3:M24) since the multiplication converts the truthy statements to numbers. The trick is to make sure when everything else evaluates, you have =sumproduct(numbers,numbers,numbers). Your instance is one array of =sumproduct(numbers/text).