How to refer to a variable range - arrays

I want to refer a range of data in my excel sheet that is of variable range. Means this month data have 80 rows but next month it could be of 100 rows. So i just wanted a method to refer a range for variable range. So that i can use that method in following formula:-
=SUMPRODUCT(Allocation_Updt!$J$2:$J$83*((RIGHT(Allocation_Updt!$F$2:$F$83,6)+0)=$E62))/100
Here 83 is the last row of the data sheet. but it can be changed next time. Setting it to 10000(Almost max limit of my data) will give me error.

Try converting the range of data to a table. It will automatically apply a name to each column. These column names can be used to refer to the data in the column, and that range of data will be dynamic going forward.

Use,
match(1e99, Allocation_Updt!$J:$J)
... to find the row number of the last number or date in a column. With the last value in J83 all of the following three range references are the same thing
Allocation_Updt!$J$2:$J$83
Allocation_Updt!$J$2:index(Allocation_Updt!J:J, match(1e99, Allocation_Updt!$J:$J))
index(Allocation_Updt!J:J, 2)):index(Allocation_Updt!J:J, match(1e99, Allocation_Updt!$J:$J))
So your SUMPRODUCT function can be dynamically limited to exactly what is needed with,
=SUMPRODUCT(Allocation_Updt!$J$2:index(Allocation_Updt!$J:$J, match(1e99, Allocation_Updt!$J:$J))*((RIGHT(Allocation_Updt!$F$2:index(Allocation_Updt!$F:$F, match(1e99, Allocation_Updt!$J:$J)),6)+0)=$E62))/100
Note that the last row number in column J is used to get the last valid entry in both column F and column J.
Given the persnickety nature of SUMPRODUCT, I might perform some tests with,
=sumifs(Allocation_Updt!$J:$J, Allocation_Updt!$F:$F, "*"&$E62)/100
That is not specifically a 'right-most 6 character match'; it is an 'ends-with-E62' match. Some testing on your own data will quickly prove whether this is a viable alternative. It is more efficient, more forgiving and you can use full column references without penalty.

Related

speed up array formula

I have the following formula which I will explain below:
{=SUM(IF(($G$1:$L$1=$O$1)*($G$2:$L$2=$O$2)*($G$3:$L$3=$O$3)*($G$4:$L$4=$O$4)*($G$5:$L$5=$O$5)*($G$6:$L$6=$O$6)*($G$7:$L$7=$O$7);G21:L21))}
Here is what the worksheet looks like:
Under columns G - L we have a 'database' of all data. These columns will be added cumulatively each quarter (approx 30 columns a quarter). So after a few years we have ended up with a bunch of database columns (1000 + columns of raw data). For the sake of this demo, I have only included those 6 columns.
As you can see, each column contains specific parameters, between rows 1 - 7, which allows to identify specific CountryCode + Project Code + Category + Fiscal Year, + ... (etc.). This allows us to track down a unique specific project and retrieve its data.
What we have afterwards on the column O is a specific project we are trying to retrieve values for (you can see that the rows 1 - 7 are the same as under column G (we are trying to retrieve values for this particular project).
Here comes our formula. I have attached above. Here is what it looks like when I press F2. As you can see the IF statement is first simply checking whether the particular columns match the pre-defined criteria under column O and it sums all the columns that match all the criteria between rows 1-7.
Now here is the problem. We have a worksheet, which contains 20 projects (such as column O) and we are using this array formula there to retrieve values. The problem is that retrieving data using this way takes A LOT OF TIME. We have also adopted a principle via VBA that we iterate through all the cells, then we insert a formula, calculate array cell, and then we copy & Paste resulting value inside (so that we won't end up with full sheet of array formulas). However it still takes LONG to calculate (1 minute or so).
I was wondering, if there is a better solution how to retrieve the data in the already mentioned format (that means we have a specific criteria we are trying to find)? Maybe SUMIFS could be better? Or sumproduct? Or even compeltely different solution?
I am open to any proposal which would fasten the process.
i met similar problem about 2 weeks ago. At first i use a helper column/row. The helper column is to concatenate the 7 string in each column. then only use the IF function to check if the joined text match. Such as, assuming the helper row is row 8 per your sample, cell G8 formula would be
=CONCATENATE(G1,"|",G2,"|",G3,"|",G4,"|",G5,"|",G6,"|",G7)
and do the same for the rest including column O
=CONCATENATE(O1,"|",O2,"|",O3,"|",O4,"|",O5,"|",O6,"|",O7)
Then do a HLOOKUP
=HLOOKUP(O8,G8:L21,14,0)
In my case, the calculation time reduce from 10 min to a few seconds!
Alternatively I also found a way to do without helper column, using array again, but the idea is pretty much the same,
the formula in O21 as per your sample would be
=SUM(IF(CONCATENATE(G1:L1,G2:L2,G3:L3,G4:L4,G5:L5,G6:L6,G7:L7)=CONCATENATE(O1,O2,O3,O4,O5,O6,O7),G21:L21))
(i didn't add in the "|" delimiter for this formula, but it is better to do so)
But in the end I prefer the helper column method.
For your reference
HTH
To improve performance avoid reapeating same calculations multiple times.
This allows us to track down a unique specific project and retrieve its data.
If a combination of 7 values is unique, calculate the position of chosen project only once in helper cell (for example O15) with array formula (confirmed with Ctrl+Shift+Enter:
=MATCH(1;(G1:L1=O1)*(G2:L2=O2)*(G3:L3=O3)*(G4:L4=O4)*(G5:L5=O5)*(G6:L6=O6)*(G7:L7=O7);0)
Use the following formula in O21 and drag down:
=INDEX(G21:L21;1;$O$15)

How can I find and pass a cell reference into the StDev.P command in Excel 2010?

I would like to pass a cell reference into the STDEV.P function in Excel, but when I do this I keep getting a #DIV/0 error.
I have two columns in Excel. Column A contains a list of dates starting Jan-1-2012 and going to the current date. Column B contains a list of integers. I have over 800 rows of data and it's possible that integers in column B are repeated somewhere in the 800 rows of data.
I want to find the STDEV of an array of values in Column B. The array is determined by a begin date and an end date. The end user can decide which begin and end dates are to be used. For example, if the begin date is 1/1/2015, I want to find the corresponding integer in column B for this date and pass the CELL REFERENCE into the STDEV formula. I want to do the same for the end date. The end result is a STDEV calculation that uses the array of integers determined by user supplied begin & end dates.
I've been able to find the cell location (e.g. value .332 is in cell D45) using the MATCH, INDEX and ADDRESS functions, but when I try to pass D45 into the STDEV function, I get the error. Help!
Many users believe that an INDEX(MATCH(...)) pair only returns a cell value in a lookup but in fact it can be used to return a cell reference without the INDIRECT function's overhead. Two of them can even be joined with a colon to form a valid cell range to be used in any number of formulas.
        
The formulas in F2:H2 are,
=STDEV(INDEX(B:B, MATCH(D2,A:A, 0)):INDEX(B:B, MATCH(E2,A:A, 0))) ◄F2
=STDEV.P(INDEX(B:B, MATCH(D2,A:A, 0)):INDEX(B:B, MATCH(E2,A:A, 0))) ◄G2
=SUM(INDEX(B:B, MATCH(D2,A:A, 0)):INDEX(B:B, MATCH(E2,A:A, 0))) ◄H2
I've included a simple SUM function so that you can quickly verify that the method used is returning the correct cell range without doing the math on a StDev.
Of the many lookup functions, this is an INDEX function trait. The VLOOKUP function or HLOOKUP function cannot be used in this manner as they are only returning the values.
Assuming your start date is in D1 and your end date in D2 please try:
=STDEV.P(INDIRECT("B"&MATCH(D1,A:A,0)&":B"&MATCH(D2,A:A,0)))
As long as the dates are unique it should not matter that the other values are not.

Code equivalent to Array Formula

I am currently using an array formula in my data to find a row where columns O, Y, and AA match the current row, and where column A value does not match, and return column C for the matching row.
Here is my formula:
=INDEX(C:C,MATCH(1,(O:O=O2)*(Y:Y=Y2)*(AA:AA=AA2)*(A:A<>A2),0))
Using named ranges I have been able to input this formula using VBA, but what I really want to do is use VBA to perform a similar function and write the resulting value to column D.
I am thinking that possibly a loop, for each i from 2 to last row, find the other row within the range that matches and write cell(row that was found, 3).value to cell(i, 4), but I don't know the syntax for a VBA array to find that matching row.
While not explicitly stated, it could easily be inferred that you are seeking to use VBA to increase the efficiency of the calculation/recalculation of your array formula. You haven't provided the scope (i.e. number of rows) of your data but it is unlikely that you require the full column references you are using. The following calculation cycle times were based on ~1000 rows of static data.
Your array formula:
=INDEX(C:C, MATCH(1, (O:O=O2)*(Y:Y=Y2)*(AA:AA=AA2)*(A:A<>A2), 0))
Elapsed time to fill down and calculate: 24.828 seconds
Your array formula with column references truncated to actual extents of the data:
=INDEX($C$2:$C$999, MATCH(1, ($O$2:$O$999=O2)*($Y$2:$Y$999=Y2)*($AA$2:$AA$999=AA2)*($A$2:$A$999<>A2), 0))
Elapsed time to fill down and calculate: 0.203 seconds
Comparable standard formula with column references truncated to actual extents of the data:
=INDEX($C$2:$C$999, MIN(INDEX(ROW($1:$998)+(($A$2:$A$999=A2)+($O$2:$O$999<>O2)+($Y$2:$Y$999<>Y2)+($AA$2:$AA$999<>AA2))*1E+99, , )))
Elapsed time to fill down and calculate: 0.257 seconds
As you can see, cutting down the column references to what are actually being used increases efficiency immensely. Array formulas process by calculating everything against everything else. The calculation load increases exponentially
as the rows of cells referenced increases.
If your data is constantly changing and you do not know how many rows you will be dealing with, use named ranges with a Refers to: that is defined dynamically. Example:
Pick a column that usually defines the extents of the data and note the nature of the data. This method differs slightly depending upon whether you are dealing with true numbers or text. For demonstration purposes, column C has numbers.
Choose Formulas ► Defined Names► Name Manager. When the Name Manager dialog opens, click New.
Type a friendly name for the range in the Name: text box; e.g. MyColAA. Lave the Scope: as Workbook.
Use the following for the Refers to::     =$AA$2:INDEX($AA:$AA, MATCH(1e99, $C:$C)) 
This will define myColAA as AA2 to the row matching the last number in column C. If column C:C was full of text values you would use the following,     =$AA$2:INDEX($AA:$AA, MATCH("zzz", $C:$C)) 
Repeat for columns A:A, C:C, O:O and Y:Y. Keep the last used reference as column C:C so they will always have the same number of rows but change the other column references and give each a new name.
When you created all of the named ranges and are back at the worksheet, test one by tapping F5, typing myColAA into the Reference: text box and clicking OK.
Your array formula will now look similar to the following.
=INDEX(myColC, MATCH(1, (myColO=O2)*(myColY=Y2)*(myColAA=AA2)*(myColA<>A2), 0))
The named ranges will grow and shrink with the amount of data available.

Optimization of array function that calculates products

I have the following array formula that calculates the returns on a particular stock in a particular year:
=IF(AND(NOT(E2=E3),H2=H3),PRODUCT(IF($E$2:E2=E1,$O$2:O2,""))-1,"")
But since I have 500,000 row entries as soon as I hit row 50,000 I get an error from Excel stating that my machine does not have enough resources to compute the values.
How shall I optimize the function so that it actually works?
E column refers to a counter to check the years and ticker values of stocks. If year is different from the previous value the function will output 1. It will also output 1 when the name of stock has changed. So for example you may have values for year 1993 and the next value is 1993 too but the name of stock is different, so clearly the return should be calculated anew, and I use 1 as an indication for that.
Then I have another column that runs a cumulative sum of those 1s. When a new 1 in that previous column is encountered I add 1 to the running total and keep printing same number until I observe a new one. This makes possible use of the array function, if the column that contains running total values (E column) has a next value that is different from previous I use my twist on SUMIF but with PRODUCT IF. This will return the product of all the corresponding running total E column values.
The source of the inefficiency, I believe, is in the steady increase with row number of the number of cells that must be examined in order to evaluate each successive array formula. In row 50,000, for example, your formula must examine cells in all the rows above it.
I'm a big fan of array formulas, so it pains me to say this, but I wouldn't do it this way. Instead, use additional columns to compute, in each row, the pieces of your formula that are needed to return the desired result. By taking that approach, you're exploiting Excel's very efficient recalculation engine to compute only what's needed.
As for the final product, compute that from a cumulative running product in an auxiliary column, and that resets to the value now in column O when column P in the row above contains a number. This approach is much more "local" and avoids formulas that depend on large numbers of cells.
I realize that text is not the best language for describing this, and my poor writing skills might be adding to the challenge, so please let me know if more detail is needed.
Interesting problem, thanks.
Could I suggest a really quick and [very] dirty vba? Something like the below. Obviously, have a backup of your file before running this. This assumes you want to start calculating from row 13.
Sub calculateP()
'start on row 13, column P:
Cells(13, 16).Select
'loop through every row as long as column A is populated:
Do
If ActiveCell(1, -14).Value = "" Then Exit Do 'column A not populated so exit loop
'enter formula:
Selection.FormulaR1C1 = _
"=IF(AND(NOT(RC[-11]=R[1]C[-11]),RC[-8]=R[1]C[-8]),PRODUCT(IF(R[-11]C5:RC[-11]=R[-1]C[-11],R2C15:RC[-1],""""))-1,"""")"
'convert cell value to value only (remove formula):
ActiveCell.Value = ActiveCell.Value
'select next row:
ActiveCell(2, 1).Select
Loop
End Sub
Sorry, this is definitely not a great answer for you... in fact, even this method could be achieved more elegantly using range... but, the quick and dirty approach may help you in the interim ??

Excel arrays count totals using criterias from multiple ranges (or sheets)

What I would like to do is to count the amount of lines that matches criterias to be verified in two arrays.
I can't use VBA, add new columns (for instance a new column with VLOOKUP formula) and preferably use arrays.
I have two separate ranges, each with a ID column for the identifier and other fields with data.
For instance, range 1:
Range 2:
If I had only to check the first range I would do:
={SUM((D4:D7="Red") * (E4:E7="Big"))}
But I don't know how to check also using data from the other range.
How, for example, to count the number of items that are Red, Big and Round by using both Ranges ?
Put this in the cell F4:
=IF((VLOOKUP(C4,$C$11:$D$12,2)="Round")*(D4="Red")*(E4="Big"),1,"")
Note that the behavior of VLOOKUP is that it finds the value up to the first parameter. Since there's no 1 in your second dataset, this first cell is going to show "#N/A", which I don't know how to solve, but when you extend this formula down to also compare the other sample data in the first set, the ID numbers 2 and 4 will show up as "yes" for you.
Edit: You wanted a count of this list. So after this, it should be easy to get a count of cells in this column using the COUNT function.
Try this array formula
=SUM((D4:D7="Red")*(E4:E7="Big")*ISNUMBER(MATCH(C4:C7,IF(D12:D13="Round",C12:C13),0)))
The last part is the added criterion you want - the IF function returns {2,4} [IDs where Data 3 is "Round"] and then you can use MATCH to compare C4:C7 against that. If there is a match you get a NUMBER (instead of #N/A) so you can then use ISNUMBER to get TRUE/FALSE and that feeds in to your original formula - result should be 2

Resources