I am currently using an array formula in my data to find a row where columns O, Y, and AA match the current row, and where column A value does not match, and return column C for the matching row.
Here is my formula:
=INDEX(C:C,MATCH(1,(O:O=O2)*(Y:Y=Y2)*(AA:AA=AA2)*(A:A<>A2),0))
Using named ranges I have been able to input this formula using VBA, but what I really want to do is use VBA to perform a similar function and write the resulting value to column D.
I am thinking that possibly a loop, for each i from 2 to last row, find the other row within the range that matches and write cell(row that was found, 3).value to cell(i, 4), but I don't know the syntax for a VBA array to find that matching row.
While not explicitly stated, it could easily be inferred that you are seeking to use VBA to increase the efficiency of the calculation/recalculation of your array formula. You haven't provided the scope (i.e. number of rows) of your data but it is unlikely that you require the full column references you are using. The following calculation cycle times were based on ~1000 rows of static data.
Your array formula:
=INDEX(C:C, MATCH(1, (O:O=O2)*(Y:Y=Y2)*(AA:AA=AA2)*(A:A<>A2), 0))
Elapsed time to fill down and calculate: 24.828 seconds
Your array formula with column references truncated to actual extents of the data:
=INDEX($C$2:$C$999, MATCH(1, ($O$2:$O$999=O2)*($Y$2:$Y$999=Y2)*($AA$2:$AA$999=AA2)*($A$2:$A$999<>A2), 0))
Elapsed time to fill down and calculate: 0.203 seconds
Comparable standard formula with column references truncated to actual extents of the data:
=INDEX($C$2:$C$999, MIN(INDEX(ROW($1:$998)+(($A$2:$A$999=A2)+($O$2:$O$999<>O2)+($Y$2:$Y$999<>Y2)+($AA$2:$AA$999<>AA2))*1E+99, , )))
Elapsed time to fill down and calculate: 0.257 seconds
As you can see, cutting down the column references to what are actually being used increases efficiency immensely. Array formulas process by calculating everything against everything else. The calculation load increases exponentially
as the rows of cells referenced increases.
If your data is constantly changing and you do not know how many rows you will be dealing with, use named ranges with a Refers to: that is defined dynamically. Example:
Pick a column that usually defines the extents of the data and note the nature of the data. This method differs slightly depending upon whether you are dealing with true numbers or text. For demonstration purposes, column C has numbers.
Choose Formulas ► Defined Names► Name Manager. When the Name Manager dialog opens, click New.
Type a friendly name for the range in the Name: text box; e.g. MyColAA. Lave the Scope: as Workbook.
Use the following for the Refers to:: =$AA$2:INDEX($AA:$AA, MATCH(1e99, $C:$C))
This will define myColAA as AA2 to the row matching the last number in column C. If column C:C was full of text values you would use the following, =$AA$2:INDEX($AA:$AA, MATCH("zzz", $C:$C))
Repeat for columns A:A, C:C, O:O and Y:Y. Keep the last used reference as column C:C so they will always have the same number of rows but change the other column references and give each a new name.
When you created all of the named ranges and are back at the worksheet, test one by tapping F5, typing myColAA into the Reference: text box and clicking OK.
Your array formula will now look similar to the following.
=INDEX(myColC, MATCH(1, (myColO=O2)*(myColY=Y2)*(myColAA=AA2)*(myColA<>A2), 0))
The named ranges will grow and shrink with the amount of data available.
Related
A truncated version of my data is in the form shown in the screenshot below: three columns of 5 unique names. The names appear in any order and in any position but never repeat in a single row.
My goal is to create an array that contains the number of times Adam appears in each row. I can fill down the formula=countif(A2:C2,$I$2) in a new column, or if I write the array manually for each row, it looks like:
={countif(A2:C2,$I$2);countif(A3:C3,$I$2);countif(A4:C4,$I$2);countif(A5:C5,$I$2);countif(A6:C6,$I$2)}
Where cell I2 contains "Adam". Of course, this is not feasible for large data sets.
I know that arrays are effectively cells turned into ranges, but my main issue is that the cell I'm trying to transform already references a range, and I don't know how to tell the software to apply the countif down each row (i.e. I intuitively would like to do something like countif((A2:C2):(A99:C99),"Adam") but understand that's not how spreadsheets work).
My goal is ultimately to perform some operations on the corresponding array but I think I'm comfortable enough with that once I can get the array formula I'm looking for.
try:
=ARRAYFORMULA(IF(A2:A="",,MMULT(IF(A2:C="Adam", 1, 0), {1;1;1})))
I have a workbook with several sheets, each containing a large amount of data formatted identically. What I'd like to do is enter a formula on a summary sheet that sums data from across the data sheets, selecting the data to sum based on an array of criteria.
The list of sheets is named 'AdHoc_Sheets' and the list of criteria is named 'Uncontrollable_Compensation'.
First attempt:
=SUMPRODUCT(SUMIF(INDIRECT("'"&AdHoc_Sheets&"'!"&"C:C"),A40,INDIRECT("'"&AdHoc_Sheets&"'!"&"E:E")))
This works well when only a single criteria (in this case 'A40') is needed. The challenge I'm finding is changing that to be an array of criteria.
Second attempt:
={SUMPRODUCT(SUM(IF(ISERROR(MATCH(INDIRECT("'"&AdHoc_Sheets&"'!"&"C:C"),TRANSPOSE(Uncontrollable_Compensation),0)),0,INDIRECT("'"&AdHoc_Sheets&"'!"&"E:E"))))}
Which returns a zero when it's not CSE'd and an #N/A error when it is CSE'd. Something about the dynamics of juggling the arrays is messing me up, and I can't quite tell if I need to turn to MMULT or some other method. Thanks in advance.
Assuming that the entries in column C are text, not numeric, array formula**:
=SUM(IF(ISNUMBER(MATCH(T(OFFSET(INDIRECT("'"&AdHoc_Sheets&"'!"&"C1"),TRANSPOSE(ROW(C1:C100)-MIN(ROW(C1:C100))),0)),Uncontrollable_Compensation,0)),N(OFFSET(INDIRECT("'"&AdHoc_Sheets&"'!"&"E1"),TRANSPOSE(ROW(C1:C100)-MIN(ROW(C1:C100))),0))))`
With such a construction you cannot 'get away' with arbitrarily referencing entire columns without detriment to performance. Hence my choice of range from row 1 to row 100, which obviously you can change, though be sure to keep it as small as possible.
Regards
**Array formulas are not entered in the same way as 'standard' formulas. Instead of pressing just ENTER, you first hold down CTRL and SHIFT, and only then press ENTER. If you've done it correctly, you'll notice Excel puts curly brackets {} around the formula (though do not attempt to manually insert these yourself).
I would prefer to use an excel array formula (but if it can only be done in VBA, so be it) that copies ALL cells from a column array that contains specific text. The picture below shows what I am after and what I have tried. I'm getting close (thanks to similar, but different questions) but can't quite get to where I want. At the moment, I am getting only the first cell instead of all the cells. In my actual application, I am searching through about 20,000 cells and will have a few hundred search terms. I expect most search terms to give me about 8 - 12 cells with that value.
formula I am using:
=INDEX($A$4:$A$10,MATCH(FALSE,ISERROR(SEARCH($C$1,$A$4:$A$10)),0))
Spredsheet Image
To make this work efficiently, I recommend having a separate cell holding the results count (I used cell C2) which has this formula:
=COUNTIF(A:A,"*"&C1&"*")
Then in cell C4 and copied down use this array formula (The -3 is just because the header row is row 3. If the header row was row 1, it would be -1):
=IF(ROW(A1)>$C$2,"",INDEX($A$4:$A$21000,SMALL(IF(ISNUMBER(SEARCH($C$1,$A$4:$A$21000)),ROW($A$4:$A$21000)-3),ROW(C1))))
I tested this with 21000 rows of data in column A with an average of 30 results per search string and the formula is copied down for 60 cells in column C. With that much data, this takes about 1-2 seconds to finish recalculating. Recalculation time can vary widely depending on other factors in your workbook (additional formulas, nested dependencies, use of volatile functions, etc) as well as your hardware.
Alternately, you could just use the built-in Filter functionality, but I hope this helps.
You need to get the ROWS. Put this in C4 and copy down.
=IFERROR(AGGREGATE(15,6, IF(SEARCH($C$1, $A$4:$A$10)>0, ROW($A$4:$A$10)), ROW($C4)-ROW($A$4)+1), "")
Array formula so use ctrl-shift-Enter
I want to refer a range of data in my excel sheet that is of variable range. Means this month data have 80 rows but next month it could be of 100 rows. So i just wanted a method to refer a range for variable range. So that i can use that method in following formula:-
=SUMPRODUCT(Allocation_Updt!$J$2:$J$83*((RIGHT(Allocation_Updt!$F$2:$F$83,6)+0)=$E62))/100
Here 83 is the last row of the data sheet. but it can be changed next time. Setting it to 10000(Almost max limit of my data) will give me error.
Try converting the range of data to a table. It will automatically apply a name to each column. These column names can be used to refer to the data in the column, and that range of data will be dynamic going forward.
Use,
match(1e99, Allocation_Updt!$J:$J)
... to find the row number of the last number or date in a column. With the last value in J83 all of the following three range references are the same thing
Allocation_Updt!$J$2:$J$83
Allocation_Updt!$J$2:index(Allocation_Updt!J:J, match(1e99, Allocation_Updt!$J:$J))
index(Allocation_Updt!J:J, 2)):index(Allocation_Updt!J:J, match(1e99, Allocation_Updt!$J:$J))
So your SUMPRODUCT function can be dynamically limited to exactly what is needed with,
=SUMPRODUCT(Allocation_Updt!$J$2:index(Allocation_Updt!$J:$J, match(1e99, Allocation_Updt!$J:$J))*((RIGHT(Allocation_Updt!$F$2:index(Allocation_Updt!$F:$F, match(1e99, Allocation_Updt!$J:$J)),6)+0)=$E62))/100
Note that the last row number in column J is used to get the last valid entry in both column F and column J.
Given the persnickety nature of SUMPRODUCT, I might perform some tests with,
=sumifs(Allocation_Updt!$J:$J, Allocation_Updt!$F:$F, "*"&$E62)/100
That is not specifically a 'right-most 6 character match'; it is an 'ends-with-E62' match. Some testing on your own data will quickly prove whether this is a viable alternative. It is more efficient, more forgiving and you can use full column references without penalty.
I have the following formula which I will explain below:
{=SUM(IF(($G$1:$L$1=$O$1)*($G$2:$L$2=$O$2)*($G$3:$L$3=$O$3)*($G$4:$L$4=$O$4)*($G$5:$L$5=$O$5)*($G$6:$L$6=$O$6)*($G$7:$L$7=$O$7);G21:L21))}
Here is what the worksheet looks like:
Under columns G - L we have a 'database' of all data. These columns will be added cumulatively each quarter (approx 30 columns a quarter). So after a few years we have ended up with a bunch of database columns (1000 + columns of raw data). For the sake of this demo, I have only included those 6 columns.
As you can see, each column contains specific parameters, between rows 1 - 7, which allows to identify specific CountryCode + Project Code + Category + Fiscal Year, + ... (etc.). This allows us to track down a unique specific project and retrieve its data.
What we have afterwards on the column O is a specific project we are trying to retrieve values for (you can see that the rows 1 - 7 are the same as under column G (we are trying to retrieve values for this particular project).
Here comes our formula. I have attached above. Here is what it looks like when I press F2. As you can see the IF statement is first simply checking whether the particular columns match the pre-defined criteria under column O and it sums all the columns that match all the criteria between rows 1-7.
Now here is the problem. We have a worksheet, which contains 20 projects (such as column O) and we are using this array formula there to retrieve values. The problem is that retrieving data using this way takes A LOT OF TIME. We have also adopted a principle via VBA that we iterate through all the cells, then we insert a formula, calculate array cell, and then we copy & Paste resulting value inside (so that we won't end up with full sheet of array formulas). However it still takes LONG to calculate (1 minute or so).
I was wondering, if there is a better solution how to retrieve the data in the already mentioned format (that means we have a specific criteria we are trying to find)? Maybe SUMIFS could be better? Or sumproduct? Or even compeltely different solution?
I am open to any proposal which would fasten the process.
i met similar problem about 2 weeks ago. At first i use a helper column/row. The helper column is to concatenate the 7 string in each column. then only use the IF function to check if the joined text match. Such as, assuming the helper row is row 8 per your sample, cell G8 formula would be
=CONCATENATE(G1,"|",G2,"|",G3,"|",G4,"|",G5,"|",G6,"|",G7)
and do the same for the rest including column O
=CONCATENATE(O1,"|",O2,"|",O3,"|",O4,"|",O5,"|",O6,"|",O7)
Then do a HLOOKUP
=HLOOKUP(O8,G8:L21,14,0)
In my case, the calculation time reduce from 10 min to a few seconds!
Alternatively I also found a way to do without helper column, using array again, but the idea is pretty much the same,
the formula in O21 as per your sample would be
=SUM(IF(CONCATENATE(G1:L1,G2:L2,G3:L3,G4:L4,G5:L5,G6:L6,G7:L7)=CONCATENATE(O1,O2,O3,O4,O5,O6,O7),G21:L21))
(i didn't add in the "|" delimiter for this formula, but it is better to do so)
But in the end I prefer the helper column method.
For your reference
HTH
To improve performance avoid reapeating same calculations multiple times.
This allows us to track down a unique specific project and retrieve its data.
If a combination of 7 values is unique, calculate the position of chosen project only once in helper cell (for example O15) with array formula (confirmed with Ctrl+Shift+Enter:
=MATCH(1;(G1:L1=O1)*(G2:L2=O2)*(G3:L3=O3)*(G4:L4=O4)*(G5:L5=O5)*(G6:L6=O6)*(G7:L7=O7);0)
Use the following formula in O21 and drag down:
=INDEX(G21:L21;1;$O$15)