I have a case in an Excel macro (VBA) where I'd like to dimension an array where the number of dimensions and the bounds of each dimension are determined at runtime. I'm letting the user specify a series of combinatorial options by creating a column for each option type and filling in the possibilities below. The number of columns and the number of options is determined at run time by inspecting the sheet.
Some code needs to run through each combination (one selection from each column) and I'd like to store the results in a multidimensional array.
The number of dimensions will probably be between about 2 to 6 so I can always fall back to a bunch of if else blocks if I have to but it feels like there should be a better way.
I was thinking it would be possible to do if I could construct the Redim statement at runtime as a string and execute the string, but this doesn't seem possible.
Is there any way to dynamically Redim with a varying number of dimensions?
I'm pretty sure there is no way of doing this in a single ReDim statement. Select Case may be marginally neater than "a bunch of If...Else blocks", but you're still writing out a lot of separate ReDims.
Working with arrays in VBA where you don't know in advance how many dimensions they will have is a bit of a PITA - as well as ReDim not being very flexible, there is also no neat way of testing an array to see how many dimensions it has (you have to loop through attempts to access higher dimensions and trap errors, or hack around in the underlying memory structure - see this question). So you will need to keep track of the number of dimensions, and write long Case statements every time you need to access the array as well, since the syntax will be different.
I would suggest creating the array with the largest number of dimensions you think you'll need, then setting the number of elements in any unused dimensions to 1 - that way you always have the same syntax every time you access the array, and if you need to you can check for this using UBound(). This is the approach taken by the Excel developers themselves for the Range.Value property - which always returns a 2-dimensional array even for a 1-dimensional Range.
As I understood your users can specify dimensions and their seize by filling in the excel-sheet. This means you have to get the last row containing a value and the last column.
Therefore, have a look at: Excel VBA- Finding the last column with data
Use Redim to change the array's size. If you want to keep some kind of entries use Redim Preserve
"Some code needs to run through each combination (one selection from
each column) and I'd like to store the results in a multidimensional
array."
To begin with, I would transpose desired Range object into a Variant.
Dim vArray as Variant
'--as per your defined Sheet, range
'this creates a two dimensional array
vArray = ActiveWorkbook.Sheets("Sheet1").Range("A1:Z300").Value2
Then you could iterate through this array to possible find the size and data you need, which you may save it to an array (with the dimensions) you need.
Little Background:
Redim: Reallocates storage space for an array variable.
You are not allowed to Redim an array, if you are defining an array with a Dim statement initially. (e.g. Dim vArray(1 to 6) As Variant).
UPDATE: to show explicitly what's allowed and what's not under Redim.
Each time you use Redim it resets your original Array object to the dimensions you are defining next.
There's a way to preserve your data using Redim Preserve but that only allows you to change the last dimension of a multidimensional array, where first dimension remains as the original.
Related
A truncated version of my data is in the form shown in the screenshot below: three columns of 5 unique names. The names appear in any order and in any position but never repeat in a single row.
My goal is to create an array that contains the number of times Adam appears in each row. I can fill down the formula=countif(A2:C2,$I$2) in a new column, or if I write the array manually for each row, it looks like:
={countif(A2:C2,$I$2);countif(A3:C3,$I$2);countif(A4:C4,$I$2);countif(A5:C5,$I$2);countif(A6:C6,$I$2)}
Where cell I2 contains "Adam". Of course, this is not feasible for large data sets.
I know that arrays are effectively cells turned into ranges, but my main issue is that the cell I'm trying to transform already references a range, and I don't know how to tell the software to apply the countif down each row (i.e. I intuitively would like to do something like countif((A2:C2):(A99:C99),"Adam") but understand that's not how spreadsheets work).
My goal is ultimately to perform some operations on the corresponding array but I think I'm comfortable enough with that once I can get the array formula I'm looking for.
try:
=ARRAYFORMULA(IF(A2:A="",,MMULT(IF(A2:C="Adam", 1, 0), {1;1;1})))
I have a large range of data in excel that I would like to parse into an array for a user defined function. The range is 2250 x 2250. It takes far too long to parse each cell in via a for loop, and it is too large to be assigned to an array via this method:
dim myArr as Variant
myArr = range("myrange")
Just brainstorming here, would it be more efficient to parse in each column and join the arrays? Any ideas?
Thanks
You're nearly there.
The code you need is:
Dim myArr as Variant
myArr = range("myrange").Value2
Note that I'm using the .Value2 property of the range, not just 'Value', which reads formats and locale settings, and will probably mangle any dates
Note, also, that I haven't bothered to Redim and specify the dimensions of the array: the Value and Value2 properties are a 2-dimensional array, (1 to Rowcount, 1 to Col Count)... Unless it's a single cell, which will be a scalar variant which breaks any downstream code that expected an array. But that's not your problem with a known 2250 x 2250 range.
If you reverse the operation, and write an array back to a range, you will need to set the size of the receiving range exactly to the dimensions of the array. Again, not your problem with the question you asked: but the two operations generally go together.
The general principle is that each 'hit' to the worksheet takes about a twentieth of a second - some machines are much faster, but they all have bad days - and the 'hit' or reading a single cell to a variable is almost exactly the same as reading a seven-million-cell range into a variant array. Both are several million times faster than reading that range in one cell at a time.
Either way, you may as well count any operation in VBA as happening in zero time once you've done the 'read-in' and stopped interacting with the worksheet.
The numbers are all very rough-and-ready, but the general principles will hold, right up until the moment you start allocating arrays that won't fit in the working memory and, again, that's not your problem today.
Remember to Erase the array variant when you've finished, rather than relying on it going out of scope: that'll make a difference, with a range this size.
This works fine.
Sub T()
Dim A() As Variant
A = Range("A2").Resize(2250, 2250).Value2
Dim i As Long, j As Long
For i = 1 To 2250
For j = 1 To 2250
If i = j Then A(i, j) = 1
Next j
Next i
Range("A2").Resize(2250, 2250).Value2 = A
End Sub
I think the best options are:
Try to limit the data to a reasonable number, say 1,000,000 values at a time.
Add some error handling to catch the Out of Memory error and then try again, but cut the size in half, then by a third, a quarter, etc...until it works.
Either way, if we're using data sets in the order of 5,000,000 values and you want to make sure that the program will run, you will need to adjust the code to chop up the data.
I'd like to know if it's possible to use the SUMIF function with implicit or "nested" arrays. With "implicit" array I mean a matrix which data isn't in it's final form in any rank of the spreadsheet, but it's function of some other array. For example, lets say that we have data of an independent variable (which values, all integers, range from 0 to 5) in the rank A1:A100, and data of a dependent variable in B1:B100. With the SUMIF function we may calculate easily, for example, the sum of the dependent variable when the independent is 4. But if we want to know the sum of the SQUARES of the dependent variable it's not that easy, indeed, the SUMIF function gives an error if we write SUMIF(A1:A100;4;B1:B100^2) no matter how we enter it (as array or as a simple formula).
Is there any way to do this without having to waste an entire column for the squares of the values of column B?
I know that for this very example the function SUMPRODUCT((A1:A100=4)*B1:B100^2) would do the job, what I don't know is how to "nest" arrays (which is very useful) in general.
The answer is no, I'm afraid. The ranges used in COUNTIF(S)/SUMIF(S)/AVERAGEIF(S) must be either:
1) References to worksheet ranges
2) Constructions which resolve to references to worksheet ranges
One example of the former:
=SUMIF(A1:A10,"A",B1:B10)
And two of the latter (which just happen to be identical to the above):
=SUMIF(A1:INDEX(A:A,10),"A",B1:INDEX(B:B,10))
=SUM(SUMIF(OFFSET(A1,{0,1,2,3,4,5,6,7,8,9},),"A",OFFSET(B1,{0,1,2,3,4,5,6,7,8,9},)))
Here SUMPRODUCT has the advantage over this group of functions, in that constructions may be passed which do not necessarily resolve to worksheet ranges.
However, it might well be the case that a more efficient set-up is achieved by, as you suggested, first using an additional column within the worksheet to compute the squares and then referencing that column within a SUMIF, not least since one of the major advantages that COUNTIF(S), SUMIF(s), etc. can claim over SUMPRODUCT is that arbitrarly large references can be passed with no detriment to calculation performance. For example, the difference in performance between:
=SUMIF(A:A,"A",B:B)
and:
=SUMPRODUCT(0+(A:A="A"),B:B)
is enormous, the latter, having to process all 1,048,576 cells within that range (whether they are technically beyond the last-used cells or not), being not at all recommendable.
Regards
I am interested in spreadsheet functions, not VBA solutions, to be included in a single cell formula.
[A1:A15 contain numeric values from 1 to 127, B1:B15 contain integers from 1 to 7 that set a divisor.]
Given the function:
=SUMPRODUCT(MOD(FREQUENCY(A1:A15;A1:A15);B1:B15))
FREQUENCY(A1:A15;A1:A15) gives a 1-column array of 15+1 rows, whereas the second part (B1:B15) is a 1-column array of 15 rows.
I would like to change the resulting array given by FREQUENCY (only in memory -not explicit in sheet-) from a 1-column 16 rows array to a 1-column 15 rows array with the first 15 cell values of that array.
[FREQUENCY documentation: https://support.office.com/en-us/article/FREQUENCY-function-44e3be2b-eca0-42cd-a3f7-fd9ea898fdb9 NB: for Excel, second remark states number of elements that depend on bins_array. ]
I would appreciate suggestions.
Thus, both arrays within MOD will have the same dimensions and SUMPRODUCT will not find cells with error values. I can disregard error values using IF and ISERROR within SUMPRODUCT, but I'd rather disregard the non-relevant part of the FREQUENCY resulting array if it is possible.
It has been thought that making it more specific might be more helpful, so it has been heavily reduced and simplified.
With external help, I have been able to fine-tune a way to solve my problem using INDEX in array formula mode. I am posting the answer in case it helps others.
One way: Put FREQUENCY(A1:A15;A1:A15), or any formula that produces an multi-cell array, within INDEX and have 2nd and/or 3rd arguments as array of consecutive values which will represent rows/columns.
INDEX(FREQUENCY(A1:A15;A1:A15);ROW(INDIRECT("1:" & ROWS(FREQUENCY(A1:A15;A1:A15)-1));1)
First argument within INDEX is the resulting array coming from a formula to shrink (from 16x1 to 15x1), which would be a multi-cell array formula if explicitly entered; second argument is the array 1..15 given by row numbers from 1 to the number of total rows of the "array from formula to shrink" MINUS 1: the first 15 (out of 16) values in the array from a formula; 3rd argument is the column of the shrank array (if need be, more than one could be selected using an analogue to the second argument).
In the particular case of FREQUENCY, because it is known that we are interested in the "bins" part of the function, the formula can be simplified by including the total rows of the "bins"/"intervals" array inside FREQUENCY (its second argument). We will have
INDEX(FREQUENCY(A1:A15;A1:A15);ROW(INDIRECT("1:" & ROWS(A1:A15)));1)
and the complete formula would become
SUMPRODUCT(MOD(INDEX(FREQUENCY(A1:A15;A1:A15);ROW(INDIRECT("1:" & ROWS(A1:A15)));1);B1:B15))
Now, both dividend and divisor of MOD have exactly the same dimensions (15x1) and because B1:B15 includes integers greater than 0 there are no errors.
Thanks all for helping me in making question more concise and better formatted.
ADDITIONAL INFORMATION: As pointed out correctly in comments by XOR LX, this does not seem to work in the widely popular spreadsheet software Excel. It has been developed for an INDEX function inside SUMPRODUCT as used in Open Office Calc which I had mistakenly thought 100% equivalent to Excel's version. A more complete answer perhaps using other functions would be appreciated.
In the previous answer, XOR LX points out very correctly that this formula cannot work in Excel, due to row_num/column_num argument behaviour. Very kindly XOR LX has shown me how that approach can work, and also thanks and credit for supplying a good answer: "INDEX can be used to redimension array (even dynamically created ones) if the row_num/column_num array is coerced to take an arbitrary array with the right dimensions, as shown on this blog entry " The following formula has been checked in Excel 2010 and has the expected results:
SUMPRODUCT(MOD(INDEX(FREQUENCY(A1:A15,A1:A15),N(INDEX(ROW(INDIRECT("1:" & ROWS(A1:A15))),,)),1),B1:B15))
NB: row_num argument of first INDEX, a ROW generated auxiliary array, has been nested inside N(INDEX([...],,)); at least one comma is necessary to account for the two arguments minimum of the nested INDEX. It is in itself interesting the discussion that applies generally to INDEX's arguments, and other functions', that need to be coerced to take arrays (see, here and here at XOR LX's blog). For Open Office users it might be worth stressing the point made at the blog
Unlike OFFSET, (...) for which the first parameter must be a
reference (...) in the worksheet, INDEX can also accept –
and manipulate – for its reference arrays which consist of values
generated e.g. via other subfunctions within the formula. XOR LX's blog
That would be indeed the case in changing the dimension in an array as in this question, but also useful in reversing or displacing the values in an array, for example. Open Office accepts arrays as row_num/column_num, so the coercion is not needed and some formulas rely on this, but without it, these formulas are unlikely to work when files are open in Excel.
Regrettably, this type of coercion is not passed correctly to Open Office, and formula need to be "decoerced" to work, at least in my casual tests.
In order to use a formula that would work in both spreadsheet programs regarding shortening arrays, the only thing I have managed is the following (required: arrays must be single-column)
SUMPRODUCT(
(COLUMN(INDIRECT("R1C1:R"& ROWS(vals_to_mod) &"C"& ROWS(FREQUENCY(vals_for_freq,vals_for_freq)),FALSE))
-ROW(COLUMN(INDIRECT("R1C1:R"& ROWS(vals_to_mod) &"C"& ROWS(FREQUENCY(vals_for_freq,vals_for_freq)),FALSE))
=0)
*MOD(TRANSPOSE(FREQUENCY(vals_for_freq,vals_for_freq)),vals_to_mod)
)
(it "shortens" one array to the shortest of the pair, by creating an auxiliary array with TRUE/1s on the diagonal starting top-left and FALSE/0s elsewhere, therefore disregarding all defined values outside the square section of the array. Thus, SUMPRODUCT adds values within the diagonal of the square section which are the product of the corresponding values up to the last value of the shorter array.)
I'm trying to write a simulator in VB (Excel macro) where the input to a simulation is taken from cells in one sheet. The input will be placed in a number of arrays, for example timePerUser(10) and bytesPerUser(10). Then there will be some simple if/for/while stuff to make calculations based on the arrays and finally I will write the results back to Excel. SO, Excel will only be used to provide input data and to display the results, everything else is happening inside of the macro, including changing values in the arrays.
I am used working with Matlab but can't use it for this simulator, so here are my questions:
Are there any existing matrix/array operations I could use within an Excel macro? For example, is there some command to check the smallest or the next smallest value in an array? The Excel function "SMALL" would be perfect, but it doesn't seem to work in macros. Or do I simply need to solve this with for-loops?
Are there any other suggestions on how to create the arrays? Is it better to have one big matrix where each row corresponds to time, data, user etc (an NxM matrix) or is it better to have separate arrays for each parameter?
How to speed up matrix/array operations? Any general suggestions?
Thanks!
Oscar
Excel does have some simple array operations - SUMPRODUCT (Matlab equivalent: A*B'), MIN, MAX, ... Most of these, including SMALL, need to be called using Application.WorksheetFunction.xxx ; for example, the function SMALL can be called in VBA with the following:
nthSmallest = Application.WorksheetFunction.Small(r, n)
where r is an array or range, and n is an integer between 1 and r.Cells.Count
I would strongly urge you to use one variable per conceptual variable, rather than trying to be clever with multiple entities in one 2D array. Speedwise I suspect the difference is small; but the opportunity to write unreadable code is significant.
As for speeding things up: it really helps to define your variables (and especially your arrays) as some fixed type (rather than Variant), e.g.:
Dim a(1 To 10) as Double
Dim ii as Integer
and it may help a little bit to have all of the arrays use the same (default) base. Since I'm a Matlab junkie myself, I often work with array indices starting at 1 - you enforce this in VBA by adding
Option Base 1
at the start of your module. At this point, if you declare
Dim a(10) as Double
it is equivalent to
Dim a(1 To 10) as Double
Is also good practice for someone who is not proficient in VBA, to add
Option Explicit
at the start of the module, as it will ensure that any variables you did not declare (or more often, that you misspelled) will generate an interpreter error.
One other thing about arrays: as a Matlab person you are used to Matlab increasing array sizes as needed. In VBA it is possible to change array size dynamically at runtime, using, for example
ReDim Preserve a(1 To 20)
which will make the array a 20 elements in size. If it was shorter, it will be padded with zeros; if it was longer, it will be trimmed. This CAN be useful, but is QUITE expensive, since often the entire array needs to be copied to a new location. When you don't know in advance how large your array will be, it's better to overestimate (and check bounds) than to ReDim every time it needs to get bigger - and do a final ReDim at the end to get it to the right size.
Hope this is enough to get your started!