A question on variants. Im aware that variants in Excel vba are both the default data type and also inefficient (from the viewpoint of overuse in large apps). However, I regularly use them for storing data in arrays that have multiple data types. A current project I am working on is essentially a task that requires massive optimistaion of very poor code (c.7000 lines)- and it got me thinking; is there a way around this?
To explain; the code frequently stores data in array variables. So consider a dataset of 10 columns by 10000. The columns are multiple different data types (string, double, integers, dates,etc). Assuming I want to store these in an array, I would usually;
dim myDataSet(10,10000) as variant
But, my knowledge says that this will be really inefficient with the code evaluating each item to determine what data type it is (when in practise I know what Im expecting). Plus, I lose the control that dimensioning individual data types gives me. So, (assuming the first 6 are strings, the next 4 doubles for ease of explaining the point), I could;
dim myDSstrings(6,10000) as string
dim myDSdoubles(4,10000) as double
This gives me back the control and efficiency- but is also a bit clunky (in practise the types are mixed and different- and I end up having an odd number of elements in each one, and end up having to assign them individually in the code- rather than on mass). So, its a case of;
myDSstrings(1,r) = cells(r,1)
myDSdoubles(2,r) = cells(r,2)
myDSstrings(2,r) = cells(r,3)
myDSstrings(3,r) = cells(r,4)
myDSdoubles(3,r) = cells(r,5)
..etc...
Which is a lot more ugly than;
myDataSet(c,r) = cells(r,c)
So- it got me thinking- I must be missing something here. What is the optimal way for storing an array of different data types? Or, assuming there is no way of doing it- what would be best coding-practise for storing an array of mixed data-types?
Never optimize your code without measuring first. You'll might be surprised where the code is the slowest. I use the PerfMon utility from Professional Excel Development, but you can roll your own also.
Reading and writing to and from Excel Ranges is a big time sink. Even though Variants can waste a lot of memory, this
Dim vaRange as Variant
vaRange = Sheet1.Range("A1:E10000").Value
'do something to the array
Sheet1.Range("A1:E10000").Value = vaRange
is generally faster than looping through rows and cells.
My preferred method for using arrays with multiple data types is to not use arrays at all. Rather, I'll use a custom class module and create properties for the elements. That's not necessarily a performance boost, but it makes the code much easier to write and read.
I'm not sure your bottleneck comes from the Variant typing of your array.
By the way, to set values from an array to an Excel range, you should use (in Excel 8 or higher):
Range("A1:B2") = myArray
On previous versions, you should use the following code:
Sub SuperBlastArrayToSheet(TheArray As Variant, TheRange As Range)
With TheRange.Parent.Parent 'the workbook the range is in
.Names.Add Name:="wstempdata", RefersToR1C1:=TheArray
With TheRange
.FormulaArray = "=wstempdata"
.Copy
.PasteSpecial Paste:=xlValues
End With
.Names("wstempdata").Delete
End With
End Sub
from this source that you should read for VBA optimization.
Yet, you should profile your app to see where your bottlenecks are. See this question from Issun to help you benchmark your code.
Related
I have a Function that calculates an average payout. The loop calculates down from x days to y days. As part of this function I have to interpolate a number using a specific range. However this is very slow.
I read that a method to speed up the code would be to read the range as values rather than a Range as VBA is slowed down by going to Excel each time the code is run.
Is this true?
My current code.
Function AveragePayout(Time As Double, period)
Dim i As Integer
Dim sum As Double
Dim interpolate_surface As Range
Set interpolate_surface = Range("A1", "D4")
If Time < period Then
AveragePayout = 0
Else
For i = 1 To period
interpolated_val = Interpolation(interpolate_surface, 5, Time)
sum = sum + CustomPricer(interpolated_value)
Time = Time - 1
Next i
AveragePayout = sum / period
End If
End Function
I was thinking to change line 5 to the below to then run the Interpolation on an VBA matrix/array rather than returning to the Excel document each loop (which apparently slows the function tremendously:
Set interpolate_surface = Range("A1", "D4").Value2
Alternatively are there any other methods to speed up the running of this loop?
Many thanks.
While R.Leruth is very close, there are a few things that need to be elaborated on.
First, the reason why a Range object is slower is because you are working on the Object representation of that value, and there are events bound to that Range. As a result, calculations will run, the sheet will need to be evaluated, and accessing the value has to go through the Object, and not through an in-memory representation of that object.
This performance decrease generally stands true for any Range operations, and the performance decrease is directly tied to the size of the range. Thus, operating on 100 cells is much quicker than operating on 1,000,000 cells.
While performance time of an array is also directly linked, accessing each value is much quicker. This is because the the values are in-memory and easy to access. There are no Objects to depend on with an array. This doesn't mean arrays will always be fast. I have encountered instances of array operations taking many minutes or hours because I took their initial speed for granted. You will notice a performance decrease with arrays, but the rate of performance decrease is much much much lower.
To create an array, we use the Variant type. Keep in mind a Variant can be anything, so you have to be somewhat careful. General convention is to use Dim Foo as Variant() but then any argument that accepts or returns a Variant() must be given a Variant() and not a Variant (minor difference, huge impact on code). Because of this, I tend to use Dim Foo as Variant.
We can then assign the Values from a range back to the array. While Foo = Range("A1:B2") is functionally equivalent to Foo = Range("A1:B2").Value, I strongly recommend full qualification. Thus, I don't rely on implicit properties as much as I can (.Value is the implicit property of Range).
So our code should be:
Dim Foo as Variant
Foo = SomeRange.Value
Where Foo is your variable, and SomeRange is replaced with your range.
As long as your Interpolate function accepts an array, this should cause no issues whatsoever. If the Interpolate function doesn't accept an array, you may need to find another workaround (or write your own).
To output the array, we just need to create a range of the same size as our array. There are different ways of doing this. I tend to prefer this method:
SomeRange.Resize(UBound(SomeArray, 1) - LBound(SomeArray, 1) + 1, Ubound(SomeArray, 2) - LBound(SomeArray, 2) + 1)
All this does is takes some range (should be a single cell) and resizes that range by the number of columns in the array, and the number of rows in the array. I use (Ubound - Lbound) + 1 since, for a 0-based array, this will return Ubound + 1 and for a 1-based array it will return Ubound. It makes things much simpler than creating If blocks for the same purpose.
The last thing to make sure in all of this is that your Range variable is fully-qualified. Notice that Range("A1:B2").Value is functionally equivalent to ActiveSheet.Range("A1:B2").Value but again, relying on implicit calls quickly introduces bugs. Squash those out as much as possible. If you need the ActiveSheet then use it. Otherwise, create a Worksheet variable and point that variable to the correct sheet.
And if you must use the ActiveSheet then Dim Foo as Worksheet : Set Foo = ActiveSheet is much better than just using the ActiveSheet (since the ActiveSheet will generally change when you really need it not to, and then everything will break.
Best of luck in using arrays. They are performance changing but they are never an excuse for bad coding practices. Make sure you use them properly, and that you aren't introducing new inefficiencies just because you now can.
What we usually do in VBA to speed up macros is to decrease the amount of interaction between the code and the sheet.
For example :
Get all necessary values in an array
Dim arr() as Variant
arr = Range("A1:D4")
Treat the value
...
Put them back
Range("A1:D4") = arr
In your case just try to change interpolated_surface from Range to an array type.
I have this code, which works fine, but is slow on large datasets.
I'd like to hear from the experts if this code could benefit from using Linq, or another method, and if so, how?
Dim array_of_strings As String()
' now I add strings to my array, these come from external file(s).
' This does not take long
' Throughout the execution of my program, I need to validate millions
' of other strings.
Dim search_string As String
Dim indx As Integer
' So we get million of situation like this, where I need to find out
' where in the array I can find a duplicate of this exact string
search_string = "the_string_search_for"
indx = array_of_strings.ToList().IndexOf(search_string)
Each of the strings in my array are unique, no duplicates.
This works pretty well, but like I said, too slow for larger datasets. I am running this query millions of times. Currently it takes about 1 minute for a million queries but this is too slow to my liking.
There's no need to use Linq.
If you used an indexed data structure like a dictionary, the search would be O(log n), at the cost of a slightly longer process of filling the structure. But you do that once, then do a million searches, you're going to come out ahead.
See the description of Dictionary at this site:
https://msdn.microsoft.com/en-us/library/7y3x785f(v=vs.110).aspx
Since (I think) you're talking about a collection that is its own key, you could save some memory by using SortedSet<T>
https://msdn.microsoft.com/en-us/library/dd412070(v=vs.110).aspx
No, I don't think it can benefit from linq.
Linq queries are slow, relatively speaking.
You might try to multithread it, however.
So I am working on a problem where I am dealing with very large amounts of data and I have come across a limitation I do not fully understand. I need to store sets of 6 integer values and associate each with an index. The approach I chose was to initially create my own type and then create a List(of Type). That failed with an 'Array dimensions exceeded supported range" error. Fine, I presumed that this was due to the Type I defined and perhaps the way the List/Collection was storing the data. I was expecting to make use of the full Integer.MaxValue number of indices in an array, as given in http://msdn.microsoft.com/en-us/library/wak0wfyt.aspx#BKMK_ArraySize but that seems to not apply (why?). I then proceeded to re-write the functions and ended up with an array of type Tuple(int,int,int,int,int,int). But again, I run into the same situation. Same for arrays of a type that has an array as its variable. I tried out several ways to see what the maximum size of the array could be and ended up with a maximum size of around 48E6 indices. The problem is that I need more than 10x that to store the data I have...
The only way I found to make this (sort of) work is to use a List(of List(of Integer())) and then add a new item to the top level list after every 40M indices or so. Nasty solution and not efficient, but it showed that it could be made to work...
Background: VS2010, .NET 4.0, Win7 x64, 32GB Ram.
Any ideas of how I would best store 6 integer values in either a collection or array (I need to be able to access them by index) for more than about 500 million combinations (ideally up to the 2.1B combinations)?
Thanks
The solution is actually quite simple (thanks coffee). Reading through the documentation in the link above, this should not be the problem, but... the maximum size of the array is no longer Int.MaxValue once the type isn't an integer (or so it would seem, though none of the documentation indicates this). The way around this is simply to go from something like this:
Dim _Array(Array_Size) as Tuple(of Integer,Integer,Integer,Integer,Integer,Integer)
to
Dim _Array1(Array_Size) as Integer
Dim _Array2(Array_Size) as Integer
Dim _Array3(Array_Size) as Integer
Dim _Array4(Array_Size) as Integer
Dim _Array5(Array_Size) as Integer
This allows each array the maximum size (or at least the size I need which is close enough to the max size). The only thing is that I then need to expand the rest of the code accordingly.
I am a bit surprised about this, considering that the MSDN states that 'The length of every dimension of an array is limited to the maximum value of the Integer data type' when it looks like it should actually read that 'The Total length [...] is limited to the maximum value'. That would explain that I receive an error (of the original statement) at a size that accounts for the additional 6 integer values plus some for accounting.
I've been using VBA for about a month now, and this forum has been a great resource for my first "programming" language. As I've started to get more comfortable with VBA arrays, I've begun to wonder what the best way to store variables is, and I'm sure someone here knows the answer to what's probably a programming newb question:
Is there any difference, say, between having 10 String variables used independently of each other or an array of String variables used independently of each other (by independent I mean their position in the array doesn't matter for their use in the program). There are bits of code I use where I might have around 9 public variables. Is there any advantage to setting them as an array, despite the fact that I don't need to preserve their order vis a vis one another? e.g. I could have
Public x As String
Public y As String
Public v As String
Public w As String
Or
Public arr(1 to 4) As String
arr(1) = x
arr(2) = y
arr(3) = v
arr(4) = w
In terms of what I need to do with the code, these two versions are functionally equivalent. But is there a reason to use one rather than the other?
Connected to this, I can transpose an array into an Excel field, and use xlUp and xlDown to move around the various values in the array. But I can also move through arrays in similar ways by looking for elements with a particular value or position in an array held "abstractly."* Sometimes I find it easier to manipulate array values once they have been transposed into a worksheet, using xlUp and xlDown. Apart from having to have dedicated worksheet space to do this, is this worse (time, processing power, reliability etc.) than looping through an "abstract"* array (if Applications.ScreenUpdating = False)?
*This may mean something technical to mathematicians/ serious programmers - I'm trying to say an array that doesn't use the visual display of the worksheet grid.
EDIT:
Thank you for your interesting answers. I'm not sure if the second part of my question counts as a second question entirely and I'm therefore breaking a rule of the forum, or if it is connected, but I would be very happy to tick the answer that also considered it
Unless you need to refer to them sequentially or by index# dynamically do not use an array as a grouping of scratch variables. It is harder to read.
Memory-wise they should be near identical with slight more overhead on the array.
As others have noted, there's no need to use arrays for variables which are not related or part of a "set" of values. If however you find yourself doing this:
Dim email1 as String, email2 as String, email3 as String, _
email4 as String, email5 as String
then you should consider whether an array would be a better approach.
To the second part of your question: if you're using arrays in your VBA then it would be preferrable to work with them directly in memory rather than dumping them to a worksheet and navigating them from there.
Keeping everything in-memory is going to be faster, and removes dependencies such as having to ensure there's a "scratch" worksheet around: such dependencies make your code less re-usable and more brittle.
I'm trying to write a simulator in VB (Excel macro) where the input to a simulation is taken from cells in one sheet. The input will be placed in a number of arrays, for example timePerUser(10) and bytesPerUser(10). Then there will be some simple if/for/while stuff to make calculations based on the arrays and finally I will write the results back to Excel. SO, Excel will only be used to provide input data and to display the results, everything else is happening inside of the macro, including changing values in the arrays.
I am used working with Matlab but can't use it for this simulator, so here are my questions:
Are there any existing matrix/array operations I could use within an Excel macro? For example, is there some command to check the smallest or the next smallest value in an array? The Excel function "SMALL" would be perfect, but it doesn't seem to work in macros. Or do I simply need to solve this with for-loops?
Are there any other suggestions on how to create the arrays? Is it better to have one big matrix where each row corresponds to time, data, user etc (an NxM matrix) or is it better to have separate arrays for each parameter?
How to speed up matrix/array operations? Any general suggestions?
Thanks!
Oscar
Excel does have some simple array operations - SUMPRODUCT (Matlab equivalent: A*B'), MIN, MAX, ... Most of these, including SMALL, need to be called using Application.WorksheetFunction.xxx ; for example, the function SMALL can be called in VBA with the following:
nthSmallest = Application.WorksheetFunction.Small(r, n)
where r is an array or range, and n is an integer between 1 and r.Cells.Count
I would strongly urge you to use one variable per conceptual variable, rather than trying to be clever with multiple entities in one 2D array. Speedwise I suspect the difference is small; but the opportunity to write unreadable code is significant.
As for speeding things up: it really helps to define your variables (and especially your arrays) as some fixed type (rather than Variant), e.g.:
Dim a(1 To 10) as Double
Dim ii as Integer
and it may help a little bit to have all of the arrays use the same (default) base. Since I'm a Matlab junkie myself, I often work with array indices starting at 1 - you enforce this in VBA by adding
Option Base 1
at the start of your module. At this point, if you declare
Dim a(10) as Double
it is equivalent to
Dim a(1 To 10) as Double
Is also good practice for someone who is not proficient in VBA, to add
Option Explicit
at the start of the module, as it will ensure that any variables you did not declare (or more often, that you misspelled) will generate an interpreter error.
One other thing about arrays: as a Matlab person you are used to Matlab increasing array sizes as needed. In VBA it is possible to change array size dynamically at runtime, using, for example
ReDim Preserve a(1 To 20)
which will make the array a 20 elements in size. If it was shorter, it will be padded with zeros; if it was longer, it will be trimmed. This CAN be useful, but is QUITE expensive, since often the entire array needs to be copied to a new location. When you don't know in advance how large your array will be, it's better to overestimate (and check bounds) than to ReDim every time it needs to get bigger - and do a final ReDim at the end to get it to the right size.
Hope this is enough to get your started!