maximum size of array in vb.net - arrays

So I am working on a problem where I am dealing with very large amounts of data and I have come across a limitation I do not fully understand. I need to store sets of 6 integer values and associate each with an index. The approach I chose was to initially create my own type and then create a List(of Type). That failed with an 'Array dimensions exceeded supported range" error. Fine, I presumed that this was due to the Type I defined and perhaps the way the List/Collection was storing the data. I was expecting to make use of the full Integer.MaxValue number of indices in an array, as given in http://msdn.microsoft.com/en-us/library/wak0wfyt.aspx#BKMK_ArraySize but that seems to not apply (why?). I then proceeded to re-write the functions and ended up with an array of type Tuple(int,int,int,int,int,int). But again, I run into the same situation. Same for arrays of a type that has an array as its variable. I tried out several ways to see what the maximum size of the array could be and ended up with a maximum size of around 48E6 indices. The problem is that I need more than 10x that to store the data I have...
The only way I found to make this (sort of) work is to use a List(of List(of Integer())) and then add a new item to the top level list after every 40M indices or so. Nasty solution and not efficient, but it showed that it could be made to work...
Background: VS2010, .NET 4.0, Win7 x64, 32GB Ram.
Any ideas of how I would best store 6 integer values in either a collection or array (I need to be able to access them by index) for more than about 500 million combinations (ideally up to the 2.1B combinations)?
Thanks

The solution is actually quite simple (thanks coffee). Reading through the documentation in the link above, this should not be the problem, but... the maximum size of the array is no longer Int.MaxValue once the type isn't an integer (or so it would seem, though none of the documentation indicates this). The way around this is simply to go from something like this:
Dim _Array(Array_Size) as Tuple(of Integer,Integer,Integer,Integer,Integer,Integer)
to
Dim _Array1(Array_Size) as Integer
Dim _Array2(Array_Size) as Integer
Dim _Array3(Array_Size) as Integer
Dim _Array4(Array_Size) as Integer
Dim _Array5(Array_Size) as Integer
This allows each array the maximum size (or at least the size I need which is close enough to the max size). The only thing is that I then need to expand the rest of the code accordingly.
I am a bit surprised about this, considering that the MSDN states that 'The length of every dimension of an array is limited to the maximum value of the Integer data type' when it looks like it should actually read that 'The Total length [...] is limited to the maximum value'. That would explain that I receive an error (of the original statement) at a size that accounts for the additional 6 integer values plus some for accounting.

Related

CGAL::Surface_mesh - Accessing face/vertex using an integer index?

From the user manual of the CGAL Surface_mesh class:
the data structure uses integer indices as descriptors for vertices,
halfedges, edges and faces
I am interested in accessing a certain face/edge/vertex based on it's integer index, but cannot find how this is done.
Iterators obviously work, but I don't want to iterate a known number of times just to get to the relevant face_index/vertex_index if I can access based on an integer face/vertex index known a-priori.
Can someone please explain how (if actually possible) to use the integer indexing if I want to access the i-th face "directly" (without iterating)?
The following should work:
unsigned int i = 33;
Surface_mesh::Face_index fi(i);

Redim variable number of dimensions in VBA

I have a case in an Excel macro (VBA) where I'd like to dimension an array where the number of dimensions and the bounds of each dimension are determined at runtime. I'm letting the user specify a series of combinatorial options by creating a column for each option type and filling in the possibilities below. The number of columns and the number of options is determined at run time by inspecting the sheet.
Some code needs to run through each combination (one selection from each column) and I'd like to store the results in a multidimensional array.
The number of dimensions will probably be between about 2 to 6 so I can always fall back to a bunch of if else blocks if I have to but it feels like there should be a better way.
I was thinking it would be possible to do if I could construct the Redim statement at runtime as a string and execute the string, but this doesn't seem possible.
Is there any way to dynamically Redim with a varying number of dimensions?
I'm pretty sure there is no way of doing this in a single ReDim statement. Select Case may be marginally neater than "a bunch of If...Else blocks", but you're still writing out a lot of separate ReDims.
Working with arrays in VBA where you don't know in advance how many dimensions they will have is a bit of a PITA - as well as ReDim not being very flexible, there is also no neat way of testing an array to see how many dimensions it has (you have to loop through attempts to access higher dimensions and trap errors, or hack around in the underlying memory structure - see this question). So you will need to keep track of the number of dimensions, and write long Case statements every time you need to access the array as well, since the syntax will be different.
I would suggest creating the array with the largest number of dimensions you think you'll need, then setting the number of elements in any unused dimensions to 1 - that way you always have the same syntax every time you access the array, and if you need to you can check for this using UBound(). This is the approach taken by the Excel developers themselves for the Range.Value property - which always returns a 2-dimensional array even for a 1-dimensional Range.
As I understood your users can specify dimensions and their seize by filling in the excel-sheet. This means you have to get the last row containing a value and the last column.
Therefore, have a look at: Excel VBA- Finding the last column with data
Use Redim to change the array's size. If you want to keep some kind of entries use Redim Preserve
"Some code needs to run through each combination (one selection from
each column) and I'd like to store the results in a multidimensional
array."
To begin with, I would transpose desired Range object into a Variant.
Dim vArray as Variant
'--as per your defined Sheet, range
'this creates a two dimensional array
vArray = ActiveWorkbook.Sheets("Sheet1").Range("A1:Z300").Value2
Then you could iterate through this array to possible find the size and data you need, which you may save it to an array (with the dimensions) you need.
Little Background:
Redim: Reallocates storage space for an array variable.
You are not allowed to Redim an array, if you are defining an array with a Dim statement initially. (e.g. Dim vArray(1 to 6) As Variant).
UPDATE: to show explicitly what's allowed and what's not under Redim.
Each time you use Redim it resets your original Array object to the dimensions you are defining next.
There's a way to preserve your data using Redim Preserve but that only allows you to change the last dimension of a multidimensional array, where first dimension remains as the original.

In which format variant array store values internally

I have amount column which has format of number.
I declare 2 dimensional array of type variant, first dimension, I store currency(ex. : GBP, USD) and in other dimension I store amount(eg.: 1234.22 or-1567.69)
myArray(1,0)=GBP
myArray(1,1)= -1234.12
myArray(2,0)=GBP
myArray(2,1)= 1234.12
I am summing myArray(1,1) and myArray(2,1), while summing it is considering format as General/Text instead of Number(which is my column format) and sum is non-zero whereas ideally sum should be 0.
Please suggest, how do I handle this scenario?
To understand that, you will need to understand exactly what a VARIANT is in VBA and exactly what an ARRAY is.
Arrays:
Starting with arrays, a VBA array is actually not really an array of memory locations but a data structure called SAFEARRAY which includes details as shown in the listing below (source):
typedef struct tagSAFEARRAY {
USHORT cDims;
USHORT fFeatures;
ULONG cbElements;
ULONG cLocks;
PVOID pvData;
SAFEARRAYBOUND rgsabound[1];
} SAFEARRAY, *LPSAFEARRAY;
So, you can see that this structure has a pointer to where the data actually is, and what the number of elements are, the number of dimensions and so on and so forth. Because of that VBA is able to ensure that by using its arrays, you will not accidentally mess up some not-to-be-disturbed memory location.
Variants:
With that out of the way, you need to understand what exactly a VARIANT is. VARIANT is also not really a primitive data type but a data structure which makes it able to handle multiple data types easily.
Details of the structure can be found by a simple search but the details are simple:
Total data structure size: 16 bytes
2 bytes: Information about the data type
6 bytes: Reserved bytes (set to 0)
8 bytes: Contain the actual data
Hence when you do a VarType the first two bytes are obtained and that is how the interpreter knows what data type is being used. See here for more details.
So you can understand now what a SAFEARRAY of VARIANT data is.
Finally, the problem in the question::
That has nothing to do with the Variant and everything to do with floating point math. Floating point numbers are not stored exactly as you think they are.
E.g. 2.323 will not be stored as 2.323 but rather as something like 2.322999999999999999999
This rounding error will eventually cause trouble (leading to the entire study of stable and unstable methods, etc.) unless you are very careful about the way you handle this quantization of sorts.
Some algorithms will be such that the errors cancel out and in some they add-up.
So, if you are looking for exact calculations, you need to use a different fixed point data type which might be more suited to your problem domain (e.g. Currency might help in some precision financial calculations)
The Solution:
The Currency data type is a 64-bit data type and interally it's like a very long integer scaled by 10,000. So up to 4 decimal places and 15 digits before the decimal can be accurately represented.

Handling of arrays in VBA macros for Excel

I'm trying to write a simulator in VB (Excel macro) where the input to a simulation is taken from cells in one sheet. The input will be placed in a number of arrays, for example timePerUser(10) and bytesPerUser(10). Then there will be some simple if/for/while stuff to make calculations based on the arrays and finally I will write the results back to Excel. SO, Excel will only be used to provide input data and to display the results, everything else is happening inside of the macro, including changing values in the arrays.
I am used working with Matlab but can't use it for this simulator, so here are my questions:
Are there any existing matrix/array operations I could use within an Excel macro? For example, is there some command to check the smallest or the next smallest value in an array? The Excel function "SMALL" would be perfect, but it doesn't seem to work in macros. Or do I simply need to solve this with for-loops?
Are there any other suggestions on how to create the arrays? Is it better to have one big matrix where each row corresponds to time, data, user etc (an NxM matrix) or is it better to have separate arrays for each parameter?
How to speed up matrix/array operations? Any general suggestions?
Thanks!
Oscar
Excel does have some simple array operations - SUMPRODUCT (Matlab equivalent: A*B'), MIN, MAX, ... Most of these, including SMALL, need to be called using Application.WorksheetFunction.xxx ; for example, the function SMALL can be called in VBA with the following:
nthSmallest = Application.WorksheetFunction.Small(r, n)
where r is an array or range, and n is an integer between 1 and r.Cells.Count
I would strongly urge you to use one variable per conceptual variable, rather than trying to be clever with multiple entities in one 2D array. Speedwise I suspect the difference is small; but the opportunity to write unreadable code is significant.
As for speeding things up: it really helps to define your variables (and especially your arrays) as some fixed type (rather than Variant), e.g.:
Dim a(1 To 10) as Double
Dim ii as Integer
and it may help a little bit to have all of the arrays use the same (default) base. Since I'm a Matlab junkie myself, I often work with array indices starting at 1 - you enforce this in VBA by adding
Option Base 1
at the start of your module. At this point, if you declare
Dim a(10) as Double
it is equivalent to
Dim a(1 To 10) as Double
Is also good practice for someone who is not proficient in VBA, to add
Option Explicit
at the start of the module, as it will ensure that any variables you did not declare (or more often, that you misspelled) will generate an interpreter error.
One other thing about arrays: as a Matlab person you are used to Matlab increasing array sizes as needed. In VBA it is possible to change array size dynamically at runtime, using, for example
ReDim Preserve a(1 To 20)
which will make the array a 20 elements in size. If it was shorter, it will be padded with zeros; if it was longer, it will be trimmed. This CAN be useful, but is QUITE expensive, since often the entire array needs to be copied to a new location. When you don't know in advance how large your array will be, it's better to overestimate (and check bounds) than to ReDim every time it needs to get bigger - and do a final ReDim at the end to get it to the right size.
Hope this is enough to get your started!

Excel VBA: Variants in Array Variables

A question on variants. Im aware that variants in Excel vba are both the default data type and also inefficient (from the viewpoint of overuse in large apps). However, I regularly use them for storing data in arrays that have multiple data types. A current project I am working on is essentially a task that requires massive optimistaion of very poor code (c.7000 lines)- and it got me thinking; is there a way around this?
To explain; the code frequently stores data in array variables. So consider a dataset of 10 columns by 10000. The columns are multiple different data types (string, double, integers, dates,etc). Assuming I want to store these in an array, I would usually;
dim myDataSet(10,10000) as variant
But, my knowledge says that this will be really inefficient with the code evaluating each item to determine what data type it is (when in practise I know what Im expecting). Plus, I lose the control that dimensioning individual data types gives me. So, (assuming the first 6 are strings, the next 4 doubles for ease of explaining the point), I could;
dim myDSstrings(6,10000) as string
dim myDSdoubles(4,10000) as double
This gives me back the control and efficiency- but is also a bit clunky (in practise the types are mixed and different- and I end up having an odd number of elements in each one, and end up having to assign them individually in the code- rather than on mass). So, its a case of;
myDSstrings(1,r) = cells(r,1)
myDSdoubles(2,r) = cells(r,2)
myDSstrings(2,r) = cells(r,3)
myDSstrings(3,r) = cells(r,4)
myDSdoubles(3,r) = cells(r,5)
..etc...
Which is a lot more ugly than;
myDataSet(c,r) = cells(r,c)
So- it got me thinking- I must be missing something here. What is the optimal way for storing an array of different data types? Or, assuming there is no way of doing it- what would be best coding-practise for storing an array of mixed data-types?
Never optimize your code without measuring first. You'll might be surprised where the code is the slowest. I use the PerfMon utility from Professional Excel Development, but you can roll your own also.
Reading and writing to and from Excel Ranges is a big time sink. Even though Variants can waste a lot of memory, this
Dim vaRange as Variant
vaRange = Sheet1.Range("A1:E10000").Value
'do something to the array
Sheet1.Range("A1:E10000").Value = vaRange
is generally faster than looping through rows and cells.
My preferred method for using arrays with multiple data types is to not use arrays at all. Rather, I'll use a custom class module and create properties for the elements. That's not necessarily a performance boost, but it makes the code much easier to write and read.
I'm not sure your bottleneck comes from the Variant typing of your array.
By the way, to set values from an array to an Excel range, you should use (in Excel 8 or higher):
Range("A1:B2") = myArray
On previous versions, you should use the following code:
Sub SuperBlastArrayToSheet(TheArray As Variant, TheRange As Range)
With TheRange.Parent.Parent 'the workbook the range is in
.Names.Add Name:="wstempdata", RefersToR1C1:=TheArray
With TheRange
.FormulaArray = "=wstempdata"
.Copy
.PasteSpecial Paste:=xlValues
End With
.Names("wstempdata").Delete
End With
End Sub
from this source that you should read for VBA optimization.
Yet, you should profile your app to see where your bottlenecks are. See this question from Issun to help you benchmark your code.

Resources