Filtering 2D Arrays in Excel VBA - arrays

Using Excel and VBA, I wanted some advice on how to best filter data in an array (in the same sort of way one might use a pivot table) strictly using VBA. I am creating a UserForm that is going to make some data decisions based on currently existing data. I can visualize how to do it well enough but am not that versed in VBA programming.
Here is an example
A B C
bob 12 Small
sam 16 Large
sally 1346 Large
sam 13 Small
sally 65 Medium
bob 1 Medium
To grab the data in an Array, I could use
Dim my_array As Variant
my_array = Range("A1").CurrentRegion
Now, I am familiar with looping through 2D arrays, but I wondered: what the most effective way to filter 2D array data (without looping through the array time and again)?
For example, how do I get would be to say get this kind of data:
data_for_sally As Variant 'rows with sally as name in ColA
data_for_sally_less_than_ten As Variant ' all rows with sally's name in ColA and colB < 10
data_for_all_mediums as Variant ' all rows where ColC is Medium
Suggestions? I could work this out with a bunch of custom functions and loops but I thought there must be a better way. Thanks.

I assume you want to use VBA only.
I think it depends on several parameters, mainly on:
how often you run the same condition => do you store the result of a filter or do you recalculate every time?
how often you need to filter stuff => if often, it is worth having a proper code structure in place, if not then a one off loop is clearly the way to go.
From an OO perspective, assuming performance (speed & memory) is not an issue, I would go for the following design (I won't go into the details of the implementation, only give the general idea). Create a class (let's call it imaginatively ArrayFilter) that you could use like this.
Setup the filter
Dim filter As New ArrayFilter
With filter
.name = "sam"
.category = "Medium"
.maxValue = 10
End With
Or
filter.add(1, "sam") 'column 1
filter.add(3, "Medium") 'column 3
filter.addMax(2, 10) 'column 2
Create the filtered data set
filteredArray = getFilteredArray(originalArray, filter)
The getFilteredArray is fairly straightforward to write: you loop over the array checking if the values match the filter and put the valid lines in a new array:
If filter.isValidLine(originalArray, lineNumber) Then 'append to new array
Pros
Clean design
Reusable, especially with the second version where you use the column number. This can be used to filter any arrays really.
Filtering code is in one function that you can test
Corollary: avoid duplication of code
Cons
Filtering is recalculated every time, even if you use the same filter twice. You can store the results in a Dictionary for example - see below.
Memory: every call to the getFilteredArray creates a new array, but not sure how this can be avoided anyway
This adds quite a few lines of code, so I would do it only if it helps make the code easier to read / maintain.
ps: If you need to cache the results to improve performance, one way would be to store the results in a dictionary and add some logic to the getFilteredArray function. Note that unless your arrays are really big and/or you run the same filter a lot, this is probably not worth it.
filters.add filter, filteredArray 'filters is a dictionary
That way, when you call getFilteredArray next time, you can do something like this:
For each f in filters
'Check if all conditions in f and newFilter are the same
'If they are:
getFilteredArray = filters(f)
Exit Function
Next
'Not found in cache: compute the result

Try this
' credited to ndu
Function Filter2DArray(ByVal sArray, ByVal ColIndex As Long, ByVal FindStr As String, ByVal HasTitle As Boolean)
Dim tmpArr, i As Long, j As Long, Arr, Dic, TmpStr, Tmp, Chk As Boolean, TmpVal As Double
On Error Resume Next
Set Dic = CreateObject("Scripting.Dictionary")
tmpArr = sArray
ColIndex = ColIndex + LBound(tmpArr, 2) - 1
Chk = (InStr("><=", Left(FindStr, 1)) > 0)
For i = LBound(tmpArr, 1) - HasTitle To UBound(tmpArr, 1)
If Chk Then
TmpVal = CDbl(tmpArr(i, ColIndex))
If Evaluate(TmpVal & FindStr) Then Dic.Add i, ""
Else
If UCase(tmpArr(i, ColIndex)) Like UCase(FindStr) Then Dic.Add i, "" 'This finds only exact matches, if you need *FindStr* use: If UCase(tmpArr(i, ColIndex)) Like UCase("*" & FindStr & "*") Then Dic.Add i, ""
End If
Next
If Dic.Count > 0 Then
Tmp = Dic.Keys
ReDim Arr(LBound(tmpArr, 1) To UBound(Tmp) + LBound(tmpArr, 1) - HasTitle, LBound(tmpArr, 2) To UBound(tmpArr, 2))
For i = LBound(tmpArr, 1) - HasTitle To UBound(Tmp) + LBound(tmpArr, 1) - HasTitle
For j = LBound(tmpArr, 2) To UBound(tmpArr, 2)
Arr(i, j) = tmpArr(Tmp(i - LBound(tmpArr, 1) + HasTitle), j)
Next
Next
If HasTitle Then
For j = LBound(tmpArr, 2) To UBound(tmpArr, 2)
Arr(LBound(tmpArr, 1), j) = tmpArr(LBound(tmpArr, 1), j)
Next
End If
End If
Filter2DArray = Arr
End Function

Related

Arrays in Excel VBA. At some point it puts NA instead of the value

I am trying to make a simple simulation in Excel VBA in which we roll two dices. What is the probability of getting "1"+"2" or "1"+"3"?
This is my code:
Sub ProbabilityMeyerArray()
Dim i As Long
Dim ArrayDices(1 To 100000, 1 To 2) As Variant
Dim ArrayResult(1 To 100000) As Variant
'Simulation
For i = 1 To 100000
ArrayDices(i, 1) = WorksheetFunction.RandBetween(1, 6)
ArrayDices(i, 2) = WorksheetFunction.RandBetween(1, 6)
If (ArrayDices(i, 1) = 1 And ArrayDices(i, 2) = 3) _
Or (ArrayDices(i, 1) = 1 And ArrayDices(i, 2) = 2) _
Or (ArrayDices(i, 1) = 3 And ArrayDices(i, 2) = 1) _
Or (ArrayDices(i, 1) = 2 And ArrayDices(i, 2) = 1) Then
ArrayResult(i) = 1
Else
ArrayResult(i) = 0
End If
Next i
'print the values to cells
Range("A1:B100000").Value = ArrayDices
Range("C1:C100000").Value = WorksheetFunction.Transpose(ArrayResult)
'Calculate the probability
Probability = Application.WorksheetFunction.Sum(ArrayResult) / 100000
MsgBox "The Probability is " & Probability
End Sub
The problem is that when I print the values from arrays to the cells, then in column C I have 0 and 1 (as it should be), but then from row 34465 I get NA. Here is a screenshot:
https://ibb.co/7jsjjJC
So, for some reason it starts putting NA instead of 0 and 1. The calculation does not work properly either, because the probability is too low, and I guess this is because it only counts the first 34464 zeros and ones, while dividing with 100 000. Can you help me understand what is wrong here? It seems to be a problem with (my understanding of) arrays, since I can run a similar simulation without arrays (by simply using cells), and it works.
Thanks in advance!
As #RaymondWu said in the comments, the problem is that the Transpose function has a limit to the length of the array it can manipulate. This limit is somewhere between 65k and 66k columns.
Indeed, your code will run as expected for 65k iterations.
You can easily avoid using transpose and to be honest I don't see the reason to use it in the first place.
Instead of declaring your array as Dim ArrayResult(1 To 100000) As Variant which by default makes it 1 row x 100000 columns, you can declare it as so:
Dim ArrayResult(1 To 100000, 1 To 1) As Variant
This will make it 100000 rows x 1 columns, which can now be written in a column in excel easily like so:
Range("C1:C100000").Value = ArrayResult
Of course you will also need to change the code accordingly where needed:
ArrayResult(i,1) = 1
Else
ArrayResult(i,1) = 0
A few other tips:
Always use Option Explicit at the very top of the code. It makes the declaration of variables mandatory and it helps to avoid mistakes
Always use explicit references. When referencing Ranges the best practice is to define the worksheet to which they belong e.g. ThisWorkbook.Worksheets("Sheet1").Range("C1:C100000").Value = ArrayResult

How to Pass an Array to and from a Function?

(Fair Warning, I am self taught on VBA so I apologize in advance for any cringe-worthy coding or notations.)
I have an estimating worksheet in excel. The worksheet will have a section for the user to input variables (which will be an array). The first input variable will "reset" the remaining input variables to a standard value when the first variable is changed. The standard values for the input variables are stored in a function in a module. I am attempting to fill the input variable array with the standard values from the function and then display those values on the sheet. I was easily able to do this without arrays but have had no luck in moving everything into arrays.
This is for excel 2010. I previously did not use arrays and created a new variable when needed, however the estimating sheet has grown much larger and it would be better to use arrays at this point. I have googled this question quite a bit, played around with removing and adding parenthesis, changing the type to Variant, trying to set the input variable array to be a variable that is an array (if that makes sense?), and briefly looked into ParamArray but that does not seem applicable here.
Dim BearingDim(1 To 9, 1 To 4, 1 To 8) As Range
Dim arrBearingGeneral(1 To 5, 1 To 8) As Range
Dim Test As Variant
Private Sub Worksheet_Change(ByVal Target As Range)
'Set General Variable array to cells on the worksheet
For i = 1 To 5
For j = 1 To 8
Set arrBearingGeneral(i, j) = Cells(9 + i, 3 + j)
Next j
Next i
'Set Bearing Input Variables to Cells on the Worksheet
For p = 1 To 4
For i = 1 To 9
Select Case p
Case Is = 1
Set BearingDim(i, p, 1) = Cells(16 + i, 4)
Case Is = 2
Set BearingDim(i, p, 1) = Cells(27 + i, 4)
Case Is = 3
Set BearingDim(i, p, 1) = Cells(37 + i, 4)
Case Is = 4
Set BearingDim(i, p, 1) = Cells(49 + i, 4)
End Select
Next i
Next p
'Autopopulate standard input variables based on Bearing Type
inputMD_StdRocker BearingType:=arrBearingGeneral(1, 1), _
arrBearingDim:=BearingDim
End Sub
Sub inputMD_StdRocker(ByVal BearingType As String, ByRef _
arrBearingDim() As Variant)
Dim arrBearingDim(1 To 9, 1 To 4)
Select Case BearingType
Case Is = "MF50-I"
For j = 1 To 2
arrBearingDim(2, j) = 20
arrBearingDim(3, j) = 9
arrBearingDim(4, j) = 1.75
Next j
arrBearingDim(5, 1) = 15
'There are numerous more select case, but those were removed to keep it
'short
End Select
End Sub
The expected output is my "BearingDim" Array will have certain array index values set to a standard value from the "inputMD_StdRocker" function. Then those values will be displayed in the cell that corresponds to the array index.
Currently, I get a compile error "Type Mismatch, Array or User-Defined Type Expected". I have been able to get around the type mismatch by removing the () from "arrBearingDim()" in the function title for "inputMD_StdRocker" however, it will not pass the values back to my "BearingDim" array.
Any help would be greatly appreciated.
This is a partial answer to what (I think) is a misunderstanding you have of how to use arrays. There are a few problems in your code.
First, you're defining a two-dimensional and a three-dimensional array of Ranges when I believe you really only want to store the values captured from the worksheet. (If I'm wrong, then you are never initializing the array of Ranges, so none of the ranges in the array actually point to anything.)
Secondly, it looks as if your initial array arrBearingGeneral is always filled from the same (static) area of the worksheet. If this is so (and you really do want the values from the cells, not an array of Range objects), then you can create a memory-based array (read this website, especially section 19). So the first part of your code can be reduced to
'--- create and populate a memory-based array
Dim bearingDataArea As Range
Dim arrBearingGeneral(1 To 5, 1 To 8) As Variant
Set bearingDataArea = ThisWorkbook.Sheets("Sheet1").Range("D10:K14")
arrBearingGeneral = bearingDataArea.Value
Optionally of course you can calculate the range of your data instead of hard-coding it ("D10:K14"), but this example follows your own example.
While this isn't a complete answer, hopefully it clears up an issue to get you farther down the road.

Excel transpose 2D array - Out of memory issue

I have a VBA macro, where I want to write out an array to an Excel sheet.
I'm getting an "Out of memory" runtime error on some machines. I can run it easily on my development PC, but my client has issues with it.
Here I define my Values array:
Dim Values()
Dim idx As Long
idx = 0
Then I have a for cycle where I dynamically redim the array, and add my values to it:
for cycle...
ReDim Preserve Values(16, 0 To idx)
Values(0, idx) = "some text"
Values(1, idx) = "some other text"
....
Values(15, idx) = "last values for this row"
idx = idx + 1
next
Then here is where my code fails:
With ws
.Range(.Cells(1, 1), .Cells(1+ idx - 1, 16)).value = TransP(Values)
End With
Here's the TransP transposing function:
Public Function TransP(var As Variant) As Variant
Dim outP() As Variant, i As Long, j As Long
ReDim outP(LBound(var, 2) To UBound(var, 2), LBound(var, 1) To UBound(var, 1))
For i = LBound(outP) To UBound(outP)
For j = LBound(var) To UBound(var)
outP(i, j) = var(j, i)
Next
Next
TransP = outP
End Function
As I said, I can run the macro, and get something like 108770 rows. The same 108770 rows don't work on my clients PC.
I expect that the TransP function gives up on his PC, so should I split up the array into multiple smaller chunks, and write them 1 by 1?
Or my data model is not good?
You could also create a loop to write your output array row by row, it will take more time but you will most likely don't get out of memory error.
In the past when I've got out of memory issues with an arrays I just tried to perform actions using regular excel commands, in this case you could just copy range and than paste transposed values:
.PasteSpecial Paste:=xlPasteValues, Transpose:=True
Your method of appending elements to Values is inefficient because every time an element is added a new array is created and the values of the existing one copied to it. During this time twice the memory is in use, and if a large array is copied in this way multiple times in quick succession only heaven can know what demands are put on the RAM management.
The better way is to dimension the array larger than will be required (once), count the number of elements written to it and use Redim Preserve to reduce its size (once) when you are done.
I suspect this might be due to you not specifying the dimension in your TransP function. You are also defining your loops with both arrays. You've sized them identically (although with the dimensions switched) - Just use the same one to define your For loop
Public Function TransP(var As Variant) As Variant
Dim outP() As Variant, i As Long, j As Long
ReDim outP(LBound(var, 2) To UBound(var, 2), LBound(var, 1) To UBound(var, 1))
For i = LBound(outP, 1) To UBound(outP, 1)
For j = LBound(outP, 2) To UBound(outP, 2)
outP(i, j) = var(j, i)
Next
Next
TransP = outP
End Function
I don't think it's the TransP function, since that is already handling everything in a loop. I experienced the same sort of issue and there this error occured when I tried to transfer a large multidimensional array to a range.
My solution was to create a loop and do about a 1000 rows each time, but that depends on the clients pc I guess with how many rows you can do.
To take into account that var will not end on a certain step size, you could do a while loop:
i = 0
with ws
Do while i < Ubound(var)
max = application.worksheetfunction.max(1000,Ubound(var) - i) 'At some point, you could have less than 1000 left
.range(.cells(i+1,1), .cells(i+max)) = TransP(var,i,max)
i = i + 1000
Loop
In TransP you now use the lower- and upperbound of var, but if you add two variables from the loop, you can use them to only take a piece of the array.

Array slicing based on filter

I am trying to filter an array of 20000 rows and 70 columns by a particular column. I want to copy the whole row to another array if the value in column 14 is "Hard". I came up with a very simple implementation and tried using it but Excel went Not Responding everytime crashed eventually. Then I added DoEvents in the loop so that excel does not crash. My code looks as follows for now. It would help a lot if the community could help me optimize it. It is taking way too much time right now.
Is there some other function I can use to slice the array, instead of Index, that would be working faster?
Dim arr_all() As Variant
Dim arr_Hard(1 To 20000) As Variant
Dim arr_Soft(1 To 20000) As Variant
Dim arr_Travel() As Variant
arr_all = wsCopyTo.Range("A2:BR20000").Value
m = 1
n = 1
For i = LBound(arr_all) To UBound(arr_all)
DoEvents
Select Case arr_all(i, 14)
Case "Hard"
arr_Hard(m) = Application.Index(arr_all, i)
m = m + 1
Case "Soft"
arr_Soft(n) = Application.Index(arr_all, i)
n = n + 1
End Select
Next i
The Solution
Sort the items in the original Worksheet by the Hard/Soft column (desc or asc no matter). If you can't sort for some reason then copy the original worksheet to a new one.
Find the row where Hard/Soft items end - simply count the number of Hard or Soft with COUNTIF, whichever comes first, and get the required ranges of cells FROM and TO for both Hard and Soft items
Use the wsCopyTo.Range("A[FROM]:BR[TO]").Value to simply copy the part of Hard and Soft to array arr_Hard and array arr_Hard

Vb.Net Sort 3x One Dimensional Arrays

I have 3 one dimensional arrays.
Each contains information that corresponds to the other 2 arrays.
e.g Array 1 contains a customer first name
Array 2 contains a customer last name
Array 3 contains the customer phone number.
This is not my actual example but is easiest to explain.
How do I sort all three arrays so that they are sorted in order by say customer last name.
If Mr Smith is sorted and has moved from position 10 to position 5 in the lastname array, I would expect his phone number and first name to also be in position 5 in the respective arrays.
I am dealing with arrays with 10,000's of items so I would like to avoid looping (my current method) as this is incredibly slow.
Hoping to use the array.sort methods.
Can someone help me?
Ok - So I have tried to use a new data Type but am still at a loss how I can instantly filter using this. Below is my sample code which has a couple of issues. If someone can resolve - it would love to learn how you did it.
The purpose of the code is to return an array containing grouped issues.
For simplicity I have assumed in the example that each constant found is an issue.
If an issue is found, combine it with other issues found on that same worksheet.
e.g The number 2 is found in both cells A1 and A2 on sheet 1. The array should return A1:A2.
If the issues are found in A1 on sheet 1 and A2 in sheet 2, two seperate array entries would be returned.
Test File and Code Here
Public Type Issues
ws_Sheet As Integer
rng_Range As String
s_Formula As String
s_Combined As String
d_ItemCount As Double
End Type
Sub IssuesFound()
Dim MyIssues() As Issues
Dim i_SheetCount As Integer
Dim s_Formula As String
Dim rng_Range As Range
Dim d_IssueCounter As Double
Dim s_SearchFor As String
Dim a_TempArray() As Issues
Dim d_InsertCounter As Double
d_IssueCounter = -1
' Loop All Sheets Using A Counter Rather Than For Each
For i_SheetCount = 1 To ActiveWorkbook.Sheets.Count
' Loop all Constants On Worksheet
For Each rng_Range In Sheets(i_SheetCount).Cells.SpecialCells(xlCellTypeConstants, 23)
If d_IssueCounter = -1 Then
' First Time and Issue Is Found, Start Recording In An Array
d_IssueCounter = d_IssueCounter + 1
ReDim MyIssues(0)
MyIssues(0).ws_Sheet = i_SheetCount
MyIssues(0).rng_Range = rng_Range.AddressLocal
MyIssues(0).s_Formula = rng_Range.Value
MyIssues(0).s_Combined = i_SheetCount & "#" & rng_Range.Value
MyIssues(0).d_ItemCount = 0
Else
' Going To Look For Issues Found On The Same Sheet with The Same Constant Value
s_SearchFor = i_SheetCount & "#" & rng_Range.Value
' HELP HERE: Need To Ideally Return Whether The Above Search Term Exists In The Array
' Without looping, and I want to return the position in the array if the item is found
a_TempArray = MyIssues 'Filter(MyIssues.s_Combined, s_SearchFor, True, vbTextCompare)
If IsVarArrayEmpty(a_TempArray) = True Then
' New Issue Found - Increase Counter By + 1
d_IssueCounter = d_IssueCounter + 1
' Increase The Array By 1
ReDim Preserve MyIssues(d_IssueCounter)
' Record The Information About The Constant Found. Sheet Number, Constant, Range, and also a combined string for searching and the array position
MyIssues(0).ws_Sheet = i_SheetCount
MyIssues(0).rng_Range = rng_Range.AddressLocal
MyIssues(0).s_Formula = rng_Range.Value
MyIssues(0).s_Combined = i_SheetCount & "#" & rng_Range.Value
MyIssues(0).d_ItemCount = 0
Else
' Get The Array Position Where Other Issues With The Same Worksheet and Constant are Stored
d_InsertCounter = a_TempArray.d_ItemCount
' Add The New Found Constant To The Range Already Containing The Same Constants on This Worksheet
MyIssues(d_InsertCounter).rng_Range = Union(rng_Range, Range(MyIssues(d_InsertCounter).rng_Range)).AddressLocal
End If
End If
Next
Next
End Sub
Function IsVarArrayEmpty(ByRef anArray As Issues)
Dim i As Integer
On Error Resume Next
i = UBound(anArray, 1)
If Err.Number = 0 Then
IsVarArrayEmpty = False
Else
IsVarArrayEmpty = True
End If
End Function
Sample Test File and Code Here
As suggested, you should not be using concurrent arrays at all. You should be defining a type with three properties and then creating a single array or collection of that type.
To answer your question though, there is no way to sort three arrays in concert but there is a way to sort two. What that means is that you can create a copy of the array that you want to use as keys and then use the copy to sort one of the other arrays and the original to sort the other. Check out the documentation for the Array.Copy overload that takes two arrays as arguments.
That said, copying the array and then sorting twice is a big overhead so you may not gain much, if anything, from this method. Better to just do it the right way in the first place, i.e. use a single array of a complex type rather than concurrent arrays of simple types. It's not 1960 any more, after all.

Resources