vba fastest way compare two lists of numbers - arrays

I am working in an MS-Access 2010 environment. I have two strings of numbers. I need to compare them and find the missing numbers in list 2. These lists can be very large (more than 100.000 numbers), so I need a fast method to compare them.
Let us say the lists are as follows:
L1: "1,2,4,5,6,8,9"
L2: "1,2,6,9"
So I need to find the numbers 4, 5 and 8. How can I do this most efficiently? I can put them in an array and them loop through both arrays, but I am afraid it will be very slow if both lists contain over 100.000 values.
Would a dictionary approach be more efficient? If so, how?

Please, try the next code. It uses a dictionary and the two strings split in arrays. It needs only two iterations per each array, which should not take too much. The arrays content not necessary to be sorted. It returns in a third array, its content being Joined and returned in Immediate Window. Of course, it can be used as an array, depending on your need:
Sub testCompare1DArrays()
Dim arr1, arr2, noMatch, i As Long, k As Long, dict2 As Object
arr1 = Split("1,2,4,5,6,8,9", ",")
arr2 = Split("1,2,6,9", ",")
Set dict2 = CreateObject("scripting.dictionary")
'place the array to be checked in the dictionary (as keys)
For i = 0 To UBound(arr2)
dict2(arr2(i)) = vbNullString
Next i
ReDim noMatch(UBound(arr1)) 'set the array to keep the processed result to surelly have space for all occurrences
For i = 1 To UBound(arr1)
If Not dict2.exists(arr1(i)) Then noMatch(k) = arr1(i): k = k + 1 'buld the returned arry for elements not being found as keys
Next i
If k > 0 Then
ReDim Preserve noMatch(k - 1) 'keep only loaded elements
Debug.Print Join(noMatch, "|") 'return in Immediate Window (Ctrl + G)
Else
Debug.Print "All elements of arr2 exists in arr1..."
End If
End Sub

An alternative method to achieve this desired result is to utilize the collections object.
First, loop through the second list adding each number to a collection and using each number as its own item key.
Second, loop through the first list testing each number against the second list's keys.
If the key does not exist, add the number to a collection of missing keys.
Private Sub compareLists()
Dim list1 As Variant, list2 As Variant
Dim testNumbers As New Collection, _
missingNumbers As New Collection
Dim i As Long
Dim number As String
list1 = Split("1,2,4,5,6,8,9", ",")
list2 = Split("1,2,6,9", ",")
'Prepare numbers from list2
For i = 0 to UBound(list2)
number = list2(i)
testNumbers.Add number, number
Next i
'Identify numbers in list1 missing from list2
For i = 0 to UBound(list1)
number = list1(i)
On Error Resume Next
testNumbers.Item number
If Err.Number = 5 And Err.Number <> 457 Then _
missingNumbers.Add number, number
Next i
End Sub
Perhaps a faster solution exists, but I tend to prefer utilizing collections.

Related

Intersect two arrays in VBA (no excel)

I have two string arrays with quite a few items. The items are unique in each array: I sort the array alphabetically and remove duplicates as I do so.
I want to know how many of the items in array a are also present in array b. As soon as there is even one of such items that occurs in both arrays, I "know enough".
And even though both arrays are sorted alphabetically first, doing a double loop as below is very heavy on the processor and terribly slow.
So I'm looking for an alternative, or even an other approach from the start...
Dim aryA(1 To 10000) As String
Dim aryB(1 To 10000) As String
Dim x As Long
Dim y As Long
Dim Counter as long
Counter = 0
'fill arrays with many values
For x = 1 to 10000
For y = 1 to 10000
If aryA(x) = aryB(y) Then
counter = counter + 1
Exit For
End If
Next y
Next x
If Flag = True Then
'run amazing code here
End If
End Sub```
Thanks!
You should try using Dictionaries instead of arrays.
I did a test with 1'000, 10'000 and 100'000 words and compared all elements (so not breaking when a duplicate was found) on a very regular computer. Using 1'000 elements was ready instantly, 10'000 took not much more that the blink of the eye, 100'000 2-3 seconds.
I don't know how your original data is from or how you will your arrays. The following code just copies the arrays into dictionaries and then loop over the keys of the first dictionary and looks if it is present in the second. No need to have the data sorted, and duplicates are eliminated automatically when the dictionary is build.
If your not familiar with Dictionaries: You need to add a reference to the "Microsoft Scripting Runtime" (or change code to late binding).
Sub arrayTest()
' (omitting my test code to fill up the arrays)
' Create Dictionaries
Dim d1 As New Dictionary
Dim d2 As New Dictionary
For x = 1 To UBound(aryA)
If Not d1.Exists(aryA(x)) Then d1.Add aryA(x), vbNullString
Next x
For y = 1 To UBound(aryB)
If Not d2.Exists(aryB(y)) Then d2.Add aryB(y), vbNullString
Next
' Compare the dictionaries
Counter = 0
Dim k
For Each k In d1.Keys
If d2.Exists(k) Then Counter = Counter + 1
Next
Debug.Print "Done: " & Counter
End Sub
If you just need this check once and don't do anything with the data afterwards, of course it is not necessary to create the second dictionary:
For x = 1 To UBound(aryA)
If Not d1.Exists(aryA(x)) Then d1.Add aryA(x), vbNullString
Next x
For y = 1 To UBound(aryB)
If d1.Exists(aryB(y)) Then Counter = Counter + 1
Next
Debug.Print "Done: " & Counter
If both array are sorted you could just go through them considering their ordering in linear time:
Pseudo code:
i=0;
j=0;
while i < len(aryA) and j < len(aryB) do
if aryA[i] == aryB[j] then
counter++;
i++;
j++;
elseif aryA[i] > aryB[j] then
j++
else
i++;
fi
done

how to populate and array with a loop

I have a strings in column "C", starting at C2 (for example: Cat, Dog, Bird, etc...) and I don't know how many. So I am using a LRow function to find the last row with data. Currently, the last row is C63 but this is expected to be different if I run the subroutine next week or next month (Hence why I said "I don't know how many"). I want to create an array for example RTArr = Array("Cat", "Dog", "Bird", etc...) So... I was thinking something like:
Dim RTArr As Variant
LRow = r.End(xlDown).Offset(x, y).Row
With ActiveSheet
For i = 2 To LRow
str = .Range("C" & i).Value
Next i
End With
Can I populate the array with something like:
Dim RTArr As Variant
LRow = r.End(xlDown).Offset(x, y).Row
With ActiveSheet
For i = 2 To LRow
ArrNum = (i - 1)
str = .Range("C" & i).Value
RTArr(ArrNum) = str
Next i
End With
Or does this not work because of the unknown size of the array? Or do I have to use "amend" in the loop? Would I be better off using a "collection" in this case? Or going about it some other way? Can I simply set a range of cells as an array without needing to loop?
If you declare a dynamic array at first (without the size), you need to ReDim it to the needed size before populating it, which in your case will be the number of rows e.g. ReDim RTArr(numberofitems). Or use a two dimensional array ReDim RTArr(numbercolumns, numberrows).
Remember that standard arrays begin at element 0, but you can define it however you like.
Remember that when inputting ranges into array Excel creates by default a two-dimensional array
More advanced techniques are possible of course, you can do some more research about VBA arrays regarding those:
1) you could ReDim the array after each element added inside of the loop, but this is mostly useful for one dimensional arrays.
2) you could define a much bigger size of array than needed before populating it, populate it, and then shrink the array to the actual size needed.
3) note that when using two (or more) dimensions ReDim Preserve works only on the last dimension.
Pseudo code for the basic populating:
Dim arr() as Variant
'we know we want to populate array with 10 elements
ReDim arr(1 to 10)
For i = 1 to 10
'This part will insert the count from the loop into the count position in array
' eg. first element of array will be a 1, second a 2 etc. until 10
arr(i) = i
Next i
If your version of Excel supports the TEXTJOIN function:
Sub Kolumn2Array()
Dim r As Range
Dim N As Long
Dim RTArray
Dim comma As String
comma = ","
N = Cells(Rows.Count, "C").End(xlUp).Row
Set r = Range("C2:C" & N)
With Application.WorksheetFunction
RTArray = Split(.TextJoin(comma, True, r), comma)
End With
End Sub

Error when Populating values for ComboBox.List

I'm trying to populate a combo box with UNIQUE values only, no duplicates; which I believe is working fine, but something is wrong with my logic in the second For loop
The below logic goings as follows...
Private Function PopulateComboBoxWeeks()
Dim i As Long
Dim x As Long
Dim LR As Long
Dim ws As Worksheet
Dim SearchNextWeek As String
LR = ws.Cells(ws.Rows.Count, "A").End(xlUp).Row
Set ws = ActiveSheet
With UserForm1.ComboBox1
''' Fill first slot in ComboBox1 with the value of last row in Column "A"
.AddItem ws.Range("A" & LR).Value
''' Loop to search Column "A" for items to fill with, start on the second last row, since the above line fills the first line
For i = LR - 1 To 2 Step -1
''' Loop to search the ComboBox.List() array
For x = 0 To .ListCount
''' Array list starts at 0
If Not (.List(x) = ws.Range("A" & i).Value) Then
.AddItem ws.Range("A" & i).Value
End If
Next x
Next i
End With
End Function
It's checking the Array list properly, but I'm stuck on the second For loop, if I start at index 0 of my array and it's taking into account the total items in the array with .ListCount. Thus it's giving me the below error...
Run-Time error '381':
Could not get the List property. Invalid property array index
Which could only mean I'm referencing an array item outside of the array size. I've tried doing .ListCount - 1 but this gives me an infinite loop. I think all my logic is sound here except this one item and I'm not sure how to get passed this point.
Iterating any collection as you're changing it is always a bad idea.
Don't loop on anything. Just tell it what range you want to use.
If you can't do that, then you need to first get the unique values into an array (single-dimensional), then assign ComboBox1.List = theArray. Done.
There are two things you want to do:
Figure out what the unique values are
Assign the List property
Don't do these two things in one single nested spaghetti loop. Separate them.
Dim allValues As Variant
'get a single-dimensional array with all the values in the column:
allValues = Application.WorksheetFunction.Transpose(ws.Range("A2:A" & LR).Value)
'let's use built-in collection keys to ensure uniqueness:
Dim uniqueValuesColl As Collection
Set uniqueValuesColl = New Collection
Dim currentIndex As Long
For currentIndex = LBound(allValues) To UBound(allValues)
If Not IsError(allValues(currentIndex)) Then
On Error Resume Next
uniqueValuesColl.Add allValues(currentIndex), Key:=CStr(allValues(currentIndex))
If Err.Number <> 0 Then
' we already have that value
Err.Clear
End If
On Error GoTo 0
End If
Next
'now we know what the unique values are - get them into an array:
ReDim uniqueValues(0 To uniqueValuesColl.Count - 1)
Dim currentItem As Variant
currentIndex = 0
For Each currentItem In uniqueValuesColl
uniqueValues(currentIndex) = currentItem
currentIndex = currentIndex + 1
Next
'just assign the list of unique values
ComboBox1.List = uniqueValues
So I'm iterating all values once, and then the unique values once. But you're currently iterating them once for every single item in the non-unique list. So this solution is O(n+m) where n is the number of non-unique items and m is the number of unique items, whereas your nested loop is O(n2) (the big-O notation of your solution is actually more complicated than that, but I'm no big-O expert).

VBA Insert value to array replacing value instead of inserting

I have a column of data with unique strings where the first 4 characters in the string may be a repeat of the first 4 characters in another string, in a format similar to:
ABCDEF
ABCDXY
ABCDKL
DTYTZF
DTYTSD
I am attempting to loop through this data to identify which 4 starting characters appear more then three times. If the first 4 digits of the string occur 3 times or more, I would like to remove these from the array entirely, and end up with an array that excludes these values. For example, in my column above, as 3 strings or more begin with 'ABCD', I would like to remove all strings that begin with this code, and have only every other value remain, such that my result would be:
DTYTZF
DTYTSD
I am currently looping through the array, pushing any value that occurs three times or more into a NEW array, and plan to then use that list to do a second pass on the original array, and remove any matches. This may not be the most efficient way, but I've not been able to determine a better way that is guaranteed not to mess my data up.
I have worked through looping through the strings to identify which strings occur more then once, but when I try to push them to an array, the string successfully is pushed to the array, but is then replaced with the next value as soon as it is pushed to the array. I know the value is pushed correctly, because if I view the array immediately afterwards, I see the value in the array. When the next value is pushed and you view the array again, only the new value is displayed (The older ones are not).
I believe this is due to my limited understanding of ReDim-ing arrays, and me not fully understanding a code snippet for pushing this value into an array. My (condensed) code is as follows:
Sub pickupValues()
Dim valuesArray()
Dim i As Long
Dim y As Long
Dim sizeCheck As Long
Dim tempArray() As String
valuesArray() = Worksheets("Sheet1").Range("A1:A10").Value
For i = LBound(valuesArray) To UBound(valuesArray)
sizeCheck = 0
For y = LBound(valuesArray) To UBound(valuesArray)
If Left(valuesArray(i, 1), 4) = Left(valuesArray(y, 1), 4) Then
sizeCheck = sizeCheck + 1
i = y
If sizeCheck >= 3 Then
ReDim tempArray(1 To 1) As String 'I'm not sure why I need to do this.
tempArray(UBound(tempArray)) = Left(valuesArray(i, 1), 4) 'I believe this is what pushes the value into the array.
ReDim Preserve tempArray(1 To UBound(tempArray) + 1) As String 'Again unsure on what the purpose of this is.
viewArray (tempArray)
End If
End If
Next y
Next i
End Sub
Function viewArray(myArray)
Dim txt As String
Dim i As Long
For i = LBound(myArray) To UBound(myArray)
txt = txt + myArray(i) + vbCrLf
Next i
MsgBox txt
End Function
What am I doing wrong?
I would like to re-use the same basic code later in the function to push other values OUT of an array based on if they match the string or not, but it seems VBA does not like to move values out of arrays either. Is there an easy solution that would match both scenarios?
I've rewritten what you are trying to do. I'm using the filter function to quickly get your results in the array
Option Explicit
Public Sub pickupValues()
Dim tmp As Variant
Dim results As Variant
Dim i As Long
Dim v
' Make sure this matches your range
With ThisWorkbook.Sheets("Sheet1")
' Important to transpose the input here as Filter will only take a 1D array. Even though it's only 1 column, setting an array this way will generate a 2D array
tmp = Application.Transpose(.Range(.Cells(1, 1), .Cells(.Cells(.Rows.Count, 1).End(xlUp).Row, 1)).Value2)
End With
' ReDiming to the maximum value and slimming down afterwards is much quicker then increasing your array each time you've found a new value
ReDim results(1 To UBound(tmp))
For Each v In tmp
' Less then 2 as first result is '0'. Will return '-1' if can't be found but as test criteria is in the array it will always be at least 0
If UBound(Filter(tmp, Left(v, 4))) < 2 Then
i = i + 1
results(i) = v
End If
Next v
' Redim Preserve down to actual array size
If i > 0 Then
ReDim Preserve results(1 To i)
viewArray (results)
Else
MsgBox "Nothing Found"
End If
End Sub
' Should really be a sub as doesn't return anything back to caller
Public Sub viewArray(myArray)
MsgBox Join(myArray, vbCrLf)
End Sub
Your algorithm is not helping you.
Option 1:
Sort your array. Then you can make a single pass to find sequential values with the same first four characters and count them.
Option 2:
Use a Dictionary object: first four characters as key, number of occurrences as value.

Store range of cells to an Array

I'm working with arrays and I'm sorry but I'm a bit new to it and still confused. I have already this code to store the values in a range in an array and if I run it, it is empty.
attPresent = ws.Range("H4:H" & lastrow)
ReDim attPresent(1 To UBound(attPresent))
For k = LBound(attPresent) To UBound(attPresent)
MsgBox attPresent(k)
Next
Can someone please tell me where I'm wrong? I've read any other posts and gather some ideas, but still not working.
You can go like this
Dim attPresent as Variant
attPresent = ws.Range("H4:H" & lastrow).Value '<-- this will automatically size the array to a 2dimensional array of as many rows as the range ones and one column
For k = LBound(attPresent,1) To UBound(attPresent,1)
MsgBox attPresent(k,1)
Next
Or
Dim attPresent as Variant
attPresent=Application.Transpose(Range("H4:H" & lastrow).Value) '<-- this will automatically size the array to a 1dimensional array of as many elements as the range
For k = LBound(attPresent) To UBound(attPresent)
MsgBox attPresent(k)
Next
Reference MSDN: ReDim Statement (Visual Basic)
When you Redim the array you are erasing all the values. Use ReDim Preserve to resize the last dimension of the Array and still preserve the values of the array.
After the dynamic array has been allocated(the first time Redim is used) you can only resize the last dimension of a array and you cannot change the number of dimensions. In your code you are trying to convert a 2 Dimensional array to a 1 Dimensional array, you can not do that.
You should watch this complete series: Excel VBA Introduction. This is the relevant video: Excel VBA Introduction Part 25 - Arrays
'Try this code example to suit your case
Sub StoreRangeofCellsToAnArray()
'presumes optionbase=0
'presume Range("a1:c3") contain number 1 to 9
Dim MyRange, MyArray(9), n
n = 0
For Each c In Range("a1:c3")
MyArray(n) = c
n = n + 1
Next
'testing: reprint array
For n = 0 To 8
Debug.Print MyArray(n)
Next
End Sub

Resources