VBA array trouble error 9 script out of range - arrays

Thanks for reading my question,
I was given a list of about 250k entries along with names and sign in dates to accompany each entry to show when they logged. My task is to find out which users signed in on consecutive days, how often and how many times.
i.e. Bob smith had 3 consecutive days one time, 5 consecutive days 3 times.
joe smith had 8 consecutive days once, 5 consecutive days 8 times
etc
I am brand new to VBA and have been struggling to write a program to do this.
code:
Option Explicit
Option Base 1
Sub CountUUIDLoop()
Dim UUID As String
Dim Day As Date
Dim Instance() As Variant
ReDim Instance(50, 50)
Dim CountUUID As Variant
Dim q As Integer
Dim i As Long
Dim j As Long
Dim f As Integer
Dim g As Integer
Dim LastRow As String
f = 1
q = 1
g = 2
LastRow = Cells.Find("*", [A1], , , xlByRows, xlPrevious).Row
For i = q To LastRow
UUID = Cells(i, "A")
Instance(f, 1) = UUID
g = 2
For j = 1 To LastRow
If UUID = Cells(j, "A") Then
Instance(f, g) = Cells(j, "B")
g = g + 1
End If
Next j
f = f + 1
q = g - 1
Next i
End Sub
The goal of this code is to go through the entries and store them in the array 'Instance' such that the 2D array would look like [UUID1, B1, B2, B3]
[UUID2, B1, B2, B3, B4]
[UUID3, B1, B2]
Where the UUID is the user, the B1 represents the date that user signed in, b2 would be the next date they signed in etc. Some users have more or less dates than others.
My main issue has come with setting up the array as I keep getting different errors around it. I'm not sure how to define this 2D array partly because there will be over 30 000 rows, each with 1->85 columns.
Any help is appreciated, let me know if anything needs further clarification. Once again this is my first time using VBA so im sorry ahead of time if everything i've been doing is wrong.
P.S. I used ReDim Instance (50,50) as a test to see if i could make it work by predefining but same errors occurred. Thanks again!

As far as I understand from your question and code, you have a table with following structure:
..............A.................B
1........LOGIN1.......DATE1
2........LOGIN1.......DATE2
3........LOGIN1.......DATE3
4........LOGIN2.......DATE4
5........LOGIN2.......DATE5
6........LOGIN3.......DATE6
And your task in this code was to fetch data in a 2D structure like this:
RESULT_ARRAY-
............................|-LOGIN1-
............................................|-DATE1
............................................|-DATE2
............................................|-DATE3
............................|-LOGIN2-
............................................|-DATE4
............................................|-DATE5
............................|-LOGIN3-
............................................|-DATE6
First of all, you need to know what goes wrong in your code. Please see comments in code below to find out the reason of error:
Option Explicit
Option Base 1
Sub CountUUIDLoop()
Dim UUID As String
Dim Day As Date
Dim Instance() As Variant ' If you are using variant data type, it is not necesary to point it: default data type in VBA is Variant. Just write like this: "Dim Instance()"
ReDim Instance(50, 50) ' Limitation in 50 may be the reason, why your script is going into "out of range" error.
' Remember, that this operation means, that your array now will have following dimentions: [1..50,1..50]
Dim CountUUID As Variant 'Just write like this: "Dim CountUUID"
Dim q As Integer ' you can describe all your variables in one line, like this: "Dim q as Integer,f as Integer,g as Integer"
Dim i As Long
Dim j As Long
Dim f As Integer
Dim g As Integer
Dim LastRow As String ' first mistake: you are using String data type to perform numeric operations below in your FOR-cycle
f = 1 ' Your Instance array index starts from {0} and you are not using this index by starting from {1}.
q = 1 ' The reason to use this variable is not obvious. You could just use constant in FOR cycle below and avoid unnecessary variables.
g = 2 ' You could remove this line, because this var is set every time in cycle below (before second FOR)
LastRow = Cells.Find("*", [A1], , , xlByRows, xlPrevious).Row ' The alternative here is to use predefined Excel constants, like this:
' "Cells.SpecialCells(xlLastCell).Row".
'If LastRow is bigger, than {50} - this could be a reason of your Error.
For i = q To LastRow ' Here goes comparison between String and Integer data type, not good thing, but type conversion should work fine here.
UUID = Cells(i, "A") ' No need to perform re-set here, just move forward and assign value from this cell to the Instanse directly:
' Like this: Instance(f, 1) = Cells(i, "A")
Instance(f, 1) = UUID
g = 2
For j = 1 To LastRow ' It is another point, why "q" variable is not necessary. :)
If UUID = Cells(j, "A") Then ' You could use your Instansce value instead of UUID there, like this: "Instance(f, 1)"
Instance(f, g) = Cells(j, "B") 'If "g" variable will somehow become bigger, than {49}, this could become a reason of your Error.
g = g + 1
End If
Next j
f = f + 1
q = g - 1 ' "q" variable is not used after this row, so it is a strange unnecessary action
Next i
End Sub
Now, when we have some information about error, let me do some improvements on your code. I am certain, that to make most simply code, you can use your Excel worksheets to store and count data with VBA as background automations. But if you need the code with arrays, let's do this! :)
Option Explicit ' It is an option that turns on check for every used variable to be defined before execution. If this option is not defined, your code below will find undefined variables and define them when they are used. Good practice is to use this option, because it helps you, for example to prevent missprinting errors in variable names.
Option Base 1 ' This option sets the default index value for arrays in your code. If this option is not set, the default index value will be {0}.
Const HEADER_ROW = 1 ' It is a number to identify your header row, next row after this one will be counted as a row with data
Const UUID = 1 ' ID of element in our "Instance" array to store UUID
Const DATES_ID = 2 ' ID of element in our "Instance" array to store dates
Function CountUUIDLoop()
ActiveSheet.Copy After:=ActiveSheet 'Copy your worksheet to new one to ensure that source data will not be affected.
Dim Instance(), dates() ' "Instance" will be used to store all the data, "dates" will be used to store and operate with dates
ReDim Instance(2, 1) ' Set first limitation to the "Instance" array in style [[uuid, dates],id]
ReDim dates(1) ' Set first limitation to the "dates" array
Instance(DATES_ID, 1) = dates
Dim CountUUID
Dim i as Long, j as Long, f as Long, active_element_id As Long 'Integer is quite enough to perform our array manipulations, but Long datatype is recomended (please refer to UPDATE2 below)
i = HEADER_ROW + 1 ' Set first row to fetch data from the table
active_element_id = 1 ' Set first active element number
With ActiveSheet ' Ensure that we are working on active worksheet.
While .Cells(i, 1) <> "" 'If operated cell is not empty - continue search for data
If i > HEADER_ROW + 1 Then
active_element_id = active_element_id + 1 ' increment active element number
ReDim Preserve Instance(2, active_element_id) ' Assign new limitation (+ 1) for our Instances, don't forget to preserve our results.
ReDim dates(1) ' Set first limitation to the "dates" array
Instance(DATES_ID, active_element_id) = dates
End If
Instance(UUID, active_element_id) = .Cells(i, 1) ' save UUID
dates(1) = .Cells(i, 2) ' save first date
j = i + 1 ' Set row to search next date from as next row from current one.
While .Cells(j, 1) <> "" 'If operated cell is not empty - continue search for data
If .Cells(j, 1) = .Cells(i, 1) Then
ReDim Preserve dates(UBound(dates) + 1) ' Expand "dates" array, if new date is found.
dates(UBound(dates)) = .Cells(j, 2) ' Save new date value.
.Cells(j, 1).EntireRow.Delete 'Remove row with found date to exclude double checking in future
Else
j = j + 1 ' If uuid is not found, try next row
End If
Wend
Instance(DATES_ID, active_element_id) = dates
i = i + 1 'After all the dates are found, go to the next uuid
Wend
.Cells(i, 1) = "UUID COUNT" ' This will write you a "UUID COUNT" text in A column below all the rest of UUIDs on active worksheet
.Cells(i, 2) = i - HEADER_ROW - 1 ' This will write you a count of UUIDs in B column below all the rest of UUIDs on active worksheet
End With
CountUUIDLoop = Instance ' This ensures that your function (!) returns an array with all UUIDs and dates inside.
End Function
This function will print you count of your UUIDs at the bottom of active sheet and return you an array like this:
[[LOGIN1][1], [[DATE1][DATE2][DATE3]][1]]
I have used this order of storing data to avoid error with expanding of multidimentional arrays. This error is similar to yours, so you could read more about this there: How can I "ReDim Preserve" a 2D Array in Excel 2007 VBA so that I can add rows, not columns, to the array? Excel VBA - How to Redim a 2D array? ReDim Preserve to a Multi-Dimensional Array in Visual Basic 6
Anyway, you could use my function output ("Instance" array) to perform your further actions to find what you need or even display your uuid-dates values. :)
Good luck in your further VBA actions!
UPDATE
Here is the test procedure showing how to work with the above function's results:
Sub test()
Dim UUIDs ' The result of the "CountUUIDLoop" function will be stored there
Dim i as Long, j As Long ' Simple numeric variables used as indexies to run through our resulting array
UUIDs = CountUUIDLoop ' assign function result to a new variable
Application.DisplayAlerts = False ' Disable alerts from Excel
ActiveSheet.Delete ' Delete TMP worksheet
Application.DisplayAlerts = True ' Enable alerts from Excel
If UUIDs(UUID, 1) <> Empty Then ' This ensures that UUIDs array is not empty
Sheets.Add After:=ActiveSheet ' Add new worksheet after active one to put data into it
With ActiveSheet 'Ensure that we are working with active worksheet
.Cells(HEADER_ROW, 1) = "UUIDs/dates" ' Put the header into the "HEADER_ROW" row
For i = 1 To UBound(UUIDs, 2) ' run through all the UUIDs
.Cells(1 + HEADER_ROW, i) = UUIDs(UUID, i) ' Put UUID under the header
For j = 1 To UBound(UUIDs(DATES_ID, i)) ' run through all the dates per UUID
.Cells(j + 1 + HEADER_ROW, i) = UUIDs(DATES_ID, i)(j) ' put date into column below the UUID
Next j ' Go to next date
Next i ' Go to next UUID
.Cells.EntireColumn.AutoFit ' This will make all columns' width to fit its contents
End With
Else
MsgBox "No UUIDs are found!", vbCritical, "No UUIDs on worksheet" ' Show message box if there are no UUIDs in function result
End If
End Sub
So, if you'll have following data on the active worksheet:
..............A.................B
1........LOGIN1.......DATE1
2........LOGIN1.......DATE2
3........LOGIN1.......DATE3
4........LOGIN2.......DATE4
5........LOGIN2.......DATE5
6........LOGIN3.......DATE6
...this sub will put UUIDs on the new sheet like this:
..............A.................B.................C
1........UUIDs/dates
2........LOGIN1........LOGIN2........LOGIN3
3........DATE1.........DATE4.........DATE6
4........DATE2.........DATE5
5........DATE3
UPDATE2
It is recomended to use Long data type instead of Integer each type when integer (or whole number) variable is needed. Long is slightly faster, it has much wider limitations and costs no additional memory. Here is proof link:
MSDN:The Integer, Long, and Byte Data Types

I would recommend using collections and a dictionary instead of arrays. The below code will structure the data in a way that is very similar to the way you wanted it.
Sub collect_logins_by_user_()
'you need to enable the microsoft scripting runtime
'in tools - references
'assuming unique ids are in col A and there are no gaps
'and assuming dates in col B and there are no gaps
'
'The expected runtime for this is O(n) and I have used similar code on more than 250.000 record.
'It still takes a while obviously, but should run just fine.
'
'The the data will bestructed in the following format:
'{id_1: [d_1, d_2,...], id_2: [d_3, d_4,...], ...}
Dim current_id As Range: Set current_id = ActiveSheet.Range("A2") 'modify range as required
Dim logins_by_users As New Dictionary
While Not IsEmpty(current_id)
If Not logins_by_users.Exists(current_id.Value) Then
Set logins_by_users(current_id.Value) = New Collection
End If
logins_by_users(current_id.Value).Add current_id.Offset(ColumnOffset:=1).Value
Set current_id = current_id.Offset(RowOffset:=1)
Wend
'Once you have the data structured, you can do whatever you want with it.
'like printing it to the immediate window.
Dim id_ As Variant
For Each id_ In logins_by_users
Debug.Print "======================================================="
Debug.Print id_
Dim d As Variant
For Each d In logins_by_users(id_)
Debug.Print d
Next d
Next id_
Debug.Print "======================================================="
End Sub

I have written a bit of code that does something along the lines of what you are trying to do - it prints to the debug window the different numbers of consecutive logs per user, separeted by commas.
This code makes use of the dictionary object - which essentially is an associative array where the indexes are not restrained to numbers like they are in arrays, and offers a couple of convenient features to manipulate data that arrays don't.
I have tested this on a sheet including user ids in colomn A and log dates in column B - including headers - and this looks to work fine. Fell free to give it a try
Sub mysub()
Dim dic As Object
Dim logs As Variant
Dim myval As Long
Dim mykey As Variant
Dim nb As Long
Dim i As Long
Set dic = CreateObject("Scripting.dictionary")
'CHANGE TO YOUR SHEET REFERENCE HERE
For Each cell In Range(Cells(2, 1), Cells(Worksheets("Sheet8").Rows.count, 1).End(xlUp))
mykey = cell.Value
myval = cell.Offset(0, 1)
If myval <> 0 Then
On Error GoTo ERREUR
dic.Add mykey, myval
On Error GoTo 0
End If
Next cell
For Each Key In dic
logs = Split(dic(Key), ",")
logs = sortArray(logs)
i = LBound(logs) + 1
nb = 1
Do While i <= UBound(logs)
Do While CLng(logs(i)) = CLng(logs(i - 1)) + 1
nb = nb + 1
i = i + 1
Loop
If nb > 1 Then
tot = tot & "," & CStr(nb)
nb = 1
End If
i = i + 1
Loop
If tot <> "" Then dic(Key) = Right(tot, Len(tot) - 1)
Debug.Print "User: " & Key & " - Consecutive logs: " & dic(Key)
tot = ""
mys = ""
Next Key
Exit Sub
ERREUR:
If myval <> 0 Then dic(mykey) = dic(mykey) & "," & CStr(myval)
Resume Next
End Sub
Function sortArray(a As Variant) As Variant
For i = LBound(a) + 1 To UBound(a)
j = i
Do While a(j) < a(j - 1)
temp = a(j - 1)
a(j - 1) = a(j)
a(j) = temp
j = j - 1
If j = 0 Then Exit Do
Loop
Next i
sortArray = a
End Function

Related

VBA stop using temporary ranges

I'm new to vba so I need some help making my macro more efficient. It does return the desired outcome however I know there must be a much quicker way to do so I just do not have the vba experience to know how.
I have a column which contains names of people assigned to a project. Some are only one name, and others may be multiple, for example:
At the moment, my code goes through this column, separates the names by comma, and enters them individually into a new range like so:
I then use a collection for the unique names and enter them in the final desired list. The names must show up three times, blank row, next three rows are the next name, so on.It should look like this in the end:
Currently my code is the following
Sub FindUniques()
Dim Ws As Worksheet, Ns As Worksheet
Dim SubString() As String, m As Integer, k As Long, NameCount As Integer
Dim allNames As New Collection, tempRng As Range
Set Ns = Worksheets("Sheet2")
Set Ws = Worksheets("Sheet1")
'Loops through the Assigned To column, separates and finds unique names
On Error Resume Next
For i = 1 To Ws.Range("A:A").End(xlDown).Row - Range("Assigned_to").Row
SubString = Split(Range("Assigned_to").Offset(i), ", ")
For j = 0 To UBound(SubString)
allNames.Add (allNames.count), SubString(j)
Next j
Next i
On Error GoTo 0
NameCount = allNames.count
For k = 1 To NameCount
For m = 1 To 4
Ns.Cells((k - 1) * 4 + m + 7, 2) = allNames.Key(k)
Next
Range("Names").Offset((k - 1) * 4).ClearContents
Next
End Sub
It works, however there must be some way that is more efficient than entering the names into a new range and then deleting the range. How can I use a collection or an array or something of the sort to make it quicker? Any ideas would be really appreciated
edit: I have now updated the code and it is using an collection, taking values from the substring. This enters the item (0, 1, 2, ...) in the cells instead of the keys (keys here are the names). How do I get it to return the key instead of the item number?
The slowest part of VBA are worksheet interactions so we should attempt to minimize that as much as possible.
Sub FindUniques()
Dim ws As Worksheet, ns As Worksheet
Dim splitStr() As String, nameStr As Variant
Dim dict As New Dictionary
Dim lastRow As Long, i As Long
Set ns = Worksheets("Sheet2")
Set ws = Worksheets("Sheet1")
lastRow = ws.Cells(ws.Rows.Count, 1).End(xlUp).Row
'Loops through the Assigned To column, separates and finds unique names
For i = 2 To lastRow
splitStr = Split(CStr(ws.Cells(i, 1).Value), ", ")
For Each nameStr In splitStr
If Not dict.Exists(nameStr) Then dict.Add nameStr , 0
Next
Next i
i = 2
For Each nameStr In dict.Keys
ns.Cells(i, 1).Resize(3).Value = nameStr
i = i + 4
Next
End Sub
Edited With #Toddleson & #BigBen 's suggestions
Good Luck!

Store data by using FILTER Function within VBA [duplicate]

I'm trying to make a function MonstersInLevel() that filters the second column of my "LevelMonsters" named range based on the value of the first column. The range's first column represents a game level ID and the second column represents a monster ID that appears in that level. Here's what my range looks like.
If I call MonstersInLevel(2) I expect the function to return a range consisting of "2", "3" and "4".
Function MonstersInLevel(level As Integer) As Range
MonstersInLevel = Application.WorksheetFunction.Filter(Range("LevelMonsters").Columns(2), Range("LevelMonsters").Columns(1) = level)
End Function
I get:
A value used in the formula is of the wrong data type
I'm using the FILTER function as I would as an Excel formula. I assume there's some difference in the Excel and VBA syntax for FILTER's criteria.
Just encountered this problem myself and wanted to post my workaround.
We need to return an array of True/False to the worksheet function. To do this I created a Function that takes a 2D array, the column wanted and the value to compare. It then returns a 2d single column array of the necessary True/False.
Function myeval(arr() As Variant, clm As Long, vl As Variant) As Variant()
Dim temp() As Variant
ReDim temp(1 To UBound(arr, 1), 1 To 1)
Dim i As Long
For i = 1 To UBound(arr, 1)
temp(i, 1) = arr(i, clm) = vl
Next i
myeval = temp
End Function
So in this particular case it would be called:
Function MonstersInLevel(level As Integer) As Variant
MonstersInLevel = Application.WorksheetFunction.Filter(Range("LevelMonsters").Columns(2), myeval(Range("LevelMonsters").Value, 1, level),"""")
End Function
Avoid type mismatch in Worksheetfunction via VBA
Keeping in mind that the 2nd argument reflects a dynamic matrix condition
based entirely on ►worksheet logic (returning an array of 0 or 1 cell values /False or True])
it seems that you have
to execute an evaluation at least within this argument and
declare the function type (explicitly or implicitly) as Variant
Function MonstersInLevel(level As Integer) As Variant
'' Failing assignment:
' MonstersInLevel = Application.WorksheetFunction.Filter(Range("LevelMonsters").Columns(2), _
' Range("LevelMonsters").Columns(1) = level _
' )
MonstersInLevel = Application.WorksheetFunction.Filter( _
Range("LevelMonsters").Columns(2), _
Evaluate(Range("LevelMonsters").Columns(1).Address & "=" & level) _
)
End Function
...or to evaluate the complete function
Function MonstersInLevel(level As Integer) As Variant
Dim expr As String
expr = "=Filter(" & _
Range("LevelMonsters").Columns(2).Address & "," & _
Range("LevelMonsters").Columns(1).Address & "=" & level & _
")"
'Debug.Print expr
MonstersInLevel = Evaluate(expr)
End Function
Example call writing results to any target
Dim v
v = MonstersInLevel(2)
Sheet1.Range("D2").Resize(UBound(v), UBound(v, 2)) = v
Of course it would be possible as well to write .Formula2 expressions programmatically, even splitting into spill range references.
Addendum ........... //as of Jan 10th
Backwards compatible workaround via VBA.Filter()
"If you guys know any other VBA function that would be more appropriate
than Application.WorksheetFunction.Filter I'd be ok."
In order to provide also a backwards compatible alternative,
I demonstrate the following approach using the classic (VBA.)Filter() function (see section [3]) based upon
prior matching results (see [1]).
Note that Application.Match() comparing two (!) array inputs
delivers a whole array of possible findings (instead of a single result as most frequently executed).
Non findings are identified by IsError() values of -1; adding +1 results in a set
of zeros and ones. Section [2] enters corresponding data for positive findings.
Eventually non-findings (i.e. 0or zero) are removed by a tricky negative filtering.
Function getLevels()
Function getLevels(rng As Range, ByVal level As Long)
'Site: https://stackoverflow.com/questions/65630126/how-to-remove-only-the-duplicate-row-instead-of-removing-all-the-rows-that-follo
'[0] get datafield array
Dim v, v2
v = Application.Index(rng.Value2, 0, 1) ' 1st column
v2 = Application.Index(rng.Value2, 0, 2) ' 2nd column
'[1] check data (with Match comparing 2 arrays :-)
Dim results
results = Application.Transpose(Application.Match(v, Array(level), 0))
'[2] rebuild with False/True entries
Dim i As Long
For i = 1 To UBound(results)
results(i) = IsError(results(i)) + 1 ' 0 or 1-values
If results(i) Then results(i) = v2(i, 1) ' get current value if true
Next i
'[3] remove zeros (negative filtering)
results = Filter(results, "0", False)
'[4] return results as vertical 1-based array
getLevels = Application.Transpose(results)
End Function
Example call
Const LVL = 2 ' define level
With Sheet1 ' change to project's sheet Code(Name)
'define data range (assuming columns A:B)
Dim rng As Range
Set rng = .UsedRange.Resize(, 2)
'function call getLevels()
Dim levels
levels = getLevels(rng, level:=LVL)
'write to target
.Columns("I:I").Clear
.Range("I2").Resize(UBound(levels), 1) = levels
End With
Solution without any supporting VBA function:
Function MonstersInLevel(level As Integer) As Variant
With Application.WorksheetFunction
MonstersInLevel = .Filter(Range("LevelMonsters").Columns(2), _
.IfError(.XLookup(Range("LevelMonsters").Columns(1), level, True), False))
End With
End Function
XLookup returns an array of #N/A or True. IfError replaces errors with False. Finally, the Filter function receives an array of booleans as the second parameter.
EDIT
Removed the IfError function thanks to #ScottCraner:
Function MonstersInLevel(level As Integer) As Variant
With Application.WorksheetFunction
MonstersInLevel = .Filter(Range("LevelMonsters").Columns(2), _
.XLookup(Range("LevelMonsters").Columns(1), level, True, False))
End With
End Function
I couldn't resolve your question but as I did some testing on the subject trying to do so, I thought I'd share my findings:
Based on this Microsoft community post, or at least the answers there, it seems you will need to loop through the output in one way or another...
That question seems to want to achieve the same as what you are wanting to do (I think?).
On the other hand, I have never used the WorksheetFunction.Filter method, and the closest I could get it to working was like so:
Here is my sample data - RangeOne is Column A and RangeTwo is Column B. I have used the =FILTER() function in cell C1 evaluating the input in D1 for reference of expected results. Naturally this function is working as expected! The VBA routine is outputting to Columns E, F and G.
Sub TestFilterFunction()
Dim TestArray As Variant
Range("E1:E3") = Application.Filter(Range("RangeTwo"), Range("RangeOne"), Range("D1"))
Range("F1:F3") = Application.Filter(Range("RangeTwo"), Range("RangeOne") = Range("D1")) 'Runtime Error 13
Range("G1:G3") = Application.Filter(Range("RangeTwo"), Range("RangeOne"))
TestArray = Application.Filter(Range("RangeTwo"), Range("RangeOne"), Range("D1"))
TestArray = Application.Filter(Range("RangeTwo"), Range("RangeOne") = Range("D1")) 'Runtime Error 13
TestArray = Application.Filter(Range("RangeTwo"), Range("RangeOne"))
Range("H1:H3") = Application.Filter(Range("RangeTwo", "RangeOne"), Range("RangeOne"), Range("D1"))
TestArray = Application.Filter(Range("A1:B9"), Range("RangeOne"), "2")
End Sub
Column E returned the first 3 values from RangeTwo.
Column F has not been populated - This is because that line threw the Runtime error 13 - Type Mismatch
Column G returned the first 3 values from RangeTwo.
Column H returned the first 3 values from "A1:B9" (both ranges together) - specifically the first 3 values of column A.
I thought this was odd so I threw in an array to assign the values to rather than directly to the worksheet;
The first TestArray line and the third TestArray line both populated the array with the entire RangeTwo values;
I realised with the syntax of the first and third attempt at the WorksheetFunction.Filter, the entire range is returned (that being the first argument - Arg1 - range), but when trying to include the = Range("D1") , it returns the Type Mismatch error.
The final TestArray attempt being the same syntax as the Column H test, returned both columns in a 2D array (now TestArray(1 To 9, 1 To 2)).
I should note I couldn't find any documentation at all on WorksheetFunction.Filter so I'm assuming it does follow the same syntax as the Excel Sheet Function has.
If I find anything more on this topic I'll come back and edit it in, but for now it's looking like perhaps a solution using either loops or Index/Match functions also will need to happen to have the data returned in VBA.
I thought about perhaps writing the sheet formula to a cell and then grab that into an array or something but Excel inserts # into it now which only returns a single cell result, i.e.
Range("J1").Formula = "=FILTER(B1:B9, A1:A9 = D1)"
Would return in J1:
=#FILTER(B1:B9, A1:A9 = D1)
Which with our sample data, would only return 2 in J1 as opposed to the expected/desired 2, 3 and 4 in J1:J3.
I can't work out a way to remove the # as it is applied when the function is written to the cell unfortunately, but hopefully any of the above helps someone find a solution.
Just some comments to help you out.
If you are using the new FILTER() function from either a worksheet cell or within some VBA code, the first argument should be a range and the second argument should a a Boolean array. (if you don't enter something that can evaluate to a Boolean array, VBA may complain the the data type is wrong)
You would be best served (in VBA) if you:
explicitly declared a 2 dimensional, column-compatible, Boolean array
filled the array
used the array in the function call
Here is a super simple example. Say we want to filter the data from A1 to A6 to remove blanks. We could pick a cell and enter:
=FILTER(A1:A6,A1:A6<>"")
Looks like:
Now we want to perform the same activities with a VBA sub and put the result in a block starting with B9. The code:
Sub SingleColumn()
Dim r As Range, wf As WorksheetFunction, i As Long
Dim arr, s As String, dq As String, boo, rc As Long
Set wf = Application.WorksheetFunction
Set r = Range("A1:A6")
rc = r.Rows.Count
ReDim boo(1 To r.Rows.Count, 1 To 1) As Boolean
i = 1
For Each rr In r
If rr.Value = "" Then
boo(i, 1) = False
Else
boo(i, 1) = True
End If
i = i + 1
Next rr
arr = wf.Filter(r, boo)
MsgBox LBound(arr, 1) & "-" & UBound(arr, 1) & vbCrLf & LBound(arr, 2) & "-" & UBound(arr, 2)
Range("B9").Resize(UBound(arr, 1), UBound(arr, 2)) = arr
End Sub
Result:
On Excel version 15.0 (2013), I don't see Application.WorksheetFunction.Filter (tried with Show Hidden Members):
So maybe this is a newer function in later versions ?
My top Google search directs me to this question ;)
So, my answer is to avoid the function primarily from the point of view of backwards compatibility.
Alternate code options presented below returning e.g. a Range and a Variant.
Input:
Code:
Option Explicit
Sub Test()
Dim rngInput As Range
Dim rngFiltered As Range
Dim varFiltered As Variant
Dim varItem As Variant
Set rngInput = ThisWorkbook.Worksheets("Sheet1").Range("A2:B10")
' as range
Debug.Print "' Output as Range"
Set rngFiltered = MonstersInLevel_AsRange(rngInput, 2, 1, 2)
Debug.Print "' " & rngFiltered.Address ' expect B5, B6, B8
Debug.Print "' ---------------"
' as variant
Debug.Print "' Output as Variant"
varFiltered = MonstersInLevel_AsVariant(rngInput, 2, 1, 2)
For Each varItem In varFiltered
Debug.Print "' " & varItem ' expect 3, 4, 5
Next varItem
Debug.Print "' ---------------"
End Sub
Function MonstersInLevel_AsRange(rngToFilter As Range, _
ByVal lngLevel As Long, _
ByVal lngColIxToFilter As Long, _
ByVal lngColIxForValue As Long) As Range
Dim rngResult As Range
Dim lngRowIndex As Long
Dim lngResultIndex As Long
Set rngResult = Nothing
For lngRowIndex = 1 To rngToFilter.Rows.Count
If rngToFilter.Cells(lngRowIndex, lngColIxToFilter) = lngLevel Then
If rngResult Is Nothing Then
Set rngResult = rngToFilter.Cells(lngRowIndex, lngColIxForValue)
Else
Set rngResult = Union(rngResult, rngToFilter.Cells(lngRowIndex, lngColIxForValue))
End If
End If
Next lngRowIndex
Set MonstersInLevel_AsRange = rngResult
End Function
Function MonstersInLevel_AsVariant(rngToFilter As Range, _
ByVal lngLevel As Long, _
ByVal lngColIxToFilter As Long, _
ByVal lngColIxForValue As Long) As Variant
Dim varResult As Variant
Dim lngRowIndex As Long
Dim lngResultIndex As Long
lngResultIndex = 0
ReDim varResult(0)
For lngRowIndex = 1 To rngToFilter.Rows.Count
If rngToFilter.Cells(lngRowIndex, lngColIxToFilter) = lngLevel Then
lngResultIndex = lngResultIndex + 1
ReDim Preserve varResult(1 To lngResultIndex)
varResult(lngResultIndex) = rngToFilter.Cells(lngRowIndex, lngColIxForValue)
End If
Next lngRowIndex
MonstersInLevel_AsVariant = varResult
End Function
Test output:
' Output as Range
' $B$5:$B$6,$B$8
' ---------------
' Output as Variant
' 3
' 5
' 4
' ---------------
Based on Christian Buses answer (https://stackoverflow.com/a/65671334/16578424) I wrote a generic function to use the FILTER-function.
It returns a one-dimensional array with the filtered values.
Public Function getFILTERValuesFromRange(rgResult As Range, rgFilter As Range, varValue As Variant) As Variant
If rgResult.Columns.count > 1 Or rgFilter.Columns.count > 1 Then
Err.Raise vbObjectError + 512, , "Only ranges with one column are allowed."
ElseIf rgResult.Rows.count <> rgFilter.Rows.count Then
Err.Raise vbObjectError + 512, , "Both ranges have to be of the same size."
End If
Dim arr1 As Variant
With Application.WorksheetFunction
arr1 = .filter(rgResult, .XLookup(rgFilter, varValue, True, False))
End With
getFILTERValuesFromRange = getOneDimensionalArrayFromRangeArray(arr1)
End Function
Private Function getOneDimensionalArrayFromRangeArray(arr1 As Variant) As Variant
Dim arr2 As Variant
ReDim arr2(LBound(arr1, 1) To UBound(arr1, 1))
Dim i As Long
For i = 1 To UBound(arr1, 1)
arr2(i) = arr1(i, 1)
Next
getOneDimensionalArrayFromRangeArray = arr2
End Function

How do I build an array based on data using two criteria (Array Formula [pref] or VBA)

I have the following example data:
The first column is a list of names, the second column is the year those names belong to.
What I want to do is build a list of all unique (distinct) names from one year.
So for example in the year 2016 I want it to build a list like in the end result column, while in the year 2017 I want it to build a list with the unique names of that year.
Preferably I want it to be a (dynamic) named range so that the calculation only has to be performed once and so that I can use the =INDEX(examplenamedrange, 1) formula to call the names that I want to use.
If this is not possible in a dynamic named range then storing the array in VBA is also a possibility.
I’ve seen a few Excel formulas around the net that look at unique values in a list, but none that I could find with an extra criteria (in this case: Year).
Can anyone set me on the right path?
Here is a short VBA sub to achieve what you ask.
To set up a sub, press Alt+F11 to open VBA editor, then Insert>Module and paste the following code. I have commented it to show what each section is doing. You could also set this up to run when the Year cell is changed, but I'll leave that as an exercise for you! To run it, press F5 in the VBA editor or click the run button.
Sub uniqueInYear()
Dim sh As Worksheet
Set sh = ActiveSheet
Dim vcell As Range
Dim namesString As String
namesString = ""
Dim namesList() As String
' Compile string with all names comma separated for given year
For Each vcell In Range("A2:A" & sh.UsedRange.Rows.Count)
' check if name already captured for given year
If InStr(namesString, vcell.Value) = 0 And vcell.Offset(0, 1).Value = sh.Range("E1").Value Then
namesString = namesString & "," & vcell.Value
End If
Next vcell
' If empty then quit
If namesString = "" Then
Exit Sub
End If
' Remove leading comma
namesString = Right(namesString, Len(namesString) - 1)
' Put names into array
namesList = Split(namesString, ",")
' Write names to result column after clearing it
sh.Range("E2:E" & sh.UsedRange.Rows.Count + 1).Value = ""
Dim nameVar As Variant
For Each nameVar In namesList
sh.Range("E" & sh.UsedRange.Rows.Count + 1).End(xlUp).Offset(1, 0).Value = nameVar
Next nameVar
' Named range - delete if it exists then create a-fresh
On Error Resume Next
sh.Parent.Names("UniqueNames").Delete
On Error GoTo 0
sh.Parent.Names.Add name:="UniqueNames", _
RefersTo:=sh.Range("E2", sh.Range("E" & sh.UsedRange.Rows.Count + 1).End(xlUp))
End Sub
Outcome:
Just to add another similar, but different method in there. You can use a UDF which returns an array. So paste the code into a code module and then use the following formula on the worksheet
=GetNamesInYear(names,dates,2016)
where names is your range of names, dates is your range of dates and 2016 is the year you are searching, either a number written in the formula or reference to a cell with the value 2016 is fine.
To return the full array you need to enter the formula with Ctrl+Shift+Enter. To view all results, rather than just the first result, highlight that cell and the 5 below it (say), press F2 to edit and then Ctrl+Shift+Enter again.
Alternatively you can access the array with any worksheet function that can deal with string arrays. For example:
=INDEX(GetNamesInYear(names,dates,2016),2)
returns the second item in the array
Here is the code
Function GetNamesInYear(names As Range, years As Range, year As Integer) As Variant
Dim namesArr As Variant
namesArr = names.Value2
Dim yearsArr As Variant
yearsArr = years.Value2
Dim results As Long
results = 0
Dim resultArr As Variant
Dim i As Long
ReDim resultArr(0 To 0)
For i = 1 To UBound(namesArr, 1)
If Not InArray(resultArr, namesArr(i, 1)) And (yearsArr(i, 1) = year) Then
ReDim Preserve resultArr(0 To results)
resultArr(results) = namesArr(i, 1)
results = results + 1
End If
Next i
GetNamesInYear = Application.WorksheetFunction.Transpose(resultArr)
End Function
Private Function InArray(arr As Variant, value As Variant) As Boolean
Dim i As Integer
For i = 0 To UBound(arr)
If arr(i) = value Then
InArray = True
Exit Function
End If
Next i
InArray = False
End Function
The result looks like this:
Update
Names and dates inputs are now split out (separate ranges) as per comments from OP
You could try this:
Sub Names()
Dim x, Years, Counted, ColumnCount, j, lColumn
Dim Names(), FoundNames()
Years = Range("B1").Value
Counted = 0
ColumnCount = 2
ReDim Names(Range("A" & Rows.count).End(xlUp).row)
ReDim FoundNames(LBound(Names) To UBound(Names))
lColumn = Cells(1, Cells(1, Columns.count).End(xlToLeft).Column).Column
For Each c In Range(Range("A1"), Range("A" & Rows.count).End(xlUp))
If Years <> Range("B" & c.row).Value Then
For i = LBound(Names) To UBound(Names)
If Names(i) <> "" Then
j = j + 1
FoundNames(j - 1) = Names(i)
End If
Next i
ReDim Preserve FoundNames(LBound(Names) To j - 1)
Cells(1, lColumn + ColumnCount).Value = Years
For i = LBound(FoundNames) To UBound(FoundNames)
Cells(i + 2, lColumn + ColumnCount).Value = FoundNames(i)
Next
ColumnCount = ColumnCount + 1
Years = Range("B" & c.row).Value
Counted = 0
ReDim Names(Range("A" & Rows.count).End(xlUp).row)
ReDim FoundNames(LBound(Names) To UBound(Names))
End If
If InStr(Join(Names, ","), c.Value) < 1 Then
Names(Counted) = c.Value
Counted = Counted + 1
End If
Next c
j = 0
For i = LBound(Names) To UBound(Names)
If Names(i) <> "" Then
j = j + 1
FoundNames(j - 1) = Names(i)
End If
Next i
ReDim Preserve FoundNames(LBound(Names) To j - 1)
Cells(1, lColumn + ColumnCount).Value = Years
For i = LBound(FoundNames) To UBound(FoundNames)
Cells(i + 2, lColumn + ColumnCount).Value = FoundNames(i)
Next
End Sub
The results look like this:
Array formula can work here:
=INDEX($A$1:$A$15, N(IF({1}, MODE.MULT(IF(($B$1:$B$15=2016)*(ROW($A$1:$A$15)=MATCH($A$1:$A$15, $A$1:$A$15, 0)), (ROW($A$1:$A$15)) * {1,1})))))
Define your named range as dynaRange_2016, and see it's use in the two images
:
You could name a range for each year instead, and then define yet another name for the uniques range. This is more versatile:
define Named range_2017 as =INDEX(Sheet5!$A:$A, MATCH(2017,Sheet5!$B:$B, 0)):INDEX(Sheet5!$A:$A, MATCH(2017,Sheet5!$B:$B, 1))
Then define another named range uniques_2017 as=INDEX(Sheet5!range_2017, N(IF({1}, MODE.MULT(IF(ROW(Sheet5!range_2017)-MATCH(2017, Sheet5!$B:$B, 0)+1=MATCH(Sheet5!range_2017, Sheet5!range_2017, 0), (ROW(Sheet5!range_2017)-MATCH(2017, Sheet5!$B:$B, 0)+1) * {1,1})))))
In your sheet, you can call as INDEX(uniques_2017, 3) for example. Do the same for all years that you expect to occur.

Redimming a 2d array throws type mismatch

I was working on a solution to another question of mine when I stumble across this helpful question and answer. However implementing the answer given by Control Freak over there throws me a Type Mismatch error as soon as I exit the function and return to my code on the line: Years = ReDimPreserve(Years, i, 3). I'm not that skilled of a programmer to figure out what is going wrong here, so can anybody shed some light on this.
Here is my code:
Sub DevideData()
Dim i As Integer
Dim Years() As String
ReDim Years(1, 3)
Years(1, 1) = Cells(2, 1).Value
Years(1, 2) = 2
i = 2
ThisWorkbook.Worksheets("Simple Boundary").Activate
TotalRows = ThisWorkbook.Worksheets("Simple Boundary").Range("A100000").End(xlUp).row
For row = 3 To TotalRows
Years = ReDimPreserve(Years, i, 3)
If Not Cells(row, 1).Value = Cells(row - 1, 1).Value Then
Years(i - 1, 3) = row - 1
Years(i, 1) = Cells(row, 1).Value
Years(i, 2) = row
i = i + 1
End If
Next row
End Sub
And here is the function as written by Control Freak:
Public Function ReDimPreserve(aArrayToPreserve, nNewFirstUBound, nNewLastUBound)
ReDimPreserve = False
'check if its in array first
If IsArray(aArrayToPreserve) Then
'create new array
ReDim aPreservedArray(nNewFirstUBound, nNewLastUBound)
'get old lBound/uBound
nOldFirstUBound = UBound(aArrayToPreserve, 1)
nOldLastUBound = UBound(aArrayToPreserve, 2)
'loop through first
For nFirst = LBound(aArrayToPreserve, 1) To nNewFirstUBound
For nLast = LBound(aArrayToPreserve, 2) To nNewLastUBound
'if its in range, then append to new array the same way
If nOldFirstUBound >= nFirst And nOldLastUBound >= nLast Then
aPreservedArray(nFirst, nLast) = aArrayToPreserve(nFirst, nLast)
End If
Next
Next
'return the array redimmed
If IsArray(aPreservedArray) Then ReDimPreserve = aPreservedArray
End If
End Function
I promised a fuller answer. Sorry it is later than I expected:
I got tied up with another problem,
Technique 1, which I was expecting to recommend, did not work as I expected so I added some other techniques which are much more satisfactory.
As I said in my first comment:
Public Function ReDimPreserve(aArrayToPreserve, nNewFirstUBound, nNewLastUBound)
causes aArrayToPreserve to have the default type of Variant. This does not match:
Dim Years() As String
As you discovered, redefining Years as a Variant, fixes the problems. An alternative approach would be to amend the declaration of ReDimPreserve so aArrayToPreserve is an array of type String. I would not recommend that approach since you are storing both strings and numbers in the array. A Variant array will handle either strings or numbers while a String array can only handle numbers by converting them to strings for storage and back to numbers for processing.
I tried your macro with different quantities of data and different amendments and timed the runs:
Rows of data Amendment Duration of run
3,500 Years() changed to Variant 4.99 seconds
35,000 Years() changed to Variant 502 seconds
35,000 aArrayToPreserve changed to String 656 seconds
As I said in my second comment, ReDim Preserve is slow for both the inbuilt method and the VBA routine you found. For every call it must:
find space for the new larger array
copy the data from the old array to the new
release the old array for garbage collection.
ReDim Preserve is a very useful method but it must be used with extreme care. Sometimes I find that sizing an array to the maximum at the beginning and using ReDim Preserve to cut the array down to the used size at the end is a better technique. The best techniques shown below determine the number of entries required before sizing the array.
At the bottom of your routine, I added:
For i = LBound(Years, 1) To LBound(Years, 1) + 9
Debug.Print Years(i, 0) & "|" & Years(i, 1) & "|" & Years(i, 2) & "|" & Years(i, 3)
Next
For i = UBound(Years, 1) - 9 To UBound(Years, 1)
Debug.Print Years(i, 0) & "|" & Years(i, 1) & "|" & Years(i, 2) & "|" & Years(i, 3)
Next
This resulted in the following being output to the Immediate Window:
|||
|AAAA|2|2
|AAAB|3|4
|AAAC|5|7
|AAAD|8|11
|AAAE|12|16
|AAAF|17|22
|AAAG|23|23
|AAAH|24|25
|AAAI|26|28
|AOUJ|34973|34976
|AOUK|34977|34981
|AOUL|34982|34987
|AOUM|34988|34988
|AOUN|34989|34990
|AOUO|34991|34993
|AOUP|34994|34997
|AOUQ|34998|35002
|AOUR|35003|
|||
Since you have called the array Years, I doubt my string values are anything like yours. This does not matter. What matters, is that I doubt this output was exactly what you wanted.
If you write:
ReDim Years(1, 3)
The lower bounds are set to the value specified by the Option Base statement or zero if there is no Option Base statement. You have lower bounds for both dimensions of zero which you do not use. This is the reason for the “|||” at the top. There is another “|||” at the end which means you are creating a final row which you are not using. The final used row does not have an end row which I assume in a mistake.
When I can divide a routine into steps, I always validate the result of one step before advancing to the next. That way, I know any problems are within the current step and not the result of an error in an earlier step. I use Debug.Print to output to the Immediate Window most of the time. Only if I want to output a lot of diagnostic information will I write to a text file. Either way, blocks of code like mine are a significant aid to rapid debugging of a macro.
I would never write ReDim Years(1, 3). I always specify the lower bound so as to be absolutely clear. VBA is the only language I know where you can specify any value for the lower bound (providing it is less than the upper bound) so I will specify non-standard values if is helpful for a particular problem. In this case, I see not advantage to a lower bound other than one so that is what I have used.
With two dimensions arrays it is conventional to have columns as the first dimension and rows as the second. One exception is for arrays read from or to be written to a worksheet for which the dimensions are the other way round. You have rows as the first dimension. If you have used the conventional sequence you could have used the ReDim Preserve method, thereby avoiding the RedimPreserve function and the problem of non-matching types.
Technique 1
I expected this to be the fastest technique. Experts advise us to avoid “re-inventing the wheel”. That is, if Excel has a routine that will do what you want, don’t code an alternative in VBA. However, I have found a number of examples where this is not true and I discovered this technique was one of them.
The obvious technique here is to use Filter, then create a range of the visible rows using SpecialCells and finally process each row in this range. I have used this technique very successfully to meet other requirements but not here.
I did not know the VBA to select unique rows so started the macro recorder and filtered my test data from the keyboard to get:
Range("A1:A35000").AdvancedFilter Action:=xlFilterInPlace, Unique:=True
My past uses of Filter have all converted to AutoFilter which I have found to give acceptable performance. This converted to AdvancedFilter which took 20 seconds both from the keyboard and from VBA. I do not know why it is so slow.
The second problem was that:
Set RngUnique = .Range(.Cells(1, 1), .Cells(RowLast, 1)) _
.SpecialCells(xlCellTypeVisible)
was rejected as “too complicated”.
Not being able to get the visible rows as a range means the benefits of Filter are not really available. I have counted the visible rows to simulate having RngUnique.Rows.Count. This shows the technique which has always worked with AutoFilter. If AdvancedFilter had reported the unique rows in an accepted time I might have investigated this problem but under the circumstances it does not seem worth the effort.
The macro demonstrating this technique is:
Option Explicit
Sub Technique1()
' * Avoid using meaningless names like i. Giving every variable a meaningful
' name is helpful during development and even more helpful when you return
' to the macro in six months for maintenence.
' * My naming convention is use a sequence of keywords. The first keyword
' identifies what type of data the variable holds. So "Row" means it holds
' a row number. Each subsequent keyword narrows the scope. "RowSb" is a
' row of the worksheet "Simple Boundary" and "RowYears" is a row of the Years
' array. "RowSbCrnt"is the current row of the worksheet "Simple Boundary".
' * I can look at macros I wrote years ago and know what all the variables are.
' You may not like my convention. Fine, development your own but do not
' try programming with random names.
' * Avoid data type Integer which specifies a 16-bit whole number and requires
' special processing on 32 and 64-bit computers. Long is now the recommended
' data type for whole numbers.
Dim NumRowsVisible As Long
Dim RowSbCrnt As Long
Dim RowSbLast As Long
Dim RowYearsCrnt As Long
Dim TimeStart As Double
Dim Years() As Variant
TimeStart = Timer ' Get the time as seconds since midnight to nearest .001
' of a second
' This can save significant amounts of time if the macro amends the
' screen or switches between workbooks.
Application.ScreenUpdating = False
With Worksheets("Simple Boundary")
' Rows.Count avoiding having to guess how many rows will be used
RowSbLast = .Cells(Rows.Count, "A").End(xlUp).Row
' Hide non-unique rows
With .Range(.Cells(1, 1), .Cells(RowSbLast, 1))
.AdvancedFilter Action:=xlFilterInPlace, Unique:=True
End With
' Count number of unique rows.
' It is difficult to time small pieces of code because OS routines
' can execute at any time. However, this count takes less than .5
' of a second with 35,000 rows.
NumRowsVisible = 0
For RowSbCrnt = 2 To RowSbLast
If Not .Rows(RowSbCrnt).Hidden Then
NumRowsVisible = NumRowsVisible + 1
End If
Next
' Use count to ReDim array to final size.
ReDim Years(1 To 3, 1 To NumRowsVisible)
RowYearsCrnt = 1
Years(1, RowYearsCrnt) = .Cells(2, 1).Value
Years(2, RowYearsCrnt) = 2
For RowSbCrnt = 3 To RowSbLast
If Not .Rows(RowSbCrnt).Hidden Then
Years(3, RowYearsCrnt) = RowSbCrnt - 1
RowYearsCrnt = RowYearsCrnt + 1
Years(1, RowYearsCrnt) = .Cells(RowSbCrnt, 1).Value
Years(2, RowYearsCrnt) = RowSbCrnt
End If
Next
' Record final row for final string
Years(3, RowYearsCrnt) = RowSbLast
.ShowAllData ' Clear AdvancedFilter
End With
Application.ScreenUpdating = True
Debug.Print "Duration: " & Format(Timer - TimeStart, "#,##0.000")
' Output diagnostics
For RowYearsCrnt = 1 To 9
Debug.Print Years(1, RowYearsCrnt) & "|" & _
Years(2, RowYearsCrnt) & "|" & _
Years(3, RowYearsCrnt) & "|"
Next
' Note that rows are now in the second dimension hence the 2 in UBound(Years, 2)
For RowYearsCrnt = UBound(Years, 2) - 9 To UBound(Years, 2)
Debug.Print Years(1, RowYearsCrnt) & "|" & _
Years(2, RowYearsCrnt) & "|" & _
Years(3, RowYearsCrnt) & "|"
Next
End Sub
The output to the Immediate Window is:
Duration: 20.570
AAAA|2|2|
AAAB|3|4|
AAAC|5|7|
AAAD|8|11|
AAAE|12|16|
AAAF|17|22|
AAAG|23|23|
AAAH|24|25|
AAAI|26|28|
AOUI|34970|34972|
AOUJ|34973|34976|
AOUK|34977|34981|
AOUL|34982|34987|
AOUM|34988|34988|
AOUN|34989|34990|
AOUO|34991|34993|
AOUP|34994|34997|
AOUQ|34998|35002|
AOUR|35003|35008|
As you can see the last row is correct. A duration of 20 seconds is better than the 8 minutes of your technique but I am sure we can do better.
Technique 2
The next macro is similar to the last one but it counts the unique rows rather than use AdvancedFilter to hide the non-unique rows. This macro has a duration of 1.5 seconds with 35,000 rows. This demonstrates that counting how many rows are required for an array in a first pass of the data is a viable approach. The diagnostic output from this macro is the same as above.
Sub Technique2()
Dim NumRowsUnique As Long
Dim RowSbCrnt As Long
Dim RowSbLast As Long
Dim RowYearsCrnt As Long
Dim TimeStart As Double
Dim Years() As Variant
TimeStart = Timer ' Get the time as seconds since midnight to nearest .001
' of a second
With Worksheets("Simple Boundary")
RowSbLast = .Cells(Rows.Count, "A").End(xlUp).Row
' Count number of unique rows.
' Assume all data rows are unique until find otherwise
NumRowsUnique = RowSbLast - 1
For RowSbCrnt = 3 To RowSbLast
If .Cells(RowSbCrnt, 1).Value = .Cells(RowSbCrnt - 1, 1).Value Then
NumRowsUnique = NumRowsUnique - 1
End If
Next
' * Use count to ReDim array to final size.
' * Note that I have defined the columns as the first dimension and rows
' as the second dimension to match convention. Had I wished, this would
' have allowed me to use the standard ReDim Preserve which can only
' adjust the last dimension. However, this does not match the
' syntax of Cells which has the row first. It may have been better to
' maintain your sequence so the two sequences were the same.
ReDim Years(1 To 3, 1 To NumRowsUnique)
RowYearsCrnt = 1
Years(1, RowYearsCrnt) = .Cells(2, 1).Value
Years(2, RowYearsCrnt) = 2
For RowSbCrnt = 3 To RowSbLast
If .Cells(RowSbCrnt, 1).Value <> .Cells(RowSbCrnt - 1, 1).Value Then
Years(3, RowYearsCrnt) = RowSbCrnt - 1
RowYearsCrnt = RowYearsCrnt + 1
Years(1, RowYearsCrnt) = .Cells(RowSbCrnt, 1).Value
Years(2, RowYearsCrnt) = RowSbCrnt
End If
Next
' Record final row for final string
Years(3, RowYearsCrnt) = RowSbLast
End With
Debug.Print "Duration: " & Format(Timer - TimeStart, "#,##0.000")
' Output diagnostics
For RowYearsCrnt = 1 To 9
Debug.Print Years(1, RowYearsCrnt) & "|" & _
Years(2, RowYearsCrnt) & "|" & _
Years(3, RowYearsCrnt) & "|"
Next
' Note that rows are now in the second dimension hence the 2 in UBound(Years, 2)
For RowYearsCrnt = UBound(Years, 2) - 9 To UBound(Years, 2)
Debug.Print Years(1, RowYearsCrnt) & "|" & _
Years(2, RowYearsCrnt) & "|" & _
Years(3, RowYearsCrnt) & "|"
Next
End Sub
Technique 3
The next macro is only slightly changed from the last.
Firstly, I have replaced the literals used to identify the column numbers in worksheets and arrays with constants such as:
Const ColYrEnd As Long = 3
Under my naming convention ColYrEnd = Column of Year array holding range End hence:
Years(ColYrEnd, RowYearsCrnt) = RowCvCrnt - 1
instead of Years(3, RowYearsCrnt) = RowCvCrnt - 1
This makes no difference to the compiled code but makes the source code easier to understand because you do not have to remember what columns 1, 2 and 3 hold. More importantly, if you ever have to rearrange the columns, updating the constants is the only change required. If you ever have to search through a long macro replacing every use of 2 as a column number (while ignoring any other use of 2) by 5, you will know why this is important.
Secondly, I have used:
ColValues = .Range(.Cells(1, ColSbYear), _
.Cells(RowSbLast, ColSbYear)).Value
to import column 1 to an array. The code that read the values from the worksheet now reads them from this array. Array access is much faster than worksheet access so this reduces the runtime from 1.5 seconds to .07 seconds.
The revised code is:
Sub Technique3()
Const ColCvYear As Long = 1
Const ColSbYear As Long = 1
Const ColYrYear As Long = 1
Const ColYrStart As Long = 2
Const ColYrEnd As Long = 3
Const RowSbDataFirst As Long = 2
Const RowCvDataFirst As Long = 2
Dim ColValues As Variant
Dim NumRowsUnique As Long
Dim RowCvCrnt As Long
Dim RowSbCrnt As Long
Dim RowSbLast As Long
Dim RowYearsCrnt As Long
Dim TimeStart As Double
Dim Years() As Variant
TimeStart = Timer ' Get the time as seconds since midnight to nearest .001
' of a second
With Worksheets("Simple Boundary")
RowSbLast = .Cells(Rows.Count, ColSbYear).End(xlUp).Row
ColValues = .Range(.Cells(1, ColSbYear), _
.Cells(RowSbLast, ColSbYear)).Value
' * The above statement imports all the data from column 1 as a two dimensional
' array into a Variant. The Variant is then accessed as though it is an array.
' * The first dimension has one entry per row, the second dimension has on entry
' per column which is one in this case. Both dimensions will have a lower bound
' of one even if the first row or column loaded is not one.
End With
' Count number of unique rows.
' Assume all data rows are unique until find otherwise
NumRowsUnique = UBound(ColValues, 1) - 1
For RowCvCrnt = RowCvDataFirst + 1 To UBound(ColValues, 1)
If ColValues(RowCvCrnt, ColCvYear) = ColValues(RowCvCrnt - 1, ColCvYear) Then
NumRowsUnique = NumRowsUnique - 1
End If
Next
' I mentioned earlier that I was unsure if having rows and columns in the
' convention sequence was correct. I am even less sure here where array
' ColValues has been loaded from a worksheet and the rows and columns are
' not in the conventional sequence. ReDim Years(1 To 3, 1 To NumRowsUnique)
RowYearsCrnt = 1
Years(ColYrYear, RowYearsCrnt) = ColValues(RowCvDataFirst, ColCvYear)
Years(ColYrStart, RowYearsCrnt) = RowCvDataFirst
For RowCvCrnt = RowCvDataFirst + 1 To UBound(ColValues, 1)
If ColValues(RowCvCrnt, ColCvYear) <> ColValues(RowCvCrnt - 1, ColCvYear) Then
Years(ColYrEnd, RowYearsCrnt) = RowCvCrnt - 1
RowYearsCrnt = RowYearsCrnt + 1
Years(ColYrYear, RowYearsCrnt) = ColValues(RowCvCrnt, ColCvYear)
Years(ColYrStart, RowYearsCrnt) = RowCvCrnt
End If
Next
' Record final row for final string
Years(ColYrEnd, RowYearsCrnt) = UBound(ColValues, 1)
Debug.Print "Duration: " & Format(Timer - TimeStart, "#,##0.000")
' Output diagnostics
For RowYearsCrnt = 1 To 9
Debug.Print Years(ColYrYear, RowYearsCrnt) & "|" & _
Years(ColYrStart, RowYearsCrnt) & "|" & _
Years(ColYrEnd, RowYearsCrnt) & "|"
Next
' Note that rows are now in the second dimension hence the 2 in UBound(Years, 2)
For RowYearsCrnt = UBound(Years, 2) - 9 To UBound(Years, 2)
Debug.Print Years(ColYrYear, RowYearsCrnt) & "|" & _
Years(ColYrStart, RowYearsCrnt) & "|" & _
Years(ColYrEnd, RowYearsCrnt) & "|"
Next
End Sub
Other techniques
I considered introducing other techniques but I decided they were not useful for this requirement. Also, this answer is already long enough. I have provided much for you to think about and more would just be overload. As stated above I have reduced the run time for 35,000 rows from 8 minutes to 20 seconds to 1.5 seconds to .07 seconds.
Work slowly through my macros. I have hope I have provided adequate explanation of what each is doing. Once you know a statement exists, it is generally easy to look it up so there is not too much explanation of the statements. Come back with questions as necessary.
As stated earlier in comments, ReDim Preserve is an expensive call when working with large datasets and is generally avoided. Here is some commented code that should perform as desired. Tested on a dataset with 200,000 rows, it took less than 5 seconds to complete. Tested on a dataset with 1000 rows, it took less that 0.1 seconds to complete.
The code uses a Collection to get the unique values out of column A, and then builds the array based on those unique values and outputs the results to another sheet. In your original code, there was nowhere that the resulting array was output, so I just made something up and you'll need to adjust the output section as needed.
Sub tgr()
Dim ws As Worksheet
Dim rngYears As Range
Dim collUnqYears As Collection
Dim varYear As Variant
Dim arrAllYears() As Variant
Dim arrYearsData() As Variant
Dim YearsDataIndex As Long
Set ws = ActiveWorkbook.Sheets("Simple Boundary")
Set rngYears = ws.Range("A1", ws.Cells(Rows.Count, "A").End(xlUp))
If rngYears.Cells.Count < 2 Then Exit Sub 'No data
Set collUnqYears = New Collection
With rngYears
.CurrentRegion.Sort rngYears, xlAscending, Header:=xlYes 'Sort data by year in column A
arrAllYears = .Offset(1).Resize(.Rows.Count - 1).Value 'Put list of years in array for faster calculation
'Get count of unique years by entering them into a collection (forces uniqueness)
For Each varYear In arrAllYears
On Error Resume Next
collUnqYears.Add CStr(varYear), CStr(varYear)
On Error GoTo 0
Next varYear
'Ssize the arrYearsData array appropriately
ReDim arrYearsData(1 To collUnqYears.Count, 1 To 3)
'arrYearsData column 1 = Unique Year value
'arrYearsData column 2 = Start row for the year
'arrYearsData column 3 = End row for the year
'Loop through unique values and populate the arrYearsData array with desired information
For Each varYear In collUnqYears
YearsDataIndex = YearsDataIndex + 1
arrYearsData(YearsDataIndex, 1) = varYear 'Unique year
arrYearsData(YearsDataIndex, 2) = .Find(varYear, .Cells(1), , , , xlNext).Row 'Start Row
arrYearsData(YearsDataIndex, 3) = .Find(varYear, .Cells(1), , , , xlPrevious).Row 'End Row
Next varYear
End With
'Here is where you would output your results
'Your original code did not output results anywhere, so adjust sheet and start cell as necessary
With Sheets("Sheet2")
.UsedRange.Offset(1).ClearContents 'Clear previous result data
.Range("A2").Resize(UBound(arrYearsData, 1), UBound(arrYearsData, 2)).Value = arrYearsData
.Select 'This will show the output sheet so you can see the results
End With
End Sub
As you mentioned in the comments, if you are going to continue this way you definitely need to move that redim inside the if statement:
If Not Cells(row, 1).Value = Cells(row - 1, 1).Value Then
Years = ReDimPreserve(Years, i, 3)
Years(i - 1, 3) = row - 1
Years(i, 1) = Cells(row, 1).Value
Years(i, 2) = row
i = i + 1
End If
I think this redimming multi-dimensional arrays is overkill for you. I have a few recommendations:
Ranges
I notice that you are using 2 values to represent the start of a range and end of a range (years(i,2) is the start and years(i,3) is the end). Instead why not just use an actual range?
Create a range variable called startNode and when you find the end of the range create a Range object like with Range(startNode,endNode).
Your code will look something like this:
Sub DevideData()
Dim firstCell As Range
Dim nextRange As Range
Set firstCell = Cells(2,1)
ThisWorkbook.Worksheets("Simple Boundary").Activate
TotalRows = ThisWorkbook.Worksheets("Simple Boundary").Range("A100000").End(xlUp).row
For row = 3 To TotalRows
If Not Cells(row, 1).Value = Cells(row - 1, 1).Value Then
Set nextRange = Range(firstCell, Cells(row-1,1))
Set firstCell = Cells(row,1)
End If
Next row
End Sub
1D Array
Now you do not need to store 3 values! Just an array of ranges Which you can redim like this:
Dim years() As Range
'Do Stuff'
ReDim Preserve years(1 to i)
set years(i) = nextRange
i = i + 1
Note that the only reason that ReDimPreserve was created was so that you can redim both dimensions of a 2D array (normally you can only change the second dimension). With a 1D array you can freely redim without any troubles! :)
For Each Loop
Lastly I recommend that you use a for each loop instead of a regular for loop. It makes your intentions for the loop more explicit which makes your code more readable.
Dim firstCell as Range
Dim lastUniqueValue as Variant
Dim lastCell as Range
Dim iCell as Range
Set firstCell = Cells(3,1)
lastUniqueValue = firstCell.Value
Set lastCell = ThisWorkbook.Worksheets("Simple Boundary").Range("A100000").End(xlUp)
For Each iCell in Range(firstCell, lastCell)
If iCell.Value <> lastUniqueValue Then
lastUniqueValue = iCell.Value
'Do Stuff
End If
Next
Hope this helps! :)

Comparing two large lists with multiple columns (same number in each list) in excel VBA and do...more stuff

I've searched far and wide and I can't quite find anything to fit my needs.
The situation:
I have two lists of data with the same type data in each column (10 columns but the last 2 are useless), but the lists are of varying length (currently 55k in one, 18k in the other). The longer list is going to be a running list of items with the most up to date data in each column for the unique ID # in column A. The other list is linked to a SharePoint list that I update a couple times each day.
The need:
I need the list that updates from SharePoint to be compared to the running list. If there are matching Unique ID #'s in the lists, then the running list needs to be updated to the pulled data. If the running list doesn't contain a Unique ID that is in the pulled list, the new line needs to be added to the running list (which will be sorted later).
I first tried doing this with cell references in two for loops and for only 10 rows this worked fine. When I tried running it for every line, I had problems. So I tried using arrays instead, but this is new territory for me. The code seems to be working, but it's taking a really long time to run (I've let it go for 10 minutes before force stopping). I've tried adding some efficiency increases like turning off screen updating and calculations, but they shouldn't have any effect since I'm using arrays and not actually updating the cells until the array comparison is finished. If arrays are more efficient, great, but I don't know how to combine the data from the pulled list's array to the running list's array.
Here is the code that I have so far:
Sub Data_Compile_Cells()
Dim sdata As Worksheet, spull As Worksheet
Dim p As Long, d As Long, c As Long
Dim lrdata As Long, lrpull As Long
Dim rdata As Range, rpull As Range
Dim Newvalue As Boolean
Dim apull As Variant, adata As Variant
Dim nrows As Long, ncols As Integer
Set sdata = Sheets("Data")
Set spull = Sheets("Data Pull")
Newvalue = "FALSE"
i = 1
apull = spull.Range("A1").CurrentRegion
adata = sdata.Range("A1").CurrentRegion
'lrdata = sdata.Range("A" & Rows.Count).End(xlUp).Row
'lrpull = spull.Range("A" & Rows.Count).End(xlUp).Row
Application.Calculation = xlCalculationManual
Application.ScreenUpdating = False
sdata.Activate
'*****UniqueID Check******
'Run through list of Unique ID's pulled from SharePoint
For p = 2 To UBound(apull, 1)
'I tried to add a status bar to see if the code was actually running
'Application.StatusBar = "Progress: " & p & " of " & UBound(apull, 1) & " : " & Format(p / UBound(apull, 1), "0%")
'Compare each one to the Unique ID's already listed
For d = 2 To UBound(adata, 1)
'Check for matching Unique ID's
If adata(d, 1) = apull(p, 1) Then
'Check each cell in the row with the matching Unique ID
For c = 2 To 10
'If a cell does not have the same data, replace the Data array value with the value from the Pull array
If adata(p, c) <> apull(d, c) Then
adata(d, c) = apull(p, c)
End If
Next c
'If a match is found, skip to the next p value
Exit For
Else
Newvalue = "TRUE"
'Need code to append new line to Data array
End If
Next d
Next p
'Sort the data
'Range("A2").CurrentRegion.Sort key1:=Range("A2"), order1:=xlAscending, Header:=xlYes
Application.ScreenUpdating = True
Application.Calculation = xlCalculationAutomatic
End Sub
Any direction would be much appreciated.
This ran in <1 sec for me, using 20k rows "data", ~3k rows "pull" (mix of updates and new).
EDIT: tidied up and added some comments...
Sub tester()
Const NUM_NEW As Long = 20000 'large enough ?
Dim arrPull, arrData, arrDataId, arrNew()
Dim ubP As Long, ubD As Long
Dim numNew As Long, r As Long
Dim v, c As Long
Dim t, tmp, coll As Collection
t = Timer
'grab the current and new data
arrPull = Sheets("Pull").Range("A1").CurrentRegion.Value
arrData = Sheets("Data").Range("A1").CurrentRegion.Value
ubP = UBound(arrPull, 1)
ubD = UBound(arrData, 1)
numNew = 0
ReDim arrNew(1 To NUM_NEW, 1 To 10) 'array for new data
'create a collection to map ID to "row number"
Set coll = New Collection
For r = 1 To ubD
coll.Add Item:=r, Key:=arrData(r, 1)
Next r
For r = 1 To ubP
tmp = arrPull(r, 1)
v = 0
'collection has no "exists" function, so trap any error
On Error Resume Next
v = coll.Item(tmp)
On Error GoTo 0
If v > 0 Then
'Id already exists: update data
For c = 2 To 10
arrData(v, c) = arrPull(r, c)
Next c
Else
'new Id: add to the "new" array
numNew = numNew + 1
If numNew > NUM_NEW Then
MsgBox "Need larger `new` array!"
'a more sophisticated approach would be to dump the full
' array to the sheet and then redimension it for more
' data...
Exit Sub
End If
For c = 1 To 10
arrNew(numNew, c) = arrPull(r, c)
Next c
End If
Next r
'drop updated and new (if any) to the worksheet
With Sheets("Data")
.Range("A1").CurrentRegion.Value = arrData
If numNew > 0 Then
.Cells(ubD + 1, 1).Resize(numNew, 10).Value = arrNew
End If
End With
Debug.Print "Done in " & Timer - t & " sec"
End Sub
You would be better off using MSAccess to do this. Link to both tables and then do an inner join on the id field or which ever field links the items in the two lists.

Resources