Turn Excel range into VBA string - arrays

I would like to turn values in given range into VBA string where original cell values are separated by any chosen column delimiter and row delimiter. Delimiters could be one character or longer strings. The row delimiter is the string at the end of the line. The string should be done just as we read text from left top corner, from left to right, to bottom right corner.
Here is an example of the VALUES in range A1:C5:
+----+----+----+
| A1 | B1 | C1 |
+----+----+----+
| A2 | B2 | C2 |
+----+----+----+
| A3 | B3 | C3 |
+----+----+----+
| A4 | B4 | C4 |
+----+----+----+
| A5 | B5 | C5 |
+----+----+----+
Desired results is a VBA string:
A1,B1,C1#$A$2,$B$2,$C$2#A3,B3,C3#A4,B4,C4#A5,B5,C5#
For the sake of readability I will show it like this:
A1,B1,C1#
A2,B2,C2#
A3,B3,C3#
A4,B4,C4#
A5,B5,C5#
As a column delimiter I have chosen , (comma), and as a row delimiter # sign. Of course these could be any characters like \r\n.
The reason why I want fast cooking of the string from range is because I want to to send it to SQL Server through ADO connection. As I have tested so far it is the fastest way to transfer lots of data on the fly. The twin question how to split this string on SQL Server is here: Split string into table given row delimiter and column delimiter in SQL server
Solution 1. Loop through all rows and columns. Question is if there be any more elegant way then just looping through all rows and columns? I would prefer VBA solution, not formula one.
Solution 2. Suggested by Mat's Mug in comment. CSV file is desired results. I would like to do it on the fly without saving. But good point - imitate CSV is what I want but I want it without saving.
Edit after bounty
Answer of Thomas Inzina works crazy fast and his solution is portable. Ordinary VBA loop turned out to be way faster then worksheet functions like JOIN on large data sets. I do not recommend using worksheet functions in VBA for that purpose. I have voted up everybody. Thank you all.

To optimize performance my function emulates a String Builder.
Variables
Text: A very large string to hold the data
CELLLENGTH: A contant that determines the size of the BufferSize
BufferSize: The initial size of Text string
Data(): An Array derived from the source range
As the rows and columns of the Data() array are iterated over the current element (Data(x, y)) value replaces a portion of the Text string. The text string is resized as needed. This reduces the number of concatenations immensely. The initial BufferSize is set pretty high. I got my best results, 0.8632813 Second(s), by reducing CELLLENGTH to 25.
Download Sample Data from Sample-Videos.com
Results
Code
Function getRangeText(Source As Range, Optional rowDelimiter As String = "#", Optional ColumnDelimiter As String = ",")
Const CELLLENGTH = 255
Dim Data()
Dim text As String
Dim BufferSize As Double, length As Double, x As Long, y As Long
BufferSize = CELLLENGTH * Source.Cells.Count
text = Space(BufferSize)
Data = Source.Value
For x = 1 To UBound(Data, 1)
If x > 1 Then
Mid(text, length + 1, Len(rowDelimiter)) = rowDelimiter
length = length + Len(rowDelimiter)
End If
For y = 1 To UBound(Data, 2)
If length + Len(Data(x, y)) + 2 > Len(text) Then text = text & Space(CDbl(BufferSize / 4))
If y > 1 Then
Mid(text, length + 1, Len(ColumnDelimiter)) = ColumnDelimiter
length = length + Len(ColumnDelimiter))
End If
Mid(text, length + 1, Len(Data(x, y))) = Data(x, y)
length = length + Len(Data(x, y))
Next
Next
getRangeText = Left(text, length) & rowDelimiter
End Function
Test
Sub TestGetRangeText()
Dim s As String
Dim Start: Start = Timer
s = getRangeText(ActiveSheet.UsedRange)
Debug.Print "Execution Time: "; Timer - Start; "Second(s)"
Debug.Print "Rows: "; ActiveSheet.UsedRange.Rows.Count; "Columns: "; ActiveSheet.UsedRange.Columns.Count
Debug.Print "Result Length: "; Format(Len(s), "#,###")
End Sub

Here's a quick way to test (Note: this will only work with Excel 2016 (or if you have the TextJoin() function).
First, in the empty column D, do =C1&"#", so you get your last column filled with the cell+#
Then, say in cell E1, =TEXTJOIN(",",TRUE,A1:C5)
(Note: TRUE there means to skip blanks. If you have blanks, and want to keep them, change that to FALSE).
THen, on that cell, run
=Substitute(E1,"#,","#")
Or combine the formulas into one: =SUBSTITUTE(TEXTJOIN(",",TRUE,A1:C4),"#,","#").
If you need vba, just throw the formula into a VBA macro and run like that.

Here is a UDF that returns the desired output:
EDIT Changed to add EOL at the end.
Option Explicit
Function MultiJoin(Rng As Range, Delimiter As String, EOL As String) As String
Dim V As Variant, W As Variant
Dim COL As Collection
Dim I As Long, J As Long
V = Rng
Set COL = New Collection
ReDim W(1 To UBound(V, 2))
For I = 1 To UBound(V, 1)
For J = 1 To UBound(V, 2)
W(J) = V(I, J)
Next J
COL.Add W
Next I
ReDim V(1 To COL.Count)
For I = 1 To COL.Count
V(I) = Join(COL(I), Delimiter)
Next I
W = Join(V, EOL)
MultiJoin = W & EOL
End Function
One could shorten the code by using WorksheetFunctions, but I would guess execution time would be slower.
Shortened Code
Option Explicit
Function MultiJoin(Rng As Range, Delimiter As String, EOL As String) As String
Dim V As Variant, W As Variant
Dim I As Long, J As Long
V = Rng
With WorksheetFunction
For I = 1 To UBound(V, 1)
V(I, 1) = Join(.Index(V, I, 0), Delimiter)
Next I
MultiJoin = Join(.Transpose(.Index(V, 0, 1)), EOL) & EOL
End With
End Function

This solution will require either a reference to the Microsoft Forms 2.0 Object Library in your project or some other way of fetching the contents of the clipboard (like through an API call).
Function TurnExcelRangeIntoVBAString(Optional cellDelimiter As String = ",", _
Optional rowDelimiter As String = "#") _
As String
Dim rng As Range
Set rng = ActiveSheet.UsedRange
rng.Copy
Dim clip As New MSForms.DataObject
Dim txt As String
clip.GetFromClipboard
txt = clip.GetText()
txt = Replace(Replace(txt, vbTab, cellDelimiter), vbCrLf, rowDelimiter)
TurnExcelRangeIntoVBAString = txt
End Function

you could try this
Option Explicit
Sub main()
Dim strng As String
Dim cell As Range
With Worksheets("TurnRangeIntoString") '<--| change "TurnRangeIntoString" to your actual worksheet name
For Each cell In Intersect(.UsedRange, .Columns(1)) '<--| loop through its column 1 cells
strng = strng & Join(Application.Transpose(Application.Transpose(.Range(cell, cell.End(xlToRight)).value)), ",") & "#" '<--| build string
Next cell
End With
MsgBox strng
End Sub

Sub aquatique()
dim a(),s$,i&,j&:a=selection.value
for i=1 to ubound(a)
for j=1 to ubound(a,2)
if j=1 then
if i=1 then
s= a(i,j)
else
s=s &"#" & vbnewline & a(i,j)
end if
else
s=s &";" & a(i,j)
end if
next
next
end sub
simple but does the job. Slow on huge ranges, you'd need to use "join"

How about this?:
Sub Concatenate()
Dim Cel As Range, Rng As Range
Dim sString As String, r As Long, c As Long, r2 As Long
Set Rng = Selection
r = Selection.Row
c = Selection.Column
r2 = Selection.Row
For Each Cel In Rng
r = Cel.Row
If sString = "" Then
sString = Cel.Value
Else
If r <> r2 Then sString = sString & "#" & Cel.Value
If r = r2 Then sString = sString & "," & Cel.Value
End If
r2 = Cel.Row
Next
sString = sString & "#"
Debug.Print sString
End Sub

Related

How to join returned values from named range separated by comma

I've spent hours trying to find out how to join returned values from a named range, but the result is a
run-time error 32 - Type mismatch.
As a newbie I'm still struggling with arrays, so maybe I've overlooked some detail. Thank you for helping me out.
Example: (B1)Benzine, (B2)Diesel, (B3)Hybride -> (E1)Gasoline, (E2)Diesel, (E3)Hybrid
This is the named range:
Another example (to be more clear):
Example 2: (B1)Benzine, (B3)Hybride -> (E1)Gasoline, (E3)Hybrid
Option Explicit
Sub splitter()
Dim i As Long
Dim w As Long
'Dim oWB As Workbook
Dim oWS As Worksheet
Dim oWS9 As Worksheet
Dim rngMOTOR As Range
Dim rngMOTOR2 As Range
Dim arrMOTOR() As Variant
Dim LastRow As Long
'Set oWB = Workbooks("BRONBESTAND.xlsm")
Set oWS = Sheets("ONDERDELEN")
Set oWS9 = Sheets("MOTOR") '5 columns: 1 Short & LONG + 1 NL + 3 Languages !!!!! WARNING
LastRow = oWS.Range("A" & Rows.Count).End(xlUp).Row
For i = 2 To LastRow 'Starting below headers
Set rngMOTOR = oWS.Cells(i, "M") 'MOTOR ...
Set rngMOTOR2 = oWS9.Range("MOTOR") 'MOTOR2: MOTOR - Bronbestand arrPOS = rngPOS2.Value
arrMOTOR = rngMOTOR2.Value
'*********
Dim txt As String
Dim j As Integer
Dim Splitted As Variant
Dim arrMOTORall As Variant
Dim arrMOTORsplit As Variant
Dim Motor3 As String
txt = oWS.Cells(i, "M") 'MOTOR ...
Debug.Print ("txt : ") & i & ": "; txt
If Not IsEmpty(txt) Then
Splitted = Split(txt, ", ")
For j = 0 To UBound(Splitted)
Cells(1, j + 1).Value = Splitted(j)
Debug.Print (" ---> Splitted: ") & Splitted(j)
'**** INSERT *****
For w = LBound(arrMOTOR) To UBound(arrMOTOR)
If arrMOTOR(w, 1) = Splitted(j) Then 'EX: B - Benzine
arrMOTORsplit = (arrMOTOR(w, 4)) '(arrMOTOR(y, 2)) -> 1=SHORT+LONG , 2=NL, 3=FR, 4=EN
Debug.Print (" ---> arrMOTORsplit: ") & i & ": " & arrMOTORsplit
'**** JOIN ****
arrMOTORall = Join(arrMOTORsplit, ", ")
Debug.Print ("arrMOTORall: ") & arrMOTORall
End If
Next w
Next j
End If
Next i
End Sub
Get comma separated strings for each column in named range
I didn't analyze your code, but this should work to receive the first three values joined
"Benzine, Diesel, Hybride" ' e.g. from first column
or
"Gasoline, Diesel, Hybrid" ' e.g. from the fourth column
from a named range "Motor" via the Application.Index function.
Notes
The parameter 0 in this Index function indicates to not choose a specific row, the Parameter ColNo chooses each of your columns in a Loop. A subsequent transposition allows to change the 2 dimensioned array values to a 1-dim array. The Join function needs a 1-dim array and concatenates the chosen column items therein.
Hint: The following sample code uses a fully qualified range reference assuming that you don't call the TestMe procedure from your Personal Macro Library. In the latter case you'd have to change references and workbook identification (not using ThisWorkbook!).
Example code
Option Explicit ' declaration head of your code module
Sub TestMe()
Dim v As Variant, ColNo As Long
' assign first three rows to variant 1-based 2-dim datafield array
v = ThisWorkbook.Worksheets("Motor").[Motor].Resize(3, 4) ' Named range value
' write comma separated list for each column
For ColNo = 1 To 4
Debug.Print Join(Application.Transpose(Application.Index(v, 0, ColNo)), ", ")
Next ColNo
End Sub
EDIT - Flexible Search in ANY ORDER to translate joined lists
This solution allows to return joined search words in any combination using the Application.Index function in an advanced way using row and column arrays as parameters. The main function getSplitters() creates a variant 2-dim array in only three steps without loops and redims and uses two language constants (Const DUTCH and Const ENGLISH).:
assigns data to variant 1-based 2-dim datafield array
gets only the selected rows based on comma separated string values
reduces the same array to Dutch and English columns
Calling Code
Due to your OP the calling code anylyzes all comma separated strings in Column M in your sheet "ONDERDELEN" as far as there are values in column A. This is made by passing these found string values to the main function getSplitters with an innovative approach to get results in only three steps without Loops (see function code below).
Translation is based on values in the named range Motor "B1:E4" in sheet "Motor" where rows comprise different sort of fuel with neighbouring columns for different languages (starting with Dutch in the first column and English in the fourth col).
Note that using VBA it is faster to loop through an array to get values than through a range.
Option Explicit ' declaration head of your code module
Const DUTCH As Integer = 1
Const ENGLISH As Integer = 4
Sub TranslateAnyFuelCombination()
' Purpose: returns comma separated lists in column "M" and translates from Dutch to English
' Example: "Benzine, Hybride, Diesel" (Dutch) gets to "Gasoline, Hybrid, Diesel" in English
Dim s As String
Dim oWS As Worksheet, i&, LastRow&, vMOTOR As Variant
Set oWS = Thisworkbook.Worksheets("ONDERDELEN") ' fully qualified reference
' Get last row of wanted data
LastRow = oWS.Range("A" & Rows.Count).End(xlUp).Row
vMOTOR = oWS.Range("M1:M" & LastRow)
For i = 2 To LastRow 'Starting below headers
Debug.Print getSplitters(vMOTOR(i, 1))
Next i
End Sub
Main function
Function getSplitters(ByVal sRows As String) As String
Dim i As Long, j As Long
Dim v As Variant, a As Variant
' [0] analyze selected rows string, e.g. "Benzine, Hybride, Diesel"
a = getRowAr(sRows) ' -> assign 1-dim Rows Array(1, 3, 2)
' [1] assign data to variant 1-based 2-dim datafield array
v = Application.Transpose(ThisWorkbook.Worksheets("Motor").[Motor]) ' Named range value
' [2] get only selected rows, e.g. 1st, 3rd and 2nd -> in free order (!) Benzine, Hybride, Diesel
v = Application.Transpose(Application.Index(v, _
Application.Evaluate("row(1:" & UBound(v, 2) & ")"), _
a)) ' transposed columns array = selected rows
' [3] reduce to Dutch and English columns
v = Application.Transpose(Application.Index(v, _
Application.Evaluate("row(1:" & (UBound(a) + 1) & ")"), _
Array(DUTCH, ENGLISH))) ' selected columns array (above array retransposed)
' [4] return concatenated strings
getSplitters = Join(Application.Transpose(Application.Transpose(Application.Index(v, 1, 0))), ", ") & " -> " & _
Join(Application.Transpose(Application.Transpose(Application.Index(v, 2, 0))), ", ")
End Function
Two helper functions
Function getRowAr(ByVal sList As String) As Variant
' Purpose: split comma separated list into 1-dim number array in FREE ORDER
' Example: "Benzine, Hybride, Diesel" -> Array(1, 3, 2)
Dim ar, i&
' change words in comma separated list to numbers
ar = Split(Replace(sList, " ", ""), ",")
For i = LBound(ar) To UBound(ar)
ar(i) = val(getNumber(ar(i))) ' change to numbers
Next i
getRowAr = ar ' return
End Function
Function getNumber(ByVal s As String) As Long
' Purpose: replaces dutch search words with corresponding row number
Dim arFuel
' get search words to 1-dim array
arFuel = Application.Index(ThisWorkbook.Worksheets("Motor").[Motor], 0, DUTCH)
' return corresponding number
getNumber = Application.Match(s, arFuel)
End Function
Addendum (Edit due to comment)
The above code works as intended if you are sure that the concatenated search words (or starting parts) actually match else an Error 13 is raised. You can solve this issue in two steps:
Insert an empty first row into your named range Motor (or fill it e.g. with ?, #N/A etc.)
Change the 2nd helper function as follows:
Edited function getNumber()
Function getNumber(ByVal s As String) As Long
' Purpose: replaces dutch search words with corresponding row number
Dim arFuel
' get search words to 1-dim array
arFuel = Application.Index(ThisWorkbook.Worksheets("Motor").[Motor], 0, DUTCH)
' return corresponding number
On Error Resume Next ' provide for not found case
getNumber = Application.Match(s, arFuel, 0) ' find only exact matches
If Err.Number <> 0 Then getNumber = 0 ' could be omitted in case of a zero return
End Function
With 2 arrays this is a possible solution:
Sub TestMe()
Dim inputString As String
Dim arrString As Variant
Dim arrResult As Variant
inputString = "Benzine, Diesel, Hybride"
arrString = Split(inputString, ",")
Dim total As Long: total = UBound(arrString)
ReDim arrResult(total)
Dim i As Long
For i = LBound(arrString) To UBound(arrString)
arrResult(total - i) = Trim(arrString(i))
Next i
Debug.Print Join(arrResult, " ,")
End Sub
However, there is a classic solution of this problem, reversing everything twice:
Sub TestMe()
Dim inputString As String
inputString = "Benzine, Diesel, Hybride"
inputString = StrReverse(inputString)
Dim arr As Variant: arr = Split(inputString, ",")
Dim i As Long
For i = LBound(arr) To UBound(arr)
arr(i) = Trim(StrReverse(arr(i)))
Next i
Debug.Print Join(arr, ", ")
End Sub

VBA Compare 2 arrays, write unique values to cell with comma delimiter

I have a series of 2 cells in which values are separated by a comma delimiter.
Example
Cell D1 = 1,2,3,4,5,6,7,8,9,10
Cell O1 = 1,2,3,4,5,6
I want to first use the split function to pass the values to an Array and subsequently compare those 2 Arrays to find out the unique/not double values.
These values then i want to write to another cell as values with a comma delimiter.
Based on this answer
Comparing two Dimension array
and something I found about adding values to an Array i tried my luck with this code
Sub compare()
Dim cont As Long
Dim x As Long
Dim y As Long
Dim Source As Variant
Dim Comparison As Variant
Dim Target As Variant
With ThisWorkbook.Worksheets("Open items")
For cont = 1 To .Cells(Rows.Count, 4).End(xlUp).Row
Source = Split(.Range("D" & cont).Value, ",")
Comparison = Split(.Range("O" & cont).Value, ",")
For x = LBound(Source) To UBound(Source)
For y = LBound(Comparison) To UBound(Comparison)
If Source(x, y) = !Comparison(x, y) Then
Target(UBound(Target)) = Source(x, y).Value
Next
Next
Next cont
End Sub
But seem to be stuck.
Is this the correct way to add a value to the Array Target?
How do I get the Array into the cell?
The result in my example should be for Target to contain "7", "8", "9" , and "10" and should be shown in a cell in the way
7,8,9,10
Thank you for your help!
Some issues:
Rows.Count will look in the active sheet, not necessarily in the "Open items" sheet. So you need to add the dot: .Rows.Count
Source(x, y) will not work, since Source only has one dimension. In fact y has nothing to do with Source. A similar remark holds for Comparison.
= ! is not a valid comparison operator. You maybe intended <>.
Target is not defined, and Target(UBound(Target)) will always refer to the same location. Instead, you could append the result to a string variable immediately.
Furthermore, I would use a Collection object for fast look up, so that the algorithm is not O(n²), but O(n):
Sub Compare()
Dim cont As Long
Dim source As Variant
Dim comparison As Variant
Dim part As Variant
Dim parts As Collection
Dim result As String
With ThisWorkbook.Worksheets("Open items")
For cont = 1 To .Cells(.Rows.Count, 4).End(xlUp).Row
source = Split(.Range("D" & cont).Value, ",")
comparison = Split(.Range("O" & cont).Value, ",")
' Add the source items in a collection for faster look-up
Set parts = New Collection
For Each part In source
parts.Add Trim(part), Trim(part)
Next
' Remove the comparison items from the collection
For Each part In comparison
On Error Resume Next ' Ignore error when part is not in parts
parts.Remove Trim(part)
If Err Then parts.Add Trim(part), Trim(part) ' Add part if not yet in parts
On Error GoTo 0 ' Stop ignoring errors
Next
' Turn the remaining collection to comma-separated string
result = ""
For Each part In parts
result = result & ", " & part
Next
result = Mid(result, 3) ' Remove first comma and space
' Store the result somewhere, for example in the E column
.Range("E" & cont).Value = result
Next cont
End With
End Sub
Alternative for Sorted Lists
When your source and comparison lists are sorted in numerical order, and you need the target to maintain that sort order, you could use a tandem-kind of iteration, like this:
Sub Compare()
Dim cont As Long
Dim source As Variant
Dim comparison As Variant
Dim x As Long
Dim y As Long
Dim result As String
With ThisWorkbook.Worksheets("Open items")
For cont = 1 To .Cells(.Rows.Count, 4).End(xlUp).Row
source = Split(.Range("D" & cont).Value, ",")
comparison = Split(.Range("O" & cont).Value, ",")
x = LBound(source)
y = LBound(comparison)
result = ""
Do While x <= UBound(source) And y <= UBound(comparison)
If Val(source(x)) < Val(comparison(y)) Then
result = result & ", " & Trim(source(x))
x = x + 1
ElseIf Val(source(x)) > Val(comparison(y)) Then
result = result & ", " & Trim(comparison(y))
y = y + 1
Else
x = x + 1
y = y + 1
End If
Loop
' Flush the remainder of either source or comparison
Do While x <= UBound(source)
result = result & ", " & Trim(source(x))
x = x + 1
Loop
Do While y <= UBound(comparison)
result = result & ", " & Trim(comparison(y))
y = y + 1
Loop
result = Mid(result, 3) ' Remove first comma and space
' Store the result somewhere, for example in the E column
.Range("E" & cont).Value = result
Next cont
End With
End Sub
Try this small UDF():
Public Function unikue(BigString As String, LittleString As String) As String
Dim B As Variant, L As Variant, Barr, Larr
Dim Good As Boolean
Barr = Split(BigString, ",")
Larr = Split(LittleString, ",")
For Each B In Barr
Good = True
For Each L In Larr
If L = B Then Good = False
Next
If Good Then unikue = unikue & "," & B
Next B
If unikue <> "" Then unikue = Mid(unikue, 2)
End Function
Couple of things with this code
the variable Target() - You never tell code how big this array is or if you want to make it bigger - my full code below will grow for each match that is found
Source(x, y).Value - You dont need to use Value for arrays. you also do not need x and y as you are only reading in one column you only need source(x)
Where I have wrote MISSING in the full code - these lines where missing and would have caused you issues.
The purpose of Found is that for every time source(x) is found in Comparison(y) then Found is incremented. If it has never been incremented then we can assume that it is to be captured in target.
One other note is that you do not specify where you want to output Target to. so currently the target array does not go anywhere
Sub compare()
Dim cont As Long
Dim x As Long
Dim y As Long
Dim Source As Variant
Dim Comparison As Variant
Dim Target() As Variant
ReDim Target(1)
With ThisWorkbook.Worksheets("Open items")
For cont = 1 To .Cells(.Rows.Count, 4).End(xlUp).Row
Source = Split(.Range("D" & cont).Value, ",")
Comparison = Split(.Range("O" & cont).Value, ",")
For x = LBound(Source) To UBound(Source)
Found = 0
For y = LBound(Comparison) To UBound(Comparison)
If Source(x) = Comparison(y) Then
Found = Found + 1
'count if found
End If 'MISSING
Next
'if values are found dont add to target
If Found = 0 Then
Target(UBound(Target)) = Source(x)
ReDim Preserve Target(UBound(Target) + 1)
End If
Next
Next cont
End With 'MISSING
End Sub

Calculatin within an Array

I'm a little bit familiar with VBA Excel 2013. My problem now is, that I need to sum the values where specific Strings are available.
I need to calculate within a column only the values where a lookup is true.
My table looks like:
9 | AD,DCO,PD
5 | AD
5 | PD
15 | PD
So the sum for AD would be 14 or DCO would be 9 or PD would be 29 an therefore the result table have to look like this:
AD | DCO | PD
—–—–—–—–—–––—–
14 | 9 | 29
Can anybody help me?
Forget VBA. Sumproduct solves this perfectly.
I'm assuming that your commas separate cells, so Column A has the numbers, then columns B-D have the letters. If they do not, you can always use the Text to Columns function in excel (look under the Data tab) to make this so.
Put the following function where you wnat the answer (say, B10):
=SUMPRODUCT(($A$1:$A$4) * ($B$1:$D$4 = $A$10))
You can then type in cell A10 the code letter and it will sum up column A, where columns B-D contain that text.
If you want to display all text options, you can make a list of them (say A10 = AD, A11 = DCO, A12 = PD), then in B10 put the same formula as above but remove the $ from in front of the number (so it reads $A10). You can then drag it down and it will tally up for each of the values in A10-A13 and display the results in B10 - B13.
If I have correctly interpreted your sample data into a pseudo-table then the native SUMIF function will conditionally sum the numbers in column A using a wildcard criteria against column B. Example in D2 (as the image below) would be =SUMIF($B$1:$B$4,"*"&D$1&"*",$A$1:$A$4).
             
Fill or copy right as necessary. The caution is that wildcards may make false positives when the criteria closely resembles other values. This is not the case with your sample data.
Problem Solved, but do you have any suggestion to speed up the code?
Sub Calculate()
Dim Dict As Object
Dim i As Long
Dim e As Long
Set Dict = CreateObject("Scripting.Dictionary")
Dim b, m As Long
Dim a As Long
Dim c As Variant
Dim j As Long
Dim sum_Dict As Variant
Dim column_Dict As Variant
i = 1
e = 16
a = i
a = a - 2
For i = i To e
For Each sum_Dict In Range("A" & i & ":" & "A" & e) 'Spalte V = Price
If sum_Dict = "" Then
i = i + 1
Else
For Each c In Range("B" & i & ":F" & i) 'All Columns from B til F
' iterating over the array
For j = LBound(column_Dict) To UBound(column_Dict)
' assigning the separated values to columns
b = CDbl(sum_Dict) 'convert to double
c = CStr(c) 'convert to string
If Dict.Exists(c) Then 'check if key exists
Dict.Item(c) = Dict.Item(c) + b 'if key exists sum the items
Else 'if the key does not exist create it
Dict.Add Key:=c, Item:=b
End If
Next j
Next c
i = i + 1
'extract keys into variant array
'Debug.Print "Array of Keys and items"
For m = 0 To Dict.Count - 1
'Debug.Print Dict.Keys()(m), Dict.Items()(m)
Next m
'Debug.Print Dict.Count & " Items in Dictionary"
End If
Next sum_Dict
Next
'-- output to sheet using first 1D Array
Range("H" & a + 2).Resize(1, _
UBound(Application.Transpose(Dict.Keys()))) = Dict.Keys()
'-- output to sheet using dictionary
Range("H" & a + 3).Resize(1, _
UBound(Application.Transpose(Dict.Items()))) = Dict.Items()
Dict.RemoveAll ' empty dictionary
End Sub

Efficient way to create an inverted index in VBA

I am creating an inverted index to get a dictionary of words with an associated list of the line numbers that the word appears on (starting the line numbers and a list of words that appear in a given cell within that line).
I have managed to get some code working for this, but I found dealing with adding to the arrays (the values in the dictionary) to be a little cumbersome and I wonder is there is a more efficient or more elegant way to handle this.
I am open to using arrays, collections or any other data type that can be easily searched to store the list of line numbers in the values of the dictionary. I have pasted a cut down version of my code to demonstrate the core problem below, the question is really just about the BuildInvertedIndex procedure, but the rest is included to try to make it easier to recreate the scenario:
Sub Test()
' minimum included here to demonstrate use of buildInvertedIndex procedure
Dim vRange As Range
Dim vDict As Dictionary
Set vRange = ActiveSheet.Range("F2:F20585")
Set vDict = New Dictionary
BuildInvertedIndex vDict, vRange
' test values returned in dictionary (word: [line 1, ..., line n])
Dim k As Variant, vCounter As Long
vCounter = 0
For Each k In vDict.Keys
Debug.Print k & ": " & ArrayToString(vDict.Item(k))
vCounter = vCounter + 1
If vCounter >= 10 Then
Exit For
End If
Next
End Sub
Sub BuildInvertedIndex(pDict As Dictionary, pRange As Range)
Dim cell As Range
Dim words As Variant, word As Variant, val As Variant
Dim tmpArr() As Long
Dim newLen As Long, i As Long
' loop through cells (one col wide so same as looping through lines)
For Each cell In pRange.Cells
' loop through words in line
words = Split(cell.Value)
For Each word In words
If Not pDict.exists(word) Then
' start line array with first row number
pDict.Add word, Array(cell.Row())
Else
i = 0
If Not InArray(cell.Row(), pDict.Item(word)) Then
newLen = UBound(pDict.Item(word)) + 1
ReDim tmpArr(newLen)
For Each val In tmpArr
If i < newLen Then
tmpArr(i) = pDict.Item(word)(i)
Else
tmpArr(i) = cell.Row()
End If
i = i + 1
Next val
pDict.Item(word) = tmpArr
End If
End If
Next word
Next cell
End Sub
Function ArrayToString(vArray As Variant, _
Optional vDelim As String = ",") As String
' only included to support test (be able to see what is in the arrays)
Dim vDelimString As String
Dim i As Long
For i = LBound(vArray) To UBound(vArray)
vDelimString = vDelimString & CStr(vArray(i)) & _
IIf(vCounter < UBound(vArray), vDelim, "")
Next
ArrayToString = vDelimString
End Function
To run this you will need values in column F of the active sheet (sentences), if you do not already have it you will also need to add a reference to the Microsoft Scripting Runtime in your VBA environment for the dictionary data type to be available (tools -> references -> Microsoft Scripting Runtime).
As you will see from the code this gets a bit messy where I have to insert a new line number into an existing array (that is stored as a value within the dictionary). As I do not know of a way to just extend this array (without clearing the existing values), I have used the variable tmpArr to create an array of the appropriate size and then copy the values one by one from the existing array in the dictionary and then add the current row number to the end. The temporary array is then used to replace the existing value for that key (the current word).
Any advice on this would be greatly appreciated.
I am open to using arrays, collections or any other data type
As I see, using collection instead array would be much simplier:
Sub BuildInvertedIndex(pDict As Dictionary, pRange As Range)
Dim cell As Range
Dim words, word
Dim i As Long
' loop through cells (one col wide so same as looping through lines)
For Each cell In pRange.Cells
' loop through words in line
words = Split(cell.Value)
For Each word In words
If Not pDict.Exists(word) Then
' initialize collection
pDict.Add word, New Collection
End If
'try to add to collection. If row is already in collecton, nothing happend. Storing key makes you sure there're only unique rows
On Error Resume Next
pDict.Item(word).Add Item:=cell.Row, Key:=CStr(cell.Row)
On Error GoTo 0
Next word
Next cell
End Sub
Next step, is to slightly modify ArrayToString to ColToString:
Function ColToString(vCol As Collection, _
Optional vDelim As String = ",") As String
' only included to support test (be able to see what is in the arrays)
Dim vDelimString As String
Dim i As Long
For i = 1 To vCol.Count
vDelimString = vDelimString & CStr(vCol.Item(i)) & _
IIf(i < vCol.Count, vDelim, "")
Next
ColToString = vDelimString
End Function
and the test subroutine (changed only one row - Debug.Print k & ": " & ColToString(vDict.Item(k)) and target range to "F2:F5"):
Sub Test()
' minimum included here to demonstrate use of buildInvertedIndex procedure
Dim vRange As Range
Dim vDict As Dictionary
Set vRange = ActiveSheet.Range("F2:F5")
Set vDict = New Dictionary
BuildInvertedIndex vDict, vRange
' test values returned in dictionary (word: [line 1, ..., line n])
Dim k As Variant, vCounter As Long
vCounter = 0
For Each k In vDict.Keys
Debug.Print k & ": " & ColToString(vDict.Item(k))
vCounter = vCounter + 1
If vCounter >= 10 Then
Exit For
End If
Next
'clean up memory
Set vDict = Nothing
End Sub
RESULT:
UPDATE:
to improve speed of your code you could store range in array (next approach work only with single-column range, but you could easily modify it):
Test sub:
Sub TestWirhArray()
' minimum included here to demonstrate use of buildInvertedIndex procedure
Dim vRange As Range
Dim vDict As Dictionary
Dim myArr As Variant
Set vDict = New Dictionary
Set vRange = ActiveSheet.Range("F2:F20585")
myArr = vRange.Value
BuildInvertedIndexWithArr vDict, myArr, vRange.Row
' test values returned in dictionary (word: [line 1, ..., line n])
Dim k As Variant, vCounter As Long
vCounter = 0
For Each k In vDict.Keys
Debug.Print k & ": " & ColToString(vDict.Item(k))
vCounter = vCounter + 1
If vCounter >= 10 Then
Exit For
End If
Next
'clean up memory
Set vDict = Nothing
End Sub
new version of BuildInvertedIndexWithArr:
Sub BuildInvertedIndexWithArr(pDict As Dictionary, pArr, firstRow As Long)
Dim cell, words, word
Dim i As Long, j As Long
j = firstRow
' loop through cells (one col wide so same as looping through lines)
For Each cell In pArr
' loop through words in line
words = Split(cell)
For Each word In words
If Not pDict.exists(word) Then
' initialize collection
pDict.Add word, New Collection
End If
On Error Resume Next
pDict.Item(word).Add Item:=j, Key:=CStr(j)
On Error GoTo 0
Next word
j = j + 1
Next cell
End Sub

Is it possible to fill an array with row numbers which match a certain criteria without looping?

I would like to fill an array in VBA with the row numbers of only rows which meet a certain criteria. I would like the fastest method possible (for example, something like RowArray = index(valRange=valMatch).row)
Below is the code for the (slow) range loop.
Current Code
Sub get_row_numbers()
Dim RowArray() As Long
Dim valRange As Range
Dim valMatch As String
Set valRange = ActiveSheet.Range("A1:A11")
valMatch = "aa"
ReDim RowArray(WorksheetFunction.CountIf(valRange, valMatch) - 1)
For Each c In valRange
If c.Value = valMatch Then RowArray(x) = c.Row: x = x + 1
Next c
End Sub
Still around 2-3 times the time of the efficient variant array from Chris, but the technique is powerful and has application beyond this question
One point to note is that Application.Transpose is limited to 65536 cells, so a longer range needs to be "chunked" into pieces.
Sub GetEm()
Dim x
x = Filter(Application.Transpose(Application.Evaluate("=IF(A1:A50000=""aa"",ROW(A1:a50000),""x"")")), "x", False)
End Sub
First copy the range to a variant array , then loop over the array
Arr = rngval
For I = 1 to ubound(arr)
If arr(I,1) = valMatch Then RowArray(x) = I: x = x + 1
Next
There is an assumption in the question title: that a looping solution is slow and a non-looping solution is faster. So, I conducted some comparisons to check that.
Test Case
I created some sample data consisting of 50,000 samples, and 50% matching values. For the fastest methods I created two more sample sets, again with 50,000 rows and one with 10% matching rows, another with 90% matching row.
I ran each of the posted methods over this data in a loop, repeating the logic 10 times (so times are for processing a total of 500,000 rows).
50% 10% 90%
ExactaBox 1300 1240 1350 ms
Scott Holtzman 415000
John Bustos 12500
Chris neilsen 310 310 310
Brettdj 970 970 970
OP 1530 1320 1700
So the moral is clear: just because it includes a loop, doesn't make it slow. What is slow is access the worksheet, so you should make every effort to minimise that.
Update
Added test of Brettdj's comment: single line of code
For completeness sake, here's my solution
Sub GetRows()
Dim valMatch As String
Dim rData As Range
Dim a() As Long, z As Variant
Dim x As Long, i As Long
Dim sCompare As String
Set rData = Range("A1:A50000")
z = rData
ReDim a(1 To UBound(z, 1))
x = 1
sCompare = "aa"
For i = 1 To UBound(z)
If z(i, 1) = sCompare Then a(x) = i: x = x + 1
Next
ReDim Preserve a(1 To x - 1)
End Sub
Building off what others have offered here, I've combined both methods along with some string manipulation to get the exact row numbers of any given range containing the desired match without looping.
The only note that differs from your code is that RowArray() is a String type. However, you could convert it to Long using CLng as you strip numbers out as needed, if you need to do that.
Sub get_row_numbers()
Dim rowArray() As String, valRange As Range, valMatch As String
Dim wks As Worksheet, I As Long, strAddress As String
Set wks = Sheets(1)
valMatch = "aa"
With wks
Set valRange = .Range("A1:A11")
Dim strCol As String
strCol = Split(valRange.Address, "$")(1)
'-> capture the column name of the evaluated range
'-> NB -> the method below will fail if a multi column range is selected
With valRange
If Not .Find(valMatch) Is Nothing Then
'-> make sure valMatch exists, otherwise SpecialCells method will fail
.AutoFilter 1, valMatch
Set valRange = .SpecialCells(xlCellTypeVisible)
'-> choose only cells where ValMatch is found
strAddress = valRange.Address '-> capture address of found cells
strAddress = Replace(Replace(strAddress, ":", ""), ",", "") '-> remove any commas and colons
strAddress = Replace(strAddress, "$" & strCol & "$", ",") '-> replace $column$ with comma
strAddress = Right(strAddress, Len(strAddress) - 1) '-> remove leading comma
rowArray() = Split(strAddress, ",")
'-> test print
For I = 0 To UBound(rowArray())
Debug.Print rowArray(I)
Next
End If 'If Not .Find(valMatch) Is Nothing Then
End With ' With valRange
End With 'With wks
End Sub
You may want to look at Find vs Match vs Variant Array which concludes that the variant array approach is fastest unless the hit density is very low.
But the fastest method of all is only for sorted data and exact match: use binary search to find the fisrt and last ocurrences and then get that subset of data into a variant array.
I still have a loop, but only through the necessary rows to populate the array:
Sub get_row_numbers()
Dim RowArray() As Long
Dim valRange As Range
Dim valMatch As String
Set valRange = ActiveSheet.Range("A1:A11")
valMatch = "aa"
ReDim RowArray(WorksheetFunction.CountIf(valRange, valMatch) - 1)
Dim c As Range
Dim x As Integer
Set c = valRange.Find(What:=valMatch, LookIn:=xlFormulas, SearchOrder:=xlByRows, SearchDirection:=xlNext)
Do
RowArray(x) = c.Row
Set c = valRange.FindNext(after:=c)
x = x + 1
Loop Until x = UBound(RowArray) + 1
End Sub
You have your range hard-coded in the example. Do you have a spare column to the right? If so, you could fill the cells to the right with 0 if it's not a match, or the row number if it is. Then pull that into an array and filter it. No loops:
Sub NoLoop()
Dim valMatch As String
Dim rData As Excel.Range, rFormula As Excel.Range
Dim a As Variant, z As Variant
Set rData = ThisWorkbook.Worksheets(1).Range("A1:A11") 'hard-coded in original example
Set rFormula = ThisWorkbook.Worksheets(1).Range("B1:B11") ' I'm assuming this range is currently empty
valMatch = "aa" 'hard-coded in original example
'if it's a valid match, the cell will state its row number, otherwise 0
rFormula.FormulaR1C1 = "=IF(RC[-1]=""" & valMatch & """,ROW(RC),0)"
a = Application.Transpose(rFormula.Value)
z = Filter(a, 0, False) 'filters out the zeroes, you're left with an array of valid row numbers
End Sub
I have to credit Jon49 at One-dimensional array from Excel Range for the Application.Transpose trick to get a 1-d array.
Everyone, thanks for your individual inputs.
ExactaBox, your solution has been much helpful to me. However, there is a catch in returning 0 value through formula
rFormula.FormulaR1C1= "=IF(RC[-1]=""" & valMatch & """,ROW(RC),0)".
Since VBA Filter function filters out values by making string comparisons, it would also filter out row numbers having zeroes in them. For example valid row numbers, 20, 30, 40 etc. shall also be filtered out because they contain zeroes, so it would be better to write a string in place of 0 in the formula, which could therefore be:
rFormula.FormulaR1C1= "=IF(RC[-1]=""" & valMatch & """,ROW(RC),""Valid"")"
as was also suggested by brettdj above, who used "x" string as the last argument.

Resources