Array search/compare is slow, compare to Excel VBA - arrays

I just switched from VBA (Excel) to VB (Visual Studio Express 2013).
Now I have copied parts of my code from VBA to VB.
And now I'm wondering why VB is so slow...
I'm creating an Array (IFS_BV_Assy) with 4 column and about 4000 rows.
There are some identical entrys in it, so I compare every entry with each other and override the duplicate with a empty string.
The Code looks like that:
For i = 1 To counter
For y = 1 To counter
If IFS_BV_Assy(1, y) = IFS_BV_Assy(1, i) And i <> y Then
If IFS_BV_Assy(2, i) < IFS_BV_Assy(2, y) Then
IFS_BV_Assy(1, i) = ""
Else
IFS_BV_Assy(1, y) = ""
End If
Exit For
End If
Next
Next
Counter is the lenght of the Array.
In VBA it takes about 1 Sec. In VB it takes about 30 Sec. to go thru the loop. Somebody knows why? (im creating some Timestamp between every Step to be sure whats slow. And that loop is the bad guy)
The Array looks like this:
(1,1) = 12.3.015 / (2,1) = 02
(1,2) = 12.3.016 / (2,2) = 01 <-- delete
(1,3) = 12.3.016 / (2,3) = 02 <-- keep, because 02 is newer then 01
(1,4) = 12.3.017 / (2,4) = 01
(1,5) = 12.3.018 / (2,5) = 01
Thanks in advance
Andy
Edit: I create the Array like that:
strStartPath_BV_Assy = "\\xxx\xx\xx\"
myFile = Dir(strStartPath_BV_Assy & "*.*")
counter = 1
ReDim IFS_BV_Assy(0 To 2, 0 To 0)
IFS_BV_Assy(0, 0) = "Pfad"
IFS_BV_Assy(1, 0) = "Zg."
IFS_BV_Assy(2, 0) = "Rev"
Do While myFile <> ""
If UCase(Right(myFile, 3)) = "DWG" Or UCase(Right(myFile, 3)) = "PDF" Then
ReDim Preserve IFS_BV_Assy(0 To 2, 0 To counter)
IFS_BV_Assy(0, counter) = strStartPath_BV_Assy + myFile
IFS_BV_Assy(1, counter) = Left(Mid(myFile, 12), InStr(1, Mid(myFile, 12), "-") - 1)
IFS_BV_Assy(2, counter) = Mid(myFile, Len(myFile) - 8, 2)
counter = counter + 1
End If
myFile = Dir()
Loop

Maybe data was best case (around 4000) when ran in VBA.
30 sec seems a reasonable time for 4000x4000=16.000.000 iterations. 1 sec is too low for this number of iterations.

Stokke suggested to create the array as String instead of Objekt Type.
Dim IFS_BV_Assy(,) as String
I create the Module with Option Explicit Off, because I never see any difference in VBA for that point. Now I declare any variable with Dim .. as ....
And now, it's as fast as VBA is =)
Learning = making mistakes.. =)

Related

VBA: Return the closest value using a 1-Dimensional Array

I am using the WorksheetFunction.Large and WorksheetFunction.CountIf commands to determine the closest "jaw size" using a 1-Dimensional array as the source data, shown below.
wsSheet.Range("H2").Value = WorksheetFunction.Large(myArray, WorksheetFunction.CountIf(myArray, ">" & SizePush) + 1)
The problem I am having is when I use whole numbers (1, 2, 3, 4) the resulting jaw size does not take the closest value from the array, it takes the second closest value. The array I am using is shown in image 1 (myArray), and 'SizePush' refers to the following equation: (Start Diameter - (Start Diameter - End Diameter))-0.05.
a snippet of the jaw size array
I have attached the code that I am using. If anyone can help that would be greatly appreciated because I cannot figure out why only whole numbers cause an issue.
Dim StartDiam, EndDiam, PReduction, Push1, Push2, Push3, Push4, SizePush
StartDiam = 0.5
EndDiam = 4.75
PReduction = Worksheets("Sheet1").Range("D2").Value
Push1 = Worksheets("Sheet1").Range("I2").Value
Push2 = Worksheets("Sheet1").Range("I3").Value
Push3 = Worksheets("Sheet1").Range("I4").Value
Push4 = Worksheets("Sheet1").Range("I5").Value
SizePush = Worksheets("Sheet1").Range("I6").Value
Dim myArray
Set myArray = Range("T2:T51")
Dim wsSheet As Worksheet
Set wsSheet = Worksheets("Sheet1")
If StartDiam < wsSheet.Range("B2").Value Then
If EndDiam > wsSheet.Range("C2").Value Then
'size of jaw if the push is one
If wsSheet.Range("I2").Value = Push1 Then
wsSheet.Range("H2").Value = WorksheetFunction.Large(myArray, WorksheetFunction.CountIf(myArray, ">" & SizePush) + 1)
Exit Sub
End If

Finding all possible combos for n * m array, excluding certain values

I have an array that can vary in size, with n columns and m rows, and I need to find all the combinations of one element for each row/column combination, but exclude any combinations where the element is zero. So, in practice, if I have:
Row
Item1
Item2
Item3
1
A
B
C
2
D
E
F
I will have 2^3 = 8 possible combinations: ABC, ABF, AEC, AEF, DBC, DBF, DEC, DEF.
But if instead of B I have a zero in row 1 Item2, I want to exclude that cell from the list of combinations (in bold above), so I would end up with: AEC, AEF, DEC and DEF.
I found some code that give me all the possible combinations on a fixed number of columns (Macro to make all possible combinations of data in various columns in excel sheet), but it doesn't account for an array that can change dimensions, or for the exclusion rule above.
I'm just going to post the code for the simple (no zeroes) case so you can see where I'm going with this (of course I have realised that Base switches over to letters for radix 11 onwards so this might not be the smartest approach :) )
Function ListCombos(r As Range)
Dim s As String, result As String
Dim arr()
Dim j As Integer, offset As Integer
Dim rows As Integer, cols As Integer
Dim nComb As Long, i As Long
rows = r.rows.Count
cols = r.Columns.Count
nComb = rows ^ cols
ReDim arr(1 To nComb)
For i = 1 To nComb
s = Application.Base(i - 1, rows, cols)
result = ""
For j = 1 To cols
offset = CInt(Mid(s, j, 1))
result = result & r.Cells(1, 1).offset(offset, j - 1)
Next j
arr(i) = result
Next i
ListCombos = arr
End Function
This is the version skipping combinations which contain zeroes. The method is to move non-zero values to the first rows of a holding array so effectively if you start with something like this
You make it look like this
So you don't have to generate or check all the combinations that contain zeroes.
Then use mixed radix to cycle through the combinations:
Option Explicit
Option Base 1
Function ListCombosWithZeroes(r As Range)
Dim s As String, result As String
Dim arr()
Dim i As Integer, j As Integer, offset As Integer, count As Integer, carry As Integer, temp As Integer
Dim rows As Integer, cols As Integer
Dim nComb As Long, iComb As Long
Dim holdingArr(20, 20) As String
Dim countArr(20) As Integer
Dim countUpArr(20) As Integer
rows = r.rows.count
cols = r.Columns.count
' Move non-zero cells to first rows of holding array and establish counts per column
For j = 1 To cols
count = 0
For i = 1 To rows
If r.Cells(i, j) <> 0 Then
count = count + 1
holdingArr(count, j) = r.Cells(i, j)
End If
Next i
countArr(j) = count
Next j
' Calculate number of combos
nComb = 1
For j = 1 To cols
nComb = nComb * countArr(j)
Next j
ReDim arr(1 To nComb)
'Loop through combos
For iComb = 1 To nComb
result = ""
For j = 1 To cols
offset = countUpArr(j)
result = result & holdingArr(offset + 1, j)
Next j
arr(iComb) = result
'Increment countup Array - this is the hard part.
j = cols
'Set carry=1 to force increment on right-hand column
carry = 1
Do
temp = countUpArr(j) + carry
countUpArr(j) = temp Mod countArr(j)
carry = temp \ countArr(j)
j = j - 1
Loop While carry > 0 And j > 0
Next iComb
ListCombosWithZeroes = arr
End Function
You don't have to have equal numbers of letters per column.
Here's a solution. Probably not most efficient, since it is O(n2), but it works.
Caveats
I put a '.' instead of zero to avoid dealing with numeric vs alphanumeric values, but you can easily change this
Since I build the strings incrementally I need indices to be predictable. Hence I fill all the possible combinations and then remove the ones containing a '.' in a second pass
Global aws As Worksheet
Global ur As Range
Global ccount, rcount, size, rptline, rptblock, iblk, iln, idx As Integer
Global tempcombos(), combos() As String
Public Sub Calc_combos()
Set aws = Application.ActiveSheet
Set ur = aws.UsedRange
ccount = ur.Columns.Count
rcount = ur.Rows.Count
size = (rcount - 1) ^ (ccount - 1)
ReDim tempcombos(size - 1)
ReDim combos(size - 1)
rptline = size / (rcount - 1)
rptblock = 1
For c = 2 To ccount
idx = 0
For iblk = 1 To rptblock
For r = 2 To rcount
For iln = 1 To rptline
tempcombos(idx) = tempcombos(idx) & Cells(r, c)
idx = idx + 1
Next iln
Next r
Next iblk
rptline = rptline / (rcount - 1)
rptblock = rptblock * (rcount - 1)
Next c
idx = 0
For iln = 0 To size - 1
If InStr(tempcombos(iln), ".") = 0 Then
combos(idx) = tempcombos(iln)
idx = idx + 1
End If
Next iln
End Sub
The Python way:
from dataclasses import dataclass, field
from itertools import product
from random import randint
from typing import Dict, List
#dataclass
class PriceComparison():
rows : int
cols : int
maxprice : int = 50
threshold : int = 0
itemcodes : List[List[str]] = field(init=False)
pricelist : Dict[str, int] = field(init=False)
def __post_init__(self):
##create sample data
self.itemcodes = [[f'A{r+self.cols*c:03d}' for c in range(self.rows)] for r in range(self.cols)]
print(self.itemcodes)
self.pricelist = {self.itemcodes[c][r]:randint(0,self.maxprice) for r in range(self.rows) for c in range(self.cols)}
##remove items with price = 0
for col in self.itemcodes:
for item in col[:]:
if self.pricelist[item] == 0:
print(f'removing {item} from {col}')
col.remove(item)
del self.pricelist[item]
def find_cheapest(self):
iterations = 1
for col in self.itemcodes:
iterations *= len(col)
print(f'this may require {iterations} iterations!')
cheapest = self.maxprice * self.cols + 1
for i, combo in enumerate(product(*self.itemcodes)):
##dummy price calculation
price = sum([self.pricelist[item] for item in combo]) * randint(1,10) // 10
if price < cheapest:
print(f'current cheapest is {price} at iteration {i}')
cheapest = price
if price < self.threshold:
print('under threshold: returning')
break
return cheapest
Some notes:
I assume the cheapest combo is not simply given by selecting the cheapest item in each column, otherwise we would not need all this complicated machinery; so I inserted a random coefficient while calculating the total price of a combo - this should be replaced with the actual formula
I also assume we have item codes in our input table, with prices for each item stored elsewhere. As sample data I create codes from 'A000' to 'Axxx', and assign a random price between 0 and a maxprice to each one
Items with price = 0 are removed immediately, before the search for the cheapest combo
For large input tables the search will take a very long time. So although it wasn't requested I also added an optional threshold parameter: if we find a total price under that value we consider it is cheap enough and stop the search
EDIT
The following is a Python 3.5 compatible version.
However it must be noted that with a 10x15 input table the number of required iterations will be somewhere near 1E+15 (something less actually, depending on how many cells we are able to ignore as "obvious outliers"). Even if we check 1 million combos per second it will still run for (something less than) 1E+09 seconds, or about 32 years.
So we need a way to improve our strategy. I integrated two options:
Setting a threshold, so that we don't search for the actual best price but stop as soon as we find an "acceptable" one
Splitting the tables in "zones" (subsets of columns), looking for the best partial solution for each zone and then combining them.
Sample runs:
##10 x 15, 5 zones, each 3 columns wide
this may require up to 1.000000e+03 iterations!
...
current best price is 1 at iteration 71 in 0.06 secs
this may require up to 1.000000e+03 iterations!
...
current best price is 2 at iteration 291 in 0.11 secs
this may require up to 1.000000e+03 iterations!
...
current best price is 1 at iteration 330 in 0.07 secs
this may require up to 8.100000e+02 iterations!
...
current best price is 4 at iteration 34 in 0.09 secs
this may require up to 1.000000e+03 iterations!
...
current best price is 1 at iteration 82 in 0.07 secs
['A000', 'A106', 'A017', 'A033', 'A139', 'A020', 'A051', 'A052', 'A008', 'A009', 'A055', 'A131', 'A147', 'A133', 'A044']
##10 x 15, no zones, threshold = 25
this may require up to 8.100000e+14 iterations!
...
current best price is 24 at iteration 267493282 in 1033.24 secs
under threshold: returning
['A000', 'A001', 'A002', 'A003', 'A004', 'A005', 'A051', 'A052', 'A008', 'A039', 'A055', 'A071', 'A042', 'A133', 'A044']
Code follows:
from itertools import product
from random import randint
from time import time
class PriceComparison():
def __init__(self, rows, cols, zones = [], maxprice = 50, threshold = 0):
self.rows = rows
self.cols = cols
if zones == []:
self.zones = [cols]
else:
self.zones = zones
self.maxprice = maxprice
self.threshold = threshold
self.__post_init__()
def __post_init__(self):
##create sample data
self.itemcodes = [['A%03d' % (r+self.cols*c) for c in range(self.rows)] for r in range(self.cols)]
print(self.itemcodes)
self.pricelist = {self.itemcodes[c][r]:randint(0,self.maxprice) for r in range(self.rows) for c in range(self.cols)}
##remove items with price = 0
for col in self.itemcodes:
for item in col[:]:
if self.pricelist[item] == 0:
print('removing %s from %s' % (item, col))
col.remove(item)
del self.pricelist[item]
def find_cheapest(self, lo, hi):
iterations = 1
for col in self.itemcodes[lo:hi]:
iterations *= len(col)
start = time()
print('\nthis may require up to %e iterations!' % (iterations))
bestprice = self.maxprice * self.cols + 1
for i, combo in enumerate(product(*self.itemcodes[lo:hi])):
##dummy price calculation
price = sum([self.pricelist[item] for item in combo]) * randint(1,10) // 10
if price < bestprice:
elapsed = time() - start
print('current best price is %d at iteration %d in %.2f secs' % (price, i, elapsed))
cheapest = combo
bestprice = price
if price < self.threshold:
print('under threshold: returning')
break
return cheapest
def find_by_zones(self):
print(self.zones)
fullcombo = []
lo = 0
for zone in self.zones:
hi = lo + zone
fullcombo += self.find_cheapest(lo, hi)
lo = hi
return fullcombo

VBA Runtime Error 9 (Excel 2007)

i have got the following problem. I am creating an excel worksheet with active x elements to calculate several values (for a class in university). And in the following code i sometimes (not everytime) get the runtime error 9 that the index is out of range (hopefully i translated it correctly into english). I am new to vba. I know that there are several similar problems already asked but i have a huge problem to adapt the solutions to my code as i don't really understand either the problem in my code as also the solutions of their problems.
I marked the line for which the error occurs with stars.
I would be really thankful if anybody could explain, why this problem occurs in my code sometimes and how to solve it properly.
Thank you in advance.
Here's the code:
Sub calcinull()
Dim ione(4), itwo(4), ii, ints(4), cs(4), io, it As Double
Dim a, b, c As Double
ione(0) = 0
ione(1) = 10
ione(2) = 20
ione(3) = 30
ione(4) = 40
itwo(0) = 100
itwo(1) = 90
itwo(2) = 80
itwo(3) = 70
itwo(4) = 60
For b = 0 To 4
ii = ione(b) + (((itwo(b) - ione(b)) * (NPV(ione(b))) / (NPV(ione(b)) - NPV(itwo(b)))))
ints(b) = ii
cs(b) = NPV(ii)
Next b
Dim AbsInt(4), AbsCs(4) As Double
For a = 0 To 4
AbsInt(a) = VBA.Abs(ints(a))
AbsCs(a) = VBA.Abs(cs(a))
Next a
Dim pos As Integer
pos = Application.Match(Application.Min(AbsCs), AbsCs, 0)
*ii = ints(pos)*
If NPV(ii) > 0 Then
io = ii
If pos > 0 Then
it = itwo(pos - 1)
Else
it = itwo(0)
End If
ElseIf NPV(ii) < 0 Then
it = ii
If pos > 0 Then
io = ione(pos - 1)
Else
io = ione(0)
End If
ElseIf NPV(ii) = 0 Then
inull = ii
End If
For c = 1 To 30
Do Until (NPV(io) - NPV(it)) <> 0
io = io - 0.1
it = it + 0.1
Loop
ii = io + (((it - io) * (NPV(io)) / (NPV(io) - NPV(it))))
If NPV(ii) > 0 Then
io = ii
If it > (io + 0.5) Then
it = it - 0.5
End If
ElseIf NPV(ii) < 0 Then
it = ii
If io < (it - 0.5) Then
io = io + 0.5
End If
ElseIf NPV(ii) = 0 Then
inull = ii
Exit For
End If
Next c
inull = ii
End Sub
As ints is an array with 5 elements (0..4), probably pos is > 4 when this error occurs.
If you can't tell why, maybe put something like this behind the Match-Statement and set a breakpoint to the print while testing.
if pos < 0 or pos > 4 then
debug.print pos & " is off"
end if
Alright guys, i solved it. The problem was, that the arrays uses indices from 0 to x, whereas the position gives the nth position of the array, which means, that my "pos"-variable is always one integer above the array-index.
Thank you all for your help!

Parallelize data processing

I have a large matrix data that I want to "organize" in a certain way. The matrix has 5 columns and about 2 million rows. The first 4 columns are characteristics of each observation (these are integers) and the last column is the outcome variable I'm interested in (this contains real numbers). I want to organize this matrix in an Array of Arrays. Since data is very large, I'm trying to parallelize this operation:
addprocs(3)
#everywhere data = readcsv("datalocation", Int)
#everywhere const Z = 65
#everywhere const z = 16
#everywhere const Y = 16
#everywhere const y = 10
#everywhere const arr = Array{Vector}(Z-z+1,Y-y+1,Z-z+1,Y-y+1)
#parallel (vcat) for a1 in z:Z, e1 in y:Y, a2 in z:Z, e2 in y:Y
arr[a1-z+1,e1-y+1,a2-z+1,e2-y+1] = data[(data[:,1].==a1) & (data[:,2].==e1) & (data[:,3].==a2) & (data[:,4].==e2), end]
end
However I get an error when I try to run the for loop:
Error: syntax: invalid assignment location
After the loop is finished, I would like to have arr available to all processors. What am I doing wrong?
EDIT:
The input matrix data looks like this (rows in no particular order):
16 10 16 10 100
16 10 16 11 200
20 12 21 13 500
16 10 16 10 300
20 12 21 13 500
Notice that some rows can be repeated, and some others will have the same "key" but a different fifth column.
The output I want looks like this (notice how I'm using the dimensions of arr as "keys" for a "dictionary":
arr[16-z+1, 10-y+1, 16-z+1, 10-y+1] = [100, 300]
arr[16-z+1, 10-y+1, 16-z+1, 11-y+1] = [200]
arr[20-z+1, 12-y+1, 21-z+1, 13-y+1] = [500, 500]
That is, the element of arr at index (16-z+1, 10-y+1, 16-z+1, 10-y+1) is the vector [100, 300]. I don't care about the ordering of the rows or the ordering of the last column of vectors.
Does this work for you? I tried to simulate your data by repeating the snippet that you gave of it 1000 times. It's not as elegant as I would have wanted and in particular, I couldn't quite get the remotecall_fetch() working like I wanted (even when wrapping it with #async) so I had to split the calling and the fetching into two steps. Let me know though how this seems.
addprocs(n)
#everywhere begin
if myid() != 1
multiplier = 10^3;
Data = readdlm("/path/to/Input.txt")
global data = kron(Data,ones(multiplier));
println(size(data))
end
end
#everywhere begin
function Select_Data(a1, e1, a2, e2, data=data)
return data[(data[:,1].==a1) & (data[:,2].==e1) & (data[:,3].==a2) & (data[:,4].==e2), end]
end
end
n_workers = nworkers()
function next_pid(pid, n_workers)
if pid <= n_workers
return pid + 1
else
return 2
end
end
const arr = Array{Any}(Z-z+1,Y-y+1,Z-z+1,Y-y+1);
println("Beginning Processing Work")
#sync begin
pid = 2
for a1 in z:Z, e1 in y:Y, a2 in z:Z, e2 in y:Y
pid = next_pid(pid, n_workers)
arr[a1-z+1,e1-y+1,a2-z+1,e2-y+1] = remotecall(pid, Select_Data, a1, e1, a2, e2)
end
end
println("Retrieving Completed Jobs")
#sync begin
pid = 2
for a1 in z:Z, e1 in y:Y, a2 in z:Z, e2 in y:Y
arr[a1-z+1,e1-y+1,a2-z+1,e2-y+1] = fetch(arr[a1-z+1,e1-y+1,a2-z+1,e2-y+1])
end
end
Note: I initially misinterpreted your question. I had thought that you were trying to split the data amongst your workers, but I now see that isn't quite what you were after. I wrote up some simplified examples of ways that can be accomplished. I'll leave them up as a response in case anyone in the future finds them useful.
Get started:
writedlm("path/to/data.csv", rand(100,10), ',')
addprocs(4)
Option 1:
function sendto(p::Int; args...)
for (nm, val) in args
#spawnat(p, eval(Main, Expr(:(=), nm, val)))
end
end
Data = readcsv("/path/to/data.csv")
for (idx, pid) in enumerate(workers())
Start = (idx-1)*25 + 1
End = Start + 24
sendto(pid, Data = Data[Start:End,])
end
Option 2:
#everywhere begin
if myid() != 1
Start = (myid()-2)*25 + 1
End = Start + 24
println(Start)
println(End)
Data = readcsv("path/to/data.csv")[Start:End,:]
end
end
# verify everything looks right for what got sent
#everywhere if myid()!= 1 println(typeof(Data)) end
#everywhere if myid()!= 1 println(size(Data)) end
Option 3:
for (idx, pid) in enumerate(workers())
Start = (idx-1)*25 + 1
End = Start + 24
sendto(pid, Start = Start, End = End)
end
#everywhere if myid()!= 1 Data = readcsv("path/to/data.csv")[Start:End,:] end

Lookup inside arrays or quick Lookup in Excel

I have two ranges
First range("D:D") in sheet(1) starting from second row to last row, This is Lookup_values, 140.000 rows
x1 = Worksheets("1").range(Worksheets("1").Cells(2, "D").Address, Worksheets("1").Cells(Rows.Count, "D").End(xlUp)).value
Second range("A:D") in sheet(2) starting from second row to last row, this is tabble_array, 500.000 rows
x2 = Worksheets("2").range(Worksheets("2").Cells(2, 4).Address, Worksheets("2").Cells(Rows.Count, 1).End(xlUp)).value
Here i trying to look up inside arrays
ReDim ListBoxArrSplitToRows(1 To 4, 1 To UBound(x2, 1))
CX = UBound(x2, 2)
For ii = 2 To UBound(x1, 1)
For i = 1 To UBound(x2, 1)
SearchInst = x2(i, 1)
txt = x1(ii, 1)
If InStr(SearchInst, txt) Then
zz = zz + 1
For counter = 1 To 4
ListBoxArrSplitToRows(counter, zz) = x2(i, counter)
Next counter
Else
End If
Next i
Next ii
If zz <> 0 Then ReDim Preserve ListBoxArrSplitToRows(1 To 4, 1 To zz) Else ReDim ListBoxArrSplitToRows(0, 0): MsgBox "No matches"
Worksheets(1).Cells(2, "E").Resize(UBound(ListBoxArrSplitToRows, 2), 3) = ListBoxArrSplitToRows
ii=3
Ubound(x1,1) = 136586
Ubound(x2,1) = 496369
zz=1
How to quick lookup two large ranges, beacause that code takes 30 min to lookup values and it is too long
As far as I can tell, the cause of your "Subscript out of range" error is that zz is greater than UBound(x2, 1) which makes it out of ListBoxArrSplitToRows bounds.
The quick fix is to move this line
If zz <> 0 Then ReDim Preserve ListBoxArrSplitToRows(1 To 4, 1 To zz)
one line higher up, i.e. before Next i. Though get rid of Else ReDim ListBoxArrSplitToRows(0, 0); I don't know what this achieves.

Resources