I have defined 5 arrays.
One with undefined dimensions to store the other 4:
Dim outputArr() As Variant
and the rest as follows:
Dim Arr1(5, 0), Arr2(12, 0), Arr3(5, 0), Arr4(12, 0) As Variant
I assign the elements of the latter as follows:
Arr1(0, 0) = [{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}]
Arr1(1, 0) = [{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}]
Arr1(2, 0) = [{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}]
Arr1(3, 0) = [{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}]
Arr1(4, 0) = [{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}]
Arr1(5, 0) = [{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}]
The above is applied to each array.
When I use
ReDim outputArray(3, 0)
outputArr = [{Arr1, Arr2, Arr3, Arr4}]
I get a 'Type Mismatch' error.
When I do not use Evaluate and assign without ReDim
outputArr = Array(Arr1, Arr2, Arr3, Arr4)
I can see the elements and their values in the Watch window, but when I try to populate Defined Named Ranges with the elements of outputArr I get an empty output
Range("nRange1name").Value = outputArr(0)
Range("nRange2name").Value = outputArr(1)
Range("nRange3name").Value = outputArr(2)
Range("nRange4name").Value = outputArr(3)
How can I work around this?
The use of variants in the OP code introduces unecessary dimensions. I don't understand why two transpose functions are needed but the following code pastes 2d arrays satisfactorily.
Option Explicit
Sub TestArrays()
Dim outputArr As Variant
Dim Arr1 As Variant
Dim Arr2 As Variant
Dim Arr3 As Variant
Dim Arr4 As Variant
Arr1 = Array(Array(1, 2, 3, 4, 5, 6, 7, 8, 9), Array(1, 2, 3, 4, 5, 6, 7, 8, 9))
Arr2 = Array(Array(1, 2, 3, 4, 5, 6, 7, 8, 9), Array(1, 2, 3, 4, 5, 6, 7, 8, 9))
Arr3 = Array(Array(1, 2, 3, 4, 5, 6, 7, 8, 9), Array(1, 2, 3, 4, 5, 6, 7, 8, 9))
Arr4 = Array(Array(1, 2, 3, 4, 5, 6, 7, 8, 9), Array(1, 2, 3, 4, 5, 6, 7, 8, 9))
outputArr = Array(Arr1, Arr2, Arr3, Arr4)
' For Horizontal ranges
Range("A1:H2") = Application.WorksheetFunction.Transpose(Application.WorksheetFunction.Transpose(outputArr(2)))
'For Vertical ranges
Range("A4:B11") = Application.WorksheetFunction.Transpose(outputArr(3))
End Sub
You need to construct an actual 2D array to do something like that.
Dim arr(1 to 6, 1 to 12)
dim r as long, c as long
for r = lbound(arr, 1) to ubound(arr, 1)
for c = lbound(arr, 2) to ubound(arr, 2)
arr(r, c) = 0
next c
next r
Range("A1").Resize(ubound(arr, 1), ubound(arr, 2)).value = arr
Is there a function in Numpy that allows you to take 4 records at a time and see where they match with a second dataset? Once there is a match move to the next 4 records of the first data set. It wont always be every 4 records, but i am using this as an example.
So if dataset one had - 1,5,7,8,10,12,6,1,3,6,8,9
And the second dataset had - 1,5,7,8,11,15,6,1,3,6,10,6
My result will be: 1,5,7,8, 6,1,3,6
POST EDIT:
My second example datasets:
import numpy as np
a =np.array([15,15,0,0,10,10,0,0,2,1,8,8,42,2,4,4,3,1,1,3,5,6,0,9,47,1,1,7,7,0,0,45,12,17,45])
b = np.array ([6,0,0,15,15,0,0,10,10,0,0,2,1,8,8,42,2,4,4,3,3,4,6,0,9,47,1,1,7,7,0,0,45,12,16,1,9,3,30])
Here's another snapshot of an example:
Thank you in advance for looking at my question!!
Update: for the more difficult and more interesting alignment problem it is probably best not to reinvent the wheel but to rely on python's difflib:
from difflib import SequenceMatcher
import numpy as np
k=4
a = np.array([15,15,0,0,10,10,0,0,2,1,8,8,42,2,4,4,3,1,1,3,5,6,0,9,47,1,1,7,7,0,0,45,12,17,45])
b = np.array ([6,0,0,15,15,0,0,10,10,0,0,2,1,8,8,42,2,4,4,3,3,4,6,0,9,47,1,1,7,7,0,0,45,12,16,1,9,3,30])
sm = SequenceMatcher(a=a, b=b)
matches = sm.get_matching_blocks()
matches = [m for m in matches if m.size >= k]
# [Match(a=0, b=3, size=17), Match(a=21, b=22, size=12)]
consensus = [a[m.a:m.a+m.size] for m in matches]
# [array([15, 15, 0, 0, 10, 10, 0, 0, 2, 1, 8, 8, 42, 2, 4, 4, 3]), array([ 6, 0, 9, 47, 1, 1, 7, 7, 0, 0, 45, 12])]
consfour = [a[m.a:m.a + m.size // k * k] for m in matches]
# [array([15, 15, 0, 0, 10, 10, 0, 0, 2, 1, 8, 8, 42, 2, 4, 4]), array([ 6, 0, 9, 47, 1, 1, 7, 7, 0, 0, 45, 12])]
summary = [np.c_[np.add.outer(np.arange(m.size // k * k), (m.a, m.b)), c]
for m, c in zip(matches, consfour)]
merge = np.concatenate(summary, axis=0)
Below is my original solution assuming already aligned and same-length arrays:
Here is a hybrid solution using numpy to find consecutive matches and cutting them out and then list comp to apply length constraints:
import numpy as np
d1 = np.array([7,1,5,7,8,0,6,9,0,10,12,6,1,3,6,8,9])
d2 = np.array([8,1,5,7,8,0,6,9,0,11,15,6,1,3,6,10,6])
k = 4
# find matches
m = d1 == d2
# find switches between match, no match
sw = np.where(m[:-1] != m[1:])[0] + 1
# split
mnm = np.split(d1, sw)
# select matches
ones_ = mnm[1-m[0]::2]
# apply length constraint
res = [blck[i:i+k] for blck in ones_ for i in range(len(blck)-k+1)]
# [array([1, 5, 7, 8]), array([5, 7, 8, 0]), array([7, 8, 0, 6]), array([8, 0, 6, 9]), array([0, 6, 9, 0]), array([6, 1, 3, 6])]
res_no_ovlp = [blck[k*i:k*i+k] for blck in ones_ for i in range(len(blck)//k)]
# [array([1, 5, 7, 8]), array([0, 6, 9, 0]), array([6, 1, 3, 6])]
You can use matrix masking like,
import numpy as np
from scipy.sparse import dia_matrix
a = np.array([1,5,7,8,10,12,6,1,3,6,8,9])
b = np.array([1,5,7,8,11,15,6,1,3,6,10,6])
mask = dia_matrix((np.ones((1, a.size)).repeat(4, axis=0), np.arange(4)),
shape=(a.size, b.size), dtype=np.int)
print(mask.toarray())
matches = a[mask.T.dot(mask.dot(a == b) == 4).astype(np.bool)]
print(matches)
This will output,
array([[1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0],
[0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1]])
[1 5 7 8 6 1 3 6]
You can think about how the matrix multiplication works to get this result.
Scaling
For scaling, I tested with 1e3, 1e5, and 1e7 elements and got,
1e3 - 0.019184964010491967
1e5 - 0.4330314120161347
1e7 - 144.54082221200224
See the gist. Not sure why such a hard jump at 1e7 elements.
This is an exercise is list comprehension. We have the data
data = [1,5,7,8,10,12,6,1,3,6,8,9]
search_data = [1,5,7,8,11,15,6,1,3,6,10,6]
First we can chunk the original data into blocks of length n
n = 4
chunks = [data[i:i + n] for i in range(len(data) - n + 1)]
search_chunks = [search_data[i:i + n] for i in range(len(search_data) - n + 1)]
Now we must select chunks from the first list that appear in the second list
hits = [c for c in chunks if c in search_chunks]
print hits
# [[1, 5, 7, 8], [6, 1, 3, 6]]
This may not be the optimal solution for long lists. It may improve performance to consider sets, if there are likely to repeated chunks
chunks = set(tuple(data[i:i + n]) for i in range(len(data) - n + 1))
search_chunks = set(tuple(search_data[i:i + n]) for i in range(len(search_data) - n + 1))
This can be quite competitive with above numpy solution, e.g.
import numpy as np
import time
# Generate data
len_ = 10000
max_ = 10
data = map(int, np.random.rand(len_) * max_)
search_data = map(int, np.random.rand(len_) * max_)
# Time list comprehension
start = time.time()
n = 4
chunks = set(tuple(data[i:i + n]) for i in range(len(data) - n + 1))
search_chunks = set(tuple(search_data[i:i + n]) for i in range(len(search_data) - n + 1))
hits = [c for c in chunks if c in search_chunks]
print time.time() - start
# Time numpy
a = np.array(data)
b = np.array(search_data)
mask = 1 * (np.abs(np.arange(a.size).reshape((-1, 1)) - np.arange(a.size) - 0.5) < 2)
start = time.time()
matches = a[mask.T.dot(mask.dot(a == b) == 4).astype(np.bool)]
print time.time() - start
It's typically faster here, but it depends on number of repeated chunks etc.