Working with arrays in VBA memory and avoiding loops using vectorization - arrays

I am versed in MATLAB but find myself working in VBA these days as MATLAB is less accessible to me and I struggle with trying to do stuff in VBA (like vectorization) that I could easily handle in MATLAB.
Lets say I have a data table in excel of the following form:
record startDate endDate count
1 100 103 10
2 98 102 5
3 101 104 4
I would like to do all my processing in memory (avoiding loops) and then output results file that looks like this:
1 2 3 Sum
98 0 5 0 5
99 0 5 0 5
100 10 5 0 15
101 10 5 4 19
102 10 5 4 19
103 10 0 4 14
104 0 0 4 4
Basically, I start with earliest date and loop through the latest date and then check to see if each date is included in the date window for each record and if it is I apply the record count to that day and then sum them up.
I created the included output using a simple worksheet function, but I would like to be able to replicate the process in VBA specifically avoiding looping at least reducing to 1 loop instead of embedded loops.
If I were in MATLAB I would find the logical array that meets a condition, for example:
numDays = 7;
numRecords = 3;
startDate = [100; 98; 101];
endDate = [103; 102; 104];
dateVector = [98; 99; 100; 101; 102; 103; 104];
count = [10; 5; 4];
dateLogic = logical(numDays,numRecords);
for d = 1:numDays
dateLogic(d,:) = dateVector(d) >= startDate(:,1) & dateVector(d) <= endDate(:,1)
end
countMatrix = dateLogix * count';
Sum = sum(countMatrix,2);
This would give me a logical matrix of zeros and ones that I can cross multiply with count vector to get my counts and ultimately my Sum vector. I believe I could even use a bsxfun to remove the loop on days.
Please excuse any potential syntax errors as I do not have access to MATLAB right now.
Anyway, how can I do something similar in VBA. Is there an equivalent colon notation to reference the entire range of columns or rows in an array. I will be applying to large data set so efficiency is of the essence. The more I can do in memory before pasting the better.
Thanks in advance.

Here's one possibility, try with sampe data in A1:A4 of a new workbook.
Sub NewTable()
Set Table = Sheet1.[a2:d4]
With Application
Record = .Transpose(.Index(Table, , 1))
FirstDate = .Transpose(.Index(Table, , 2))
LastDate = .Transpose(.Index(Table, , 3))
Count = .Transpose(.Index(Table, , 4))
Dates = .Evaluate("row(" & .Min(FirstDate) & ":" & .Max(LastDate) & ")")
Values = .PV(, Count, .PV(, .GeStep(Dates, FirstDate), .GeStep(LastDate, Dates)))
Sum = .MMult(Values, .Power(.Transpose(Record), 0))
End With
Sheet1.[F1].Offset(, 1).Resize(, UBound(Values, 2)) = Record
Sheet1.[F2].Resize(UBound(Dates)) = Dates
Sheet1.[G2].Resize(UBound(Values), UBound(Values, 2)) = Values
Sheet1.[G2].Offset(, UBound(Values, 2)).Resize(UBound(Dates)) = Sum
End Sub

Related

Updating a matrix from list of redundant indices

I have a matrix A, a list of indices is and js, and a list of values to add to A, ws. Originally I was simply iterating through A by a nested for loop:
for idx = 1:N
i = is(idx);
j = js(idx);
w = ws(idx);
A(i,j) = A(i,j) + w;
end
However, I would like to vectorize this to increase efficiency. I thought something simple like
A(is,js) = A(is,js) + ws
would work, and it does as long as the is and js don't repeat. Said differently, if I generate idx = sub2ind(size(A),is,js);, so long as idx has no repeat values, all is well. If it does, then only the last value is added, all previous values are left out. A concrete example:
A = zeros(3,3);
indices = [1,2,3,1];
additions = [5,5,5,5];
A(indices) = A(indices) + additions;
This results in the first column having values of 5, not 5,5,10.
This is a small example, but in my actual application the lists of indices are really long and filled with redundant values. I'm hoping to vectorize this to save time, so going through and eliminating redundancies isn't really an option. So my main question is, how do I add to a matrix from a given set of redundant indices? Alternatively, is there another way of working through this without any sort of iteration?
To emphasize a nice property of accumarray (accumarray actually works with two indices)
With the example from Luis Mendo:
is = [2 3 3 1 1 2].';
js = [1 3 3 2 2 4].';
ws = [10 20 30 40 50 60].';
A3 = accumarray([is js],ws);
%% A3 =
%% 0 90 0 0
%% 10 0 0 60
%% 0 0 50 0
If I understand correctly, you only need full(sparse(is, js, ws)). This works because sparse accumulates values at matching indices.
% Example data
is = [2 3 3 1 1 2];
js = [1 3 3 2 2 4];
ws = [10 20 30 40 50 60];
% With loop
N = numel(is);
A = zeros(max(is), max(js));
for idx = 1:N
i = is(idx);
j = js(idx);
w = ws(idx);
A(i,j) = A(i,j) + w;
end
% With `sparse`
A2 = full(sparse(is, js, ws));
% Check
isequal(A, A2)

Group two non-adjacent columns into 2d array for Excel VBA Script

I think this question might be related to Ms Excel -> 2 columns into a 2 dimensional array but I can't quite make the connection.
I have a VBA script for filling missing missing data. I select two adjacent columns, and it finds any gaps in the second column and linearly interpolates based on (possibly irregular) spacing in the first column. For instance, I could use it on this data:
1 7
2 14
3 21
5 35
5.1
6 42
7
8
9 45
to get this output
1 7
2 14
3 21
5 35
5.1 35.7 <---1/10th the way between 35&42
6 42
7 43 <-- 1/3 the way between 42 & 45
8 44 <-- 2/3 the way between 42 & 45
9 45
This is very useful for me.
My trouble is that it only works on contiguous columns. I would like to be able to select two columns that are not adjacent to each other and have it work the same way. My code starts out like this:
Dim addr As String
addr = Selection.Address
Dim nR As Long
Dim nC As Long
'Reads Selected Cells' Row and Column Information
nR = Range(addr).Rows.Count
nC = Range(addr).Columns.Count
When I run this with contiguous columns selected, addr shows up in the Locals window with a value like "$A$2:$B$8" and nC = 2
When I run this with non-contiguous columns selected, addr shows up in the Locals window with a value like "$A$2:$A$8,$C$2:$C$8" and nC = 1.
Later on in the script, I collect the values in each column into an array. Here's how I deal with the second column, for example:
'Creates a Column 2 (col1) array, determines cells needed to interpolate for, changes font to bold and red, and reads its values
Dim col2() As Double
ReDim col2(0 To nR + 1)
i = 1
Do Until i > nR
If IsEmpty(Selection(i, 2)) Or Selection(i, 2) = 0 Or Selection(i, 2) = -901 Then
Selection(i, 2).Font.Bold = True
Selection(i, 2).Font.Color = RGB(255, 69, 0)
col2(i) = 9999999
Else
col2(i) = Selection(i, 2)
End If
i = i + 1
Loop
This is also busted, because even if my selection is "$A$2:$A$8,$C$2:$C$8" VBA will treat Selection(1,2) as a reference to $B$2, not the desired $C$2.
Anyone have a suggestion for how I can get VBA to treat non-contiguous selection the way it treats contiguous?
You're dealing with "disjoint ranges." Use the Areas collection, e.g., as described here. The first column should be in Selection.Areas(1) and the second column should be in Selection.Areas(2).

Intertwining 3 arrays in matlab / octave to get correct pattern

I know I can intertwine 2 arrays by
C = [A(:),B(:)].'; %'
D = C(:)
But how can I intertwine 3 arrays with a (pendulum type of pattern going back and forth) See image below with arrows showing the intertwining path pattern I'm trying to get (each column is an array). Also the number pattern I'm trying to get is also next to it, in one large column. Please note the numerical values are just examples to make it easier to read. the numerical values could be decimals also
I tried the code below but the pattern is incorrect.
A=[1,2,3,4,5]
B=[10,20,30,40,50,60,70,80,90]
C=[100,200,300,400,500]
D = [A(:),B(:),C(:)].'; %'
E = D(:)
I get an error in the D array due to the fact that the B array is a larger size than A and C but the number pattern is also not following the pattern I'm trying to get.
1
10
100
2
20
200
3
30
300
4
40
400
5
50
500
error: horizontal dimensions mismatch (5x1 vs 9x1)
The pattern from the 3 arrays I'm trying to get is below.
Please note the numerical values are just examples to make it easier to read. the numerical values could be decimals also
1
10
100
20
2
30
200
40
3
50
300
60
4
70
400
80
5
90
500
PS: I'm using Octave 3.8.1 which is like matlab
Have you tried the following?
D = zeros(4 * size(A, 2) - 1, 1); % initialization
D(1 : 4 : end) = A;
D(2 : 2 : end) = B;
D(3 : 4 : end) = C;

Split vector in MATLAB

I'm trying to elegantly split a vector. For example,
vec = [1 2 3 4 5 6 7 8 9 10]
According to another vector of 0's and 1's of the same length where the 1's indicate where the vector should be split - or rather cut:
cut = [0 0 0 1 0 0 0 0 1 0]
Giving us a cell output similar to the following:
[1 2 3] [5 6 7 8] [10]
Solution code
You can use cumsum & accumarray for an efficient solution -
%// Create ID/labels for use with accumarray later on
id = cumsum(cut)+1
%// Mask to get valid values from cut and vec corresponding to ones in cut
mask = cut==0
%// Finally get the output with accumarray using masked IDs and vec values
out = accumarray(id(mask).',vec(mask).',[],#(x) {x})
Benchmarking
Here are some performance numbers when using a large input on the three most popular approaches listed to solve this problem -
N = 100000; %// Input Datasize
vec = randi(100,1,N); %// Random inputs
cut = randi(2,1,N)-1;
disp('-------------------- With CUMSUM + ACCUMARRAY')
tic
id = cumsum(cut)+1;
mask = cut==0;
out = accumarray(id(mask).',vec(mask).',[],#(x) {x});
toc
disp('-------------------- With FIND + ARRAYFUN')
tic
N = numel(vec);
ind = find(cut);
ind_before = [ind-1 N]; ind_before(ind_before < 1) = 1;
ind_after = [1 ind+1]; ind_after(ind_after > N) = N;
out = arrayfun(#(x,y) vec(x:y), ind_after, ind_before, 'uni', 0);
toc
disp('-------------------- With CUMSUM + ARRAYFUN')
tic
cutsum = cumsum(cut);
cutsum(cut == 1) = NaN; %Don't include the cut indices themselves
sumvals = unique(cutsum); % Find the values to use in indexing vec for the output
sumvals(isnan(sumvals)) = []; %Remove NaN values from sumvals
output = arrayfun(#(val) vec(cutsum == val), sumvals, 'UniformOutput', 0);
toc
Runtimes
-------------------- With CUMSUM + ACCUMARRAY
Elapsed time is 0.068102 seconds.
-------------------- With FIND + ARRAYFUN
Elapsed time is 0.117953 seconds.
-------------------- With CUMSUM + ARRAYFUN
Elapsed time is 12.560973 seconds.
Special case scenario: In cases where you might have runs of 1's, you need to modify few things as listed next -
%// Mask to get valid values from cut and vec corresponding to ones in cut
mask = cut==0
%// Setup IDs differently this time. The idea is to have successive IDs.
id = cumsum(cut)+1
[~,~,id] = unique(id(mask))
%// Finally get the output with accumarray using masked IDs and vec values
out = accumarray(id(:),vec(mask).',[],#(x) {x})
Sample run with such a case -
>> vec
vec =
1 2 3 4 5 6 7 8 9 10
>> cut
cut =
1 0 0 1 1 0 0 0 1 0
>> celldisp(out)
out{1} =
2
3
out{2} =
6
7
8
out{3} =
10
For this problem, a handy function is cumsum, which can create a cumulative sum of the cut array. The code that produces an output cell array is as follows:
vec = [1 2 3 4 5 6 7 8 9 10];
cut = [0 0 0 1 0 0 0 0 1 0];
cutsum = cumsum(cut);
cutsum(cut == 1) = NaN; %Don't include the cut indices themselves
sumvals = unique(cutsum); % Find the values to use in indexing vec for the output
sumvals(isnan(sumvals)) = []; %Remove NaN values from sumvals
output = {};
for i=1:numel(sumvals)
output{i} = vec(cutsum == sumvals(i)); %#ok<SAGROW>
end
As another answer shows, you can use arrayfun to create a cell array with the results. To apply that here, you'd replace the for loop (and the initialization of output) with the following line:
output = arrayfun(#(val) vec(cutsum == val), sumvals, 'UniformOutput', 0);
That's nice because it doesn't end up growing the output cell array.
The key feature of this routine is the variable cutsum, which ends up looking like this:
cutsum =
0 0 0 NaN 1 1 1 1 NaN 2
Then all we need to do is use it to create indices to pull the data out of the original vec array. We loop from zero to max and pull matching values. Notice that this routine handles some situations that may arise. For instance, it handles 1 values at the very beginning and very end of the cut array, and it gracefully handles repeated ones in the cut array without creating empty arrays in the output. This is because of the use of unique to create the set of values to search for in cutsum, and the fact that we throw out the NaN values in the sumvals array.
You could use -1 instead of NaN as the signal flag for the cut locations to not use, but I like NaN for readability. The -1 value would probably be more efficient, as all you'd have to do is truncate the first element from the sumvals array. It's just my preference to use NaN as a signal flag.
The output of this is a cell array with the results:
output{1} =
1 2 3
output{2} =
5 6 7 8
output{3} =
10
There are some odd conditions we need to handle. Consider the situation:
vec = [1 2 3 4 5 6 7 8 9 10 11 12 13 14];
cut = [1 0 0 1 1 0 0 0 0 1 0 0 0 1];
There are repeated 1's in there, as well as a 1 at the beginning and end. This routine properly handles all this without any empty sets:
output{1} =
2 3
output{2} =
6 7 8 9
output{3} =
11 12 13
You can do this with a combination of find and arrayfun:
vec = [1 2 3 4 5 6 7 8 9 10];
N = numel(vec);
cut = [0 0 0 1 0 0 0 0 1 0];
ind = find(cut);
ind_before = [ind-1 N]; ind_before(ind_before < 1) = 1;
ind_after = [1 ind+1]; ind_after(ind_after > N) = N;
out = arrayfun(#(x,y) vec(x:y), ind_after, ind_before, 'uni', 0);
We thus get:
>> celldisp(out)
out{1} =
1 2 3
out{2} =
5 6 7 8
out{3} =
10
So how does this work? Well, the first line defines your input vector, the second line finds how many elements are in this vector and the third line denotes your cut vector which defines where we need to cut in our vector. Next, we use find to determine the locations that are non-zero in cut which correspond to the split points in the vector. If you notice, the split points determine where we need to stop collecting elements and begin collecting elements.
However, we need to account for the beginning of the vector as well as the end. ind_after tells us the locations of where we need to start collecting values and ind_before tells us the locations of where we need to stop collecting values. To calculate these starting and ending positions, you simply take the result of find and add and subtract 1 respectively.
Each corresponding position in ind_after and ind_before tell us where we need to start and stop collecting values together. In order to accommodate for the beginning of the vector, ind_after needs to have the index of 1 inserted at the beginning because index 1 is where we should start collecting values at the beginning. Similarly, N needs to be inserted at the end of ind_before because this is where we need to stop collecting values at the end of the array.
Now for ind_after and ind_before, there is a degenerate case where the cut point may be at the end or beginning of the vector. If this is the case, then subtracting or adding by 1 will generate a start and stopping position that's out of bounds. We check for this in the 4th and 5th line of code and simply set these to 1 or N depending on whether we're at the beginning or end of the array.
The last line of code uses arrayfun and iterates through each pair of ind_after and ind_before to slice into our vector. Each result is placed into a cell array, and our output follows.
We can check for the degenerate case by placing a 1 at the beginning and end of cut and some values in between:
vec = [1 2 3 4 5 6 7 8 9 10];
cut = [1 0 0 1 0 0 0 1 0 1];
Using this example and the above code, we get:
>> celldisp(out)
out{1} =
1
out{2} =
2 3
out{3} =
5 6 7
out{4} =
9
out{5} =
10
Yet another way, but this time without any loops or accumulating at all...
lengths = diff(find([1 cut 1])) - 1; % assuming a row vector
lengths = lengths(lengths > 0);
data = vec(~cut);
result = mat2cell(data, 1, lengths); % also assuming a row vector
The diff(find(...)) construct gives us the distance from each marker to the next - we append boundary markers with [1 cut 1] to catch any runs of zeros which touch the ends. Each length is inclusive of its marker, though, so we subtract 1 to account for that, and remove any which just cover consecutive markers, so that we won't get any undesired empty cells in the output.
For the data, we mask out any elements corresponding to markers, so we just have the valid parts we want to partition up. Finally, with the data ready to split and the lengths into which to split it, that's precisely what mat2cell is for.
Also, using #Divakar's benchmark code;
-------------------- With CUMSUM + ACCUMARRAY
Elapsed time is 0.272810 seconds.
-------------------- With FIND + ARRAYFUN
Elapsed time is 0.436276 seconds.
-------------------- With CUMSUM + ARRAYFUN
Elapsed time is 17.112259 seconds.
-------------------- With mat2cell
Elapsed time is 0.084207 seconds.
...just sayin' ;)
Here's what you need:
function spl = Splitting(vec,cut)
n=1;
j=1;
for i=1:1:length(b)
if cut(i)==0
spl{n}(j)=vec(i);
j=j+1;
else
n=n+1;
j=1;
end
end
end
Despite how simple my method is, it's in 2nd place for performance:
-------------------- With CUMSUM + ACCUMARRAY
Elapsed time is 0.264428 seconds.
-------------------- With FIND + ARRAYFUN
Elapsed time is 0.407963 seconds.
-------------------- With CUMSUM + ARRAYFUN
Elapsed time is 18.337940 seconds.
-------------------- SIMPLE
Elapsed time is 0.271942 seconds.
Unfortunately there is no 'inverse concatenate' in MATLAB. If you wish to solve a question like this you can try the below code. It will give you what you looking for in the case where you have two split point to produce three vectors at the end. If you want more splits you will need to modify the code after the loop.
The results are in n vector form. To make them into cells, use num2cell on the results.
pos_of_one = 0;
% The loop finds the split points and puts their positions into a vector.
for kk = 1 : length(cut)
if cut(1,kk) == 1
pos_of_one = pos_of_one + 1;
A(1,one_pos) = kk;
end
end
F = vec(1 : A(1,1) - 1);
G = vec(A(1,1) + 1 : A(1,2) - 1);
H = vec(A(1,2) + 1 : end);

count elements falling within certain thresholds in array in matlab?

I have a huge vector. I have to count values falling within certain ranges.
the ranges are like 0-10, 10-20 etc. I have to count the number of values which fall in certain range.
I did something like this :
for i=1:numel(m1)
if (0<m1(i)<=10)==1
k=k+1;
end
end
Also:
if not(isnan(m1))==1
x=(0<m1<=10);
end
But both the times it gives array which contains all 1s. What wrong am I doing?
You can do something like this (also works for non integers)
k = sum(m1>0 & m1<=10)
You can use logical indexing. Observe:
>> x = randi(40, 1, 10) - 20
x =
-2 17 -12 -9 -14 -14 15 4 2 -14
>> x2 = x(0 < x & x < 10)
x2 =
4 2
>> length(x2)
ans =
2
and the same done in one step:
>> length(x(0 < x & x < 10))
ans =
2
to count the values in a specific range you can use ismember,
if m1 is vector use,
k = sum(ismember(m1,0:10));
If m1 is matrix use k = sum(sum(ismember(m1,0:10)));
for example,
m1=randi(20,[5 5])
9 10 6 10 16
8 9 14 20 6
16 13 14 7 11
16 15 4 12 14
4 16 3 5 18
sum(sum(ismember(m1,1:10)))
12
Why not simply do something like this?
% Random data
m1 = 100*rand(1000,1);
%Count elements between 10 and 20
m2 = m1(m1>10 & m1<=20);
length(m2) %number of elements of m1 between 10 and 20
You can then put things in a loop
% Random data
m1 = 100*rand(1000,1);
nb_elements = zeros(10,1);
for k=1:length(nb_elements)
temp = m1(m1>(10*k-10) & m1<=(10*k));
nb_elements(k) = length(temp);
end
Then nb_elements contains your data with nb_elements(1) for the 0-10 range, nb_elements(2) for the 10-20 range, etc...
Matlab does not know how to evaluate the combined logical expression
(0<m1(i)<=10)
Insted you should use:
for i=1:numel(m1)
if (0<m1(i)) && (m1(i)<=10)
k=k+1;
end
end
And to fasten it up probably something like this:
sum((0<m1) .* (m1<=10))
Or you can create logical arrays and then use element-wise multiplication. Don't know how fast this is though and it might use a lot of memory for large arrays.
Something like this
A(find((A>0.2 .* (A<0.8)) ==1))
Generate values
A= rand(5)
A =
0.414906 0.350930 0.057642 0.650775 0.525488
0.573207 0.763477 0.120935 0.041357 0.900946
0.333857 0.241653 0.421551 0.737704 0.162307
0.517501 0.491623 0.016663 0.016396 0.254099
0.158867 0.098630 0.198298 0.223716 0.136054
Find the intersection where the values > 0.8 and < 0.2. This will give you two logical arrays and the values where A>0.2 and A<0.8 will be =1 after element-wise multiplication.
find((A>0.2 .* (A<0.8)) ==1)
Then apply those indices to A
A(find((A>0.2 .* (A<0.8)) ==1))
ans =
0.41491
0.57321
0.33386
0.51750
0.35093
0.76348
0.24165
0.49162
0.42155
0.65077
0.73770
0.22372
0.52549
0.90095
0.25410

Resources