Fastest way to parse this string to a numpy array - arrays

We have to perform the following operation around 400,000 times so I'm searching for the most efficient solution. I have tried several things but I'm curious whether there are even better approaches :)
Data example
We can use the following code to generate an example test set
random.seed(10)
np.random.seed(10)
def test_str():
n = 10000000
arr = np.random.randint(10000, size=n)
sign = np.random.choice(['+','-'], size=n)
return 'ID1' + '\t' + ' '.join(["{}{}".format(a,b) for a,b in zip(arr, sign)])
Which looks like ID1\t7688+ 737+ 677+ 1508- 9251-......
The code where it is all about :)
Copy the code from google colab (P.s. running it there gave me a TypingError whereas it ran fine on my machine), or just see the functions below
General function
From this Numba issue , but based on #armamut answer this may introduce a lot of overhead with Numba, making native Numpy apparently faster..
#nb.jit(nopython=True)
def str_to_int(s):
final_index, result = len(s) - 1, 0
for i,v in enumerate(s):
result += (ord(v) - 48) * (10 ** (final_index - i))
return result
Approach 1
#nb.jit(nopython=True)
def process_number(numb, identifier, i):
sign = 1 if numb[-1] == '+' else -1
return str_to_int(numb[:-1]), sign, i, identifier
#nb.jit(nopython=True)
def expand1(data):
identifier, l = data.split('\t')
identifier = str_to_int(identifier[-1])
numbers = l.split()
# init emtpy numpy array
arr = np.empty(shape = (len(numbers), 4), dtype = np.int64)
# Fill array
for i, numb in enumerate(numbers):
arr[i,:] = process_number(numb, identifier, i)
return arr
Approach 2
#nb.jit(nopython=True)
def expand2(data):
identifier, l = data.split('\t')
identifier = str_to_int(identifier[-1])
numbers = l.split()
size = len(numbers)
numbs = [ str_to_int(numb[:-1]) for numb in numbers ]
signs = [ 1 if numb[:-1] =='+' else -1 for numb in numbers ]
arr = np.empty(shape = (size, 4), dtype = np.int64)
arr[:,0] = numbs
arr[:,1] = signs
arr[:,2] = np.arange(0, size)
arr[:,3] = np.repeat(identifier, size)
return arr
Approach 3
#nb.jit(nopython=True)
def expand3(data):
identifier, l = data.split('\t')
identifier = str_to_int(identifier[-1])
numbers = l.split()
arr = np.empty(shape = (len(numbers), 4), dtype = np.int64)
for i, numb in enumerate(numbers):
arr[i,:] = str_to_int(numb[:-1]), 1 if numb[:-1] =='+' else -1, i, identifier
return arr
Answer approach
def expand4(t):
identifier, l = t.split('\t')
identifier = np.int(identifier[-1])
numbers = np.array([np.int(k[:-1]) for k in l.split(' ')])
signs = np.array([(k[-1] == '+') for k in l.split(' ')]) * 2 - 1
N = len(numbers)
arr = np.empty(shape = (N, 4), dtype = np.int64)
arr[:, 0] = numbers
arr[:, 1] = signs
arr[:, 2] = identifier
arr[:, 3] = np.arange(N)
return arr
Test results:
Expand 1
72.7 ms ± 177 ms per loop (mean ± std. dev. of 7 runs, 5 loops each)
Expand 2
27.9 ms ± 67.1 ms per loop (mean ± std. dev. of 7 runs, 5 loops each)
Expand 3
8.81 ms ± 20.3 ms per loop (mean ± std. dev. of 7 runs, 5 loops each)
Expand 4 ANSWER 1
429 µs ± 63.4 µs per loop (mean ± std. dev. of 7 runs, 5 loops each)

I cannot replicate your code, as I also got "ord" is not implemented error for numba.
But why are you using numba? Your str_to_int operation seems to be very expensive and unoptimized for vector operations etc. Why not (without numba):
def expand(t):
identifier, l = t.split('\t')
identifier = np.int(identifier[-1])
numbers = np.array([np.int(k[:-1]) for k in l.split(' ')])
signs = np.array([(k[-1] == '+') for k in l.split(' ')]) * 2 - 1
N = len(numbers)
arr = np.empty(shape = (N, 4), dtype = np.int64)
arr[:, 0] = numbers
arr[:, 1] = signs
arr[:, 2] = identifier
arr[:, 3] = np.arange(N)
return arr
t = test_str()
%timeit expand(t)
>>>
1.01 ms ± 121 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Related

Efficient way to apply a function to elements of a numpy array?

I have an enormous 1D numpy array of booleans w and an increasing list of indices i, which splits w into len(i)+1 subarrays. A toy example is:
w=numpy.array([True,False,False,False,True,True,True,True,False,False])
i=numpy.array([0,0,2,5,5,8,8])
I wish to compute a numpy array wi, whose i-th entry is 1 if the i-th subarray contains a True and 0 otherwise. In other words, the i-th entry of w is the sum (logical 'or') of elements of the i-th subarray of w. In our example, the output is:
[0 0 1 1 0 1 0 0]
This is achieved with the code:
wi=numpy.fromiter(map(numpy.any,numpy.split(w,i)),int)
Is there a more efficient way of doing this or is this optimal as far as memory is concerned?
P.S. related post
For efficiency (memory and performance), use np.bitwise_or.reduceat as it keeps the output in boolean -
In [10]: np.bitwise_or.reduceat(w,np.r_[0,i])
Out[10]: array([ True, True, False, True, False, False])
To have as int output, view as int -
In [11]: np.bitwise_or.reduceat(w,np.r_[0,i]).view('i1')
Out[11]: array([1, 1, 0, 1, 0, 0], dtype=int8)
Here's all-weather solution -
def slice_reduce_or(w, i):
valid = i<len(w)
invalidc =( ~valid).sum()
i = i[valid]
mi = np.r_[i[:-1]!=i[1:],True]
pp = i[mi]
p1 = np.bitwise_or.reduceat(w,pp)
N = len(i)+1
out = np.zeros(N+invalidc, dtype=bool)
out[1:N][mi] = p1
out[0] = w[:i[0]].any()
return out.view('i1')
Let's try np.add.reductat:
wi = np.add.reduceat(w,np.r_[0,i]).astype(bool)
output:
array([1, 1, 0, 1, 0, 0])
And performance:
%timeit -n 100 wi = np.add.reduceat(w,np.r_[0,i]).astype(bool).astype(int)
21.7 µs ± 7.86 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit -n 100 wi=np.fromiter(map(np.any,np.split(w,i)),int)
44.5 µs ± 7.79 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
So we're looking at about 2x speed here.

Element wise comparison in R

I'm attempting to write a for loop that will compare values between two individuals, but not the same individual. The following data frame contains values for five subjects:
Value1
Subject1 0
Subject2 1
Subject3 5
Subject4 6
Subject5 8
I've written a double loop that creates a 'Value2' variable based on the following criteria:
If the subject has a larger Value1, then the result is +1.
If the subject has an equal Value1, then the result is 0.
If the subject has a smaller Value1, then the result is -1.
For example, Subject 1's Value1 is smaller than the other four subjects; this should result in -4. So far the loop I've written works for the first subject but fails to iterate to the second subject.
Value2<-0
i = 0
w = 0
for(i in 1:length(Value1)){
for(j in 1:length(Value1)){
if(i != j){
Value1[i] = w
if(w > Value1[j]){
Value2[i] = Value2[i] + 1
}
if(w < Value1[j]){
Value2[i] = Value2[i] - 1
}
if(w == Value1[j]){
Value2[i] = Value2[i] + 0
}
}
}
}
If I'm understanding the problem correctly, this should give you what you want
x <- c(0, 1, 5, 6, 8)
colSums(outer(x, x, '<')) - colSums(outer(x, x, '>'))
# [1] -4 -2 0 2 4
Or
-colSums(sign(outer(x, x, '-')))
# [1] -4 -2 0 2 4
Edit: If your vector is large (or even if it isn't, really) use d.b.'s rank method instead. The outer function will create an NxN matrix where N is the length of x. For example, when x is sample(1e5) outer will attempt to create a matrix >30Gb in size! This means most people's laptops in 2019 don't even have enough memory for this method to work on large vectors. With this same x, the method using rank provided by d.b. returns the result almost instantly.
Benchmark for vector of size 1000
x <- sample(1000)
microbenchmark(
outer_diff = colSums(-sign(outer(x, x, '-'))),
outer_gtlt = colSums(outer(x, x, '<')) - colSums(outer(x, x, '>')),
rank = {r <- rank(x); 2*(r - mean(r))}
)
# Unit: microseconds
# expr min lq mean median uq max neval cld
# outer_diff 15930.26 16872.4175 20946.2980 18030.776 25346.677 38668.324 100 b
# outer_gtlt 14168.21 15120.4165 28970.7731 16698.264 23857.651 352390.298 100 b
# rank 111.18 141.5385 170.8885 177.026 188.513 282.257 100 a
x = c(0, 1, 5, 6, 8)
r = rank(x)
ans = 2 * (r - mean(r))
ans
#[1] -4 -2 0 2 4
#IceCreamToucan's benchmark considers cases with distinct values (sampling without replacement), but if we extend to repeated values (covered by criterion 2 in the OP), I figured tabulating first saves time.
library(data.table)
# from #d.b's answer and comments from d.b, ICT
fdb = function(x) {
r = frank(x)
2 * (r - mean(r))
}
# from #chinsoon's comment and some algebra
fdb2 = function(x) {
r = frank(x)
2 * r - length(x) - 1
}
# tabulation with data.table
ff = function(x){
nx = length(x)
xDT = setDT(list(x=x))
resDT = xDT[, .N, keyby=x][, res := 2L*cumsum(N) - N - nx]
resDT[xDT, x.res]
}
Sample data and results:
nv = 1e4 # number of values
n = 1e7 # length of vector
x = sample(nv, n, replace=TRUE)
system.time(res_fdb <- fdb(x))
# user system elapsed
# 0.32 0.09 0.24
system.time(res_fdb2 <- fdb2(x))
# user system elapsed
# 0.25 0.13 0.27
system.time(res_ff <- ff(x))
# user system elapsed
# 0.58 0.24 0.50
identical(res_ff, as.integer(res_fdb)) # TRUE
identical(res_ff, as.integer(res_fdb2)) # TRUE
Turns out ff() not as fast as direct use of data.table::frank, taking roughly twice as long because grouping by distinct values is done twice: once to count, and again in a lookup.
I guess the tabulation can also be done with base R's table.
ft = function(x){
nx = length(x)
N = table(x)
cN = cumsum(N)
res = 2L*cN - N - nx
as.vector(res[as.character(x)])
}
system.time(res_ft <- ft(x))
# user system elapsed
# 7.58 0.34 7.93
identical(res_ff, res_ft)
# [1] TRUE

Total numbers having frequency k in a given range

How to find total numbers having frequency=k in a particular range(l,r) in a given array. There are total 10^5 queries of format l,r and each query is built on the basis of previous query's answer. In particular, after each query we increment l by the result of the query, swapping l and r if l > r. Note that 0<=a[i]<=10^9. Total elements in array is n=10^5.
My Attempt:
n,k,q = map(int,input().split())
a = list(map(int,input().split()))
ans = 0
for _ in range(q):
l,r = map(int,input().split())
l+=ans
l%=n
r+=ans
r%=n
if l>r:
l,r = r,l
d = {}
for i in a[l:r+1]:
try:
d[i]+=1
except:
d[i] = 1
curr_ans = 0
for i in d.keys():
if d[i]==k:
curr_ans+=1
ans = curr_ans
print(ans)
Sample Input:
5 2 3
7 6 6 5 5
0 4
3 0
4 1
Sample Output:
2
1
1
If the number of different values in the array is not too large, you may consider storing arrays as long as the input array, one per unique value, counting the number of appearances of the value until each point. Then you just need to subtract the end values from the beginning values to find how many frequency matches are there:
def range_freq_queries(seq, k, queries):
n = len(seq)
c = freq_counts(seq)
result = [0] * len(queries)
offset = 0
for i, (l, r) in enumerate(queries):
result[i] = range_freq_matches(c, offset, l, r, k, n)
offset = result[i]
return result
def freq_counts(seq):
s = {v: i for i, v in enumerate(set(seq))}
counts = [None] * (len(seq) + 1)
counts[0] = [0] * len(s)
for i, v in enumerate(seq, 1):
counts[i] = list(counts[i - 1])
j = s[v]
counts[i][j] += 1
return counts
def range_freq_matches(counts, offset, start, end, k, n):
start, end = sorted(((start + offset) % n, (end + offset) % n))
num = 0
return sum(1 for cs, ce in zip(counts[start], counts[end + 1]) if ce - cs == k)
seq = [7, 6, 6, 5, 5]
k = 2
queries = [(0, 4), (3, 0), (4, 1)]
print(range_freq_queries(seq, k, queries))
# [2, 1, 1]
You can do it faster with NumPy, too. Since each result depends on the previous one, you will have to loop in any case, but you can use Numba to really accelerate things up:
import numpy as np
import numba as nb
def range_freq_queries_np(seq, k, queries):
seq = np.asarray(seq)
c = freq_counts_np(seq)
return _range_freq_queries_np_nb(seq, k, queries, c)
#nb.njit # This is not necessary but will make things faster
def _range_freq_queries_np_nb(seq, k, queries, c):
n = len(seq)
offset = np.int32(0)
out = np.empty(len(queries), dtype=np.int32)
for i, (l, r) in enumerate(queries):
l = (l + offset) % n
r = (r + offset) % n
l, r = min(l, r), max(l, r)
out[i] = np.sum(c[r + 1] - c[l] == k)
offset = out[i]
return out
def freq_counts_np(seq):
uniq = np.unique(seq)
seq_pad = np.concatenate([[uniq.max() + 1], seq])
comp = seq_pad[:, np.newaxis] == uniq
return np.cumsum(comp, axis=0)
seq = np.array([7, 6, 6, 5, 5])
k = 2
queries = [(0, 4), (3, 0), (4, 1)]
print(range_freq_queries_np(seq, k, queries))
# [2 1 2]
Let's compare it with the original algorithm:
from collections import Counter
def range_freq_queries_orig(seq, k, queries):
n = len(seq)
ans = 0
counter = Counter()
out = [0] * len(queries)
for i, (l, r) in enumerate(queries):
l += ans
l %= n
r += ans
r %= n
if l > r:
l, r = r, l
counter.clear()
counter.update(seq[l:r+1])
ans = sum(1 for v in counter.values() if v == k)
out[i] = ans
return out
Here is a quick test and timing:
import random
import numpy
# Make random input
random.seed(0)
seq = random.choices(range(1000), k=5000)
queries = [(random.choice(range(len(seq))), random.choice(range(len(seq))))
for _ in range(20000)]
k = 20
# Input as array for NumPy version
seq_arr = np.asarray(seq)
# Check all functions return the same result
res1 = range_freq_queries_orig(seq, k, queries)
res2 = range_freq_queries(seq, k, queries)
print(all(r1 == r2 for r1, r2 in zip(res1, res2)))
# True
res3 = range_freq_queries_np(seq_arr, k, queries)
print(all(r1 == r3 for r1, r3 in zip(res1, res3)))
# True
# Timings
%timeit range_freq_queries_orig(seq, k, queries)
# 3.07 s ± 1.11 s per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit range_freq_queries(seq, k, queries)
# 1.1 s ± 307 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit range_freq_queries_np(seq_arr, k, queries)
# 265 ms ± 726 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)
Obviously the effectiveness of this depends on the characteristics of the data. In particular, if there are fewer repeated values the time and memory cost to construct the counts table will approach O(n2).
Let's say the input array is A, |A|=n. I'm going to assume that the number of distinct elements in A is much smaller than n.
We can divide A into sqrt(n) segments each of size sqrt(n). For each of these segments, we can calculate a map from element to count. Building these maps takes O(n) time.
With that preprocessing done, we can answer each query by adding together all the maps wholly contained in (l,r), of which there are at most sqrt(n), then adding any extra elements (or going one segment over and subtracting), also sqrt(n).
If there are k distinct elements, this takes O(sqrt(n) * k) so in the worst case O(n) if in fact every element of A is distinct.
You can keep track of the elements that have the desired count while combining the hashes and extra elements.

Optimized method to partition numpy 2D array

I am trying to partition a 2D numpy array into 2 separate numpy arrays based on the contents
of a particular column. This is my code:
import numpy as np
import pandas as pd
#profile
def partition_data(arr,target_colm):
total_colms = arr.shape[1]
target_data = arr[:,target_colm]
type1_data = []
type2_data = []
for i in range(arr.shape[0]):
if target_data[i]==0: # if value==0, put in another array
type1_data = np.append(type1_data,arr[i])
else:
type2_data = np.append(type2_data,arr[i])
type1_data = np.array(type1_data).reshape(int(len(type1_data)/total_colms),total_colms)
type2_data = np.array(type2_data).reshape(int(len(type2_data)/total_colms),total_colms)
return type1_data, type2_data
d = pd.read_csv('data.csv').values
x,y = partition_data(d,7) # check values of 7th column
Note: For my experiment, I have used a array of (14359,42) elements.
Now, when I profile this function using kernprof line profiler, I get the following results.
Wrote profile results to code.py.lprof
Timer unit: 1e-06 s
Total time: 7.3484 s
File: code2.py
Function: part_data at line 8
Line # Hits Time Per Hit % Time Line Contents
==============================================================
8 #profile
9 def part_data(arr,target_col):
10 1 7.0 7.0 0.0 total_colms = arr.shape[1]
11 1 14.0 14.0 0.0 target_data = arr[:,target_col]
12 1 2.0 2.0 0.0 type1_data = []
13 1 1.0 1.0 0.0 type2_data = []
14 5161 40173.0 7.8 0.5 for i in range(arr.shape[0]):
15 5160 39225.0 7.6 0.5 if target_data[i]==6:
16 4882 7231260.0 1481.2 98.4 type1_data = np.append(type1_data,arr[i])
17 else:
18 278 33915.0 122.0 0.5 type2_data = np.append(type2_data,arr[i])
19 1 3610.0 3610.0 0.0 type1_data = np.array(type1_data).reshape(int(len(type1_data)/total_colms),total_colms)
20 1 187.0 187.0 0.0 type2_data = np.array(type2_data).reshape(int(len(type2_data)/total_colms),total_colms)
21 1 3.0 3.0 0.0 return type1_data, type2_data
Here, one line-16 takes up significant time. In future, the real data size I will work with will be much bigger.
Can anyone please suggest a faster method of partitioning a numpy array?
This should make it alot faster:
def partition_data_vectorized(arr, target_colm):
total_colms = arr.shape[1]
target_data = arr[:,target_colm]
mask = target_data == 0
type1_data = arr[mask, :]
type2_data = arr[~mask, :]
return (
type1_data.reshape(int(type1_data.size / total_colms), total_colms),
type2_data.reshape(int(type2_data.size / total_colms), total_colms))
Some timings:
# Generate some sample inputs:
arr = np.random.rand(10000, 42)
arr[:, 7] = np.random.randint(0, 10, 10000)
%timeit c, d = partition_data_vectorized(arr, 7)
# 2.09 ms ± 200 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%timeit a, b = partition_data(arr, 7)
# 4.07 s ± 102 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
This is 2000 times faster than the non-vectorized calculation!
Comparing the results:
np.all(b == d)
# Out: True
np.all(a == c)
# Out: True
So the results are correct and it is 2000 times faster just by replacing the for-loop and the repeated array creation with np.append by vectorized operations.

mean of parts of an array in octave

I have two arrays. One is a list of lengths within the other. For example
zarray = [1 2 3 4 5 6 7 8 9 10]
and
lengths = [1 3 2 1 3]
I want to average (mean) over parts the first array with lengths given by the second. For this example, resulting in:
[mean([1]),mean([2,3,4]),mean([5,6]),mean([7]),mean([8,9,10])]
I am trying to avoid looping, for the sake of speed. I tried using mat2cell and cellfun as follows
zcell = mat2cell(zarray,[1],lengths);
zcellsum = cellfun('mean',zcell);
But the cellfun part is very slow. Is there a way to do this without looping or cellfun?
Here is a fully vectorized solution (no explicit for-loops, or hidden loops with ARRAYFUN, CELLFUN, ..). The idea is to use the extremely fast ACCUMARRAY function:
%# data
zarray = [1 2 3 4 5 6 7 8 9 10];
lengths = [1 3 2 1 3];
%# generate subscripts: 1 2 2 2 3 3 4 5 5 5
endLocs = cumsum(lengths(:));
subs = zeros(endLocs(end),1);
subs([1;endLocs(1:end-1)+1]) = 1;
subs = cumsum(subs);
%# mean of each part
means = accumarray(subs, zarray) ./ lengths(:)
The result in this case:
means =
1
3
5.5
7
9
Speed test:
Consider the following comparison of the different methods. I am using the TIMEIT function by Steve Eddins:
function [t,v] = testMeans()
%# generate test data
[arr,len] = genData();
%# define functions
f1 = #() func1(arr,len);
f2 = #() func2(arr,len);
f3 = #() func3(arr,len);
f4 = #() func4(arr,len);
%# timeit
t(1) = timeit( f1 );
t(2) = timeit( f2 );
t(3) = timeit( f3 );
t(4) = timeit( f4 );
%# return results to check their validity
v{1} = f1();
v{2} = f2();
v{3} = f3();
v{4} = f4();
end
function [arr,len] = genData()
%#arr = [1 2 3 4 5 6 7 8 9 10];
%#len = [1 3 2 1 3];
numArr = 10000; %# number of elements in array
numParts = 500; %# number of parts/regions
arr = rand(1,numArr);
len = zeros(1,numParts);
len(1:end-1) = diff(sort( randperm(numArr,numParts) ));
len(end) = numArr - sum(len);
end
function m = func1(arr, len)
%# #Drodbar: for-loop
idx = 1;
N = length(len);
m = zeros(1,N);
for i=1:N
m(i) = mean( arr(idx+(0:len(i)-1)) );
idx = idx + len(i);
end
end
function m = func2(arr, len)
%# #user1073959: MAT2CELL+CELLFUN
m = cellfun(#mean, mat2cell(arr, 1, len));
end
function m = func3(arr, len)
%# #Drodbar: ARRAYFUN+CELLFUN
idx = arrayfun(#(a,b) a-(0:b-1), cumsum(len), len, 'UniformOutput',false);
m = cellfun(#(a) mean(arr(a)), idx);
end
function m = func4(arr, len)
%# #Amro: ACCUMARRAY
endLocs = cumsum(len(:));
subs = zeros(endLocs(end),1);
subs([1;endLocs(1:end-1)+1]) = 1;
subs = cumsum(subs);
m = accumarray(subs, arr) ./ len(:);
if isrow(len)
m = m';
end
end
Below are the timings. Tests were performed on a WinXP 32-bit machine with MATLAB R2012a. My method is an order of magnitude faster than all other methods. For-loop is second best.
>> [t,v] = testMeans();
>> t
t =
0.013098 0.013074 0.022407 0.00031807
| | | \_________ #Amro: ACCUMARRAY (!)
| | \___________________ #Drodbar: ARRAYFUN+CELLFUN
| \______________________________ #user1073959: MAT2CELL+CELLFUN
\__________________________________________ #Drodbar: FOR-loop
Furthermore all results are correct and equal -- differences are in the order of eps the machine precision (caused by different ways of accumulating round-off errors), therefore considered rubbish and simply ignored:
%#assert( isequal(v{:}) )
>> maxErr = max(max( diff(vertcat(v{:})) ))
maxErr =
3.3307e-16
Here is a solution using arrayfun and cellfun
zarray = [1 2 3 4 5 6 7 8 9 10];
lengths = [1 3 2 1 3];
% Generate the indexes for the elements contained within each length specified
% subset. idx would be {[1], [4, 3, 2], [6, 5], [7], [10, 9, 8]} in this case
idx = arrayfun(#(a,b) a-(0:b-1), cumsum(lengths), lengths,'UniformOutput',false);
means = cellfun( #(a) mean(zarray(a)), idx);
Your desired output result:
means =
1.0000 3.0000 5.5000 7.0000 9.0000
Following #tmpearce comment I did a quick time performance comparison between above's solution, from which I create a function called subsetMeans1
function means = subsetMeans1( zarray, lengths)
% Generate the indexes for the elements contained within each length specified
% subset. idx would be {[1], [4, 3, 2], [6, 5], [7], [10, 9, 8]} in this case
idx = arrayfun(#(a,b) a-(0:b-1), cumsum(lengths), lengths,'UniformOutput',false);
means = cellfun( #(a) mean(zarray(a)), idx);
and a simple for loop alternative, function subsetMeans2.
function means = subsetMeans2( zarray, lengths)
% Method based on single loop
idx = 1;
N = length(lengths);
means = zeros( 1, N);
for i = 1:N
means(i) = mean( zarray(idx+(0:lengths(i)-1)) );
idx = idx+lengths(i);
end
Using the next test scrip, based on TIMEIT, that allows checking performance varying the number of elements on the input vector and sizes of elements per subset:
% Generate some data for the performance test
% Total of elements on the vector to test
nVec = 100000;
% Max of elements per subset
nSubset = 5;
% Data generation aux variables
lenghtsGen = randi( nSubset, 1, nVec);
accumLen = cumsum(lenghtsGen);
maxIdx = find( accumLen < nVec, 1, 'last' );
% % Original test data
% zarray = [1 2 3 4 5 6 7 8 9 10];
% lengths = [1 3 2 1 3];
% Vector to test
zarray = 1:nVec;
lengths = [ lenghtsGen(1:maxIdx) nVec-accumLen(maxIdx)] ;
% Double check that nVec is will be the max index
assert ( sum(lengths) == nVec)
t1(1) = timeit(#() subsetMeans1( zarray, lengths));
t1(2) = timeit(#() subsetMeans2( zarray, lengths));
fprintf('Time spent subsetMeans1: %f\n',t1(1));
fprintf('Time spent subsetMeans2: %f\n',t1(2));
It turns out that the non-vectorised version without arrayfun and cellfun is faster, presumably due to the extra overhead of those functions
Time spent subsetMeans1: 2.082457
Time spent subsetMeans2: 1.278473

Resources