Say I have a 500000x1 array called A. I want to divide this array into 1000 equal sections, and then calculate the mean of that section. So I will end up with a 1000x1 array called B, in which B[1] is the mean of A[1:500], B[2] is the mean of B[501:1000]`, and so on. Since I will be doing this many many times, I want to do it efficiently. What's the most effective way of doing this in Matlab/Python?
NumPy/Python
We could reshape to have 500 columns and then compute average along the second axis -
A.reshape(-1,500).mean(axis=1)
Sample run -
In [89]: A = np.arange(50)+1;
In [90]: A.reshape(-1,5).mean(1)
Out[90]: array([ 3., 8., 13., 18., 23., 28., 33., 38., 43., 48.])
Runtime test :
An alternative method to get those average values would be with the old-fashioned way of computing the sum and then dividing by the number of elements involved in the summation. Let's time these two methods -
In [107]: A = np.arange(500000)+1;
In [108]: %timeit A.reshape(-1,500).mean(1)
1000 loops, best of 3: 1.19 ms per loop
In [109]: %timeit A.reshape(-1,500).sum(1)/500.0
1000 loops, best of 3: 583 µs per loop
Seems, like quite an improvement there with the alternative method! But wait, it's because with mean method NumPy is converting to float type by default and that conversion overhead showed up here.
So, if we use float type input arrays, we would have a different and a fair scenario -
In [144]: A = np.arange(500000).astype(float)+1;
In [145]: %timeit A.reshape(-1,500).mean(1)
1000 loops, best of 3: 534 µs per loop
In [146]: %timeit A.reshape(-1,500).sum(1)/500.0
1000 loops, best of 3: 516 µs per loop
MATLAB
With column-major ordering, we would reshape to have 500 rows and then average along the first dimension -
mean(reshape(A,500,[]),1)
Sample run -
>> A = 1:50;
>> mean(reshape(A,5,[]),1)
ans =
3 8 13 18 23 28 33 38 43 48
Runtime test :
Let's try out the old-fashioned way here too -
>> A = 1:500000;
>> func1 = #() mean(reshape(A,500,[]),1);
>> timeit(func1)
ans =
0.0013021
>> func2 = #() sum(reshape(A,500,[]),1)/500.0;
>> timeit(func2)
ans =
0.0012291
Related
Basically I have two 1d numpy arrays, let's call them x and y, both of the same length. I want to essentially get the result x1y1 + x2y2 + ... + xn*yn. Obviously I could do this with a for loop but is there a built-in method or something where I can do this in one line?
What you are trying to compute is known as an 'inner product' and, in the case of two vectors, is called a 'dot product'. Numpy has built-in functions for computing both which are optimized for speed over the simple (x*y).sum() solution.
import numpy as np
a = np.array([1, 2, 3])
b = np.array([3, 2, 1])
print(np.inner(a, b))
# 10
print(np.dot(a, b))
# 10
Some timing results in the table below with vectors a and b being 1000 randomly selected elements using np.random.randn:
np.dot(a, b) # 920 ns ± 9.9 ns
np.inner(a, b) # 1.1 µs ± 83.5 ns
(a*b).sum() # 4.2 µs ± 62.9 ns
np.sum(a*b) # 5.7 µs ± 170 ns
You can use sum(x*y) or (x*y).sum(), they're equivalent.
I have a code where I do a lot of basic arithmetic calculations with a bunch of numerical data that is is multiple arrays. I have realized that in most concievable operations, numpy classes are always slower than the default python ones. Why is this?
For example I have a simple snippet where all I do is just update 1 numpy array element with another one retrieved from another numpy array, or I update it with the mathematical product of 2 other numpy array elements. It should be a basic operation, yet it will always be at least 2-3x slower than if I do it with list.
First I thought that it's because I haven't harmonized the data structures and the compiler has to do a lot of unecessary transformations. So then I recoded the whole thing and replaced every float with numpy.float64 and every list with numpy.ndarray, and the entire data is numpy.float64 all across the code so that it doesn't have to do any unecessary transformations.
The code is still 2-3 times slower than if I just use list and float.
For example:
ALPHA = [[random.uniform(*a_param) for k in range(l2)] for l in range(l1)]
COEFF = [[random.uniform(*c_param) for k in range(l2)] for l in range(l1)]
summa=0.0
for l in range(l1):
for k in range(l2):
summa+=COEFF[l][k] * ALPHA[l][k]
will always be 2-3x faster than:
ALPHA = numpy.random.uniform(*a_param, (l1,l2))
COEFF = numpy.random.uniform(*c_param, (l1,l2))
summa=0.0
for l in range(l1):
for k in range(l2):
summa+=COEFF[l][k] * ALPHA[l][k]
How is this possible, am I doing something wrong , since numpy is supposed to speed up things.
For the record I am using Python 3.5.3 and numpy (1.12.1), should I update?
Modifying a single element of a NumPy array is not expected to be faster than modifying a single element of a Python list. The speedup from using NumPy comes when you perform "vectorized" operations on entire arrays (or subsets of arrays). Try assigning the first 10000 elements of a NumPy array to be equal to the first 10000 elements of another, and compare that with using lists.
If your data and/or operations are very small (one or just a few elements), you are probably better off not using NumPy.
I tried two things:
Running your two blocks of code. For me, they were about the same speed.
Writing a new function that exploits numpy's vectorized math. This is several times faster than the other methods.
Here are my functions:
import numpy as np
def with_lists(l1, l2):
ALPHA = [[random.uniform(0, 1) for k in range(l2)] for l in range(l1)]
COEFF = [[random.uniform(0, 1) for k in range(l2)] for l in range(l1)]
summa=0.0
for l in range(l1):
for k in range(l2):
summa+=COEFF[l][k] * ALPHA[l][k]
return summa
def with_arrays(l1, l2):
ALPHA = np.random.uniform(size=(l1,l2))
COEFF = np.random.uniform(size=(l1,l2))
summa=0.0
for l in range(l1):
for k in range(l2):
summa+=COEFF[l][k] * ALPHA[l][k]
return summa
def with_ufunc(l1, l2):
"""Avoid the loop completely by exploitng numpy's
elementwise math."""
ALPHA = np.random.uniform(size=(l1,l2))
COEFF = np.random.uniform(size=(l1,l2))
return np.sum(COEFF * ALPHA)
When I compare the speed (I'm using the %timeit magic in IPython), I get the following:
>>> %timeit with_lists(10, 10)
107 µs ± 4.7 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> %timeit with_arrays(10, 10)
91.9 µs ± 10.5 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> %timeit with_ufunc(10, 10)
12.6 µs ± 589 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
The third function, without loops, about 10 to 30 times faster on my machine, depending on the values of l1 and l2.
Say I have the following numpy array
n = 50
a = np.array(range(1, 1000)) / 1000.
I would like to execute this line of code
%timeit v = [a ** k for k in range(0, n)]
1000 loops, best of 3: 2.01 ms per loop
However, this line of code will ultimately be executed in a loop, therefore I have performance issues.
Is there a way to optimize the loop? For example, the result of a specific calculation i in the list comprehension is simply the result of the previous calculation result in the loop, multiplied by a again.
I don't mind storing the results in a 2d-array instead of arrays in a list. That would probably be cleaner. By the way, I also tried the following, but it yields similar performance results:
k = np.array(range(0, n))
ones = np.ones(n)
temp = np.outer(a, ones)
And then performed the following calculation
%timeit temp ** k
1000 loops, best of 3: 1.96 ms per loop
or
%timeit np.power(temp, k)
1000 loops, best of 3: 1.92 ms per loop
But both yields similar results to the list comprehension above. By the way, n will always be an integer in my case.
In quick tests cumprod seems to be faster.
In [225]: timeit v = np.array([a ** k for k in range(0, n)])
2.76 ms ± 1.62 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [228]: %%timeit
...: A=np.broadcast_to(a[:,None],(len(a),50))
...: v1=np.cumprod(A,axis=1)
...:
208 µs ± 42.3 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
To compare values I have to tweak ranges, since v includes a 0 power, while v1 starts with a 1 power:
In [224]: np.allclose(np.array(v)[1:], v1.T[:-1])
Out[224]: True
But the timings suggest that cumprod is worth refining.
The proposed duplicate was Efficient way to compute the Vandermonde matrix. That still has good ideas.
I have a numpy array with a shape of:
(11L, 5L, 5L)
I want to calculate the mean over the 25 elements of each 'slice' of the array [0, :, :], [1, :, :] etc, returning 11 values.
It seems silly, but I can't work out how to do this. I've thought the mean(axis=x) function would do this, but I've tried all possible combinations of axis and none of them give me the result I want.
I can obviously do this using a for loop and slicing, but surely there is a better way?
Use a tuple for axis :
>>> a = np.arange(11*5*5).reshape(11,5,5)
>>> a.mean(axis=(1,2))
array([ 12., 37., 62., 87., 112., 137., 162., 187., 212.,
237., 262.])
Edit: This works only with numpy version 1.7+.
You can reshape(11, 25) and then call mean only once (faster):
a.reshape(11, 25).mean(axis=1)
Alternatively, you can call np.mean twice (about 2X slower on my computer):
a.mean(axis=2).mean(axis=1)
Can always use np.einsum:
>>> a = np.arange(11*5*5).reshape(11,5,5)
>>> np.einsum('...ijk->...i',a)/(a.shape[-1]*a.shape[-2])
array([ 12, 37, 62, 87, 112, 137, 162, 187, 212, 237, 262])
Works on higher dimensional arrays (all of these methods would if the axis labels are changed):
>>> a = np.arange(10*11*5*5).reshape(10,11,5,5)
>>> (np.einsum('...ijk->...i',a)/(a.shape[-1]*a.shape[-2])).shape
(10, 11)
Faster to boot:
a = np.arange(11*5*5).reshape(11,5,5)
%timeit a.reshape(11, 25).mean(axis=1)
10000 loops, best of 3: 21.4 us per loop
%timeit a.mean(axis=(1,2))
10000 loops, best of 3: 19.4 us per loop
%timeit np.einsum('...ijk->...i',a)/(a.shape[-1]*a.shape[-2])
100000 loops, best of 3: 8.26 us per loop
Scales slightly better then the other methods as array size increases.
Using dtype=np.float64 does not change the above timings appreciably, so just to double check:
a = np.arange(110*50*50,dtype=np.float64).reshape(110,50,50)
%timeit a.reshape(110,2500).mean(axis=1)
1000 loops, best of 3: 307 us per loop
%timeit a.mean(axis=(1,2))
1000 loops, best of 3: 308 us per loop
%timeit np.einsum('...ijk->...i',a)/(a.shape[-1]*a.shape[-2])
10000 loops, best of 3: 145 us per loop
Also something that is interesting:
%timeit np.sum(a) #37812362500.0
100000 loops, best of 3: 293 us per loop
%timeit np.einsum('ijk->',a) #37812362500.0
100000 loops, best of 3: 144 us per loop
I have an array containing many values between 0 and 360 (like degrees in a circle), but unevenly distributed:
1,45,46,47,48,49,50,51,52,53,54,55,100,120,140,188, 210, 280, 355
Now I need to reduce those values to e.g. 4 only, but as evenly as possible distributed values.
How to do that?
Thanks,
Jan
Put the numbers on a circle, like a clock. Now construct a logical cross, say at 12, 3, 6, and 9 o’clock. Put the 12 at the first number. Now find what numbers would be nearest to 3, 6, and 9 o’clock, and record the sum of those three numbers’ distances next to the first number.
Iterate by rotating the top of your cross — the 12 o’clock point — clockwise until it exactly lines up with the next number. Again measure how far the nearest numbers are to each of your three other crosspoints, and record that score next to this current 12 o’clock number.
Repeat until you reach your 12 o’clock has rotated all the way to the original 3 o’clock, at which point you’re done. Whichever number has the lowest sum assigned to it determines the winning configuration.
This solution generalizes to any range of values R and any number N of final points you wish to reduce the set to. Each point on the “cross” is R/N away from each other, and you need only rotate until the top of your cross reaches where the next arm was in the original position. So if you wanted 6 points, you would have a 6-pointed cross, each 60 degrees apart instead of a 4-pointed cross each 90 degrees apart. If your range is different, you still do the same sort of operation. That way you don’t need a physical clock and cross to implement this algorithm: it works for any R and N.
I feel bad about this answer from a Perl perspective, as I’ve not managed to include any dollar signs in the solution. :)
Use a clustering algorithm to divide your data into evenly distributed partitions. Then grab a random value from each cluster. The following $datafile looks like this:
1 1
45 45
46 46
...
210 210
280 280
355 355
First column is a tag, second column is data. Running the following with $K = 4:
use strict; use warnings;
use Algorithm::KMeans;
my $datafile = $ARGV[0] or die;
my $K = $ARGV[1] or 0;
my $mask = 'N1';
my $clusterer = Algorithm::KMeans->new(
datafile => $datafile,
mask => $mask,
K => $K,
terminal_output => 0,
);
$clusterer->read_data_from_file();
my ($clusters, $cluster_centers) = $clusterer->kmeans();
my %clusters;
while (#$clusters) {
my $cluster = shift #$clusters;
my $center = shift #$cluster_centers;
$clusters{"#$center"} = $cluster->[int rand( #$cluster - 1)];
}
use YAML; print Dump \%clusters;
returns this:
120: 120
199: 188
317.5: 355
45.9166666666667: 46
First column is the center of the cluster, second is the selected value from that cluster. The centers' distance to one another should be maximized according to the Expectation Maximization algorithm.