What is the algorithm behind the axis option in numpy? - arrays

I cannot understand how axis in numpy works for any general n-D array.
For example, np.mean(array, axis=1) will give an expected array regardless of the shape of the array.
If I want to make a code without using numpy, I can make averaging function for k-D array, e.g., mean_1d, mean_2d, ... with axis option. However, I cannot imagine how to make a single function which works for any input array with any k value. So far, I couldn't find an explanation how numpy axis works. Can anybody help me understanding what's going on behind the scene?
Added: My original goal was to make a code to do a fast array combination with outlier rejection (i.e., sigma-clipped median combine of a list of n-D arrays and obtain an (n-1)-D array) using numba.

If the size of your arrays allow you can see it recursively. I have just tested the code below and it seems to work fine for sum function along axis 3. It is simple but it should convey the idea :
import numpy as np
a = np.ones((3,4,5,6,7))
# first get the shape of the result
def shape_res(sh1, axis) :
sh2 = [sh1[0]]
if len(sh1)>0 :
for i in range(len(sh1)) :
if i > 0 and i != axis :
sh2.append(sh1[i])
elif i == axis :
# shrink the summing axis
sh2.append(1)
return sh2
def sum_axis(a, axis=0) :
cur_sum = 0
sh1 = np.shape(a)
sh2 = shape_res(sh1, axis)
res = np.zeros(sh2)
print(sh1, sh2)
def rec_fun (a, res, cur_axis) :
for i in range(len(a)) :
if axis > cur_axis :
# dig in the array to reach the right axis
next_axis = cur_axis + 1
rec_fun(a[i], res[i], next_axis)
elif axis == cur_axis :
# sum along the given axis
res[0] += a[i]
rec_fun(a, res, cur_axis)
return res
result = sum_axis(a, axis=3)
print(result)

Related

pyQtGraph detect widget bounds and start scrolling

I'm trying to plot a live graph (a graph about the evolution of a stock trading) with pyQtGraph and have some questions I haven't been able to solve checking the examples:
I want the graph to start painting from left to right (this is what happens by default) and when it reaches right side of bounding box instead of resize it to make all new data fit I would like it to scroll making new data enter from the right and old data dissapearing to the left.
I know that appending data to a numpy array creates a new instance of the array. I don't want this. Is there any way to tell pyQtGraph plot to just get data in a range of the numpy array? For exmaple could I instantiate initially an array of 10000 floats and tell pyQtGraph to just plot the first 100 floats?
On the other hand I have come across that I could just modify the array in-place and shift the numbers to simulate the scrolling. Is there any way to make pyQtGraph use a numpy array as a ring? This way I would only need to tell that the graph starts at an index and everything would work without allocations, etc...
Here the code I have so far, pretty simple:
class Grapher(QtWidgets.QMainWindow, Ui_MainWindow):
def __init__(self):
QtWidgets.QMainWindow.__init__(self)
Ui_MainWindow.__init__(self)
self.setupUi(self)
self.graphicsView.setTitle('My Graph')
self.graphicsView.setLabel('bottom', 'X axis')
self.graphicsView.setLabel('left', 'Y axis')
self.graphicsView.setLimits(xMin=0, yMin=0)
self.graphicsView.showGrid(x=True, y=True)
self.x=np.array();
self.y=np.array();
self._timer=QtCore.QTimer()
self._timer.timeout.connect(self.update)
self._timer.start(1000)
self._timerCounter=0;
def update(self):
self.x=np.append(self.x, [self._timerCounter]);
self.y=np.append(self.y, [math.sin(self._timerCounter)])
self.graphicsView.plot(self.x, self.y)
self._timerCounter+=1;
Thanks in advance.
This is how I would do it. If, say, you have an array of 1000 points but you only want to plot 200 points:
class Grapher(QtWidgets.QMainWindow, Ui_MainWindow):
def __init__(self):
QtWidgets.QMainWindow.__init__(self)
Ui_MainWindow.__init__(self)
self.setupUi(self)
self.graphicsView.setTitle('My Graph')
self.graphicsView.setLabel('bottom', 'X axis')
self.graphicsView.setLabel('left', 'Y axis')
self.graphicsView.setLimits(xMin=0, yMin=0)
self.graphicsView.showGrid(x=True, y=True)
self.plot = self.graphicsView.plot()
self.ptr = 0
self.x = np.zeros(1000)
self.y = np.zeros(1000)
self.xPlot = np.zeros(200)
self.yPlot = np.zeros(200)
self._timer = QtCore.QTimer()
self._timer.timeout.connect(self.update)
self._timer.start(1000)
self._timerCounter = 0
def update(self):
xNew = self._timerCounter
yNew = math.sin(xNew)
self.y[self.ptr] = math.sin(self._timerCounter)
self.x[self.ptr] = self._timerCounter
if self.ptr < 200:
# Normal assignment
self.yPlot[self.ptr] = self.y[self.ptr]
self.xPlot[self.ptr] = self.x[self.ptr]
self.plot.setData(self.xPlot[1:self.ptr + 1],
self.yPlot[1:self.ptr + 1])
else:
# Shift arrays and then assign
self.yPlot[:-1] = self.yPlot[1:]
self.yPlot[-1] = self.y[self.ptr]
self.xPlot[:-1] = self.xPlot[1:]
self.xPlot[-1] = self.x[self.ptr]
self.plot.setData(self.xPlot, self.yPlot)
self.ptr += 1

Is there a way to reshape an array that does not maintain the original size (or a convenient work-around)?

As a simplified example, suppose I have a dataset composed of 40 sorted values. The values of this example are all integers, though this is not necessarily the case for the actual dataset.
import numpy as np
data = np.linspace(1,40,40)
I am trying to find the maximum value inside the dataset for certain window sizes. The formula to compute the window sizes yields a pattern that is best executed with arrays (in my opinion). For simplicity sake, let's say the indices denoting the window sizes are a list [1,2,3,4,5]; this corresponds to window sizes of [2,4,8,16,32] (the pattern is 2**index).
## this code looks long because I've provided docstrings
## just in case the explanation was unclear
def shapeshifter(num_col, my_array=data):
"""
This function reshapes an array to have 'num_col' columns, where
'num_col' corresponds to index.
"""
return my_array.reshape(-1, num_col)
def looper(num_col, my_array=data):
"""
This function calls 'shapeshifter' and returns a list of the
MAXimum values of each row in 'my_array' for 'num_col' columns.
The length of each row (or the number of columns per row if you
prefer) denotes the size of each window.
EX:
num_col = 2
==> window_size = 2
==> check max( data[1], data[2] ),
max( data[3], data[4] ),
max( data[5], data[6] ),
.
.
.
max( data[39], data[40] )
for k rows, where k = len(my_array)//num_col
"""
my_array = shapeshifter(num_col=num_col, my_array=data)
rows = [my_array[index] for index in range(len(my_array))]
res = []
for index in range(len(rows)):
res.append( max(rows[index]) )
return res
So far, the code is fine. I checked it with the following:
check1 = looper(2)
check2 = looper(4)
print(check1)
>> [2.0, 4.0, ..., 38.0, 40.0]
print(len(check1))
>> 20
print(check2)
>> [4.0, 8.0, ..., 36.0, 40.0]
print(len(check2))
>> 10
So far so good. Now here is my problem.
def metalooper(col_ls, my_array=data):
"""
This function calls 'looper' - which calls
'shapeshifter' - for every 'col' in 'col_ls'.
EX:
j_list = [1,2,3,4,5]
==> col_ls = [2,4,8,16,32]
==> looper(2), looper(4),
looper(8), ..., looper(32)
==> shapeshifter(2), shapeshifter(4),
shapeshifter(8), ..., shapeshifter(32)
such that looper(2^j) ==> shapeshifter(2^j)
for j in j_list
"""
res = []
for col in col_ls:
res.append(looper(num_col=col))
return res
j_list = [2,4,8,16,32]
check3 = metalooper(j_list)
Running the code above provides this error:
ValueError: total size of new array must be unchanged
With 40 data points, the array can be reshaped into 2 columns of 20 rows, or 4 columns of 10 rows, or 8 columns of 5 rows, BUT at 16 columns, the array cannot be reshaped without clipping data since 40/16 ≠ integer. I believe this is the problem with my code, but I do not know how to fix it.
I am hoping there is a way to cutoff the last values in each row that do not fit in each window. If this is not possible, I am hoping I can append zeroes to fill the entries that maintain the size of the original array, so that I can remove the zeroes after. Or maybe even some complicated if - try - break block. What are some ways around this problem?
I think this will give you what you want in one step:
def windowFunc(a, window, f = np.max):
return np.array([f(i) for i in np.split(a, range(window, a.size, window))])
with default f, that will give you a array of maximums for your windows.
Generally, using np.split and range, this will let you split into a (possibly ragged) list of arrays:
def shapeshifter(num_col, my_array=data):
return np.split(my_array, range(num_col, my_array.size, num_col))
You need a list of arrays because a 2D array can't be ragged (every row needs the same number of columns)
If you really want to pad with zeros, you can use np.lib.pad:
def shapeshifter(num_col, my_array=data):
return np.lib.pad(my_array, (0, num_col - my.array.size % num_col), 'constant', constant_values = 0).reshape(-1, num_col)
Warning:
It is also technically possible to use, for example, a.resize(32,2) which will create an ndArray padded with zeros (as you requested). But there are some big caveats:
You would need to calculate the second axis because -1 tricks don't work with resize.
If the original array a is referenced by anything else, a.resize will fail with the following error:
ValueError: cannot resize an array that references or is referenced
by another array in this way. Use the resize function
The resize function (i.e. np.resize(a)) is not equivalent to a.resize, as instead of padding with zeros it will loop back to the beginning.
Since you seem to want to reference a by a number of windows, a.resize isn't very useful. But it's a rabbit hole that's easy to fall into.
EDIT:
Looping through a list is slow. If your input is long and windows are small, the windowFunc above will bog down in the for loops. This should be more efficient:
def windowFunc2(a, window, f = np.max):
tail = - (a.size % window)
if tail == 0:
return f(a.reshape(-1, window), axis = -1)
else:
body = a[:tail].reshape(-1, window)
return np.r_[f(body, axis = -1), f(a[tail:])]
Here's a generalized way to reshape with truncation:
def reshape_and_truncate(arr, shape):
desired_size_factor = np.prod([n for n in shape if n != -1])
if -1 in shape: # implicit array size
desired_size = arr.size // desired_size_factor * desired_size_factor
else:
desired_size = desired_size_factor
return arr.flat[:desired_size].reshape(shape)
Which your shapeshifter could use in place of reshape

Nested array slicing

Let's say I have an array of vectors:
""" simple line equation """
function getline(a::Array{Float64,1},b::Array{Float64,1})
line = Vector[]
for i=0:0.1:1
vector = (1-i)a+(i*b)
push!(line, vector)
end
return line
end
This function returns an array of vectors containing x-y positions
Vector[11]
> Float64[2]
> Float64[2]
> Float64[2]
> Float64[2]
.
.
.
Now I want to seprate all x and y coordinates of these vectors to plot them with plotyjs.
I have already tested some approaches with no success!
What is a correct way in Julia to achive this?
You can broadcast getindex:
xs = getindex.(vv, 1)
ys = getindex.(vv, 2)
Edit 3:
Alternatively, use list comprehensions:
xs = [v[1] for v in vv]
ys = [v[2] for v in vv]
Edit:
For performance reasons, you should use StaticArrays to represent 2D points. E.g.:
getline(a,b) = [(1-i)a+(i*b) for i=0:0.1:1]
p1 = SVector(1.,2.)
p2 = SVector(3.,4.)
vv = getline(p1,p2)
Broadcasting getindex and list comprehensions will still work, but you can also reinterpret the vector as a 2×11 matrix:
to_matrix{T<:SVector}(a::Vector{T}) = reinterpret(eltype(T), a, (size(T,1), length(a)))
m = to_matrix(vv)
Note that this does not copy the data. You can simply use m directly or define, e.g.,
xs = #view m[1,:]
ys = #view m[2,:]
Edit 2:
Btw., not restricting the type of the arguments of the getline function has many advantages and is preferred in general. The version above will work for any type that implements multiplication with a scalar and addition, e.g., a possible implementation of immutable Point ... end (making it fully generic will require a bit more work, though).

Implementing Permutation of Complex Numbers In TensorFlow

In this associative lstm paper, http://arxiv.org/abs/1602.03032, they ask to permute a complex tensor.
They have provided their code here: https://github.com/mohammadpz/Associative_LSTM/blob/master/bricks.py#L79
I'm trying to replicate this in tensorflow. Here is what I have done:
# shape: C x F/2
# output = self.permutations: [num_copies x cell_size]
permutations = []
indices = numpy.arange(self._dim / 2) #[1 ,2 ,3 ...64]
for i in range(self._num_copies):
numpy.random.shuffle(indices) #[4, 48, 32, ...64]
permutations.append(numpy.concatenate(
[indices,
[ind + self._dim / 2 for ind in indices]]))
#you're appending a row with two columns -- a permutation in the first column, and the same permutation + dim/2 for imaginary
# C x F (numpy)
self.permutations = tf.constant(numpy.vstack(permutations), dtype = tf.int32) #This is a permutation tensor that has the stored permutations
# output = self.permutations: [num_copies x cell_size]
def permute(complex_tensor): #complex tensor is [batch_size x cell_size]
gather_tensor = tf.gather_nd(complex_tensor, self.permutations)
return gather_tensor
Basically, my question is: How efficiently can this be done in TensorFlow? Is there anyway to keep the batch size dimension fixed of complex tensor?
Also, is gather_nd the best way to go about this? Or is it better to do a for loop and iterate over each row in self.permutations using tf.gather?
def permute(self, complex_tensor):
inputs_permuted = []
for i in range(self.permutations.get_shape()[0].value):
inputs_permuted.append(
tf.gather(complex_tensor, self.permutations[i]))
return tf.concat(0, inputs_permuted)
I thought that gather_nd would be far more efficient.
Nevermind, I figured it out, the trick is to just use permute the original input tensor using tf transpose. This will allow you then to do a tf.gather on the entire matrix. Then you can tf concat the matrices together. Sorry if this wasted anyone's time.

Despite many examples online, I cannot get my MATLAB repmat equivalent working in python

I am trying to do some numpy matrix math because I need to replicate the repmat function from MATLAB. I know there are a thousand examples online, but I cannot seem to get any of them working.
The following is the code I am trying to run:
def getDMap(image, mapSize):
newSize = (float(mapSize[0]) / float(image.shape[1]), float(mapSize[1]) / float(image.shape[0]))
sm = cv.resize(image, (0,0), fx=newSize[0], fy=newSize[1])
for j in range(0, sm.shape[1]):
for i in range(0, sm.shape[0]):
dmap = sm[:,:,:]-np.array([np.tile(sm[j,i,:], (len(sm[0]), len(sm[1]))) for k in xrange(len(sm[2]))])
return dmap
The function getDMap(image, mapSize) expects an OpenCV2 HSV image as its image argument, which is a numpy array with 3 dimensions: [:,:,:]. It also expects a tuple with 2 elements as its imSize argument, of course making sure the function passing the arguments takes into account that in numpy arrays the rows and colums are swapped (not: x, y, but: y, x).
newSize then contains a tuple containing fracions that are used to resize the input image to a specific scale, and sm becomes a resized version of the input image. This all works fine.
This is my goal:
The following line:
np.array([np.tile(sm[i,j,:], (len(sm[0]), len(sm[1]))) for k in xrange(len(sm[2]))]),
should function equivalent to the MATLAB expression:
repmat(sm(j,i,:),[size(sm,1) size(sm,2)]),
This is my problem:
Testing this, an OpenCV2 image with dimensions 800x479x3 is passed as the image argument, and (64, 48) (a tuple) is passed as the imSize argument.
However when testing this, I get the following ValueError:
dmap = sm[:,:,:]-np.array([np.tile(sm[i,j,:], (len(sm[0]),
len(sm[1]))) for k in xrange(len(sm[2]))])
ValueError: operands could not be broadcast together with
shapes (48,64,3) (64,64,192)
So it seems that the array dimensions do not match and numpy has a problem with that. But my question is what? And how do I get this working?
These 2 calculations match:
octave:26> sm=reshape(1:12,2,2,3)
octave:27> x=repmat(sm(1,2,:),[size(sm,1) size(sm,2)])
octave:28> x(:,:,2)
7 7
7 7
In [45]: sm=np.arange(1,13).reshape(2,2,3,order='F')
In [46]: x=np.tile(sm[0,1,:],[sm.shape[0],sm.shape[1],1])
In [47]: x[:,:,1]
Out[47]:
array([[7, 7],
[7, 7]])
This runs:
sm[:,:,:]-np.array([np.tile(sm[0,1,:], (2,2,1)) for k in xrange(3)])
But it produces a (3,2,2,3) array, with replication on the 1st dimension. I don't think you want that k loop.
What's the intent with?
for i in ...:
for j in ...:
data = ...
You'll only get results from the last iteration. Did you want data += ...? If so, this might work (for a (N,M,K) shaped sm)
np.sum(np.array([sm-np.tile(sm[i,j,:], (N,M,1)) for i in xrange(N) for j in xrange(M)]),axis=0)
z = np.array([np.tile(sm[i,j,:], (N,M,1)) for i in xrange(N) for j in xrange(M)]),axis=0)
np.sum(sm - z, axis=0) # let numpy broadcast sm
Actually I don't even need the tile. Let broadcasting do the work:
np.sum(np.array([sm-sm[i,j,:] for i in xrange(N) for j in xrange(M)]),axis=0)
I can get rid of the loops with repeat.
sm1 = sm.reshape(N*M,L) # combine 1st 2 dim to simplify repeat
z1 = np.repeat(sm1, N*M, axis=0).reshape(N*M,N*M,L)
x1 = np.sum(sm1 - z1, axis=0).reshape(N,M,L)
I can also apply broadcasting to the last case
x4 = np.sum(sm1-sm1[:,None,:], 0).reshape(N,M,L)
# = np.sum(sm1[None,:,:]-sm1[:,None,:], 0).reshape(N,M,L)
With sm I have to expand (and sum) 2 dimensions:
x5 = np.sum(np.sum(sm[None,:,None,:,:]-sm[:,None,:,None,:],0),1)
len(sm[0]) and len(sm[1]) are not the sizes of the first and second dimensions of sm. They are the lengths of the first and second row of sm, and should both return the same value. You probably want to replace them with sm.shape[0] and sm.shape[1], which are equivalent to your Matlab code, although I am not sure that it will work as you expect it to.

Resources