How to collapse 2D scatter plot into a dot plot? - arrays

I have a very large 2d array of shape (186295, 2) with the first element of every 2-element sub-array being x and the second element being y. Here is how I produce the scatter plot by separating x and y components in matplotlib:
ax.scatter(A[:, 0]+np.random.uniform(-.02, .02, A.shape[0]), A[:, 1], s=2, color='b', alpha=0.5, zorder=3)
However, I would like
all points with x-value in the range [8,9.2] be shown as a dot plot at the mid point x=8.6,
all points with x-value in the range [9.2,10.4] be shown as a dot plot at the mid point x=9.8,
all points with x-value in the range [10.4,12.2] be shown as a dot plot at the mid point x=11.3.
Your help is greatly appreciated,

You can use np.select:
Example:
import numpy as np
from matplotlib import pyplot as plt
n=100
x = np.random.uniform(8, 12, n)
y = np.random.uniform(.01, 1, n)
a = np.array(list(zip(x,y)))
fig,ax = plt.subplots(2, sharex=True)
ax[0].scatter(a[:,0], a[:,1])
ax[0].title.set_text('Scatter Plot')
conditions = [a[:,0]<=8, a[:,0]<=9.2, a[:,0]<=10.4, a[:,0]<=12.2, a[:,0]>12.2]
choices = [a[:,0], 8.6, 9.8, 11.3, a[:,0]]
a[:,0] = np.select(conditions, choices)
ax[1].scatter(a[:,0], a[:,1])
ax[1].title.set_text('Dot Plot')
Result:
Another possibility is using np.digitize which saves some typing as it uses a list of bins (upper bounds) instead of a list of conditions.

Related

Indexing 3D arrays with Numpy

I have an array in three dimensions (x, y, z) and an indexing vector. This vector has a size equal to the dimension x of the array. Its objective is to index a specific y bringing their respective z, i.e., the expected result has dimension (x, z).
I wrote a code that works as expected, but does anyone know if a Numpy function can replace the for loop and solve the problem more optimally?
arr = np.random.rand(100,5,2)
result = np.random.rand(100,2)
id = [np.random.randint(0, 5) for _ in range(100)]
for i in range(100):
result[i] = arr[i,id[i]]
You can achieve this with this piece of code:
import numpy as np
arr = np.random.randn(100, 5, 2)
ids = np.random.randint(0, 5, size=100)
res = arr[range(100), ids]
res.shape # (100, 2)

Python 3.7: Modelling a 2D Gaussian equation using a Numpy meshgrid and arrays without iterating through each point

I am currently trying to write my own 2D Gaussian function as a coding exercise, and have been able to create the following script:
import numpy as np
import matplotlib.pyplot as plt
def Gaussian2D_v1(coords=None, # x and y coordinates for each image.
amplitude=1, # Highest intensity in image.
xo=0, # x-coordinate of peak centre.
yo=0, # y-coordinate of peak centre.
sigma_x=1, # Standard deviation in x.
sigma_y=1, # Standard deviation in y.
rho=0, # Correlation coefficient.
offset=0): # Offset from zero (background radiation).
x, y = coords
xo = float(xo)
yo = float(yo)
# Create covariance matrix
mat_cov = [[sigma_x**2, rho * sigma_x * sigma_y],
[rho * sigma_x * sigma_y, sigma_y**2]]
mat_cov = np.asarray(mat_cov)
# Find its inverse
mat_cov_inv = np.linalg.inv(mat_cov)
G_array = []
# Calculate pixel by pixel
# Iterate through row last
for i in range(0, np.shape(y)[0]):
# Iterate through column first
for j in range(0, np.shape(x)[1]):
mat_coords = np.asarray([[x[i, j]-xo],
[y[i, j]-xo]])
G = (amplitude * np.exp(-0.5*np.matmul(np.matmul(mat_coords.T,
mat_cov_inv),
mat_coords)) + offset)
G_array.append(G)
G_array = np.asarray(G_array)
G_array = G_array.reshape(64, 64)
return G_array.ravel()
coords = np.meshgrid(np.arange(0, 64), np.arange(0, 64))
model_1 = Gaussian2D_v1(coords,
amplitude=20,
xo=32,
yo=32,
sigma_x=6,
sigma_y=3,
rho=0.8,
offset=20).reshape(64, 64)
plt.figure(figsize=(5, 5)).add_axes([0,
0,
1,
1])
plt.contourf(model_1)
The code as it is works, but as you can see, I am currently iterating through the mesh grid one point at a time, and appending each point to a list, which is then converted to an array and re-shaped to give the 2D Gaussian distribution.
How can I modify the script to forgo using a nested "for" loop and have the program consider the whole meshgrid for matrix calculations? Is such a method possible?
Thanks!
Of course there is a solution, numpy is all about array operations and vectorization of the code! np.matmul can take args with more than 2 dimensions and apply the matrix multiplication on the last two axes only (and this calculation in parallel over the others axes). However, making sure of the right axes order can get tricky.
Here is your edited code:
import numpy as np
import matplotlib.pyplot as plt
def Gaussian2D_v1(coords, # x and y coordinates for each image.
amplitude=1, # Highest intensity in image.
xo=0, # x-coordinate of peak centre.
yo=0, # y-coordinate of peak centre.
sigma_x=1, # Standard deviation in x.
sigma_y=1, # Standard deviation in y.
rho=0, # Correlation coefficient.
offset=0): # Offset from zero (background radiation).
x, y = coords
xo = float(xo)
yo = float(yo)
# Create covariance matrix
mat_cov = [[sigma_x**2, rho * sigma_x * sigma_y],
[rho * sigma_x * sigma_y, sigma_y**2]]
mat_cov = np.asarray(mat_cov)
# Find its inverse
mat_cov_inv = np.linalg.inv(mat_cov)
# PB We stack the coordinates along the last axis
mat_coords = np.stack((x - xo, y - yo), axis=-1)
G = amplitude * np.exp(-0.5*np.matmul(np.matmul(mat_coords[:, :, np.newaxis, :],
mat_cov_inv),
mat_coords[..., np.newaxis])) + offset
return G.squeeze()
coords = np.meshgrid(np.arange(0, 64), np.arange(0, 64))
model_1 = Gaussian2D_v1(coords,
amplitude=20,
xo=32,
yo=32,
sigma_x=6,
sigma_y=3,
rho=0.8,
offset=20)
plt.figure(figsize=(5, 5)).add_axes([0, 0, 1, 1])
plt.contourf(model_1)
So, the equation is exp(-0.5 * (X - µ)' Cinv (X - µ) ), where X is our coordinate matrix, µ the mean (x0, y0) and Cinv the inverse covariance matrix (and ' is a transpose). In the code, I stack both meshgrids to a new matrix so that: mat_coords has a shape of (Ny, Nx, 2). In the first np.matmul call, I add a new axis so that the shapes go like :(Ny, Nx, 1, 2) * (2, 2) = (Ny, Nx, 1, 2). As you see, the matrix multiplication is done on the two last axes, in parallel on the other. Then, I add a new axis so that: (Ny, Nx, 1, 2) * (Ny, Nx, 2, 1) = (Ny, Nx, 1, 1).
The np.squeeze() call returns a version without the two last singleton axes.

how to convert a tuple in to a 2D matrix

I a have tuple a with the shape of (3,1) and I would like to construct a 2D matrix X with the dimension(3,2). After X is constructed, there is a need to multiply X'*X which is supposed to be in the shape of (2,2)
enter image description here
import numpy as np
thistuple = (1, 2, 3)
arr=np.ones(shape=(len(thistuple),2))
tuple_index=0
for i in range(0,len(arr)):
for j in range(0,len(arr[0])):
if(tuple_index>=len(thistuple)):
break
arr[i][j]=thistuple[tuple_index]
tuple_index+=1
rez = arr.T
result = np.dot(rez,arr)
print(result)
The above code will work for an tuple of shape n*1 in python.

Despite many examples online, I cannot get my MATLAB repmat equivalent working in python

I am trying to do some numpy matrix math because I need to replicate the repmat function from MATLAB. I know there are a thousand examples online, but I cannot seem to get any of them working.
The following is the code I am trying to run:
def getDMap(image, mapSize):
newSize = (float(mapSize[0]) / float(image.shape[1]), float(mapSize[1]) / float(image.shape[0]))
sm = cv.resize(image, (0,0), fx=newSize[0], fy=newSize[1])
for j in range(0, sm.shape[1]):
for i in range(0, sm.shape[0]):
dmap = sm[:,:,:]-np.array([np.tile(sm[j,i,:], (len(sm[0]), len(sm[1]))) for k in xrange(len(sm[2]))])
return dmap
The function getDMap(image, mapSize) expects an OpenCV2 HSV image as its image argument, which is a numpy array with 3 dimensions: [:,:,:]. It also expects a tuple with 2 elements as its imSize argument, of course making sure the function passing the arguments takes into account that in numpy arrays the rows and colums are swapped (not: x, y, but: y, x).
newSize then contains a tuple containing fracions that are used to resize the input image to a specific scale, and sm becomes a resized version of the input image. This all works fine.
This is my goal:
The following line:
np.array([np.tile(sm[i,j,:], (len(sm[0]), len(sm[1]))) for k in xrange(len(sm[2]))]),
should function equivalent to the MATLAB expression:
repmat(sm(j,i,:),[size(sm,1) size(sm,2)]),
This is my problem:
Testing this, an OpenCV2 image with dimensions 800x479x3 is passed as the image argument, and (64, 48) (a tuple) is passed as the imSize argument.
However when testing this, I get the following ValueError:
dmap = sm[:,:,:]-np.array([np.tile(sm[i,j,:], (len(sm[0]),
len(sm[1]))) for k in xrange(len(sm[2]))])
ValueError: operands could not be broadcast together with
shapes (48,64,3) (64,64,192)
So it seems that the array dimensions do not match and numpy has a problem with that. But my question is what? And how do I get this working?
These 2 calculations match:
octave:26> sm=reshape(1:12,2,2,3)
octave:27> x=repmat(sm(1,2,:),[size(sm,1) size(sm,2)])
octave:28> x(:,:,2)
7 7
7 7
In [45]: sm=np.arange(1,13).reshape(2,2,3,order='F')
In [46]: x=np.tile(sm[0,1,:],[sm.shape[0],sm.shape[1],1])
In [47]: x[:,:,1]
Out[47]:
array([[7, 7],
[7, 7]])
This runs:
sm[:,:,:]-np.array([np.tile(sm[0,1,:], (2,2,1)) for k in xrange(3)])
But it produces a (3,2,2,3) array, with replication on the 1st dimension. I don't think you want that k loop.
What's the intent with?
for i in ...:
for j in ...:
data = ...
You'll only get results from the last iteration. Did you want data += ...? If so, this might work (for a (N,M,K) shaped sm)
np.sum(np.array([sm-np.tile(sm[i,j,:], (N,M,1)) for i in xrange(N) for j in xrange(M)]),axis=0)
z = np.array([np.tile(sm[i,j,:], (N,M,1)) for i in xrange(N) for j in xrange(M)]),axis=0)
np.sum(sm - z, axis=0) # let numpy broadcast sm
Actually I don't even need the tile. Let broadcasting do the work:
np.sum(np.array([sm-sm[i,j,:] for i in xrange(N) for j in xrange(M)]),axis=0)
I can get rid of the loops with repeat.
sm1 = sm.reshape(N*M,L) # combine 1st 2 dim to simplify repeat
z1 = np.repeat(sm1, N*M, axis=0).reshape(N*M,N*M,L)
x1 = np.sum(sm1 - z1, axis=0).reshape(N,M,L)
I can also apply broadcasting to the last case
x4 = np.sum(sm1-sm1[:,None,:], 0).reshape(N,M,L)
# = np.sum(sm1[None,:,:]-sm1[:,None,:], 0).reshape(N,M,L)
With sm I have to expand (and sum) 2 dimensions:
x5 = np.sum(np.sum(sm[None,:,None,:,:]-sm[:,None,:,None,:],0),1)
len(sm[0]) and len(sm[1]) are not the sizes of the first and second dimensions of sm. They are the lengths of the first and second row of sm, and should both return the same value. You probably want to replace them with sm.shape[0] and sm.shape[1], which are equivalent to your Matlab code, although I am not sure that it will work as you expect it to.

MATLAB: creation of 3D array, vectorizing vs. looping

I have searched for an answer for my question on here but cannot find one, so I apologize in advance if it already exists!
What I am trying to do is create a 3D array of 3-d points in space (x,y,z). I know in a 1D vector you can specify the interval, like 1:5:20, to get a vector from 1 to 20 spaced by 5. What I would like to do is create a 3D array, most likely row by row would be the most efficient, where the spacing is by a unit vector (ix, iy, iz). so, for example,
a(1,1,:) = [1, 1, 1]
uv = [0.5 0.5 0.5]
a(2,2,:) = [1.5, 1.5, 1.5]
etc. I know the numbers are not 'unit vectors', but the idea is there. Is there something along the lines of a = [1, 1, 1] : uv : [end, end, end] ???
You might be interested in a mesh grid.
An example:
[X,Y,Z] = meshgrid(1:0.1:2, 1:0.1:2, 1:0.1:2); %# they can be different
points = [X(:) Y(:) Z(:)];
plot3(points(:,1),points(:,2),points(:,3),'.')
box on, axis equal
xlabel x, ylabel y, zlabel z

Resources