I'm new to python and I am trying to work through some exercises introducing numpy. I have got stuck on this question:
Create a function that takes š,š ā ā as input and generates a šĆš matrix (numpy.array) š“ with entries a[i,j] = j*m + i where 0 ā¤ š ā¤ šā1 and 0 ā¤ š ā¤šā1
I have found a way of doing this more or less without numpy but any help on this would be appreciated.
I think the best way would be to create an array using generator first and then convert it into a numpy array
// m, n = rows, cols
np.array([j*m + i for j in range(n) for i in range(m)]).reshape((m, n))
However, it is a sequence of numbers along rows so it can be more easily done as
np.array(range(m*n)).reshape((m, n))
However, for this numpy library also has an inbuilt function which is arange
np.arange(m*n).reshape((m, n))
Hope it helps.
Related
New to julia, so this is probably very easy.
I have an n-by-m array and a vector of length n and want to repeat each row of the array the number of times in the corresponding element of the vector. For example:
mat = rand(3,6)
v = vec([2 3 1])
The result should be a 6-by-6 array. I tried the repeat function but
repeat(mat, inner = v)
yields a 6Ć18Ć1 Array{Float64,3}: array instead so it takes v to be the dimensions along which to repeat the elements. In matlab I would use repelem(mat, v, 1) and I hope julia offers something similar. My actual matrix is a lot bigger and I will have to call the function many times, so this operation needs to be as fast as possible.
It has been discussed to add a similar thing to Julia Base, but currently it is not implemented yet AFAIK. You can achieve what you want using the inverse_rle function from StatsBase.jl:
julia> row_idx = inverse_rle(axes(v, 1), v)
6-element Array{Int64,1}:
1
1
2
2
2
3
and now you can write:
mat[row_idx, :]
or
#view mat[row_idx, :]
(the second option creates a view which might be relevant in your use case if you say that your mat is large and you need to do such indexing many times - which option is faster will depend on your exact use case).
To use vcat(a,b) and hcat(a,b), one must match the number of columns or number of rows in the matrices a and b.
When constructing a matrix using vact(a, b) or hcat(a, b) in a loop, one needs an initial matrix a (like a starting statement). Although all the sub-matrices are created in the same manner, I might need to construct this initial matrix a outside of the loop.
For example, if the loop condition is for i in 1:w, then I would need to pre-create a using i = 1, then start the loop with for i in 2:w.
If there is a nested loop, then my method is very awkward. I have thought the following methods, but it seems they don't really work:
Use a dummy a, delete a after the loop. From this question, we cannot delete row in a matrix. If we use another variable to refer to the useful rows and columns, we might waste some memory allocation.
Use reshape() to make an empty dummy a. It works for 1 dimension, but not multiple dimensions.
julia> a = reshape([], 2, 0)
2Ć0 Array{Any,2}
julia> b = hcat(a, [3, 3])
2Ć1 Array{Any,2}:
3
3
julia> a = reshape([], 2, 2)
ERROR: DimensionMismatch("new dimensions (2,2) must be consistent with array size 0")
in reshape(::Array{Any,1}, ::Tuple{Int64,Int64}) at ./array.jl:113
in reshape(::Array{Any,1}, ::Int64, ::Int64, ::Vararg{Int64,N}) at ./reshapedarray.jl:39
So my question is how to work around with vcat() and hcat() in a loop?
Edit:
Here is the problem I got stuck in:
There are many gray pixel images. Each one is represented as a 20 by 20 Float64 array. One function foo(n) randomly picks n of those matrices, and combine them to a big square.
If n has integer square root, then foo(n) returns a sqrt(n) * 20 by sqrt(n) * 20 matrix.
If n does not have integer square root, then foo(n) returns a ceil(sqrt(n)) * 20 by ceil(sqrt(n)) * 20 matrix. On the last row of the big square image (a row of 20 by 20 matrices), foo(n) fills ceil(sqrt(n)) ^ 2 - n extra black images (each one is represented as zeros(20,20)).
My current algorithm for foo(n) is to use a nested loop. In the inner loop, hcat() builds a layer (consisting ceil(sqrt(n)) images). In the outer loop, vcat() combines those layers.
Then dealing with hcat() and vcat() in a loop becomes complicated.
So would:
pickimage() = randn(20,20)
n = 16
m = ceil(Int, sqrt(n))
out = Matrix{Float64}(20m, 20m)
k = 0
for i in (1:m)-1
for j in (1:m)-1
out[20i + (1:20), 20j + (1:20)] .= ((k += 1) <= n) ? pickimage() : zeros(20,20)
end
end
be a relevant solution?
Suppose that f(x,y) is a bivariate function as follows:
function [ f ] = f(x,y)
UN=(g)1.6*(1-acos(g)/pi)-0.8;
f= 1+UN(cos(0.5*pi*x+y));
end
How to improve execution time for function F(N) with the following code:
function [VAL] = F(N)
x=0:4/N:4;
y=0:2*pi/1000:2*pi;
VAL=zeros(N+1,3);
for i = 1:N+1
val = zeros(1,N+1);
for j = 1:N+1
val(j) = trapz(y,f(0,y).*f(x(i),y).*f(x(j),y))/2/pi;
end
val = fftshift(fft(val))/N;
l = (length(val)+1)/2;
VAL(i,:)= val(l-1:l+1);
end
VAL = fftshift(fft(VAL,[],1),1)/N;
L = (size(VAL,1)+1)/2;
VAL = VAL(L-1:L+1,:);
end
Note that N=2^p where p>10, so please consider the memory limitations while optimizing the code using ndgrid, arrayfun, etc.
FYI: The code intends to find the central 3-by-3 submatrix of the fftn of
fun=#(a,b) trapz(y,f(0,y).*f(a,y).*f(b,y))/2/pi;
where a,b are in [0,4]. The key idea is that we can save memory using the code above specially when N is very large. But the execution time is still an issue because of nested loops. See the figure below for N=2^2:
This is not a full answer, but some possibly helpful hints:
0) The trivial: Are you sure you need numerics? Can't you do the computation analytically?
1) Do not use function handles:
function [ f ] = f(x,y)
f= 1+1.6*(1-acos(cos(0.5*pi*x+y))/pi)-0.8
end
2) Simplify analytically: acos(cos(x)) is the same as abs(mod(x + pi, 2 * pi) - pi), which should compute slightly faster. Or, instead of sampling and then numerically integrating, first integrate analytically and sample the result.
3) The FFT is a very efficient algorithm to compute the full DFT, but you don't need the full DFT. Since you only want the central 3 x 3 coefficients, it might be more efficient to directly apply the DFT definition and evaluate the formula only for those coefficients that you want. That should be both fast and memory-efficient.
4) If you repeatedly do this computation, it might be helpful to precompute DFT coefficients. Here, dftmtx from the Signal Processing toolbox can assist.
5) To get rid of the loops, think about the problem not in the form of computation instructions, but a single matrix operation. If you consider your input N x N matrix as a vector with NĀ² elements, and your output 3 x 3 matrix as a 9-element vector, then the whole operation you apply (numerical integration via trapz and DFT via fft) appears to be a simple linear transform, which it should be possible to express as an NĀ² x 9 matrix.
This question is related to matlab: find the index of common values at the same entry from two arrays.
Suppose that I have an 1000 by 10000 matrix that contains value 0,1,and 2. Each row are treated as a sample. I want to calculate the pairwise distance between those samples according to the formula d = 1-1/(2p)sum(a/c+b/d) where a,b,c,d can treated as as the row vector of length 10000 according to some definition and p=10000. c and d are probabilities such that c+d=1.
An example of how to find the values of a,b,c,d: suppose we want to find d between sample i and bj, then I look at row i and j.
If kth entry of row i and j has value 2 and 2, then a=2,b=0,c=1,d=0 (I guess I will assign 0/0=0 in this case).
If kth entry of row i and j has value 2 and 1 or vice versa, then a=1,b=0,c=3/4,d=1/4.
The similar assignment will give to the case for 2,0(a=0,b=0,c=1/2,d=1/2),1,1(a=1,b=1,c=1/2,d=1/2),1,0(a=0,b=1,c=1/4,d=3/4),0,0(a=0,b=2,c=0,d=1).
The matlab code I have so far is using for loops for i and j, then find the cases above by using find, then create two arrays for a/c and b/d. This is extremely slow, is there a way that I can improve the efficiency?
Edit: the distance d is the formula given in this paper on page 13.
Provided those coefficients are fixed, then I think I've successfully vectorised the distance function. Figuring out the formulae was fun. I flipped things around a bit to minimise division, and since I wasn't aware of pdist until #horchler's comment, you get it wrapped in loops with the constants factored out:
% m is the data
[n p] = size(m, 1);
distance = zeros(n);
for ii=1:n
for jj=ii+1:n
a = min(m(ii,:), m(jj,:));
b = 2 - max(m(ii,:), m(jj,:));
c = 4 ./ (m(ii,:) + m(jj,:));
c(c == Inf) = 0;
d = 1 - c;
distance(ii,jj) = sum(a.*c + b.*d);
% distance(jj,ii) = distance(ii,jj); % optional for the full matrix
end
end
distance = 1 - (1 / (2 * p)) * distance;
I have a 1974x1 vector, Upper, and I am trying to break the information up into individual arrays of 36 items each. So, I used length to find that there are 1974 items and then divided by 36 and used the floor function. I cannot figure out how to do it all with n.
Here is my logic: I am defining n in an attempt to find the number of subsets that need to be defined. Then, I am trying to have subsetn become subset1, subset2,...,subset36. However, MATLAB only definies the matrix subsetn as a 1x36 matrix. However, this matrix contains what subset1 is supposed to contain(1...36). Do you guys have any advice for a newbie? What am I doing wrong?
binSize = 36;
nData = length(Upper);
nBins = floor(nData/36);
nDiscarded = nData - binSize*nBins;
n=1:binSize;
subsetn= [(n-1)*binSize+1:n*binSize];
You can create a 54x36 array where the nth column is your nth subset.
subsetArray=reshape(x(1:binSize*nBins),[],nBins);
You can access the nth subset as subsetArray(:,n)
Sorry in advance if I misunderstood what you want to do.
I think the following little trick might do what you want (it's hacky, but I'm no Matlab expert):
[a, b] = meshgrid(0:nBins-1, 0:binSize-1)
inds = a*binSize + b + 1
Now inds is a nBins*binSize matrix of indices. You can index Upper with it like
Upper(inds)
which should give you the subsets as the columns in the resulting matrix.
Edit: on seeing Yoda's answer, his is better ;)