Symmetrical matrix with numpy - loops

from random import *
N = 100
gamma = 0.7
connect = zeros((N,N))
for i in range(N):
for j in range(i+1):
if random() < gamma:
connect[i,j] = 1
connect[j,i] = 1
else:
connect[i,j] = 0
connect[j,i] = 0
What I try to do is to create a symmetrical matrix, filled with zeros and ones (ones with a probability of 0.7).
Here is the double for loop, very inefficient...I shall make something with numpy, which I believe could speed up thing a great deal?
Does anyone know how to proceed?
Thank you very much!

You could use the numpy random module to generate random vectors, and use those vectors to seed the matrix. For example:
import numpy as np
N = 100
gamma = 0.7
connect = np.zeros((N,N),dtype=np.int32)
for i in range(0,N):
dval = np.diag((np.random.random_sample(size=(N-i))<gamma).astype(np.int32),i)
connect += dval
if (i>0):
connect += dval.T
does this diagonally using numpy.diag, but you could do it row-wise to assemble the upper or lower triangular portion, then use addition to form the symmetrical matrix. I don't have a feeling for which might be faster.
EDIT:
In fact this row wise version is about 5 times faster than the diagonal version, which I guess shouldn't be all that surprising given the memory access patterns it uses compared to diagonal assembly.
N = 100
gamma = 0.7
connect = np.zeros((N,N),dtype=np.int32)
for i in range(0,N):
rval = (np.random.random_sample(size=(N-i))<gamma).astype(np.int32)
connect[i,i:] = rval
connect += np.triu(connect,1).T
EDIT 2
This is even simpler and about 4 times faster than the row-wise version above. Here a triangular matrix is formed directly from a full matrix of weights, then added to its transpose to produce the symmetric matrix:
N = 100
gamma = 0.7
a=np.triu((np.random.random_sample(size=(N,N))<gamma).astype(np.int32))
connect = a + np.triu(a,1).T
On the Linux system I tested it on, version 1 takes about 6.5 milliseconds, version 2 takes about 1.5 milliseconds, version 3 takes about 450 microseconds.

Related

Take numbers form two intervals in concentric spheres in Julia

I am trying to take numbers from two intervals in Julia. The problem is the following,
I am trying to create concentric spheres and I need to generate vectors of dimension equal to 15 filled with numbers taken from each circle. The code is:
rmax = 5
ra = fill(0.0,1,rmax)
for i=1:rmax-1
ra[:,i].=rad/i
ra[:,rmax].= 0
end
for i=1:3
ptset = Any[]
for j=1:200
yt= 0
yt= rand(Truncated(Normal(0, 1), -ra[i], ra[i] ))
if -ra[(i+1)] < yt <= -ra[i] || ra[(i+1)] <= yt < ra[i]
push!(ptset,yt)
if length(ptset) == 15
break
end
end
end
end
Here, I am trying to generate spheres with uniform random numbers inside of each one; In this case, yt is only part of the construction of the numbers inside the sphere.
I would like to generate points in a sphere with radius r0 (ra[:,4] for this case), then points distributed from the edge of the first sphere to the second one wit radius r1 (here ra[:,3]) and so on.
In order to do that, I try to take elements that fulfill one of the two conditions -ra[(i+1)] < yt <= -ra[i]
or ra[(i+1)] <= yt < ra[i], i.e. I would like to generate a vector with positive and negative numbers. I used the operator || but it seems to take only the positive part. I am new in Julia and I am not sure how to take the elements from both parts of the interval. Does anyone has a hit on how to do it?. Thanks in advance
I hope I understood you correctly. First, we need to be able to sample uniformly from an N-dimensional shell with radii r0 and r1:
using Random
using LinearAlgebra: normalize
struct Shell{N}
r0::Float64
r1::Float64
end
Base.eltype(::Type{<:Shell}) = Vector{Float64}
function Random.rand(rng::Random.AbstractRNG, d::Random.SamplerTrivial{Shell{N}}) where {N}
shell = d[]
Δ = shell.r1 - shell.r0
θ = normalize(randn(N)) # uniformly distributed N-dimensional direction of length 1
r = shell.r0 .* θ # scale to a point on the interior of the shell
return r .+ Δ .* θ .* .√rand(N) # add a uniformly random segment between r0 and r1
end
(See here for more info about hooking into Random. You could equally implement a new Distribution, but that's not really necessary.)
Most importantly, a truncated normal will not result in a uniform distribution, but neither will adding a uniform scaling into the right direction: see here for why the square root is necessary (and I hope I got it right; you should check the math once more).
Then we can just create a sequence of shell samples with nested radii:
rmax = 5
rad = 10.0
ra = range(0, rad, length=rmax)
ptset = [rand(Shell{2}(ra[i], ra[i+1]), 15) for i = 1:(rmax - 1)]
(This part I wasn't really sure about, but the point should be clear.)

Compute the product of the next n elements in array

I would like to compute the product of the next n adjacent elements of a matrix. The number n of elements to be multiplied should be given in function's input.
For example for this input I should compute the product of every 3 consecutive elements, starting from the first.
[p, ind] = max_product([1 2 2 1 3 1],3);
This gives [1*2*2, 2*2*1, 2*1*3, 1*3*1] = [4,4,6,3].
Is there any practical way to do it? Now I do this using:
for ii = 1:(length(v)-2)
p = prod(v(ii:ii+n-1));
end
where v is the input vector and n is the number of elements to be multiplied.
in this example n=3 but can take any positive integer value.
Depending whether n is odd or even or length(v) is odd or even, I get sometimes right answers but sometimes an error.
For example for arguments:
v = [1.35912281237829 -0.958120385352704 -0.553335935098461 1.44601450110386 1.43760259196739 0.0266423803393867 0.417039432979809 1.14033971399183 -0.418125096873537 -1.99362640306847 -0.589833539347417 -0.218969651537063 1.49863539349242 0.338844452879616 1.34169199365703 0.181185490389383 0.102817336496793 0.104835620599133 -2.70026800170358 1.46129128974515 0.64413523430416 0.921962619821458 0.568712984110933]
n = 7
I get the error:
Index exceeds matrix dimensions.
Error in max_product (line 6)
p = prod(v(ii:ii+n-1));
Is there any correct general way to do it?
Based on the solution in Fast numpy rolling_product, I'd like to suggest a MATLAB version of it, which leverages the movsum function introduced in R2016a.
The mathematical reasoning is that a product of numbers is equal to the exponent of the sum of their logarithms:
A possible MATLAB implementation of the above may look like this:
function P = movprod(vec,window_sz)
P = exp(movsum(log(vec),[0 window_sz-1],'Endpoints','discard'));
if isreal(vec) % Ensures correct outputs when the input contains negative and/or
P = real(P); % complex entries.
end
end
Several notes:
I haven't benchmarked this solution, and do not know how it compares in terms of performance to the other suggestions.
It should work correctly with vectors containing zero and/or negative and/or complex elements.
It can be easily expanded to accept a dimension to operate along (for array inputs), and any other customization afforded by movsum.
The 1st input is assumed to be either a double or a complex double row vector.
Outputs may require rounding.
Update
Inspired by the nicely thought answer of Dev-iL comes this handy solution, which does not require Matlab R2016a or above:
out = real( exp(conv(log(a),ones(1,n),'valid')) )
The basic idea is to transform the multiplication to a sum and a moving average can be used, which in turn can be realised by convolution.
Old answers
This is one way using gallery to get a circulant matrix and indexing the relevant part of the resulting matrix before multiplying the elements:
a = [1 2 2 1 3 1]
n = 3
%// circulant matrix
tmp = gallery('circul', a(:))
%// product of relevant parts of matrix
out = prod(tmp(end-n+1:-1:1, end-n+1:end), 2)
out =
4
4
6
3
More memory efficient alternative in case there are no zeros in the input:
a = [10 9 8 7 6 5 4 3 2 1]
n = 2
%// cumulative product
x = [1 cumprod(a)]
%// shifted by n and divided by itself
y = circshift( x,[0 -n] )./x
%// remove last elements
out = y(1:end-n)
out =
90 72 56 42 30 20 12 6 2
Your approach is correct. You should just change the for loop to for ii = 1:(length(v)-n+1) and then it will work fine.
If you are not going to deal with large inputs, another approach is using gallery as explained in #thewaywewalk's answer.
I think the problem may be based on your indexing. The line that states for ii = 1:(length(v)-2) does not provide the correct range of ii.
Try this:
function out = max_product(in,size)
size = size-1; % this is because we add size to i later
out = zeros(length(in),1) % assuming that this is a column vector
for i = 1:length(in)-size
out(i) = prod(in(i:i+size));
end
Your code works when restated like so:
for ii = 1:(length(v)-(n-1))
p = prod(v(ii:ii+(n-1)));
end
That should take care of the indexing problem.
using bsxfun you create a matrix each row of it contains consecutive 3 elements then take prod of 2nd dimension of the matrix. I think this is most efficient way:
max_product = #(v, n) prod(v(bsxfun(#plus, (1 : n), (0 : numel(v)-n)')), 2);
p = max_product([1 2 2 1 3 1],3)
Update:
some other solutions updated, and some such as #Dev-iL 's answer outperform others, I can suggest fftconv that in Octave outperforms conv
If you can upgrade to R2017a, you can use the new movprod function to compute a windowed product.

How to improve the execution time of this function?

Suppose that f(x,y) is a bivariate function as follows:
function [ f ] = f(x,y)
UN=(g)1.6*(1-acos(g)/pi)-0.8;
f= 1+UN(cos(0.5*pi*x+y));
end
How to improve execution time for function F(N) with the following code:
function [VAL] = F(N)
x=0:4/N:4;
y=0:2*pi/1000:2*pi;
VAL=zeros(N+1,3);
for i = 1:N+1
val = zeros(1,N+1);
for j = 1:N+1
val(j) = trapz(y,f(0,y).*f(x(i),y).*f(x(j),y))/2/pi;
end
val = fftshift(fft(val))/N;
l = (length(val)+1)/2;
VAL(i,:)= val(l-1:l+1);
end
VAL = fftshift(fft(VAL,[],1),1)/N;
L = (size(VAL,1)+1)/2;
VAL = VAL(L-1:L+1,:);
end
Note that N=2^p where p>10, so please consider the memory limitations while optimizing the code using ndgrid, arrayfun, etc.
FYI: The code intends to find the central 3-by-3 submatrix of the fftn of
fun=#(a,b) trapz(y,f(0,y).*f(a,y).*f(b,y))/2/pi;
where a,b are in [0,4]. The key idea is that we can save memory using the code above specially when N is very large. But the execution time is still an issue because of nested loops. See the figure below for N=2^2:
This is not a full answer, but some possibly helpful hints:
0) The trivial: Are you sure you need numerics? Can't you do the computation analytically?
1) Do not use function handles:
function [ f ] = f(x,y)
f= 1+1.6*(1-acos(cos(0.5*pi*x+y))/pi)-0.8
end
2) Simplify analytically: acos(cos(x)) is the same as abs(mod(x + pi, 2 * pi) - pi), which should compute slightly faster. Or, instead of sampling and then numerically integrating, first integrate analytically and sample the result.
3) The FFT is a very efficient algorithm to compute the full DFT, but you don't need the full DFT. Since you only want the central 3 x 3 coefficients, it might be more efficient to directly apply the DFT definition and evaluate the formula only for those coefficients that you want. That should be both fast and memory-efficient.
4) If you repeatedly do this computation, it might be helpful to precompute DFT coefficients. Here, dftmtx from the Signal Processing toolbox can assist.
5) To get rid of the loops, think about the problem not in the form of computation instructions, but a single matrix operation. If you consider your input N x N matrix as a vector with N² elements, and your output 3 x 3 matrix as a 9-element vector, then the whole operation you apply (numerical integration via trapz and DFT via fft) appears to be a simple linear transform, which it should be possible to express as an N² x 9 matrix.

Matlab random sample of a dataset

I have a dataset (Data) which is a vector of, let's say, 1000 real numbers. I would like to extract at random from Data 100 times 10 contiguous numbers. I don't know how to use Datasample for that purpose.
Thanks in advance for you help.
You can just pick 100 random numbers between 1 and 991:
I = randi(991, 100, 1)
Then use them as the starting points to index 10 contiguous elements:
cell2mat(arrayfun(#(x)(Data(x:x+9)), I, 'uni', false))
Here you have a snipet, but instead of using Datasample, I used randi to generate random indexes.
n_times = 100;
l_data = length(Data);
index_random = randi(l_data-9,n_times,1); % '- 9' to not to surpass the vector limit when you read the 10 items
for ind1 = 1:n_times
random_number(ind1,:) = Data(index_random(ind1):index_random(ind1)+9)
end
This is similar to Dan's answer, but avoids using cells and arrayfun, so it may be faster.
Let Ns denote the number of contiguous numbers you want (10 in your example), and Nt the number of times (100 in your example). Then:
result = Data(bsxfun(#plus, randi(numel(Data)-Ns+1, Nt, 1), 0:Ns-1)); %// Nt x Ns
Here is another solution, close to #Luis, but with cumsum instead of bsxfun:
A = rand(1,1000); % The vector to sample
sz = size(A,2);
N = 100; % no. of samples
B = 10; % size of one sample
first = randi(sz-B+1,N,1); % the starting point for all blocks
rand_blocks = A(cumsum([first ones(N,B-1)],2)); % the result
This results in an N-by-B matrix (rand_blocks), each row of it is one sample. Of course, this could be one-lined, but it won't make it faster, and I want to keep it clear. For small N or B this method is slightly faster. If N or B becomes very large then the bsxfun method is slightly faster. This ranking is not affected by the size of A.

Optimize parameters of a pairwise distance function in Matlab

This question is related to matlab: find the index of common values at the same entry from two arrays.
Suppose that I have an 1000 by 10000 matrix that contains value 0,1,and 2. Each row are treated as a sample. I want to calculate the pairwise distance between those samples according to the formula d = 1-1/(2p)sum(a/c+b/d) where a,b,c,d can treated as as the row vector of length 10000 according to some definition and p=10000. c and d are probabilities such that c+d=1.
An example of how to find the values of a,b,c,d: suppose we want to find d between sample i and bj, then I look at row i and j.
If kth entry of row i and j has value 2 and 2, then a=2,b=0,c=1,d=0 (I guess I will assign 0/0=0 in this case).
If kth entry of row i and j has value 2 and 1 or vice versa, then a=1,b=0,c=3/4,d=1/4.
The similar assignment will give to the case for 2,0(a=0,b=0,c=1/2,d=1/2),1,1(a=1,b=1,c=1/2,d=1/2),1,0(a=0,b=1,c=1/4,d=3/4),0,0(a=0,b=2,c=0,d=1).
The matlab code I have so far is using for loops for i and j, then find the cases above by using find, then create two arrays for a/c and b/d. This is extremely slow, is there a way that I can improve the efficiency?
Edit: the distance d is the formula given in this paper on page 13.
Provided those coefficients are fixed, then I think I've successfully vectorised the distance function. Figuring out the formulae was fun. I flipped things around a bit to minimise division, and since I wasn't aware of pdist until #horchler's comment, you get it wrapped in loops with the constants factored out:
% m is the data
[n p] = size(m, 1);
distance = zeros(n);
for ii=1:n
for jj=ii+1:n
a = min(m(ii,:), m(jj,:));
b = 2 - max(m(ii,:), m(jj,:));
c = 4 ./ (m(ii,:) + m(jj,:));
c(c == Inf) = 0;
d = 1 - c;
distance(ii,jj) = sum(a.*c + b.*d);
% distance(jj,ii) = distance(ii,jj); % optional for the full matrix
end
end
distance = 1 - (1 / (2 * p)) * distance;

Resources