Matlab random sample of a dataset

Matlab random sample of a dataset - arrays

I have a dataset (Data) which is a vector of, let's say, 1000 real numbers. I would like to extract at random from Data 100 times 10 contiguous numbers. I don't know how to use Datasample for that purpose.
Thanks in advance for you help.

You can just pick 100 random numbers between 1 and 991:
I = randi(991, 100, 1)
Then use them as the starting points to index 10 contiguous elements:
cell2mat(arrayfun(#(x)(Data(x:x+9)), I, 'uni', false))

Here you have a snipet, but instead of using Datasample, I used randi to generate random indexes.
n_times = 100;
l_data = length(Data);
index_random = randi(l_data-9,n_times,1); % '- 9' to not to surpass the vector limit when you read the 10 items
for ind1 = 1:n_times
random_number(ind1,:) = Data(index_random(ind1):index_random(ind1)+9)
end

This is similar to Dan's answer, but avoids using cells and arrayfun, so it may be faster.
Let Ns denote the number of contiguous numbers you want (10 in your example), and Nt the number of times (100 in your example). Then:
result = Data(bsxfun(#plus, randi(numel(Data)-Ns+1, Nt, 1), 0:Ns-1)); %// Nt x Ns

Here is another solution, close to #Luis, but with cumsum instead of bsxfun:
A = rand(1,1000); % The vector to sample
sz = size(A,2);
N = 100; % no. of samples
B = 10; % size of one sample
first = randi(sz-B+1,N,1); % the starting point for all blocks
rand_blocks = A(cumsum([first ones(N,B-1)],2)); % the result
This results in an N-by-B matrix (rand_blocks), each row of it is one sample. Of course, this could be one-lined, but it won't make it faster, and I want to keep it clear. For small N or B this method is slightly faster. If N or B becomes very large then the bsxfun method is slightly faster. This ranking is not affected by the size of A.

Related

Define a vector with random steps

I want to create an array that has incremental random steps, I've used this simple code.
t_inici=(0:10*rand:100);
The problem is that the random number keeps unchangable between steps. Is there any simple way to change the seed of the random number within each step?

If you have a set number of points, say nPts, then you could do the following
nPts = 10; % Could use 'randi' here for random number of points
lims = [0, 10] % Start and end points
x = rand(1, nPts); % Create random numbers
% Sort and scale x to fit your limits and be ordered
x = diff(lims) * ( sort(x) - min(x) ) / diff(minmax(x)) + lims(1)
This approach always includes your end point, which a 0:dx:10 approach would not necessarily.
If you had some maximum number of points, say nPtsMax, then you could do the following
nPtsMax = 1000; % Max number of points
lims = [0,10]; % Start and end points
% Could do 10* or any other multiplier as in your example in front of 'rand'
x = lims(1) + [0 cumsum(rand(1, nPtsMax))];
x(x > lims(2)) = []; % remove values above maximum limit
This approach may be slower, but is still fairly quick and better represents the behaviour in your question.

My first approach to this would be to generate N-2 samples, where N is the desired amount of samples randomly, sort them, and add the extrema:
N=50;
endpoint=100;
initpoint=0;
randsamples=sort(rand(1, N-2)*(endpoint-initpoint)+initpoint);
t_inici=[initpoint randsamples endpoint];
However not sure how "uniformly random" this is, as you are "faking" the last 2 data, to have the extrema included. This will somehow distort pure randomness (I think). If you are not necessarily interested on including the extrema, then just remove the last line and generate N points. That will make sure that they are indeed random (or as random as MATLAB can create them).

Here is an alternative solution with "uniformly random"
[initpoint,endpoint,coef]=deal(0,100,10);
t_inici(1)=initpoint;
while(t_inici(end)<endpoint)
t_inici(end+1)=t_inici(end)+rand()*coef;
end
t_inici(end)=[];
In my point of view, it fits your attempts well with unknown steps, start from 0, but not necessarily end at 100.

From your code it seems you want a uniformly random step that varies between each two entries. This implies that the number of entries that the vector will have is unknown in advance.
A way to do that is as follows. This is similar to Hunter Jiang's answer but adds entries in batches instead of one by one, in order to reduce the number of loop iterations.
Guess a number of required entries, n. Any value will do, but a large value will result in fewer iterations and will probably be more efficient.
Initiallize result to the first value.
Generate n entries and concatenate them to the (temporary) result.
See if the current entries are already too many.
If they are, cut as needed and output (final) result. Else go back to step 3.
Code:
lower_value = 0;
upper_value = 100;
step_scale = 10;
n = 5*(upper_value-lower_value)/step_scale*2; % STEP 1. The number 5 here is arbitrary.
% It's probably more efficient to err with too many than with too few
result = lower_value; % STEP 2
done = false;
while ~done
result = [result result(end)+cumsum(step_scale*rand(1,n))]; % STEP 3. Include
% n new entries
ind_final = find(result>upper_value,1)-1; % STEP 4. Index of first entry exceeding
% upper_value, if any
if ind_final % STEP 5. If non-empty, we're done
result = result(1:ind_final-1);
done = true;
end
end

Compute the product of the next n elements in array

I would like to compute the product of the next n adjacent elements of a matrix. The number n of elements to be multiplied should be given in function's input.
For example for this input I should compute the product of every 3 consecutive elements, starting from the first.
[p, ind] = max_product([1 2 2 1 3 1],3);
This gives [1*2*2, 2*2*1, 2*1*3, 1*3*1] = [4,4,6,3].
Is there any practical way to do it? Now I do this using:
for ii = 1:(length(v)-2)
p = prod(v(ii:ii+n-1));
end
where v is the input vector and n is the number of elements to be multiplied.
in this example n=3 but can take any positive integer value.
Depending whether n is odd or even or length(v) is odd or even, I get sometimes right answers but sometimes an error.
For example for arguments:
v = [1.35912281237829 -0.958120385352704 -0.553335935098461 1.44601450110386 1.43760259196739 0.0266423803393867 0.417039432979809 1.14033971399183 -0.418125096873537 -1.99362640306847 -0.589833539347417 -0.218969651537063 1.49863539349242 0.338844452879616 1.34169199365703 0.181185490389383 0.102817336496793 0.104835620599133 -2.70026800170358 1.46129128974515 0.64413523430416 0.921962619821458 0.568712984110933]
n = 7
I get the error:
Index exceeds matrix dimensions.
Error in max_product (line 6)
p = prod(v(ii:ii+n-1));
Is there any correct general way to do it?

Based on the solution in Fast numpy rolling_product, I'd like to suggest a MATLAB version of it, which leverages the movsum function introduced in R2016a.
The mathematical reasoning is that a product of numbers is equal to the exponent of the sum of their logarithms:
A possible MATLAB implementation of the above may look like this:
function P = movprod(vec,window_sz)
P = exp(movsum(log(vec),[0 window_sz-1],'Endpoints','discard'));
if isreal(vec) % Ensures correct outputs when the input contains negative and/or
P = real(P); % complex entries.
end
end
Several notes:
I haven't benchmarked this solution, and do not know how it compares in terms of performance to the other suggestions.
It should work correctly with vectors containing zero and/or negative and/or complex elements.
It can be easily expanded to accept a dimension to operate along (for array inputs), and any other customization afforded by movsum.
The 1st input is assumed to be either a double or a complex double row vector.
Outputs may require rounding.

Update
Inspired by the nicely thought answer of Dev-iL comes this handy solution, which does not require Matlab R2016a or above:
out = real( exp(conv(log(a),ones(1,n),'valid')) )
The basic idea is to transform the multiplication to a sum and a moving average can be used, which in turn can be realised by convolution.
Old answers
This is one way using gallery to get a circulant matrix and indexing the relevant part of the resulting matrix before multiplying the elements:
a = [1 2 2 1 3 1]
n = 3
%// circulant matrix
tmp = gallery('circul', a(:))
%// product of relevant parts of matrix
out = prod(tmp(end-n+1:-1:1, end-n+1:end), 2)
out =
4
4
6
3
More memory efficient alternative in case there are no zeros in the input:
a = [10 9 8 7 6 5 4 3 2 1]
n = 2
%// cumulative product
x = [1 cumprod(a)]
%// shifted by n and divided by itself
y = circshift( x,[0 -n] )./x
%// remove last elements
out = y(1:end-n)
out =
90 72 56 42 30 20 12 6 2

Your approach is correct. You should just change the for loop to for ii = 1:(length(v)-n+1) and then it will work fine.
If you are not going to deal with large inputs, another approach is using gallery as explained in #thewaywewalk's answer.

I think the problem may be based on your indexing. The line that states for ii = 1:(length(v)-2) does not provide the correct range of ii.
Try this:
function out = max_product(in,size)
size = size-1; % this is because we add size to i later
out = zeros(length(in),1) % assuming that this is a column vector
for i = 1:length(in)-size
out(i) = prod(in(i:i+size));
end
Your code works when restated like so:
for ii = 1:(length(v)-(n-1))
p = prod(v(ii:ii+(n-1)));
end
That should take care of the indexing problem.

using bsxfun you create a matrix each row of it contains consecutive 3 elements then take prod of 2nd dimension of the matrix. I think this is most efficient way:
max_product = #(v, n) prod(v(bsxfun(#plus, (1 : n), (0 : numel(v)-n)')), 2);
p = max_product([1 2 2 1 3 1],3)
Update:
some other solutions updated, and some such as #Dev-iL 's answer outperform others, I can suggest fftconv that in Octave outperforms conv

If you can upgrade to R2017a, you can use the new movprod function to compute a windowed product.

Pivot to binary matrix from categorial array

I have an array with some values that belongs to a set. I would like to transform this array in a binary matrix, each column of this matrix will represent each possible value of the set, the row value is 1 for the column that matches the input array or 0 for all the others. I think a name for that is something like a binary pivot.
The input array is a column of a table type
Example of input array (The previous example were only capital letters, which led to misinterpretation):
'Apple'
'Banana'
'Cherry'
'Dragonfruit'
'Apple'
'Cherry'
So, in this example input could assume 4 different values: 'Apple', 'Banana', 'Cherry' or 'Dragonfruit', in my real scenario it can be more than 4.
Example Output matrix:
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
1 0 0 0
0 0 1 0
I have achieved this desired behavior, but I would like to know if there is a better way to perform this operation. In a vectorized way (without the for-loop for each category) or using a built-in function.
function [ binMatrix, categs ] = pivotToBinaryMatrix( input )
categorizedInput = categorical(input);
categs = categories(categorizedInput);
binMatrix = zeros(size(atributo, 1), size(categorias, 1));
for i = 1: size(caters,1)
binMatrix(:,i) = ismember(categorizedInput, categs(i));
end
end
For about 50.000 entries with 9 categories it performed in 0.075137 seconds.
EDIT: I've improved the examples, because the previous examples led to misinterpretation.

Here's my take on the problem:
input = ['ABCDAB']';
binMatrix = bsxfun(#eq,input,unique(input)');
For the benchmarking, I ran it on a Windows 7 machine, 4Gb RAM, Intel i7-2600 CPU 3.4 GHz, borrowing #rayryeng initialization code:
% Generate dictionary from A up to I
ch = char(65 + (0:8));
rng(123);
% Generate 50000 random characters
v = randi(9, 50000, 1);
inputArray = ch(v);
time=0;
for ii=1:100
tic;
binMatrix = bsxfun(#eq,inputArray,unique(inputArray)');
t = toc;
time=time+t;
end
disp(time/100);
Which gave me 0.001203 seconds. For an extensive comparison of methods, please refer to #ryaryeng's answer.

I'm going to assume that your input array is a cell array of characters like so:
inputArray = {'Apple', 'Banana', 'Cherry', 'Dragonfruit', 'Apple', 'Cherry'};
You can convert the above into a numeric array by using the unique function's third output. What's great about this is that unique assigns a unique ID in sorted order, and so if you have a cell array of characters, it respects a lexicographical ordering of the characters.
Next, declare a matrix of zeros (like you did above) then use sub2ind to index into the matrix and set the values to 1.
Something like this. Bear in mind that I initialized the output slightly differently. It's a trick I learned to allocate a matrix of zeroes that is quite fast. See here: Faster way to initialize arrays via empty matrix multiplication? (Matlab)
inputArray = {'Apple', 'Banana', 'Cherry', 'Dragonfruit', 'Apple', 'Cherry'};
[~,~,inputNum] = unique(inputArray);
inputNum = inputNum.'; %// To make compatible in dimensions
binMatrix(numel(inputArray), max(inputNum)) = 0;
binMatrix(sub2ind(size(binMatrix), 1:numel(inputArray), inputNum)) = 1;
Another method would be to create a sparse logical array where we set the right row and column positions to be 1, then use this to index into our zeroes array and set the values accordingly.
Something like:
inputArray = {'Apple', 'Banana', 'Cherry', 'Dragonfruit', 'Apple', 'Cherry'};
[~,~,inputNum] = unique(inputArray);
inputNum = inputNum.'; %// To make compatible in dimensions
binMatrix = sparse(1:numel(inputArray), inputNum, 1, numel(inputArray), max(inputNum));
binMatrix = full(binMatrix);
Let's put this all together in a timing script. I've incorporated the two methods above, plus your old method, plus Divakar's (only the first method) and brodroll's (very ingenious btw) method. For Divakar's and brodroll's method, I have also used unique with the third output as your original inquiry had capital letters which confused as all. Using the third output can easily convert their previous methods to your new specifications.
BTW, your example and your code are mismatched. Your example has it set so the each column is an index but it's each row. For the timing tests, I'm going to transpose your result.I'm running MATLAB R2013a on Mac OS X 10.10.3 with 16 GB of RAM and an Intel i7 2.3 GHz processor. So:
clear all;
close all;
%// Generate dictionary
chars = {'Apple', 'Banana', 'Cherry', 'Dragonfruit'};
rng(123);
%// Generate 50000 random words
v = randi(numel(chars), 50000, 1);
inputArray = chars(v);
[~,~,inputNum] = unique(inputArray);
inputNum = inputNum.'; %// To make compatible in dimensions
%// Timing #1 - sub2ind
tic;
binMatrix(numel(inputArray), max(inputNum)) = 0;
binMatrix(sub2ind(size(binMatrix), 1:numel(inputArray), inputNum)) = 1;
t = toc;
clear binMatrix;
%// Timing #2 - sparse
tic;
binMatrix = sparse(1:numel(inputArray), inputNum, 1, numel(inputArray), max(inputNum));
binMatrix = full(binMatrix);
t2 = toc;
clear binMatrix;
%// Timing #3 - ismember and for
tic;
binMatrix = zeros(numel(inputArray), numel(chars));
for i = 1: size(binMatrix,1)
binMatrix(i,:) = ismember(chars, inputArray(i));
end
t3 = toc;
%// Timing #4 - bsxfun
clear binMatrix;
tic;
binMatrix = bsxfun(#eq,inputNum',unique(inputNum)); %// Changed to make dimensions match
t4 = toc;
clear binMatrix;
%// Timing #5 - raw sub2ind
tic;
binMatrix(numel(inputArray), max(inputNum)) = 0;
binMatrix( (inputNum-1)*size(binMatrix,1) + [1:numel(inputArray)] ) = 1;
t5 = toc;
fprintf('Timing using sub2ind: %f seconds\n', t);
fprintf('Timing using sparse: %f seconds\n', t2);
fprintf('Timing using ismember and loop: %f seconds\n', t3);
fprintf('Timing using bsxfun: %f seconds\n', t4);
fprintf('Timing using raw sub2ind: %f seconds\n', t5);
We get:
Timing using sub2ind: 0.004223 seconds
Timing using sparse: 0.004252 seconds
Timing using ismember and loop: 2.771389 seconds
Timing using bsxfun: 0.020739 seconds
Timing using raw sub2ind: 0.000773 seconds
In terms of rank:
Raw sub2ind
sub2ind
sparse
bsxfun
OP's method

If you don't mind all zeros columns in cases where you have non-successive characters in the input array, something like 'ABEACF', where 'D' is missing, you can use this -
col_idx = inputArray - 'A' + 1;
binMatrix(numel(inputArray), max(col_idx) ) = 0;
binMatrix( (col_idx-1)*size(binMatrix,1) + [1:numel(inputArray)] ) = 1;
If you do care about that issue and would like no all-zeros columns, you can use a modified version of it -
[~,unq_pos,col_idx] = unique(inputArray,'stable');
binMatrix(numel(inputArray), numel(unq_pos)) = 0;
binMatrix( (col_idx-1)*size(binMatrix,1) + [1:numel(inputArray)].' ) = 1;
Basically both these approaches use the same hacky technique to pre-allocate as listed in Undocumented MATLAB and also listed in the other answer by #rayryeng. On top of it, it uses a raw version of sub2ind.

Matlab: Help in implementing quantized time series

I am having trouble implementing this code due to the variable s_k being logical 0/1. In what way can I implement this statement?
s_k is a random sequence of 0/1 generated using a rand() and quantizing the output of rand() by its mean given below. After this, I don't know how to implement. Please help.
N =1000;
input = randn(N);
s = (input>=0.5); %converting into logical 0/1;
UPDATE
N = 3;
tmax = 5;
y(1) = 0.1;
for i =1 : tmax+N-1 %// Change here
y(i+1) = 4*y(i)*(1-y(i)); %nonlinear model for generating the input to Autoregressive model
end
s = (y>=0.5);
ind = bsxfun(#plus, (0:tmax), (0:N-1).');
x = sum(s(ind+1).*(2.^(-ind+N+1))); % The output of this conversion should be real numbers
% Autoregressive model of order 1
z(1) =0;
for j =2 : N
z(j) = 0.195 *z(j-1) + x(j);
end

You've generated the random logical sequence, which is great. You also need to know N, which is the total number of points to collect at one time, as well as a list of time values t. Because this is a discrete summation, I'm going to assume the values of t are discrete. What you need to do first is generate a sliding window matrix. Each column of this matrix represents a set of time values for each value of t for the output. This can easily be achieved with bsxfun. Assuming a maximum time of tmax, a starting time of 0 and a neighbourhood size N (like in your equation), we can do:
ind = bsxfun(#plus, (0:tmax), (0:N-1).');
For example, assuming tmax = 5 and N = 3, we get:
ind =
0 1 2 3 4 5
1 2 3 4 5 6
2 3 4 5 6 7
Each column represents a time that we want to calculate the output at and every row in a column shows a list of time values we want to calculate for the desired output.
Finally, to calculate the output x, you simply take your s_k vector, make it a column vector, use ind to access into it, do a point-by-point multiplication with 2^(-k+N+1) by substituting k with what we got from ind, and sum along the rows. So:
s = rand(max(ind(:))+1, 1) >= 0.5;
x = sum(s(ind+1).*(2.^(-ind+N+1)));
The first statement generates a random vector that is as long as the maximum time value that we have. Once we have this, we use ind to index into this random vector so that we can generate a sliding window of logical values. We need to offset this by 1 as MATLAB starts indexing at 1.

Is there a better/faster way of randomly shuffling a matrix in MATLAB?

In MATLAB, I am using the shake.m function (http://www.mathworks.com/matlabcentral/fileexchange/10067-shake) to randomly shuffle each column. For example:
a = [1 2 3; 4 5 6; 7 8 9]
a =
1 2 3
4 5 6
7 8 9
b = shake(a)
b =
7 8 6
1 5 9
4 2 3
This function does exactly what I want, however my columns are very long (>10,000,000) and so this takes a long time to run. Does anyone know of a faster way of achieving this? I have tried shaking each column vector separately but this isn't faster. Thanks!

You can use randperm like this, but I don't know if it will be any faster than shake:
[m,n]=size(a)
for c = 1:n
a(randperm(m),c) = a(:,c);
end
Or you can try switch the randperm around to see which is faster (should produce the same result):
[m,n]=size(a)
for c = 1:n
a(:,c) = a(randperm(m),c);
end
Otherwise how many rows do you have? If you have far fewer rows than columns, it's possible that we can assume each permutation will be repeated, so what about something like this:
[m,n]=size(a)
cols = randperm(n);
k = 5; %//This is a parameter you'll need to tweak...
set_size = floor(n/k);
for set = 1:set_size:n
set_cols = cols(set:(set+set_size-1))
a(:,set_cols) = a(randperm(m), set_cols);
end
which would massively reduce the number of calls to randperm. Breaking it up into k equal sized sets might not be optimal though, you might want to add some randomness to that as well. The basic idea here though is that there will only be factorial(m) different orderings, and if m is much smaller than n (e.g. m=5, n=100000 like your data), then these orderings will be repeated naturally. So instead of letting that occur by itself, rather manage the process and reduce the calls to randperm which would be producing the same result anyway.

Here's a simple vectorized approach. Note that it creates an auxiliary matrix (ind) the same size as a, so depending on your memory it may be usable or not.
[~, ind] = sort(rand(size(a))); %// create a random sorting for each column
b = a(bsxfun(#plus, ind, 0:size(a,1):numel(a)-1)); %// convert to linear index

Obtain shuffled indices using randperm
idx = randperm(size(a,1));
Use the indices to shuffle the vector:
m = size(a,1);
for i=1:m
b(:,i) = a(randperm(m,:);
end
Look at this answer: Matlab: How to random shuffle columns of matrix

Here's a no-loop approach as it processes all indices at once and I believe this is as random as one could get given the requirements of shuffling among each column only.
Code
%// Get sizes
[m,n] = size(a);
%// Create an array of randomly placed sequential indices from 1 to numel(a)
rand_idx = randperm(m*n);
%// segregate those indices into rows and cols for the size of input data, a
col = ceil(rand_idx/m);
row = rem(rand_idx,m);
row(row==0)=m;
%// Sort both these row and col indices based on col, such that we have col
%// as 1,1,1,1 ...2,2,2,....3,3,3,3 and so on, which would represent per col
%// indices for the input data. Use these indices to linearly index into a
[scol,ind1] = sort(col);
a(1:m*n) = a((scol-1)*m + row(ind1))
Final output is obtained in a itself.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Matlab random sample of a dataset - arrays

I have a dataset (Data) which is a vector of, let's say, 1000 real numbers. I would like to extract at random from Data 100 times 10 contiguous numbers. I don't know how to use Datasample for that purpose. Thanks in advance for you help.

You can just pick 100 random numbers between 1 and 991: I = randi(991, 100, 1) Then use them as the starting points to index 10 contiguous elements: cell2mat(arrayfun(#(x)(Data(x:x+9)), I, 'uni', false))

Related

Define a vector with random steps

Compute the product of the next n elements in array

Pivot to binary matrix from categorial array

Matlab: Help in implementing quantized time series

Is there a better/faster way of randomly shuffling a matrix in MATLAB?

Categories

Resources