How to determine the most repeated values into a interval of a vector array matlab - arrays

this is my question:
I want to know which and how many times is a value repeated in a interval of a vector array, I know that many people will tell me that use "hist", but I did it and the results isn't exact enough, let me show you in a picture my problem:
In the past picture, you can see in blue the "Data"; and I have used 3 kinds of values: 1st "Mode", 2nd "Mean" and finally "Most repeated value in Histogram" which means that I used something like [a,b]=hist(Data), then Mayor Value = b(a==max(a)) and is very important to do NOT use a predefined range; but this picture doesn't represent the most repeted values, so let me show you another pic, which is a closer view of the data:
That blue "Data", which vary between (0-0.5)E-5 approximately is the interval that I need to obtain, but as you can see, the others three values are not close enough. And "mode" value is just "0". I hope that you can help me to solve this problem, thanks by the way!.
Ok to be more clear, I add this new pic:
What exactly I'm looking for is to get an interval, like in this example I wrote manually 0.1 - 0.4 E-4 (in purple), so the function will say:
[A,B]=magicfunction(Data);
A=[0.1E-4 0.4E-4]; B=[123];
Where B=123 means the amount of data contained in that interval, as you can see I just ingress vector "Data", nothing else.
In the next link you can get the "Data":
https://drive.google.com/file/d/0B4WGV21GqSL5Vk0tRUdLNk5XVnc/edit?usp=sharing

isn't taking the max of a hist in a range what you want? you almost got it, you just didn't define the bins well. For example:
range=4750:5050;
[counts val]=hist(data(range),unique(data(range)));
most_repeated _value_in_range=val(counts==max(counts));
Edit:
Following the clarification, what you want is a statistical bound regarding the histogram width around it's maximum (most frequent value) , here's a solution:
[c, v]=hist(data,linspace(min(data),max(data),num_of_bins));
range=find(c>1/exp(1)*max(c)); % can be also c>0.5*max(c) etc...
A=[v(range(1)) v(range(end))];
B=sum(c(range));
Let's test with some fake data:
t=linspace(-50,50,1e3);
data=0.3*exp(-(t-30).^2)+0.2*exp(-(t-10).^2)+0.3*exp(-(t+10).^2)+0.01*randn(1,numel(t));
[c, v]=hist(data,linspace(min(data),max(data),numel(t)));
range=find(c>1/exp(1)*max(c));
A=[v(range(1)) v(range(end))];
B=sum(c(range));
plot(t,data,'b'); hold on
plot([min(t) max(t)],[A(1) A(1)] ,'--r');
plot([min(t) max(t)],[A(2) A(2)] ,'--r');
B
B =
518
Of course you can change the definition of "width" of the histogram, I took 1/e to 1/e you can take full width at half max (c>0.5*max(c)), or narrower according to the type of data used, etc...

The function below is designed based on several assumptions:
The "interval" of interest is close to 0.
The majority of the samples are small.
The basic idea is to first filter out the samples that are too big, and then define the interval based on the sorted array of the remaining samples.
function [A, B] = magicfunction(data)
% Assuming the outlier samples only exist in the positive side, some
% samples of big, positive values can be excluded in order to obtain a
% better estimation of "the interval". Here we exclude the
% samples that are greater than mean(A)+K1*std(A), where K1 is empirically
% selected as 1.0
K1 = 1.0;
filtered_data = data( data < mean(data)+K1*std(data));
sorted_data = sort(filtered_data);
% Define the interval in terms of the percentile in the
% sorted_data. Here the interval is empirically selected as [0, 0.75]
interval = [0 0.75];
% Map the percentile interval to the actual index in sorted_data.
% Note that interval_index(1) cannot be smaller than 1, and
% interval_index(2) cannot be greater than length(sorted_data)
interval_index = round( length(sorted_data)*interval );
interval_index(1) = max(1, interval_index(1));
interval_index(2) = min(length(sorted_data), interval_index(2));
% Assign output A in terms of the value in the sorted_data
A = sorted_data(interval_index)
% Assign output B
B = sum( data>A(1) & data<A(2) )
% Visualization
x = [1:length(data)];
figure;
subplot(211);
plot(x, data, ...
x, repmat(A(:)', length(data),1) ); grid on;
legend('data', 'lower bound', 'upper bound');
xlim([1 20000]);
subplot(212);
plot(x, data, ...
x, repmat(A(:)', length(data),1) ); grid on;
legend('data', 'lower bound', 'upper bound');
ylim([0, 3*10^-5]);
xlim([1 20000]);
Feeding the data provided in your question into the function yields the following plot:
You may want to empirically tune the two variables in the function to obtain the desired result.
K1
interval

Related

Define a vector with random steps

I want to create an array that has incremental random steps, I've used this simple code.
t_inici=(0:10*rand:100);
The problem is that the random number keeps unchangable between steps. Is there any simple way to change the seed of the random number within each step?
If you have a set number of points, say nPts, then you could do the following
nPts = 10; % Could use 'randi' here for random number of points
lims = [0, 10] % Start and end points
x = rand(1, nPts); % Create random numbers
% Sort and scale x to fit your limits and be ordered
x = diff(lims) * ( sort(x) - min(x) ) / diff(minmax(x)) + lims(1)
This approach always includes your end point, which a 0:dx:10 approach would not necessarily.
If you had some maximum number of points, say nPtsMax, then you could do the following
nPtsMax = 1000; % Max number of points
lims = [0,10]; % Start and end points
% Could do 10* or any other multiplier as in your example in front of 'rand'
x = lims(1) + [0 cumsum(rand(1, nPtsMax))];
x(x > lims(2)) = []; % remove values above maximum limit
This approach may be slower, but is still fairly quick and better represents the behaviour in your question.
My first approach to this would be to generate N-2 samples, where N is the desired amount of samples randomly, sort them, and add the extrema:
N=50;
endpoint=100;
initpoint=0;
randsamples=sort(rand(1, N-2)*(endpoint-initpoint)+initpoint);
t_inici=[initpoint randsamples endpoint];
However not sure how "uniformly random" this is, as you are "faking" the last 2 data, to have the extrema included. This will somehow distort pure randomness (I think). If you are not necessarily interested on including the extrema, then just remove the last line and generate N points. That will make sure that they are indeed random (or as random as MATLAB can create them).
Here is an alternative solution with "uniformly random"
[initpoint,endpoint,coef]=deal(0,100,10);
t_inici(1)=initpoint;
while(t_inici(end)<endpoint)
t_inici(end+1)=t_inici(end)+rand()*coef;
end
t_inici(end)=[];
In my point of view, it fits your attempts well with unknown steps, start from 0, but not necessarily end at 100.
From your code it seems you want a uniformly random step that varies between each two entries. This implies that the number of entries that the vector will have is unknown in advance.
A way to do that is as follows. This is similar to Hunter Jiang's answer but adds entries in batches instead of one by one, in order to reduce the number of loop iterations.
Guess a number of required entries, n. Any value will do, but a large value will result in fewer iterations and will probably be more efficient.
Initiallize result to the first value.
Generate n entries and concatenate them to the (temporary) result.
See if the current entries are already too many.
If they are, cut as needed and output (final) result. Else go back to step 3.
Code:
lower_value = 0;
upper_value = 100;
step_scale = 10;
n = 5*(upper_value-lower_value)/step_scale*2; % STEP 1. The number 5 here is arbitrary.
% It's probably more efficient to err with too many than with too few
result = lower_value; % STEP 2
done = false;
while ~done
result = [result result(end)+cumsum(step_scale*rand(1,n))]; % STEP 3. Include
% n new entries
ind_final = find(result>upper_value,1)-1; % STEP 4. Index of first entry exceeding
% upper_value, if any
if ind_final % STEP 5. If non-empty, we're done
result = result(1:ind_final-1);
done = true;
end
end

Interpolate 2D Array to single point in MATLAB

I have 3 graphs of an IV curve (monotonic increasing function. consider a positive quadratic function in the 1st quadrant. Photo attached.) at 3 different temperatures that are not obtained linearly. That is, one is obtained at 25C, one at 125C and one at 150C.
What I want to make is an interpolated 2D array to fill in the other temperatures. My current method to build a meshgrid-type array is as follows:
H = 5;
W = 6;
[Wmat,Hmat] = meshgrid(1:W,1:H);
X = [1:W; 1:W];
Y = [ones(1,W); H*ones(1,W)];
Z = [vecsatIE25; vecsatIE125];
img = griddata(X,Y,Z,Wmat,Hmat,'linear')
This works to build a 6x6 array, which I can then index one row from, then interpolate from that 1D array.
This is really not what I want to do.
For example, the rows are # temps = 25C, 50C, 75C, 100C, 125C and 150C. So I must select a temperature of, say, 50C when my temperature is actually 57.5C. Then I can interpolate my I to get my V output. So again for example, my I is 113.2A, and I can actually interpolate a value and get a V for 113.2A.
When I take the attached photo and digitize the plot information, I get an array of points. So my goal is to input any Temperature and any current to get a voltage by interpolation. The type of interpolation is not as important, so long as it produces reasonable values - I do not want nearest neighbor interpolation, linear or something similar is preferred. If it is an option, I will try different kinds of interpolation later (cubic, linear).
I am not sure how I can accomplish this, ideally. The meshgrid array does not need to exist. I simply need the 1 value.
Thank you.
If I understand the question properly, I think what you're looking for is interp2:
Vq = interp2(X,Y,V,Xq,Yq) where Vq is the V you want, Xq and Yq are the temperature and current, and X, Y, and V are the input arrays for temperature, current, and voltage.
As an option, you can change method between 'linear', 'nearest', 'cubic', 'makima', and 'spline'

Ignore NaN when detrending 3-d array

I'm using Matlab 2016a; I'm attempting to detrend a 3-dimensional array along the third dimension, but where there are missing values. It is critical that the values stay in the same positions in the array since the position relates to a geographic location.
In this image, imagine that Page 2 has NaN at random locations but that Page 1 and Page 3 have complete data. Detrending along the 3rd dimension, some vectors will have three data points and some will have two. I need to be able to detrend along the third dimension using all available values. If I were to look at the values for the detrended Page 1 or Page 3, there should be no missing values (since there are always either 2 or 3 data points to use), but Page 2 would have NaN placeholders in the location where the NaN was located.
My question is: how can I detrend along the third dimension while ignoring NaN?
I've attempted using detrend3 (found on the Matlab file exchange: https://www.mathworks.com/matlabcentral/fileexchange/61328-detrend3?focused=7203929&tab=function), which works perfectly when detrending 3-d arrays with no missing values.
Detrending with NaN present produces an error. I've tried ignoring NaN and also setting NaN to -9999 and then ignoring that number, but have been unable to get these efforts to work.
Any guidance about what direction to go would be greatly appreciated.
function detrended = detrendNaN3(A,t)
%DETRENDNAN3 Detrends a matrix with NaNs into the third dimension
% Input Arguments:
% - A: NxMxK matrix
% - t: 1xK time vector
% time to same format as A
t = bsxfun(#times,permute(t,[3 1 2]),ones(size(A)));
% where A == Nan, -> t = NaN
t(isnan(A)) = NaN;
%mean of time each pixel
xm = nanmean(t,3);
% mean of every pixel in A
ym = nanmean(A,3);
% calculate slope using least squares for every pixel
a = nansum(bsxfun(#times,bsxfun(#minus,t,xm),bsxfun(#minus,A,ym)),3)./nansum(bsxfun(#minus,t,xm).^2,3);
% calculate intercept for every pixel
b = ym - a.*xm;
% calculate trend for every pixel
trend = bsxfun(#plus,b,bsxfun(#times,a,t));
% remove trend
detrended = A-trend;
end
Even tough the function is fully vectorised it could be written a bit faster - but it's currently very readable and with a 2500x1700x100 matrix it takes about 8 seconds which I deem acceptable.
An updated version is maintained at the file exchange.

Randomize matrix elements between two values while keeping row and column sums fixed (MATLAB)

I have a bit of a technical issue, but I feel like it should be possible with MATLAB's powerful toolset.
What I have is a random n by n matrix of 0's and w's, say generated with
A=w*(rand(n,n)<p);
A typical value of w would be 3000, but that should not matter too much.
Now, this matrix has two important quantities, the vectors
c = sum(A,1);
r = sum(A,2)';
These are two row vectors, the first denotes the sum of each column and the second the sum of each row.
What I want to do next is randomize each value of w, for example between 0.5 and 2. This I would do as
rand_M = (0.5-2).*rand(n,n) + 0.5
A_rand = rand_M.*A;
However, I don't want to just pick these random numbers: I want them to be such that for every column and row, the sums are still equal to the elements of c and r. So to clean up the notation a bit, say we define
A_rand_c = sum(A_rand,1);
A_rand_r = sum(A_rand,2)';
I want that for all j = 1:n, A_rand_c(j) = c(j) and A_rand_r(j) = r(j).
What I'm looking for is a way to redraw the elements of rand_M in a sort of algorithmic fashion I suppose, so that these demands are finally satisfied.
Now of course, unless I have infinite amounts of time this might not really happen. I therefore accept these quantities to fall into a specific range: A_rand_c(j) has to be an element of [(1-e)*c(j),(1+e)*c(j)] and A_rand_r(j) of [(1-e)*r(j),(1+e)*r(j)]. This e I define beforehand, say like 0.001 or something.
Would anyone be able to help me in the process of finding a way to do this? I've tried an approach where I just randomly repick the numbers, but this really isn't getting me anywhere. It does not have to be crazy efficient either, I just need it to work in finite time for networks of size, say, n = 50.
To be clear, the final output is the matrix A_rand that satisfies these constraints.
Edit:
Alright, so after thinking a bit I suppose it might be doable with some while statement, that goes through every element of the matrix. The difficult part is that there are four possibilities: if you are in a specific element A_rand(i,j), it could be that A_rand_c(j) and A_rand_r(i) are both too small, both too large, or opposite. The first two cases are good, because then you can just redraw the random number until it is smaller than the current value and improve the situation. But the other two cases are problematic, as you will improve one situation but not the other. I guess it would have to look at which criteria is less satisfied, so that it tries to fix the one that is worse. But this is not trivial I would say..
You can take advantage of the fact that rows/columns with a single non-zero entry in A automatically give you results for that same entry in A_rand. If A(2,5) = w and it is the only non-zero entry in its column, then A_rand(2,5) = w as well. What else could it be?
You can alternate between finding these single-entry rows/cols, and assigning random numbers to entries where the value doesn't matter.
Here's a skeleton for the process:
A_rand=zeros(size(A)) is the matrix you are going to fill
entries_left = A>0 is a binary matrix showing which entries in A_rand you still need to fill
col_totals=sum(A,1) is the amount you still need to add in every column of A_rand
row_totals=sum(A,2) is the amount you still need to add in every row of A_rand
while sum( entries_left(:) ) > 0
% STEP 1:
% function to fill entries in A_rand if entries_left has rows/cols with one nonzero entry
% you will need to keep looping over this function until nothing changes
% update() A_rand, entries_left, row_totals, col_totals every time you loop
% STEP 2:
% let (i,j) be the indeces of the next non-zero entry in entries_left
% assign a random number to A_rand(i,j) <= col_totals(j) and <= row_totals(i)
% update() A_rand, entries_left, row_totals, col_totals
end
update()
A_rand(i,j) = random_value;
entries_left(i,j) = 0;
col_totals(j) = col_totals(j) - random_value;
row_totals(i) = row_totals(i) - random_value;
end
Picking the range for random_value might be a little tricky. The best I can think of is to draw it from a relatively narrow distribution centered around N*w*p where p is the probability of an entry in A being nonzero (this would be the average value of row/column totals).
This doesn't scale well to large matrices as it will grow with n^2 complexity. I tested it for a 200 by 200 matrix and it worked in about 20 seconds.

Plot Representative sample of large data set - Matlab

I have a large data set with two arrays, say x and y. The arrays have over 1 million data points in size. Is there a simple way to do a scatter plot of only 2000 of these points but have it be representative of the entire set?
I'm thinking along the lines of creating another array r ; r = max(x)*rand(2000,1) to get a random sample of the x array. Is there a way to then find where a value in r is equal to, or close to a value in x ? They wouldn't have to be in the same indexed location but just throughout the whole matrix. We could then plot the y values associated with those found x values against r
I'm just not sure how to code this. Is there a better way than doing this?
I'm not sure how representative this procedure will be of your data, because it depends on what your data looks like, but you can certainly code up something like that. The easiest way to find the closest value is to take the min of the abs of the difference between your test vector and your desired value.
r = max(x)*rand(2000,1);
for i = 1:length(r)
[~,z(i)] = min(abs(x-r(i)));
end
plot(x(z),y(z),'.')
Note that the [~,z(i)] in the min line means we want to store the index of the minimum value in vector z.
You might also try something like a moving average, see this video: http://blogs.mathworks.com/videos/2012/04/17/using-convolution-to-smooth-data-with-a-moving-average-in-matlab/
Or you can plot every n points, something like (I haven't tested this, so no guarantees):
n = 1000;
plot(x(1:n:end),y(1:n:end))
Or, if you know the number of points you want (again, untested):
npoints = 2000;
interval = round(length(x)/npoints);
plot(x(1:interval:end),y(1:interval:end))
Perhaps the easiest way is to use round function and convert things to integers, then they can be compared. For example, if you want to find points that are within 0.1 of the values of r, multiply the values by 10 first, then round:
r = max(x) * round(2000,1);
rr = round(r / 0.1);
xx = round(x / 0.1);
inRR = ismember(xx, rr)
plot(x(inRR), y(inRR));
By dividing by 0.1, any values that have the same integer value are within 0.1 of each other.
ismember returns a 1 for each value of xx if that value is in rr, otherwise a 0. These can be used to select entries to plot.

Resources