Plot Representative sample of large data set - Matlab - arrays

I have a large data set with two arrays, say x and y. The arrays have over 1 million data points in size. Is there a simple way to do a scatter plot of only 2000 of these points but have it be representative of the entire set?
I'm thinking along the lines of creating another array r ; r = max(x)*rand(2000,1) to get a random sample of the x array. Is there a way to then find where a value in r is equal to, or close to a value in x ? They wouldn't have to be in the same indexed location but just throughout the whole matrix. We could then plot the y values associated with those found x values against r
I'm just not sure how to code this. Is there a better way than doing this?

I'm not sure how representative this procedure will be of your data, because it depends on what your data looks like, but you can certainly code up something like that. The easiest way to find the closest value is to take the min of the abs of the difference between your test vector and your desired value.
r = max(x)*rand(2000,1);
for i = 1:length(r)
[~,z(i)] = min(abs(x-r(i)));
end
plot(x(z),y(z),'.')
Note that the [~,z(i)] in the min line means we want to store the index of the minimum value in vector z.
You might also try something like a moving average, see this video: http://blogs.mathworks.com/videos/2012/04/17/using-convolution-to-smooth-data-with-a-moving-average-in-matlab/
Or you can plot every n points, something like (I haven't tested this, so no guarantees):
n = 1000;
plot(x(1:n:end),y(1:n:end))
Or, if you know the number of points you want (again, untested):
npoints = 2000;
interval = round(length(x)/npoints);
plot(x(1:interval:end),y(1:interval:end))

Perhaps the easiest way is to use round function and convert things to integers, then they can be compared. For example, if you want to find points that are within 0.1 of the values of r, multiply the values by 10 first, then round:
r = max(x) * round(2000,1);
rr = round(r / 0.1);
xx = round(x / 0.1);
inRR = ismember(xx, rr)
plot(x(inRR), y(inRR));
By dividing by 0.1, any values that have the same integer value are within 0.1 of each other.
ismember returns a 1 for each value of xx if that value is in rr, otherwise a 0. These can be used to select entries to plot.

Related

How to select part of complex vector in Matlab

This is probably a trivial question, but I want to select a portion of a complex array in order to plot it in Matlab. My MWE is
n = 100;
t = linspace(-1,1,n);
x = rand(n,1)+1j*rand(n,1);
plot(t(45):t(55),real(x(45):x(55)),'.--')
plot(t(45):t(55),imag(x(45):x(55)),'.--')
I get an error
Error using plot
Vectors must be the same length.
because the real(x(45):x(55)) bit returns an empty matrix: Empty matrix: 1-by-0. What is the easiest way to fix this problem without creating new vectors for the real and imaginary x?
It was just a simple mistake. You were doing t(45):t(55), but t is generated by rand, so t(45) would be, say, 0.1, and t(55), 0.2, so 0.1:0.2 is only 0.1. See the problem?
Then when you did it for x, the range was different and thus the error.
What you want is t(45:55), to specify the vector positions from 45 to 55.
This is what you want:
n = 100;
t = linspace(-1,1,n);
x = rand(n,1)+1j*rand(n,1);
plot(t(45:55),real(x(45:55)),'.--')
plot(t(45:55),imag(x(45:55)),'.--')

Interpolate 2D Array to single point in MATLAB

I have 3 graphs of an IV curve (monotonic increasing function. consider a positive quadratic function in the 1st quadrant. Photo attached.) at 3 different temperatures that are not obtained linearly. That is, one is obtained at 25C, one at 125C and one at 150C.
What I want to make is an interpolated 2D array to fill in the other temperatures. My current method to build a meshgrid-type array is as follows:
H = 5;
W = 6;
[Wmat,Hmat] = meshgrid(1:W,1:H);
X = [1:W; 1:W];
Y = [ones(1,W); H*ones(1,W)];
Z = [vecsatIE25; vecsatIE125];
img = griddata(X,Y,Z,Wmat,Hmat,'linear')
This works to build a 6x6 array, which I can then index one row from, then interpolate from that 1D array.
This is really not what I want to do.
For example, the rows are # temps = 25C, 50C, 75C, 100C, 125C and 150C. So I must select a temperature of, say, 50C when my temperature is actually 57.5C. Then I can interpolate my I to get my V output. So again for example, my I is 113.2A, and I can actually interpolate a value and get a V for 113.2A.
When I take the attached photo and digitize the plot information, I get an array of points. So my goal is to input any Temperature and any current to get a voltage by interpolation. The type of interpolation is not as important, so long as it produces reasonable values - I do not want nearest neighbor interpolation, linear or something similar is preferred. If it is an option, I will try different kinds of interpolation later (cubic, linear).
I am not sure how I can accomplish this, ideally. The meshgrid array does not need to exist. I simply need the 1 value.
Thank you.
If I understand the question properly, I think what you're looking for is interp2:
Vq = interp2(X,Y,V,Xq,Yq) where Vq is the V you want, Xq and Yq are the temperature and current, and X, Y, and V are the input arrays for temperature, current, and voltage.
As an option, you can change method between 'linear', 'nearest', 'cubic', 'makima', and 'spline'

Draw imaginary numbers in matlab

i am trying to learn matlab.
I am trying to make a program that draw these imaginary numbers: ("," = decimal number)
and determine what of the 500 numbers that is closest the real axis.
And i need a little guidance.
What do i have to do to solve this task?
I was thinking about making a loop where all the "values" get stored in a array:
[code]
n= 1
while n < 500
value=1+0.1^n;
disp(value)
n=n+1[/code]
(seems like value is printing wrong values? and how to store in a array?)
And then somehow determine what number that is nearest the real axis and then display the value.
would be really grateful if someone could help me.
thanks in advance.
MATLAB creates imaginary numbers by appending an i or j term with the number. For example, if you wanted to create an imaginary number such that the real component was 1 and the imaginary component was 1, you would simply do:
>> A = 1 + i
A =
1.0000 + 1.0000i
You can see that there is a distinct real component as well as an imaginary component and is stored in A. Similarly, if you want to make the imaginary component have anything other than 1, you would need to add a constant in front of the i (or j). Something like:
>> A = 3 + 6i
A =
3.0000 + 6.0000i
Therefore, for your task, you simply need to create a vector of n between 1 to 500, input this into the above equation, then plot the resulting imaginary numbers. In this case, you would plot the real component on the x axis and the imaginary component on the y axis. Something like:
>> n = 1 : 500;
>> A = (1 + 0.1i).^n;
>> plot(real(A), imag(A));
real and imag are functions in MATLAB that access the real and imaginary components of complex numbers stored in arrays, matrices or single values. As noted by knedlsepp, you can simply plot the array itself as plot can handle complex-valued arrays:
>> plot(A);
Nice picture btw! Be mindful of the . operator appended with the ^ operator. The . means an element-wise operation. This means that we wish to apply the power operation for each value of n from 1 to 500 with 1 + 0.1i as the base. The result would be a 500 element array with the resulting calculations. If we did ^ by itself, we would be expecting to perform a matrix power operation, when this is not the case.
The values that you want to analyze for each value of n being applied to the equation in your post are stored in A. We then plot the real and imaginary components on the graph. Now if you want to find which numbers are closest to the real axis, you simply need to find the smallest absolute imaginary component of the numbers stored in A, then search for all of those numbers that share this number.
>> min_dist = min(abs(imag(A)));
>> vals = A(abs(imag(A)) == min_dist)
vals =
1.3681 - 0.0056i
This means that the value of 1.3681 - 0.0056i is the closest to the real axis.

How to determine the most repeated values into a interval of a vector array matlab

this is my question:
I want to know which and how many times is a value repeated in a interval of a vector array, I know that many people will tell me that use "hist", but I did it and the results isn't exact enough, let me show you in a picture my problem:
In the past picture, you can see in blue the "Data"; and I have used 3 kinds of values: 1st "Mode", 2nd "Mean" and finally "Most repeated value in Histogram" which means that I used something like [a,b]=hist(Data), then Mayor Value = b(a==max(a)) and is very important to do NOT use a predefined range; but this picture doesn't represent the most repeted values, so let me show you another pic, which is a closer view of the data:
That blue "Data", which vary between (0-0.5)E-5 approximately is the interval that I need to obtain, but as you can see, the others three values are not close enough. And "mode" value is just "0". I hope that you can help me to solve this problem, thanks by the way!.
Ok to be more clear, I add this new pic:
What exactly I'm looking for is to get an interval, like in this example I wrote manually 0.1 - 0.4 E-4 (in purple), so the function will say:
[A,B]=magicfunction(Data);
A=[0.1E-4 0.4E-4]; B=[123];
Where B=123 means the amount of data contained in that interval, as you can see I just ingress vector "Data", nothing else.
In the next link you can get the "Data":
https://drive.google.com/file/d/0B4WGV21GqSL5Vk0tRUdLNk5XVnc/edit?usp=sharing
isn't taking the max of a hist in a range what you want? you almost got it, you just didn't define the bins well. For example:
range=4750:5050;
[counts val]=hist(data(range),unique(data(range)));
most_repeated _value_in_range=val(counts==max(counts));
Edit:
Following the clarification, what you want is a statistical bound regarding the histogram width around it's maximum (most frequent value) , here's a solution:
[c, v]=hist(data,linspace(min(data),max(data),num_of_bins));
range=find(c>1/exp(1)*max(c)); % can be also c>0.5*max(c) etc...
A=[v(range(1)) v(range(end))];
B=sum(c(range));
Let's test with some fake data:
t=linspace(-50,50,1e3);
data=0.3*exp(-(t-30).^2)+0.2*exp(-(t-10).^2)+0.3*exp(-(t+10).^2)+0.01*randn(1,numel(t));
[c, v]=hist(data,linspace(min(data),max(data),numel(t)));
range=find(c>1/exp(1)*max(c));
A=[v(range(1)) v(range(end))];
B=sum(c(range));
plot(t,data,'b'); hold on
plot([min(t) max(t)],[A(1) A(1)] ,'--r');
plot([min(t) max(t)],[A(2) A(2)] ,'--r');
B
B =
518
Of course you can change the definition of "width" of the histogram, I took 1/e to 1/e you can take full width at half max (c>0.5*max(c)), or narrower according to the type of data used, etc...
The function below is designed based on several assumptions:
The "interval" of interest is close to 0.
The majority of the samples are small.
The basic idea is to first filter out the samples that are too big, and then define the interval based on the sorted array of the remaining samples.
function [A, B] = magicfunction(data)
% Assuming the outlier samples only exist in the positive side, some
% samples of big, positive values can be excluded in order to obtain a
% better estimation of "the interval". Here we exclude the
% samples that are greater than mean(A)+K1*std(A), where K1 is empirically
% selected as 1.0
K1 = 1.0;
filtered_data = data( data < mean(data)+K1*std(data));
sorted_data = sort(filtered_data);
% Define the interval in terms of the percentile in the
% sorted_data. Here the interval is empirically selected as [0, 0.75]
interval = [0 0.75];
% Map the percentile interval to the actual index in sorted_data.
% Note that interval_index(1) cannot be smaller than 1, and
% interval_index(2) cannot be greater than length(sorted_data)
interval_index = round( length(sorted_data)*interval );
interval_index(1) = max(1, interval_index(1));
interval_index(2) = min(length(sorted_data), interval_index(2));
% Assign output A in terms of the value in the sorted_data
A = sorted_data(interval_index)
% Assign output B
B = sum( data>A(1) & data<A(2) )
% Visualization
x = [1:length(data)];
figure;
subplot(211);
plot(x, data, ...
x, repmat(A(:)', length(data),1) ); grid on;
legend('data', 'lower bound', 'upper bound');
xlim([1 20000]);
subplot(212);
plot(x, data, ...
x, repmat(A(:)', length(data),1) ); grid on;
legend('data', 'lower bound', 'upper bound');
ylim([0, 3*10^-5]);
xlim([1 20000]);
Feeding the data provided in your question into the function yields the following plot:
You may want to empirically tune the two variables in the function to obtain the desired result.
K1
interval

Interpolate (upsample) an array of data

I have an array of 32766 values, which I would like to upsample to fit other arrays of 65534 values.
I could also cycle in a way to take multiple times the same value, but I have to use it several times.
There is a way to increase the number of samples? I've seen the resample function, but it seems for a specific type of object data...
Edit
I was looking for the wrong term: I've found the function interp that upsamples for an integer number, and now I've used it and adapted the array replicating the last two values to fit the other; there is a way to automatically achieve the same size?
You can use interp1:
x = 1:10;
y = x.*x;
%The x values that you want to be interpolated;
xi = 1:0.25:10;
yi = interp1(x,y,xi);

Resources