I am a beginner in Stata and I am trying to create a loop regression, store the coefficients of the DVs and then plot these. Does this code make sense?
forvalues i = 1/100 {
regress y x1 x2 x3 if ID==`i'
matrix b1 = e(x1)
matrix b2 = e(x2)
matrix b3 = e(x3)
}
I am using coefplot right after and it just does not work. Any help will be very highly appreciated.
Does this code make sense? As you say, it does not work.
I see three bugs.
After each regression, the coefficients are not stored in e(x1) and so on. Such references are not illegal, but they just return missing values.
Similarly a command like
matrix b1 = e(x1)
just creates a 1 x 1 matrix with a single missing value.
Each time around the loop you just overwrite the previous matrices. Nothing in the code accumulates results, even if #1 and #2 were what you want.
Hence, a natural question is: where did this code come from?
There are several ways to get coefficients stored from say 100 regressions. See for example statsby and the community-contributed rangestat (SSC).
Related
Situation:
I was trying to compare two signal vectors (y1 & y2 with time vectors x1 & x2) with different lengths (len(y1)=1000>len(y2)=800). For this, I followed the main piece of advice given hardly everywhere: to use interp1 or spline. In order to 'expand' y2 towards y1 in number of samples through an interpolation.
So I want:
length(y1)=length(y2_interp)
However, in these functions you have to give the points 'x' where to interpolate (xq), so I generate a vector with the resampled points I want to compute:
xq = x2(1):(length(x2))/length(x1):x2(length(x2));
y2_interp = interp1(x2,y2,xq,'spline'); % or spline method directly
RMS = rms(y1-y2_interp)
The problem:
When I resample the x vector in 'xq' variable, as the faction of lengths is not an integer it gives me not the same length for 'y2_interp' as 'y1'. I cannot round it for the same problem.
I tried interpolate using the 'resample' function:
y2_interp=resample(y2,length(y1),length(y2),n);
But I get an aliasing problem and I want to avoid filters if possible. And if n=0 (no filters) I get some sampling problems and more RMS.
The two vectors are quite long, so my misalignment is just of 2 or 3 points.
What I'm looking for:
I would like to find a way of interpolating one vector but having as a reference the length of another one, and not the points where I want to interpolate.
I hope I have explained it well... Maybe I have some misconception. It's more than i'm curious about any possible idea.
Thanks!!
The function you are looking for here is linspace
To get an evenly spaced vector xq with the same endpoints as x2 but the same length as x1:
xq = (x2(1),x2(end),length(x1));
It is not sufficient to interpolate y2 to get the right number of samples, the samples should be at locations corresponding to samples of y1.
Thus, you want to interpolate y2 at the x-coordinates where you have samples for y1, which is given by x1:
y2_interp = interp1(x2,y2,x1,'spline');
RMS = rms(y1-y2_interp)
This is probably a trivial question, but I want to select a portion of a complex array in order to plot it in Matlab. My MWE is
n = 100;
t = linspace(-1,1,n);
x = rand(n,1)+1j*rand(n,1);
plot(t(45):t(55),real(x(45):x(55)),'.--')
plot(t(45):t(55),imag(x(45):x(55)),'.--')
I get an error
Error using plot
Vectors must be the same length.
because the real(x(45):x(55)) bit returns an empty matrix: Empty matrix: 1-by-0. What is the easiest way to fix this problem without creating new vectors for the real and imaginary x?
It was just a simple mistake. You were doing t(45):t(55), but t is generated by rand, so t(45) would be, say, 0.1, and t(55), 0.2, so 0.1:0.2 is only 0.1. See the problem?
Then when you did it for x, the range was different and thus the error.
What you want is t(45:55), to specify the vector positions from 45 to 55.
This is what you want:
n = 100;
t = linspace(-1,1,n);
x = rand(n,1)+1j*rand(n,1);
plot(t(45:55),real(x(45:55)),'.--')
plot(t(45:55),imag(x(45:55)),'.--')
I fail to understand why, in the below example, only x1 turns into a 1000 column array while y is a single number.
x = [0:1:999];
y = (7.5*(x))/(18000+(x));
x1 = exp(-((x)*8)/333);
Any clarification would be highly appreciated!
Why is x1 1x1000?
As given in the documentation,
exp(X) returns the exponential eˣ for each element in array X.
Since x is 1x1000, so -(x*8)/333 is 1x1000 and when exp() is applied on it, exponentials of all 1000 elements are computed and hence x1 is also 1x1000. As an example, exp([1 2 3]) is same as [exp(1) exp(2) exp(3)].
Why is y a single number?
As given in the documentation,
If A is a rectangular m-by-n matrix with m~= n, and B is a matrix
with n columns, then x = B/A returns a least-squares solution of the
system of equations x*A = B.
In your case,
A is 18000+x and size(18000+x) is 1x1000 i.e. m=1 and n=1000, and m~=n
and B is 7.5*x which has n=1000 columns.
⇒(7.5*x)/(18000+x) is returning you least-squares solution of equations x*(18000+x) = 7.5*x.
Final Remarks:
x = [0:1:999];
Brackets are unnecessary here and it should better be use like this: x=0:1:999 ;
It seems that you want to do element-wise division for computing x1 for which you should use ./ operator like this:
y=(7.5*x)./(18000+x); %Also removed unnecessary brackets
Also note that addition is always element-wise. .+ is not a valid MATLAB syntax (It works in Octave though). See valid arithmetic array and matrix operators in MATLAB here.
3. x1 also has some unnecessary brackets.
The question has already been answered by other people. I just want to point out a small thing. You do not need to write x = 0:1:999. It is better written as x = 0:999 as the default increment value used by MATLAB or Octave is 1.
Try explicitly specifying that you want to do element-wise operations rather than matrix operations:
y = (7.5.*(x))./(18000+(x));
In general, .* does elementwise multiplication, ./ does element-wise division, etc. So [1 2] .* [3 4] yields [3 8]. Omitting the dots will cause Matlab to use matrix operations whenever it can find a reasonable interpretation of your inputs as matrices.
I have a large data set with two arrays, say x and y. The arrays have over 1 million data points in size. Is there a simple way to do a scatter plot of only 2000 of these points but have it be representative of the entire set?
I'm thinking along the lines of creating another array r ; r = max(x)*rand(2000,1) to get a random sample of the x array. Is there a way to then find where a value in r is equal to, or close to a value in x ? They wouldn't have to be in the same indexed location but just throughout the whole matrix. We could then plot the y values associated with those found x values against r
I'm just not sure how to code this. Is there a better way than doing this?
I'm not sure how representative this procedure will be of your data, because it depends on what your data looks like, but you can certainly code up something like that. The easiest way to find the closest value is to take the min of the abs of the difference between your test vector and your desired value.
r = max(x)*rand(2000,1);
for i = 1:length(r)
[~,z(i)] = min(abs(x-r(i)));
end
plot(x(z),y(z),'.')
Note that the [~,z(i)] in the min line means we want to store the index of the minimum value in vector z.
You might also try something like a moving average, see this video: http://blogs.mathworks.com/videos/2012/04/17/using-convolution-to-smooth-data-with-a-moving-average-in-matlab/
Or you can plot every n points, something like (I haven't tested this, so no guarantees):
n = 1000;
plot(x(1:n:end),y(1:n:end))
Or, if you know the number of points you want (again, untested):
npoints = 2000;
interval = round(length(x)/npoints);
plot(x(1:interval:end),y(1:interval:end))
Perhaps the easiest way is to use round function and convert things to integers, then they can be compared. For example, if you want to find points that are within 0.1 of the values of r, multiply the values by 10 first, then round:
r = max(x) * round(2000,1);
rr = round(r / 0.1);
xx = round(x / 0.1);
inRR = ismember(xx, rr)
plot(x(inRR), y(inRR));
By dividing by 0.1, any values that have the same integer value are within 0.1 of each other.
ismember returns a 1 for each value of xx if that value is in rr, otherwise a 0. These can be used to select entries to plot.
I have two arrays of 2-by-2 complex matrices, and I was wondering what would be the fastest method of multiplying them. (I want to do matrix multiplication on the elements of the matrix arrays.) At present, I have
numpy.array(map(lambda i: numpy.dot(m1[i], m2[i]), range(l)))
But can one do better than this?
Thanks,
v923z
numpy.einsum is the optimal solution for this problem, and it is mentioned way down toward the bottom of DaveP's reference. The code is clean, very easy to understand, and an order of magnitude faster than looping through the array and doing the multiplication one by one. Here is some sample code:
import numpy
l = 100
m1 = rand(l,2,2)
m2 = rand(l,2,2)
m3 = numpy.array(map(lambda i: numpy.dot(m1[i], m2[i]), range(l)))
m3e = numpy.einsum('lij,ljk->lik', m1, m2)
%timeit numpy.array(map(lambda i: numpy.dot(m1[i], m2[i]), range(l)))
%timeit numpy.einsum('lij,ljk->lik', m1, m2)
print np.all(m3==m3e)
Here are the return values when run in an ipython notebook:
1000 loops, best of 3: 479 µs per loop
10000 loops, best of 3: 48.9 µs per loop
True
I think the answer you are looking for is here. Unfortunately it is a rather messy solution involving reshaping.
If m1 and m2 are 1-dimensional arrays of 2x2 complex matrices, then they essentially have shape (l,2,2). So matrix multiplication on the last two axes is equivalent to summing the product of the last axis of m1 with the second-to-last axis of m2. That's exactly what np.dot does:
np.dot(m1,m2)
Or, since you have complex matrices, perhaps you want to take the complex conjugate of m1 first. In that case, use np.vdot.
PS. If m1 is a list of 2x2 complex matrices, then perhaps see if you can rearrange your code to make m1 an array of shape (l,2,2) from the outset.
If that is not possible, a list comprehension
[np.dot(m1[i],m2[i]) for i in range(l)]
will be faster than using map with lambda, but performing l np.dots is going to be slower than doing one np.dot on two arrays of shape (l,2,2) as suggested above.
If m1 and m2 are 1-dimensional arrays of 2x2 complex matrices, then they essentially have shape (l,2,2). So matrix multiplication on the last two axes is equivalent to summing the product of the last axis of m1 with the second-to-last axis of m2. That's exactly what np.dot does:
But that is not what np.dot does.
a = numpy.array([numpy.diag([1, 2]), numpy.diag([2, 3]), numpy.diag([3, 4])])
produces a (3,2,2) array of 2-by-2 matrices. However, numpy.dot(a,a) creates 6 matrices, and the result's shape is (3, 2, 3, 2). That is not what I need. What I need is an array holding numpy.dot(a[0],a[0]), numpy.dot(a[1],a[1]), numpy.dot(a[2],a[2]) ...
[np.dot(m1[i],m2[i]) for i in range(l)]
should work, but I haven't yet checked, whether it is faster that the mapping of the lambda expression.
Cheers,
v923z
EDIT: the for loop and the map runs at about the same speed. It is the casting to numpy.array that consumes a lot of time, but that would have to be done for both methods, so there is no gain here.
May be it is too old question but i was still searching for an answer.
I tried this code
a=np.asarray(range(1048576),dtype='complex');b=np.reshape(a//1024,(1024,1024));b=b+1J*b
%timeit c=np.dot(b,b)
%timeit d=np.einsum('ij, ki -> jk', b,b).T
The results are : for 'dot'
10 loops, best of 3: 174 ms per loop
for 'einsum'
1 loops, best of 3: 4.51 s per loop
I have checked that c and d are same
(c==d).all()
True
still 'dot' is the winner, I am still searching for a better method but no success