Plotting logistic regression line - arrays

this is my first post ever here so I'm not quit sure what is the proper form to ask the question. I'm trying to put picture of the results but since its my first post, the website telling me that I need 10 positive post for some credibility so I think that my charts doesn't appear. Also, I'm french, not perfectly bilingual. Please, be indulgent, I'm open for all comments and suggestions. I really need this for my master's projet. Thank you very much!
I have two sets of arrays which contains thousands of values In one (x_1_3) is all the value of temperature and y_0_100 contain only 0's and 100's which are associated to every temperature in x_1_3 sorted.
x_1_3 = array([[ 2.02],
[ 2.01],
[ 3.08],
...,
[ 0.16],
[ 0.17],
[-2.12]])
y_0_100 = array([ 0., 0., 0., ..., 100., 100., 100.])
The 0 in y_0_100 represent solid precipitation and 100 represent liquid precipitation I just want to plot a logistic regression line across my values
(I also tried to put the values in a dataframe, but it didnt work)
dfsnow_rain
AirTemp liquid%
0 2.02 0.0
1 2.01 0.0
2 3.08 0.0
3 3.05 0.0
4 4.89 0.0
... ... ...
7526 0.78 100.0
7527 0.40 100.0
7528 0.16 100.0
7529 0.17 100.0
7530 -2.12 100.0
7531 rows × 2 columns
X = x_1_3
y = y_0_100
# Fit the classifier
clf = linear_model.LogisticRegression(C=1e5)
clf.fit(X, y)
# and plot the result
plt.figure(1, figsize=(10, 5))
plt.clf()
plt.scatter(X.ravel(), y, color='black', zorder=20)
X_test = np.linspace(-15, 15, 300)
loss = expit(X_test * clf.coef_ + clf.intercept_).ravel()
plt.plot(X_test, loss, color='red', linewidth=3)
ols = linear_model.LinearRegression()
ols.fit(X, y)
plt.plot(X_test, ols.coef_ * X_test + ols.intercept_, linewidth=1)
#plt.axhline(1, color='.5')
plt.ylabel('y')
plt.xlabel('X')
plt.xticks(range(-10, 10))
plt.yticks([0, 100, 10])
plt.ylim(0, 100)
plt.xlim(-10, 10)
plt.legend(('Logistic Regression Model', 'Linear Regression Model'),
loc="lower right", fontsize='small')
plt.tight_layout()
plt.show()
Chart results
When I zoom in I realise that my logistic regression line is not flat, its the line that curves in a very small range (see picture below)
Chart when it's zoomed
I would like something more like this :
Logistic regression chart i would like
What am i doing wrong here? I just want to plot a regression line across my values from y0 to y100

Related

Bounds Error in Julia When Working with Arrays

I'm trying to simulate a 3D random walk in Julia as a way to learn the ropes of Julia programming. I define all my variables and then initialize an (n_steps X 3) array of zeros that I want to use to store my coordinates when I do the walk. Here, "n_steps" is the number of steps in the walk, and the three columns correspond to the x, y, and z coordinates. When I try to update the array with my new coordinates, I get an error:
ERROR: LoadError: BoundsError: attempt to access 100×3 Array{Float64,2} at index [0, 1]
I don't understand why I'm getting this error. As far as I know, I'm looping through all the rows of the array and updating the x, y, and z coordinates. I never mentioned the index 0, as I specified that the loop start at row number 1 in my code. What is going on? Here is my code so far (I haven't plotted yet, since I can't progress further without resolving this problem):
using Plots
using Random
len_step = 1
θ_min, θ_max = 0, pi
ϕ_min, ϕ_max = 0, 2 * pi
n_steps = 100
init = zeros(Float64, n_steps, 3)
for jj = 1:1:length(init)
θ_rand = rand(Float64)* (θ_max - θ_min)
ϕ_rand = rand(Float64)* (ϕ_max - ϕ_min)
x_rand = len_step * sin(θ_rand) * cos(ϕ_rand)
y_rand = len_step * sin(θ_rand) * sin(ϕ_rand)
z_rand = len_step * cos(θ_rand)
init[jj, 1] += init[jj-1, 1] + x_rand
init[jj, 2] += init[jj-1, 2] + y_rand
init[jj, 3] += init[jj-1, 3] + z_rand
end
print(init)
If it's relevant, I'm running Julia Version 1.4.2 on 64-Bit on Windows 10. I'd greatly appreciate any help. Thanks.
The function length returns the length of an array as if it was one dimensional. What you want is size
julia> init = zeros(3,5)
3×5 Array{Float64,2}:
0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0
0.0 0.0 0.0 0.0 0.0
julia> length(init)
15
julia> size(init)
(3, 5)
julia> size(init, 2)
5
julia> size(init, 1)
3
Note also that in julia, array indices start at 1, and since you access at index jj-1, you can not start the loop at 1.

Julia way to write k-step look ahead function?

Suppose I have two arrays representing a probabilistic graph:
2
/ \
1 -> 4 -> 5 -> 6 -> 7
\ /
3
Where the probability of going to state 2 is 0.81 and the probability of going to state 3 is (1-0.81) = 0.19. My arrays represent the estimated values of the states as well as the rewards. (Note: Each index of the array represents its respective state)
V = [0, 3, 8, 2, 1, 2, 0]
R = [0, 0, 0, 4, 1, 1, 1]
The context doesn't matter so much, it's just to give an idea of where I'm coming from. I need to write a k-step look ahead function where I sum the discounted value of rewards and add it to the estimated value of the kth-state.
I have been able to do this so far by creating separate functions for each step look ahead. My goal of asking this question is to figure out how to refactor this code so that I don't repeat myself and use idiomatic Julia.
Here is an example of what I am talking about:
function E₁(R::Array{Float64,1}, V::Array{Float64, 1}, P::Float64)
V[1] + 0.81*(R[1] + V[2]) + 0.19*(R[2] + V[3])
end
function E₂(R::Array{Float64,1}, V::Array{Float64, 1}, P::Float64)
V[1] + 0.81*(R[1] + R[3]) + 0.19*(R[2] + R[4]) + V[4]
end
function E₃(R::Array{Float64,1}, V::Array{Float64, 1}, P::Float64)
V[1] + 0.81*(R[1] + R[3]) + 0.19*(R[2] + R[4]) + R[5] + V[5]
end
.
.
.
So on and so forth. It seems that if I was to ignore E₁() this would be exceptionally easy to refactor. But because I have to discount the value estimate at two different states, I'm having trouble thinking of a way to generalize this for k-steps.
I think obviously I could write a single function that took an integer as a value and then use a bunch of if-statements but that doesn't seem in the spirit of Julia. Any ideas on how I could refactor this? A closure of some sort? A different data type to store R and V?
It seems like you essentially have a discrete Markov chain. So the standard way would be to store the graph as its transition matrix:
T = zeros(7,7)
T[1,2] = 0.81
T[1,3] = 0.19
T[2,4] = 1
T[3,4] = 1
T[5,4] = 1
T[5,6] = 1
T[6,7] = 1
Then you can calculate the probabilities of ending up at each state, given an intial distribution, by multiplying T' from the left (because usually, the transition matrix is defined transposedly):
julia> T' * [1,0,0,0,0,0,0] # starting from (1)
7-element Array{Float64,1}:
0.0
0.81
0.19
0.0
0.0
0.0
0.0
Likewise, the probability of ending up at each state after k steps can be calculated by using powers of T':
julia> T' * T' * [1,0,0,0,0,0,0]
7-element Array{Float64,1}:
0.0
0.0
0.0
1.0
0.0
0.0
0.0
Now that you have all probabilities after k steps, you can easily calculate expectations as well. Maybe it pays of to define T as a sparse matrix.

How can I perform a matrix interpolation from a linearly spaced axis to a logarithmically spaced axis?

Anyone know how can I interpole a energy spectrum matrix linearrly spaced to a matrix where one of the axis is logarithimically spaced instead of linearly spaced?
The size of my energy spectrum matrix is 64x165. The original x axis represents the energy variation in terms of directions and the original y axis represents the energy variation in terms of frequencies. Both vectors are spaced linearly (the same interval between each vector position). I want to interpolate this matrix to a 24x25 format where the x axis (directions) continues linearly spaced (now a vector with 24 positions instead of 64) but the y axis (frequency) is not linearly spaced anymore; it is a vector with different intervals between positions (the interval between the position 2 and the position 1 is smaller than the interval between the position 3 and the position 2 of this vector... and so on up to position 25).
It is important to point out that all vectors (including the new frequency logarithmically spaced vector) are known (I don't wanna to generate them).
I tried the function interp2 and griddata. Both functions showed the same result, but this result is completely different from the original spectrum (what I would not expect to happen since I just did an interpolation). Anyone could help? I'm using Matlab 2011 for Windows.
Small example:
freq_input=[0.038592 0.042451 0.046311 0.05017 0.054029 0.057888 0.061747 0.065607 0.069466 0.073325]; %Linearly spaced
dir_input=[0 45 90 135 180 225 270 315]; %Linearly spaced
matrix_input=[0.004 0.006 1.31E-06 0.011 0.032 0.0007 0.010 0.013 0.001 0.008
0.007 0.0147 3.95E-05 0.023 0.142 0.003 0.022 0.022 0.003 0.017
0.0122 0.0312 0.0012 0.0351 0.285 0.024 0.048 0.036 0.015 0.036
0.0154 0.0530 0.0185 0.0381 0.242 0.102 0.089 0.058 0.060 0.075
0.0148 0.0661 0.1209 0.0345 0.095 0.219 0.132 0.087 0.188 0.140
0.0111 0.0618 0.2232 0.0382 0.027 0.233 0.156 0.119 0.370 0.187
0.0069 0.0470 0.1547 0.0534 0.010 0.157 0.154 0.147 0.436 0.168
0.0041 0.0334 0.0627 0.0646 0.009 0.096 0.136 0.163 0.313 0.112]; %8 lines (directions) and 10 columns (frequencies)
freq_output=[0.412E-01 0.453E-01 0.498E-01 0.548E-01 0.603E-01]; %Logarithimically spaced
dir_output=[0 45 90 135 180 225 270 315]; %The same as dir_input
After did a meshgrid with the freq_input and dir_input vectors, and a meshgrid using freq_output and dir_output, I tried interp2(freq_input,dir_input,matrix,freq_output,dir_output) and griddata(freq_input,dir_input,matrix,freq_output,dir_output) and the results seems wrong.
The course of action you described should work fine, so it's possible that you misinterpreted your results after interpolation when you said "the result seems wrong".
Here's what I mean, assuming your dummy data from the question:
% interpolate using griddata
matrix_output = griddata(freq_input,dir_input,matrix_input,freq_output.',dir_output);
% need 2d arrays later for scatter plotting the result
[freq_2d,dir_2d] = meshgrid(freq_output,dir_output);
figure;
% plot the original data
surf(freq_input,dir_input,matrix_input);
hold on;
scatter3(freq_2d(:),dir_2d(:),matrix_output(:),'rs');
The result shows the surface plot (based on the original input data) with red squares superimposed on it: the interpolated values
You can see that the linearly interpolated data values follow the bilinear surface drawn by surf perfectly (rotating the figure around in 3d makes this even more obvious). In other words, the interpolation and subsequent plotting is fine.

Self organizing maps

I have question regarding the self organizing maps algorithm
I know that we have an input vector and weight vectors. The calculation of the min distance between the weight and input is the best match unit which make the weight column that relates to the min value update and then update its neighbors.After that we update the rate (assuming you have an experience in SOM).
example
input
i1: (1, 1, 0, 0)
weight =
[.8 .4 .7 .3
.2 .6 .5 .9]
learning rate .6
steps (simply and dropping Gaussian function)
first iteration.
1- find the min distance
d2 = (.2-1)2 + (.6-1)2 + (.5-0)2 + (.9-0)2 = 1.86
d2 = (.8-1)2 + (.4-1)2 + (.7-0)2 + (.3-0)2 = .98 this is the BMU
2- update weight vector
new−unit2−weights = [.8 .4 .7 .3] + 0.6([1 1 0 0]-[.8 .4 .7 .3])
= [.92 .76 .28 .12]
the result of the weight is
.8 .4 .7 .3
.92 .76 .28 .12
my questions
1- at the end, I'll be getting new weight vector values and the same input vectors.
what should be plotted? Weight or input or what?
If am using matlab do you have any idea what function to use to get good illustration
Following you very simple example, the initial weights are:
Initial weight =
[.8 .4 .7 .3
.2 .6 .5 .9]
and final weights should be (assuming all your calculations are correct):
Final weights =
[.92 .76 .28 .12
.2 .6 .5 .9]
Note that the winning unit - called best matching unit - is the only one that should be updated/changed here since you have disregarded the neighborhood learning aspect of SOM.
This becomes your results and is the one that will be plotted.
I am learning SOM algorithm these days, and I am going to use Python to implement the algorithm, if you are familiar with Python I think you can click this link, som_test.
Your weight is
weight =
[.8 .4 .7 .3
.2 .6 .5 .9]
and you input value is
vector = [1, 1, 0, 0]
And I think the output layer is 2 because the initial weight is 2 by 4 matrix. And you can plot both the input data and the weight.
The input value is
[[0.1961, 0.9806],
[-0.1961, 0.9806],
[0.9806, 0.1961],
[0.9806, -0.1961],
[-0.5812, -0.8137],
[-0.8137, -0.5812],]
And the plot is, the weight is 3 by 2 matrix, as you can see in the image, there are 3 Xs, that are weights.

Simultaneous rotation in Matrix

Can anyone help me (again) please? I have a Matrix like this:
1.0 0.0 0.0 2.5
0.0 1.0 0.0 0.0
0.0 0.0 1.0 0.0
0.0 0.0 0.0 1.0
How can I rotate it 20° in X axis, -128° in Y axis and 72.1° in Z axis simultaneously?
thank you very much
I want rotate … in X axis, … in Y axis and … in Z axis simultaneously
You can't. What you ask for is mathematically undefined. There are 6 permutations of the order in which the elementary rotations could be combined…
X Y Z
X Z Y
Y X Z
Y Z X
Z X Y
Z Y X
and each of them has a different result. Rotations don't work the way you think. Mathematically rotations in 3 dimensional space form a special unitary group of degree 2, also written as SU(2). Each rotation in SU(2) is unique but can be constructed by combining an infinite number of other rotations in SU(2).
In your particular case there's no particular solution to the problem. The best thing you can do is choose a particular execution order and apply the rotations one after another onto your existing coordinate system, by forming the corresponding rotation matrix and multiplying onto the matrix representing the previous coordinate system/transformation step.

Resources