Self organizing maps - maps

I have question regarding the self organizing maps algorithm
I know that we have an input vector and weight vectors. The calculation of the min distance between the weight and input is the best match unit which make the weight column that relates to the min value update and then update its neighbors.After that we update the rate (assuming you have an experience in SOM).
example
input
i1: (1, 1, 0, 0)
weight =
[.8 .4 .7 .3
.2 .6 .5 .9]
learning rate .6
steps (simply and dropping Gaussian function)
first iteration.
1- find the min distance
d2 = (.2-1)2 + (.6-1)2 + (.5-0)2 + (.9-0)2 = 1.86
d2 = (.8-1)2 + (.4-1)2 + (.7-0)2 + (.3-0)2 = .98 this is the BMU
2- update weight vector
new−unit2−weights = [.8 .4 .7 .3] + 0.6([1 1 0 0]-[.8 .4 .7 .3])
= [.92 .76 .28 .12]
the result of the weight is
.8 .4 .7 .3
.92 .76 .28 .12
my questions
1- at the end, I'll be getting new weight vector values and the same input vectors.
what should be plotted? Weight or input or what?
If am using matlab do you have any idea what function to use to get good illustration

Following you very simple example, the initial weights are:
Initial weight =
[.8 .4 .7 .3
.2 .6 .5 .9]
and final weights should be (assuming all your calculations are correct):
Final weights =
[.92 .76 .28 .12
.2 .6 .5 .9]
Note that the winning unit - called best matching unit - is the only one that should be updated/changed here since you have disregarded the neighborhood learning aspect of SOM.
This becomes your results and is the one that will be plotted.

I am learning SOM algorithm these days, and I am going to use Python to implement the algorithm, if you are familiar with Python I think you can click this link, som_test.
Your weight is
weight =
[.8 .4 .7 .3
.2 .6 .5 .9]
and you input value is
vector = [1, 1, 0, 0]
And I think the output layer is 2 because the initial weight is 2 by 4 matrix. And you can plot both the input data and the weight.
The input value is
[[0.1961, 0.9806],
[-0.1961, 0.9806],
[0.9806, 0.1961],
[0.9806, -0.1961],
[-0.5812, -0.8137],
[-0.8137, -0.5812],]
And the plot is, the weight is 3 by 2 matrix, as you can see in the image, there are 3 Xs, that are weights.

Related

Plotting logistic regression line

this is my first post ever here so I'm not quit sure what is the proper form to ask the question. I'm trying to put picture of the results but since its my first post, the website telling me that I need 10 positive post for some credibility so I think that my charts doesn't appear. Also, I'm french, not perfectly bilingual. Please, be indulgent, I'm open for all comments and suggestions. I really need this for my master's projet. Thank you very much!
I have two sets of arrays which contains thousands of values In one (x_1_3) is all the value of temperature and y_0_100 contain only 0's and 100's which are associated to every temperature in x_1_3 sorted.
x_1_3 = array([[ 2.02],
[ 2.01],
[ 3.08],
...,
[ 0.16],
[ 0.17],
[-2.12]])
y_0_100 = array([ 0., 0., 0., ..., 100., 100., 100.])
The 0 in y_0_100 represent solid precipitation and 100 represent liquid precipitation I just want to plot a logistic regression line across my values
(I also tried to put the values in a dataframe, but it didnt work)
dfsnow_rain
AirTemp liquid%
0 2.02 0.0
1 2.01 0.0
2 3.08 0.0
3 3.05 0.0
4 4.89 0.0
... ... ...
7526 0.78 100.0
7527 0.40 100.0
7528 0.16 100.0
7529 0.17 100.0
7530 -2.12 100.0
7531 rows × 2 columns
X = x_1_3
y = y_0_100
# Fit the classifier
clf = linear_model.LogisticRegression(C=1e5)
clf.fit(X, y)
# and plot the result
plt.figure(1, figsize=(10, 5))
plt.clf()
plt.scatter(X.ravel(), y, color='black', zorder=20)
X_test = np.linspace(-15, 15, 300)
loss = expit(X_test * clf.coef_ + clf.intercept_).ravel()
plt.plot(X_test, loss, color='red', linewidth=3)
ols = linear_model.LinearRegression()
ols.fit(X, y)
plt.plot(X_test, ols.coef_ * X_test + ols.intercept_, linewidth=1)
#plt.axhline(1, color='.5')
plt.ylabel('y')
plt.xlabel('X')
plt.xticks(range(-10, 10))
plt.yticks([0, 100, 10])
plt.ylim(0, 100)
plt.xlim(-10, 10)
plt.legend(('Logistic Regression Model', 'Linear Regression Model'),
loc="lower right", fontsize='small')
plt.tight_layout()
plt.show()
Chart results
When I zoom in I realise that my logistic regression line is not flat, its the line that curves in a very small range (see picture below)
Chart when it's zoomed
I would like something more like this :
Logistic regression chart i would like
What am i doing wrong here? I just want to plot a regression line across my values from y0 to y100

for loop condition ignored

I've just started learning Java a few days ago.
I was trying to create a method with two parameters that prints the products of the two numbers, until the product is less than or equal to 200.
For example: the inputs are 5.0 and 0.5. I want this operation: 5.0 * 0.5, print the result (2.5), then I want the update to be increased by 0.1, so from 0.5 to 0.6. Make the operation again 5.0 * 0.6, print the result, increase the update to 0.7 and so on, until the result of the operation n*n is <=200.
Just like this:
inputs: 5.0, 0.5
print:
2,5
3
3,5
...
200
I don't understand why the loop is infinite, I guess the condition is ignored. What is the problem?
Here are some variations on the code I wrote
void ration(double lbs, double increment) {
for(double product=(lbs*increment); product <= 200.0; increment+=0.1){
System.out.println(lbs*increment);
}
}
this prints an infinite loop of the results between n*n (let's say 5 10 15 20 25..)
void ration(double lbs, double increment) {
for(double product=(lbs*increment); product <= 200.0; increment+=0.1){
System.out.println(product);
}
}
This prints the first result of n * n forver... (let's say 5 5 5 5 5 5..)
What am I doing wrong? Any advice would be greately appreciated.
Thank you

How can I perform a matrix interpolation from a linearly spaced axis to a logarithmically spaced axis?

Anyone know how can I interpole a energy spectrum matrix linearrly spaced to a matrix where one of the axis is logarithimically spaced instead of linearly spaced?
The size of my energy spectrum matrix is 64x165. The original x axis represents the energy variation in terms of directions and the original y axis represents the energy variation in terms of frequencies. Both vectors are spaced linearly (the same interval between each vector position). I want to interpolate this matrix to a 24x25 format where the x axis (directions) continues linearly spaced (now a vector with 24 positions instead of 64) but the y axis (frequency) is not linearly spaced anymore; it is a vector with different intervals between positions (the interval between the position 2 and the position 1 is smaller than the interval between the position 3 and the position 2 of this vector... and so on up to position 25).
It is important to point out that all vectors (including the new frequency logarithmically spaced vector) are known (I don't wanna to generate them).
I tried the function interp2 and griddata. Both functions showed the same result, but this result is completely different from the original spectrum (what I would not expect to happen since I just did an interpolation). Anyone could help? I'm using Matlab 2011 for Windows.
Small example:
freq_input=[0.038592 0.042451 0.046311 0.05017 0.054029 0.057888 0.061747 0.065607 0.069466 0.073325]; %Linearly spaced
dir_input=[0 45 90 135 180 225 270 315]; %Linearly spaced
matrix_input=[0.004 0.006 1.31E-06 0.011 0.032 0.0007 0.010 0.013 0.001 0.008
0.007 0.0147 3.95E-05 0.023 0.142 0.003 0.022 0.022 0.003 0.017
0.0122 0.0312 0.0012 0.0351 0.285 0.024 0.048 0.036 0.015 0.036
0.0154 0.0530 0.0185 0.0381 0.242 0.102 0.089 0.058 0.060 0.075
0.0148 0.0661 0.1209 0.0345 0.095 0.219 0.132 0.087 0.188 0.140
0.0111 0.0618 0.2232 0.0382 0.027 0.233 0.156 0.119 0.370 0.187
0.0069 0.0470 0.1547 0.0534 0.010 0.157 0.154 0.147 0.436 0.168
0.0041 0.0334 0.0627 0.0646 0.009 0.096 0.136 0.163 0.313 0.112]; %8 lines (directions) and 10 columns (frequencies)
freq_output=[0.412E-01 0.453E-01 0.498E-01 0.548E-01 0.603E-01]; %Logarithimically spaced
dir_output=[0 45 90 135 180 225 270 315]; %The same as dir_input
After did a meshgrid with the freq_input and dir_input vectors, and a meshgrid using freq_output and dir_output, I tried interp2(freq_input,dir_input,matrix,freq_output,dir_output) and griddata(freq_input,dir_input,matrix,freq_output,dir_output) and the results seems wrong.
The course of action you described should work fine, so it's possible that you misinterpreted your results after interpolation when you said "the result seems wrong".
Here's what I mean, assuming your dummy data from the question:
% interpolate using griddata
matrix_output = griddata(freq_input,dir_input,matrix_input,freq_output.',dir_output);
% need 2d arrays later for scatter plotting the result
[freq_2d,dir_2d] = meshgrid(freq_output,dir_output);
figure;
% plot the original data
surf(freq_input,dir_input,matrix_input);
hold on;
scatter3(freq_2d(:),dir_2d(:),matrix_output(:),'rs');
The result shows the surface plot (based on the original input data) with red squares superimposed on it: the interpolated values
You can see that the linearly interpolated data values follow the bilinear surface drawn by surf perfectly (rotating the figure around in 3d makes this even more obvious). In other words, the interpolation and subsequent plotting is fine.

Correct way to get weighted average of concrete array-values along continous interval

I've been looking for a while onto websearch, however, possibly or probably I am missing the right terminology.
I have arbitrary sized arrays of scalars ...
array = [n_0, n_1, n_2, ..., n_m]
I also have a function f->x->y, with 0<=x<=1, and y an interpolated value from array. Examples:
array = [1,2,9]
f(0) = 1
f(0.5) = 2
f(1) = 9
f(0.75) = 5.5
My problem is that I want to compute the average value for some interval r = [a..b], where a E [0..1] and b E [0..1], i.e. I want to generalize my interpolation function f->x->y to compute the average along r.
My mind boggles me slightly w.r.t. finding the right weighting. Imagine I want to compute f([0.2,0.8]):
array --> 1 | 2 | 9
[0..1] --> 0.00 0.25 0.50 0.75 1.00
[0.2,0.8] --> ^___________________^
The latter being the range of values I want to compute the average of.
Would it be mathematically correct to compute the average like this?: *
1 * (1-0.8) <- 0.2 'translated' to [0..0.25]
+ 2 * 1
avg = + 9 * 0.2 <- 0.8 'translated' to [0.75..1]
----------
1.4 <-- the sum of weights
This looks correct.
In your example, your interval's length is 0.6. In that interval, your number 2 is taking up (0.75-0.25)/0.6 = 0.5/0.6 = 10/12 of space. Your number 1 takes up (0.25-0.2)/0.6 = 0.05 = 1/12 of space, likewise your number 9.
This sums up to 10/12 + 1/12 + 1/12 = 1.
For better intuition, think about it like this: The problem is to determine how much space each array-element covers along an interval. The rest is just filling the machinery described in http://en.wikipedia.org/wiki/Weighted_average#Mathematical_definition .

Finding the row with max separation between elements of an array in matlab

I have an array of size m x n. Each row has n elements which shows some probability (between 0 and 1). I want to find the row which has the max difference between its elements while it would be better if its nonzero elements are greater as well.
For example in array Arr:
Arr = [0.1 0 0.33 0 0.55 0;
0.01 0 0.10 0 0.2 0;
1 0.1 0 0 0 0;
0.55 0 0.33 0 0.15 0;
0.17 0.17 0.17 0.17 0.17 0.17]
the best row would be 3rd row, because it has more distinct values with greater values. How can I compute this using Matlab?
It seems that you're looking for the row with the greatest standard deviation, which is basically a measure of how much the values vary from the average.
If you want to ignore zero elements, use Shai's useful suggestion to replace zero elements to NaN. Indeed, some of MATLAB's built-in functions allow ignoring them:
Arr2 = Arr;
Arr2(~Arr) = NaN;
To find the standard deviation we'll employ nanstd (not std, because it doesn't ignore NaN values) along the rows, i.e. the 2nd dimension:
nanstd(Arr2, 0, 2)
To find the greatest standard deviation and it's corresponding row index, we'll apply nanmax and obtain both output variables:
[stdmax, idx] = nanmax(nanstd(Arr2, 0, 2));
Now idx holds hold the index of the desired row.
Example
Let's run this code on the input that you provided in your question:
Arr = [0.1 0 0.33 0 0.55 0;
0.01 0 0.10 0 0.2 0;
1 0.1 0 0 0 0;
0.55 0 0.33 0 0.15 0;
0.17 0.17 0.17 0.17 0.17 0.17];
Arr2 = Arr;
Arr2(~Arr) = NaN;
[maxstd, idx] = nanmax(nanstd(Arr2, 0, 2))
idx =
3
Note that the values in row #3 differ one from another much more than those in row #1, and therefore the standard deviation of row #3 is greater. This also corresponds to your comment:
... ergo a row with 3 zero and 3 non-zero but close values is worse than a row with 4 zeros and 2 very different values.
For this reason I believe that in this case 3 is indeed the correct answer.
It seems like you wish to ignore 0s in your matrix. You may achieve this by setting them to NaN and proceed using special build-in functions that ignore NaNs (e.g., nanmin, nanmax, etc.)
Here is a sample code for finding the row (ri) with the largest difference between minimal (nonzero) response and the maximal response:
nArr = Arr;
nArr( Arr == 0 ) = NaN; % replace zeros with NaNs
mn = nanmin(nArr, [], 2); % find minimal, non zero response at each row
mx = nanmax(nArr, [], 2); % maximal response
[~, ri] = nanmax( mx - mn ); % fid the row with maximal difference

Resources