I am trying to build a bar chart with the bars shown in a descending order.
In my code, the numpy array is a result of using SelectKmeans() to select the best features in a machine learning problem depending on their variance.
import numpy as np
import matplotlib.pyplot as plt
flist = ['int_rate', 'installment', 'log_annual_inc','dti', 'fico', 'days_with_cr_line', 'revol_bal', 'revol_util', 'inq_last_6mths','pub_rec']
fimportance = np.array([250.14120228,23.95686725,10.71979245,13.38566487,219.41737141,
8.19261323,27.69341779,64.96469182,218.77495366,22.7037686 ]) # this is the numpy.ndarray after running SelectKBest()
print(fimportance) # this gives me 'int_rate', 'fico', 'revol_util', 'inq_last_6mths' as 4 most #important features as their variance values are mapped to flist, e.g. 250 relates to'int_rate' and 218 relates to 'inq_last_6mths'.
[250.14120228 23.95686725 10.71979245 13.38566487 219.41737141
8.19261323 27.69341779 64.96469182 218.77495366 22.7037686 ]
So I want to show these values on my bar chart in descending order, with int_rate on top.
fimportance_sorted = np.sort(fimportance)
fimportance_sorted
array([250.14120228, 219.41737141, 218.77495366, 64.96469182,
27.69341779, 23.95686725, 22.7037686 , 13.38566487,
10.71979245, 8.19261323])
# this bar chart is not right because here the values and indices are messed up.
plt.barh(flist, fimportance_sorted)
plt.show()
Next I have tried this.
plt.barh([x for x in range(len(fimportance))], fimportance)
I understand I need to map these indices to the flist values somehow and then sort them. Maybe by creating an array and then mapping my list labels instead of its index. here I am stuck.
for i,v in enumerate(fimportance):
arr = np.array([i,v])
.....
Thank you for your help with this problem.
the values and indices are messed up
That's because you sorted fimportance (fimportance_sorted = np.sort(fimportance)), but the order of labels in flist remained unchanged, so now labels don't correspond to the values in fimportance_sorted.
You can use numpy.argsort to get the indices that would put fimportance into sorted order and then index both flist and fimportance with these indices:
>>> import numpy as np
>>> flist = ['int_rate', 'installment', 'log_annual_inc','dti', 'fico', 'days_with_cr_line', 'revol_bal', 'revol_util', 'inq_last_6mths','pub_rec']
>>> fimportance = np.array([250.14120228,23.95686725,10.71979245,13.38566487,219.41737141,
... 8.19261323,27.69341779,64.96469182,218.77495366,22.7037686 ])
>>> idx = np.argsort(fimportance)
>>> idx
array([5, 2, 3, 9, 1, 6, 7, 8, 4, 0])
>>> flist[idx]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: only integer scalar arrays can be converted to a scalar index
>>> np.array(flist)[idx]
array(['days_with_cr_line', 'log_annual_inc', 'dti', 'pub_rec',
'installment', 'revol_bal', 'revol_util', 'inq_last_6mths', 'fico',
'int_rate'], dtype='<U17')
>>> fimportance[idx]
array([ 8.19261323, 10.71979245, 13.38566487, 22.7037686 ,
23.95686725, 27.69341779, 64.96469182, 218.77495366,
219.41737141, 250.14120228])
idx is the order in which you need to put elements of fimportance to sort it. The order of flist must match the order of fimportance, so index both with idx.
As a result, elements of np.array(flist)[idx] correspond to elements of fimportance[idx].
The Goal
(Forgive me for length of this, it's mostly background and detail.)
I'm contributing to a TOML encoder/decoder for MATLAB and I'm working with numerical arrays right now. I want to input (and then be able to write out) the numerical array in the same format. This format is the nested square-bracket format that is used by numpy.array. For example, to make multi-dimensional arrays in numpy:
The following is in python, just to be clear. It is a useful example though my work is in MATLAB.
2D arrays
>> x = np.array([1,2])
>> x
array([1, 2])
>> x = np.array([[1],[2]])
>> x
array([[1],
[2]])
3D array
>> x = np.array([[[1,2],[3,4]],[[5,6],[7,8]]])
>> x
array([[[1, 2],
[3, 4]],
[[5, 6],
[7, 8]]])
4D array
>> x = np.array([[[[1,2],[3,4]],[[5,6],[7,8]]],[[[9,10],[11,12]],[[13,14],[15,16]]]])
>> x
array([[[[ 1, 2],
[ 3, 4]],
[[ 5, 6],
[ 7, 8]]],
[[[ 9, 10],
[11, 12]],
[[13, 14],
[15, 16]]]])
The input is a logical construction of the dimensions by nested brackets. Turns out this works pretty well with the TOML array structure. I can already successfully parse and decode any size/any dimension numeric array with this format from TOML to MATLAB numerical array data type.
Now, I want to encode that MATLAB numerical array back into this char/string structure to write back out to TOML (or whatever string).
So I have the following 4D array in MATLAB (same 4D array as with numpy):
>> x = permute(reshape([1:16],2,2,2,2),[2,1,3,4])
x(:,:,1,1) =
1 2
3 4
x(:,:,2,1) =
5 6
7 8
x(:,:,1,2) =
9 10
11 12
x(:,:,2,2) =
13 14
15 16
And I want to turn that into a string that has the same format as the 4D numpy input (with some function named bracketarray or something):
>> str = bracketarray(x)
str =
'[[[[1,2],[3,4]],[[5,6],[7,8]]],[[[9,10],[11,12]],[[13,14],[15,16]]]]'
I can then write out the string to a file.
EDIT: I should add, that the function numpy.array2string() basically does exactly what I want, though it adds some other whitespace characters. But I can't use that as part of the solution, though it is basically the functionality I'm looking for.
The Problem
Here's my problem. I have successfully solved this problem for up to 3 dimensions using the following function, but I cannot for the life of me figure out how to extend it to N-dimensions. I feel like it's an issue of the right kind of counting for each dimension, making sure to not skip any and to nest the brackets correctly.
Current bracketarray.m that works up to 3D
function out = bracketarray(in, internal)
in_size = size(in);
in_dims = ndims(in);
% if array has only 2 dimensions, create the string
if in_dims == 2
storage = cell(in_size(1), 1);
for jj = 1:in_size(1)
storage{jj} = strcat('[', strjoin(split(num2str(in(jj, :)))', ','), ']');
end
if exist('internal', 'var') || in_size(1) > 1 || (in_size(1) == 1 && in_dims >= 3)
out = {strcat('[', strjoin(storage, ','), ']')};
else
out = storage;
end
return
% if array has more than 2 dimensions, recursively send planes of 2 dimensions for encoding
else
out = cell(in_size(end), 1);
for ii = 1:in_size(end) %<--- this doesn't track dimensions or counts of them
out(ii) = bracketarray(in(:,:,ii), 'internal'); %<--- this is limited to 3 dimensions atm. and out(indexing) need help
end
end
% bracket the final bit together
if in_size(1) > 1 || (in_size(1) == 1 && in_dims >= 3)
out = {strcat('[', strjoin(out, ','), ']')};
end
end
Help me Obi-wan Kenobis, y'all are my only hope!
EDIT 2: Added test suite below and modified current code a bit.
Test Suite
Here is a test suite to use to see if the output is what it should be. Basically just copy and paste it into the MATLAB command window. For my current posted code, they all return true except the ones more than 3D. My current code outputs as a cell. If your solution output differently (like a string), then you'll have to remove the curly brackets from the test suite.
isequal(bracketarray(ones(1,1)), {'[1]'})
isequal(bracketarray(ones(2,1)), {'[[1],[1]]'})
isequal(bracketarray(ones(1,2)), {'[1,1]'})
isequal(bracketarray(ones(2,2)), {'[[1,1],[1,1]]'})
isequal(bracketarray(ones(3,2)), {'[[1,1],[1,1],[1,1]]'})
isequal(bracketarray(ones(2,3)), {'[[1,1,1],[1,1,1]]'})
isequal(bracketarray(ones(1,1,2)), {'[[[1]],[[1]]]'})
isequal(bracketarray(ones(2,1,2)), {'[[[1],[1]],[[1],[1]]]'})
isequal(bracketarray(ones(1,2,2)), {'[[[1,1]],[[1,1]]]'})
isequal(bracketarray(ones(2,2,2)), {'[[[1,1],[1,1]],[[1,1],[1,1]]]'})
isequal(bracketarray(ones(1,1,1,2)), {'[[[[1]]],[[[1]]]]'})
isequal(bracketarray(ones(2,1,1,2)), {'[[[[1],[1]]],[[[1],[1]]]]'})
isequal(bracketarray(ones(1,2,1,2)), {'[[[[1,1]]],[[[1,1]]]]'})
isequal(bracketarray(ones(1,1,2,2)), {'[[[[1]],[[1]]],[[[1]],[[1]]]]'})
isequal(bracketarray(ones(2,1,2,2)), {'[[[[1],[1]],[[1],[1]]],[[[1],[1]],[[1],[1]]]]'})
isequal(bracketarray(ones(1,2,2,2)), {'[[[[1,1]],[[1,1]]],[[[1,1]],[[1,1]]]]'})
isequal(bracketarray(ones(2,2,2,2)), {'[[[[1,1],[1,1]],[[1,1],[1,1]]],[[[1,1],[1,1]],[[1,1],[1,1]]]]'})
isequal(bracketarray(permute(reshape([1:16],2,2,2,2),[2,1,3,4])), {'[[[[1,2],[3,4]],[[5,6],[7,8]]],[[[9,10],[11,12]],[[13,14],[15,16]]]]'})
isequal(bracketarray(ones(1,1,1,1,2)), {'[[[[[1]]]],[[[[1]]]]]'})
I think it would be easier to just loop and use join. Your test cases pass.
function out = bracketarray_matlabbit(in)
out = permute(in, [2 1 3:ndims(in)]);
out = string(out);
dimsToCat = ndims(out);
if iscolumn(out)
dimsToCat = dimsToCat-1;
end
for i = 1:dimsToCat
out = "[" + join(out, ",", i) + "]";
end
end
This also seems to be faster than the route you were pursing:
>> x = permute(reshape([1:16],2,2,2,2),[2,1,3,4]);
>> tic; for i = 1:1e4; bracketarray_matlabbit(x); end; toc
Elapsed time is 0.187955 seconds.
>> tic; for i = 1:1e4; bracketarray_cris_luengo(x); end; toc
Elapsed time is 5.859952 seconds.
The recursive function is almost complete. What is missing is a way to index the last dimension. There are several ways to do this, the neatest, I find, is as follows:
n = ndims(x);
index = cell(n-1, 1);
index(:) = {':'};
y = x(index{:}, ii);
It's a little tricky at first, but this is what happens: index is a set of n-1 strings ':'. index{:} is a comma-separated list of these strings. When we index x(index{:},ii) we actually do x(:,:,:,ii) (if n is 4).
The completed recursive function is:
function out = bracketarray(in)
n = ndims(in);
if n == 2
% Fill in your n==2 code here
else
% if array has more than 2 dimensions, recursively send planes of 2 dimensions for encoding
index = cell(n-1, 1);
index(:) = {':'};
storage = cell(size(in, n), 1);
for ii = 1:size(in, n)
storage(ii) = bracketarray(in(index{:}, ii)); % last dimension automatically removed
end
end
out = { strcat('[', strjoin(storage, ','), ']') };
Note that I have preallocated the storage cell array, to prevent it from being resized in every loop iteration. You should do the same in your 2D case code. Preallocating is important in MATLAB for performance reasons, and the MATLAB Editor should warm you about this too.