Converting pandas data frame into numpy ndarray [duplicate] - arrays

This question already has answers here:
Convert pandas dataframe to NumPy array
(15 answers)
Closed 4 years ago.
I am using a pandas data frame to clean and process data. However, I need to then convert it into a numpy ndarray in order to use exploit matrix multiplication. I turn the data frame into a list of lists with the following:
x = df.tolist()
This returns the following structure:
[[1, 2], [3, 4], [5, 6], [7, 8] ...]
I then convert it into a numpy array like this:
x = np.array(x)
However, the following print:
print(type(x))
print(type(x[0]))
gives this result:
'numpy.ndarray'
'numpy.float64'
However, I need them both to be numpy arrays. If it's not from a pandas data frame and I just convert a hard-coded list of lists then they are both ndarrays. How do I get the list, and the lists in that list to be ndarrays when that list has been made from a data frame? Many thanks for reading, this has had me stumped for hours.

I think you need values:
df = pd.DataFrame({'C':[7,8,9,4,2,3],
'D':[1,3,5,7,1,0]})
print (df)
C D
0 7 1
1 8 3
2 9 5
3 4 7
4 2 1
5 3 0
x = df.values
print (x)
[[7 1]
[8 3]
[9 5]
[4 7]
[2 1]
[3 0]]
And then select by indexing:
print (x[:,0])
[7 8 9 4 2 3]
print (x[:,1])
[1 3 5 7 1 0]
print (type(x[:,0]))
<class 'numpy.ndarray'>
Also is possible transpose array:
x = df.values.T
print (x)
[[7 8 9 4 2 3]
[1 3 5 7 1 0]]
print (x[0])
[7 8 9 4 2 3]
print (x[1])
[1 3 5 7 1 0]

How about as_matrix:
x = df.as_matrix()

You may want to try df.get_values(), and eventually np.reshape it.

Related

How can I add 1 in the beginning of a numpy array?

I have a numpy array X = [[3 4 5 6] [6 5 3 3] [9 8 5 2]]
I would like to add 1 in each array like so:
X = [[1 3 4 5 6] [1 6 5 3 3] [1 9 8 5 2]]
I wanted to do it using np.ones() and np.hstack()
This is what I tried to do
X = [[3 4 5 6] [6 5 3 3] [9 8 5 2]]
ones = np.ones(len(X))
X = np.hstack((ones, X))

NumPy Slicing HackerRank

I have wrote a function named array_slice which gets four numbers n, n_dim, n_row, n_col from the user and performs array operations given below.
Instructions:
Create an array x of shape (n_dim, n_row, n_col), having first n natural numbers.
Create a Boolean array b of shape (2,).
Print the values for following expressions: x[b] and x[b,:,1:3]
For example if we have input 30, 2, 3, 5, for each corresponding parameters n, n_dim, n_row, n_col, Then the output prints will be as:
[[[ 0 1 2 3 4] [ 5 6 7 8 9] [10 11 12 13 14]]]
[[[ 1 2] [ 6 7] [11 12]]]
The written code is:
import numpy as np
# Enter your code here. Read input from STDIN. Print output to STDOUT
def array_slice(n,n_dim,n_row,n_col):
x=np.array(n, dtype=int, ndmin=n_dim).reshape(n_row,n_col)
b=np.array([True,False],dtype="bool",ndmin=n_dim).reshape(2,)
print(x[b])
print(x[b,:,1:3])
if __name__ == '__main__':
n = int(input())
n_dim = int(input())
n_row = int(input())
n_col = int(input())
array_slice(n,n_dim,n_row,n_col)
I went through official documentation NumPy, but still couldn't understand the error. I tried all possible ways with arange and array but I'm unable to get solution. Please help me out
This passed all test cases:
x = np.arange(n, dtype=int).reshape(n_dim, n_row, n_col)
b = np.array([True, False], dtype="bool", ndmin=n_dim).reshape(2,)
print(x[b])
print(x[b, :, 1:3])
I have tried the following code for x array using np.arrange:
x = np.arange(n, dtype=int).reshape(n_dim, n_row, n_col)
it will work:
[[[ 0 1 2 3 4]
[ 5 6 7 8 9]
[10 11 12 13 14]]]
[[[ 1 2]
[ 6 7]
[11 12]]]

How to remove duplicates from a numpy array with multiple dimensions

Lets say I have the following array:
board = np.random.randint(1, 9, size=(2, 5))
How do I remove duplicates from each element in the array
e.g.
[[6 1 2 8 4]
[8 3 2 3 6]]
So here there are two 3s, and I want one of those to be deleted, how can I perform such an action?
Given your example, it seems that you don't want repetition relatively to rows. You may be interested in numpy.random.choice and try something like this:
import numpy as np
nb_lines = 2
nb_columns = 5
min_value = 1
max_value = 9
range_value = max_value-min_value
# The number of columns should be <= than the integer range to have a solution
assert(range_value+1 >= nb_columns)
board = min_value + np.array([
np.random.choice(np.arange(range_value+1), nb_columns, replace=False)
for l in range(nb_lines)
])
print(board)
Output:
% python3 script.py
[[7 4 6 3 1]
[2 8 6 4 3]]

How to plot each column of a cell array?

I have a cell array that looks like this:
Column1 Column2
[1 2 3 4] [2 5 6 9]
[1 3 4] [3 4 7 8]
[2 3 4] [1 3 7 9]
[1 2 4] [1 4 6 8]
There are a few more columns that have similar styles of data. I need to create a way to make a graph of each column (separate graphs for each column of the array), that plots each point as a number from each double as the x-coordinate, and the row as the y-coordinate. It should look something like this:
(Row)
1 x x x x
2 x x x
3 x x x
4 x x x
1 2 3 4
X is just a point on the graph.
Does this make enough sense? I feel like I'm making 0 progress in explaining what I want. If anyone doesn't understand this, feel free to ask questions and I'll answer them as best I can.
Something like this?
cin = { {[1 2 3 4] , [1 3 4], [2 3 4], [1 2 4]}, {[1 2 3 8] , [1 3 4], [2 3 4], [1 2 4]} };
for k=1:numel(cin)
col_k = cin{k};
figure(); %// 1 figure per column
for y=1:numel(col_k)
plot(col_k{y}, y);
hold on;
end
end

Place equal elements in cell array

I have an array. I sorted it, so I have sorted array and indeces of sorted elements in the initial array.
Fo example, from [4 5 4 4 4 4 5 4] I got [4 4 4 4 4 4 5 5] and [1 3 4 5 6 8 2 7].
How to place recieved indeces in a cell array, so that in one cell will be indeces of equal elements? For my example, it will be: {1 3 4 5 6 8}, {2 7}.
I'm searching for non-loop way to solve it.
Use accumarray:
x = [4 5 4 4 4 4 5 4]; %// data
[~, ~, jj] = unique(x);
result = accumarray(jj(:), 1:numel(x), [], #(v) {v(:).'});
Or, if you need each set of indices sorted:
result = accumarray(jj(:), 1:numel(x), [], #(v) {sort(v(:)).'});

Resources