Pandas How to Align Two Columns in a DataFrame and NaN empty cells - arrays

I'm using Python 3.8.8
I have a DataFrame structured like this:
A
B
0
1
1
2
2
1
3
7
4
7
5
8
and an array:
C = [3, 4, 7]
I would like to add an array "C" as a new column to the DataFrame. The problem is this array has a different length of index than the df. I would like to make up for the difference in length in C by filling the empty cells with NaNs. My desired result would look something like:
A
B
C
0
1
NaN
1
2
NaN
2
1
3
3
7
4
4
7
7
5
8
NaN
What I am looking for specifically is a way to add C starting at a specific index of the df, but I don't know how to work around the discrepancy between the length of the df and array.
Thank you for your time

To get around the problem of 'different length' when putting your list into the dataframe, you can convert it to a pandas series. Once you do that, you can easily add it to your dataframe with the rest of the values being filled with np.nan.
In your case, you can specifically also set the index when you convert your C list to a series, which you can then assign to your dataframe. Pandas nature to align data on indices will place the series on the right index
Consider using the code below:
c = pd.Series([3, 4, 7],index=[2,3,4])
df['C'] = c
prints:
A B 0
0 0 1 NaN
1 1 2 NaN
2 2 1 3.0
3 3 7 4.0
4 4 7 7.0
5 5 8 NaN
Renaming 0 should be trivial.

Related

Read an Array of Arrays separated by blank lines in Julia

I have an array of arrays stored as blocks of tabular data in a textfile. The blocks have different number of rows but the same number of columns. Like this:
7 9
9 9
7 1
1 1
3 3
4 1
And so on.
I would like to read them in Julia and to end with an array of arrays, or an array of 2 dimensional arrays, like this:
a=[ [7, 9; 9 ,9], [7, 1; 1, 1 ; 3 3 ]] ...
I am trying with different ideas with do syntax, but I am being not very succesfull yet.
aux=[]
open("cell.dat") do f
aux=[]
aux2=[]
for line in eachline(f)
if line != ""
aux2=vcat(aux2,line)
else
print("tuabuela")
aux=vcat(aux,aux2)
print(aux, aux2)
aux2=[]
end
end
end
I end with an empty array!
There are many ways to do it. Suppose the file is not huge and you have read your file to String called dat:
dat="""7 9
9 9
7 1
1 1
3 3
4 1"""
In that case you can do:
julia> readdlm.(IOBuffer.(split(dat,"\n\n")))
3-element Vector{Matrix{Float64}}:
[7.0 9.0; 9.0 9.0]
[7.0 1.0; 1.0 1.0; 3.0 3.0]
[4.0 1.0]

reshaping and re-arranging array using octave / matlab

I'm trying to reshape an array, perform an operation and then reshape it back to the original. See example of the output I'm trying to get. I can get a and b but I'm having trouble getting c to look like a again.
Step 1) (the original array)
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
Step 2) (reshape and perform some operation)
1,1,1,2,2,2,3,3,3,4,4,4,5,5,5
Step 3) (array is reshaped back to the original size to look like step 1) this is what I want
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
I can get the variables a and b but I'm not sure how to reshape c from b to look like a again see example code and output below
a=[repmat(1,[1,3]);repmat(2,[1,3]);repmat(3,[1,3]);repmat(4,[1,3]);repmat(5,[1,3])]
[rw,col]=size(a)
b=reshape(a',1,rw*col)
c=reshape(b,rw,col)
a=
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
b=1,1,1,2,2,2,3,3,3,4,4,4,5,5,5
c =
1 2 4
1 3 4
1 3 5
2 3 5
2 4 5
Ps: I'm using Octave 4.0 which is like Matlab.
MATLAB and Octave use column-major ordering so you'll need to reshape the result with that in mind. The data will be filled down the columns first but you want it to fill the columns first. To achieve this, you can specify the number of columns as the number of rows provided to reshape and then transpose the result
c = reshape(b, 3, []).'
Or more flexibly
c = reshape(b, flip(size(a))).'

How to convert data frame columns values into an array without loop

I have a data frame like this:
df = pd.DataFrame({'A': [10,10,11,14], 'B':[2,3,3,5]})
It looks like this:
A B
0 10 2
1 10 3
2 11 3
3 14 5
I want to convert to this, with A as the row index, and store B's values inside the array or matrix:
10 2 3
11 3
14 5
Is there python way of doing this without looking in each row in data frame df?
many thanks
Use groupby:
df.groupby('A')
Then you can (for instance) get the mean of the grouped version by:
df.groupby('A').mean()
which result in:
B
A
10 2.5
11 3.0
14 5.0

How do you fill-in missing values despite differences in index values?

Here's my situation. I have a predicted values in the form of array (i.e. ([1,3,1,2,3,...3]) ) and a data frame column of missing NA's. Both array and column of data frame have the same dimensions. But, the indices don't match another.
For instance, the indices of predicted array are 0:100.
On the other hand, the indices of the column of NA's don't begin with 0, rather the first index where NA is observed in the dataFrame.
What's Pandas function will fill-in the first missing value with the first element of predicted array, second missing value with the second element, and so forth?
Assuming your missing data is represented in the DF as NaN/None values:
df = pd.DataFrame({'col1': [2,3,4,5,7,6,5], 'col2': [2,3,None,5,None,None,5],}) # Column 2 has missing values
pred_vals = [11, 22, 33] # Predicted values to be inserted in place of the missing values
print 'Original:'
print df
missing = df[pd.isnull(df['col2'])].index # Find indices of missing values
df.loc[missing, 'col2'] = pred_vals # Replace missing values
print '\nFilled:'
print df
Result:
Original:
col1 col2
0 2 2
1 3 3
2 4 NaN
3 5 5
4 7 NaN
5 6 NaN
6 5 5
Filled:
col1 col2
0 2 2
1 3 3
2 4 11
3 5 5
4 7 22
5 6 33
6 5 5

sorting matrix in matlab based on another vector

I have a 2D matrix and want to sort rows and columns based on two other vectors i.e. one for ordering rows another for ordering columns in MATLAB
Example: A (Matrix to order)
0 1 2 3 4
1 1 8 9 7
2 3 4 6 2
3 1 2 0 8
Row Vector (Order for sorting rows of matrix A)
1
4
2
3
And column vector
1 5 4 2 3
Modified A
0 4 3 1 2
3 8 0 1 2
1 7 9 1 8
2 2 6 3 4
How about:
ModifiedA=A(RowVector,ColumnVector);
Note: Matab's indexing starts at 1 not at 0, adapt your indexing vectors accordingly.
In MATLAB, you can use the second output of sort to get the 1-based indexes that MATLAB is looking for (in this case you could have just added 1, but using sort works even if the row and column vectors are not consecutive).
[~,rowIdx] = sort(rowVector);
[~,colIdx] = sort(colVector);
And then you can apply the indexing operation to the matrix:
modifiedA = A(rowIdx, colIdx);

Resources