How do you fill-in missing values despite differences in index values? - arrays

Here's my situation. I have a predicted values in the form of array (i.e. ([1,3,1,2,3,...3]) ) and a data frame column of missing NA's. Both array and column of data frame have the same dimensions. But, the indices don't match another.
For instance, the indices of predicted array are 0:100.
On the other hand, the indices of the column of NA's don't begin with 0, rather the first index where NA is observed in the dataFrame.
What's Pandas function will fill-in the first missing value with the first element of predicted array, second missing value with the second element, and so forth?

Assuming your missing data is represented in the DF as NaN/None values:
df = pd.DataFrame({'col1': [2,3,4,5,7,6,5], 'col2': [2,3,None,5,None,None,5],}) # Column 2 has missing values
pred_vals = [11, 22, 33] # Predicted values to be inserted in place of the missing values
print 'Original:'
print df
missing = df[pd.isnull(df['col2'])].index # Find indices of missing values
df.loc[missing, 'col2'] = pred_vals # Replace missing values
print '\nFilled:'
print df
Result:
Original:
col1 col2
0 2 2
1 3 3
2 4 NaN
3 5 5
4 7 NaN
5 6 NaN
6 5 5
Filled:
col1 col2
0 2 2
1 3 3
2 4 11
3 5 5
4 7 22
5 6 33
6 5 5

Related

Pandas How to Align Two Columns in a DataFrame and NaN empty cells

I'm using Python 3.8.8
I have a DataFrame structured like this:
A
B
0
1
1
2
2
1
3
7
4
7
5
8
and an array:
C = [3, 4, 7]
I would like to add an array "C" as a new column to the DataFrame. The problem is this array has a different length of index than the df. I would like to make up for the difference in length in C by filling the empty cells with NaNs. My desired result would look something like:
A
B
C
0
1
NaN
1
2
NaN
2
1
3
3
7
4
4
7
7
5
8
NaN
What I am looking for specifically is a way to add C starting at a specific index of the df, but I don't know how to work around the discrepancy between the length of the df and array.
Thank you for your time
To get around the problem of 'different length' when putting your list into the dataframe, you can convert it to a pandas series. Once you do that, you can easily add it to your dataframe with the rest of the values being filled with np.nan.
In your case, you can specifically also set the index when you convert your C list to a series, which you can then assign to your dataframe. Pandas nature to align data on indices will place the series on the right index
Consider using the code below:
c = pd.Series([3, 4, 7],index=[2,3,4])
df['C'] = c
prints:
A B 0
0 0 1 NaN
1 1 2 NaN
2 2 1 3.0
3 3 7 4.0
4 4 7 7.0
5 5 8 NaN
Renaming 0 should be trivial.

How to find minimum value of a column imported from Excel using MATLAB

I have a set of values in the following pattern.
A B C D
1 5 6 11
2 6 5 21
3 7 3 42
4 3 7 22
1 2 3 54
2 3 2 43
3 4 3 27
4 3 2 14
I exported the every column into MATLAB workspace as follows.
A = xlsread('F:\R.xlsx','Complete Data','A2:A43');
B = xlsread('F:\R.xlsx','Complete Data','B2:B43');
C = xlsread('F:\R.xlsx','Complete Data','C2:C43');
D = xlsread('F:\R.xlsx','Complete Data','D2:D43');
I need help with code where the it has to check the Column A, find the lowest D value and output the corresponding B and C values. I need the output to look like.
1 5 6 11
2 6 5 21
3 4 3 27
4 3 2 14
I read through related questions and understand that I need to make it a matrix and sort it based on the element on the 4th column using
sortrows
and get indices of the sorted elements. But I am stuck here. Please Guide me.
You can export those columns in one go as:
ABCD = xlsread('F:\R.xlsx','Complete Data','A2:D43');
Now use sortrows to sort the rows according to the first and the fourth column.
req = sortrows(ABCD, [1 4]);
☆ If all elements of the first column exist twice then:
req = req(1:2:end,:);
☆ If it is not necessary that all elements of the first column will exist twice then:
[~, ind] = unique(req(:,1));
req = req(ind,:);

Extracting array values based on values in different dimension

I've got a problem with subsetting values of an array.
raw.table <- array(data = c(1:12,13:24,rep(1:6, each=2)),
dim=c(3,4,3),
dimnames=list(LETTERS[1:3],1:4,c("target","ctrl","samples")))
The first two dimensions of my array represent some values that I want to do statistics on and the higher dimensions contain different attributes I want to use to access specific subsets. In this case I have only sample numbers, whereas there are always two values assigned to the same sample number (measurement replicates).
, , target
1 2 3 4
A 1 4 7 10
B 2 5 8 11
C 3 6 9 12
, , ctrl
1 2 3 4
A 13 16 19 22
B 14 17 20 23
C 15 18 21 24
, , samples
1 2 3 4
A 1 2 4 5
B 1 3 4 6
C 2 3 5 6
How do I access the values in dimension 1 (= target) that have the same sample number denoted in dimension 3 (= samples)? I tried out different approaches using unique(), duplicated() and match() but without coming to a result. I just cannot wrap my head about the indexing of arrays -.-
Cheers,
zuup
Form a logical index with a logical test (across dimensions):
> raw.table[,,1] == raw.table[,,3]
1 2 3 4
A TRUE FALSE FALSE FALSE
B FALSE FALSE FALSE FALSE
C FALSE FALSE FALSE FALSE
And use it to select items from the first dimension (and since they will be equal length there is no recycling):
> raw.table[, , 1 ][ raw.table[,,1] == raw.table[,,3] ]
[1] 1
Chaining calls to the Extract-operator is perfectly acceptable in R

How to logically index entire columns in MATLAB

Given a logical column vector (size n x 1) v and an array a (size m x n) how do I generate a new array consisting of all the columns in a where the numerical index of said column (1...n) is 1 at the corresponding location in v.
So for example if v was
1
0
0
1
and a was
1 4 7 10
2 5 8 11
3 6 9 12
the new array would be
1 10
2 11
3 12
because the first and fourth elements of v are 1 (true), so the new array should contain the first and fourth columns of a.
I have tried a bunch of things involving normal logical indexing and transpose but I can't get it to work. All help is appreciated
You want to use the logical indexing to select the columns and select all rows. In the example below, I have explicitly cast v as a logical just in case it's not a logical matrix already.
new = a(:, logical(v))
1 10
2 11
3 12

Associating / linking an array column with another column in the array

I have an array that has some calcultations done on the second column. I would like the values from the third column to follow/be linked to the second column.
Test Code:
a1= [1,10,-11;
2,70,232;
3,33.2,-33;
4,40,44;]
a2calc=abs(a1(:,2)-max(a1(:,2))) %calculation
a2=[a1(:,1),a2calc,a1(:,3)] %new array
Example:
original a1 Array
1 10 -11
2 70 232
3 33.2 -33
4 40 44
a2 Array after column 2 calculations looks like this
1 60 -11
2 0 232
3 36.8 -33
4 30 44
I'm trying to get the final array to look like this (column 3 values follow / are linked to the second column)
1 60 232
2 0 -11
3 36.8 44
4 30 -33
What I'm having problems with is I'm not sure if I should use the index values of column 2 and if so how I can get it to look like the final output array I included in the question.
I might be wrong here, but it looks to me like the logic is:
After calculating the second column, change the order of the third column so that the third column is sorted the same way as the second. To see what I mean:
This represents the two columns, numbered from highest to lowest:
A = 1 1
4 3
2 2
3 4
If I understand it right, you want the resulting matrix to be
A = 1 1
4 4
2 2
3 3
If this is the right logic then you should check out sort with two outputs. You can use the second output to index the third column.
[~, idx] = sort(A(:, 2));
sorted_3 = sort(A(:, 3));
A(idx, 3) = sorted_3;
The output from this is:
A =
1.00000 60.00000 232.00000
2.00000 0.00000 -33.00000
3.00000 36.80000 44.00000
4.00000 30.00000 -11.00000
Good luck!

Resources