Pandas: filling missing values of a dataframe column from a numpy array - arrays

I have a numpy array of size k, and a pandas dataframe with a column of size n>k that contains k missing values.
Is there an easy way to fill the k missing values from the numpy array correspondingly (that is, first occurred missing value in the column of the dataframe corresponds to the next value in the array)?

Something like this might work. You may also want to consider what order (i.e. sorting) you want to fill these values in.
fill_values = list(range(k)) #or whatever your array is
indicies_of_missing = df[df['myColumn'].isnull()].index # list of the missing indices
for fill_index, dataframe_index in enumerate(indicies_of_missing):
dataframe.loc[dataframe_index, 'myColumn'] = fill_values[fill_index]

Related

Creating/appending Numpy Array

I have two numpy arrays that are shaped (8760,1) that I want to combine into a single array that is (8760,2) and then from that, filter out any values of zero that might be in first index column, or gauge in the "data" so that I can do statistical manipulation with the temp array. I have tried np.stack and then attempted to filter out any zero values that way, but ended up with my temp array being 3D rather than still 2D.
data=np.stack((mb, gauge), axis=-1)
dta = data[:,data!=0]
idx = np.where(data[:,1]>0)
temp = data[idx,:]
I know I could filter out the zeros from gauge first, but I want to preserve the index values that go along with the mb array
np.stack joins along a newly created axis. Thus your arrays become 3D.
To join along an existing axis, you could use np.concatenate:
a1 = np.empty((100, 1))
a2 = np.empty((100, 1))
a3 = np.concatenate((a1, a2), axis=1) # will give a (100, 2) array

R: Perform a T-Test across all matching rows of two different dataframes

So, in excel, there is a formula ttest which can perform a ttest across a given row (using TWO DIFFERENT ARRAYS, e.g. =TTEST(A1:L1, M1:Z1,2,2) - in this case, two-tailed, type 2). It returns a pvalue for that row.
I would like to know if there is an analogous way to do this in R, where I have two dataframe (df1 and df2) of equal length which represent the two arrays, and the new vector returns the result of the t test for each row.
So, the T-test would take all the row1 values in df1 as the first array, and all the row1 values in df2 as the second array, return a pvalue for that row in the new ttest vector, and continue down to the last row.
Thank you for your kind help.
Note: I do not necessarily know the number of columns in each dataframe, and they will vary every time I run it. That's why I want to automate it.
The easiest way is with a for loop as in:
p <- numeric(0)
for(i in seq(nrow(df1)))
pvalues[i] <- t.test(df1[i],df2[i])$p.value
you should be warned that taking rows out of a data.frame often causes some confusing type coercion, so I would convert the data.frames to a matrix first and test that the matrix is numeric as in
mx1 <- as.matrix(df1)
stopifnot(inherits(mx1,c('integer','numeric')))
and us mx1 in place of df1 in the for loop.

length in 2 dimension array

var example is a 2-dimension array. example.length will give values like 14.3
But how can I get an integer for the length of example in second dimension, like 3 in this case?
Thank you!
If the array is homogeneous (which is always the case when such an array is the result of a getValues() call in a spreadsheet range for example) you can simply write :
example[0].length
EDIT : a few comments to be more clear ...
The 2D array you get from example = range.getValues() is always an array of rows data.
The number of rows is represented by example.length and the inner array length (representing rows content) is always example[0].length, which is actually the number of columns

Matlab cell array to string vector - unique

Going nuts with cell array, because I just can't get rid of it... However, it will be an easy one for you guys out here.
So here is why:
I have a dataset (data) which contains two variables: A (Numbers) and B (cell array).
Unfortunately I can't even reconstruct the problem nevertheless my imported table looks like this:
data=dataset;
data.A = [1;1;3;3;3];
data.B = ['A';'A';'BUU';'BUU';'A'];
where data.B is of the type 5x1 cell which I can't reconstruct
all I want now is the unique rows like
ans= [1 A;3 BUU;3 A]
the result should be in a dataset or just two vectors where the rows are equivalent.
but unique([dataA dataB],'rows') can't handle cell arrays and I can't find anywhere in the www how I simple convert the cell array B to a vector of strings (does it exist?).
cell2mat() didn't work for me, because of the different word length ('A' vs 'BUU').
Though, two things I would love to learn: Making an 5x1 cell to an string vector
and find unique rows out of numbers and strings (or cells).
Thank you very much!
Cheers Dominik
The problem is that the A and B fields are of a different type. Although they could be concatenated into a cell array, unique can't handle that. A general trick for cases like this is to "translate" elements of each field (column) to unique identifiers, i.e. numbers. This translation can be done applying unique to each field separately and getting its third output. The obtained identifiers can now be concatenated into a matrix, so that each row of this matrix is a "composite identifier". Finally, unique with 'rows' option can be applied to this matrix.
So, in your case:
[~, ~, kA] = unique(data.A);
[~, ~, kB] = unique(data.B);
[~, jR] = unique([kA kB], 'rows');
Now build the result as (same format as data)
result.A = data.A(jR);
result.B = data.B(jR);
or as (2D cell array)
result = cat(2, mat2cell(data.A(jR), ones(1,numel(jR))), data.B(jR));
Here is my clumpsy solution
tt.A = [1;1;3;3;3];
tt.B = {'A';'A';'BUU';'BUU';'A'};
Convert integers to characters, then merge and find unique strings
tt.C = cellstr(num2str(tt.A));
tt.D = cellfun(#(x,y) [x y],tt.C,tt.B,'UniformOutput',0);
[tt.F,tt.E] = unique(tt.D);
Display results
tt.F

Insert new values into an array

I currently have a column vectors of different lengths and I want to insert another column vector at various points of the original array. i.e. I want to add my new array to the start of the old array skip 10 places add my new array again, skip another 10 spaces and add my new array again and so on till the end of the array. I can do this by using:
OffsetSign = [1:30]';
Extra = [0;0;0;0;0];
OffsetSign =[Extra;OffsetSign(1:10);Extra;OffsetSign(11:20);Extra;OffsetSign(21:30)];
However this is not suitable for longer arrays. Any tips on an easy way to do this for longer arrays?
here's one way to do it:
a = [1:30]';
b = [0;0;0;0;0];
a=reshape(a,10,[]);
b=repmat(b,[1 size(a,2)])
r=[b ; a]
r=r(:);
the trick is to reshape a to a matrix with columns of the right size (10 elements each). Replicate b to this # of columns , concatenate both and flatten the matrix to a vector...

Resources