Dividing an array by an array column in R - arrays

My data is the following:
print(xr)
[1] 1.1235685 1.0715964 0.2043725 4.0639341
> class(xr)
[1] "array"
I'm trying to divide the values of all the columns in my array by the value given by the 1st column (ie, 1.1235685). The resulting array would be:
1.000 0.953 0.181 3.616
How would I do this in R, given my R-data object type? The columns do not have names, because of the datatype. (If there's a way I can assign a column names before dividing them, then that's even better.)
I'm new to R, so apologies for the simple question.
Thank you.

Some people already answered this in the comments, but I'll try to provide a more comprehensive one. The code to do what you want is pretty simple.
xr <- array(data = c(1.1235685, 1.0715964, 0.2043725, 4.0639341))
xr/xr[1]
However, if you created that array with only one dimension, I would recommend you use a numeric vector instead, which has no "dim" attribute. You'd create it as follows:
xr <- c(1.1235685, 1.0715964, 0.2043725, 4.0639341))
xr/xr[1]

Related

Unique values in particular column only 2-d array (using numpy)

I have a 2-d array in numpy. I wish to obtain unique values only in a particular column.
import numpy as np
data = np.genfromtxt('somecsvfile',dtype='str',delimiter=',')
#data looks like
[a,b,c,d,e,f,g],
[e,f,z,u,e,n,c],
...
[g,f,z,u,a,v,b]
Using numpy/scipy only, how do I obtain an array or list of unique values in the 5th column. (I know it can easily be done with pandas.)
The expected output would be 2 values: [e,a]
Correct answer posted. A simple referencing question in essence.
np.unique(data[:, 4])
With thanks.

Excel: creating an array with n times a constant

I have been looking around for a while but unable to find an answer to my question.
In Excel, what compact formula can I use to create an array made up of a single element repeated n times, where n is an input (potentially hard-coded)?
For example, something that would look like this (the formula below does not work but gives an idea of what I am looking for):
{={"Constant"}*3}
Note: I am not looking for a VBA-based solution.
EDIT Reading #AxelRichter answer, I see I should also indicate that the formulas below assume Constant is a number. If Constant is text, then this solution will not work.
Volatile:
=ROW(INDIRECT("1:" & Repts))/ROW(INDIRECT("1" & ":" & Repts)) * Constant
non-Volatile:
=ROW(INDEX($1:$65535,1,1):INDEX($1:$65535,Repts,1))/ROW(INDEX($1:$65535,1,1):INDEX($1:$65535,Repts,1))*Constant
If
Constant = 14
Repts = 3
then
Result = {14;14;14}
The first part of the formulas create an array of 1's repeated Repts times. Then we multiply that array by Constant to get the desired result.
And after reading #MacroMarc's comment, the following non-volatile formula shouyld also work for numbers:
=(ROW($A$1:INDEX($A:$A,Repts))>0)*Constant
One could concatenate 1:n empty cells to the "Constant" to create a string array having n items "Constant":
"Constant"&INDEX(XFD:XFD,1):INDEX(XFD:XFD,3)
There 3 is n.
Used in Formula
=INDEX("Constant"&INDEX(XFD:XFD,1):INDEX(XFD:XFD,3),0)
Evaluate Formula shows that it works:
Here column XFD is used because in most cases this column will be empty and a column which is guaranteed to be empty is needed for this solution.
If used
"Constant"&T(ROW($A$1:INDEX($A:$A,3)))
=INDEX("Constant"&T(ROW($A$1:INDEX($A:$A,3))),0)
the need of an empty column disappears. The function ROW returns numbers but the T returns an empty string if its parameter is not text. So empty strings will be concatenated for each 1:3 (n).
Thanks to #MacroMarc for the hint.
Try:
REPT("Constant", SEQUENCE(3,1,1,0))
Or, if the reference is to a dynamic array:
REPT("Constant", SEQUENCE(A1#,1,1,0))
The dynamic array spills, and has your constant repeated one time.
Using SEQUENCE with a step of 0 is a much cleaner way to make an array of constants. You can choose whether you want rows or columns (or both!) as well.
=SEQUENCE(Repts,1,Constant,0)
I will generally use a sequence (like Claire (above) said). But if you want to provide an output of text objects, I would do it this way:
=IF(SEQUENCE(A1,A2,1,0),A3)
Where:
A1 has the number of rows
A2 has the number of columns
A3 has the thing you want repeated into an array
The sequence will create a matrix of 1's, which the IF statement will default to the TRUE expression (being the contents of A3).
So, if you wanted a vertical list of 3 items that says "Constant", this would do it:
=IF(SEQUENCE(3,,1,0),"Constant")
If you would prefer it be arranged horizontally instead of vertically, just amend the SEQUENCE function:
=IF(SEQUENCE(,3,1,0),"Constant")

aggregate values of one colum by classes in second column using numpy

I've a numpy array with shape N,2 and N>10000. I the first column I have e.g. 6 class values (e.g. 0.0,0.2,0.4,0.6,0.8,1.0) in the second column I have float values. Now I want to calculate the average of the second column for all different classes of the first column resulting in 6 averages one for each class.
Is there a numpy way to do this, to avoid manual loops especially if N is very large?
In pure numpy you would do something like:
unq, idx, cnt = np.unique(arr[:, 0], return_inverse=True,
return_counts=True)
avg = np.bincount(idx, weights=arr[:, 1]) / cnt
I copied the answer from Warren to here, since it solves my problem best and I want to check it as solved:
This is a "groupby/aggregation" operation. The question is this close
to being a duplicate of
getting median of particular rows of array based on index.
... You could also use scipy.ndimage.labeled_comprehension as
suggested there, but you would have to convert the first column to
integers (e.g. idx = (5*data[:, 0]).astype(int)
I did exactly this.

Searching for identical values in two arrays

I know this may seem trivial to most/all of you, but I've been scratching my head over this for some time now.
I want to find values in the first column (values in the first column of each array are 'years') of 2 separate arrays that are identical in size (2400 x 2). Once such values have been found, I am trying to store the corresponding values in column 2 of both arrays in a new array called z. My code is as follows:
n=1;
a_years=austrianAtmosphericTemperaturesTest(n,1);
r_years=frenchRainDataSimplified(n,1);
while n<=2400
for a_years = r_years
z=[frenchRainDataSimplified(n,1) austrianAtmosphericTemperaturesTest(n,2) frenchRainDataSimplified(n,2)];
n=n+1;
end
end
I have tried other methods like find & ismember but I'm having no luck!
Thanks,
Chris

How to get mean, median, and other statistics over entire matrix, array or dataframe?

I know this is a basic question but for some strange reason I am unable to find an answer.
How should I apply basic statistical functions like mean, median, etc. over entire array, matrix or dataframe to get unique answers and not a vector over rows or columns
Since this comes up a fair bit, I'm going to treat this a little more comprehensively, to include the 'etc.' piece in addition to mean and median.
For a matrix, or array, as the others have stated, mean and median will return a single value. However, var will compute the covariances between the columns of a two dimensional matrix. Interestingly, for a multi-dimensional array, var goes back to returning a single value. sd on a 2-d matrix will work, but is deprecated, returning the standard deviation of the columns. Even better, mad returns a single value on a 2-d matrix and a multi-dimensional array. If you want a single value returned, the safest route is to coerce using as.vector() first. Having fun yet?
For a data.frame, mean is deprecated, but will again act on the columns separately. median requires that you coerce to a vector first, or unlist. As before, var will return the covariances, and sd is again deprecated but will return the standard deviation of the columns. mad requires that you coerce to a vector or unlist. In general for a data.frame if you want something to act on all values, you generally will just unlist it first.
Edit: Late breaking news(): In R 3.0.0 mean.data.frame is defunctified:
o mean() for data frames and sd() for data frames and matrices are
defunct.
By default, mean and median etc work over an entire array or matrix.
E.g.:
# array:
m <- array(runif(100),dim=c(10,10))
mean(m) # returns *one* value.
# matrix:
mean(as.matrix(m)) # same as before
For data frames, you can coerce them to a matrix first (the reason this is by default over columns is because a dataframe can have columns with strings in it, which you can't take the mean of):
# data frame
mdf <- as.data.frame(m)
# mean(mdf) returns column means
mean( as.matrix(mdf) ) # one value.
Just be careful that your dataframe has all numeric columns before coercing to matrix. Or exclude the non-numeric ones.
You can use library dplyr via install.packages('dplyr') and then
dataframe.mean <- dataframe %>%
summarise_all(mean) # replace for median

Resources