Loop over all the variables automatically in R

Loop over all the variables automatically in R - loops

I have a dataset of 3 dependent variables (height,color and habit) and 3 independent (rep,block and flower_name). I have 3 replications and 20 blocks each blocks repeated 6 time.
data_com<-data[1:18,]
flower<-(list(rep = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1), block = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2,
3, 3, 3, 3, 3, 3), flowername = c("yellow", "orange", "black",
"black1", "orange1", "violet", "violet1", "violet3", "purple",
"purple1", "purple3", "red", "red1", "lila", "sky", "pink", "purple_pink",
"purple_pink1"), height = c(5, 4, 6, 5, 6, 4, 7, 5, 4, 6, 5,
6, 7, 5, 6, 5, 4, 5), color = c(5, 5, 7, 6, 6, 4, 7, 4, 5, 6,
5, 6, 7, 5, 6, 6, 7, 7), habit = c(4, 6, 3, 3, 6, 2, 4, 2, 4,
6, 2, 6, 7, 7, 7, 6, 6, 6)), row.names = c(NA, -18L), class = c("tbl_df",
"tbl", "data.frame"))
My model looks like this:
data <- readxl::read_excel("flower.xlsx",sheet = 1)
str(data)
names(data)
data$flowername <- as.factor(data$"flowername")
data$rep <- as.factor(data$"rep")
data$block <- as.factor(data$"block")
data$height <- as.numeric(data$"height")
model <- lmer(height~ 1 + (1|flowername) + (1|rep) , data = data) summary(model)
------------------------------------------------------------------------
And I would like to have a loop which runs over all the dependent variable once. later I would like to save the random effects for all variables as a list and as xlsx, so that I could use it for further analysis. I would also like to save the anova output for all dependent variables as xlsx as well.
I am new to R and looping seems readlly difficult for me to understand. any help would be appreciated.
I am also new in stackoverflow so correct me please if the post is not properly formatted. Thank you

First off, your example data does not work just like that. You have too many (flowername) or too few (rep) levels for the low number of rows. This results in errors when fitting the models. The following example data works, however:
data_com<-data[1:18,]
flower<-structure(list(rep = rep(1:2, 9),
block = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2,
3, 3, 3, 3, 3, 3),
flowername = rep(c("yellow", "orange", "black"), 6),
height = c(5, 4, 6, 5, 6, 4, 7, 5, 4, 6, 5,
6, 7, 5, 6, 5, 4, 5),
color = c(5, 5, 7, 6, 6, 4, 7, 4, 5, 6,
5, 6, 7, 5, 6, 6, 7, 7),
habit = c(4, 6, 3, 3, 6, 2, 4, 2, 4,
6, 2, 6, 7, 7, 7, 6, 6, 6)),
row.names = c(NA, -18L), class = c("tbl_df",
"tbl", "data.frame"))
flower$flowername <- as.factor(flower$"flowername")
flower$rep <- as.factor(flower$"rep")
flower$block <- as.factor(flower$"block")
flower$height <- as.numeric(flower$"height")
So, to "automatically" run through your dependent variables, you need to make a function that fits the model and extracts the results that you are interested in:
get.re <- function(dependent, dat) {
require(lme4)
dat$dependent <- dat[[dependent]] # specifies the dependent variable
model <- lmer(dependent ~ 1 + (1|flowername) + (1|rep),
data = dat) # fits the model
cat("\n\n============ Model for dependent:", dependent, "============\n")
print(summary(model)) # shows you the summary
ranef(model) # returns the random effects of the model
}
# make a vector of the dependent variable names
dependents <- c("height", "color", "habit")
# apply the function to each dependent variable
fits.ls <- lapply(dependents,
get.re,
dat = flower)
names(fits.ls) <- dependents # name the list elements
The random effects of each model are given as a list of matrices (or data frames, not sure) where the row names are the levels of your random factors. The following code collapses these nested lists of matrices into one data frame per model. Then we save these data frames to an xlsx and use one sheet per model/df.
fits.dfs <- lapply(fits.ls,
function(x) {out <- dplyr::bind_rows(lapply(x,
function(y) data.frame(level = rownames(y),
value = y[,1]) ),
.id = "")
out}
)
library(openxlsx)
wb <- buildWorkbook(fits.dfs)
saveWorkbook(wb, "RandomEffects.xlsx")
Edit:
To keep only random effects of flowername and put all of them (from all fitted models) into one excel sheet, transform your output list (fit.ls) as follows. This replaces the last code block:
fits.df <- lapply(fits.ls,
function(x) {dplyr::bind_rows(x$flowername)})
fits.df <- dplyr::bind_cols(fits.df)
colnames(fits.df) <- names(fits.ls)
fits.df <- cbind(rownames(fits.df), fits.df) # flowernames as a column so it is visible in the xlsx
openxlsx::write.xlsx(fits.df, "RandomEffects.xlsx")

Related

counting DISTINCT copies of row elements

Consider the array sample A.
import numpy as np
A = np.array([[2, 3, 6, 7, 3, 6, 7, 2],
[2, 3, 6, 7, 3, 6, 7, 7],
[2, 4, 3, 4, 6, 4, 9, 4],
[4, 9, 0, 1, 2, 5, 3, 0],
[5, 5, 2, 5, 4, 3, 7, 5],
[7, 5, 4, 8, 0, 1, 2, 6],
[7, 5, 4, 7, 3, 8, 0, 7]])
PROBLEM: I want to identify rows that have a specified number of DISTINCT element copies. The following code comes close: The code needs to be able to answer questions like "which rows of A have exactly 4 elements that appear twice?", or "which rows of A have exactly 1 element that appear three times?"
r,c = A.shape
nCopies = 4
s = np.sort(A,axis=1)
out = A[((s[:,1:] != s[:,:-1]).sum(axis=1)+1 == c - nCopies)]
This produces 2 output rows, both having 4 copied elements.
The 1st row has copies of 2,3,6,7. The 2nd row has copies of 3,6,7,7:
array([[2, 3, 6, 7, 3, 6, 7, 2],
[2, 3, 6, 7, 3, 6, 7, 7]])
My problem is that I don't want the 2nd output row because it only has 3 DISTINCT copies (ie: 3,6,7)
How can to code be modified to identify only distinct copies?

If I understand correctly, you want the rows of A that have 4 distinct values and every value must have at least one copy. You can leverage np.unique(return_counts=True) which returns 2 values, the distinct values and the count of each value.
counts = [np.unique(row,return_counts=True) for row in A ]
valid_indices = [ np.all(row[1] > 1) and row[0].shape[0] == 4 for row in counts ]
valid_rows = A[valid_indices]

Match lengths of multiple Numpy arrays of unequal length

I'm doing some data processing on data that comes in sets of 3 thousands of values long. Sometimes the arrays are of slightly different lengths, and I am trying to find a way to find the minimal length array and match the other two to that.
# Some randomly generated sequences
a = array([7, 1, 7, 8, 0, 0, 1, 2, 8, 7, 2, 3])
b = array([0, 1, 1, 8, 3, 4, 1, 5])
c = array([8, 3, 3, 1, 4, 6, 6, 7, 3, 8, 8])
# What I'd like accomplished
a = array([7, 1, 7, 8, 0, 0, 1, 2])
b = array([0, 1, 1, 8, 3, 4, 1, 5])
c = array([8, 3, 3, 1, 4, 6, 6, 7])
This problem seems well covered for 2 arrays of different lengths but my searches didn't bring up anything for matching the lengths of multiple arrays. Looking at some of the Numpy methods like resize and array_split didn't seem to have the functionality I was looking for. Before I dive into writing some type of ugly recursive function using the directions I found matching 2 arrays, does anyone have any suggestions about how this can be accomplished conveniently?

First we can do return the min length
mlen = min(map(len, [a, b, c]))
8
Then
newl=[x[: mlen ] for x in [a,b,c]]

Rearrange array [1, 2, 3, 4, 5, 6] to [1, 3, 5, 2, 4, 6]

I'm looking for the royal road to rearrange an array of variable length like
[1, 2, 3, 4, 5, 6]
into something like this:
[1, 3, 5, 2, 4, 6]
The length of the array is always dividable by 3. So I could also have an array like this:
[1, 2, 3, 4, 5, 6, 7, 8, 9]
which should be turned into:
[1, 4, 7, 2, 5, 8, 3, 6, 9]
A real example would look like this:
['X.1', 'X.2', 'Y.1', 'Y.2', 'Z.1', 'Z.2']
which I would like to turn into:
['X.1', 'Y.1', 'Z.1', 'X.2', 'Y.2', 'Z.2']
An array of size 3 or an empty array should remain unmodified.
How would I do that?

If a is the name of your NumPy array, you can write:
a.reshape(3, -1).ravel('f')
(This assumes that your array is divisible by 3, as you have stated.)
This method works by first viewing each chunk of len(a) / 3 elements as rows of a 2D array and then unravels that 2D array column-wise.
For example:
>>> a = np.array([1, 2, 3, 4, 5, 6])
>>> a.reshape(3, -1).ravel('f')
array([1, 3, 5, 2, 4, 6])
>>> b = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> b.reshape(3, -1).ravel('f')
array([1, 4, 7, 2, 5, 8, 3, 6, 9])

no numpy solution:
>>> r = range(1,10)
>>> sum([r[i::len(r)/3] for i in range(len(r)/3)],[])
[1, 4, 7, 2, 5, 8, 3, 6, 9]
i used sum for concatenting lists as it is the most self contained and readable example. But as mentioned in the comments, it is certainly not the most efficient one. For efficiency, you can use list (in)comprehension:
>>> r = range(1,10)
>>> [x for y in [r[i::len(r)/3] for i in range(len(r)/3)] for x in y]
[1, 4, 7, 2, 5, 8, 3, 6, 9]
or any of the other methods mentioned here.

Using reshape and transpose (or T) will do :
import numpy as np
t = np.arange(1, 10)
t2 = np.reshape(t, [t.shape[0]/3, 3]).T.reshape(len(t))

Indexing highest value of numpy matrix

I have a numpy array of shape (4, 7) like this:
array([[ 1, 4, 5, 7, 8, 6, 7]
[ 2, 23, 2, 4, 8, 94, 2],
[ 1, 5, 6, 7, 10, 15, 20],
[ 3, 9, 2, 7, 6, 5, 4]])
I would like to get the index of the highest element, i.e. 94, in a form like: first row fifth column. Thus the output should be a numpy array ([1,5]) (matlab-style).

You get the index of the maximum index using arr.argmax() but to get the actual row and column you must use np.unravel_index as below:
import numpy as np
arr = np.array([[ 1, 4, 5, 7, 8, 6, 7],
[ 2, 23, 2, 4, 8, 94, 2],
[ 1, 5, 6, 7, 10, 15, 20],
[ 3, 9, 2, 7, 6, 5, 4]])
maximum = np.unravel_index(arr.argmax(), arr.shape)
print(maximum)
# (1, 5)
You have to use np.unravel_index as by default np.argmax will return the index from a flattened array (which in your case would be index 12).

Python 3.x IndexError while using nested For loops

So I've been trying to code a tabletop game that I made a long time ago - I'm working on the graphic section now, and I'm trying to draw the 9x7 tile map using nested For loops:
I'm using the numpy library for my 2d array
gameboard = array( [[8, 8, 8, 7, 7, 7, 8, 8, 8],
[8, 3, 6, 7, 7, 7, 6, 3, 8],
[0, 1, 1, 6, 6, 6, 1, 1, 0],
[0, 5, 4, 0, 0, 0, 4, 5, 0],
[0, 3, 2, 0, 0, 0, 2, 3, 0],
[8, 8, 1, 0, 0, 0, 1, 8, 8],
[8, 8, 8, 6, 6, 6, 8, 8, 8]] )
def mapdraw():
for x in [0, 1, 2, 3, 4, 5, 6, 7, 8]:
for y in [0, 1, 2, 3, 4, 5, 6]:
if gameboard[(x, y)] == 1:
#insert tile 1 at location
elif gameboard[(x, y)] == 2:
#insert tile 2 at location
elif gameboard[(x, y)] == 3:
#insert tile 3 at location
#this continues for all 8 tiles
#graphics update
When I run this program, i get an error on the line "if gameboard[(x,y)] == 1:"
"IndexError: index (7) out of range (0<=index<7) in dimension 0"
I've looked for hours to find what this error even means, and have tried many different ways to fix it: any help would be appreciated.

You have to index the array using [y,x] because the first coordinate is the row index (which, for you, is the y index).
As an aside, please iterate over a range instead of an explicit list!
for x in range(9):
for y in range(7):
if gameboard[y, x] == 1:
#insert tile 1 at location
...

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Loop over all the variables automatically in R - loops

Related

counting DISTINCT copies of row elements

Match lengths of multiple Numpy arrays of unequal length

Rearrange array [1, 2, 3, 4, 5, 6] to [1, 3, 5, 2, 4, 6]

Indexing highest value of numpy matrix

Python 3.x IndexError while using nested For loops

Categories

Resources