I'm doing some data processing on data that comes in sets of 3 thousands of values long. Sometimes the arrays are of slightly different lengths, and I am trying to find a way to find the minimal length array and match the other two to that.
# Some randomly generated sequences
a = array([7, 1, 7, 8, 0, 0, 1, 2, 8, 7, 2, 3])
b = array([0, 1, 1, 8, 3, 4, 1, 5])
c = array([8, 3, 3, 1, 4, 6, 6, 7, 3, 8, 8])
# What I'd like accomplished
a = array([7, 1, 7, 8, 0, 0, 1, 2])
b = array([0, 1, 1, 8, 3, 4, 1, 5])
c = array([8, 3, 3, 1, 4, 6, 6, 7])
This problem seems well covered for 2 arrays of different lengths but my searches didn't bring up anything for matching the lengths of multiple arrays. Looking at some of the Numpy methods like resize and array_split didn't seem to have the functionality I was looking for. Before I dive into writing some type of ugly recursive function using the directions I found matching 2 arrays, does anyone have any suggestions about how this can be accomplished conveniently?
First we can do return the min length
mlen = min(map(len, [a, b, c]))
8
Then
newl=[x[: mlen ] for x in [a,b,c]]
Related
I have a dataset of 3 dependent variables (height,color and habit) and 3 independent (rep,block and flower_name). I have 3 replications and 20 blocks each blocks repeated 6 time.
data_com<-data[1:18,]
flower<-(list(rep = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1), block = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2,
3, 3, 3, 3, 3, 3), flowername = c("yellow", "orange", "black",
"black1", "orange1", "violet", "violet1", "violet3", "purple",
"purple1", "purple3", "red", "red1", "lila", "sky", "pink", "purple_pink",
"purple_pink1"), height = c(5, 4, 6, 5, 6, 4, 7, 5, 4, 6, 5,
6, 7, 5, 6, 5, 4, 5), color = c(5, 5, 7, 6, 6, 4, 7, 4, 5, 6,
5, 6, 7, 5, 6, 6, 7, 7), habit = c(4, 6, 3, 3, 6, 2, 4, 2, 4,
6, 2, 6, 7, 7, 7, 6, 6, 6)), row.names = c(NA, -18L), class = c("tbl_df",
"tbl", "data.frame"))
My model looks like this:
data <- readxl::read_excel("flower.xlsx",sheet = 1)
str(data)
names(data)
data$flowername <- as.factor(data$"flowername")
data$rep <- as.factor(data$"rep")
data$block <- as.factor(data$"block")
data$height <- as.numeric(data$"height")
model <- lmer(height~ 1 + (1|flowername) + (1|rep) , data = data) summary(model)
------------------------------------------------------------------------
And I would like to have a loop which runs over all the dependent variable once. later I would like to save the random effects for all variables as a list and as xlsx, so that I could use it for further analysis. I would also like to save the anova output for all dependent variables as xlsx as well.
I am new to R and looping seems readlly difficult for me to understand. any help would be appreciated.
I am also new in stackoverflow so correct me please if the post is not properly formatted. Thank you
First off, your example data does not work just like that. You have too many (flowername) or too few (rep) levels for the low number of rows. This results in errors when fitting the models. The following example data works, however:
data_com<-data[1:18,]
flower<-structure(list(rep = rep(1:2, 9),
block = c(1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2,
3, 3, 3, 3, 3, 3),
flowername = rep(c("yellow", "orange", "black"), 6),
height = c(5, 4, 6, 5, 6, 4, 7, 5, 4, 6, 5,
6, 7, 5, 6, 5, 4, 5),
color = c(5, 5, 7, 6, 6, 4, 7, 4, 5, 6,
5, 6, 7, 5, 6, 6, 7, 7),
habit = c(4, 6, 3, 3, 6, 2, 4, 2, 4,
6, 2, 6, 7, 7, 7, 6, 6, 6)),
row.names = c(NA, -18L), class = c("tbl_df",
"tbl", "data.frame"))
flower$flowername <- as.factor(flower$"flowername")
flower$rep <- as.factor(flower$"rep")
flower$block <- as.factor(flower$"block")
flower$height <- as.numeric(flower$"height")
So, to "automatically" run through your dependent variables, you need to make a function that fits the model and extracts the results that you are interested in:
get.re <- function(dependent, dat) {
require(lme4)
dat$dependent <- dat[[dependent]] # specifies the dependent variable
model <- lmer(dependent ~ 1 + (1|flowername) + (1|rep),
data = dat) # fits the model
cat("\n\n============ Model for dependent:", dependent, "============\n")
print(summary(model)) # shows you the summary
ranef(model) # returns the random effects of the model
}
# make a vector of the dependent variable names
dependents <- c("height", "color", "habit")
# apply the function to each dependent variable
fits.ls <- lapply(dependents,
get.re,
dat = flower)
names(fits.ls) <- dependents # name the list elements
The random effects of each model are given as a list of matrices (or data frames, not sure) where the row names are the levels of your random factors. The following code collapses these nested lists of matrices into one data frame per model. Then we save these data frames to an xlsx and use one sheet per model/df.
fits.dfs <- lapply(fits.ls,
function(x) {out <- dplyr::bind_rows(lapply(x,
function(y) data.frame(level = rownames(y),
value = y[,1]) ),
.id = "")
out}
)
library(openxlsx)
wb <- buildWorkbook(fits.dfs)
saveWorkbook(wb, "RandomEffects.xlsx")
Edit:
To keep only random effects of flowername and put all of them (from all fitted models) into one excel sheet, transform your output list (fit.ls) as follows. This replaces the last code block:
fits.df <- lapply(fits.ls,
function(x) {dplyr::bind_rows(x$flowername)})
fits.df <- dplyr::bind_cols(fits.df)
colnames(fits.df) <- names(fits.ls)
fits.df <- cbind(rownames(fits.df), fits.df) # flowernames as a column so it is visible in the xlsx
openxlsx::write.xlsx(fits.df, "RandomEffects.xlsx")
Consider the array sample A.
import numpy as np
A = np.array([[2, 3, 6, 7, 3, 6, 7, 2],
[2, 3, 6, 7, 3, 6, 7, 7],
[2, 4, 3, 4, 6, 4, 9, 4],
[4, 9, 0, 1, 2, 5, 3, 0],
[5, 5, 2, 5, 4, 3, 7, 5],
[7, 5, 4, 8, 0, 1, 2, 6],
[7, 5, 4, 7, 3, 8, 0, 7]])
PROBLEM: I want to identify rows that have a specified number of DISTINCT element copies. The following code comes close: The code needs to be able to answer questions like "which rows of A have exactly 4 elements that appear twice?", or "which rows of A have exactly 1 element that appear three times?"
r,c = A.shape
nCopies = 4
s = np.sort(A,axis=1)
out = A[((s[:,1:] != s[:,:-1]).sum(axis=1)+1 == c - nCopies)]
This produces 2 output rows, both having 4 copied elements.
The 1st row has copies of 2,3,6,7. The 2nd row has copies of 3,6,7,7:
array([[2, 3, 6, 7, 3, 6, 7, 2],
[2, 3, 6, 7, 3, 6, 7, 7]])
My problem is that I don't want the 2nd output row because it only has 3 DISTINCT copies (ie: 3,6,7)
How can to code be modified to identify only distinct copies?
If I understand correctly, you want the rows of A that have 4 distinct values and every value must have at least one copy. You can leverage np.unique(return_counts=True) which returns 2 values, the distinct values and the count of each value.
counts = [np.unique(row,return_counts=True) for row in A ]
valid_indices = [ np.all(row[1] > 1) and row[0].shape[0] == 4 for row in counts ]
valid_rows = A[valid_indices]
I need to Write a Java method int[] easyAs123(int[] nums) that takes an array of integers and returns an array that contains exactly the same numbers as the given array, but rearranged so that every 1 and 2 is immediately followed by a 3, in the order of the appearance, as in the test cases below.
Specifically,
In the input array, every 1 is always immediately followed by a 2.
However, even though every 2 is immediately followed by a number, not every 2 is immediately followed by a 3.
Do not move the 1 and 2's, but every other number may move to swap place with a 3.
The input array contains the same number of 1's, 2's and 3's.
for example:easyAs123([5, 1, 2, 4, 3, 5]) → [5, 1, 2, 3, 4, 5]
easyAs123([1, 2, 9, 8, 3, 5, 3, 7, 1, 2, 6, 4]) → [1, 2, 3, 8, 9, 5, 6, 7, 1, 2, 3, 4]
Suppose I have an array with 10 elements, e.g. a=np.arange(10). If I want to create another array with the 1st, 3rd, 5th, 7th, 9th, 10th elements of the original array, i.e. b=np.array([0,2,4,6,8,9]), how can I do it efficiently?
thanks
a[[0, 2, 4, 6, 8, 9]]
Index a with a list or array representing the desired indices. (Not 1, 3, 5, 7, 9, 10, because indexing starts from 0.) It's a bit confusing that the indices and the values are the same here, so have a different example:
>>> a = np.array([5, 4, 6, 3, 7, 2, 8, 1, 9, 0])
>>> a[[0, 2, 4, 6, 8, 9]]
array([5, 6, 7, 8, 9, 0])
Note that this creates a copy, not a view. Also, note that this might not generalize to multiple axes the way you expect.
So I've been trying to code a tabletop game that I made a long time ago - I'm working on the graphic section now, and I'm trying to draw the 9x7 tile map using nested For loops:
I'm using the numpy library for my 2d array
gameboard = array( [[8, 8, 8, 7, 7, 7, 8, 8, 8],
[8, 3, 6, 7, 7, 7, 6, 3, 8],
[0, 1, 1, 6, 6, 6, 1, 1, 0],
[0, 5, 4, 0, 0, 0, 4, 5, 0],
[0, 3, 2, 0, 0, 0, 2, 3, 0],
[8, 8, 1, 0, 0, 0, 1, 8, 8],
[8, 8, 8, 6, 6, 6, 8, 8, 8]] )
def mapdraw():
for x in [0, 1, 2, 3, 4, 5, 6, 7, 8]:
for y in [0, 1, 2, 3, 4, 5, 6]:
if gameboard[(x, y)] == 1:
#insert tile 1 at location
elif gameboard[(x, y)] == 2:
#insert tile 2 at location
elif gameboard[(x, y)] == 3:
#insert tile 3 at location
#this continues for all 8 tiles
#graphics update
When I run this program, i get an error on the line "if gameboard[(x,y)] == 1:"
"IndexError: index (7) out of range (0<=index<7) in dimension 0"
I've looked for hours to find what this error even means, and have tried many different ways to fix it: any help would be appreciated.
You have to index the array using [y,x] because the first coordinate is the row index (which, for you, is the y index).
As an aside, please iterate over a range instead of an explicit list!
for x in range(9):
for y in range(7):
if gameboard[y, x] == 1:
#insert tile 1 at location
...