Error in seq_len(nrow(i)) - permutation

I am getting the following error message using adonis in vegan package
'nperm' >= set of all permutations: complete enumeration.
Set of permutations < 'minperm'. Generating entire set.
Error in seq_len(nrow(i)) :
argument must be coercible to non-negative integer
In addition: Warning message:
In seq_len(nrow(i)) : first element used of 'length.out' argument
I have the following dist file
structure(c(0.0363993060204278, 0.153646867325815, 0.240343525408732,
0.252037785179205, 0.285288085910786, 0.130845513407629, 0.24076787797198,
0.253242846053041, 0.286498377946087, 0.301920318321807, 0.280128731379395,
0.311665395565766, 0.0607868389909555, 0.104081030624619, 0.0776041876382889
), Size = 6L, Labels = c("Day10F1", "Day10F2", "Day10F3", "Day10F4",
"Day10F5", "Day10F6"), Diag = FALSE, Upper = FALSE, method = "bray", call =
vegdist(x = deco_UntMet10,
method = "bray", binary = FALSE), class = "dist")
And a second data.frame containing my groupings
structure(list(Treatment = structure(c(11L, 11L, 11L, 4L, 4L,
4L), .Label = c("Chlora", "Gen", "Lin", "Metro", "Metro+Pen",
"Metro+Rif", "Metro+Rif+Pen", "Pen", "Pen+Rif", "Rif", "Untreated"
), class = "factor"), CDW = c(2.7, 3.3, 3.133, 1.333, 1.333,
1.367), Chlorophyll = c(34.714, 37.773, 40.54, 4.67, 4.67, 4.934
), EPS = c(0.571, 0.591, 0.597, 0.166, 0.171, 0.179), Day = c(10L,
10L, 10L, 10L, 10L, 10L)), .Names = c("Treatment", "CDW", "Chlorophyll",
"EPS", "Day"), row.names = c("Day10F1", "Day10F2", "Day10F3",
"Day10F4", "Day10F5", "Day10F6"), class = "data.frame")
I want to test for if the two different treatment groups are significantly different. Untreated versus Chlora. I used the following code:
mod5 <- betadisper(vegUntMet10, mtdt_UntMet10$Treatment)
mod5
permutest(mod5, permutations = 1000)
plot(mod5)
boxplot(mod5)
TukeyHSD(mod5)
### so my data is homogenous and so I proceed to test for significance
between samples
perm5 <- how(nperm = 1000)
setBlocks(perm) <- with(mtdt_UntMet10, Treatment)
adonis(vegUntMet10 ~ mtdt_UntMet10$Treatment,
strata=mtdt_UntMet10$Treatment)
So now I get the error message. So it seems that I am doing something wrong as I have used this code before for the same full data set.
Can anyone help? thanks

Related

Python : Replace a column in a dataframe by datetime values

I'm trying to replace a column an array of 4 columns by datetime values that I treated. The problem is that it's difficult to keep the same form between the different formats of dataframe, array,....
dataw = ds.variables["pr"][:]
dataw = np.array(dataw[:,0,0])
lat = ds.variables["lat"][:]
long = ds.variables["lon"][:]
time = ds.variables["time"][:]
time = pd.to_datetime(ds.variables["time"][:],origin=pd.Timestamp('1850-01-01'),unit='D')
#np.datetime64(ds.variables["time"][:],'D')
x2 = pd.DataFrame(np.zeros((len(dataw),4), float))
x = np.zeros((len(dataw),4), float)
x[:,0] = time
x[:,1] = long
x[:,2] = lat[:]
x[:,3] = dataw[:]*86400
x=pd.DataFrame(x)
x[:,0] = pd.to_datetime(time,origin=pd.Timestamp('1850-01-01'),unit='D')
If I put directly the dates transformed in the array, the result is like: 1.32542e+18
I tried
time = ds.variables["time"][:]
and include it in the array, and then use
x[:,0]=pd.to_datetime(x[:,0],origin=pd.Timestamp('1850-01-01'),unit='D')
I get the error:
TypeError: unhashable type: 'slice'
I tried also directly put:
time=pd.to_datetime(time,origin=pd.Timestamp('1850-01-01'),unit='D')
x[:,0] = time[:]
TypeError: unhashable type: 'slice'
try this instead
import numpy as np
import pandas as pd
dataw = ds.variables["pr"][:]
dataw = np.array(dataw[:, 0, 0])
lat = ds.variables["lat"][:]
long = ds.variables["lon"][:]
time = ds.variables["time"][:]
time = np.datetime64(time, 'D')
x = np.zeros((len(dataw), 4), dtype='datetime64[D]')
x[:, 0] = time
x[:, 1] = long
x[:, 2] = lat
x[:, 3] = dataw * 86400
df = pd.DataFrame(x, columns=["Time", "Longitude", "Latitude", "Data"])
Xarray makes the netcdf->pandas workflow quite straightforward:
import xarray as xr
ds = xr.open_dataset('file.nc', engine='netcdf4')
df = ds.to_pandas()
Presuming your time variable is using cf-conventions, Xarray will automatically decode it into datetime objects.

Data arrays must have the same length, and match time discretization in dynamic problems error in GEKKO

I want to find the value of the parameter m that minimizes my variable x subject to a system of differential equations. I have the following code
from gekko import GEKKO
def run_model_m(days, population, case, k_val, b_val, u0_val, sigma_val, Kmax0, a_val, c_val):
list_x =[]
list_u =[]
list_Kmax =[]
for i in range(len(days)):
list_xi=[]
list_ui=[]
list_Ki=[]
for j in range(len(days[i])):
#try:
m = GEKKO(remote=False)
#m.time= days[i][j]
eval = np.linspace(days[i][j][0], days[i][j][-1], 100, endpoint=True)
m.time = eval
x_data= population[i][j]
variable= np.linspace(population[i][j][0], population[i][j][-1], 100, endpoint=True)
x = m.Var(value=population[i][j][0], lb=0)
sigma= m.Param(sigma_val)
d = m.Param(c_val)
k = m.Param(k_val)
b = m.Param(b_val)
r = m.Param(a_val)
step = np.ones(len(eval))
step= 0.2*step
step[0]=1
m_param = m.CV(value=1, lb=0, ub=1, integer=True); m_param.STATUS=1
u = m.Var(value=u0_val, lb=0, ub=1)
#m.free(u)
a = m.Param(a_val)
c= m.Param(c_val)
Kmax= m.Param(Kmax0)
if case == 'case0':
m.Equations([x.dt()== x*(r*(1-x/(Kmax))-m_param/(k+b*u)-d), u.dt()== sigma*(m_param*b/((k+b*u)**2))])
elif case == 'case4':
m.Equations([x.dt()== x*(r*(1-u**2)*(1-x/(Kmax))-m_param/(k+b*u)-d), u.dt() == sigma*(-2*u*r*(1-x/(Kmax))+(b*m_param)/(b*u+k)**2)])
p = np.zeros(len(eval))
p[-1] = 1.0
final = m.Param(value=p)
m.Obj(x)
m.options.IMODE = 6
m.options.MAX_ITER=15000
m.options.SOLVER=1
# optimize
m.solve(disp=False, GUI=False)
#m.open_folder(dataset_path+'inf')
list_xi.append(x.value)
list_ui.append(u.value)
list_Ki.append(m_param.value)
list_x.append(list_xi)
list_Kmax.append(list_Ki)
list_u.append(list_ui)
return list_x, list_u, list_Kmax, m.options.OBJFCNVAL
scaled_days[i][j] =[-7.0, 42.0, 83.0, 125.0, 167.0, 217.0, 258.0, 300.0, 342.0]
scaled_pop[i][j] = [0.01762491277346285, 0.020592540360308997, 0.017870838266697213, 0.01690069378982034,0.015512320147187675,0.01506701796298272,0.014096420738841563,0.013991224004743027,0.010543380664478205]
k0,b0,group, case0, u0, sigma0, K0, a0, c0 = (100, 20, 'Size3, Inc', 'case0', 0.1, 0.05, 2, 0, 0.01)
list_x2, list_u2, list_Kmax2,final =run_model_m(days=[[scaled_days[i][j]]], population=
[[scaled_pop[i][j]]],case=case1, k_val=list_b1[i0][0], b_val=b1, u0_val=list_u1[i0][j0],
sigma_val=sigma1, Kmax0=K1, a_val=list_Kmax1[0][0], c_val=c1)
I get the error Data arrays must have the same length, and match time discretization in dynamic problems error but I don't understand why. I have tried making x and m_param arrays, with x=m.Var, m_param =m.MV... But still get the same error, even if they are all arrays of the same length. Is this the right way to find the solution of the minimization problem?
I think the error was just that in run_model_m I was passing a list as u0_val and it didn't have the same dimensions as m.time. So it should be u0_val=list_u1[0][0][0]

Map Layer Issues in ggplot2

I'm having a few issues with finalizing my map for a report. I think I'm warm on the solutions, but haven't quite figured them out. I would really appreciate any help on solutions so that I can finally move on!
1) The scale bar will NOT populate in the MainMap code and the subsequent Figure1 plot. This is "remedied" in the MainMap code if I comment out the "BCWA_land" map layer. However, when I retain the "BCWA_land" map layer it will eliminate the scale bar and produces this error:
Warning message: Removed 3 rows containing missing values (geom_text).
And this is the code:
MainMap <- ggplot(QOI) +
geom_sf(aes(fill = quadID)) +
scale_fill_manual(values = c("#6b8c42",
"#70b2ae",
"#d65a31")) +
labs(fill = "Quadrants of Interest",
caption = "Figure 1: Map depicting the quadrants in the WSDOT project area as well as other quadrants of interest in the Puget Sound area.")+
ggtitle("WSDOT Project Area and Quadrants of Interest") +
scalebar(x.min = -123, x.max = -122.8, y.min = 47, y.max = 47.1, location = "bottomleft",
transform = TRUE, dist = 10, dist_unit = "km", st.size = 3, st.bottom = TRUE, st.dist = 0.1) +
north(data = QOI, location = "topleft", scale = 0.1, symbol = 12, x.min = -123, y.min = 48.3, x.max = -122.7, y.max = 48.4) +
theme_bw()+
theme(panel.grid= element_line(color = "gray50"),
panel.background = element_blank(),
panel.ontop = TRUE,
legend.text = element_text(size = 11, margin = margin(l = 3), hjust = 0),
legend.position = c(0.95, 0.1),
legend.justification = c(0.85, 0.1),
legend.background = element_rect(colour = "#3c4245", fill = "#f4f4f3"),
axis.title = element_blank(),
plot.title = element_text(face = "bold", colour = "#3c4245", hjust = 0.5, margin = margin(b=10, unit = "pt")),
plot.caption = element_text(face = "italic", colour = "#3c4245", margin = margin(t = 7), hjust = 0, vjust = 0.5)) +
geom_sf(data = BCWA_land) + #this is what I've tried to comment out to determine the scale bar problem
xlim (-123.1, -121.4) +
ylim (47.0, 48.45)
MainMap
InsetRect <- data.frame(xmin=-123.2, xmax=-122.1, ymin=47.02, ymax=48.45)
InsetMap <- ggplotGrob( ggplot( quads) +
geom_sf(aes(fill = "")) +
scale_fill_manual(values = c("#eefbfb"))+
geom_sf(data = BCWA_land) +
scale_x_continuous(expand = c(0,0), limits = c(-124.5, -122.0)) +
scale_y_continuous(expand = c(0,0), limits = c(47.0, 49.5)) +
geom_rect(data = InsetRect,aes(xmin=xmin, xmax=xmax, ymin=ymin, ymax=ymax),
color="#3c4245",
size=1.25,
fill=NA,
inherit.aes = FALSE) +
theme_bw()+
theme(legend.position = "none",
panel.grid = element_blank(),
axis.title = element_blank(),
axis.text = element_blank(),
axis.ticks = element_blank(),
plot.margin = margin(0,0,0,0)))
InsetMap
Figure1 <- MainMap +
annotation_custom(grob = InsetMap, xmin = -122.2, xmax = -121.3,
ymin = 47.75, ymax = 48.5)
Figure1
As you can see I'm not getting this issue or error for my north arrow so I'm not really sure what is happening with the scale bar!
This problem is probably a little too OCD, however I REALLY don't want the gridlines to show up on the InsetMap, and was hoping that the InsetMap would overlay on top of the MainMap, without gridlines as I had those parameters set to element_blank() in the InsetMap code.
Here is an image of my plot. If you would like the data for this, please let me know. Because these are shapefiles, the data is unwieldy and not super conducive to SO's character limit for a post...
If anyone has any insight into a solution(s) I would so so appreciate that!! Thanks for your time!
The issue was the
xlim (-123.1, -121.4) +
ylim (47.0, 48.45)
call that I made. Instead, I used coord_sf(xlim = c(min, max), ylim = c(min, max)). I thought that this would be helpful to someone who might be in my position later on!
Essentially the difference between setting the limits of to the graph using just the x/y lim calls is that that truncates the data that is available in your dataset, whereas the coord_sf call simply "focuses" your graph on that extent if you will without altering the data you have available in your dataset.

R: how to properly create rx_forest_model object?

I'm trying to do a churn analysis with R and SQL Server 2016.
I have uploaded my dataset on my database in a local SQL Server and I did all the preliminary work on this dataset.
Well, now I have this function trainModel() which I would use to estimate my random model forest:
trainModel = function(sqlSettings, trainTable) {
sqlConnString = sqlSettings$connString
trainDataSQL <- RxSqlServerData(connectionString = sqlConnString,
table = trainTable,
colInfo = cdrColInfo)
## Create training formula
labelVar = "churn"
trainVars <- rxGetVarNames(trainDataSQL)
trainVars <- trainVars[!trainVars %in% c(labelVar)]
temp <- paste(c(labelVar, paste(trainVars, collapse = "+")), collapse = "~")
formula <- as.formula(temp)
## Train gradient tree boosting with mxFastTree on SQL data source
library(RevoScaleR)
rx_forest_model <- rxDForest(formula = formula,
data = trainDataSQL,
nTree = 8,
maxDepth = 16,
mTry = 2,
minBucket = 1,
replace = TRUE,
importance = TRUE,
seed = 8,
parms = list(loss = c(0, 4, 1, 0)))
return(rx_forest_model)
}
But when I run the function I get this wrong output:
> system.time({
+ trainModel(sqlSettings, trainTable)
+ })
user system elapsed
0.29 0.07 58.18
Warning message:
In tempGetNumObs(numObs) :
Number of observations not available for this data source. 'numObs' set to 1e6.
And for this warning message, the function trainModel() does not create the object rx_forest_model
Does anyone have any suggestions on how to solve this problem?
After several attempts, I found the reason why the function trainModel() did not function properly.
Is not a connection string problem and is not even a data source type issue.
The problem is in the syntax of function trainModel().
It is enough to eliminate from the body of the function the statement:
return(rx_forest_model)
In this way, the function returns the same warning message, but creates the object rx_forest_model in the correct way.
So, the correct function is:
trainModel = function(sqlSettings, trainTable) {
sqlConnString = sqlSettings$connString
trainDataSQL <- RxSqlServerData(connectionString = sqlConnString,
table = trainTable,
colInfo = cdrColInfo)
## Create training formula
labelVar = "churn"
trainVars <- rxGetVarNames(trainDataSQL)
trainVars <- trainVars[!trainVars %in% c(labelVar)]
temp <- paste(c(labelVar, paste(trainVars, collapse = "+")), collapse = "~")
formula <- as.formula(temp)
## Train gradient tree boosting with mxFastTree on SQL data source
library(RevoScaleR)
rx_forest_model <- rxDForest(formula = formula,
data = trainDataSQL,
nTree = 8,
maxDepth = 16,
mTry = 2,
minBucket = 1,
replace = TRUE,
importance = TRUE,
seed = 8,
parms = list(loss = c(0, 4, 1, 0)))
}

Biopython for Loop IndexError

I get "IndexError: list is out of range" when I input this code. Also, the retmax is set at 614 because that's the total number of results when I make the request. Is there a way to make the retmode equal to the number of results using a variable that changes depending on the search results?
#!/usr/bin/env python
from Bio import Entrez
Entrez.email = "something#gmail.com"
handle1 = Entrez.esearch(db = "nucleotide", term = "dengue full genome", retmax = 614)
record = Entrez.read(handle1)
IdNums = [int(i) for i in record['IdList']]
while i >= 0 and i <= len(IdNums):
handle2 = Entrez.esearch(db = "nucleotide", id = IdNums[i], type = "gb", retmode = "text")
record = Entrez.read(handle2)
print(record)
i += 1
Rather than using a while loop, you can use a for loop...
from Bio import Entrez
Entrez.email = 'youremailaddress'
handle1 = Entrez.esearch(db = 'nucleotide', term = 'dengue full genome', retmax = 614)
record = Entrez.read(handle1)
IdNums = [int(i) for i in record['IdList']]
for i in IdNums:
print(i)
handle2 = Entrez.esearch(db = 'nucleotide', term = 'dengue full genome', id = i, rettype = 'gb', retmode = 'text')
record = Entrez.read(handle2)
print(record)
I ran it on my computer and it seems to work. The for loop solved the out of bounds, and adding the term to handle2 solved the calling error.

Resources