Turning an Array into a Data Frame - arrays

I would like to put my summary statistics into a table using the kable function, but I cannot because it comes up as an array.
```{r setup options, include = FALSE}
knitr::opts_chunk$set(fig.width = 8, fig.height = 5, echo = TRUE)
library(mosaic)
library(knitr)
```
```{r}
sum = summary(SwimRecords$time) # generic data set included with mosaic package
kable(sum) # I want this to be printed into a table
```
Any suggestions?

You can do so easily with the broom package which is built to "tidy" these stats-related objects:
#install.packages(broom)
broom::tidy(sum)

Related

Generate all combinations of the SUM in Ruby but only using specific amount of numbers

I am currently pulling in F1 prices from an Api, placing them into an Array. and determining what combination is less than or equal to 20. Using the below successfully:
require 'net/http'
require 'json'
#url = 'HIDDEN URL AS HAS NO RELEVANCE'
#uri = URI(#url)
#response = Net::HTTP.get(#uri)
#fantasy = JSON.parse(#response)
arr= [[#fantasy.first["Mercedes"].to_f, #fantasy.first["Ferrari"].to_f], [#fantasy.first["Hamilton"].to_f, #fantasy.first["Verstappen"].to_f]]
target = 20
#array = arr[0].product(*arr[1..-1]).select { |a| a.reduce(:+) <= target }
Where:
#fantasy = [{"Mercedes" => "4", "Ferrari" => "6.2", "Hamilton" => "7.1", "Verstappen" => "3"}]
This is successfully outputting:
[[4.0, 7.1], [4.0, 3.0], [6.2, 7.1], [6.2, 3.0]]
Eventually this will contain all F1 teams on the left side and all F1 drivers on the right (making an F1 fantasy teambuilder). But the idea is that only 1 constructor is needed and 5 drivers for the combination that should be equal or less than 20.
Is there a way to define this? To only use 1 Team (Mercedes, Ferrari etc) and 5 drivers (Hamilton, Verstappen etc) in the calculation? Obviously do not have 5 drivers included yet as just testing. So that my output would be:
[[4.0, 7.1, 3.0], [6.2, 7.1, 3.0]]
Where the constructor forms the 'base' for the calculation and then it can have any 5 of the driver calls?
My final question is, considering what I am trying to do, is this the best way to put my API into an array? As in to manually place #fantasy.first["Mercedes"].to_f inside my array brackets?
Thanks!
Not sure if I understand the question, but does this help?
arr = #fantasy.first.values.map(&:to_f)
target = 20
p result = arr.combination(2).select{|combi| combi.sum <= target}

Lapply function to anova and post hoc test cld

I am new to r and I am trying to get my mind around the apply function. So far I managed to run my anovas for all the the variables on my data and I got the pairwise comparison.
varlist <- names(dt)[5:length(dt)]
# loop
models <- lapply(X = varlist,
FUN = function(t) lm(formula = paste0("`", t, "` ~ block+irrigation*genotype"), data = dt))
#Name the list of models to the column name
names(models) = varlist
## apply anova to each model stored in the list, models
lapply(models, anova)
#marginal-means-all-variable}
res.model1 <- lapply(models, function(x) pairs(emmeans(x, ~genotype:irrigation)))
res.model1
So far so good, now I want to create a compact letter list so I can use to plot it. Previously I used the following but I can't work out how to apply an lapply function to the following code
CLD = cld(res.model1,
alpha=0.05,
Letters=letters,
adjust="tukey")
I use the CLD data to create graphs
I manage to get the letters with the following code but then I am not getting the full anova table.
tx <- with(dt, interaction(irrigation, genotype)) # determining the factors
model2 <- lapply(varlist, function(x) {
lm(substitute(i~block+tx, list(i = as.name(x))), data = dt)}) # using the factors already in "tx"
lapply(model2, anova)
letters = lapply(model2, function(m) HSD.test((m), "tx", alpha = 0.05, group = TRUE, console = TRUE))
Any suggestions to achieve what I need.
Thank you

using lookup tables to plot a ggplot and table

I'm creating a shiny app and i'm letting the user choose what data that should be displayed in a plot and a table. This choice is done through 3 different input variables that contain 14, 4 and two choices respectivly.
ui <- dashboardPage(
dashboardHeader(),
dashboardSidebar(
selectInput(inputId = "DataSource", label = "Data source", choices =
c("Restoration plots", "all semi natural grasslands")),
selectInput(inputId = "Variabel", label = "Variable", choices =
choicesVariables)),
#choicesVariables definition is omitted here, because it's very long but it
#contains 14 string values
selectInput(inputId = "Factor", label = "Factor", choices = c("Company
type", "Region and type of application", "Approved or not approved
applications", "Age group" ))
),
dashboardBody(
plotOutput("thePlot"),
tableOutput("theTable")
))
This adds up to 73 choices (yes, i know the math doesn't add up there, but some choices are invalid). I would like to do this using a lookup table so a created one with every valid combination of choices like this:
rad1<-c(rep("Company type",20), rep("Region and type of application",20),
rep("Approved or not approved applications", 13), rep("Age group", 20))
rad2<-choicesVariable[c(1:14,1,4,5,9,10,11, 1:14,1,4,5,9,10,11, 1:7,9:14,
1:14,1,4,5,9,10,11)]
rad3<-c(rep("Restoration plots",14),rep("all semi natural grasslands",6),
rep("Restoration plots",14), rep("all semi natural grasslands",6),
rep("Restoration plots",27), rep("all semi natural grasslands",6))
rad4<-1:73
letaLista<-data.frame(rad1,rad2,rad3, rad4)
colnames(letaLista) <- c("Factor", "Variabel", "rest_alla", "id")
Now its easy to use subset to only get the choice that the user made. But how do i use this information to plot the plot and table without using a 73 line long ifelse statment?
I tried to create some sort of multidimensional array that could hold all the tables (and one for the plots) but i couldn't make it work. My experience with these kind of arrays is limited and this might be a simple issue, but any hints would be helpful!
My dataset that is the foundation for the plots and table consists of dataframe with 23 variables, factors and numerical. The plots and tabels are then created using the following code for all 73 combinations
s_A1 <- summarySE(Samlad_info, measurevar="Dist_brukcentrum",
groupvars="Companytype")
s_A1 <- s_A1[2:6,]
p_A1=ggplot(s_A1, aes(x=Companytype,
y=Dist_brukcentrum))+geom_bar(position=position_dodge(), stat="identity") +
geom_errorbar(aes(ymin=Dist_brukcentrum-se,
ymax=Dist_brukcentrum+se),width=.2,position=position_dodge(.9))+
scale_y_continuous(name = "") + scale_x_discrete(name = "")
where summarySE is the following function, burrowed from cookbook for R
summarySE <- function(data=NULL, measurevar, groupvars=NULL, na.rm=TRUE,
conf.interval=.95, .drop=TRUE) {
# New version of length which can handle NA's: if na.rm==T, don't count them
length2 <- function (x, na.rm=FALSE) {
if (na.rm) sum(!is.na(x))
else length(x)
}
# This does the summary. For each group's data frame, return a vector with
# N, mean, and sd
datac <- ddply(data, groupvars, .drop=.drop,
.fun = function(xx, col) {
c(N = length2(xx[[col]], na.rm=na.rm),
mean = mean (xx[[col]], na.rm=na.rm),
sd = sd (xx[[col]], na.rm=na.rm)
)
},
measurevar
)
# Rename the "mean" column
datac <- rename(datac, c("mean" = measurevar))
datac$se <- datac$sd / sqrt(datac$N) # Calculate standard error of the mean
# Confidence interval multiplier for standard error
# Calculate t-statistic for confidence interval:
# e.g., if conf.interval is .95, use .975 (above/below), and use df=N-1
ciMult <- qt(conf.interval/2 + .5, datac$N-1)
datac$ci <- datac$se * ciMult
return(datac)
}
The code in it's entirety is a bit to large but i hope this may clarify what i'm trying to do.
Well, thanks to florian's comment i think i might have found a solution my self. I'll present it here but leave the question open as there is probably far neater ways of doing it.
I rigged up the plots (that was created as lists by ggplot) into a list
plotList <- list(p_A1, p_A2, p_A3...)
tableList <- list(s_A1, s_A2, s_A3...)
I then used subset on my lookup table to get the matching id of the list to select the right plot and table.
output$thePlot <-renderPlot({
plotValue<-subset(letaLista, letaLista$Factor==input$Factor &
letaLista$Variabel== input$Variabel & letaLista$rest_alla==input$DataSource)
plotList[as.integer(plotValue[1,4])]
})
output$theTable <-renderTable({
plotValue<-subset(letaLista, letaLista$Factor==input$Factor &
letaLista$Variabel== input$Variabel & letaLista$rest_alla==input$DataSource)
skriva <- tableList[as.integer(plotValue[4])]
print(skriva)
})

shiny selecting specific columns from uploaded data frame

I have merged different sources of code to make an app that allows one to upload a file (data frame).
However, beyond this I would also like to make it possible to select specific columns from the data frame and analyse them. This is difficult however as one must predefine the given data frame in order to be able to refer to it in the ui.R script....
So when a previously undefined data frame is uploaded to the site, one can not revere to it in the ui.R as it is defined in the server....
predefined variables
vchoices <- 1:ncol(mtcars)
names(vchoices) <- names(mtcars)
ui.R
runApp(
ui = basicPage(
h2('The uploaded file data'),
dataTableOutput('mytable'),
fileInput('file', 'Choose info-file to upload',
accept = c(
'text/csv',
'text/comma-separated-values',
'text/tab-separated-values',
'text/plain',
'.csv',
'.tsv'
)
),
actionButton("choice", "incorporate external information"),
selectInput("columns", "Select Columns", choices=vchoices, inline = T),
#notice that the 'choices' in selectInput are set to the predefined
#variables above whereas I would like to set them equal to the
#not yet defined uploaded file below in server.R
tableOutput("table_display")
))
Notice that the 'choices' in selectInput are set to the predefined variables above whereas I would like to set them equal to the not yet defined uploaded file below in server.R
server.R
server = function(input, output) {
info <- eventReactive(input$choice, {
inFile <- input$file
if (is.null(inFile))
return(NULL)
isolate(f<-read.table(inFile$datapath, header = T,
sep = "\t"))
f
})
output$table_display<-renderTable({
f<-info()
f<-subset(f, select=input$columns) #subsetting takes place here
head(f)
})
}
Does anyone know of a way to refer to a variable that's defined in in the server, in the ui and thus allow for interactive manipulation?
You can use a family of functions update*Input - in this case updateSelectInput. Its first argument has to be session and you also have to add session to server <- function(input, output) to be able to update your widget.
You could make an update of the widget immediately after clicking on the actionButton - so, you had to use updateSelectInput within eventReactive.
Let's take a look how we can do that:
First, you can save the names of columns of the new uploaded dataset in a variable, say, vars and then pass it to the function updateSelectInput.
(The choices of the selectInput are initially set to NULL - we don't need to specify them before because they are going to be updated anyway)
info <- eventReactive(input$choice, {
inFile <- input$file
# Instead # if (is.null(inFile)) ... use "req"
req(inFile)
# Changes in read.table
f <- read.table(inFile$datapath, header = input$header, sep = input$sep, quote = input$quote)
vars <- names(f)
# Update select input immediately after clicking on the action button.
updateSelectInput(session, "columns","Select Columns", choices = vars)
f
})
I've added a small upload interface to your code.
The other way would be to define widgets on the server side and then to pass them to the client side via renderUI function. You can find here an example.
Full example:
library(shiny)
ui <- fluidPage(
h2('The uploaded file data'),
dataTableOutput('mytable'),
fileInput('file', 'Choose info-file to upload',
accept = c(
'text/csv',
'text/comma-separated-values',
'text/tab-separated-values',
'text/plain',
'.csv',
'.tsv'
)
),
# Taken from: http://shiny.rstudio.com/gallery/file-upload.html
tags$hr(),
checkboxInput('header', 'Header', TRUE),
radioButtons('sep', 'Separator',
c(Comma=',',
Semicolon=';',
Tab='\t'),
','),
radioButtons('quote', 'Quote',
c(None='',
'Double Quote'='"',
'Single Quote'="'"),
'"'),
################################################################
actionButton("choice", "incorporate external information"),
selectInput("columns", "Select Columns", choices = NULL), # no choices before uploading
tableOutput("table_display")
)
server <- function(input, output, session) { # added session for updateSelectInput
info <- eventReactive(input$choice, {
inFile <- input$file
# Instead # if (is.null(inFile)) ... use "req"
req(inFile)
# Changes in read.table
f <- read.table(inFile$datapath, header = input$header, sep = input$sep, quote = input$quote)
vars <- names(f)
# Update select input immediately after clicking on the action button.
updateSelectInput(session, "columns","Select Columns", choices = vars)
f
})
output$table_display <- renderTable({
f <- info()
f <- subset(f, select = input$columns) #subsetting takes place here
head(f)
})
}
shinyApp(ui, server)

In Django how can I run a custom clean function on fixture data during import and validation?

In a ModelForm I can write a clean_<field_name> member function to automatically validate and clean up data entered by a user, but what can I do about dirty json or csv files (fixtures) during a manage.py loaddata?
Fixtures loaded with loaddata are assumed to contain clean data that doen't need validation (usually as an inverse operation to a prior dumpdata), so the short answer is that loaddata isn't the approach you want if you need to clean your inputs.
However, you probably can use some of the underpinnings of loaddata while implementing your custom data cleaning code--I'm sure you can easily script something using the Django serialization libs to read your existing data files them in and the save the resulting objects normally after the data has been cleaned up.
In case others want to do something similar, I defined a model method to do the cleaning (so it can be called from ModelForms)
MAX_ZIPCODE_DIGITS = 9
MIN_ZIPCODE_DIGITS = 5
def clean_zip_code(self, s=None):
#s = str(s or self.zip_code)
if not s: return None
s = re.sub("\D","",s)
if len(s)>self.MAX_ZIPCODE_DIGITS:
s = s[:self.MAX_ZIPCODE_DIGITS]
if len(s) in (self.MIN_ZIPCODE_DIGITS-1,self.MAX_ZIPCODE_DIGITS-1):
s = '0'+s # FIXME: deal with other intermediate lengths
if len(s)>=self.MAX_ZIPCODE_DIGITS:
s = s[:self.MIN_ZIPCODE_DIGITS]+'-'+s[self.MIN_ZIPCODE_DIGITS:]
return s
Then wrote a standalone python script to clean up my legacy json files using any clean_ methods found among the models.
import os, json
def clean_json(app = 'XYZapp', model='Entity', fields='zip_code', cleaner_prefix='clean_'):
# Set the DJANGO_SETTINGS_MODULE environment variable.
os.environ['DJANGO_SETTINGS_MODULE'] = app+".settings"
settings = __import__(app+'.settings').settings
models = __import__(app+'.models').models
fpath = os.path.join( settings.SITE_PROJECT_PATH, 'fixtures', model+'.json')
if isinstance(fields,(str,unicode)):
fields = [fields]
Ns = []
for field in fields:
try:
instance = getattr(models,model)()
except AttributeError:
print 'No model named %s could be found'%(model,)
continue
try:
cleaner = getattr(instance, cleaner_prefix+field)
except AttributeError:
print 'No cleaner method named %s.%s could be found'%(model,cleaner_prefix+field)
continue
print 'Cleaning %s using %s.%s...'%(fpath,model,cleaner.__name__)
fin = open(fpath,'r')
if fin:
l = json.load(fin)
before = len(l)
cleans = 0
for i in range(len(l)):
if 'fields' in l[i] and field in l[i]['fields']:
l[i]['fields'][field]=cleaner(l[i]['fields'][field]) # cleaner returns None to delete records
cleans += 1
fin.close()
after = len(l)
assert after>.5*before
Ns += [(before, after,cleans)]
print 'Writing %d/%d (new/old) records after %d cleanups...'%Ns[-1]
with open(fpath,'w') as fout:
fout.write(json.dumps(l,indent=2,sort_keys=True))
return Ns
if __name__ == '__main__':
clean_json()

Resources