I am new to r and I am trying to get my mind around the apply function. So far I managed to run my anovas for all the the variables on my data and I got the pairwise comparison.
varlist <- names(dt)[5:length(dt)]
# loop
models <- lapply(X = varlist,
FUN = function(t) lm(formula = paste0("`", t, "` ~ block+irrigation*genotype"), data = dt))
#Name the list of models to the column name
names(models) = varlist
## apply anova to each model stored in the list, models
lapply(models, anova)
#marginal-means-all-variable}
res.model1 <- lapply(models, function(x) pairs(emmeans(x, ~genotype:irrigation)))
res.model1
So far so good, now I want to create a compact letter list so I can use to plot it. Previously I used the following but I can't work out how to apply an lapply function to the following code
CLD = cld(res.model1,
alpha=0.05,
Letters=letters,
adjust="tukey")
I use the CLD data to create graphs
I manage to get the letters with the following code but then I am not getting the full anova table.
tx <- with(dt, interaction(irrigation, genotype)) # determining the factors
model2 <- lapply(varlist, function(x) {
lm(substitute(i~block+tx, list(i = as.name(x))), data = dt)}) # using the factors already in "tx"
lapply(model2, anova)
letters = lapply(model2, function(m) HSD.test((m), "tx", alpha = 0.05, group = TRUE, console = TRUE))
Any suggestions to achieve what I need.
Thank you
I have multiple csv files that have the name and the price of products. There may be or may not be products that are in both files. I have to find the highest and the lowest price across these files for each product.
I joined products from both files into one array:
Dir["./*.csv"].each do |file|
CSV.foreach(file, headers:true) do |row|
tmpRow = row.to_s.chomp + "," + file #saving name of the input file
list.push(tmpRow.chomp.split(","))
end
end
The array list looks like this:
[["5893105","2.38", "weightOrSomethingIrrelevant", "./FIAT_2.csv"]]
This is the main algorithm:
while list[0] do
if list[0] != nil
tmpPart = list[0][0]
tmpParts = list.select{ |part, price| part == tmpPart}
tmpParts.each do |tp|
tmpPrices.push(tp[1])
end
list[0][2].to_f != 0.0 ? tmpWeight = list[0][2].to_s : tmpWeight = "Undefined"
tmpMaxPrice = tmpParts.select{|part, price| part == tmpPart && price == tmpPrices.max}
tmpMinPrice = tmpParts.select{|part, price| part == tmpPart && price == tmpPrices.min}
result.push([tmpPart, tmpWeight, tmpPrices.max, tmpMaxPrice[0].last, tmpPrices.min, tmpMinPrice[0].last)
tmpPart = ""
list = list - tmpParts
tmpParts = []
tmpPrices = []
tmpMaxPrice = []
tmpMinPrice = []
tmpWeight = ""
end
end
The input files are huge (over 200 000 rows), so I am having problems with efficiency of my algorithm (as it processes one row in half a second).
I am wondering if there is any better way to write this app.
I would split this into several parts:
1) I suggest you have a table which represents files (the file name, location, line number etc) and connected to that a product table (the row data from that file)
2) script / function to ingest files and store rows as DB records
3) script / function to analyse rows and find products by name, using the DB and pulling price info out using Min/max.
This could later be improved to deal with naming inconsistencies products vs product occurrences etc.
I'm creating a shiny app and i'm letting the user choose what data that should be displayed in a plot and a table. This choice is done through 3 different input variables that contain 14, 4 and two choices respectivly.
ui <- dashboardPage(
dashboardHeader(),
dashboardSidebar(
selectInput(inputId = "DataSource", label = "Data source", choices =
c("Restoration plots", "all semi natural grasslands")),
selectInput(inputId = "Variabel", label = "Variable", choices =
choicesVariables)),
#choicesVariables definition is omitted here, because it's very long but it
#contains 14 string values
selectInput(inputId = "Factor", label = "Factor", choices = c("Company
type", "Region and type of application", "Approved or not approved
applications", "Age group" ))
),
dashboardBody(
plotOutput("thePlot"),
tableOutput("theTable")
))
This adds up to 73 choices (yes, i know the math doesn't add up there, but some choices are invalid). I would like to do this using a lookup table so a created one with every valid combination of choices like this:
rad1<-c(rep("Company type",20), rep("Region and type of application",20),
rep("Approved or not approved applications", 13), rep("Age group", 20))
rad2<-choicesVariable[c(1:14,1,4,5,9,10,11, 1:14,1,4,5,9,10,11, 1:7,9:14,
1:14,1,4,5,9,10,11)]
rad3<-c(rep("Restoration plots",14),rep("all semi natural grasslands",6),
rep("Restoration plots",14), rep("all semi natural grasslands",6),
rep("Restoration plots",27), rep("all semi natural grasslands",6))
rad4<-1:73
letaLista<-data.frame(rad1,rad2,rad3, rad4)
colnames(letaLista) <- c("Factor", "Variabel", "rest_alla", "id")
Now its easy to use subset to only get the choice that the user made. But how do i use this information to plot the plot and table without using a 73 line long ifelse statment?
I tried to create some sort of multidimensional array that could hold all the tables (and one for the plots) but i couldn't make it work. My experience with these kind of arrays is limited and this might be a simple issue, but any hints would be helpful!
My dataset that is the foundation for the plots and table consists of dataframe with 23 variables, factors and numerical. The plots and tabels are then created using the following code for all 73 combinations
s_A1 <- summarySE(Samlad_info, measurevar="Dist_brukcentrum",
groupvars="Companytype")
s_A1 <- s_A1[2:6,]
p_A1=ggplot(s_A1, aes(x=Companytype,
y=Dist_brukcentrum))+geom_bar(position=position_dodge(), stat="identity") +
geom_errorbar(aes(ymin=Dist_brukcentrum-se,
ymax=Dist_brukcentrum+se),width=.2,position=position_dodge(.9))+
scale_y_continuous(name = "") + scale_x_discrete(name = "")
where summarySE is the following function, burrowed from cookbook for R
summarySE <- function(data=NULL, measurevar, groupvars=NULL, na.rm=TRUE,
conf.interval=.95, .drop=TRUE) {
# New version of length which can handle NA's: if na.rm==T, don't count them
length2 <- function (x, na.rm=FALSE) {
if (na.rm) sum(!is.na(x))
else length(x)
}
# This does the summary. For each group's data frame, return a vector with
# N, mean, and sd
datac <- ddply(data, groupvars, .drop=.drop,
.fun = function(xx, col) {
c(N = length2(xx[[col]], na.rm=na.rm),
mean = mean (xx[[col]], na.rm=na.rm),
sd = sd (xx[[col]], na.rm=na.rm)
)
},
measurevar
)
# Rename the "mean" column
datac <- rename(datac, c("mean" = measurevar))
datac$se <- datac$sd / sqrt(datac$N) # Calculate standard error of the mean
# Confidence interval multiplier for standard error
# Calculate t-statistic for confidence interval:
# e.g., if conf.interval is .95, use .975 (above/below), and use df=N-1
ciMult <- qt(conf.interval/2 + .5, datac$N-1)
datac$ci <- datac$se * ciMult
return(datac)
}
The code in it's entirety is a bit to large but i hope this may clarify what i'm trying to do.
Well, thanks to florian's comment i think i might have found a solution my self. I'll present it here but leave the question open as there is probably far neater ways of doing it.
I rigged up the plots (that was created as lists by ggplot) into a list
plotList <- list(p_A1, p_A2, p_A3...)
tableList <- list(s_A1, s_A2, s_A3...)
I then used subset on my lookup table to get the matching id of the list to select the right plot and table.
output$thePlot <-renderPlot({
plotValue<-subset(letaLista, letaLista$Factor==input$Factor &
letaLista$Variabel== input$Variabel & letaLista$rest_alla==input$DataSource)
plotList[as.integer(plotValue[1,4])]
})
output$theTable <-renderTable({
plotValue<-subset(letaLista, letaLista$Factor==input$Factor &
letaLista$Variabel== input$Variabel & letaLista$rest_alla==input$DataSource)
skriva <- tableList[as.integer(plotValue[4])]
print(skriva)
})
I have written functions to access a database.
The table called Books has:
- a `book_id` TEXT column,
- a title TEXT column, and
- an author TEXT column.
For the first one, run_query is a function that connects the database.
get_book_cnt_per_author is a function that returns a list of tuples in this form:
'author', number of books
I don't know how to use the run_query function through the loop. I always got None for what I wrote.
I don't know where my problem is. I only get one book for each author.
Please tell me what is the problem.
def get_books(db, book_cnt_list, book_cnt):
""" (str, list of tuple, int) -> list of str
Precondition: the elements in book_cnt_list are sorted
in ascending order by author name.
Return a list of all the book titles whose authors
each have book_cnt books in the database with name db
according to the book_cnt_list. The book titles should be in
ascending order for each author, but not for the entire list.
Follow ascending order across authors, that is, the order
authors appear in book_cnt_list that is already sorted by
author name.
>>> author_cnt_list = get_book_cnt_per_author("e7_database.db")
>>> books_list = get_books("e7_database.db", author_cnt_list, 10)
>>> books_list[0]
'A Christmas Carol'
>>> books_list[9]
'The Life and Adventures of Nicholas Nickleby'
>>> books_list[10]
'Disgrace'
>>> books_list[-1]
'Youth'
"""
# HINT: First figure out which authors have book_cnt books
# using the book_cnt_list. Then, access the database db
# to retrieve the required information for those authors.
# Do not call any other of your E7 functions other than run_query.
list1 = []
for i in book_cnt_list:
if i[1] == "book_cnt":
list1.append(i[0])
for j in list1:
return run_query(my_db, '''SELECT title FROM Books OREDER BY Books.title ASC WHERE Books.author = ? ''', (j))
def create_author_dict(db):
""" (str) -> dict of {str: list of str}
Return a dictionary that maps each author to the books they have written
according to the information in the Books table of the database
with name db.
>>> author_dict = create_author_dict('e7_database.db')
>>> author_dict['Isaac Asimov'].sort()
>>> author_dict['Isaac Asimov']
['Foundation', 'I Robot']
>>> author_dict['Maya Angelou']
['I Know Why the Caged Bird Sings']
"""
con = sqlite3.connect(db)
cur = con.cursor()
cur.execute('''SELECT author, title FROM Books WHERE Books.author = ?''')
new_list = cur.fetchall()
new_dict = {}
for i in new_list:
key = i[0]
value = i[1:]
new_dict.update({key: list(value)})
con.commit()
cur.close()
con.close()
return new_dict
try changing the:
new_dict.update({key: list(value)})
for a statement like:
new_dict[key] = new_dict.get(key, list()) + [value]
I've created a program for a project that tests images against one another to see whether or not it's the same image or not. I've decided to use correlation since the images I am using are styled in the same way and with this, I've been able to get everything working up to this point.
I now wish to create an array of images again, but this time, in order of their correlation. So for example, if I'm testing a 50 pence coin and I test 50 images against the 50 pence coin, I want the highest 5 correlations to be stored into an array, which can then be used for later use. But I'm unsure how to do this as each item in the array will need to have more than one variable, which will be the image location/name of the image and it's correlation percentage.
%Program Created By Ben Parry, 2016.
clc(); %Simply clears the console window
%Targets the image the user picks
inputImage = imgetfile();
%Targets all the images inside this directory
referenceFolder = 'M:\Project\MatLab\Coin Image Processing\Saved_Images';
if ~isdir(referenceFolder)
errorMessage = print('Error: Folder does not exist!');
uiwait(warndlg(errorMessage)); %Displays an error if the folder doesn't exist
return;
end
filePattern = fullfile(referenceFolder, '*.jpg');
jpgFiles = dir(filePattern);
for i = 1:length(jpgFiles)
baseFileName = jpgFiles(i).name;
fullFileName = fullfile(referenceFolder, baseFileName);
fprintf(1, 'Reading %s\n', fullFileName);
imageArray = imread(fullFileName);
imshow(imageArray);
firstImage = imread(inputImage); %Reading the image
%Converting the images to Black & White
firstImageBW = im2bw(firstImage);
secondImageBW = im2bw(imageArray);
%Finding the correlation, then coverting it into a percentage
c = corr2(firstImageBW, secondImageBW);
corrValue = sprintf('%.0f%%',100*c);
%Custom messaging for the possible outcomes
corrMatch = sprintf('The images are the same (%s)',corrValue);
corrUnMatch = sprintf('The images are not the same (%s)',corrValue);
%Looping for the possible two outcomes
if c >=0.99 %Define a percentage for the correlation to reach
disp(' ');
disp('Images Tested:');
disp(inputImage);
disp(fullFileName);
disp (corrMatch);
disp(' ');
else
disp(' ');
disp('Images Tested:');
disp(inputImage);
disp(fullFileName);
disp(corrUnMatch);
disp(' ' );
end;
imageArray = imread(fullFileName);
imshow(imageArray);
end
You can use struct() function to create structures.
Initializing an array of struct:
imStruct = struct('fileName', '', 'image', [], 'correlation', 0);
imData = repmat(imStruct, length(jpgFiles), 1);
Setting field values:
for i = 1:length(jpgFiles)
% ...
imData(i).fileName = fullFileName;
imData(i).image = imageArray;
imData(i).correlation = corrValue;
end
Extract values of correlation field and select 5 highest correlations:
corrList = [imData.correlation];
[~, sortedInd] = sort(corrList, 'descend');
selectedData = imData(sortedInd(1:5));