I have a dataframe similar to 'df1'. After Converting the value column to a daily time series, I fit using Holt Winters method and predict 120 days in the future. I want to be able to visualise the actual and predicted using dygraphs.
library(dygraphs)
> head(df1)
timestamp value
1 2017-03-29 534.4571
2 2017-03-30 536.4350
3 2017-03-31 534.6661
4 2017-04-01 535.9185
5 2017-04-02 532.6998
6 2017-04-03 534.8282
convert_to_daily_ts <- function(x){
x <- x[order(x$timestamp),]
x$value_ts <- ts(x$value, frequency = 7)
return(x)
}
df1 <- convert_to_daily_ts(df1)
hw <- tryCatch(HoltWinters(df1$value_ts), error=NA)
p <- predict(hw, n.ahead = 120, prediction.interval = TRUE, level=0.95)
act <- df1$value_ts
all <- cbind(act, p)
> class(all)
[1] "mts" "ts" "matrix"
> head(all)
Time Series:
Start = c(1, 1)
End = c(1, 6)
Frequency = 7
actual p.fit p.upr p.lwr
1.000000 534.4571 NA NA NA
1.142857 536.4350 NA NA NA
1.285714 534.6661 NA NA NA
1.428571 535.9185 NA NA NA
1.571429 532.6998 NA NA NA
1.714286 534.8282 NA NA NA
> tail(all)
Time Series:
Start = c(115, 2)
End = c(115, 7)
Frequency = 7
actual p.fit p.upr p.lwr
115.1429 NA 386.2924 581.7568 190.8279
115.2857 NA 384.4614 580.0625 188.8603
115.4286 NA 383.4728 579.2104 187.7352
115.5714 NA 381.3159 577.1900 185.4418
115.7143 NA 383.3130 579.3234 187.3025
115.8571 NA 384.2098 580.3565 188.0631
> str(all)
mts [1:805, 1:4] 534 536 535 536 533 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:4] "actual" "p.fit" "p.upr" "p.lwr"
- attr(*, "tsp")= num [1:3] 1 116 7
- attr(*, "class")= chr [1:3] "mts" "ts" "matrix"
dygraph(all, main = "Daily Predictions") %>%
dySeries("act", label = "Actual") %>%
dySeries(c("p.lwr", "p.fit", "p.upr"), label = "Predicted") %>%
dyOptions(drawGrid = F) %>%
dyRangeSelector()
I get Error:Unsupported type passed to argument 'data'. But the class of 'all' is as expected for the dygraph. Any help to visualise above data(actual & predicted) will be helpful. Also, I need the x-axis values to show month-year(Ex: Jun 2017, Jul 2017) instead of 1,2,3 so on. Is it possible ?
It looks like the ts object needs start and end dates for dygraph to figure things out. Could you add the appropirate start and end dates when you create the ts object? You'll need to adjust the start and end dates as appropriate. There's a post about that here.
convert_to_daily_ts <- function(x){
x <- x[order(x$timestamp),]
x$value_ts <- ts(x$value, start = c(2017,3), end = c(2017,7), frequency = 7)
return(x)
}
Related
I have the following dataframe:
x <- data.frame("A"=c(rep(4,3),rep(7,4),rep(2,2)),
"B"=c("Q","Y"," ","F","Q"," ","Z","Q","C"),
"C"=seq(1:9))
A being my grouping variable / factor
B the status flag I'll strart subsetting after its first appearance.
I've managed to subset a fixed amount of rows after by just adding the amount of rows to subset to B's indices where "Q" appears:
Something like this: x[c(which(x$B=="Q"),which(x$B=="B")+1),]
"+1" because I was just interested in the first row after "Q" occurs.
Now I need the rest of the rows within each group after "Q" occurs (including Q's row), and I been banging my head against my desk trying to figure out how to do this with the dplyr package with grouped tibbles... hence I'm here.
Please help?
------ EDIT -----
This seemed to have worked
x %>% group_by(A) %>% filter(row_number()>=which.max(B=="Q"))
Using dplyr and tidyr you may run the following code:
mydf <- data.frame("A"=c(rep(4,3),rep(7,4),rep(2,2)),
"B"=c("Q","Y"," ","F","Q"," ","Z","Q","C"),
"C"=seq(1:9))
library(tidyverse)
mydf %>%
group_by(A) %>%
mutate(selector=case_when(
B=="Q" ~ 1
)) %>%
fill(...=selector,.direction="down") %>%
filter(selector==1) %>%
select(-selector)) %>%
arrange(C,A)
and this is the result:
A B C
<dbl> <fct> <int>
1 4.00 Q 1
2 4.00 Y 2
3 4.00 " " 3
4 7.00 Q 5
5 7.00 " " 6
6 7.00 Z 7
7 2.00 Q 8
8 2.00 C 9
Is this what you're looking for?
q_rows <- row.names(subset(x, B == 'Q')) # rows where Q occurs
list_of_frames <- list()
for(i in 1:length(q_rows)) {
q_start <- as.numeric(q_rows[i]);
q_group <- as.numeric(x[q_start,c('A')])
group_frame <- subset(x[q_start:nrow(x),], A == q_group)
list_of_frames[i] <- list(group_frame)
}
list_of_frames
I'm trying to write a function in R which is given an 3dim array and a target value and returns a matrix of the indeces with the closest value to the target in z direction for every x,y point. If there is no value within a given margin of the target the matrix should be NA at that point.
I wrote a function which works but is too slow for the hundereds of data grids with dim(x) = c(586,538,100) I want to process. I don't know how to avoid the two for loops going over the arrays x,y indices.
x <- seq(6.5,13,len=90)
dim(x) <- c(3,3,10)
get.zvals <- function(dens_grid,layer,margin=0.2){
out <- dens_grid[,,1]
out[] <- NA
for(i in 1:dim(out)[1]){
for(j in 1:dim(out)[2]){
x <- dens_grid[i,j,]
if( sum(!is.na(x)) >2
& sum(x[x<(layer+margin) & x>(layer-margin)],na.rm=TRUE) >=1 ){
out[i,j] <- which.min(abs(x-layer))
}
}
}
return(out)
}
y <- get.zvals(x,12.06)
Using apply:
get.zvals <- function(dens_grid, layer, margin=0.2) {
apply(dens_grid, c(1,2), function(x) ifelse(any(abs(x-layer) < margin),
which.min(abs(x-layer)), NA))
}
> get.zvals(x,12.06)
[,1] [,2] [,3]
[1,] NA 9 9
[2,] NA 9 NA
[3,] 9 9 NA
I'm working with 3-dimensional arrays and want to have slices along the
third dimension for each position in the first two dimensions as columns in a data frame.
I also want my code to be readable for people who dont use R regularly.
Looping over the first two dimensions is very readable but slow (30 secs for the example below), while the permute-flatten-shape-to-matrix approach
is faster (14 secs) but not so readable.
Any suggestions for a nice solution?
Reproducible example here:
# Create data
d1 <- 200
d2 <- 100
d3 <- 50
data <- array(rnorm(n=d1*d2*d3), dim=c(d1, d2, d3))
# Idea 1: Loop
df <- data.frame(var1 = rep(0, d3))
i <- 1
system.time(
for (c in 1:d2) {
for(r in 1:d1){
i <- i + 1
df[[i]] <- data[r, c, ]
}
})
# Idea 2: Permute dimension of array first
df2 <- data.frame(var1 = rep(0, d3))
system.time({
data.perm <- aperm(data, c(3, 1, 2))
df2[, 2:(d1*d2 + 1)] <- matrix(c(data.perm), nrow = d3, ncol = d1*d2)}
)
identical(df, df2)
I would suggest a much more simple approach:
t(apply(data, 3, c))
I hope it suits your expectations of being fast and readable.
fast, as demonstrated in the timings below.
readable because it's a basic apply statement. All that is being done is using c to convert the matrix in each third dimension to a single vector in each third dimension, which then simplifies to a two-dimensional array. The result just needs to be transposed....
Here's your sample data:
set.seed(1)
d1 <- 200
d2 <- 100
d3 <- 50
data <- array(rnorm(n=d1*d2*d3), dim=c(d1, d2, d3))
Here are a few functions to compare:
funam <- function() t(apply(data, 3, c))
funrl <- function() {
myl <- vector("list", d3)
i <- 1
for (c in 1:d2) {
for(r in 1:d1){
i <- i + 1
myl[[i]] <- data[r, c, ]
}
}
do.call(cbind, myl)
}
funop <- function() {
df <- data.frame(var1 = rep(0, d3))
i <- 1
for (c in 1:d2) {
for(r in 1:d1){
i <- i + 1
df[[i]] <- data[r, c, ]
}
}
df[-1]
}
Here are the results of the timing:
system.time(am <- funam())
# user system elapsed
# 0.000 0.000 0.062
system.time(rl <- funrl())
# user system elapsed
# 3.980 0.000 1.375
system.time(op <- funop())
# user system elapsed
# 21.496 0.000 21.355
... and a comparison for equality:
all.equal(am, as.matrix(unname(op)), check.attributes = FALSE)
# [1] TRUE
all.equal(am, rl, check.attributes = FALSE)
# [1] TRUE
Here's an idea. Recommended read would be The R Inferno by Patrick Burns (pun intended?).
myl <- vector("list", d3) # create an empty list
i <- 1
system.time(
for (c in 1:d2) {
for(r in 1:d1){
i <- i + 1
myl[[i]] <- data[r, c, ]
}
})
user system elapsed
1.8 0.0 1.8
# bind each list element into a matrix, column-wise
do.call("cbind", myl)[1:5, 1:5]
[,1] [,2] [,3] [,4] [,5]
[1,] -0.3394909 0.1266012 -0.4240452 0.2277654 -2.04943585
[2,] 1.6788653 -2.9381127 0.5781967 -0.7248759 -0.19482647
[3,] -0.6002371 -0.3132874 1.0895175 -0.2766891 -0.02109013
[4,] 0.5215603 -0.2805730 -1.0325867 -1.5373842 -0.14034565
[5,] 0.6063638 1.6027835 0.5711185 0.5410889 -1.77109124
I already asked a similar question, however the input data has different dimension and I don't get the bigger array filled with the smaller matrix or array. Here some basic example data showing my structure:
dfList <- list(data.frame(CNTRY = c("B", "C", "D"), Value=c(3,1,4)),
data.frame(CNTRY = c("A", "B", "E"),Value=c(3,5,15)))
names(dfList) <- c("111.2000", "112.2000")
The input data is a list of >1000 dfs. Which I turned into a list of matrices with the first column as rownames. Here:
dfMATRIX <- lapply(dfList, function(x) {
m <- as.matrix(x[,-1])
rownames(m) <- x[,1]
colnames(m) <- "Value"
m
})
This list of matrices I tried to filled in an array as shown in my former question. Here:
loadandinstall("abind")
CNTRY <- c("A", "B", "C", "D", "E")
full_dflist <- array(dim=c(length(CNTRY),1,length(dfMATRIX)))
dimnames(full_dflist) <- list(CNTRY, "Value", names(dfMATRIX))
for(i in seq_along(dfMATRIX)){
afill(full_dflist[, , i], local= TRUE ) <- dfMATRIX[[i]]
}
which gives the error message:
Error in `afill<-.default`(`*tmp*`, local = TRUE, value = c(3, 1, 4)) :
does not make sense to have more dims in value than x
Any ideas?
I also tried as in my former question to use acast and also array() instead of the dfMATRIX <- lapply... command. I would assume that the 2nd dimension of my full_dflist-array (sorry for the naming:)) is wrong, but I don't know how to write the input. I appreciate your ideas very much.
Edit2: Sorry I put the wrong output:) Here my new expected output:
$`111.2000`
Value
A NA
B 3
C 1
D 4
E NA
$`112.2000`
Value
A 3
B 5
C NA
D NA
E 15
This could be one solution using data.table:
library(data.table)
#create a big data.table with all the elements
biglist <- rbindlist(dfList)
#use lapply to operate on individual dfs
lapply(dfList, function(x) {
#use the big data table to merge to each one of the element dfs
temp <- merge(biglist[, list(CNTRY)], x, by='CNTRY', all.x=TRUE)
#remove the duplicate values
temp <- temp[!duplicated(temp), ]
#convert CNTRY to character and set the order on it
temp[, CNTRY := as.character(CNTRY)]
setorder(temp, 'CNTRY')
temp
})
Output:
$`111.2000`
CNTRY Value
1: A NA
2: B 3
3: C 1
4: D 4
5: E NA
$`112.2000`
CNTRY Value
1: A 3
2: B 5
3: C NA
4: D NA
5: E 15
EDIT
For your updated output you could do:
lapply(dfList, function(x) {
temp <- merge(biglist[, list(CNTRY)], x, by='CNTRY', all.x=TRUE)
temp <- temp[!duplicated(temp), ]
temp[, CNTRY := as.character(CNTRY)]
setorder(temp, 'CNTRY')
data.frame(Value=temp$Value, row.names=temp$CNTRY)
})
$`111.2000`
Value
A NA
B 3
C 1
D 4
E NA
$`112.2000`
Value
A 3
B 5
C NA
D NA
E 15
But I would really suggest keeping the list with data.table elements rather than converting to data.frames so that you can have row.names.
The script in R as follow.
for (i in 1:(ncol(K)-1)) #ncol=7,K is dataset
for (j in i:ncol(K)){
print(mi.empirical(rbind(K[, i],K[,j])))
}
output is given as
[1] 0
[1] 0.1412579
[1] 0.4597332
[1] 0.382798
[1] 0.1162086
[1] 0.3379114
[1] 0.4848073
[1] 0
[1] 0.3103481
[1] 0.1556235
[1] 0.02243661
[1] 0.123117
[1] 0.2290079
[1] 0
[1] 0.1460452
[1] 0.3638203
[1] 0.2569532
[1] 0.1117262
[1] 0
[1] 0.1981758
[1] 0.05325737
[1] 0.03590259
[1] 0
[1] 0.1380921
[1] 0.3007399
[1] 0
[1] 0.1032283
Can someone tell me how to print this results in a upper triangular matrix (7 x 7 matrix)?
All zeros (0) are in Diagonal. Please help me.
Thank u in advance.
If you just want to print the results as a matrix, then, using this dummy data
n <- 7
vec <- seq_len((0.5 * (n * (n-1)) + n)) ## dummy data, *inc* diagonal
which will play the role of the cumulated output from mi.empirical(), create an empty matrix
mat <- matrix(ncol = 7, nrow = 7)
Then index the upper triangle of mat using upper.tri(mat, diag = TRUE) and assign the cumulated results to it
mat[upper.tri(mat, diag = TRUE)] <- vec
> mat
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 1 2 4 7 11 16 22
[2,] NA 3 5 8 12 17 23
[3,] NA NA 6 9 13 18 24
[4,] NA NA NA 10 14 19 25
[5,] NA NA NA NA 15 20 26
[6,] NA NA NA NA NA 21 27
[7,] NA NA NA NA NA NA 28
You are going to need to put all the outputs from mi.empirical() into a single vector though, rather than printing each intermediary result.
upper.tri is probably what you're looking for:
#Generating a reproducible 7x7 matrix:
set.seed(1)
m <- matrix(rexp(49, rate=.1), ncol=7)
diag(m) <- rep(0, ncol(m))
#The calculation you're looking for:
m[upper.tri(m, diag = FALSE)]
You are anyway looping over the indices, you may as well fill up a matrix while you are doing it.
For example:
answers <- matrix(NA_character_, nrow=ncol(K) - 1, ncol=ncol(K))
for (i in 1:(ncol(K)-1))
for (j in i:ncol(K)){
answers[i, j] <- mi.empirical(rbind(K[, i],K[,j]))
}
print(answers)