I have a array and I want to melt it based on the dimnames. The problem is that the dimension names are large numeric values and therefore making them character would convert them to a wrong ID see the example:
test <- array(1:18, dim = c(3,3,2), dimnames = list(c(00901291282245454545454,329293929929292,2929992929922929),
c("a", "b", "c"),
c("d", "e")))
library(reshape2)
library(data.table)
test2 <- data.table(melt(test))
test2[, Var1 := as.character(Var1)]
> test2
Var1 Var2 Var3 value
1: 9.01291282245455e+20 a d 1
2: 329293929929292 a d 2
3: 2929992929922929 a d 3
4: 9.01291282245455e+20 b d 4
5: 329293929929292 b d 5
6: 2929992929922929 b d 6
7: 9.01291282245455e+20 c d 7
8: 329293929929292 c d 8
9: 2929992929922929 c d 9
10: 9.01291282245455e+20 a e 10
11: 329293929929292 a e 11
12: 2929992929922929 a e 12
13: 9.01291282245455e+20 b e 13
14: 329293929929292 b e 14
15: 2929992929922929 b e 15
16: 9.01291282245455e+20 c e 16
17: 329293929929292 c e 17
18: 2929992929922929 c e 18
How could I make the first column with the large IDs character? What I am currently doing is pasting a character letter to the dimnames and then melt, making it a character and then take a substring, which is really inefficient. It is important that it is an efficient solution because the dataset is millions of rows. There are two problems,first the 0's are deleted if they are in front of the ID and it is converted to a e+20 character.
You need to define your dimnames as character and then slighly modify melt.array which is called when you do melt on your array:
test <- array(1:18, dim = c(3,3,2), dimnames = list(c("00901291282245454545454", "329293929929292", "2929992929922929"),
c("a", "b", "c"),
c("d", "e")))
Customise melt.array to add a parameter which permits to decide wether you want the conversion or not:
melt.array2 <- function (data, varnames = names(dimnames(data)), conv=TRUE, ...)
{
values <- as.vector(data)
dn <- dimnames(data)
if (is.null(dn))
dn <- vector("list", length(dim(data)))
dn_missing <- sapply(dn, is.null)
dn[dn_missing] <- lapply(dim(data), function(x) 1:x)[dn_missing]
if(conv){ # conv is the new parameter to know if conversion needs to be done
char <- sapply(dn, is.character)
dn[char] <- lapply(dn[char], type.convert)
}
indices <- do.call(expand.grid, dn)
names(indices) <- varnames
data.frame(indices, value = values)
}
Try the new function on your example (with conv=FALSE):
head(melt.array2(test, conv=FALSE))
# X1 X2 X3 value
# 1 00901291282245454545454 a d 1
# 2 329293929929292 a d 2
# 3 2929992929922929 a d 3
# 4 00901291282245454545454 b d 4
# 5 329293929929292 b d 5
# 6 2929992929922929 b d 6
EDIT
In the development version of reshape2 (devtools::install_github("hadley/reshape"), melt.array is differently defined and you can use parameter as.is to avoid the conversion:
melt(test, as.is=TRUE)
will give you the same result as above (with Var1 etc instead of X1 etc).
Related
Taking into account the answers in this post Permutations of 3 elements within 6 positions, I think it's worth to open a new discussion about how ordering the elements.
The first condition was to have always sequences with alternate elements:
# Var1 Var2 Var3 Var4 Var5 Var6 V7
# 1 b c a b c a bcabca
# 2 c a b c a b cabcab
# 3 a b c a b c abcabc
# 4 b a b c a b babcab
# 5 c b c a b c cbcabc
# 6 a c a b c a acabca
However, the rest of the permutations could have value even if there is one coincidence of elements in like-neighbour restriction. For instance:
# Var1 Var2 Var3 Var4 Var5 Var6 Coincidence
# 1 b b a b c a -->[bb]
# 2 c c b c a b -->[cc]
# 3 a b c a a c -->[aa]
# 4 b a c c a b -->[cc]
Is it possible to use expand.grid for that too?
If it's "only one more", then I suggest the simplest way to allow it is to force it.
Using the start from the previous question:
r <- replicate(6, seq_len(length(abc)-1), simplify=FALSE)
r[[1]] <- c(r[[1]], length(abc))
We now copy this single list (that is passed to expand.grid) and replace each of the 2nd through last elements with 0. Recall that we are using these numbers with cumsum to change from the previous value, so replacing 1:2 with 0 means that we are forcing the next element to be the same.
rs <- lapply(seq_len(length(r)-1) + 1, function(i) { r[[i]] <- 0; r; })
# ^^^^^^^^^^^^^^^^^^^^^^^^ or: seq_len(length(r))[-1]
str(rs[1:2])
# List of 2
# $ :List of 6
# ..$ : int [1:3] 1 2 3
# ..$ : num 0 <--- the second letter will repeat
# ..$ : int [1:2] 1 2
# ..$ : int [1:2] 1 2
# ..$ : int [1:2] 1 2
# ..$ : int [1:2] 1 2
# $ :List of 6
# ..$ : int [1:3] 1 2 3
# ..$ : int [1:2] 1 2
# ..$ : num 0 <--- the third letter will repeat
# ..$ : int [1:2] 1 2
# ..$ : int [1:2] 1 2
# ..$ : int [1:2] 1 2
### other rs's are similar
We can verify that this works as we think it should:
# rs[[1]] repeats the first 2
m <- t(apply(do.call(expand.grid, rs[[1]]), 1, cumsum) %% length(abc) + 1)
m[] <- abc[m]
head(as.data.frame(cbind(m, apply(m, 1, paste, collapse = ""))), n=3)
# Var1 Var2 Var3 Var4 Var5 Var6 V7
# 1 b b c a b c bbcabc
# 2 c c a b c a ccabca
# 3 a a b c a b aabcab
# rs[[3]] repeats the 3rd-4th
m <- t(apply(do.call(expand.grid, rs[[3]]), 1, cumsum) %% length(abc) + 1)
m[] <- abc[m]
head(as.data.frame(cbind(m, apply(m, 1, paste, collapse = ""))), n=3)
# Var1 Var2 Var3 Var4 Var5 Var6 V7
# 1 b c a a b c bcaabc
# 2 c a b b c a cabbca
# 3 a b c c a b abccab
From here, let's automate it by putting all of these into one list and lapplying them.
rs <- c(list(r), rs)
rets <- do.call(rbind.data.frame, c(stringsAsFactors=FALSE, lapply(rs, function(r) {
m <- t(apply(do.call(expand.grid, r), 1, cumsum) %% length(abc) + 1)
m[] <- abc[m]
as.data.frame(cbind(m, apply(m, 1, paste, collapse = "")), stringsAsFactors=FALSE)
})))
head(rets)
# Var1 Var2 Var3 Var4 Var5 Var6 V7
# 1 b c a b c a bcabca
# 2 c a b c a b cabcab
# 3 a b c a b c abcabc
# 4 b a b c a b babcab
# 5 c b c a b c cbcabc
# 6 a c a b c a acabca
tail(rets)
# Var1 Var2 Var3 Var4 Var5 Var6 V7
# 331 b c b a c c bcbacc
# 332 c a c b a a cacbaa
# 333 a b a c b b abacbb
# 334 b a c b a a bacbaa
# 335 c b a c b b cbacbb
# 336 a c b a c c acbacc
Walkthrough of additional steps:
rs <- c(list(r), rs) makes the first (non-repeating r) an enclosed list, then prepends it to the rs list.
lapply(rs, function(r) ...) does the ... from the previous question once for each element in the rs list. I named it r inside the anon-function to make it perfectly clear (inside the function) that each time it gets a new r, it does exactly the same steps as the last question.
do.call(rbind.data.frame, c(stringsAsFactors=FALSE, ... because each return from the lapply will be a data.frame, and we want to combine them into a single frame. I prefer no factors, but you can choose otherwise if you need. (Instead of rbind.data.frame, you could use data.table::rbindlist or dplyr::bind_rows, both without stringsAsFactors.)
Now the first 96 rows have no repeats, then the remaining five batches of 48 rows each (total 336 rows) have one repeat each. We "know" that 48 is the right number for each of the repeat-once lists, since by changing one of the positions from "1 2" possible to "0" (from 2 to 1 possible value) we halve the total number of possible combinations (96 / 2 == 48).
If for some reason your next question asks how to expand this to allow two repeats ... then I wouldn't necessarily recommend brute-forcing this aspect of it: there are 6 or 10 possible combinations (depending on if "aaa" is allowed) of repeats, and I would much prefer to go to a more programmatic handling than this brute-force appending of the one-constraint.
I have a empty pandas DataFrame:
aqi_df = pd.DataFrame(columns = ["IMEI","Date","pm10conc_24hrs","pm25conc_24hrs","sdPm10","sdPm25","aqi","windspeed","winddirection","severity","health_impact"] )
I want to add elements one by one to each column -
for i in range(1,10):
aqi_df.IMEI.append("a")
aqi_df.Date.append("b")
aqi_df.pm10conc_24hrs.append("c")
.
.
.
But append throws an error
TypeError: cannot concatenate a non-NDFrame object
How can I append elements to pandas dataframe one by one?
IIUC you can use:
aqi_df = pd.DataFrame(columns = ["IMEI","Date","pm10conc_24hrs"] )
print (aqi_df)
for i in range(1,10):
aqi_df.loc[i] = ['a','b','c']
print (aqi_df)
IMEI Date pm10conc_24hrs
1 a b c
2 a b c
3 a b c
4 a b c
5 a b c
6 a b c
7 a b c
8 a b c
9 a b c
But better is creating DataFrame from Series or dict:
IMEI = pd.Series(['aa','bb','cc'])
Date = pd.Series(['2016-01-03','2016-01-06','2016-01-08'])
pm10conc_24hrs = pd.Series(['w','e','h'])
aqi_df = pd.DataFrame({'a':IMEI,'Date':Date,'pm10conc_24hrs':pm10conc_24hrs})
print (aqi_df)
Date a pm10conc_24hrs
0 2016-01-03 aa w
1 2016-01-06 bb e
2 2016-01-08 cc h
aqi_df = pd.DataFrame({'a':['aa','bb','cc'],
'Date':['2016-01-03','2016-01-06','2016-01-08'],
'pm10conc_24hrs':['w','e','h']})
print (aqi_df)
Date a pm10conc_24hrs
0 2016-01-03 aa w
1 2016-01-06 bb e
2 2016-01-08 cc h
I have a array/ named vector that looks like this:
d f g
1 2 3
I want to fill up the empty slots, meaning I want this:
a b c d e f g
0 0 0 1 0 2 3
Is there an elegant way of doing this, without having to write loops and conditionals? In my actual problem, instead of abcd as my array names, it's numbers. Not sure if that makes a difference. Figured alphabet is easier to understand for a reproducible example.
Create a vector of the final names, nms and then create a named vector of zeros from it using sapply and replace the elements corresponding to input names with the input values.
v <- c(d = 1, f = 2, g = 3) # input
nms <- letters[letters <= max(names(v))] # names on output vector, i.e. letters[1:7]
replace(sapply(nms, function(x) 0), names(v), v) ##
giving:
a b c d e f g
0 0 0 1 0 2 3
If in your actual vector the names are not letters then just set nms yourself. For example, nms <- c("dogs", "cats", "d", "elephants", "f", "g") would work with the same line marked ## above.
2) An alternative is to replace the line marked ## above with:
unlist(modifyList(as.list(setNames(numeric(length(nms)), nms)), as.list(v)))
Data
x <- c(d=1L,f=2L,g=3L);
x;
## d f g
## 1 2 3
Solution 1: First match new names into x and extract values, then replace NAs with zero.
x <- setNames(x[match(letters[1:7],names(x))],letters[1:7]);
x[is.na(x)] <- 0L;
x;
## a b c d e f g
## 0 0 0 1 0 2 3
Solution 2: One-liner, using nomatch argument of match().
setNames(c(x,0L)[match(letters[1:7],names(x),nomatch=length(x)+1L)],letters[1:7]);
## a b c d e f g
## 0 0 0 1 0 2 3
I already asked a similar question, however the input data has different dimension and I don't get the bigger array filled with the smaller matrix or array. Here some basic example data showing my structure:
dfList <- list(data.frame(CNTRY = c("B", "C", "D"), Value=c(3,1,4)),
data.frame(CNTRY = c("A", "B", "E"),Value=c(3,5,15)))
names(dfList) <- c("111.2000", "112.2000")
The input data is a list of >1000 dfs. Which I turned into a list of matrices with the first column as rownames. Here:
dfMATRIX <- lapply(dfList, function(x) {
m <- as.matrix(x[,-1])
rownames(m) <- x[,1]
colnames(m) <- "Value"
m
})
This list of matrices I tried to filled in an array as shown in my former question. Here:
loadandinstall("abind")
CNTRY <- c("A", "B", "C", "D", "E")
full_dflist <- array(dim=c(length(CNTRY),1,length(dfMATRIX)))
dimnames(full_dflist) <- list(CNTRY, "Value", names(dfMATRIX))
for(i in seq_along(dfMATRIX)){
afill(full_dflist[, , i], local= TRUE ) <- dfMATRIX[[i]]
}
which gives the error message:
Error in `afill<-.default`(`*tmp*`, local = TRUE, value = c(3, 1, 4)) :
does not make sense to have more dims in value than x
Any ideas?
I also tried as in my former question to use acast and also array() instead of the dfMATRIX <- lapply... command. I would assume that the 2nd dimension of my full_dflist-array (sorry for the naming:)) is wrong, but I don't know how to write the input. I appreciate your ideas very much.
Edit2: Sorry I put the wrong output:) Here my new expected output:
$`111.2000`
Value
A NA
B 3
C 1
D 4
E NA
$`112.2000`
Value
A 3
B 5
C NA
D NA
E 15
This could be one solution using data.table:
library(data.table)
#create a big data.table with all the elements
biglist <- rbindlist(dfList)
#use lapply to operate on individual dfs
lapply(dfList, function(x) {
#use the big data table to merge to each one of the element dfs
temp <- merge(biglist[, list(CNTRY)], x, by='CNTRY', all.x=TRUE)
#remove the duplicate values
temp <- temp[!duplicated(temp), ]
#convert CNTRY to character and set the order on it
temp[, CNTRY := as.character(CNTRY)]
setorder(temp, 'CNTRY')
temp
})
Output:
$`111.2000`
CNTRY Value
1: A NA
2: B 3
3: C 1
4: D 4
5: E NA
$`112.2000`
CNTRY Value
1: A 3
2: B 5
3: C NA
4: D NA
5: E 15
EDIT
For your updated output you could do:
lapply(dfList, function(x) {
temp <- merge(biglist[, list(CNTRY)], x, by='CNTRY', all.x=TRUE)
temp <- temp[!duplicated(temp), ]
temp[, CNTRY := as.character(CNTRY)]
setorder(temp, 'CNTRY')
data.frame(Value=temp$Value, row.names=temp$CNTRY)
})
$`111.2000`
Value
A NA
B 3
C 1
D 4
E NA
$`112.2000`
Value
A 3
B 5
C NA
D NA
E 15
But I would really suggest keeping the list with data.table elements rather than converting to data.frames so that you can have row.names.
I have a array of names and a function that returns a data frame. I want to combine this array and data frame. For e.g.:
>mynames<-c("a", "b", "c")
>df1 <- data.frame(val0=c("d", "e"),val1=4:5)
>df2 <- data.frame(val1=c("e", "f"),val2=5:6)
>df3 <- data.frame(val2=c("f", "g"),val3=6:7)
What I want is a data frame that joins this array with data frame. df1 corresponds to "a", df2 corresponds to "b" and so on. So, the final data frame looks like this:
Names Var Val
a d 4
a e 5
b e 5
b f 6
c f 6
c g 7
Can someone help me on this?
Thanks.
This answers this particular question, but I'm not sure how much help it will be for your actual problem:
myList <- list(df1, df2, df3)
do.call(rbind,
lapply(seq_along(mynames), function(x)
cbind(Names = mynames[x], setNames(myList[[x]],
c("Var", "Val")))))
# Names Var Val
# 1 a d 4
# 2 a e 5
# 3 b e 5
# 4 b f 6
# 5 c f 6
# 6 c g 7
Here, we create a list of your data.frames, and in our lapply call, we add in the new "Names" column and rename the existing columns so that we can use rbind to put them all together.