R: Combine a list to a data frame - arrays

I have a array of names and a function that returns a data frame. I want to combine this array and data frame. For e.g.:
>mynames<-c("a", "b", "c")
>df1 <- data.frame(val0=c("d", "e"),val1=4:5)
>df2 <- data.frame(val1=c("e", "f"),val2=5:6)
>df3 <- data.frame(val2=c("f", "g"),val3=6:7)
What I want is a data frame that joins this array with data frame. df1 corresponds to "a", df2 corresponds to "b" and so on. So, the final data frame looks like this:
Names Var Val
a d 4
a e 5
b e 5
b f 6
c f 6
c g 7
Can someone help me on this?
Thanks.

This answers this particular question, but I'm not sure how much help it will be for your actual problem:
myList <- list(df1, df2, df3)
do.call(rbind,
lapply(seq_along(mynames), function(x)
cbind(Names = mynames[x], setNames(myList[[x]],
c("Var", "Val")))))
# Names Var Val
# 1 a d 4
# 2 a e 5
# 3 b e 5
# 4 b f 6
# 5 c f 6
# 6 c g 7
Here, we create a list of your data.frames, and in our lapply call, we add in the new "Names" column and rename the existing columns so that we can use rbind to put them all together.

Related

I wanna keep index in "pd.Series(a,index=).unique" code

I have a problem with pd.Series(a).unique()
I made a Series, and I used .unique().
However, this deletes the pd.Series index.
How can I made unique Array with original index?
Instead of using .unique() you can use .drop_duplicates():
x = pd.Series([1,2,3,1,1,2,4,5,6], index=list("abcdefghi"))
print(x)
a 1
b 2
c 3
d 1
e 1
f 2
g 4
h 5
i 6
dtype: int64
.drop_duplicates() will remove all duplicates from the Series while maintaining reference to the index. You can choose whether you want to keep the index location of the "first" or the "last" duplicated item via the keep argument:
# Keep the first entry of each duplicated value
x.drop_duplicates(keep="first")
a 1
b 2
c 3
g 4
h 5
i 6
dtype: int64
# Keep the last entry of each duplicated item
x.drop_duplicates(keep="last")
c 3
e 1
f 2
g 4
h 5
i 6
dtype: int64

Split an array into smaller arrays using R

I have a 1D array with an odd number of rows, 2435 rows. I want to split the array into smaller arrays and each time perform a small test.
Firstly, I want to split the big array into two smaller arrays.
Then I would like to split my array into 4 smaller arrays, then into 8 small arrays and so on.
Can anyone help with that?
An example is the following:
A<-1:2435
A1
1,2,3,4,...,1237
A2
1238, 1239,...,2435
Thanks in advance
Why not simply use the split() function? For example (using an odd length will return warnings but that's fine):
split(x = 1:11, f = 1:2) # to split into 2 distinct list elements
#$`1`
#[1] 1 3 5 7 9 11
#
#$`2`
#[1] 2 4 6 8 10
split(x = 1:11, f = 1:4)
#$`1`
#[1] 1 5 9
#
#$`2`
#[1] 2 6 10
#
#$`3`
#[1] 3 7 11
#
#$`4`
#[1] 4 8
And if you are really keen on splitting to 2, and then again by 2, you can always use the lapply() function which works on each element of a list:
lapply(split(x = 1:11, f = 1:2), split, f = 1:2)
#$`1`
#$`1`$`1`
#[1] 1 5 9
#
#$`1`$`2`
#[1] 3 7 11
#
#
#$`2`
#$`2`$`1`
#[1] 2 6 10
#
#$`2`$`2`
#[1] 4 8
The nested structure is a little bit of a pain but there are other methods for dealing with that, for example:
L <- split(x = 1:11, f = 1:2) # the main (first) split
names(L) <- letters[1:length(L)] # names the main split a and b
LL <- lapply(L, split, f = 1:2) # split the main split
unlist(LL, recursive = F)
#$a.1
#[1] 1 5 9
#
#$a.2
#[1] 3 7 11
#
#$b.1
#[1] 2 6 10
#
#$b.2
#[1] 4 8
If you want to split the data through the middle of the array, you can also use the split function:
a <- 1:2435
divide <- function(x, n = 2)
{
i <- ceiling(length(x)/n)
split(x,x%/%i+1)
}
divide(a)
and with more parts you can use
divide(a, n = 4)
Or in two itterations use
lapply(divide(a,2),function(x) divide(x,2))
With a higher value of n, the sizes will not be equal anymore, due to rounding issues. Which warrants the use of the nested approach.

Replace corresponding parts of one array with another array in R

I have a array/ named vector that looks like this:
d f g
1 2 3
I want to fill up the empty slots, meaning I want this:
a b c d e f g
0 0 0 1 0 2 3
Is there an elegant way of doing this, without having to write loops and conditionals? In my actual problem, instead of abcd as my array names, it's numbers. Not sure if that makes a difference. Figured alphabet is easier to understand for a reproducible example.
Create a vector of the final names, nms and then create a named vector of zeros from it using sapply and replace the elements corresponding to input names with the input values.
v <- c(d = 1, f = 2, g = 3) # input
nms <- letters[letters <= max(names(v))] # names on output vector, i.e. letters[1:7]
replace(sapply(nms, function(x) 0), names(v), v) ##
giving:
a b c d e f g
0 0 0 1 0 2 3
If in your actual vector the names are not letters then just set nms yourself. For example, nms <- c("dogs", "cats", "d", "elephants", "f", "g") would work with the same line marked ## above.
2) An alternative is to replace the line marked ## above with:
unlist(modifyList(as.list(setNames(numeric(length(nms)), nms)), as.list(v)))
Data
x <- c(d=1L,f=2L,g=3L);
x;
## d f g
## 1 2 3
Solution 1: First match new names into x and extract values, then replace NAs with zero.
x <- setNames(x[match(letters[1:7],names(x))],letters[1:7]);
x[is.na(x)] <- 0L;
x;
## a b c d e f g
## 0 0 0 1 0 2 3
Solution 2: One-liner, using nomatch argument of match().
setNames(c(x,0L)[match(letters[1:7],names(x),nomatch=length(x)+1L)],letters[1:7]);
## a b c d e f g
## 0 0 0 1 0 2 3

R: How to fill one column matrices of different dimensions in a LOOP?

I already asked a similar question, however the input data has different dimension and I don't get the bigger array filled with the smaller matrix or array. Here some basic example data showing my structure:
dfList <- list(data.frame(CNTRY = c("B", "C", "D"), Value=c(3,1,4)),
data.frame(CNTRY = c("A", "B", "E"),Value=c(3,5,15)))
names(dfList) <- c("111.2000", "112.2000")
The input data is a list of >1000 dfs. Which I turned into a list of matrices with the first column as rownames. Here:
dfMATRIX <- lapply(dfList, function(x) {
m <- as.matrix(x[,-1])
rownames(m) <- x[,1]
colnames(m) <- "Value"
m
})
This list of matrices I tried to filled in an array as shown in my former question. Here:
loadandinstall("abind")
CNTRY <- c("A", "B", "C", "D", "E")
full_dflist <- array(dim=c(length(CNTRY),1,length(dfMATRIX)))
dimnames(full_dflist) <- list(CNTRY, "Value", names(dfMATRIX))
for(i in seq_along(dfMATRIX)){
afill(full_dflist[, , i], local= TRUE ) <- dfMATRIX[[i]]
}
which gives the error message:
Error in `afill<-.default`(`*tmp*`, local = TRUE, value = c(3, 1, 4)) :
does not make sense to have more dims in value than x
Any ideas?
I also tried as in my former question to use acast and also array() instead of the dfMATRIX <- lapply... command. I would assume that the 2nd dimension of my full_dflist-array (sorry for the naming:)) is wrong, but I don't know how to write the input. I appreciate your ideas very much.
Edit2: Sorry I put the wrong output:) Here my new expected output:
$`111.2000`
Value
A NA
B 3
C 1
D 4
E NA
$`112.2000`
Value
A 3
B 5
C NA
D NA
E 15
This could be one solution using data.table:
library(data.table)
#create a big data.table with all the elements
biglist <- rbindlist(dfList)
#use lapply to operate on individual dfs
lapply(dfList, function(x) {
#use the big data table to merge to each one of the element dfs
temp <- merge(biglist[, list(CNTRY)], x, by='CNTRY', all.x=TRUE)
#remove the duplicate values
temp <- temp[!duplicated(temp), ]
#convert CNTRY to character and set the order on it
temp[, CNTRY := as.character(CNTRY)]
setorder(temp, 'CNTRY')
temp
})
Output:
$`111.2000`
CNTRY Value
1: A NA
2: B 3
3: C 1
4: D 4
5: E NA
$`112.2000`
CNTRY Value
1: A 3
2: B 5
3: C NA
4: D NA
5: E 15
EDIT
For your updated output you could do:
lapply(dfList, function(x) {
temp <- merge(biglist[, list(CNTRY)], x, by='CNTRY', all.x=TRUE)
temp <- temp[!duplicated(temp), ]
temp[, CNTRY := as.character(CNTRY)]
setorder(temp, 'CNTRY')
data.frame(Value=temp$Value, row.names=temp$CNTRY)
})
$`111.2000`
Value
A NA
B 3
C 1
D 4
E NA
$`112.2000`
Value
A 3
B 5
C NA
D NA
E 15
But I would really suggest keeping the list with data.table elements rather than converting to data.frames so that you can have row.names.

R: JSON Package - importing data & missing values / null

I am reading in data with the JSON package.
Basically, the data has the following format:
{"a":1,"b":2,"c":3}
{"a": null,"b":2,"c":3}
I am storing the data as follows in R:
DAT<-data.table(read.csv("D:/file.csv"))
i<-1
#create unified variable names
while (i<=nrow(DAT)) {
OUT[[i]]<-fromJSON(as.character(DAT[i]$results))
vnames<-c(vnames,names(OUT[[i]]))
i<-i+1
}
#create the corresponding content
content <- NULL
Applicant <- NULL
i<-1
while (i<=nrow(DAT)) {
temp<-fromJSON(as.character(DAT[i]$results))
laenge <- length(fromJSON(as.character(DAT[i]$results)))
for(j in 1:laenge)
{
content_new <- as.character(temp[[j]])
content <- c(content, content_new)
}
i <- i+1
}
Then I want to join the lists via (in order to have the data in the typical format):
assets_mren = data.frame(asset_class=vnames, value=content)
Yet I receive an error message stating that vnames and content have different number of rows. I believe that the problem is "null" in the data to be read in. Do you have an idea how to read in "null" above or how to better read in the data?
Yes the problem is null. You get different structure for each row.
ll <- '{"a":1,"b":2,"c":3}
{"a": null,"b":2,"c":3}'
res <- lapply(ll,function(x)str(fromJSON(x)))
Named num [1:3] 1 2 3 ## named vector for the first line
- attr(*, "names")= chr [1:3] "a" "b" "c"
List of 3
$ a: NULL ## list for the second line
$ b: num 2
$ c: num 3
So you have to homogenise the output of each line. Here 2 options:
1- replace null by a dummy values (0 or -1) for example:
ll <- readLines(textConnection(gsub("null",-1,ll)))
do.call(rbind,lapply(ll,function(x)
fromJSON(x)))
a b c
[1,] 1 2 3
[2,] -1 2 3 ## res[res==-1] <- NA to replace dummy value
2- keep the null but you should use rbind.fill to get a data.frame:
ll <- readLines(textConnection(gsub("null",-1,ll)))
do.call(rbind,lapply(ll,function(x)
fromJSON(x)))
ll <- '{"a":1,"b":2,"c":3}
{"a": null,"b":2,"c":3}'
ll <- readLines(textConnection(ll))
res <- lapply(ll,function(x)
as.data.frame(t(as.matrix(unlist(fromJSON(x))))))
library(plyr)
rbind.fill(res)
a b c
1 1 2 3
2 NA 2 3

Resources