Two-dimensional data structure of objects - arrays

I would like to store an object in a two-dimensional data structure in R. I have searched and tried several solutions, but none of them do what I want. This is what I had in mind:
S = SomeTwoDimensionalStructure(dim=c(2,4))
S[1,1] = LoadDataObject("File1")
s[1,2] = LoadDataObject("File2")
# etc
FunctionWantingObject(S[1,1])
This solution is quite close, but requires accessing S[[1,1]] instead of S[1,1].
Adding the objects to a list and then using dim resulted in the later functions not being happy with the argument passed.

If you're willing to give your two dimensional structure a new class, you can then define a special [ method for it that does what you want.
## Make sample data, a matrix of lists, of class "listmatrix"
set.seed(44)
m <- matrix(lapply(sample(9), function(X) sample(letters, size=X)), ncol=3)
class(m) <- "listmatrix"
## Define a new `[` method for "listmatrix" objects
`[.listmatrix` <- function(x,i,j,...) `[[`(x,i,j,...)
## Check that it works
m[1,2]
# [1] "m" "f" "h" "y" "r" "x" "q" "k" "n"

Related

Vectorize an S4 class in R

I got some troubles defining array like classes in a way that they are fully typed (as far as that is possible in R).
My example: I want to define a class Avector, which should contain an arbitrary number of elements of the class A.
# Define the base class
setClass("A", representation(x = "numeric"))
# Some magic needed ????
setClass("Avector", ???)
# In the end one should be able to use it as follows:
a1 <- new("A", x = 1)
a2 <- new("A", x = 2)
X <- new("Avector", c(a1, a2))
I am aware that having a vector of objects is not possible in R. So I guess it will be stored in a kind of "typed" list.
I have found some solution, but I am not happy with it:
# Define the vectorized class
setClass(
"Avector",
representation(X = "list"),
valididty = function(.Object)) {
if (all(sapply(.Object#X, function(x) class(x) == "A")))
TRUE
else
"'Avector' must be a list of elements in the class 'A'"
}
)
# Define a method to subscript the elements inside of the vector
setMethod(
"[", signature(x = "Avector", i = "ANY", j = "ANY"),
function(x, i, j, ...) x#X[[i]]
)
# Test the class
a1 <- new("A", x = 1)
a2 <- new("A", x = 2)
avec <- new("Avector", X = list(a1, a2))
# Retrieve the element in index i
avec[i]
This method appears more like a hack to me. Is there a way to do this in a canonical way in R without doing this type checking and indexing method by hand?
Edit:
This should also hold, if the class A is not consisting of atomic slots. For example in the case that:
setClass("A", representation(x = "data.frame"))
I would be glad for help :)
Cheers,
Adrian
The answer depends somewhat on what you are trying to accomplish, and may or may not be possible in your use case. The way S4 is intended to work is that objects are supposed to be high-level to avoid excessive overheads.
Generally, it is necessary to have the slots be vectors. You can't define new atomic types from within R. So in your toy example instead of calling
avec <- new("Avector", X = list(a1, a2))
you call
avec <- new("A", x = c(1, 2))
This may necessitate other slots (which were previously vectors) becoming arrays, for example.
If you're desperate to have an atomic type, then you might be able to over-ride one of the existing types. I think the bit64 package does this, for example. Essentially what you do is make a new class that inherits from, say, numeric and then write lots of methods that supersede all the default ones for your new class.

How to collapse a multi-dimensional array of hashes in Ruby?

Background:
Hey all, I am experimenting with external APIs and am trying to pull in all of the followers of a User from a site and apply some sorting.
I have refactored a lot of the code, HOWEVER, there is one part that is giving me a really tough time. I am convinced there is an easier way to implement this than what I have included and would be really grateful on any tips to do this in a much more eloquent way.
My goal is simple. I want to collapse an array of arrays of hashes (I hope that is the correct way to explain it) into one array of hashes.
Problem Description:
I have an array named f_collectionswhich has 5 elements. Each element is an array of size 200. Each sub-element of these arrays is a hash of about 10 key-value pairs. My best representation of this is as follows:
f_collections = [ collection1, collection2, ..., collection5 ]
collection1 = [ hash1, hash2, ..., hash200]
hash1 = { user_id: 1, user_name: "bob", ...}
I am trying to collapse this multi-dimensional array into one array of hashes. Since there are five collection arrays, this means the results array would have 1000 elements - all of which would be hashes.
followers = [hash1, hash2, ..., hash1000]
Code (i.e. my attempt which I do not want to keep):
I have gotten this to work with a very ugly piece of code (see below), with nested if statements, blocks, for loops, etc... This thing is a nightmare to read and I have tried my hardest to research ways to do this in a simpler way, I just cannot figure out how. I have tried flatten but it doesn't seem to work.
I am mostly just including this code to show I have tried very hard to solve this problem, and while yes I solved it, there must be a better way!
Note: I have simplified some variables to integers in the code below to make it more readable.
for n in 1..5 do
if n < 5
(0..199).each do |j|
if n == 1
nj = j
else
nj = (n - 1) * 200 + j
end
#followers[nj] = #f_collections[n-1].collection[j]
end
else
(0..199).each do |jj|
njj = (4) * 200 + jj
#followers[njj] = #f_collections[n-1].collection[jj]
end
end
end
Oh... so It is not an array objects that hold collections of hashes. Kind of. Lets give it another try:
flat = f_collection.map do |col|
col.collection
end.flatten
which can be shortened (and is more performant) to:
flat = f_collection.flat_map do |col|
col.collection
end
This works because the items in the f_collection array are objects that have a collection attribute, which in turn is an array.
So it is "array of things that have an array that contains hashes"
Old Answer follows below. I leave it here for documentation purpose. It was based on the assumption that the data structure is an array of array of hashes.
Just use #flatten (or #flatten! if you want this to be "inline")
flat = f_collections.flatten
Example
sub1 = [{a: 1}, {a: 2}]
sub2 = [{a: 3}, {a: 4}]
collection = [sub1, sub2]
flat = collection.flatten # returns a new collection
puts flat #> [{:a=>1}, {:a=>2}, {:a=>3}, {:a=>4}]
# or use the "inplace"/"destructive" version
collection.flatten! # modifies existing collection
puts collection #> [{:a=>1}, {:a=>2}, {:a=>3}, {:a=>4}]
Some recommendations for your existing code:
Do not use for n in 1..5, use Ruby-Style enumeration:
["some", "values"].each do |value|
puts value
end
Like this you do not need to hardcode the length (5) of the array (did not realize you removed the variables that specify these magic numbers). If you you want to detect the last iteration you can use each_with_index:
a = ["some", "home", "rome"]
a.each_with_index do |value, index|
if index == a.length - 1
puts "Last value is #{value}"
else
puts "Values before last: #{value}"
end
end
While #flatten will solve your problem you might want to see how DIY-solution could look like:
def flatten_recursive(collection, target = [])
collection.each do |item|
if item.is_a?(Array)
flatten_recursive(item, target)
else
target << item
end
end
target
end
Or an iterative solution (that is limited to two levels):
def flatten_iterative(collection)
target = []
collection.each do |sub|
sub.each do |item|
target << item
end
end
target
end

Haskell Array Pattern in a function

Hi total Haskell beginner here: What does the pattern in a function for an array look like ? For example: I simply want to add +1 to the first element in my array
> a = array (1,10) ((1,1) : [(i,( i * 2)) | i <- [2..10]])
My first thought was:
> arraytest :: Array (Int,Int) Int -> Array (Int,Int) Int
> arraytest (array (mn,mx) (a,b):xs) = (array (mn,mx) (a,b+1):xs)
I hope you understand my problem :)
You can't pattern match on arrays because the data declaration in the Data.Array.IArray module for the Array type doesn't have any of its data constructors exposed. This is a common practice in Haskell because it allows the author to update the internal representation of their data type without making a breaking change for users of their module.
The only way to use an Array, therefore, is to use the functions provided by the module. To access the first value in an array, you can use a combination of bounds and (!), or take the first key/value pair from assocs. Then you can use (//) to make an update to the array.
arraytest arr = arr // [(index, value + 1)]
where
index = fst (bounds arr)
value = arr ! index
If you choose to use assocs, you can pattern match on its result:
arraytest arr = arr // [(index, value + 1)]
where
(index, value) = head (assocs arr) -- `head` will crash if the array is empty
Or you can make use of the Functor instances for lists and tuples:
arraytest arr = arr // take 1 (fmap (fmap (+1)) (assocs arr))
You will probably quickly notice, though, that the array package is lacking a lot of convenience functions. All of the solutions above are fairly verbose compared to how the operation would be implemented in other languages.
To fix this, we have the lens package (and its cousins), which add a ton of convenience functions to Haskell and make packages like array much more bearable. This package has a fairly steep learning curve, but it's used very commonly and is definitely worth learning.
import Control.Lens
arraytest arr = arr & ix (fst (bounds arr)) +~ 1
If you squint your eyes, you can almost see how it says arr[0] += 1, but we still haven't sacrificed any of the benefits of immutability.
This is more like an extended comment to #4castle's answer. You cannot pattern match on an Array because its implementation is hidden; you must use its public API to work with them. However, you can use the public API to define such a pattern (with the appropriate language extensions):
{-# LANGUAGE PatternSynonyms, ViewPatterns #-}
-- PatternSynonyms: Define patterns without actually defining types
-- ViewPatterns: Construct patterns that apply functions as well as match subpatterns
import Control.Arrow((&&&)) -- solely to dodge an ugly lambda; inline if you wish
pattern Array :: Ix i => (i, i) -> [(i, e)] -> Array i e
-- the type signature hints that this is the array function but bidirectional
pattern Array bounds' assocs' <- ((bounds &&& assocs) -> (bounds', assocs'))
-- When matching against Array bounds' assocs', apply bounds &&& assocs to the
-- incoming array, and match the resulting tuple to (bounds', assocs')
where Array = array
-- Using Array in an expression is the same as just using array
arraytest (Array bs ((i,x):xs)) = Array bs ((i,x+1):xs)
I'm fairly sure that the conversions to and from [] make this absolutely abysmal for performance.

Optimizing function speed on 3D array

I am applying a user-defined function to individual cells of a 3D array. The contents of each cell are one of the following possibilities, all of which are character vectors because of prior formatting:
"N"
"A"
""
"1"
"0"
I want to create a new 3D array of the same dimensions, where cells contain either NA or a numeric vector containing 1 or 0. Thus, I wrote a function named Numericize and used aaply to apply it to the entire array. However, it takes forever to apply it.
Numericize <- function(x){
if(!is.na(x)){
x[x=="N"] <- NA; x
x[x=="A"] <- NA; x
x[x==""] <- NA; x
x <- as.integer(x)
}
return(x)
}
The dimensions original array are 480x866x366. The function takes forever to apply using the following code:
Final.Daily.Array <- aaply(.data = Complete.Daily.Array,
.margins = c(1,2,3),
.fun = Numericize,
.progress = "text")
I am unsure if the speed issue comes from an inefficient Numericize, an inefficient aaply, or something else entirely. I considered trying to set up parallel computing using the plyr package but I wouldn't think that such a simple command would require parallel processing.
On one hand I am concerned that I created a stack overflow for myself (see this for more), but I have applied other functions to similar arrays without problems.
ex.array <- array(dim = c(3,3,3))
ex.array[,,1] <- c("N","A","","1","0","N","A","","1")
ex.array[,,2] <- c("0","N","A","","1","0","N","A","")
ex.array[,,3] <- c("1","0","N","A","","1","0","N","A")
desired.array <- array(dim = c(3,3,3))
desired.array[,,1] <- c(NA,NA,NA,1,0,NA,NA,NA,1)
desired.array[,,2] <- c(0,NA,NA,NA,1,0,NA,NA,NA)
desired.array[,,3] <- c(1,0,NA,NA,NA,1,0,NA,NA)
ex.array
desired.array
Any suggestions?
You can just use a vectorized approach:
ex.array[ex.array %in% c("", "N", "A")] <- NA
storage.mode(ex.array) <- "integer"
You can simply use the second line and it will introduce NAs by coercion.

multi-dimensional list? List of lists? array of lists?

(I am definitively using wrong terminology in this question, sorry for that - I just don't know the correct way to describe this in R terms...)
I want to create a structure of heterogeneous objects. The dimensions are not necessary rectangular. What I need would be probably called just "array of objects" in other languages like C. By 'object' I mean a structure consisting of different members, i.e. just a list in R - for example:
myObject <- list(title="Uninitialized title", xValues=rep(NA,50), yValues=rep(NA,50))
and now I would like to make 100 such objects, and to be able to address their members by something like
for (i in 1:100) {myObject[i]["xValues"]<-rnorm(50)}
or
for (i in 1:100) {myObject[i]$xValues<-rnorm(50)}
I would be grateful for any hint about where this thing is described.
Thanks in advance!
are you looking for the name of this mythical beast or just how to do it? :) i could be wrong, but i think you'd just call it a list of lists.. for example:
# create one list object
x <- list( a = 1:3 , b = c( T , F ) , d = mtcars )
# create a second list object
y <- list( a = c( 'hi', 'hello' ) , b = c( T , F ) , d = matrix( 1:4 , 2 , 2 ) )
# store both in a third object
z <- list( x , y )
# access x
z[[ 1 ]]
# access y
z[[ 2 ]]
# access x's 2nd object
z[[ 1 ]][[ 2 ]]
I did not realize that you were looking for creating other objects of same structure. You are looking for replicate.
my_fun <- function() {
list(x=rnorm(1), y=rnorm(1), z="bla")
}
replicate(2, my_fun(), simplify=FALSE)
# [[1]]
# [[1]]$x
# [1] 0.3561663
#
# [[1]]$y
# [1] 0.4795171
#
# [[1]]$z
# [1] "bla"
#
#
# [[2]]
# [[2]]$x
# [1] 0.3385942
#
# [[2]]$y
# [1] -2.465932
#
# [[2]]$z
# [1] "bla"
here is the example of solution I have for the moment, maybe it will be useful for somebody:
NUM <- 1000 # NUM is how many objects I want to have
xVal <- vector(NUM, mode="list")
yVal <- vector(NUM, mode="list")
title <- vector(NUM, mode="list")
for (i in 1:NUM) {
xVal[i]<-list(rnorm(50))
yVal[i]<-list(rnorm(50))
title[i]<-list(paste0("This is title for instance #", i))
}
myObject <- list(xValues=xVal, yValues=yVal, titles=title)
# now I can address any member, as needed:
print(myObject$titles[[3]])
print(myObject$xValues[[4]])
If the dimensions are always going to be rectangular (in your case, 100x50), and the contents are always going to be homogeneous (in your case, numeric) then create a 2D array/matrix.
If you want the ability to add/delete/insert on individual lists (or change the data type), then use a list-of-lists.

Resources