How to add a continuous sequence (unique identifier) to an array? - arrays

I have an array, similar to this:
ar1<- array(rep(1, 91*5*4), dim=c(91, 5, 4))
I want to add an extra column at the end of each component (n = 4) that is sequential across all components (I'm not sure if component is the right word).
In this case it would be a sequence from 1 to 364.
The idea behind this is that if the rows are scrambled when I'm messing around with joining data or anything else I would be able to see it and rectify it.
How do I achieve this please?

Maybe the following is what you want.
It uses apply to add an extra column to each slice defined by the 2nd dimension of the array and after this is done sets the final dimensions correctly.
ar2 <- sapply(1:5, function(i){
new <- seq_len(NROW(ar1[, i, ])) + (i - 1)*NROW(ar1[, i, ])
cbind(ar1[, i, ], new)
})
dim(ar2) <- c(91, 5, 5)
The code above creates a new array, if you want you can rewrite the original one.
To get the original back this will do it.
n <- dim(ar2)[2]
ar1_back <- sapply(1:5, function(i){
ar2[, -n, i]
})
dim(ar1_back) <- c(91, 5, 4)
identical(ar1, ar1_back)
#[1] TRUE

Related

Finding max sum with operation limit

As an input i'm given an array of integers (all positive).
Also as an input i`m given a number of "actions". The goal is to find max possible sum of array elements with given number of actions.
As an "action" i can either:
Add current element to sum
Move to the next element
We are starting at 0 position in array. Each element could be added only once.
Limitation are:
2 < array.Length < 20
0 < number of "actions" < 20
It seems to me that this limitations essentially not important. Its possible to find each combination of "actions", but in this case complexity would be like 2^"actions" and this is bad...))
Examples:
array = [1, 4, 2], 3 actions. Output should be 5. In this case we added zero element, moved to first element, added first element.
array = [7, 8, 9], 2 actions. Output should be 8. In this case we moved to the first element, then added first element.
Could anyone please explain me the algorithm to solve this problem? Or at least the direction in which i shoudl try to solve it.
Thanks in advance
Here is another DP solution using memoization. The idea is to represent the state by a pair of integers (current_index, actions_left) and map it to the maximum sum when starting from the current_index, assuming actions_left is the upper bound on actions we are allowed to take:
from functools import lru_cache
def best_sum(arr, num_actions):
'get best sum from arr given a budget of actions limited to num_actions'
#lru_cache(None)
def dp(idx, num_actions_):
'return best sum starting at idx (inclusive)'
'with number of actions = num_actions_ available'
# return zero if out of list elements or actions
if idx >= len(arr) or num_actions_ <= 0:
return 0
# otherwise, decide if we should include current element or not
return max(
# if we include element at idx
# we spend two actions: one to include the element and one to move
# to the next element
dp(idx + 1, num_actions_ - 2) + arr[idx],
# if we do not include element at idx
# we spend one action to move to the next element
dp(idx + 1, num_actions_ - 1)
)
return dp(0, num_actions)
I am using Python 3.7.12.
array = [1, 1, 1, 1, 100]
actions = 5
In example like above, you just have to keep moving right and finally pickup the 100. At the beginning of the array we never know what values we are going to see further. So, this can't be greedy.
You have two actions and you have to try out both because you don't know which to apply when.
Below is a python code. If not familiar treat as pseudocode or feel free to convert to language of your choice. We recursively try both actions until we run out of actions or we reach the end of the input array.
def getMaxSum(current_index, actions_left, current_sum):
nonlocal max_sum
if actions_left == 0 or current_index == len(array):
max_sum = max(max_sum, current_sum)
return
if actions_left == 1:
#Add current element to sum
getMaxSum(current_index, actions_left - 1, current_sum + array[current_index])
else:
#Add current element to sum and Move to the next element
getMaxSum(current_index + 1, actions_left - 2, current_sum + array[current_index])
#Move to the next element
getMaxSum(current_index + 1, actions_left - 1, current_sum)
array = [7, 8, 9]
actions = 2
max_sum = 0
getMaxSum(0, actions, 0)
print(max_sum)
You will realize that there can be overlapping sub-problems here and we can avoid those repetitive computations by memoizing/caching the results to the sub-problems. I leave that task to you as an exercise. Basically, this is Dynamic Programming problem.
Hope it helped. Post in comments if any doubts.

How to collapse a multi-dimensional array of hashes in Ruby?

Background:
Hey all, I am experimenting with external APIs and am trying to pull in all of the followers of a User from a site and apply some sorting.
I have refactored a lot of the code, HOWEVER, there is one part that is giving me a really tough time. I am convinced there is an easier way to implement this than what I have included and would be really grateful on any tips to do this in a much more eloquent way.
My goal is simple. I want to collapse an array of arrays of hashes (I hope that is the correct way to explain it) into one array of hashes.
Problem Description:
I have an array named f_collectionswhich has 5 elements. Each element is an array of size 200. Each sub-element of these arrays is a hash of about 10 key-value pairs. My best representation of this is as follows:
f_collections = [ collection1, collection2, ..., collection5 ]
collection1 = [ hash1, hash2, ..., hash200]
hash1 = { user_id: 1, user_name: "bob", ...}
I am trying to collapse this multi-dimensional array into one array of hashes. Since there are five collection arrays, this means the results array would have 1000 elements - all of which would be hashes.
followers = [hash1, hash2, ..., hash1000]
Code (i.e. my attempt which I do not want to keep):
I have gotten this to work with a very ugly piece of code (see below), with nested if statements, blocks, for loops, etc... This thing is a nightmare to read and I have tried my hardest to research ways to do this in a simpler way, I just cannot figure out how. I have tried flatten but it doesn't seem to work.
I am mostly just including this code to show I have tried very hard to solve this problem, and while yes I solved it, there must be a better way!
Note: I have simplified some variables to integers in the code below to make it more readable.
for n in 1..5 do
if n < 5
(0..199).each do |j|
if n == 1
nj = j
else
nj = (n - 1) * 200 + j
end
#followers[nj] = #f_collections[n-1].collection[j]
end
else
(0..199).each do |jj|
njj = (4) * 200 + jj
#followers[njj] = #f_collections[n-1].collection[jj]
end
end
end
Oh... so It is not an array objects that hold collections of hashes. Kind of. Lets give it another try:
flat = f_collection.map do |col|
col.collection
end.flatten
which can be shortened (and is more performant) to:
flat = f_collection.flat_map do |col|
col.collection
end
This works because the items in the f_collection array are objects that have a collection attribute, which in turn is an array.
So it is "array of things that have an array that contains hashes"
Old Answer follows below. I leave it here for documentation purpose. It was based on the assumption that the data structure is an array of array of hashes.
Just use #flatten (or #flatten! if you want this to be "inline")
flat = f_collections.flatten
Example
sub1 = [{a: 1}, {a: 2}]
sub2 = [{a: 3}, {a: 4}]
collection = [sub1, sub2]
flat = collection.flatten # returns a new collection
puts flat #> [{:a=>1}, {:a=>2}, {:a=>3}, {:a=>4}]
# or use the "inplace"/"destructive" version
collection.flatten! # modifies existing collection
puts collection #> [{:a=>1}, {:a=>2}, {:a=>3}, {:a=>4}]
Some recommendations for your existing code:
Do not use for n in 1..5, use Ruby-Style enumeration:
["some", "values"].each do |value|
puts value
end
Like this you do not need to hardcode the length (5) of the array (did not realize you removed the variables that specify these magic numbers). If you you want to detect the last iteration you can use each_with_index:
a = ["some", "home", "rome"]
a.each_with_index do |value, index|
if index == a.length - 1
puts "Last value is #{value}"
else
puts "Values before last: #{value}"
end
end
While #flatten will solve your problem you might want to see how DIY-solution could look like:
def flatten_recursive(collection, target = [])
collection.each do |item|
if item.is_a?(Array)
flatten_recursive(item, target)
else
target << item
end
end
target
end
Or an iterative solution (that is limited to two levels):
def flatten_iterative(collection)
target = []
collection.each do |sub|
sub.each do |item|
target << item
end
end
target
end

Is there a way to reshape an array that does not maintain the original size (or a convenient work-around)?

As a simplified example, suppose I have a dataset composed of 40 sorted values. The values of this example are all integers, though this is not necessarily the case for the actual dataset.
import numpy as np
data = np.linspace(1,40,40)
I am trying to find the maximum value inside the dataset for certain window sizes. The formula to compute the window sizes yields a pattern that is best executed with arrays (in my opinion). For simplicity sake, let's say the indices denoting the window sizes are a list [1,2,3,4,5]; this corresponds to window sizes of [2,4,8,16,32] (the pattern is 2**index).
## this code looks long because I've provided docstrings
## just in case the explanation was unclear
def shapeshifter(num_col, my_array=data):
"""
This function reshapes an array to have 'num_col' columns, where
'num_col' corresponds to index.
"""
return my_array.reshape(-1, num_col)
def looper(num_col, my_array=data):
"""
This function calls 'shapeshifter' and returns a list of the
MAXimum values of each row in 'my_array' for 'num_col' columns.
The length of each row (or the number of columns per row if you
prefer) denotes the size of each window.
EX:
num_col = 2
==> window_size = 2
==> check max( data[1], data[2] ),
max( data[3], data[4] ),
max( data[5], data[6] ),
.
.
.
max( data[39], data[40] )
for k rows, where k = len(my_array)//num_col
"""
my_array = shapeshifter(num_col=num_col, my_array=data)
rows = [my_array[index] for index in range(len(my_array))]
res = []
for index in range(len(rows)):
res.append( max(rows[index]) )
return res
So far, the code is fine. I checked it with the following:
check1 = looper(2)
check2 = looper(4)
print(check1)
>> [2.0, 4.0, ..., 38.0, 40.0]
print(len(check1))
>> 20
print(check2)
>> [4.0, 8.0, ..., 36.0, 40.0]
print(len(check2))
>> 10
So far so good. Now here is my problem.
def metalooper(col_ls, my_array=data):
"""
This function calls 'looper' - which calls
'shapeshifter' - for every 'col' in 'col_ls'.
EX:
j_list = [1,2,3,4,5]
==> col_ls = [2,4,8,16,32]
==> looper(2), looper(4),
looper(8), ..., looper(32)
==> shapeshifter(2), shapeshifter(4),
shapeshifter(8), ..., shapeshifter(32)
such that looper(2^j) ==> shapeshifter(2^j)
for j in j_list
"""
res = []
for col in col_ls:
res.append(looper(num_col=col))
return res
j_list = [2,4,8,16,32]
check3 = metalooper(j_list)
Running the code above provides this error:
ValueError: total size of new array must be unchanged
With 40 data points, the array can be reshaped into 2 columns of 20 rows, or 4 columns of 10 rows, or 8 columns of 5 rows, BUT at 16 columns, the array cannot be reshaped without clipping data since 40/16 ≠ integer. I believe this is the problem with my code, but I do not know how to fix it.
I am hoping there is a way to cutoff the last values in each row that do not fit in each window. If this is not possible, I am hoping I can append zeroes to fill the entries that maintain the size of the original array, so that I can remove the zeroes after. Or maybe even some complicated if - try - break block. What are some ways around this problem?
I think this will give you what you want in one step:
def windowFunc(a, window, f = np.max):
return np.array([f(i) for i in np.split(a, range(window, a.size, window))])
with default f, that will give you a array of maximums for your windows.
Generally, using np.split and range, this will let you split into a (possibly ragged) list of arrays:
def shapeshifter(num_col, my_array=data):
return np.split(my_array, range(num_col, my_array.size, num_col))
You need a list of arrays because a 2D array can't be ragged (every row needs the same number of columns)
If you really want to pad with zeros, you can use np.lib.pad:
def shapeshifter(num_col, my_array=data):
return np.lib.pad(my_array, (0, num_col - my.array.size % num_col), 'constant', constant_values = 0).reshape(-1, num_col)
Warning:
It is also technically possible to use, for example, a.resize(32,2) which will create an ndArray padded with zeros (as you requested). But there are some big caveats:
You would need to calculate the second axis because -1 tricks don't work with resize.
If the original array a is referenced by anything else, a.resize will fail with the following error:
ValueError: cannot resize an array that references or is referenced
by another array in this way. Use the resize function
The resize function (i.e. np.resize(a)) is not equivalent to a.resize, as instead of padding with zeros it will loop back to the beginning.
Since you seem to want to reference a by a number of windows, a.resize isn't very useful. But it's a rabbit hole that's easy to fall into.
EDIT:
Looping through a list is slow. If your input is long and windows are small, the windowFunc above will bog down in the for loops. This should be more efficient:
def windowFunc2(a, window, f = np.max):
tail = - (a.size % window)
if tail == 0:
return f(a.reshape(-1, window), axis = -1)
else:
body = a[:tail].reshape(-1, window)
return np.r_[f(body, axis = -1), f(a[tail:])]
Here's a generalized way to reshape with truncation:
def reshape_and_truncate(arr, shape):
desired_size_factor = np.prod([n for n in shape if n != -1])
if -1 in shape: # implicit array size
desired_size = arr.size // desired_size_factor * desired_size_factor
else:
desired_size = desired_size_factor
return arr.flat[:desired_size].reshape(shape)
Which your shapeshifter could use in place of reshape

how i can eliminate the row which equal those if conditions

function prealloc()
situation=zeros(Int64,3^5,5);
i=1;
for north=0:2, south=0:2, east=0:2, west=0:2, current=0:2
situation[i,:]=[north, south, east, west, current]
if situation[i,:]=[2, 2, 2, 2, 2]
elseif situation[i,:]=[2, 2, 2, 2, 1]
elseif situation[i,:]=[2, 2, 2, 2, 0]`enter code here`
end
i+=1
end
situation
end
How can I eliminate the row which equal those if conditions from the array which called situation
First things first: the code in your question doesn't run (for several reasons). When posting code in questions, it is good form to put it in a "working example" form, where users can copy and paste it into their editor of choice and it will work without the user having to make educated guesses as to what you are actually trying to do. This is probably one reason the question has received down-votes.
With that out of the way, there are two approaches to accomplish what you are trying to do:
1) Construct your matrix without the indicated rows in the first step. Then you don't need to worry about "deleting the rows" later on. For situations as simple as the one in the question, you could just do something like this:
function prealloc()
x = zeros(Int, 3^5 - 3, 5)
i = 1
for n=0:2, s=0:2, ea=0:2, w=0:2, cur=0:2
if !([n, s, ea, w, cur] == [2, 2, 2, 2, 2] || [n, s, ea, w, cur] == [2, 2, 2, 2, 1] || [n, s, ea, w, cur] == [2, 2, 2, 2, 0])
x[i, :] = [n, s, ea, w, cur]
i += 1
end
end
return(x)
end
Notice I'm using Int, not Int64. This will not affect performance, and it means your code will run on both 32-bit and 64-bit architectures.
Another style tip. Don't use semi-colons to end lines. This is a Matlab quirk, and it is not needed in Julia.
2) As other users have suggested, you could construct the entire matrix (including the undesirable rows), and then remove them at a later point. Of course, this necessitates re-allocating the entire matrix, and so is somewhat inefficient (note, you can remove elements of vectors in place, i.e. without re-allocation, but not any arrays of dimension 2 or greater). In this case, to encourage code re-use, it makes sense to break the routine down into three separate functions. First, we allocate the entire matrix:
function prealloc1()
x = zeros(Int64,3^5,5)
i = 1
for north=0:2, south=0:2, east=0:2, west=0:2, current=0:2
x[i,:]=[north, south, east, west, current]
i += 1
end
return(x)
end
Next, we obtain a vector of indices that we wish to remove. We do this as its own step because we only want to re-allocate the matrix once, rather than re-allocating every time we find a new row we want to delete. For your situation, you could use a function like this:
function findCondition(x::Matrix{Int})
inds = Array(Int, 0)
for i = 1:size(x, 1)
if x[i, :] == [2 2 2 2 2]
push!(inds, i)
elseif x[i, :] == [2 2 2 2 1]
push!(inds, i)
elseif x[i, :] == [2 2 2 2 0]
push!(inds, i)
end
end
return(inds)
end
Notice that in my comparison statements in this function I use [2 2 2 2 2] instead of [2, 2, 2, 2, 2]. This is because the first construct is a 2-dimensional array (type Matrix) while the second is 1-dimensional (type Vector). Since x[i, :] is of type Matrix, the difference is important.
Finally, we need to re-allocate the matrix without the offending rows. As user #Matt B. suggests, this can be done with the following one-liner function:
removeIndices(x::Matrix{Int}, inds::Vector{Int}) = x[setdiff(IntSet(1:size(x, 1)), IntSet(inds)), :]
Note, applying setdiff to IntSet here is fast because by construction inds will already be sorted in ascending order.
You cannot just delete a row in Julia, the only way to do it, is to create a copy of the array without the row you want to delete. And I think that's not internally implemented and it's intentional.
So you will have to do to it manually, something like this will create a copy of situation without the row i (which is not the same as saying that it will delete row i).
situation = vcat(situation[1:i-1,:],situation[i+1:end,:])
also, this will actually change the dimensions of situation in each iteration, so be careful with that...
Also2, your loop will finish in a bounds error since eventually it will be off limits of your array, maybe you could write something like this to end your loop.
if i = length(situation)
break
else
i += 1
end
Ultimately, you can make a function delrow and call it from within your loop:
function delrow(array,row)
return vcat(array[1:row-1,:],array[row+1:end,:])
end
then call situation = delrow(situation,i)

lapply and rbind not properly appending the results

SimNo <- 10
for (i in 1:SimNo){
z1<-rnorm(1000,0,1)
z2<-rnorm(1000,0,1)
z3<-rnorm(1000,0,1)
z4<-rnorm(1000,0,1)
z5<-rnorm(1000,0,1)
z6<-rnorm(1000,0,1)
X<-cbind(z1,z2,z3,z4,z5,z6)
sx<-scale(X)/sqrt(999)
det1<-det(t(sx)%*%sx)
detans<-do.call(rbind,lapply(1:SimNo, function(x) ifelse(det1<1,det1,0)))
}
when I run all commands with in loop except last one I get different values of determinant but when I run code with loops at once I get last value of determinant repeated for all.
Please help and guide to control all situation like this.
Is there way to have short and efficient way for this code, so that each individual variable can also be accessed.
Whenever you are repeating the same operation multiple times, and without inputs, think about using replicate. Here you can use it twice:
SimNo <- 10
det1 <- replicate(SimNo, {
X <- replicate(6, rnorm(1000, 0, 1))
sx <- scale(X) / sqrt(999)
det(t(sx) %*% sx)
})
detans <- ifelse(det1 < 1, det1, 0)
Otherwise, this is what your code should have looked with your for loop. You needed to create a vector for storing your outputs at each loop iteration:
SimNo <- 10
detans <- numeric(SimNo)
for (i in 1:SimNo) {
z1<-rnorm(1000,0,1)
z2<-rnorm(1000,0,1)
z3<-rnorm(1000,0,1)
z4<-rnorm(1000,0,1)
z5<-rnorm(1000,0,1)
z6<-rnorm(1000,0,1)
X<-cbind(z1,z2,z3,z4,z5,z6)
sx<-scale(X)/sqrt(999)
det1<-det(t(sx)%*%sx)
detans[i] <- ifelse(det1<1,det1,0)
}
Edit: you asked in the comments how to access X using replicate. You would have to make replicate create and store all your X matrices in a list. Then use the *apply family of functions to loop throughout that list to finish the computations:
X <- replicate(SimNo, replicate(6, rnorm(1000, 0, 1)), simplify = FALSE)
det1 <- sapply(X, function(x) {
sx <- scale(x) / sqrt(999)
det(t(sx) %*% sx)
})
detans <- ifelse(det1 < 1, det1, 0)
Here, X is now a list of matrices, so you can get e.g. the matrix for the second simulation by doing X[[2]].
SimNo <- 10
matdet <- matrix(data=NA, nrow=SimNo, ncol=1, byrow=TRUE)
for (i in 1:SimNo){
z1<-rnorm(1000,0,1)
z2<-rnorm(1000,0,1)
z3<-rnorm(1000,0,1)
z4<-rnorm(1000,0,1)
z5<-rnorm(1000,0,1)
z6<-rnorm(1000,0,1)
X<-cbind(z1,z2,z3,z4,z5,z6)
sx<-scale(X)/sqrt(999)
det1<-det(t(sx)%*%sx)
matdet[i] <-do.call(rbind,lapply(1:SimNo, function(x) ifelse(det1<1,det1,0)))
}
matdet

Resources