Creating a list of many ndarrays (different size) in python - arrays

I am new to python. Do we have any similar structure like Matlab's Multidimensional structure arrays in Python 2.7 that handles many ndarrays in a list. For instance, I have 15 of these layers (i.e. layer_X, X=[1,15]) with different size but all are 4D:
>>>type(layer_1)
<type 'numpy.ndarray'>
>>> np.shape(layer_1)
(1, 1, 32, 64)
>>> np.shape(layer_12)
(1, 1, 512, 1024)
How do I assign a structure that handles these ndarray with their position X?

You can use a dictionary:
layer_dict = {}
for X in range(1,16):
layer_dict['layer_' + str(X)] = np.ndarray(shape=(1, 1, 32, 64))
This allows to store arrays of various sizes (and any other datatypes to be precise), add and remove components. It also allows you to access your arrays efficiently.
To add a layer type:
layer_dict['layer_16'] = np.ndarray(shape=(1, 1, 512, 1024))
To delete one:
del layer_dict['layer_3']
Note that the items are not stored in order, but that does not prevent you from efficient in-order processing with approaches similar to one in the initial construction loop. If you want to have an ordered dictionary, you can use OrderedDict from the collections module.
If there is any particular rule for choosing the size of each layer, update your question and I will edit my answer.
This is an example of sequential usage:
for X in range(1,16):
temp = layer_dict['layer_' + str(X)]
print type(temp)
The type of temp is an ndarray that you can use as any other ndarray.
A more detailed usage example:
for X in range(1,16):
temp = layer_dict['layer_' + str(X)]
temp[0, 0, 2, 0] = 1
layer_dict['layer_' + str(X)] = temp
Here each layer is fetched into temp, modified, and then reassigned to layer_dict.

You can just use a list:
layers = [layer_1, layer_12]

Related

Most computationally efficient way to batch alter values in each array of a 2d array, based on conditions for particular values by indices

Say that I have a batch of arrays, and I would like to alter them based on conditions of particular values located by indices.
For example, say that I would like to increase and decrease particular values if the difference between those values are less than two.
For a single 1D array it can be done like this
import numpy as np
single2 = np.array([8, 8, 9, 10])
if abs(single2[1]-single2[2])<2:
single2[1] = single2[1] - 1
single2[2] = single2[2] + 1
single2
array([ 8, 7, 10, 10])
But I do not know how to do it for batch of arrays. This is my initial attempt
import numpy as np
single1 = np.array([6, 0, 3, 7])
single2 = np.array([8, 8, 9, 10])
single3 = np.array([2, 15, 15, 20])
batch = np.array([
np.copy(single1),
np.copy(single2),
np.copy(single3),
])
if abs(batch[:,1]-batch[:,2])<2:
batch[:,1] = batch[:,1] - 1
batch[:,2] = batch[:,2] + 1
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Looking at np.any and np.all, they are used to create an array of booleans values, and I am not sure how they could be used in the code snippet above.
My second attempt uses np.where, using the method described here for comparing particular values of a batch of arrays by creating new versions of the arrays with values added to the front/back of the arrays.
https://stackoverflow.com/a/71297663/3259896
In the case of the example, I am comparing values that are right next to each other, so I created copies that shift the arrays forwards and backwards by 1. I also use only the particular slice of the array that I am comparing, since the other numbers would also be used in the comparison in np.where.
batch_ap = np.concatenate(
(batch[:, 1:2+1], np.repeat(-999, 3).reshape(3,1)),
axis=1
)
batch_pr = np.concatenate(
(np.repeat(-999, 3).reshape(3,1), batch[:, 1:2+1]),
axis=1
)
Finally, I do the comparisons, and adjust the values
batch[:, 1:2+1] = np.where(
abs(batch_ap[:,1:]-batch_ap[:,:-1])<2,
batch[:, 1:2+1]-1,
batch[:, 1:2+1]
)
batch[:, 1:2+1] = np.where(
abs(batch_pr[:,1:]-batch_pr[:,:-1])<2,
batch[:, 1:2+1]+1,
batch[:, 1:2+1]
)
print(batch)
[[ 6 0 3 7]
[ 8 7 10 10]
[ 2 14 16 20]]
Though I am not sure if this is the most computationally efficient nor programmatically elegant method for this task. Seems like a lot of operations and code for the task, but I do not have a strong enough mastery of numpy to be certain about this.
This works
mask = abs(batch[:,1]-batch[:,2])<2
batch[mask,1] -= 1
batch[mask,2] += 1

Scala - Efficient element wise sum of two arrays

I have two arrays which I would like to reduce to one array in which at each index you have the sum of the two elements in the original arrays. For example:
val arr1: Array[Int] = Array(1, 1, 3, 3, 5)
val arr1: Array[Int] = Array(2, 1, 2, 2, 1)
val arr3: Array[Int] = sum(arr1, arr2)
// This should result in:
// arr3 = Array(3, 2, 5, 5, 6)
I've seen this post: Element-wise sum of arrays in Scala, and I currently use this approach (zip/map). However, using this for a big data application I am concerned about its performance. Using this approach one has to traverse the array(s) at least twice. Is there a better approach in terms of efficiency?
The most efficient way might well be to do it lazily.
As with anything collection-oriented, Scala 2.12 and 2.13 are going to be different (this code is Scala 2.13, but 2.12 will be similar... might extend IndexedSeqLike, but I don't know for sure)
import scala.collection.IndexedSeq
import scala.math.Numeric
case class SumIndexedSeq[+T: Numeric](seq1: IndexedSeq[T], seq2: IndexedSeq[T]) extends IndexedSeq[T] {
override val length: Int = seq1.length.min(seq2.length)
override def apply(i: Int) =
if (i >= length) throw new IndexOutOfBoundsException
else seq1(i) + seq2(i)
}
Arrays are implicitly convertible to a subtype of collection.IndexedSeq. This will compute the sum of the corresponding elements on every access (which may be generally desirable as it's possible to use a mutable IndexedSeq).
If you need an Array, you can get one with only a single traversal via
val arr3: Array[Int] = SumIndexedSeq(arr1, arr2).toArray
but SumIndexedSeq can be used anywhere a Seq can be used without a traversal.
As a further optimization, especially if you're sure that the underlying collections/arrays won't mutate, you can add a cache so you don't add the same elements together twice. It can also be generalized, if you so care, to any binary operations on T (in which case the Numeric constraint can be removed).
As Luis noted, for a performance question: experiment and benchmark. It's worth keeping in mind that a cache implementation may well entail boxing every element to put in the cache, so you might need to be accessing the same elements many times in order for the cache to be a win (and a sufficiently large cache may have implications for the stability of a distributed system).
Well, first of all, as with all things related to performance the only answer is to benchmark.
Second, are you sure you need plain mutable, invariant, weird Arrays? Can't you use something like Vector or ArraySeq?
Third, you can just do something like this or using a while loop, which would be the same.
val result = ArraySeq.tabulate(math.min(arr1.length, arr2.length)) { i =>
arr1(i) + arr2(i)
}

How to convert two associated arrays so that elements are evenly distributed?

There are two arrays, an array of images and an array of the corresponding labels. (e.g pictures of figures and it's values)
The occurrences in the labels are unevenly distributed.
What I want is to cut both arrays in such a way, that the labels are evenly distributed. E.g. every label occurs 2 times.
To test I've just created two 1D arrays and it was working:
labels = np.array([1, 2, 3, 3, 1, 2, 1, 3, 1, 3, 1,])
images = np.array(['A','B','C','C','A','B','A','C','A','C','A',])
x, y = zip(*sorted(zip(images, labels)))
label = list(set(y))
new_images = []
new_labels = []
amount = 2
for i in label:
start = y.index(i)
stop = start + amount
new_images = np.append(new_images, x[start: stop])
new_labels = np.append(new_labels, y[start: stop])
What I get/want is this:
new_labels: [ 1. 1. 2. 2. 3. 3.]
new_images: ['A' 'A' 'B' 'B' 'C' 'C']
(It is not necessary, that the arrays are sorted)
But when I tried it with the right data (images.shape = (35000, 32, 32, 3), labels.shape = (35000)) I've got an error:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
This does not help me a lot:
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
I think that my solution is quite dirty anyhow. Is there a way to do it right?
Thank you very much in advance!
When your labels are equal, the sort function tries to sort on the second value of the tuples it has as input, since this is an array in the case of your real data, (instead of the 1D data), it cannot compare them and raises this error.
Let me explain it a bit more detailed:
x, y = zip(*sorted(zip(images, labels)))
First, you zip your images and labels. What this means, is that you create tuples with the corresponding elements of images and lables. The first element from images by the first element of labels, etc.
In case of your real data, each label is paired with an array with shape (32, 32, 3).
Second you sort all those tuples. This function tries first to sort on the first element of the tuple. However, when they are equal, it will try to sort on the second element of the tuples. Since they are arrays it cannot compare them en throws an error.
You can solve this by explicitly telling the sorted function to only sort on the first tuple element.
x, y = zip(*sorted(zip(images, labels), key=lambda x: x[0]))
If performance is required, using itemgetter will be faster.
from operator import itemgetter
x, y = zip(*sorted(zip(images, labels), key=itemgetter(0)))

Adding value to arrays in scala

I faced a problem where I needed to add a new value in the middle of an Array (i.e. make a copy of the original array and replace that with the new one). I successfully solved my problem, but I was wondering whether there were other methods to do this without changing the array to buffer for a while.
val original = Array(0, 1, 3, 4)
val parts = original.splitAt(2)
val modified = parts._1 ++ (2 +: parts._2)
res0: Array[Int] = Array(0, 1, 2, 3, 4)
What I don't like on my solution is the parts variable; I'd prefer not using an intermediate step like that. Is that the easiest way to add the value or is there some better ways to do add an element?
This is precisely what patch does:
val original = Array(0, 1, 3, 4)
original.patch(2, Array(2), 0) // Array[Int] = Array(0, 1, 2, 3, 4)
You can use a mutable version of a collection to do this. The method insert do what you want (insert an element at a given index).
Well, if indeed the extra variable is what's troubling you, you can do it in one go:
val modified = original.take(2) ++ (2 +: original.drop(2))
But using a mutable collection like Augusto suggested might fit better, depending on your use case (e.g. performance, array size, multiple such edits...).
The question is, what's the context? If you are doing this in a loop, allocating a new array every time will kill your performance anyway, and you should rethink your approach (e.g. collect all the elements you want to insert before inserting them).
If you aren't, well, you can use System.arraycopy to avoid any intermediate conversions:
val original = Array(0, 1, 3, 4)
val index = 2
val valueToInsert = 2
val modified = Array.ofDim[Int](original.length + 1)
System.arraycopy(original, 0, modified, 0, index)
modified(index) = valueToInsert
System.arraycopy(original, index, modified, index + 1, original.length - index)
But note how easy it's to make an off-by-one error here (I think there isn't one, but I haven't tested it). So the only reason to do it is if you really need high performance, and that's only likely if it happens in a loop, in which case go back to the second sentence.

How do I algorithmically instantiate and manipulate a multidimensional array in Scala

I am trying to wrote a program to manage a Database through a Scala Gui, and have been running into alot of trouble formatting my data in such a way as to input it into a Table and have the Column Headers populate. To do this, I have been told I would need to use an Array[Array[Any]] instead of an ArrayBuffer[ArrayBuffer[String]] as I have been using.
My problem is that the way I am trying to fill these arrays is modular: I am trying to use the same function to draw from different tables in a MySQL database, each of which has a different number of columns and entries.
I have been able to (I think) define a 2-D array with
val Data = new Array[Array[String]](numColumns)(numRows)
but I haven't found any ways of editing individual cells in this new array.
Data(i)(j)=Value //or
Data(i,j)=Value
do not work, and give me errors about "Update" functionality
I am sure this can't possibly be as complicated as I have been making it, so what is the easy way of managing these things in this language?
You don't need to read your data into an Array of Arrays - you just need to convert it to that format when you feed it to the Table constuctor - which is easy, as demonstrated my answer to your other question: How do I configure the Column names in a Scala Table?
If you're creating a 2D array, the idiom you want is
val data = Array.ofDim[String](numColumms, numRows)
(There is also new Array[String](numColumns, numRows), but that's deprecated.)
You access element (i, j) of an Array data with data(i)(j) (remember they start from 0).
But in general you should avoid mutable collections (like Array, ArrayBuffer) unless there's a good reason. Try Vector instead.
Without knowing the format in which you're retrieving data from the database it's not possible to say how to put it into a collection.
Update:
You can alternatively put the type information on the left hand side, so the following are equivalent (decide for yourself which you prefer):
val a: Array[Array[String]] = Array.ofDim(2,2)
val a = Array.ofDim[String](2,2)
To explain the syntax for accessing / updating elements: as in Java, a multi-dimensional array is just an array of arrays. So here, a(i) is element i of a, which an Array[String], and so a(i)(j) is element j of that array, which is a String.
Luigi's answer is great, but I'd like to shed some light on why your code isn't working.
val Data = new Array[Array[String]](numColumns)(numRows)
does not do what you expect it to do. The new Array[Array[String]](numColumns) part does create an array of array of strings with numColumns entries, with all entries (arrys of strings) being null, and returns it. The following (numRows) then just calls the apply function on that returned object, which returns the numRowsth entry in that list, which is null.
You can try that out in the scala REPL: When you input
new Array[Array[String]](10)(9)
you get this as output:
res0: Array[String] = null
Luigi's solution, instead
Array.ofDim[String](2,2)
does the right thing:
res1: Array[Array[String]] = Array(Array(null, null), Array(null, null))
It's rather ugly, but you can update a multidimensional array with update
> val data = Array.ofDim[String](2,2)
data: Array[Array[String]] = Array(Array(null, null), Array(null, null))
> data(0).update(0, "foo")
> data
data: Array[Array[String]] = Array(Array(foo, null), Array(null, null))
Not sure about the efficiency of this technique.
Luigi's answer is great, but I just wanted to point out another way of initialising an Array that is more idiomatic/functional – using tabulate. This takes a function that takes the array cell coordinates as input and produces the cell value:
scala> Array.tabulate[String](4, 4) _
res0: (Int, Int) => String => Array[Array[String]] = <function1>
scala> val data = Array.tabulate(4, 4) {case (x, y) => x * y }
data: Array[Array[Int]] = Array(Array(0, 0, 0, 0), Array(0, 1, 2, 3), Array(0, 2, 4, 6), Array(0, 3, 6, 9))

Resources