Array Index Out of Bound Exception - Scala

Array Index Out of Bound Exception - Scala - arrays

I want to print a two dimensional matrix in Scala and I keep getting Array Index Out of Bound Exception.
I have used breakable code and still I am encountering the issue.
package com.edureka.scala
import scala.util.control.Breaks._
class Pascal
{
val r,c=0
val matrix=Array.ofDim[Int](r,c) //declare a two-dimensional array
def fun
{
breakable
{
for(r <- 0 until 4 ;c <- 0 until 4)
{
println(matrix(r)(c)=r+c)
if(r>3)break
}
}
}
}
object pas1 extends App
{
val pasobj=new Pascal()
pasobj.fun
}

You are creating an empty array:
val matrix = Array.ofDim[Int](0, 0)
matrix: Array[Array[Int]] = Array()
Since there are no entries, retrieving one fails:
scala> matrix(0)(0)
java.lang.ArrayIndexOutOfBoundsException: 0
And assigning to one fails, as well:
scala> matrix(0)(0) = 0
java.lang.ArrayIndexOutOfBoundsException: 0
You need to declare an array of 4x4 dimension:
val matrix = Array.ofDim[Int](4, 4)
matrix: Array[Array[Int]] = Array(Array(0, 0, 0, 0), Array(0, 0, 0, 0), ...)
Then you can assign successfully:
scala> matrix(3)(3) = 3
And retrieve as well:
scala> matrix(3)(3)
res1: Int = 3

You define an empty array of array of ints, since you declare r,c=0
# val m = Array.ofDim[Int](0, 0)
m: Array[Array[Int]] = Array()
And then in your loop you try to access the elements in that array (which do not exist)
# m(0)(0)
java.lang.ArrayIndexOutOfBoundsException: 0
$sess.cmd5$.<init>(cmd5.sc:1)
$sess.cmd5$.<clinit>(cmd5.sc:-1)
Simply creating an array of arrays does not fill it with values, especially when you set its dimensions as 0. You can set the dimensions higher and you will have a populated array:
# val m2 = Array.ofDim[Int](5, 5)
m2: Array[Array[Int]] = Array(
Array(0, 0, 0, 0, 0),
Array(0, 0, 0, 0, 0),
Array(0, 0, 0, 0, 0),
Array(0, 0, 0, 0, 0),
Array(0, 0, 0, 0, 0)
)
# m2(1)(4)
res7: Int = 0

Related

how to feed DataGenerator for KERAS multilabel issue?

I am working on a multilabel classification problem with KERAS.
When i execute the code like this i get the following error:
ValueError: Error when checking target: expected activation_19 to have 2 dimensions, but got array with shape (32, 6, 6)
This is because of my lists full of "0" and "1" in the labels dictionary, which dont fit to keras.utils.to_categorical in return statement, as i learned recently. softmax cant handle more than one "1" as well.
I guess I first need a Label_Encoder and afterwards One_Hot_Encoding for labels, to avoid multiple "1" in labels, which dont go together with softmax.
I hope someone can give me a hint how to preprocess or transform labels data, to get the code fixed. I will appreciate a lot.
Even a code snippet would be awesome.
csv looks like this:
Filename label1 label2 label3 label4 ... ID
abc1.jpg 1 0 0 1 ... id-1
def2.jpg 0 1 0 1 ... id-2
ghi3.jpg 0 0 0 1 ... id-3
...
import numpy as np
import keras
from keras.layers import *
from keras.models import Sequential
class DataGenerator(keras.utils.Sequence):
'Generates data for Keras'
def __init__(self, list_IDs, labels, batch_size=32, dim=(224,224), n_channels=3,
n_classes=21, shuffle=True):
'Initialization'
self.dim = dim
self.batch_size = batch_size
self.labels = labels
self.list_IDs = list_IDs
self.n_channels = n_channels
self.n_classes = n_classes
self.shuffle = shuffle
self.on_epoch_end()
def __getitem__(self, index):
'Generate one batch of data'
# Generate indexes of the batch
indexes = self.indexes[index*self.batch_size:(index+1)*self.batch_size]
# Find list of IDs
list_IDs_temp = [self.list_IDs[k] for k in indexes]
# Generate data
X, y = self.__data_generation(list_IDs_temp)
return X, y
def on_epoch_end(self):
'Updates indexes after each epoch'
self.indexes = np.arange(len(self.list_IDs))
if self.shuffle == True:
np.random.shuffle(self.indexes)
def __data_generation(self, list_IDs_temp):
'Generates data containing batch_size samples' # X : (n_samples, *dim, n_channels)
# Initialization
X = np.empty((self.batch_size, *self.dim, self.n_channels))
y = np.empty((self.batch_size, self.n_classes), dtype=int)
# Generate data
for i, ID in enumerate(list_IDs_temp):
# Store sample
X[i,] = np.load('Folder with npy files/' + ID + '.npy')
# Store class
y[i] = self.labels[ID]
return X, keras.utils.to_categorical(y, num_classes=self.n_classes)
-----------------------
# Parameters
params = {'dim': (224, 224),
'batch_size': 32,
'n_classes': 21,
'n_channels': 3,
'shuffle': True}
# Datasets
partition = partition
labels = labels
# Generators
training_generator = DataGenerator(partition['train'], labels, **params)
validation_generator = DataGenerator(partition['validation'], labels, **params)
# Design model
model = Sequential()
model.add(Conv2D(32, (3,3), input_shape=(224, 224, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
...
model.add(Flatten())
model.add(Dense(64))
model.add(Activation('relu'))
model.add(Dense(21))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
# Train model on dataset
model.fit_generator(generator=training_generator,
validation_data=validation_generator)

Since you already have the labels as a vector of 21 elements of 0 and 1, you shouldn't use keras.utils.to_categorical in the function __data_generation(self, list_IDs_temp). Just return X and y.

Ok i have a solution but i'm not sure that's the best .. :
from sklearn import preprocessing #for LAbelEncoder
labels_list = [x[1] for x in labels.items()] #get the list of all sequences
def convert(list):
res = int("".join(map(str, list)))
return res
label_int = [convert(i) for i in labels_list] #Convert each sequence to int
print(label_int) #E.g : [1,2,3] become 123
le = preprocessing.LabelEncoder()
le.fit(label_int)
labels = le.classes_ #Encode each int to only get the uniques
print(labels)
d = dict([(y,x) for x,y in enumerate(labels)]) #map each unique sequence to an label like 0, 1, 2, 3 ...
print(d)
labels_encoded = [d[i] for i in label_int] #get all the sequence and encode them with label obtained
print(labels_encoded)
labels_encoded = to_categorical(labels_encoded) #encode to_cagetorical
print(labels_encoded)
This is not really clean i think, but it's working
You need to change your last Dense layer to have a number of neurones equal to the lenght of the labels_encoded sequences.
For the predictions, you will have the dict "d" that map the predicted value to your orginal sequence style.
Tell me if you need clarifications !
For a few test sequences, it's gives you that :
labels = {'id-0': [1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1],
'id-1': [0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
'id-2': [0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1],
'id-3': [1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1],
'id-4': [0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1]}
[100100001100000001011, 10100001100000000001, 100001100010000001, 100100001100000001011, 10100001100000000001]
[100001100010000001 10100001100000000001 100100001100000001011]
{100001100010000001: 0, 10100001100000000001: 1, 100100001100000001011: 2}
[2, 1, 0, 2, 1]
[[0. 0. 1.]
[0. 1. 0.]
[1. 0. 0.]
[0. 0. 1.]
[0. 1. 0.]]
EDIT after clarification :
Ok i read a little more about the subject, once more the problem of softmax is that it will try to maximize a class while minize the others.
So i would sugest to keep your arrays of 21 ones's and zeros's but instead of using Softmax, use Sigmoid (to predict a probability between 0 and 1 for each class) with binary_crossentropy.
And use a treshold for your predictions :
preds = model.predict(X_test)
preds[preds>=0.5] = 1
preds[preds<0.5] = 0
Keep me in touch of the results !

ValueError: setting an array element with a sequence - passing a list in a dictionary to DataGenerator

I am working on a keras multilabel problem. In order to work with big amount of data to avoid memory issues, I implemented a custom data generator.
So far I work with a csv file with IDs, Filenames and their corresponding labels (21 in total), which looks like this:
Filename label1 label2 label3 label4 ... ID
abc1.jpg 1 0 0 1 ... id-1
def2.jpg 1 0 0 1 ... id-2
ghi3.jpg 1 0 0 1 ... id-3
...
I put the the ids and the labels in dictionaries which have the following output:
partition: {'train': ['id-1','id-2','id-3',...], 'validation': ['id-7','id-14','id-21',...]}
labels: {'id-0': [1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
'id-1': [1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
'id-2': [1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1],
...}
All my images are converted to arrays and saved in single npy files. id-1.npy, id-2.npy...
Then I am executing my code:
import numpy as np
import keras
from keras.layers import *
from keras.models import Sequential
class DataGenerator(keras.utils.Sequence):
'Generates data for Keras'
def __init__(self, list_IDs, labels, batch_size=32, dim=(224,224), n_channels=3,
n_classes=21, shuffle=True):
'Initialization'
self.dim = dim
self.batch_size = batch_size
self.labels = labels
self.list_IDs = list_IDs
self.n_channels = n_channels
self.n_classes = n_classes
self.shuffle = shuffle
self.on_epoch_end()
def __len__(self):
'Denotes the number of batches per epoch'
return int(np.floor(len(self.list_IDs) / self.batch_size))
def __getitem__(self, index):
'Generate one batch of data'
# Generate indexes of the batch
indexes = self.indexes[index*self.batch_size:(index+1)*self.batch_size]
# Find list of IDs
list_IDs_temp = [self.list_IDs[k] for k in indexes]
# Generate data
X, y = self.__data_generation(list_IDs_temp)
return X, y
def on_epoch_end(self):
'Updates indexes after each epoch'
self.indexes = np.arange(len(self.list_IDs))
if self.shuffle == True:
np.random.shuffle(self.indexes)
def __data_generation(self, list_IDs_temp):
'Generates data containing batch_size samples' # X : (n_samples, *dim, n_channels)
# Initialization
X = np.empty((self.batch_size, *self.dim, self.n_channels))
y = np.empty((self.batch_size), dtype=int)
# Generate data
for i, ID in enumerate(list_IDs_temp):
# Store sample
X[i,] = np.load('Folder with npy files/' + ID + '.npy')
# Store class
y[i] = self.labels[ID]
return X, keras.utils.to_categorical(y, num_classes=self.n_classes)
# Parameters
params = {'dim': (224, 224),
'batch_size': 32,
'n_classes': 21,
'n_channels': 3,
'shuffle': True}
# Datasets
partition = partition
labels = labels
# Generators
training_generator = DataGenerator(partition['train'], labels, **params)
validation_generator = DataGenerator(partition['validation'], labels, **params)
# Design model
model = Sequential()
model.add(Conv2D(32, (3,3), input_shape=(224, 224, 3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
...
model.add(Flatten())
model.add(Dense(64))
model.add(Activation('relu'))
model.add(Dense(21))
model.add(Activation('softmax'))
model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
# Train model on dataset
model.fit_generator(generator=training_generator,
validation_data=validation_generator)
and the following Error raises:
ValueError: setting an array element with a sequence
the following part of the error seems to be crucial:
<ipython-input-58-fedc63607310> in __getitem__(self, index)
31
32 # Generate data
---> 33 X, y = self.__data_generation(list_IDs_temp)
34
35 return X, y
<ipython-input-58-fedc63607310> in __data_generation(self, list_IDs_temp)
53
54 # Store class
---> 55 y[i] = self.labels[ID]
56
57 return X, keras.utils.to_categorical(y, num_classes=self.n_classes)
as soon as i replace labels from the beginning with the following, the code gets executed:
labels = {'id-0': 0,
'id-1': 2,
'id-2': 1,
...}
I still want to pass multiple labels to the DataGenerator, therefore I chose to put a list in the dictionary, as shown in the beginning, but this gives me the ValueError. How can I anyway pass multiple values for a single ID to the DataGenerator as suggested? What do I have to adjust?
A hint or a snippet of code I appreciate a lot.

If i understand well your code here is the problem :
y = np.empty((self.batch_size), dtype=int)
You are creating an emty 1D array, but here :
y[i] = self.labels[ID]
You are filling it with a sequence :
'id-0': [1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1]
In order to work you need to create your label array with the shape of your batch_size and the lenght of your sequence :
y = np.empty((self.batch_size, len(sequence)), dtype=int)
EDIT
to_categorical is to encode categorical feature to be arrays like [0, 0, 0, 1], [0, 0, 1, 0], etc But you are feeding sequences, not categorical features.
By feeding sequences to your network, you don't want to one_hot encode it so replace :
return X, keras.utils.to_categorical(y, num_classes=self.n_classes)
by :
return X, y
Recommendation from last comment
The problem is that your Softmax activation will try to give the best score to the correct class, but here you give sequence array that softmax will interpret with multiple "correct class" :
For exemple : if you have 3 labels [1, 2, 3], by one_hot encoding you will have [1, 0, 0], [0, 1, 0], [0, 0, 1], there is only one "1" per encoded label array, one correct class, softmax will try to get this class score bigger as possible.
But in you case your are giving arrays with multiple "1's" :
with that : [1, 0, 1] softmax don't know to which class give the best score.
So i would recommand that, you start with your 21 labels [0,1,2,3, ..] and then you one_hot encode this array and you give it to your network.
If you really need that sequence, you have to find an other solution !
Hope i'm clear !

Multiply a list of vectors to different matrices conditioned on the vectors' names

I have a list of 20-length vectors that I would like to multiply each of those with one of three matrices depending on the length vectors' names. Here is my unsuccessful attempt. Please suggest how I improve my code. Any help is much appreciated!
for (i in 1:length(List)){
.$Value=ifelse(names(List) %in% c("a","b","c"),matrixA%*%.$Value,ifelse(names(List) %in% c("d","e"),matrixB%*%.$Value, matrixC%*%.$Value))
}
Part of my list and the matrices are included below.
list(a = structure(c(3, 0, 0, 5, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 1, 10, 0, 0, 1, 1), .Dim = c(20L, 1L)), b = structure(c(2,
0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 1, 0, 6, 0), .Dim = c(20L,
1L)))
matrixA <- diag(2,20)
matrixB <- diag(1,20)
matrixC <- diag(4,20)

So... Not sure I understand. But it seems like if the list has name a, b or c you want to multiply it to matrixA, if it's d or e you want to multiply it to matrixB and if neither, the values should be multiplied to matrixC.
Let's use your example.
zz <- list(a = structure(c(3, 0, 0, 5, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 1, 10, 0, 0, 1, 1), .Dim = c(20L, 1L)), b = structure(c(2,
0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 1, 0, 6, 0), .Dim = c(20L,
1L)))
matrixA <- diag(2,20)
matrixB <- diag(1,20)
matrixC <- diag(4,20)
This is probably not the best solution but it is a simple one. I just made a few tweaks to your ideia so it would work, a matrix needs to be protected by list() (because a list is a vector, and ifelse works with vectors) inside an ifelse() otherwise you only get the first element. This will return you a list of the results.
results <- lapply(seq_along(zz), function(i){
ifelse(names(zz[i]) %in% c("a","b","c"),list(matrixA%*%zz[[i]]),
ifelse(names(zz[i]) %in% c("d","e"), list(matrixB%*%zz[[i]]), list(matrixC%*%zz[[i]])))
})
I used lapply to apply the sequence to (1 to length of zz) to the defined function. For each i the function looks at the name of i element zz (zz[i] returns the element of the list with its name) and if it satisfies the condition we multiply the content of the i element of zz (zz[[i]] just returns the content of the i element of the list without its name) by a predefined matrix.
This also works and you don't need to protect the matrix using list() which is kinda of a bother.
results <- lapply(seq_along(zz), function(i){
if(names(zz[i]) %in% c("a","b","c")) matrixA%*%zz[[i]] else
if(names(zz[i]) %in% c("d","e")) matrixB%*%zz[[i]]
else matrixC%*%zz[[i]]
})
Edit: #akrun answer is way more beautiful and short.

May be this helps
nm1 <- paste0("matrix", toupper(names(lst1)))
Map(crossprod, lst1, mget(nm1))

How do I set multiple array values by index in Scala?

Suppose I have a sequence of integers and a number n < 30. How can I produce an array (of length n) that is 0 in all places except at the indices specified by the sequence (where it should be 1)?
For instance
Input:
Seq(1, 2, 5)
7
Output:
Array(0, 1, 1, 0, 0, 1, 0)

scala> val a = Array.fill(7)(0)
a: Array[Int] = Array(0, 0, 0, 0, 0, 0, 0)
scala> Seq(1,2,5).foreach(a(_) = 1)
scala> a
res1: Array[Int] = Array(0, 1, 1, 0, 0, 1, 0)

Alternatively,
scala> val is = Set(1, 2, 5)
is: scala.collection.immutable.Set[Int] = Set(1, 2, 5)
scala> Array.tabulate(10)(i => if (is contains i) 1 else 0)
res0: Array[Int] = Array(0, 1, 1, 0, 0, 1, 0, 0, 0, 0)

def makeArray(indices: Seq[Int], size: Int): Array[Int] = Iterable.tabulate(size) {
case idx if indices contains idx => 1
case _ => 0
}.toArray
makeArray(Seq(1, 2, 5), size = 7)

A simplest way to convert array to 2d array in scala

I have a 10 × 10 Array[Int]
val matrix = for {
r <- 0 until 10
c <- 0 until 10
} yield r + c
and want to convert the "matrix" to an Array[Array[Int]] with 10 rows and 10 columns.
What is the simplest way to do it?

val matrix = (for {
r <- 0 until 3
c <- 0 until 3
} yield r + c).toArray
// Array(0, 1, 2, 1, 2, 3, 2, 3, 4)
scala> matrix.grouped(3).toArray
// Array(Array(0, 1, 2), Array(1, 2, 3), Array(2, 3, 4))

If I understand correctly, you can do :
Array.tabulate(10,10)(_+_)
//> res0: Array[Array[Int]] = Array(Array(0, 1, 2, 3, 4, 5, 6, 7, 8, 9), ....)
If you just need a 10 x 10 Array[Int] without any values you can do,
Array.ofDim[Int](10,10)
/> res1: Array[Array[Int]] = Array(Array(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), Array(0
//| , 0, 0, 0, 0, 0, 0, 0, 0, 0), Array(0, 0, 0, 0, 0, 0, 0, 0, 0, 0), Array(0, ....

The code you showed gives you a Vector of Int, not an Array. If Vector and it is okay to generate a new you just need to yield twice
val matrix = for (r <- 1 to 10)
yield for(c <- 1 to 10)
yield r+c
If you need to convert the existing Vector to Array[Array[Int]] as you said, use grouped as chris-martin suggested
matrix.grouped(10).toArray.map(_.toArray)

for (x <- (0 until 10).toArray) yield (x until x + 10).toArray

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Array Index Out of Bound Exception - Scala - arrays

Related

how to feed DataGenerator for KERAS multilabel issue?

ValueError: setting an array element with a sequence - passing a list in a dictionary to DataGenerator

Multiply a list of vectors to different matrices conditioned on the vectors' names

How do I set multiple array values by index in Scala?

A simplest way to convert array to 2d array in scala

Categories

Resources