How to iterate through dataset and train the model - loops

I want to build a model wherein I want to iterate through 1st 80000 values and train the model, then train on next 80000 point and so on. Is it possible to do it and also allowed?
length=80000
train_data=[]
train_tar=[]
for i in range (0, len(X_train), length):
train_data[i]=X_train.iloc[i:i+length, :]
train_tar[i]=Y_train.iloc[i:i+length, :]
X_training, X_val, Y_training, Y_val = train_test_split(train_data[i], train_tar[i], test_size=0.40, shuffle=False )
scaler1= StandardScaler()
X_training =scaler1.fit_transform(X_training[i])
X_val[i]=scaler1.transform(X_val[i])
X_test[i]=scaler1.transform(X_test[i])
scaler2= StandardScaler()
Y_training[i] =scaler2.fit_transform(Y_train[i])
Y_val[i]=scaler2.transform(Y_val[i])
Y_test[i]=scaler2.transform(Y_test[i])
train_gen[i] = tf.keras.utils.timeseries_dataset_from_array( X_training[i], Y_training[i], sequence_length=160, sequence_stride=1, batch_size=256,sampling_rate=1,shuffle=False)
val_gen[i] = tf.keras.utils.timeseries_dataset_from_array( X_val[i], Y_val[i], sequence_length=160, sequence_stride=1, batch_size=256,sampling_rate=1,shuffle=False)
batch =train_gen
inputs, target=batch
input= inputs.shape[1], inputs.shape[2]
print(input)
def ann():
model = Sequential()
model.add(Dense(1000, input_shape=input))
model.add(Dense(100))
model.add(Flatten())
model.add(Dense(2, activation='linear'))
model.compile(optimizer=Adam(learning_rate = 1e-6), loss= 'mse', metrics=(['accuracy']))
model.summary()
return model
model = ann()
history=model.fit(train_gen[i], validation_data=val_gen[i], shuffle=True,epochs=10,verbose=1)
But I am getting error at the following line, how can I resolve this.
Indexerror:
train_data[i]=X_train.iloc[i:i+length, :]
IndexError: list assignment index out of range

i is not the right variable to index train_data and train_tar. Use enumerate to fix your problem:
# Now:
# i -> the index of the current loop
# j -> the slicing index
for i, j in enumerate(range(0, len(X_train), length)):
train_data.append(X_train.iloc[j:j+length, :])
train_tar.append(Y_train.iloc[j:j+length, :])
# Rest of your code

Related

RandomizedSearchCV - IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed

I'm using HDBSCAN clustering algorithm and using RandomizedSearchCV. When I fit the features with labels, I get error "IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed". Shape of embedding is (5000,4) and of hdb_labels is (5000,). Below is my code
# UMAP
umap_hdb = umap.UMAP(n_components=4, random_state = 42)
embedding = umap_hdb.fit_transform(customer_data_hdb)
# creating HDBSCAN wrapper
class HDBSCANWrapper(hdbscan.HDBSCAN):
def predict(self,X):
return self.labels_.astype(int)
# HBDSCAN
clusterer_hdb = HDBSCANWrapper(min_samples=40, min_cluster_size=1000, metric='manhattan', gen_min_span_tree=True).fit(embedding)
hdb_labels = clusterer_hdb.labels_
# specify parameters and distributions to sample from
param_dist = {'min_samples': [10,30,50,60,100,150],
'min_cluster_size':[100,200,300,400,500],
'cluster_selection_method' : ['eom','leaf'],
'metric' : ['euclidean','manhattan']
}
# validity_scroer
validity_scorer = make_scorer(hdbscan.validity.validity_index,greater_is_better=True)
n_iter_search = 20
random_search = RandomizedSearchCV(clusterer_hdb
,param_distributions=param_dist
,n_iter=n_iter_search
,scoring=validity_scorer
,random_state=42)
random_search.fit(embedding, hdb_labels)
I'm getting an error in the random_search.fit and could not get rid of it. Any suggestions/help would be appreciated.

Lapply function to anova and post hoc test cld

I am new to r and I am trying to get my mind around the apply function. So far I managed to run my anovas for all the the variables on my data and I got the pairwise comparison.
varlist <- names(dt)[5:length(dt)]
# loop
models <- lapply(X = varlist,
FUN = function(t) lm(formula = paste0("`", t, "` ~ block+irrigation*genotype"), data = dt))
#Name the list of models to the column name
names(models) = varlist
## apply anova to each model stored in the list, models
lapply(models, anova)
#marginal-means-all-variable}
res.model1 <- lapply(models, function(x) pairs(emmeans(x, ~genotype:irrigation)))
res.model1
So far so good, now I want to create a compact letter list so I can use to plot it. Previously I used the following but I can't work out how to apply an lapply function to the following code
CLD = cld(res.model1,
alpha=0.05,
Letters=letters,
adjust="tukey")
I use the CLD data to create graphs
I manage to get the letters with the following code but then I am not getting the full anova table.
tx <- with(dt, interaction(irrigation, genotype)) # determining the factors
model2 <- lapply(varlist, function(x) {
lm(substitute(i~block+tx, list(i = as.name(x))), data = dt)}) # using the factors already in "tx"
lapply(model2, anova)
letters = lapply(model2, function(m) HSD.test((m), "tx", alpha = 0.05, group = TRUE, console = TRUE))
Any suggestions to achieve what I need.
Thank you

Python, face_recognition convert string to array

I want to convert a variable to a string and then to an array that I can use to compare, but i dont know how to do that.
my code:
import face_recognition
import numpy as np
a = face_recognition.load_image_file('C:\\Users\zivsi\OneDrive\תמונות\סרט צילום\WIN_20191115_10_32_24_Pro.jpg') # my picture 1
b = face_recognition.load_image_file('C:\\Users\zivsi\OneDrive\תמונות\סרט צילום\WIN_20191115_09_48_56_Pro.jpg') # my picture 2
c = face_recognition.load_image_file(
'C:\\Users\zivsi\OneDrive\תמונות\סרט צילום\WIN_20191115_09_48_52_Pro.jpg') # my picture 3
d = face_recognition.load_image_file('C:\\Users\zivsi\OneDrive\תמונות\סרט צילום\ziv sion.jpg') # my picture 4
e = face_recognition.load_image_file(
'C:\\Users\zivsi\OneDrive\תמונות\סרט צילום\WIN_20191120_17_46_40_Pro.jpg') # my picture 5
f = face_recognition.load_image_file(
'C:\\Users\zivsi\OneDrive\תמונות\סרט צילום\WIN_20191117_16_19_11_Pro.jpg') # my picture 6
a = face_recognition.face_encodings(a)[0]
b = face_recognition.face_encodings(b)[0]
c = face_recognition.face_encodings(c)[0]
d = face_recognition.face_encodings(d)[0]
e = face_recognition.face_encodings(e)[0]
f = face_recognition.face_encodings(f)[0]
Here I tried to convert the variable to a string
str_variable = str(a)
array_variable = np.array(str_variable)
my_face = a, b, c, d, e, f, array_variable
while True:
new = input('path: ')
print('Recognizing...')
unknown = face_recognition.load_image_file(new)
unknown_encodings = face_recognition.face_encodings(unknown)[0]
The program cannot use the variable:
results = face_recognition.compare_faces(array_variable, unknown_encodings, tolerance=0.4)
print(results)
recognize_times = int(results.count(True))
if (3 <= recognize_times):
print('hello boss!')
my_face = *my_face, unknown_encodings
please help me
The error shown:
Traceback (most recent call last):
File "C:/Users/zivsi/PycharmProjects/AI/pytt.py", line 37, in <module>
results = face_recognition.compare_faces(my_face, unknown_encodings, tolerance=0.4)
File "C:\Users\zivsi\AppData\Local\Programs\Python\Python36\lib\site-
packages\face_recognition\api.py", line 222, in compare_faces
return list(face_distance(known_face_encodings, face_encoding_to_check) <= tolerance)
File "C:\Users\zivsi\AppData\Local\Programs\Python\Python36\lib\site-packages\face_recognition\api.py", line 72, in face_distance
return np.linalg.norm(face_encodings - face_to_compare, axis=1)
ValueError: operands could not be broadcast together with shapes (7,) (128,)
First of all, the array_variable should actually be a list of the known encodings and not a numpy array.
Also you do not need str.
Now, in your case, if the input images i.e., a,b,c,d,f,e do NOT have the same dimensions, the error will persist. You can not compare images that have different sizes using this function. The reason is that the comparison is based on the distance and distance is defined on vectors of the same length.
Here is a working simple example using the photos from https://github.com/ageitgey/face_recognition/tree/master/examples:
import face_recognition
import numpy as np
from PIL import Image, ImageDraw
from IPython.display import display
# Load a sample picture and learn how to recognize it.
obama_image = face_recognition.load_image_file("obama.jpg")
obama_face_encoding = face_recognition.face_encodings(obama_image)[0]
# Load a second sample picture and learn how to recognize it.
biden_image = face_recognition.load_image_file("biden.jpg")
biden_face_encoding = face_recognition.face_encodings(biden_image)[0]
array_variable = [obama_face_encoding,biden_face_encoding] # list of known encodings
# compare the list with the biden_face_encoding
results = face_recognition.compare_faces(array_variable, biden_face_encoding, tolerance=0.4)
print(results)
[False, True] # True means match, False mismatch
# False: coming from obama_face_encoding VS biden_face_encoding
# True: coming from biden_face_encoding VS biden_face_encoding
To run it go here: https://beta.deepnote.com/project/09705740-31c0-4d9a-8890-269ff1c3dfaf#
Documentation: https://face-recognition.readthedocs.io/en/latest/face_recognition.html
EDIT
To save the known encodings you can use numpy.save
np.save('encodings',biden_face_encoding) # save
load_again = np.load('encodings.npy') # load again

How to use np.where to find the index value of a certain element within an array?

Here is my sample data:
# sample data
xdata = [3.33172, 3.33348, 3.33525, 3.33702, 3.33878, 3.34055, 3.34232,
3.34408, 3.34585, 3.34762, 3.34938, 3.35115, 3.35292 , 3.35468, 3.35645,
3.35822, 3.35998, 3.36175, 3.36352, 3.36529, 3.36705, 3.36882]
ydata = [-0.00437834, -0.00486735, -0.0118371, -0.00582171, 0.00339488,
-0.000369502, -0.000898799, -0.00797662, -0.00853566, -0.0123596,
-0.0162318, -0.0103355, -0.00445416, 0.00137920, -0.00251916, -0.00968244,
0.00260652, 0.00841350, 0.00445556, 0.00373271, 0.00621243, 0.00220983]
How could I use np.where to find the index value of 3.35115, for example?
You need to first turn your data into a numpy array so you can check where it is == to your target value:
>>> np.where(np.array(xdata) == 3.35115)
# (array([11]),)
This says that index 11 of xdata is 3.35115.

numpy slicing using user defined input

I have (in a larger project) data contained in numpy.array.
Based on user input I need to move a selected axis (dimAxisNr) to the first dimension of the array and slice one or more (including the first) dimension based on user input (such as Select2 and Select0 in the example).
Using this input I generate a DataSelect which contains the information needed to slice. But the output size of the sliced array is different from the one using inline indexing. So basically I need a way to generate the '37:40:2' and '0:2' from an input list.
import numpy as np
dimAxisNr = 1
Select2 = [37,39]
Select0 = [0,1]
plotData = np.random.random((102,72,145,2))
DataSetSize = np.shape(plotData)
DataSelect = [slice(0,item) for item in DataSetSize]
DataSelect[2] = np.array(Select2)
DataSelect[0] = np.array(Select0)
def shift(seq, n):
n = n % len(seq)
return seq[n:] + seq[:n]
#Sort and Slice the data
print(np.shape(plotData))
print(DataSelect)
plotData = np.transpose(plotData, np.roll(range(plotData.ndim),-dimAxisNr))
DataSelect = shift(DataSelect,dimAxisNr)
print(DataSelect)
print(np.shape(plotData))
plotData = plotData[DataSelect]
print(np.shape(plotData))
plotDataDirect = plotData[slice(0, 72, None), 37:40:2, slice(0, 2, None), 0:2]
print(np.shape(plotDataDirect))
I'm not sure I've understood your question at all...
But if the question is "How do I generate a slice based on a list of indices like [37,39,40,23] ?"
then I would answer : you don't have to, just use the list as is to select the right indices, like so :
a = np.random.rand(4,5)
print(a)
indices = [2,3,1]
print(a[0:2,indices])
Note that the sorting of the list matters: [2,3,1] yields a different result from [1,2,3]
Output :
>>> a
array([[ 0.47814802, 0.42069094, 0.96244966, 0.23886243, 0.86159478],
[ 0.09248812, 0.85569145, 0.63619014, 0.65814667, 0.45387509],
[ 0.25933109, 0.84525826, 0.31608609, 0.99326598, 0.40698516],
[ 0.20685221, 0.1415642 , 0.21723372, 0.62213483, 0.28025124]])
>>> a[0:2,[2,3,1]]
array([[ 0.96244966, 0.23886243, 0.42069094],
[ 0.63619014, 0.65814667, 0.85569145]])
I have found the answer to my question. I need to use numpy.ix_.
Here is the working code:
import numpy as np
dimAxisNr = 1
Select2 = [37,39]
Select0 = [0,1]
plotData = np.random.random((102,72,145,2))
DataSetSize = np.shape(plotData)
DataSelect = [np.arange(0,item) for item in DataSetSize]
DataSelect[2] = Select2
DataSelect[0] = Select0
#print(list(37:40:2))
def shift(seq, n):
n = n % len(seq)
return seq[n:] + seq[:n]
#Sort and Slice the data
print(np.shape(plotData))
print(DataSelect)
plotData = np.transpose(plotData, np.roll(range(plotData.ndim),-dimAxisNr))
DataSelect = shift(DataSelect,dimAxisNr)
plotDataSlice = plotData[np.ix_(*DataSelect)]
print(np.shape(plotDataSlice))
plotDataDirect = plotData[slice(0, 72, None), 37:40:2, slice(0, 2, None), 0:1]
print(np.shape(plotDataDirect))

Resources