I am trying to use Artificial Intelligence algorithm to replace a system which identifies correct quantity. The quantity will be considered as "Yes" if it's in multiple of a number and "No" is it's not in multiples. Also, the other factor which it uses are > and < a number. I tried to use scikit learn RandomForestClassifier algorithms, but it doesn't get trained for the multipliers. Can you please suggest an algorithm which will best suit this. Thanks.
I tried to use scikit learn RandomForestClassifier algorithms
import sklearn
import seaborn as sns
import pandas as pd
import numpy as np
from sklearn import datasets
from sklearn.neural_network import MLPClassifier
from sklearn import tree
from sklearn.preprocessing import scale
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import confusion_matrix, classification_report
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.model_selection import train_test_split
%matplotlib inline
from sklearn import svm
from TFANN import ANNR
from google.colab import files
files.upload()
data=pd.read_csv('qty.csv')
data.head()
data.info()
validate=LabelEncoder()
data['Type']=validate.fit_transform(data['Type'])
data['ans']=validate.fit_transform(data['ans'])
data.head()
sns.countplot(data['ans'])
X=data.drop('ans', axis=1)
y = data['ans']
X_train, X_test, y_train, y_test= train_test_split(X,y,test_size=0.2, random_state=42)
#sc=StandardScaler()
#X_train=sc.fit_transform(X_train)
#X_test=sc.fit_transform(X_test)
print(X_train)
rfc = RandomForestClassifier(n_estimators=200)
rfc.fit(X_train,y_train)
pred_rfc=rfc.predict([[0,12,20]])
#print(X_test)
print (pred_rfc)
If I want to predict multiple of 12 in the following, it doesn't enter code herework as expected. How can I use AI algorithm to train multiples?
pred_rfc=rfc.predict([[0,12,2400]])
In this case I'd start by further defining the problem. Do you need it to work for only multiples in your training set, all multiples within a specified range, or all multiples unconstrained?
If you only need it to work for values in your training set, then most ML algorithms will work just fine. If you need it to work on all values in a specified range, then again most ML algorithms will work just fine, but some might require some additional refinement. If you need it to work for all multiples, then you need to focus on selecting an appropriate underlying model.
A random forest like you are using here will not perform well beyond the extremes (high and low) of your training data because the underlying model does not extrapolate beyond the extremes of the training data. There are plenty of alternative models that can precisely match multiples however, for example a sine wave. The period of a sine wave determines how often the value reaches 1, so if you learn the correct period from the data, then you can predict all multiples with some degree of success.
Related
I'm using caffe to do the object detection with SSD model, and recently work I adjust the loss type of "MultiBoxLoss".
In the multibox_loss_layer.cpp file, its loss has SOFTMAX as default and LOGISTIC option, I add the hingeloss(SVM) option into caffe code, and do the training but the result is bad.
Now the boss want me to use SVM to classify the feature map by python sklearn.
And a question come across to me, in the multibox_loss_layer.cpp file, there can use the softmax, logistic and hingeloss to calculate the loss. On this step, its data is just "one-dimension", but the feature map is high-dimension, and I internet the article, it seem softmax can't classify high-dimension data.
Ex: if there have three class: cat, dog and rabbit, then it's one-dimension data just have three value to represent cat, dog and rabbit(one value for each class), but the high-dimension data, it have many value(like feature map) for each class, and on the high-dimension case, softmax seems have no work for this.
so I wonder what's the different between softmax, logistic and SVM. Can anybody help? thank you!
Never seen applying SVM loss function into NN. However softmax is a loss function which should be used in order to optimize solution multiclass classifiaction problem. Softmax "transform" NN outputs into probability of each class occurance. Logistic function usually optimize each neuron output as a logistic problem, so it's not force output to be only one class. You should use this function if you want to solve multi labeling problem.
SVM is not a function, is a different classifier. There is no sense in comparing softmax with SVM, because first one is a loss function second one is a classifier.
Background
I have a list with the paths of thousand image stacks (3D numpy arrays) preprocessed and saved as .npy binaries.
Case Study I would like to calculate the mean of all the images and in order to speed the analysis I thought to parallelise the processing.
Approach using dask.delayed
# List with the file names
flist_img_to_filter
# I chunk the list of paths in sublists. The number of chunks correspond to
# the number of cores used for the analysis
chunked_list
# Scatter the images sublists to be able to process in parallel
futures = client.scatter(chunked_list)
# Create dask processing graph
output = []
for future in futures:
ImgMean = delayed(partial_image_mean)(future)
output.append(ImgMean)
ImgMean_all = delayed(sum)(output)
ImgMean_all = ImgMean_all/len(futures)
# Compute the graph
ImgMean = ImgMean_all.compute()
Approach using dask.arrays
modified from Matthew Rocklin blog
imread = delayed(np.load, pure=True) # Lazy version of imread
# Lazily evaluate imread on each path
lazy_values = [imread(img_path) for img_path in flist_img_to_filter]
arrays = [da.from_delayed(lazy_value, dtype=np.uint16,shape=shape) for
lazy_value in lazy_values]
# Stack all small Dask arrays into one
stack = da.stack(arrays, axis=0)
ImgMean = stack.mean(axis=0).compute()
Questions
1. In the dask.delayed approach is it necessary to pre-chunk the list? If I scatter the original list I obtain a future for each element. Is there a way to tell a worker to process the futures it has access to?
2. The dask.arrays approach is significantly slower and with higher memory usage. Is this a 'bad way' to use dask.arrays?
3. Is there a better way to approach the issue?
Thanks!
In the dask.delayed approach is it necessary to pre-chunk the list? If I scatter the original list I obtain a future for each element. Is there a way to tell a worker to process the futures it has access to?
Simple answer is no, as of Dask version 0.15.4 there is no very robust way to submit a computation on "all of the tasks of a certain type currently present on this worker".
However, you can easily ask the scheduler which keys are present on the scheduler using the who_has or has_what client methods.
from dask.distributed import wait
import wait
futures = dask.persist(futures)
wait(futures)
client.who_has(futures)
The dask.arrays approach is significantly slower and with higher memory usage. Is this a 'bad way' to use dask.arrays?
You might want to play with the split_every= keyword of the mean function or else rechunk your array to group images together (probably similar to what yo do above) before calling mean to play with parallelism/memory tradeoffs.
Is there a better way to approach the issue?
You might also try as_completed and compute running means as data completes. You would have to switch from delayed to futures for this.
I have a 2D numpy array (named enviro_grid) with zeros, ones, and twos that shuffle around per loop iteration. I would like to animate the iterations of a changing colormap so that I can sort of visually verify/demonstrate that all the agents are following the behaviour I expect.
Skeleton outline of code:
import numpy as np
import random
import pylab as plt
#...initialize values, setup, seed grid with 0's, 1's, 2's, etc...
for t_weeks in range(time_limit):
for j in range(player_n):
#...Here lie a bunch of for/if loops which shuffle values around via rules...
#...culminating in grid/array updates via the following line...
#...which is seen a few times per iteration as the P[:,j]'s change.
#...Note that P[6, j] and P[7, j] are just x, y array locations for each agent...
#...while P[0, j] is just a designation of 1 or 2
enviro_grid[int( P[6, j] ), int( P[7, j] )] = int( P[0, j] )
#...Then I have this, which I don't really understand so much as...
#... just copy/pasted from somewhere
im = plt.imshow(enviro_grid, cmap = 'hot')
plt.colorbar(im, orientation='horizontal')
plt.show()
I've looked at a few links already for help; for instance, these
How to animate the colorbar in matplotlib
Colormap issue using animation in matplotlib
http://jakevdp.github.io/blog/2012/08/18/matplotlib-animation-tutorial/
but I'm just too out of my league re: Python to understand half of what I'm seeing, much less tailor it to my own code. Just to help illustrate the kind of very basic help that I will probably need to be useful, the plotting I've done in the past is more of the form plot(x, y, options), and any animation I've done was just a natural byproduct of plotting during loop iterations. All of this fig.method and plt.method stuff, culminating in a final plt.show(), confuses and enrages me. The use of def() functions in all of the above examples further exacerbates my lack of clarity as far as transitioning lines to my own context.
Would anyone mind providing a working solution for this based on my code? I can provide more details if needed but am trying to keep this short per stackoverflow preference.
Thanks in advance for any help anyone can provide.
The biggest challenge in doing simple animations in matplotlib is to understand and work around the blocking behavior of the plt.show command. It is nicely described in this question. For your specific problem, maybe something like this will be a good starting point:
import numpy as np
import pylab as plt
def get_data(): return np.random.randint(0,3,size=(5,5))
im = None
for _ in xrange(20): # draw 20 frames
if not im:
# for the first frame generate the plot...
im = plt.imshow(get_data(), cmap = 'hot', interpolation='none',vmin=0,vmax=2)
plt.colorbar(im, orientation='horizontal')
else:
# ... for subsequent times only update the data
im.set_data(get_data())
plt.draw()
plt.pause(0.1)
Note that you need the plt.pause command to give your GUI enough time to actually draw the plot, but you can probably make the time much shorter.
There is no PIL on Python3. Pillow seems to be the right way but it is not in a standard library. Is there a way to convert gif (or another image format) to numpy array that is not require installation of additional python packages?
You can also use the libraries PILasOPENCV or gif2numpy from Pypi. Install them with
pip install gif2numpy
gif2numpy works like this:
import gif2numpy
import cv2
np_frames, extensions, image_specifications = gif2numpy.convert("yourimage.gif")
cv2.imshow("test", np_frames[0])
cv2.waitKey()
If you have SciPy installed (as, I assume, most people using NumPy do), ndimage allows you to read in images as NumPy arrays:
from scipy import ndimage
im_array = ndimage.imread("image_file.gif")
ndimage is good. if you don't wanna see deprecated warning, you can use
import matplotlib.pyplot as plt
img_array = plt.imread('image_file.gif')
The sensor module in my project consists of a rotating camera, that collects noisy information about moving objects in the surrounding environment.
The information consists of distance, angle and relative change of the moving objects..
The limiting view range of the camera makes it essential to rotate the camera periodically to update environment information...
I was looking for algorithms / ways to model these information, in order to be able to guess / predict / learn motion properties of these object..
My current proposed idea is to store last n snapshots of each object in a queue. I take weighted average of positions and velocities of moving object, but I think it is a poor method...
Can you state some titles that suit this case?
Thanks
Kalman {Extended, unscented, ... } filters and particle filters only after reading about Kalman filters.
Kalman filters learn and predict the correct data from noisy data with a Gaussian assumption, so it may be of use to you. If you need non-Gaussian methods, look at the particle filter.