What's the different of "classify" between softmax, logistic and svm? - artificial-intelligence

I'm using caffe to do the object detection with SSD model, and recently work I adjust the loss type of "MultiBoxLoss".
In the multibox_loss_layer.cpp file, its loss has SOFTMAX as default and LOGISTIC option, I add the hingeloss(SVM) option into caffe code, and do the training but the result is bad.
Now the boss want me to use SVM to classify the feature map by python sklearn.
And a question come across to me, in the multibox_loss_layer.cpp file, there can use the softmax, logistic and hingeloss to calculate the loss. On this step, its data is just "one-dimension", but the feature map is high-dimension, and I internet the article, it seem softmax can't classify high-dimension data.
Ex: if there have three class: cat, dog and rabbit, then it's one-dimension data just have three value to represent cat, dog and rabbit(one value for each class), but the high-dimension data, it have many value(like feature map) for each class, and on the high-dimension case, softmax seems have no work for this.
so I wonder what's the different between softmax, logistic and SVM. Can anybody help? thank you!

Never seen applying SVM loss function into NN. However softmax is a loss function which should be used in order to optimize solution multiclass classifiaction problem. Softmax "transform" NN outputs into probability of each class occurance. Logistic function usually optimize each neuron output as a logistic problem, so it's not force output to be only one class. You should use this function if you want to solve multi labeling problem.
SVM is not a function, is a different classifier. There is no sense in comparing softmax with SVM, because first one is a loss function second one is a classifier.

Related

Why are the inputs to my guess_nonlinear() all 1s?

The N2 diagram for my full problem is below.
The N2 diagram for the coupled portion of the problem is below.
I have a DirectSolver handling the coupling between LLTForces and ImplicitLiftingLine, and an LNBGS solver handling the coupling between LiftingLineGroup and TestCL.
The gist for the problem is here: https://gist.github.com/eufren/31c0e569ed703b2aea3e2ef5360610f7
I have implemented guess_nonlinear() on ImplicitLiftingLine, which should use various outputs from LLTGeometry to give a good initial guess for the vortex strengths based on a linearised form of the governing equations.
def guess_nonlinear(self, inputs, outputs, resids):
freestream_unit_vector = inputs['freestream_unit_vector']
freestream_velocity = inputs['freestream_velocity']
n = inputs['normal_vectors']
A = inputs['surface_areas']
l = inputs['bound_vortices']
ic_tot = inputs['influence_coefficients_total']
v_inf = freestream_velocity
v_inf_vec = v_inf*freestream_unit_vector
lin_numerator = np.pi * v_inf * A * np.sum(n * v_inf_vec, axis=1)
lin_denominator = (np.linalg.norm(np.cross(v_inf_vec, l), axis=1) - np.pi * v_inf * A * np.sum(np.sum(n * ic_tot, axis=2), axis=1))
lin_vtx_str = lin_numerator / lin_denominator
outputs['vortex_strengths'] = lin_vtx_str
However, when the problem is run for the first time, any inputs not explicitly set with p.set_val() are all 1s. This causes guess_nonlinear() to give a bad output and so the system fails to converge:
As far as I can tell, the execution order for the LLT group is correct, and the geometry components should be being executed before the implicit component. I'm confused as to why this doesn't seem to actually be happening when the code is run, and instead these inputs are taking their default values.
What do I need to change to get this to work properly? Additionally, I've found difficulty in getting LNBGS to converge (hence adding guess_nonlinear()) during optimisation - only DirectSolver gets all the way through the optimisation without issues, but it's very slow for large numbers of LLT nodes). How can I improve the linear and nonlinear solver selection, and improve the reliability of the iterative solver?
Note: Thanks for providing a testable example. It made figuring out the answer to your question a lot simpler. Your problem was a bit subtle and I would not have been able to give a good answer without runnable code
Your first question: "Why are all the inputs 1"
"Short" Answer
You have put the nonlinear solver to high in the model hierarchy, which then included a key precurser component that computed your input values. By moving the solver down to a lower level of the model, I was able to ensure that the precurser component (LTTGeometry) ran and had valid outputs before you got to the guess_nonlinear of implicit component.
Here is what you had (Notice the implicit solver included LTTGeometry even though the data cycle does not require that component:
I moved both the nonlinear solver and the linear solver down into the LTTCycle group, which then allows the LTTGeometry component to execute before getting to the nonlinear solver and guess_nonlinear step:
My fix is only partially correct, since there is a secondary cycle from the TestCL component that also needs a solver and does not have one. However, that cycle still does not involve the LTTGeometry group. So the fully correct fix is to restructure you model top run geometry first, and then put the LTTCycle and TestCL groups together so you can run a solver over just them. That was a bit more hacking than I wanted to do on your test problem, but you can see the general idea from the adjusted N2 above.
Long Answer
The guess_nonlinear sequence in OpenMDAO does NOT run the compute method of explicit components or of groups. It follows the execution hierarchy, and calls any guess_nonlinear that it finds. So that means that any explicit components you have in your model will NOT get executed, their outputs will not get updated with computed values, and those computed values will not get passed to the inputs of downstream components.
Things get a little tricky when you have deep model hierarchies. The guess_nonlinear method is called as the first step in the nonlinear solver process. If you have a NonLinearRunOnce solver at the top level, it will follow the compute chain down the line calling compute or solve_nonlinear on each child and doing a data transfer after each one. If one of those children happens to be a group with a nonlinear solver, then that solver will call guess_nonlinear on its children (grandchildren of the top group with the NonLinearRunOnce solver) as the first step. So any outputs that were computed by the siblings of this group will be valid, but none of the outputs from the grandchild level will have been computed yet.
You may be wondering why not just have the guess_nonlinear method call the compute for any explicit components? There is a difficult to balance trade off here. If you assume that all explicit components are very cheap to run, then it might make sense to run the compute methods --- or it might not. A lot depends on the cyclic data structure. If some early component in the group needs guesses from the later one, then running its compute isn't going to help you much at all. Perhaps more importantly though, not all explicit components are cheap to run. You might have a very expensive computation, and calling compute as part of the guess process would be way too costly.
The compromise here, if you need some kind of top level guess process, is that you can implement guess_nonlinear at the group level. It's less common to do, but it gives you total control over what happens. You can call whatever you need to call in whatever sequence.
So the absolute key thing to remember is that the only data you have available to you when a guess_nonlinear is called is any data that was computed before your containing solver was executed. That means any thing that was computed before you got to the model scope of the containing solver (not the scope of the component with the guess_method itself).
Your second question: "How can I speed this up when the number of nodes gets large?"
This one not possible to give a generic answer to at all. I noticed that you have already specified sparse partial derivatives. That is a great start, but if its still not fast enough for you then it means you're reaching the limits of what you can do with a DirectSolver. You note that this solver is the only one that gets you through the optimization without issues, which I will take to mean that ScipyKryloventer link description here and PetscKrylov are not converging the linear system well for you --- at least not by themselves. Thats not surprising, as krylov solvers almost always require some kind of preconditioner... and this is why I can't offer a generic answer. Setting up efficient linear solvers for larger-scale compute is a tricky subject. If you look into the literature, you'll find some good suggestions. You can also study open source implementations like VSPAero for some tips.
effectively, you've reached the limit of what simple linear solvers can offer you. From this point forward, OpenMDAO can help a bit by making it easier to implement some preconditioning, but you'll have to suffer the math side yourself.

logistic regression always predict the same value when the nework are deeper

im using darknet to train a logistic regresion model. but it always output the same prediction for different input image.
but when i remove some convolutional layers , it seems to become normal.(different output for different input images)
the model cfg file is as follows:
[net]
some parameter...
[convolutions]
[convolutions]
[shortcut]
...
[avgpool]
[connected]
batch_normalize=1
output=1
activation=linear
[logistic]
i tryed different learning rate, momentum. not work.
and the trainging data is ballanced. two class, 15000images for each class.
any advices?
thanks.

Can a single image be a positive example for multiple classes?

Bouquets of flowers are a fairly accurate analogy for our problem domain, and we have another S.O. question out there asking about the feasibility of a different approach to our problem/goal.
What if, rather than making classes by flower types, we made our classes according to the actions we need to take depending on the contents and complex combinations of the bouquet?
Let's say that, if in the bouquet in our test image, there are:
>9 roses, >14 pansies, <1 marigold, any qty of other flowers
then we need to take, both, action-a & action-d.
So, then, the same image would be used as a positive example for both class action-a and class action-d.
Inversely, there would absolutely be positive action-d examples which would be negative action-a examples, and vice versa.
Of course, even with this simplification it still gets quite complex.
I imagine this approach would need a huge number of training images.
Even still, I'm hopeful that it might work.
Thoughts?
Yes, you can have the same image in >1 classes inside 1 classifier, as long as you have >=10 unique images per class AND >=20 total unique images in the classifier in total, including any negative_examples.
However, you should be careful about what you are "teaching" the system by doing this.
Classes within a classifier are meant to be mutually exclusive. Internally the system is trying to figure out what makes the positive examples of one class different from all the other examples in a classifier's training data.
If the system discovers an exact duplicate of an image file in more than one class of a single classifier , it will use it as a positive example of both classes. Exact duplicates are determined by the check sum of the image file.
I think you are on right path. but u have to make sure that u have enough no of images for training and no. of flowers in each image should be clearly visible.
Try it

Advice for Object Detection on Embedded System with no non-standard libraries

I am looking for some advice for a good way to detect either square or circular objects in an image. I currently have a canny edge algorithm running on the original greyscale and I can produce this output:
http://imgur.com/FAwowr1
Now I can see that there is a cubesat in this picture, but what is a good computationally efficient way that the program can see that aswell? I have looked at houghs transform but that seems to be very computation heavy. I have also looked at Harris corner detect, but I feel I would get to many false positives, for I am essentially looking to isolate pictures that contain said cube satellite.
Anyone have any thoughts on some good algorithms to pursue? I am very limited on space so I cannot use any large external libraries like opencv. (This is all in C btw)
Many Thanks!
I would into what is called mathematical morphology
Basically you operate on binary images, so you must find a clever way to threshold them first , the you do operations such as erosion and dilation with some well selected structuring element to extract areas of interest in your image.

best method of turning millions of x,y,z positions of particles into visualisation

I'm interested in different algorithms people use to visualise millions of particles in a box. I know you can use Cloud-In-Cell, adaptive mesh, Kernel smoothing, nearest grid point methods etc to reduce the load in memory but there is very little documentation on how to do these things online.
i.e. I have array with:
x,y,z
1,2,3
4,5,6
6,7,8
xi,yi,zi
for i = 100 million for example. I don't want a package like Mayavi/Paraview to do it, I want to code this myself then load the decomposed matrix into Mayavi (rather than on-the-fly rendering) My poor 8Gb Macbook explodes if I try and use the particle positions. Any tutorials would be appreciated.
Analysing and creating visualisations for complex multi-dimensional data is complex. The best visualisation almost always depends on what the data is, and what relationships exists within the data. Of course, you are probably wanting to create visualisation of the data to show and explore relationships. Ultimately, this comes down to trying different posibilities.
My advice is to think about the data, and try to find sensible ways to slice up the dimensions. 3D plots, like surface plots or voxel renderings may be what you want. Personally, I prefer trying to find 2D representations, because they are easier to understand and to communicate to other people. Contour plots are great because they show 3D information in a 2D form. You can show a sequence of contour plots side by side, or in a timelapse to add a fourth dimension. There are also creative ways to use colour to add dimensions, while keeping the visualisation comprehensible -- which is the most important thing.
I see you want to write the code yourself. I understand that. Doing so will take a non-trivial effort, and afterwards, you might not have an effective visualisation. My advice is this: use a tool to help you prototype visualisations first! I've used gnuplot with some success, although I'm sure there are other options.
Once you have a good handle on the data, and how to communicate what it means, then you will be well positioned to code a good visualisation.
UPDATE
I'll offer a suggestion for the data you have described. It sounds as though you want/need a point density map. These are popular in geographical information systems, but have other uses. I haven't used one before, but the basic idea is to use a function to enstimate the density in a 3D space. The density becomes the fourth dimension. Something relatively simple, like the equation below, may be good enough.
The point density map might be easier to slice, summarise and render than the raw particle data.
The data I have analysed has been of a different nature, so I have not used this particular method before. Hopefully it proves helpful.
PS. I've just seen your comment below, and I'm not sure that this information will help you with that. However, I am posting my update anyway, just in case it is useful information.

Resources