Quantization for object detection - quantization

How quantization for object detection models varies from that of classification models?
Since detection models need to handle the bbox coordinates(multiple objects in an input),there must be some scaling trick in quantization.

You can look at SSD of Tensorflow model in the model zoo API here
SSD is single shot detection model, it takes the features of detection when looking overthe image to provide the classification score according to label. This type of application is very useful for multiple types of object detection.

Related

Multiple weights per Edge in a JGraphT DAG

Is there a way in JGraphT that I can assign multiple weights to a single edge? For example, suppose I have a graph representing travel-time between cities. I want to assign edge-weights for "time by plane", "time by car", "time by bus", etc., and then find least-cost route by some specified mode of travel.
One approach I can think of is to have distinct graph for each travel mode and then add every city vertex to every graph but that seems like a messy and memory intensive solution.
My next thought was that I might be able to extend the class implementing the graph ( probably DirectedWeightedPseudograph) and customize the getEdgeWeight() method to take an additional argument specifying which weight value to use. That, however, would require extending all the algorithm classes as well (e.g., DijkstraShortestPath) which I am trying to avoid.
To get around that problem I considered the following:
Extend my Graph class by adding a method setWeightMode(enum mode)
customize the getEdgeWeight() method to use the currently assigned mode to determine which weight value to return to the caller.
On the plus side it would be 100% transparent to any existing analysis classes. On the negative side, it would not be thread-safe.
At this point I'm out of ideas. Can anyone suggest an approach that is scalable for large graphs, supports multi-threading, and minimizes the need to re-implement code already provided by JGraphT?
There exists a much easier solution: you want to use the AsWeightedGraph class. This is a wrapper class that allows you to create different weighted views of an underlying graph. From the class description:
Provides a weighted view of a graph. The class stores edge weights internally. All getEdgeWeight calls are handled by this view; all other graph operations are propagated to the graph backing this view.
This class can be used to make an unweighted graph weighted, to override the weights of a weighted graph, or to provide different weighted views of the same underlying graph. For instance, the edges of a graph representing a road network might have two weights associated with them: a travel time and a travel distance. Instead of creating two weighted graphs of the same network, one would simply create two weighted views of the same underlying graph.

What is the canonical method for varying filter parameters based on the output of a preceding filter in Gstreamer?

I am looking at building an application using gstreamer but first have some questions regarding its capabilities with respect to a desired use case.
Say I wanted to build a pipeline that processes video data in a similar way as depicted below.
Videosrc -> Facedetect -> Crop -> Videosink
What is the canonical method for taking metadata produced on each frame by a given video filter (i.e. the bounding box from a facial detection filter) and passing it to succeeding filter to operate on (i.e. the Crop filter cropping each image on the bounding box provided by Facedetect).
I know there are properties and dynamic properties, but as far as I can tell from docs, those both require an idea of what you want to happen when you construct the pipeline.
I also know that you can attach metadata to the GstBuffer object which could potentially be used, but there would need to be an agreed upon interface in that case which doesn't seem very portable and may lack support across many elements with the same capabilities.

Feature selection for Logistic Regression

Both Kaplan Meier method and Logistic Regression have their own feature selections. I want to use another method to pick best features for example, back stepwise feature selection. Is it possible to use this sort of methods instead or not.
My data acquires more than 130 features and about 3000 individuals. Since it is medical [cancer] data I don't want to use simple methods.
Further information about the project can be seen here and it is in order of what should I do:
preprocessing the data
separating them for test and train
Data imputation for train data
Feature selection by train data
Training the models which are Kaplan Meier and Logistic Regression
Testing the model
Pleas inform me that is it wrong to use any other feature selection for them or not?
I can use any tip about the model which I have listed too.
Basically there are 4 types of feature selection (fs) techniques namely:-
1.) Filter based fs
2.) Wrapper based fs
3.) Embedded fs techniques
4.) Hybrid fs techniques
Each has it's own advantages and disadvantages. For ex, filter fs is used when you want to determine if "one" feature is important to the output variable. So if you have 400 features in your dataset, you would have to repeat this 400 times!
Wrapper based methods (as you mentioned in you question), on the other hand do this is one step. But they are prone to overfitting, whereas filter based methods are not.
Embedded methods use tree based methods for fs purpose.
I do not have enough knowledge about hybrid methods.
I would say you could use some wrapper based techniques like RFECV since you say you do not want to use simple filter techniques.

Facial Detection with LBPH - Extracting Features

I've created the the framework of the system, which takes a picture, converts it to an LBPH image, and then gets the histograms from each tile of the grid(8x8). I'm following this paper on it, but am confused what to do next to identify features after step 4. Do I just compare each square of the grid with a set of known feature squares and find the closest match? This is my first facial detection program so I'm very new to it.
So basically image processing works like this. Pixel intensity values are way too variant and uninformative by themselves to be useful for algorithms to make sense of an image. Much more useful is the local relationships between pixel intensity values So image processing for recognition, detection is basically a 2-step process.
Feature Extraction - Transform the low-level, high variance, uninformative features such as pixel intensities into a high-level, lower variance, more informative feature set (e.g. edges, visual patterns, etc.) this is referred to as feature extraction. Over the years, there have been a number of feature extraction mechanisms suggested such as edge detection with Sobel filters, histogram of oriented gradients (HOG), Haar-like features, Scale invariant features (SIFTS) and LBPH as you are trying to use. (Note that in most modern applications that are not computationally limited, convolutional neural networks (CNNs) are used for the feature extraction step because they empirically work much much better.
Use Transformed Features - once more useful information (a more informative set of features) has been extracted, you need to use these features to perform the reasoning operation you're hoping to accomplish. In this step, you fit a model (function approximator) such that given your high-level features as an input, the model outputs the information you want (in this case a classification on whether an image contains a face I think). Thus, you need to select and fit a model that can make use of the high-level features for classification. Some classic approaches to this include decision trees, support vector machines, and neural networks. Essentially, model fitting is a standard machine learning problem, and will require using a labelled set of training data to "teach" the model what the high-level feature set will look like for an image that contains a face, versus an image that does not.
It sounds like your code in its current state is missing the second piece. As a good starting place, look into using sci-kit learn's decision tree package.

How does Ludwig encode images

I’m looking to understand how Ludwig encodes images. Does it run the images through a pretrained model without running a loss or does it run a loss? If so, what type of loss is ran for a large feature set?
All the documentation regarding image pre-processing and encoding can be found here.
In summary:
Currently there are two encoders supported for images: Convolutional Stack Encoder and ResNet encoder which can be set by setting encoder parameter to stacked_cnn or resnet in the input feature dictionary in the model definition (stacked_cnn is the default one).
Each of the above encoders have configurable hyper-parameters. It does not run them through a pre-trained model as that would defeat the purpose - you are training your own model with your own data.

Resources