Feature selection for Logistic Regression

Feature selection for Logistic Regression - logistic-regression

Both Kaplan Meier method and Logistic Regression have their own feature selections. I want to use another method to pick best features for example, back stepwise feature selection. Is it possible to use this sort of methods instead or not.
My data acquires more than 130 features and about 3000 individuals. Since it is medical [cancer] data I don't want to use simple methods.
Further information about the project can be seen here and it is in order of what should I do:
preprocessing the data
separating them for test and train
Data imputation for train data
Feature selection by train data
Training the models which are Kaplan Meier and Logistic Regression
Testing the model
Pleas inform me that is it wrong to use any other feature selection for them or not?
I can use any tip about the model which I have listed too.

Basically there are 4 types of feature selection (fs) techniques namely:-
1.) Filter based fs
2.) Wrapper based fs
3.) Embedded fs techniques
4.) Hybrid fs techniques
Each has it's own advantages and disadvantages. For ex, filter fs is used when you want to determine if "one" feature is important to the output variable. So if you have 400 features in your dataset, you would have to repeat this 400 times!
Wrapper based methods (as you mentioned in you question), on the other hand do this is one step. But they are prone to overfitting, whereas filter based methods are not.
Embedded methods use tree based methods for fs purpose.
I do not have enough knowledge about hybrid methods.
I would say you could use some wrapper based techniques like RFECV since you say you do not want to use simple filter techniques.

Related

Facial Detection with LBPH - Extracting Features

I've created the the framework of the system, which takes a picture, converts it to an LBPH image, and then gets the histograms from each tile of the grid(8x8). I'm following this paper on it, but am confused what to do next to identify features after step 4. Do I just compare each square of the grid with a set of known feature squares and find the closest match? This is my first facial detection program so I'm very new to it.

So basically image processing works like this. Pixel intensity values are way too variant and uninformative by themselves to be useful for algorithms to make sense of an image. Much more useful is the local relationships between pixel intensity values So image processing for recognition, detection is basically a 2-step process.
Feature Extraction - Transform the low-level, high variance, uninformative features such as pixel intensities into a high-level, lower variance, more informative feature set (e.g. edges, visual patterns, etc.) this is referred to as feature extraction. Over the years, there have been a number of feature extraction mechanisms suggested such as edge detection with Sobel filters, histogram of oriented gradients (HOG), Haar-like features, Scale invariant features (SIFTS) and LBPH as you are trying to use. (Note that in most modern applications that are not computationally limited, convolutional neural networks (CNNs) are used for the feature extraction step because they empirically work much much better.
Use Transformed Features - once more useful information (a more informative set of features) has been extracted, you need to use these features to perform the reasoning operation you're hoping to accomplish. In this step, you fit a model (function approximator) such that given your high-level features as an input, the model outputs the information you want (in this case a classification on whether an image contains a face I think). Thus, you need to select and fit a model that can make use of the high-level features for classification. Some classic approaches to this include decision trees, support vector machines, and neural networks. Essentially, model fitting is a standard machine learning problem, and will require using a labelled set of training data to "teach" the model what the high-level feature set will look like for an image that contains a face, versus an image that does not.
It sounds like your code in its current state is missing the second piece. As a good starting place, look into using sci-kit learn's decision tree package.

Kmeans as custom layer in functional model

We are planning to use kmeans to split our data and have 10 separate fully connected models to estimate results for each group from kmeans separately.
One obvious way is to have 10 separate tfjs models and separate kmeans on the beginning.
As tfjs supports functional models and custom layers. Alternative is to have kmeans as fist custom layer and then several dense layers connected to it. Is it possible to use existing layer API to receive 20 Tensors, perform kmeans and have 10 different set of 20 Tensors as output to next layers? Do you see any issues with this approach? Is there another alternative?

Kmeans is not yet implemented in tfjs. Had it been it couldn't have been considered as a layer itself. You can however create a two stage model in your class by supposing that you yourself you manage to have your own implementation of kmeans.
You'll simply have to pass the result of one model to the other using a conditional statement. The first model -kmeans - will output the class of the data and the second model - one out 10 - is chosen based on the output of the first model.
Having said that, all these can be done in one shot using either the sequential API tf.sequential or the functional one tf.model. There are kmeans implementation in js that will return js arrays as vectors. These arrays can be converted to tensors whose shape will determine the shape of the layers. Using FCNN , we can have an output for each kmeans class.

How to model Data Transfer Objects for different front ends?

I've run into reoccuring problem for which I haven't found any good examples or patterns.
I have one core service that performs all heavy datasbase operations and that sends results to different front ends (html, silverlight/flash, web services etc).
One of the service operation is "GetDocuments", which provides a list of documents based on different filter criterias. If I only had one front-end, I would like to package the result in a list of Document DTOs (Data transfer objects) that just contains the data. However, different front-ends needs different amounts of "metadata". The simples client just needs the document headline and a link reference. Other clients wants a short text snippet of the document, another one also wants a thumbnail and a third wants the name of the author. Its basically all up to the implementation of the GUI what needs to be displayed.
Whats the best way to model this:
As a lot of different DTOs (Document, DocumentWithThumbnail, DocumentWithTextSnippet)
tends to become a lot of classes
As one DTO containing all the data, where the client choose what to display
Lots of unnecessary data sent
As one DTO where certain fields are populated based on what the client requested
Tends to become a very large class that needs to be extended over time
One DTO but with some kind of generic "Metadata" field containing requested metadata.
Or are there other options?
Since I want a high performance service, I need to think about both network load and caching strategies.
Does anyone have any good patterns or practices that might help me?

What I would do is give the front end the ability to request the presence of the wanted metadata ( say getDocument( WITH_THUMBNAILS | WITH_TEXT_SNIPPET ) )
Then this DTO is built with only this requested information.
Adding all the possible metadata is as you said, unacceptable.
I will surely stay with one class defining all the possible methods (getTitle(), getThumbnail()) and if possible it will return a placeholder when the thumbnail was not requested. Something like "Image not available".
If you want to model this like a pattern, take a look at the factory patterns.
Hope this helps you.

Is there any noticable cost to creating a DTO that has all the data any of your views could need and using it everywhere? I would do that, especially since it insulates you from a requirement change down the line to have one of the views incorporate data one of the other views uses
ex. Maybe your silverlight/flash view doesn't show the title itself b/c it's in the thumb now, but they decide they want to sort by it later.
To clarify, I do not necesarily think you need to pass down all of the data every time, but I think your DTO class should define all of them. Just don't fall into the pits of premature optimization or analysis paralysis. Do the simplest thing first, then justify added complexity. Throw it all in and profile it. If the perf is unacceptable, optimize and try again.

What are the best practices for database development with Delphi?

How can I use the RAD way productively (reusing code). Any
samples, existing libraries, basic
crud generators?
How can I design the OOP way? Which
design patterns to use for
connection, abstracting different
engines/db access layers
(bde-dbexpress-ado), basic CRUD
operations.

I have my own Delphi/MySQL framework that lets me add 'new screens' very rapidly. I won't share it, but I can describe the approach I take:
I use a tabbed interface with a TFrame based hierarchy. I create a tab and link a TFrame into it.
I take care of all the crud plumbing, and concurrency controls using a standard mysql stored procedure implementation. CustomerSEL, CustomerGET, CustomerUPD, CustomerDEL, etc...
My main form essentially contains navbar panel and a panel containing TPageControl
An example of the classes in my hierarchy
TFrame
TMFrame - my derivation, with interface implementations capturing OnShow, OnHide, and some other particulars
--TWebBrowserFrame
--TDataAwareFrame
--TObjectEditFrame
--TCustomerEditFrame
--TOrderEditFrame
etc...
--TObjectListFrame
--TCustomerListFrame
etc...
and some dialogs..
TDialog
TMDialog
--TDataAwareDialog
--TObjectEditDialog
-- TContactEditDialog
etc..
--TObjectSelectDialog
--TContactSelectDialog
etc...
When I add a new object to manage, it could be a new attribute of customers, let's say we want to track which vehicles a customer owns.
create table CustomerVehicles
I run my special sproc generator that creates my SEL, GET, UPD, DEL
test those...
Derive from the base classes I mentioned above, drop some controls. Add a tab to the TCustomerEdit.
Delphi has always the Dataset as the abstract layer, expose this to your GUI via DataSources. Add the dataset to the customer data module, and "register it". My own custom function in my derived datamodule class, TMDataModule
Security control is similarly taken care of in the framework.. I 'Register' components that require a security flag to be visible or enabled.
I can usually add a new object, build the sprocs, add the maintenance screens within an hour.
Of course, that is usually just the start, usually when you add something, you use it for more than tracking. If this a garage application, we want to add the vehicle the customer brought into the garage, id it so we can track the history. But even so, it is fast.
I have tried subcontracting to younger guys using 'newer development tools', and they never seem to believe me when I say I can do this all ten times faster with Delphi! I can do in two hours bug-free what it seems to take them two days and they still have bugs...
DO - Be careful planning your VFI! As someone mentioned, if you want to change the name of a component on one of your parent classes, be prepared for trouble. You will need to open and 'edit' each child in the hierarchy, even if you clean DCU you can still have some DFM hell. I can assure you in 2006 this is still a problem.
DON'T create one monster datamodule
DO take your time in the upfront design, refactoring after you have created a ton of dependents can be a fun challenge, but a nightmare when you have to get something new working quickly!

Be very careful if you use the „put every DB objects into one big data module” (or "few big datamodules" in huge applications) approach. This can make your project having data module so big, that you will have to use HD monitor to see all TXDataset on this datamodule
Bottom line: switch to using specialized classes for business logic instead of big global data modules. Use global data modules with logic ONLY in very small projects.

Well, I strongly suggest you to use Actions (TActionList) when designing your user interface. There are many predefined actions including Next/Prev/Insert/Delete/Edit/Update operations that can be performed on datasets, so it is a good practice to use these actions and link them to buttons/menus on your forms. This prevents repeated code for UI logic.
There is no need for a CRUD generator for Delphi!! Add TDataSource, TDBGrid and TActionList to a form, add predefined data source actions to the action list, link those actions to buttons or menus, and you are done!

For large applications, I use the tiopf object persistance framework. That lets me deal with objects rather than datasets and swap databases easily. Most of my business logic moves into the business object model (BOM) and my forms are pretty dumb. tiopf has a few ways to connect the BOM to forms; persistance aware controls, Ttidataset for data-aware controls and Mogel Gui Mediator classes for connecting to normal controls.
For small and quick apps, I just use data modules and database components. The main things to remember are:
Put as much code in the data modules (and as little in the forms) as possible.
Do multiple data modules broken down by functionality eg the email module, the income module, the invoicing module...
Test, test, test

Use VFI (visual form inheritance). Design a standart DB form. For example, empty DataSet, DataSource, a PageControl consisting of 2 sheets. First will be empty, later on you'll add edit controls to manipulate data at child forms. Add DBGrid to the second sheet. Beware, this isn't the OOP way though, but it's easy and fast.

I would take a look at Data Abstract from Remobjects.

Any business examples of using Markov chains?

What business cases are there for using Markov chains? I've seen the sort of play area of a markov chain applied to someone's blog to write a fake post. I'd like some practical examples though? E.g. useful in business or prediction of stock market, or the like...
Edit: Thanks to all who gave examples, I upvoted each one as they were all useful.
Edit2: I selected the answer with the most detail as the accepted answer. All answers I upvoted.

The obvious one: Google's PageRank.

Hidden Markov models are based on a Markov chain and extensively used in speech recognition and especially bioinformatics.

I've seen spam email that was clearly generated using a Markov chain -- certainly that qualifies as a "business use". :)

There is a class of optimization methods based on Markov Chain Monte Carlo (MCMC) methods. These have been applied to a wide variety of practical problems, for example signal & image processing applications to data segmentation and classification. Speech & image recognition, time series analysis, lots of similar examples come out of computer vision and pattern recognition.

We use log-file chain-analysis to derive and promote secondary and tertiary links to otherwise-unrelated documents in our help-system (a collection of 10m docs).
This is especially helpful in bridging otherwise separate taxonomies. e.g. SQL docs vs. IIS docs.

I know AccessData uses them in their forensic password-cracking tools. It lets you explore the more likely password phrases first, resulting in faster password recovery (on average).

Markov chains are used by search companies like bing to infer the relevance of documents from the sequence of clicks made by users on the results page. The underlying user behaviour in a typical query session is modeled as a markov chain , with particular behaviours as state transitions...
for example if the document is relevant, a user may still examine more documents (but with a smaller probability) or else he may examine more documents (with a much larger probability).

There are some commercial Ray Tracing systems that implement Metropolis Light Transport (invented by Eric Veach, basically he applied metropolis hastings to ray tracing), and also Bi-Directional- and Importance-Sampling- Path Tracers use Markov-Chains.
The bold texts are googlable, I omitted further explanation for the sake of this thread.

We plan to use it for predictive text entry on a handheld device for data entry in an industrial environment. In a situation with a reasonable vocabulary size, transitions to the next word can be suggested based on frequency. Our initial testing suggests that this will work well for our needs.

IBM has CELM. Check out this link:
http://www.research.ibm.com/journal/rd/513/labbi.pdf

I recently stumbled on a blog example of using markov chains for creating test data...
http://github.com/emelski/code.melski.net/blob/master/markov/main.cpp

Markov model is a way of describing a process that goes through a series of states.
HMMs can be applied in many fields where the goal is to recover a data sequence that is not immediately observable (but depends on some other data on that sequence).
Common applications include:
Crypt-analysis, Speech recognition, Part-of-speech tagging, Machine translation, Stock Prediction, Gene prediction, Alignment of bio-sequences, Gesture Recognition, Activity recognition, Detecting browsing pattern of a user on a website.

Markov Chains can be used to simulate user interaction, f.g. when browsing service.
My friend was writing as diplom work plagiat recognision using Markov Chains (he said the input data must be whole books to succeed).
It may not be very 'business' but Markov Chains can be used to generate fictitious geographical and person names, especially in RPG games.

Markov Chains are used in life insurance, particularly in the permanent disability model. There are 3 states
0 - The life is healthy
1 - The life becomes disabled
2 - The life dies
In a permanent disability model the insurer may pay some sort of benefit if the insured becomes disabled and/or the life insurance benefit when the insured dies. The insurance company would then likely run a monte carlo simulation based on this Markov Chain to determine the likely cost of providing such an insurance.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight