Kmeans as custom layer in functional model - tensorflow.js

We are planning to use kmeans to split our data and have 10 separate fully connected models to estimate results for each group from kmeans separately.
One obvious way is to have 10 separate tfjs models and separate kmeans on the beginning.
As tfjs supports functional models and custom layers. Alternative is to have kmeans as fist custom layer and then several dense layers connected to it. Is it possible to use existing layer API to receive 20 Tensors, perform kmeans and have 10 different set of 20 Tensors as output to next layers? Do you see any issues with this approach? Is there another alternative?

Kmeans is not yet implemented in tfjs. Had it been it couldn't have been considered as a layer itself. You can however create a two stage model in your class by supposing that you yourself you manage to have your own implementation of kmeans.
You'll simply have to pass the result of one model to the other using a conditional statement. The first model -kmeans - will output the class of the data and the second model - one out 10 - is chosen based on the output of the first model.
Having said that, all these can be done in one shot using either the sequential API tf.sequential or the functional one tf.model. There are kmeans implementation in js that will return js arrays as vectors. These arrays can be converted to tensors whose shape will determine the shape of the layers. Using FCNN , we can have an output for each kmeans class.

Related

Multiple weights per Edge in a JGraphT DAG

Is there a way in JGraphT that I can assign multiple weights to a single edge? For example, suppose I have a graph representing travel-time between cities. I want to assign edge-weights for "time by plane", "time by car", "time by bus", etc., and then find least-cost route by some specified mode of travel.
One approach I can think of is to have distinct graph for each travel mode and then add every city vertex to every graph but that seems like a messy and memory intensive solution.
My next thought was that I might be able to extend the class implementing the graph ( probably DirectedWeightedPseudograph) and customize the getEdgeWeight() method to take an additional argument specifying which weight value to use. That, however, would require extending all the algorithm classes as well (e.g., DijkstraShortestPath) which I am trying to avoid.
To get around that problem I considered the following:
Extend my Graph class by adding a method setWeightMode(enum mode)
customize the getEdgeWeight() method to use the currently assigned mode to determine which weight value to return to the caller.
On the plus side it would be 100% transparent to any existing analysis classes. On the negative side, it would not be thread-safe.
At this point I'm out of ideas. Can anyone suggest an approach that is scalable for large graphs, supports multi-threading, and minimizes the need to re-implement code already provided by JGraphT?
There exists a much easier solution: you want to use the AsWeightedGraph class. This is a wrapper class that allows you to create different weighted views of an underlying graph. From the class description:
Provides a weighted view of a graph. The class stores edge weights internally. All getEdgeWeight calls are handled by this view; all other graph operations are propagated to the graph backing this view.
This class can be used to make an unweighted graph weighted, to override the weights of a weighted graph, or to provide different weighted views of the same underlying graph. For instance, the edges of a graph representing a road network might have two weights associated with them: a travel time and a travel distance. Instead of creating two weighted graphs of the same network, one would simply create two weighted views of the same underlying graph.

Feature selection for Logistic Regression

Both Kaplan Meier method and Logistic Regression have their own feature selections. I want to use another method to pick best features for example, back stepwise feature selection. Is it possible to use this sort of methods instead or not.
My data acquires more than 130 features and about 3000 individuals. Since it is medical [cancer] data I don't want to use simple methods.
Further information about the project can be seen here and it is in order of what should I do:
preprocessing the data
separating them for test and train
Data imputation for train data
Feature selection by train data
Training the models which are Kaplan Meier and Logistic Regression
Testing the model
Pleas inform me that is it wrong to use any other feature selection for them or not?
I can use any tip about the model which I have listed too.
Basically there are 4 types of feature selection (fs) techniques namely:-
1.) Filter based fs
2.) Wrapper based fs
3.) Embedded fs techniques
4.) Hybrid fs techniques
Each has it's own advantages and disadvantages. For ex, filter fs is used when you want to determine if "one" feature is important to the output variable. So if you have 400 features in your dataset, you would have to repeat this 400 times!
Wrapper based methods (as you mentioned in you question), on the other hand do this is one step. But they are prone to overfitting, whereas filter based methods are not.
Embedded methods use tree based methods for fs purpose.
I do not have enough knowledge about hybrid methods.
I would say you could use some wrapper based techniques like RFECV since you say you do not want to use simple filter techniques.

How to model nested lists with many items using Google Drive Realtime API?

I'd like to model ordered nested lists of uniform items (like what you would see in a standard tree widget) using the Google Drive realtime API. These trees could get quite large, ideally working well with many thousands of items.
One approach would be:
Item:
title: CollaborativeString
attributes: CollaborativeMap
children: CollaborativeList // recursivly hold other items
But I'm unsure if this is feesible when dealing with a large number of items.
An alternative might be to store all items tree order in a single CollaborativeList and add an additional "level" attribute. Then reconstruct the tree structure based on that level on the client. That would change from having to maintain thousands of CollaborativeLists to just a single big one. Probably lots of other alternatives that I don't know about.
Thanks for pointers on the best way to model this in the Google Drive Realtime API.
So long as the total size of the document is within the size limits, there shouldn't be a significant performance difference between the approaches from a framework perspective. (One caveat, using ObjectChangedListeners with a highly connected graph may slow things down. Prefer registering listeners on the specific objects instead.)
Modeling it as a real tree makes sense, since that will be the easiest to work with, and you can use the new move operation to atomically rearrange items in the lists.

Self Tracking Entities Traffic Optimization

I'm working on a personal project using WPF with Entity Framework and Self Tracking Entities. I have a WCF web service which exposes some methods for the CRUD operations. Today I decided to do some tests and to see what actually travels over this service and even though I expected something like this, I got really disappointed. The problem is that for a simple update (or delete) operation for just one object - lets say Category I send to the server the whole object graph, including all of its parent categories, their items, child categories and their items, etc. I my case it was a 170 KB xml file on a really small database (2 main categories and about 20 total and about 60 items). I can't imagine what will happen if I have a really big database.
I tried to google for some articles concerning traffic optimization with STE, but with no success, so I decided to ask here if somebody has done something similar, knows some good practices, etc.
One of the possible ways I came out with is to get the data I need per object with more service calls:
return context.Categories.ToList();//only the categories
...
return context.Items.ToList();//only the items
Instead of:
return context.Categories.Include("Items").ToList();
This way the categories and the items will be separated and when making changes or deleting some objects the data sent over the wire will be less.
Has any of you faced a similar problem and how did you solve it or did you solve it?
We've encountered similiar challenges. First of all, as you already mentioned, is to keep the entities as small as possible (as dictated by the desired client functionality). And second, when sending entities back over the wire to be persisted: strip all navigation properties (nested objects) when they haven't changed. This sounds very simple but is not at all trivial. What we do is to recursively dig into the entities present in trackable collections of say the "topmost" entity (and their trackable collections, and theirs, and...) and remove them when their ChangeTracking state is "Unchanged". But be carefull with this, because in some cases you still need these entities because they have been removed or added to trackable collections of their parent entity (so then you shouldn't remove them).
This, what we call "StripEntity", is also mentioned (not with any code sample or whatsoever) in Julie Lerman's - Programming Entity Framework.
And although it might not be as efficient as a more purist kind of approach, the use of STE's saves a lot of code for queries against the database. We are not in need for optimal performance in a high traffic situation, so STE's suit our needs and takes away a lot of code to communicate with the database. You have to decide for your situation what the "best" solution is. Good luck!
You can find an Entity Framework project item at http://selftrackingentity.codeplex.com/. With version 0.9.8, I added a method called GetObjectGraphChanges() that returns an optimized entity object graph with only objects that have changes.
Also, there are two helper methods: EstimateObjectGraphSize() and EstimateObjectGraphChangeSize(). The first method returns the estimate size of the whole entity object along with its object graph; and the later returns the estimate size of the optimized entity object graph with only object that have changes. With these two helper methods, you can decide whether it makes sense to call GetObjectGraphChanges() or not.

Using cakephp, how do I display a series of data without using for each loops?

The application I'm building is a tennis draw, which has a list of matches organized by round. I've also given the matches a position number as a way to order them and manage matches that are at the top of the draw. In a 32 player draw there are 16 matches in the first round, with match positions ordered from 1 - 16.
The draw is laid out using html and isn't just a series of table rows, it has some complexity to it. So in order to display match players/scores in the view I need to place data in places that are out of sequence from a typical display using a series of rows.
How can I display 16 matches of data without using a foreach loop? Is it best for each variable in the controller be for a single record?
I had thought that in the controller I could have 16 variables, but I'd prefer to learn better dry approaches to programming as I learn php and cakephp.
Much appreciated.
Just display the array elements.
<?php echo $matches['Match'][0]['Player1']?>
<?php echo $matches['Match'][12]['Score']?>
If you want to have a good look at the data array, to figure out which bits are where, you can use pr().
<?php pr($matches)?>
You can just echo out a specific part of an array, they are no exclusivly tied to foreach loops. This is the approach that I would take. Although you need to be aware that obviously as your data changes between rounds you'll either need to update or create a new view for each round.
Don't give up on foreach just yet.
I each row of data is "fairly complex", perhaps it's the best if you move it to an element.
If you need to display data outside of the typical table, maybe it would be better to extract that data using the awesome Set class, and then doing your calculations and whatnot.
Although, as sibidiba has already mentioned, it would be the best if you did all that in your model, so your views stay clean and clear. All the data you want to display outsite your standard table should already be ready for display, prepared in your model and passed on to your views by your controller. As mentioned, Set class is a powerful tool, see if you can use it.
If you are really unable to iterate through the data in your view (consider also nested foreach loops), your are passing the wrong data to the view!
Refactor your controller to pass data in an iterable form. Containables are really awesome in CakePHP.

Resources