How do I manage multiple training sets using the Watson NLC Toolkit

How do I manage multiple training sets using the Watson NLC Toolkit - ibm-watson

From what I see, there's no way to upload multiple training sets to the new Watson NLC tooling. I need to manage separate training sets and their associated classifiers. What am I missing here?

Preferred option: Provision an NLC service instance for each set of training data you'd like to work with and separately access the tooling for each.
Workaround: Currently, the flow for managing multiple training sets in one NLC service instance is as follows:
(Optional to start fresh) Go to the training data page and click on the garbage icon to delete all training data.
Upload a training set on the training data page using the upload icon.
Manipulate the data as necessary. Add texts and classes, tag texts with classes, etc.
Create a classifier. When you create a classifier, it is essentially a snapshot of your current training data since you are able to retrieve it later from the classifiers page.
Repeat steps 1-4 as necessary until you have uploaded all of your training data sets and created the corresponding classifiers.
When you want to continue working on a previous training set:
Clear your training data (step 1 from above).
Go to the classifiers page.
Click on the download icon for the classifier which contains the training data you'd like to work with.
Return to the training data page and upload the file downloaded from step 3.

The best way to manage multiple training sets is to use a different NLC service instance for each training set.
The current beta NLC tooling is not intended to manage separate training sets within a single service instance. For example, the tool makes suggestions when you add texts without classes- these are based on the most recently trained classifier which won't make sense if that was based on a completely different training set.
The work around suggested by #John Bufe will work if you have a hard limit on the number of NLC services you can use for some reason, e.g. you have reached your limit of Bluemix services. Cost is not a factor here as additional NLC service instances will not increase the overall price since the monthly charge is for trained classifier instances. For example, if you have four service instances with a single classifier in each, you'll see 3 charged and 1 free.
If you want to use the NLC beta tooling to manage your training data, I would recommend using separate NLC services for each training set you require.

Related

UI - Consuming different microservices API and consolidate the list into grid view and allow users to sort the data

Currently in our front end project (AngularJS), we need to consume different endpoints that are built in microservices architecture and show the data in the list view. Then we need to allow users to sort the data based on the columns selected by user. For eg, we are listing 10 columns out of which 6 are rendered from Service A and other 4 columns are pulled from another Service B. Both the services don't have direct relation mapping instead based on the object id Service B returns the data.
Now we have consolidated the list and shown the columns and allowed users to choose columns of their choice. As a next step, we need to allow users to sort any column data seamlessly. Is there any best practice followed in microservices paradigm to retrieve the data from both the services and sort them and show the result.
We have few options like
list all the data at once from both the services and sort the data in frontend. But problem with this approach is, if there are more dataset then user might feel slowness and at times browser can get hanged. We are using AngularJs in our project and already facing slowness when data set grows.
Introduce an intermediate API service(light weight nodejs server) which will helps to coordinate the request and it internally handles requesting data between different services and sends the result back.
Create an intermediate API service which will cache the data and orchestrates the request and responds the data from multiple services.
Can any one just share any other practices can be followed for the above use case? In current microservices trends, all API services are exposed as separate service and it makes frontend world a bit complex to handle services between different APIs and show data to users in UI to interact.
Any suggestions or approaches or hint will be helpful.
Thanks in advance.
Srini

Like you said, there are a few ways to handle the scenario you have. In my opinion the best approach would be option two. It is similar to the Gateway Aggregation Pattern where you introduce a gateway layer to handle the aggregation of your service APIs. The added benefit is that you may be able to park some common functionalities in this gateway layer if required.
Of course, the obvious drawback would be that you now have another layer that needs to be highly available and managed. So do consider the pros and cons carefully before deciding on your approach. For example, if this is the only aggregation that you will ever need, then 3 may be a better option.

how to prepare data for domain specific chat-bot

I am trying to make a chatbot. all the chatbots are made of structure data. I looked Rasa, IBM watson and other famous bots. Is there any ways that we can convert the un-structured data into some sort of structure, which can be used for bot training? Let's consider bellow paragraph-
Packaging unit
A packaging unit is used to combine a certain quantity of identical items to form a group. The quantity specified here is then used when printing the item labels so that you do not have to label items individually when the items are not managed by serial number or by batch. You can also specify the dimensions of the packaging unit here and enable and disable them separately for each item.
It is possible to store several EAN numbers per packaging unit since these numbers may differ for each packaging unit even when the packaging units are identical. These settings can be found on the Miscellaneous tab:
There are also two more settings in the system settings that are relevant to mobile data entry:
When creating a new item, the item label should be printed automatically. For this reason, we have added the option ‘Print item label when creating new storage locations’ to the settings. When using mobile data entry devices, every item should be assigned to a storage location, where an item label is subsequently printed that should be applied to the shelf in the warehouse to help identify the item faster.
how to make the bot from such a data any lead would be highly appreciated. Thanks!
is this idea in picture will work?just_a_thought

The data you are showing seems to be a good candidate for a passage search. Basically, you would like to answer user question by the most relevant paragraph found in your training data. This uses-case is handled by Watson Discovery service that can analyze unstructured data as you are providing and then you can query the service with input text and the service answers with the closest passage found in the data.
From my experience you also get a good results by implementing your own custom TF/IDF algorithm tailored for your use-case (TF/IDF is a nice similarity search tackling e.g. the stopwords for you).
Now if your goal would be to bootstrap a rule based chatbot using these kind of data then these data are not that ideal. For rule-based chatbot the best data would be some actual conversations between users asking questions about the target domain and the answers by some subject matter expert. Using these data you might be able to at least do some analysis helping you to pinpoint the relevant topics and domains the chatbot should handle however - I think - you will have hard time using these data to bootstrap a set of intents (questions the users will ask) for the rule based chatbot.
TLDR
If I would like to use Watson service, I would start with Watson Discovery. Alternatively, I would implement my own search algorithm starting with TF/IDF (which maps rather nicely to your proposed solution).

How to use Cloud ML Engine for Context Aware Recommender System

I am trying to build Context Aware Recommender System with Cloud ML Engine, which uses context prefiltering method (as described in slide 55, solution a) and I am using this Google Cloud tutorial (part 2) to build a demo. I have split the dataset to Weekday and Weekend contexts and Noon and Afternoon contexts by timestamp for purposes of this demo.
In practice I will learn four models, so that I can context filter by Weekday-unknown, Weekend-unknown, unknown-Noon, unknown-Afternoon, Weekday-Afternoon, Weekday-Noon... and so on. The idea is to use prediction from all the relevant models by user and then weight the resulting recommendation based on what is known about the context (unknown meaning, that all context models are used and weighted result is given).
I would need something, that responds fast and it seems like I will unfortunately need some kind of middle-ware if I don't want do the weighting in the front-end.
I know, that AppEngine has prediction mode, where it keeps the models in RAM, which guarantees fast responses, as you don't have to bootstrap the prediction models; then resolving the context would be fast.
However, would there be more simple solution, which would also guarantee similar performance in Google Cloud?
The reason I am using Cloud ML Engine is that when I do context aware recommender system this way, the amount of hyperparameter tuning grows hugely; I don't want to do it manually, but instead use the Cloud ML Engine Bayesian Hypertuner to do the job, so that I only need to tune the range of parameters one to three times per each context model (with an automated script); this saves much of Data Scientist development time, whenever the dataset is reiterated.

There are four possible solutions:
Learn 4 models and use SavedModel to save them. Then, create a 5th model that restores the 4 saved models. This model has no trainable weights. Instead, it simply computes the context and applies the appropriate weight to each of the 4 saved models and returns the value. It is this 5th model that you will deploy.
Learn a single model. Make the context a categorical input to your model, i.e. follow the approach in https://arxiv.org/abs/1606.07792
Use a separate AppEngine service that computes the context and invokes the underlying 4 services, weighs them and returns the result.
Use an AppEngine service written in Python that loads up all four saved models and invokes the 4 models and weights them and returns the result.
option 1 involves more coding, and is quite tricky to get right.
option 2 would be my choice, although it changes the model formulation from what you desire. If you go this route, here's a sample code on MovieLens that you can adapt: https://github.com/GoogleCloudPlatform/cloudml-samples/tree/master/movielens
option 3 introduces more latency because of the additional network overhead
option 4 reduces network latency from #3, but you lose the parallelism. You will have to experiment between options 3 and 4 on which provides better performance overall

Google Maps Fusion Tables feasibility

I am wondering if someone can provide some insight about an approach for google maps. Currently I am developing a visualization with google maps api v3. This visualization will map out polygons for; country, state, zip code, cities, etc. As well as map 3 other markers(balloon, circle..). This data is dynamically driven by an underlying report which can have filters applied and can be drilled to many levels. The biggest problem I am running into is dynamically rendering the polygons. The data necessary to generate a polygon with Google Maps V3 is large. It also requires a good deal of processing at runtime.
My thought is that since my visualization will never allow the user to return very large data sets(all zip codes for USA). I could employ the use of dynamically created fusion tables.
Lets say for each run my report will return 50 states or 50 zip codes. Users can drill from state>zip.
The first run of the visualization users will run a report ad it will return the state name and 4 metrics. Would it be possible to dynamically create a fusion table based on this information? Would I be able to pass through 4 metrics and formatting for all of the different markers to be drawn on the map?
The second run the user will drill from state to zip code. The report will then return 50 zip codes and 4 metrics. Could the initial table be dropped and another table be created to map a map with the same requirements as above? Providing the fusion tables zip code(22054, 55678....) and 4 metric values and formatting.
Sorry for being long winded. Even after reading the fusion table documentation I am not 100% certain on this.

Fully-hosted solution
If you can upload the full dataset and get Google to do the drill-down, you could check out the Google Maps Engine platform. It's built to handle big sets of geospatial data, so you don't have to do the heavy lifting.
Product page is here: http://www.google.com/intl/en/enterprise/mapsearth/products/mapsengine.html
API doco here: https://developers.google.com/maps-engine/
Details on hooking your data up with the normal Maps API here: https://developers.google.com/maps/documentation/javascript/mapsenginelayers
Dynamic hosted solution
However, since you want to do this dynamically it's a little trickier. Neither the Fusion Tables API nor the Maps Engine API at this point in time support table creation via their APIs, so your best option is to model your data in a consistent schema so you can create your table (in either platform) ahead of time and use the API to upload & delete data on demand.
For example, you could create a table in MapsEngine ahead of time for each drill-down level (e.g. one for state, one for zip-code) & use the batchInsert method to add data at run-time.
If you prefer Fusion Tables, you can use insert or importRows.
Client-side solution
The above solutions are fairly complex & you may be better off generating your shapes using the Maps v3 API drawing features (e.g. simple polygons).
If your data mapping is quite complex, you may find it easier to bind your data to a Google Map using D3.js. There's a good example here. Unfortunately, this does mean investigating yet another API.

Custom Database integration with MOSS 2007

Hopefully someone has been down this road before and can offer some sound advice as far as which direction I should take. I am currently involved in a project in which we will be utilizing a custom database to store data extracted from excel files based on pre-established templates (to maintain consistency). We currently have a process (written in C#.Net 2008) that can extract the necessary data from the spreadsheets and import it into our custom database. What I am primarily interested in is figuring out the best method for integrating that process with our portal. What I would like to do is let SharePoint keep track of the metadata about the spreadsheet itself and let the custom database keep track of the data contained within the spreadsheet. So, one thing I need is a way to link spreadsheets from SharePoint to the custom database and vice versa. As these spreadsheets will be updated periodically, I need tried and true way of ensuring that the data remains synchronized between SharePoint and the custom database. I am also interested in finding out how to use the data from the custom database to create reports within the SharePoint portal. Any and all information will be greatly appreciated.

I have actually written a similar system in SharePoint for a large Financial institution as well.
The way we approached it was to have an event receiver on the Document library. Whenever a file was uploaded or updated the event receiver was triggered and we parsed through the data using Aspose.Cells.
The key to matching data in the excel sheet with the data in the database was a small header in a hidden sheet that contained information about the reporting period and data type. You could also use the SharePoint Item's unique ID as a key or the file's full path. It all depends a bit on how the system will be used and your exact requirements.

I think this might be awkward. The Business Data Catalog (BDC) functionality will enable you to tightly integrate with your database, but simultaneously trying to remain perpetually in sync with a separate spreadsheet might be tricky. I guess you could do it by catching the update events for the document library that handles the spreadsheets themselves and subsequently pushing the right info into your database. If you're going to do that, though, it's not clear to me why you can't choose just one or the other:
Spreadsheets in a document library, or
BDC integration with your database
If you go with #1, then you still have the ability to search within the documents themselves and updating them is painless. If you go with #2, you don't have to worry about sync'ing with an actual sheet after the initial load, and you could (for example) create forms as needed to allow people to modify the data.
Also, depending on your use case, you might benefit from the MOSS server-side Excel services. I think the "right" decision here might require more information about how you and your team expect to interact with these sheets and this data after it's initially uploaded into your SharePoint world.

So... I'm going to assume that you are leveraging Excel because it is an easy way to define, build, and test the math required. Your spreadsheet has a set of input data elements, a bunch of math, and then there are some output elements. Have you considered using Excel Services? In this scenario you would avoid running a batch process to generate your output elements. Instead, you can call Excel services directly in SharePoint and run through your calculations. More information: available online.
You can also surface information in SharePoint directly from the spreadsheet. For example, if you have a graph in the spreadsheet, you can link to that graph and expose it. When the data changes, so does the graph.
There are also some High Performance Computing (HPC) Excel options coming out from Microsoft in the near future. If your spreadsheet is really, really big then the Excel Services route might not work. There is some information available online (search for HPC excel - I can't post the link).

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight