Watson knowledge studio: how to teach my model to get recipe name? - ibm-watson

I've been trying to figure out how to use Watson Knowledge Studio for couple weeks now. I've been working with cooking recipes to keep data simple and easy to annotate.
My goal would be to be able to submit a recipe as an unstructured text and get a structured response with the recipe name, ingredients, cooking devices, budget, diet, etc.
It's actually doing ok so far, except for the recipe name.
So my question is how to teach the model how to identify this very specific part (recipe name) since it's almost always different?
Any advice welcome :)

In the "Annotator Component" of the Watson Knowledge studio, you have a component called Machine learning. Create a corpus of few representative documents and complete human annotation. You can use this set as training set for the machine learning component and see the statistics of the evaluation and fine tune the model. The process works like this:
Create type system (you can create custom dictionaries for auto annotate the documents) --> Create a document corpus of representative documents --> Human Annotate the documents (entities, relations & conferences) --> Submit the annotations--> Approve the annotations --> Create machine learning annotator --> select the document corpus --> Build Training Set, Test Set and Blind Set (or you can use the system proposed distribution) --> Train & Evaluate --> Check statistics --> Create a snapshot the version --> Deploy the version with your AlchemyAPI key --> Your model will be created.
Try the model with new documents and see how it performs and you can repeat the process to fine tune it.
HTH
Gopal

Related

Is there a way to show pdf in its original structure in the human review custom entity labelling in aws sagemaker?

I have modified this sample to read PDFs in tabular format. I would like to keep the tabular structure of the original pdf when doing the human review process. I notice the custom worker task template uses the crowd-entity-annotation element which seems to read only texts. I am aware that the human reviewer process reads from an S3 key which contains raw text written by the textract process.
I have been considering writing to S3 using tabulate but I don't think that is the best solution. I would like to keep the structure and still have the ability to annotate custom entities.
Comprehend now natively support to detect custom-defined entities for pdf documents. To do so, you can try the following steps:
Follow this github readme to start the annotation process for PDF documents.
Once the annotations are produced. You can use Comprehend CreateEntityRecognizer API to train a custom entity model for Semi-structured document”
Once entity recognizer is trained, you can use StartEntitiesDetectionJob API to run inference for PDF documents

Best way to save objects in database

The Story:
I’m trying to develop a big application in PHP. The main problem is that I have to deal with the objects and I need to apply CRUD operations. Ex: Suppose we have class diagram (Compiler):
Project { name:string, statements:list …}
Statement{ type:string }
IfStatement exend Statement { condition:Exp, …}
…
The question (What is the best design for ERD or database)
as I know I’ve two solutions:
Serialize the main object and save it in the DB
Make a table for each class in the class diagram and linking by foreign keys
Note: I’ve read about ORM but I think it’s similar to 1st solution
Serializing your objects in your database is in my opinion not a good solution.
You must be able to find and edit each object properties in your database.
It requires a simple logic structure, with relations and keys.
This way you'll be able to build a CRUD Admin Dashboard, read and edit all your data in a logical and reliable way.
MySQL Workbench is a real good starting point, it communicates with PhpMyAdmin, you can use it to build your database and create shemas.
Then if you're looking for a professional CRUD Generator I suggest this one: https://www.phpcrudgenerator.com/
[Disclaimer: I'm the author]
It can build your CRUD admin dashboard in a very simple way within a few minutes, and comes with many advanced features.
The Admin dashboard generated by the CRUD Generator is powered with Bootstrap 4, jQuery, PHP objects and TWIG templates.

What is the formal model behind Sense/Net ECM?

First, I don't know if this is the right place to discuss idea related to Sense/Net SN evolution & learning process about it!
Anyway, this is my story:
I have tried & tested some SN functionality especially content type definition CTD; It is really elegant!
Sense/Net wiki documentation gives us "Know How" and we may write 200 wiki pages about SN. All included information are true. However, we don't have the complete model in which we can see the whole system model and how all cases derived from it.
I searched SN codeplex.com pages but didn't find how SN evolved to be mature ECM platform.
Also, searched google using the following KWs:
"Document Management System Modeling"
"Role-based access control (RBAC) model"
.....
Please collaborate & help.
It's curious that no one from SenseNet has answered, but I'll give it a shot even though I don't know a lot of the history. I've been working with SenseNet for the last 4+ years, developed the pysensenet extension, communicate with the developers, and am familiar with the source code, so I know a bit about the framework.
The framework has evolved over that last 15+ years and is pretty remarkable. Here are a few facts and highlights:
The data model is at it's core an XML Tree where each tree node has an internal representation as a C# class and can hold any number of properties/Fields. This is referred to as Content, and the database as the Content Repository.
The XML Tree is persisted in a SQL Database and uses Lucene.NET for indexing.
Content / data queries are made in Lucene and not SQL.
At one time the database was arbitrary (SQL), then stored procedures in MS SQL Server locked it into MS SQL, although recently (SenseNet 7) supports blob storage in MongoDB.
Fields can be one of 9 built-in field types, or a custom type that you define.
A node in the XML Tree, aka "Content", can hold a field that references another node somewhere else in the tree, like a linked list inside a tree! OK, a doubly linked list since both nodes can refer to each other. Very cool.
There is no "external model", or as SenseNet says, "Everything is Content".
The permission system is node based and is incredibly granular. For example, you can define permissions such that one role, group or person, can only see the Content at a particular node. And it integrates with Active Directory.
All Content can be versioned and tracked. For example, a Content Type of "Contact" (person) could have versioning on for the person's name. This way if someone changed their name, the Content Repository would have a history of the all name changes.
Hopefully this doesn't come off as a SenseNet marketing piece -- I don't work for them and don't benefit if you purchase a license -- but may help you compare it to other technologies such as SharePoint and Alfresco.

Generate a series of documents based on SQL table

I am trying to formulate a proposal for an application that allows a user to print a batch of documents based on data stored in a SQL table. The SQL table indicates which documents are due and also contains all demographic information. This is outside of what I normally do and am trying to see if these is a platform/application that already exists to do such a task
For example
List of all documents: Document #1 - Document #10
Person 1 is due for document #: 1,5,7,8
Person 2 is due for document #: 2.6
Person 3 is due for document #: 7,8,10
etc
Ideally, what I would like is for the user to be able to push a button and get a printed stack of documents that have been customized for each user including basic demographic info like name, DOB, etc
Like i said at the top, I already have all of the needed information in a database, I am just trying to figure out the best approach to move that information onto a document
I have done some research and found some people have used mail merge in Word or using Access as a front end but I don't know if this is the best way. I've also found this document. Any advice would be greatly appreciated
If I understand your problem correctly, your problem is two-fold: Firstly, you need to find a way to generated documents based on data (mail-merge) and secondly, you might need to print them two.
For document generation you have two basic approaches: template-based and programmatically from scratch. I suppose that you will opt for a template based approach which basically means that you design (in MS Word) a template document (Word, RTF, ...) that acts as a template and contains placeholders and other tags that designate »dynamic« parts of the document. Then, at document generation time, you need a .NET library/processor that you will pass this template document and the data, where the processor will populate the template with the data and return the resulting document.
One way to achieve this functionality would be employing MS Words' native mail-merge, but you should know that this would involve using Office COM and Word Application Automation which should be avoided almost always.
Another option is to build such a system on top of Open XML SDK. This is velid option, but it will be a pretty demanding task and will most probably cost you much more than buying a commercial .NET library that does mail-merge out-of-the-box – been there, done that. But of course, the good side here is that you will be able to tailer the solution to your needs. If you go down this road I recoment that you use Content Controls for tagging documents/templates. The solution with CCs will be much easier to implement than the solution with bookmarks.
I'm not very familliar with the open source solutions and I'm not sury how many there are that can do mail-merge. One I know is FlexDoc (on CodePlex) but its problem is that uses a construct (XmlControl) for tagging that is depricated in Word 2010+.
Then there are commercial solutions. Again I don't know them in detail but I know that the majority of them are a general purpose document processing libraries. Our company has been using this document generation toolkit for some time now and I can say it covers all our »template-based document generation« needs. It doesn't require MS Word at doc generation time, and has really helpful add-in for MS word and you only need several lines of code to integrate it in your project. Templating is very powerful and you can set-up a template in a very short time. While templates are Word documents, you can generate PDF or XPS docs as well. XPS is useful because you can use .NET/WPF prining framework that works with XPS docs to print documents. This is a very high-end solution, but of course, the downside here is that it is not a free solution.

Getting Started With Lift, Using Databases to Build Dynamic Sites

So I have been looking around the internet for a good explanation of how lift works concerning databases. I have not found anything very helpful yet. What I am looking for is a simple explanation or code example that can show how lift connects to its databases to perform transactions and how to use this to create new tables, models or update and edit existing tables.
For example: with django i fairly easily figured out how it generated database tables from model classes and executed updates on them through methods it inherited from the framework.
I am trying to create a simple app at the moment that would have users, information about them, posts on a website, etc.
I am currently reading through the available Lift books and would greatly appreciate more help in learning how to use lift.
Lift configures it's data source in Boot.scala.
if (!DB.jndiJdbcConnAvailable_?) {
val vendor =
new StandardDBVendor(Props.get("db.driver") openOr "org.h2.Driver",
Props.get("db.url") openOr
"jdbc:h2:lift_proto.db;AUTO_SERVER=TRUE",
Props.get("db.user"), Props.get("db.password"))
LiftRules.unloadHooks.append(vendor.closeAllConnections_! _)
DB.defineConnectionManager(DefaultConnectionIdentifier, vendor)
}
It can generate table schemas for you using Schemifier:
Schemifier.schemify(true, Schemifier.infoF _, User,Post,Tag,PostTags)
For general Lift project, you can just use Lift Mapper as an ORM tool, it's not complete but works for most of the cases.
You can refer to Lift WIKI and Simply Lift(Written by the Author) or Explore Lift.
From my perspective, the documents available so far are rather disappointing.
It's said the Lift in Action is very well written, but won't come out till this summer, you can read it from MEAP.
In the Exploring Lift book, the PocketChange example contains code showing how to define a User using MetaProtoUser and other features. I would start there for a better understanding of Lift, model and the built-in CRUD and User prototype objects.
http://exploring.liftweb.net/master/index-2.html#toc-Chapter-2
Keep in mind that the 'new' approach to DB integration will be via the Record. This is very much a work in progress, so I wouldn't rush to start learning it.
You can also look at the source for Lift in Action to get some ideas. Here's a link to the travel app built in the first couple chapters
https://github.com/timperrett/lift-travel
And to the source code for the entire book. Chapter 10 is the Mapper chapter.
https://github.com/timperrett/lift-in-action
The default ORM in Lift is Mapper which gives you among other things a quick path to CRUD functionality for your DB entities.
However if you would like a more traditional JPA persistence approach (or rather SPA since entities would in that case be written in scala), i usually find very useful the JPA-like sample application that is part of the Lift distribution. To try it out, assuming maven is installed, just type:
mvn archetype:generate -DarchetypeRepository=http://scala-tools.org/repo-snapshots -DarchetypeGroupId=net.liftweb -DarchetypeArtifactId=lift-archetype-jpa-basic_2.8.1 -DarchetypeVersion=2.3-SNAPSHOT -DgroupId=org.mycompany.myproject -DartifactId=MyProject -Dversion=1.0
This will create a MyProject Lift project, containing a simple library application with 2 entities (Author and Book) having a one-to-many relationship as well as CRUD snippets showing how you can create and edit such entities in a jdbc compliant database.

Resources