Is there a best way to make training data including entity? - ibm-watson

Can Intent have wildcard replaced for entity within training data?
In case of intent trained by data including entity, confidence change depending on entity.
Is there a best way to make training data including entity?

By default Watson Conversation automatically factors in entities into the training of intents.
Just be aware that the words themselves become keywords, when associating.
For example.
If you created an entity with value of xxx and your intent had "What is xxx", then everything in the entity of xxx would be swapped out when created.
Now in your entity if you had dog as a value, it would only look for that keyword. It wouldn't understand that the word has a relationship with puppy for example, unless you explicitly put puppy into the entity.
There is more details here. That was written before pattern entities.

You cannot use wildcards in intents because that’s not how NLP works. It needs access to the full sentence to process it.
What I see in most projects are examples with and without entities. For example, an intent to buy a house could use this:
- I want to buy a house
- I have to purchase a place
- I need to buy
- I am going to acquire
- I have money for a
- I need a home
Assume you have an entity house with some synonyms: place, apartment, home and etc.
So you have a intent for buying that is generalist but you have some examples with entities too.

Related

Generic relation for database

I have to design a generic entity that would be able to refer to variated other entities.
In my example, that would be a commentary entity inside a web application. You could post commentaries on to users, classifieds, articles, varieties (botanical ones), and so on.
So that entity would be made like this:
As a matter of fact, the design (kind of) pattern would be this one:
What are the pros and cons of this kind of pattern?
What I see is:
Pros
It decreases the number of entities if the concept is the same (commentaries for example);
You can therefore easily manipulate heterogeneous objects;
You can aggregate these objects easily (e.g. this user's last commentaries in the whole site, presented easily in a same thread);
Cons
This allows you to fall in the ugly (you use it outrageously and your database and source code are ugly);
There is no control in the database, and this one must therefore be done inside the application code.
What are the performances impacts?
Conclusion
Is this kind of pattern suitable for a relational database? How can we do then?
Thank you by advance.
One more con :
This scheme relies on a mapping between values and names for the "entities" referred to by those values. Think of all the fun you'll have resolving issues that in the TEST system, the ORDER entity has number 734 but in production, it has number 256. You can use the entity names themselves as the values of your entity_id stuff, but you will never be able to avoid hardcoding values for them in your programs (or, say, in view definitions) anyway. Thereby defeating whatever advantage it was you thought you could win.
This kind of scheme is a disease mostly suffered by OO programmers. They see structures that are largely similar and they have this instinctive reflex "I must find a way to resue the existing thing for this". Forgetting that database design is not program design.
EDIT
(if it wasn't clear, this means my answer to your question "Is this kind of pattern suitable for a relational database?" is a principled "NO".)
This is the classic Polymorphic Association anti-pattern. There are a number of possible solutions:
1) Exclusive Arcs e.g. for the Commentary entity
Id
User_Id
Classified_Id
Article_Id
Variety_Id
Where User_Id, Classified_Id, Article_Id and Variety_Id are nullable and exactly one must be not null.
2) Reverse the Relationship e.g remove the Target_Entity and Target_Entity_Id from the Commentary entity and create four new entities
User_Commentary
Commentary_Id
User_Id
Classified_Commentary
Commentary_Id
Classified_Id
Article_Commentary
Commentary_Id
Article_Id
Variety_Commentary
Commentary_Id
Variety_Id
Where Commentary_Id is unique and relates to the Id in Commentary.
3) Create a super-type entity for User, Classified, Article and Variety and have the Commentary entity reference the unique attribute of this new entity.
You would need to decide which of these approaches you feel is most appropriate in your specific situation.

should I put the user in the ancestor path or separately?

My app should contain several users, each of them having a list of objects ( only one user own the object ).
My question is : Would it be better to put an entity User that references the Ids of its objects, or should I put the user as the ancestor of the objects ? Please be kind, I am just beginning with nosql and datastore !
What approach you take will depend heavily on your access patterns, what make sense for easy retrieval, frequency of writes etc. You start your design process by building a basic entity relationship model, then start elaborating on what information you need to get to, and how frequently it is required what security restrictions are required. Then look at how you need to adjust the real model to reflect these access use cases taking into account performance, ease of use, security requirements.
Which approach you should choose depends mainly on the consistency model (strong vs eventual) you require for your entities. In Google Cloud Datastore, an entity group (an entity and its descendants) is a unit with strong consistency, transactionality, and locality.
You can read more on the topic here and here.
And there is one more important thing that is needed to take into account. If you model a parent-child relationship between a user and an object, the parent will be part of the object's key hence if you will change the object's owner later, you will end up with different object in terms of its key.

Database Design - Best way to manage these relationships

I am working on a project (based in Django although that's not really relevant to my question) and I am struggling to work out the best way to represent the data models.
I have the four following models:
User,
Client,
Meeting,
Location
User and Client have a many-to-many relationship through the Meeting model. The Meeting model has a one-to-one relationship with the Location model.
Meetings will take place at either:
The address defined in the User (or UserProfile) model
The address defined in the Client model.
Some other location which has to be defined at a later date.
I'm struggling to work out the best way to store the Location data in order to make it as clean and reusable as possible.
I considered making Location as a field in the Meetings model rather than a model in its own right - although this could also lead to redundant data if lots of Meetings are created at the same location, so this is probably a non-starter.
I could automatically create Location records for each User and Client that gets created and use a generic relationship between the relevant records, however, I understand that this can lead to inefficient database performance. Also, not every Client / User would be able to hold meetings at their Location.
Can anyone see an tidier alternative?
Any advice appreciated.
Thanks.
I considered making Location as a field in the Meetings model rather
than a model in its own right - although this could also lead to
redundant data if lots of Meetings are created at the same location,
so this is probably a non-starter.
No, that's a really good thought, because it points you straight at the real problem.
The real problem is that there's a difference between a meeting and the parties that attend a meeting. A meeting has some attributes that have nothing to do with the attendees: it has at the very least a time and a place.
So I think you should change your thinking about the Meeting model.
Instead of users having a M:N relationship with clients through the Meeting model, they should have a M:N relationship through, say, an Attendance model. (A Registration or Reservation or MightAttend model might be more appropriate for you.) And the Meeting model should change to reflect the unique attributes of a real-world meeting: time and place.
I would expect Meetings and Locations to have a many-to-one relationship. Can't a location be used for more than one meeting? (at different times, of course)
It seems to me that a location has attributes that persist beyond its use for a single meeting. Example: seating capacity.

ER modelling - Generating patient ID every visit

Hi I am doing an assignment on ER modelling and there is a part that I'm stuck on, here is an extract:
Patient is a person who is either admitted to the hospital or is registered in an outpatient program. Each patient has a patient number (ID), name, dob, and tele. Resident patients have a Date Admitted. Each outpatient is scheduled for zero or more return visits, which have data and comments. Each time a patient is admitted to the hospital or registered as an outpatient, they receive a new patient number.
I can't do the last section bolded. I have attempted the question: http://tinypic.com/r/358dus9/4
Also if anyone can check if I've done it correctly, would be highly appreciated thanks!
Sometimes assignments also contain "information" that is pretty much immaterial.
The purpose is precisely to learn to filter out the 'real' information from the noise.
(With the caveat that there are dozens and dozens of ER dialects, and each has its own peculiarities,) ER does not have a way to express the information that "attribute x in entity y is to be autogenerated by the system.". For this reason, and as far as the actual ER modeling is concerned, your bold phrase is just noise.
I agree with Erwin on this one. I'll add that not having to have a consistent structure for the patient means that you don't have to create another table for the patient, you can just put it into the ER case directly.
Generally, this is a bad practice however. In reality, you would still have a regular patients table with identifiable patients over several visits. Then again, this is a class and as we all know, the #1 rule is not to disobey the teacher (no matter how insane it is). The real lesson here is to learn how to take requirements, have them clarify the requirements, explain the consequences if they don't follow your advice on how the data will be modeled and then go ahead with whatever they say as they have the final say as the client.
Depends on the course that you're taking, as well. Microsoft SQL Server/SQL Express has the autonumber setting possible, while Oracle does not feature this (although it's accomplished through this). Insofar as the modeling is concerned, there is no way to model that requirement specifically, as far as I know.
Entity-relationship diagrams are used to model the relationships and the data itself as it exists. What you're looking for is more of a UML approach to describing the process in which it has data created for that field.

Google Appengine: Is This a Good set of Entity Groups?

I am trying to wrap my head around Entity Groups in Google AppEngine. I understand them in general, but since it sounds like you can not change the relationships once the object is created AND I have a big data migration to do, I want to try to get it right the first time.
I am making an Art site where members can sign up as regular a regular Member or as one of a handful of non-polymorphic Entity "types" (Artist, Venue, Organization, ArtistRepresentative, etc). Artists, for example can have Artwork, which can in turn have other Relationships (Gallery, Media, etc). All these things are connected via References and I understand that you don't need Entity Groups to merely do References. However, some of the References NEED to exist, which is why I am looking at Entity Groups.
From the docs:
"A good rule of thumb for entity groups is that they should be about the size of a single user's worth of data or smaller."
That said, I have a couple hopefully yes/no questions.
Question 0: I gather you don't need Entity Groups just to do transactions. However, since Entity Groups are stored in the same region of Big Table, this helps cut down on consistency issues and race conditions. Is this a fair look at Entity Groups and Transactions together?
Question 1: When a child Entity is saved, do any parent objects get implicitly accessed/saved? i.e. If I set up an Entity Group with path Member/Artist/Artwork, if I save an Artwork object, do the Member and Artist objects get updated/accessed? I would think not, but I am just making sure.
Question 2: If the answer to Question 1 is yes, does the accessing/updating only travel up the path and not affect other children. i.e. If I update Artwork, no other Artwork child of Member is updated.
Question 3: Assuming it is very important that the Member and its associated account type entity exist when a user signs up and that only the user will be updating its Member and associated account type Entity, does it make sense to put these in Entity Groups together?
i.e. Member/Artist, Member/Organization, Member/Venue.
Similarly, assuming only the user will be able to update the Artwork entities, does it make sense to include those as well? Note: Media/Gallery/etc which are references to Artwork may be related to lots of Artwork, not just those owned by the user (i.e. many to many relations).
It makes sense to have all the user's bits in an entity group if it works the way I suspect (i.e. Q1/Q2 are "no"), since they will all be in the same region of BigTable. However, adding the Artwork to the entity group seems like it might violate the "keep it small" principal and honestly, may not need to be in Transactions aside from saving bandwidth/retrys when users are uploading artwork images.
Any thoughts? Am I approaching Entity Groups wrong?
0: You do need entity groups for transactions among multiple entities
1: Modifying/accessing children does not modify/access a parent
2: N/A
3: Sounds reasonable. My feeling is, entity groups should not be used unless you need transactions among them.
It is not necessary to have the the Artwork as a child for permission purposes. But if you need transactional modification to them (including e.g. creation and deletion) it might be better. For example: if you delete an account, you delete the user entity but before you delete the child, you get DeadlineExceeded or the server crashes. Now you have an orphaned Artwork. If you have more than 1,000 Artworks for an Artist, you must delete in batches.
Good luck!

Resources